YouTube CEO Neal Mohan said OpenAI’s potential use of YouTube videos to coach text-to-video model Sora would violate its terms of service. 

Mohan told Bloomberg, “If Sora used content from YouTube it might be a ‘clear violation’ of its terms of service.”

There shall be no love lost between YouTube and OpenAI, with each drawn on different sides of the Big Tech divide. 

Sora is OpenAI’s revolutionary latest text-to-video model, which remains to be being tested. It signifies generative AI’s conquest of all media forms, starting with text, then images, and now audio and video. 

Generative video and audio include a brand new set of risks for AI firms to barter, akin to their models producing near-exact replicas of copyright material. 

We’ve already witnessed this with text-to-audio model Suno, which produces very similar audio to famous songs like Queen’s “Bohemian Rhapsody” and ABBA’s “Dancing Queen.” 

Neither OpenAI nor most AI firms have been notably transparent about their reliance on vast amounts of internet-sourced data, including copyrighted material, to coach models. 

OpenAI even acknowledged the challenges of avoiding copyrighted data in its development processes, stating in a submission to the British House of Lords that “it was ‘not possible” to construct the technology without it.” 

That was somewhat of a Freudian slip that exposed an inconvenient truth.

However, despite OpenAI stating copyright data is unequivocally vital for generative AI, infringement has not yet been proven in a court of law, reflecting how copyright law in its current incarnation was simply not born for this era. 

On the subject of coaching Sora specifically, OpenAI CTO Mira Murati, in an interview with Wall Street Journal, seemingly didn’t know what content was used to coach Sora, including whether any YouTube content was involved. 

Murati said, “I’m actually unsure about that,” when questioned concerning the content sources for Sora’s training, adding that any data utilized was either “publicly available or licensed.”

It’s not a gleaming report of transparency for OpenAI as they prepare to release their groundbreaking latest model – one they’re already using to tender for business inside Hollywood for its potential applications in film and TV. 

Sora already caused producer Tyler Perry to pause an $800 million studio expansion, hinting at potentially massive upheaval for the creative industries ahead. 

YouTube’s CEO speaks about Sora

YouTube CEO Mohan showed his awareness of the continuing discussions about AI training practices. He hinted at OpenAI’s have to make clear using YouTube data. 

He told Bloomberg, “From a creator’s perspective, when a creator uploads their labor to our platform, they’ve certain expectations. One of those expectations is that the terms of service goes to be abided by. It doesn’t allow for things like transcripts or video bits to be downloaded, and that could be a clear violation of our terms of service. Those are the principles of the road by way of content on our platform.”

YouTube’s terms of service explicitly “prohibit unauthorized scraping or downloading of YouTube content,” a policy confirmed by a spokesperson for YouTube in light of Mohan’s comments.

Alphabet, YouTube’s parent, is keenly developing their very own AI tools. We can expect backlash if OpenAI directly or not directly used YouTube videos to coach Sora. 

The AI data gold rush has led to strategic partnerships and licensing agreements between tech firms and content providers. Numerous lawsuits are still in progress within the domains of text and image generation, but these remain largely inconclusive. 

First, even when AI models expose themselves by reproducing copyrighted work (akin to MidJourney spitting out images from Marvel movies or the Simpsons), their black box nature makes it nigh-impossible to find out where this data was retrieved and when precisely the infringement occurred. 

Secondly, while AI-generated audio, images, video, etc., might illustrate strong evidence of infringement, it’s not as clear-cut as you or me copying a picture of Mickey Mouse and selling it for tens of millions without permission. 

In response to those legal pressures, AI firms are beginning to deal on precious data. 

For instance, Reddit’s $60 million per yr licensing take care of Google for training AI tools exemplifies the formal arrangements emerging within the industry. 

Similarly, media organizations akin to The Associated Press and Axel Springer have entered into agreements allowing their content for use for AI training, with provisions for attribution in AI-generated responses.

This presents its own challenges. Generative AI is dear to construct and run, and now, AI firms must pay for the info somewhat than simply extract it from the web. 

The post YouTube CEO warns OpenAI about potential terms of service violation appeared first on DailyAI.

This article was originally published at dailyai.com