Google vs. OpenAI The Battle Over YouTube Data Unveiled

Google vs. OpenAI: The Battle Over YouTube Data Unveiled

Discover how OpenAI utilized YouTube videos for AI training, sparking controversy with Google. Learn about the implications of this practice.
Share your love
  • Artificial intelligence remains without data to train.
  • With so much money at stake, they do not hesitate to use the law to obtain data that is not public.
Just Fresh CPS [App, iOS, android] IN
Newsletter Form For Blog Post

Make sure you catch our updates!
Subscribe

Welcome to our Newsletter Subscription Center. Fill out the form below to stay up-to-date with the latest news and updates from our company.


Introduction

Google accused OpenAI last week of using YouTube videos to train Sora. Now, an investigation by The New York Times assures that OpenAI also uses more than a million hours of YouTube videos to train Whisper, his AI that converts audio into text.

As was to be expected, Google has not sat well because OpenAI is not only its data but also its most direct rival in the field of artificial intelligence.

We’ll see if this case reaches the courts or if there’s an agreement between companies so that the two win.

5 ChatGPT Prompts: Make Your Content Marketing Strategy

OpenAI used YouTube videos to train their AI.

Artificial intelligence needs real-world data to improve. And the more perfect this AI is, the more data it needs.

According to The New York Times, via The Verge, major AI companies have already consumed all the public data available to train AI, as well as the private collections with which they have reached an agreement.

Discover how OpenAI utilized YouTube videos for AI training, sparking controversy with Google. Learn about the implications of this practice.
Discover how OpenAI utilized YouTube videos for AI training, sparking controversy with Google. Learn about the implications of this practice.

According to the research, OpenAI will remain without data in 2021. So its executives discussed the possibility of using YouTube videos, podcasts, and audio books, even knowing they were in a “grey zone” of the law.

Finally, they decided to use a million hours of YouTube videos to extract the audio and train Whisper, their voice-to-text AI. They would welcome the term “reasonable use” by employing only a fraction of the hundreds of billions of hours of video on YouTube.

Just Fresh CPS [App, iOS, android] IN

YouTube as a Goldmine:

Supposedly, the president of OpenAI himself, Greg Brockman, was involved in obtaining those videos.

Google spokesman Matt Bryant confirmed to The Verge that the company had “seen unconfirmed reports” of OpenAI activity and assured that “both our robots.txt files and the terms of service prohibit the scraping or unauthorized download of YouTube content.”.

The New York Times investigation also assures that Meta was out of data a long time ago and barred the possibility of licensing books and even buying a large publishing house.

According to some experts, AI companies will need more data than they can generate by 2028.

The solution is to create synthetic data, that is, artificially designed to be used with AI, or to use other training models that do not require so much data. But so far, none of this has worked.

Conclusion

AI companies compete in a relentless race to dominate a market that will generate a lot of money, and they do not hesitate to skip copyright in order to train their AI faster than their rivals. A suicidal race that sows doubts about the supposed security of that AI, as long as it does not annihilate us or make us its slaves…

Share your love
Lineesh Kumar
Lineesh Kumar

Expertise in digital marketing

Articles: 218

Leave a Reply

Your email address will not be published. Required fields are marked *