The legal landscape around fair use in AI training is evolving rapidly, and if your models rely on unlicensed materials scraped from the Internet, the latest court rulings are a clear warning sign. If you have video content available to license, the opportunity to monetize that content just got stronger.
Two federal judges recently issued decisions on whether using copyrighted books to train large language models (LLMs) qualifies as fair use. While both opinions ultimately found fair use in the specific circumstances, neither ruling offers blanket immunity for AI developers and highlights serious risks, even more pronounced when the training data involves video content.
Below, I will break down these decisions, why they matter, and how they apply to video training, especially when that video is scraped without permission from platforms like YouTube.
Two Decisions, One Cautionary Tale
In June 2025, Judge Alsup (Bartz v. Anthropic) and Judge Chhabria (Kadrey v. Meta) issued opinions in lawsuits brought by authors against AI companies. Though the rulings reached the same bottom line ”fair use ” each decision emphasized different rationales and limitations.
Judge Alsup: The Problem with Pirated Inputs
Key takeaway:
Downloading and storing pirated books from shadow libraries like Library Genesis was not fair use.
Judge Alsup ruled that Anthropic’s creation of a permanent library of unauthorized book copies was a separate infringing act, regardless of whether the ultimate model outputs were transformative. He ordered a trial on damages, signalling that input acquisition matters as much as output.
Judge Chhabria: A Warning About Market Dilution
While Judge Chhabria ruled Meta’s use was fair due to a lack of evidence of actual market harm, he embraced a new theory of copyright market dilution.
According to this view, using copyrighted works to build a model that enables countless derivative or competing outputs, even if those outputs do not copy anything verbatim, could be an infringement if it undermines the market for the original works.
In his own words: No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to develop a tool to make billions while enabling the creation of a potentially endless stream of competing works that could significantly harm the market.
Applying These Lessons to Video Training Data
Many AI companies have trained models on video downloaded from YouTube, Vimeo, or other platforms again, often without the creators’ knowledge or consent.
Just as the judges examined whether downloading pirated text datasets was a separate infringing use, a court would almost certainly ask:
- Did the AI company acquire those video files legally?
- Was there a valid license from the content owner?
- Did the scraping violate the platform’s terms of service or copyright law?
If the answer is no, the company may face infringement claims similar to those in Bartz v. Anthropic, or worse, because videos are often easier to prove as verbatim reproductions.
Moreover, the market dilution theory applies even more strongly to video:
An AI model trained on YouTube clips could eventually generate new videos that substitute for the originals, undermining demand.
This potential for substitution, even by non-infringing outputs, is exactly what Judge Chhabria warned could tip the balance against fair use in future cases.
Ethically Sourced Video Is the Sustainable Path
These rulings show that simply calling AI training transformative is no longer enough. The legal and ethical case for sourcing video responsibly has never been clearer:
Ethically sourced video means the content owner gives permission and is fairly compensated for the use of their content by any AI company developing the next generation of “known world” models, which are on the path to AGI.
By licensing video through an ethical third-party broker, an AI model supports the creators whose work enables your model to learn in the first place and:
- Avoids the liability that Judge Alsup identified in building an unauthorized dataset.
- Reduces the risk of market harm claims under Judge Chhabria’s dilution theory.
What This Means for AI Companies
These cases are not free passes; they are early guideposts in an unsettled legal landscape. Here’s what you should take away:
- Fair use can apply to training if the content is lawfully acquired and the purpose is transformative.
- Pirated or scraped data remains a major risk, and courts are increasingly willing to scrutinize it.
- Market dilution arguments are gaining traction and could lead to liability even when no direct copying occurs.
In the case of video training data, where the stakes (and evidence trails) are even clearer, licensing and ethical sourcing are not only the right thing to do but also the safest legal strategy.
The Versos Commitment
Our company has chosen a clear path. We are building a video data processing platform that tracks copyright through the supply chain, supporting AI models with ethically sourced video licensed from rights holders, fairly compensated, clearly documented and with “chain-of-custody” tracked. We believe this approach:
- Protects the creators whose work shapes our cultural memory.
- Reduces legal exposure for our business and partners.
- Enables us to innovate responsibly, with transparency and accountability.
Final Thought
The next generation of AI models will be defined by their capabilities and how they were trained. If the courts are telling us anything, it’s this: the era of scraping and hoping for fair use immunity is over.
Ethically sourced content is the future and the foundation of a sustainable, creator-respecting AI ecosystem.
Book a demo with Versos to see how you can monetize the ethically sourced content in your library.