Meta just told a federal court that scraping pirated books to train AI models is fair use, and the legal logic might actually hold.

The Signal

Meta is defending its LLaMA training data in a lawsuit from book publishers, and their argument cuts straight to what counts as transformative use. They're not claiming they didn't use pirated books from LibGen and Bibliotik. They're claiming it doesn't matter because the purpose was fundamentally different from the original. The books were inputs for statistical pattern learning, not reading material for humans.

This isn't new territory. Google won a similar case in 2015 when they scanned millions of books without permission for Google Books. The court ruled that transforming books into searchable data was fair use even though Google copied entire works. Meta is extending that logic: if converting books to search indexes is transformative, converting them to training data for language models is too. The AI doesn't store the books. It learns statistical relationships between words. The output isn't a copy, it's a probability distribution.

The stakes go beyond Meta. Every foundation model has been trained on data scraped from the internet, much of it copyrighted. If courts rule this isn't fair use, the entire AI training paradigm collapses or gets paywalled behind licensing deals only the biggest players can afford. OpenAI, Anthropic, Google, every lab that matters has the same exposure. Meta is fighting the test case that determines whether training data is a moat or a commodity.

The Implication

Watch this case. If Meta wins, AI training stays open and competitive. If they lose, expect a new licensing layer to emerge where publishers and rights holders extract rent from every training run. That favors incumbents with deep pockets and existing content deals. It also creates a market for tokenized data rights, where proving ownership and licensing training data becomes its own infrastructure play. The agent economy depends on cheap, abundant training data. This lawsuit decides if that stays true.


Source: Hacker News Best