The companies that define words are now fighting over who owns the meaning we fed to machines.
The Signal
Encyclopedia Britannica and Merriam-Webster just filed suit against OpenAI, claiming copyright violation on nearly 100,000 articles used for LLM training. This isn't another newspaper suing a tech company. This is the dictionary and the encyclopedia, the reference standards we've trusted for generations, saying OpenAI took their carefully crafted definitions and explanations without permission or payment.
The timing matters. We're past the "move fast and break things" phase of AI development. Every major foundation model company is now fighting copyright battles on multiple fronts. But this suit cuts deeper than most. Britannica and Merriam-Webster didn't just compile facts, they synthesized human knowledge into structured, authoritative text. That's exactly what makes their content valuable for training models that need to understand language precisely.
The number is significant: 100,000 articles represents decades of editorial work, expert verification, and institutional knowledge. If OpenAI loses, the precedent doesn't just affect dictionaries. It affects every knowledge repository, every curated database, every institution that spent years building structured understanding of the world. The question isn't just about compensation. It's about whether AI companies can legally bootstrap intelligence from the careful work others did to organize human knowledge.
The Implication
Watch how OpenAI responds. If they settle quickly, it signals they know their training data sourcing won't hold up in court. If they fight, we're headed toward a legal framework that will reshape how foundation models get built. For anyone building AI products, this is your warning: the free-for-all era of training data is ending. Start budgeting for licensing, or start building with models whose provenance you can defend.
Sources: TechCrunch AI | TechCrunch AI