
By Hadi Brenjekjy – Board Member, London Intercultural Centre
When the Claude AI case hit the headlines, I was shocked! It made me wonder just how far we are pushing the limits in this AI race. For those who have missed the news: Anthropic, the company behind Claude, has admitted to downloading over 7 million pirated books…yes, pirated.. reason? to train their AI.
Let’s pause.
This is a choice they have made. A breach of both compliance and community trust.
Claude Legal Lowdown
Sure, a U.S. judge ruled that using copyrighted books for training can fall under fair use if it’s transformative enough. But that’s not a free pass to loot the intellectual commons. Downloading pirated content is still illegal. And Anthropic knew that.
The court’s basically saying: “Training on legit content = maybe okay.
Downloading 7 million stolen books? Absolutely not.”
December 2025, we are watching.
A Brief History of AI’s Appetite for Data
To understand why data is everything, we have to zoom out.
Large Language Models (LLMs) like Claude, ChatGPT, and others, were raised on massive volumes of data. Basically trillions of words from books, websites, forums, academic journals, news articles, and more, you name it. The idea is simple: the more you feed the model, the better it understands context and human expression.
OpenAI, for example, trained GPT-3 and GPT-4 on a mix of Common Crawl (a web scraping project), Wikipedia, open-access books, and a blend of licensed datasets. They have since confirmed deals with publishers like Associated Press and Reddit to access higher-quality, ethically sourced material.
But early on, even OpenAI faced backlash over vague disclosures about training sources. The AI community was concerned: “Did you ask permission? Who owns the words inside your models?”
That concern blew up when The New York Times sued OpenAI for allegedly using its articles without consent.
So no one’s hands are completely clean. But there’s a line between murky and malicious. Between maybe you scraped something gray and you definitely downloaded 7 million pirated books from illegal sites and built your tool off them.
That’s where Anthropic crossed the line.
Why This Should Alarm Us All
What Anthropic did undermines all of that. it’s about trust, fairness, and respecting the work of real people: authors, translators, educators.. its copyright!
Imagine being an independent writer, pouring your soul into a book for years, only to discover it’s been vacuumed up by a multibillion-dollar company and turned into chatbot. No credit. No consent. No coin.
If the goal is to build AI that benefits society, it can’t start on a foundation of exploitation. What starts wrong, ends wrong.
Our Call to the Industry
Let this be a wake-up call to every company using AI:
compliance isn’t optional. Trust isn’t infinite.
At LIC, we urge all organisations, especially those in education, policy, and culture, to vet their AI partners. Always ask:
- How was your AI trained?
- Who owns the content in your models?
- What are your values?
Until there’s transparency, there can be no true trust.
We also support initiatives calling for AI transparency labels, clear disclosures of what data went into training models, what rights were respected, and how future updates are governed. Much like food labeling (fair trade).
A Note to Creators
To every author whose work may have been scraped without consent: we see you. We stand with you.
Your words matter. And the fact that the tech elite sometimes act like your stories are just “tokens” for training doesn’t diminish their value.
And to every AI developer still trying to do the right thing in a messy system: keep going. But make it clean. Make it accountable. Make it human.
Because if the future is built on stolen work, then the machines are not the real threat
we are.
Hadi