Somewhere in the architecture of every major AI model — the ones answering your questions, writing your emails, generating your images — lives the unlicensed work of millions of human beings who never agreed to be training data. Painters, novelists, journalists, musicians, photographers, and coders: their labor, their craft, their intellectual output quietly absorbed into datasets measured not in pages or canvases but in billions of data points. That absorption is now the subject of the most consequential copyright battles the American legal system has faced since the internet itself rewrote the rules of intellectual property.

The courtrooms are filling up. The stakes are existential — not just for the AI companies, but for every creator who has ever posted their work online under the reasonable assumption that the world would read it, not consume it as machine fuel.

How We Got Here: The Architecture of the Problem

To understand why these cases matter, you have to understand how large language models and image-generation systems are built. They are not programmed with rules. They are trained on examples — vast, almost incomprehensibly large collections of text, images, audio, and video scraped from the open web. The LAION-5B dataset, which underpinned the development of Stable Diffusion and systems built on top of it, consisted of five billion images pulled from the internet without licensing agreements, without artist consent, and without compensation (NYU Journal of Intellectual Property & Entertainment Law, 2023). OpenAI’s GPT models were trained on datasets so large that researchers describe them as equivalent to a Microsoft Word document stretching over 3.7 billion pages (Harvard Law Review, 2024).

For decades, courts largely held that indexing and processing publicly available data was permissible under copyright law — a legal framework built before generative AI existed. The implicit assumption was that a search engine crawling a page was functionally different from a human copying a manuscript. What no one fully anticipated was a system that doesn’t just index content but internalizes it, learns from it, and then produces outputs that can directly compete with the original sources.

That competitive threat is precisely where the lawsuits are concentrated.

The Landmark Cases: A Battlefield Overview

New York Times v. OpenAI and Microsoft is the case that most cleanly defines the fault lines. Filed in December 2023, the lawsuit alleges that OpenAI incorporated millions of the newspaper’s copyrighted articles into the training data for its GPT models without permission or payment — and that the resulting system can, under certain conditions, reproduce those articles nearly verbatim, effectively bypassing the Times’s paywall (NPR, March 2025). OpenAI’s defense centers on “fair use,” the legal doctrine that permits the use of copyrighted material when that use is transformative, non-competitive, and does not harm the market for the original work.

In March 2025, Judge Sidney Stein of the Southern District of New York denied most of OpenAI’s motion to dismiss, allowing the core copyright infringement claims to proceed toward trial (Axios, April 2025). The case is now deep in discovery, with its own subsidiary drama: a May 2025 court order requiring OpenAI to preserve all ChatGPT conversation logs — affecting over 400 million users globally — as potential evidence of the system’s ability to reproduce copyrighted content (Nelson Mullins, July 2025). No trial date has been set. The next major legal threshold — summary judgment on fair use — is not expected before summer 2026 (ChatGPT Is Eating the World, October 2025).

Andersen v. Stability AI, Midjourney, and DeviantArt is the visual art equivalent. Filed in January 2023 by a group of artists including Sarah Andersen, Kelly McKernan, and Karla Ortiz, the class action alleges that their copyrighted images were scraped as part of the LAION dataset and used to train image-generation systems that can now produce work in their recognizable styles — on command, at scale, for free (Hollywood Reporter, 2024). The court has allowed claims of direct copyright infringement, trademark, and inducement to proceed, dismissing other counts. The case is in document discovery as of late 2025.

Getty Images v. Stability AI adds a particularly striking dimension: the allegation that Stable Diffusion not only trained on Getty’s twelve million licensed photographs without authorization but, in doing so, learned to replicate Getty’s watermarks in its outputs (McKool Smith, September 2025). The trademark implications alone are significant — a system generating images that mimic the visual signature of a company’s proprietary brand is territory copyright law was never designed to navigate.

On the music side, the Recording Industry Association of America filed suit in June 2024 against both Suno and Udio — AI music generators capable of producing full-length, commercially viable songs from text prompts. The RIAA alleges the systems were trained on copyrighted recordings without licensing. Udio settled with Universal Music Group in late 2025 (McKool Smith, November 2025). Suno’s case continues.

The Bartz v. Anthropic case resulted in a $1.5 billion class action settlement in 2025 — one of the largest of its kind — after Anthropic was found to have downloaded millions of pirated copies of books from “shadow libraries” to train its Claude models (Copyright Alliance, January 2026). The settlement was preliminarily approved by Judge Alsup, who drew a sharp distinction: the violation was not the act of training an AI, but the act of acquiring hundreds of thousands of copyrighted works through piracy, for whatever purpose.

The Fair Use Defense: A Shield Built for a Different Era

Fair use is a four-factor balancing test derived from Section 107 of the Copyright Act. Courts weigh: the purpose and character of the use (is it transformative?), the nature of the original work, the amount of the original taken, and the effect on the market for the original. AI companies have leaned heavily on the first and fourth factors, arguing their models are highly transformative analytical tools that do not substitute for original content.

Two courts have agreed with them — partially. In Kadrey v. Meta and in the Bartz case (on the training question specifically, not the acquisition), judges found that AI training constitutes a transformative use (OpenAI, 2025). But a third court has reached the opposite conclusion, and as of late 2025, the legal tracker tallies two rulings for AI companies on fair use and one against — with all three decisions described as nuanced rather than sweeping (ChatGPT Is Eating the World, October 2025).

The core tension is this: fair use was designed for scholarship, criticism, parody, and commentary — uses that engage with the original work in a recognizable way. A critic quoting a paragraph to analyze it transforms that paragraph by putting it in a new context. An AI system ingesting five billion images to learn statistical patterns does something categorically different. Whether courts will ultimately conclude those are legally equivalent acts is, as of this writing, an open question that will likely require appellate intervention to resolve conclusively.

The market substitution factor is equally fraught. If a user can prompt ChatGPT to reproduce a Times article and read it for free, that substitutes for paying the Times. If a user can generate a photorealistic image in the style of a licensed Getty photographer without paying for a license, that substitutes for a licensing transaction. The AI companies respond that their tools do not function as substitutes for original sources for most users, in most contexts. That argument works better for some use cases than others.

The Legislative Response: Disclosure, Consent, and the EU’s Opening Move

Courts are not the only arena. The U.S. Congress introduced the Generative AI Copyright Disclosure Act of 2024, which would require AI developers to publicly disclose the datasets used to train their models — a transparency mandate rather than a prohibition, but one that would give rights holders the information they need to pursue claims (USC IP & Technology Law Society, 2025). The No AI FRAUD Act, also introduced in 2024, targets a narrower problem: the use of AI to impersonate real people without consent.

In Europe, the EU AI Act represents the most comprehensive regulatory framework yet enacted. It mandates that AI systems used in high-risk contexts comply with specific requirements around transparency and data governance, and opens the door to further amendment on data provenance and artistic ownership (European Parliament, 2024). The UK Intellectual Property Office issued a framework in May 2025 requiring that derivative AI-generated content be disclosed with documented data lineage — not yet enforceable, but a clear statement of regulatory intent (BLAZ Project, 2025).

Canada and South Korea are both considering legislation that would categorically restrict training on protected content without explicit permission from rights holders. The global picture is one of regulatory momentum clearly moving toward transparency and consent — even as the legal standards for what constitutes infringement remain unsettled.

The Artist’s Arsenal: Glaze, Nightshade, and the Technical Counter-Offensive

While litigation moves at its deliberate pace, some creators have stopped waiting for courts to protect them and taken matters into their own hands — technically. The University of Chicago developed Glaze and Nightshade, tools that allow artists to subtly alter their images at the pixel level in ways imperceptible to human eyes but disruptive to AI training pipelines. Images processed with Nightshade are designed to “poison” models that train on them, causing misclassification and degraded outputs (McKool Smith, June 2025). The tools have become popular enough that Stability AI sought, in the Andersen litigation, to prevent the plaintiffs’ technical expert — the developer of Glaze and Nightshade — from examining their source code.

Beyond technical tools, artists have organized internationally. Coalitions including the Creative Rights Front and the Visual Authors Network are lobbying for mandatory dataset disclosures and compensation mechanisms modeled on music streaming royalties (BLAZ Project, 2025). In April 2025, a French court ordered an AI firm that had used a graphic novelist’s work without authorization to cease distribution and pay €280,000 in damages — a relatively small sum by tech-industry standards, but widely cited as a turning-point ruling that has encouraged more artists to pursue legal action.

Blockchain-based watermarking and metadata tagging are increasingly deployed to establish provenance and create a digital chain of custody for creative work — the legal equivalent of a hallmark stamp on a piece of crafted silver.

The Deeper Stakes: What the Outcome Will Actually Determine

Strip away the legal procedural language and the arguments about tokenization and statistical patterns, and the underlying question is straightforward: does the accumulation of human creative output — billions of decisions made by individual artists, writers, photographers, and musicians about what to create and how to share it — constitute a public commons that AI companies can mine freely? Or does copyright law, properly interpreted, require that using someone’s work as fuel for a commercial engine constitutes an act that demands consent and compensation?

The answer courts give to that question will determine whether the AI industry’s foundational business model — train on everything, license nothing, build value on the aggregate — can continue. If the major pending cases resolve in favor of rights holders, AI companies will face either massive retroactive liability or the obligation to negotiate licensing frameworks at industrial scale. Several publishers — the Associated Press, News Corp., Vox Media — have already chosen that path voluntarily, reaching content-sharing agreements with OpenAI rather than litigating. The Times, the musicians, and the visual artists have chosen to fight instead.

As of early 2026, the battlefield has more participants than ever. The number of infringement cases filed against AI companies more than doubled between the end of 2024 and 2025, from around 30 to over 70. Summary judgment on fair use is not expected from most courts until at least summer 2026. The appellate courts — including the Ninth and Third Circuits — will ultimately have to resolve the contradictions between district-level rulings before any genuine legal clarity arrives.

What is already clear is that the era of building AI on an assumption of free access to human creative work is over. Whether it ends in licensing regimes, legislative mandates, settlement frameworks, or landmark judicial decisions, the costs of that assumption are being counted — case by case, ruling by ruling, artist by artist. The dataset is not a commons. It was built by human hands. And the hands are finally demanding an accounting.

Sources

Copyright Alliance. AI Copyright Lawsuit Developments in 2025: A Year in Review. January 8, 2026. https://copyrightalliance.org/ai-copyright-lawsuit-developments-2025/
USC IP & Technology Law Society. AI, Copyright, and the Law: The Ongoing Battle Over Intellectual Property Rights. February 4, 2025. https://sites.usc.edu/iptls/2025/02/04/ai-copyright-and-the-law-the-ongoing-battle-over-intellectual-property-rights/
NYU Journal of Intellectual Property & Entertainment Law. Andersen v. Stability AI: The Landmark Case Unpacking the Copyright Risks of AI Image Generators. 2023. https://jipel.law.nyu.edu/andersen-v-stability-ai-the-landmark-case-unpacking-the-copyright-risks-of-ai-image-generators/
Harvard Law Review. NYT v. OpenAI: The Times’s About-Face. April 10, 2024. https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-timess-about-face/
NPR. Judge Allows ‘New York Times’ Copyright Case Against OpenAI to Go Forward. March 26, 2025. https://www.npr.org/2025/03/26/nx-s1-5288157/new-york-times-openai-copyright-case-goes-forward
Axios. NYT Case Against OpenAI and Microsoft Can Advance. April 1, 2025. https://www.axios.com/2025/04/01/nyt-openai-microsoft-lawsuit-advances
Nelson Mullins. From Copyright Case to AI Data Crisis: How The New York Times v. OpenAI Reshapes Companies’ Data Governance and eDiscovery Strategy. July 10, 2025. https://www.nelsonmullins.com/insights/blogs/corporate-governance-insights/all/from-copyright-case-to-ai-data-crisis-how-the-new-york-times-v-openai-reshapes-companies-data-governance-and-ediscovery-strategy
McKool Smith. AI Infringement Case Updates: September 15, 2025. https://www.mckoolsmith.com/newsroom-ailitigation-36
McKool Smith. AI Infringement Case Updates: November 24, 2025. https://www.mckoolsmith.com/newsroom-ailitigation-46
ChatGPT Is Eating the World. Status of All 51 Copyright Lawsuits v. AI (Oct. 8, 2025). October 8, 2025. https://chatgptiseatingtheworld.com/2025/10/08/status-of-all-51-copyright-lawsuits-v-ai-oct-8-2025-no-more-decisions-on-fair-use-in-2025/
BLAZ Project. AI Artists Face Lawsuits: Copyright Issues in the Spotlight. July 23, 2025. https://blaz-project.com/en/ai-copyright-cases/
Hollywood Reporter. Judge Advances Copyright Lawsuit by Artists Against AI Art Generators. August 14, 2024. https://www.hollywoodreporter.com/business/business-news/artists-score-major-win-copyright-case-against-ai-art-generators-1235973601/
Sustainable Tech Partner. Generative AI Lawsuits Timeline. February 2026. https://sustainabletechpartner.com/topics/ai/generative-ai-lawsuit-timeline/
European Parliament. EU AI Act: First Regulation on Artificial Intelligence. June 2024. https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

Copyright and the Dataset: The Legal Battle Over Scraping Human Art for Machine Training

How We Got Here: The Architecture of the Problem

The Landmark Cases: A Battlefield Overview

The Fair Use Defense: A Shield Built for a Different Era

The Legislative Response: Disclosure, Consent, and the EU’s Opening Move

The Artist’s Arsenal: Glaze, Nightshade, and the Technical Counter-Offensive

The Deeper Stakes: What the Outcome Will Actually Determine

Sources

Related

The Latent Space: How Diffusion Models Translate Text into High-Fidelity Visuals

The Uncanny Valley of Generative Video: Temporal Consistency and the Physics of AI Motion

The Economic Impact of Midjourney: Assessing the Future of Freelance Illustration

Authenticity in the Age of Synthesis: How Traditional Galleries Are Adapting to AI-Generated Art

Marcellino NY vs. Berluti: Patina as Art — Who Does It Better?

Brawler by Lauren Groff — Nine Stories That Refuse to Look Away

The Miller Place Academy: Where the North Shore Came to Learn — and Still Remembers

The Vaudeville Summer Colony of St. James: How Stage and Screen Legends Transformed a Quiet North Shore Village Into a High-Society Retreat

Marcellino NY vs. Frank Clegg: Two American Briefcase Makers, Two Different Philosophies

Marcellino NY vs. Berluti: Patina as Art — Who Does It Better?

Brawler by Lauren Groff — Nine Stories That Refuse to Look Away

The Miller Place Academy: Where the North Shore Came to Learn — and Still Remembers

The Vaudeville Summer Colony of St. James: How Stage and Screen Legends Transformed a Quiet North Shore Village Into a High-Society Retreat

Marcellino NY vs. Frank Clegg: Two American Briefcase Makers, Two Different Philosophies

How We Got Here: The Architecture of the Problem

The Landmark Cases: A Battlefield Overview

The Fair Use Defense: A Shield Built for a Different Era

The Legislative Response: Disclosure, Consent, and the EU’s Opening Move

The Artist’s Arsenal: Glaze, Nightshade, and the Technical Counter-Offensive

The Deeper Stakes: What the Outcome Will Actually Determine

Sources

Related

Similar Posts