The New York Times and Open AI: Newspaper copyright in the age of chatbots

the New York Times building in New York
The New York Times building is shown in New York. The New York AP Photo/Mark Lennihan

Among all of the public indicators suggesting that we’re moving deeper into uncharted territory with respect to generative AI, a potential New York Times lawsuit against Open AI — maker of ChatGPT — might be the most glaring.   

According to reporting from NPR, the Times is mulling legal action against the tech company after negotiations between the two organizations over a licensing arrangement, whereby OpenAI could access the newspaper’s archives and reporting, went south. 

Chief among the concerns raised by the Times is that Open AI would become “a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper’s staff,” NPR, whose reporters spoke with several Times attorneys, reports.

headshot of Jeremy Paul
Northeastern University Professor of Law Jeremy Paul. Photo by Alyssa Stone/Northeastern University

A Times lawsuit would by no means be the first legal spear hurled in Open AI’s direction from publishers; the tech company already faces a class-action lawsuit from a group of authors that includes, among others, the comedian Sarah Silverman for allegedly incorporating parts of their works into their AI tools — including training for those tools’ datasets. But should the Times and Open AI come to blows, it would set up a precedent-setting clash between the publishing world and the neighboring world of AI tech and large language models.

Indeed, a potential legal battle would come at a moment of heightened suspicion about the goings-on at Open AI, as is evidenced by a Vanity Fair piece detailing the Times’ distress about the emerging technology. “Do not put any proprietary information, including published or unpublished Times articles, into generative AI tools, such as ChatGPT, Bing Chat, Bard or others,” several New York Times deputy managing editors wrote in an email to the broader newsroom over the summer.

Northeastern Global News spoke with Jeremy Paul, a professor of law and former dean of the Northeastern University School of Law, to discuss the potential implications of a potential legal battle between one of the world’s leading newspapers and the AI company. The conversation has been edited for brevity and clarity.

What do you make of the New York Times exploring legal action against Open AI, and the reporting by NPR?

All the players are being forced to navigate this situation under the outdated law of copyright, which self-evidently was not written with this problem in mind. AI is new technology, and Congress should consider the many facets of the situation and write new laws. There is much talk in Congress about updating copyright but…

Some things are clear. Were AI to be literally publishing lengthy excerpts word for word from New York Times stories, that would be unlawful and highly unlikely to be protected by a fair use defense. Online postings from ChatGPT would be competing with the New York Times for readers interested in news and ChatGPT can’t just steal New York Times formulations. Indeed, the fair use defense referenced in the article doesn’t seem that strong.

But that’s not really what’s happening here. Leaving aside how the tech works for a minute, the response to the legal challenge of unauthorized copying is that ChatGPT is effectively “reading” or, to use Ms. [Sarah] Silverman’s formulation, “ingesting” material from the New York Times and then using what it has learned to inform its own readers. 

If a writer for the Wall Street Journal read New York Times stories on a topic and then wrote her own story, as long as she copied only facts and information but not expression, there would be no copyright violation. So why should there be a violation here?

Based on the — albeit, little — information available, do you think they could have a legitimate copyright claim?

One answer might be that the way ChatGPT works; it must effectively copy the entire New York Times somewhere onto a computer in order to use it. If that’s true — and I don’t know if it is — then the New York Times could have a stronger copyright claim. But if the New York Times is merely saying that it did the work and ChatGPT is benefiting, that’s a strong moral and policy argument but not yet a legal one.

Are there any historical examples to draw on here as it relates to publishing copyright?

There is a strong historical example of a similar problem when the U.S. Supreme Court acted on its own to tackle a problem that statutes hadn’t yet addressed. Early in the 20th century, the Associated Press had a vast apparatus gathering news around the world. Various papers subscribed to the AP and printed their stories. 

A competing news service, [International News Service], got the news that AP had gathered by reading bulletin boards on which the news was posted. INS then sent stories reporting the same information — but not copying any words — to its members via telegraph, and INS members were able to publish the info AP had gathered ahead of some AP papers. The AP sued to stop INS from “stealing” this info. 

But AP faced a problem that it could not copyright the news. You can copyright expression but not facts. Nonetheless, the court ruled for AP to protect it against “unfair competition.” Today’s Supreme Court can’t do something similar for complicated reasons, but state courts could. 

Tanner Stening is a Northeastern Global News reporter. Email him at t.stening@northeastern.edu. Follow him on Twitter @tstening90.