
The New York Times restricts OpenAI from using its content to train AI models
The New York Times blocks OpenAI's web crawler and explores potential legal action over intellectual property concerns, amid disputes regarding content usage for AI training.


Highlights
- The New York Times has barred OpenAI's GPTBot from content access since 17 August
- NYT contemplates suing OpenAI for potential copyright violations, joining the broader debate on AI and intellectual property
- The conflict showcases the evolving challenges at the intersection of technology, copyright, and AI training
In a recent development, The New York Times has taken a firm stance against OpenAI's web crawler, effectively preventing OpenAI from utilising the publication's content to enhance its AI models. Notably, the renowned newspaper has employed a robots.txt directive, specifically targeting GPTBot, the web crawler introduced by OpenAI. This measure was enacted as early as 17 August, as observed in the archives of the Internet Archive's Wayback Machine.

This action follows the New York Times' recent update to its terms of service at the beginning of the month, in which the newspaper explicitly prohibited the utilisation of its content for training artificial intelligence models.
A web crawler, also known as a web spider or web robot, is a computer program that systematically browses the internet to gather information from websites. It does this by following links from one page to another, retrieving data and content along the way.
Web crawlers are commonly used by search engines to index web pages, and they play a crucial role in collecting data for various purposes such as data mining, research, and content indexing.
The clash between the New York Times & OpenAI escalates
Despite these developments, both parties have refrained from immediate commentary. Charlie Stadtlander, a spokesperson for the New York Times, declined to provide insight into the matter, while OpenAI has yet to respond to requests for clarification.
This situation takes a potentially litigious turn, with reports emerging that The New York Times is contemplating legal recourse against OpenAI for alleged infringements on intellectual property rights. NPR reported last week that the newspaper is considering legal action.
If pursued, The New York Times would join the ranks of other entities, such as comedian Sarah Silverman and two other authors, who lodged a lawsuit against OpenAI in July. Their grievance pertained to the usage of Books3, a dataset integral to ChatGPT's training, which reportedly contains a substantial number of copyrighted works.
Furthermore, programmer and lawyer Matthew Butterick has also made claims, asserting that OpenAI's data scraping practices amount to a form of software piracy.
In a rapidly evolving landscape at the intersection of technology, copyright, and artificial intelligence, the situation between OpenAI and The New York Times presents a case study of the complex legal and ethical considerations that come into play as AI models continue to evolve and utilise vast amounts of textual data.
COMMENTS 0