US Media Giants Block AI Crawlers to Protect Copyrights

Zach Anderson  Jan 26, 2024 19:30  UTC 11:30

2 Min Read

Highlighting the tension between media companies and artificial intelligence (AI) technology, a recent Wired report has revealed that 88% of leading news outlets in the United States are actively blocking AI web crawlers. This move, driven by concerns over copyright infringement and the uncompensated use of content, reflects a growing resistance within the media industry against the data collection activities of AI entities.

The survey conducted by Ontario-based AI detection startup Originality AI encompassed 44 top news sites, including prominent organizations such as The New York Times, The Washington Post, and The Guardian. It found that these media houses have initiated measures to restrict the data collecting activities of AI companies. OpenAI’s GPTBot has been identified as the most widely blocked crawler, with many media companies implementing restrictions especially after OpenAI's announcement in August 2023 that its crawler would respect robots.txt flags, which are used by websites to control web crawler access.

This escalating conflict reached a new peak last December when The New York Times filed a lawsuit against OpenAI. The lawsuit alleges copyright infringement due to the unauthorized use of published works by OpenAI for training chatbots. The New York Times contends that millions of its articles have been utilized in training chatbots, which now serve as alternative sources of information, potentially undermining the credibility and financial sustainability of traditional media outlets. The media giant is seeking billions of dollars in statutory and actual damages, marking a pivotal moment in the legal landscape surrounding AI and media.

During a hearing in the Judiciary Committee’s privacy and technology subpanel, a group of witnesses representing local and national media organizations urged lawmakers to intervene and prevent AI companies from using copyrighted news content without appropriate credit or compensation. They argued that AI companies are using the “fair use” provision of U.S. intellectual property law to justify training their models on copyrighted news material. However, this interpretation of the fair use statute is challenged by the media organizations, which argue that the use of their content for training AI models goes far beyond the established legal guardrails.

As media companies bolster their defenses against AI bots, the dispute underscores the complex interplay between technological advancement and content protection. It raises critical questions about the future of information dissemination, journalistic integrity, and the democratization of knowledge in the era of technological disruption.

This development has implications beyond the immediate legal battles and technical measures. It delves into fundamental issues about the role and impact of AI in the media landscape, highlighting the need for a balanced approach to innovation and accountability in the digital age.


Image source: Shutterstock


Read More