Anthropic’s Claude Is Crawling the Web at Unprecedented Scale — and Website Owners Are Scrambling to Respond

Anthropic, the artificial intelligence company behind the Claude chatbot, has dramatically increased its web crawling activity in recent months, raising alarm among website operators, SEO professionals, and publishers who say the aggressive data harvesting is straining their infrastructure and ignoring established protocols designed to control automated access.
According to reporting by Search Engine Land, Anthropic’s bots have been observed crawling websites at rates that far exceed what most site owners consider reasonable, with some reporting hundreds of thousands of requests in short time periods. The activity has prompted a growing backlash from the webmaster community, which is now grappling with how to manage AI crawlers that behave very differently from traditional search engine bots like Googlebot.
A Flood of Requests That Servers Can’t Handle
The scale of the crawling has caught many website operators off guard. Reports from multiple sources indicate that Anthropic’s crawlers — identified by user-agent strings such as “ClaudeBot” and “anthropic-ai” — have been hammering websites with request volumes that can degrade performance for human visitors. Some site administrators have reported seeing tens of thousands of page requests per day from Anthropic’s IP addresses, with little regard for crawl-delay directives in robots.txt files.
This behavior stands in sharp contrast to how established search engines typically operate. Google, Bing, and other major search crawlers have spent years developing systems that respect server capacity, throttle request rates, and honor the robots.txt standard that has governed crawler-website relations since the mid-1990s. Anthropic’s bots, according to the complaints documented by Search Engine Land, appear to be far less restrained.
The robots.txt Problem: Compliance in Question
At the heart of the controversy is a fundamental question about whether AI companies are obligated — legally or ethically — to respect the wishes of website owners who do not want their content scraped for AI training purposes. The robots.txt protocol, while widely respected, is technically voluntary. There is no law in the United States that requires a crawler to obey a disallow directive, though ignoring such directives can raise questions under computer fraud and copyright statutes.
Anthropic has published documentation indicating that website owners can block ClaudeBot by adding specific directives to their robots.txt files. However, multiple webmasters have reported that even after implementing these blocks, they continued to see crawling activity from Anthropic-associated IP addresses using different user-agent strings or behaving in ways that circumvented standard blocking methods. This has led to frustration and, in some cases, to site owners resorting to IP-level blocking — a more aggressive and maintenance-intensive approach.
AI Training Data: The Core Motivation
The reason for the aggressive crawling is straightforward: AI models like Claude require enormous volumes of text data to train and improve. The more diverse and current the training data, the more capable the resulting model. For Anthropic, which competes directly with OpenAI, Google DeepMind, and Meta AI, the pressure to acquire high-quality training data is immense. Web crawling remains one of the most efficient methods for gathering such data at scale.
But the economics of this arrangement are deeply asymmetrical. Website owners bear the cost of serving pages to AI crawlers — bandwidth, server resources, and infrastructure expenses — while receiving nothing in return. Unlike traditional search engine crawling, which at least offers the implicit bargain of increased visibility through search results, AI training crawlers extract value from content without driving any traffic back to the source. This has led some publishers and content creators to describe the practice as extractive and exploitative.
Industry Pushback Is Growing
The frustration is not limited to small website operators. Major publishers have been increasingly vocal about the need for AI companies to negotiate licensing agreements rather than simply scraping content without permission. The New York Times, for example, has filed a high-profile lawsuit against OpenAI and Microsoft, alleging copyright infringement related to the use of its articles in training data. While that case is directed at OpenAI, the legal theories involved apply equally to other AI companies, including Anthropic.
In the SEO and webmaster communities, the discussion has taken on a more technical flavor. Professionals are sharing strategies for identifying and blocking AI crawlers, comparing notes on which user-agent strings to target, and debating whether the current robots.txt framework is adequate for managing a new generation of automated agents that have very different incentives than traditional search bots. Some have called for a new standard — or at least an extension of the existing one — that specifically addresses AI training crawlers and gives site owners more granular control over how their content is used.
Anthropic’s Position and Public Statements
Anthropic has generally positioned itself as one of the more responsible actors in the AI space. The company, founded by former OpenAI executives Dario and Daniela Amodei, has made AI safety a central part of its public messaging. Its website includes instructions for blocking ClaudeBot, and the company has stated that it aims to respect the preferences of website owners.
However, the gap between stated policy and observed behavior is what has drawn criticism. When crawlers continue to access sites despite robots.txt blocks, or when the volume of requests overwhelms server infrastructure, the company’s safety-first reputation takes a hit. Anthropic has not issued a detailed public response to the specific complaints raised by webmasters, though the company has acknowledged that it is working to improve its crawling practices. As reported by Search Engine Land, the situation remains unresolved for many affected site owners.
Legal and Regulatory Dimensions
The legal framework governing web scraping and AI training data remains unsettled. In the United States, courts are currently weighing multiple cases that could set important precedents. Beyond the New York Times litigation, cases involving visual artists, software developers, and music rights holders are all testing the boundaries of fair use doctrine as applied to AI training.
In Europe, the situation is somewhat clearer. The EU’s AI Act and existing data protection regulations under GDPR provide additional tools for content owners to push back against unauthorized scraping. Some European publishers have already begun sending formal opt-out notices to AI companies under the EU’s text and data mining exceptions, which require companies to respect such requests. Whether Anthropic’s crawlers comply with these requirements in practice is another open question.
What This Means for the Open Web
The broader implications of aggressive AI crawling extend beyond any single company. If AI firms can freely harvest web content without compensation or consent, the incentive structure that supports the open web — where publishers create content in exchange for traffic and advertising revenue — begins to break down. Why invest in producing high-quality content if it will simply be absorbed into an AI model that competes with your own website for user attention?
This concern has led to a growing movement among publishers to gate their content more aggressively, implement paywalls, or restrict access to authenticated users. Some have begun using technical measures like JavaScript rendering requirements and CAPTCHAs to make automated scraping more difficult. While these measures can be effective, they also risk degrading the experience for legitimate users and search engine crawlers alike.
The Path Forward Remains Uncertain
For now, the tension between AI companies and content creators shows no signs of easing. Anthropic’s crawling activity is just one manifestation of a much larger conflict over who owns the value created by web content and who gets to profit from it. The outcome will likely be determined by a combination of legal rulings, regulatory action, and industry negotiations — none of which are moving as quickly as the technology itself.
Website owners who are concerned about AI crawling should take immediate steps to audit their server logs for AI-related user-agent strings, implement robots.txt directives targeting known AI crawlers, and consider IP-level blocking as a fallback. Industry groups and standards bodies, meanwhile, face pressure to develop updated protocols that reflect the realities of a web increasingly shaped by artificial intelligence. The old rules were written for a different era, and the current friction makes clear that new ones are urgently needed.