Nearly Right

Infrastructure power trumps regulation as AI content extraction reaches crisis point

Cloudflare's crawler blocking reveals how internet gatekeepers now set economic policy whilst governments struggle with digital governance

The numbers expose everything you need to know about artificial intelligence's relationship with online publishing. Recent data from Cloudflare shows Anthropic's AI crawlers scraped websites 73,000 times for every single user they referred back to the original source. OpenAI managed 1,700 scrapes per referral. Google Search, by contrast, operates at 14 scrapes per referral.

These aren't abstract statistics. They represent the collapse of the internet's fundamental economic bargain - the implicit agreement that drove two decades of digital growth. Publishers created content, search engines indexed it, users discovered websites through search results, and advertising revenue flowed back to content creators. Artificial intelligence has shattered this arrangement, replacing symbiotic relationships with pure extraction.

Cloudflare, the infrastructure company that manages traffic for a fifth of the global web, has responded with something unprecedented: unilateral economic policy-making. Starting in July 2025, the company began blocking AI crawlers by default for all new customers, forcing AI companies into direct licensing negotiations with publishers. No government mandate, no regulatory process, no international treaty. Simply a software update that reshaped the economics of digital content.

"If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone," declared Matthew Prince, Cloudflare's chief executive. His company's position managing one-fifth of internet traffic grants it regulatory-like authority that most governments can only envy.

The economics of content extraction

The scale of value extraction reveals why publishers are desperate for solutions. Marc McCollum from advertising optimisation company Raptive estimates that AI-powered search features now cost publishers $2 billion annually in lost advertising revenue. When Google's AI Overviews appear in search results, research shows click-through rates to original websites decline by an average of 34.5%. Some publishers report drops exceeding 54%.

Meanwhile, AI companies are building business models worth hundreds of billions of dollars on this content. The asymmetry is stark: Microsoft offered HarperCollins $5,000 - split between author and publisher - for three years of AI training rights. That sum could train models that generate millions in revenue whilst original content creators receive virtually nothing.

Publishers face costs without compensation. Martin Alderson from web performance consultancy Catch Metrics describes AI crawler activity as resembling denial-of-service attacks, with publishers absorbing server costs for traffic that generates no revenue. Justin Wohl, formerly of Salon, reports that crawlers appear "by user agent and by IP address with blatantly non-human signatures" that consume bandwidth whilst avoiding analytics tracking.

The traffic data reveals the problem's accelerating scope. DoubleVerify found that AI crawlers contributed to an 86% increase in invalid web traffic, with 16% of automated visits in 2024 coming from AI scrapers like GPTBot, ClaudeBot and AppleBot. Publishers are paying to feed systems designed to reduce their audience.

Historical precedents for platform devastation

Publishers recognise this pattern from Facebook's algorithm changes between 2016 and 2018, when the social media giant prioritised posts from friends and family over news content. Traffic dropped noticeably at major news outlets including CNN, ABC, NBC, The New York Times and The Washington Post. Some publishers reported losing over 80% of their Facebook referral traffic as the platform optimised for user engagement rather than external link-sharing.

But the AI disruption operates differently. Facebook offered alternative revenue-sharing arrangements and maintained some referral relationships. AI models are designed to keep users within AI interfaces, providing answers without directing traffic to source websites. The business model is inherently extractive rather than complementary.

Publishers adapted to Facebook's changes by diversifying traffic sources and revenue models. Industry data shows subscription revenue growing 14.4% in early 2025, whilst digital advertising increased 12.4%. This diversification strategy positions some publishers to weather continued disruption from AI-mediated content consumption, but smaller outlets without subscription potential face more severe challenges.

The Facebook precedent also demonstrates how platform decisions can devastate entire sectors of digital publishing. LittleThings, a viral content publisher, closed directly after Facebook's 2018 algorithm changes, eliminating jobs throughout its partner network. The collateral damage extended to companies like Jukin Media and talent agencies that had built businesses around Facebook's previous traffic patterns.

The licensing mirage

Despite industry rhetoric about AI licensing creating new revenue streams, research suggests most publishers will see little benefit. Joshua Benton from Harvard's Nieman Lab predicts that "most publishers will not get any meaningful revenue from licensing content to technology companies, and those who do are likely to be large publishers who get at most a few percent of incremental revenue."

The numbers support this pessimism. While major publishers like The New York Times, The Atlantic and News Corporation have signed licensing deals with AI companies, these agreements typically represent small percentages of total revenue. For companies like Axel Springer and Dotdash Meredith, continued decline in print advertising has far greater financial impact than AI licensing provides.

A Reuters Institute survey of news leaders found that 35% expected most AI licensing money to flow to large media companies, whilst 48% anticipated little money for any news organisation. Both predictions appear accurate. The licensing approach legitimises content extraction whilst concentrating benefits among publishers with sufficient market power to negotiate individual deals.

Smaller publishers face a binary choice: allow AI companies unrestricted access to their content or attempt to block crawlers through technical measures. Most lack the resources to negotiate licensing agreements or implement sophisticated blocking systems. They're effectively forced to subsidise AI development through unpaid content provision whilst watching their audience migrate to AI interfaces.

Regulatory fragmentation and infrastructure governance

Different jurisdictions are approaching AI-publisher relationships through distinct legal frameworks, creating a patchwork of regulation that may favour infrastructure-level solutions like Cloudflare's.

The European Union's AI Act requires general-purpose AI model providers to "put in place a policy to respect Union copyright law" and create public summaries of training content. The legislation explicitly links AI training to text and data mining exceptions under EU copyright law, potentially forcing AI companies to obtain licensing agreements with publishers who have opted out of automated content mining.

Australia's News Media Bargaining Code provides a more aggressive model. The legislation allows publishers to demand negotiations with digital platforms over content payment, with final-offer arbitration if agreements aren't reached voluntarily. The code worked: over 30 commercial agreements between platforms and Australian news organisations followed its introduction, deals that were "highly unlikely" without legislative pressure.

However, these regulatory approaches require years of political negotiation and face intense industry lobbying. Google threatened to withdraw search services from Australia before agreeing to voluntary negotiations. Facebook temporarily blocked Australian news sharing to demonstrate the platform's power.

Cloudflare's approach bypasses these political processes entirely. The company's infrastructure position allows it to implement economic policy through technical configuration. New websites receive crawler blocking by default. Existing customers can activate blocking with single-click options. AI companies must negotiate directly with individual publishers or lose access to content.

Infrastructure companies as economic regulators

This represents a fundamental shift in digital governance. Infrastructure companies increasingly wield regulatory-like authority over internet economics without traditional democratic oversight or international coordination.

Cloudflare's crawler blocking affects AI companies globally, not just those operating under specific national jurisdictions. The company's network architecture enables it to enforce publisher preferences regardless of where AI crawler requests originate. This gives infrastructure providers more effective regulatory capability than most governments possess.

The precedent extends beyond publishing. Amazon's cloud services, Google's search infrastructure, and Microsoft's productivity software platforms all make decisions that reshape industry economics. These companies increasingly function as private regulators, setting standards and policies that affect entire sectors.

Publishers appear comfortable with this arrangement when it serves their interests. Major outlets including Condé Nast, TIME, The Associated Press, The Atlantic and Fortune have publicly supported Cloudflare's crawler blocking initiative. They recognise that infrastructure governance may prove more responsive than traditional regulatory processes.

But infrastructure governance lacks democratic accountability or transparent decision-making. Cloudflare's policies reflect its business interests and customer demands rather than broader public interest considerations. The company could theoretically reverse its crawler blocking approach if AI companies offer more attractive partnership terms.

The future of content economics

Cloudflare's intervention may force AI companies toward sustainable business models, but significant questions remain about the future structure of online content economics.

Some AI companies are exploring revenue-sharing arrangements with publishers. Perplexity AI plans to share advertising revenue with news organisations when its chatbot cites their content in responses. OpenAI has signed content deals with major publishers including The Financial Times and News Corporation. Such arrangements suggest possible evolution toward more reciprocal relationships.

However, the fundamental tension persists between AI companies' desire to retain users within their interfaces and publishers' need for direct audience relationships. AI systems that consistently direct users to source websites undermine their own value proposition as comprehensive information providers.

Publishers are simultaneously developing their own AI capabilities whilst negotiating with external AI companies. Deutsche Presse-Agentur has built internal AI systems trained on its content archive, allowing journalists to query their organisation's reporting without external dependencies. This approach maintains editorial control whilst capturing AI efficiency benefits.

The infrastructure governance model may spread to other technology conflicts. Content delivery networks, cloud computing platforms, and internet service providers all possess technical capabilities to enforce economic policies on digital industries. Their intervention could become more common as traditional regulatory approaches struggle with technology's rapid evolution.

Yet infrastructure governance creates new dependencies and power concentrations. Publishers celebrating Cloudflare's crawler blocking must recognise their increased reliance on the company's continued cooperation. Infrastructure providers could become kingmakers in digital industries, determining which business models survive through technical policy decisions.

The resolution of this conflict will likely determine whether the internet remains an open information ecosystem or fragments into competing closed platforms. AI companies need sustainable access to content for model training and inference. Publishers need economic models that support continued content creation. Infrastructure companies have emerged as unexpected arbiters of this relationship.

Cloudflare's crawler blocking represents just the opening move in a broader negotiation over digital content economics. The company's success in forcing AI licensing discussions suggests that infrastructure governance may become the primary mechanism for resolving technology industry conflicts - a development with profound implications for how digital economies develop and who controls their evolution.

#artificial intelligence