Nearly Right

Personal blogging now requires bank-grade security as bot traffic saturates the internet

Bear Blog's weekend outage exposes the hidden infrastructure crisis making independent platforms economically unsustainable

Herman Martinus spent Saturday watching his life's work collapse. Not from technical failure or bad code—his reverse proxy simply drowned under tens of thousands of requests per minute. Five years of perfect uptime ended because automated bots wanted something his servers couldn't give fast enough. They didn't care. They never do.

Bear Blog hosts 5,500 personal blogs (including this one). Text files averaging 5kb. No tracking, no advertisements, no surveillance. The sort of platform that represents what the web was supposed to be before it became an opencast mine. Yet Herman now needs defences built for protecting banks just to keep these tiny text files online.

His monitoring system failed him. Despite being the only app permitted to wake him at night, it stayed silent. When he finally discovered the outage, the damage was done. His technical response was flawless—redundant monitoring, upgraded capacity, automated restart protocols. But the competence of his reaction only underscores the absurdity of the situation. This isn't a Herman problem. This is the new baseline for anyone running independent platforms.

The question isn't whether Herman can defend Bear Blog. It's whether anyone should have to.

When machines became the majority

The numbers tell a story most people haven't grasped. In 2024, automated bots generated more web traffic than humans for the first time in a decade. Not marginally more—they hit 51% of all traffic, with malicious bots alone accounting for 37%. That's up from 32% just a year earlier. Imperva blocked 13 trillion bad bot requests across thousands of domains in 2024. Thirteen trillion. The figure is almost incomprehensible until you're Herman, watching your infrastructure buckle.

This didn't happen gradually. Scraping activity surged 432% between the first and second quarters of 2023. The timing tracks perfectly with the explosion of large language models. Training data became valuable overnight, and every public website transformed into potential fodder. The incentive structure guarantees escalation—there's money in scraping, so scraping happens regardless of permissions, ethics, or collateral damage.

Three categories of bots now flood the web. AI companies scrape openly, identifying themselves in headers whilst hoovering up content. Malicious actors rotate through thousands of IP addresses, likely tunnelling through mobile devices on cellular networks, hunting for vulnerabilities. Then come the accidents—generative AI has made building scrapers so easy that novices spin them up without understanding what they've created. Home computers more powerful than virtual private servers launch accidental distributed denial of service attacks. The democratisation of technical capability removed the safety brake of competence.

Herman encountered all three that Saturday. His web application firewall and rate limiting worked perfectly. They sit downstream of the reverse proxy. When hundreds of blogs were simultaneously hammered with commercial-grade request rates, the upstream proxy simply gave up. Not a design flaw—a reasonable capacity assumption destroyed by unreasonable traffic volume.

The economics of hostility

Follow the money and the escalation makes sense. Mobile proxy providers sell "residential IP" access for £4 to £5 per gigabyte, marketed explicitly for bypassing bot detection. Free applications might monetise users by selling tunnel access to scrapers. This would explain why Herman sees attack traffic from cellular network ASNs—thousands of phones unwittingly proxying hostile requests whilst their owners scroll social media.

Malicious scrapers hunt systematically. They probe every site for misconfigured WordPress instances, exposed environment files, AWS credentials accidentally committed to repositories. They scrape and re-scrape constantly, finding vulnerabilities within hours of deployment. Herman blocked nearly two million malicious requests across several hundred blogs in one day. Not an unusual day. A typical one.

Self-hosting has become genuinely dangerous. Simple mistakes that once caused embarrassment now trigger immediate exploitation. The scanning is constant, systematic, automated. No humans involved, no ethical considerations, no concept of proportionality.

The AI companies operate in plain sight. They identify themselves but scrape relentlessly. Herman allows user-initiated queries—bloggers want discoverability—whilst blocking bulk data mining. Making this distinction requires manual intervention and constant vigilance. The companies could respect robots.txt files or terms of service. They don't, because the economics favour extraction over consent.

Then there's the newest threat, born from "vibe coding" and large language models. Someone curious about a topic prompts an AI to build a scraper, runs it from their home computer, and accidentally launches an attack. No malice, no sophistication, no awareness that requesting thousands of pages per minute constitutes an assault. These aren't attacks in any traditional sense. They're accidents at scale, enabled by tools that divorced capability from understanding.

The cost no-one discusses

Bear exists because Herman wanted something specific. Fast, accessible, legible. No JavaScript surveillance, no attention-harvesting algorithms, no pivot towards "creator economy" monetisation. Just writing, made public, discoverable through a feed Herman personally curates to prevent spam. Pages average 5kb because they're mostly text.

This sounds quaint. It's becoming extinct.

Server costs stay low—text files are tiny. The actual expense is defence. Herman pays for Cloudflare's web application firewall. Maintains rate limiting rules across multiple layers. Implements custom bot detection. Monitors continuously. He's experimented with zip bombs for hostile scrapers and proof-of-work validation to make scraping expensive. These measures work, but they demand expertise most people lack and vigilance that exhausts everyone.

Running a simple blogging platform now requires security engineering previously reserved for financial infrastructure. Not because Herman codes poorly or provisions inadequately. Because the baseline web environment has become fundamentally hostile.

Herman funds Bear through optional custom domain subscriptions at a few pounds monthly and personal commitment to maintaining a space he believes should exist. Earlier this year he wrote Bear's manifesto, promising the platform wouldn't shut down, wouldn't sell, wouldn't show advertisements. He detailed succession plans in case something happened to him. These commitments matter precisely because so many good platforms have vanished—not from lack of users, but from economics that make maintenance unsustainable.

Every independent platform operator fights the same escalating battle. They absorb costs generated by hostile actors whilst receiving nothing in return. The selection pressure is brutal. Platforms without funding find defence too expensive or exhausting. Those with funding face pressure to monetise through advertising or data collection, undermining the values that made them worth defending. The middle ground—sustainable, independent, user-respecting—is being systematically eliminated. Operator burnout isn't a bug in this system. It's an externality nobody bothers counting.

Building fortresses for text files

Herman's response to the outage was technically excellent. Redundant monitoring that phones, emails, and texts during failures. Increased rate limiting that cut server load by half. Capacity bumped fivefold—overkill perhaps, but compute is cheap compared to weekend catastrophes. Automatic restart protocols. A public status page for transparency.

These measures work. They'll handle larger attacks and fail more gracefully when breached. But they address symptoms whilst leaving fundamental problems untouched. Every megabyte of hostile traffic Herman blocks costs resources—computational capacity, bandwidth, time spent tuning defences. The mitigation is effective but extractive. He's defending a garden by building a fortress, knowing the fortress demands constant maintenance that serves no purpose beyond keeping wolves at bay.

Large companies absorb bot traffic as a cost of doing business. They spread it across massive infrastructure and dedicated security teams. They afford mistakes and recovery. Defence costs don't scale linearly with traffic. Independent operators lack these advantages. Herman's fivefold capacity increase represents meaningful expense on a platform funded by optional £2 subscriptions. The next attack requiring more capacity will cost more. Eventually the treadmill becomes unwinnable.

The alternatives are grim. Authentication walls solve bot problems by eliminating public access—turning open platforms into private gardens that defeat the purpose. IP allowlists work for small communities but break discoverability. CAPTCHAs create friction whilst becoming less effective as AI systems master them. Rate limiting can't distinguish legitimate bursts from coordinated attacks without sophisticated analysis. Every defence trades something valuable—openness, accessibility, user experience—for protection against threats that shouldn't exist at this scale.

What remains unsaid

Herman's weekend isn't unusual. It's ordinary—the new normal for maintaining public resources. The revealing detail isn't that bots attacked Bear. Bots attack everything constantly. It's that a platform designed with unusual care, running efficiently on minimal resources, maintaining five years of perfect uptime, still requires enterprise-grade defences just to survive.

This pattern extends beyond blogging. Every independent forum, community site, resource library, and public service faces identical pressure. Some have resources to fight. Most don't. The ones that vanish disappear quietly through operator burnout rather than dramatic shutdowns. The open web contracts silently whilst everyone focuses on AI ethics debates that miss the actual infrastructure crisis.

No elegant solutions exist. ISP-level bot filtering doesn't happen for residential connections. Economic consequences for hostile traffic generators remain theoretical—attribution is difficult and most behaviour technically stays legal. Rearchitecting how we think about public web resources might help, but who builds that infrastructure when incentives reward extraction over maintenance?

Herman continues anyway. He'll keep tuning defences, upgrading capacity, absorbing costs. Bear stays online because someone cares enough to make it work despite economics suggesting it shouldn't. But how many other operators have that commitment? How many good spaces have we already lost? How many more vanish before people recognise the real crisis isn't what AI companies train models on, but whether spaces worth training on can survive long enough to matter?

The arms race continues. Herman's status page shows green bars he hopes stay solid. Every independent platform operator watches similar dashboards, fighting similar battles, wondering how long they can sustain defences against traffic serving no legitimate purpose. The tragedy isn't technical. It's that we've built a web where maintaining good spaces requires fighting wars nobody should have to wage just to share words on the internet.

#cybersecurity #software development