Ladybird browser reaches Apple's 90% test threshold whilst remaining years from usability
Independent project passes arbitrary benchmark as major browsers avoid iOS entirely
The Ladybird browser project announced it had crossed 90% on web-platform-tests—the threshold Apple requires for alternative browser engines on iOS. For eight full-time engineers competing against Google's thousand-person Chrome team, this seemed a triumph.
Then the top comment arrived, from someone who helps maintain those very tests, "I'd caution against any use of the test pass rate as a metric for anything."
That tension—between achievement and its meaninglessness—reveals how we've built a system where passing the test means almost nothing about building a usable browser. And how regulatory compliance theatre can look like progress whilst changing nothing at all.
The test designed not to measure
Web-platform-tests were never meant to rank browsers. The maintainers explicitly designed them as "a useful engineering tool rather than a good metric." This makes sense, engineers need tests that are easy to contribute and comprehensive enough to catch bugs. What makes a good debugging tool makes a terrible benchmark.
The result? More than half of all web-platform-tests cover character encoding. Not because encoding represents half the challenge of building a browser, but because encoding tests are trivial to generate algorithmically. Complex layout behaviour is hard to test, so fewer such tests exist. A browser scoring 90% might excel at encoding whilst rendering actual websites catastrophically.
The creators of web-platform-tests understand this perfectly. That's why they built the Interop Project—a separate initiative where browser vendors select test subsets representing features with genuine interoperability problems. Interop requires consensus and balances coverage across the platform.
Apple ignored it. When the EU's Digital Markets Act forced Apple to allow alternative engines on iOS, Apple chose the blunter instrument, 90% on the unbalanced test suite explicitly designed not to be a metric. It transformed an engineering tool into a regulatory gatekeeper overnight.
An arbitrary gate with impossible terms
The 90% threshold is merely the visible requirement. Apple's other conditions explain why no major browser vendor has shipped an alternative engine for iOS despite 15 months of the regulatory "opening."
Alternative engines must exist in EU-only apps, preventing global versions. Updates must ship within 15 days of any engine release, regardless of testing needs. Development and testing can only happen on devices physically located in the EU—American and Asian engineers need not apply.
Mozilla and Google, whose browser engines serve hundreds of millions, have declined. Apple claims vendors "have chosen not to" participate, a statement of spectacular disingenuousness. Building a separate codebase, abandoning existing users, and accepting Apple's contractual terms makes commercial sense for exactly nobody.
This is regulatory compliance as performance art. The EU forced Apple's hand. Apple responded with requirements that technically allow alternative engines whilst making them commercially impossible. It's malicious compliance elevated to corporate strategy.
The chasm between tests and browsers
Ladybird's actual achievement deserves recognition. The project evolved from a hobby browser for a custom operating system into a serious attempt at an independent engine. Corporate sponsors include Shopify and Cloudflare. GitHub's co-founder donated a million pounds. The team publishes monthly progress reports showing genuine advancement.
Their JavaScript engine now beats Chrome and Safari on compliance tests. Gmail and Google Calendar load. For a project founded in 2022, this represents extraordinary engineering.
Yet Ladybird remains, by its developers' own admission, roughly 10 times slower than established browsers. It crashes on many real websites. Video playback fails. Form handling breaks. Complex layouts render incorrectly. The planned alpha release for early adopters isn't until 2026.
This paradox—high test scores, low usability—illuminates what test suites cannot capture. Browser development's genuinely hard problems involve performance optimisation, memory management across thousands of tabs, security sandboxing, and handling the infinite edge cases that real websites inflict. Tests verify that features technically work. They cannot measure whether features work well enough for anyone to actually use them.
Andreas Kling, Ladybird's founder, acknowledges this directly. Performance is "one of our weak points." The development philosophy, "make it work, make it right, make it fast"—in that order. Sensible for a young project, but it means the 90% score wildly overstates readiness for users.
Why new engines died
Modern browsers are operating systems masquerading as applications. They manage processes, threads, and memory. They implement networking stacks, graphics acceleration, and security sandboxes. They must handle not just current web standards but decades of accumulated cruft that real websites depend on.
The specifications alone span thousands of pages across dozens of documents, each evolving continuously. CSS, HTML, JavaScript, WebAssembly, accessibility APIs, payment APIs, geolocation, notifications, service workers—the list expands faster than small teams can implement. Chrome's codebase contains 40 million lines. Firefox is comparably vast. These projects represent hundreds of person-years, continuous optimisation, and countless bugs found through billions of users.
This explains the extinction event in browser diversity. Opera abandoned its own engine for Chromium in 2013. Microsoft killed EdgeHTML for Chromium in 2020. These companies had substantial engineering teams and existing codebases. Both concluded they couldn't maintain independent engines. If Microsoft couldn't sustain its own browser, what chance does a non-profit with eight engineers have?
The Interop Project actually addresses this problem intelligently. Browser vendors agree on priority areas and collaborate to fix interoperability issues, making the platform more usable without requiring new engines. But Interop's careful, consensus-based methodology makes it useless as a simple pass/fail gate. Apple needed something mechanical and arbitrary, so it chose test score percentages over thoughtful collaboration.
Performance art masquerading as regulation
The European Union forced Apple to allow alternative engines. Apple responded with requirements designed to make alternatives commercially impossible whilst technically complying. No major browser vendor finds these terms acceptable. The only project reaching the threshold is a small non-profit that won't ship anything usable for years.
The broader barriers remain untouched. Google can afford a thousand-person browser team because Chrome funnels users to Google Search, generating advertising revenue. Mozilla survives on payments from Google to make Google Search the default in Firefox. Apple maintains Safari because controlling the browser helps lock users into its services.
Ladybird has none of these advantages. It has sponsorship from companies believing in an open web, but no business model that scales with usage. Even reaching alpha quality won't solve the fundamental problem, sustaining development whilst competing against established engines with massive head starts requires resources that engineering skill cannot conjure.
The web's founding promise—open, interoperable standards implemented by multiple competing browsers—looks increasingly quaint. What we have instead, Chromium dominance, Safari locked to iOS, Firefox barely surviving on Google's payments, and regulatory compliance so theatrical it changes nothing substantive.
Browser diversity becoming real rather than aspirational requires addressing actual barriers, sustainable funding models for independent browsers, specifications that don't expand faster than small teams can implement, and regulatory enforcement demanding substantive change rather than accepting procedural compliance. None of these seem remotely forthcoming.
Ladybird continues building regardless. Perhaps that stubborn independence has value even without changing the market—proof that building a browser from scratch remains technically possible, if commercially insane. In a world of trillion-pound corporations and effective monopolies, someone needs to demonstrate that the old promises weren't complete fantasies.
Just don't mistake passing an arbitrary test threshold for actual progress towards browser diversity. That gap between metric and reality is precisely what makes this milestone simultaneously impressive and meaningless.