Silicon Valley's AI agents can't schedule meetings but promise to replace workers by 2027
Why Andrej Karpathy thinks reliable AI assistants need a decade while the industry sells snake oil
The most sophisticated AI system ever built just failed to schedule my Tuesday meeting. Again.
This is the maddening reality of artificial intelligence in 2025: the same technology that can diagnose rare diseases from medical scans can't handle the calendar conflict between British Summer Time and Eastern Daylight Time. OpenAI's GPT-4o—supposedly the apex of machine intelligence—fails at basic office tasks 91.4% of the time. That's not a typo. Nine out of ten times, it fails.
Andrej Karpathy isn't surprised. After five years teaching Teslas to drive themselves, watching promise after promise crash into reality, he knows the script by heart. Demo, hype, investment, failure, silence. His prediction that functional AI agents need at least a decade has split Silicon Valley into warring tribes: the believers crying pessimism, the engineers muttering "optimist."
"They don't work," Karpathy told Dwarkesh Patel. Not "need improvement." Not "almost there." They. Don't. Work.
Yet the money keeps flowing—$131.5 billion in 2024 alone—chasing a revolution that might be further away than when we started. MIT just found 95% of enterprise AI pilots fail completely. Not underperform. Fail. Meanwhile, venture capitalists promise AI agents will replace half the workforce by 2027.
Someone's lying. And Karpathy's "march of nines" explains exactly who.
The brutal mathematics of "good enough"
Here's what nobody tells you about AI reliability: the difference between 90% and 99% isn't 9%—it's infinite.
Karpathy learned this at Tesla, where five years of work conquered maybe three decimal points. "Every single nine is a constant amount of work," he says, and hidden in that sentence is why your AI assistant will still be useless in 2030.
Think about email. At 99% reliability, an AI assistant fails once per hundred messages. For a company processing 10,000 daily emails, that's a hundred disasters every morning. Missed contracts. Leaked data. Deleted customer records. One percent failure means unemployable.
Getting to 99.9%? That's not tweaking—it's archaeology. You're no longer fixing common problems but excavating the bizarre: the client whose name contains emoji, the meeting request sent during a leap second, the calendar system used by exactly one customer in Nepal. Each fix creates new edge cases. The edge cases have edge cases.
Microsoft's AI Red Team discovered the cruel joke: throwing money at the problem makes it worse. Double your model size, 10x your training data—you'll get more sophisticated failures, not fewer failures. The AI that couldn't spell "restaurant" now hallucinates entire restaurants that don't exist, complete with believable Yelp reviews.
This is why MIT found 95% of enterprise AI pilots fail. Not because the technology doesn't work, but because 95% reliable means 5% catastrophic. In the real world, that's the difference between a tool and a liability.
Where AI dreams go to die
Walk into any Fortune 500 IT department. Ask about their AI initiatives. Watch them flinch.
The pattern is so predictable it's almost a religious ritual:
- Competitor announces AI breakthrough
- CEO panics, demands action
- Consultants appear with slides
- Vendor demo dazzles everyone
- Contracts signed, champagne popped
- Reality arrives
- Silence
McKinsey—who sells these transformations—admits only 1% of companies have achieved "AI maturity." The other 99% are burning shareholder money on what amounts to extremely expensive gambling.
Take the financial services firm that spent $12 million on AI loan approvals. The pilot: 94% accurate. Production: catastrophe. It approved deadbeats, rejected gold-standard borrowers, nearly triggered regulatory intervention. The culprit? Training data from good times, deployment during a minor recession. The AI had never seen economic headwinds. Twelve million dollars to learn that patterns change.
NTT DATA's numbers are even grimmer: 70-85% failure rate, and it's getting worse, not better. The tools are more sophisticated; the failures more spectacular. Companies building their own AI systems? 67% failure rate. Buying from vendors? Still 33% failure. Those are survival rates from World War I.
"Almost everywhere we went, enterprises were trying to build their own tool," MIT's Aditya Challapally observed, watching millions evaporate. The core delusion? That AI systems learn from experience. They don't. They pattern-match training data. When reality diverges from training—and reality always diverges—they fail like clockwork.
Sixteen years to drive six blocks
The self-driving car is AI's perfect cautionary tale—not because it failed, but because it succeeded just enough to reveal how far success really is.
Timeline of delusion:
- 1995: Carnegie Mellon's car drives itself on highways
- 2009: Google starts self-driving project
- 2014: Karpathy takes "perfect" Waymo ride
- 2015: Blind man rides alone through Austin
- 2016: Elon Musk promises coast-to-coast self-driving "next year"
- 2017: Still next year
- 2018: Still next year
- 2019: "One million robotaxis by 2020"
- 2020: Zero robotaxis
- 2025: Waymo operates in six cities, perfect weather, mapped roads only
Sixteen years. Billions spent. The brightest minds in tech. Result? Cars that work perfectly until they don't—and when they don't, they stop dead in intersections, drive through emergency scenes, or mistake a billboard for a road sign.
The demos were never fake. Waymo's cars genuinely drove perfectly—on predetermined routes, in controlled conditions, with edge cases eliminated. But reality doesn't offer predetermined routes. Reality offers construction workers dressed as traffic cones. Snow that erases lane markings. Emergencies requiring judgments no algorithm possesses.
Every company learned the same lesson at the same price. BMW promised self-driving by 2021, delivered nothing. Ford burned $1 billion on Argo AI, then killed it. Uber and Lyft sold their self-driving divisions for scrap.
"In self-driving, if things go wrong, you might get injured," Karpathy notes. "In software, it's almost unbounded how terrible something could be."
He's understating it. A car crash is localised, visible, insurable. An AI agent with system access? That's your entire customer database sold on the dark web. Your trade secrets emailed to competitors. Your company's reputation destroyed by decisions you never made.
If it takes sixteen years to make a car drive six blocks safely, how long for an AI to navigate the infinite complexity of human work?
Your brain on ChatGPT
MIT researchers wired up students' brains and discovered something terrifying: AI isn't making us smarter. It's making us cognitively disabled.
Fifty-four students, divided into three groups. Write an essay. Group one uses ChatGPT, group two uses Google, group three uses their brains. The EEG readings told a horror story: ChatGPT users showed the weakest neural activity. Their brains had essentially gone to sleep.
Then came the gut punch. After writing their essays, researchers asked students to quote from their own work. The ChatGPT group? 83.3% couldn't remember a single sentence. Not one. They had no memory of "their" words because their brains never engaged with them. They were cognitive zombies, present but not thinking.
The researchers call this "cognitive debt"—every time you let AI think for you, your brain atrophies a little more. Like a muscle you stop using. Over four months, the deterioration was measurable at every level: neural, linguistic, behavioral. When forced to work without AI, these students couldn't. The machinery had rusted.
Dr. Zishan Khan watches this play out in real time with young patients: "These neural connections that help you in accessing information, the memory of facts, and the ability to be resilient: all that is going to weaken." His teenage patients who use AI for homework are literally rewiring their brains for dependency.
Twenty-six percent of US teenagers now use ChatGPT for schoolwork—double last year's rate. They're outsourcing their cognitive development during the exact years when their brains are forming permanent patterns.
We've seen this horror movie before. GPS users develop smaller hippocampi—the navigation center of the brain literally shrinks from disuse. Now we're running the same experiment on our ability to think. The generation growing up with AI won't know how to function without it. They won't even know what they've lost.
Why everything we're building is wrong
Yann LeCun built modern AI. Now he's trying to kill it.
"LLMs are good at manipulating language, but not at thinking," Meta's chief AI scientist announced at Davos, essentially calling his industry's crown jewel an elaborate parlour trick. His timeline for obsolescence? Three to five years. After that, he predicts, using current AI architectures will be like "using a steam engine to power a spaceship."
The problem isn't scale or data or computing power. It's physics. Current AI lacks what LeCun calls "world models"—the basic understanding that objects exist, gravity pulls things down, water makes things wet. An LLM has read every description of cooking ever written but doesn't know heat makes things hot. It's like trying to become a surgeon by reading medical journals in a language you don't speak about a species you've never seen.
Think about how you'd clean a messy room. You can visualise the end state, work backwards, understand consequences. You know throwing things makes more mess, that liquids spill, that fragile things break. LLMs know none of this. They're pattern-matching machines in a world that requires understanding.
LeCun's proposed replacement, JEPA, tries to build these world models. Early versions show promise. Timeline to maturity? "Mark Zuckerberg keeps asking me," LeCun says. His answer: maybe a decade. Maybe longer.
Meanwhile, the industry's backup plan—reinforcement learning—is what Karpathy calls "terrible." It's "sucking supervision through a straw," getting one bit of feedback after minutes of action. Imagine learning to cook by being told "bad" or "good" only after serving the meal. No information about what went wrong, when, or why. That's reinforcement learning.
Richard Sutton, who literally wrote the textbook on reinforcement learning, has given up on current approaches leading to understanding. "Language and reality are different things," he told Dwarkesh Patel, dismissing the entire premise that LLMs understand anything at all.
We're not building intelligence. We're building increasingly sophisticated mimics. The difference matters when you need something that actually works.
The future is further than you think
Here's the truth Silicon Valley doesn't want you to hear: the AI revolution already happened. This is it. Impressive demos that don't work. Cognitive crutches that weaken your brain. Billions spent on sophisticated pattern matching that fails the moment reality gets weird.
The real timeline? Karpathy says a decade. History says two. Physics says maybe never, at least not with what we're building now.
Self-driving cars needed sixteen years to operate in six cities. They had clear goals: don't hit things, get from A to B. AI agents need to navigate the infinite complexity of human work, where success itself is ambiguous, context is everything, and edge cases outnumber normal cases.
For businesses, the prescription is brutal: stop believing in magic. That vendor demo showing perfect results? That's take seventeen on curated data. Your AI transformation? It's a 95% chance of expensive failure. Use AI for what it genuinely does—first drafts, pattern recognition, creative inspiration—while maintaining armies of humans to catch its failures. Because it will fail. Constantly.
For individuals, the challenge is existential: resist the cognitive debt. Every time you let AI think for you, you're selling off pieces of your mind for convenience. The students MIT studied can't think without AI anymore. Their brains have literally rewired for dependency. Don't let that be you. Use AI as a tool, not a brain replacement. The skills that matter now aren't the ones AI can replicate but the ones it can't: judgment, context, understanding consequences.
The march of nines grinds on. Each decimal of improvement costs more, takes longer, and reveals new problems we haven't imagined. The demos will keep dazzling. The deployments will keep failing. And every year, someone will promise that next year is the year.
It won't be.
Silicon Valley's AI agents genuinely can't schedule a meeting today. By 2037, they still might not. The future isn't cancelled—it's just much, much further away than anyone selling it wants to admit.
The tragedy isn't that AI agents don't work. The tragedy is that we've built our expectations, our investments, and increasingly our cognitive abilities around the assumption that they do.
They don't. They won't. Not soon. Maybe not ever.
Plan accordingly.