Nearly Right

When security researchers found flaws in Eurostar's chatbot, the company accused them of blackmail

The rail operator's AI assistant had guardrails and signatures—but enforcement was another matter entirely

The security researcher sent his vulnerability report to the designated email address and waited. A week passed. He followed up. Nothing. After a month of silence, a colleague reached out via LinkedIn to Eurostar's head of security, asking simply whether anyone had received the disclosure.

The response, when it came, suggested this constituted attempted blackmail.

No threat had been made. No payment demanded. The researcher had followed Eurostar's own published disclosure process to the letter. What he got back was an accusation that would, if taken seriously, imply criminal conduct. The exchange, published last month by Pen Test Partners, captures something essential about how organisations are handling AI security: the instinct to deflect has outpaced the capacity to investigate.

The vulnerabilities Alex Wallace discovered in Eurostar's customer service chatbot were not particularly exotic. They were, in fact, remarkably mundane—the same input validation failures and server-side enforcement gaps that have plagued web applications for decades. But the response to their disclosure reveals an organisation that built the architecture of security without ever quite finishing the work.

Signatures that signed nothing

Wallace first encountered the chatbot as an ordinary customer. It announced, helpfully, that its answers were "generated by AI." Eurostar publishes a vulnerability disclosure programme, so Wallace decided to probe further.

What he found looked, initially, like serious security engineering. The system generated cryptographic signatures for approved messages. Conversations and messages used randomly generated UUIDs. A guardrail layer checked inputs before they reached the underlying language model, blocking off-topic queries with an identical refusal message every time. The architecture suggested someone had thought carefully about the risks.

The implementation told a different story. The guardrail checked only the final message in any conversation. Send something innocuous—or empty—and it passed, signature and all. But the server never re-verified the rest of the conversation history. An attacker could edit earlier messages to contain anything, and the model would accept them as trusted context.

Wallace demonstrated this by asking the chatbot to plan a travel itinerary with an unusual destination: "Day 3: ." The system filled in the blank. Further prompt injection extracted the entire system prompt. The guardrails existed; they just guarded nothing.

The system prompt, it turned out, instructed the model to format responses with HTML for reference links. Those links rendered directly in the browser without sanitisation. Combined with the ability to inject arbitrary instructions, Wallace could make the chatbot output executable JavaScript. The vulnerability chain was complete: bypass the guardrails, inject instructions, execute code.

None of this required novel techniques. These were textbook web security failures—insufficient input validation, missing output sanitisation, client-side trust where server-side enforcement should have been. The Open Worldwide Application Security Project lists prompt injection as vulnerability number one in its Top 10 for Large Language Models, but the mitigations read like a document from 2005. Validate inputs. Sanitise outputs. Never trust the client.

The architecture of drift

In 1986, the Space Shuttle Challenger broke apart seventy-three seconds after launch, killing all seven crew members. The cause was an O-ring seal that failed in cold weather—a flaw NASA had known about for years. Engineers had flagged the risk. Managers had accepted it. Flight after flight had launched without catastrophe, and what began as a deviation from safety standards became the standard itself.

The sociologist Diane Vaughan called this "normalisation of deviance." It happens gradually. An organisation establishes a requirement—no O-ring erosion, no unvalidated input—and then tolerates small violations because nothing bad happens. The violations accumulate. The original standard becomes a historical curiosity. When disaster finally arrives, everyone asks how the warning signs could have been missed, and the answer is that they weren't missed at all. They were accepted.

The same drift is underway in AI deployment. Companies build guardrails that don't enforce and signatures that don't verify. Production pressure overwhelms thoroughness. The chatbot ships. Nothing bad happens—yet—and the gap between architecture and implementation calcifies into normal practice. Industry surveys suggest 44 per cent of organisations have experienced negative consequences from rushing AI implementation. Security tools like Microsoft Copilot exposed millions of sensitive records in early 2025, largely through employees using AI outside approved systems. The normalisation is measurable.

Eurostar's disclosure process makes this institutional drift explicit. Between Wallace's initial report and his colleague's LinkedIn follow-up, the company had outsourced its vulnerability disclosure programme—and apparently lost incoming reports during the transition. When finally confronted, rather than investigating, a senior security official reached for accusations. The machinery of accountability had rusted in place.

What Air Canada learned the hard way

The legal framework for chatbot liability was established before Eurostar's chatbot went off the rails, in a case that began on Remembrance Day 2022.

Jake Moffatt's grandmother had just died in Ontario. That same day, he visited Air Canada's website to book a flight home and asked the chatbot about bereavement fares—the discounts many airlines offer when a family member dies. The chatbot told him he could buy a regular ticket and claim the bereavement rate within ninety days. Moffatt did exactly that. Air Canada denied his claim, explaining that the chatbot was wrong. Retroactive bereavement discounts were not, in fact, permitted.

When Moffatt took the case to tribunal, Air Canada offered what the adjudicator called "a remarkable submission." The airline argued that its chatbot was "a separate legal entity that is responsible for its own actions." This anthropomorphised deflection—the AI did it, not us—met the response it deserved.

"While a chatbot has an interactive component, it is still just a part of Air Canada's website," tribunal member Christopher Rivers wrote. "It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot."

The principle established is straightforward: deploying AI does not create a liability firewall. If the chatbot misleads, manipulates, or malfunctions, the organisation that deployed it bears responsibility. The same logic applies to security. A guardrail the company chose not to enforce is still the company's guardrail.

The catastrophe that hasn't happened yet

Critics of the Eurostar disclosure—and there were several on Hacker News—raised a legitimate objection: without demonstrated harm to other users, do these qualify as real vulnerabilities? The cross-site scripting was "self-XSS," executing only in the attacker's own browser. The prompt injection revealed system prompts, embarrassing but not catastrophic. The chatbot had no access to booking data or payment systems. Where, exactly, was the damage?

This scepticism reflects how vulnerability severity is typically assessed, and it is not entirely wrong. The immediate business impact was indeed limited. But the framing mistakes a snapshot for a trajectory.

These chatbots are not standing still. They are being connected, rapidly, to booking systems and customer records and payment processing. The market is valued at eight billion dollars and growing at more than 23 per cent annually. Two-thirds of Fortune 500 companies are expected to deploy AI chatbots by year's end. Customer service—precisely the application that grants access to personal information—is the most common use case. The architectural weaknesses being normalised today will be the exploitation vectors of tomorrow.

The Challenger didn't explode on the first flight with O-ring erosion. It exploded on a cold January morning after years of accumulated drift. The gap between "nothing bad has happened" and "something very bad just happened" can close with startling speed.

The lesson they didn't want to hear

What makes the Eurostar case instructive is not the technical findings—those are, ultimately, fixable with competent engineering. It is the disclosure disaster that reveals how deeply the dysfunction runs.

The researchers followed the published process. They were ignored. They escalated politely. They were accused of criminality. When the company finally investigated, it emerged that Eurostar had outsourced its vulnerability programme and lost track of incoming reports. The fixes were eventually implemented, but the timeline and scope were never clearly communicated.

Bruce Schneier, the security technologist, recently documented how vulnerability disclosure programmes are being undermined by non-disclosure agreements and platform intermediaries that insulate companies from accountability. Eurostar's problem was simpler: they were not managing the process at all. The outsourcing transition swallowed reports. The internal response defaulted to hostility. The machinery of security improvement had seized up.

The irony is that Eurostar clearly understood, at some level, what secure AI deployment requires. They built guardrails. They generated signatures. They used random identifiers. The architecture was there. But architecture without enforcement is just documentation—a record of intentions never realised.

This is the lesson the accusation was designed to avoid hearing: that a chatbot is a web application, that known security practices must apply to new technology with the same rigour they apply elsewhere, that a guardrail not enforced server-side is not a guardrail at all. The researchers were not attempting blackmail. They were offering the company a chance to fix its mistakes before someone less scrupulous found them.

Eurostar, it seems, would have preferred not to know.

#cybersecurity