Paying developers per story point sounds revolutionary. Frederick Taylor tried something similar in 1895.
A buzzy new compensation model for software engineers draws from deep wells of industrial history—and psychological research on what really motivates creative work
Two software engineers sit at adjacent desks in a Manhattan co-working space. One has forty-five AI agents running simultaneously—three ordering lunch, two writing code, one conducting research. The other is typing "fo" into a text editor. One character at a time. Like a caveman, in the words of the man who noticed.
That man was Arman Hezarkhani, and he tells this story to illustrate what he sees as the central dysfunction of modern software compensation. Both engineers probably earn similar salaries. Both arrived at nine and will leave at five. Yet one is harnessing AI to multiply output while the other plods along at the pace of a typewriter. The difference, Hezarkhani argues, is incentives. The solution, according to the company he co-founded with Morning Brew's Alex Lieberman, is to pay engineers the way you pay salespeople.
Tenex, their AI consultancy, compensates developers based on the story points they complete rather than the hours they work. Complete more points, earn more money. The company claims some engineers will earn over one million dollars in cash compensation next year—purely from output. They have recruited rocket scientists from NASA and world-class machine learning researchers. The work ships fast. The code holds up.
It sounds like a clever hack for a new era. But the idea of paying knowledge workers per unit of output has a lineage stretching back more than a century, and a history that Tenex's founders do not mention.
An old idea in new clothes
In June 1895, Frederick Winslow Taylor stood before the American Society of Mechanical Engineers and presented a paper titled "A Piece-Rate System: A Step Toward Partial Solution of the Labor Problem." His argument was elegant. Workers were deliberately slow because hourly pay rewarded time spent rather than value delivered. Pay them per piece produced, and their interests would align with the factory's. Productivity would soar. Everyone would prosper.
Taylor called the problem he was solving "soldiering"—the informal conspiracy of workers to restrict output and protect their collective position. His solution was scientific measurement. Time-and-motion studies would determine exactly how long each task should take. Workers who exceeded the benchmark would earn more. Those who couldn't would be identified and replaced. The measurement would be objective. The incentives would be clear.
The results were less harmonious. At the Watertown Arsenal in Massachusetts, workers complained of feeling "strained and resentful." At Bethlehem Steel, Taylor's showcase project, the piece-rate system reduced costs for management but bred lasting resentment. Of forty men hired over several months, managers determined that only three qualified as "first-class." Most others, they noted, "break down after two or three days." Critics accused Taylor of turning workers into automatons, making work "monotonous and unfulfilling."
The parallels to Tenex are uncomfortable. Both systems promise to align incentives through output-based pay. Both assume workers underperform due to perverse compensation structures—Taylor blamed soldiering, Tenex blames failure to adopt AI. Both rely on supposedly objective measurement. Both promise that higher productivity means higher pay for workers, not just higher margins for employers.
And both face the same fundamental problem: workers game any measurement system they can. Story points are particularly vulnerable. They were invented as rough estimates of complexity, not precise units of output. One team's three-point story is another's thirteen. Tenex claims to solve this with AI-driven estimation, but this merely shifts the arbitrariness from human judgment to model outputs—and the models are trained on historically inconsistent human estimates. You cannot bootstrap objectivity from subjective data.
What psychology says about paying for output
The intuition behind piece-rate pay is seductive. Tie compensation to results and people will work harder to achieve results. Decades of research suggest the relationship is considerably more complicated.
Edward Deci and Richard Ryan, whose self-determination theory has shaped modern understanding of workplace motivation, conducted a meta-analysis of 128 studies on extrinsic rewards. Their central finding was counterintuitive: performance-contingent rewards "significantly undermined free-choice intrinsic motivation." Pay people for completing a task, and they become less interested in the task itself. The effect was robust across contexts and populations.
For software engineering, this creates a problem. Programming—particularly the creative problem-solving that justifies high salaries—depends on intrinsic motivation. The quiet satisfaction of elegant code. The puzzle-solving pleasure of debugging. The craftsman's pride in something well-built. Research suggests that tying pay directly to output can crowd out precisely the internal drive that makes excellent engineering possible. Faster does not mean better. More points delivered does not mean more value created.
Separate research on gig economy workers reveals another risk. A study in Work and Occupations found that platform workers who depended primarily on gig income reported higher psychological distress than those who did gig work on the side, traditional wage workers, and the self-employed. Financial strain explained roughly half this elevated distress. When pay fluctuates with output, even high average compensation carries psychological costs. The stress of uncertainty compounds across months and years.
Tenex mitigates this somewhat by paying a flat base salary with story-point bonuses added quarterly. But research published in the Journal of Applied Psychology found that workers with fluctuating pay—even when the fluctuation sits atop a stable base—"tend to have poorer sleep, more stress and miserable physical symptoms." The volatility itself extracts a toll.
The problem compensation cannot solve
Perhaps the most surprising finding in recent AI productivity research has nothing to do with compensation. In a randomised controlled trial published in July 2025, METR found that experienced open-source developers using AI tools completed tasks nineteen percent slower than those working without AI assistance.
Read that again. Slower.
The perception gap was more remarkable still. Before starting, developers predicted AI would make them twenty-four percent faster. After finishing their tasks—more slowly than the control group—they still believed AI had improved their productivity by twenty percent. The tools felt faster. The stopwatch said otherwise.
This finding complicates Tenex's entire diagnosis. The company assumes developers resist AI adoption because hourly compensation fails to reward efficiency. But the METR study suggests developers don't need incentives to believe AI helps them. They already believe it. They're just wrong. Stack Overflow's 2025 developer survey reinforces the picture: eighty percent of developers now use AI tools, but trust in their accuracy has fallen from forty percent to twenty-nine percent. Developers are using tools they don't trust, believing those tools make them faster when they often don't.
If the problem is cognitive rather than economic, no compensation structure will solve it. Paying someone more per story point does not help them recognise when AI is slowing them down. It might exacerbate the problem, creating financial pressure to adopt tools that feel productive without being so.
The conditions for success
None of this means Tenex's model cannot work for Tenex. The company emphasises extreme hiring selectivity. "We make hiring incredibly difficult for ourselves," Hezarkhani told his conference audience, "so that everything else is easy."
This admission is more revealing than it might appear. Tenex's compensation model may succeed not because of the incentive structure but because of the talent pool it operates within. Engineers who actively seek outcome-based pay are self-selecting for high confidence in their productivity. Those who survive rigorous screening are proven performers. In a small, elite team doing client-facing work with clear deliverables, story-point compensation may function well.
But this is not a scalable philosophy. It is a description of conditions under which output-based pay can avoid its usual pathologies. The motivation research suggests that extrinsic rewards work best for people who are already intrinsically driven—precisely the population Tenex's hiring process selects. For the broader developer workforce, the century-old lessons of Taylorism may apply more directly.
The company's own safeguards hint at the fragility. Strategists are compensated based on customer happiness, creating a counterbalance against point inflation. Multiple rounds of quality assurance catch rushed work. These are not features of a self-regulating system. They are guardrails against the gaming that output-based pay inevitably invites.
What the caveman might know
Return to that Manhattan co-working space. The developer typing character by character—the one who looked to Hezarkhani like a relic—what explains their behaviour?
The Tenex story assumes they lack incentives. Tie their pay to output and they would rush to adopt AI. But the research suggests other possibilities. Perhaps they have tried AI tools and found them unreliable for their particular work. Perhaps they are maintaining a codebase where AI assistance creates more problems than it solves. Perhaps, like the experienced developers in the METR study, they have learned through painful experience that what feels faster often is not.
The WeWork scene is a parable about divergence, but its meaning is less obvious than Hezarkhani assumes. He sees a failure of incentive design. The psychological research sees a cognitive puzzle—why do people believe tools help them when the evidence says otherwise? The historical record sees a recurring fantasy—that scientific measurement and aligned compensation can resolve the tensions inherent in creative work. Taylor believed this. His workers did not find it so.
Tenex may have built something that works for a small group of exceptional engineers doing clearly scoped client projects. That is worth noting. But the claim that paying per story point solves knowledge work's incentive problems rests on assumptions that a century of industrial history and decades of motivational psychology have complicated. The workers at Watertown Arsenal and Bethlehem Steel could have told us this. So could Deci and Ryan. So, perhaps, could the METR researchers watching experienced developers work slower while feeling faster.
What AI demands from compensation design may be humility rather than disruption. An acknowledgment that measuring creative work is hard. That incentives shape behaviour in unexpected ways. That the person typing "fo" one character at a time might understand something about their work that forty-five AI agents cannot see.