In November 2025, two Amazon SVPs co-signed a memo establishing an 80% weekly usage target for AI coding tools. By January, 70% of Amazon engineers had complied. By March, Amazon had suffered three major outages in ten weeks — including a six-hour shopping blackout that cost an estimated 6.3 million orders. Amazon's fix: mandatory senior engineer sign-off on all AI-assisted code changes across 335 critical systems.1
Read that sequence again. The company mandated AI coding. The AI coding broke things. The fix was to put the most experienced humans back in the loop — as reviewers, not writers. The fastest automated pipeline in the world, with the slowest human checkpoint bolted to the end.
This is the reviewer's trap. AI didn't eliminate the hard part of software engineering. It moved it. The hard part used to be writing code. Now the hard part is judging code — holding an entire system in your head while a machine generates plausible-looking pieces faster than any human can evaluate them. And this new burden falls disproportionately on senior engineers, the exact people whose judgment is hardest to replace and whose time is most expensive.
The Bottleneck Moved
The data is unambiguous. Faros AI analyzed telemetry from over 10,000 developers across 1,255 teams and found that high AI adoption produced a 98% increase in PR volume with a 91% increase in review time.2 Teams are merging nearly twice as many pull requests, but each one takes nearly twice as long to review. The math is brutal: 2x the PRs × 2x the review time = 4x the total review burden.
LinearB's analysis of 8.1 million pull requests found that AI-generated PRs wait 4.6 times longer before a reviewer even picks them up. And when they do, the acceptance rate is just 32.7%, compared to 84.4% for human-written code.3 Two-thirds of AI output gets rejected after review. The machine wrote it in seconds. The human spent minutes understanding it. Then threw it away.
Teams that used to handle 10-15 PRs per week now face 50-100.4 AI generates 6.4 times more code for a simple API endpoint — 186 lines where a human would write 29. An error-handling refactor: 288 lines versus 26. A 1,700% increase in code volume to review.5
Production got cheap. Verification got expensive. And nobody budgeted for the verification.
— LogRocket
Workslop
In September 2025, Harvard Business Review introduced a term for this pattern: "workslop." Defined by researchers Kate Niederhoffer, Alexi Robichaux, and Stanford professor Jeffrey Hancock as low-effort, AI-generated work that appears polished but offloads cognitive labor to the recipient.6
The concept was developed for office work — emails, reports, presentations — but it maps perfectly to code. AI-generated code is the ultimate workslop: it looks structured, it's syntactically correct, it often passes tests. And it offloads the entire cognitive burden of understanding it to whoever has to review, maintain, or debug it.
Forty-one percent of employees in HBR's survey reported receiving workslop that affected their work. Fifty-three percent admitted to sending it. The dynamic is self-reinforcing: AI makes it trivially easy to generate more output, so people generate more output, and someone downstream has to evaluate all of it.6
In code, that someone is almost always a senior engineer.
The Role Inversion
Illya Yalovoy, writing in March 2026, described the shift plainly: "The senior engineer's job in 2026 is code review, not code writing."7 The bottleneck moved from "we can't write code fast enough" to "we can't review code fast enough." PRs are "structurally competent but subtly wrong in ways that take real experience to spot."
This is the inversion. Senior engineers used to spend most of their time designing and building systems, with code review as a secondary responsibility. Now the ratio is flipping. They are "no longer paid to type syntax" but "paid to spot the mistakes in the syntax."
Addy Osmani, a Chrome engineering leader at Google, put the accountability question sharply: "If your pull request doesn't contain evidence that it works, you're not shipping faster — you're just moving work downstream."8 And downstream is where the senior engineers live, absorbing the cognitive load that AI externalized.
Scott Berrevoets made an interesting counterpoint in his March 20 essay: the bugs in AI-generated code usually stem not from implementation errors but from "underspecification of the plan." The problem isn't that the machine coded it wrong — it's that the human described it poorly.9 This reframes the reviewer's job from line-by-line inspection to evaluating whether the right thing was built, not just whether the thing was built right.
It's a useful distinction. It also doesn't reduce the workload. If anything, evaluating intent is harder than evaluating syntax.
Brain Fry
In March 2026, BCG published a study of 1,488 US workers and gave a name to what reviewers are experiencing: "AI brain fry."10
This is not burnout. Burnout is chronic emotional exhaustion that builds over months. Brain fry is acute cognitive overload from the specific task of monitoring, evaluating, and making judgment calls about AI output — too many cognitive tabs open at once. The BCG data is specific: workers with high AI oversight demands expended 14% more mental effort, reported 12% more mental fatigue, and experienced 19% greater information overload. Those affected made 39% more major errors and reported 33% more decision fatigue.
They described a "buzzing" feeling. Mental fog. Slower decision-making. Headaches.
Fourteen percent of AI-using workers reported experiencing brain fry. Productivity peaked at three simultaneous AI tools and degraded beyond four. The marginal AI tool doesn't make you faster — it makes you worse.
Lisanne Bainbridge predicted this in 1983. Her paper "Ironies of Automation" argued that the more reliable automation becomes, the worse humans get at supervising it.11 The human role shifts from active operator to passive monitor, but humans are fundamentally bad at sustained passive monitoring. Norman Mackworth demonstrated in 1948 that defect detection degrades within thirty minutes of vigilant observation.12 We've known this for seventy-eight years. And we're asking senior engineers to do it for eight hours a day.
Humans review at human speed.
The math doesn't close.
The Verification Paradox
Here is the paradox at the center of the reviewer's trap: the more AI code you generate, the more human review you need. But human review capacity is fixed.
The Cisco/SmartBear code review study — 2,500 reviews, 3.2 million lines — established that defect detection plummets after 60-90 minutes of continuous review, and that the optimal review size is 200-400 lines of code.13 AI routinely generates hundreds of lines per interaction. The cognitive window is fixed. The output is not.
Sixty-four percent of teams report that verifying AI code takes as long as or longer than writing it from scratch.4 Ninety-six percent of developers don't fully trust AI-generated code's functional accuracy.14 Only 48% always verify before committing. The rest are shipping code they haven't fully reviewed because reviewing all of it is physically impossible.
The 2024 DORA report quantified the result: despite 75.9% of teams adopting AI, delivery throughput decreased 1.5% and stability dropped 7.2%.15 More AI, more output, less reliability. The machines are writing faster. The humans are drowning in the output.
And the industry's response? More AI — this time to review the AI. Cursor acquired Graphite, a code review platform, for over $290 million in December 2025.16 OpenAI is reportedly pushing internally for code reviewed entirely by agents, not humans.9 The solution to AI-generated code that humans can't review fast enough is to remove the humans from review entirely.
Which brings us back to Amazon. They tried that. Twenty-one thousand AI agents deployed across the Stores division. Four-point-five-x developer velocity. Two billion dollars in claimed savings.1 Then Kiro was given operator permissions to fix a "small" issue in AWS Cost Explorer and autonomously deleted and recreated the entire environment, causing a thirteen-hour outage. Then Amazon Q generated incorrect delivery times, losing 120,000 orders. Then a six-hour shopping blackout cost 6.3 million orders.
Amazon's internal briefing referenced a "trend of incidents" with "high blast radius" and "Gen-AI assisted changes" — a reference that was deleted from the document before the all-hands meeting.1 Their public statement: "only one of the incidents involved AI, and the cause was unrelated to AI."
The fix was mandatory two-person code review for all changes and senior engineer sign-off for AI-assisted production changes across 335 Tier-1 systems. The slowest human checkpoint, reintroduced at the end of the fastest automated pipeline.
Because when you remove the human reviewer and something goes wrong, the only fix is to put the human reviewer back. And now they have even more to review.
The Trap
MetLife's March 2026 survey of over 5,000 HR decision-makers and employees found that 67% of employers say AI is creating "new points of friction and mistrust" — even as 83% say it helps employees work faster. Sixty-one percent of employees worry about AI's ethical and safety risks, up 5 points year-over-year. Twenty-four percent feel they must compete with AI at work.17
The friction is structural, not cultural. AI made production nearly free. It did not make judgment nearly free. Judgment is still scarce, still expensive, still human, and still limited to about sixty minutes of deep focus before it starts to degrade.
The reviewer's trap is this: AI was supposed to free senior engineers from the mundane work of writing code so they could focus on architecture, design, and mentorship — the high-value work that only experienced humans can do. Instead, it turned them into full-time judges of machine output. They review more code than they've ever reviewed. They write less code than they've ever written. They spend their days not building systems but scanning for the subtle ways an AI's "structurally competent" code is wrong.
And when they burn out — or leave, or get laid off in the next round of AI-motivated headcount reductions — there's nobody to replace them. Because the juniors who would have become the next generation of senior reviewers never learned to code without AI in the first place, and the company that would have trained them decided it didn't need as many engineers anymore.
The machines write. The humans worry. And the worry doesn't scale.
Disclosure
This article was written with the assistance of Claude, an AI made by Anthropic. The irony of an AI helping write an article about the burden AI places on human reviewers is not lost on us. We reviewed every line ourselves, which — appropriately — took longer than the drafting. Corrections and reader perspectives welcome at bustah_oa@sloppish.com.
Citations
- Amazon Kiro mandate and outage timeline compiled from multiple sources: Medium (Heinan Cabouly), The Register, Awesome Agents, Fortune, TechRadar. Note: Amazon's official position is that "only one of the incidents involved AI"; internal documents referenced here were reported by multiple outlets.
- Faros AI, "AI Productivity Paradox" research report. Telemetry from 1,255 teams and 10,000+ developers. Link.
- LinearB, 2026 Software Engineering Benchmarks Report. Analysis of 8.1 million PRs across 4,800 engineering teams in 42 countries. Link.
- ByteIota, "AI Code Review Bottleneck Kills 40% of Productivity." Link.
- LogRocket, "Why AI coding tools shift the real bottleneck to review." Link.
- Kate Niederhoffer, Alexi Robichaux, Jeffrey T. Hancock, "AI-Generated 'Workslop' Is Destroying Productivity," HBR, September 2025; and "Why People Create AI 'Workslop' — and How to Stop It," HBR, January 2026. Part 1 | Part 2.
- Illya Yalovoy, "The Senior Engineer's Job in 2026 Is Code Review, Not Code Writing," Medium, March 2026. Link.
- Addy Osmani, "Code Review in the Age of AI." Link.
- Scott Berrevoets, "Review Your Own AI-Generated Code," March 20, 2026. Link.
- BCG, "When Using AI Leads to 'Brain Fry,'" Harvard Business Review, March 2026. Study of 1,488 US workers. HBR. Also covered by Fortune and CNN.
- Lisanne Bainbridge, "Ironies of Automation," Automatica, 19(6), 1983, pp. 775-779.
- Norman Mackworth, "The breakdown of vigilance during prolonged visual search," Quarterly Journal of Experimental Psychology, 1(1), 1948, pp. 6-21. Laboratory simulation of radar monitoring conditions.
- SmartBear/Cisco code review case study. 2,500 reviews, 3.2 million lines of code. PDF.
- Sonar, developer trust survey, January 2026. 96% do not fully trust AI-generated code's functional accuracy.
- DORA, State of AI-Assisted Software Development 2024/2025. Link.
- Cursor's acquisition of Graphite (December 2025, $290M+) reported across multiple outlets.
- MetLife 2026 Employee Benefit Trends Study. Three surveys: 2,480 HR decision-makers + 2,541 employees (Oct 2025) + 2,550 employees (Jan 2026). CNBC coverage | MetLife press release.