The Manager's Dilemma

It's a Tuesday standup. The sprint dashboard is on the big screen. One developer closed 14 tickets last week — features, bug fixes, a small refactor — all with AI agents handling the implementation. Another developer closed 5, all written by hand. Both are senior. Both are competent. The engineering manager stares at the numbers and feels the question forming that she doesn't want to ask out loud: why am I paying for two people?

She knows the answer, of course. She's been an EM for eight years. She knows that the developer who closed 5 tickets also spent two days untangling a production incident that wasn't in any ticket, mentored a mid-level engineer through a system design she'll need to own next quarter, and caught an architectural flaw in a PR that would have caused data loss at scale. None of that shows up in the sprint velocity chart. None of it ever does.

But her VP doesn't know that. Her VP sees a spreadsheet. And on that spreadsheet, one developer produces 2.8x the output of another, and the company just announced an AI-first mandate, and the board is asking why headcount hasn't decreased when the tools have supposedly made everyone so much more productive.

This is the manager's dilemma. The question of whether to use AI was settled two years ago. The dilemma is what you're willing to lose in the name of a metric that doesn't measure what you actually need.

The Consolidation Math

The logic is seductive in its simplicity. If AI makes developers 2-3x more productive, you need fewer developers. The math works on a napkin. It's working on earnings calls, too.

Salesforce CEO Marc Benioff told investors he'd reduced engineering headcount from 9,000 to about 5,000 because "I need less heads." No new software engineers were hired in 2025.¹ Shopify CEO Tobi Lutke issued a company-wide memo making AI a "fundamental expectation" and requiring teams to prove a job can't be done by AI before requesting headcount.² Amazon laid off 16,000 workers in 2026, with CEO Andy Jassy stating AI will mean "fewer people in certain roles."³ IBM plans to replace roughly 30% of back-office roles — about 7,800 positions — over five years.⁴

And then there's Klarna. The poster child for aggressive AI consolidation cut 47% of its workforce — from 5,527 employees to 2,907. CEO Sebastian Siemiatkowski boasted about replacing human work with AI at every opportunity. Then customer complaints rose. Satisfaction dropped. The CEO publicly admitted they "went too far." Klarna is now rehiring humans.⁵

A survey by an enterprise workforce analytics firm found that 55% of companies that rushed to replace workers with AI now regret it.⁶ Harvard Business Review published a January 2026 analysis with a title that deserves to be printed on every boardroom wall: "Companies Are Laying Off Workers Because of AI's Potential — Not Its Performance."⁷

The consolidation math works on a napkin but collapses inside a codebase.

"I've reduced it from 9,000 heads to about 5,000 because I need less heads."
— Marc Benioff, CEO of Salesforce, who hired zero software engineers in 2025

What the Data Actually Shows

Here is the uncomfortable empirical reality that the consolidation math ignores: the productivity gains from AI coding tools are far less clear than the marketing suggests, and in some cases, they're negative.

The 2024 DORA State of DevOps report — the industry's most respected annual assessment of software delivery performance — found that despite 75.9% of teams adopting AI coding tools, delivery throughput decreased 1.5% and stability dropped 7.2%.⁸ More AI adoption. Less actual delivery. The core finding was blunt: "AI doesn't fix broken teams; it amplifies what's already there."

METR, an independent AI evaluation lab, conducted a randomized controlled trial with experienced open-source developers working on real tasks in their own repositories. Developers using AI tools were 19% slower than those working without them — despite the developers themselves predicting they'd be 24% faster.⁹ The perception gap was 43 percentage points. Developers felt faster. They were not.

A Nature meta-analysis of 106 studies found that human-AI teams don't reliably outperform humans alone on average.¹⁰ BCG's study of consultants using AI showed they were 19% less likely to produce correct solutions for tasks outside AI's capability frontier — and critically, neither the consultants nor their managers could reliably identify which tasks were inside or outside that frontier.¹¹

Sixty percent of engineering leaders say the lack of clear metrics is their biggest AI challenge.¹² As Waydev, an engineering analytics company, put it: "Every traditional engineering metric assumes humans write the code. That assumption just broke." Google is factoring AI use into performance reviews for the first time. Meta is tracking AI-assisted lines of code in reviews. Nobody has figured out what these numbers mean yet. They're measuring because the mandate says to measure, not because they know what to do with the measurements.

The manager staring at the sprint dashboard is looking at a number — 14 tickets vs. 5 — that tells her almost nothing about the actual state of her system, her team, or her codebase. But it's the number her VP will see. And her VP's VP. And the board.

The Perception Tax

Here is something the productivity studies don't capture: the divergence between how fast AI-assisted work feels and how fast it actually is has a compounding organizational cost.

The METR study's 43-point perception gap isn't just a measurement curiosity. It means that developers report to their managers that AI is making them dramatically more productive, managers relay this to directors, directors relay it to VPs, and by the time it reaches the C-suite, AI has become a miracle tool that justifies cutting headcount by a third. Each layer of the organization adds optimism. Nobody adds skepticism.

A Fortune/Stanford study by Nicholas Bloom in March 2026 found that 70% of CEOs and CFOs use AI less than one hour per week. Twenty-eight percent never use it at all. Yet these same executives mandate AI adoption through OKRs, usage tracking, and performance reviews.¹³ The people setting the mandates have no firsthand experience of the cognitive cost.

Writer, an enterprise AI platform, surveyed C-suite executives and found that 42% say AI is "tearing their company apart." Seventy-five percent of leaders think their AI rollout was successful. Only 45% of employees agree. Thirty-nine percent of employees bypass IT entirely to use unauthorized AI tools — not because the approved tools are bad, but because the mandated workflows don't match reality.¹⁴

The perception tax is real: organizations make staffing decisions based on what AI feels like it does, not what it actually does. And the people making those decisions are the ones with the least direct exposure to the tools.

"Companies are laying off workers because of AI's potential — not its performance."
— Harvard Business Review, January 2026

What Gets Lost

When you cut a developer from a team, you don't just lose their ticket velocity. You lose everything they carry in their head that was never written down.

Research estimates that 80% of organizational knowledge is undocumented tribal knowledge — the "why" behind architectural decisions, the workarounds for quirks in third-party APIs, the understanding that this particular microservice fails silently under load and needs a specific monitoring pattern. The cost of losing this knowledge is estimated at $92 billion annually across industries.¹⁵

AI enables headcount cuts. Headcount cuts destroy institutional knowledge. And the vicious irony is that AI systems need institutional knowledge to function well — they need context about the codebase, the business rules, the edge cases that live only in someone's memory. You cut the people who hold the knowledge that would make the AI actually useful. Then the AI produces plausible-looking code that ignores all the hard-won context, and nobody remaining on the team knows enough to catch it.

Fast Company reported in March 2026 that after AI-driven consolidation, "there are fewer people to do tasks like designing, testing, and working with stakeholders, which AI has zero grasp on."¹⁵ The work that AI can't do — stakeholder negotiation, system design, cross-team coordination, the messy human work of building software in an organization — doesn't disappear when you cut headcount. It just gets redistributed among fewer people who now also have to manage AI output.

A UC Berkeley ethnographic study published in HBR found that AI doesn't reduce workload — it intensifies it. Over eight months, researchers observed that AI expanded task scope, blurred work-life boundaries, and normalized higher speed expectations. The remaining developers don't work less. They work differently, and usually more.¹⁶

The Mentorship Cliff

Junior and entry-level developer hiring has dropped 73% year-over-year.¹² New graduates made up 32% of Big Tech hires in 2019. In 2026, they make up 7% — a 78% reduction.¹² Fifty-four percent of engineering leaders say they plan to hire fewer juniors going forward.

The logic seems sound: if AI handles the work juniors used to do, why hire juniors? The answer is that juniors don't exist to close tickets. Juniors exist to become seniors. The junior-to-senior pipeline is how an industry reproduces its own expertise. Cut the pipeline, and in five to ten years you face what workforce researchers are calling a "talent hollow" — an inverted pyramid with plenty of seniors aging out and nobody behind them who learned the craft deeply enough to replace them.¹²

AWS CEO Matt Garman said it plainly: "If you stop hiring juniors today, in 10 years you'll face a serious experience gap."¹² Microsoft Azure's CTO proposed "preceptorship at scale" — AI systems that capture senior reasoning and turn daily work into teachable moments — as a mitigation. It's a thoughtful idea. It's also an admission that the traditional mechanism for knowledge transfer — humans mentoring other humans — is breaking down.

Anthropic's own research found a 17-point comprehension gap when people learn with AI assistance versus without it.¹² The tool that was supposed to accelerate learning may be degrading it. Juniors who learn to code with AI copilots develop a different skill set than those who struggled through problems alone — they're better at prompting and evaluating, worse at designing and debugging from first principles. Whether that trade-off is acceptable depends on what you think software engineering actually is.

The manager watching her team shrink from eight to five knows what she's losing. She's losing the mid-level engineer she spent two years developing, the one who finally understood the payment system well enough to be on-call for it. She's losing the junior who asked annoyingly good questions in design reviews, the kind that forced everyone to articulate assumptions they'd stopped examining. She's losing the slack in the system — the capacity to absorb a sick day, a parental leave, an unexpected departure, without everything grinding to a halt.

"If you stop hiring juniors today, in 10 years you'll face a serious experience gap."
— Matt Garman, CEO of AWS

The Middle Management Squeeze

Engineering managers in 2026 are caught between two irreconcilable pressures. From above: the AI mandate. Use AI. Show productivity gains. Justify headcount with metrics. From below: the team reality. The AI breaks things. The metrics don't measure what matters. The people are burning out.

Amazon's Kiro mandate is the canonical case study. Two SVPs set an 80% weekly usage target for AI coding tools. Seventy percent of engineers complied. Then 1,500 engineers petitioned leadership for access to Claude Code instead of the mandated tool, because the mandated tool was making their work worse. Then four Sev-1 incidents. Then a six-hour outage. Amazon's fix: mandatory senior engineer sign-off on all AI-assisted code changes across 335 critical systems.¹⁷

The engineering managers in the middle of that sequence — the ones who had to enforce the 80% mandate, field the complaints from their teams, explain the incidents to leadership, and then implement the new review policy that effectively acknowledged the mandate was premature — carry a particular kind of organizational whiplash. They were told to accelerate. They accelerated. Things broke. They were told to slow down. And nobody above them acknowledged the contradiction.

LeadDev's survey of developers under AI mandates found that 59% experience deployment problems half the time with AI tools. Sixty-seven percent spend more time debugging AI code, not less. Sixty-eight percent spend more time on AI-related security vulnerabilities. Executives set AI OKRs "without any regard for whether it's actually helping."¹⁸

The manager's dilemma isn't theoretical. It's the daily experience of a person who knows the dashboard numbers don't reflect reality, who knows the headcount cuts will degrade the team, who knows the AI mandate is creating more problems than it solves — and who also knows that saying any of this out loud is a career risk. The incentive structure rewards the appearance of AI-driven productivity. It punishes the honest accounting.

Fifty percent of developers lose 10+ hours per week to non-coding organizational friction — meetings, context-switching, cross-team coordination, unclear requirements. AI hasn't touched any of this.⁸ The actual bottleneck in most engineering organizations was never the speed of writing code. It was the speed of deciding what to build, coordinating who builds it, and ensuring it works in the context of everything else. AI accelerated code generation, but the actual bottlenecks stayed slow. And the mandate says the headcount should drop anyway.

The Social Fabric

Something happens to engineering culture when teams shrink. It's harder to measure than throughput or velocity, but anyone who's lived through a layoff round knows the texture of it.

Stack Overflow's 2025 survey found that 84% of developers use AI tools, but only 29% trust them.⁸ That gap — near-universal adoption, minority trust — describes a workforce that uses tools they don't believe in because they were told to. This is not the enthusiastic adoption that productivity studies model. This is compliance.

Research on social loafing in human-AI teams found that workers exert less effort when collaborating with AI teammates and cede responsibility for outcomes.¹⁹ The dynamic is insidious: if the AI wrote it, who's accountable? Ralabs framed it sharply — "AI does not accept accountability. Engineers do." But if the engineer didn't write the code, what exactly are they accountable for?¹² The answer, in practice, is everything. The AI generates. The human is responsible. This is not a partnership; it's a liability assignment.

Mollick's study of 244 BCG consultants identified a trajectory in how people work with AI: centaur (human leads, AI assists), then cyborg (deeply interleaved), then self-automator (full delegation with a rubber stamp). The self-automation mode looks like productivity from the outside — tasks completed, tickets closed, PRs merged. From the inside, it's disengagement. The human stops thinking critically about the output and starts trusting the machine to be right. And sometimes the machine is right. And sometimes it's plausible-looking wrong in a way that nobody catches until production.¹¹

The traditional senior-to-junior ratio on engineering teams was roughly 1:4-6. That ratio is inverting. The new composition, according to workforce studies, looks more like 65% senior, 25% mid-level, 10% AI specialist — with juniors effectively eliminated.¹² New roles are emerging — "Code Architect," extracted from the engineering manager role; "System Verifier," replacing what used to be the junior "Code Generator" — but these are names for a reality that hasn't stabilized yet.

What has stabilized is the loss. Smaller teams mean fewer perspectives in design reviews. Fewer people on-call means less resilience when systems fail at 2 AM. Fewer juniors asking "why do we do it this way?" means fewer opportunities to discover that nobody remembers why. The social fabric of an engineering team — the informal knowledge transfer, the hallway conversations, the collective debugging sessions, the shared ownership of a system's quirks — frays when you pull threads out. AI doesn't replace those threads. AI doesn't even see them.

"AI does not accept accountability. Engineers do."
But if the engineer didn't write the code, what are they accountable for?

The Karat Paradox

Karat, an engineering hiring platform, published data showing that senior engineers realize 5x productivity gains from AI compared to juniors. Seventy-three percent of engineering leaders say a strong engineer is worth 3x or more their compensation.¹²

This seems to support the consolidation thesis: hire fewer, more senior people, arm them with AI, and watch productivity soar. But it contains its own contradiction. If senior engineers are the ones who benefit most from AI, and you've eliminated the junior pipeline that produces senior engineers, then you're depending on experienced engineers you've stopped developing.

The SignalFire analysis of companies winning the talent war found that they succeed not through mandates but by offering autonomy — letting developers choose how they use AI, not dictating it.¹² The Shopify mandate, the Amazon mandate, the Salesforce hiring freeze — these are blunt instruments applied to a nuanced problem. They optimize for one metric (cost per ticket) while degrading another (organizational capability over time).

The CTO role itself is shifting, from technical depth to strategic AI orchestration. LeadDev interviewed nine engineering leaders and found that all see AI as an augmenter, but none have worked out a stable strategy.¹² They're optimizing in real time, which is a polite way of saying they're guessing. Test-driven development is becoming the "frontline quality mechanism" — tests as the primary feedback loop for AI-generated code — not because TDD is new, but because it's the only automated way to verify that AI output actually works. The irony: the discipline that many teams abandoned as too slow is now the minimum viable safety net for AI-speed development.

The Bet

An EY report found that roughly 20% of firms are actively reducing headcount because of AI.⁶ They are making a bet. The bet is that AI productivity gains will compound faster than organizational knowledge degrades, that the talent hollow won't matter because AI will fill it, that the mentorship gap will close itself somehow, that the institutional knowledge lost in layoffs wasn't that important anyway.

Some of them will be right. AI capabilities are improving fast. The tools available today are not the tools that will be available in two years. It's possible — not certain, but possible — that AI will eventually handle enough of the software development lifecycle that smaller teams truly can do more with less, without the hidden costs.

But "eventually" is doing a lot of work in that sentence. And in the meantime, real companies are cutting real teams based on projections that the data doesn't yet support. Anthropic's own research found that 27% of AI-assisted work consists of tasks that wouldn't have been done otherwise¹² — work that AI made feasible but that nobody actually needed. Some portion of the "productivity gains" justifying headcount cuts is just busywork that looks good on a dashboard.

The GitHub CEO's field study of 22 developers found that half expect AI to write roughly 90% of code within two years.¹² If your senior developers believe this, the staffing question becomes genuinely hard: do you hire for the tasks AI can do (risking that humans interfere with the AI, as the BCG study found), or for the tasks AI can't do (which are hard to identify in advance, as the jagged frontier research showed)?

There is no clean answer. That's the dilemma.

The Conversation After Standup

The engineering manager pulls the 5-ticket developer aside after standup. Not for a performance conversation — for an honest one.

"I know what the dashboard shows," she says. "I also know you spent Tuesday debugging that payment race condition that's been lurking for six months. And I know you pair-programmed with Maya on the notification redesign, and she'll be able to own that system by Q3 because of the time you invested. None of that is in Jira."

He nods. He's been in the industry long enough to know what's coming.

"My VP is going to ask me why we need eight engineers when the velocity data says five could do it. I need to make the case for the work that doesn't show up in velocity. And honestly? I'm not sure I can. The dashboard is very convincing."

This is the conversation happening in every engineering org that adopted AI tools and is now staring at the headcount question. The manager knows the dashboard is wrong. She knows that cutting from eight to five will break things that don't show up in any metric until they break in production at 2 AM and there's nobody on-call who understands the system well enough to fix it. She knows the mentorship capacity will evaporate, the code review quality will drop, the institutional knowledge will walk out the door.

She also knows that the company next door cut their team to five and their board loved it.

The manager's dilemma was never "should I use AI?" It was never even "how many engineers do I need?" The dilemma is this: the things AI makes measurable — ticket velocity, lines of code, PR throughput — are not the things that keep a software organization healthy. And the things that keep a software organization healthy — knowledge transfer, mentorship, review culture, team resilience, institutional memory — are not things AI makes measurable. The dashboard will always favor the former. The reckoning will always come from the latter.

The question isn't whether to use AI. The question is what you're willing to lose while the metrics tell you you're winning.

Disclosure

This article was written with the assistance of Claude, made by Anthropic. We find it fitting that a piece about the organizational cost of AI-driven consolidation was produced by the kind of AI being used to justify that consolidation. We're part of the problem we're describing. At least we're honest about it. Corrections, rebuttals, and war stories from engineering managers in the trenches welcome at bustah_oa@sloppish.com.

Sources

Salesforce CEO Marc Benioff on headcount reduction, earnings call and investor communications, 2025-2026. Covered by CNBC, Fortune.
Shopify CEO Tobi Lutke, internal company memo on AI expectations and headcount justification requirements, 2025. Fast Company.
Amazon 2026 layoffs and CEO Andy Jassy's statements on AI and workforce. CNBC.
IBM plans to replace ~30% of back-office roles (~7,800 positions) with AI over five years. CNBC.
Klarna workforce reduction (5,527 to 2,907 employees) and subsequent rehiring. CEO Sebastian Siemiatkowski's admission of going too far. Fortune, Fast Company.
EY report: ~20% of firms actively reducing headcount due to AI; 55% of companies that rushed replacement now regret it. CNBC.
"Companies Are Laying Off Workers Because of AI's Potential — Not Its Performance," Harvard Business Review, January 2026. HBR.
DORA, State of AI-Assisted Software Development 2024/2025. Also: Stack Overflow Developer Survey 2025 (84% use AI, 29% trust). DORA Report.
METR randomized controlled trial: experienced open-source developers 19% slower with AI tools, despite predicting 24% faster. METR.
Nature meta-analysis of 106 studies on human-AI team performance. Nature Human Behaviour.
BCG/Wharton "jagged frontier" study of consultants using AI (19% less likely to produce correct solutions outside AI frontier); Mollick centaur/cyborg/self-automator taxonomy (244 BCG consultants). SSRN (jagged frontier) | SSRN (taxonomy).
Composite of workforce and hiring data: Karat (5x senior AI gains, 73% of leaders say strong engineers worth 3x+ comp), junior hiring statistics (73% YoY drop, 32% to 7% of Big Tech hires), team composition shifts, Anthropic RCT (17-point comprehension gap), AWS CEO Matt Garman quote, Microsoft preceptorship proposal, GitHub CEO field study, Waydev metrics analysis, SignalFire talent research, LeadDev leadership interviews. Sources: Karat, ByteIota, Waydev, DX, LeadDev, Coruzant.
Fortune/Nicholas Bloom study (March 2026): 70% of CEOs/CFOs use AI <1 hr/week, 28% never use it. Fortune.
Writer enterprise survey: 42% of C-suite say AI is "tearing their company apart," 75% of leaders vs. 45% of employees say rollout was successful, 39% bypass IT. Unleash AI.
Institutional knowledge research: 80% undocumented tribal knowledge, $92B annual loss. Fast Company on post-consolidation capability gaps. Fast Company.
UC Berkeley/HBR 8-month ethnographic study on AI work intensification. HBR.
Amazon Kiro mandate: 80% weekly usage target, 1,500-engineer petition, Sev-1 incidents, 6-hour outage, mandatory senior review policy. AICerts | Autonoma.
LeadDev AI mandate survey: 59% deployment problems, 67% more debugging time, 68% more security vulnerability time. LeadDev.
Social loafing in human-AI teams: workers exert less effort and cede responsibility. ScienceDirect.