Learned Helplessness in the Age of Agents

In the spring of 2025, Carnegie Mellon's Fundamentals of Programming course — 15-112, one of the most storied intro CS courses in the country — recorded its highest drop rate since the course began. AI models could score 100% on every assignment. Many students used them. Then the exams arrived, closed-book and closed-laptop, and those students discovered they couldn't write a for loop without a prompt. Some described their AI dependence as an addiction. The course responded by shifting weight to in-class quizzes and mandatory recitations. The students responded by dropping out.¹

This is not a story about cheating. The students weren't trying to game the system. They were using the tools they'd been told would make them more productive. They prompted, they got code, the code worked, they submitted it. The feedback loop was perfect: ask, receive, submit, pass. The problem is that the loop contained no learning. Every assignment completed by AI was a rep the student's brain never performed. And when the safety net was removed — when they had to write code alone, in a room, with nothing but their own understanding — there was nothing there.

The error message on the screen meant nothing to them. Not because it was cryptic. Because they had never seen one they had to fix themselves.

The Dogs Who Stopped Trying

In 1967, Martin Seligman ran an experiment that would reshape psychology. He put dogs in a harness and administered mild electric shocks. One group could press a lever to stop the shocks. The other group could not — the shocks were uncontrollable regardless of what they did. Later, both groups were placed in a box where they could easily escape the shocks by jumping over a low barrier. The dogs who had learned they could control the shocks jumped immediately. The dogs who had experienced uncontrollable shocks lay down and whimpered. They didn't even try to escape.²

Seligman called it learned helplessness. The dogs hadn't lost the physical ability to jump. They had lost the belief that their actions mattered. The mechanism was cognitive, not physical — they had learned that effort was futile, and they generalized that lesson even when the situation changed.

The parallel to AI-dependent coding is not perfect, but it is uncomfortably close. The productive struggle of debugging — that agonizing hour staring at a stack trace, the moment when you finally understand why your async callback fires before your data loads — is where programming knowledge lives. It is effortful, it is frustrating, and it is the mechanism by which novices become competent. When an AI agent resolves the error in four seconds, the developer never experiences the agency of solving it themselves. They learn, implicitly, that struggle is unnecessary. That the right response to confusion is to ask the machine.

A developer writing under the handle "NotTheCode" described the social dimension: juniors lean on AI because it's safe, fast, and private. "They stop asking humans because asking humans carries a social cost — the fear of looking slow."³ The AI never judges. It never sighs. It never makes you feel stupid for not knowing what a null pointer is. It just gives you the answer. And each answer you don't earn is a neural pathway you don't build.

"If you outsource thinking about architecture to an AI, you outsource your ability to reason about systems — which is basically what senior engineering is."
— Developer after 30 days without AI tools

The Generation Effect, Inverted

In 1978, psychologists Norman Slamecka and Peter Graf demonstrated something that should be obvious but has profound implications for AI-assisted coding: actively generating information produces dramatically better memory and understanding than passively receiving it.⁴ They called it the generation effect. A meta-analysis of 86 subsequent studies found an average effect size of d=0.40 — information you generate yourself is remembered roughly half a standard deviation better than information you merely read.⁵

AI code generation is the exact inversion of the generation effect. It converts what should be an active generation task — writing code, reasoning through logic, debugging failures — into a passive reading task. You read the AI's output. You maybe skim it. You run it. It works. You move on. The cognitive process that would have encoded understanding into long-term memory never fires.

Anthropic's own research confirmed this in January 2026, in a randomized controlled trial that should be tattooed on the forehead of every bootcamp instructor in the country. Fifty-two mainly junior software engineers were tasked with learning the Trio library, which was new to all of them. Half used AI assistance. Half did it manually. The result: the AI group scored 50% on comprehension quizzes versus 67% for the manual group — roughly two letter grades of difference.⁶

The speed difference? About two minutes. Not statistically significant.

Read that again. The AI group learned 17% less and wasn't meaningfully faster. The entire value proposition — "AI makes you more productive" — evaporated in a controlled setting. What remained was a measurable comprehension deficit, purchased for nothing.

But here's the finding that matters most: the pattern of failure. Developers who delegated code generation entirely to the AI scored below 40%. Those who gradually offloaded more tasks as they went — starting manual and progressively leaning on the crutch — showed declining comprehension over time. But developers who used AI to ask conceptual questions — "why does Trio handle cancellation this way?" rather than "write me a Trio function that does X" — scored 65% and above, nearly matching the manual group.⁶

The way almost everyone uses it is destructive.

The Debugging Gap

The Anthropic study found that the largest comprehension gap was specifically on debugging questions.⁶ This is the most important detail in the entire study and it has received the least attention.

Debugging is not a discrete skill. It is the skill — the meta-skill that subsumes reading error messages, forming hypotheses about system behavior, testing those hypotheses, understanding state and control flow, and reasoning about why code does what it does rather than what you intended it to do. Every senior engineer will tell you that the overwhelming majority of their expertise was built not by writing code that worked, but by fixing code that didn't.

A developer on DEV Community described finding a junior with an async JavaScript bug — "anyone who's written async JavaScript would spot it immediately, but they'd never written async JavaScript. They'd only ever prompted for it."⁷ The code looked right. It ran, mostly. But when it broke — as async code inevitably does — the junior had no mental model of event loops, no understanding of callback timing, no intuition for the category of bug they were looking at. The error message was in English. It might as well have been in Sumerian.

SoftwareSeni posed the question that should keep engineering managers awake at night: "Who becomes your senior engineers in five years if today's juniors never develop foundational debugging skills?"⁸

The answer, increasingly, is: nobody. A Stanford Digital Economy study found that employment for software developers aged 22-25 declined nearly 20% from its late 2022 peak.⁹ Companies are hiring fewer juniors because AI handles the work juniors used to do. But juniors are where seniors come from. The pipeline that produces the experienced engineers needed to supervise AI output is being dismantled at the intake valve.

"Every junior dev I talk to has Copilot or Claude running 24/7. The code often works. Many juniors can't confidently explain how or why."
— DEV Community

The Calculator Fallacy

The reflexive defense is always the same: "They said the same thing about calculators." In 1975, 72% of teachers opposed giving seventh-graders calculators. The fears proved largely unfounded — calculator use maintained or modestly improved mathematical learning over time.¹⁰ So the argument goes: AI is just the next calculator. Relax. Adapt. Move on.

This analogy is wrong, and the reasons it's wrong are instructive.

First, calculators are deterministic. Press 7 times 8, get 56. Every time. There is no hallucination. There is no confident-sounding wrong answer. The output is verifiable by anyone who understands what multiplication means. AI output is probabilistic, often wrong, and requires the very expertise it's supposed to replace in order to verify.

Second, scope. Calculators offload arithmetic — a mechanical subprocess of mathematical reasoning. You still have to understand the problem, set up the equation, interpret the result. AI offloads reasoning itself: problem decomposition, architectural design, debugging strategy, the entire chain of thought from "what am I trying to build" to "here is working code." It automates the thinking itself.

Third, the data. A 2024 study of 666 participants found a significant negative correlation between frequent AI tool usage and critical thinking abilities (r = -0.68).¹¹ For non-statisticians: that is a strong negative correlation. People who used AI tools more frequently demonstrated measurably worse critical thinking. A meta-analysis of over 60 studies found that critical thinking declined 10-15% over 30 years with increasing technology dependence.¹¹

Calculators didn't quietly erode learners' capacity to think. There is mounting evidence that LLMs do.

But perhaps the most devastating counterpoint to the calculator analogy comes from a different domain entirely.

The GPS Problem

In 2020, Louisa Dahmani and Veronique Bohbot at McGill University published a study in Nature Scientific Reports that should be required reading for anyone who thinks AI-assisted coding is harmless. They measured GPS use and spatial memory in a longitudinal study over three years and found that greater lifetime GPS use predicted worse spatial memory — and crucially, the longitudinal follow-up confirmed the relationship was causal, not merely correlational. GPS use didn't just correlate with poor navigation. It caused it.¹²

The mechanism is neurological. GPS disengages the hippocampus — the brain region responsible for building cognitive maps. When you navigate by following turn-by-turn directions, you process each instruction as an isolated command. When you navigate by building a mental map, your hippocampus encodes spatial relationships, landmarks, and routes into a persistent internal model. The GPS user arrives at the destination. The mental-map user understands the territory.

Siddhant Khare described the programming parallel precisely: "Before GPS, you built mental maps. After years of GPS, you can't navigate without it."¹³ Before AI coding tools, you built mental models of systems — understanding how components connected, where state lived, why errors propagated the way they did. After years of AI coding tools, you have code that works and no map of the territory it occupies.

The Sparrow, Liu, and Wegner study published in Science in 2011 found the same pattern with information more broadly: people who expected to have access to information later showed lower recall of the information itself. The internet functions as "transactive memory" — an external store that the brain offloads to.¹⁴ A Kaspersky survey found that 34% of respondents said their phone "was their memory." Forty percent couldn't remember their own children's phone numbers.

Now scale that to code. JetBrains reported that 41% of all code written in 2025 was AI-generated.¹⁵ Junior developers entering the workforce today are inheriting codebases where nearly half the code was written by machines. The "struggle" that builds expertise isn't just being bypassed by the individual developer — it's disappearing from the professional environment itself.

Autopilot Disengaged

On June 1, 2009, Air France Flight 447 was cruising over the Atlantic when ice crystals blocked the plane's pitot tubes, causing the autopilot to disengage. The aircraft was functioning normally. It simply needed to be flown manually — something pilots are trained to do, in theory. But the crew had spent so many hours monitoring automation that their manual flying skills had atrophied. They failed to recognize a stall. For four minutes and twenty-four seconds, they held the nose up while the plane fell 38,000 feet into the ocean. Two hundred and twenty-eight people died.¹⁶

In 2013, the pilot of Asiana Flight 214 crashed on approach to San Francisco because he had never flown the approach without the glideslope automation in a real 777. The airline explicitly encouraged maximum automation use. Three people died.¹⁶

The FAA issued formal safety alerts in 2013 and 2017 warning that "continuous use of autoflight systems could lead to degradation of the pilot's ability to quickly recover."¹⁶ Over 60% of aviation accidents involve challenges with manual control, often tied to managing automation errors. The pattern is consistent: the automation handles the routine cases perfectly. The human is needed only for the edge cases. But the human's skills have degraded because the routine cases were where they practiced.

Lisanne Bainbridge identified this as the fundamental irony of automation in 1983: the more reliable the automation becomes, the less practiced the human operator, and therefore the worse their performance precisely when they are most needed — when the automation fails.¹⁷ Her paper has over 4,700 citations. The industry has read it. The industry has not internalized it.

Medicine is learning the same lesson in real time. A colonoscopy study found that the adenoma detection rate dropped from 28.4% to 22.4% when endoscopists reverted to non-AI-assisted procedures after repeated AI use. Erroneous AI prompts increased false-positive radiology recalls by up to 12%. The researchers identified two dimensions of deskilling: technical (reduced procedural skill) and decision-making (reduced critical thinking). Early-career clinicians were the most vulnerable — they were still building the pattern recognition that experienced practitioners had already consolidated.¹⁸

Across every domain — spatial navigation, aviation, medicine, coding — the pattern is identical: automation that removes effortful cognitive processing degrades the underlying skill. The mechanism doesn't care about the domain. The hippocampus doesn't care whether it's disengaged by GPS or by Copilot. The neural pathways that weren't exercised don't distinguish between turns you didn't navigate and bugs you didn't debug.

Automation degrades the very skills needed when it fails. The more reliable it becomes, the worse we perform without it.
— Bainbridge's Irony, 1983

The Dreyfus Stall

In 1980, Hubert and Stuart Dreyfus proposed a model of skill acquisition with five stages: Novice, Advanced Beginner, Competent, Proficient, and Expert.⁵ The model has been applied to everything from chess to nursing to programming, and its central insight is this: advancing beyond Competent requires letting go of conscious rule-following and developing intuitive pattern recognition through emotional engagement with outcomes.

The key word is emotional. You don't develop intuition for debugging by reading about debugging. You develop it by spending an afternoon furious at a segfault, elated when you find it, and humiliated when you realize it was a one-character typo. The emotional weight of the experience is what encodes it. The Dreyfus model predicts that practitioners who rely on external tools to bypass this engagement will stall at Competence — they will be able to follow rules and apply procedures but will never develop the intuitive grasp that characterizes expertise.

An MIT Media Lab study using EEG monitoring found that LLM users showed the weakest neural connectivity over a four-month period. Self-reported ownership of essays was lowest among LLM users, and they couldn't accurately quote their own work.¹⁹ If developers are forming weaker cognitive connections with the code they "write" via AI, the learned helplessness may be neurologically measurable, not just behavioral.

Addy Osmani, a Chrome engineering leader at Google with twelve years of experience, described his own progression of skill atrophy: first he stopped reading documentation. Then his debugging skills declined. Then deep comprehension faded. Then he found himself 10x dependent on the tools.²⁰ If this happens to a veteran with consolidated expertise, the effect on juniors who never built the skills in the first place is not atrophy. It's absence. You cannot atrophy what you never developed.

One developer went cold turkey — thirty days without AI tools. On day one, they reached for "the AI keybind like a phantom limb." After thirty days, they had rebuilt pattern recognition, documentation literacy, and architectural reasoning. Their conclusion: the skills were recoverable, but only with deliberate, uncomfortable effort.³ The Asian Journal of Psychiatry has documented what they're calling GAID — Generative AI Dependence — with clinical withdrawal symptoms including anxiety, irritability, and restlessness when AI access is reduced.²¹

We are not speaking metaphorically when we say people can't function without their AI tools. We are describing a clinically documented dependency.

The Verification Abyss

Here is where all of these threads converge into something genuinely dangerous.

A junior developer uses AI to generate code. The code works, or appears to. They ship it. But they cannot verify it, because verification requires understanding — understanding the system, understanding the failure modes, understanding what "correct" means in context. Without mental models, they cannot read error messages or evaluate whether the AI's solution is appropriate, efficient, or secure, because evaluation requires exactly the expertise that the AI has been substituting for.

Sarkar's SSRN analysis confirmed this quantitatively: more experienced workers have a 6% higher accept rate for agent-generated code per standard deviation of experience. Juniors aren't just worse at writing code — they're worse at judging AI-written code.²² They accept or reject with less discrimination, which compounds the helplessness problem. They don't just lack the skills to produce. They lack the judgment to evaluate.

Veracode found that 45% of AI-generated code fails security tests. Models have gotten better at generating functional code but have shown no improvement in generating secure code over time.²³ GitClear's analysis found that AI-generated code has a 41% higher churn rate — it gets rewritten or deleted faster. Refactoring dropped from 25% of code changes to under 10%. Copy-pasted code rose from 8.3% to 12.3%.²⁴

The quality of the code is declining. The ability to detect the decline is also declining. And the people entering the profession have the least ability to detect it of anyone.

Stack Overflow described the terminal stage in January 2026: "A new worst coder has entered the chat: vibe coding without code knowledge."²⁵ MIT's theoretical computer science courses have banned AI entirely. Their applied courses shifted grading from 25% problem sets / 75% exams to 5% problem sets / 95% exams — because problem sets are no longer trustworthy signals of understanding.²⁶ Professor Manolis Kellis put it simply: "Students who care learn more, deeper, faster, better. Students who don't cheat more easily."

The problem is that the line between "using AI efficiently" and "cheating" has become invisible, and the students themselves cannot tell which side they're on.

The Missing Middle

A Harvard study of 62 million workers found that when companies adopt generative AI, junior developer employment drops 9-10% within six quarters.⁷ Senior developers are still in demand — more than ever, in fact, because someone has to supervise the machines. But the pathway from junior to senior runs through years of writing bad code, debugging it, understanding why it was bad, and slowly building the judgment that makes a senior engineer senior.

If companies remove the juniors and automate their work, they will have no new seniors in five years.

This is not hypothetical. It is arithmetic. The industry is simultaneously demanding more senior judgment (to review AI output) and eliminating the pipeline that produces it (by not hiring and not training juniors). Every company that replaces a junior developer position with an AI agent today is consuming senior expertise it will not be able to replace. The account will not be overdrawn this year, or next year. It will be overdrawn when the current seniors burn out, retire, or leave — and there is nobody behind them.

CodeRabbit's analysis found that AI-generated tests "test their own assumptions rather than intent" — they verify that the code does what the AI thought it should do, not what the system actually requires.²⁷ TheSeniorDev reported that after 150,000 lines of AI-generated code, the AI was building redundant implementations instead of using established patterns in the codebase.²⁸ These are the kinds of errors that only experience can catch. They require knowing the codebase, knowing the domain, knowing what "right" looks like because you've seen enough "wrong."

A junior who has never seen "wrong" — who has only ever seen code that the AI produced and that appeared to work — has no basis for this judgment. They are not incompetent. They are inexperienced. And the tools that were supposed to accelerate their learning have instead prevented it.

"Who becomes your senior engineers in five years if today's juniors never develop foundational debugging skills?"
— SoftwareSeni

The Question That Remains

In 2019, Lee Sedol — the Go grandmaster who in 2016 had been the last human to beat Google's AlphaGo — retired from professional play. His reason: "With the debut of AI in Go games, I've realized that I'm not at the top even if I become the number one through frantic efforts. There is an entity that cannot be defeated."²⁹

Sedol knew what he was losing. He had spent thirty years building the intuition, the pattern recognition, the deep understanding of the game that made him one of the greatest players in history. He could feel the gap between his knowledge and the machine's performance. The loss was real to him because the skill was real to him.

The junior developer who has never debugged a race condition, never traced a memory leak, never spent a weekend understanding why their code works on their machine but not in production — that developer will never feel the loss. You cannot mourn expertise you never had. You cannot miss the satisfaction of solving a problem you never struggled with. The learned helplessness is invisible to the person experiencing it, because they have no baseline. They have always had the machine. The machine has always had the answer. Why would they suspect that something is missing?

The question isn't whether junior developers will use AI. Of course they will. The question is whether they can function without it. Whether, when the agent confidently produces code with a subtle concurrency bug — the kind that passes every test and fails in production at 3 a.m. on a Saturday — they have the mental models to understand what went wrong. Whether they can read the stack trace. Whether they know what a stack trace is.

The evidence points the same direction in every domain studied: comprehension gaps in coding, record drop rates in CS programs, measurable spatial memory loss from GPS, diagnostic accuracy decline when medical AI is removed. Different fields, same mechanism.

The pattern holds across every domain studied. The only difference is that in aviation, the consequences are measured in crash reports. In medicine, in misdiagnoses. In software — in an industry where nearly everything runs on code — we haven't yet determined how to measure what's being lost.

But we know it's being lost. And we know the people losing it don't know they're losing it.

Lee Sedol retired because he knew. The juniors won't retire. They'll just never arrive.

Disclosure

This article was written with the assistance of Claude, made by Anthropic — whose own study provides the central evidence that AI assistance impairs learning. We recognize the irony of an AI helping articulate the case that AI creates learned helplessness. We also recognize that we used the tool to write about the dangers of using the tool, which is either poetic or pathetic depending on how much coffee you've had. All claims were verified by a human who, for the record, once spent three days debugging a misplaced semicolon and is a better engineer for it. Corrections and reader perspectives welcome at bustah_oa@sloppish.com.

Sources

Carnegie Mellon, 15-112 Fundamentals of Programming course. AI impact on drop rates and course restructuring. The Tartan.
Martin E.P. Seligman, "Learned Helplessness," Annual Review of Medicine, 1972; Maier & Seligman, 1976. Original research conducted 1967.
"30 Days Without AI: What I Learned When I Finally Used My Brain Again." DEV Community.
Norman J. Slamecka and Peter Graf, "The Generation Effect: Delineation of a Phenomenon," Journal of Experimental Psychology: Human Learning and Memory, 4(6), 1978. APA PsycNet.
Bertsch et al., meta-analysis of 86 studies on the generation effect, average effect size d=0.40. Dreyfus model: Hubert and Stuart Dreyfus, "A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition," 1980.
Anthropic, "The Impact of AI Assistance on Coding Skill Formation," January 2026. Randomized controlled trial, 52 junior engineers. Anthropic Research | InfoQ coverage.
"The Junior Developer Crisis of 2026: AI Is Creating Developers Who Can't Debug." DEV Community. Includes reference to Harvard study of 62 million workers.
SoftwareSeni, on the junior-to-senior pipeline crisis. DEV Community.
Stack Overflow, Stanford Digital Economy study: employment for developers aged 22-25 declined nearly 20% from late 2022 peak. Stack Overflow Blog.
"AI Can Transform the Classroom, Just Like the Calculator." Scientific American.
Study of 666 participants (2024): r = -0.68 correlation between AI tool usage and critical thinking. Meta-analysis of 60+ studies on cognitive offloading and critical thinking decline. MDPI Societies.
Louisa Dahmani and Veronique D. Bohbot, "Habitual use of GPS negatively impacts spatial memory during self-guided navigation," Nature Scientific Reports, 2020. Longitudinal, causal. Nature.
Siddhant Khare, "AI Fatigue Is Real." Link.
Betsy Sparrow, Jenny Liu, and Daniel M. Wegner, "Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips," Science, 333(6043), 2011. Science.
JetBrains, "State of Developer Ecosystem 2025." 41% of code AI-generated. JetBrains Research.
FAA Safety Alert for Operators 13002 (2013) and 17007 (2017). Air France 447 (2009, 228 dead) and Asiana 214 (2013, 3 dead) accident reports. FAA PDF.
Lisanne Bainbridge, "Ironies of Automation," Automatica, 19(6), 1983, pp. 775-779. 4,700+ citations. PDF.
Medical deskilling: colonoscopy adenoma detection drop (28.4% to 22.4%); radiology false-positive increase (up to 12%). ScienceDirect | Springer.
MIT Media Lab EEG study: LLM users showed weakest neural connectivity over 4 months. arXiv.
Addy Osmani, "Avoiding Skill Atrophy in the Age of AI." Substack.
Generative AI Dependence (GAID) withdrawal symptoms documented in Asian Journal of Psychiatry, 2025. ScienceDirect.
Sarkar, "Experience and AI Code Acceptance Rates," SSRN. 6% higher accept rate per SD of experience. SSRN.
Veracode, "GenAI Code Security Report." 45% failure rate on security tests. Veracode.
GitClear, "AI Assistant Code Quality 2025." 41% higher churn rate for AI code; refactoring dropped from 25% to under 10%. GitClear.
Stack Overflow, "A New Worst Coder Has Entered the Chat: Vibe Coding Without Code Knowledge," January 2026. Stack Overflow Blog.
MIT EECS AI policies, Fall 2025. 6.1200/6.1210 prohibit AI; 6.1210 shifted to 5% psets / 95% exams. The Tech.
CodeRabbit, "State of AI vs Human Code Generation Report." CodeRabbit.
theSeniorDev, "Why I Stopped Using AI as a Senior Developer After 150,000 Lines of AI-Generated Code." Link.
Lee Sedol retirement announcement, November 2019. Retired from professional Go citing inability to compete with AI.