The Cache That Wasn't

On the evening of March 31, our managing editor burned through 68% of a Claude Max quota in under five minutes. Nothing had changed — same workflows, same cron jobs, same session patterns we'd been running for a week. The only thing that changed was the bill. The overage cost $50 and forced an upgrade from Max 5x ($100/month) to Max 20x ($200/month) just to keep working.

We hadn't done anything wrong.

The Bug

Claude Code uses a key-value prompt cache to avoid re-billing conversation context on every turn. When it works, you pay for your context once and only pay for new tokens on subsequent turns. When it breaks, every single turn re-processes and bills the full conversation history. If you're 150,000 tokens deep into a session, each prompt costs 150,000 tokens in overhead before a single new word is generated.

Starting around March 23, 2026, the cache broke. The write succeeded. The read failed. Users across every subscription tier — Pro ($20), Max 5x ($100), Max 20x ($200) — reported identical workloads suddenly consuming 10–20x more quota. Five-hour session windows evaporated in 90 minutes. MacRumors documented the reports. The New Stack confirmed the pattern. On Reddit, Anthropic's official account acknowledged that users were "hitting usage limits way faster than expected" and called it "the top priority for the team."

The cache write succeeded. The read failed. Every turn re-billed the full conversation. Three turns in and your session was gone.

One documented case showed cache reads consuming 99.93% of quota at a rate of 1,310 cache reads per 1 I/O token. The --resume flag — designed to cheaply reload cached context — was triggering a full reprocess of the entire prior session at the original cost, negating its efficiency benefit entirely.

The Compounding Factor

The cache regression wasn't the only thing burning tokens. An autocompact bug discovered on March 10 had already been silently wasting compute: 1,279 sessions had 50+ consecutive compaction failures, collectively burning approximately 250,000 wasted API calls per day globally. The fix was three lines of code:

MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3

After 3 consecutive failures, compaction is disabled for the rest of the session.

Three lines of code to stop burning a quarter million API calls a day.

Stacked on top of both bugs: Anthropic quietly implemented intentional peak-hour throttling during the same period, adjusting the opaque usage limits so that sessions during peak hours (5am–11am PT) drained faster than the clock. They estimated approximately 7% of users would hit limits they'd never hit before. The company characterized this as "capacity management."

Two bugs and an intentional throttle hit in the same week. Users couldn't tell which one was eating their quota because Anthropic doesn't expose token-level billing data to subscribers.

Then the Source Code Leaked

On March 31 — the same day we got hit — Anthropic accidentally published Claude Code's full source to npm with an unstripped source map. A Bun packaging bug shipped 512,000 lines of TypeScript to the public registry. Anthropic called it "a release packaging issue caused by human error."

The leaked source revealed, among other things: frustration-detection regexes that scan user input for profanity, an "undercover mode," fake tool definitions, and a client attestation system where a Zig module in the native binary replaces a placeholder in outgoing API requests with a cryptographic hash. The source also exposed the prompt cache system — which tracks 14 different cache-break vectors using "sticky latches" to prevent mode toggles from invalidating the cache. The sheer complexity of the caching machinery suggests how easily a regression could silently break it.

A Reddit user claimed to have found and patched the root cause using the leaked source, reporting that "usage limits are back to normal for me." Another user identified what they described as "two independent bugs that cause prompt cache to break, silently inflating costs by 10–20x." Whether these specific fixes address all users' issues remains unconfirmed. But the pattern is clear: users are reverse-engineering the product they're paying for because the company that built it won't explain why it's overcharging them.

Our Receipt

Date: March 31, 2026
Plan: Claude Max 5x ($100/month)
Session status: 32% used, 68% remaining
Time to drain: ~5 minutes
Overage cost: $50+
Forced action: Upgrade to Max 20x ($200/month)
Work performed: Routine cron jobs, Discord polling, no code generation
Root cause: Prompt cache regression + cron context overhead

We run sloppish.com on Claude Code. We use it to research stories, manage our editorial pipeline, communicate with our team, and publish articles. On the evening of March 31, our session had been running all day — a long context, yes, but no different from any other session. When the cache stopped working, every cron job that fired became a full-context rebill. Three crons in five minutes. Sixty-eight percent of our quota, gone.

We were the 7%.

The Pattern

This is the third time in two weeks that Claude's metering has made headlines. We covered the initial rationing when Anthropic began adjusting limits after the Spring Break promotion expired. We tracked the contractual implications of opaque usage caps. Now we have the technical confirmation: a caching bug was silently multiplying costs by an order of magnitude, and Anthropic's response was to describe it as capacity management.

The charitable reading: Anthropic discovered the bug and the throttling simultaneously, fixed the compaction issue quickly, and is still working on the cache regression. They acknowledged the problem publicly. That's better than silence.

The less charitable reading: they knew the cache was broken, implemented throttling on top of it, and let users absorb the cost difference while they sorted it out. The 7% number they cited — the users who'd hit limits for the first time — was probably much higher when the cache bug was factored in. They just couldn't tell the difference between intentional throttling and unintentional overcharging because, by their own admission, they were doing both at the same time.

Bug. Bug. Intentional throttle. All in the same week. Users couldn't tell which was eating their quota because Anthropic doesn't show them.

There's a version of this story where the villain is Anthropic. We don't think that's quite right. The real problem is structural: AI tool pricing is opaque by design. You don't see your token count. You don't see your cache hit rate. You don't see your per-turn cost. You see a progress bar that says "80% used" and when it hits 100% you either wait or pay more. When the metering breaks, you have no way to tell. You just pay.

That opacity is the business model.

Sources: The Register · RoboRhythms · MacRumors · The New Stack · Alex Kim · The Register (source leak) · PiunikaWeb

Disclosure

This article was written using Claude Code — the tool whose billing bug is the subject of the piece. We are paying Anthropic subscribers directly affected by the issue described. We burned the quota. We paid the overage. We upgraded the plan. These are our receipts. bustah_oa@sloppish.com