AI & Building8 min read

I Burned 2 Billion Claude Code Tokens in a Week. Here's How I Fixed It.

17 April 2026

Friday, 3pm. Going live on Build Hour. I open Claude Code, start typing, and it tells me I'm out of tokens. Again.

Fifth time this month. Max plan on Opus 4.7, supposedly a bottomless well, and I've drained it by Thursday every single week.

The previous time was worse. 24 hours frozen. I stared at my to-do list with ten urgent things on it and got through exactly one of them, because I could still reply to emails and brainstorm with myself. Everything that needed code or actual thinking from Claude? Dead.

And here's the kicker: before I was on the Max plan, I'd already paid for extra Anthropic credits. Those also ran out way too fast. So this isn't a plan problem. This is a me problem. So I went live, made Claude audit its own token usage, and found three silent killers I'd been running for months.

The 2 billion token mystery

Two billion. That's the number Claude coughed up when I asked where all my tokens had gone over seven days.

The breakdown was wild:

92% was cache reuse from inside sessions. Normal, but a clue, because it meant very long sessions with the same context locked in.
75% of one particular session went to skills review work. Heavy sub-agents, parallel research, lots of context.
240 million tokens on a single session building course infrastructure for a product I'm launching in May.
200 million tokens recreating an internal admin dashboard, a to-do tool to replace a paid external product.
Strategy docs, experiments (including an AI CEO called Paperclip), and a handful of other multi-hour sessions.

None of this looked bad individually. Every one of those projects was real work I actually wanted to ship. But the shape of the spend was the tell: a few massive sessions, kept alive for days, chewing through the cache across turns.

Most founders looking at token usage only see the monthly bill. The real story is in session length, working directory, and model choice. Let me walk you through each.

The three silent killers

Here's what most people get wrong about Claude Code tokens. I got all three wrong for months.

Killer #1: Installing every MCP, every skill, and never splitting your CLAUDE.md.

When you go deep with Claude Code, the temptation is to install everything. Every new MCP server that shows up. Every shiny skill. Then you stack your global CLAUDE.md with notes, preferences, and memory entries until it's a 400-line scroll.

Each new session loads all of it. Every turn.

Whatever cache hit rate you're getting, you're paying for a bloated context window before you even type the first prompt. If you haven't split your CLAUDE.md by project, or pruned MCPs you don't actually use, that's where I'd start.

Killer #2: Zombie sessions.

I had one Claude Code session open for 123 hours straight. Five days. Through sleep, meetings, other projects. I thought keeping context warm was helping me.

The same work split into focused sessions uses 20 times fewer tokens than one marathon.

Twenty times. Not 20%. Twenty *x*. Because every new turn in a long session keeps stuffing the same history back into the context, and the cache helps a little but not enough.

Killer #3: Working from the wrong directory.

If you launch Claude Code from your home directory, it scopes its tool search and skills memory across *everything*. If you launch it from the project folder, it scopes narrow.

That single habit costs roughly 75% more tokens on exploratory work. I'd been doing this for months because my terminal lives in `~/` and I'd just start typing.

Three killers. Compounded. That's how you burn 2 billion.

The model switch I should have made six months ago

Here's the confession: I had never once switched models mid-session.

Opus 4.7 does strategy. It also does renames. It also does "read this file and summarise it." It's like running a V8 to cycle to the corner shop.

This is the fix I set up on air. I'd recommend stealing it verbatim and dropping it into your global CLAUDE.md:

```markdown ## Model Routing

Default to Sonnet 4.6. At the start of a new turn, if the task clearly belongs to a different tier, proactively suggest the switch in one line before acting. Do not ask repeatedly in the same session; suggest once per topic change.

- Haiku 4.5 - file lookups, single-line edits, short summaries, yes/no questions, memory writes, simple greps, renaming. - Sonnet 4.6 - default for coding, skill invocations, UI tweaks, bug fixes in 1-2 files, writing new features with clear specs. - Opus 4.7 - strategy, architecture, cross-system debugging (>3 files), PR/code reviews, tradeoff analysis, client-facing writing.

Never silently switch - always let the user approve. ```

Here's the same rule as a quick-scan table:

Model	Best for
Haiku 4.5	Lookups, one-line edits, summaries, yes/no, renames, simple greps
Sonnet 4.6	Coding, UI tweaks, bug fixes, features with clear specs (DEFAULT)
Opus 4.7	Strategy, architecture, debugging across >3 files, reviews, planning

The key line in the prompt is "never silently switch." Claude has to ask before downshifting, so you always know what tier is working on your code. No silent Haiku writing production code, ever.

Since I added this rule, Sonnet is handling around 70% of my turns. Opus shows up for the hard stuff. Haiku for busywork. My estimated token burn has dropped hard, though I need another week of data before I'll claim a number.

AI BUILDERS

Want to learn Claude Code the right way?

I'm running a 5-week course teaching founders how to build real products with Claude Code, including the token-efficiency habits from this post. Small cohort, starts in May.

Join the waitlist→

The prompt I used to audit my own usage

Before you fix the killers, run the audit. You can't improve what you don't see.

Here's the exact prompt I ran on air, with the privacy guards added:

"Have a look through my token usage in the last seven days and tell me what were the highest usage of tokens. Don't share any confidential or client information, don't mention any names. Categorise into buckets by project or context window. Give me a table estimating how many tokens each used as a percentage of the week, plus recommendations on what I could have done better."

That prompt surfaces three things at once: 1. Where the tokens went by project 2. Session-level anomalies (like my 123-hour zombie) 3. Concrete recommendations you can act on in 10 minutes

Run it once a week. It's the single most useful audit I've added to my Claude workflow.

I'd also recommend asking a follow-up: *"If I want to keep my weekly token budget to X, what should I change first?"* It forces the model to prioritise rather than dump a generic list.

Keep reading

Get the best of batko.ai on AI & Building - straight to your inbox

Free. Unsubscribe anytime.

The founder checklist: five moves to make today

If you're a founder burning your Claude budget in five days flat, do these in order:

Close sessions when you're done for the day. Every day. Not "when the project ends." Just every day. Your 123-hour marathon is costing you 20x.
Launch Claude Code from the project directory, not from home. `cd ~/your-project && claude`. That one habit saves roughly 75%.
Add the model routing block (above) to your global CLAUDE.md. Let Claude pick the right tier and ask before escalating.
Run the weekly audit prompt every Friday afternoon. Two minutes. Saves your following week.
Stop installing every new MCP and skill. If it's not solving a problem you had yesterday, leave it. Prune what you're not using.

You don't need all five at once. Habit #1 alone will probably get you through a week without running out. The rest compound.

What I'm building next

Here's the thing I wish existed: a daily token-audit agent that emails me every morning with how I spent yesterday, what the cheapest rewrite of my worst session would have been, and one specific change to try today.

That's next on my build list. Build Hour #4, maybe.

If you want to see it live when it ships, my Build Hour goes weekly and the recordings go up on YouTube. You'll watch me make these kinds of mistakes live, then fix them on air, then ship.

Tokens are the new burn rate. Treat them with the same discipline you'd treat cash.

Sources and Further Reading

This article is licensed under CC BY-NC 4.0. Share freely with attribution.

Running out of tokens is a habit problem, not a plan problem. The three killers - zombie sessions, wrong directory, always-on Opus - are the default state for almost every founder using Claude Code. Fix the defaults and the problem disappears.

If this was useful and you want to go deeper on building real products with Claude Code (the good, the expensive, and the "oh no I deleted the database" moments), jump on the AI Builders waitlist. We're kicking off in May.

DM me on LinkedIn WhatsApp me

AI BUILDERS

Want to learn Claude Code the right way?

I'm running a 5-week course teaching founders how to build real products with Claude Code, including the token-efficiency habits from this post. Small cohort, starts in May.

Join the waitlist→

AI & Automation

How to Build a SaaS Product With Claude Code (Zero Coding Experience)→

10 min read

AI & Automation

The Founder's Guide to Prompt Engineering (Talk More, Type Less)→

9 min read

I Burned 2 Billion Claude Code Tokens in a Week. Here's How I Fixed It.

The 2 billion token mystery

The three silent killers

The model switch I should have made six months ago

Want to learn Claude Code the right way?

The prompt I used to audit my own usage

The founder checklist: five moves to make today

What I'm building next

Sources and Further Reading

Want to learn Claude Code the right way?

Related articles

Get the newsletter

Work with me

Explore more on batko.ai

5 Frameworks I Use With Every Founder I Coach

Batko OS