I've watched developers burn through hundreds of dollars in a week without realizing why. It's almost never the prompts. It's the context piling up behind each prompt and getting re-billed at every single step.
Understanding how to reduce Claude Code token usage comes down to two levers: consume fewer tokens, or make each token cheaper. The first lever is where most of your savings live. The second is secondary but worth knowing.
Here's a stat that puts this in context: according to Anthropic's own usage documentation, input tokens (including cached context) are billed separately from output tokens, and cached reads can be up to 90% cheaper than uncached. That means the shape of your context matters as much as the model you pick.
This guide covers both levers, with the heavier hitter first.
Key Takeaways
- Context is the real bill, not the prompt: every token in your context window gets re-billed on every turn, so controlling what enters the context is where most savings live.
- Using /clear at the start of each new task is the highest-ROI habit: old unrelated context is dead weight you keep paying for, and /compact is the softer option when you need continuity.
- Delegating to subagents keeps your main session lean: each subagent runs in its own context window, does the heavy reading on the side, and hands back just the answer.
- CLAUDE.md loads on every single turn, so keep it concise, push folder-specific guidance into nested files, and only document what the code cannot say for itself.
- The prompt cache stays warm for roughly 5 minutes and makes cached reads up to 90% cheaper; working in focused bursts instead of letting sessions go idle is a low-effort way to cut costs.
Learn this hands-on
Become a 10x PM by learning how to use Claude Code in your daily work as a Product Manager, through 3 highly efficient live sessions of 1h30. Join the Claude Code for PMs live cohort.

Lever 1: Reduce How Many Tokens You Consume
These five habits account for the majority of savings you will see in practice.
1. Clear Early, Clear Often
This is the highest-ROI habit by a wide margin.
Every unrelated task should start with /clear. Old context that has nothing to do with what you're working on now is dead weight, and you re-pay for it on every turn. /compact is the softer version: it summarizes the thread into something shorter when you want continuity but need to shed bulk.
The discipline underneath both commands is the same: keep tasks small and scoped. A small, focused task is one you can comfortably clear and move on from. A bloated session is expensive by definition.
If you want to check your Claude Code token usage before you clear, the /status command shows how large your current context has grown. Build the habit of checking it at the start of each new topic.
2. Delegate Tasks to Subagents
A subagent in Claude Code is a specialized worker defined by its own system-prompt markdown file, operating in its own separate context window.
Work handed to a subagent never piles up inside your current session. Your main context stays lean, and a lean context is cheap because a session stuffed with files and tool results gets more expensive on every turn. The subagent does the heavy reading on the side and hands back just the answer.
This is the best structural fix for complex multi-step work. Instead of one long sprawling session, you have a short orchestrator delegating to focused specialists. Each specialist starts clean and ends clean.
3. Keep Your CLAUDE.md Lean
CLAUDE.md is loaded into context on every turn from the very start of the session. That makes it the most expensive context you own because every token in it gets billed on every prompt.
Two practices help here:
First, push folder-specific guidance into nested CLAUDE.md files inside relevant subfolders. Each one only loads when the model is working in that folder, not globally on every turn.
Second, only document what the code cannot say for itself. If the model could figure something out by reading the files, leave it out of CLAUDE.md. Over-documented CLAUDE.md files are a common source of silent token bloat that's easy to miss when you check Claude Code token usage reports.
4. Point Directly at Files
If you already know which file matters, point at it with @ in your prompt.
The model goes straight to it instead of grepping, listing directories, and reading two or three wrong files along the way. Fewer exploratory turns, no junk in your context. This is a small habit with compounding value because exploratory bloat sticks around after the files are read.
5. Do the QA Yourself (Especially on Frontend Work)
When the model generates UI for something like a frontend prototype, don't ask it to verify the result. AI verification spins up a headless browser, takes screenshots, and reads those images back into context. Images are expensive in tokens.
Open the browser and look yourself. You check just as well as the AI, it's faster, and it costs zero tokens. Save AI verification for flows you genuinely can't check at a glance, like complex authenticated API chains.
A smaller version of the same principle: match the reasoning effort to the task. Heavy thinking modes generate a lot of thinking tokens, and renaming a variable doesn't need them.
Lever 2: Make Each Token Cheaper
Once you've addressed consumption, these two habits squeeze additional savings from the tokens you do use.
6. Combine Models Per Phase
Not every phase of a task needs the same model.
A practical flow: plan with your strongest reasoner (that's where the expensive model earns its tokens), execute with a faster and cheaper model for the repetitive grind, review with the strong model at the end. You pay premium rates only for the thinking phases.
The trade-off to know: switching models mid-task can start the cache cold for the new model. Each model caches its own context independently, so a model swap may cost more on the first turn than staying with the same model would have. Factor that into your mental model when you're planning a long session.
7. Ride the Warm Prompt Cache
Claude keeps a cache of your session's context for roughly five minutes. Keep prompting and the cache stays warm: your context is billed at a heavily reduced rate (around 90% cheaper for cached reads versus fresh input). Let the session go idle for more than five minutes and the cache expires. Your next prompt gets billed at full uncached input price.
This is one reason to work in focused bursts rather than switching between tasks mid-session. Idle gaps are invisible on the usage graph until you see the bill at the end of the month.
As Andrej Karpathy noted, "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." That framing captures exactly what these seven habits are doing: not just reducing tokens, but making every token that enters the window earn its place.
Context engineering is the delicate art and science of filling the context window with just the right information for the next step.
The Pattern Under Everything
Four of these seven habits are the same move in different clothes: control what enters the context before it gets there.
Always-on bloat (oversized CLAUDE.md, too many MCP servers loaded globally) is the worst kind because it taxes every single turn. Exploratory bloat (the model hunting for files, reading whole files when a summary would do, screenshotting for QA) is next because it sticks around once it's in. Subagents and /clear are how you deal with what has already gotten in.
If you want to understand your Claude Code token limit exposure, the /status command and the usage tab in your Claude account are the two places to check. Neither requires any special setup.
Pick one habit this week. /clear discipline is the easiest place to start. Watch your usage graph change within two or three days.
Claude Code Token Usage for Product Managers
Most of the token cost conversation lives in developer circles, but PMs running Claude Code face the exact same problem, often worse, because PM sessions tend to be exploratory by nature.
A PM working on a discovery session pulls in interview transcripts, competitor teardowns, and design docs into one growing context. Each follow-up question re-bills all of it. The bill compounds fast, and it's invisible until the end of the month.
The same seven habits apply:
- Start each work area with
/clear. Don't carry last week's roadmap session into today's spec review. - Build subagents for recurring tasks. A competitor digest agent, a PRD review agent, a meeting-notes-to-ticket agent. Each one runs in its own context and stays cheap.
- Keep your CLAUDE.md scoped to what PMs actually need (team context, product principles, output formats). Don't load the whole engineering runbook into every session.
- Point directly at the transcript or doc that matters instead of asking the model to find it.
The savings are the same. The habits transfer directly.
Product manager and want to work like this? This is exactly what we teach in Claude Code for PMs, our live cohort for product teams: 3 live sessions of 90 minutes over 2 weeks. Every PM ships a real feature, builds their own agent, and gets personalized written feedback.
The One Thing to Remember
Context is the bill. Anything that enters the context window gets re-billed on every subsequent turn until you clear it. The prompt you write is a tiny fraction of the total token cost. The context that surrounds it is most of it.
Control the context, control the cost.

