Article

Conversation

Image
Using Claude Code: Session Management & 1M Context
In my recent calls with Claude Code users, one theme keeps coming up: the 1M token context window is a double-edged sword.
It lets Claude Code operate autonomously for longer and handle tasks more reliably, but it also opens the door to context pollution if you're not deliberate about managing your sessions.
Session management matters more than ever and there seem to be a lot of questions about it. Do you keep one session open in a terminal, or two? Start fresh with every prompt? When should you use compact, rewind, or subagents? What causes a bad compact?
There’s a surprising amount of detail here that can really shape your experience with Claude Code and almost all of it comes from managing your context window.

A Quick Primer on Context, Compaction & Context Rot

Image
The context window is everything the model can "see" at once when generating its next response. It includes your system prompt, the conversation so far, every tool call and its output, and every file that's been read. Claude Code has a context window of one million tokens.
Unfortunately using context has a slight cost, which is often called context rot. Context rot is the observation that model performance degrades as context grows because attention gets spread across more tokens, and older, irrelevant content starts to distract from the current task. For our 1MM context model, we see some level of context rot happen around ~300-400k tokens, but it is highly dependent on the task- not a fast rule.
Context windows are a hard cutoff, so when you’re nearing the end of the context window, you will need to summarize the task you’ve been working on into a smaller description and continue the work in a new context window, we call this compaction. You can also trigger compaction yourself.
Image

Every Turn Is a Branching Point

Say you've just asked Claude to do something and it's finished, you’ve now got some information in your context (tool calls, tool outputs, your instructions) and you have a surprising number of options for what to do next:
  • Continue — send another message in the same session
  • /rewind (esc esc) — jump back to a previous message and try again from there
  • /clear — start a new session, usually with a brief you've distilled from what you just learned
  • Compact — summarize the session so far and keep going on top of the summary
  • Subagents — delegate the next chunk of work to an agent with its own clean context, and only pull its result back in
While the most natural is just to continue, the other four options exist to help manage your context.
Image

When to Start a New Session

The new 1M context windows means that you can now do longer tasks more reliably, for example to have it build a full-stack app from scratch. But just because your model hasn't run out of context, it doesn't mean you shouldn't start a new session.
Our general rule of thumb is when you start a new task, you should also start a new session.
A grey area is when you may want to do related tasks where some of the context is still necessary, but not all.
For example, writing the documentation for a feature you just implemented. While you could start a new session, Claude would have to reread the files that you just implemented, which would be slower and more expensive. Since documentation may not be a highly intelligence sensitive task, the extra context is probably worth the efficiency gain of not having to re-read the relevant files again.

Rewinding Instead of Correcting

Image
If I had to pick one habit that signals good context management, it’s rewind.
In Claude Code, double-tapping Esc(or running /rewind) lets you jump back to any previous message and re-prompt from there. The messages after that point are dropped from the context.
Rewind is often the better approach to correction. For example, Claude reads five files, tries an approach, and it doesn't work. Your instinct may be to type "that didn't work, try X instead." but the better move is to rewind to just after the file reads, and re-prompt with what you learned. "Don't use approach A, the foo module doesn't expose that — go straight to B."
You can also use “summarize from here” to have Claude summarize its learnings and create a handoff message, kind of like a message to the previous iteration of Claude from its future self that tried something and it didn’t work.
Image

Compacting vs. Fresh Sessions

Once a session gets long, you have two ways to shed weight: /compact or /clear (and start fresh). They feel similar but behave very differently.
Compact asks the model to summarize the conversation so far, then replaces the history with that summary. It's lossy, you're trusting Claude to decide what mattered, but you didn't have to write anything yourself and Claude might be more thorough in including important learnings or files. You can also steer it by passing instructions (/compact focus on the auth refactor, drop the test debugging).
Image
With /clear you write down what matters ("we're refactoring the auth middleware, the constraint is X, the files that matter are A and B, we've ruled out approach Y") and start clean. It's more work, but the resulting context is what you decided was relevant.

What Causes a Bad Compact?

Image
If you run a lot of long running sessions, you might have noticed times in which compacting might be particularly bad. In this case we’ve often found that bad compacts can happen when the model can’t predict the direction your work is going.
For example autocompact fires after a long debugging session and summarizes the investigation and your next message is "now fix that other warning we saw in ."
But because the session was focused on debugging, the other warning might have been dropped from the summary.
This is particularly difficult, because due to context rot, the model is at its least intelligent point when compacting. With one million context, you have more time to /compact proactively with a description of what you want to do.

Subagents & Fresh Context Windows

Image
Subagents are a form of context management, useful for when you know in advance that a chunk of work will produce a lot of intermediate output you won't need again.
When Claude spawns a subagent via the Agent tool, that subagent gets its own fresh context window. It can do as much work as it needs to, and then synthesize its results so only the final report comes back to the parent.
The mental test we use: will I need this tool output again, or just the conclusion?
While Claude Code will automatically call subagents, you may want to tell it to explicitly do this. For example, you may want to tell it to:
  • “Spin up a subagent to verify the result of this work based on the following spec file”
  • “Spin off a subagent to read through this other codebase and summarize how it implemented the auth flow, then implement it yourself in the same way”
  • “Spin off a subagent to write the docs on this feature based on my git changes”

Summary

In summary, when Claude has ended a turn and you’re about to send a new message, you have a decision point.
Overtime we expect that Claude will help you handle this itself, but for now this is one of the ways you can guide Claude's output.
Image

Amitav Krishna
Post your reply

Yes but I don't want to compact. It goes too far. I want Claude to surgically remove tokens from tool calls that were unhelpful in achieving my goal. Literally just have a Claude remove from its memory every token that wasn't useful in it's retrieval calls. If I'm agentically
You can try building a toy harness to test this. I've found removing tokens from a transcript can confuse Claude sometimes. The particular point about search is answered via subagents.
Great as always, 👏 Could you please elaborate on rewind and prompt caching? Let's say I have a 10+ turn debugging conversation and then I want to call rewind because the current turn happened to be a dead end. The next message would then be a cache miss, wouldn't it?
no it stays as a prompt cache hit! I didnt want to talk about prompt caching here because I felt like the important thing was just talking about effectiveness of context
I’m trying to figure out why you guys even shipped 1m context window in the first place, literally agentic Hiroshima
it makes longer, autonomous tasks way better, autocompacting in the middle of a task is brutal
On remote sessions /clear doesn’t sync to the app UI, so all transcript history piles up. After each app cold start it has to reload the entire transcript. We’re talking minutes of loading just to get back in.
context rot is just the model's way of saying it forgot what you asked for after three coffees. guess /rewind is the new ctrl+z for when your brain turns to mush at 300k tokens
This is great information, and really makes it clear why it's called context engineering. I created a skill that saves your current session state to the shared knowledge layer before context compaction. It diffs what changed, updates shared files (CONTEXT.md, TASKS.md, memory
Shouldn't the harness to transparently handle all these quirks? Other harnesses are handling stuff like this like champs, not affecting the end users or requiring for them to manage extremely customized setups (which at some point are not even possible. i.e. managed suscriptions)
Based on your post, I’ve been trying something I call proactive compacting because I hate waiting for a compaction to complete mid-task. My Claude Code IDE checks for inactive sessions (ones I’m not interacting with right now) and compacts them if they are above 60%. Works nicely
Can you please confirm what part of my chat history will be restored and what will be excluded and when I am using /compact ? I mean what approach the claude takes to /compact ?
The distinction between simply continuing and knowing when to use /rewind or /clear is exactly what separates a smooth workflow from one that gets bogged down in context rot. The point about documentation being a good candidate for carrying over context is a helpful nuance.
its frustrating that after making a plan, the option to clear the context and bypass permissions is gone. that was a perfect way to cleat out the context
Does /clear reload claude, settings, and memories? I find myself opening a fresh window when using multiple sessions to make sure it reloads.. no idea if it’s needed.
Didnt knew about the rewind feature and wished often times it exists. From my own usage I can say I almost never use compact, but tend to use clear and subagents a lot. I think also the critcism towards quality and rate limits comes from people not managing the context properly.
I think that some context bar at the bottom of the terminal screen will help a lot and improve visibility of when to use clear or compact commands
What are your thoughts on telling Claude to summarize the conversation on its memory and then clean? That is generally the approach I follow when starting a new session that kind of depends on the previous one, I made a skill for that too
Love it. But I’m trying to work more from the Desktop app after recent major update, and it’s frustrating that Desktop keeps lagging behind on core Code features like /rewind. Is full session rewind / checkpointing coming to Claude Desktop Code as well?
Why not default to agent teams over subagents? A teammate can act like a disposable subagent, or stay alive and reuse its own context later. Wouldn't that keep the main session smaller and reduce context bloat?
This is a good read for context management but couldn’t we just use a tool to handle all these things for every message and conversation? Or perhaps haiku could be a better solution.
Great article! FYI the blog version doesn't have the new introduction. I really appreciate these sorts of tips. Everything in it was very understandable except for the "summarize from here" paragraph - I had to look that one up in the online docs.
What that means in practice: WBTC/WETH - 68.7 WBTC against 18K WETH. Sell-only swap flow at maximum intensity across every window. Any meaningful outflow moves price materially. UNI/WETH - 666.8 WETH backing over 3.85M UNI. 1% average per-swap price movement on just 29 trades.
Image
Image
Few understand this at the level they need to be successful. Sadly you may need to gate 1M for humans that are aware. The last few months have undoubtedly been hell for you due to this and it’s only going to get worse. The manifestation of this as omg quantization/etc is obscene
I have to spend 200k of the 1M just setting up all my anti-patterns to stop Material Design from ruining my project and deprogram CC of tech lead scope dilution. I guess I need to complain because convergence is getting worse over time and the product is losing the tail.
Genuine question — I feel like I under-use subagents. The "need the answer, not the tool output" test makes sense in theory, but in practice I default to just continuing in the main session.What's the signal you use to catch "this is a subagent moment"? Before starting the task,
This makes it really practical and useful - should result in much better success rates on long tasks by treating each response as a decision branch rather than passive continuation
Why would you event not want to use subagents for complex tasks that have multiple sub tasks? For me it seems to work quite well to have Claude Opus 1M context window as orchestrator and basically call subagents all the time. Kind of like an engineering manager and dev team
One thing that's really helped me (rather than compact or clear): 1. get handoff summary/docs 2. start a fresh session 3. leave the first one open 3. ask the new chat if it has context questions or if anything else would be useful 4. get more context handed off 5. off the races
Most teams will just keep hitting continue until something breaks. Rewind and proactive compaction are the right patterns, but they assume users are actively managing context, which breaks down as sessions get longer and tasks get more complex.
Usage of subagents is highly relevant for doing research, testing and so many other tasks that we do in between without disturbing the main context!! Excellent read
Thanks... but guys sorry... Claude just got lazy and dumb. It just admitet to me that it did do the whole task. And then lied to because " I was to pushy" with it. Then i ask ok pls show me where did a push you do befaster. It admit do made it up. Sorry but WTF is going on.
Pretty cool stuff—are you guys working on having Claude manage the context window smartly somehow? Seems like a cool problem to tackle, especially considering how easy it is to just go rogue and continue prompting.
What's weird is that for some reason I can't apparently make CC use 1M. I've tried CLAUDE_CODE_DISABLE_1M_CONTEXT=0, and all sorts of combinations, checked settings.json, you name it. It keeps it at 200k since this morning.
In other words, the reason Claude Code works badly for you lately is that you think 150K tokens in input is only 15% of capacity and there's plenty of space, while you're already in dumb-zone
Wonderful details , I am always check /context see how far I have used. Then decide should I use compact(if I am still working on similar feature in code) or clear (if I am starting of new code flow). But /rewind is a Thor's hammer, never thought of this and used it
Image
Really great post, thank you! A bit of feedback for the future: please consider how/where to publish things like this which has as low friction as possible to share with agents. Like, a link to the same text in raw markdown in an url which is easy to reach (x is not)
Or just move to Codex like I did. Sorry Thariq, Codex just does more for what it offers. This comes from an ex-$400 a month Claude user to a new $200 a month codex user.
In "Every Turn Is a Branching Point", there can be another powerful option: fanning out into multiple forks. Of course, that's a contentious and advanced option, to be used with cautious workflow design.
Been using CC since it came out so i'm very cautious of context windows and have developed clean workflows and a deep understanding for how to get the most out of each model / session. These little details will help tremendously on top of what I already know.
Thanks for this... I run my implementations with phases (say up to 10) and I tend to start a new context window every 3 phases or so. So after phase 3, I'll say, update the plan so I can start a new session and continue with phase 4. What's the best practice here?
This is really helpful. I often find myself asking Claude if he’s doing ok on context. He seems to be comfortable pushing pretty far. But the more I use it the more comfortable I get with managing this.
Thanks for sharing! I have a question: there were lots of discussions around your teams that don't write code manually anymore, create MRs via phones, etc. Does it mean everyone is babysitting the context each turn? Or if you prefer a fully autonomous mode, how the above rules
Thanks Thariq. Always appreciate when you & the team share insights like these. rewind feels immediately useful.
It would be cool to be able to drill into subagent sessions and potentially ask follow up questions there, sending updated results back into the parent session.
Why in the new Claude desktop, I see my context window based in 200K whil I'm using the 1M model?
Image
Appreciate the post and insight on this, but the new CC UI app seems to have a context bloat problem at least comparing with CC in VScode
Quote
Vlad
@valerionxv
With the launch of the new Claude Code desktop UI I thought i'd give it a try. Immediately noticed that the context in the new UI is heavier than working in vscode (even after disabling every plugin, except computer use and chrome dev tools). ~100k in the app and ~50k in vscode
Image
Image
Image
Image
great summary & illustrations. Important concepts that are completely skipped by most of OC users. LLM cache management could also be covered in this article or in a dedicated one, don’t you think?
Very useful post, thanks! Would you be able at some point to implement automatic rewind when dialog gets into state "no, this was wrong, let's do it other way"? Perhaps a tool that Claude can call upon detection of "user said we went wrong direction" and somehow identify the
Good read! Although it's a bit strange to be warned about high subagent usage when that's just Claude spinning up its own Explore agents :)
I need Claude to effectively manage the usage complexity of…wait for it…Claude! How about new skills to do this for us based on prompts covering usage scenarios?
i use fork after a read. init a session. then fork sessions to do work which require same set of file reads. ask to summarize related modules save these for lot of future work.