Extended Thinking
Definition
Extended Thinking is Claude's ability to engage in visible step-by-step reasoning before responding, using additional compute time to work through complex problems systematically.
Why It Matters
Complex problems require deliberate reasoning. When you ask a difficult question, thinking step-by-step produces better answers than rushing to a conclusion. Extended Thinking applies this principle to LLMs, giving the model time and space to reason before answering.
The key insight: LLMs perform better when they “think aloud” before responding. Extended Thinking makes this explicit, allocating dedicated tokens for reasoning that the user can observe. This improves accuracy on math, coding, analysis, and multi-step problems.
For AI engineers, Extended Thinking represents a shift in how we use LLMs. Instead of optimizing for minimal tokens, we deliberately allocate compute for reasoning. The trade-off (more tokens and latency for better answers) is often worthwhile for complex tasks.
How It Works
Extended Thinking modifies the generation process:
1. Thinking Phase Before generating the visible response, the model produces a “thinking” block. This contains explicit reasoning steps, working through the problem systematically.
2. Controlled Token Budget You can specify how many tokens the model should allocate to thinking. More thinking tokens allow deeper reasoning but increase cost and latency.
3. Visible Reasoning Unlike hidden chain-of-thought, Extended Thinking shows the reasoning process. Users see how the model approached the problem, making outputs more interpretable.
4. Response Generation After thinking, the model generates its final answer, informed by the reasoning it just performed.
Implementation Basics
Using Extended Thinking effectively:
API Parameters Enable Extended Thinking via the API with thinking parameters. Specify budget limits to control cost and latency.
When to Use Enable for complex analytical tasks: multi-step math, debugging code, strategic planning, comparing options. Skip for simple queries where immediate answers suffice.
Prompt Design Frame problems to benefit from deliberation. “Analyze the trade-offs between X and Y” invites extended reasoning. “What is the capital of France?” doesn’t need it.
Streaming Considerations Thinking tokens can stream separately from the response, letting users see the reasoning process unfold. Consider UX implications for your application.
Cost Management Thinking tokens count toward usage. For production systems, implement logic to enable Extended Thinking selectively based on query complexity.
Evaluation Compare outputs with and without Extended Thinking on your specific tasks. The benefit varies, as some tasks see significant improvement, others minimal gain.
Extended Thinking represents the broader trend of test-time compute scaling: allocating more resources at inference time for better results, rather than only scaling training.
Source
Extended thinking allows Claude to show its reasoning process, improving performance on complex tasks requiring multi-step analysis.
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking