reasoning
field of each message, unless you decide to exclude them.
Some reasoning models do not return their reasoning tokensWhile most models and providers make reasoning tokens available in the response, some (like the OpenAI o-series and Gemini Flash Thinking) do not.
Controlling Reasoning Tokens
You can control reasoning tokens in your requests using thereasoning
parameter:
reasoning
config object consolidates settings for controlling reasoning strength across different models. See the Note for each option below to see which models are supported and how other models will behave.
Max Tokens for Reasoning
Currently supported by:
- Gemini thinking models
- Anthropic reasoning models (by using the
reasoning.max_tokens
parameter) - Some Alibaba Qwen thinking models (mapped to
thinking_budget
)
reasoning.max_tokens
(via thinking_budget
) is available."max_tokens": 2000
- Directly specifies the maximum number of tokens to use for reasoning
reasoning.effort
(see below), the max_tokens
value will be used to determine the effort level.
Reasoning Effort Level
Supported modelsCurrently supported by OpenAI reasoning models (o1 series, o3 series, GPT-5 series) and Grok models
"effort": "high"
- Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)"effort": "medium"
- Allocates a moderate portion of tokens (approximately 50% of max_tokens)"effort": "low"
- Allocates a smaller portion of tokens (approximately 20% of max_tokens)
reasoning.max_tokens
, the effort level will be set based on the percentages above.
Excluding Reasoning Tokens
If you want the model to use reasoning internally but not include it in the response:"exclude": true
- The model will still use reasoning, but it wonβt be returned in the response
reasoning
field of each message.
Enable Reasoning with Default Config
To enable reasoning with the default parameters:"enabled": true
- Enables reasoning at the βmediumβ effort level with no exclusions.
Legacy Parameters
For backward compatibility, OpenRouter still supports the following legacy parameters:include_reasoning: true
- Equivalent toreasoning: {}
include_reasoning: false
- Equivalent toreasoning: { exclude: true }
reasoning
parameter for better control and future compatibility.
Examples
Basic Usage with Reasoning Tokens
Using Max Tokens for Reasoning
For models that support direct token allocation (like Anthropic models), you can specify the exact number of tokens to use for reasoning:Excluding Reasoning Tokens from Response
If you want the model to use reasoning internally but not include it in the response:Advanced Usage: Reasoning Chain-of-Thought
This example shows how to use reasoning tokens in a more complex workflow. It injects one modelβs reasoning into another model to improve its response quality:Provider-Specific Reasoning Implementation
Anthropic Models with Reasoning Tokens
The latest Claude models, such as anthropic/claude-3.7-sonnet, support working with and returning reasoning tokens. You can enable reasoning on Anthropic models only using the unifiedreasoning
parameter with either effort
or max_tokens
.
Note: The :thinking
variant is no longer supported for Anthropic models. Use the reasoning
parameter instead.
Reasoning Max Tokens for Anthropic Models
When using Anthropic models with reasoning:- When using the
reasoning.max_tokens
parameter, that value is used directly with a minimum of 1024 tokens. - When using the
reasoning.effort
parameter, the budget_tokens are calculated based on themax_tokens
value.
budget_tokens = max(min(max_tokens * {effort_ratio}, 32000), 1024)
effort_ratio is 0.8 for high effort, 0.5 for medium effort, and 0.2 for low effort.
Important: max_tokens
must be strictly higher than the reasoning budget to ensure there are tokens available for the final response after thinking.
Token Usage and BillingPlease note that reasoning tokens are counted as output tokens for billing purposes. Using reasoning tokens will increase your token usage but can significantly improve the quality of model responses.
Examples with Anthropic Models
Example 1: Streaming mode with reasoning tokens
Preserving Reasoning Blocks
Model SupportThe reasoning_details are currently returned by all OpenAI reasoning models (o1 series, o3 series, GPT-5 series) and all Anthropic reasoning models (Claude 3.7, Claude 4, and Claude 4.1 series).
openai/gpt-5-mini
) and Anthropic reasoning models (like anthropic/claude-sonnet-4
) without changing your code structure.
If you want to pass reasoning back in context, you must pass reasoning blocks back to the API. This is useful for maintaining the modelβs reasoning flow and conversation integrity.
Preserving reasoning blocks is useful specifically for tool calling. When models like Claude invoke tools, it is pausing its construction of a response to await external information. When tool results are returned, the model will continue building that existing response. This necessitates preserving reasoning blocks during tool use, for a couple of reasons:
Reasoning continuity: The reasoning blocks capture the modelβs step-by-step reasoning that led to tool requests. When you post tool results, including the original reasoning ensures the model can continue its reasoning from where it left off.
Context maintenance: While tool results appear as user messages in the API structure, theyβre part of a continuous reasoning flow. Preserving reasoning blocks maintains this conceptual flow across multiple API calls.
Important for Reasoning ModelsWhen providing reasoning_details blocks, the entire sequence of consecutive reasoning blocks must match the outputs generated by the model during the original request; you cannot rearrange or modify the sequence of these blocks.
Responses API Shape
When reasoning models generate responses, the reasoning information is structured in a standardized format through thereasoning_details
array. This section documents the API response structure for reasoning details in both streaming and non-streaming responses, based on the schema definitions in the llm-interfaces
package.
reasoning_details Array Structure
Thereasoning_details
field contains an array of reasoning detail objects. Each object in the array represents a specific piece of reasoning information and follows one of three possible types. The location of this array differs between streaming and non-streaming responses:
- Non-streaming responses:
reasoning_details
appears inchoices[].message.reasoning_details
- Streaming responses:
reasoning_details
appears inchoices[].delta.reasoning_details
for each chunk
Common Fields
All reasoning detail objects share these common fields:id
(string | null): Unique identifier for the reasoning detailformat
(string): The format of the reasoning detail, with possible values:"unknown"
- Format is not specified"openai-responses-v1"
- OpenAI responses format version 1"anthropic-claude-v1"
- Anthropic Claude format version 1 (default)
index
(number, optional): Sequential index of the reasoning detail
Reasoning Detail Types
1. Summary Type (reasoning.summary
)
Contains a high-level summary of the reasoning process:
reasoning.encrypted
)
Contains encrypted reasoning data that may be redacted or protected:
reasoning.text
)
Contains raw text reasoning with optional signature verification:
Response Examples
Non-Streaming Response
In non-streaming responses,reasoning_details
appears in the message:
Streaming Response
In streaming responses,reasoning_details
appears in delta chunks as the reasoning is generated:
- Each reasoning detail chunk is sent as it becomes available
- The
reasoning_details
array in each chunk may contain one or more reasoning objects - For encrypted reasoning, the content may appear as
[REDACTED]
in streaming responses - The complete reasoning sequence is built by concatenating all chunks in order
Example: Preserving Reasoning Blocks with OpenRouter and Claude
Was this page helpful?