Usage Information
When enabled, the API will return detailed usage information including:- Prompt and completion token counts using the model’s native tokenizer
- Cost in credits
- Reasoning token counts (if applicable)
- Cached token counts (if available)
Enabling Usage Accounting
You can enable usage accounting in your requests by including theusage
parameter:
Response Format
When usage accounting is enabled, the response will include ausage
object with detailed token information:
cached_tokens
is the number of tokens that were read from the cache. At this point in time, we do not support retrieving the number of tokens that were written to the cache.
Cost Breakdown
The usage response includes detailed cost information:cost
: The total amount charged to your account
Performance ImpactEnabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
Benefits
- Efficiency: Get usage information without making separate API calls
- Accuracy: Token counts are calculated using the model’s native tokenizer
- Transparency: Track costs and cached token usage in real-time
- Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens
Best Practices
- Enable usage tracking when you need to monitor token consumption or costs
- Account for the slight delay in the final response when usage accounting is enabled
- Consider implementing usage tracking in development to optimize token usage before production
- Use the cached token information to optimize your application’s performance
Was this page helpful?