The Anannas Responses API provides a unified interface for building advanced AI agents capable of executing complex tasks autonomously. This API is compatible with OpenAI’s Responses API format and supports multimodal inputs, reasoning capabilities, and seamless tool integration.
Stateless API Implementation This API implements the stateless version of the Responses API. Unlike OpenAI’s stateful Responses API, Anannas does not maintain conversation state between requests. Each request is completely independent, and you must include the full conversation history in every request.
Beta API This API is in beta stage and may have breaking changes. Use with caution in production environments.
Base URL
https://api.anannas.ai/api/v1/responses
Authentication
All requests require authentication using your Anannas API key:
import requests
response = requests.post(
"https://api.anannas.ai/api/v1/responses" ,
headers = {
"Authorization" : "Bearer <ANANNAS_API_KEY>" ,
"Content-Type" : "application/json" ,
},
json = {
"model" : "openai/gpt-5-mini" ,
"input" : "Hello, world!" ,
},
)
Core Features
Stateless Design : Each request is independent - you manage conversation history client-side
Multimodal Support : Handle various input types, including text, images, and audio
Reasoning Capabilities : Access advanced reasoning with configurable effort levels
Tool Integration : Utilize function calling with support for parallel execution
Streaming Support : Receive responses in real-time as they’re generated
Managing Conversation History Since this API is stateless, you must include the complete conversation history in each request. Include all previous user messages and assistant responses in the input array to maintain context.
The Responses API uses a structured request format with an input array containing conversation messages:
type ResponsesRequest = {
// Required
model : string ;
input : ResponsesMessage [];
// Optional
instructions ?: string ;
response_format ?: { type : 'json_object' };
metadata ?: { [ key : string ] : string };
temperature ?: number ;
max_output_tokens ?: number ;
stream ?: boolean ;
// Tool calling
tools ?: Tool [];
tool_choice ?: 'auto' | 'none' | { type : 'function' ; name : string };
parallel_tool_calls ?: boolean ;
// Advanced parameters
top_p ?: number ;
top_k ?: number ;
frequency_penalty ?: number ;
presence_penalty ?: number ;
stop ?: string [];
seed ?: number ;
// Reasoning
reasoning ?: {
effort ?: 'minimal' | 'low' | 'medium' | 'high' ;
max_tokens ?: number ;
exclude ?: boolean ;
};
// Anannas-specific
provider ?: ProviderPreferences ;
modalities ?: string [];
audio ?: AudioConfig ;
mcp ?: MCPConfig ;
prompt_cache_key ?: string ;
};
The Responses API returns a structured response with an output array:
type ResponsesResponse = {
id : string ;
object : 'response' ;
model : string ;
created : number ;
output : ResponsesOutputItem [];
usage : {
prompt_tokens : number ;
completion_tokens : number ;
total_tokens : number ;
};
metadata ?: { [ key : string ] : any };
};
Example Response:
{
"id" : "resp_abc123" ,
"object" : "response" ,
"model" : "openai/gpt-5-mini" ,
"created" : 1693350000 ,
"output" : [
{
"type" : "message" ,
"role" : "assistant" ,
"status" : "completed" ,
"content" : [
{
"type" : "output_text" ,
"text" : "Hello! How can I help you today?"
}
]
}
],
"usage" : {
"prompt_tokens" : 10 ,
"completion_tokens" : 8 ,
"total_tokens" : 18
}
}
The input field accepts either a simple string or an array of message objects:
Simple String Input:
{
"model" : "openai/gpt-5-mini" ,
"input" : "What is the capital of France?"
}
Structured Message Input:
{
"model" : "openai/gpt-5-mini" ,
"input" : [
{
"type" : "message" ,
"role" : "user" ,
"content" : [
{
"type" : "input_text" ,
"text" : "What is the capital of France?"
}
]
}
]
}
The output array contains one or more output items. Each item can be:
Message : Text response from the model
Function Call : Tool/function invocation
Error : Error information
Message Output:
{
"type" : "message" ,
"role" : "assistant" ,
"status" : "completed" ,
"content" : [
{
"type" : "output_text" ,
"text" : "The capital of France is Paris."
}
]
}
Function Call Output:
{
"type" : "message" ,
"role" : "assistant" ,
"status" : "completed" ,
"content" : [
{
"type" : "function_call" ,
"function_call" : {
"id" : "call_abc123" ,
"name" : "get_weather" ,
"arguments" : "{ \" location \" : \" Paris \" }"
}
}
]
}
Prompt Caching
Check Caching Support For models that support prompt caching and current pricing, visit anannas.ai/models .
Prompt caching allows you to reduce costs and latency by reusing cached prompt prefixes. Anannas supports two caching methods depending on the provider:
OpenAI Models (prompt_cache_key)
For OpenAI models, use the prompt_cache_key parameter:
{
"model" : "openai/gpt-5-mini" ,
"input" : "Hello, world!" ,
"prompt_cache_key" : "my-cache-key-123"
}
Pricing:
Cache reads : Cached input tokens are billed at 50% of the original input token price
Cache writes : No additional cost for creating the cache
How it works:
First request with a prompt_cache_key creates the cache
Subsequent requests with the same key reuse the cached prefix
Cache is automatically managed by the provider
Anthropic Models (cache_control)
For Anthropic Claude models, use the cache_control object in message content:
{
"model" : "anthropic/claude-sonnet-4" ,
"input" : [
{
"type" : "message" ,
"role" : "user" ,
"content" : [
{
"type" : "text" ,
"text" : "Your system instructions here..." ,
"cache_control" : {
"type" : "ephemeral" ,
"ttl" : "5m"
}
}
]
}
]
}
Pricing:
Cache creation : Cache creation tokens are billed at 1.25x (125%) of the original input token price
Cache reads : Cached input tokens are billed at 0.1x (10%) of the original input token price - a 90% discount
Anthropic-specific requirements:
Maximum of 4 content blocks can have cache_control per request
Cache expires after 5 minutes (TTL: "5m")
cache_control must be added to individual content parts within messages
Only "ephemeral" cache type is supported
Example with system message:
{
"model" : "anthropic/claude-sonnet-4" ,
"input" : [
{
"type" : "message" ,
"role" : "user" ,
"content" : [
{
"type" : "text" ,
"text" : "You are a helpful assistant. Always be concise." ,
"cache_control" : {
"type" : "ephemeral" ,
"ttl" : "5m"
}
},
{
"type" : "text" ,
"text" : "What is 2+2?"
}
]
}
]
}
Other Providers
View Caching Support For complete caching support details by provider and model, check anannas.ai/models .
Grok : Supports caching similar to OpenAI with cached tokens at 10% of input price (90% discount)
Nebius : Supports caching with provider-specific pricing
TogetherAI : Supports caching with provider-specific pricing
Monitoring Cache Usage
Cache usage is included in the response usage object:
{
"usage" : {
"prompt_tokens" : 1000 ,
"completion_tokens" : 500 ,
"total_tokens" : 1500 ,
"cache_read_input_tokens" : 800 ,
"cache_creation_input_tokens" : 200
}
}
cache_read_input_tokens: Number of tokens read from cache (discounted pricing)
cache_creation_input_tokens: Number of tokens used to create the cache (Anthropic only, 1.25x pricing)
Error Handling
The Anannas API returns standard HTTP status codes and error responses:
{
"error" : {
"message" : "Invalid request parameters" ,
"type" : "invalid_request_error" ,
"code" : "invalid_parameter"
}
}
Common error codes:
400 Bad Request: Invalid request parameters
401 Unauthorized: Missing or invalid API key
429 Too Many Requests: Rate limit exceeded
500 Internal Server Error: Server error
Rate Limits
Rate limits are applied per API key. See the Limits documentation for details.
Next Steps
Was this page helpful? 👍
Yes👎
No