Responses API

The Anannas Responses API provides a unified interface for building advanced AI agents capable of executing complex tasks autonomously. This API is compatible with OpenAI’s Responses API format and supports multimodal inputs, reasoning capabilities, and seamless tool integration.

Stateless API ImplementationThis API implements the stateless version of the Responses API. Unlike OpenAI’s stateful Responses API, Anannas does not maintain conversation state between requests. Each request is completely independent, and you must include the full conversation history in every request.

Beta APIThis API is in beta stage and may have breaking changes. Use with caution in production environments.

Base URL

https://api.anannas.ai/api/v1/responses

Authentication

All requests require authentication using your Anannas API key:

import requests

response = requests.post(
    "https://api.anannas.ai/api/v1/responses",
    headers={
        "Authorization": "Bearer <ANANNAS_API_KEY>",
        "Content-Type": "application/json",
    },
    json={
        "model": "openai/gpt-5-mini",
        "input": "Hello, world!",
    },
)

Core Features

Stateless Design: Each request is independent - you manage conversation history client-side
Multimodal Support: Handle various input types, including text, images, and audio
Reasoning Capabilities: Access advanced reasoning with configurable effort levels
Tool Integration: Utilize function calling with support for parallel execution
Streaming Support: Receive responses in real-time as they’re generated

Managing Conversation HistorySince this API is stateless, you must include the complete conversation history in each request. Include all previous user messages and assistant responses in the input array to maintain context.

Request Format

The Responses API uses a structured request format with an input array containing conversation messages:

type ResponsesRequest = {
  // Required
  model: string;
  input: ResponsesMessage[];

  // Optional
  instructions?: string;
  response_format?: { type: 'json_object' };
  metadata?: { [key: string]: string };
  temperature?: number;
  max_output_tokens?: number;
  stream?: boolean;
  
  // Tool calling
  tools?: Tool[];
  tool_choice?: 'auto' | 'none' | { type: 'function'; name: string };
  parallel_tool_calls?: boolean;

  // Advanced parameters
  top_p?: number;
  top_k?: number;
  frequency_penalty?: number;
  presence_penalty?: number;
  stop?: string[];
  seed?: number;

  // Reasoning
  reasoning?: {
    effort?: 'minimal' | 'low' | 'medium' | 'high';
    max_tokens?: number;
    exclude?: boolean;
  };

  // Anannas-specific
  provider?: ProviderPreferences;
  modalities?: string[];
  audio?: AudioConfig;
  mcp?: MCPConfig;
  prompt_cache_key?: string;
};

Response Format

The Responses API returns a structured response with an output array:

type ResponsesResponse = {
  id: string;
  object: 'response';
  model: string;
  created: number;
  output: ResponsesOutputItem[];
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
  metadata?: { [key: string]: any };
};

Example Response:

{
  "id": "resp_abc123",
  "object": "response",
  "model": "openai/gpt-5-mini",
  "created": 1693350000,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Hello! How can I help you today?"
        }
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Input Format

The input field accepts either a simple string or an array of message objects: Simple String Input:

{
  "model": "openai/gpt-5-mini",
  "input": "What is the capital of France?"
}

Structured Message Input:

{
  "model": "openai/gpt-5-mini",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "What is the capital of France?"
        }
      ]
    }
  ]
}

Output Format

The output array contains one or more output items. Each item can be:

Message: Text response from the model
Function Call: Tool/function invocation
Error: Error information

Message Output:

{
  "type": "message",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "output_text",
      "text": "The capital of France is Paris."
    }
  ]
}

Function Call Output:

{
  "type": "message",
  "role": "assistant",
  "status": "completed",
  "content": [
    {
      "type": "function_call",
      "function_call": {
        "id": "call_abc123",
        "name": "get_weather",
        "arguments": "{\"location\": \"Paris\"}"
      }
    }
  ]
}

Prompt Caching

Check Caching Support

For models that support prompt caching and current pricing, visit anannas.ai/models.

Prompt caching allows you to reduce costs and latency by reusing cached prompt prefixes. Anannas supports two caching methods depending on the provider:

OpenAI Models (`prompt_cache_key`)

For OpenAI models, use the prompt_cache_key parameter:

{
  "model": "openai/gpt-5-mini",
  "input": "Hello, world!",
  "prompt_cache_key": "my-cache-key-123"
}

Pricing:

Cache reads: Cached input tokens are billed at 50% of the original input token price
Cache writes: No additional cost for creating the cache

How it works:

First request with a prompt_cache_key creates the cache
Subsequent requests with the same key reuse the cached prefix
Cache is automatically managed by the provider

Anthropic Models (`cache_control`)

For Anthropic Claude models, use the cache_control object in message content:

{
  "model": "anthropic/claude-sonnet-4",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Your system instructions here...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "5m"
          }
        }
      ]
    }
  ]
}

Pricing:

Cache creation: Cache creation tokens are billed at 1.25x (125%) of the original input token price
Cache reads: Cached input tokens are billed at 0.1x (10%) of the original input token price - a 90% discount

Anthropic-specific requirements:

Maximum of 4 content blocks can have cache_control per request
Cache expires after 5 minutes (TTL: "5m")
cache_control must be added to individual content parts within messages
Only "ephemeral" cache type is supported

Example with system message:

{
  "model": "anthropic/claude-sonnet-4",
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant. Always be concise.",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "5m"
          }
        },
        {
          "type": "text",
          "text": "What is 2+2?"
        }
      ]
    }
  ]
}

Other Providers

View Caching Support

For complete caching support details by provider and model, check anannas.ai/models.

Grok: Supports caching similar to OpenAI with cached tokens at 10% of input price (90% discount)
Nebius: Supports caching with provider-specific pricing
TogetherAI: Supports caching with provider-specific pricing

Monitoring Cache Usage

Cache usage is included in the response usage object:

{
  "usage": {
    "prompt_tokens": 1000,
    "completion_tokens": 500,
    "total_tokens": 1500,
    "cache_read_input_tokens": 800,
    "cache_creation_input_tokens": 200
  }
}

cache_read_input_tokens: Number of tokens read from cache (discounted pricing)
cache_creation_input_tokens: Number of tokens used to create the cache (Anthropic only, 1.25x pricing)

Error Handling

The Anannas API returns standard HTTP status codes and error responses:

{
  "error": {
    "message": "Invalid request parameters",
    "type": "invalid_request_error",
    "code": "invalid_parameter"
  }
}

Common error codes:

400 Bad Request: Invalid request parameters
401 Unauthorized: Missing or invalid API key
429 Too Many Requests: Rate limit exceeded
500 Internal Server Error: Server error

Rate Limits

Rate limits are applied per API key. See the Limits documentation for details.

Next Steps

Learn basic usage with simple text requests
Explore streaming responses for real-time interactions
Integrate tool calling for function execution
Configure reasoning capabilities for advanced problem-solving

Was this page helpful?

Getting Started

Features

API

Models

Use Cases

Community

Base URL

Authentication

Core Features

Request Format

Response Format

Input Format

Output Format

Prompt Caching

Check Caching Support

OpenAI Models (`prompt_cache_key`)

Anthropic Models (`cache_control`)

Other Providers

View Caching Support

Monitoring Cache Usage

Error Handling

Rate Limits

Next Steps

Getting Started

Features

API

Models

Use Cases

Community

​Base URL

​Authentication

​Core Features

​Request Format

​Response Format

​Input Format

​Output Format

​Prompt Caching

Check Caching Support

​OpenAI Models (prompt_cache_key)

​Anthropic Models (cache_control)

​Other Providers

View Caching Support

​Monitoring Cache Usage

​Error Handling

​Rate Limits

​Next Steps

Base URL

Authentication

Core Features

Request Format

Response Format

Input Format

Output Format

Prompt Caching

OpenAI Models (`prompt_cache_key`)

Anthropic Models (`cache_control`)

Other Providers

Monitoring Cache Usage

Error Handling

Rate Limits

Next Steps