Skip to main content
Anannas provides a unified Chat Completions API across OpenAI and Anthropic. The schema is designed to be OpenAI-compatible while supporting Anannas-specific routing and model options.

Requests

Completions Request Format
The main endpoint for completions is: POST /chat/completions
Here’s the request schema in TypeScript:
// Definitions of subtypes are below
type Request = {
  // Either "messages" or "prompt" is required
  messages?: Message[];
  prompt?: string;

  // Model selection (defaults to user/org default if unspecified)
  model?: string; // See "Supported Models" section

  response_format?: { type: 'json_object' };

  stop?: string | string[];
  stream?: boolean;

  max_tokens?: number;
  temperature?: number;

  // Tool calling (OpenAI-compatible)
  tools?: Tool[];
  tool_choice?: ToolChoice;

  // Advanced parameters
  seed?: number;
  top_p?: number;
  top_k?: number;
  frequency_penalty?: number;
  presence_penalty?: number;
  repetition_penalty?: number;
  logit_bias?: { [key: number]: number };
  min_p?: number;
  top_a?: number;

  // Anannas-only parameters
  models?: string[];   // For model routing
  route?: 'fallback';  // Smart routing fallback
  provider?: ProviderPreferences; // Provider routing
  user?: string;       // Stable identifier for your end-users
  prompt_cache_key?: string; // Prompt caching key (OpenAI models)
};
Example Request
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_ANANNAS_API_KEY",
    base_url="https://api.anannas.ai/v1"
)

completion = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[
        {"role": "user", "content": "What is the meaning of life?"}
    ]
)

print(completion.choices[0].message)
Headers
You can set optional headers for discoverability:
  • HTTP-Referer: Identifies your app on anannas.ai
  • X-Title: Sets/modifies your app’s title
fetch('https://api.anannas.ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <ANANNAS_API_KEY>',
    'HTTP-Referer': '<YOUR_SITE_URL>',
    'X-Title': '<YOUR_APP_NAME>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'openai/gpt-5-mini',
    messages: [{ role: 'user', content: 'Hello, Anannas!' }],
  }),
});
If the model parameter is omitted, Anannas will select the default for the user/org. If multiple providers/models are available, Anannas’s routing system automatically selects the best option (based on price, availability, and latency) and falls back if a provider fails.
Responses
Anannas normalizes responses to comply with the OpenAI Chat API schema.
type Response = {
  id: string;
  object: 'chat.completion' | 'chat.completion.chunk';
  created: number;
  model: string;
  choices: (NonStreamingChoice | StreamingChoice)[];
  usage?: ResponseUsage;
};
Here’s an exmaple:
{
  "id": "gen-xxxxxxxx",
  "object": "chat.completion",
  "created": 1693350000,
  "model": "openai/gpt-5-mini",
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Hello there!"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}
Finish Reason

Prompt Caching

Prompt caching allows you to reduce costs and latency by reusing cached prompt prefixes. Anannas supports prompt caching for compatible providers.

OpenAI Models (prompt_cache_key)

For OpenAI models, use the prompt_cache_key parameter:
{
  "model": "openai/gpt-5-mini",
  "messages": [
    {"role": "user", "content": "Hello, world!"}
  ],
  "prompt_cache_key": "my-cache-key-123"
}
Pricing:
  • Cache reads: Cached input tokens are billed at 50% of the original input token price
  • Cache writes: No additional cost for creating the cache
How it works:
  1. First request with a prompt_cache_key creates the cache
  2. Subsequent requests with the same key reuse the cached prefix
  3. Cache is automatically managed by the provider

Anthropic Models (cache_control)

For Anthropic Claude models, use the cache_control object in message content. Add cache_control to individual content parts within messages:
{
  "model": "anthropic/claude-sonnet-4",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant. Always be concise.",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "5m"
          }
        },
        {
          "type": "text",
          "text": "What is 2+2?"
        }
      ]
    }
  ]
}
Pricing:
  • Cache creation: Cache creation tokens are billed at 1.25x (125%) of the original input token price
  • Cache reads: Cached input tokens are billed at 0.1x (10%) of the original input token price - a 90% discount
Anthropic-specific requirements:
  • Maximum of 4 content blocks can have cache_control per request
  • Cache expires after 5 minutes (TTL: "5m")
  • cache_control must be added to individual content parts within messages
  • Only "ephemeral" cache type is supported

Other Providers

  • Grok: Supports caching with cached tokens at 10% of input price (90% discount)
  • Nebius: Supports caching with provider-specific pricing
  • TogetherAI: Supports caching with provider-specific pricing

Monitoring Cache Usage

Cache usage is included in the response usage object:
{
  "usage": {
    "prompt_tokens": 1000,
    "completion_tokens": 500,
    "total_tokens": 1500,
    "cache_read_input_tokens": 800,
    "cache_creation_input_tokens": 200
  }
}
  • cache_read_input_tokens: Number of tokens read from cache (discounted pricing)
  • cache_creation_input_tokens: Number of tokens used to create the cache (Anthropic only, 1.25x pricing)
Querying Cost and Stats Anannas ensures your requests remain provider-agnostic, resilient, and cost-optimized, while staying fully OpenAI-compatible.
Was this page helpful?