Skip to main content

Endpoint

POST https://api.anannas.ai/v1/chat/completions

Request Schema

The Anannas API follows the OpenAI Chat Completions format with Anannas-specific extensions. The request body is JSON.

Required Fields

  • model (string): Model identifier in format provider/model-name (e.g., openai/gpt-5-mini, anthropic/claude-3-sonnet)
  • messages (array): Array of message objects. Minimum 1 message required.

Request Type Definition

type ChatCompletionRequest = {
  // Required
  model: string;
  messages: Message[];

  // Optional: Sampling parameters
  temperature?: number;        // 0.0-2.0, default: 1.0
  top_p?: number;              // 0.0-1.0, default: 1.0
  max_tokens?: number;         // Maximum tokens to generate
  max_completion_tokens?: number; // Alternative to max_tokens
  stop?: string | string[];    // Stop sequences
  seed?: number;               // For deterministic outputs
  frequency_penalty?: number;  // -2.0 to 2.0
  presence_penalty?: number;   // -2.0 to 2.0
  top_k?: number;              // 0 or above
  repetition_penalty?: number; // (0, 2]
  min_p?: number;              // [0, 1]
  top_a?: number;              // [0, 1]
  logit_bias?: { [token_id: number]: number }; // Token bias map

  // Optional: Streaming
  stream?: boolean;            // Enable Server-Sent Events

  // Optional: Tool calling (OpenAI-compatible)
  tools?: Tool[];
  tool_choice?: "none" | "auto" | "required" | { type: "function", function: { name: string } };
  parallel_tool_calls?: boolean; // Default: true

  // Optional: Structured outputs
  response_format?: {
    type: "json_object";
  } | {
    type: "json_schema";
    json_schema: {
      name: string;
      strict?: boolean;
      schema: object; // JSON Schema object
    };
  };

  // Optional: Multimodal
  modalities?: string[];       // ["text", "audio", "image"]
  audio?: AudioConfig;         // Audio output configuration

  // Optional: Reasoning (for o1, o3, Claude Sonnet 4.5, etc.)
  reasoning?: {
    effort?: "low" | "medium" | "high";
    max_tokens?: number;
    enabled?: boolean;
    exclude?: boolean;
  };
  thinking_config?: {          // External API alias for reasoning
    include_thoughts?: boolean;
    thinking_budget?: number;
    thinking_level?: string;
  };

  // Optional: Anannas-specific routing
  models?: string[];           // Model routing fallbacks
  route?: "fallback";          // Enable smart fallback routing
  provider?: {
    order?: string[];           // Provider preference order
    allow_fallbacks?: boolean;  // Default: true
    require_parameters?: boolean;
    data_collection?: "allow" | "deny";
    zdr?: boolean;              // Zero Data Retention only
    only?: string[];            // Whitelist providers
    ignore?: string[];          // Blacklist providers
    quantizations?: string[];   // Filter by quantization
    sort?: "price" | "throughput" | "latency";
    max_price?: {
      prompt?: number;
      completion?: number;
      request?: number;
      image?: number;
    };
  };
  fallbacks?: Array<string | {
    model: string;
    provider?: ProviderPreferences;
    metadata?: { [key: string]: string };
    reason?: string;
  }>;

  // Optional: User tracking
  user?: string;               // Stable user identifier

  // Optional: Prompt caching
  prompt_cache_key?: string;   // OpenAI prompt caching

  // Optional: Metadata
  metadata?: { [key: string]: string }; // Custom tracking data

  // Optional: Plugins (PDF support)
  plugins?: Array<{
    type: string;
    [key: string]: any;
  }>;

  // Optional: Grok-specific
  search_parameters?: {
    mode?: "off" | "on" | "auto";
    max_search_results?: number; // 1-30
    from_date?: string;          // ISO-8601
    to_date?: string;            // ISO-8601
    return_citations?: boolean;
    sources?: Array<{
      type: "x" | "web" | "news" | "rss" | "live_search";
      included_x_handles?: string[];
      excluded_x_handles?: string[];
      allowed_websites?: string[];
      excluded_websites?: string[];
      country?: string;
      safe_search?: boolean;
      links?: string[];
    }>;
  };

  // Optional: MCP (Model Context Protocol)
  mcp?: {
    servers: Array<{
      name: string;
      [key: string]: any;
    }>;
  };

  // Deprecated: Use messages instead
  prompt?: string;
};

Message Object

type Message = {
  role: "system" | "user" | "assistant" | "tool";
  content: string | ContentPart[];
  name?: string;               // For named tools/functions
  tool_call_id?: string;      // For tool result messages
  tool_calls?: ToolCall[];     // Assistant tool invocations
};

type ContentPart = {
  type: "text" | "image_url" | "file" | "input_audio";
  text?: string;
  image_url?: {
    url: string;               // URL or base64 data URI
    detail?: "low" | "high" | "auto";
  };
  file?: {
    url?: string;
    data?: string;             // Base64
    type?: string;             // "application/pdf"
  };
  input_audio?: {
    data: string;              // Base64 audio
    format: string;            // "wav", "mp3", etc.
  };
  cache_control?: {            // Anthropic caching
    type: "ephemeral";
    ttl: string;               // "5m"
  };
};

Example Request

import requests

response = requests.post(
  "https://api.anannas.ai/v1/chat/completions",
  headers={
    "Authorization": "Bearer <ANANNAS_API_KEY>",
    "Content-Type": "application/json",
  },
  json={
    "model": "openai/gpt-5-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": False
  }
)

Response Schema

Responses follow the OpenAI Chat Completions format:
type ChatCompletionResponse = {
  id: string;                  // Unique completion ID
  object: "chat.completion" | "chat.completion.chunk";
  created: number;              // Unix timestamp
  model: string;                // Model identifier used
  choices: Array<{
    index: number;
    message: {
      role: "assistant";
      content: string | null;
      tool_calls?: ToolCall[];
      refusal?: string | null;  // Content refusal (Anthropic)
      reasoning?: string;       // Reasoning tokens (o1, o3, etc.)
      audio?: AudioOutput;
      images?: ImageOutput[];
      citations?: string[];    // Grok citations
    };
    finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "null";
    delta?: {                  // Streaming only
      role?: "assistant";
      content?: string;
      tool_calls?: ToolCall[];
    };
  }>;
  usage: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
    cache_read_input_tokens?: number;      // Cached tokens read
    cache_creation_input_tokens?: number;  // Cache creation cost (Anthropic)
    reasoning_tokens?: number;            // Reasoning token count
    audio_tokens?: number;                 // Audio token count
  };
  system_fingerprint?: string;
};

Example Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-5-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum computing is a computational paradigm..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165
  }
}

Headers

Required

  • Authorization: Bearer <ANANNAS_API_KEY> - API key authentication
  • Content-Type: application/json - Request content type

Optional

  • HTTP-Referer: <YOUR_SITE_URL> - Identifies your application
  • X-Title: <YOUR_APP_NAME> - Sets application name for analytics

Finish Reasons

The finish_reason field indicates why generation stopped:
  • stop: Model generated a stop sequence or natural completion
  • length: Reached max_tokens limit
  • tool_calls: Model requested tool execution
  • content_filter: Content was filtered by safety systems
  • null: Generation incomplete (streaming)

Prompt Caching

Check Caching Support

For models that support prompt caching and current pricing, visit anannas.ai/models.

OpenAI Models

Use prompt_cache_key to cache prompt prefixes:
{
  "model": "openai/gpt-5-mini",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
  ],
  "prompt_cache_key": "system-prompt-v1"
}
Pricing:
  • Cache reads: 50% of input token price
  • Cache writes: No additional cost

Anthropic Models

Use cache_control in message content parts:
{
  "model": "anthropic/claude-3-sonnet",
  "messages": [{
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "You are a helpful assistant.",
        "cache_control": {
          "type": "ephemeral",
          "ttl": "5m"
        }
      },
      {
        "type": "text",
        "text": "What is 2+2?"
      }
    ]
  }]
}
Pricing:
  • Cache creation: 1.25x input token price
  • Cache reads: 0.1x input token price (90% discount)

Verify Caching Pricing

For current caching pricing and supported models, check anannas.ai/models.
Limits:
  • Maximum 4 content blocks with cache_control per request
  • Cache expires after 5 minutes

Error Responses

Errors follow this format:
{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "code": "error_code"
  }
}
Common error types:
  • invalid_request_error: Malformed request, missing required fields
  • authentication_error: Invalid or missing API key
  • rate_limit_error: Rate limit exceeded
  • insufficient_quota_error: Insufficient credits (402)
  • server_error: Internal server error

Model Routing

If model is omitted, Anannas selects the default model for your account. The routing system automatically:
  1. Selects optimal provider based on price, availability, and latency
  2. Falls back to alternative providers if primary fails
  3. Respects provider preferences when specified
Use fallbacks for explicit cross-model fallback chains:
{
  "model": "openai/gpt-5-mini",
  "fallbacks": [
    "anthropic/claude-3-sonnet",
    "openai/gpt-3.5-turbo"
  ],
  "messages": [...]
}

See Also