Documentation Index
Fetch the complete documentation index at: https://docs.anannas.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
POST https://api.anannas.ai/v1/chat/completions
Request Schema
The Anannas API follows the OpenAI Chat Completions format with Anannas-specific extensions. The request body is JSON.
Required Fields
model (string): Model identifier in format provider/model-name (e.g., openai/gpt-5-mini, anthropic/claude-3-sonnet)
messages (array): Array of message objects. Minimum 1 message required.
Request Type Definition
type ChatCompletionRequest = {
// Required
model: string;
messages: Message[];
// Optional: Sampling parameters
temperature?: number; // 0.0-2.0, default: 1.0
top_p?: number; // 0.0-1.0, default: 1.0
max_tokens?: number; // Maximum tokens to generate
max_completion_tokens?: number; // Alternative to max_tokens
stop?: string | string[]; // Stop sequences
seed?: number; // For deterministic outputs
frequency_penalty?: number; // -2.0 to 2.0
presence_penalty?: number; // -2.0 to 2.0
top_k?: number; // 0 or above
repetition_penalty?: number; // (0, 2]
min_p?: number; // [0, 1]
top_a?: number; // [0, 1]
logit_bias?: { [token_id: number]: number }; // Token bias map
// Optional: Streaming
stream?: boolean; // Enable Server-Sent Events
// Optional: Tool calling (OpenAI-compatible)
tools?: Tool[];
tool_choice?: "none" | "auto" | "required" | { type: "function", function: { name: string } };
parallel_tool_calls?: boolean; // Default: true
// Optional: Structured outputs
response_format?: {
type: "json_object";
} | {
type: "json_schema";
json_schema: {
name: string;
strict?: boolean;
schema: object; // JSON Schema object
};
};
// Optional: Multimodal
modalities?: string[]; // ["text", "audio", "image"]
audio?: AudioConfig; // Audio output configuration
// Optional: Reasoning (for o1, o3, Claude Sonnet 4.5, etc.)
reasoning?: {
effort?: "low" | "medium" | "high";
max_tokens?: number;
enabled?: boolean;
exclude?: boolean;
};
thinking_config?: { // External API alias for reasoning
include_thoughts?: boolean;
thinking_budget?: number;
thinking_level?: string;
};
// Optional: Anannas-specific routing
models?: string[]; // Model routing fallbacks
route?: "fallback"; // Enable smart fallback routing
provider?: {
order?: string[]; // Provider preference order
allow_fallbacks?: boolean; // Default: true
require_parameters?: boolean;
data_collection?: "allow" | "deny";
zdr?: boolean; // Zero Data Retention only
only?: string[]; // Whitelist providers
ignore?: string[]; // Blacklist providers
quantizations?: string[]; // Filter by quantization
sort?: "price" | "throughput" | "latency";
max_price?: {
prompt?: number;
completion?: number;
request?: number;
image?: number;
};
};
fallbacks?: Array<string | {
model: string;
provider?: ProviderPreferences;
metadata?: { [key: string]: string };
reason?: string;
}>;
// Optional: User tracking
user?: string; // Stable user identifier
// Optional: Prompt caching
prompt_cache_key?: string; // OpenAI prompt caching
// Optional: Metadata
metadata?: { [key: string]: string }; // Custom tracking data
// Optional: Plugins (PDF support)
plugins?: Array<{
type: string;
[key: string]: any;
}>;
// Optional: Grok-specific
search_parameters?: {
mode?: "off" | "on" | "auto";
max_search_results?: number; // 1-30
from_date?: string; // ISO-8601
to_date?: string; // ISO-8601
return_citations?: boolean;
sources?: Array<{
type: "x" | "web" | "news" | "rss" | "live_search";
included_x_handles?: string[];
excluded_x_handles?: string[];
allowed_websites?: string[];
excluded_websites?: string[];
country?: string;
safe_search?: boolean;
links?: string[];
}>;
};
// Optional: MCP (Model Context Protocol)
mcp?: {
servers: Array<{
name: string;
[key: string]: any;
}>;
};
// Deprecated: Use messages instead
prompt?: string;
};
Message Object
type Message = {
role: "system" | "user" | "assistant" | "tool";
content: string | ContentPart[];
name?: string; // For named tools/functions
tool_call_id?: string; // For tool result messages
tool_calls?: ToolCall[]; // Assistant tool invocations
};
type ContentPart = {
type: "text" | "image_url" | "file" | "input_audio";
text?: string;
image_url?: {
url: string; // URL or base64 data URI
detail?: "low" | "high" | "auto";
};
file?: {
url?: string;
data?: string; // Base64
type?: string; // "application/pdf"
};
input_audio?: {
data: string; // Base64 audio
format: string; // "wav", "mp3", etc.
};
cache_control?: { // Anthropic caching
type: "ephemeral";
ttl: string; // "5m"
};
};
Example Request
import requests
response = requests.post(
"https://api.anannas.ai/v1/chat/completions",
headers={
"Authorization": "Bearer <ANANNAS_API_KEY>",
"Content-Type": "application/json",
},
json={
"model": "openai/gpt-5-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing"}
],
"temperature": 0.7,
"max_tokens": 500,
"stream": False
}
)
Response Schema
Responses follow the OpenAI Chat Completions format:
type ChatCompletionResponse = {
id: string; // Unique completion ID
object: "chat.completion" | "chat.completion.chunk";
created: number; // Unix timestamp
model: string; // Model identifier used
choices: Array<{
index: number;
message: {
role: "assistant";
content: string | null;
tool_calls?: ToolCall[];
refusal?: string | null; // Content refusal (Anthropic)
reasoning?: string; // Reasoning tokens (o1, o3, etc.)
audio?: AudioOutput;
images?: ImageOutput[];
citations?: string[]; // Grok citations
};
finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | "null";
delta?: { // Streaming only
role?: "assistant";
content?: string;
tool_calls?: ToolCall[];
};
}>;
usage: {
prompt_tokens: number;
completion_tokens: number;
total_tokens: number;
cache_read_input_tokens?: number; // Cached tokens read
cache_creation_input_tokens?: number; // Cache creation cost (Anthropic)
reasoning_tokens?: number; // Reasoning token count
audio_tokens?: number; // Audio token count
};
system_fingerprint?: string;
};
Example Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677652288,
"model": "openai/gpt-5-mini",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is a computational paradigm..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165
}
}
Required
Authorization: Bearer <ANANNAS_API_KEY> - API key authentication
Content-Type: application/json - Request content type
Optional
HTTP-Referer: <YOUR_SITE_URL> - Identifies your application
X-Title: <YOUR_APP_NAME> - Sets application name for analytics
Finish Reasons
The finish_reason field indicates why generation stopped:
stop: Model generated a stop sequence or natural completion
length: Reached max_tokens limit
tool_calls: Model requested tool execution
content_filter: Content was filtered by safety systems
null: Generation incomplete (streaming)
Prompt Caching
Check Caching Support
For models that support prompt caching and current pricing, visit anannas.ai/models.
OpenAI Models
Use prompt_cache_key to cache prompt prefixes:
{
"model": "openai/gpt-5-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
],
"prompt_cache_key": "system-prompt-v1"
}
Pricing:
- Cache reads: 50% of input token price
- Cache writes: No additional cost
Anthropic Models
Use cache_control in message content parts:
{
"model": "anthropic/claude-3-sonnet",
"messages": [{
"role": "user",
"content": [
{
"type": "text",
"text": "You are a helpful assistant.",
"cache_control": {
"type": "ephemeral",
"ttl": "5m"
}
},
{
"type": "text",
"text": "What is 2+2?"
}
]
}]
}
Pricing:
- Cache creation: 1.25x input token price
- Cache reads: 0.1x input token price (90% discount)
Verify Caching Pricing
For current caching pricing and supported models, check anannas.ai/models.
Limits:
- Maximum 4 content blocks with
cache_control per request
- Cache expires after 5 minutes
Error Responses
Errors follow this format:
{
"error": {
"message": "Error description",
"type": "error_type",
"code": "error_code"
}
}
Common error types:
invalid_request_error: Malformed request, missing required fields
authentication_error: Invalid or missing API key
rate_limit_error: Rate limit exceeded
insufficient_quota_error: Insufficient credits (402)
server_error: Internal server error
Model Routing
If model is omitted, Anannas selects the default model for your account. The routing system automatically:
- Selects optimal provider based on price, availability, and latency
- Falls back to alternative providers if primary fails
- Respects
provider preferences when specified
Use fallbacks for explicit cross-model fallback chains:
{
"model": "openai/gpt-5-mini",
"fallbacks": [
"anthropic/claude-3-sonnet",
"openai/gpt-3.5-turbo"
],
"messages": [...]
}
See Also