Skip to main content

Overview

This document describes all parameters available in the /v1/chat/completions endpoint. Parameters are validated server-side; unsupported parameters for a given provider are ignored.

Required Parameters

model

Type: string
Required: Yes
Description: Model identifier in format provider/model-name
Examples:
  • openai/gpt-5-mini
  • anthropic/claude-3-sonnet
  • openai/gpt-3.5-turbo

messages

Type: Message[]
Required: Yes
Minimum: 1 message
Description: Array of message objects with role and content
Message roles:
  • system: System instructions (typically first message)
  • user: User input
  • assistant: Model responses (for conversation history)
  • tool: Tool execution results

Sampling Parameters

temperature

Type: number
Range: 0.0 - 2.0
Default: 1.0
Description: Controls randomness. Lower values make output more deterministic.
  • 0.0: Most deterministic
  • 1.0: Balanced
  • 2.0: Most random

top_p

Type: number
Range: 0.0 - 1.0
Default: 1.0
Description: Nucleus sampling - considers tokens with cumulative probability mass.

max_tokens

Type: integer
Minimum: 1
Description: Maximum tokens to generate. Model-specific limits apply.

max_completion_tokens

Type: integer
Description: Alternative to max_tokens (provider-specific).

stop

Type: string | string[]
Description: Stop sequences that halt generation. Can be a single string or array.
{
  "stop": ["\n\n", "Human:"]
}

seed

Type: integer
Description: Random seed for deterministic outputs. Only supported by some models.

Penalties

frequency_penalty

Type: number
Range: -2.0 - 2.0
Description: Reduces likelihood of repeating tokens. Positive values decrease repetition.

presence_penalty

Type: number
Range: -2.0 - 2.0
Description: Reduces likelihood of discussing new topics. Positive values encourage new topics.

repetition_penalty

Type: number
Range: (0, 2]
Description: Provider-specific repetition control.

Advanced Sampling

top_k

Type: integer
Minimum: 0
Description: Limits sampling to top K tokens by probability.
Not allowed with reasoning: When reasoning is enabled, top_k is not supported and will result in an error.

top_a

Type: number
Range: [0, 1]
Description: Provider-specific sampling parameter.

min_p

Type: number
Range: [0, 1]
Description: Minimum probability threshold for token selection.

logit_bias

Type: { [token_id: number]: number }
Description: Bias specific tokens by ID. Values typically range from -100 to 100.
{
  "logit_bias": {
    "1234": 10,   // Increase likelihood
    "5678": -10   // Decrease likelihood
  }
}

Tool Calling

tools

Type: Tool[]
Description: Array of function definitions for tool calling.
{
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  }]
}

tool_choice

Type: "none" | "auto" | "required" | { type: "function", function: { name: string } }
Default: "auto"
Description: Controls tool usage behavior.
  • "none": No tools called
  • "auto": Model decides
  • "required": Model must call a tool
  • Object: Force specific function
Restrictions with reasoning: When reasoning is enabled, only "auto" or "none" are allowed. Using "required" or forcing a specific tool will result in an error. This restriction enables interleaved thinking, which allows reasoning between tool calls.

parallel_tool_calls

Type: boolean
Default: true
Description: Allow multiple tool calls in a single response.

Structured Outputs

response_format

Type: { type: "json_object" } | { type: "json_schema", json_schema: object }
Description: Enforce JSON output format.
JSON Object:
{
  "response_format": {
    "type": "json_object"
  }
}
JSON Schema:
{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "response",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"}
        }
      }
    }
  }
}

Streaming

stream

Type: boolean
Default: false
Description: Enable Server-Sent Events streaming. See Streaming documentation.

Reasoning

reasoning

Type: object
Description: Configure reasoning for models that support it.

Check Reasoning Support

For models that support reasoning and their configuration options, visit anannas.ai/models.
{
  "reasoning": {
    "effort": "high",      // "low" | "medium" | "high"
    "max_tokens": 10000,   // Maximum reasoning tokens
    "enabled": true,       // Enable reasoning
    "exclude": false       // Exclude from response
  }
}
Interleaved Thinking with ToolsWhen using reasoning with tools on Claude 4 models (Sonnet 4.5, Opus 4.5, Haiku 4.5), interleaved thinking is automatically enabled. This allows the model to reason between tool calls. With interleaved thinking, max_tokens in the reasoning config can exceed the request’s max_tokens parameter, as it represents the total budget across all thinking blocks within one assistant turn.Interleaved thinking requires tool_choice: "auto" (or no tool_choice specified). Using tool_choice: "required" or forcing a specific tool will result in an error when reasoning is enabled.

thinking_config

Type: object
Description: External API alias for reasoning. Maps to reasoning internally.
{
  "thinking_config": {
    "include_thoughts": true,
    "thinking_budget": 10000,
    "thinking_level": "high"
  }
}

Multimodal

modalities

Type: string[]
Description: Requested output modalities: ["text", "audio", "image"]

audio

Type: object
Description: Audio output configuration.
{
  "audio": {
    "voice": "alloy",
    "format": "mp3"
  }
}

Prompt Caching

prompt_cache_key

Type: string
Description: Cache key for OpenAI prompt caching. See Overview.

Routing

models

Type: string[]
Description: Fallback model list for routing.

route

Type: "fallback"
Description: Enable smart fallback routing.

provider

Type: object
Description: Provider selection preferences.

Check Provider Pricing

For current pricing to set max_price limits, visit anannas.ai/models.
{
  "provider": {
    "order": ["openai", "anthropic"],
    "allow_fallbacks": true,
    "require_parameters": false,
    "data_collection": "deny",
    "zdr": true,
    "only": ["openai"],
    "ignore": ["anthropic"],
    "quantizations": ["q4", "q8"],
    "sort": "price",
    "max_price": {
      "prompt": 0.001,
      "completion": 0.002
    }
  }
}

fallbacks

Type: Array<string | object>
Description: Explicit fallback chain.
{
  "fallbacks": [
    "anthropic/claude-3-sonnet",
    {
      "model": "openai/gpt-3.5-turbo",
      "provider": {"only": ["openai"]}
    }
  ]
}

User Tracking

user

Type: string
Description: Stable identifier for end-users (abuse prevention).

Metadata

metadata

Type: { [key: string]: string }
Description: Custom metadata for request tracking.
{
  "metadata": {
    "request_id": "req-123",
    "environment": "production"
  }
}

Provider-Specific Parameters

Check Parameter Support

For detailed parameter support by model and provider, visit anannas.ai/models.
Some parameters are provider-specific and may be ignored by others:
  • Mistral: safe_prompt
  • Hyperbolic: raw_mode
  • Grok: search_parameters, deferred
  • Anthropic: cache_control in message content

Parameter Validation

  • Invalid parameter values return 400 Bad Request
  • Unsupported parameters are silently ignored
  • Provider-specific parameters are passed through when supported

See Also