Parameters

Sampling parameters shape the token generation process of the model.
You may send any parameters from the list below, as well as others, to Anannas. Anannas will default to the values listed below if certain parameters are absent from your request (for example, temperature to 1.0). We will also transmit some provider-specific parameters, such as safe_prompt for Mistral or raw_mode for Hyperbolic directly to the respective providers if specified. This page lists the request parameters you can send to /chat/completions. If a parameter is unsupported by the selected provider, it will be ignored.

Please check the documentation for the specific model/provider to confirm which parameters are supported.

Universal Parameters

Temperature

Key: temperature
float, 0.0 – 2.0
Default: 1.0

Top P

Key: top_p
float, 0.0 – 1.0
Default: 1.0

Max Tokens

Key: max_tokens
integer ≥ 1

Stop

Key: stop
string or array of strings
Stops generation when matched.

OpenAI-specific

Tools

Key: tools
Supports OpenAI tool calling API.

Tool Choice

Key: tool_choice
"none" | "auto" | "required" | {object}"

Parallel Tool Calls

Key: parallel_tool_calls
Default: true

Response Format

Key: response_format
{ "type": "json_object" } → JSON output

Logit Bias

Key: logit_bias
Bias token selection by ID.

Seed

Key: seed
Enables deterministic outputs (when supported).

Prompt Cache Key

Key: prompt_cache_key
string
Enables prompt caching for OpenAI models. Cached input tokens are billed at 50% of the original input token price. First request with a key creates the cache; subsequent requests with the same key reuse the cached prefix.

Anthropic-specific

system messages must be passed in the root system field (not inside messages).
No logit_bias support.
response_format is not yet supported — JSON enforcement must be done via prompt instructions.

Cache Control

Key: cache_control (in message content parts)
object with type: "ephemeral" and ttl: "5m"
Enables prompt caching for Anthropic Claude models. Add cache_control to individual content parts within messages.
Pricing: Cache creation tokens billed at 1.25x input price; cache reads at 0.1x (90% discount).
Limits: Maximum 4 content blocks with cache_control per request; cache expires after 5 minutes.

Was this page helpful?

Getting Started

Features

API

Models

Use Cases

Community

Universal Parameters

OpenAI-specific

Anthropic-specific

Getting Started

Features

API

Models

Use Cases

Community

​Universal Parameters

​OpenAI-specific

​Anthropic-specific

Universal Parameters

OpenAI-specific

Anthropic-specific