Skip to main content
Sampling parameters shape the token generation process of the model.
You may send any parameters from the list below, as well as others, to Anannas.
Anannas will default to the values listed below if certain parameters are absent from your request (for example, temperature to 1.0). We will also transmit some provider-specific parameters, such as safe_prompt for Mistral or raw_mode for Hyperbolic directly to the respective providers if specified. This page lists the request parameters you can send to /chat/completions. If a parameter is unsupported by the selected provider, it will be ignored.
Please check the documentation for the specific model/provider to confirm which parameters are supported.

Universal Parameters

Temperature
  • Key: temperature
  • float, 0.0 – 2.0
  • Default: 1.0
Top P
  • Key: top_p
  • float, 0.0 – 1.0
  • Default: 1.0
Max Tokens
  • Key: max_tokens
  • integer ≥ 1
Stop
  • Key: stop
  • string or array of strings
    Stops generation when matched.

OpenAI-specific

Tools
  • Key: tools
  • Supports OpenAI tool calling API.
Tool Choice
  • Key: tool_choice
  • "none" | "auto" | "required" | {object}"
Parallel Tool Calls
  • Key: parallel_tool_calls
  • Default: true
Response Format
  • Key: response_format
  • { "type": "json_object" } → JSON output
Logit Bias
  • Key: logit_bias
  • Bias token selection by ID.
Seed
  • Key: seed
  • Enables deterministic outputs (when supported).
Prompt Cache Key
  • Key: prompt_cache_key
  • string
  • Enables prompt caching for OpenAI models. Cached input tokens are billed at 50% of the original input token price. First request with a key creates the cache; subsequent requests with the same key reuse the cached prefix.

Anthropic-specific

  • system messages must be passed in the root system field (not inside messages).
  • No logit_bias support.
  • response_format is not yet supported — JSON enforcement must be done via prompt instructions.
Cache Control
  • Key: cache_control (in message content parts)
  • object with type: "ephemeral" and ttl: "5m"
  • Enables prompt caching for Anthropic Claude models. Add cache_control to individual content parts within messages.
  • Pricing: Cache creation tokens billed at 1.25x input price; cache reads at 0.1x (90% discount).
  • Limits: Maximum 4 content blocks with cache_control per request; cache expires after 5 minutes.
Was this page helpful?