Audio Inputs

Anannas supports sending audio files to compatible models via the API. This guide will show you how to work with audio using Anannas.

Audio files must be base64-encoded - direct URLs are not supported for audio content.

Requests with audio files to compatible models are available via the /v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification. Note that only models with audio processing capabilities will handle these requests. You can search for models that support audio by filtering to audio input modality on our Models page.

Sending Audio Files

Here’s how to send an audio file for processing:

import requests
import base64
from pathlib import Path

def transcribe_audio(file_path, model_id, max_tokens=4096):
    """Transcribe audio file using Anannas API."""
    
    # Read and encode audio file
    with open(file_path, 'rb') as f:
        audio_data = base64.b64encode(f.read()).decode('utf-8')
    
    # Get file extension for format (e.g., .mp3 -> mp3)
    file_ext = Path(file_path).suffix.lstrip('.').lower()
    
    # Prepare payload
    payload = {
        'model': model_id,
        'messages': [{
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': 'Transcribe and summarize this audio.'
                },
                {
                    'type': 'input_audio',
                    'input_audio': {
                        'data': audio_data,
                        'format': file_ext
                    }
                }
            ]
        }],
        'max_tokens': max_tokens
    }
    
    # Make API request
    response = requests.post(
        'https://api.anannas.ai/v1/chat/completions',
        headers={
            'Authorization': f'Bearer {API_KEY_REF}',
            'Content-Type': 'application/json'
        },
        json=payload,
        timeout=60
    )
    
    if response.status_code == 200:
        result = response.json()
        content = result['choices'][0]['message']['content']
        print(content)
        
        # Print token usage if available
        if 'usage' in result:
            print(f"\nTokens used: {result['usage'].get('total_tokens', 'N/A')}")
        return content
    else:
        print(f'Error: {response.status_code} - {response.text}')
        return None

# Example usage
transcribe_audio('path/to/your/audio.wav', 'google/gemini-3-flash-preview', max_tokens=20000)

Format Detection

The audio format is typically determined from the file extension. Extract the format by removing the leading dot from the file extension (e.g., .mp3 → mp3, .wav → wav).

Supported Formats

Supported audio formats vary by provider. Common formats include:

wav - WAV audio
mp3 - MP3 audio
aiff - AIFF audio
aac - AAC audio
ogg - OGG Vorbis audio
flac - FLAC audio
m4a - M4A audio
pcm16 - PCM16 audio
pcm24 - PCM24 audio

Check Format Support

Note: Check your model’s documentation to confirm which audio formats it supports. Not all models support all formats. Visit anannas.ai/models to see capabilities by model.

Was this page helpful?

Getting Started

Features

API

Models

Use Cases

Community

Audio Inputs

Audio Inputs

Sending Audio Files

Format Detection

Supported Formats

Check Format Support

Getting Started

Features

API

Models

Use Cases

Community

​Audio Inputs

​Sending Audio Files

​Format Detection

​Supported Formats

Check Format Support

Audio Inputs

Sending Audio Files

Format Detection

Supported Formats