Skip to main content
Anannas supports sending audio files to compatible models via the API. This guide will show you how to work with audio using Anannas.
Audio files must be base64-encoded - direct URLs are not supported for audio content.

Audio Inputs

Requests with audio files to compatible models are available via the /v1/chat/completions API with the input_audio content type. Audio files must be base64-encoded and include the format specification. Note that only models with audio processing capabilities will handle these requests. You can search for models that support audio by filtering to audio input modality on our Models page.

Sending Audio Files

Here’s how to send an audio file for processing:
import requests
import base64
from pathlib import Path

def transcribe_audio(file_path, model_id, max_tokens=4096):
    """Transcribe audio file using Anannas API."""
    
    # Read and encode audio file
    with open(file_path, 'rb') as f:
        audio_data = base64.b64encode(f.read()).decode('utf-8')
    
    # Get file extension for format (e.g., .mp3 -> mp3)
    file_ext = Path(file_path).suffix.lstrip('.').lower()
    
    # Prepare payload
    payload = {
        'model': model_id,
        'messages': [{
            'role': 'user',
            'content': [
                {
                    'type': 'text',
                    'text': 'Transcribe and summarize this audio.'
                },
                {
                    'type': 'input_audio',
                    'input_audio': {
                        'data': audio_data,
                        'format': file_ext
                    }
                }
            ]
        }],
        'max_tokens': max_tokens
    }
    
    # Make API request
    response = requests.post(
        'https://api.anannas.ai/v1/chat/completions',
        headers={
            'Authorization': f'Bearer {API_KEY_REF}',
            'Content-Type': 'application/json'
        },
        json=payload,
        timeout=60
    )
    
    if response.status_code == 200:
        result = response.json()
        content = result['choices'][0]['message']['content']
        print(content)
        
        # Print token usage if available
        if 'usage' in result:
            print(f"\nTokens used: {result['usage'].get('total_tokens', 'N/A')}")
        return content
    else:
        print(f'Error: {response.status_code} - {response.text}')
        return None

# Example usage
transcribe_audio('path/to/your/audio.wav', 'google/gemini-3-flash-preview', max_tokens=20000)

Format Detection

The audio format is typically determined from the file extension. Extract the format by removing the leading dot from the file extension (e.g., .mp3mp3, .wavwav).

Supported Formats

Supported audio formats vary by provider. Common formats include:
  • wav - WAV audio
  • mp3 - MP3 audio
  • aiff - AIFF audio
  • aac - AAC audio
  • ogg - OGG Vorbis audio
  • flac - FLAC audio
  • m4a - M4A audio
  • pcm16 - PCM16 audio
  • pcm24 - PCM24 audio

Check Format Support

Note: Check your model’s documentation to confirm which audio formats it supports. Not all models support all formats. Visit anannas.ai/models to see capabilities by model.
Was this page helpful?