Skip to main content
Anannas supports multiple input modalities beyond text, so you can send images, PDFs, and audio files to compatible models through the same /chat/completions endpoint. This enables rich, multimodal interactions with minimal integration overhead.

Supported Modalities

  • Images - Send images to vision-enabled models for tasks like analysis, captioning, OCR, and visual Q&A. Supports: URL-based images, base64-encoded images
    Read More about Image Inputs ->
  • PDFs - Process PDF documents seamlessly. Anannas extracts text and handles both text-based PDFs and scanned files. Accepted formats: URL or base64-encoded.
    Read More about PDF processing ->
  • Audio - Send audio files to models with audio processing capabilities for tasks like transcription, translation, and multimodal analysis. Supports: base64-encoded audio (URLs not supported).
    Read More about Audio Input ->

Getting Started

All multimodal inputs use /chat/completions with the messages array.
Specify the content type for each input:
  1. Images -> image_url
  2. PDFs -> file
  3. Audio -> input_audio
You can combine multiple inputs in a single request. The maximum number of files depends on the model/provider.

Model Compatibility

Check Multimodal Support

For models that support images, PDFs, and other modalities, visit anannas.ai/models to see capabilities by model.
  1. Images -> Supported (URL + base64)
  2. PDFs -> Supported (URL + base64)
  3. Audio -> Supported (base64 only)
You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

Input Format Support

URLs (Recommended for public content)
  1. Images: https://example.com/image.jpg
  2. PDFs: https://example.com/document.pdf
  3. Audio: ❌ Not supported (base64 only)
Base64 Encoding (Required for local/private files)
  1. Images: data:image/jpeg;base64,{base64_data}
  2. PDFs: data:application/pdf;base64,{base64_data}
  3. Audio: Base64-encoded with format specification
URLs are preferred for large files since they avoid payload bloat.Use base64 encoding when working with local or private files.

FAQs

Yes. You can mix text, images, and PDFs in the same request. The model will process them together.
Audio inputs are only processed by models with audio input modality. If you send audio to a model that doesn’t support it, the audio will be ignored silently (no error will be thrown).
Models with audio input modality support audio processing. Check the Models page and filter by audio input modality to see which models support audio.
Was this page helpful?