Multimodal Capabilities

Anannas supports multiple input modalities beyond text, so you can send images, PDFs, and audio files to compatible models through the same /chat/completions endpoint. This enables rich, multimodal interactions with minimal integration overhead.

Supported Modalities

Images - Send images to vision-enabled models for tasks like analysis, captioning, OCR, and visual Q&A. Supports: URL-based images, base64-encoded images
Read More about Image Inputs ->
PDFs - Process PDF documents seamlessly. Anannas extracts text and handles both text-based PDFs and scanned files. Accepted formats: URL or base64-encoded.
Read More about PDF processing ->
Audio - Send audio files to models with audio processing capabilities for tasks like transcription, translation, and multimodal analysis. Supports: base64-encoded audio (URLs not supported).
Read More about Audio Input ->

Getting Started

All multimodal inputs use /chat/completions with the messages array.
Specify the content type for each input:

Images -> image_url
PDFs -> file
Audio -> input_audio

You can combine multiple inputs in a single request. The maximum number of files depends on the model/provider.

Model Compatibility

Check Multimodal Support

For models that support images, PDFs, and other modalities, visit anannas.ai/models to see capabilities by model.

Images -> Supported (URL + base64)
PDFs -> Supported (URL + base64)
Audio -> Supported (base64 only)

You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

Input Format Support

URLs (Recommended for public content)

Images: https://example.com/image.jpg
PDFs: https://example.com/document.pdf
Audio: ❌ Not supported (base64 only)

Base64 Encoding (Required for local/private files)

Images: data:image/jpeg;base64,{base64_data}
PDFs: data:application/pdf;base64,{base64_data}
Audio: Base64-encoded with format specification

URLs are preferred for large files since they avoid payload bloat.Use base64 encoding when working with local or private files.

FAQs

Can I send both images and PDFs in one request?

Yes. You can mix text, images, and PDFs in the same request. The model will process them together.

What happens if I send audio to a model that doesn't support it?

Audio inputs are only processed by models with audio input modality. If you send audio to a model that doesn’t support it, the audio will be ignored silently (no error will be thrown).

Which models support audio?

Models with audio input modality support audio processing. Check the Models page and filter by audio input modality to see which models support audio.

Was this page helpful?

Getting Started

Features

API

Models

Use Cases

Community

Multimodal Capabilities

Supported Modalities

Getting Started

Model Compatibility

Check Multimodal Support

Input Format Support

FAQs

Getting Started

Features

API

Models

Use Cases

Community

​Supported Modalities

​Getting Started

​Model Compatibility

Check Multimodal Support

​Input Format Support

​FAQs

Supported Modalities

Getting Started

Model Compatibility

Input Format Support

FAQs