Embedders turn text, images, and other data into vectors (lists of numbers) that capture meaning. Those vectors make it easy to store and search information semantically—so you find content by intent and context, not just exact keywords. If you’re building features like retrieval-augmented generation (RAG), semantic search, question answering over docs, or long-term memory for agents, embedders are the foundation that makes it all work.Documentation Index
Fetch the complete documentation index at: https://spacesail.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Why use embedders?
- Better recall than keywords: They understand meaning, so “How do I reset my passcode?” finds docs mentioning “change PIN”.
- Ground LLMs in your data: Provide the model with trusted, domain-specific context at answer time.
- Scale to large knowledge bases: Vectors enable fast similarity search across thousands or millions of chunks.
- Multilingual retrieval: Many embedders map different languages to the same semantic space.
When to use embedders
Use embedders when you need any of the following:- RAG and context injection: Supply relevant snippets to your agent before responding.
- Semantic search: Let users query by meaning across product docs, wikis, tickets, or chats.
- Deduplication and clustering: Group similar content or avoid repeating the same info.
- Personal and team memory: Store summaries and facts for later recall by agents.
How it works in Agno
Agno usesOpenAIEmbedder as the default, but you can swap in any supported embedder. When you add content to a knowledge base, the embedder converts each chunk into a vector and stores it in your vector database. Later, when an agent searches, it embeds the query and finds the most similar vectors.
Here’s a basic setup:
Choosing an embedder
Pick based on your constraints:- Hosted vs local: Prefer local (e.g., Ollama, FastEmbed) for offline or strict data residency; hosted (OpenAI, Gemini, Voyage) for best quality and convenience.
- Latency and cost: Smaller models are cheaper/faster; larger models often retrieve better.
- Language support: Ensure your embedder supports the languages you expect.
- Dimension compatibility: Match your vector DB’s expected embedding size if it’s fixed.
Quick Comparison
| Embedder | Type | Best For | Cost | Performance |
|---|---|---|---|---|
| OpenAI | Hosted | General use, proven quality | $$ | Excellent |
| Ollama | Local | Privacy, offline, no API costs | Free | Good |
| Voyage AI | Hosted | Specialized retrieval tasks | $$$ | Excellent |
| Gemini | Hosted | Google ecosystem, multilingual | $$ | Excellent |
| FastEmbed | Local | Fast local embeddings | Free | Good |
| HuggingFace | Local/Hosted | Open source models, customization | Free/$ | Variable |
Supported embedders
The following embedders are supported:- OpenAI
- Cohere
- Gemini
- AWS Bedrock
- Azure OpenAI
- Fireworks
- HuggingFace
- Jina
- Mistral
- Nebius
- Ollama
- Qdrant FastEmbed
- Together
- Voyage AI
Best Practices
Batch Embeddings
Many embedding providers support processing multiple texts in a single API call, known as batch embedding. This approach offers several advantages: it reduces the number of API requests, helps avoid rate limits, and significantly improves performance when processing large amounts of text. To enable batch processing, set theenable_batch flag to True when configuring your embedder.
The batch_size paramater can be used to control the amount of texts sent per batch.