Skip to main content
Most knowledge bases work great with Agno’s defaults. But if you’re seeing slow searches, memory issues, or poor results, a few strategic changes can make a big difference.

When to Optimize

Don’t prematurely optimize. Focus on performance when you notice:
  • Slow search - Queries taking more than 2-3 seconds
  • Memory issues - Out of memory errors during content loading
  • Poor results - Search returning irrelevant chunks or missing obvious matches
  • Slow loading - Content processing taking unusually long
If things are working fine, stick with the defaults and focus on building your application.

The 80/20 of Performance

These five changes give you the biggest performance boost for the least effort:

1. Pick the Right Vector Database

Your database choice has the biggest impact on performance at scale:
from agno.vectordb.lancedb import LanceDb
from agno.vectordb.pgvector import PgVector

# Development: Fast, local, zero setup
dev_db = LanceDb(
    table_name="dev_knowledge",
    uri="./local_db"
)

# Production: Scalable, battle-tested
prod_db = PgVector(
    table_name="prod_knowledge",
    db_url="postgresql+psycopg://user:pass@db:5432/knowledge"
)
Guidelines:
  • LanceDB for development and testing (no setup required)
  • PgVector for production (up to 1M documents, need SQL features)
  • Pinecone for managed services (no ops overhead, auto-scaling)

2. Skip Already-Processed Files

The single biggest speed-up for re-running your ingestion:
# Skip files you've already processed
knowledge.add_content(
    path="large_document.pdf",
    skip_if_exists=True,  # Don't reprocess existing files
    upsert=False          # Don't update existing
)

# For batch loading
knowledge.add_contents(
    paths=["docs/", "policies/"],
    skip_if_exists=True,
    include=["*.pdf", "*.md"],
    exclude=["*temp*", "*draft*"]
)

3. Use Metadata Filters

Narrow searches before vector comparison for faster, more accurate results:
# Slow: Search everything
results = knowledge.search("deployment process", max_results=10)

# Fast: Filter first, then search
results = knowledge.search(
    query="deployment process",
    max_results=10,
    filters={"department": "engineering", "type": "procedure"}
)

# Validate your filters to catch typos
valid_filters, invalid_keys = knowledge.validate_filters({
    "department": "engineering",
    "invalid_key": "value"  # This gets flagged
})

4. Match Chunking Strategy to Your Content

Different strategies have different performance characteristics:
StrategySpeedQualityBest For
Fixed SizeFastGoodUniform content, when speed matters
SemanticSlowerBestComplex docs, when quality matters
RecursiveFastGoodStructured docs, good balance
from agno.knowledge.chunking.fixed import FixedSizeChunking
from agno.knowledge.chunking.semantic import SemanticChunking

# Fast processing for simple content
fast_chunking = FixedSizeChunking(
    chunk_size=800,
    overlap=80
)

# Better quality for complex content (but slower)
quality_chunking = SemanticChunking(
    chunk_size=1200,
    similarity_threshold=0.5
)
Learn more about choosing chunking strategies.

5. Use Async for Batch Operations

Process multiple items concurrently:
import asyncio

async def load_knowledge_efficiently():
    # Load multiple content sources in parallel
    tasks = [
        knowledge.add_content_async(path="docs/hr/"),
        knowledge.add_content_async(path="docs/engineering/"),
        knowledge.add_content_async(url="https://company.com/api-docs"),
    ]
    await asyncio.gather(*tasks)

asyncio.run(load_knowledge_efficiently())

Common Performance Pitfalls

Issue: Search Returns Irrelevant Results

What’s happening: Chunks are too large, too small, or chunking strategy doesn’t match your content. Quick fixes:
  1. Check your chunking strategy - try semantic chunking for better context
  2. Verify content actually loaded: knowledge.get_content_status(content_id)
  3. Increase max_results to see if relevant results are just ranked lower
  4. Add metadata filters to narrow the search scope
# Debug search quality
results = knowledge.search("your query", max_results=10)
if not results:
    content_list, count = knowledge.get_content()
    print(f"Total content items: {count}")
    
    # Check for failed content
    for content in content_list[:5]:
        status, message = knowledge.get_content_status(content.id)
        print(f"{content.name}: {status}")

Issue: Content Loading is Slow

What’s happening: Processing large files without batching, or using semantic chunking on huge datasets. Quick fixes:
  1. Use skip_if_exists=True to avoid reprocessing
  2. Switch to fixed-size chunking for faster processing
  3. Process in batches instead of all at once
  4. Use file filters to only process what you need
# Batch processing for large datasets
import os

def load_content_in_batches(knowledge, content_dir, batch_size=10):
    files = [f for f in os.listdir(content_dir) if f.endswith('.pdf')]
    
    for i in range(0, len(files), batch_size):
        batch_files = files[i:i+batch_size]
        print(f"Processing batch {i//batch_size + 1}")
        
        for file in batch_files:
            knowledge.add_content(
                path=os.path.join(content_dir, file),
                skip_if_exists=True
            )

Issue: Running Out of Memory

What’s happening: Loading too many large files at once, or chunk sizes are too large. Quick fixes:
  1. Process content in smaller batches (see code above)
  2. Reduce chunk size in your chunking strategy
  3. Use include and exclude patterns to limit what gets processed
  4. Clear old/outdated content regularly with knowledge.remove_content_by_id()
# Process only what you need
knowledge.add_contents(
    paths=["large_dataset/"],
    include=["*.pdf"],       # Only PDFs
    exclude=["*backup*"],    # Skip backups
    skip_if_exists=True,
    metadata={"batch": "current"}
)

Advanced Optimizations

Once you’ve applied the quick wins above, consider these for further improvements: Combine vector and keyword search for better results:
from agno.vectordb.pgvector import PgVector, SearchType

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    search_type=SearchType.hybrid  # Vector + keyword search
)

Add Reranking

Improve result quality by reranking with Cohere:
from agno.knowledge.reranker.cohere import CohereReranker

vector_db = PgVector(
    table_name="knowledge",
    db_url="postgresql+psycopg://user:pass@localhost:5432/db",
    reranker=CohereReranker(
        model="rerank-multilingual-v3.0",
        top_n=10
    )
)

Optimize Embedder Dimensions

Reduce dimensions for faster search (with slight quality trade-off):
from agno.knowledge.embedder.openai import OpenAIEmbedder

# Smaller dimensions = faster search, lower cost
embedder = OpenAIEmbedder(
    id="text-embedding-3-large",
    dimensions=1024  # Instead of full 3072
)

Monitoring Performance

Keep an eye on these metrics:
# Check content processing status
content_list, total_count = knowledge.get_content()

failed = [c for c in content_list if c.status == "failed"]
if failed:
    print(f"Failed items: {len(failed)}")
    for content in failed:
        status, message = knowledge.get_content_status(content.id)
        print(f"  {content.name}: {message}")

# Time your searches
import time

start = time.time()
results = knowledge.search("test query", max_results=5)
elapsed = time.time() - start
print(f"Search took {elapsed:.2f} seconds")

Next Steps

Chunking Strategies

Learn how different chunking strategies affect performance

Vector Databases

Compare vector database options for your scale

Embedders

Choose the right embedder for your use case

Hybrid Search

Combine vector and keyword search for better results
Start simple, optimize when needed. Agno’s defaults work well for most use cases. Profile your application to find actual bottlenecks before spending time on optimization.