Documentation Index
Fetch the complete documentation index at: https://spacesail.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Semantic chunking is a method of splitting documents into smaller chunks by analyzing semantic similarity between text segments using embeddings. It uses the chonkie library to identify natural breakpoints where the semantic meaning changes significantly, based on a configurable similarity threshold. This helps preserve context and meaning better than fixed-size chunking by ensuring semantically related content stays together in the same chunk, while splitting occurs at meaningful topic transitions.
Code
import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.semantic import SemanticChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
knowledge = Knowledge(
vector_db=PgVector(table_name="recipes_semantic_chunking", db_url=db_url),
)
asyncio.run(knowledge.add_content_async(
url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
reader=PDFReader(
name="Semantic Chunking Reader",
chunking_strategy=SemanticChunking(similarity_threshold=0.5),
),
))
agent = Agent(
knowledge=knowledge,
search_knowledge=True,
)
agent.print_response("How to make Thai curry?", markdown=True)
Usage
Create a virtual environment
Open the Terminal and create a python virtual environment.python3 -m venv .venv
source .venv/bin/activate
Install libraries
pip install -U sqlalchemy psycopg pgvector agno chonkie
Run PgVector
docker run -d \
-e POSTGRES_DB=ai \
-e POSTGRES_USER=ai \
-e POSTGRES_PASSWORD=ai \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v pgvolume:/var/lib/postgresql/data \
-p 5532:5432 \
--name pgvector \
agno/pgvector:16
Run Agent
python cookbook/knowledge/chunking/semantic_chunking.py
Semantic Chunking Params
| Parameter | Type | Default | Description |
embedder | Embedder | OpenAIEmbedder | The embedder to use for semantic chunking. |
chunk_size | int | 5000 | The maximum size of each chunk. |
similarity_threshold | float | 0.5 | The similarity threshold for determining chunk boundaries. |