Documentation Index
Fetch the complete documentation index at: https://spacesail.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
CSV row chunking is a method of splitting CSV files based on the number of rows, rather than character count. This approach treats each row (or group of rows) as a semantic unit, preserving the integrity of individual records while enabling efficient processing of tabular data.
Code
import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.row import RowChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.csv_reader import CSVReader
from agno.vectordb.pgvector import PgVector
db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
knowledge_base = Knowledge(
vector_db=PgVector(table_name="imdb_movies_row_chunking", db_url=db_url),
)
asyncio.run(knowledge_base.add_content_async(
url="https://agno-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
reader=CSVReader(
chunking_strategy=RowChunking(),
),
))
# Initialize the Agent with the knowledge_base
agent = Agent(
knowledge=knowledge_base,
search_knowledge=True,
)
# Use the agent
agent.print_response("Tell me about the movie Guardians of the Galaxy", markdown=True)
Usage
Create a virtual environment
Open the Terminal and create a python virtual environment.python3 -m venv .venv
source .venv/bin/activate
Install libraries
pip install -U sqlalchemy psycopg pgvector agno
Run PgVector
docker run -d \
-e POSTGRES_DB=ai \
-e POSTGRES_USER=ai \
-e POSTGRES_PASSWORD=ai \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v pgvolume:/var/lib/postgresql/data \
-p 5532:5432 \
--name pgvector \
agno/pgvector:16
Run Agent
python cookbook/knowledge/chunking/csv_row_chunking.py
CSV Row Chunking Params
| Parameter | Type | Default | Description |
rows_per_chunk | int | 100 | The number of rows to include in each chunk. |
skip_header | bool | False | Whether to skip the header row when chunking. |
clean_rows | bool | True | Whether to clean and normalize row data. |
include_header_in_chunks | bool | False | Whether to include the header row in each chunk. |
max_chunk_size | int | 5000 | Maximum character size for each chunk (fallback limit). |