CSV Row Chunking

CSV row chunking is a method of splitting documents into smaller chunks by using a model to determine natural breakpoints in the text. Rather than splitting text at fixed character counts, it analyzes the content to find semantically meaningful boundaries like paragraph breaks and topic transitions.

Code

import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.row import RowChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.csv_reader import CSVReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge_base = Knowledge(
    vector_db=PgVector(table_name="imdb_movies_row_chunking", db_url=db_url),
)

asyncio.run(knowledge_base.add_content_async(
    url="https://agno-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
    reader=CSVReader(
        chunking_strategy=RowChunking(),
    ),
))  

# Initialize the Agent with the knowledge_base
agent = Agent(
    knowledge=knowledge_base,
    search_knowledge=True,
)

# Use the agent 
agent.print_response("Tell me about the movie Guardians of the Galaxy", markdown=True)

Usage

Create a virtual environment

Open the Terminal and create a python virtual environment.

python3 -m venv .venv
source .venv/bin/activate

Install libraries

pip install -U sqlalchemy psycopg pgvector agno

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16

Run Agent

python cookbook/knowledge/chunking/csv_row_chunking.py

CSV Row Chunking Params

Parameter	Type	Default	Description
`rows_per_chunk`	`int`	`100`	The number of rows to include in each chunk.
`skip_header`	`bool`	`False`	Whether to skip the header row when chunking.
`clean_rows`	`bool`	`True`	Whether to clean and normalize row data.
`include_header_in_chunks`	`bool`	`False`	Whether to include the header row in each chunk.
`max_chunk_size`	`int`	`5000`	Maximum character size for each chunk (fallback limit).

Overview

Use Cases

Concepts

Models

CSV Row Chunking

Code

Usage

CSV Row Chunking Params

Overview

Use Cases

Concepts

Models

Documentation Index

​Code

​Usage

​CSV Row Chunking Params

Code

Usage

CSV Row Chunking Params