Document objects that can be embedded, chunked, and stored in vector databases.
What are Readers?
A Reader is a specialized component that knows how to parse and extract content from specific data sources or file formats. Think of readers as translators that convert different content formats into a standardized format that Agno can work with. Every piece of content that enters your knowledge base must pass through a reader first. The reader’s job is to:- Parse the raw content from its original format
- Extract the meaningful text and metadata
- Structure the content into
Documentobjects - Apply chunking strategies to break large content into manageable pieces
How Readers Work
All readers inherit from the baseReader class and follow a consistent pattern:
The Reading Process
When a reader processes content, it follows these steps:- Content Ingestion: The reader receives raw content (file, URL, text, etc.)
- Parsing: Extract text and metadata using format-specific logic
- Document Creation: Convert parsed content into
Documentobjects - Chunking: Apply chunking strategies to break content into smaller pieces
- Return: Provide a list of processed documents ready for embedding
Content Types and Specialization
Each reader specializes in handling specific content types:- Use format-specific parsing libraries
- Extract relevant metadata
- Handle format-specific challenges (encryption, encoding, etc.)
- Optimize processing for that content type
Reader Configuration
Readers are highly configurable to meet different processing needs:Chunking Control
Content Processing Options
Encoding Control
For text-based readers, you can override the file encoding:Metadata and Naming
The Document Output
Readers convert raw content intoDocument objects with this structure:
Chunking Integration
One of the most important features of readers is their integration with chunking strategies:Automatic Chunking
Whenchunk=True, readers automatically apply chunking strategies to break large documents into smaller, more manageable pieces:
Chunking Strategy Support
Different readers support different chunking strategies based on their content type:Reader Factory and Auto-Selection
Agno provides intelligent reader selection through theReaderFactory:
Supported Readers
The following readers are currently supported:| Reader Name | Description |
|---|---|
| ArxivReader | Fetches and processes academic papers from arXiv |
| CSVReader | Parses CSV files and converts rows to documents |
| FieldLabeledCSVReader | Converts CSV rows to field-labeled text documents |
| FirecrawlReader | Uses Firecrawl API to scrape and crawl web content |
| JSONReader | Processes JSON files and converts them into documents |
| MarkdownReader | Reads and parses Markdown files |
| PDFReader | Reads and extracts text from PDF files |
| PPTXReader | Reads and extracts text from PowerPoint (.pptx) files |
| TextReader | Handles plain text files |
| WebsiteReader | Crawls entire websites following links recursively |
| WebSearchReader | Searches and reads web search results |
| WikipediaReader | Searches and reads Wikipedia articles |
| YouTubeReader | Extracts transcripts and metadata from YouTube videos |
Async Processing
All readers support asynchronous processing for better performance:Usage in Knowledge
Readers integrate seamlessly with Agno Knowledge:Best Practices
Choose the Right Reader
- Use specialized readers for better extraction quality
- Consider format-specific features (PDF encryption, CSV delimiters, etc.)
Configure Chunking Appropriately
- Smaller chunks for precise retrieval
- Larger chunks for maintaining context
- Use semantic chunking for structured documents
Optimize for Performance
- Use async readers for I/O-heavy operations
- Batch process multiple files when possible
- Cache readers through ReaderFactory when processing many files
Handle Errors Gracefully
- Readers return empty lists for failed processing
- Check reader logs for debugging information
- Provide fallback readers for unknown formats
Next Steps
Chunking Strategies
Learn how to optimize content chunking for better search results
Content Types
Understand different ways to add information to your knowledge base
Vector Databases
Choose the right storage solution for your processed content
Examples
See readers in action with practical examples