Skip to main content

Documentation Index

Fetch the complete documentation index at: https://spacesail.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This example demonstrates how to create an agent that can transcribe audio conversations, identifying different speakers and providing accurate transcriptions.

Code

audio_to_text.py
import requests
from agno.agent import Agent
from agno.media import Audio
from agno.models.google import Gemini

agent = Agent(
    model=Gemini(id="gemini-2.0-flash-exp"),
    markdown=True,
)

url = "https://agno-public.s3.us-east-1.amazonaws.com/demo_data/QA-01.mp3"

response = requests.get(url)
audio_content = response.content

# Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.

agent.print_response(
    "Give a transcript of this audio conversation. Use speaker A, speaker B to identify speakers.",
    audio=[Audio(content=audio_content)],
    stream=True,
)

Usage

1

Create a virtual environment

Open the Terminal and create a python virtual environment.
python3 -m venv .venv
source .venv/bin/activate
2

Install libraries

pip install -U agno google-generativeai requests
3

Export your GOOGLE API key

  export GOOGLE_API_KEY="your_google_api_key_here"
4

Create a Python file

Create a Python file and add the above code.
touch audio_to_text.py
5

Run Agent

python audio_to_text.py
6

Find All Cookbooks

Explore all the available cookbooks in the Agno repository. Click the link below to view the code on GitHub:Agno Cookbooks on GitHub