Skip to main content

Documentation Index

Fetch the complete documentation index at: https://spacesail.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This example demonstrates how to create an agent that can process audio input and generate audio output using OpenAI’s GPT-4o audio preview model. The agent analyzes an audio recording and responds with both text and audio.

Code

audio_input_output.py
import requests
from agno.agent import Agent
from agno.media import Audio
from agno.models.openai import OpenAIChat
from agno.utils.audio import write_audio_to_file
from rich.pretty import pprint

# Fetch the audio file and convert it to a base64 encoded string
url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content

agent = Agent(
    model=OpenAIChat(
        id="gpt-5-mini-audio-preview",
        modalities=["text", "audio"],
        audio={"voice": "sage", "format": "wav"},
    ),
    markdown=True,
)

run_response = agent.run(
    "What's in these recording?",
    audio=[Audio(content=wav_data, format="wav")],
)

if run_response.response_audio is not None:
    pprint(run_response.content)
    write_audio_to_file(
        audio=run_response.response_audio.content, filename="tmp/result.wav"
    )

Usage

1

Create a virtual environment

Open the Terminal and create a python virtual environment.
python3 -m venv .venv
source .venv/bin/activate
2

Install libraries

pip install -U openai agno requests
3

Export your OpenAI API key

  export OPENAI_API_KEY="your_openai_api_key_here"
4

Create a Python file

Create a Python file and add the above code.
touch audio_input_output.py
5

Run Agent

python audio_input_output.py
6

Find All Cookbooks

Explore all the available cookbooks in the Agno repository. Click the link below to view the code on GitHub:Agno Cookbooks on GitHub