Build Your Own Voice Assistant: A Step-by-Step Guide with OpenAI Agents SDK

Are you envisioning a digital voice assistant capable of handling any user query, from account actions to product details? Creating a complete voice assistant used to be a complex endeavor. Fortunately, OpenAI's recent releases—Responses API, Agents SDK, and voice agents—make building and orchestrating voice-powered agent workflows easier than ever. This guide show you how to leverage these tools and create your own intelligent voice assistant.

What You'll Learn

How to build a modular, agentic architecture for your voice assistant.
How to route user requests to specialized agents for accurate responses.
How to integrate voice pipelines to enable audio input and output.
How to use real time information to inform purchasing decisions using web search.

Install Required Packages

Make sure that you have the following packages installed before proceeding further.

pip install openai
pip install openai-agents[voice]
pip install numpy
pip install sounddevice
pip install os

Core Components: OpenAI's Powerful Trio

Three key releases from OpenAI have simplified the development of voice assistants:

Responses API: This agentic API allows easy interaction with OpenAI's advanced models through stateful conversations. It provides built-in tools for file search, web search, and computer use.
Agents SDK: This open-source framework enables you to build and orchestrate workflows across multiple agents. The Agents SDK allows your assistant to route inputs to the appropriate agent and be scaled to other use cases.
Voice Agents: Extending the Agents SDK, voice agents facilitate the use of voice pipelines. They enable your agents to understand and produce audio with minimal code.

Building Blocks: Specialized Agents for Specific Tasks

This cookbook demonstrates how to build an in-app voice assistant for a fictitious consumer application. We'll create these separate agents:

Triage Agent: Greets users, determines their intent, and routes requests to other specific agents.
Search Agent: Uses the Responses API's web search tool to provide real-time information.
Knowledge Agent: Uses the Responses API's file search tool to retrieve information from a managed vector database with our company information.
Account Agent: Uses function calling to trigger custom actions via API and to provide account information.

Setting Up Your API Key

Don't forget to set your OpenAI API key as shown:

from agents import set_default_openai_key

set_default_openai_key("YOUR_API_KEY")

Agent 1: Search Agent for Real-time Information

This agent uses the WebSearchTool to provide up-to-date information based on user queries.

from agents import Agent, WebSearchTool

search_agent = Agent(
    name="SearchAgent",
    instructions="You immediately provide an input to the WebSearchTool to find up-to-date information on the user's query.",
    tools=[WebSearchTool()],
)

Agent 2: Knowledge Agent for Product Portfolio Insights

This agent answers your questions about products using the FileSearchTool to extract details from an OpenAI-managed vector store. You can create and upload a vector store via either OpenAI Platform Website OR OpenAI API.

from agents import Agent, FileSearchTool

knowledge_agent = Agent(
    name="KnowledgeAgent",
    instructions="You answer user questions on our product portfolio with concise, helpful responses using the FileSearchTool.",
    tools=[FileSearchTool(
        max_num_results=3,
        vector_store_ids=["VECTOR_STORE_ID"],
    )],
)

Agent 3: Account Agent for Personalized Account Information

This agent provides account information using a custom function tool.

from agents import Agent, function_tool

@function_tool
def get_account_info(user_id: str) -> dict:
    """Return dummy account info for a given user."""
    return {
        "user_id": user_id,
        "name": "Bugs Bunny",
        "account_balance": "£72.50",
        "membership_status": "Gold Executive"
    }

account_agent = Agent(
    name="AccountAgent",
    instructions="You provide account information based on a user ID using the get_account_info tool.",
    tools=[get_account_info],
)

Agent 4: Triage Agent: The Smart Router

The triage agent is the entry point, welcoming users and directing their requests to the appropriate specialized agent. Prompt the agent with the handoff instructions to give guidance on how to treat handoffs with instructions.

from agents import Agent
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions

triage_agent = Agent(
    name="Assistant",
    instructions=prompt_with_handoff_instructions("""
        You are the virtual assistant for Acme Shop. Welcome the user and ask how you can help.
        Based on the user's intent, route to:
        - AccountAgent for account-related queries
        - KnowledgeAgent for product FAQs
        - SearchAgent for anything requiring real-time web search
        """),
    handoffs=[account_agent, knowledge_agent, search_agent],
)

Putting it Together: Running the Voice Agent Workflow

Now that you've defined all the agents, test them with a few example queries to see how well they perform. To do this you can use an in-built trace function enabling us on the flow of events during an agent run across the LLM calls.

from agents import Runner, trace

async def test_queries():
    examples = [
        "What's my ACME account balance doc? My user ID is 1234567890", # Account Agent test
        "Ooh i've got money to spend! How big is the input and how fast is the output of the dynamite dispenser?", # Knowledge Agent test
        "Hmmm, what about duck hunting gear - what's trending right now?", # Search Agent test
    ]
    with trace("ACME App Assistant"):
        for query in examples:
            result = await Runner.run(triage_agent, query)
            print(f"User: {query}")
            print(result.final_output)
            print("---")

# Run the tests
await test_queries()

Make it Voice-Activated: Enabling Voice Functionality

Leverage the Agents SDK to convert the text-based workflow into a voice-based one. The VoicePipeline class provides an interface for transcribing audio input, while SingleAgentVoiceWorkflow allows you to reuse your existing agent workflow.

import numpy as np
import sounddevice as sd
from agents.voice import AudioInput, SingleAgentVoiceWorkflow, VoicePipeline

async def voice_assistant():
    samplerate = sd.query_devices(kind='input')['default_samplerate']

    while True:
        pipeline = VoicePipeline(workflow=SingleAgentVoiceWorkflow(triage_agent))

        cmd = input("Press Enter to speak your query (or type 'esc' to exit): ")
        if cmd.lower() == "esc":
            print("Exiting...")
            break
        print("Listening...")
        recorded_chunks = []

        with sd.InputStream(samplerate=samplerate, channels=1, dtype='int16', callback=lambda indata, frames, time, status: recorded_chunks.append(indata.copy())):
            input()

        recording = np.concatenate(recorded_chunks, axis=0)
        audio_input = AudioInput(buffer=recording)

        with trace("ACME App Voice Assistant"):
            result = await pipeline.run(audio_input)

        response_chunks = []
        async for event in result.stream():
            if event.type == "voice_stream_event_audio":
                response_chunks.append(event.data)

        response_audio = np.concatenate(response_chunks, axis=0)

        print("Assistant is responding...")
        sd.play(response_audio, samplerate=samplerate)
        sd.wait()
        print("---")

# Run the voice assistant
await voice_assistant()

Level Up your Voice Assistant

By using the OpenAI Agents SDK you can quickly create your own intelligent Voice Assistant. You can then further optimize voice interactions by improving prompts, adjusting agent behavior, and refining voice pipelines.