Create Your Own Interactive Digital Human: A Step-by-Step Guide to Open Avatar Chat
Want to build your own real-time conversational digital human? Open Avatar Chat makes it possible, even on a single PC! This guide breaks down everything you need to know to get started, from core features to detailed installation instructions, to creating a digital human with low latency and personalized interactions.
What is Open Avatar Chat?
Open Avatar Chat is a modular, open-source project that lets you create interactive digital humans, that reacts in real-time. Imagine customizing your own AI avatar for customer service, virtual assistance, or even just for fun. Open Avatar Chat gives you the tools to do it, using cutting-edge multimodal language models or cloud APIs giving you options for different computer capabilities.
Key Benefits of Open Avatar Chat
- Low-Latency Real-Time Conversation: Experience natural interactions with quick response times (around 2.2 seconds on average).
- Multimodal Language Model Support: Use text, audio, and video to enhance your digital human's communication capabilities.
- Modular Design: Easily swap out components to create custom digital human experiences.
What You'll Need
Before diving in, make sure you have the following:
- Python: Version 3.10 - 3.12
- CUDA-enabled GPU: A CUDA-enabled GPU is needed of your machine will be running the complete interactive digital human chat platform.
- Git LFS Ensure Git LFS is installed.
Don't have a powerful GPU? No problem. You can use cloud APIs to handle the heavy lifting – more on that later.
Quick Steps to Getting Started: Local Execution
Ready to bring your digital human to life? Follow these steps for local execution:
- Install Git LFS: OpenAvatarChat requires the git LFS to handle large files.
- Update Submodules:
- Install UV: UV is a Python environment and package manager. Use it to manage dependencies.
- Install Dependencies: Choose a configuration file (more on this below) and install the necessary dependencies.
- Run the Demo:
Choosing the Right Configuration
Open Avatar Chat uses configuration files to define the behavior of your digital human. Here are a few pre-set options:
chat_with_gs.yaml
: Uses a Lightweight setup with API-based LLM and TTS, suitable for multiple connections.chat_with_minicpm.yaml
: Employs MiniCPM-o-2.6 as the audio2audio chat model, requiring a beefy GPU.chat_with_openai_compatible.yaml
: Integrates with an OpenAI-compatible API for LLM and CosyVoice for local TTS.chat_with_openai_compatible_bailian_cosyvoice.yaml
: This is the lightest config, where both LLM and TTS are provided by API.chat_with_openai_compatible_edge_tts.yaml
: Uses Microsoft Edge TTS, which doesn't require an API key.
Running Open Avatar Chat with Docker
Want an even easier setup? Use Docker! Here’s how:
- Install Docker: Make sure you have Docker installed and configured to use your GPU.
- Build and Run: Execute the following command, pointing to your chosen configuration file:
Diving Deeper: Handler Dependencies
Open Avatar Chat uses handlers for various tasks like rendering, language models, and text-to-speech. Here's a closer look at some key handlers:
- LAM Client Rendering Handler: This handler uses assets generated by the LAM (ultra-realistic 3D digital humans from a single image) project to render the client avatar. You can use the sample assets provided or create your own.
- OpenAI Compatible LLM Handler: Connect to your existing LLM API (like Bailian) to power the digital human's responses.
- MiniCPM Omni Speech2Speech Handler: Use the MiniCPM-o-2.6 model for multimodal dialogue capabilities, downloading the model from Hugging Face or ModelScope.
- Bailian CosyVoice Handler: Integrate Bailian's CosyVoice API for text-to-speech, reducing system requirements.
- CosyVoice Local Inference Handler: Run CosyVoice locally for TTS, but be aware of potential installation issues on Windows (see the original documentation for workarounds).
- Edge TTS Handler: Leverage Microsoft Edge's cloud-based TTS service.
- LiteAvatar Avatar Handler: Utilize 2D avatar features with a library of 100+ avatars on ModelScope’s LiteAvatarGallery.
- LAM Avatar Driver Handler: Use audio to drive facial expressions with the
facebook/wav2vec2-base-960h
andLAM_audio2exp
models.
Optional Deployment Tweaks
- SSL Certificates: If you're not running Open Avatar Chat on localhost, you'll need SSL certificates to secure the video and audio streams.
- TURN Server: If you encounter connection issues, setting up a TURN server can help with NAT traversal.
Configuring Your Digital Human
The main configuration file and command parameter is <project_root>/configs/chat_with_minicpm.yaml
. You can load from another directory by specifying --config
.
Key parameters for configuration live in:
- VAD
- LLM
- ASR
- TTS
- LiteAvatar Digital Human
Remember, all path parameters can be absolute or relative to the project root.
Start Building Your Digital Human Today
Open Avatar Chat offers a powerful and flexible platform for creating interactive digital humans. Dive in, experiment with different configurations, and bring your AI avatar to life!