Create Your Own Interactive Digital Human: A Step-by-Step Guide to Open Avatar Chat

Want to build your own real-time conversational digital human? Open Avatar Chat makes it possible, even on a single PC! This guide breaks down everything you need to know to get started, from core features to detailed installation instructions, to creating a digital human with low latency and personalized interactions.

What is Open Avatar Chat?

Open Avatar Chat is a modular, open-source project that lets you create interactive digital humans, that reacts in real-time. Imagine customizing your own AI avatar for customer service, virtual assistance, or even just for fun. Open Avatar Chat gives you the tools to do it, using cutting-edge multimodal language models or cloud APIs giving you options for different computer capabilities.

Key Benefits of Open Avatar Chat

Low-Latency Real-Time Conversation: Experience natural interactions with quick response times (around 2.2 seconds on average).
Multimodal Language Model Support: Use text, audio, and video to enhance your digital human's communication capabilities.
Modular Design: Easily swap out components to create custom digital human experiences.

What You'll Need

Before diving in, make sure you have the following:

Python: Version 3.10 - 3.12
CUDA-enabled GPU: A CUDA-enabled GPU is needed of your machine will be running the complete interactive digital human chat platform.
Git LFS Ensure Git LFS is installed.

Don't have a powerful GPU? No problem. You can use cloud APIs to handle the heavy lifting – more on that later.

Quick Steps to Getting Started: Local Execution

Ready to bring your digital human to life? Follow these steps for local execution:

Install Git LFS: OpenAvatarChat requires the git LFS to handle large files.
```
sudo apt install git-lfs
git lfs install
```

Update Submodules:

git submodule update --init --recursive

Install UV: UV is a Python environment and package manager. Use it to manage dependencies.
```
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Install Dependencies: Choose a configuration file (more on this below) and install the necessary dependencies.
```
uv run install.py --uv --config <absolute path to config file>.yaml
```

Run the Demo:

uv run src/demo.py --config <absolute path to config file>.yaml

Choosing the Right Configuration

Open Avatar Chat uses configuration files to define the behavior of your digital human. Here are a few pre-set options:

chat_with_gs.yaml: Uses a Lightweight setup with API-based LLM and TTS, suitable for multiple connections.
chat_with_minicpm.yaml: Employs MiniCPM-o-2.6 as the audio2audio chat model, requiring a beefy GPU.
chat_with_openai_compatible.yaml: Integrates with an OpenAI-compatible API for LLM and CosyVoice for local TTS.
chat_with_openai_compatible_bailian_cosyvoice.yaml: This is the lightest config, where both LLM and TTS are provided by API.
chat_with_openai_compatible_edge_tts.yaml: Uses Microsoft Edge TTS, which doesn't require an API key.

Running Open Avatar Chat with Docker

Want an even easier setup? Use Docker! Here’s how:

Install Docker: Make sure you have Docker installed and configured to use your GPU.
Build and Run: Execute the following command, pointing to your chosen configuration file:
```
./build_and_run.sh --config <absolute path to config file>.yaml
```

Diving Deeper: Handler Dependencies

Open Avatar Chat uses handlers for various tasks like rendering, language models, and text-to-speech. Here's a closer look at some key handlers:

LAM Client Rendering Handler: This handler uses assets generated by the LAM (ultra-realistic 3D digital humans from a single image) project to render the client avatar. You can use the sample assets provided or create your own.
OpenAI Compatible LLM Handler: Connect to your existing LLM API (like Bailian) to power the digital human's responses.
MiniCPM Omni Speech2Speech Handler: Use the MiniCPM-o-2.6 model for multimodal dialogue capabilities, downloading the model from Hugging Face or ModelScope.
Bailian CosyVoice Handler: Integrate Bailian's CosyVoice API for text-to-speech, reducing system requirements.
CosyVoice Local Inference Handler: Run CosyVoice locally for TTS, but be aware of potential installation issues on Windows (see the original documentation for workarounds).
Edge TTS Handler: Leverage Microsoft Edge's cloud-based TTS service.
LiteAvatar Avatar Handler: Utilize 2D avatar features with a library of 100+ avatars on ModelScope’s LiteAvatarGallery.
LAM Avatar Driver Handler: Use audio to drive facial expressions with the facebook/wav2vec2-base-960h and LAM_audio2exp models.

Optional Deployment Tweaks

SSL Certificates: If you're not running Open Avatar Chat on localhost, you'll need SSL certificates to secure the video and audio streams.
TURN Server: If you encounter connection issues, setting up a TURN server can help with NAT traversal.

Configuring Your Digital Human

The main configuration file and command parameter is <project_root>/configs/chat_with_minicpm.yaml. You can load from another directory by specifying --config.

Key parameters for configuration live in:

VAD
LLM
ASR
TTS
LiteAvatar Digital Human

Remember, all path parameters can be absolute or relative to the project root.

Start Building Your Digital Human Today

Open Avatar Chat offers a powerful and flexible platform for creating interactive digital humans. Dive in, experiment with different configurations, and bring your AI avatar to life!