Create Realistic Dialogue with Dia: A Text-to-Speech Model for Researchers

Looking for a powerful text-to-speech model that generates realistic dialogue? Nari Labs' Dia is a 1.6B parameter model that directly produces high-fidelity speech from text. Learn how this innovative tool can enhance your research and unlock new possibilities in audio generation.

What is Dia and How Can It Benefit You?

Dia is more than just a text-to-speech model; it's a dialogue generator capable of producing realistic conversations. Here's what makes it stand out:

Realistic Dialogue: Dia creates natural-sounding conversations directly from transcripts.
Emotion and Tone Control: You can condition the output on audio to influence the emotion and tone of the generated speech.
Nonverbal Cues: The model produces nonverbal cues like laughter and coughing to add realism.
Voice Cloning feature: Clone your voice, using provided script and audio.

Getting Started with Dia: A Quick Installation Guide

Ready to dive in? Here’s how to quickly install and run Dia:

Install via pip:

pip install git+https://github.com/nari-labs/dia.git

Run the Gradio UI: This opens a user-friendly interface to interact with the model.
```
git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py
```
Alternatively, if you don't have uv pre-installed:
```
git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e.
python app.py
```
- The model may produce different voices each time you run it, as it isn't fine-tuned on a specific voice. Add an audio prompt or fix the seed to maintain speaker consistency.

Unleash Dialogue Creation: Key Features of the Nari Labs Text-to-Speech Model

Dia offers a range of features to enhance your dialogue creation process

Dialogue Tags: Use [S1] and [S2] tags to designate different speakers in your script.
Non-Verbal tags: Emulate non-verbal cues such as (laughs), (coughs), etc.
Voice Cloning: Clone your own voice, by uploading audio you want to clone

Generate Realistic Speech: How to Use Dia as a Python Library

Integrate Dia directly into your Python projects for customized text-to-speech applications. Here's a simple example:

from dia. model import Dia
model = Dia. from_pretrained ( "nari-labs/Dia-1.6B", compute_dtype = "float16")
text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
output = model. generate ( text, use_torch_compile = True, verbose = True)
model. save_audio ( "simple.mp3", output)

Hardware Considerations and Inference Speed

Dia is optimized for GPUs (PyTorch 2.0+, CUDA 12.6). While CPU support is planned, GPUs offer the best performance. The initial model run might take longer due to the Descript Audio Codec download. Quantized versions are planned for the future.

Responsible Use of Nari Labs' Dia: A Crucial Disclaimer

While Dia provides powerful speech generation capabilities, ethical use is paramount. Avoid:

Identity Misuse: Creating audio that resembles real individuals without their explicit consent.
Deceptive Content: Generating misleading or false information.
Illegal Activities: Using the model for harmful or unlawful purposes.

By using Dia, you agree to adhere to these ethical guidelines and legal standards.

Call to Action: Contribute and Shape the Future of Dia

Nari Labs welcomes contributions to further enhance Dia's capabilities. Join the Discord server to participate in discussions and help shape the future of this exciting text-to-speech technology.

What is Dia and How Can It Benefit You?

Dia is more than just a text-to-speech model; it's a dialogue generator capable of producing realistic conversations. Here's what makes it stand out:

Realistic Dialogue: Dia creates natural-sounding conversations directly from transcripts.

Emotion and Tone Control: You can condition the output on audio to influence the emotion and tone of the generated speech.

Nonverbal Cues: The model produces nonverbal cues like laughter and coughing to add realism.

Voice Cloning feature: Clone your voice, using provided script and audio.

Getting Started with Dia: A Quick Installation Guide

Ready to dive in? Here’s how to quickly install and run Dia:

Install via pip:

pip install git+https://github.com/nari-labs/dia.git

Run the Gradio UI: This opens a user-friendly interface to interact with the model.

git clone https://github.com/nari-labs/dia.git
cd dia && uv run app.py

Alternatively, if you don't have uv pre-installed:

git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e.
python app.py

The model may produce different voices each time you run it, as it isn't fine-tuned on a specific voice. Add an audio prompt or fix the seed to maintain speaker consistency.

Unleash Dialogue Creation: Key Features of the Nari Labs Text-to-Speech Model

Dia offers a range of features to enhance your dialogue creation process

Dialogue Tags: Use [S1] and [S2] tags to designate different speakers in your script.

Non-Verbal tags: Emulate non-verbal cues such as (laughs), (coughs), etc.

Voice Cloning: Clone your own voice, by uploading audio you want to clone

Generate Realistic Speech: How to Use Dia as a Python Library

Integrate Dia directly into your Python projects for customized text-to-speech applications. Here's a simple example:

Responsible Use of Nari Labs' Dia: A Crucial Disclaimer

While Dia provides powerful speech generation capabilities, ethical use is paramount. Avoid: