Unleash the Power of Liquid: A Unified Approach to Multimodal AI Generation

Tired of juggling separate models for text and images? Liquid offers a groundbreaking solution: a single, scalable autoregressive generation paradigm that seamlessly combines multimodal comprehension and generation. Discover how Liquid can revolutionize your AI workflows.

What is Liquid?

Liquid is a novel approach to multimodal AI, integrating visual and language tasks into a single large language model (LLM). This unified architecture eliminates the need for external visual embeddings, simplifying development and improving efficiency.

Unified Architecture: Combines visual and language processing into a single LLM.
Simplified Development: No need for external visual embeddings like CLIP.
Improved Efficiency: Streamlined workflow for multimodal tasks.
Scalable Performance: Liquid shows clear Scaling Law in multimodal generation across different sizes(0.5B to 32B).

Key Benefits of Using Liquid

Liquid offers significant advantages over traditional multimodal AI approaches. Achieve better performance, and greater flexibility with this unified framework.

Visual Understanding: Analyze and interpret images with natural language prompts using the power of this new multimodal AI model.
Visual Generation: Generate high-quality, photorealistic images from text descriptions, customizable to any aspect ratio.
Multimodal Generation: Seamlessly combine visual and textual information for complex creative tasks, thanks to this brand-new multimodal AI model.

Getting Started with Liquid: Inference Made Easy

Using Liquid for inference is surprisingly straightforward. Because it's built as a Hugging Face language model, you only need the transformers library and a few basic components.

Run a Local Gradio Demo in Seconds

Ensure you have Python installed.

Install necessary packages.

pip install gradio==4.44.1 gradio_client==1.3.0

Navigate to the evaluation directory.
```
cd evaluation
```
Launch the demo!
```
python app.py
```

Single Inference Examples

Text-to-Text (Language Dialogue):

python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt "Write me a poem about Machine Learning."

Image-to-Text (Image Understanding):

python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt "How to make this pastry?"

Text-to-Image (Image Generation): Use this model for high quality image generation.
```
python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt "young blue dragon with horn lightning in the style of dd fantasy full body"
```
Tip: If you're running on a GPU with less than 30GB VRAM, add --load_8bit to avoid memory errors.

The Liquid Scaling Law: Bigger Is Better

Liquid uncovered a fascinating scaling law: the performance drop typically associated with unified training of visual and language tasks diminishes as the model size increases. This means larger Liquid models offer superior performance in both visual and language domains.

Open-Source and Ready to Use and Expand

Liquid is designed with accessibility in mind.

Open-Source Plan: Liquid-7B-IT (Instruction Tuned Multimodal Model with Instruction Following Ability)
- Web Demo: [✅]
- Evaluation: [✅]
- Checkpoints: [✅]
- Training Codes: [✅]
Pretrained Models: Liquid-0.5B~32B-Pretrain (Multimodal extension models of six different scales ranging from 0.5B to 32B across three model families.)

License

The project is licensed under the MIT License, offering flexibility for research and commercial use.

Contributing and Citing

If you find Liquid valuable, consider citing the project and contributing to its development. You can find citation details in the project's README.

What is Liquid?

Unified Architecture: Combines visual and language processing into a single LLM.

Simplified Development: No need for external visual embeddings like CLIP.

Improved Efficiency: Streamlined workflow for multimodal tasks.

Scalable Performance: Liquid shows clear Scaling Law in multimodal generation across different sizes(0.5B to 32B).

Key Benefits of Using Liquid

Liquid offers significant advantages over traditional multimodal AI approaches. Achieve better performance, and greater flexibility with this unified framework.

Visual Understanding: Analyze and interpret images with natural language prompts using the power of this new multimodal AI model.

Visual Generation: Generate high-quality, photorealistic images from text descriptions, customizable to any aspect ratio.

Multimodal Generation: Seamlessly combine visual and textual information for complex creative tasks, thanks to this brand-new multimodal AI model.

Getting Started with Liquid: Inference Made Easy

Using Liquid for inference is surprisingly straightforward. Because it's built as a Hugging Face language model, you only need the transformers library and a few basic components.

Run a Local Gradio Demo in Seconds

Ensure you have Python installed.

Install necessary packages.

pip install gradio==4.44.1 gradio_client==1.3.0

Navigate to the evaluation directory.

cd evaluation

Launch the demo!

python app.py

Single Inference Examples

Text-to-Text (Language Dialogue):

python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt "Write me a poem about Machine Learning."

Image-to-Text (Image Understanding):

python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt "How to make this pastry?"

Text-to-Image (Image Generation): Use this model for high quality image generation.

python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt "young blue dragon with horn lightning in the style of dd fantasy full body"

Tip: If you're running on a GPU with less than 30GB VRAM, add --load_8bit to avoid memory errors.

Open-Source and Ready to Use and Expand

Liquid is designed with accessibility in mind.

Open-Source Plan: Liquid-7B-IT (Instruction Tuned Multimodal Model with Instruction Following Ability)

Web Demo: [✅]
Evaluation: [✅]
Checkpoints: [✅]
Training Codes: [✅]

Pretrained Models: Liquid-0.5B~32B-Pretrain (Multimodal extension models of six different scales ranging from 0.5B to 32B across three model families.)