Unleash the Power of Liquid: A Scalable Multimodal Generation Paradigm

Ready to experience the future of AI? Liquid offers a groundbreaking approach to multimodal generation, seamlessly blending visual comprehension and text generation. Learn how Liquid can revolutionize your AI projects!

What is Liquid?

Liquid is a cutting-edge autoregressive generation paradigm that unifies multimodal comprehension and generation. Instead of relying on external visual embeddings, Liquid achieves integration using a single large language model (LLM). This innovative approach delivers a new level of scalability and versatility.

Key Benefits of Using Liquid

Unified Multimodal Generation: Liquid seamlessly integrates visual and textual data, enabling powerful applications.
Single LLM Architecture: Eliminates the need for external pre-trained visual embeddings like CLIP, simplifying the architecture.
Scalable Performance: Performance drop diminished as model size increases.
Mutual Enhancement: The unified token space enables visual generation and comprehension to mutually enhance each other.

Getting Started with Liquid: Inference and Evaluation

Diving into the world of Liquid is easier than you think! Liquid's inference and evaluation don't require a complex setup. Because Liquid is based on a HuggingFace format language model, you just need the transformers library and basic components.

Simple Steps for Inference

Install Dependencies:

pip install gradio==4.44.1
pip install gradio_client==1.3.0

Run the Gradio Demo:
```
cd evaluation
python app.py
```

Examples of Single Inference

Text to Text (T2T): Pure Language Dialogue

python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt "Write me a poem about Machine Learning."

Image to Text (I2T): Image Understanding

python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt 'How to make this pastry?'

Text to Image (T2I): Image Generation

python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt "young blue dragon with horn lightning in the style of dd fantasy full body"

Add --load_8bit when using GPUs with less than 30GB VRAM

Liquid's Open-Source Plan: Models and Capabilities

Liquid is designed to be accessible and adaptable, with a detailed open-source plan:

Liquid-7B-IT: Instruction Tuned Multimodal Model with Instruction Following Ability
- [✅] Web Demo
- [✅] Evaluation
- [✅] Checkpoints
- [✅] Training Codes
Liquid-0.5B~32B-Pretrain: Multimodal extension models of six different scales ranging from 0.5B to 32B across three model families.
- Checkpoints

Scaling Law and Multimodal Generation

Liquid showcases a clear scaling law in multimodal generation across different model sizes ranging from 0.5B to 32B. Liquid excels at generating high-quality, photorealistic images of any aspect ratio from textual prompts using an autoregressive paradigm.

Dive Deeper: Installation, Training, and Further Resources

For detailed instructions on installation and training, refer to Data.md and TRAIN.md.

License Information

This project is licensed under the MIT License. See the LICENSE file for details.

What is Liquid?

Key Benefits of Using Liquid

Unified Multimodal Generation: Liquid seamlessly integrates visual and textual data, enabling powerful applications.

Single LLM Architecture: Eliminates the need for external pre-trained visual embeddings like CLIP, simplifying the architecture.

Scalable Performance: Performance drop diminished as model size increases.

Mutual Enhancement: The unified token space enables visual generation and comprehension to mutually enhance each other.

Getting Started with Liquid: Inference and Evaluation

Simple Steps for Inference

Install Dependencies:

pip install gradio==4.44.1
pip install gradio_client==1.3.0

Run the Gradio Demo:

cd evaluation
python app.py

Examples of Single Inference

Text to Text (T2T): Pure Language Dialogue

python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt "Write me a poem about Machine Learning."

Image to Text (I2T): Image Understanding

python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt 'How to make this pastry?'

Text to Image (T2I): Image Generation

python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt "young blue dragon with horn lightning in the style of dd fantasy full body"

Add --load_8bit when using GPUs with less than 30GB VRAM

Liquid's Open-Source Plan: Models and Capabilities

Liquid is designed to be accessible and adaptable, with a detailed open-source plan:

Liquid-7B-IT: Instruction Tuned Multimodal Model with Instruction Following Ability

[✅] Web Demo
[✅] Evaluation
[✅] Checkpoints
[✅] Training Codes

Liquid-0.5B~32B-Pretrain: Multimodal extension models of six different scales ranging from 0.5B to 32B across three model families.

Checkpoints

Unleash the Power of Liquid: A Scalable Multimodal Generation Paradigm

What is Liquid?

Key Benefits of Using Liquid

Getting Started with Liquid: Inference and Evaluation

Simple Steps for Inference

Examples of Single Inference

Liquid's Open-Source Plan: Models and Capabilities

Scaling Law and Multimodal Generation

Dive Deeper: Installation, Training, and Further Resources

License Information

Unleash the Power of Liquid: A Scalable Multimodal Generation Paradigm

What is Liquid?

Key Benefits of Using Liquid

Getting Started with Liquid: Inference and Evaluation

Simple Steps for Inference

Examples of Single Inference

Liquid's Open-Source Plan: Models and Capabilities

Scaling Law and Multimodal Generation

Dive Deeper: Installation, Training, and Further Resources

License Information

Related Posts