Unleash the Power of Liquid: A Unified Multimodal Generation Paradigm

Discover Liquid, a groundbreaking auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation. This innovative approach utilizes a single large language model (LLM) to bridge the gap between text and visuals, unlocking unprecedented possibilities for multimodal applications.

Why Liquid is Revolutionizing Multimodal AI

Liquid stands apart from traditional multimodal large language models (MLLMs) by eliminating the reliance on external pretrained visual embeddings like CLIP. This streamlined architecture offers several key advantages:

Simplified Architecture: Reduces complexity and dependencies, making integration easier.
Enhanced Efficiency: Leverages the full potential of a single LLM for both visual and textual processing.
Unified Token Space: Allows visual generation and comprehension to mutually enhance each other.

Key Dates & Updates

Stay up-to-date with the latest advancements in Liquid:

2025-03-25: Updated data processing and model pretraining scripts.
2025-03-04: Released text-to-image and visual understanding evaluation scripts.
2025-02-28: Paper, demo, model, and project page officially launched.

Open-Source Potential: Liquid's Accessibility

Foundation Vision is committed to open-source principles, making Liquid accessible to researchers and developers worldwide. Explore the available resources:

Liquid-7B-IT (Instruction Tuned Multimodal Model)

[✅] Web Demo
[✅] Evaluation
[✅] Checkpoints
[✅] Training Codes

Liquid-0.5B~32B-Pretrain (Multimodal Extension Models)

Checkpoints available for various scales across three model families.

Hands-On with Liquid: Simple Inference Guide

Getting started with Liquid is straightforward, thanks to its HuggingFace-compatible format. You can perform both inference and evaluation with minimal dependencies.

Installation: pip install gradio==4.44.1 and pip install gradio_client==1.3.0
Run the Gradio Demo locally: cd evaluation followed by python app.py

If deploying on a GPU with less than 30GB VRAM, enable load_in_8bit in AutoModelForCausalLM.from_pretrained within app.py to prevent memory errors.

Real-World Examples: Putting Liquid to Work

See Liquid in action with these practical inference examples:

Pure Language Dialogue:

python inference_t2t.py --model_path Junfeng5/Liquid_V1_7B --prompt " Write me a poem about Machine Learning. "

Image Understanding:

python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B --image_path samples/baklava.png --prompt ' How to make this pastry? '

Image Generation (Text-to-Image):

python inference_t2i.py --model_path Junfeng5/Liquid_V1_7B --prompt " young blue dragon with horn lightning in the style of dd fantasy full body " --load_8bit # For GPUs < 30GB VRAM

Scaling Laws: The Future of Multimodal Generation

Liquid reveals a crucial insight: the performance drop associated with unified training of visual and language tasks diminishes as model size increases. Discover the Liquid scaling law for enhanced multimodal generation. Liquid exhibits clear scaling across sizes from 0.5B to 32B.

Unleashing Visual Creativity: Autoregressive Image Generation

Liquid can generate high-quality, photorealistic images of any aspect ratio through descriptive language. Experience the power of autoregressive image generation with unparalleled control and precision.

Dive Deeper: Installation, Training, and More

For detailed instructions on installation, training, and data processing, refer to Data.md and TRAIN.md. Start exploring the world of unified multimodal generation with Liquid, and discover the potential of this powerful AI paradigm.

License & Citation

This project is licensed under the MIT License. If you find Liquid useful, please cite the following:

@article { wu2024liquid,
 title = { Liquid: Language models are scalable multi-modal generators},
 author = { Wu, Junfeng and Jiang, Yi and Ma, Chuofan and Liu, Yuliang and Zhao, Hengshuang and Yuan, Zehuan and Bai, Song and Bai, Xiang},
 journal = { arXiv preprint arXiv:2412.04332},
 year = { 2024}
}