In-Context Image Editing: Achieve SOTA Results with Minimal Data and Parameters

ICEdit Teaser

Looking for a new way to perform instruction-based image editing? In-Context Edit (ICEdit) offers cutting-edge results utilizing a fraction of the data and parameters compared to existing methods. This innovative approach uses in-context generation within a large-scale diffusion transformer to achieve state-of-the-art performance. No need to spend hours training with massive datasets!

This article explores the capabilities of ICEdit, its implementation, and how you can start using it for instructional image editing today.

Why ICEdit Stands Out: Performance and Efficiency

ICEdit distinguishes itself through its efficiency and effectiveness. By leveraging in-context learning, it achieves remarkable results while requiring only 0.5% of the training data and 1% of the parameters compared to previous state-of-the-art methods.

Reduced Resource Consumption: Train models with significantly less data and computational power.
Superior Editing Precision: Execute multi-turn edits with high accuracy.
Diverse Applications: Achieve visually stunning results for single-turn editing tasks.

Addressing Potential Challenges: Tips for Optimal Results

While ICEdit offers impressive capabilities, understanding its limitations ensures optimal performance. Here are some tips to avoid common issues and maximize your editing success using instructional image editing:

Seed Variation: If you encounter unexpected results, try using a different seed value.
Style Considerations: Be aware that the base model, FLUX, may sometimes inadvertently alter the artistic style of your image due to its training data.
Image Realism: The model is primarily trained on realistic images. Editing non-realistic images (like anime) may lead to decreased success rates.
Object Removal: Object removal is something to be cautious, it may have low success rate since OmniEdit removal dataset is not high quality.

Getting Started with ICEdit: Installation and Inference

Ready to dive in and start using ICEdit? Follow these simple steps to get the official implementation up and running.

Installation

Conda Environment Setup:

conda create -n icedit python=3.10
conda activate icedit
pip install -r requirements.txt
pip install -U huggingface_hub

Download Pretrained Weights:
- If you have a Hugging Face account or installed the Hugging Face Hub package, the weights should download automatically. Otherwise, download them manually.
- Required files:
  - Flux.1-fill-dev
  - ICEdit-MoE-LoRA

Inference (Bash)

Run ICEdit from the command line:

python scripts/inference.py --image assets/girl.png \
--instruction " Make her hair dark green and her clothes checked. " \
--seed 42

If you're running on a system with limited GPU memory (e.g., 24GB), enable CPU offloading:

python scripts/inference.py --image assets/girl.png \
--instruction " Make her hair dark green and her clothes checked. " \
--enable-model-cpu-offload

If you downloaded the pretrained weights manually, specify their paths:

python scripts/inference.py --image assets/girl.png \
--instruction " Make her hair dark green and her clothes checked. " \
--flux-path /path/to/flux.1-fill-dev \
--lora-path /path/to/ICEdit-MoE-LoRA

Inference (Gradio Demo)

For a more user-friendly experience, use the Gradio demo:

python scripts/gradio_demo.py --port 7860

Optional parameters for memory management and local weights:

python scripts/gradio_demo.py --port 7860 \
--flux-path /path/to/flux.1-fill-dev (optional) \
--lora-path /path/to/ICEdit-MoE-LoRA (optional) \
--enable-model-cpu-offload (optional)

Access the demo in your browser using the provided link.

Key Considerations for Image Editing with ICEdit

Keep these points in mind for optimal results with in-context image editing :

Image Size: The model is designed for images with a width of 512 pixels. Images with different widths will be automatically resized.
Troubleshooting: If the model fails to produce the expected results, adjust the --seed parameter. Inference-time Scaling with VLM can help much to improve.

ICEdit vs. Commercial Models: A Cost-Effective Alternative

GPT-4o Comparison

ICEdit offers comparable or even superior performance compared to commercial models like Gemini and GPT-4o, particularly in character ID preservation and instruction following. Plus, it's open-source, with lower costs, faster processing speeds (approximately 9 seconds per image), and powerful performance.

Citing ICEdit

If you find ICEdit helpful for your research, please cite the following BibTeX entry:

@misc{zhang2025ICEdit,
title={In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer},
author={Zechuan Zhang and Ji Xie and Yu Lu and Zongxin Yang and Yi Yang},
year={2025},
eprint={2504.20690},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.20690},
}