Stable Diffusion Textual Inversion: Master Image Generation

Want to control your Stable Diffusion image generation and create unique visual concepts? This tutorial shows you how to use Textual Inversion embeddings for precise image control. Learn to teach Stable Diffusion specific objects or styles, like turning ordinary photos into works of art!

What is Stable Diffusion Textual Inversion?

Textual Inversion is not just about fine-tuning; it's about teaching Stable Diffusion new tricks. It empowers the model to generate specific image concepts by creating new "words" within its understanding of images. Imagine adding "grooty" to Stable Diffusion's vocabulary!

Custom Concepts: Learn to generate specific objects or styles.
Precise Control: Fine-tune text prompts for improved output.
Combined Power: Use with Dreambooth for maximum influence.

Easy Installation for Stable Diffusion Textual Inversion

Let's get your environment set up with a few simple commands:

!pip install -qq accelerate tensorboard ftfy
!pip install -qq -U transformers
!pip install -qq -U diffusers
!pip install -qq bitsandbytes
!pip install gradio

# Create directories
!mkdir inputs_textual_inversion
!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

These commands install crucial libraries such as accelerate, transformers, and diffusers, preparing your system for Textual Inversion. We will also create the necessary directories for storing your training images.

Loading Stable Diffusion v1-5

Get the Stable Diffusion model files directly from Hugging Face:

apt-get install git-lfs && git-lfs clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5

This step will download all the necessary components that power the Stable Diffusion model, ensuring you have a local copy for faster and offline training.

Teaching Stable Diffusion a New Concept

Here's where the magic happens. Select images that represent the concept you want to teach the model. For this Stable Diffusion tutorial, we will be using pictures of a plastic Baby Groot toy.

Data is Key: Good images lead to better embeddings.
3-5 Images: Ideally, use about 3-5 images for training.
Diverse Data: Use images that showcase different perspectives.

urls = [
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/7/image/image.jpg",
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/5/image/image.jpg",
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/7/image/image.jpg"]

# Download Images
import requests
import glob
from io import BytesIO

def download_image(url):
 try:
 response = requests.get(url)
 except:
 return None
 return Image.open(BytesIO(response.content)).convert("RGB")

images = list(filter(None,[download_image(url) for url in urls]))
save_path = "./inputs_textual_inversion"
if not os.path.exists(save_path):
 os.mkdir(save_path)
[image.save(f"{save_path}/{i}.jpeg") for i, image in enumerate(images)]
image_grid(images, 1, len(images))

Defining Your Concept

Let's define what we aim to teach Stable Diffusion:

concept_name = "grooty"
initializer_token = "groot" 
what_to_teach = "object"
placeholder_token = f'<{concept_name}>'

concept_name: The unique name for your concept.
initializer_token: A similar existing word that will help guide the model.
what_to_teach: Is it an object or a style?
placeholder_token: The unique token to represent your idea.

Creating a Training Dataset

We'll construct the sentences using our new token, guiding Stable Diffusion to understand the visual features:

imagenet_templates_small = [
 "a photo of a {}",
 "a rendering of a {}",
 "a cropped photo of the {}",
 "the photo of a {}",
 "a photo of a clean {}",
 "a photo of a dirty {}",
 "a dark photo of the {}",
 "a photo of my {}",
 "a photo of the cool {}",
 "a close-up photo of a {}",
 "a bright photo of the {}",
 "a cropped photo of a {}",
 "a photo of the {}",
 "a good photo of the {}",
 "a photo of one {}",
 "a close-up photo of the {}",
 "a rendition of the {}",
 "a photo of the clean {}",
 "a rendition of a {}",
 "a photo of a nice {}",
 "a good photo of a {}",
 "a photo of the nice {}",
 "a photo of the small {}",
 "a photo of the weird {}",
 "a photo of the large {}",
 "a photo of a cool {}",
 "a photo of a small {}",
]

Stable Diffusion then tweaks image inputs and converts and adjusts them as needed. Thus increasing overall model acuity during training.

Load Tokenizer and Setup the New Tokens

Finally, use the CLIPTokenizer to add your brand-new token. This enables the Stable Diffusion model to recognize and use your custom concept:

tokenizer = CLIPTokenizer.from_pretrained(
 pretrained_model_name_or_path,
 subfolder="tokenizer")

# Add the placeholder token in tokenizer
num_added_tokens = tokenizer.add_tokens(placeholder_token)
if num_added_tokens == 0:
 raise ValueError(
 f"The tokenizer already contains the token {placeholder_token}. Please pass a different"
 " `placeholder_token` that is not already in the tokenizer."
 )

Congratulations! You've unlocked enhanced creative control over Stable Diffusion. Experiment with these techniques for amazing AI-generated art.

Stable Diffusion Textual Inversion: Master Image Generation

What is Stable Diffusion Textual Inversion?

Custom Concepts: Learn to generate specific objects or styles.
Precise Control: Fine-tune text prompts for improved output.
Combined Power: Use with Dreambooth for maximum influence.

Easy Installation for Stable Diffusion Textual Inversion

Let's get your environment set up with a few simple commands:

!pip install -qq accelerate tensorboard ftfy
!pip install -qq -U transformers
!pip install -qq -U diffusers
!pip install -qq bitsandbytes
!pip install gradio

# Create directories
!mkdir inputs_textual_inversion
!git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui

Loading Stable Diffusion v1-5

Get the Stable Diffusion model files directly from Hugging Face:

apt-get install git-lfs && git-lfs clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5

This step will download all the necessary components that power the Stable Diffusion model, ensuring you have a local copy for faster and offline training.

Teaching Stable Diffusion a New Concept

Here's where the magic happens. Select images that represent the concept you want to teach the model. For this Stable Diffusion tutorial, we will be using pictures of a plastic Baby Groot toy.

Data is Key: Good images lead to better embeddings.
3-5 Images: Ideally, use about 3-5 images for training.
Diverse Data: Use images that showcase different perspectives.

urls = [
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/7/image/image.jpg",
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/5/image/image.jpg",
 "https://datasets-server.huggingface.co/assets/valhalla/images/--/valhalla--images/train/7/image/image.jpg"]

# Download Images
import requests
import glob
from io import BytesIO

def download_image(url):
 try:
 response = requests.get(url)
 except:
 return None
 return Image.open(BytesIO(response.content)).convert("RGB")

images = list(filter(None,[download_image(url) for url in urls]))
save_path = "./inputs_textual_inversion"
if not os.path.exists(save_path):
 os.mkdir(save_path)
[image.save(f"{save_path}/{i}.jpeg") for i, image in enumerate(images)]
image_grid(images, 1, len(images))

Defining Your Concept

Let's define what we aim to teach Stable Diffusion:

concept_name = "grooty"
initializer_token = "groot" 
what_to_teach = "object"
placeholder_token = f'<{concept_name}>'

concept_name: The unique name for your concept.
initializer_token: A similar existing word that will help guide the model.
what_to_teach: Is it an object or a style?
placeholder_token: The unique token to represent your idea.

Creating a Training Dataset

We'll construct the sentences using our new token, guiding Stable Diffusion to understand the visual features:

imagenet_templates_small = [
 "a photo of a {}",
 "a rendering of a {}",
 "a cropped photo of the {}",
 "the photo of a {}",
 "a photo of a clean {}",
 "a photo of a dirty {}",
 "a dark photo of the {}",
 "a photo of my {}",
 "a photo of the cool {}",
 "a close-up photo of a {}",
 "a bright photo of the {}",
 "a cropped photo of a {}",
 "a photo of the {}",
 "a good photo of the {}",
 "a photo of one {}",
 "a close-up photo of the {}",
 "a rendition of the {}",
 "a photo of the clean {}",
 "a rendition of a {}",
 "a photo of a nice {}",
 "a good photo of a {}",
 "a photo of the nice {}",
 "a photo of the small {}",
 "a photo of the weird {}",
 "a photo of the large {}",
 "a photo of a cool {}",
 "a photo of a small {}",
]

Stable Diffusion then tweaks image inputs and converts and adjusts them as needed. Thus increasing overall model acuity during training.

Load Tokenizer and Setup the New Tokens

Finally, use the CLIPTokenizer to add your brand-new token. This enables the Stable Diffusion model to recognize and use your custom concept:

tokenizer = CLIPTokenizer.from_pretrained(
 pretrained_model_name_or_path,
 subfolder="tokenizer")

# Add the placeholder token in tokenizer
num_added_tokens = tokenizer.add_tokens(placeholder_token)
if num_added_tokens == 0:
 raise ValueError(
 f"The tokenizer already contains the token {placeholder_token}. Please pass a different"
 " `placeholder_token` that is not already in the tokenizer."
 )

Congratulations! You've unlocked enhanced creative control over Stable Diffusion. Experiment with these techniques for amazing AI-generated art.

Stable Diffusion Textual Inversion: Master Image Generation

What is Stable Diffusion Textual Inversion?

Easy Installation for Stable Diffusion Textual Inversion

Loading Stable Diffusion v1-5

Teaching Stable Diffusion a New Concept

Defining Your Concept

Creating a Training Dataset

Load Tokenizer and Setup the New Tokens

Stable Diffusion Textual Inversion: Master Image Generation

What is Stable Diffusion Textual Inversion?

Easy Installation for Stable Diffusion Textual Inversion

Loading Stable Diffusion v1-5

Teaching Stable Diffusion a New Concept

Defining Your Concept

Creating a Training Dataset

Load Tokenizer and Setup the New Tokens

Related Posts