SDXL-Lightning model using hugging face Transformers

SDXL-Lightning models are a set of text-to-image generation models distilled using a diffusion distillation method, aiming to achieve a new state-of-the-art in one-step/few-step 1024px image generation based on SDXL.

Table of Content

SDXL-Lightning: Speed and Quality Combined in Just 2 Steps Key Features How to use SDXL-Lightning text-to-image Generation with Transformers 2-Step, 4-Step, 8-Step LoRA 1-Step UNet FAQ’s 8step vs 4step which one more realistic photo?Whay to download/use LoRA?Does this model support 512 resolution?How to finetune lightning model or create my own version?Can’t load the model files. The same error whether it is 4 or 8.

These models are distilled from Hugging Face stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models. The image generation for 2-step, 4-step and 8-step models have amazing quality. However 1-step model is more experimental.

Both full UNet and LoRA checkpoints are included in the model. The LoRA models can be used with other base models, but the full UNet models are superior.

Generate with all configurations on Hugging Face, best quality: Demo
Real-time generation as you type, lightning-fast: Demo from fastsdxl.ai
Image Generation with Customization Features: Replicate Playground

Using SDXL-Lightning enables the generation of exceptionally high-quality images in a single step.

For more information, please refer to the research paper: SDXL-Lightning: Progressive Adversarial Diffusion Distillation. This open-source model available as part of the research.

Diffusion models, generative models, have advanced applications in text-to-image and video. However, their generation process is slow and computationally demanding. To accelerate high-quality samples, researchers are exploring methods like SDXL-Lightning, which combines progressive and adversarial distillation for fewer inference steps. The approach combines progressive and adversarial distillation, ensuring a balance between quality and mode coverage.

In this article, I will show you how to use SDXL Lightning to generate high quality images.

SDXL-Lightning: Speed and Quality Combined in Just 2 Steps

SDXL-Lightning is a new text-to-image generation model created by researchers at ByteDance, to generate high quality images. The method employs Progressive Adversarial Diffusion Distillation, which enables the fast creation of high-resolution (1024px) images in a few steps. The model has been open-sourced for research and provides a range of configurations and checkpoints for different inference steps.

ByteDance, the parent company of TikTok, has released SDXL Lightning weights, which means they are not directly linked to stability.ai! Its optimization for 1024x1024px makes it appear to produce better results than SDXL Turbo and LCM models.

Key Features

Fast Generation: Has the ability to quickly produce high quality images.
Progressive Adversarial Diffusion Distillation: Uses modern technology to convert written text into images.
Open Source: Models and checkpoints are open for public use and experimentation.

How to use SDXL-Lightning text-to-image Generation with Transformers

To use SDXL-Lightning models with transformers, make sure to use the latest transformers release. To install latest version of transformers run following command in notebook. You can also use terminal for this purpose.

! pip install -U "transformers==4.38.1" --upgrade

Next Step is to install diffusers.

! pip install diffusers

Now import required libraries, classes & function using following python code snippets.

Python

import torch
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

ByteDance/SDXL-Lightning provides a complete range of model weights to download and includes supplementary information for each file in the model card. There are several options: LORAs, UNETs and Base Models with 1, 2, 4, or 8 step. I will focus on the 4-phase UNET model in the following sections.

After importing, the next step is to select the specific model that is required for download. You can also use any different ckpt if required. The list of other ckpt are available here.

Python

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_4step_unet.safetensors"

Now load your downloaded model.

Python

unet = UNet2DConditionModel.from_config(base, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
pipe = StableDiffusionXLPipeline.from_pretrained(base, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")

# Ensure sampler uses "trailing" timesteps.
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")

Well done! 👍 Now, your model is ready for use. Specify a prompt for the image you want to create and add a file name to it.

Python

pipe("A Panda Playing with football", num_inference_steps=4, guidance_scale=0).images[0].save("image_name.png")

Use this additional step if you want to view output on your notebook.

Python

from IPython.display import Image
Image("image_name.png")

Panda Image Generated with SDXL-Lightning — Made with SDXL Lightning

Check the Google Colab notebook for the previous steps here.

2-Step, 4-Step, 8-Step LoRA

If you are using non-SDXL base models then use LoRA only. Otherwise use the UNet checkpoint for better quality.

Python

import torch
from diffusers import StableDiffusionXLPipeline, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_4step_lora.safetensors" # Use the correct ckpt for your step setting!

# Load model.
pipe = StableDiffusionXLPipeline.from_pretrained(base, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo, ckpt))
pipe.fuse_lora()

# Ensure sampler uses "trailing" timesteps.
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")

# Ensure using the same inference steps as the loaded model and CFG set to 0.
pipe("A panda smiling", num_inference_steps=4, guidance_scale=0).images[0].save("output.png")

1-Step UNet

The 1-step model is an experimental approach, but its stability is significantly reduced. The 2-step method is a more effective option for better quality.

In the 1-step model, “sample” is used to predict everything better than “epsilon” (which doesn’t work better in this situation). The scheduler needs to be configured correctly.

Python

import torch
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_1step_unet_x0.safetensors" # Use the correct ckpt for your step setting!

# Load model.
unet = UNet2DConditionModel.from_config(base, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
pipe = StableDiffusionXLPipeline.from_pretrained(base, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")

# Ensure sampler uses "trailing" timesteps and "sample" prediction type.
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing", prediction_type="sample")

# Ensure using the same inference steps as the loaded model and CFG set to 0.
pipe("A panda playing with football", num_inference_steps=1, guidance_scale=0).images[0].save("output.png")

FAQ’s

8step vs 4step which one more realistic photo?

More step is better quality.

Whay to download/use LoRA?

If you opt for the entire UNet, there’s no necessity to download LoRA. It’s for if you want to use some community finetuned SDXL base model.

Does this model support 512 resolution?

This is a derivative of the SDXL model, which typically begins at 1024px. Therefore, I have reservations about its support for 512px. For a resolution of 512×512, consider using SDXL-Turbo directly from Stability AI.

How to finetune lightning model or create my own version?

Suggestion:

Train the regular SDXL model on your dataset, and then apply SDXL-Lightning LoRA on top for acceleration.
Preferably, train SDXL using LoRA as well, ensuring minimal model changes for optimal compatibility.

More advanced:

If the quality is not satisfactory, consider merging SDXL-Lightning LoRA with your model and then training on top. However, using MSE loss may diminish the acceleration effect.
For the most advanced approach, merge SDXL-Lightning LoRA and employ an adversarial objective, similar to what the SDXL-Lightning paper demonstrates.

Can’t load the model files. The same error whether it is 4 or 8.

Error occurred when executing CheckpointLoaderSimple:
‘model.diffusion_model.input_blocks.0.0.weight’

You probably downloaded the wrong weight.
Use:
sdxl_lightning_4step.safetensors
Not:
sdxl_lightning_4step_lora.safetensors
sdxl_lightning_4step_unet.safetensors

Top Stories

How to Fine-tune Meta Llama-3 8B

C4AI Command R+ Everything You Need to Know

Fine-tune Jamba-v0.1 on English quotes using QLoRA

Stay Connected

SDXL-Lightning model using hugging face Transformers

SDXL-Lightning: Speed and Quality Combined in Just 2 Steps

Key Features

How to use SDXL-Lightning text-to-image Generation with Transformers

2-Step, 4-Step, 8-Step LoRA

1-Step UNet

FAQ’s

8step vs 4step which one more realistic photo?

Whay to download/use LoRA?

Does this model support 512 resolution?

How to finetune lightning model or create my own version?

Can’t load the model files. The same error whether it is 4 or 8.

Other Posts

Can Google Detect AI-Generated Content like ChatGPT?

Quiz App Using HTML, CSS, and JavaScript

AD(x) Inc. || Google Certified Publishing Partners

10 Best eCommerce WordPress Themes Free & Paid

Latest Posts