Hello everyone! Today, I’m going to dive into the step by step introduction of fine-tuning the most popular text-to-image model Flux.1. This guide will walk you through the process, making it accessible even if you’re new to the field.
FLUX has been taking the internet by storm this past month, and for good reason. Their claims of superiority to models like DALLE 3, Ideogram, and Stable Diffusion 3 have proven well founded. With capability to use the models being added to more and more popular Image Generation tools like Stable Diffusion Web UI Forge and ComyUI, this expansion into the Stable Diffusion space will only continue.
Prerequisites:
- Hardware: A powerful GPU with ample VRAM, such as an NVIDIA A100 or RTX 4090.
- Software:
- Python 3.8 or later
- PyTorch 1.12 or later
- Hugging Face Transformers
- AI Toolkit (available on GitHub)
- Dataset: A collection of images and corresponding text prompts that align with your desired fine-tuning goals.
Steps:
- Set Up the Environment:
- Clone the AI Toolkit repository from GitHub:
bash git clone https://github.com/ostris/ai-toolkit
- Install required dependencies:
bash cd ai-toolkit pip install -r requirements.txt
- Prepare the Dataset:
- Organize your images and text prompts into a structured format, such as JSON or CSV or Directory.
- Ensure that the image filenames match the corresponding text prompts.
- Consider using a data augmentation technique to increase the diversity of your training data.
- Configure the Training Script:
- Open the
train_lora_flux_24gb.py
script in your preferred text editor. - Modify the following parameters:
model_path
: Path to the pre-trained Flux.1 model.data_path
: Path to your dataset.output_dir
: Directory where the fine-tuned model will be saved.train_batch_size
: Batch size for training.eval_batch_size
: Batch size for evaluation.num_epochs
: Number of training epochs.learning_rate
: Learning rate.lora_rank
: Rank of the LoRA layers.lora_alpha
: Scaling factor for the LoRA layers.save_every
: Frequency of saving checkpoints.
- Start the Training:
- Run the training script:
bash python train_lora_flux_24gb.py
- The training process may take several hours or even days, depending on the size of your dataset and the hardware you’re using.
- Evaluate the Fine-Tuned Model:
- Generate images using the fine-tuned model and compare the results to the original model.
- Assess the quality of the generated images based on your specific requirements.
Details: AI-Toolkit training on Notebook
Install Required Dependencies
!nvidia-smi
!git clone https://github.com/ostris/ai-toolkit
!mkdir -p /content/dataset
# Put your image dataset in the /content/dataset folder
!cd ai-toolkit && git submodule update --init --recursive && pip install -r requirements.txt
Model License
Training currently only works with FLUX.1-dev. Which means anything you train will inherit the non-commercial license. It is also a gated model, so you need to accept the license on HF before using it. Otherwise, this will fail. Here are the required steps to setup a license.
Sign into HF and accept the model access here black-forest-labs/FLUX.1-dev.
Get a READ key from huggingface and place it in the next cell after running it.
import getpass
import os
# Prompt for the token
hf_token = getpass.getpass('Enter your HF access token and press enter: ')
# Set the environment variable
os.environ['HF_TOKEN'] = hf_token
print("HF_TOKEN environment variable has been set.")
import os
import sys
sys.path.append('/content/ai-toolkit')
from toolkit.job import run_job
from collections import OrderedDict
from PIL import Image
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
Dataset Format
Datasets are a folder of images. captions need to be txt files with the same name as the image for instance image2.jpg and image2.txt. Only jpg, jpeg, and png are supported currently images will automatically be resized and bucketed into the resolution specified.
Training Code
This is your config. It is documented pretty well. Normally you would do this as a yaml file, but for colab, this will work. This will run as is without modification, but feel free to edit as you want.
from collections import OrderedDict
job_to_run = OrderedDict([
('job', 'extension'),
('config', OrderedDict([
# this name will be the folder and filename name
('name', 'my_first_flux_lora_v1'),
('process', [
OrderedDict([
('type', 'sd_trainer'),
# root folder to save training sessions/samples/weights
('training_folder', '/content/output'),
# uncomment to see performance stats in the terminal every N steps
#('performance_log_every', 1000),
('device', 'cuda:0'),
# if a trigger word is specified, it will be added to captions of training data if it does not already exist
# alternatively, in your captions you can add [trigger] and it will be replaced with the trigger word
# ('trigger_word', 'image'),
('network', OrderedDict([
('type', 'lora'),
('linear', 16),
('linear_alpha', 16)
])),
('save', OrderedDict([
('dtype', 'float16'), # precision to save
('save_every', 250), # save every this many steps
('max_step_saves_to_keep', 4) # how many intermittent saves to keep
])),
('datasets', [
OrderedDict([
('folder_path', '/content/dataset'),
('caption_ext', 'txt'),
('caption_dropout_rate', 0.05), # will drop out the caption 5% of time
('shuffle_tokens', False), # shuffle caption order, split by commas
('cache_latents_to_disk', True), # leave this true unless you know what you're doing
('resolution', [512, 768, 1024]) # flux enjoys multiple resolutions
])
]),
('train', OrderedDict([
('batch_size', 1),
('steps', 2000), # total number of steps to train 500 - 4000 is a good range
('gradient_accumulation_steps', 1),
('train_unet', True),
('train_text_encoder', False), # probably won't work with flux
('content_or_style', 'balanced'), # content, style, balanced
('gradient_checkpointing', True), # need the on unless you have a ton of vram
('noise_scheduler', 'flowmatch'), # for training only
('optimizer', 'adamw8bit'),
('lr', 1e-4),
# uncomment this to skip the pre training sample
# ('skip_first_sample', True),
# uncomment to completely disable sampling
# ('disable_sampling', True),
# uncomment to use new vell curved weighting. Experimental but may produce better results
# ('linear_timesteps', True),
# ema will smooth out learning, but could slow it down. Recommended to leave on.
('ema_config', OrderedDict([
('use_ema', True),
('ema_decay', 0.99)
])),
# will probably need this if gpu supports it for flux, other dtypes may not work correctly
('dtype', 'bf16')
])),
('model', OrderedDict([
# huggingface model name or path
('name_or_path', 'black-forest-labs/FLUX.1-dev'),
('is_flux', True),
('quantize', True), # run 8bit mixed precision
#('low_vram', True), # uncomment this if the GPU is connected to your monitors. It will use less vram to quantize, but is slower.
])),
('sample', OrderedDict([
('sampler', 'flowmatch'), # must match train.noise_scheduler
('sample_every', 250), # sample every this many steps
('width', 1024),
('height', 1024),
('prompts', [
# you can add [trigger] to the prompts here and it will be replaced with the trigger word
#'[trigger] holding a sign that says \'I LOVE PROMPTS!\'',
'woman with red hair, playing chess at the park, bomb going off in the background',
'a woman holding a coffee cup, in a beanie, sitting at a cafe',
'a horse is a DJ at a night club, fish eye lens, smoke machine, lazer lights, holding a martini',
'a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background',
'a bear building a log cabin in the snow covered mountains',
'woman playing the guitar, on stage, singing a song, laser lights, punk rocker',
'hipster man with a beard, building a chair, in a wood shop',
'photo of a man, white background, medium shot, modeling clothing, studio lighting, white backdrop',
'a man holding a sign that says, \'this is a sign\'',
'a bulldog, in a post apocalyptic world, with a shotgun, in a leather jacket, in a desert, with a motorcycle'
]),
('neg', ''), # not used on flux
('seed', 42),
('walk_seed', True),
('guidance_scale', 4),
('sample_steps', 20)
]))
])
])
])),
# you can add any additional meta info here. [name] is replaced with config name at top
('meta', OrderedDict([
('name', '[name]'),
('version', '1.0')
]))
])
Run it
Below does all the magic. Check your folders to the left. Items will be in output/LoRA/your_name_v1 In the samples folder, there are preiodic sampled. This doesnt work great with colab. They will be in /content/output
run_job(job_to_run)
# Check your ourput dir and get your slider
Additional Tips:
- Experiment with different hyperparameters: Adjust the learning rate, batch size, and other parameters to optimize the training process.
- Consider using a learning rate scheduler: This can help prevent overfitting and improve convergence.
- Monitor the training progress: Keep an eye on the loss function and evaluation metrics to ensure that the model is learning effectively.
- Share your fine-tuned model: Contribute to the community by sharing your trained model with others.
By following these steps and incorporating the additional tips, you can effectively fine-tune Flux.1 using the AI Toolkit to achieve your desired image generation goals.