How to fine-tune Microsoft/Phi-3-mini-128k-instruct

Imports and Setup

Libraries: The code imports necessary libraries like datasets, transformers, peft, trl, and torch.

Table of Content

Imports and Setup Hyperparameters Model and Tokenizer Loading Data Processing Data Loading and Processing Training Evaluation Save the Fine-Tuned Model

Logging: Sets up logging to track the training process.

Python

import sys
import logging

import datasets
from datasets import load_dataset
from peft import LoraConfig
import torch
import transformers
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig

# Logging setup (you can customize this as needed)
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[logging.StreamHandler(sys.stdout)],
)
logger = logging.getLogger(__name__)

Hyperparameters

Hyperparameters: Defines two dictionaries, training_config, and peft_config, to store hyperparameters for training and PEFT (Parameter-Efficient Fine-Tuning) respectively.

Training Arguments: Creates a TrainingArguments object from the training_config dictionary.

PEFT Configuration: Creates a LoraConfig object from the peft_config dictionary, specifying the LoRA (Low-Rank Adaptation) settings for efficient fine-tuning.

Python

# Training hyperparameters
training_config = {
    "bf16": True,  # Use mixed precision
    "do_eval": False,
    "learning_rate": 5.0e-06,
    "log_level": "info",
    "logging_steps": 20,
    "logging_strategy": "steps",
    "lr_scheduler_type": "cosine",
    "num_train_epochs": 1,
    "max_steps": -1,
    "output_dir": "./checkpoint_dir",
    "overwrite_output_dir": True,
    "per_device_eval_batch_size": 4,
    "per_device_train_batch_size": 4,
    "remove_unused_columns": True,
    "save_steps": 100,
    "save_total_limit": 1,
    "seed": 0,
    "gradient_checkpointing": True,
    "gradient_checkpointing_kwargs": {"use_reentrant": False},
    "gradient_accumulation_steps": 1,
    "warmup_ratio": 0.2,
}

# PEFT (LoRA) configuration
peft_config = {
    "r": 16,  # LoRA rank
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": "all-linear",
    "modules_to_save": None,
}

# Create TrainingArguments and LoraConfig objects
train_conf = TrainingArguments(**training_config)
peft_conf = LoraConfig(**peft_config)

Model and Tokenizer Loading

Checkpoint Path: Specifies the path to the pre-trained model, here “microsoft/Phi-3-mini-4k-instruct“.

Model Arguments: Defines model_kwargs with settings like use_cache, torch_dtype (bfloat16 for mixed precision), and attention implementation (flash_attention_2).

Model and Tokenizer Loading: Loads the pre-trained model using AutoModelForCausalLM.from_pretrained and the tokenizer using AutoTokenizer.from_pretrained.

Tokenizer Configuration: Sets the maximum sequence length, padding token, token ID, and padding side for the tokenizer.

Python

# Model checkpoint to fine-tune
checkpoint_path = "microsoft/Phi-3-mini-4k-instruct"  # Or other Phi-3 model

# Model loading arguments
model_kwargs = dict(
    use_cache=False,
    trust_remote_code=True,
    attn_implementation="flash_attention_2",  # Flash Attention support
    torch_dtype=torch.bfloat16,
    device_map=None
)

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(checkpoint_path, **model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)

# Tokenizer configuration
tokenizer.model_max_length = 2048  # Set maximum sequence length
tokenizer.pad_token = tokenizer.unk_token  # Use unk as padding token
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = 'right'

Data Processing

apply_chat_template Function: This function preprocesses the data by:

Adding an empty system message if none exists in the beginning.

Applying the chat template using the tokenizer to format the conversation.

Python

def apply_chat_template(example, tokenizer):
    messages = example["messages"]
    # Add an empty system message if there is none
    if messages[0]["role"] != "system":
        messages.insert(0, {"role": "system", "content": ""})
    example["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False
    )
    return example

Data Loading and Processing

Dataset Loading: Loads the “HuggingFaceH4/ultrachat_200k” dataset using the datasets library.

Dataset Splitting: Extracts the train_sft and test_sft splits for training and evaluation.

Data Processing: Applies the apply_chat_template function to both training and test datasets using the map function, preparing the data for the chat-based fine-tuning task.

Python

# Load the dataset
raw_dataset = load_dataset("HuggingFaceH4/ultrachat_200k")

# Extract train and test splits
train_dataset = raw_dataset["train_sft"]
test_dataset = raw_dataset["test_sft"]
column_names = list(train_dataset.features)  # Get column names 

# Process the datasets using the apply_chat_template function
processed_train_dataset = train_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    num_proc=10,
    remove_columns=column_names, 
    desc="Applying chat template to train_sft",
)
processed_test_dataset = test_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    num_proc=10,
    remove_columns=column_names,
    desc="Applying chat template to test_sft",
)

Training

Trainer Initialization: Creates an SFTTrainer object with the following arguments:

model: The loaded pre-trained model.

args: The TrainingArguments object.

peft_config: The LoraConfig object for LoRA settings.

train_dataset and eval_dataset: The processed training and evaluation datasets.

Other arguments like max_seq_length, dataset_text_field, tokenizer, and packing.

Training Execution: Starts the training process using trainer.train().

Metrics Logging and Saving: Logs and saves the training metrics.

Saving Trainer State: Saves the trainer state for potential resuming or further analysis.

Python

# Initialize the SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=train_conf,
    peft_config=peft_conf,
    train_dataset=processed_train_dataset,
    eval_dataset=processed_test_dataset,
    max_seq_length=2048,
    dataset_text_field="text",
    tokenizer=tokenizer,
    packing=True
)

# Train the model 
train_result = trainer.train()

# Log and save training metrics 
metrics = train_result.metrics 
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()

Evaluation

Tokenizer Adjustment: Changes the tokenizer padding side to ‘left‘ for evaluation.

Evaluation: Runs the evaluation using trainer.evaluate() and obtains evaluation metrics.

Metrics Logging and Saving: Logs and saves the evaluation metrics.

Python

# Adjust tokenizer padding side for evaluation
tokenizer.padding_side = 'left' 

# Evaluate the model
metrics = trainer.evaluate()

# Log and save evaluation metrics 
metrics["eval_samples"] = len(processed_test_dataset)
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

Save the Fine-Tuned Model

Saving Fine-Tuned Model: Saves the fine-tuned model to the specified output directory using trainer.save_model().

Python

# Save the fine-tuned model
trainer.save_model(train_conf.output_dir)

Top Stories

How to fine-tune Microsoft/Phi-3-mini-128k-instruct

How to Fine-tune Meta Llama-3 8B

C4AI Command R+ Everything You Need to Know

Stay Connected

How to fine-tune Microsoft/Phi-3-mini-128k-instruct

Imports and Setup

Hyperparameters

Model and Tokenizer Loading

Data Processing

Data Loading and Processing

Training

Evaluation

Save the Fine-Tuned Model

Other Posts

SDXL-Lightning model using hugging face Transformers

15 Best WordPress Blog Themes

Two Pass Compiler in Compiler Design

20 Cheapest Universities in Canada for International Students

Latest Posts

DeepLearning.AI just announced a course on Hugging Face

How to Fine-tune Meta Llama-3 8B

How to Fine-Tune LLMs with Hugging Face Transformers

Quick Links

About US

Top Stories

Stay Connected

Imports and Setup

Hyperparameters

Model and Tokenizer Loading

Data Processing

Data Loading and Processing

Training

Evaluation

Save the Fine-Tuned Model

You Might Also Like

Other Posts

Latest Posts

Quick Links

About US