Exnrt Logo
  • Home
  • Technology
    • Artificial Intelligence
    • WordPress
  • Programming
    ProgrammingShow More
    Mistral AI Model
    Mistral-7B Instruct Fine-Tuning using Transformers LoRa
    19 1
    Hugging Face Website
    Hugging Face Transformers Pipeline, what can they do?
    15 1
    AI generated images using SDXL-Lightning huggingface
    SDXL-Lightning model using hugging face Transformers
    14 1
    Gemma AI Model
    Finetune Gemma Models with Transformers
    11 1
    HTML Quiz App
    Quiz App Using HTML, CSS, and JavaScript
    9 1
  • Business
    • Ads
    • SEO
  • AI Tools
    • AI Chatbot For Education
    • Ask a Question
    • News Title Generator
  • My Feed
    • My Interests
    • My Saves
    • History
Notification
Sign In
ExnrtExnrtExnrt
Font ResizerAa
  • Artificial Intelligence
  • Technology
  • Business
  • Ads
  • SEO
Search
  • Blog
  • Ads
  • Programming
  • Technology
  • Artificial Intelligence
  • WordPress
  • SEO
  • Business
  • Education

Top Stories

Explore the latest updated news!
Fine Tuning Siglip2 a ViT on Image Classification Task.

Fine Tuning Siglip2 on Image Classification Task

4
AI-Generated-Image-using-Flux-1

How to Fine-Tune Flux.1 Using AI Toolkit

8
microsoft/Phi-3-mini-128k-instruct

How to fine-tune Microsoft/Phi-3-mini-128k-instruct

12

Stay Connected

Find us on socials
248.1k Followers Like
61.1k Followers Follow
165k Subscribers Subscribe
Artificial IntelligenceBlogProgramming

Mistral-7B Instruct Fine-Tuning using Transformers LoRa

Using the Mistral 7B model, this tutorial will give you an overview of how to fine-tune your natural language processing projects.
Ateeq Azam
Last updated: May 6, 2024 3:51 pm
By Ateeq Azam Add a Comment 19 1
Share
Mistral AI Model
Mistral AI Model
SHARE

Mistral 7B, is a Large Language Model (LLM) of Mistral.AI, is a powerful AI algorithm that trained on large datasets to generate coherent text and perform various natural language processing tasks. Pre-trained LLMs are limited to next-token prediction, and as a result, they cannot perform specific tasks. The base models are then adjusted to serve as helpful assistants by fine-tuning their instructions and answers.

Table of Content
Understanding Mistral 7BModel ArchitectureWhy Fine-tune LLMsFine-Tune Mistral-7B using LoRaLoading DatasetDataset FormattingLoading the Mistral 7B base modelTrain the ModelSave Model and Push to HubText the Trained ModelFine-tuning mistralai/Mistral-7B-Instruct-v0.2Conclusion

In this tutorial, you’ll learn how to use and fine-tune the Mistral 7B model for natural language processing projects. You will be taught to load the model, run inference, quantize, fine-tune it, merge it and push it to the Hugging Face Hub.

Understanding Mistral 7B

With great excitement, the Mistral AI team introduces the mistral 7B model as a new addition to the generative AI era. It’s a language model giant with 7.24 billion parameters and many new features.

Mistral AI 7b Features

The Mistral 7B is one of the most impressive, second major tongue models. Despite facing stiff competition from the Llama-1 34B, it outperforms the Llama-2 13B in all benchmark tests. Additionally, the Mistral 7B is adept at code-related tasks but still manages to perform well in English-language tasks. His versatility and strength are exemplified by this outstanding performance.

Both the code and the text model are released under the Apache 2.0 license, allowing it to be used without restriction. The Mistral 7B research paper (https://arxiv.org/pdf/2310.06825.pdf) outlines the model architecture, performance, and instruction refinement in greater depth.

Model Architecture

Mistral-7B-v0.1 is a transformer model, with the following architecture choices:

  • Grouped-Query Attention
  • Sliding-Window Attention
  • Byte-fallback BPE tokenizer

Why Fine-tune LLMs

Fine-tuning is the best way to train a model about task-specific things. The process of training a pre-trained model involves using custom datasets to improve its performance on specific tasks. The refinement of the base model involves transfer learning, which updates its parameters to reflect the acquired knowledge.

Full fine-tuning involves updating all model parameters, but this can be expensive and unattainable for a larger developer team. This is where LoRA and QLoRA come into the picture.

Before we start, we have to install (if already install then need to update) the essential libraries to avoid any error.

Copy
! pip install transformers trl accelerate torch bitsandbytes peft datasets -qU

Fine-Tune Mistral-7B using LoRa

The most common way to access Mistral 7B is through Hugging Face. There is also a feature, called Models, provides a more convenient way to access the model. The model or dataset can be load or fine-tuned without downloading, and it takes only a few minutes to start the task.

Loading Dataset

First, we need to load our mosaicml/instruct-v3 dataset. It’s a great collection of effective and safe tasks.

PythonCopy
from datasets import load_dataset

instruct_tune_dataset = load_dataset("mosaicml/instruct-v3")

Dataset Formatting

Let’s take a peek at our dataset.

It’s our job to merge these prompt and response columns into a single formatted prompt for fine-tuning.

PythonCopy
instruct_tune_dataset
Output
DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 56167
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 6807
    })
})

Since we want to generate a model that generates instructions – we’re going to filter away all the subset datasets and only used the dolly_hhrlhf component!

PythonCopy
instruct_tune_dataset = instruct_tune_dataset.filter(lambda x: x["source"] == "dolly_hhrlhf")
instruct_tune_dataset
Output
DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 34333
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 4771
    })
})

We’re going to train on a small subset of the data – if you were considering an Epoch based approach this would reduce the amount of time spent training!

PythonCopy
instruct_tune_dataset["train"] = instruct_tune_dataset["train"].select(range(5_000))
instruct_tune_dataset["test"] = instruct_tune_dataset["test"].select(range(200))
instruct_tune_dataset
Output
DatasetDict({
    train: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 5000
    })
    test: Dataset({
        features: ['prompt', 'response', 'source'],
        num_rows: 200
    })
})

In the following function we’ll be merging our prompt and response columns by creating the following template:

Copy
<s>### Instruction:
Use the provided input to create an instruction that could have been used to generate the response with an LLM.

### Input:
{input}

### Response:
{response}</s>

Let’s do it in python.

PythonCopy
def create_prompt(sample):
  bos_token = "<s>"
  original_system_message = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
  system_message = "Use the provided input to create an instruction that could have been used to generate the response with an LLM."
  response = sample["prompt"].replace(original_system_message, "").replace("\n\n### Instruction\n", "").replace("\n### Response\n", "").strip()
  input = sample["response"]
  eos_token = "</s>"

  full_prompt = ""
  full_prompt += bos_token
  full_prompt += "### Instruction:"
  full_prompt += "\n" + system_message
  full_prompt += "\n\n### Input:"
  full_prompt += "\n" + input
  full_prompt += "\n\n### Response:"
  full_prompt += "\n" + response
  full_prompt += eos_token

  return full_prompt
PythonCopy
create_prompt(instruct_tune_dataset["train"][0])

Loading the Mistral 7B base model

Next, we will implement 4-bit quantization with NF4-type setup using BitsAndBytes to load our model in 4-BIT precision. By reducing memory footprint and loading the model faster, it can be made compatible with Google Colab or consumer GPUs.

We’re going to load our model in 4bit, with double quantization, with bfloat16 as our compute dtype. You’ll notice we’re loading the instruct-tuned model – this is because it’s already adept at following tasks – we’re just teaching it a new one!

PythonCopy
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    device_map='auto',
    quantization_config=nf4_config,
    use_cache=False
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Let’s example how well the model does at this task currently:

PythonCopy
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0].replace(prompt, "")
PythonCopy
generate_response("### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.\n\n### Input:\nI think it depends a little on the individual, but there are a number of steps you’ll need to take.  First, you’ll need to get a college education.  This might include a four-year undergraduate degree and a four-year doctorate program.  You’ll also need to complete a residency program.  Once you have your education, you’ll need to be licensed.  And finally, you’ll need to establish a practice.\n\n### Response:", model)
OutputCopy
<s> 
To become a healthcare provider, you should pursue a college education, obtaining a four-year undergraduate degree and a four-year doctorate program. Afterward, you must complete a residency program. Once you have completed your education, you will need to become licensed. Finally, to establish a practice, you must have all of these steps completed.</s>

Train the Model

Now, we’re going to prepare our model for 4bit LoRA training! We can use these handy helper functions to achieve this goal thanks to huggingface and the peft library!

The fine-tuning process employs PEFT LoRa, which is based on the Low-Rank Adaptation (LoRA) method. The term “matrix” is used to describe a significant amount of information that we use when teaching (training) our model. The technique known as LoRa enables the use of smaller matrices that represent larger ones. The mechanism functions by exploiting the abundance of repetitive materials in the vast matrix, particularly for our objectives.

Consider the entire list of tasks that can be learned, but only a portion of them is required for our specific task to be complete. LoRa assists us in concentrating solely on that minor aspect. This approach ensures that we don’t have to go back and read the entire list every time we train our model for our specific job. LoRa’s core principle is anchored by that!

The GPU space is decreased even more by this method, as the model doesn’t have to handle and store irrelevant data. LoRa’s main purpose is to optimize the utilization of GPU resources, resulting in faster training and reduced computing power consumption.

PythonCopy
from peft import AutoPeftModelForCausalLM, LoraConfig, get_peft_model, prepare_model_for_kbit_training

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM"
)
PythonCopy
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

All that’s left to do is set up a number of hyper parameters.

PythonCopy
from transformers import TrainingArguments

args = TrainingArguments(
  output_dir = "mistral_instruct_generation",
  #num_train_epochs=5,
  max_steps = 100, # comment out this line if you want to train in epochs
  per_device_train_batch_size = 4,
  warmup_steps = 0.03,
  logging_steps=10,
  save_strategy="epoch",
  #evaluation_strategy="epoch",
  evaluation_strategy="steps",
  eval_steps=20, # comment out this line if you want to evaluate at the end of each epoch
  learning_rate=2e-4,
  bf16=True,
  lr_scheduler_type='constant',
)

Supervised fine-tuning (SFT) involves the use of labeled data to modify the pre-train Language Model (LLM) through supervised learning techniques. The weights of the model are adjusted based on the gradients obtained from the task-specific loss, which is measured as the difference between the predictions made by the LLM and the actual ground truth labels.

PythonCopy
from trl import SFTTrainer

max_seq_length = 2048

trainer = SFTTrainer(
  model=model,
  peft_config=peft_config,
  max_seq_length=max_seq_length,
  tokenizer=tokenizer,
  packing=True,
  formatting_func=create_prompt,
  args=args,
  train_dataset=instruct_tune_dataset["train"],
  eval_dataset=instruct_tune_dataset["test"]
)

Next, we call the train function, here we train the model for 100 steps.

PythonCopy
trainer.train()
Mistral-7B Fine-tuning Results

Save Model and Push to Hub

PythonCopy
trainer.save_model("exnrt_mistral_instruct")
PythonCopy
!pip install huggingface-hub -qU
PythonCopy
from huggingface_hub import notebook_login

notebook_login()
PythonCopy
trainer.push_to_hub("exnrt/exnrt_mistral_instruct")

Text the Trained Model

The fine-tuned model can be utilized for inference after the training.

PythonCopy
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt,  return_tensors="pt", add_special_tokens=True)
  model_inputs = encoded_input.to('cuda')

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0]
PythonCopy
generate_response("### Instruction:\nUse the provided input to create an instruction that could have been used to generate the response with an LLM.### Input:\n--Your input text to test the model.\n\n### Response:", merged_model)
🔻Fine-tuning mistralai/Mistral-7B-Instruct-v0.2

Fine-tuning mistralai/Mistral-7B-Instruct-v0.2

PythonCopy
! pip install -U bitsandbytes peft accelerate trl datasets wandb

Import

PythonCopy
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
import os, torch, wandb
from datasets import load_dataset
from trl import SFTTrainer

Login to Hugging Face

PythonCopy
from huggingface_hub import notebook_login
notebook_login()

Login to wandb and Create Project

PythonCopy
# Monitering the LLM
wandb.login(key = '')
run = wandb.init(
    project='Fine-tuning-Mistral', 
    job_type="training", 
    anonymous="allow"
)

Select Model and Dataset

PythonCopy
base_model = "mistralai/Mistral-7B-Instruct-v0.2" 
dataset_name = "mwitiderrick/lamini_mistral"
new_model = "Mistral-7b-v2-finetune"

Manage Dataset

PythonCopy
#Importing the dataset
dataset = load_dataset(dataset_name, split="train")
dataset["text"][100]

Model and Tokenizer Loading

PythonCopy
bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
        base_model,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

Adding the adapters in the layers

PythonCopy
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
model = get_peft_model(model, peft_config)

Test Before Fine-tuning

PythonCopy
prompt = "How to make banana bread?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])
PythonCopy
print(model)

Set Hyperparameter

PythonCopy
training_arguments = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="wandb"
)

Setting SFT parameters and starting training

PythonCopy
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length= None,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= False,
)
trainer.train()

Save Model

PythonCopy
# Save the fine-tuned model
trainer.model.save_pretrained(new_model)
wandb.finish()
model.config.use_cache = True
model.eval()

Test Fine-tuned Model

PythonCopy
prompt = "Can I find information about the code's approach to handling long-running tasks and background jobs?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

Conclusion

The Mistral 7B LLM’s fine tuning is a captivating blend of theoretical concepts and practical implementation. The theoretical basis of this process enables you to comprehend the extent of customization achievable with such a robust language model. It’s important to remember that fine-tuning is often a laborious process that requires experimentation and refinement to achieve the best possible outcome. This theoretical guide provides you with the necessary knowledge to begin your personal creation of Mistral 7B, tailored to your individual preferences.

TAGGED:Artificial IntelligenceHugging FaceProgramming
Share This Article
Facebook Twitter Copy Link Print
What do you think?
Love7
Sad0
Happy2
Sleepy0
Angry0
Leave a comment
Subscribe
Login
Notify of
guest

guest

0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

You Might Also Like

Fine Tuning Siglip2 a ViT on Image Classification Task.
Fine Tuning Siglip2 on Image Classification Task
AI-Generated-Image-using-Flux-1
How to Fine-Tune Flux.1 Using AI Toolkit
microsoft/Phi-3-mini-128k-instruct
How to fine-tune Microsoft/Phi-3-mini-128k-instruct
AI Generated: A professional real llama looking like a hacker in a dark lab with light yellow lights
How to Fine-tune Meta Llama-3 8B

Other Posts

ChatGPT
ChatGPT Update Enables Internet Access with latest Information
Artificial Intelligence Blog Technology
Fine Tuning Siglip2 a ViT on Image Classification Task.
Fine Tuning Siglip2 on Image Classification Task
Artificial Intelligence Blog
Google Search Engine
The SEO Impact of Using Keywords in Domain
SEO Blog
SadaPay Golden Ticket Free Code or 10 Invites
SadaPay Golden Ticket Free Code
Business Blog

Latest Posts

Uncover the Latest stories that related to your interest!
Fine Tuning Siglip2 a ViT on Image Classification Task.
Artificial IntelligenceBlog

Fine Tuning Siglip2 on Image Classification Task

April 14, 2025

At Exnrt.com, we believe in empowering computer science students with the knowledge and skills they need to succeed in their careers. Our goal is to provide accessible and engaging tutorials that help students and professionals develop their skills and advance their careers.

  • Categories:
  • Business
  • Technology
  • Ads
  • SEO

Quick Links

  • Blog
  • Technology
  • Artificial Intelligence
  • Business

About US

  • About Us
  • Contact Us
  • Privacy Policy

Copyright © 2024 All Rights Reserved – Exnrt by ateeq.pk

wpDiscuz
Welcome Back!

Sign in to your account

Register Lost your password?