Exnrt Logo
  • Home
  • Technology
    • Artificial Intelligence
    • WordPress
  • Programming
    ProgrammingShow More
    Mistral AI Model
    Mistral-7B Instruct Fine-Tuning using Transformers LoRa
    19 1
    Hugging Face Website
    Hugging Face Transformers Pipeline, what can they do?
    15 1
    AI generated images using SDXL-Lightning huggingface
    SDXL-Lightning model using hugging face Transformers
    14 1
    Gemma AI Model
    Finetune Gemma Models with Transformers
    11 1
    HTML Quiz App
    Quiz App Using HTML, CSS, and JavaScript
    9 1
  • Business
    • Ads
    • SEO
  • AI Tools
    • AI Chatbot For Education
    • Ask a Question
    • News Title Generator
  • My Feed
    • My Interests
    • My Saves
    • History
Notification
Sign In
ExnrtExnrtExnrt
Font ResizerAa
  • Artificial Intelligence
  • Technology
  • Business
  • Ads
  • SEO
Search
  • Blog
  • Ads
  • Programming
  • Technology
  • Artificial Intelligence
  • WordPress
  • SEO
  • Business
  • Education

Top Stories

Explore the latest updated news!
Fine Tuning Siglip2 a ViT on Image Classification Task.

Fine Tuning Siglip2 on Image Classification Task

6
AI-Generated-Image-using-Flux-1

How to Fine-Tune Flux.1 Using AI Toolkit

8
microsoft/Phi-3-mini-128k-instruct

How to fine-tune Microsoft/Phi-3-mini-128k-instruct

12

Stay Connected

Find us on socials
248.1k Followers Like
61.1k Followers Follow
165k Subscribers Subscribe
Artificial IntelligenceBlogProgramming

Finetune Gemma Models with Transformers

In this post we will briefly discuss how to use PEFT libraries and Hugging Face Transformers to achieve Parameter Efficient FineTuning (PEFT) on Gemma models.
Ateeq Azam
Last updated: March 17, 2024 8:05 am
By Ateeq Azam Add a Comment 11 1
Share
Gemma AI Model
Gemma AI Model
SHARE

Gemma, a open source language model of Google Deepmind’s, has been made available to the online community via Hugging Face. It is available in two sizes, 2 billion and 7 billion parameters, and comes with both pre-trained and instruction-tracked versions. You can use it on Github, with TGI support, or deploy and fine-tune it with Vertex Model Garden and Google Kubernetes Engine for easy deployment and refinement.

Table of Content
What is Gemma?Prompt formatUsing gemma-7b-it with TransformersHow to Finetune Gemma Models with HuggingFace and PEFT1. Download the Model and Tokenizer2. Test model3. start the finetuning process4. Use LoRA config5. Test Finetuned ModelWhy PEFT?PyTorch on GPU and TPUNext StepsLet’s Wrap

To ensure optimal integration with the Hugging Face ecosystem, Google and Hugging Face collaborated. The Hub offers access to the 4 open-access models, including 2 basic models and 2 fine-tuned ones.

What is Gemma?

Gemma is a family of 4 new LLM models by Google based on Gemini. Two sizes, 2 billion and 7 billion parameters, were used to categorize Gemma, a set of large language models. Writing, translating, and answering questions are among the many functions that these models can perform. These models can be used on a variety of computers without special techniques to reduce their size and can retain information from up to 8,000 words of text.

  • gemma-2b: Base 2B model.
  • gemma-2b-it: Instruction fine-tuned version of the base 2B model.
  • gemma-7b: Base 7B model.
  • gemma-7b-it: Instruction fine-tuned version of the base 7B model.

In what sense are the Gemma models good? Below is a summary of the base models’ performance on the LLM Leaderboard in relation to other open models (higher scores indicate better performance):

ModelLicenseCommercial use?Pretraining size [tokens]Leaderboard score ⬇️
LLama 2 70B Chat (reference)Llama 2 license✅2T67.87
Gemma-7BGemma license✅6T63.75
DeciLM-7BApache 2.0✅unknown61.55
PHI-2 (2.7B)MIT✅1.4T61.33
Mistral-7B-v0.1Apache 2.0✅unknown60.97
Llama 2 7BLlama 2 license✅2T54.32
Gemma 2BGemma license✅2T46.51
Gemma COMPARISON with other LLMs

Prompt format

The base models have no prompt format. They can be used to maintain an input sequence with a plausible continuation or for zero-shot/few-shot inference, just like other base models. They provide a solid foundation for fine-tuning your own use cases. The Instruct versions feature a basic conversation structure:

Copy
<start_of_turn>user
knock knock<end_of_turn>
<start_of_turn>model
who is there<end_of_turn>
<start_of_turn>user
LaMDA<end_of_turn>
<start_of_turn>model
LaMDA who?<end_of_turn>

This format must be reproduced exactly for effective use.

Using gemma-7b-it with Transformers

Here’s a quick rundown on how to use gemma-7b-it with transformers. The system mandates about 18 GB of RAM, which comprises consumer GPUs like 3090 or 4090.

PythonCopy
from transformers import AutoTokenizer, pipeline
import torch

model = "google/gemma-7b-it"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": "What is Fine Tuning?  Write in one line."},
]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(
    prompt,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)
print(outputs[0]["generated_text"][len(prompt):])
OutputCopy
Fine tuning is a machine learning technique that involves adjusting the parameters of a pre-trained model to fit a specific downstream task.

How to Finetune Gemma Models with HuggingFace and PEFT

Users are required to sign a consent form before they can access Gemma model artifacts.

In addition, Gemma models are compatible with torch.compile() with CUDA graphs, resulting in a ~4x inference time speedup!

To use Gemma models with transformers, make sure to use the latest transformers release:

Copy
pip install -U "transformers==4.38.1" --upgrade

1. Download the Model and Tokenizer

  • The Hugging Face Hub enables users to view the model artifacts after filling in the consent form.
  • Users who have submitted a consent form can access model samples from the Hugging Face Hub.
  • This example only shows the download model and tokenizer with BitsAndBytesConfig for the weight quantization.
PythonCopy
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])

2. Test model

Now, lets test model before starting the finetuning:

PythonCopy
text = "Qoute: The greatest glory in living"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The model makes a reasonable complement with some additional tokens:

OutputCopy
Qoute: The greatest glory in living lies not in never falling, but in rising every time we fall.

-Nelson Mandela

However, this is not the format we would prefer for the answer to follow. Let’s finetune this model to generate the output in following format:

Required Output FormatCopy
Qoute: The greatest glory in living lies not in never falling, but in rising every time we fall.

Author: Nelson Mandela

3. start the finetuning process

Ok. Let’s begin the process of fine-tuning. We will use the English quotes dataset for this purpose.

PythonCopy
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

4. Use LoRA config

And now we use LoRA config:

PythonCopy
import transformers
from trl import SFTTrainer

def formatting_func(example):
    text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
    return [text]

trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)
trainer.train()

5. Test Finetuned Model

After finishing this task, we can start examining the fine-tuned Gemma using the similar prompt we used previously:

PythonCopy
text = "Qoute: The best and most beautiful things in"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
OutputCopy
The best and most beautiful things in the world cannot be seen or even touched - they must be felt with the heart.

Author: Helen Keller

Why PEFT?

Language models are commonly trained using memory and compute-intensive methods, even for small sizes. This can be problematic for users who rely on openly available compute platforms like Colab or Kaggle for learning and experimentation. Additionally, the cost of adapting these models to different domains is a crucial consideration for enterprise users, making PEFT scalable and cost-effective.

PyTorch on GPU and TPU

Hugging Face transformers’ for gemma models are optimized for PyTorch and PyTorch/XLA. This makes Gemma models available for TPU and GPU users to access and experiment with as needed. TPU acceleration via PyTorch/XLA is also made possible for other Hugging Face models by this FSDP via SPMD integration.

Next Steps

We proceeded to examine a basic example that was taken from the source notebook and demonstrated the LoRA fine tuning technique for Gemma models. The complete colab for GPU is available here, and the complete script for TPU can be found here.

Let’s Wrap

In this work, a new family of open language models called Gemma is introduced. Gemma performs well on academic benchmarks for language understanding, safety, and reasoning. Two model sizes (2 billion and 7 billion parameters) are made available, along with pretrained and fine-tuned checkpoints. On 11 out of 18 text-based tasks, Gemma outperforms similarly sized open models. Hugging Face also provide thorough assessments of the models’ safety and responsibility features, along with a thorough explanation of our model’s development. We think that enhancing the safety of frontier models and facilitating the subsequent wave of LLM innovations depend on the responsible release of LLMs.

TAGGED:Artificial IntelligenceChatbotHugging FaceProgramming
Share This Article
Facebook Twitter Copy Link Print
What do you think?
Love3
Sad1
Happy1
Sleepy0
Angry0
Leave a comment
Subscribe
Login
Notify of
guest

guest

0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

You Might Also Like

Fine Tuning Siglip2 a ViT on Image Classification Task.
Fine Tuning Siglip2 on Image Classification Task
AI-Generated-Image-using-Flux-1
How to Fine-Tune Flux.1 Using AI Toolkit
microsoft/Phi-3-mini-128k-instruct
How to fine-tune Microsoft/Phi-3-mini-128k-instruct
AI Generated: A professional real llama looking like a hacker in a dark lab with light yellow lights
How to Fine-tune Meta Llama-3 8B

Other Posts

AI-Generated-Image-using-Flux-1
How to Fine-Tune Flux.1 Using AI Toolkit
Artificial Intelligence Blog
Generative AI
What is Generative AI? A Detailed Overview
Artificial Intelligence Blog Technology
Compiler Working
What is Compiler Construction? Overview to Compiler Design
Programming Blog
SEO - Search Engine Optimization
Why Are Google Rankings Different For Mobile And Desktop?
SEO Blog

Latest Posts

Uncover the Latest stories that related to your interest!
Fine Tuning Siglip2 a ViT on Image Classification Task.
Artificial IntelligenceBlog

Fine Tuning Siglip2 on Image Classification Task

April 14, 2025

At Exnrt.com, we believe in empowering computer science students with the knowledge and skills they need to succeed in their careers. Our goal is to provide accessible and engaging tutorials that help students and professionals develop their skills and advance their careers.

  • Categories:
  • Business
  • Technology
  • Ads
  • SEO

Quick Links

  • Blog
  • Technology
  • Artificial Intelligence
  • Business

About US

  • About Us
  • Contact Us
  • Privacy Policy

Copyright © 2024 All Rights Reserved – Exnrt by ateeq.pk

wpDiscuz
Welcome Back!

Sign in to your account

Register Lost your password?