DBRX-Instruct: A step-by-step guide to use

Databricks released DBRX, an open, general-purpose LLM that outperforms other available open-source models on various benchmark tests. The model weights are available on hugging Face and GitHub for research as well as for commercial purposes.

Table of Content

DBRX Base and DBRX-Instruct A New State-of-the-Art Open LLM Comparison with other Models Quickstart Guide Run the model on a CPU:Run the model on multiple GPUs:Limitations and Ethical Considerations Training Dataset Limitations Associated Risks and Recommendations

DBRX Base and DBRX-Instruct

DBRX-Base and DBRX-Instruct are the latest open-source Mixture-of-Experts (MoE) models with 36B active and 132B Total Parameters. Both the base version (DBRX Base) and the fine-tuned version (DBRX Instruct) of DBRX are trained from stretch and can be executed or optimized on public, custom, or proprietary datasets.

Feature	DBRX Base	DBRX Instruct
Model Type	Foundation Model	Pre-trained model for specific tasks
Customizability	Highly customizable for specific applications	Pre-programmed for following instructions and tasks
Purpose	Blank canvas for building specialized AI apps	Designed for tasks like question answering, summarization, creative writing, and coding assistance
Fine-Tuning	Requires fine-tuning for specific tasks	Pre-trained for few-turn interactions and task completion
Usage	Flexible across a range of applications	Specifically optimized for tasks within a conversation
Specialization	General-purpose, adaptable	Task-specific, optimized for instruction following

A New State-of-the-Art Open LLM

DBRX outperforms models such as LLaMA2-70B, Mixtral, and Grok-1 in language comprehension, programming, math, and logic tasks. The model has been identified by Databricks’ Gauntlet, an open-source benchmarking tool, to excel in over 30 advanced benchmarks, indicating ongoing enhancements to model quality.

In terms of efficiency and control, DBRX is the preferred choice for enterprises replacing proprietary models with open-source alternatives, as it outperforms GPT-3.5 in most tests. Databricks has observed this pattern among its 12,000+ client base, with many achieving higher quality and speed by customizing open-source models to their specific requirements.

Here are the key points about DBRX:

Model Type: Utilizes a mixture-of-experts (MoE) architecture with 132B total, 36B active parameters.
Model Base: It’s transformer-based decoder-only that was trained using next-token prediction.
Training Data: Trained on 12 trillion tokens.
Context Length: Supports a maximum context length of 32k tokens.
Licensing: Follows a Llama-like license, with non-commercial terms allowing up to 700 million users and prohibiting training on outputs.
Fine-Tuning: While details on fine-tuning are currently lacking, the Instruct model is pre-tuned for specific tasks, and it’s unclear whether reinforcement learning from human feedback (RLHF) was used.
Estimated Cost: Estimated to cost $10-30 million, involving significant resources such as focused time, hardware, engineering efforts, and data contracts.

Comparison with other Models

Model	MMLU	GSM8K	HumanEval
GPT-4	86.4	92	67
Llama2-70B	69.8	54.4	23.7
Mixtral-8x7B-base	70.6	74.4	40.2
Qwen1.5-72B	77.5	79.5	41.5
DBRX-4x33B-instruct	73.7	66.9	70.1

Quickstart Guide

Note: Manual approval is required to access the Base model.

Transformers library makes it simple to start working with DBRX models. The following packages and approximately 264GB of RAM are needed for this model:

! pip install transformers tiktoken

The hf_transfer package, as described by Huggingface, can be utilized to enhance download speed.

pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

To download the model, you will have to submit a request for access to this repository. After this is approved, provide the following token and receive an access token with read permission.

# Or requirements-gpu.txt to use flash attention on GPU(s)
pip install -r requirements.txt
# Add your Hugging Face token in order to access the model
huggingface-cli login
# See generate.py to change the prompt and other settings
python generate.py

The requirements and generate.py file are available on GitHub repository here.

Run the model on a CPU:

Import necessary libraries:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

Load the tokenizer and model
Replace “hf_YOUR_TOKEN” with your actual Hugging Face token

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", 
                                          device_map="cpu",  # Use CPU for inference
                                          torch_dtype=torch.bfloat16,  # Use BFloat16 for efficient memory usage
                                          trust_remote_code=True, 
                                          token="hf_YOUR_TOKEN")

# Prepare the input text
input_text = "How to Fine-tune LLM?"

# Format the input as a chat message
messages = [{"role": "user", "content": input_text}]

# Apply the chat template and tokenize the input
input_ids = tokenizer.apply_chat_template(messages, 
                                          return_dict=True,
                                          tokenize=True,
                                          add_generation_prompt=True,
                                          return_tensors="pt")

# To generate up to 200 new tokens
outputs = model.generate(**input_ids, max_new_tokens=200)

# Decode the generated text & Print
print(tokenizer.decode(outputs[0]))

Run the model on multiple GPUs:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")

input_text = "What is Fine-tuning?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

If your GPU system supports FlashAttention2, you can add attn_implementation=”flash_attention_2” as a keyword to AutoModelForCausalLM.from_pretrained() to achieve faster inference.

Limitations and Ethical Considerations

Training Dataset Limitations

With a December 2023 knowledge cutoff, DBRX models were trained using 12T text tokens.

DBRX use a mixture of code and language examples in its training dataset. The most of the training text consists of English-language. However, DBRX’s proficiency in non-English languages has not been tested. Therefore, DBRX should be considered a generalist model for text-based tasks in English language. It does not have multimodal capabilities.

Associated Risks and Recommendations

Certainly, here are minor adjustments to your text:

All foundation models are cutting-edge technologies with a range of dangers and the potential to produce biased, offensive, incomplete, or erroneous information. Before utilizing or sharing such output, users should use caution and assess whether it is accurate and suitable for the use case they have in mind.

Retrieval augmented generation, or RAG, is advised by Databricks for situations where precision and authenticity are crucial. Additionally, i advise anyone utilizing or refining DBRX Base or DBRX Instruct to conduct extra safety testing within the parameters of their specific application and area.

Top Stories

How to Fine-tune Meta Llama-3 8B

C4AI Command R+ Everything You Need to Know

Fine-tune Jamba-v0.1 on English quotes using QLoRA

Stay Connected

DBRX-Instruct: A step-by-step guide to use

DBRX Base and DBRX-Instruct

A New State-of-the-Art Open LLM

Comparison with other Models

Quickstart Guide

Run the model on a CPU:

Run the model on multiple GPUs:

Limitations and Ethical Considerations

Training Dataset Limitations

Associated Risks and Recommendations

Other Posts

DNA Animation Using Html & CSS

CSS 3D Glowing Animated Cube | Ambient Light Effects

Balloon Blast Game Project Using Python

ChatGPT Update Enables Internet Access with latest Information

Latest Posts

New site-level approval process for AdSense for Search (AFS) sites

What is Compiler Construction? Overview to Compiler Design

Google’s Gemini AI – The Most capable Language Model Yet

Quick Links

About US

Top Stories

Stay Connected

DBRX Base and DBRX-Instruct

A New State-of-the-Art Open LLM

Comparison with other Models

Quickstart Guide

Run the model on a CPU:

Run the model on multiple GPUs:

Limitations and Ethical Considerations

Training Dataset Limitations

Associated Risks and Recommendations

You Might Also Like

Other Posts

Latest Posts

Quick Links

About US