Optimizing Small Language Fashions on a Free T4 GPU | by Yanli Liu

Machine Learning

Optimizing Small Language Fashions on a Free T4 GPU | by Yanli Liu | Jan, 2024

hhhhm

2024年1月31日

Optimizing Small Language Fashions on a Free T4 GPU | by Yanli Liu | Jan, 2024

[ad_1]

“Small” Giant Language Fashions (LLMs) are quickly changing into a game-changer within the area of synthetic intelligence.

In contrast to the standard LLMs which require important computational sources, these fashions are a lot smaller and extra environment friendly. Whereas their efficiency could be that of the bigger ones, they will simply function on customary gadgets reminiscent of laptops, and even go to the sting. This additionally signifies that they are often simply custom-made and built-in to be used in your knowledge set.

On this article, I’ll first clarify the fundamentals and internal workings of the mannequin fine-tuning and alignment processes. Then, I’ll information you thru the method of desire fine-tuning Phi 2, a small LLM with 2 billion parameters, utilizing a novel strategy known as Direct Desire Optimization (DPO).

Due to the small measurement of the mannequin and optimization methods reminiscent of quantization and QLoRA, we’ll be capable of carry out this course of by Google Colab utilizing the free T4 GPU! This requires some adaptation of the settings and hyperparameters utilized by Hugging Face to coach its Zephyr 7B mannequin.

Desk of Contents:

Why We Want Nice-Tuning and the Mechanics of Direct Desire Optimization (DPO)
1.1. Why We Want Nice-Tuning a LLM
1.2. What’s DPO and DPO vs. RLHF
1.3. Why Use DPO?
1.2. How you can Implement DPO?
An Overview of Key Elements within the DPO Course of
2.1. Hugging Face Transformers Reinforcement Studying (TRL) Library
2.2. Getting ready the Dataset
2.3. The Microsoft’s Phi2 Mannequin
Step-by-Step Information to Nice-Tuning Phi2 on T4 GPU
Closing ideas

Why We Want Nice-Tuning a LLM?

Though extremely succesful, Giant Language Fashions (LLMs)have their limits, particularly in dealing with the newest or particular area data captured in firm’s repositories. To deal with this, now we have two choices:

[ad_2]