Home Machine Learning High quality Tuning LLMs on a Single Shopper Graphic Card | by Naser Tamimi | Jan, 2024

High quality Tuning LLMs on a Single Shopper Graphic Card | by Naser Tamimi | Jan, 2024

0
High quality Tuning LLMs on a Single Shopper Graphic Card | by Naser Tamimi | Jan, 2024

[ad_1]

GENERATIVE AI

Learnings from high quality tuning a big language mannequin on a single client GPU

Picture by Creator (Midjourney).

After we take into consideration Massive Language Fashions or another generative fashions, the primary {hardware} that involves thoughts is GPU. With out GPUs, many developments in Generative AI, machine studying, deep studying, and information science would’ve been unimaginable. If 15 years in the past, avid gamers have been enthusiastic concerning the newest GPU applied sciences, at the moment information scientists and machine studying engineers be a part of them and pursue the information on this subject too. Though normally avid gamers and ML customers are two totally different sorts of GPUs and graphic playing cards.

Gaming customers normally use client graphic playing cards (resembling NVIDIA GeForce RTX Sequence GPUs), whereas ML and AI builders normally observe information about Information Heart and Cloud Computing GPUs (resembling V100, A100, or H100). Gaming graphic playing cards normally have a lot much less GPU reminiscence (at most 24GB as of January 2024) in comparison with Information Heart GPUs (within the vary of 40GB to 80GB normally). Additionally, their value is one other important distinction. Whereas most client graphic playing cards may very well be as much as $3000, most Information Heart graphic playing cards begin from that value and may go tens of hundreds of {dollars} simply.

Since many individuals, together with myself, may need a client graphic card for his or her gaming or every day use, they could be to see if they’ll use the identical graphic playing cards for coaching, fine-tuning, or inference of LLM fashions. In 2020, I wrote a complete article about whether or not we will use client graphic playing cards for information science tasks (hyperlink to the article). At the moment, the fashions have been largely small ML or Deep Studying fashions and even a graphic card with 6GB of reminiscence might deal with many coaching tasks. However, on this article, I’m going to make use of such a graphic card for giant language fashions with billions of parameters.

For this text, I used my Geoforce 3090 RTX card which has 24GB of GPU reminiscence. In your reference, information middle graphic playing cards resembling A100 and H100 have 40GB and 80GB of reminiscence respectively. Additionally, a typical AWS EC2 p4d.24xlarge occasion has 8 GPUs (V100) with a complete of 320GB of GPU reminiscence. As you possibly can see the distinction between a easy client…

[ad_2]