GPT Mannequin: How Does it Work?. Let’s look collectively below the hood with… | by Dmitrii Eliuseev

Machine Learning

GPT Mannequin: How Does it Work?. Let’s look collectively below the hood with… | by Dmitrii Eliuseev | Feb, 2024

hhhhm

2024年2月22日

GPT Mannequin: How Does it Work?. Let’s look collectively below the hood with… | by Dmitrii Eliuseev | Feb, 2024

[ad_1]

Let’s look collectively below the hood with Python and PyTorch

Throughout the previous couple of years, the thrill round AI has been monumental, and the primary set off of all that is clearly the arrival of GPT-based massive language fashions. Curiously, this method itself isn’t new. LSTM (lengthy short-term reminiscence) neural networks had been created in 1997, and a well-known paper, “Consideration is All You Want,” was printed in 2017; each had been the cornerstones of contemporary pure language processing. However solely in 2020 will the outcomes of GPT-3 be ok, not just for educational papers but in addition for the true world.

These days, everybody can chat with GPT in an online browser, however most likely lower than 1% of individuals truly know the way it works. Good and witty solutions from the mannequin can drive folks to assume that they’re speaking with an clever being, however is it so? Effectively, one of the best ways to determine it out is to see the way it works. On this article, we’ll take an actual GPT mannequin from OpenAI, run it regionally, and see step-by-step what’s going on below the hood.

This text is meant for newcomers and people who find themselves enthusiastic about programming and knowledge science. I’ll illustrate my steps with Python, however deep Python understanding won’t be required.

Let’s get into it!

Loading The Mannequin

For our check, I might be utilizing a GPT-2 “Massive” mannequin, made by OpenAI in 2019. This mannequin was state-of-the-art on the time, however these days it doesn’t have any enterprise worth anymore, and the mannequin will be downloaded without spending a dime from HuggingFace. What’s much more essential for us is that the GPT-2 mannequin has the identical structure because the newer ones (however the variety of parameters is clearly totally different):

The GPT-2 “massive” mannequin has 0.7B parameters (GPT-3 has 175B, and GPT-4, based on internet rumors, has 1.7T parameters).
GPT-2 has a stack of 36 layers with 20 consideration heads (GPT-3 has 96, and GPT-4, based on rumors, has 120 layers).
GPT-2 has a 1024-token context size (GPT-3 has 2048, and GPT-4 has a 128K context size).

Naturally, the GPT-3 and -4 fashions present higher leads to all benchmarks in comparison with the GPT-2. However first, they aren’t accessible for obtain (and…

[ad_2]