Home Machine Learning LLMs and Transformers from Scratch: the Decoder | by Luís Roque

LLMs and Transformers from Scratch: the Decoder | by Luís Roque

0
LLMs and Transformers from Scratch: the Decoder | by Luís Roque

[ad_1]

Exploring the Transformer’s Decoder Structure: Masked Multi-Head Consideration, Encoder-Decoder Consideration, and Sensible Implementation

This submit was co-authored with Rafael Nardi.

On this article, we delve into the decoder part of the transformer structure, specializing in its variations and similarities with the encoder. The decoder’s distinctive function is its loop-like, iterative nature, which contrasts with the encoder’s linear processing. Central to the decoder are two modified types of the eye mechanism: masked multi-head consideration and encoder-decoder multi-head consideration.

The masked multi-head consideration within the decoder ensures sequential processing of tokens, a technique that stops every generated token from being influenced by subsequent tokens. This masking is necessary for sustaining the order and coherence of the generated information. The interplay between the decoder’s output (from masked consideration) and the encoder’s output is highlighted within the encoder-decoder consideration. This final step provides the enter context into the decoder’s course of.

We will even display how these ideas are applied utilizing Python and NumPy. Now we have created a easy instance of translating a sentence from English to Portuguese. This sensible method will assist illustrate the inside workings of the decoder in a transformer mannequin and supply a clearer understanding of its position in Massive Language Fashions (LLMs).

Determine 1: We decoded the LLM decoder (picture by the writer utilizing DALL-E)

As at all times, the code is accessible on our GitHub.

After describing the inside workings of the encoder in transformer structure in our earlier article, we will see the following section, the decoder half. When evaluating the 2 components of the transformer we consider it’s instructive to emphasise the primary similarities and variations. The eye mechanism is the core of each. Particularly, it happens in two locations on the decoder. They each have necessary modifications in comparison with the best model current on the encoder: masked multi-head consideration and encoder-decoder multi-head consideration. Speaking about variations, we level out the…

[ad_2]