iTransformer: The Newest Breakthrough in Time Sequence Forecasting | by Marco Peixeiro

Machine Learning

iTransformer: The Newest Breakthrough in Time Sequence Forecasting | by Marco Peixeiro | Apr, 2024

hhhhm

2024年4月9日

iTransformer: The Newest Breakthrough in Time Sequence Forecasting | by Marco Peixeiro | Apr, 2024

[ad_1]

Uncover the structure of iTransformer and apply the mannequin in a small experiment utilizing Python.

The sphere of forecasting has seen quite a lot of exercise within the realm of basis fashions, with fashions like Lag-LLaMA, Time-LLM, Chronos and Moirai being proposed because the starting of 2024.

Nevertheless, their efficiency has been a bit underwhelming (for reproducible benchmarks, see right here), and I imagine that data-specific fashions are nonetheless the optimum answer in the mean time.

To that finish, the Transformer structure has been utilized in lots of kinds for time sequence forecasting, with PatchTST reaching state-of-the-art efficiency for long-horizon forecasting.

Difficult PatchTST comes the iTransformer mannequin, proposed in March 2024 within the paper iTransformer: Inverted Transformers Are Efficient for Time Sequence Forecasting.

On this article, we uncover the strikingly easy idea behind iTransformer and discover its structure. Then, we apply the mannequin in a small experiment and evaluate its efficiency to TSMixer, N-HiTS and PatchTST.

For extra particulars, ensure that to learn the unique paper.

Let’s get began!

The thought behind iTransformer comes from the conclusion that the vanilla Transformer mannequin makes use of temporal tokens.

Which means that the mannequin seems to be in any respect options at a single time step. Thus, it’s difficult for the mannequin to study temporal dependencies when taking a look at one time step at a time.

An answer to that drawback is patching, which was proposed with the PatchTST mannequin. With patching, we merely group time factors collectively earlier than tokenizing and embedding them, as proven beneath.

Visualizing patching. Right here, we have now a sequence of 15 timesteps, with a patch size of 5 and a stride of 5 as effectively, leading to three patches. Picture by the creator.

In iTransformer, we push patching to the intense by merely making use of the mannequin on the inverted dimensions.

[ad_2]