[ad_1]
The place the assumptions behind the two-tower mannequin structure break — and the right way to transcend
Two-tower fashions are among the many most typical architectural design decisions in trendy recommender techniques — the important thing thought is to have one tower that learns relevance, and a second, shallow, tower that learns observational biases akin to place bias.
On this submit, we’ll take a better have a look at two assumptions behind two-tower fashions, particularly:
- the factorization assumption, i.e. the speculation that we will merely multiply the possibilities computed by the 2 towers (or add their logits), and
- the positional independence assumption, i.e. the speculation that the one variable that determines place bias is the place of the merchandise itself, and never the context during which it’s impressed.
We’ll see the place each of those assumptions break, and the right way to transcend these limitations with newer algorithms such because the MixEM mannequin, the Dot Product mannequin, and XPA.
Let’s begin with a really temporary reminder.
Two-tower fashions: the story up to now
The first studying goal for the rating fashions in recommender techniques is relevance: we would like the mannequin to foretell the absolute best piece of content material given the context. Right here, context merely means all the things that we’ve realized concerning the consumer, for instance from their earlier engagement or search histories, relying on the appliance.
Nonetheless, rating fashions normally exhibit sure statement biases, that’s, the tendency for customers to interact roughly with an impression relying on the way it was offered to them. Essentially the most outstanding statement bias is place bias — the tendency of customers to interact extra with objects which can be proven first.
The important thing thought in two-tower fashions is to coach two “towers”, that’s, neural networks, in parallel, the primary tower for studying relevance, and…
[ad_2]