Home Machine Learning SentenceTransformer: A Mannequin For Computing Sentence Embedding | by Mina Ghashami | Jan, 2024

SentenceTransformer: A Mannequin For Computing Sentence Embedding | by Mina Ghashami | Jan, 2024

0
SentenceTransformer: A Mannequin For Computing Sentence Embedding | by Mina Ghashami | Jan, 2024

[ad_1]

Convert BERT to an environment friendly sentence transformer

On this put up, we have a look at SentenceTransformer [1] which was printed in 2019. SentenceTransformer has a bi-encoder structure and adapts BERT to supply environment friendly sentence embeddings.

BERT (Bidirectional Encoder Illustration of Transformers) is constructed with the ideology that every one NLP duties depend on the that means of tokens/phrases. BERT is educated in two phases: 1) pre-training part the place BERT learns the overall that means of the language, and a pair of) fine-tuning the place BERT is educated on particular duties.

Picture taken from [3]

BERT is excellent at studying the that means of phrases/tokens. However It isn’t good at studying that means of sentences. In consequence it isn’t good at sure duties reminiscent of sentence classification, sentence pair-wise similarity.

Since BERT produces token embedding, one technique to get sentence embedding out of BERT is to common the embedding of all tokens. The SentenceTransformer paper [1] confirmed this produces very low high quality sentence embeddings nearly as unhealthy as getting GLOVE embeddings. These embeddings don’t seize the that means of sentences.

Picture by writer

With a view to create sentences embeddings from BERT which might be significant, SentenceTransformer trains BERT on few sentence associated process reminiscent of:

  1. NLI (pure language inferencing): This process receives two enter sentences and outputs both “entailment”, “contradiction” or “impartial”. In case of “entailment” sentence1 entails sentence 2. In case of “contradiction” sentence1 contradicts sentence2. And within the third case which is “impartial” the 2 sentences don’t have any relation.
  2. STS (sentence textual similarity): This process receives two sentences and decides the similarity of them. Usually similarity is calculated utilizing cosine similarity perform.
  3. Triplet dataset

SentenceTransformer practice BERT on NLI process utilizing a Siamese community. Siamese means twins and it consists of…

[ad_2]