Home Machine Learning MultiChoice Query Answering In HuggingFace | by Mina Ghashami | Feb, 2024

MultiChoice Query Answering In HuggingFace | by Mina Ghashami | Feb, 2024

0
MultiChoice Query Answering In HuggingFace | by Mina Ghashami | Feb, 2024

[ad_1]

Unveiling the facility of query answering

Picture from unsplash.com

Pure language processing strategies are demonstrating immense functionality on query answering (QA) duties. On this submit, we leverage the HuggingFace library to deal with a a number of alternative query answering problem.

Specifically, we fine-tune a pre-trained BERT mannequin on a multi-choice query dataset utilizing the Coach API. This permits adapting the highly effective bidirectional representations from pre-trained BERT to our goal activity. By including a classification head, the mannequin learns textual patterns that assist decide the right alternative out of a set of reply choices per query. We then consider efficiency utilizing accuracy throughout the held-out check set.

The Transformer framework permits rapidly experimenting with completely different mannequin architectures, tokenizer choices, and coaching approaches. On this evaluation, we exhibit a step-by-step recipe for attaining aggressive efficiency on a number of alternative QA by means of HuggingFace Transformers.

Step one is to put in and import the libraries. To put in the libraries use pip set up command as following:

!pip set up datasets transformers[torch] --quiet

after which import the required libraries:

import numpy as np
import pandas as pd
import os
import json
import torch
import torch.nn as nn
from torch.utils.information import Dataset, DataLoader

from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import (
AutoTokenizer,
Coach,
TrainingArguments,
set_seed,
DataCollatorWithPadding,
DefaultDataCollator
)
from datasets import load_dataset, load_metric
from dataclasses import dataclass, discipline
from typing import Non-compulsory, Union

Within the second step, we load the practice and check dataset. We use codah dataset which is obtainable for industrial use and is licensed by “odc-by”[1]

from datasets import load_dataset

codah = load_dataset("codah", "codah")

[ad_2]