ixambert-base-cased用于问答任务的微调

这是在SQuAD v1.1和巴斯克语的实验版本SQuAD1.1（原始SQuAD1.1的1/3大小）上微调的多语言模型ixambert-base-cased的基本实现，能够回答英语、西班牙语和巴斯克语的基本事实问题。

概述

语言模型：ixambert-base-cased
语言：英语、西班牙语和巴斯克语
下游任务：抽取式问答（Extractive QA）
训练数据：SQuAD v1.1 + 巴斯克语的实验版SQuAD1.1
评估数据：SQuAD v1.1 + 巴斯克语的实验版SQuAD1.1
基础设施：1x GeForce RTX 2080

输出

该模型输出问题的答案，答案在原始上下文中的起始和结束位置，以及该文本片段作为正确答案的概率得分。例如：

{'score': 0.9667195081710815, 'start': 101, 'end': 105, 'answer': '1820'}

如何使用

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "MarcBrun/ixambert-finetuned-squad-eu-en"

# To get predictions
context = "Florence Nightingale, known for being the founder of modern nursing, was born in Florence, Italy, in 1820"
question = "When was Florence Nightingale born?"
qa = pipeline("question-answering", model=model_name, tokenizer=model_name)
pred = qa(question=question,context=context)

# To load the model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

超参数

batch_size = 8
n_epochs = 3
learning_rate = 2e-5
optimizer = AdamW
lr_schedule = linear
max_seq_len = 384
doc_stride = 128

作者:

Marc Brun

数据集大小:

677.69 MB