英文

DistilBERT base uncased distilled SQuAD

目录

  • 模型详情
  • 如何开始使用该模型
  • 使用场景
  • 风险、限制和偏见
  • 训练
  • 评估
  • 环境影响
  • 技术规格
  • 引用信息
  • 模型卡片作者

模型详情

模型描述:DistilBERT模型在博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT 和论文 DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter 中提出。DistilBERT是经过知识蒸馏训练的小型、快速、廉价和轻量级Transformer模型,相比于 bert-base-uncased,它的参数减少了40%,速度提高了60%,同时在GLUE语言理解基准测试上保持了超过95%的BERT性能。

该模型是 DistilBERT-base-uncased 的微调检查点,经过知识蒸馏在 SQuAD v1.1 上进行了(第二步的)微调。

  • 开发者:Hugging Face
  • 模型类型:基于Transformer的语言模型
  • 语言:英文
  • 许可协议:Apache 2.0
  • 相关模型: DistilBERT-base-uncased
  • 了解更多信息的资源:

如何开始使用该模型

请使用下面的代码开始使用该模型。

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')

>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """

>>> result = question_answerer(question="What is a good example of a question answering dataset?",     context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'SQuAD dataset', score: 0.4704, start: 147, end: 160

以下是在PyTorch中使用该模型的方法:

from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

answer_start_index = torch.argmax(outputs.start_logits)
answer_end_index = torch.argmax(outputs.end_logits)

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

以及在TensorFlow中使用该模型的方法:

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

使用场景

该模型可用于问答任务。

误用和超出范围的使用

模型不应被用于故意创建针对人群的敌意或疏离环境。此外,该模型的训练目标不是为了提供人物或事件的真实描述,因此使用该模型生成此类内容超出了该模型能力的范畴。

风险、限制和偏见

内容警告:读者应意识到该模型生成的语言可能对一些人造成困扰或冒犯,并可能传播历史和现有的刻板印象。

大量研究已探讨了语言模型的偏见和公平性问题(参见,例如, Sheng et al. (2021) Bender et al. (2021) )。该模型生成的预测可能包含对受保护类别、身份特征以及敏感的社会和职业群体的恶劣刻板印象。例如:

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-uncased-distilled-squad')

>>> context = r"""
... Alice is sitting on the bench. Bob is sitting next to her.
... """

>>> result = question_answerer(question="Who is the CEO?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'Bob', score: 0.4183, start: 32, end: 35

用户(直接用户和下游用户)应了解该模型的风险、偏见和限制。

训练

训练数据

distilbert-base-uncased model 模型描述了其训练数据如下:

DistilBERT在与BERT相同的数据上进行了预训练,该数据集包含了11,038本未发表的图书和 English Wikipedia (不包括列表、表格和标题)。

要了解更多关于SQuAD v1.1数据集的信息,请参阅 SQuAD v1.1 data card

训练过程预处理

有关详细信息,请参阅 distilbert-base-uncased model card

预训练

有关详细信息,请参阅 distilbert-base-uncased model card

评估

model repository 中所讨论的

该模型在[SQuAD v1.1]开发集上获得了86.9的F1分数(作为比较,Bert bert-base-uncased版本的F1分数为88.5)。

环境影响

使用 Machine Learning Impact calculator Lacoste et al. (2019) 中提供的方法,可以估计出碳排放量。我们根据 associated paper 提供的硬件类型和使用小时数进行了估计。请注意,这些详细信息仅用于DistilBERT的训练,不包括与SQuAD的微调。

  • 硬件类型:8个16GB V100 GPU
  • 使用小时数:90小时
  • 云供应商:未知
  • 计算区域:未知
  • 排放碳量:未知

技术规格

有关建模架构、目标、计算基础设施和训练细节的详细信息,请参阅 associated paper

引用信息

@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  booktitle={NeurIPS EMC^2 Workshop},
  year={2019}
}

APA格式:

  • Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

模型卡片作者

本模型卡片由Hugging Face团队编写。