模型:

distilbert-base-cased-distilled-squad

任务:

问答

类库:

PyTorch TensorFlow Rust Safetensors OpenVINO Transformers

数据集:

squad 3Asquad

语言:

其他:

distilbert Eval Results AutoTrain Compatible

预印本库:

arxiv:1910.01108 arxiv:1910.09700

许可:

apache-2.0

模型介绍文件清单

英文

DistilBERT基本大小写蒸馏SQuAD

模型详细信息

模型说明：DistilBERT模型提出于博客文章 Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT 和论文 DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter 中。DistilBERT是一个小型，快速，廉价且轻量级的Transformer模型，通过蒸馏BERT base进行训练。它的参数比bert-base-uncased少40％，运行速度比BERT快60％，同时在GLUE语言理解基准测试中保持了超过95％的BERT性能。

这个模型是 DistilBERT-base-cased 的微调检查点，使用 SQuAD v1.1 上的知识蒸馏进行了（第二步的）微调。

开发者：Hugging Face
模型类型：基于Transformer的语言模型
语言：英语
许可证：Apache 2.0
相关模型： DistilBERT-base-cased
获取更多信息的资源：
- 有关Distil*（包括此模型的一类压缩模型）的更多信息，请参见 this repository 。
- 有关知识蒸馏和训练过程的更多信息，请参见 Sanh et al. (2019) 。

如何开始使用该模型

使用下面的代码开始使用该模型。

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
... """

>>> result = question_answerer(question="What is a good example of a question answering dataset?",     context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160

下面是在PyTorch中使用此模型的方法：

from transformers import DistilBertTokenizer, DistilBertModel
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')
model = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

print(outputs)

在TensorFlow中的用法：

from transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering
import tensorflow as tf

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

inputs = tokenizer(question, text, return_tensors="tf")
outputs = model(**inputs)

answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
tokenizer.decode(predict_answer_tokens)

用途

该模型可用于问答。

误用和超出范围的使用

请不要使用该模型故意创建对人们有敌意或疏远的环境。此外，该模型并未经过培训，不能成为人或事件的事实或真实的代表，因此，使用该模型生成此类内容超出了该模型的能力范围。

风险、限制和偏见

内容警告：读者应注意，该模型生成的语言可能对某些人令人不安或冒犯，并可能传播历史和当前的刻板印象。

对语言模型进行了重要的研究，探讨了偏见和公平性问题（参见例如 Sheng et al. (2021) 和 Bender et al. (2021) ）。模型生成的预测可能包含针对受保护群体、身份特征和敏感社会和职业群体的令人不安和有害的刻板印象。例如：

>>> from transformers import pipeline
>>> question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

>>> context = r"""
... Alice is sitting on the bench. Bob is sitting next to her.
... """

>>> result = question_answerer(question="Who is the CEO?", context=context)
>>> print(
... f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"
...)

Answer: 'Bob', score: 0.7527, start: 32, end: 35

用户（直接或间接）应意识到该模型的风险、偏见和限制。

训练

训练数据

distilbert-base-cased model 是使用与 distilbert-base-uncased model 相同的数据进行训练的。 distilbert-base-uncased model 模型描述了其训练数据如下：

DistilBERT在与BERT相同的数据上进行了预训练，该数据集是 BookCorpus ，包括11,038本未公开的书籍和 English Wikipedia （不包括列表、表格和标题）。

要了解有关SQuAD v1.1数据集的更多信息，请参见 SQuAD v1.1 data card 。

训练过程预处理

有关详细信息，请参见 distilbert-base-cased model card 。

预训练

有关详细信息，请参见 distilbert-base-cased model card 。

评估

如 model repository 所讨论

该模型在[SQuAD v1.1]开发集上达到87.1的F1分数（作为比较，BERT的bert-base-cased版本达到88.7的F1分数）。

环境影响

可以使用 Machine Learning Impact calculator 在 Lacoste et al. (2019) 中提供的方式估算碳排放量。我们基于 associated paper 给出硬件类型和使用时间。请注意，这些细节仅适用于DistilBERT的训练，不包括与SQuAD的微调。

硬件类型：8个16GB的V100 GPU
使用时间：90小时
云提供商：未知
计算区域：未知
排放碳量：未知

技术规格

有关建模架构、目标、计算基础设施和训练细节的详细信息，请参见 associated paper 。

引文信息

@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  booktitle={NeurIPS EMC^2 Workshop},
  year={2019}
}

APA：

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

模型卡片作者

本模型卡片由Hugging Face团队撰写。

作者:

None

数据集大小:

1.66 GB