数据集:

squad_v2

任务:

问答

子任务:

open-domain-qa extractive-qa

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1606.05250

许可:

cc-by-sa-4.0

数据集介绍文件清单

英文

数据集卡片："squad_v2"

数据集概述

将SQuAD1.1中的100,000个问题与由众包工人以类似可回答问题的方式撰写的50,000个无法回答的问题相结合。要在SQuAD2.0上表现良好，系统不仅必须在可能时回答问题，还必须确定段落不支持任何答案并放弃回答。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

squad_v2

下载的数据集文件大小：46.49 MB
生成的数据集大小：128.52 MB
总计使用的磁盘空间：175.02 MB

'validation'的示例如下所示。

This example was too long and was cropped:

{
    "answers": {
        "answer_start": [94, 87, 94, 94],
        "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"]
    },
    "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...",
    "id": "56ddde6b9a695914005b9629",
    "question": "When were the Normans in Normandy?",
    "title": "Normans"
}

数据字段

所有拆分之间的数据字段相同。

squad_v2

id：一个字符串特征。
title：一个字符串特征。
context：一个字符串特征。
question：一个字符串特征。
answers：包含以下内容的字典特征：
- text：一个字符串特征。
- answer_start：一个int32特征。

数据拆分

name	train	validation
squad_v2	130319	11873

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和规范化

More Information Needed

谁是源语言的生产者？

More Information Needed

注释

注释过程

More Information Needed

谁是注释员？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的考虑事项

附加信息

数据集维护者

More Information Needed

许可信息

More Information Needed

引用信息

@article{2016arXiv160605250R,
       author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
                 Konstantin and {Liang}, Percy},
        title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
      journal = {arXiv e-prints},
         year = 2016,
          eid = {arXiv:1606.05250},
        pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
       eprint = {1606.05250},
}

贡献者

感谢 @lewtun 、 @albertvillanova 、 @patrickvonplaten 、 @thomwolf 添加此数据集。

作者:

佚名

数据集大小:

16.47 KB

数据集卡片："squad_v2"

数据集概述

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的考虑事项

数据的社会影响

偏见讨论

其他已知限制

附加信息

数据集维护者

许可信息

引用信息

贡献者