数据集:

cosmos_qa

任务:

多项选择

子任务:

multiple-choice-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1909.00277

许可:

cc-by-4.0

数据集介绍文件清单

英文

"cosmos_qa" 数据集说明

数据集概述

Cosmos QA 是一个大规模的数据集，包含了35.6K个需要基于常识的阅读理解问题，以多选题的形式提出。它侧重于从人们日常叙述的各种故事中阅读"行间小字"，并提出需要超越上下文中具体文本范围的推理来回答关于事件可能的原因或影响的问题

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据示例

default

下载文件的大小: 24.40 MB
生成数据集的大小: 24.51 MB
总计使用的磁盘空间: 48.91 MB

"验证"集的一个示例如下所示。

This example was too long and was cropped:

{
    "answer0": "If he gets married in the church he wo nt have to get a divorce .",
    "answer1": "He wants to get married to a different person .",
    "answer2": "He wants to know if he does nt like this girl can he divorce her ?",
    "answer3": "None of the above choices .",
    "context": "\"Do i need to go for a legal divorce ? I wanted to marry a woman but she is not in the same religion , so i am not concern of th...",
    "id": "3BFF0DJK8XA7YNK4QYIGCOG1A95STE##3180JW2OT5AF02OISBX66RFOCTG5J7##A2LTOS0AZ3B28A##Blog_56156##q1_a1##378G7J1SJNCDAAIN46FM2P7T6KZEW2",
    "label": 1,
    "question": "Why is this person asking about divorce ?"
}

数据字段

所有拆分中的数据字段均相同。

default

id : 一个字符串特征.
上下文 : 一个字符串特征.
问题 : 一个字符串特征.
答案0 : 一个字符串特征.
答案1 : 一个字符串特征.
答案2 : 一个字符串特征.
答案3 : 一个字符串特征.
标签 : 一个 int32 特征.

数据拆分

name	train	validation	test
default	25262	2985	6963

数据集创建

策划原因

More Information Needed

源数据

初始数据收集和规范化

More Information Needed

谁是源语言生成者？

More Information Needed

注释

注释处理过程

More Information Needed

谁是注释者？

More Information Needed

个人和敏感信息

More Information Needed

使用数据时的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

据Yejin Choi通过电子邮件报告，该数据集是根据 CC BY 4.0 许可证授权的。

引用信息

@inproceedings{huang-etal-2019-cosmos,
    title = "Cosmos {QA}: Machine Reading Comprehension with Contextual Commonsense Reasoning",
    author = "Huang, Lifu  and
      Le Bras, Ronan  and
      Bhagavatula, Chandra  and
      Choi, Yejin",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1243",
    doi = "10.18653/v1/D19-1243",
    pages = "2391--2401",
}

贡献者

感谢 @patrickvonplaten ， @lewtun ， @albertvillanova ， @thomwolf 添加了这个数据集。

作者:

佚名

数据集大小:

15.55 KB

"cosmos_qa" 数据集说明

数据集概述

支持的任务和排行榜

语言

数据集结构

数据示例

数据字段

数据拆分

数据集创建

策划原因

源数据

注释

个人和敏感信息

使用数据时的注意事项

数据集的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献者