数据集:

qasc

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1910.11473

许可:

cc-by-4.0
英文

“qasc”数据集的数据卡

数据集摘要

QASC是一个以句子构成为重点的问答数据集。它包括9,980个关于小学科学的八选一问题(8,134个训练集、926个开发集、920个测试集),并附带一个含有17M个句子的语料库。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

默认
  • 下载的数据集文件大小:1.61 MB
  • 生成的数据集大小:5.87 MB
  • 总的磁盘使用量:7.49 MB

“验证”样例如下所示。

{
    "answerKey": "F",
    "choices": {
        "label": ["A", "B", "C", "D", "E", "F", "G", "H"],
        "text": ["sand", "occurs over a wide range", "forests", "Global warming", "rapid changes occur", "local weather conditions", "measure of motion", "city life"]
    },
    "combinedfact": "Climate is generally described in terms of local weather conditions",
    "fact1": "Climate is generally described in terms of temperature and moisture.",
    "fact2": "Fire behavior is driven by local weather conditions such as winds, temperature and moisture.",
    "formatted_question": "Climate is generally described in terms of what? (A) sand (B) occurs over a wide range (C) forests (D) Global warming (E) rapid changes occur (F) local weather conditions (G) measure of motion (H) city life",
    "id": "3NGI5ARFTT4HNGVWXAMLNBMFA0U1PG",
    "question": "Climate is generally described in terms of what?"
}

数据字段

数据字段在所有拆分中相同。

默认
  • id:字符串特征。
  • question:字符串特征。
  • choices:包含以下内容的字典特征:
    • text:字符串特征。
    • label:字符串特征。
  • answerKey:字符串特征。
  • fact1:字符串特征。
  • fact2:字符串特征。
  • combinedfact:字符串特征。
  • formatted_question:字符串特征。

数据拆分

name train validation test
default 8134 926 920

数据集创建

策划理由

More Information Needed

数据来源

数据收集和标准化

More Information Needed

资源语言的生成者是谁?

More Information Needed

标注

注释过程

More Information Needed

标注者是谁?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

附加信息

数据集策划者

More Information Needed

许可信息

该数据集是在 CC BY 4.0 许可下发布的。

引用信息

@article{allenai:qasc,
      author    = {Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal},
      title     = {QASC: A Dataset for Question Answering via Sentence Composition},
      journal   = {arXiv:1910.11473v2},
      year      = {2020},
}

贡献

感谢 @thomwolf @patrickvonplaten @lewtun 添加了该数据集。