数据集:
ai2_arc
任务:
语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
found源数据集:
original许可:
这是一个由7,787个真实的小学水平多项选择科学问题组成的新数据集,旨在鼓励高级问答研究。数据集分为挑战集和简单集,其中挑战集仅包含由检索算法和词共现算法均回答错误的问题。我们还提供一个包含超过1400万个与该任务相关的科学句子的语料库,以及针对该数据集的三个神经基线模型的实现。我们将ARC提出为一个对整个社区具有挑战性的任务。
"train"的示例如下所示。
{
"answerKey": "B",
"choices": {
"label": ["A", "B", "C", "D"],
"text": ["Shady areas increased.", "Food sources increased.", "Oxygen levels increased.", "Available water increased."]
},
"id": "Mercury_SC_405487",
"question": "One year, the oak trees in a park began producing more acorns than usual. The next year, the population of chipmunks in the park also increased. Which best explains why there were more chipmunks the next year?"
}
ARC-简单 "train"的示例如下所示。
{
"answerKey": "B",
"choices": {
"label": ["A", "B", "C", "D"],
"text": ["Shady areas increased.", "Food sources increased.", "Oxygen levels increased.", "Available water increased."]
},
"id": "Mercury_SC_405487",
"question": "One year, the oak trees in a park began producing more acorns than usual. The next year, the population of chipmunks in the park also increased. Which best explains why there were more chipmunks the next year?"
}
所有拆分之间的数据字段相同。
ARC-挑战| name | train | validation | test |
|---|---|---|---|
| ARC-Challenge | 1119 | 299 | 1172 |
| ARC-Easy | 2251 | 570 | 2376 |
@article{allenai:arc,
author = {Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and
Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
title = {Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
journal = {arXiv:1803.05457v1},
year = {2018},
}
感谢 @lewtun , @patrickvonplaten , @thomwolf 添加了此数据集。