数据集:
commonsense_qa
任务:
子任务:
open-domain-qa语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1811.00937许可:
CommonsenseQA是一个新的多选题回答数据集,需要各种常识知识来预测正确答案。它包含12,102个问题,每个问题有一个正确答案和四个干扰答案。数据集提供了两个主要的训练/验证/测试集划分:“随机划分”是主要评估划分,“问题令牌划分”,详见论文。
数据集为英语(en)。
“train”的示例如下:
{'id': '075e483d21c29a511267ef62bedc0461',
'question': 'The sanctions against the school were a punishing blow, and they seemed to what the efforts the school had made to change?',
'question_concept': 'punishing',
'choices': {'label': ['A', 'B', 'C', 'D', 'E'],
'text': ['ignore', 'enforce', 'authoritarian', 'yell at', 'avoid']},
'answerKey': 'A'}
所有划分的数据字段均相同。
默认| name | train | validation | test |
|---|---|---|---|
| default | 9741 | 1221 | 1140 |
该数据集获得MIT许可。
参见: https://github.com/jonathanherzig/commonsenseqa/issues/5
@inproceedings{talmor-etal-2019-commonsenseqa,
title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge",
author = "Talmor, Alon and
Herzig, Jonathan and
Lourie, Nicholas and
Berant, Jonathan",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1421",
doi = "10.18653/v1/N19-1421",
pages = "4149--4158",
archivePrefix = "arXiv",
eprint = "1811.00937",
primaryClass = "cs",
}
感谢 @thomwolf , @lewtun , @albertvillanova , @patrickvonplaten 添加此数据集。