数据集:
commonsense_qa
任务:
问答子任务:
open-domain-qa语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1811.00937许可:
mitCommonsenseQA是一个新的多选题回答数据集,需要各种常识知识来预测正确答案。它包含12,102个问题,每个问题有一个正确答案和四个干扰答案。数据集提供了两个主要的训练/验证/测试集划分:“随机划分”是主要评估划分,“问题令牌划分”,详见论文。
数据集为英语(en)。
“train”的示例如下:
{'id': '075e483d21c29a511267ef62bedc0461', 'question': 'The sanctions against the school were a punishing blow, and they seemed to what the efforts the school had made to change?', 'question_concept': 'punishing', 'choices': {'label': ['A', 'B', 'C', 'D', 'E'], 'text': ['ignore', 'enforce', 'authoritarian', 'yell at', 'avoid']}, 'answerKey': 'A'}
所有划分的数据字段均相同。
默认name | train | validation | test |
---|---|---|---|
default | 9741 | 1221 | 1140 |
该数据集获得MIT许可。
参见: https://github.com/jonathanherzig/commonsenseqa/issues/5
@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }
感谢 @thomwolf , @lewtun , @albertvillanova , @patrickvonplaten 添加此数据集。