数据集:
quoref
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
Quoref是一个问答数据集,用于测试阅读理解系统的指代推理能力。该数据集包含来自维基百科的4.7K个段落,共有24K个问题,系统必须在选择段落中适当的跨度来回答问题之前解决指代关系。
'验证'示例如下所示。
This example was too long and was cropped:
{
"answers": {
"answer_start": [1633],
"text": ["Frankie"]
},
"context": "\"Frankie Bono, a mentally disturbed hitman from Cleveland, comes back to his hometown in New York City during Christmas week to ...",
"id": "bfc3b34d6b7e73c0bd82a009db12e9ce196b53e6",
"question": "What is the first name of the person who has until New Year's Eve to perform a hit?",
"title": "Blast of Silence",
"url": "https://en.wikipedia.org/wiki/Blast_of_Silence"
}
所有拆分的数据字段都相同。
default| name | train | validation |
|---|---|---|
| default | 19399 | 2418 |
@article{allenai:quoref,
author = {Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner},
title = {Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning},
journal = {arXiv:1908.05803v2 },
year = {2019},
}
感谢 @lewtun , @patrickvonplaten , @thomwolf 添加了此数据集。