数据集:
gap
任务:
语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1810.05201许可:
GAP 是一个性别平衡的数据集,包含了8,908对已进行核心指代标注的(不明确的代词,先行词名称)配对数据,这些数据是从维基百科中采样并由Google AI Language发布,用于评估实际应用中核心指代解析的性能。
'验证'的一个示例如下。
{
"A": "aliquam ultrices sagittis",
"A-coref": false,
"A-offset": 208,
"B": "elementum curabitur vitae",
"B-coref": false,
"B-offset": 435,
"ID": "validation-1",
"Pronoun": "condimentum mattis pellentesque",
"Pronoun-offset": 948,
"Text": "Lorem ipsum dolor",
"URL": "sem fringilla ut"
}
数据字段在所有拆分中是相同的。
default| name | train | validation | test |
|---|---|---|---|
| default | 2000 | 454 | 2000 |
@article{webster-etal-2018-mind,
title = "Mind the {GAP}: A Balanced Corpus of Gendered Ambiguous Pronouns",
author = "Webster, Kellie and
Recasens, Marta and
Axelrod, Vera and
Baldridge, Jason",
journal = "Transactions of the Association for Computational Linguistics",
volume = "6",
year = "2018",
address = "Cambridge, MA",
publisher = "MIT Press",
url = "https://aclanthology.org/Q18-1042",
doi = "10.1162/tacl_a_00240",
pages = "605--617",
}
感谢 @thomwolf , @patrickvonplaten , @otakumesi , @lewtun 添加了该数据集。