数据集:
quartz
许可:
源数据集:
original批注创建人:
crowdsourced语言创建人:
crowdsourced大小:
1K<n<10K计算机处理:
monolingual语言:
任务:
Quartz是一个众包数据集,包含3864个关于开放领域定性关系的多项选择问题。每个问题与一个不同的背景句子(有时是短段落)配对。QuaRTz V1数据集包含3864个关于开放领域定性关系的问题。每个问题与一个不同的背景句子(有时是短段落)配对。
数据集分为训练集(2696)、开发集(384)和测试集(784)。一个背景句子只会出现在一个数据集拆分中。
'train'的一个示例如下所示。
{
"answerKey": "A",
"choices": {
"label": ["A", "B"],
"text": ["higher", "lower"]
},
"id": "QRQA-10116-3",
"para": "Electrons at lower energy levels, which are closer to the nucleus, have less energy.",
"para_anno": {
"cause_dir_sign": "LESS",
"cause_dir_str": "closer",
"cause_prop": "distance from a nucleus",
"effect_dir_sign": "LESS",
"effect_dir_str": "less",
"effect_prop": "energy"
},
"para_id": "QRSent-10116",
"question": "Electrons further away from a nucleus have _____ energy levels than close ones.",
"question_anno": {
"less_cause_dir": "electron energy levels",
"less_cause_prop": "nucleus",
"less_effect_dir": "lower",
"less_effect_prop": "electron energy levels",
"more_effect_dir": "higher",
"more_effect_prop": "electron energy levels"
}
}
所有拆分的数据字段都是相同的。
default| name | train | validation | test |
|---|---|---|---|
| default | 2696 | 384 | 784 |
该数据集的许可协议为 创意共享 Attribution 4.0 International (CC BY 4.0) 。
@InProceedings{quartz,
author = {Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark},
title = {"QUARTZ: An Open-Domain Dataset of Qualitative Relationship
Questions"},
year = {"2019"},
}
感谢 @patrickvonplaten , @lewtun , @thomwolf 添加此数据集。