数据集:
sberquad
任务:
子任务:
extractive-qa语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1912.09723许可:
Sber Question Answering Dataset (SberQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. Russian original analogue presented in Sberbank Data Science Journey 2017.
[Needs More Information]
Russian
{
"context": "Первые упоминания о строении человеческого тела встречаются в Древнем Египте...",
"id": 14754,
"qas": [
{
"id": 60544,
"question": "Где встречаются первые упоминания о строении человеческого тела?",
"answers": [{"answer_start": 60, "text": "в Древнем Египте"}],
}
]
}
| name | train | validation | test |
|---|---|---|---|
| plain_text | 45328 | 5036 | 23936 |
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
@InProceedings{sberquad,
doi = {10.1007/978-3-030-58219-7_1},
author = {Pavel Efimov and
Andrey Chertok and
Leonid Boytsov and
Pavel Braslavski},
title = {SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis},
booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction},
year = {2020},
publisher = {Springer International Publishing},
pages = {3--15}
}
Thanks to @alenusch for adding this dataset.