数据集:
wiki_bio
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:1603.07771许可:
此数据集包含从维基百科提取的728,321个传记,其中包含传记的第一段和表格信息框。
此数据集的主要目的是开发文本生成模型。
英文。
需要更多信息
单个样本的结构如下所示:
{
"input_text":{
"context":"pope michael iii of alexandria\n",
"table":{
"column_header":[
"type",
"ended",
"death_date",
"title",
"enthroned",
"name",
"buried",
"religion",
"predecessor",
"nationality",
"article_title",
"feast_day",
"birth_place",
"residence",
"successor"
],
"content":[
"pope",
"16 march 907",
"16 march 907",
"56th of st. mark pope of alexandria & patriarch of the see",
"25 april 880",
"michael iii of alexandria",
"monastery of saint macarius the great",
"coptic orthodox christian",
"shenouda i",
"egyptian",
"pope michael iii of alexandria\n",
"16 -rrb- march -lrb- 20 baramhat in the coptic calendar",
"egypt",
"saint mark 's church",
"gabriel i"
],
"row_number":[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
}
},
"target_text":"pope michael iii of alexandria -lrb- also known as khail iii -rrb- was the coptic pope of alexandria and patriarch of the see of st. mark -lrb- 880 -- 907 -rrb- .\nin 882 , the governor of egypt , ahmad ibn tulun , forced khail to pay heavy contributions , forcing him to sell a church and some attached properties to the local jewish community .\nthis building was at one time believed to have later become the site of the cairo geniza .\n"
}
其中,"table"字段中存储了所有维基百科信息框的信息(信息框的标题存储在"column_header"中,内容存储在"content"字段中)。
[需要更多信息]
此数据集在论文《用于传记领域的结构化数据的神经文本生成》 (arxiv link) 中宣布,并存储在 this 存储库(由DavidGrangier拥有)中。
初始数据收集和标准化[需要更多信息]
谁是源语言生产者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
此数据集使用 Creative Commons CC BY-SA 3.0 许可证发布。
若要以BibTex格式引用原始论文,请参考:
@article{DBLP:journals/corr/LebretGA16,
author = {R{\'{e}}mi Lebret and
David Grangier and
Michael Auli},
title = {Generating Text from Structured Data with Application to the Biography
Domain},
journal = {CoRR},
volume = {abs/1603.07771},
year = {2016},
url = {http://arxiv.org/abs/1603.07771},
archivePrefix = {arXiv},
eprint = {1603.07771},
timestamp = {Mon, 13 Aug 2018 16:48:30 +0200},
biburl = {https://dblp.org/rec/journals/corr/LebretGA16.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
感谢 @alejandrocros 添加此数据集。