数据集:

ubuntu_dialogs_corpus

任务:

对话

子任务:

dialogue-generation

语言:

计算机处理:

monolingual

大小:

1M<n<10M

语言创建人:

found

批注创建人:

found

源数据集:

original

预印本库:

arxiv:1506.08909

许可:

license:unknown

数据集介绍文件清单

英文

"ubuntu_dialogs_corpus" 数据集卡片

数据集摘要

Ubuntu对话语料库是一个包含近100万个多轮对话的数据集，总计超过700万个话语和1亿个单词。它为基于神经语言模型构建对话管理器的研究提供了独特的资源，这些模型可以利用大量未标记的数据。数据集既具有Dialog State Tracking Challenge数据集中对话的多轮属性，又具有类似Twitter等微博服务的互动的非结构化特性。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

训练集

下载的数据集文件大小：0.00 MB
生成的数据集大小：65.49 MB
总磁盘使用量：65.49 MB

"train" 的一个示例如下所示。

This example was too long and was cropped:

{
    "Context": "\"i think we could import the old comment via rsync , but from there we need to go via email . i think it be easier than cach the...",
    "Label": 1,
    "Utterance": "basic each xfree86 upload will not forc user to upgrad 100mb of font for noth __eou__ no someth i do in my spare time . __eou__"
}

数据字段

所有拆分的数据字段都相同。

训练集

上下文：字符串特征。
话语：字符串特征。
标签：整数特征。

数据拆分

name	train
train	127422

数据集创建

策展理由

More Information Needed

源数据

初始数据收集和归一化

More Information Needed

谁是源语言的生产者？

More Information Needed

注释

注释过程

More Information Needed

谁是注释者？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策展人

More Information Needed

许可信息

More Information Needed

引用信息

@article{DBLP:journals/corr/LowePSP15,
  author    = {Ryan Lowe and
               Nissan Pow and
               Iulian Serban and
               Joelle Pineau},
  title     = {The Ubuntu Dialogue Corpus: {A} Large Dataset for Research in Unstructured
               Multi-Turn Dialogue Systems},
  journal   = {CoRR},
  volume    = {abs/1506.08909},
  year      = {2015},
  url       = {http://arxiv.org/abs/1506.08909},
  archivePrefix = {arXiv},
  eprint    = {1506.08909},
  timestamp = {Mon, 13 Aug 2018 16:48:23 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/LowePSP15.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

贡献者

感谢 @thomwolf ， @patrickvonplaten ， @lewtun 添加此数据集。

作者:

佚名

数据集大小:

18.38 KB

"ubuntu_dialogs_corpus" 数据集卡片

数据集摘要

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策展理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据的社会影响

偏见讨论

其他已知限制

其他信息

数据集策展人

许可信息

引用信息

贡献者