数据集:

empathetic_dialogues

任务:

对话

问答

子任务:

dialogue-generation open-domain-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1811.00207

许可:

cc-by-nc-4.0

数据集介绍文件清单

英文

数据集卡片："共情对话"

数据集概述

这是Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset的PyTorch原始实现。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

默认

下载的数据集文件大小：28.02 MB
生成的数据集大小：25.13 MB
使用的总磁盘空间：53.15 MB

'train'的一个示例如下。

{
    "context": "sentimental",
    "conv_id": "hit:0_conv:1",
    "prompt": "I remember going to the fireworks with my best friend. There was a lot of people_comma_ but it only felt like us in the world.",
    "selfeval": "5|5|5_2|2|5",
    "speaker_idx": 1,
    "tags": "",
    "utterance": "I remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people_comma_ we felt like the only people in the world.",
    "utterance_idx": 1
}

数据字段

所有拆分的数据字段相同。

默认

conv_id：一个字符串特征。
utterance_idx：一个int32特征。
context：一个字符串特征。
prompt：一个字符串特征。
speaker_idx：一个int32特征。
utterance：一个字符串特征。
selfeval：一个字符串特征。
tags：一个字符串特征。

数据拆分

name	train	validation	test
default	76673	12030	10943

数据集创建

策划理由

More Information Needed

源数据

初始数据收集和归一化

More Information Needed

源语言制作人是谁？

More Information Needed

注释

注释过程

More Information Needed

注释者是谁？

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

其他信息

数据集策划者

More Information Needed

许可信息

创作共用 Attribution-NonCommercial 4.0 International 。

引用信息

@inproceedings{rashkin-etal-2019-towards,
    title = "Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset",
    author = "Rashkin, Hannah  and
      Smith, Eric Michael  and
      Li, Margaret  and
      Boureau, Y-Lan",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1534",
    doi = "10.18653/v1/P19-1534",
    pages = "5370--5381",
}

贡献

感谢 @thomwolf ， @patrickvonplaten ， @lewtun 添加了这个数据集。

作者:

佚名

数据集大小:

14.4 KB

数据集卡片："共情对话"

数据集概述

支持的任务和排行榜

语言

数据集结构

数据实例

数据字段

数据拆分

数据集创建

策划理由

源数据

注释

个人和敏感信息

使用数据的注意事项

数据集的社会影响

偏见讨论

其他已知限制

其他信息

数据集策划者

许可信息

引用信息

贡献