数据集:
empathetic_dialogues
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1811.00207许可:
这是Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset的PyTorch原始实现。
'train'的一个示例如下。
{
"context": "sentimental",
"conv_id": "hit:0_conv:1",
"prompt": "I remember going to the fireworks with my best friend. There was a lot of people_comma_ but it only felt like us in the world.",
"selfeval": "5|5|5_2|2|5",
"speaker_idx": 1,
"tags": "",
"utterance": "I remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people_comma_ we felt like the only people in the world.",
"utterance_idx": 1
}
所有拆分的数据字段相同。
默认| name | train | validation | test |
|---|---|---|---|
| default | 76673 | 12030 | 10943 |
创作共用 Attribution-NonCommercial 4.0 International 。
@inproceedings{rashkin-etal-2019-towards,
title = "Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset",
author = "Rashkin, Hannah and
Smith, Eric Michael and
Li, Margaret and
Boureau, Y-Lan",
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/P19-1534",
doi = "10.18653/v1/P19-1534",
pages = "5370--5381",
}
感谢 @thomwolf , @patrickvonplaten , @lewtun 添加了这个数据集。