数据集:

dair-ai/emotion

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

machine-generated

批注创建人:

machine-generated

源数据集:

original

许可:

other
英文

"情感" 数据集卡片

数据集概述

"情感" 数据集是一个包含六种基本情感(愤怒、恐惧、喜悦、爱、伤心、惊讶)的英文 Twitter 消息数据集。更详细的信息请参考相关论文。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

一个示例如下所示。

{
  "text": "im feeling quite sad and sorry for myself but ill snap out of it soon",
  "label": 0
}

数据字段

数据字段包括:

  • text: 一个字符串特征。
  • label: 一个分类标签,可能的值包括伤心(0)、喜悦(1)、爱(2)、愤怒(3)、恐惧(4)、惊讶(5)。

数据划分

数据集有两个配置:

  • split:共 20,000 个示例,划分为训练集、验证集和测试集。
  • unsplit:共 416,809 个示例,单个训练集划分。
name train validation test
split 16000 2000 2000
unsplit 416809 n/a n/a

数据集创建

策划理由

More Information Needed

源数据

数据收集和规范化

More Information Needed

谁是源语言的生产者?

More Information Needed

注释

注释过程

More Information Needed

谁是注释者?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据集的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

附加信息

数据集策划者

More Information Needed

许可信息

该数据集仅限用于教育和研究目的。

引用信息

如果您使用了该数据集,请引用:

@inproceedings{saravia-etal-2018-carer,
    title = "{CARER}: Contextualized Affect Representations for Emotion Recognition",
    author = "Saravia, Elvis  and
      Liu, Hsien-Chi Toby  and
      Huang, Yen-Hao  and
      Wu, Junlin  and
      Chen, Yi-Shin",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D18-1404",
    doi = "10.18653/v1/D18-1404",
    pages = "3687--3697",
    abstract = "Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.",
}

贡献

感谢 @lhoestq @thomwolf @lewtun 添加了该数据集。