数据集:

shunk031/wrime

语言:

ja

计算机处理:

monolingual

语言创建人:

crowdsourced

批注创建人:

crowdsourced
英文

WRIME 数据集卡片

数据集摘要

在这项研究中,我们介绍了一个新的数据集 WRIME,用于情感强度评估。我们收集了作家自己的主观情感强度和读者对其进行的客观情感强度注释,并探讨它们之间的差异。在我们的数据收集过程中,我们通过众包服务雇佣了50名参与者。他们使用主观情感强度注释了自己在社交网络服务 (SNS) 上的过去帖子。我们还雇佣了3名标注员,他们使用客观情感强度注释了所有帖子。结果,我们的日语情感分析数据集包括了17,000条帖子,其中包含Plutchik的八种情绪( Plutchik, 1980 )的主观和客观情感强度,以四分制表示(没有、弱、中、强)。

支持的任务和排行榜

[需要更多信息]

语言

  • 日语

数据集结构

数据实例

在加载特定配置时,用户必须附加一个版本相关的后缀:

from datasets import load_dataset

dataset = load_dataset("shunk031/wrime", name="ver1")

print(dataset)
# DatasetDict({
#     train: Dataset({
#         features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'],
#         num_rows: 40000
#     })
#     validation: Dataset({
#         features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'],
#         num_rows: 1200
#     })
#     test: Dataset({
#         features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'],
#         num_rows: 2000
#     })
# })
Ver. 1

一个示例如下:

{
    "sentence": "ぼけっとしてたらこんな時間。チャリあるから食べにでたいのに…",
    "user_id": "1",
    "datetime": "2012/07/31 23:48",
    "writer": {
        "joy": 0,
        "sadness": 1,
        "anticipation": 2,
        "surprise": 1,
        "anger": 1,
        "fear": 0,
        "disgust": 0,
        "trust": 1
    },
    "reader1": {
        "joy": 0,
        "sadness": 2,
        "anticipation": 0,
        "surprise": 0,
        "anger": 0,
        "fear": 0,
        "disgust": 0,
        "trust": 0
    },
    "reader2": {
        "joy": 0,
        "sadness": 2,
        "anticipation": 0,
        "surprise": 1,
        "anger": 0,
        "fear": 0,
        "disgust": 0,
        "trust": 0
    },
    "reader3": {
        "joy": 0,
        "sadness": 2,
        "anticipation": 0,
        "surprise": 0,
        "anger": 0,
        "fear": 1,
        "disgust": 1,
        "trust": 0
    },
    "avg_readers": {
        "joy": 0,
        "sadness": 2,
        "anticipation": 0,
        "surprise": 0,
        "anger": 0,
        "fear": 0,
        "disgust": 0,
        "trust": 0
    }
}
Ver. 1

一个示例如下:

{
    "sentence": "ぼけっとしてたらこんな時間。チャリあるから食べにでたいのに…", 
    "user_id": "1", 
    "datetime": "2012/7/31 23:48", 
    "writer": {
        "joy": 0, 
        "sadness": 1, 
        "anticipation": 2, 
        "surprise": 1, 
        "anger": 1, 
        "fear": 0, 
        "disgust": 0, 
        "trust": 1, 
        "sentiment": 0
    }, 
    "reader1": {
        "joy": 0, 
        "sadness": 2, 
        "anticipation": 0, 
        "surprise": 0, 
        "anger": 0, 
        "fear": 0, 
        "disgust": 0, 
        "trust": 0, 
        "sentiment": -2
    }, 
    "reader2": {
        "joy": 0, 
        "sadness": 2, 
        "anticipation": 0, 
        "surprise": 0, 
        "anger": 0, 
        "fear": 1, 
        "disgust": 1, 
        "trust": 0, 
        "sentiment": -1
    }, 
    "reader3": {
        "joy": 0, 
        "sadness": 2, 
        "anticipation": 0, 
        "surprise": 1, 
        "anger": 0, 
        "fear": 0, 
        "disgust": 0, 
        "trust": 0, 
        "sentiment": -1
    }, 
    "avg_readers": {
        "joy": 0, 
        "sadness": 2, 
        "anticipation": 0, 
        "surprise": 0, 
        "anger": 0, 
        "fear": 0, 
        "disgust": 0, 
        "trust": 0, 
        "sentiment": -1
    }
}

数据字段

Ver. 1
  • sentence:投稿文本
  • user_id:用户 ID
  • datetime:投稿日期
  • writer:主观(作者)
    • joy:主观快乐情绪
    • sadness:主观悲伤情绪
    • anticipation:主观期待情绪
    • surprise:主观惊讶情绪
    • anger:主观愤怒情绪
    • fear:主观恐惧情绪
    • disgust:主观厌恶情绪
    • trust:主观信任情绪
  • reader1:客观 A(读者 A)
    • joy:客观 A 的快乐情绪
    • sadness:客观 A 的悲伤情绪
    • anticipation:客观 A 的期待情绪
    • surprise:客观 A 的惊讶情绪
    • anger:客观 A 的愤怒情绪
    • fear:客观 A 的恐惧情绪
    • disgust:客观 A 的厌恶情绪
    • trust:客观 A 的信任情绪
  • reader2:客观 B(读者 B)
    • joy:客观 B 的快乐情绪
    • sadness:客观 B 的悲伤情绪
    • anticipation:客观 B 的期待情绪
    • surprise:客观 B 的惊讶情绪
    • anger:客观 B 的愤怒情绪
    • fear:客观 B 的恐惧情绪
    • disgust:客观 B 的厌恶情绪
    • trust:客观 B 的信任情绪
  • reader3:客观 C(读者 C)
    • joy:客观 C 的快乐情绪
    • sadness:客观 C 的悲伤情绪
    • anticipation:客观 C 的期待情绪
    • surprise:客观 C 的惊讶情绪
    • anger:客观 C 的愤怒情绪
    • fear:客观 C 的恐惧情绪
    • disgust:客观 C 的厌恶情绪
    • trust:客观 C 的信任情绪
  • avg_readers
    • joy:客观 A、B、C 的平均快乐情绪
    • sadness:客观 A、B、C 的平均悲伤情绪
    • anticipation:客观 A、B、C 的平均期待情绪
    • surprise:客观 A、B、C 的平均惊讶情绪
    • anger:客观 A、B、C 的平均愤怒情绪
    • fear:客观 A、B、C 的平均恐惧情绪
    • disgust:客观 A、B、C 的平均厌恶情绪
    • trust:客观 A、B、C 的平均信任情绪
Ver. 2
  • sentence:投稿文本
  • user_id:用户 ID
  • datetime:投稿日期
  • writer:主观(作者)
    • joy:主观快乐情绪
    • sadness:主观悲伤情绪
    • anticipation:主观期待情绪
    • surprise:主观惊讶情绪
    • anger:主观愤怒情绪
    • fear:主观恐惧情绪
    • disgust:主观厌恶情绪
    • trust:主观信任情绪
    • sentiment:主观情感极性
  • reader1:客观 A(读者 A)
    • joy:客观 A 的快乐情绪
    • sadness:客观 A 的悲伤情绪
    • anticipation:客观 A 的期待情绪
    • surprise:客观 A 的惊讶情绪
    • anger:客观 A 的愤怒情绪
    • fear:客观 A 的恐惧情绪
    • disgust:客观 A 的厌恶情绪
    • trust:客观 A 的信任情绪
    • sentiment:客观 A 的感情极性
  • reader2:客观 B(读者 B)
    • joy:客观 B 的快乐情绪
    • sadness:客观 B 的悲伤情绪
    • anticipation:客观 B 的期待情绪
    • surprise:客观 B 的惊讶情绪
    • anger:客观 B 的愤怒情绪
    • fear:客观 B 的恐惧情绪
    • disgust:客观 B 的厌恶情绪
    • trust:客观 B 的信任情绪
    • sentiment:客观 B 的感情极性
  • reader3:客观 C(读者 C)
    • joy:客观 C 的快乐情绪
    • sadness:客观 C 的悲伤情绪
    • anticipation:客观 C 的期待情绪
    • surprise:客观 C 的惊讶情绪
    • anger:客观 C 的愤怒情绪
    • fear:客观 C 的恐惧情绪
    • disgust:客观 C 的厌恶情绪
    • trust:客观 C 的信任情绪
    • sentiment:客观 C 的感情极性
  • avg_readers
    • joy:客观 A、B、C 的平均快乐情绪
    • sadness:客观 A、B、C 的平均悲伤情绪
    • anticipation:客观 A、B、C 的平均期待情绪
    • surprise:客观 A、B、C 的平均惊讶情绪
    • anger:客观 A、B、C 的平均愤怒情绪
    • fear:客观 A、B、C 的平均恐惧情绪
    • disgust:客观 A、B、C 的平均厌恶情绪
    • trust:客观 A、B、C 的平均信任情绪
    • sentiment:客观 A、B、C 的平均感情极性

数据拆分

name train validation test
ver1 40,000 1,200 2,000
ver2 30,000 2,500 2,500

数据集创建

策展理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

谁是源语言的生产者?

[需要更多信息]

注释

注释过程

[需要更多信息]

标注员是谁?

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

附加信息

数据集策展人

[需要更多信息]

许可信息

来自 GitHub 的 the README

  • 该数据集仅可用于研究目的。
  • 禁止重新分发数据集。

引用信息

@inproceedings{kajiwara-etal-2021-wrime,
    title = "{WRIME}: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations",
    author = "Kajiwara, Tomoyuki  and
      Chu, Chenhui  and
      Takemura, Noriko  and
      Nakashima, Yuta  and
      Nagahara, Hajime",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.naacl-main.169",
    doi = "10.18653/v1/2021.naacl-main.169",
    pages = "2095--2104",
    abstract = "We annotate 17,000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer{'}s subjective labels than the readers{'}. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.",
}
@inproceedings{suzuki-etal-2022-japanese,
    title = "A {J}apanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain",
    author = "Suzuki, Haruya  and
      Miyauchi, Yuto  and
      Akiyama, Kazuki  and
      Kajiwara, Tomoyuki  and
      Ninomiya, Takashi  and
      Takemura, Noriko  and
      Nakashima, Yuta  and
      Nagahara, Hajime",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.759",
    pages = "7022--7028",
    abstract = "We annotate 35,000 SNS posts with both the writer{'}s subjective sentiment polarity labels and the reader{'}s objective ones to construct a Japanese sentiment analysis dataset. Our dataset includes intensity labels (\textit{none}, \textit{weak}, \textit{medium}, and \textit{strong}) for each of the eight basic emotions by Plutchik (\textit{joy}, \textit{sadness}, \textit{anticipation}, \textit{surprise}, \textit{anger}, \textit{fear}, \textit{disgust}, and \textit{trust}) as well as sentiment polarity labels (\textit{strong positive}, \textit{positive}, \textit{neutral}, \textit{negative}, and \textit{strong negative}). Previous studies on emotion analysis have studied the analysis of basic emotions and sentiment polarity independently. In other words, there are few corpora that are annotated with both basic emotions and sentiment polarity. Our dataset is the first large-scale corpus to annotate both of these emotion labels, and from both the writer{'}s and reader{'}s perspectives. In this paper, we analyze the relationship between basic emotion intensity and sentiment polarity on our dataset and report the results of benchmarking sentiment polarity classification.",
}

贡献者

感谢 @moguranosenshi 创建了该数据集。