HebEMO - 现代希伯来语情绪识别模型

HebEMO是一种工具，用于检测现代希伯来语用户生成内容（UGC）的极性并提取情绪，该模型是在我们收集和注释的一套独特的与Covid-19相关的数据集上进行训练的。

HebEMO在极性分类方面取得了高性能，加权平均F1分数为0.96。情绪检测的F1分数为0.78-0.97，除了“惊讶”这个情绪，模型未能捕捉到（F1=0.41）。与其他最佳性能相比，这些结果甚至优于英语语言。

情绪UGC数据描述

我们的UGC数据包括自2020年1月至2020年8月收集的新闻文章上发表的评论，这些文章来自以色列的3个主要新闻网站。总数据量约为150MB，包括超过700万个词和35万个句子。我们对大约2000个句子进行了注释，每个句子由3-10名注释者进行整体情感（极性）和：愤怒、厌恶、期待、恐惧、喜悦、悲伤、惊讶和信任的注释。每种情绪出现的句子百分比详见下表。

性能

情绪识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

以上指标是针对积极类别（即文本中反映的情绪）。

情感（极性）分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可以在AWS上使用！有关更多信息，请访问 AWS' git 。

如何使用

情绪识别模型

在线模型可以在 huggingface spaces 或 colab notebook 找到。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

仅用于情感分类模型（仅极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal yahav The Coller Semitic Languages AI Lab 谢谢，תודה，شكرا

如果您使用了这个模型，请引用我们的文章：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

418.03 MB