HebEMO - 现代希伯来语情绪识别模型

HebEMO是一种工具，可以从现代希伯来语用户生成的内容（UGC）中检测极性并提取情绪。我们在一个独特的与Covid-19相关的数据集上进行了训练和注释。

HebEMO在极性分类方面达到了0.96的加权平均F1分数的高性能。情绪检测的F1分数为0.78-0.97，仅对惊讶情绪无法捕捉（F1 = 0.41）。这些结果甚至比英语的最佳性能更好。

情绪UGC数据描述

我们的UGC数据包括从2020年1月至2020年8月收集的三个主要以色列新闻网站上的文章评论。数据总大小约为150MB，包括700万个词和35万个句子。其中约有2000个句子由众包成员（每个句子有3-10个注释者）注释了整体情绪（极性）以及惊讶、愤怒、厌恶、期待、恐惧、喜悦、悲伤、惊讶和信任等方面的情绪。每种情绪出现于句子中的百分比详见下表。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情绪识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

上述指标适用于积极类别（即文本中反映的情绪）。

情感（极性）分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可以在AWS上使用！有关更多信息，请访问 AWS' git 。

如何使用

情绪识别模型

在线模型请参见 huggingface spaces 或者使用 colab notebook 。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

情感分类模型（仅限极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal Yahav The Coller Semitic Languages AI Lab 谢谢, תודה, شكرا

如果您使用了此模型，请引用我们：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

417.74 MB