HebEMO - 现代希伯来语情绪识别模型

HebEMO是一种工具，可以从我们收集和注释的独特的Covid-19相关数据集中检测极性并提取现代希伯来语用户生成内容(UGC)中的情绪。

HebEMO在极性分类方面达到了0.96的加权平均F1分数的高性能。情绪检测达到了0.78-0.97的F1分数，唯独对于惊讶的捕捉效果较差(F1=0.41)。与报告中表现最佳的结果相比，这些结果甚至超过了英语的表现。

情绪UGC数据描述

我们的UGC数据包括从2020年1月至2020年8月收集的来自3个主要以色列新闻网站的文章评论。数据的总大小约为150 MB，包括700多万个单词和35万个句子。由众包成员(每个句子3-10名注释者)对大约2000个句子进行了整体情感(极性)和 eight emotions ：愤怒、厌恶、期待、恐惧、喜悦、悲伤、惊讶和信任的标注。每种情绪出现在句子中的百分比见下表。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情绪识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

上述指标是针对正类的(即情绪在文本中反映出来)。

情感(极性)分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

可在AWS上使用情感(极性)分析模型！想要了解更多信息，请访问 AWS' git 。

如何使用

情绪识别模型

可在 huggingface spaces 找到在线模型，或作为 colab notebook 。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

适用于情感分类模型(仅极性)：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal Yahav The Coller Semitic Languages AI Lab Thank you, תודה, شكرا

如果您使用了此模型，请引用我们：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

418.03 MB