HebEMO - 用于现代希伯来语的情感识别模型

HebEMO是一个工具，可以从现代希伯来语用户生成的内容 (UGC) 中检测极性并提取情感。我们在一个独特的COVID-19相关数据集上进行了训练和注释。

HebEMO在极性分类方面达到了高性能，加权平均F1分数=0.96。情感检测的F1分数为0.78-0.97，唯独无法捕捉到"惊讶"这一情感（F1 = 0.41）。与英语相比，这些结果优于目前报道的最佳性能。

情感UGC数据描述

我们的UGC数据包括从2020年1月到2020年8月期间收集的来自3个以色列主要新闻网站的文章评论。数据的总大小约为150 MB，包括700万个词和35万个句子。由众筹成员对大约2000个句子进行了整体情感（极性）和 eight emotions : 愤怒、厌恶、期待、恐惧、喜悦、悲伤、惊讶和信任的注释。每种情感在句子中出现的百分比可以在下表中找到。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情感识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

以上度量指的是正类（即文本中反映的情感）。

情感（极性）分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可在AWS上获得！欲了解更多信息，请访问 AWS' git 。

如何使用

情感识别模型

在线模型可在 huggingface spaces 处找到，或者作为 colab notebook 。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

仅适用于情感分类模型（仅极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal yahav The Coller Semitic Languages AI Lab 谢谢, תודה, شكرا

如果您使用了此模型，请引用我们：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO：一种希伯来BERT模型和一种用于极性分析和情感识别的工具。INFORMS Journal on Data Science, 即将发表。

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

417.74 MB