HebEMO - 情感识别模型 for 现代希伯来语

HebEMO 是一个工具，可以从我们收集和注释的独特的Covid-19相关数据集上，检测出现代希伯来语用户生成内容（UGC）中的极性并提取出情感。

HebEMO 在极性分类方面获得了高性能，加权平均F1分数为0.96。情感识别的F1分数为0.78-0.97，只有 eight emotions ，该模型无法捕捉到（F1 = 0.41）。与英语相比，这些结果超过了最佳报告性能。

情感 UGC 数据描述

我们的UGC数据包括从2020年1月至2020年8月收集的新闻文章上发布的评论，这些文章来自以色列的三个主要新闻网站。数据的总大小约为150 MB，包括超过700万个单词和35万句子。~2000个句子由众包成员（每个句子3-10个标注者）标注了整体情感（极性）和 eight emotions ：愤怒、厌恶、期望、恐惧、喜悦、悲伤、惊讶和信任。每种情感出现的句子百分比在下表中找到。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情感识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

上述指标是针对正类（即情感在文本中反映出来）的。

情感（极性）分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可在AWS上获得！有关更多信息，请访问 AWS' git

如何使用

情感识别模型

可在 huggingface spaces 或 as colab notebook 找到在线模型。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

对于情感分类模型（仅极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal yahav The Coller Semitic Languages AI Lab 谢谢, תודה, شكرا

如果您使用了这个模型，请引用我们：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO：希伯来BERT模型和极性分析与情感识别工具。INFORMS Journal on Data Science, 即将出版.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

418.03 MB