HebEMO - 现代希伯来语情感识别模型

HebEMO是一种工具，用于检测现代希伯来语用户生成内容（UGC）中的极性并提取情感，它是基于我们收集和注释的一组独特的与Covid-19相关的数据集进行训练的。

HebEMO在极性分类方面获得了高性能，加权平均F1-score = 0.96。情感检测的F1-score为0.78-0.97，唯独对于惊讶这一情感，模型未能捕捉（F1 = 0.41）。这些结果比最佳报告的性能要好，即使与英语语言进行比较也是如此。

情感UGC数据说明

我们的UGC数据包括从2020年1月至2020年8月收集的三个以色列主要新闻网站上发布的评论。数据总大小约为150 MB，包括超过7百万个词和35万个句子。~2000个句子由众包成员（每个句子3-10个注释者）进行了整体情感（极性）和 eight emotions ：愤怒、厌恶、期望、恐惧、喜悦、悲伤、惊讶和信任的注释。每种情感在句子中出现的百分比在下表中给出。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情感识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

以上指标适用于正类（表示文本中反映了情感）。

情感极性分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可以在AWS上使用！有关更多信息，请访问 AWS' git 。

如何使用

情感识别模型

在线模型可在 huggingface spaces 或作为 colab notebook 找到

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

情感分类模型（仅极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal yahav The Coller Semitic Languages AI Lab 谢谢, תודה, شكرا

如果您使用了此模型，请引用我们：

Chriqui, A., & Yahav, I. (2021). HeBERT & HebEMO：希伯来BERT模型和极性分析与情感识别工具。arXiv预印本arXiv:2102.01909。

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={arXiv preprint arXiv:2102.01909},
  year={2021}
}

作者:

avi chr

数据集大小:

418.03 MB