HebEMO - 现代希伯来语情感识别模型

HebEMO是一种工具，可从我们收集和注释的独特Covid-19相关数据集上识别现代希伯来语用户生成的内容（UGC），并检测极性和提取情感。

HebEMO在极性分类方面达到了高性能的加权平均F1-分数= 0.96。情感检测的F1-分数为0.78-0.97，但模型未能捕捉到惊讶（F1 = 0.41）。这些结果甚至在与英语对比时也优于最佳报告的性能。

情感UGC数据描述

我们的UGC数据包括2020年1月至2020年8月收集的3个以色列主要新闻网站上发布的评论。数据总大小约为150MB，包括700万多个词和35万个句子。每个情感的句子百分比在下表中。

anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

性能

情感识别

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

上述指标适用于正类（即情感在文本中反映出来）。

情感（极性）分析

precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy	0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

情感（极性）分析模型也可在AWS上使用！有关详细信息，请访问 AWS' git 。

如何使用

情感识别模型

在线模型可在 huggingface spaces 或者 colab notebook 处找到。

# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1

!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

仅用于情感分类模型（极性）：

from transformers import AutoTokenizer, AutoModel, pipeline

tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# how to use?
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
>>>  [[{'label': 'neutral', 'score': 0.9978172183036804},
>>>  {'label': 'positive', 'score': 0.0014792329166084528},
>>>  {'label': 'negative', 'score': 0.0007035882445052266}]]

sentiment_analysis('קפה זה טעים')
>>>  [[{'label': 'neutral', 'score': 0.00047328314394690096},
>>>  {'label': 'possitive', 'score': 0.9994067549705505},
>>>  {'label': 'negetive', 'score': 0.00011996887042187154}]]

sentiment_analysis('אני לא אוהב את העולם')
>>>  [[{'label': 'neutral', 'score': 9.214012970915064e-05}, 
>>>  {'label': 'possitive', 'score': 8.876807987689972e-05}, 
>>>  {'label': 'negetive', 'score': 0.9998190999031067}]]

联系我们

Avichay Chriqui Inbal yahav Coller Semitic Languages AI Lab 谢谢, תודה, شكرا

如果您使用了此模型，请引用我们的文章：

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO：一种希伯来语BERT模型和用于极性分析和情感识别的工具。信息杂志数据科学，即将出版。

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

作者:

avi chr

数据集大小:

418.03 MB