数据集:
clarin-pl/polemo2-official
任务:
语言:
计算机处理:
monolingual语言创建人:
other批注创建人:
expert-generated源数据集:
original许可:
PolEmo2.0是一个包含四个领域(医药、酒店、产品、大学)在线消费者评论的数据集。该数据集是基于完整评论和单个句子进行人工标注的。当前版本(PolEmo 2.0)包含8,216条评论,共57,466个句子。每个文本和句子都手动标注了情感,使用2+1方案,总共有197,046个标注。大约85%的评论来自医药和酒店领域。每个评论都用四个标签进行标注:正面、负面、中立或模糊。
任务是预测评论的正确标签。
输入('text'列):句子
输出('target'列):句子情感的标签('zero':中立,'minus':负面,'plus':正面,'amb':模糊)
领域:在线评论
测量指标:准确率,F1 Macro
示例:
输入:Na samym wejściu hotel śmierdzi . W pokojach jest pleśń na ścianach , brudny dywan . W łazience śmierdzi chemią , hotel nie grzeje w pokojach panuje chłód . Wyposażenie pokoju jest stare , kran się rusza , drzwi na balkon nie domykają się . Jedzenie jest w małych ilościach i nie smaczne . Nie polecam nikomu tego hotelu .
输入(由DeepL翻译):在入口处,酒店有臭味。房间里墙上长霉,地毯脏。浴室有化学气味,酒店里房间不供暖,非常冷。房间的家具很旧,水龙头能动,阳台的门无法关闭。食物量少,不好吃。我不会推荐这家酒店给任何人。
输出:1(负面)
| Subset | Cardinality | 
|---|---|
| train | 6573 | 
| val | 823 | 
| test | 820 | 
| Class | train | dev | test | 
|---|---|---|---|
| minus | 0.3756 | 0.3694 | 0.4134 | 
| plus | 0.2775 | 0.2868 | 0.2768 | 
| amb | 0.1991 | 0.1883 | 0.1659 | 
| zero | 0.1477 | 0.1555 | 0.1439 | 
@inproceedings{kocon-etal-2019-multi,
    title = "Multi-Level Sentiment Analysis of {P}ol{E}mo 2.0: Extended Corpus of Multi-Domain Consumer Reviews",
    author = "Koco{\'n}, Jan  and
      Mi{\l}kowski, Piotr  and
      Za{\'s}ko-Zieli{\'n}ska, Monika",
    booktitle = "Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/K19-1092",
    doi = "10.18653/v1/K19-1092",
    pages = "980--991",
    abstract = "In this article we present an extended version of PolEmo {--} a corpus of consumer reviews from 4 domains: medicine, hotels, products and school. Current version (PolEmo 2.0) contains 8,216 reviews having 57,466 sentences. Each text and sentence was manually annotated with sentiment in 2+1 scheme, which gives a total of 197,046 annotations. We obtained a high value of Positive Specific Agreement, which is 0.91 for texts and 0.88 for sentences. PolEmo 2.0 is publicly available under a Creative Commons copyright license. We explored recent deep learning approaches for the recognition of sentiment, such as Bi-directional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT).",
}
 Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
from pprint import pprint
from datasets import load_dataset
dataset = load_dataset("clarin-pl/polemo2-official")
pprint(dataset['train'][0])
# {'target': 1,
#  'text': 'Na samym wejściu hotel śmierdzi . W pokojach jest pleśń na ścianach '
#          ', brudny dywan . W łazience śmierdzi chemią , hotel nie grzeje w '
#          'pokojach panuje chłód . Wyposażenie pokoju jest stare , kran się '
#          'rusza , drzwi na balkon nie domykają się . Jedzenie jest w małych '
#          'ilościach i nie smaczne . Nie polecam nikomu tego hotelu .'}
 import random
from pprint import pprint
from datasets import load_dataset, load_metric
dataset = load_dataset("clarin-pl/polemo2-official")
references = dataset["test"]["target"]
# generate random predictions
predictions = [random.randrange(max(references) + 1) for _ in range(len(references))]
acc = load_metric("accuracy")
f1 = load_metric("f1")
acc_score = acc.compute(predictions=predictions, references=references)
f1_score = f1.compute(predictions=predictions, references=references, average='macro')
pprint(acc_score)
pprint(f1_score)
# {'accuracy': 0.2475609756097561}
# {'f1': 0.23747048177471738}