数据集:

allegro/klej-cdsc-e

语言:

pl

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original
英文

klej-cdsc-e

描述

波兰CDSCorpus包含10K波兰语句对,人工标注了语义相关性(CDSC-R)和蕴涵(CDSC-E)。该数据集可用于评估波兰语的组合分布语义模型。该数据集在ACL 2017上进行了介绍。

尽管SICK数据集对该数据集的主要设计产生了启发,但细节上存在差异。与SICK数据集类似,句子来自图像说明,但所选择的图像集更加多样,来自46个主题组。

任务(输入、输出和指标)

两个句子之间的蕴涵关系被标记为蕴涵(entailment)、矛盾(contradiction)或中性(neutral)。任务是预测前提是否蕴含假设(蕴涵)、否定假设(矛盾)或无关(中性)。

b蕴含a(a来源于b)- 如果句子b描述的情况或事件发生,那么认为句子a描述的情况或事件也发生了,即a和b指的是相同的事件或情况;

输入:('句子A','句子B'):句子对

输出('蕴涵判断'列):可能的蕴涵关系(蕴涵、矛盾、中性)之一

领域:图像说明

度量:准确率

示例:

输入:Żaden mężczyzna nie stoi na przystanku autobusowym.; Mężczyzna z żółtą i białą reklamówką w ręce stoi na przystanku obok autobusu.

输入(由DeepL翻译):No man standing at the bus stop.; A man with a yellow and white bag in his hand stands at a bus stop next to a bus.

输出:蕴涵

数据划分

Subset Cardinality
train 8000
validation 1000
test 1000

类分布

Class train validation test
NEUTRAL 0.744 0.741 0.744
ENTAILMENT 0.179 0.185 0.190
CONTRADICTION 0.077 0.074 0.066

引用

@inproceedings{wroblewska-krasnowska-kieras-2017-polish,
    title = "{P}olish evaluation dataset for compositional distributional semantics models",
    author = "Wr{\'o}blewska, Alina  and
      Krasnowska-Kiera{\'s}, Katarzyna",
    booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P17-1073",
    doi = "10.18653/v1/P17-1073",
    pages = "784--792",
    abstract = "The paper presents a procedure of building an evaluation dataset. for the validation of compositional distributional semantics models estimated for languages other than English. The procedure generally builds on steps designed to assemble the SICK corpus, which contains pairs of English sentences annotated for semantic relatedness and entailment, because we aim at building a comparable dataset. However, the implementation of particular building steps significantly differs from the original SICK design assumptions, which is caused by both lack of necessary extraneous resources for an investigated language and the need for language-specific transformation rules. The designed procedure is verified on Polish, a fusional language with a relatively free word order, and contributes to building a Polish evaluation dataset. The resource consists of 10K sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish.",
}

许可证

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

链接

HuggingFace

Source

Paper

示例

加载

from pprint import pprint

from datasets import load_dataset

dataset = load_dataset("allegro/klej-cdsc-e")
pprint(dataset["train"][0])

# {'entailment_judgment': 'NEUTRAL',
#  'pair_ID': 1,
#  'sentence_A': 'Chłopiec w czerwonych trampkach skacze wysoko do góry '
#                'nieopodal fontanny .',
#  'sentence_B': 'Chłopiec w bluzce w paski podskakuje wysoko obok brązowej '
#                'fontanny .'}

评估

import random
from pprint import pprint

from datasets import load_dataset, load_metric

dataset = load_dataset("allegro/klej-cdsc-e")
dataset = dataset.class_encode_column("entailment_judgment")
references = dataset["test"]["entailment_judgment"]

# generate random predictions
predictions = [random.randrange(max(references) + 1) for _ in range(len(references))]

acc = load_metric("accuracy")
f1 = load_metric("f1")

acc_score = acc.compute(predictions=predictions, references=references)
f1_score = f1.compute(predictions=predictions, references=references, average="macro")

pprint(acc_score)
pprint(f1_score)

# {'accuracy': 0.325}
# {'f1': 0.2736171695141161}