数据集:
sick
任务:
语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced许可:
共享和被国际公认的基准是任何计算系统发展的基础。我们旨在通过提供适用于组合分布语义模型(CDSMs)的大规模英文基准数据集SICK(涉及组合知识的句子),帮助研究社区。SICK包含大约10,000个英文句对,其中包含许多CDSMs应该解决的词汇、句法和语义现象的例子,但不需要处理CDSMs范围之外的现有句子数据集的其他方面(惯用的多字表达式、命名实体、电报语言)。通过众包技术,每个句对被注释为两个至关重要的语义任务:意义相关性(使用5级评分标准作为黄金分数)和两个元素之间的蕴含关系(使用三个可能的黄金标签:蕴含、矛盾和中立)。SICK数据集在SemEval-2014任务1中使用,并可供研究目的免费使用。
[需要更多信息]
数据集为英文。
示例实例:
{
"entailment_AB": "A_neutral_B",
"entailment_BA": "B_neutral_A",
"label": 1,
"id": "1",
"relatedness_score": 4.5,
"sentence_A": "A group of kids is playing in a yard and an old man is standing in the background",
"sentence_A_dataset": "FLICKR",
"sentence_A_original": "A group of children playing in a yard, a man in the background.",
"sentence_B": "A group of boys in a yard is playing and a man is standing in the background",
"sentence_B_dataset": "FLICKR",
"sentence_B_original": "A group of children playing in a yard, a man in the background."
}
训练集4439,试验集495,测试集4906
[需要更多信息]
[需要更多信息]
谁是源语言的生产者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
@inproceedings{marelli-etal-2014-sick,
title = "A {SICK} cure for the evaluation of compositional distributional semantic models",
author = "Marelli, Marco and
Menini, Stefano and
Baroni, Marco and
Bentivogli, Luisa and
Bernardi, Raffaella and
Zamparelli, Roberto",
booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)",
month = may,
year = "2014",
address = "Reykjavik, Iceland",
publisher = "European Language Resources Association (ELRA)",
url = "http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf",
pages = "216--223",
}
感谢 @calpt 添加了这个数据集。