数据集:

imppres

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

machine-generated

批注创建人:

machine-generated

源数据集:

original
英文

IMPPRES 数据集卡片

数据集摘要

超过25k个同时生成的句子对,用于展示研究充分推断类型。IMPPRES是一个自然语言推理数据集,遵循SNLI (Bowman et al., 2015)、MultiNLI (Williams et al., 2018)和XNLI (Conneau et al., 2018)的格式,旨在评估经过训练的NLI模型在识别多种前提和数量蕴涵的类别时的表现。

支持的任务和排行榜

自然语言推断。

语言

英语。

数据集结构

数据实例

数据包含2种配置:前提和蕴涵。每个配置包含多个不同的子数据集:

前提

  • all_n_presupposition
  • change_of_state
  • cleft_uniqueness
  • possessed_definites_existence
  • question_presupposition
  • both_presupposition
  • cleft_existence
  • only_presupposition
  • possessed_definites_uniqueness

蕴涵

  • connectives
  • gradable_adjective
  • gradable_verb
  • modals
  • numerals_10_100
  • numerals_2_3
  • quantifiers

IMPPRES中的每种句子类型都是根据模板生成的,该模板指定了句子中成分的线性顺序。成分是从一个包含超过3000个词汇项的词汇表中采样的,这些词汇项带有确保语法正确性所需的语法特征。我们使用由Warstadt等人(2019a)开发并在BLiMP数据集(Warstadt等人,2019b)中大幅扩展的代码库来半自动生成IMPPRES数据。

这里是一个任意子数据集中原始前提数据的实例:

{
"sentence1": "All ten guys that proved to boast might have been divorcing.", 
"sentence2": "There are exactly ten guys that proved to boast.", 
"trigger": "modal", 
"presupposition": "positive", 
"gold_label": "entailment", 
"UID": "all_n_presupposition", 
"pairID": "9e", 
"paradigmID": 0
}

以及任意子数据集中的原始蕴涵数据实例:

{
"sentence1": "That teenager couldn't yell.", 
"sentence2": "That teenager could yell.", 
"gold_label_log": "contradiction", 
"gold_label_prag": "contradiction", 
"spec_relation": "negation", 
"item_type": "control", 
"trigger": "modal", 
"lexemes": "can - have to"
}

数据字段

前提

在预设的子数据集中,原始数据字段和HuggingFace数据集中出现的字段存在轻微映射关系。使用HF Dataset时,字段的映射关系如下:

"premise" -> "sentence1"
"hypothesis"-> "sentence2"
"trigger" -> "trigger" or "Not_In_Example"
"trigger1" -> "trigger1" or "Not_In_Example"
"trigger2" -> "trigger2" or "Not_In_Example"
"presupposition" -> "presupposition" or "Not_In_Example"
"gold_label" -> "gold_label"
"UID" -> "UID"
"pairID" -> "pairID"
"paradigmID" -> "paradigmID"

大多数情况下,绝大多数原始字段保持不变。然而,当处理各个 trigger 字段时,引入了一个新的映射。数据集中有些示例只有 trigger 字段,而其他示例则有 trigger1 和 trigger2 字段,没有 trigger 或 presupposition 字段。通常,大部分示例的格式与上面的“数据实例”部分中的示例相似。然而,偶尔会有一些示例的格式如下:

{
'sentence1': 'Did that committee know when Lissa walked through the cafe?', 
'sentence2': 'That committee knew when Lissa walked through the cafe.', 
'trigger1': 'interrogative', 
'trigger2': 'unembedded', 
'gold_label': 'neutral', 
'control_item': True, 
'UID': 'question_presupposition', 
'pairID': '1821n', 
'paradigmID': 95
}

在这个示例中,出现了 trigger1 和 trigger2 字段,并移除了 presupposition 和 trigger 字段。这保持了字典的长度。为了处理这些示例,我们引入了上述映射,以便通过HF Datasets接口访问的所有示例都具有相同的大小和相同的字段。如果示例中的某个字段没有值,该字段将保留在字典中,但给予 Not_In_Example 的值。

为了说明这一点,上面“数据实例”部分给出的示例在HF Datasets中将如下所示:

{
"premise": "All ten guys that proved to boast might have been divorcing.", 
"hypothesis": "There are exactly ten guys that proved to boast.", 
"trigger": "modal",
"trigger1":  "Not_In_Example",
"trigger2": "Not_In_Example"
"presupposition": "positive", 
"gold_label": "entailment", 
"UID": "all_n_presupposition", 
"pairID": "9e", 
"paradigmID": 0
}

下面是字段的描述:

"premise": The premise. 
"hypothesis": The hypothesis. 
"trigger": A detailed discussion of trigger types appears in the paper.
"trigger1":  A detailed discussion of trigger types appears in the paper.
"trigger2": A detailed discussion of trigger types appears in the paper.
"presupposition": positive or negative. 
"gold_label": Corresponds to entailment, contradiction, or neutral. 
"UID": Unique id. 
"pairID": Sentence pair ID.
"paradigmID": ?

不清楚 trigger 、 trigger1 和 trigger2 之间的区别,以及 paradigmID 是什么。

蕴涵

蕴涵字段只有以下映射:

"premise" -> "sentence1"
"hypothesis"-> "sentence2"

下面是字段的描述:

"premise": The premise. 
"hypothesis": The hypothesis. 
"gold_label_log": Gold label for a logical reading of the sentence pair.
"gold_label_prag": Gold label for a pragmatic reading of the sentence pair.
"spec_relation": ?
"item_type": ?
"trigger": A detailed discussion of trigger types appears in the paper.
"lexemes": ? 

数据切分

由于该数据集是用于测试已经训练好的模型,因此只有一个用于测试的数据切分。

数据集创建

选择原因

IMPPRES是为了评估经过训练的NLI模型对多种前提和数量蕴涵类别的识别能力而创建的。

来源数据

初始数据收集和规范化

[需要更多信息]

谁是源语言的生成者?

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁?

注释是半自动生成的。

个人和敏感信息

[需要更多信息]

数据使用的注意事项

数据的社会影响

[需要更多信息]

偏差讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集维护者

[需要更多信息]

许可信息

IMPPRES可在知识共享署名-非商业性4.0国际公共许可下使用(“许可证”)。除非符合许可证的规定,否则不能使用这些文件。请在使用数据集之前详细阅读LICENSE文件以获取更多信息。

引用信息

@inproceedings{jeretic-etal-2020-natural,
    title = "Are Natural Language Inference Models {IMPPRESsive}? {L}earning {IMPlicature} and {PRESupposition}",
    author = "Jereti\v{c}, Paloma  and
      Warstadt, Alex  and
      Bhooshan, Suvrat  and
      Williams, Adina",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.768",
    doi = "10.18653/v1/2020.acl-main.768",
    pages = "8690--8705",
    abstract = "Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by {``}some{''} as entailments. For some presupposition triggers like {``}only{''}, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.",
}

贡献者

感谢 @aclifton314 添加了这个数据集。