数据集:
anli
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found预印本库:
arxiv:1910.14599许可:
The Adversarial Natural Language Inference (ANLI) is a new large-scale NLI benchmark dataset, The dataset is collected via an iterative, adversarial human-and-model-in-the-loop procedure. ANLI is much more difficult than its predecessors including SNLI and MNLI. It contains three rounds. Each round has train/dev/test splits.
English
An example of 'train_r2' looks as follows.
This example was too long and was cropped:
{
"hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.",
"label": 0,
"premise": "\"Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...",
"reason": "",
"uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712"
}
The data fields are the same among all splits.
plain_text| name | train_r1 | dev_r1 | train_r2 | dev_r2 | train_r3 | dev_r3 | test_r1 | test_r2 | test_r3 |
|---|---|---|---|---|---|---|---|---|---|
| plain_text | 16946 | 1000 | 45460 | 1000 | 100459 | 1200 | 1000 | 1000 | 1200 |
cc-4 Attribution-NonCommercial
@InProceedings{nie2019adversarial,
title={Adversarial NLI: A New Benchmark for Natural Language Understanding},
author={Nie, Yixin
and Williams, Adina
and Dinan, Emily
and Bansal, Mohit
and Weston, Jason
and Kiela, Douwe},
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
year = "2020",
publisher = "Association for Computational Linguistics",
}
Thanks to @thomwolf , @easonnie , @lhoestq , @patrickvonplaten for adding this dataset.