数据集:
nq_open
任务:
问答子任务:
open-domain-qa语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
other批注创建人:
expert-generated许可:
cc-by-sa-3.0NQ-Open任务由Lee等人在2019年提出,是一个开放领域的问答基准,源自于自然问答。目标是为输入的英文问题预测一个英文答案字符串。所有问题都可以使用英文维基百科的内容来回答。
开放领域问答,EfficientQA排行榜: https://ai.google.com/research/NaturalQuestions/efficientqa
英文(en)
{ "question": "names of the metropolitan municipalities in south africa", "answer": [ "Mangaung Metropolitan Municipality", "Nelson Mandela Bay Metropolitan Municipality", "eThekwini Metropolitan Municipality", "City of Tshwane Metropolitan Municipality", "City of Johannesburg Metropolitan Municipality", "Buffalo City Metropolitan Municipality", "City of Ekurhuleni Metropolitan Municipality" ] }
[需要更多信息]
自然问答包含来自于对谷歌搜索的聚合查询的问题(Kwiatkowski等人,2019)。为了获取该数据集的开放版本,我们只保留带有短答案的问题,并且丢弃给定的证据文档。具有许多标记的答案通常类似于抽取的片段而不是规范的答案,因此我们会丢弃超过5个标记的答案。
谁是源语言的生产者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
对这些多样的问题-答案对进行评估至关重要,因为所有现有的数据集都存在着对于具有学习检索的开放领域问答系统来说是有问题的固有偏见。在自然问答数据集中,问题提问者并不已经知道答案。这准确地反映了真实信息寻求问题的分布。然而,注释者必须单独找到正确的答案,这需要自动工具的帮助,可能会对来自工具的结果引入适度的偏差。
[需要更多信息]
[需要更多信息]
所有自然问答数据都在 CC BY-SA 3.0 许可下发布。
@article{doi:10.1162/tacl\_a\_00276, author = {Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav}, title = {Natural Questions: A Benchmark for Question Answering Research}, journal = {Transactions of the Association for Computational Linguistics}, volume = {7}, number = {}, pages = {453-466}, year = {2019}, doi = {10.1162/tacl\_a\_00276}, URL = { https://doi.org/10.1162/tacl_a_00276 }, eprint = { https://doi.org/10.1162/tacl_a_00276 }, abstract = { We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature. } } @inproceedings{lee-etal-2019-latent, title = "Latent Retrieval for Weakly Supervised Open Domain Question Answering", author = "Lee, Kenton and Chang, Ming-Wei and Toutanova, Kristina", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P19-1612", doi = "10.18653/v1/P19-1612", pages = "6086--6096", abstract = "Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.", }
感谢 @Nilanshrajput 添加了这个数据集。