数据集:

nq_open

任务:

问答

子任务:

open-domain-qa

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

other

批注创建人:

expert-generated
英文

nq_open的数据集卡片

数据集摘要

NQ-Open任务由Lee等人在2019年提出,是一个开放领域的问答基准,源自于自然问答。目标是为输入的英文问题预测一个英文答案字符串。所有问题都可以使用英文维基百科的内容来回答。

支持的任务和排行榜

开放领域问答,EfficientQA排行榜: https://ai.google.com/research/NaturalQuestions/efficientqa

语言

英文(en)

数据集结构

数据实例

{
    "question": "names of the metropolitan municipalities in south africa",
    "answer": [
        "Mangaung Metropolitan Municipality",
        "Nelson Mandela Bay Metropolitan Municipality",
        "eThekwini Metropolitan Municipality",
        "City of Tshwane Metropolitan Municipality",
        "City of Johannesburg Metropolitan Municipality",
        "Buffalo City Metropolitan Municipality",
        "City of Ekurhuleni Metropolitan Municipality"
    ]
}

数据字段

  • 问题 - 输入的开放领域问题。
  • 答案 - 问题的可能答案列表。

数据拆分

  • 训练集:87925个
  • 验证集:1800个

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和归一化

自然问答包含来自于对谷歌搜索的聚合查询的问题(Kwiatkowski等人,2019)。为了获取该数据集的开放版本,我们只保留带有短答案的问题,并且丢弃给定的证据文档。具有许多标记的答案通常类似于抽取的片段而不是规范的答案,因此我们会丢弃超过5个标记的答案。

谁是源语言的生产者?

[需要更多信息]

注释

注释过程

[需要更多信息]

谁是注释者?

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用此数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

对这些多样的问题-答案对进行评估至关重要,因为所有现有的数据集都存在着对于具有学习检索的开放领域问答系统来说是有问题的固有偏见。在自然问答数据集中,问题提问者并不已经知道答案。这准确地反映了真实信息寻求问题的分布。然而,注释者必须单独找到正确的答案,这需要自动工具的帮助,可能会对来自工具的结果引入适度的偏差。

其他已知限制

[需要更多信息]

其他信息

数据集策划者

[需要更多信息]

许可信息

所有自然问答数据都在 CC BY-SA 3.0 许可下发布。

引用信息

@article{doi:10.1162/tacl\_a\_00276,
    author = {Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew                         M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav},
    title = {Natural Questions: A Benchmark for Question Answering Research},
    journal = {Transactions of the Association for Computational Linguistics},
    volume = {7},
    number = {},
    pages = {453-466},
    year = {2019},
    doi = {10.1162/tacl\_a\_00276},
    URL = { 
            https://doi.org/10.1162/tacl_a_00276
        },
    eprint = { 
            https://doi.org/10.1162/tacl_a_00276
        
        },
    abstract = { We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature. }
}

@inproceedings{lee-etal-2019-latent,
    title = "Latent Retrieval for Weakly Supervised Open Domain Question Answering",
    author = "Lee, Kenton  and
      Chang, Ming-Wei  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1612",
    doi = "10.18653/v1/P19-1612",
    pages = "6086--6096",
    abstract = "Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.",
}

贡献

感谢 @Nilanshrajput 添加了这个数据集。