数据集:
wikisql
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1709.00103其他:
text-to-sql许可:
A large crowd-sourced dataset for developing natural language interfaces for relational databases.
WikiSQL is a dataset of 80654 hand-annotated examples of questions and SQL queries distributed across 24241 tables from Wikipedia.
An example of 'validation' looks as follows.
This example was too long and was cropped:
{
    "phase": 1,
    "question": "How would you answer a second test question?",
    "sql": {
        "agg": 0,
        "conds": {
            "column_index": [2],
            "condition": ["Some Entity"],
            "operator_index": [0]
        },
        "human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity",
        "sel": 0
    },
    "table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..."
}
 The data fields are the same among all splits.
default| name | train | validation | test | 
|---|---|---|---|
| default | 56355 | 8421 | 15878 | 
@article{zhongSeq2SQL2017,
  author    = {Victor Zhong and
               Caiming Xiong and
               Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using
               Reinforcement Learning},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
  year      = {2017}
}
 Thanks to @lewtun , @ghomasHudson , @thomwolf for adding this dataset.