数据集:
wikisql
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1709.00103其他:
text-to-sql许可:
一个用于开发关系数据库自然语言接口的大规模众包数据集。
WikiSQL是一个包含80654个问题和SQL查询的数据集,分布在来自维基百科的24241个表格中进行手动注释。
'validation'的示例如下。
This example was too long and was cropped:
{
    "phase": 1,
    "question": "How would you answer a second test question?",
    "sql": {
        "agg": 0,
        "conds": {
            "column_index": [2],
            "condition": ["Some Entity"],
            "operator_index": [0]
        },
        "human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity",
        "sel": 0
    },
    "table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..."
}
 所有拆分之间的数据字段相同。
默认| name | train | validation | test | 
|---|---|---|---|
| default | 56355 | 8421 | 15878 | 
@article{zhongSeq2SQL2017,
  author    = {Victor Zhong and
               Caiming Xiong and
               Richard Socher},
  title     = {Seq2SQL: Generating Structured Queries from Natural Language using
               Reinforcement Learning},
  journal   = {CoRR},
  volume    = {abs/1709.00103},
  year      = {2017}
}
 感谢 @lewtun 、 @ghomasHudson 、 @thomwolf 添加此数据集。