数据集:
wikisql
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1709.00103其他:
text-to-sql许可:
一个用于开发关系数据库自然语言接口的大规模众包数据集。
WikiSQL是一个包含80654个问题和SQL查询的数据集,分布在来自维基百科的24241个表格中进行手动注释。
'validation'的示例如下。
This example was too long and was cropped:
{
"phase": 1,
"question": "How would you answer a second test question?",
"sql": {
"agg": 0,
"conds": {
"column_index": [2],
"condition": ["Some Entity"],
"operator_index": [0]
},
"human_readable": "SELECT Header1 FROM table WHERE Another Header = Some Entity",
"sel": 0
},
"table": "{\"caption\": \"L\", \"header\": [\"Header1\", \"Header 2\", \"Another Header\"], \"id\": \"1-10015132-9\", \"name\": \"table_10015132_11\", \"page_i..."
}
所有拆分之间的数据字段相同。
默认| name | train | validation | test |
|---|---|---|---|
| default | 56355 | 8421 | 15878 |
@article{zhongSeq2SQL2017,
author = {Victor Zhong and
Caiming Xiong and
Richard Socher},
title = {Seq2SQL: Generating Structured Queries from Natural Language using
Reinforcement Learning},
journal = {CoRR},
volume = {abs/1709.00103},
year = {2017}
}
感谢 @lewtun 、 @ghomasHudson 、 @thomwolf 添加此数据集。