英文

TAPEX(大型模型)

TAPEX是由刘前,陈贝,郭佳琪,Morteza Ziyadi,林泽琪,陈伟柱,楼建光于 TAPEX: Table Pre-training via Learning a Neural SQL Executor 提出的。原始代码库可以在 here 中找到。

模型描述

TAPEX(Table Pre-training via Execution)是一种概念简单且经验丰富的预训练方法,可以赋予现有模型具有表格推理技能。TAPEX通过学习一个神经SQL执行器来实现表格预训练,该执行器通过自动合成可执行的SQL查询获取合成语料库。

TAPEX基于BART架构,具有双向(类BERT)编码器和自回归(类GPT)解码器的Transformer编码器-解码器(seq2seq)模型。

此模型是在 WikiTableQuestions 数据集上微调的tapex-base模型。

预期用途

您可以使用该模型回答关于复杂问题的表格问答。下面显示了一些可解答的问题(对应的表格未显示):

Question Answer
according to the table, what is the last title that spicy horse produced? Akaneiro: Demon Hunters
what is the difference in runners-up from coleraine academical institution and royal school dungannon? 20
what were the first and last movies greenstreet acted in? The Maltese Falcon, Malaya
in which olympic games did arasay thondike not finish in the top 20? 2012
which broadcaster hosted 3 titles but they had only 1 episode? Channel 4

如何使用

这里是在transformers中使用此模型的方法:

from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd

tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wtq")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-large-finetuned-wtq")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008.0']

如何评估

请找到评估脚本 here

BibTeX条目和引用信息

@inproceedings{
    liu2022tapex,
    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=O50443AsCP}
}