模型:
eugenesiow/bart-paraphrase
一个在3个释义数据集上进行精调的大型BART seq2seq(文本到文本生成)模型。
BART模型是由Lewis等人在2019年提出的。
原始的BART代码来自于此 repository 。
您可以使用预训练模型对输入句子进行释义。
import torch
from transformers import BartForConditionalGeneration, BartTokenizer
input_sentence = "They were there to enjoy us and they were there to pray for us."
model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase')
batch = tokenizer(input_sentence, return_tensors='pt')
generated_ids = model.generate(batch['input_ids'])
generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_sentence)
['They were there to enjoy us and to pray for us.']
该模型在预训练的数据集( facebook/bart-large )上进行精调,使用了Quora,PAWS和MSR释义语料库。
我们按照 simpletransformers seq2seq example 中提供的训练过程进行训练。
@misc{lewis2019bart,
title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension},
author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
year={2019},
eprint={1910.13461},
archivePrefix={arXiv},
primaryClass={cs.CL}
}