模型:
eugenesiow/bart-paraphrase
一个在3个释义数据集上进行精调的大型BART seq2seq(文本到文本生成)模型。
BART模型是由Lewis等人在2019年提出的。
原始的BART代码来自于此 repository 。
您可以使用预训练模型对输入句子进行释义。
import torch from transformers import BartForConditionalGeneration, BartTokenizer input_sentence = "They were there to enjoy us and they were there to pray for us." model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase') batch = tokenizer(input_sentence, return_tensors='pt') generated_ids = model.generate(batch['input_ids']) generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) print(generated_sentence)
['They were there to enjoy us and to pray for us.']
该模型在预训练的数据集( facebook/bart-large )上进行精调,使用了Quora,PAWS和MSR释义语料库。
我们按照 simpletransformers seq2seq example 中提供的训练过程进行训练。
@misc{lewis2019bart, title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer}, year={2019}, eprint={1910.13461}, archivePrefix={arXiv}, primaryClass={cs.CL} }