模型:
eugenesiow/bart-paraphrase
A large BART seq2seq (text2text generation) model fine-tuned on 3 paraphrase datasets.
The BART model was proposed in BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Lewis et al. (2019).
The original BART code is from this repository .
You can use the pre-trained model for paraphrasing an input sentence.
import torch from transformers import BartForConditionalGeneration, BartTokenizer input_sentence = "They were there to enjoy us and they were there to pray for us." model = BartForConditionalGeneration.from_pretrained('eugenesiow/bart-paraphrase') device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) tokenizer = BartTokenizer.from_pretrained('eugenesiow/bart-paraphrase') batch = tokenizer(input_sentence, return_tensors='pt') generated_ids = model.generate(batch['input_ids']) generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True) print(generated_sentence)
['They were there to enjoy us and to pray for us.']
The model was fine-tuned on a pretrained facebook/bart-large , using the Quora , PAWS and MSR paraphrase corpus .
We follow the training procedure provided in the simpletransformers seq2seq example .
@misc{lewis2019bart, title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer}, year={2019}, eprint={1910.13461}, archivePrefix={arXiv}, primaryClass={cs.CL} }