模型:
pszemraj/bigbird-pegasus-large-K-booksum
这是训练时间最长的 "最新" 版本的模型,目前已经进行了70k步的训练
包括批量摘要演示的扩展示例在 here 中
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline
model = AutoModelForSeq2SeqLM.from_pretrained(
"pszemraj/bigbird-pegasus-large-K-booksum",
low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"pszemraj/bigbird-pegasus-large-K-booksum",
)
summarizer = pipeline(
"summarization",
model=model,
tokenizer=tokenizer,
)
wall_of_text = "your text to be summarized goes here."
result = summarizer(
wall_of_text,
min_length=16,
max_length=256,
no_repeat_ngram_size=3,
clean_up_tokenization_spaces=True,
)
print(result[0]["summary_text"])