模型:
s-nlp/roberta_toxicity_classifier
此模型是针对毒性分类任务进行训练的。训练所使用的数据集是由Jigsaw的三个数据集( Jigsaw 2018 , Jigsaw 2019 , Jigsaw 2020 )的英文部分合并而成,包含大约200万个示例。我们将其分为两部分,并在此基础上对RoBERTa模型( RoBERTa: A Robustly Optimized BERT Pretraining Approach )进行了微调。分类器在第一次Jigsaw比赛的测试集上表现良好,达到了0.98的AUC-ROC和0.76的F1分数。
from transformers import RobertaTokenizer, RobertaForSequenceClassification
# load tokenizer and model weights
tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
# prepare the input
batch = tokenizer.encode('you are amazing', return_tensors='pt')
# inference
model(batch)
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License 。