数据集:
SetFit/toxic_conversations
This is a version of the Jigsaw Unintended Bias in Toxicity Classification dataset . It contains comments from the Civil Comments platform together with annotations if the comment is toxic or not.
10 annotators annotated each example and, as recommended in the task page, set a comment as toxic when target >= 0.5
The dataset is inbalanced, with only about 8% of the comments marked as toxic.