数据集:

SetFit/toxic_conversations

中文

Toxic Conversation

This is a version of the Jigsaw Unintended Bias in Toxicity Classification dataset . It contains comments from the Civil Comments platform together with annotations if the comment is toxic or not.

10 annotators annotated each example and, as recommended in the task page, set a comment as toxic when target >= 0.5

The dataset is inbalanced, with only about 8% of the comments marked as toxic.