Media-Bias-Identification-Benchmark数据集的数据卡片

基准

Task	Model	Micro F1	Macro F1
cognitive-bias	ConvBERT/ConvBERT	0.7126	0.7664
fake-news	Bart/RoBERTa-T	0.6811	0.7533
gender-bias	RoBERTa-T/ELECTRA	0.8334	0.8211
hate-speech	RoBERTA-T/Bart	0.8897	0.7310
linguistic-bias	ConvBERT/Bart	0.7044	0.4995
political-bias	ConvBERT/ConvBERT	0.7041	0.7110
racial-bias	ConvBERT/ELECTRA	0.8772	0.6170
text-leve-bias	ConvBERT/ConvBERT	0.7697	0.7532

语言

所有数据集都是英文的

数据集结构

数据实例

cognitive-bias

一个训练实例的示例如下。

{
  "text": "A defense bill includes language that would require military hospitals to provide abortions on demand",
  "label": 1
}

数据字段

文本 : 来自各种来源的句子（例如新闻文章，Twitter，其他社交媒体）。
标签 : 偏见的二进制指示器（0 = 无偏见，1 = 有偏见）

使用数据的考虑因素

数据的社会影响

我们认为MBIB为该领域的研究提供了一个新的共同基础，特别是鉴于对媒体偏见的（研究）关注的增加

引用信息

@inproceedings{
    title = {Introducing MBIB - the first Media Bias Identification Benchmark Task and Dataset Collection},
    author = {Wessel, Martin and Spinde, Timo and Horych, Tomáš and Ruas, Terry and Aizawa, Akiko and Gipp, Bela},
    year = {2023},
    note = {[in review]}
}

作者:

mediabiasgroup

数据集大小:

220.39 MB