数据集:
id_clickbait
任务:
子任务:
fact-checking语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
CLICK-ID 数据集是从12个本地在线新闻发布商(detikNews、Fimela、Kapanlagi、Kompas、Liputan6、Okezone、Posmetro-Medan、Republika、Sindonews、Tempo、Tribunnews 和 Wowkeren)收集的印度尼西亚新闻标题的集合。该数据集主要由两部分组成:(i)46,119个原始文章数据和(ii)15,000个带有点击诱导注释的标题示例。注释是通过3个注释人员对每个标题进行检查而进行的。判断仅基于标题。其中,大多数被认为是真实情况。在注释样本中,我们的注释显示有6,290个点击诱导标题和8,710个非点击诱导标题。
[需要更多信息]
印度尼西亚语
注释文章的示例:
{
'id': '100',
'label': 1,
'title': "SAH! Ini Daftar Nama Menteri Kabinet Jokowi - Ma'ruf Amin"
}
>
数据集包含训练集。
[需要更多信息]
[需要更多信息]
谁是源语言的制作者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
知识共享署名4.0国际许可证
@article{WILLIAM2020106231,
title = "CLICK-ID: A novel dataset for Indonesian clickbait headlines",
journal = "Data in Brief",
volume = "32",
pages = "106231",
year = "2020",
issn = "2352-3409",
doi = "https://doi.org/10.1016/j.dib.2020.106231",
url = "http://www.sciencedirect.com/science/article/pii/S2352340920311252",
author = "Andika William and Yunita Sari",
keywords = "Indonesian, Natural Language Processing, News articles, Clickbait, Text-classification",
abstract = "News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas."
}
感谢 @cahya-wirawan 添加了这个数据集。