数据集:
id_nergrit_corpus
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
Nergrit Corpus is a dataset collection of Indonesian Named Entity Recognition, Statement Extraction, and Sentiment Analysis developed by PT Gria Inovasi Teknologi (GRIT) .
[More Information Needed]
Indonesian
A data point consists of sentences seperated by empty line and tab-seperated tokens and tags.
{'id': '0',
'tokens': ['Gubernur', 'Bank', 'Indonesia', 'menggelar', 'konferensi', 'pers'],
'ner_tags': [9, 28, 28, 38, 38, 38],
}
[More Information Needed]
The ner_tags correspond to this list:
"B-CRD", "B-DAT", "B-EVT", "B-FAC", "B-GPE", "B-LAN", "B-LAW", "B-LOC", "B-MON", "B-NOR", "B-ORD", "B-ORG", "B-PER", "B-PRC", "B-PRD", "B-QTY", "B-REG", "B-TIM", "B-WOA", "I-CRD", "I-DAT", "I-EVT", "I-FAC", "I-GPE", "I-LAN", "I-LAW", "I-LOC", "I-MON", "I-NOR", "I-ORD", "I-ORG", "I-PER", "I-PRC", "I-PRD", "I-QTY", "I-REG", "I-TIM", "I-WOA", "O",
The ner_tags have the same format as in the CoNLL shared task: a B denotes the first item of a phrase and an I any non-initial word. The dataset contains 19 following entities
'CRD': Cardinal
'DAT': Date
'EVT': Event
'FAC': Facility
'GPE': Geopolitical Entity
'LAW': Law Entity (such as Undang-Undang)
'LOC': Location
'MON': Money
'NOR': Political Organization
'ORD': Ordinal
'ORG': Organization
'PER': Person
'PRC': Percent
'PRD': Product
'QTY': Quantity
'REG': Religion
'TIM': Time
'WOA': Work of Art
'LAN': Language
Sentiment Analysis
The ner_tags correspond to this list:
"B-NEG", "B-NET", "B-POS", "I-NEG", "I-NET", "I-POS", "O",Statement Extraction
The ner_tags correspond to this list:
"B-BREL", "B-FREL", "B-STAT", "B-WHO", "I-BREL", "I-FREL", "I-STAT", "I-WHO", "O"
The ner_tags have the same format as in the CoNLL shared task: a B denotes the first item of a phrase and an I any non-initial word.
The dataset is splitted in to train, validation and test sets.
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?The annotators are listed in the Nergrit Corpus repository
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Thanks to @cahya-wirawan for adding this dataset.