数据集:
tner/fin
FIN NER 数据集格式化为 TNER 项目的一部分。FIN 数据集包含训练集 (FIN5) 和测试集 (FIN3),因此我们从训练集中随机抽取一半的测试实例来创建验证集。
训练集的一个示例如下所示。
{
"tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"]
}
标签到 ID 的映射词典可以在 here 找到。
{
"O": 0,
"B-PER": 1,
"B-LOC": 2,
"B-ORG": 3,
"B-MISC": 4,
"I-PER": 5,
"I-LOC": 6,
"I-ORG": 7,
"I-MISC": 8
}
| name | train | validation | test |
|---|---|---|---|
| fin | 1014 | 303 | 150 |
@inproceedings{salinas-alvarado-etal-2015-domain,
title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment",
author = "Salinas Alvarado, Julio Cesar and
Verspoor, Karin and
Baldwin, Timothy",
booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015",
month = dec,
year = "2015",
address = "Parramatta, Australia",
url = "https://aclanthology.org/U15-1010",
pages = "84--90",
}