模型:
pysentimiento/robertuito-ner
存储库: https://github.com/pysentimiento/pysentimiento/
该模型是使用 LinCE NER corpus 中的西班牙语/英语数据集训练的,该数据集是一种混合代码切换基准。基础模型是使用西班牙推文训练的 RoBERTa 模型 RoBERTuito 。
如果您想使用该模型,我们建议直接从 pysentimiento 库中使用,因为它在管道工作中由于标记化问题而无法正常使用。
from pysentimiento import create_analyzer
ner_analyzer = create_analyzer("ner", lang="es")
ner_analyzer.predict(
"rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr"
)
# [{'type': 'PER',
# 'text': 'leonel andres messi cuccitini',
# 'start': 24,
# 'end': 53},
# {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65},
# {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}]
结果取自 LinCE 排行榜。
| Model | Sentiment | NER | POS |
|---|---|---|---|
| RoBERTuito | 60.6 | 68.5 | 97.2 |
| XLM Large | -- | 69.5 | 97.2 |
| XLM Base | -- | 64.9 | 97.0 |
| C2S mBERT | 59.1 | 64.6 | 96.9 |
| mBERT | 56.4 | 64.0 | 97.1 |
| BERT | 58.4 | 61.1 | 96.9 |
| BETO | 56.5 | -- | -- |
如果您在研究中使用了该模型,请引用 pysentimiento、RoBERTuito 和 LinCE 论文:
@misc{perez2021pysentimiento,
title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
year={2021},
eprint={2106.09462},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{perez2022robertuito,
title={RoBERTuito: a pre-trained language model for social media text in Spanish},
author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M},
booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},
pages={7235--7243},
year={2022}
}
@inproceedings{aguilar2020lince,
title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
pages={1803--1813},
year={2020}
}