BERTeus基础版 cased

这是巴斯克语预训练模型，该模型在 Give your Text Representation Models some Love: the Case for Basque 的论文中被提出。该模型在巴斯克语语料库上进行了训练，包括来自在线报纸和巴斯克语维基百科的巴斯克语新闻文章。训练语料库包含2.246亿个标记，其中有3500万个来自维基百科。

BERTeus已在巴斯克语的四个不同下游任务上进行了测试：词性标注（POS），命名实体识别（NER），情感分析和主题分类；在所有任务上改进了现有技术水平。请参阅以下结果摘要：

Downstream task	BERTeus	mBERT	Previous SOTA
Topic Classification	76.77	68.42	63.00
Sentiment	78.10	71.02	74.02
POS	97.76	96.37	96.10
NER	87.06	81.52	76.72

如果使用此模型，请引用以下论文：

@inproceedings{agerri2020give,
  title={Give your Text Representation Models some Love: the Case for Basque},
  author={Rodrigo Agerri and I{\~n}aki San Vicente and Jon Ander Campos and Ander Barrena and Xabier Saralegi and Aitor Soroa and Eneko Agirre},
  booktitle={Proceedings of the 12th International Conference on Language Resources and Evaluation},
  year={2020}
}

作者:

Ixa taldea

数据集大小:

950.46 MB