模型:

xlm-roberta-large-finetuned-conll02-spanish

任务:

填充掩码

类库:

PyTorch Rust Transformers

语言:

multilingual

其他:

xlm-roberta AutoTrain Compatible

预印本库:

arxiv:1911.02116 arxiv:1910.09700

模型介绍文件清单

英文

xlm-roberta-large-finetuned-conll02-spanish

模型详情

模型描述

XLM-RoBERTa模型是由Alexis Conneau、Kartikay Khandelwal、Naman Goyal、Vishrav Chaudhary、Guillaume Wenzek、Francisco Guzmán、Edouard Grave、Myle Ott、Luke Zettlemoyer和Veselin Stoyanov于2019年提出的。它基于Facebook于2019年发布的RoBERTa模型。这是一个大型多语言语言模型，训练时使用了2.5TB经过筛选的CommonCrawl数据。该模型是在西班牙语数据集上进行微调的。

开发者：参见 associated paper
模型类型：多语言语言模型
语言（NLP）：XLM-RoBERTa是一个在100种不同语言上训练的多语言模型；完整列表见 GitHub Repo ；模型在西班牙语数据集上进行了微调。
许可证：需要更多信息
相关模型： RoBERTa ， XLM
- 父模型： XLM-RoBERTa-large
查看更多信息的资源：- GitHub Repo - Associated Paper - CoNLL-2002 data card

用途

直接使用

该模型是一个语言模型。可以将该模型用于标记分类，这是一种自然语言理解任务，要在文本中为某些标记分配标签。

下游使用

可能的下游使用案例包括命名实体识别（NER）和词性标注（Part-of-Speech，PoS）。欲了解有关标记分类和其他潜在下游应用案例的详细信息，请参阅Hugging Face token classification docs 。

超出范围的使用

请勿使用该模型有意为人们创造敌对或疏远的环境。

偏见、风险和限制

警告：读者应该意识到，该模型生成的语言可能对某些人具有冒犯或令人不悦的性质，并可能传播历史和现实的刻板印象。

大量研究探讨了语言模型的偏见和公平性问题（例如， Sheng et al. (2021) 和 Bender et al. (2021) ）。

建议

用户（包括直接用户和下游用户）应该意识到模型的风险、偏见和限制。

训练

有关训练数据和训练过程的详细信息，请参阅以下资源：

评估

有关评估详情，请参阅 associated paper 。

环境影响

可以使用 Machine Learning Impact calculator 中提供的方法对碳排放进行估算。

硬件类型：500个32GB Nvidia V100 GPU（来自 associated paper ）
使用小时数：需要更多信息
云服务提供商：需要更多信息
计算区域：需要更多信息
排放的碳量：需要更多信息

技术规格

有关详细信息，请参阅 associated paper 。

引用

BibTeX：

@article{conneau2019unsupervised,
  title={Unsupervised Cross-lingual Representation Learning at Scale},
  author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},
  journal={arXiv preprint arXiv:1911.02116},
  year={2019}
}

APA：

Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.

模型卡片作者

此模型卡片由Hugging Face团队编写。

如何开始使用该模型

使用下面的代码来开始使用该模型。您可以直接在NER流水线中使用该模型。

点击此处展开

>>> from transformers import AutoTokenizer, AutoModelForTokenClassification
>>> from transformers import pipeline
>>> tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large-finetuned-conll02-spanish")
>>> model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finetuned-conll02-spanish")
>>> classifier = pipeline("ner", model=model, tokenizer=tokenizer)
>>> classifier("Efectuaba un vuelo entre bombay y nueva york.")

[{'end': 30,
  'entity': 'B-LOC',
  'index': 7,
  'score': 0.95703226,
  'start': 25,
  'word': '▁bomba'},
 {'end': 39,
  'entity': 'B-LOC',
  'index': 10,
  'score': 0.9771854,
  'start': 34,
  'word': '▁nueva'},
 {'end': 43,
  'entity': 'I-LOC',
  'index': 11,
  'score': 0.9914097,
  'start': 40,
  'word': '▁yor'}]

作者:

None

数据集大小:

4.18 GB

xlm-roberta-large-finetuned-conll02-spanish

目录

模型详情

模型描述

用途

直接使用

下游使用

超出范围的使用

偏见、风险和限制

建议

训练

评估

环境影响

技术规格

引用

模型卡片作者

如何开始使用该模型