模型:
HeNLP/LongHeRo
采用最先进的Longformer语言模型进行希伯来语处理。
如何使用
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo')
model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo')
# Tokenization Example:
# Tokenizing
tokenized_string = tokenizer('שלום לכולם')
# Decoding
decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True)
如果您在研究中使用了LongHeRo,请引用 HeRo: RoBERTa and Longformer Hebrew Language Models 。
@article{shalumov2023hero,
title={HeRo: RoBERTa and Longformer Hebrew Language Models},
author={Vitaly Shalumov and Harel Haskey},
year={2023},
journal={arXiv:2304.11077},
}