菲律宾语RoBERTa句子模型

我们对 NewsPH-NLI 进行了 RoBERTa Tagalog Base (finetuned on COHFIE) 微调，以学习编码菲律宾语/塔加洛语句子的句子嵌入。我们使用 sentence-transformers 对模型进行了微调。有关所有模型细节、训练设置和语料库详细信息，请参阅该论文： Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings 。

预期用途和限制

此模型的预期用途是提取句子嵌入，用于聚类。由于我们没有对其进行偏见检查，因此此模型在生产中可能不安全。请谨慎使用。

如何使用

在安装了 sentence-transformers 之后，使用此模型会更加容易：

pip install -U sentence-transformers

使用SentenceTransformer将句子编码为句子嵌入的方法如下：

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("danjohnvelasco/filipino-sentence-roberta-v1")
sentence_list = ["sentence 1", "sentence 2", "sentence 3"]
sentence_embeddings = model.encode(sentence_list)
print(sentence_embeddings)

BibTeX条目和引用信息

如果您使用了此模型，请引用我们的工作：

@misc{https://doi.org/10.48550/arxiv.2204.03251,
  doi = {10.48550/ARXIV.2204.03251},
  url = {https://arxiv.org/abs/2204.03251},
  author = {Velasco, Dan John and Alba, Axel and Pelagio, Trisha Gail and Ramirez, Bryce Anthony and Cruz, Jan Christian Blaise and Cheng, Charibeth},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

作者:

Dan Velasco

数据集大小:

417.68 MB