模型:
danjohnvelasco/filipino-sentence-roberta-v1
我们对 NewsPH-NLI 进行了 RoBERTa Tagalog Base (finetuned on COHFIE) 微调,以学习编码菲律宾语/塔加洛语句子的句子嵌入。我们使用 sentence-transformers 对模型进行了微调。有关所有模型细节、训练设置和语料库详细信息,请参阅该论文: Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings 。
此模型的预期用途是提取句子嵌入,用于聚类。由于我们没有对其进行偏见检查,因此此模型在生产中可能不安全。请谨慎使用。
在安装了 sentence-transformers 之后,使用此模型会更加容易:
pip install -U sentence-transformers
使用SentenceTransformer将句子编码为句子嵌入的方法如下:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("danjohnvelasco/filipino-sentence-roberta-v1")
sentence_list = ["sentence 1", "sentence 2", "sentence 3"]
sentence_embeddings = model.encode(sentence_list)
print(sentence_embeddings)
如果您使用了此模型,请引用我们的工作:
@misc{https://doi.org/10.48550/arxiv.2204.03251,
doi = {10.48550/ARXIV.2204.03251},
url = {https://arxiv.org/abs/2204.03251},
author = {Velasco, Dan John and Alba, Axel and Pelagio, Trisha Gail and Ramirez, Bryce Anthony and Cruz, Jan Christian Blaise and Cheng, Charibeth},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}