Helsinki-NLP/opus-mt-zh-en | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

Helsinki-NLP/opus-mt-zh-en

任务:

翻译

类库:

PyTorch TensorFlow Rust Transformers

语言:

其他:

marian 文生文 AutoTrain Compatible

许可:

cc-by-4.0

模型介绍文件清单

英文

zho-eng

模型细节

模型描述：
开发者：赫尔辛基大学语言技术研究小组
模型类型：翻译
语言：

源语言：中文
目标语言：英文

用途

直接使用

该模型可用于翻译和文本生成。

风险、限制和偏见

内容警告：读者应注意本部分包含令人不安、冒犯的内容，并可能传播历史和当前的刻板印象。

已有大量研究探讨了语言模型的偏见和公平性问题（参见，例如， Sheng et al. (2021) 和 Bender et al. (2021) ）。

有关此模型的数据集的详细信息，请参阅OPUS自述文件： zho-eng

训练

系统信息

赫尔辛基_git_sha：480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
transformers_git_sha：2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
port_machine：brutasse
port_time：2020-08-21-14:41
src_multilingual：False
tgt_multilingual：False

训练数据预处理

预处理：标准化+SentencePiece（spm32k，spm32k）
参考长度：82826.0
数据集： opus
下载原始权重： opus-2020-07-17.zip
测试集翻译： opus-2020-07-17.test.txt

评估

结果

测试集得分： opus-2020-07-17.eval.txt
短文惩罚：0.948

基准测试

testset	BLEU	chr-F
Tatoeba-test.zho.eng	36.1	0.548

引用信息

@InProceedings{TiedemannThottingal:EAMT2020,
  author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
  title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld},
  booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)},
  year = {2020},
  address = {Lisbon, Portugal}
 }

如何开始使用模型

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")

model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")

作者:

Language Technology Research Group at the University of Helsinki

数据集大小:

1.12 GB