模型:

Helsinki-NLP/opus-mt-en-trk

英文

eng-trk

  • 来源组:英语

  • 目标组:突厥语系语言

  • OPUS自述: eng-trk

  • 模型:transformer

  • 源语言:eng

  • 目标语言:aze_Latn bak chv crh crh_Latn kaz_Cyrl kaz_Latn kir_Cyrl kjh kum ota_Arab ota_Latn sah tat tat_Arab tat_Latn tuk tuk_Latn tur tyv uig_Arab uig_Cyrl uzb_Cyrl uzb_Latn

  • 模型:transformer

  • 预处理:标准化 + SentencePiece (spm32k,spm32k)

  • 需要以形式为 >>id<< 的句子初始语言令牌(id = 有效的目标语言ID)

  • 下载原始权重: opus2m-2020-08-01.zip

  • 测试集翻译: opus2m-2020-08-01.test.txt

  • 测试集分数: opus2m-2020-08-01.eval.txt

基准

testset BLEU chr-F
newsdev2016-entr-engtur.eng.tur 10.1 0.437
newstest2016-entr-engtur.eng.tur 9.2 0.410
newstest2017-entr-engtur.eng.tur 9.0 0.410
newstest2018-entr-engtur.eng.tur 9.2 0.413
Tatoeba-test.eng-aze.eng.aze 26.8 0.577
Tatoeba-test.eng-bak.eng.bak 7.6 0.308
Tatoeba-test.eng-chv.eng.chv 4.3 0.270
Tatoeba-test.eng-crh.eng.crh 8.1 0.330
Tatoeba-test.eng-kaz.eng.kaz 11.1 0.359
Tatoeba-test.eng-kir.eng.kir 28.6 0.524
Tatoeba-test.eng-kjh.eng.kjh 1.0 0.041
Tatoeba-test.eng-kum.eng.kum 2.2 0.075
Tatoeba-test.eng.multi 19.9 0.455
Tatoeba-test.eng-ota.eng.ota 0.5 0.065
Tatoeba-test.eng-sah.eng.sah 0.7 0.030
Tatoeba-test.eng-tat.eng.tat 9.7 0.316
Tatoeba-test.eng-tuk.eng.tuk 5.9 0.317
Tatoeba-test.eng-tur.eng.tur 34.6 0.623
Tatoeba-test.eng-tyv.eng.tyv 5.4 0.210
Tatoeba-test.eng-uig.eng.uig 0.1 0.155
Tatoeba-test.eng-uzb.eng.uzb 3.4 0.275

系统信息:

  • hf_name: eng-trk

  • 源语言: eng

  • 目标语言: trk

  • OPUS自述文件网址: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-trk/README.md

  • 原始存储库: Tatoeba-Challenge

  • 标签: ['translation']

  • 语言: ['en', 'tt', 'cv', 'tk', 'tr', 'ba', 'trk']

  • 源成分: {'eng'}

  • 目标成分: {'kir_Cyrl', 'tat_Latn', 'tat', 'chv', 'uzb_Cyrl', 'kaz_Latn', 'aze_Latn', 'crh', 'kjh', 'uzb_Latn', 'ota_Arab', 'tuk_Latn', 'tuk', 'tat_Arab', 'sah', 'tyv', 'tur', 'uig_Arab', 'crh_Latn', 'kaz_Cyrl', 'uig_Cyrl', 'kum', 'ota_Latn', 'bak'}

  • 源语言多语言: False

  • 目标语言多语言: True

  • 预处理: 标准化 + SentencePiece (spm32k,spm32k)

  • 模型网址: https://object.pouta.csc.fi/Tatoeba-MT-models/eng-trk/opus2m-2020-08-01.zip

  • 测试集网址: https://object.pouta.csc.fi/Tatoeba-MT-models/eng-trk/opus2m-2020-08-01.test.txt

  • 源语言ISO 639-3码: eng

  • 目标语言ISO 639-3码: trk

  • 短语对: en-trk

  • chrF2分数: 0.455

  • BLEU分数: 19.9

  • 简洁度惩罚: 1.0

  • 参考长度: 57072.0

  • 源名称: English

  • 目标名称: Turkic languages

  • 训练日期: 2020-08-01

  • 源语言ISO 639-1码: en

  • 目标语言ISO 639-1码: trk

  • 提供旧的模型: False

  • 长语言对: eng-trk

  • Helsinki Git SHA: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535

  • Transformers Git SHA: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b

  • 端口提供机器: brutasse

  • 端口提供时间: 2020-08-21-14:41