Whisper Medium Indonesian

此模型是在印尼mozilla-foundation/common_voice_11_0、magic_data、titml和google/fleurs数据集上对 openai/whisper-medium 进行微调的版本。它达到了以下结果：

CV11测试集：

损失：0.0698
WER：3.8274

Google/fleurs测试集：

WER：9.74

使用方法

from transformers import pipeline
transcriber = pipeline(
  "automatic-speech-recognition", 
  model="cahya/whisper-medium-id"
)
transcriber.model.config.forced_decoder_ids = (
  transcriber.tokenizer.get_decoder_prompt_ids(
    language="id" 
    task="transcribe"
  )
)
transcription = transcriber("my_audio_file.mp3")

目标用途和限制

需要更多信息

训练和评估数据

需要更多信息

训练过程

训练超参数

训练时使用了以下超参数：

学习率：1e-06
训练批大小：16
评估批大小：16
种子：42
优化器：Adam，beta参数为(0.9,0.999)，epsilon为1e-08
lr_scheduler类型：线性
lr_scheduler_warmup_steps：500
训练步数：10000
混合精度训练: Native AMP

训练结果

Training Loss	Epoch	Step	Validation Loss	Wer
0.0427	0.33	1000	0.0664	4.3807
0.042	0.66	2000	0.0658	3.9426
0.0265	0.99	3000	0.0657	3.8274
0.0211	1.32	4000	0.0679	3.8366
0.0212	1.66	5000	0.0682	3.8412
0.0206	1.99	6000	0.0683	3.8689
0.0166	2.32	7000	0.0711	3.9657
0.0095	2.65	8000	0.0717	3.9980
0.0122	2.98	9000	0.0714	3.9795
0.0049	3.31	10000	0.0720	3.9887

评估

我们使用两个数据集的测试集进行了模型评估，分别是 Common Voice 11 和 Google Fleurs 。由于Whisper可以转录大小写和标点符号，我们还使用原始文本和归一化文本（小写+去除标点符号）来评估其性能。结果如下：

Common Voice 11

WER
1234321	3.83
1235321	12.62

Google/Fleurs

WER
1234321	9.74
1234321 + text normalization	tbc
1235321	10.2
1235321 + text normalization	tbc

Framework版本

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1.dev0
Tokenizers 0.13.2

作者:

Cahya Wirawan

数据集大小:

2.85 GB