Hebrew Dataset for ASR
[More Information Needed]
[More Information Needed]
{'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav',
'array': array([-0.00265503, -0.0018158 , -0.00149536, ..., -0.00135803,
-0.00231934, -0.00190735]),
'sampling_rate': 16000},
'sentence': 'היא מבינה אותי יותר מכל אחד אחר'}
[More Information Needed]
| train | validation | |
|---|---|---|
| number of samples | 8000 | 2000 |
| hours | 6.92 | 1.73 |
scraped data from youtube (channel כאן) with removing outliers (by length and ratio between length of the audio and sentences)
[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@misc{imvladikon2022hebrew_speech_kan,
author = {Gurevich, Vladimir},
title = {Hebrew Speech Recognition Dataset: Kan},
year = {2022},
howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_kan},
}
[More Information Needed]