数据集:
lj_speech
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books in English. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded in 2016-17 by the LibriVox project and is also in the public domain.
The dataset can be used to train a model for Automatic Speech Recognition (ASR) or Text-to-Speech (TTS).
The transcriptions and audio are in English.
A data point comprises the path to the audio file, called file and its transcription, called text . A normalized version of the text is also provided.
{ 'id': 'LJ002-0026', 'file': '/datasets/downloads/extracted/05bfe561f096e4c52667e3639af495226afe4e5d08763f2d76d069e7a453c543/LJSpeech-1.1/wavs/LJ002-0026.wav', 'audio': {'path': '/datasets/downloads/extracted/05bfe561f096e4c52667e3639af495226afe4e5d08763f2d76d069e7a453c543/LJSpeech-1.1/wavs/LJ002-0026.wav', 'array': array([-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449], dtype=float32), 'sampling_rate': 22050}, 'text': 'in the three years between 1813 and 1816,' 'normalized_text': 'in the three years between eighteen thirteen and eighteen sixteen,', }
Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.
id: unique id of the data sample.
file: a path to the downloaded audio file in .wav format.
audio: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate . Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0] .
text: the transcription of the audio file.
normalized_text: the transcription with numbers, ordinals, and monetary units expanded into full words.
The dataset is not pre-split. Some statistics:
[Needs More Information]
This dataset consists of excerpts from the following works:
Some details about normalization:
Abbreviation | Expansion |
---|---|
Mr. | Mister |
Mrs. | Misess (*) |
Dr. | Doctor |
No. | Number |
St. | Saint |
Co. | Company |
Jr. | Junior |
Maj. | Major |
Gen. | General |
Drs. | Doctors |
Rev. | Reverend |
Lt. | Lieutenant |
Hon. | Honorable |
Sgt. | Sergeant |
Capt. | Captain |
Esq. | Esquire |
Ltd. | Limited |
Col. | Colonel |
Ft. | Fort |
(*) there's no standard expansion for "Mrs." |
[Needs More Information]
Recordings by Linda Johnson from LibriVox. Alignment and annotation by Keith Ito.
The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in this dataset.
[Needs More Information]
[Needs More Information]
The dataset was initially created by Keith Ito and Linda Johnson.
Public Domain ( LibriVox )
@misc{ljspeech17, author = {Keith Ito and Linda Johnson}, title = {The LJ Speech Dataset}, howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/}}, year = 2017 }
Thanks to @anton-l for adding this dataset.