数据集:
PolyAI/minds14
任务:
子任务:
keyword-spotting计算机处理:
multilingual大小:
10K<n<100K预印本库:
arxiv:2104.08524许可:
MINDS-14是一个用于语音数据意图检测任务的训练和评估资源。它覆盖了来自电子银行领域商业系统中提取的14个意图,与14种不同语言风格的口语示例相关联。
MInDS-14可以按以下方式下载和使用:
from datasets import load_dataset
minds_14 = load_dataset("PolyAI/minds14", "fr-FR") # for French
# to download all data for multi-lingual fine-tuning uncomment following line
# minds_14 = load_dataset("PolyAI/all", "all")
# see structure
print(minds_14)
# load audio sample on the fly
audio_input = minds_14["train"][0]["audio"] # first decoded audio sample
intent_class = minds_14["train"][0]["intent_class"] # first transcription
intent = minds_14["train"].features["intent_class"].names[intent_class]
# use audio_input and language_class to fine-tune your model for audio classification
我们展示了配置fr-FR的数据集示例配置的详细信息。所有其他配置具有相同的结构。
fr-FR
配置fr-FR的数据实例示例如下:
{
"path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
"audio": {
"path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
"array": array(
[0.0, 0.0, 0.0, ..., 0.0, 0.00048828, -0.00024414], dtype=float32
),
"sampling_rate": 8000,
},
"transcription": "je souhaite changer mon adresse",
"english_transcription": "I want to change my address",
"intent_class": 1,
"lang_id": 6,
}
数据字段在所有拆分中都相同。
每个配置只有一个“train”拆分,包含约600个示例。
所有数据集均在 Creative Commons license (CC-BY) 下获得许可。
@article{DBLP:journals/corr/abs-2104-08524,
author = {Daniela Gerz and
Pei{-}Hao Su and
Razvan Kusztos and
Avishek Mondal and
Michal Lis and
Eshan Singhal and
Nikola Mrksic and
Tsung{-}Hsien Wen and
Ivan Vulic},
title = {Multilingual and Cross-Lingual Intent Detection from Spoken Data},
journal = {CoRR},
volume = {abs/2104.08524},
year = {2021},
url = {https://arxiv.org/abs/2104.08524},
eprinttype = {arXiv},
eprint = {2104.08524},
timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08524.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
感谢 @patrickvonplaten 添加了这个数据集