数据集:
a6kme/minds14-mirror
任务:
子任务:
keyword-spotting计算机处理:
multilingual大小:
10K<n<100K预印本库:
arxiv:2104.08524许可:
MINDS-14是一个用于口语意图识别任务的训练和评估资源。它涵盖了从电子银行领域的商业系统中提取出的14个意图,以及与14种不同语言变体相对应的口语示例。
MInDS-14可以按照以下方式下载和使用:
from datasets import load_dataset
minds_14 = load_dataset("PolyAI/minds14", "fr-FR") # for French
# to download all data for multi-lingual fine-tuning uncomment following line
# minds_14 = load_dataset("PolyAI/all", "all")
# see structure
print(minds_14)
# load audio sample on the fly
audio_input = minds_14["train"][0]["audio"] # first decoded audio sample
intent_class = minds_14["train"][0]["intent_class"] # first transcription
intent = minds_14["train"].features["intent_class"].names[intent_class]
# use audio_input and language_class to fine-tune your model for audio classification
下面我们展示了数据集配置文件 fr-FR的详细信息,其他配置文件具有相同的结构。
fr-FR
配置文件 fr-FR的数据实例示例如下:
{
"path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
"audio": {
"path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav",
"array": array(
[0.0, 0.0, 0.0, ..., 0.0, 0.00048828, -0.00024414], dtype=float32
),
"sampling_rate": 8000,
},
"transcription": "je souhaite changer mon adresse",
"english_transcription": "I want to change my address",
"intent_class": 1,
"lang_id": 6,
}
所有拆分的数据字段都是相同的。
每个配置文件只有一个 "train" 拆分,包含约600个示例。
所有数据集都在 Creative Commons license (CC-BY) 下获得许可。
@article{DBLP:journals/corr/abs-2104-08524,
author = {Daniela Gerz and
Pei{-}Hao Su and
Razvan Kusztos and
Avishek Mondal and
Michal Lis and
Eshan Singhal and
Nikola Mrksic and
Tsung{-}Hsien Wen and
Ivan Vulic},
title = {Multilingual and Cross-Lingual Intent Detection from Spoken Data},
journal = {CoRR},
volume = {abs/2104.08524},
year = {2021},
url = {https://arxiv.org/abs/2104.08524},
eprinttype = {arXiv},
eprint = {2104.08524},
timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2104-08524.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
感谢 @patrickvonplaten 增加了这个数据集