数据集:
Jzuluaga/atcosim_corpus
ATCOSIM空中交通管制模拟语音语料库是由格拉茨理工大学(TUG)和欧洲航空管理局实验中心(EEC)提供的一种空中交通管制(ATC)操作员语音数据库。这个语料库包含了使用近距离话筒进行的ATC实时模拟过程中记录的十小时语音数据。这些话语是用英语发音的,由十个非母语人士发音。该数据库包括正字法转录和有关发言人和录音会话的其他信息。该语料库由Konrad Hofbauer( description here )进行记录和注释。
文本和录音均为英语。参与的管制员都是积极从事空中交通管制工作并在模拟区域具有专业经验的。这六名男性和四名女性控制员要么是德国人,要么是瑞士人,并且他们的母语为德语、瑞士德语或瑞士法语。控制员同意录制他们的声音,目的是进行语言分析以及语音技术的研究和开发,并被要求展示他们的正常工作行为。
数据集的许可状态取决于 ATCOSIM corpus 创建者的法律状态。
在HuggingFace中准备、处理、规范化和上传数据集的贡献者:
@article{zuluaga2022how,
title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications},
author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others},
journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
year={2022}
}
@article{zuluaga2022bertraffic,
title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications},
author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others},
journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
year={2022}
}
@article{zuluaga2022atco2,
title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications},
author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others},
journal={arXiv preprint arXiv:2211.04054},
year={2022}
}
数据集的作者:
@inproceedings{hofbauer-etal-2008-atcosim,
title = "The {ATCOSIM} Corpus of Non-Prompted Clean Air Traffic Control Speech",
author = "Hofbauer, Konrad and
Petrik, Stefan and
Hering, Horst",
booktitle = "Proceedings of the Sixth International Conference on Language Resources and Evaluation ({LREC}'08)",
month = may,
year = "2008",
address = "Marrakech, Morocco",
publisher = "European Language Resources Association (ELRA)",
url = "http://www.lrec-conf.org/proceedings/lrec2008/pdf/545_paper.pdf",
}