Metrics and Command Extraction
Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, O. Ohneiser, S. Sarfjoo, H. Helmke, S. Shetty, P. Motlicek M. Kleinert, H. Ehr, Š. Murauskas (Oro navigacija, Lithuania), Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021, pp. 3291-3295
The common paper of DLR, Idiap and the Lithuanian Air Navigation Service Provider Oro navigacija describes the recognition performance on word and on semantic level for utterance from the Lithuanian airspace.
Automated Interpretation of Air Traffic Control Communication: The Journey from Spoken Words to a Deeper Understanding of the Meaning. M. Kleinert, H. Helmke, S. Shetty, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, J. Harfmann, 40th Digital Avionics System Conference (DASC 21), hybrid conference, San Antonio, Texas, USA, October 3-7, 2021.
The common paper of DLR, Idiap and NATS describes the rule-based algorithm to transform a sequence of words from an air traffic controller or pilot utterance to its semantic interpretation defined an extension of the 16-04 ontology. The defined JSON format allows a consistent exchange concerning Speech-to-Text transformation, ontology information or both together between different systems and applications. The format is by definition machine readable and easy to expand with additional key-value pairs while ensuring compatibility with old data. The paper also impressively shows the break-down of extraction performance, when no surveillance data is used or available.
Measuring Speech Recognition And Understanding Performance in Air Traffic Control Domain Beyond Word Error Rates; H. Helmke, S. Shetty, M. Kleinert, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, A. Cerna and C. Windisch, 11th SESAR Innovation Days, online conference, 2021.
The common paper of DLR, Idiap, Austro Control and Czech air navigation service provider ANS CR introduces the metrics command extraction rate, callsign extraction rate, command extraction error rate. These rates are evaluated on utterances of Austro Control and ANS CR, which are recorded in the MALORCA project in the ops room environment and in the solution 16-04 in the lab environment.
A shorter version of this Paper was presented during the Satellite Workshop at the Interspeech 2021:
How to Measure Speech Recognition Performance in the Air Traffic Control domain? The Word Error Rate is only half of the truth! . Helmke, S. Shetty, M. Kleinert, O. Ohneiser, H. Ehr, A. Prasad, P. Motlicek, A. Cerna and C. Windisch, Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.
Readback Error Detection
The king’s discipline of Automatic Speech Recognition and Understanding is readback error detection. Noisy and very abbreviated readback of pilot utterances require speech recognition and its semantic interpretation even when word error rates are beyond 10%. And the even bigger challenge is that readback errors are, luckily, seldom events. Only 1% to 4% of the conversation contain readback errors.
Readback Error Detection by Automatic Speech Recognition to Increase ATM Safety, H. Helmke, M. Kleinert, S. Shetty, O. Ohneiser, H. Ehr, H. Arilíusson, T. Simiganoschi, A. Prasad, Amrutha, P. Motlicek, K. Veselý, K. Ondřej, P. Smrz, J. Harfmann, C. Windisch,.14th ATM Seminar, 20.09.2021 – 24.09.2021, Virtual conference.
The common paper of DLR, Isavia ANS, Idiap, University of Brno (BUT), NATS and Austro Control shows that a recognition rate on command level of slightly above 50% is already sufficient to achieve a readback error detection of 50%, provided the error rate on command level is below 0.2%. Otherwise a readback error false alarm rate of more than 10% must be accepted.
Readback Error Detection by Automatic Speech Recognition and Understanding, H. Helmke, K. Ondřej, S. Shetty, H. Arilíusson, T. Simiganoschi, M. Kleinert, O. Ohneiser, H. Ehr, J. Zuluaga-Gomez, P Smrz, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8, 2022.
The common paper of DLR, BUT, Isavia ANS and Idiap presented two different algorithms for readback error detection: a rule-based one and a data-driven one, which is based on training a neural network by artificial readback error samples. The paper also presents two different approaches for command extraction, again a rule-based one and a data-driven one.
Application of HAAWAII architecture
The HAAWAII architecture was already successfully used in different projects. HAAWAII architecture means:
- to use Assistant Based Speech Recognition (ABSR), which integrates contextual knowledge (e.g., callsigns) from flight plan and surveillance data into Speech Recognition (Speech-to-Text with so called callsign boosting) and Speech Understanding (Text-to-Concept),
- to make very clear, that speech recognition (Speech-to-Text) does not automatically incorporate speech understanding (Text-to-Concept), only both together can enable an automatic speech recognition and understanding (ASRU)
- to use contextual knowledge from the conversation (e.g., previous utterance) in Text-to-Concept, e.g. “two zero zero thank you” in a pilot readback is very probable an altitude readback, and not an speed or heading readback, if the ATCo has just given a CLIMB command to flight level 200 (RBA),
- to integrate command validation in Text-to-Concept phase (VAL),
- to have the same acoustic and language model for ATCo and pilot utterances (ONE),
- to have a separate block for detection of voice transmissions, which either relies on push-to-talk (PTT) availability or needs to evaluate the input wave signal in more detail (Voice Activity Detection, VAD)
- to repair over- or under-splitting in the Text-to-Concept phase (REP)
Apron Controller Support by Integration of Automatic Speech Recognition with an Advanced Surface Movement Guidance and Control System, M. Kleinert, H. Helmke, S. Shetty, O. Ohneiser, H. Ehr, I. Nigmatulina, H. Wiese, M. Maier, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.
The paper above written by DLR, Idiap, Fraport and Atrics benefits from HAAWAII elements ABSR, ASRU, VAL, VAD and REP. It integrates a modern A-SMGCS system with speech recognition and understanding to support apron controllers for maintaining flight strip information and supports simulation pilots to reduce their workload.
Understanding Tower Controller Communication for Support in Air Traffic Control Displays , O. Ohneiser, H. Helmke, S. Shetty, M. Kleinert, H. Ehr, G. Balogh, A. Tønnesen, W. Rinaldi, S. Mansi, G. Piazzolla, Š. Murauskas, T. Pagirys, G. Kis-Pál, R. Tichy, V. Horváth, F. Kling, H. Usanovic, , SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.
The common paper of DLR, Indra Navia AS, LEONARDO S.p.A., the Lithuanian ANSP Oro Navigacija, HungaroControl and Austro Control benefits from ABSR, ASRU, VAL, PTT and REP. It summarizes the results of 3 exercise conducted in solution 97 of SESAR Industrial Research with respect to speech recognition and understanding support for tower controllers.
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator, Zuluaga-Gomez A. Prasad, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.
The paper from Idiap shows how to support simulations pilots with automatic speech recognition. The main ideas are:
(i) ASR to generate transcript from ATCo,
(ii) an entity generator to tag words (callsigns, commands, value) and
(iii) a repetition generator that uses a rule-based system to generate a pilot response based on the generated tags and a text-to-speech system that acts as a pseudo-pilot to repeat the generated pilot response.
The first step for ATC utterance understanding is to extract the callsign. Knowing which numbers and letters belong to the callsign, extracting the values of the commands is eased. The following papers of the HAAWAII team addressed this challenge.
Early Callsign Highlighting using Automatic Speech Recognition to reduce Air Traffic Controller Workload, S. Shetty, H. Helmke, M. Kleinert, O. Ohneiser, International Conference on Applied Human Factors and Ergonomics (AHFE), 24 – 28 July 2022, New York, USA.
The paper of DLR shows the advantages when callsign information is available and used. The algorithm is described and the effect of the different algorithm parts are also shown. We see quantitative decrease in performance, when certain parts of the algorithm are excluded.
Improving callsign recognition with air-surveillance data in air-traffic communication, I. Nigmatulina, R. Braun, J. Zuluaga-Gomez, P. Motlicek, Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.
The paper from Idiap addresses the improvement on word level when callsign boosting is applied, by using information from the flight plan and surveillance data.
Boosting of Contextual Information in ASR for Air-Traffic Call-Sign Recognition, Martin Kocour, Karel Veselý, Alexander Blatt, Juan Zuluaga Gomez, Igor Szöke, Jan Černocký, Dietrich Klakow Petr Motlicek, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.
The paper from University of Brno (BUT), Saarland University and Idiap addresses the improvement on word level when callsign boosting.
The following papers concentrate on improvements on speech-to-text level.
M. Kleinert, N. Venkatarathinam, H. Helmke, O. Ohneiser, M. Strake, T. Fingscheidt: Easy Adaptation of Speech Recognition to Different Air Traffic Control Environments using the DeepSpeech Engine, 11th SESAR Innovation Days, online conference, 2021.
The common paper of DLR and Braunschweig University shows the results of training the DeepSpeech Engine to recognize utterance from Prague and Vienna ops room environment.
Zuluaga-Gomez, J., Prasad, A., Nigmatulina, I., Sarfjoo, S., Motlicek, P., Kleinert, M., Helmke, H., Ohneiser, O. and Zhan, Q. “How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications”, 2023 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
The common paper of Idiap, DLR and Beijing Institute of Technology addresses the usage of Pre-trained Wav2Vec2.0.
A two-step approach to leverage contextual data: speech recognition in air-traffic communications, Nigmatulina Iuliia, Zuluaga-Gomez. Juan, Amrutha Prasad, Seyyed Saeed Sarfjoo and Petr Motlicek, in: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.
The paper from Idiap presents a two-step approach to leverage contextual data.
Contextual Semi-Supervised Learning: An Approach to Leverage Air-Surveillance and Untranscribed ATC Data in ASR Systems, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Karel Veselý, Martin Kocour Igor Szöke, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.
The paper above from Idiap, University of Brno (BUT) and ReplayWell, addresses Contextual Semi-Supervised Learning.
Detecting English Speech in the Air Traffic Control Voice Communication, Igor Szöke, Santosh Kesiraju, Ondřej Novotný, Martin Kocour, Karel Veselý, Jan Černocký, Interspeech 2021, Brno, Chechia, 30 August – 3 September, 2021.
The paper from Brno University of Technology (BUT) describes how to detect English speech in ATC utterances containing more than one language.
The common paper from Idiap and DLR describe the application of BERTraffic to detect the speaker role, i.e. whether air traffic controller or pilot is speaking.
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications J. Gomez, S. S. Sarfjoo, A. Prasad, I. Nigmatulina, P. Motlicek, O. Ohneiser, H. Helmke, H, 2023 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023.
The following paper for Idiap and DLR uses a grammar-based approach for identifying the speaker role.
Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition, Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Oliver Ohneiser, Hartmut Helmke, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8, 2022.
A shorter version of the paper was presented at the Interspeech 2021.
Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR, Amrutha Prasad (Idiap), Juan Pablo Zuluaga, Petr Motlicek, Oliver Ohneiser, Hartmut Helmke, Seyyed Saeed Sarfjoo and Iuliia Nigmatulina, , Interspeech 2021 Satellite Workshop, Brno, Chechia, 30 August – 3 September, 2021.
References used as starting point for the project
- Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr, M. Kleinert, Y. Oualil, and M. Schulder, “Assistant-based speech recognition for ATM applications,” in 11th USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal, 2015.
- Helmke, O. Ohneiser, Th. Mühlhausen, M. Wies, ”Reducing controller workload with automatic speech recognition,” in IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). Sacramento, California, 2016.
- Helmke, O. Ohneiser, J. Buxbaum, C. Kern, “Increasing ATM efficiency with assistant-based speech recognition,” in 12th USA/Europe Air Traffic Management Research and Development Seminar (ATM2017). Seattle, Washington, 2017.
- Kleinert, H. Helmke, G. Siol, H. Ehr, A. Cerna, C. Kern, D. Klakow, P. Motlicek et al., ”Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, England, 2018.
- Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al., “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, United Kingdom, 2018.
- Kleinert, H. Helmke, S. Moos, P. Hlousek, C. Windisch, O. Ohneiser, H. Ehr, and A. Labreuil, “Machine Learning of Air Traffic Controller Command Extraction Models for Speech Recognition Applications,” 9th SESAR Innovation Days, Athens, Greece, 2019.
- Helmke, M.Kleinert, O. Ohneiser, H. Ehr, and S. Shetty, “Reducing Controller Workload by Automatic Speech Recognition Assisted Radar Label Maintenance,” in IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). Virtual Conference, 2020.