Special Issue “Automatic Speech Recognition and Understanding in Air Traffic Management” in MDPI aerospace

At the latest since Alexa, Google Assistant and Siri, voice recognition technologies have been seamlessly integrated into our everyday lives. This innovation not only liberates our hands when inputting a new address into a navigation system, it also has the potential to reduce air traffic controller (ATCo) workload and enhance air traffic management safety.

The ideas of using data link exist already for more than 30 years. Nevertheless, voice communication between ATCos and pilots using radio equipment is still the main communication channel in air traffic control. The ATCo issues verbal commands to the cockpit crew. Whenever the information from voice communication has to be digitized, ATCos are burdened to enter the information – that has already been uttered – manually. Automatic Speech Recognition (ASR) transforms the analog voice signal into a spoken sequence of words, e.g.

speed bird four eight six descend flight level one two zero”.

Automatic Speech Understanding extracts the meaning from the above sequence of words, e.g. that the aircraft with the callsign BAW486 should descend to roughly 12 thousand feet. The different approaches in Europe and in the US to model the spoken words and semantics in machine readable form are discussed in the article from Chen et al. We can model for example the above sequence of words also as

speedbird 4 8 6 descend flight level 1 2 0

Here “speed bird” is written in one word and the numbers are actual digits instead of words.

When the formal problems of representing lexical, syntactic and semantic information are solved, the spoken callsigns still need to be extracted from a spoken transmission. This challenge is addressed by the paper from Garcia et al. The authors do not only address the extraction of callsigns spoken by the ATCo in the lab environment, but also from pilots, even in the noisy operational environment, i.e. from the cockpit, when pilots with different accents are flying in the Spanish approach environment. Highlighting the displayed aircraft callsign of the flight that is currently transmitting via radio reduces the workload of the air traffic controller.

Another application of Automatic Speech Recognition and Understanding (ASRU) is Pre-Filling of Radar Labels with information extracted from ATCo voice transmissions, for example for Vienna approach control. This means, not only the spoken callsigns need to be extracted, but also the spoken command and the associated values. The paper from Ahrendhold et al. quantifies the effects of ASRU support in terms of safety and human performance. An implemented ASRU system was validated within a human-in-the-loop environment by ATCos in different traffic-density scenarios. In the baseline condition, ATCos performed radar label maintenance by entering the voice transmission content manually with a mouse and keyboard into the aircraft radar label. In the proposed solution, ATCos were supported by ASRU, which most of the time automatically entered the information into the radar labels. This approach lead to a reduction of clicking times from the ATCos by a factor of more than 30.

A similar application is presented in the paper of Kleinert et al.. Here Automatic Speech Recognition and Understanding (ASRU) is integrated into an Advanced Surface Movement Guidance and Control System (A-SMGCS). ASRU provides the A-SMGCS with the abbility to automatically adapat the apron controller route planing based on the voice communication. This relieves the controllers from the burden to enter a lot of the information manually into the A-SMGCS. Validations with the ASRU enhanced A-SMGCS were performed in the complex apron simulation training environment of Frankfurt airport with 14 apron controllers in a human-in-the-loop simulation in summer 2022. The integration significantly reduced the workload of controllers and increased safety as well as overall performance.

The article of Ohneiser et al. addresses lower technology readiness levels. Ten ATCos from Lithuania and Austria participated in a human-in-the-loop simulation of DLR to validate ABSR support within a prototypic multiple remote tower controller working position. The ABSR supports ATCos by (1) highlighting recognized callsigns, (2) inputting recognized commands from ATCo voice transmissions in electronic flight strips, (3) offering correction of ABSR output, (4) automatically accepting ABSR output, and (5) feeding the digital air traffic control system. The presented results motivate the technology to be brought to a higher technology readiness level, which is also confirmed by subjective feedback from questionnaires and objective measurement of workload reduction based on a performed secondary task.

Most of the results of ASRU application still result from the lab environment, which requires simulations with air traffic controllers and especially the integration of simulation pilots. The article of Zuluaga et al. presents a virtual simulation pilot agent to reduce the number of needed simulation pilots especially in the context of air traffic controller trainings. The engine also includes a text-to-speech engine and, therefore, generates the pilot’s readbacks The framework employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models.

All articles are available here. (More to come)