Adaptation of a speech recognizer to new areas or airports requires re-training of acoustic and language models. This demands transcriptions of spoken controller and pilot utterances, which can be really time consuming. One hour of speech can easily need 40 hours of manual work to get an acceptable transcription quality. To expedite that task DLR has developed the Controller Command Logging Tool for Context Comparison (CoCoLoToCoCo).
The Fraport AG for example has used CoCoLoToCoCo to transcribe 2660 controller utterance recordings in December 2021 from Fraport’s apron simulator environment. Eliminating the silence, the 2660 utterances consist of 175 minutes of data. The whole transcription process including checking (excluding annotation) was completed in 13 hours of work by one person, which means less than five hours of work per hour of data. This was however only achieved because the word error rate of the pre-transcriptions was already below 5%.
In general CoCoLoToCoCo provides an easy interface for listening to the speech and iterating from one utterance to another. To understand even noisy and quickly spoken utterances the audio player properties can be changed, e.g. adjust of playback speed.
Different color coding’s show the processing status of the speech files:
black: fully done
blue: annotation missing
red: transcription missing
bold: needs checking
For writing down the utterance word by word a special text field is made available. To save time during typing the text field automatically transforms numbers and capital letters to its telephony spelling. Pressing “Shift-A” results in “alfa”, pressing on “8912” results in “eight nine one two”.
Input of “condor 1HE contact tower on 112.780” results in the transcription:
Furthermore, the callsigns of all aircraft in the current sector are displayed with all their possibilities of how they are usually said (“lufthansa one one one alfa”, “lufthansa one alfa”, but also “hansa triple one alfa”) . Some utterance parts especially in a language other than English occur often and they are difficult to type in via keyword. Therefore, everyone can define own phrases which occur often during the transcription. When needed these phrases can be copied and pasted easily into the transcription to simplify the typing.
To guarantee transcription quality a syntax and semantic check is automatically performed. Common typos are automatically corrected. “niner” results in “nine”. “pushback” results in “push back”, “alpha” results in “alfa” etc.
Stay tuned for further content on CoCoLoToCoCo in future blogs.