Glottis Analysis Tools (Kist et al., 2021)
journal contributionposted on 2021-05-17, 21:15 authored by Andreas M. Kist, Pablo Gómez, Denis Dubrovskiy, Patrick Schlegel, Melda Kunduk, Matthias Echternach, Rita Patel, Marion Semmler, Christopher Bohr, Stephan Dürr, Anne Schützenber, Michael Döllinger
Purpose: High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine.
Method: We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation.
Results: We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software.
Conclusion: GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders.
Supplemental Material S1. Supplemental figures:
Figure S1. Neural network training metrics.
Figure S2. Neural network evaluation.
Figure S3. Example segmentations using different backbones from various source data.
Figure S4. Audio analysis platform.
Supplemental Material S2. Endoscopic footage with semi-automatic and fully automatic segmentation methods, multiple examples.
Supplemental Material S3. GAT manual.
Kist, A. M., Gómez, P., Dubrovskiy, D., Schlegel, P., Kunduk, M., Echternach, M., Patel, R., Semmler, M., Bohr, C., Dürr, S., Schützenberger, A., & Döllinger, M. (2021). A deep learning enhanced novel software tool for laryngeal dynamics analysis. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2021_JSLHR-20-00498
Work was supported over the years by several projects: Bundesministerium für Wirtschaft und Energie (ZF4010105BA8), Deutsche Forschungsgemeinschaft DFG DO1247/8-1/2, BO4399/2-1, DO1247/12-1, SCHU3441/3-2, and EC409/1-2. A. M. K. was supported by an Addon-Fellowship of the Joachim Herz Foundation.
speechvoicedeep learningsoftwarelaryngealdynamicshigh-speed videoendoscopyHSVendoscopydisorderassessdiagnosevocal foldoscillationglottalclinicaluser-friendlyeditingmotioncorrectionsegmentationquatitativeneuralnetworkGlottis Analysis ToolsGATin vivoex vivorecordinghigh speedvideoaudiowaveformresearchdiagnosistreatmentdisordersLaboratory Phonetics and Speech Science