Wav2DDK: An automated DDK estimation algorithm (Kadambi et al., 2023)
Purpose: Oral diadochokinesis is a useful task in assessment of speech motor function in the context of neurological disease. Remote collection of speech tasks provides a convenient alternative to in-clinic visits, but scoring these assessments can be a laborious process for clinicians. This work describes Wav2DDK, an automated algorithm for estimating the diadochokinetic (DDK) rate on remotely collected audio from healthy participants and participants with amyotrophic lateral sclerosis (ALS).
Method: Wav2DDK was developed using a corpus of 970 DDK assessments from healthy and ALS speakers where ground truth DDK rates were provided manually by trained annotators. The clinical utility of the algorithm was demonstrated on a corpus of 7,919 assessments collected longitudinally from 26 healthy controls and 82 ALS speakers. Corpora were collected via the participants’ own mobile device, and instructions for speech elicitation were provided via a mobile app. DDK rate was estimated by parsing the character transcript from a deep neural network transformer acoustic model trained on healthy and ALS speech.
Results: Algorithm estimated DDK rates are highly accurate, achieving .98 correlation with manual annotation, and an average error of only 0.071 syllables per second. The rate exactly matched ground truth for 83% of files and was within 0.5 syllables per second for 95% of files. Estimated rates achieve a high test-retest reliability (r = .95) and show good correlation with the revised ALS functional rating scale speech subscore (r = .67).
Conclusion: We demonstrate a system for automated DDK estimation that increases efficiency of calculation beyond manual annotation. Thorough analytical and clinical validation demonstrates that the algorithm is not only highly accurate, but also provides a convenient, clinically relevant metric for tracking longitudinal decline in ALS, serving to promote participation and diversity of participants in clinical research.
Supplemental Material S1. Description of DDK Corpus 2.
Supplemental Material S2. The three-stage model of speech generation.
Supplemental Material S3. Elicitations for ALS sentences dataset.
Supplemental Material S4. An acoustic-to-character sequence ASR model.
Supplemental Material S5. Data visualization using tSNE embeddings.
Kadambi, P., Stegmann, G. M., Liss, J., Berisha, V., & Hahn, S. (2023). Wav2DDK: Analytical and clinical validation of an automated diadochokinetic rate estimation algorithm on remotely collected speech. Journal of Speech, Language, and Hearing Research. Advance online publication. https://doi.org/10.1044/2023_JSLHR-22-00282
Publisher Note: This article is part of the Special Issue: Select Papers From the 2022 Conference on Motor Speech.