Speech compression in dysarthria (Utianski et al., 2019)
datasetposted on 2019-02-23, 00:12 authored by Rene L. Utianski, Steven Sandoval, Visar Berisha, Kaitlin L. Lansford, Julie M. Liss
Purpose: Telemedicine, used to offset disparities in access to speech-language therapy, relies on technology that utilizes compression algorithms to transmit signals efficiently. These algorithms have been thoroughly evaluated on healthy speech; however, the effects of compression algorithms on the intelligibility of disordered speech have not been adequately explored.
Method: This case study assessed acoustic and perceptual effects of resampling and speech compression (i.e., transcoding) on the speech of 2 individuals with dysarthria. Forced-choice vowel identification and transcription tasks were utilized, completed by 20 naive undergraduate listeners.
Results: Results showed relative improvements and decrements in intelligibility, on various measures, based on the speakers’ acoustic profiles. The transcoding of the speech compression algorithm resulted in an enlarged vowel space area and associated improvements in vowel identification for 1 speaker and a smaller vowel space area and decreased vowel identification for the other speaker. Interestingly, there was an overall decrease in intelligibility in the transcription task in this condition for both speakers.
Conclusions: There is a complex interplay between dysarthria and compression algorithms that warrants further exploration. The findings suggest that it is critical to be mindful of apparent changes in intelligibility secondary to compression algorithms necessary for practicing telemedicine.
Supplemental Material S1. Confusion matrices for each vowel for each condition for Speaker One, where numbers are raw values and indicate the number of tokens identified out of the corpus of 80 classifications per vowel; target vowels are indicated with phonetic (International Phonetic Alphabet [IPA]) notation; corresponding perceived vowels are indicated with orthographic notation.
Supplemental Material S2. Confusion matrices for each vowel for each condition for Speaker Two, where numbers are raw values and indicate the number of tokens identified out of the corpus of 80 classifications per vowel; target vowels are indicated with phonetic (International Phonetic Alphabet [IPA]) notation; corresponding perceived vowels are indicated with orthographic notation.
Utianski, R. L., Sandoval, S., Berisha, V., Lansford, K. L., & Liss, J. M. (2019). The effects of speech compression algorithms on the intelligibility of two individuals with dysarthric speech. American Journal of Speech-Language Pathology, 28, 195–203. https://doi.org/10.1044/2018_AJSLP-18-0081