ASHA journals
Browse

Machine learning to explain errors in narratives (Zavaleta et al., 2025)

online resource
posted on 2025-03-03, 17:20 authored by Rosa Zavaleta, Jacob Brue, Sandip Sen, Laura Wilson

Purpose: This study examines how personal, clinical, and word-level features explain paraphasias when using machine learning–based error analysis on the narratives of people with aphasia (PWA).

Method: We used AphasiaBank as the source of narrative transcript data for 236 PWA. We tested machine learning classification algorithms including decision trees and random forests on the utterances of PWA, including nonparaphasic words and intended words when paraphasias were produced. We classified target words as paraphasic or nonparaphasic based on PWA’s age; aphasia severity, duration, and type; presence of apraxia or dysarthria; and word-level features including part of speech, word frequency, imageability, syllable count, and location in the utterance. We measured the models’ predictive accuracy across classification thresholds on held-out test sets, and we used feature analysis to compare feature importance.

Results: At the word level, our random forest model achieved an area under curve (AUC) of 0.896. We found a sensitivity of 0.821 for semantic paraphasias, 0.764 for phonemic paraphasias, and 0.872 for neologistic paraphasias. The most salient features, in order of importance, were word frequency, imageability, part of speech, age, severity, and syllable count, followed by aphasia duration, location of word, presence of apraxia, type of aphasia (e.g., fluent), and presence of dysarthria. Our random forest model that included information about surrounding words achieved AUC scores ranging from 0.881 to 0.899. Additionally, we developed a model that was trained on surrounding words and their respective features, but not given the actual error word. The best model achieved an AUC of 0.745.

Conclusions: Machine learning can aid in the explanation of paraphasias. In this study, we analyzed word- and person-level features and highlighted the nonrandom nature of paraphasic productions. Furthermore, this lays the groundwork for developing machine learning models with clinical applications at the various stages of treatment of PWA.

Supplemental Material S1. AUC values for random forest models with various window sizes, target word excluded.

Supplemental Material S2. Scripts used for data analyses.

Zavaleta, R., Brue, J., Sen, S., & Wilson, L. (2025). Using machine learning to explain paraphasias in narratives of people with aphasia. Perspectives of the ASHA Special Interest Groups. Advance online publication. https://doi.org/10.1044/2024_PERSP-23-00291

Funding

Authors Rosa Zavaleta, Jacob Brue, and Laura Wilson received internal funding from The University of Tulsa to travel and present this study at the 2023 ASHA Convention. No other funding was obtained. The AphasiaBank is supported by grant funding from National Institute on Deafness and Other Communication Disorders Grant R01-DC008524 (2022–2027).

History