Test–retest reliability of discourse measures (Stark et al., 2023)
Purpose: The purpose of this study was to characterize test–retest reliability of discourse measures across a battery of common tasks in individuals with aphasia and prospectively matched adults without brain damage.
Method: We collected spoken discourse during five monologue tasks at two timepoints (test and retest; within 2 weeks apart) in an aphasia group (n = 23) and a peer group with no brain damage (n = 24). We evaluated test–retest reliability for percentage of correct information units, correct information units per minute, mean length of utterance, verbs per utterance, noun/verb ratio, open/closed class word ratio, tokens, sample duration (seconds), propositional idea density, type–token ratio, and words per minute. We explored reliability’s relationship with sample length and aphasia severity.
Results: Rater reliability was excellent. Across tasks, both groups demonstrated discourse measures with poor, moderate, and good reliability, with the aphasia group having measures demonstrating excellent test–retest reliability. When evaluating measures within each task, test–retest reliability again ranged from poor to excellent for both groups. Across groups and task, measures that appeared most reliable appeared to reflect lexical, informativeness, or fluency information. Sample length and aphasia severity impacted reliability, and this differed across and by task.
Conclusions: We identified several discourse measures that were reliable across and within tasks. Test–retest statistics are intimately linked to the specific sample, emphasizing the importance of multiple baseline studies. Task itself should be considered an important variable, and it should not be assumed that discourse measures found to be reliable across several tasks (averaged) are likewise reliable for a single task.
Supplemental Table S1. Discourse reporting standards developed through an expert consensus process conducted as part of a FOQUS Aphasia (www.foqusaphasia.com) initiative.
Supplemental Table S2. Summary of test-retest results across tasks, stratified by sample length and aphasia severity.
Supplemental Table S3. Summary of data by task split by group (test and retest are averaged).
Supplemental Table S4. Summary of test-retest results for the Cat Rescue task (describing a single picture).
Supplemental Table S5. Summary of test-retest results for the Cinderella task (retelling a fictional narrative).
Supplemental Table S6. Summary of test-retest results for the Sandwich task (procedural “how to” narrative).
Supplemental Table S7. Summary of test-retest results for the Broken Window task (describing a picture sequence).
Supplemental Table S8. Summary of test-retest results for the Refused Umbrella task (describing a picture sequence).
Supplemental Table S9. Summary of test-retest results for the Cat Rescue picture description.
Supplemental Table S10. Summary of test-retest results for the Cinderella fictional narrative.
Supplemental Table S11. Summary of test-retest results for the Sandwich procedural narrative.
Supplemental Table S12. Summary of test-retest results for the Broken Window picture sequence.
Supplemental Table S13. Summary of test-retest results for the Refused Umbrella picture sequence.
Supplemental Figure S1. Test-retest reliability for Cat Rescue.
Supplemental Figure S2. Test-retest reliability for Cinderella.
Supplemental Figure S3. Test-retest reliability for Sandwich.
Supplemental Figure S4. Test-retest reliability for Broken Window.
Supplemental Figure S5. Test-retest reliability for Refused Umbrella.
Stark, B. C., Alexander, J. M., Hittson, A., Doub, A., Igleheart, M., Streander, T., & Jewell, E. (2023). Test–retest reliability of microlinguistic information derived from spoken discourse in persons with chronic aphasia. Journal of Speech, Language, and Hearing Research, 66(7), 2316–2345. https://doi.org/10.1044/2023_JSLHR-22-00266