posted on 2020-08-13, 22:50authored byCourtney E. Walters Jr., Rachana Nitin, Katherine Margulis, Olivia Boorom, Daniel E. Gustavson, Catherine T. Bush, Lea K. Davis, Jennifer E. Below, Nancy J. Cox, Stephen M. Camarata, Reyna L. Gordon
Purpose: Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered.
Method: We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs.
Results: In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD.
Conclusions: APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders.
Supplemental Material S1. Developmental language disorder (DLD) manual chart review rubric.
Supplemental Material S2. Intercoder reliability for research assistant coders and SLP coders for 10% of the discovery sample.
Supplemental Material S3. Determining the DLD phenotype among EHRs retrieved from a broad search for LD symptoms.
Walters, C. E., Jr., Nitin, R., Margulis, K., Boorom, O., Gustavson, D. E., Bush, C. T., Davis, L. K., Below, J. E., Cox, N. J., Camarata, S. M., & Gordon, R. L. (2020). Automated Phenotyping Tool for identifying developmental language disorder cases in health systems data (APT-DLD): A new research algorithm for deployment in large-scale electronic health record systems. Journal of Speech, Language, and Hearing Research. https://doi.org/10.1044/2020_JSLHR-19-00397
Funding
Research reported in this publication was supported by the National Institute on Deafness and Other Communication Disorders and the Office of the Director of the National Institutes of Health (NIH) under Grants R01DC016977, R03DC014802, K18DC017383, R01DC017175, and R03DC015329. The data sets used for the analyses described were obtained from Vanderbilt University Medical Center’s (VUMC’s) BioVU, which is supported by institutional funding and by Grant UL1TR000445 through the Clinical and Translational Science Awards Program from the NIH National Center for Advancing Translational Sciences. Support was also provided by a Vanderbilt Kennedy Center Nicholas Hobbs Discovery grant; the Vanderbilt Genetics Institute; the Vanderbilt Brain Institute; a Chancellor’s Initiative Vanderbilt Trans-Institutional Programs grant; and the Department of Otolaryngology, VUMC.