Electronic health records contain the clinical history of patients. The enormous potential for discovery in such a rich dataset is hampered by their complexity. We hypothesize that machine learning models trained on EHR data can predict future clinical events significantly better than current models. We analyze an EHR database of 594,862 Echocardiography studies from 272,280 unique patients with both unsupervised and supervised machine learning techniques.
In the unsupervised approach, we first develop a simulation framework to evaluate a family of different clustering pipelines. We apply the optimized approach to 41,645 patients with heart failure without providing any survival information to the underlying clustering approach. The model separates patients with significantly different survival characteristics. For example, in a 10-cluster model, the minimum and maximum risk clusters had a median survival of 22 and 53 months respectively.
In the supervised approach, with 723,754 videos available from 27,028 unique patients, we assess the predictive capacity of Echocardiography video data for one-year mortality. Also, we hold out a balanced dataset of 600 patients to compare the model performance against cardiologists. We found that the best model, among four candidate architectures, is a 3D dyadic CNN model with an average AUC of 0.78 for a single parasternal long axis view. The model yields an accuracy of 75% (AUC of 0.8) on the held-out dataset while the cardiologists achieve 56% and 61%. The model performance was significantly higher than that of the cardiologists (p = 4.2e-11 and p=6.9e-7).
Finally, we develop a multi-modal supervised approach that enables interpretability. The model provides interpretations through polynomial transformations that describe the individual feature contribution and weights the transformed features to determine their importance. We validate our proposed approach using 31,278 videos from 26,793 patients. We test our proposed approach against logistic regression and non-linear and non-interpretable models based on Random Forests and XGBoost. Our results show that the proposed neural network architecture always outperforms logistic regression models while its performance approximates the other non-linear models. Overall, our multi-modal classifier based on 3D dyadic CNN and the interpretable neural network outperforms all other classifiers (AUC=0.83).
|Advisor:||Pattichis, Marios S, Fornwalt, Brandon K|
|Commitee:||Martinez-Ramon, Manuel, Pattichis, Constantinos|
|School:||The University of New Mexico|
|School Location:||United States -- New Mexico|
|Source:||DAI-B 81/4(E), Dissertation Abstracts International|
|Subjects:||Computer Engineering, Bioinformatics|
|Keywords:||Echocardiography, interpretable, video analysis|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be