Biological, biomedical, and radiological data tend to be large, complex, and noisy. Gene expression studies contain expression levels for thousands of genes and hundreds or thousands of patients. Chest Computed Tomography images used for diagnosing lung cancer consist of hundreds of 2-D image ”slices”, each containing hundreds of thousands of pixels. Beneath the size and apparent complexity of many of these data are simple and sparse structures. These low complexity structures can be leveraged into new approaches to biological, biomedical, and radiological data analyses. Two examples are presented here. First, a new framework SparRec (Sparse Recovery) for imputation of GWAS data, based on a matrix completion (MC) model taking advantage of the low-rank and low number of co-clusters of GWAS matrices. SparRec is flexible enough to impute meta-analyses with multiple cohorts genotyped on different sets of SNPs, even without a reference panel. Compared with Mendel-Impute, another MC method, our low-rank based method achieves similar accuracy and efficiency even with up to 90% missing data; our co-clustering based method has advantages in running time. MC methods are shown to have advantages over statistics-based methods, including Beagle and fastPhase. Second, we demonstrate NoduleX, a method for predicting lung nodule malignancy from chest Computed Tomography (CT) data, based on deep convolutional neural networks. For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort and compare our results with classifications provided by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of up to 0.99, commensurate with the radiologists’ analysis. Whether they are leveraged directly or extracted using mathematical optimization and machine learning techniques, low complexity structures provide researchers with powerful tools for taming complex data.
|Commitee:||Hong, Huixiao, Su, Hung-Chi, Walker, Karl, Yang, Mary|
|School:||University of Arkansas at Little Rock|
|School Location:||United States -- Arkansas|
|Source:||DAI-B 79/10(E), Dissertation Abstracts International|
|Keywords:||Convolutional neural network, Gwas imputation, Low rank, Lung cancer|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
supplemental files is subject to the ProQuest Terms and Conditions of use.