Biological studies are data-intensive by nature. We have witnessed a rapid accumulation of various types of biological data in the past decade. Due to the complexity of biology, it is challenging to select the most relevant features and build mechanism-based models given the flood of biological data. In this thesis, we applied machine learning in predicting the kinetic constants of proteins by machine learning models using features generated by Rosetta, and predicting mutations in a genome of Escherichia coli (E. coli) in a culture condition. To build machine learning models, high-quality standardized data around a biological problem is critical. A mutation database was curated from literature for predicting mutation. Due to the on-going nature of research, it is common to design new experiments to fill in the gap or address ambiguity in the data that has been collected. Given a limited budget, it is imperative to select the most valuable experiments to run. We applied active learning (optimal experimental design) technique using Gaussian process (GP) to quantify the uncertainty and representativeness of each candidate experiment. The most uncertain and representative candidates were selected and the data was collected in a wet lab. Our approach reduced the number of datapoints by 44% to reach the same prediction accuracy on a transcriptomic profiling problem, in which the transcriptomic profile of E. coli was predicted by GP models trained on transcriptomic profiles in other culture conditions. The optimal experimental design framework consists of two modules, a predictive model and a utility score to quantify the information content of a candidate experiment. The framework can also be applied in other scenarios by replacing the predictive model with one suited for the scenarios.
|Commitee:||Faccioti, Marc T., Siegel, Justin|
|School:||University of California, Davis|
|School Location:||United States -- California|
|Source:||DAI-B 82/2(E), Dissertation Abstracts International|
|Subjects:||Biomedical engineering, Bioinformatics, Artificial intelligence|
|Keywords:||Evolution, Machine learning, Omics, Optimal experimental design, Protein design|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be