Dissertation/Thesis Abstract

Machine Learning Methods for Predicting Evolution, Mutation Effects, and Optimal Experimental Design
by Wang, Xiaokang, Ph.D., University of California, Davis, 2020, 138; 27999669
Abstract (Summary)

Biological studies are data-intensive by nature. We have witnessed a rapid accumulation of various types of biological data in the past decade. Due to the complexity of biology, it is challenging to select the most relevant features and build mechanism-based models given the flood of biological data. In this thesis, we applied machine learning in predicting the kinetic constants of proteins by machine learning models using features generated by Rosetta, and predicting mutations in a genome of Escherichia coli (E. coli) in a culture condition. To build machine learning models, high-quality standardized data around a biological problem is critical. A mutation database was curated from literature for predicting mutation. Due to the on-going nature of research, it is common to design new experiments to fill in the gap or address ambiguity in the data that has been collected. Given a limited budget, it is imperative to select the most valuable experiments to run. We applied active learning (optimal experimental design) technique using Gaussian process (GP) to quantify the uncertainty and representativeness of each candidate experiment. The most uncertain and representative candidates were selected and the data was collected in a wet lab. Our approach reduced the number of datapoints by 44% to reach the same prediction accuracy on a transcriptomic profiling problem, in which the transcriptomic profile of E. coli was predicted by GP models trained on transcriptomic profiles in other culture conditions. The optimal experimental design framework consists of two modules, a predictive model and a utility score to quantify the information content of a candidate experiment. The framework can also be applied in other scenarios by replacing the predictive model with one suited for the scenarios.

Indexing (document details)
Advisor: Tagkopoulos, Ilias
Commitee: Faccioti, Marc T., Siegel, Justin
School: University of California, Davis
Department: Biomedical Engineering
School Location: United States -- California
Source: DAI-B 82/2(E), Dissertation Abstracts International
Subjects: Biomedical engineering, Bioinformatics, Artificial intelligence
Keywords: Evolution, Machine learning, Omics, Optimal experimental design, Protein design
Publication Number: 27999669
ISBN: 9798664726923
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy