Proteins are important biological molecules that perform many different functions in an organism. They are composed of sequences of amino acids that play a large part in determining both their structure and function. In turn, the structures of proteins are related to their functions. Using computational methods for protein study is a popular approach, offering the possibility of being faster and cheaper than experimental methods. These software-based methods are able to take information such as the protein sequence and other empirical data and output predictions such as protein structure or function.
In this work, we have developed a set of computational methods that are used in the application of protein structure prediction and protein function prediction. For protein structure prediction, we use the evolution of logic circuits to produce logic circuit classifiers that predict the protein contact map of a protein based on high-dimensional feature data. The diversity of the evolved logic circuits allows for the creation of ensembles of classifiers, and the answers from these ensembles are combined to produce more-accurate answers. We also apply a number of ensemble algorithms to our results.
Our protein function prediction work is based on the use of six existing computational protein function prediction methods, of which four were optimized for use on a benchmark dataset, along with two others developed by collaborators. We used a similar ensemble framework, combining the answers from the six methods into an ensemble using an algorithm, CONS, that we helped develop.
Our contact map prediction study demonstrated that it was possible to evolve logic circuits for this purpose, and that ensembles of the classifiers improved performance. The results fell short of state-of-the-art methods, and additional ensemble algorithms failed to improve the performance. However, the method was also able to work as a feature detector, discovering salient features from the high-dimensional input data, a computationally-intractable problem. In our protein function prediction work, the combination of methods similarly led to a robust ensemble. The CONS ensemble, while not performing as well as the best individual classifier in absolute terms, was nevertheless very close in terms of performance. More intriguingly, there were many specific cases where it performed better than any single method, indicating that this ensemble provided valuable information not captured by any single methods.
To our knowledge, this is the first time the evolution of logic circuits has been used in any Bioinformatics problem, and it is expected that as the method becomes more developed, results will improve. It is also expected that the feature-detection aspect of this method can be used in other studies. The function prediction study also marks, to our knowledge, the most-comprehensive ensemble classification for protein function prediction. Finally, we expect that the ensemble classification methods used and developed in our protein structure and function work here will pave the way towards stronger ensemble predictors in the future.
|Advisor:||KC, Dukka B., Bikdash, Marwan|
|Commitee:||Bikdash, Marwan, Flurchick, Kenneth M., Harrison, Scott H., KC, Dukka B., Newman, Rob H., Smith, Mary A.|
|School:||North Carolina Agricultural and Technical State University|
|Department:||Computational Science and Engineering|
|School Location:||United States -- North Carolina|
|Source:||DAI-B 78/06(E), Dissertation Abstracts International|
|Subjects:||Biology, Bioinformatics, Artificial intelligence|
|Keywords:||Ensemble methods, Evolutionary computation, Machine learning, Protein function, Protein structure|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be