Structural genomics is a field of study that strives to derive and analyze the structural characteristics of proteins through means of experimentation and prediction using software and other automatic processes. Alongside implications for more effective drug design, the main motivation for structural genomics concerns the elucidation of each protein’s function, given that the structure of a protein almost completely governs its function. Historically, the approach to derive the structure of a protein has been through exceedingly expensive, complex, and time consuming methods such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy.
In response to the inadequacies of these methods, three families of approaches developed in a relatively new branch of computer science known as bioinformatics. The aforementioned families include threading, homology-modeling, and the de novo approach. However, even these methods fail either due to impracticalities, the inability to produce novel folds, rampant complexity, inherent limitations, etc. In their stead, this work proposes the Fuzzy Greedy K-means Decision Forest model, which utilizes sequence motifs that transcend protein family boundaries to predict local tertiary structure, such that the method is cheap, effective, and can produce semi-novel folds due to its local (rather than global) prediction mechanism. This work further extends the FGK-DF model with a new algorithm, the Hierarchically Clustered-Hidden Markov Models (HC-HMM) method to extract protein primary sequence motifs in a more accurate and adequate manner than currently exhibited by the FGK-DF model, allowing for more accurate and powerful local tertiary structure predictions. Both algorithms are critically examined, their methodology thoroughly explained and tested against a consistent data set, the results thereof discussed at length.
|Commitee:||Hu, Chenyi, Kockara, Sinan, Young, Paul|
|School:||University of Central Arkansas|
|Department:||Computer Science - Applied Computing|
|School Location:||United States -- Arkansas|
|Source:||MAI 52/04M(E), Masters Abstracts International|
|Subjects:||Bioinformatics, Computer science|
|Keywords:||Motif, Prediction, Protein, Sequential, Structure, Tertiary|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be