Dissertation/Thesis Abstract

Protein structure analysis and prediction utilizing the Fuzzy Greedy K-means Decision Forest model and Hierarchically-Clustered Hidden Markov Models method
by Hudson, Cody Landon, M.S., University of Central Arkansas, 2013, 97; 1549796
Abstract (Summary)

Structural genomics is a field of study that strives to derive and analyze the structural characteristics of proteins through means of experimentation and prediction using software and other automatic processes. Alongside implications for more effective drug design, the main motivation for structural genomics concerns the elucidation of each protein’s function, given that the structure of a protein almost completely governs its function. Historically, the approach to derive the structure of a protein has been through exceedingly expensive, complex, and time consuming methods such as x-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy.

In response to the inadequacies of these methods, three families of approaches developed in a relatively new branch of computer science known as bioinformatics. The aforementioned families include threading, homology-modeling, and the de novo approach. However, even these methods fail either due to impracticalities, the inability to produce novel folds, rampant complexity, inherent limitations, etc. In their stead, this work proposes the Fuzzy Greedy K-means Decision Forest model, which utilizes sequence motifs that transcend protein family boundaries to predict local tertiary structure, such that the method is cheap, effective, and can produce semi-novel folds due to its local (rather than global) prediction mechanism. This work further extends the FGK-DF model with a new algorithm, the Hierarchically Clustered-Hidden Markov Models (HC-HMM) method to extract protein primary sequence motifs in a more accurate and adequate manner than currently exhibited by the FGK-DF model, allowing for more accurate and powerful local tertiary structure predictions. Both algorithms are critically examined, their methodology thoroughly explained and tested against a consistent data set, the results thereof discussed at length.

Indexing (document details)
Advisor: Chen, Bernard
Commitee: Hu, Chenyi, Kockara, Sinan, Young, Paul
School: University of Central Arkansas
Department: Computer Science - Applied Computing
School Location: United States -- Arkansas
Source: MAI 52/04M(E), Masters Abstracts International
Subjects: Bioinformatics, Computer science
Keywords: Motif, Prediction, Protein, Sequential, Structure, Tertiary
Publication Number: 1549796
ISBN: 978-1-303-62768-2
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy