Dissertation/Thesis Abstract

Structure Prediction and Variant Interpretation of Membrane Proteins Aided by Machine Learning Algorithms
by Li, Bian, Ph.D., Vanderbilt University, 2018, 192; 13917125
Abstract (Summary)

Protein folding is a process of molecular self-assembly during which a disordered polypeptide chain collapses to form a compact and well-defined three-dimensional (3D) tertiary structure. A grand challenge in biochemistry has been to understand the process by which proteins fold into their functional tertiary structure (folding mechanism) and to predict this tertiary structure from amino acid sequence (structure prediction), two tasks that are collectively known as “the protein folding problem”. Solving this problem is of far-reaching impact as it will not only reveal the missing link between sequence and structure but also provide molecular biologists with a theoretical framework and practical tools for applications such as drug design and protein engineering. Chapter I of this dissertation gives a comprehensive review of the computational techniques developed in the past half century or so for studying the protein folding problem.

Helical membrane proteins (HMPs) play essential roles in various biological processes, including signal transduction, ionic and molecular transportation across the membrane, and energy generation. It was estimated that HMPs constitute about 20% to 30% of the human genome. Frequently, these transmembrane proteins do not function as monomers but undergo concerted interactions to form either homo-oligomers or interacting with other transmembrane proteins to form hetero-oligomers. Despite their prevalence in the genome, a very small portion of structures in the Protein Data Bank are HMPs due to the experimental difficulties in determining structures of HMPs and their complexes. Therefore, accurate and efficient computational methods would be valuable tools to complement existing experimental techniques. Chapters II, III, and IV describe a novel computational approach developed in this work for improving the tertiary structure prediction of HMPs and the quaternary structure prediction of HMP complexes.

In chapter II, the concept of residue weighted contact number (WCN) is introduced and a method is developed, using state-of-the-art machine learning techniques, for predicting WCNs from amino acid sequence alone. The WCN of an amino acid residue is defined as the number of neighboring residues weighted by their proximity to the focal residue. It measures the local packing degree of residues within the protein tertiary structure. In helical membrane proteins, every transmembrane helix has a characteristic profile of WCNs and this profile is strongly coupled with native contacts between helices. This implies that WCNs can be incorporated as restraints in the prediction of helix-helix packing. In chapter III, it is demonstrated that residues’ WCNs predicted by the method developed in chapter II are effective restraints for improving the fraction of native contacts in predicted tertiary structure models of HMPs. Chapter IV concerns with the characterization of interfaces between HMPs and the prediction of quaternary structures of HMP complexes via protein-protein docking. First, the physicochemical characteristics and evolutionary conservation of interface residues are compared with residues on the rest of the surface, a machine learning-based method is then developed for predicting the WCNs of interface and surface residues. Finally, it is showed that predicted interface residues and their WCNs can be used to derive a powerful score for selecting native-like docking candidates of HMP complexes

Proteins mutate in response to change in environment or errors in gene replication. A lot of diseases are caused by dysfunctional variants of HMPs. Mapping the relationship between variants and their functional impact is an essential step toward precision medicine. Ideally, except for certain well-established disease-causing cases, variants should be evaluated by physiologically relevant experimental functional assays, but experimental characterization remains labor-intensive and costly to scale. Variant interpretation is bound to present an increasingly daunting challenge in the era of next-generation sequencing. Under such constraints, computational methods, which are usually machine learning-based, represent a common predictive approach.

Dysfunctional variants of the KCNQ1 potassium channel are associated with the congenital long QT syndrome. Chapter V describes a machine learning-based, protein-specific method developed in this work, that is capable of accurately classifying the functional impact of nonsynonymous variants of KCNQ1. This method was trained on a manually curated, functionally validated dataset to classify molecular functional impact. It showed superior performance when compared with eight previous methods tested in parallel.

Chapter VI concludes with a summary of the key contributions this work made to the relevant fields and some considerations on a few major limitations needed to be addressed in future work. It also points out some questions that are of significant interests for future work.

Indexing (document details)
Advisor: Meiler, Jens
Commitee: Lybrand, Terry, McCabe, Clare, Meiler, Jens, Nakagawa, Terunaga
School: Vanderbilt University
Department: Chemistry
School Location: United States -- Tennessee
Source: DAI-B 80/11(E), Dissertation Abstracts International
Subjects: Physical chemistry, Bioinformatics, Biophysics
Keywords: Machine learning, Membrane protein docking, Neural networks, Protein folding, Protein stucture prediction, Variant of KCNQI
Publication Number: 13917125
ISBN: 9781392289211
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy