Biology's entrance into the genomic age has meant dramatic changes. Biologists once carried out painstaking, low-throughput experiments, but now often rely on massive high-throughput experimental centers and `big data'. In modern biology, the quantity of data scientists can create vastly outstrips their corresponding ability to analyze and understand its full meaning. This means that one of most pressing current challenges is to create methods that can manage, organize and visualize massive datasets with the goal of assisting biologists in creating and testing hypothesis.
The computational solution presented in this dissertation is that of the protein similarity network (PSNs) and its implementation and usage. These networks are constructed by using an all-by-all pairwise comparison of a protein entity or feature, of which a network can be visualized. These networks assist in showing proteins of interest within their context, whether it is in a sequence, structure or functional context; and in creating hypothesis about how the data of interest relate to the much larger whole.
First, Pythoscape will be presented which is a novel software framework for the creation, modification and output of large PSNs. It will be described along with an overview and description of the architecture of the framework, as well as an example using the glutathione transferase superfamily to show the power of the framework in investigating the sequence and structure relationships of large protein superfamilies.
Second, an application of Pythoscape to the alkaline phosphatase superfamily is presented. PSNs are used to generate evolutionary hypothesis for this large protein superfamily. These networks, in conjunction with phylogenetic trees, are used to propose an evolutionary model that can annotate protein function more accurately and which also demonstrates the complexity of evolution in large mechanistically diverse enzyme superfamilies.
Finally, an application of Pythoscape to the kinase superfamily is presented. We use PSNs to study how members of this superfamily are targeted by caspases, proteases that are activated during apoptosis. This preliminary research demonstrates that sequence similarity and function do not always track and that other orthogonal sources of information may be necessary for accurate annotation.
|Advisor:||Babbitt, Patricia C.|
|Commitee:||Sali, Andrej, Wells, James A.|
|School:||University of California, San Francisco|
|Department:||Pharmaceutical Sciences and Pharmacogenomics|
|School Location:||United States -- California|
|Source:||DAI-B 74/06(E), Dissertation Abstracts International|
|Keywords:||Alkaline phosphatase superfamily, Apoptosis, Caspase, Computational biology, Protein evolution, Protein origins, Protein similarity networks|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be