Dissertation/Thesis Abstract

Integration of Cancer-Related Mutations for Pan-Cancer Analysis
by Wu, Tsung-Jung, M.S., The George Washington University, 2014, 45; 1556905
Abstract (Summary)

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Due to the petabytes of data and sequence information present in NGS primary databases, a High-performance Integrated Virtual Environment (HIVE) platform for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31,979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13,896 small scale and 308,986 large scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies.

Supplemental Files

Some files may require a special program or browser plug-in. More Information

Indexing (document details)
Advisor: Mazumder, Raja
Commitee: Han, Zhiyong
School: The George Washington University
Department: Biochemistry and Molecular Biology
School Location: United States -- District of Columbia
Source: MAI 53/01M(E), Masters Abstracts International
Source Type: DISSERTATION
Subjects: Bioinformatics
Keywords: Biomarker, Biomuta, Pan-cancer, SNP, high-performance integrated virtual environment, nsSNV
Publication Number: 1556905
ISBN: 9781303933196
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest