Next-generation sequencing data can be mapped to a reference genome to identify singlenucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods: 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position), and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets.
The NCI-60 cell line has been extensively researched [1-3]. These lines were aligned and profiled from lllumina exome data using the High-performance Integrated Virtual Environment (HIVE) and phylogenetic trees were generated using PhyloSNP and FastTree based on the discovered SNPs. These trees were used to determine whether there were any noticeable relationships between the cancer lines. We noticed that several melanoma cell lines (ME:MDA_MB_435, ME:MDA_N, ME:UACC_62, and ME:M14) and breast (BR:HS578T, BR:T47D, and BR:MCF7) grouped together in both total variation and non-dbSNP variation trees with near 100% bootstrap support (10,000 replicates). Preliminary findings show that the observed phylogenetic clustering of these seven cell lines is at least partially due to some altered motif and common involvement of the TP53 gene to each of these cancer types. However, more in depth analysis is required to elucidate on these potential findings.
|School:||The George Washington University|
|Department:||School of Medicine and Health Sciences|
|School Location:||United States -- District of Columbia|
|Source:||MAI 53/01M(E), Masters Abstracts International|
|Keywords:||Breast cancer, Cancer genomics, Next-generation sequencing, Phylogenetic, SNP, SNV|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be