Reducing the effects of sequencing errors and PCR artifacts has emerged as an essential component in amplicon-based metagenomic studies. Denoising algorithms have been written that can reduce error rates in mock community data, in which the true sequences are known, but they were designed to be used in studies of real communities. To evaluate the outcome of the denoising process, we developed methods that do not rely on a priori knowledge of the correct sequences, and we applied these methods to a real-world dataset. We found that the denoising algorithms had substantial negative side-effects on the sequence data. For example, in the most widely used denoising pipeline, AmpliconNoise, the algorithm that was designed to remove pyrosequencing errors changed the reads in a manner inconsistent with the known spectrum of these errors, until one of the parameters was increased substantially from its default value.
With these shortcomings in mind, we developed a novel denoising program, FlowClus. FlowClus uses a systematic approach to filter and denoise reads efficiently. When denoising real datasets, FlowClus provides feedback about the process that can be used as the basis to adjust the parameters of the algorithm to suit the particular dataset. FlowClus produced a lower error rate compared to other denoising algorithms when analyzing a mock community dataset, while retaining significantly more sequence information. Among its other attributes, FlowClus can analyze longer reads being generated from current protocols and irregular flow orders. It has processed a full plate (1.5 million reads) in less than four hours; using its more efficient (but less precise) trie analysis option, this time was further reduced, to less than seven minutes.
|School:||University of New Hampshire|
|School Location:||United States -- New Hampshire|
|Source:||DAI-B 75/10(E), Dissertation Abstracts International|
|Keywords:||16S rRNA, Denoising, Metagenomics, Microbiome, Pyrosequencing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be