Bioinformatics can be divided into two phases, the first phase is conversion of raw data into processed data and the second phase is using processed data to obtain scientific results. It is important to consider the first “workflow” phase carefully, as there are many paths on the way to a final processed dataset. Some workflow paths may be different enough to influence the second phase, thereby, leading to ambiguity in the scientific literature. Workflow evaluation in bioinformatics enables the investigator to carefully plan how to process their data. A system that uses real data to determine the quality of a workflow can be based on the inherent biological relationships in the data itself. To our knowledge, a general software framework that performs real data-driven evaluation of bioinformatics workflows does not exist.
The Evaluation and Utility of workFLOW (EUFLOW) decision-theoretic framework, developed and tested on gene expression data, enables users of bioinformatics workflows to evaluate alternative workflow paths using inherent biological relationships. EUFLOW is implemented as an R package to enable users to evaluate workflow data. EUFLOWis a framework which also permits user-guided utility and loss functions, which enables the type of analysis to be considered in the workflow path decision. This framework was originally developed to address the quality of identifier mapping services between UNIPROT accessions and Affymetrix probesets to facilitate integrated analysis1. An extension to this framework evaluates Affymetrix probeset filtering methods on real data from endometrial cancer and TCGA ovarian serous carcinoma samples.2 Further evaluation of RNASeq workflow paths demonstrates generalizability of the EUFLOW framework. Three separate evaluations are performed including: 1) identifier filtering of features with biological attributes, 2) threshold selection parameter choice for low gene count features, and 3) commonly utilized RNASeq data workflow paths on The Cancer Genome Atlas data.
The EUFLOW decision-theoretic framework developed and tested in my dissertation enables users of bioinformatics workflows to evaluate alternative workflow paths guided by inherent biological relationships and user utility.
|Commitee:||Chandran, Uma, Day, Roger, Hochheiser, Harry, Lu, Xinghua, Weeks, Daniel E.|
|School:||University of Pittsburgh|
|School Location:||United States -- Pennsylvania|
|Source:||DAI-B 79/01(E), Dissertation Abstracts International|
|Subjects:||Molecular biology, Bioinformatics|
|Keywords:||Data-guided, RNASeq, Workflow evaluation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be