Reproducibility, a main principle of the scientific method, has historically depended on text and proofs in a publication. However, as computation pervades science and changes the way how research is conducted, relying only on the experimental results in a publication cannot guarantee reproducibility. The execution environment, in which the results were generated, is another important ingredient and must also be preserved to reproduce the results. Unfortunately, execution environments for scientific work are often fragile and too complex to be well understood by researchers, let alone to be preserved.
This dissertation proposes two broad approaches for improving the reproducibility of scientific applications and explore their feasibility and applicability for both single-machine scientific applications and complex scientific workflows. The first approach wraps the minimal execution environment of an application into an all-in-one package. The second approach specifies the execution environment from hardware, kernel and OS all the way up to software, data and environment variables in an organized way, preserves dependencies in the unit of basic OS image, software and data, and combines all the dependencies at runtime using mounting mechanisms.
For each approach, a prototype was implemented and the following three aspects are explored: what to preserve, how to preserve and how to reproduce. The time and space overheads to preserve and reproduce applications, and the correctness of preserved artifacts are evaluated through applications from high energy physics, bioinformatics, epidemiology and scene rendering. The evaluation results show that both approaches allow researchers to reproduce an application and verify its results. However, the second approach avoids storing shared dependencies repeatedly and makes it easier to extend the original work.
This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start.
|School:||University of Notre Dame|
|Department:||Computer Science and Engineering|
|School Location:||United States -- Indiana|
|Source:||DAI-B 80/06(E), Dissertation Abstracts International|
|Keywords:||Execution environment specifications, Reproducible research, Scientific applications, Scientific workflows, Software preservation, Virtualization techniques|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be