Dissertation/Thesis Abstract

Improving the Reproducibility of Scientific Applications with Execution Environment Specifications
by Meng, Haiyan, Ph.D., University of Notre Dame, 2017, 127; 13836455
Abstract (Summary)

Reproducibility, a main principle of the scientific method, has historically depended on text and proofs in a publication. However, as computation pervades science and changes the way how research is conducted, relying only on the experimental results in a publication cannot guarantee reproducibility. The execution environment, in which the results were generated, is another important ingredient and must also be preserved to reproduce the results. Unfortunately, execution environments for scientific work are often fragile and too complex to be well understood by researchers, let alone to be preserved.

This dissertation proposes two broad approaches for improving the reproducibility of scientific applications and explore their feasibility and applicability for both single-machine scientific applications and complex scientific workflows. The first approach wraps the minimal execution environment of an application into an all-in-one package. The second approach specifies the execution environment from hardware, kernel and OS all the way up to software, data and environment variables in an organized way, preserves dependencies in the unit of basic OS image, software and data, and combines all the dependencies at runtime using mounting mechanisms.

For each approach, a prototype was implemented and the following three aspects are explored: what to preserve, how to preserve and how to reproduce. The time and space overheads to preserve and reproduce applications, and the correctness of preserved artifacts are evaluated through applications from high energy physics, bioinformatics, epidemiology and scene rendering. The evaluation results show that both approaches allow researchers to reproduce an application and verify its results. However, the second approach avoids storing shared dependencies repeatedly and makes it easier to extend the original work.

This work makes its contribution by demonstrating the importance of execution environments for the reproducibility of scientific applications and differentiating execution environment specifications, which should be lightweight, persistent and deployable, from various tools used to create execution environments, which may experience frequent changes due to technological evolution. It proposes two preservation approaches and prototypes for the purposes of both result verification and research extension, and provides recommendations on how to build reproducible scientific applications from the start.

Indexing (document details)
School: University of Notre Dame
Department: Computer Science and Engineering
School Location: United States -- Indiana
Source: DAI-B 80/06(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Execution environment specifications, Reproducible research, Scientific applications, Scientific workflows, Software preservation, Virtualization techniques
Publication Number: 13836455
ISBN: 978-0-438-83641-9
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy