Dissertation/Thesis Abstract

Methods Enabling Portability of Scientific Workflows
by Hazekamp, Nicholas, Ph.D., University of Notre Dame, 2019, 184; 28187462
Abstract (Summary)

Scientific work flows are common and powerful tools used to elevate small scale analysis to large scale distributed computation. They provide ease of use for domain scientists by supporting the use of applications as they are, partitioning the data for concurrency instead of the application. However, many of these work flows are written in a way that couples the scientific intention with the specificity of the execution environment. This coupling limits the flexibility and portability of the work flow, requiring the work ow to be re-engineered for each new dataset or site.

I propose that work flows can be written for pure scientific intent, with the idiosyncrasies of execution resolved at runtime using work flow abstractions. These abstractions would allow work flows to be quickly transformed for different configurations, specifically handling new datasets, diverse sites, and different configurations. I examine three methods for developing work flow abstraction on static work flows, apply these methods to a dynamic work flow, and propose an approach that separates the user from the distributed environment.

In developing these methods for static work flows I first explored Dynamic Work-Flow Expansion, which allows work flows to be quickly adapted for new and diverse datasets. Then I describe an algorithm for statically determining a work flow's storage needs, which is used at runtime to prevent storage deadlocks. Finally, I develop an algebra for transforming work flows, which isolates site and configuration specific designs to be applied to work flows as needed. These methods were combined and applied to a dynamic work flow, adapting a site bounds MPI application to a dynamic cloud work flow.

I combine these methods and formulated the Continuously Divisible Jobs abstraction to separate the domain scientist's application from the distributed logic of a dynamic work flow. This abstraction defines an API which applications can implement to allow for dynamic distributed computation, showcasing the flexibility and portability provided through work flow abstractions.

Indexing (document details)
Advisor: Thain, Douglas
Commitee:
School: University of Notre Dame
School Location: United States -- Indiana
Source: DAI-B 82/3(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Scientific work flows, Static workflow systems, Distributed computation
Publication Number: 28187462
ISBN: 9798664788679
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest