Digital provenance is metadata that describes the ancestry or history of a digital object. Provenance enhances the value of the data it describes as it provides answers to questions such as: How was this object created? On what other objects does this object depend? How do the ancestries of these two objects differ?
In many digital systems, provenance collection is either entirely missing, thus losing valuable information, or is recorded as an after thought, risking inconsistency between data and provenance. This dissertation demonstrates that storage systems are well-suited for automatically inferring and managing provenance. Accordingly, we introduce the Provenance-Aware Storage System (PASS), a storage system that automatically collects and maintains the provenance of files. We describe the challenges in building a PASS and present an architecture for collecting provenance in local file systems. The provenance that PASS collects is useful for scientific documentation, debugging, security, search, and information lifecycle management. PASS imposes reasonable overheads, with maximum 23% observed elapsed time overhead.
We then extend PASS to the more semantically rich domains of applications. Ultimately, we provide the disclosed provenance API, an interface that supports and encourages the integration of multiple provenance collection substrates, each operating at a particular abstraction layer. By integrating the provenance collected by PASS, a workflow engine, a web browser, and a runtime Python provenance tracking wrapper, we demonstrate that this cross-layer integration provides powerful new functionality unavailable by other means.
While cross-layer provenance integration demonstrates how the PASS architecture can be extended up the application stack, we demonstrate versatility of the architecture by extending it to network attached stores (NAS) and cloud stores. To demonstrate the functionality of the architecture in a NAS, we augmented the NFS protocol with additional operations. Our augmented NFS protocol has reasonable overheads, with maximum 16.8% observed elapsed time overhead. To demonstrate the functionality of the architecture in a cloud, we designed protocols that store provenance with data on the cloud. Our cloud protocol overheads are minimal with overheads less than 10% in most cases.
|Advisor:||Seltzer, Margo Ilene|
|School Location:||United States -- Massachusetts|
|Source:||DAI-B 72/01, Dissertation Abstracts International|
|Keywords:||Cloud, File systems, Network attached stores, PASS architectures, Provenance|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be