Low-latency, high-throughput systems for serving interactive queries are crucial to today's web services. Building such systems for today's web services is challenging due to the massive volumes of data they must cater to, and the requirement for supporting sophisticated queries (e.g., searches, filters, aggregations, regular expression matches, graph queries, etc.). Several recent approaches have highlighted the importance of in-memory storage for meeting the low-latency and high-throughput requirements, but these approaches are unable to sustain this performance when the data grows larger than DRAM capacity. Existing systems thus achieve these goals either by assuming large enough DRAM (too expensive) or by supporting only a limited set of queries (e.g., key-value stores).
In this dissertation, we explore algorithmic and data structure-driven solutions to these system design problems. We present Succinct, a distributed data store that addresses these challenges using a fundamentally new approach --- executing a wide range of queries (e.g., search, random access, range, wildcard) directly on a compressed representation of the input data --- thereby enabling efficient execution of queries on data sizes much larger than DRAM capacity. We then describe BlowFish, a system that builds on Succinct to enable a dynamic storage-performance tradeoff in data stores, providing applications the flexibility to modify the storage and performance fractionally, just enough to meet the desired goals. Finally, we explore approaches that enable even richer query semantics on compressed data, including graph queries using ZipG, a memory efficient graph store, and regular expression queries using Sprint, a query rewriting technique.
|Commitee:||Hellerstein, Joseph M, Hearst, Marti A|
|School:||University of California, Berkeley|
|School Location:||United States -- California|
|Source:||DAI-B 81/9(E), Dissertation Abstracts International|
|Keywords:||Big data, Compression, Data store, Distributed systems|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be