Subgraph discovery in a single data graph---finding subsets of vertices and edges satisfying a user-specified criteria---is an essential and general graph analytics operation with a wide spectrum of applications. Depending on the criteria, subgraphs of interest may correspond to cliques of friends in social networks, interconnected entities in RDF data, or frequent patterns in protein interaction networks to name a few. Existing systems usually examine a large number of subgraphs while employing many computers and often produce an enormous result set of subgraphs. How can we enable fast discovery of only the most relevant subgraphs while minimizing the computational requirements?
In this dissertation we present Nuri, a general subgraph discovery system that allows users to succinctly specify subgraphs of interest and criteria for ranking them. Given such specifications, Nuri efficiently finds the k most relevant subgraphs using only a single computer. It prioritizes (i.e., expands earlier than others) subgraphs that are more likely to expand into the desired subgraphs (prioritized subgraph expansion) and proactively discards irrelevant subgraphs from which the desired subgraphs cannot be constructed (pruning). Nuri can also efficiently store and retrieve a large number of subgraphs on disk without being limited by the size of main memory. We demonstrate using both real and synthetic datasets that Nuri on a single core outperforms the closest alternative distributed system consuming 40 times more computational resources by more than 2 orders of magnitude for clique discovery and 1 order of magnitude for subgraph isomorphism and pattern mining.
|Commitee:||Bogdanov, Petko, Chen, Feng|
|School:||State University of New York at Albany|
|School Location:||United States -- New York|
|Source:||DAI-B 80/05(E), Dissertation Abstracts International|
|Keywords:||Prioritization, Pruning, Subgraph discovery|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be