The Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. SAMOS is one of several endpoints connected into the DOMS network, providing in-situ data for the match-up service. DOMS in-situ endpoints currently use Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by removing any prohibiting requirements on the data model, and permitting relationships between data objects.
This paper documents the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS is also described. Various data models are explored including spatial-temporal records from SAMOS added to a time tree using Graph Aware technology. This extension provides callable Java procedures within the CYPHER query language that generate in-graph structures used in data retrieval.
Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query could be complex and likely prohibitively slow. Using the time tree model in a graph, one can specify a path from the root to the data which restricts resolutions to certain time frames (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative. That said, while this advantage may be useful, it should not be interpreted as an advantage to Solr in the context of DOMS. Solr makes use of Apache Lucene indexing at its core, while Neo4j provides its own native schema indexes. Ultimately they each provide unique solutions for data retrieval that are geared for specific tasks. In the DOMS setting it would appear that Solr is the most suitable option, as there seems to be very limited use cases where Neo4j does outperform Solr. This is primarily because the use case as a subsetting tool does not require the flexibility and path-based queries that graph database tools offer. Rather, DOMS nodes are using high performance indexing structures to quickly filter large amounts of raw data that are not deeply connected, a feature of large data sets where graph queries would indeed become useful.
|Advisor:||Smith, Shawn R., Zhao, Peixiang|
|Commitee:||Haiduc, Sonia, Nistor, Adrian|
|School:||The Florida State University|
|School Location:||United States -- Florida|
|Source:||MAI 57/01M(E), Masters Abstracts International|
|Keywords:||CYPHER, Database, Graph, Neo4j, SAMOS, Solr|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
supplemental files is subject to the ProQuest Terms and Conditions of use.