Dissertation/Thesis Abstract

Analysis and optimization for processing grid-scale XML datasets
by Head, Michael Reuben, Ph.D., State University of New York at Binghamton, 2009, 121; 3389637
Abstract (Summary)

In the field of Scientific Computing, two trends are clear: the size of data sets in use is growing rapidly and microprocessor performance is improving through increases in parallelism, rather than through clock rate increases. Further, Extensible Markup Language (XML) is increasingly being used to encode large data sets, and SOAP is being used to provide Grid services – uses XML and SOAP were never designed for, and naïve implementations of these standards can lead to performance penalties. As these trends continue, past assumptions about the value of seeking out parallel algorithms should be revisited.

Lexical analysis has traditionally been seen as an inherently serial process. This work seeks to challenge that viewpoint. We start by tracking the performance of state of the art in XML parsers and SOAP toolkits through benchmarks for scientific computing applications. We continue to study the space through an examination of the effects of current workstation- and server-class computer systems' caching mechanisms on parser performance. Finally, we propose Piximal, an NFA-based parser which uses spare processors to reduce XML parse time. The limits of the Piximal approach to parallel XML parsing are examined.

Indexing (document details)
Advisor: Govindaraju, Madhusudhan
Commitee: Chiu, Kenneth, Guzman, Fernando, Lander, Leslie, Lewis, Michael J.
School: State University of New York at Binghamton
Department: Computer Science
School Location: United States -- New York
Source: DAI-B 71/01, Dissertation Abstracts International
Subjects: Computer science
Keywords: Automata, Benchmarks, Multicore, Parallel algorithms, Parsing, XML datasets
Publication Number: 3389637
ISBN: 978-1-109-56418-1
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy