This dissertation presents a new tool for exploratory text analysis that attempts to improve the experience of navigating and exploring text and its metadata. The design of the tool was motivated by the unmet need for text analysis tools in the humanities and social sciences. In these fields, it is common for scholars to have hundreds or thousands of text-based source documents of interest from which they extract evidence for complex arguments about society and culture. These collections are difficult to make sense of and navigate. Unlike numerical data, text cannot be condensed, overviewed, and summarized in an automated fashion without losing significant information. And the metadata that accompanies the documents – often from library records – does not capture the varied content of the text within.
Furthermore, adoption of computational tools remains low among these scholars despite such tools having existed for decades. A recent study found that the main culprits were poor user interfaces and lack of communication between tool builders and tool users. We therefore took an iterative, user-centered approach to the development of the tool. From reports of classroom usage, and interviews with scholars, we developed a descriptive model of the text analysis process, and extracted design guidelines for text analysis systems. These guidelines recommend showing overviews of both the content and metadata of a collection, allowing users to separate and compare subsets of data according to combinations of searches and metadata filters, allowing users to collect phrases, sentences, and documents into custom groups for analysis, making the usage context of words easy to see without interrupting the current activity, and making it easy to switch between different visualizations of the same data.
WordSeer, the system we implemented, supports highly flexible slicing and dicing, as well as easier transitions than in other tool between visual analyses, drill-downs, lateral explorations and overviews of slices in a text collection. The tool uses techniques from computational linguistics, information retrieval and data visualization.
The contributions of this dissertation are the following. First, the design and source code of WordSeer Version 3, an exploratory text analysis system. Unlike other current systems for this audience, WordSeer 3 supports collecting evidence, isolating and analyzing sub-sets of a collection, making comparisons based on collected items, and exploring a new idea without interrupting the current task. Second, we give a descriptive model of how humanities and social science scholars undertake exploratory text analysis during the course of their work. We also identify pain points in their current workflows and give suggestions on how systems can address these problems. Third, we describe a set of design principles for text analysis systems aimed at addressing these pain points. For validation, we contribute a set of three real-world examples of scholars using WordSeer 3, which was designed according to those principles. As a measure of success, we show how the scholars were able to conduct analyses yielding otherwise inaccessible results useful to their research.
Some files may require a special program or browser plug-in. More Information
|Advisor:||Hearst, Marti A., Hartmann, Bjoern|
|Commitee:||Klein, Daniel, Wagner, Bryan N.|
|School:||University of California, Berkeley|
|School Location:||United States -- California|
|Source:||DAI-B 75/08(E), Dissertation Abstracts International|
|Subjects:||Educational technology, Information science, Computer science|
|Keywords:||Digital humanities, Exploratory search, Information retrieval, Natural language processing, Sensemaking, User interfaces, Wordseer|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be