Low-resource organizations worldwide work to improve health, education, infrastructure, and economic opportunity in disadvantaged communities. These organizations must collect data in order to inform service delivery and performance monitoring. In such settings, data collection can be laborious and expensive due to challenges in the physical and digital infrastructure, in capacity and retention of technical staff, and in poor performance incentives. Governments, donors, and non-governmental organizations (NGOs) large and small are demanding more accountability and transparency, resulting in increased data collection workloads. Despite continued emphasis and investment, countless data collection efforts continue to experience delayed and low-quality results. Existing tools and capabilities for data collection have not kept pace with increased reporting requirements.
This dissertation addresses data collection in low-resource settings by algorithmically shepherding human attention at three different scales: (1) by redirecting workers' attention at the moment of entry, (2) by reformulating the data collection instrument in its design and use, and (3) by reorganizing the flow and composition of data entry tasks within and between organizations. These three different granularities of intervention map to the three major parts of this dissertation.
First, the Usher system learns probabilistic models from previous form responses, supplementing the lack of expertise and quality control. The models are a principled foundation for data in forms, and are applied at every step of the data collection process: form design, form filling, and answer verification. Simulated experiments demonstrate that Usher can improve data quality and reduce quality-control effort considerably.
Next, a number of dynamic user-interface mechanisms improve accuracy and efficiency during the act of data entry, powered by Usher. Based on a cognitive model, these interface adaptations can be applied as interventions before, during, and after input. An evaluation with professional data entry clerks in rural Uganda reduced error by up to 78%.
Finally, the Shreddr system transforms paper form images into structured data on-demand. Shreddr reformulates data entry work-flows with pipeline and batching optimizations at the organizational level. It combines emergent techniques from computer vision, database systems, and machine learning, with newly-available infrastructure—on-line workers and mobile connectivity—into a hosted data entry web-service. It is a framework for data digitization that can deliver Usher and other optimizations at scale. Shreddr's impact on digitization efficiency and quality is illustrated in a one-million-value case study in Mali.
The main contributions of this dissertation are (1) a probabilistic foundation for data collection, which effectively guides form design, form filling, and value verification; (2) dynamic data entry interface adaptations, which significantly improve data entry accuracy and efficiency; and (3) the design and large-scale evaluation of a hosted-service architecture for data entry.
|Advisor:||Hellerstein, Joseph M., Parikh, Tapan S.|
|School:||University of California, Berkeley|
|Department:||Electrical Engineering & Computer Sciences|
|School Location:||United States -- California|
|Source:||DAI-B 73/07(E), Dissertation Abstracts International|
|Keywords:||Adaptive interfaces, Crowdsourcing, Data collection, Data entry, Data quality, Transcription|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be