COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Predicting Substantiation of Office of Inspector General Investigations Using Multinomial Naïve Bayes and Natural Language Processing
by Starr, Alexis V., D.Engr., The George Washington University, 2021, 119; 28256297
Abstract (Summary)

Low substantiation rates are pervasive across the federal Office of Inspector General (OIG) community due to high levels of uncertainty and limited data availability at the time of case selection. OIG management often selects cases based on intuition and past experience. Intuitive project selection has proven unsuccessful because the methods are often subjective, prone to bias, and lead to error. The high uncertainty surrounding case selection and the current selection method employed by OIG management teams results in a significant loss of investigative resources spent on unsubstantiated cases. This research presents a novel approach to predict OIG investigative case substantiation using natural language processing techniques and multinomial naïve Bayes to retrieve information from complaint intakes. It aims to improve OIG substantiation rates and reduce the cost associated with unsubstantiated cases. The model developed in this study significantly outperformed OIG management and was 20% more accurate in the prediction of substantiated and unsubstantiated cases. This model will augment investigative case selection and improve investigative targeting, increase impact of investigative work, and improve OIG investigative resource allocation. Its application will result in a significant savings by reducing the resources dedicated to cases with a low probability of substantiation.  

Indexing (document details)
Advisor: Sarkani, Shahryar
Commitee: Fossaceca, John, Etemadi, Amir
School: The George Washington University
Department: Engineering Management
School Location: United States -- District of Columbia
Source: DAI-A 82/6(E), Dissertation Abstracts International
Subjects: Applied Mathematics, Engineering, Business administration, Management, Finance
Keywords: Naive bayes, Natural language processing, Substantiation, Natural Language Processing, Low substantiation rates, Office of Inspector General, Limited data availability
Publication Number: 28256297
ISBN: 9798698589365
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy