This dissertation addresses the problems of program comprehension to support the evolution of large-scale software systems. The research concerns how software engineers locate features and concepts along with categorizing changes within very large bodies of source code along with their versioned histories. More specifically, advanced Information Retrieval (IR) and Natural Language Processing (NLP) are utilized and enhanced to support various software engineering tasks. This research is not aimed at directly improving IR or NLP approaches; rather it is aimed at understanding how additional information can be leveraged to improve the final results. The work advances the field by investigating approaches to augment and re-document source code with different types of abstract behavior information. The hypothesis is that enriching the source code corpus with meaningful descriptive information, and integrating this orthogonal information (semantic and structural) that is extracted from source code, will improve the results of the IR methods for indexing and querying information. Moreover, adding this new information to a corpus is a form of supervision. That is, apriori knowledge is often used to direct and supervise machine-learning and IR approaches.
The main contributions of this dissertation involve improving on the results of previous work in feature location and source code querying. The dissertation demonstrates that the addition of statically derived information from source code (e.g., method stereotypes) can improve the results of IR methods applied to the problem of feature location. Further contributions include showing the effects of eliminating certain textual information (comments and function calls) from being included when performing source code indexing for feature/concept location. Moreover, the dissertation demonstrates an IR-based method of natural language topic extraction that assists developers in gaining an overview of past maintenance activities based on software repository commits.
The ultimate goal of this work is to reduce the costs, effort, and time of software maintenance by improving the results of previous work in feature location and source code querying, and by supporting a new platform for enhancing program comprehension and facilitating software engineering research.
|Advisor:||Maletic, Jonathan I.|
|School:||Kent State University|
|School Location:||United States -- Ohio|
|Source:||DAI-B 75/08(E), Dissertation Abstracts International|
|Subjects:||Information Technology, Computer science|
|Keywords:||Information retrieval, Latent semantic indexing, Method stereotypes, Software comprehension, Software evolution, Software maintenance|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be