One of the greatest challenges to data mining is erroneous or noisy data. Several studies have noted the weak performance of classification models trained from low quality data. This dissertation shows that low quality data can also impact the effectiveness of feature selection, and considers the effect of class noise on various feature ranking techniques. It presents a novel approach to feature ranking based on ensemble learning and assesses these ensemble feature selection techniques in terms of their robustness to class noise. It presents a noise-based stability analysis that measures the degree of agreement between a feature ranking techniques output on a clean dataset versus its outputs on the same dataset but corrupted with different combinations of noise level and noise distribution. It then considers classification performances from models built with a subset of the original features obtained after applying feature ranking techniques on noisy data. It proposes the focused ensemble feature ranking as a noise-tolerant approach to feature selection and compares focused ensembles with general ensembles in terms of the ability of the selected features to withstand the impact of class noise when used to build classification models. Finally, it explores three approaches for addressing the combined problem of high dimensionality and class imbalance. Collectively, this research shows the importance of considering class noise when performing feature selection.
|Advisor:||Khoshgoftaar, Taghi M.|
|School:||Florida Atlantic University|
|School Location:||United States -- Florida|
|Source:||DAI-B 73/01, Dissertation Abstracts International|
|Keywords:||Data mining, Feature selection, Noisy data, Quality data, Stability|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be