Investigated into and motivated by Ensemble Machine Learning (ML) techniques, this thesis contributes to addressing performance, consistency, and integrity issues such as overfitting, underfitting, predictive errors, accuracy paradox, and poor generalization for the ML models. Ensemble ML methods have shown promising outcome when a single algorithm failed to approximate the true prediction function. Using meta-learning, a super learner is engineered by combining weak learners. Generally, several methods in Supervised Learning (SL) are evaluated to find the best fit to the underlying data and predictive analytics (i.e., “No Free Lunch” Theorem relevance). This thesis addresses three main challenges/problems, i) determining the optimum blend of algorithms/methods for enhanced SL ensemble models, ii) engineering the selection and grouping of features that aggregate to the highest possible predictive and non-redundant value in the training data set, and iii) addressing the performance integrity issues such as accuracy paradox. Therefore, an enhanced Machine Learning Engine Engineering (eMLEE) is inimitably constructed via built-in parallel processing and specially designed novel constructs for error and gain functions to optimally score the classifier elements for improved training experience and validation procedures. eMLEE, as based on stochastic thinking, is built on; i) one centralized unit as Logical Table unit (LT), ii) two explicit units as enhanced Algorithm Blend and Tuning ( eABT) and enhanced Feature Engineering and Selection (eFES ), and two implicit constructs as enhanced Weighted Performance Metric (eWPM) and enhanced Cross Validation and Split ( eCVS). Hence, it proposes an enhancement to the internals of the SL ensemble approaches.
Motivated by nature inspired metaheuristics algorithms (such as GA, PSO, ACO, etc.), feedback mechanisms are improved by introducing a specialized function as Learning from the Mistakes ( LFM) to mimic the human learning experience. LFM has shown significant improvement towards refining the predictive accuracy on the testing data by utilizing the computational processing of wrong predictions to increase the weighting scoring of the weak classifiers and features. LFM further ensures the training layer experiences maximum mistakes (i.e., errors) for optimum tuning. With this designed in the engine, stochastic modeling/thinking is implicitly implemented.
Motivated by OOP paradigm in the high-level programming, eMLEE provides interface infrastructure using LT objects for the main units (i.e., Unit A and Unit B) to use the functions on demand during the classifier learning process. This approach also assists the utilization of eMLEE API by the outer real-world usage for predictive modeling to further customize the classifier learning process and tuning elements trade-off, subject to the data type and end model in goal.
Motivated by higher dimensional processing and Analysis (i.e. , 3D) for improved analytics and learning mechanics, eMLEE incorporates 3D Modeling of fitness metrics such as x for overfit, y for underfit, and z for optimum fit, and then creates logical cubes using LT handles to locate the optimum space during ensemble process. This approach ensures the fine tuning of ensemble learning process with improved accuracy metric.
To support the built and implementation of the proposed scheme, mathematical models (i.e., Definitions, Lemmas, Rules, and Procedures) along with the governing algorithms’ definitions (and pseudo-code), and necessary illustrations (to assist in elaborating the concepts) are provided. Diverse sets of data are used to improve the generalization of the engine and tune the underlying constructs during development-testing phases. To show the practicality and stability of the proposed scheme, several results are presented with a comprehensive analysis of the outcomes for the metrics (i.e., via integrity, corroboration, and quantification) of the engine. Two approaches are followed to corroborate the engine, i) testing inner layers (i.e., internal constructs) of the engine (i.e., Unit-A, Unit-B, and C-Unit) to stabilize and test the fundamentals, and ii) testing outer layer (i.e., engine as a black box ) for standard measuring metrics for the real-world endorsement. Comparison with various existing techniques in the state of the art are also reported. In conclusion of the extensive literature review, research undertaken, investigative approach, engine construction and tuning, validation approach, experimental study, and results visualization, the eMLEE is found to be outperforming the existing techniques most of the time, in terms of the classifier learning, generalization, metrics trade-off, optimum-fitness, feature engineering, and validation.
|Commitee:||Abuzneid, Abdelshakour, Gupta, Navarun, Rizvi, Syed, Xiong, Xingguo|
|School:||University of Bridgeport|
|Department:||Computer Science and Engineering|
|School Location:||United States -- Connecticut|
|Source:||DAI-B 80/07(E), Dissertation Abstracts International|
|Subjects:||Mathematics, Artificial intelligence, Computer science|
|Keywords:||Algorithms blending and boosting, Ensemble machine learning, Feature engineering, Feature optimization, Stochastic thinking, eMLEE|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be