Dissertation/Thesis Abstract

Enhanced Machine Learning Engine Engineering Using Innovative Blending, Tuning, and Feature Optimization
by Uddin, Muhammad Fahim, Ph.D., University of Bridgeport, 2018, 192; 13427950
Abstract (Summary)

Investigated into and motivated by Ensemble Machine Learning (ML) techniques, this thesis contributes to addressing performance, consistency, and integrity issues such as overfitting, underfitting, predictive errors, accuracy paradox, and poor generalization for the ML models. Ensemble ML methods have shown promising outcome when a single algorithm failed to approximate the true prediction function. Using meta-learning, a super learner is engineered by combining weak learners. Generally, several methods in Supervised Learning (SL) are evaluated to find the best fit to the underlying data and predictive analytics (i.e., “No Free Lunch” Theorem relevance). This thesis addresses three main challenges/problems, i) determining the optimum blend of algorithms/methods for enhanced SL ensemble models, ii) engineering the selection and grouping of features that aggregate to the highest possible predictive and non-redundant value in the training data set, and iii) addressing the performance integrity issues such as accuracy paradox. Therefore, an enhanced Machine Learning Engine Engineering (eMLEE) is inimitably constructed via built-in parallel processing and specially designed novel constructs for error and gain functions to optimally score the classifier elements for improved training experience and validation procedures. eMLEE, as based on stochastic thinking, is built on; i) one centralized unit as Logical Table unit (LT), ii) two explicit units as enhanced Algorithm Blend and Tuning ( eABT) and enhanced Feature Engineering and Selection (eFES ), and two implicit constructs as enhanced Weighted Performance Metric (eWPM) and enhanced Cross Validation and Split ( eCVS). Hence, it proposes an enhancement to the internals of the SL ensemble approaches.

Motivated by nature inspired metaheuristics algorithms (such as GA, PSO, ACO, etc.), feedback mechanisms are improved by introducing a specialized function as Learning from the Mistakes ( LFM) to mimic the human learning experience. LFM has shown significant improvement towards refining the predictive accuracy on the testing data by utilizing the computational processing of wrong predictions to increase the weighting scoring of the weak classifiers and features. LFM further ensures the training layer experiences maximum mistakes (i.e., errors) for optimum tuning. With this designed in the engine, stochastic modeling/thinking is implicitly implemented.

Motivated by OOP paradigm in the high-level programming, eMLEE provides interface infrastructure using LT objects for the main units (i.e., Unit A and Unit B) to use the functions on demand during the classifier learning process. This approach also assists the utilization of eMLEE API by the outer real-world usage for predictive modeling to further customize the classifier learning process and tuning elements trade-off, subject to the data type and end model in goal.

Motivated by higher dimensional processing and Analysis (i.e. , 3D) for improved analytics and learning mechanics, eMLEE incorporates 3D Modeling of fitness metrics such as x for overfit, y for underfit, and z for optimum fit, and then creates logical cubes using LT handles to locate the optimum space during ensemble process. This approach ensures the fine tuning of ensemble learning process with improved accuracy metric.

To support the built and implementation of the proposed scheme, mathematical models (i.e., Definitions, Lemmas, Rules, and Procedures) along with the governing algorithms’ definitions (and pseudo-code), and necessary illustrations (to assist in elaborating the concepts) are provided. Diverse sets of data are used to improve the generalization of the engine and tune the underlying constructs during development-testing phases. To show the practicality and stability of the proposed scheme, several results are presented with a comprehensive analysis of the outcomes for the metrics (i.e., via integrity, corroboration, and quantification) of the engine. Two approaches are followed to corroborate the engine, i) testing inner layers (i.e., internal constructs) of the engine (i.e., Unit-A, Unit-B, and C-Unit) to stabilize and test the fundamentals, and ii) testing outer layer (i.e., engine as a black box ) for standard measuring metrics for the real-world endorsement. Comparison with various existing techniques in the state of the art are also reported. In conclusion of the extensive literature review, research undertaken, investigative approach, engine construction and tuning, validation approach, experimental study, and results visualization, the eMLEE is found to be outperforming the existing techniques most of the time, in terms of the classifier learning, generalization, metrics trade-off, optimum-fitness, feature engineering, and validation.

Indexing (document details)
Advisor: Lee, Jeongkyu
Commitee: Abuzneid, Abdelshakour, Gupta, Navarun, Rizvi, Syed, Xiong, Xingguo
School: University of Bridgeport
Department: Computer Science and Engineering
School Location: United States -- Connecticut
Source: DAI-B 80/07(E), Dissertation Abstracts International
Subjects: Mathematics, Artificial intelligence, Computer science
Keywords: Algorithms blending and boosting, Ensemble machine learning, Feature engineering, Feature optimization, Stochastic thinking, eMLEE
Publication Number: 13427950
ISBN: 978-0-438-94599-9
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy