Stochastic optimizations algorithms like stochastic gradient descent (SGD) are favorable for large-scale data analysis because they update the model sequentially and with low per-iteration costs. Much of the existing work focuses on optimizing accuracy, however, it is known that accuracy is not an appropriate measure for class imbalanced data. Area under the ROC curve (AUC) is a standard metric that is used to measure classification performance for such a situation. Therefore, developing stochastic learning algorithms that maximize AUC in lieu of accuracy is of both theoretical and practical interest. However, AUC maximization presents a challenge since the learning objective function is defined over a pair of instances of opposite classes. Existing methods can overcome this issue and achieve online processing but with higher space and time complexity. In this thesis, we will develop two novel stochastic algorithms for AUC maximization. The first is an online method which is referred to as SPAM. In comparison to the previous literature, the algorithm can be applied to non-smooth penalty functions while achieving a convergence rate of O(log T / T). The second is a batch learning method which is referred to as SPDAM. We establish a linear convergence rate for a sufficiently large batch size. We demonstrate the effectiveness of such algorithms on standard benchmark data sets as well as data sets for anomaly detection tasks.
|Commitee:||Feng, Yunlong, Reinhold, Karin, Goldfarb, Boris|
|School:||State University of New York at Albany|
|Department:||Mathematics and Statistics|
|School Location:||United States -- New York|
|Source:||DAI-B 81/11(E), Dissertation Abstracts International|
|Keywords:||AUC optimization, Batch learning, Binary classification, Machine learning, Online learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be