Nonconvex min-max optimization receives increasing attention in modern machine learning, especially in the context of deep learning. Examples include stochastic AUC maximization with deep neural networks and Generative Adversarial Nets (GANs), which correspond to a nonconvex-concave and nonconvex-nonconcave min-max problem respectively. The classical theory of min-max optimization mainly focuses on convex-concave setting, which is not applicable for deep learning applications with nonconvex min-max formulation. A natural question is proposed---how to design provably efficient algorithms for nonconvex min-max problems in deep learning? To answer this question, this dissertation focuses on the following four concrete aspects:
First, we consider the problem of stochastic AUC maximization problem with deep neural networks as preditive models. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a non-convex concave min-max problem. We explore Polyak-Lojasiewicz (PL) condition that has been proved and observed in deep learning, and develop new stochastic algorithms with even faster convergence rate and more practical step size scheme.
Second, we consider the first-order convergence theory for weakly-convex-weakly-concave min-max problems. We propose an algorithmic framework motivated by the inexact proximal point method, where the weakly monotone variational inequality (VI) corresponding to the original min-max problem is solved through approximately solving a sequence of strongly monotone VIs constructed by adding a strongly monotone mapping to the original gradient mapping. We prove first-order non-asymptotic convergence to a nearly stationary point of the original min-max problem in polynomial time.
Third, we consider a class of nonconvex-nonconcave min-max problems with the focus on GAN training. Although adaptive gradient methods with alternate update empirically work well in training GANs, it requires expensive tuning efforts, lacks theoretical convergence guarantees and might diverge in practice. To bridge the gap, we design an adaptive gradient algorithm for training GANs with provably faster convergence than its non-adaptive counterpart.
Fourth, we aim to consider large-scale GAN training in a decentralized distributed manner. Decentralized parallel algorithms are robust to network bandwidth and latency compared with its centralized counterpart and it has merits for protecting users' privacy, but decentralized algorithms for nonconvex-nonconcave min-max optimization are not considered in the existing literature. We propose and analyze a decentralized algorithm for train GANs, and show its provable convergence to first-order stationary point in polynomial time.
|Commitee:||Yang, Tianbao, Lin, Qihang, Varadarajan, Kasturi, Oliveira, Suely, Xu, Weiyu|
|School:||The University of Iowa|
|School Location:||United States -- Iowa|
|Source:||DAI-B 82/3(E), Dissertation Abstracts International|
|Subjects:||Computer science, Mathematics, Operations research|
|Keywords:||Algorithms, Applications, Deep learning, Min-max, Nonconvex, Optimization|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be