Dissertation/Thesis Abstract

Optimization Methods Combining Variable Splitting with Deep Learning
by Cowen, Benjamin, Ph.D., New York University Tandon School of Engineering, 2019, 136; 13882605
Abstract (Summary)

Variable splitting is a classic approach to decomposing a difficult optimization problem into an iterative sequence of easier sub-problems. This framework has found great success in solving signal processing problems across a vast variety of disciplines. Recently, several variable splitting techniques have been proposed for training deep neural networks, an extremely challenging optimization task. However, these algorithms are not scalable to real-world problem sizes due to their mainly offline nature. The first work presented in this dissertation is a stochastic, online alternating minimization algorithm for training deep neural networks via variable splitting, designed in collaboration with the IBM Thomas J. Watson Research Center. The proposed method demonstrates strong improvements over offline variable splitting approaches to neural network training, and also shows advantages over popular state-of-the-art backpropagation-based training algorithms.

After exploring how variable splitting can enhance deep learning, we ask a question in the other direction: how can deep learning techniques enhance variable splitting methods? One of the most widely used variable splitting frameworks, called the Alternating Direction Method of Multipliers (ADMM) is an extremely powerful optimizer but can sometimes take many tens of iterations to achieve a good approximate solution. We propose a framework that uses deep learning principles to accelerate ADMM on L1-regularized linear least squares minimization problems. The learned algorithm is generated by the recently proposed technique known as ``unrolling'', which in our case is to implement truncated-ADMM (i.e. ADMM for a fixed number of iterations) using a neural network. Then, deep learning techniques can be applied to strategically modify certain naturally arising linear operators of ADMM with data-driven corrections learned from a training dataset.

We empirically validate the resulting algorithm in a variety of sparse coding settings, i.e. where the goal is to provide sparse representation of a signal with respect to a given dictionary. Of particular interest is the challenging multiple-dictionary setting as given by morphological component analysis (MCA), a generalization of sparse coding in which the goal is to decompose a signal into additive parts such that each part has distinct sparse representation within an appropriately chosen corresponding dictionary. Our algorithm demonstrates vast acceleration over common baselines in both settings. Finally, we present a theoretical framework that shows the resulting algorithm is still an instance of ADMM, but applied to a new, learned cost function. By extending a recently proposed Stochastic Alternating Optimization analysis framework, we show that a gradient descent step along this learned loss landscape is equivalent to a modified gradient descent step along the original loss landscape. In this framework, the achieved acceleration could potentially be explained by the network's ability to learn a correction to the gradient direction of steeper descent.

Indexing (document details)
Advisor: Selesnick, Ivan
Commitee: Choromanska, Anna, Voltz, Peter
School: New York University Tandon School of Engineering
Department: Electrical and Computer Engineering
School Location: United States -- New York
Source: DAI-B 81/2(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Applied Mathematics, Artificial intelligence, Electrical engineering
Keywords: Convex analysis, Deep learning, Machine learning, Optimization, Signal processing, Sparsity
Publication Number: 13882605
ISBN: 9781085600828
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest