Dissertation/Thesis Abstract

Large margin transformation learning
by Howard, Andrew G., Ph.D., Columbia University, 2009, 90; 3388470
Abstract (Summary)

With the current explosion of data coming from many scientific fields and industry, machine learning algorithms are more important than ever to help make sense of this data in an automated manner. Support vector machine (SVMs) have been a very successful learning algorithm for many applied settings. However, the support vector machine only finds linear classifiers so data often needs to be preprocessed with appropriately chosen nonlinear mappings in order to find a model with good predictive properties. These mappings can either take the form of an explicit transformation or be defined implicitly with a kernel function.

Automatically choosing these mappings has been studied raider the name of kernel learning. These methods typically optimize a cost function to find a kernel made up of a combination of base kernels thus implicitly learning mappings. This dissertation investigates methods for choosing explicit transformations automatically. This setting differs from the kernel learning framework by learning a combination of base transformations rather than base kernels. This allows prior knowledge to be exploited in the functional form of the transformations which may not be easily encoded as kernels such as when learning monotonic transformations. Additionally, kernel based SVMs are often hard to interpret because they lead to complex decision boundaries which are only linear in the implicitly defined space. However, by working with explicit mappings, models with an intuitive meaning can be learned. The learned transformations can be visualized to lend insight into the problem, and the hyperplane weights indicate the importance of transformed features.

The two basic models that will be studied are the a mixture of transformations Φ( x) = ∑imi&phis;i( x) and the matrix mixture of transformations which defines kernels of the form k(x, x') = ∑ i,jmimj&phis;i(x) T&phis;j(x'). The matrix mixture reduces to mixture of transformation learning when M is rank 1 and mixture of kernel learning when M is a diagonal matrix. First, greedy algorithms are proposed to simultaneously learn a mixture of transformations and a large margin hyperplane classifier. Then, a convex semidefinite algorithm is derived to find a matrix mixture of transformations and large margin hyperplane. More efficient algorithms based on the extragradient method are introduced to solve larger problems and extend the basic framework to a multitask setting. Another cost function based on kernel alignment is explored to learn matrix mixture of transformations. Maximizing the alignment with a cardinality constraint on the mixture weights gives rise to approximation algorithms with constant factor approximations similar to the Max-Cut problem. These methods are then applied to the task of learning monotonic transformations which are built from a mixture of truncated ramp functions. Experimental results for synthetic data, image histogram classification, text classification and gender recognition demonstrate the utility of these learned transformations.

Indexing (document details)
Advisor: Jebara, Tony
School: Columbia University
School Location: United States -- New York
Source: DAI-B 70/12, Dissertation Abstracts International
Subjects: Artificial intelligence, Computer science
Keywords: Kernel methods, Machine learning, Monotonic transformation, Semiactive programming, Support vector machines, Transformation learning
Publication Number: 3388470
ISBN: 978-1-109-54820-4
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy