We introduce a new machine learning framework called self-taught learning for using unlabeled data in supervised classification tasks. This framework does not require that the unlabeled data follow the class labels of the supervised task, or arise from the same generative distribution. Such unlabeled data is often significantly easier to obtain than in previously studied frameworks such as semi-supervised learning. In this thesis, we demonstrate that self-taught learning can be applied successfully to a variety of hard machine learning problems.
The centerpiece of our work is a self-taught learning algorithm based on an optimization problem called "sparse coding." This algorithm uses unlabeled data to learn a new representation for complex, high-dimensional inputs, and then applies supervised learning over this representation. The representation captures higher-level aspects of the input, and significantly improves classification performance on many test domains, including computer vision, audio recognition and text classification. We present efficient sparse coding algorithms for a translation-invariant version of the model, that can be applied to audio and image data. We also generalize the model to a much broader class of inputs, including domains that are hard to handle with previous algorithms, and apply the model to text classification and a robotic perception task. Taken together, these experiments demonstrate that using the self-taught learning framework, machine learning can be applied to much harder problems than previously possible.
These self-taught learning algorithms work best when they are allowed to learn rich models (with millions of parameters) using large amounts of unlabeled data (millions of examples). Unfortunately, with current methods, it can take weeks to learn such rich models. Further, these methods require fast, sequential updates, and with current algorithms, are not conducive to being parallelized on a distributed cluster. To apply self-taught learning to such large-scale problems, we show that graphics processor hardware (available in most modern desktops) can be used to massively parallelize the algorithms. Using a new inherently parallel algorithm, the sparse coding algorithm can be easily implemented on graphics processors, and we show that this can reduce the learning time from about three weeks to a single day.
Finally, we consider self-taught learning methods that learn hierarchical representations using unlabeled data. We develop general principles for unsupervised learning of such hierarchical models using graphics processors, and show that the slow learning algorithms for the popular deep belief network model can be successfully parallelized. This implementation is up to 70 times faster than an optimized CPU implementation, reduces the learning time from weeks to hours, and represents the state-of-the-art in learning large deep belief networks.
|Advisor:||Ng, Andrew Y.|
|School Location:||United States -- California|
|Source:||DAI-B 70/10, Dissertation Abstracts International|
|Subjects:||Artificial intelligence, Computer science|
|Keywords:||Machine learning, Self-taught learning, Semisupervised learning, Unsupervised learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be