Deep learning has been widely and successfully applied to many difficult tasks in computer vision, such as image parsing, object detection, and object recognition, where various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have achieved impressive performance and significantly outperformed state-of-the-art methods. However, the potential of deep learning in face related problems has not be fully explored yet. In this thesis, we fully explore different deep learning methods and proposes new network architectures and learning algorithms on face related applications, such as face parsing, face attribute inference, and face recognition.
For face parsing, we propose a novel face parser, which recasts segmentation of face components as a cross-modality data transformation problem, i.e., transforming an image patch to a label map. Specifically, a face is represented hierarchically by parts, components, and pixel-wise labels. With this representation, this approach first detects faces at both the part- and component-levels, and then computes the pixel-wise label maps. The part-based and component-based detectors are generatively trained with the deep belief network (DBN), and are discriminatively tuned by logistic regression. The segmentators transform the detected face components to label maps, which are obtained by learning a highly nonlinear mapping with the deep autoencoder. The proposed hierarchical face parsing is not only robust to partial occlusions but also provide richer information for face analysis and face synthesis compared with face keypoint detection and face alignment.
For face attribute inference, the proposed approach captures the interdependencies of local regions for each attribute, as well as the high-order correlations between different attributes, which makes it more robust to occlusions and misdetection of face regions. First, we have modeled region interdependencies with a discriminative decision tree, where each node consists of a detector and a classifier trained on a local region. The detector allows us to locate the region, while the classifier determines the presence or absence of an attribute. Second, correlations of attributes and attribute predictors are modeled by organizing all of the decision trees into a large sum-product network (SPN), which is learned by the EM algorithm and yields the most probable explanation (MPE) of the facial attributes in terms of the region’s localization and classification. Experimental results on a large data set with 22, 400 images show the effectiveness of the proposed approach.
For face recognition, this thesis addresses this challenge by proposing a new deep learning framework that can recover the canonical view of face images. It dramatically reduces the intraperson variances, while maintaining the inter-person discriminativeness. Unlike the existing face reconstruction methods that were either evaluated in controlled 2D environment or employed 3D information, our approach directly learns the transformation between face images with a complex set of variations and their canonical views. At the training stage, to avoid the costly process of labeling canonical-view images from the training set by hand, we have devised a new measurement and algorithm to automatically select or synthesize a canonical-view image for each identity. The recovered canonical-view face images are matched by using a facial component-based convolutional neural network. Our approach achieves the best performance on the LFW dataset under the unrestricted protocol. We also demonstrate that the performance of existing methods can be improved if they are applied to our recovered canonical-view face images.
|Advisor:||Sun, Han Qiu|
|School:||The Chinese University of Hong Kong (Hong Kong)|
|School Location:||Hong Kong|
|Source:||DAI-B 76/08(E), Dissertation Abstracts International|
|Keywords:||Computer vision, Deep learning, Face alignment, Face recognition|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be