Deep neural networks have led to remarkable progress in visual recognition. A key driving factor is the availability of abundant labeled data, enabling effective model training. However, the amount of labeled data is often limited by the high cost of manual annotation. The reliance on large-scale labeled data has become one of the key bottlenecks in creating intelligent systems.
There exists data with weak annotations containing useful - yet not perfect - knowledge about the given tasks. Examples of weak annotations include possibly wrong labels, image-level rather than pixel-level labels for semantic segmentation, etc. Weak annotations are relatively cheap. If we could create powerful recognition systems by such supervision, the labeling cost can be vastly reduced. So can we enable machines to learn well with less supervision as humans?
This thesis explores visual learning with weakly supervised data. Our core idea is to enable models to capture certain inductive biases or desired properties by leveraging the prior knowledge and the data regularities. We have rich prior knowledge about weakly supervised data - we should not place 100% confidence on potentially wrong labels, image-level labels indicate object presence, etc. Modeling such prior knowledge can regularize the solutions. Also the data has natural regularities - a horizontally flipped image mirrors the original one, various cats are more similar to each other than to a bike, etc. Such natural regularity in data also imposes regularity on the tasks and models - pixel-level classification of an image mirrors the prediction of the horizontally flipped input. We explore various forms of inductive biases in representation, predicting outputs and prior knowledge in labels. We show the effectiveness of our ideas on learning with various weak supervision for several important computer vision applications: image classification, semantic segmentation, object detection, person re-identification, and pose orientation estimation. We specifically focus on three types of weak supervision: incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always correct.
|Commitee:||Chellappa, Rama, Sankaranarayanan, Aswin, Yu, Zhiding|
|School:||Carnegie Mellon University|
|Department:||Electrical and Computer Engineering|
|School Location:||United States -- Pennsylvania|
|Source:||DAI-A 82/4(E), Dissertation Abstracts International|
|Subjects:||Artificial intelligence, Information Technology, Information science|
|Keywords:||Domain adaptation, Feature disentanglement, Noisy label learning, Weakly supervised detection, Weakly supervised learning|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be