Dissertation/Thesis Abstract

Weakly Supervised Visual Understanding
by Zou, Yang, Ph.D., Carnegie Mellon University, 2020, 170; 28095257
Abstract (Summary)

Deep neural networks have led to remarkable progress in visual recognition. A key driving factor is the availability of abundant labeled data, enabling effective model training. However, the amount of labeled data is often limited by the high cost of manual annotation. The reliance on large-scale labeled data has become one of the key bottlenecks in creating intelligent systems.

There exists data with weak annotations containing useful - yet not perfect - knowledge about the given tasks. Examples of weak annotations include possibly wrong labels, image-level rather than pixel-level labels for semantic segmentation, etc. Weak annotations are relatively cheap. If we could create powerful recognition systems by such supervision, the labeling cost can be vastly reduced. So can we enable machines to learn well with less supervision as humans?

This thesis explores visual learning with weakly supervised data. Our core idea is to enable models to capture certain inductive biases or desired properties by leveraging the prior knowledge and the data regularities. We have rich prior knowledge about weakly supervised data - we should not place 100% confidence on potentially wrong labels, image-level labels indicate object presence, etc. Modeling such prior knowledge can regularize the solutions. Also the data has natural regularities - a horizontally flipped image mirrors the original one, various cats are more similar to each other than to a bike, etc. Such natural regularity in data also imposes regularity on the tasks and models - pixel-level classification of an image mirrors the prediction of the horizontally flipped input. We explore various forms of inductive biases in representation, predicting outputs and prior knowledge in labels. We show the effectiveness of our ideas on learning with various weak supervision for several important computer vision applications: image classification, semantic segmentation, object detection, person re-identification, and pose orientation estimation. We specifically focus on three types of weak supervision: incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always correct.

Indexing (document details)
Advisor: Bhagavatula, Vijayakumar
Commitee: Chellappa, Rama, Sankaranarayanan, Aswin, Yu, Zhiding
School: Carnegie Mellon University
Department: Electrical and Computer Engineering
School Location: United States -- Pennsylvania
Source: DAI-A 82/4(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Artificial intelligence, Information Technology, Information science
Keywords: Domain adaptation, Feature disentanglement, Noisy label learning, Weakly supervised detection, Weakly supervised learning
Publication Number: 28095257
ISBN: 9798678181213
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest