Dissertation/Thesis Abstract

Human Activity Analysis using Multi-modalities and Deep Learning
by Zhang, Chenyang, Ph.D., The City College of New York, 2016, 115; 10159927
Abstract (Summary)

With the successful development of video recording devices and sharing platforms, visual media has become a significant component of everyone's life in the world. To better organize and understand the tremendous amount of visual data, computer vision and machine learning have become the key technologies to resolve such a huge problem. Among the topics in computer vision research, human activity analysis is one of the most challenging and promising areas. Human activity analysis is dedicated to detecting, recognizing, and understanding the context and meaning of human activities in visual media. This dissertation focuses on two aspects in human activity analysis: 1) how to utilize multi-modality approach, including depth sensors and traditional RGB cameras, for human action modeling. 2) How to utilize more advanced machine learning technologies, such as deep learning and sparse coding, to address more sophisticated problems such as attribute learning and automatic video captioning.

To explore the utilization of the depth cameras, we first present a depth camera-based image descriptor called histogram of 3D facets (H3DF) and its utilization in human action and hand gesture recognition and a holistic depth video representation for human actions. To unify both the inputs from depth cameras and RGB cameras, this dissertation first discusses a joint framework to model human affections from both facial expressions and body gestures with a multi-modality fusion framework. Then we present deep learning-based frameworks for human attribute learning and automatic video captioning tasks. Compared to human action detection recognition, automatic video captioning is more challenging because it includes complex language models and visual context. Extensive experiments have also been conducted on several public datasets to demonstrate that our proposed frameworks in this dissertation outperform the state-of-the-art approaches in this research area.

Indexing (document details)
Advisor: Tian, Yingli
Commitee: Brown, Lisa M., Stamos, Ioannis, Tian, Yingli, Xiao, Jizhong, Zhu, Zhigang
School: The City College of New York
Department: Electrical Engineering
School Location: United States -- New York
Source: DAI-B 78/04(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Computer vision, Human activity analysis, Visual data
Publication Number: 10159927
ISBN: 978-1-369-14855-8
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy