Emerging applications in human-computer interfaces, security, and robotics have a need for understanding human behavior from video data. Much of the research in the field of action recognition evaluates methods using relatively small data sets, under controlled conditions, and with a small set of allowable action labels. There are significant challenges in trying to adapt existing action recognition models to less structured and larger-scale data sets. Those challenges include: the recognition of a large vocabulary of actions, the scalability to learn from a large corpus of video data, the need for real-time recognition on streaming video, and the requirement to operate in settings with uncontrolled lighting, a variety of camera angles, dynamic backgrounds, and multiple actors.
This thesis focuses on scalable methods for classifying and clustering actions with minimal human supervision. Unsupervised methods are emphasized in order to learn from a massive amount of unlabeled data, and for the potential to retrain models with minimal human intervention when adapting to new settings or applications. Because many applications of action recognition require real-time performance, and training data sets can be large, scalable methods for both learning and detection are beneficial.
The specific contributions from this dissertation include a novel method for Approximate Nearest Neighbor (ANN) indexing of general metric spaces and the application of this structure to a manifold-based action representation. With this structure, nearest-neighbor action recognition is demonstrated to be comparable or superior to existing methods, while also being fast and scalable. Leveraging the same metric space indexing mechanism, a novel clustering method is introduced for discovering action exemplars in data.
|Advisor:||Draper, Bruce A.|
|Commitee:||Anderson, Charles, Howe, Adele, Peterson, Christopher|
|School:||Colorado State University|
|School Location:||United States -- Colorado|
|Source:||DAI-B 74/10(E), Dissertation Abstracts International|
|Keywords:||Action recognition, Approximate nearest neighbor, Grassmann manifold, Randomized forests, Unsupervised learning, Video analysis|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be