Peter Drucker, a well-known influential writer and philosopher in the field of management theory and practice, once claimed that “the most important thing in communication is hearing what isn't said.” It is not difficult to see that a similar concept also holds in the context of video scene understanding. In almost every non-trivial video scene, most important elements, such as the motives and intentions of the actors, can never be seen or directly observed, yet the identification of these latent attributes is crucial to our full understanding of the scene. That is to say, latent attributes matter.
In this work, we explore the task of identifying latent attributes in video scenes, focusing on the mental states of participant actors. We propose a novel approach to the problem based on the use of large text collections as background knowledge and minimal information about the videos, such as activity and actor types, as query context. We formalize the task and a measure of merit that accounts for the semantic relatedness of mental state terms, as well as their distribution weights. We develop and test several largely unsupervised information extraction models that identify the mental state labels of human participants in video scenes given some contextual information about the scenes. We show that these models produce complementary information and their combination significantly outperforms the individual models, and improves performance over several baseline methods on two different datasets. We present an extensive analysis of our models and close with a discussion of our findings, along with a roadmap for future research.
|Advisor:||Cohen, Paul R.|
|Commitee:||Barnard, Kobus, McAllister, Ken S., Surdeanu, Mihai|
|School:||The University of Arizona|
|School Location:||United States -- Arizona|
|Source:||DAI-B 76/01(E), Dissertation Abstracts International|
|Subjects:||Information science, Artificial intelligence, Computer science|
|Keywords:||Artificial intelligence, Computer vision, Information extraction, Information retrieval, Mental state inference, Natural language processing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be