This thesis proposes a balanced framework toward understanding speech motor planning and control by observing aspects of its behavioral execution. To this end, it proposes representing, modeling, and analyzing real-time speech articulation data from both `top-down' (or knowledge-driven) as well as `bottom-up' (or data-driven) perspectives.
The first part of the thesis uses existing knowledge from linguistics and motor control to extract meaningful representations from real-time magnetic resonance imaging (rtMRI) data, and further, posit and test specific hypotheses regarding kinematic and postural planning during pausing behavior. In the former case, we propose a measure to quantify the speed of articulators during pauses as well as during their immediate neighborhoods. Using appropriate statistical analysis techniques, we find support for the hypothesis that pauses at major syntactic boundaries (i.e., grammatical pauses), but not ungrammatical (e.g., word search) pauses, are planned by a high-level cognitive mechanism that also controls the rate of articulation around these junctures. In the latter case, we present a novel automatic procedure to characterize vocal posture from rtMRI data. Statistical analyses suggest that articulatory settings differ during rest positions, ready positions and inter-speech pauses, and might, in that order, involve an increasing degree of active control by the cognitive speech planning mechanism. We show that this may be due to the fact that postures assumed during pauses are significantly more mechanically advantageous than postures assumed during absolute rest. In other words, inter-speech postures allow for a larger change in the space of motor control tasks/goals for a minimal change in the articulatory posture space as compared to postures at absolute rest. We argue that such top-down approaches can be used to augment models of speech motor control.
The second part of the thesis presents a computational, data-driven approach to derive interpretable movement primitives from speech articulation data in a bottom-up manner. It puts forth a convolutive Nonnegative Matrix Factorization algorithm with sparseness constraints (cNMFsc) to decompose a given data matrix into a set of spatiotemporal basis sequences and an activation matrix. The algorithm optimizes a cost function that trades off the mismatch between the proposed model and the input data against the number of primitives that are active at any given instant. The method is applied to both measured articulatory data obtained through electromagnetic articulography (EMA) as well as synthetic data generated using an articulatory synthesizer. The paper then describes how to evaluate the algorithm performance quantitatively and further performs a qualitative assessment of the algorithm's ability to recover compositional structure from data. The results suggest that the proposed algorithm extracts movement primitives from human speech production data that are linguistically interpretable. We further examine how well derived representations of "primitive movements'' of speech articulation can be used to classify broad phone categories, and thus provide more insights into the link between speech production and perception. We finally show that such primitives can be mathematically modeled using nonlinear dynamical systems in a control-theoretic framework for speech motor control. Such a primitives-based framework could thus help inform practicable theories of speech motor control and coordination.
|Advisor:||Narayanan, Shrikanth S.|
|Commitee:||Byrd, Dani, Goldstein, Louis M., Nayak, Krishna S., Ortega, Antonio, Schaal, Stefan|
|School:||University of Southern California|
|School Location:||United States -- California|
|Source:||DAI-B 76/03(E), Dissertation Abstracts International|
|Subjects:||Linguistics, Electrical engineering, Computer science|
|Keywords:||Machine learning, Motor control, Primitives, Real-time magnetic resonance imaging, Signal and image processing, Speech production and communication|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be