In tonal languages such as Mandarin Chinese, words are defined by their phonemic sequence and by the intonational patterns (tones) of their syllables.
To see if the problem of tone recognition is worth solving, we propose an information theoretic measure to compare the relative importance (Functional Load) of phonological contrasts in any language. Empirical calculations show that tones are at least as important as vowels for conveying information in Mandarin.
We then carry out a large and thorough investigation of possible acoustic features to recognize tones. This involves hundreds of experiments, each involves classifying over a hundred thousand syllables from ten hours of broadcast news speech.
After determining a base set of features (based on pitch, duration, and overall intensity) that achieve a syllable classification rate of 58.9.
Experiments on a subset of our data show that simple features based on energy in various frequency bands work better for tone recognition than those based on more complicated methods like harmonic-amplitude differences and glottal flow estimation. Further experiments determine a set of band energy features that improve classification accuracy to 63.7%, with the F score for Neutral Tone increasing from 0.345 to 0.619. This opens up a host of new features for future speech researchers in industry and academia to investigate and use.
We investigate making additional use of context: if we know the tones of the surrounding syllables, we can only increase classification accuracy to 67.2%. (This provides a useful upper bound for our experiments.) While we do not have such ideal contextual information, we can use estimates of it to increase accuracy to 65.0%.
Finally, we investigate the hypothesis that better articulated syllables are easier to recognize. On a small corpus of lab speech from Xu (1999), we classify syllables in focussed words with over 99% accuracy, and use this to improve classification accuracy of all syllables. However, in news broadcast speech, we find that while stronger syllables are recognized better, the difference is not enough to suggest an algorithm that makes use of it.
|School:||The University of Chicago|
|School Location:||United States -- Illinois|
|Source:||DAI-B 68/10, Dissertation Abstracts International|
|Subjects:||Linguistics, Computer science|
|Keywords:||Automatic recognition, Mandarin Chinese, Phonological contrast, Support vector machines, Tones|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be