The high levels goals of this thesis are to: understand the neural representation of sound, produce more robust statistical models of natural sound, and develop models for top-down auditory attention. These are three critical concepts in the auditory system. The neural representation of sound should provide a useful representation for building robust statistical models and directing attention. Robust statistical models are necessary for humans to generalize their knowledge from one domain to the plethora of domains in the real world. And attention is fundamental to the perception of sound, allowing one to prioritize information in the raw audio signal.
First, I approach understanding the neural representation of sound using the efficient coding principle and the physiological characteristics of the cochlea. A theoretical model is developed using convolutional filters and leaky-integrate-and-fire (LIF) neurons to model the cochlear transform and spiking code of the auditory nerve. The goal of this model is to explain the distributed phase code of the auditory nerve response but it lays the foundation for much more.
Second, I investigate an algorithm for audio source separation, called deep clustering. Experiments are performed to evaluate it's robustness, and a new neural network architecture is developed to improve robustness. The experiments show that the conventional recurrent neural network performs sub-optimally, and our dilated convolutional neural network improves robustness while using an order of magnitude fewer parameters. This more parsimonious model is a step towards models which are minimally parameterized and generalize well across many domains.
Third, I develop a new algorithm to address the limitations of the previous deep clustering method. This algorithm can extract multiple sources at once from a mixture using an attentional context or bias. It relies on modulating the computation of the bottom-up pathway using a top-down neural signal, which indicates which sources are of interest. A simple idea from the attentional spotlight method is used to do this: to allow for the top-down neural signal to modulate the gain on a set of low level neurons. This computational method demonstrates one way top-down feedback could direct auditory attention in the brain. Interestingly, this method goes beyond neuroscience, it demonstrates that attention can be about more than efficient computation. The experiments show that it resolves one of the main short comings of deep clustering. The model can extract sources from a mixture without knowing the total number of sources in the mixture, unlike deep clustering.
The major contributions of this work are a theoretical model for the auditory nerve response, a more robust neural network architecture for sound understanding, and a novel and powerful model of top-down auditory attention. I hope that the first contribution will be used to build a better understanding of the complex auditory nerve code. The second to build ever more parsimonious and robust models of source separation. And the third to provide a basis for an under-explored research direction which I believe is the most fruitful for building human-level auditory scene analysis, attention-based source separation.
Some files may require a special program or browser plug-in. More Information
|Advisor:||Olshausen, Bruno A.|
|Commitee:||Theunissen, Frederic, Yartsev, Michael, Deweese, Michael|
|School:||University of California, Berkeley|
|School Location:||United States -- California|
|Source:||DAI-B 81/3(E), Dissertation Abstracts International|
|Subjects:||Neurosciences, Artificial intelligence, Audiology|
|Keywords:||Audio, Deep learning, Hearing, Machine learning, Source separation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be