With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
The problem of automatic music transcription (AMT) is considered by many researchers as the holy grail of the field, because of the notorious complexity and difficulty of the problem. Meanwhile, the current decade has seen an unprecedented surge of deep learning where neural network methods have achieved tremendous success in many machine learning tasks including AMT. The success of deep learning is largely enabled by the ever-increasing amount of available data and the innovation of GPU hardware, allowing a deep learning model to enjoy the increased capacity to process such scale of data. While having more data and higher capacity translates better performance in general, there still remains the question of how to design an AMT model that can effectively incorporate the inductive bias for the task and best utilize the increased capacity.
This thesis hypothesizes that an effective way to address this question is through the use of generative neural networks.
Starting with a simplified setup of monophonic transcription, we learn the effectiveness of convolutional representation and the roles of dataset choices in data-driven models for music analysis. In the subsequent chapters, we examine the applications of deep generative models in music analysis and synthesis tasks, by introducing a WaveNet-based music synthesis model that learns a multi-dimensional timbre representation and a music language model applied in an adversarial manner to improve a piano transcription model. Finally, we combine the analysis and synthesis methods to develop a multi-instrument polyphonic music transcription system. From these observations, we conclude that deep generative models can be used to improve AMT in many ways, and they will be a crucial component for further advancing AMT.
Advisor: | Bello, Juan Pablo |
Commitee: | Rowe, Robert, Humphrey, Eric J. |
School: | New York University |
Department: | Music and Performing Arts Professions |
School Location: | United States -- New York |
Source: | DAI-A 81/8(E), Dissertation Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Artificial intelligence, Information Technology, Music |
Keywords: | Automatic music transcription, Deep learning era, Neural networks |
Publication Number: | 27736453 |
ISBN: | 9781392853870 |