Dissertation/Thesis Abstract

Automatic Music Transcription in the Deep Learning Era: Perspectives on Generative Neural Networks
by Kim, Jong Wook, Ph.D., New York University, 2020, 181; 27736453
Abstract (Summary)

The problem of automatic music transcription (AMT) is considered by many researchers as the holy grail of the field, because of the notorious complexity and difficulty of the problem. Meanwhile, the current decade has seen an unprecedented surge of deep learning where neural network methods have achieved tremendous success in many machine learning tasks including AMT. The success of deep learning is largely enabled by the ever-increasing amount of available data and the innovation of GPU hardware, allowing a deep learning model to enjoy the increased capacity to process such scale of data. While having more data and higher capacity translates better performance in general, there still remains the question of how to design an AMT model that can effectively incorporate the inductive bias for the task and best utilize the increased capacity.

This thesis hypothesizes that an effective way to address this question is through the use of generative neural networks.

Starting with a simplified setup of monophonic transcription, we learn the effectiveness of convolutional representation and the roles of dataset choices in data-driven models for music analysis. In the subsequent chapters, we examine the applications of deep generative models in music analysis and synthesis tasks, by introducing a WaveNet-based music synthesis model that learns a multi-dimensional timbre representation and a music language model applied in an adversarial manner to improve a piano transcription model. Finally, we combine the analysis and synthesis methods to develop a multi-instrument polyphonic music transcription system. From these observations, we conclude that deep generative models can be used to improve AMT in many ways, and they will be a crucial component for further advancing AMT.

Indexing (document details)
Advisor: Bello, Juan Pablo
Commitee: Rowe, Robert, Humphrey, Eric J.
School: New York University
Department: Music and Performing Arts Professions
School Location: United States -- New York
Source: DAI-A 81/8(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Artificial intelligence, Information Technology, Music
Keywords: Automatic music transcription, Deep learning era, Neural networks
Publication Number: 27736453
ISBN: 9781392853870
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest