A systematic approach to retrieve individual parts in a monaural music recording with its score is introduced. We are interested in isolating the accompaniment part by removing the solo part from a recording of concerto music in which a solo instrument is accompanied by an orchestra. We require the music audio, the score, and optionally a sample library of individual notes played in isolation. Our approach is based on explicit knowledge of the musical audio at the semantic level (notes or chords) from an audio-score alignment. Such knowledge allows the spectrogram energy to be decomposed into note-based models that could be trained with the sample library. Our approach can be divided into: (1) "masking" to estimate a solo mask to remove the solo and (2) "reconstruction" to impute the missing harmonics of the orchestra notes that have been inevitably damaged in masking.
In "masking," we estimate a 2-dimensional binary mask to classify each time-frequency cell of the short-time Fourier Transform (STFT) spectrogram as either solo or accompaniment in STFT domain. We mainly employ an Expectation Maximization (EM) algorithm to decompose spectrogram magnitude into note-based models. In this process of "erasing" the soloist’s contribution to the mixture by applying the mask, the remaining orchestra is degraded. In "reconstruction," we propose a novel technique to repair such degradation. We use a state-space model for each note partial which is represented by a slowing-changing amplitude envelope and an "unwrapped" phase sequence. Such amplitude-phase representation can be computed by Kalman smoothing. It allows us to "transpose" intact partials of the orchestra part onto the degraded time-frequency region. Objective metrics and subjective listening are used on real and synthesized musical audio data for evaluation and parameter optimization.
|Commitee:||Crandall, David, Myers, Steven, Trosset, Michael|
|School Location:||United States -- Indiana|
|Source:||DAI-A 75/04(E), Dissertation Abstracts International|
|Subjects:||Music, Statistics, Information science|
|Keywords:||Acoustics and statistics, Expectation maximization, Kalman smoothing, Music informatics, Phase estimation, Source separation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be