This dissertation addresses the problem of the automatic discovery of structure in music from audio signals by introducing novel approaches and proposing perceptually enhanced evaluations. First, the problem of music structure analysis is reviewed from the perspectives of music information retrieval (MIR) and music perception and cognition (MPC), including a discussion of the limitations and current challenges in both disciplines. When discussing the existing methods of evaluating the outputs of algorithms that discover musical structure, a transparent open source software called mir eval, which contains implementations to these evaluations, is introduced. Then, four MIR algorithms are presented: one to compress music recordings into audible summaries, another to discover musical patterns from an audio signal, and two for the identification of the large-scale, non-overlapping segments of a musical piece. After discussing these techniques, and given the differences when perceiving the structure of music, the idea of applying more MPC-oriented approaches is considered to obtain perceptually relevant evaluations for music segmentation. A methodology to automatically obtain the most difficult tracks for machines to annotate is presented in order to include them in a design of a human study to collect multiple human annotations. To select these tracks, a novel open source framework called music structural analysis framework (MSAF) is introduced. This framework contains the most relevant music segmentation algorithms and it uses mir eval to transparently evaluate them. Moreover, MSAF makes use of the JSON annotated music specification (JAMS), a new format to contain multiple annotations for several tasks in a single file, which simplifies the dataset design and the analysis of agreement across different human references. The human study to collect additional annotations (which are stored in JAMS files) is described, where five new annotations for fifty tracks are stored. Finally, these additional annotations are analyzed, confirming the problem of having ground-truth datasets with a single annotator per track due to the high degree of disagreement among annotators for the challenging tracks. To alleviate this, these annotations are merged to produce a more robust human reference annotation. Lastly, the standard F-measure of the hit rate measure to evaluate music segmentation is analyzed when access to additional annotations is not possible, and it is shown, via multiple human studies, that precision seems more perceptually relevant than recall.
|Advisor:||Farbood, Morwaread Mary, Pablo Bello, Juan|
|Commitee:||Casey, Micahel, Jehan, Tristan, Ruthman, Stephen Alexander|
|School:||New York University|
|Department:||Music and Performing Arts Professions|
|School Location:||United States -- New York|
|Source:||DAI-B 76/10(E), Dissertation Abstracts International|
|Subjects:||Music, Information Technology, Electrical engineering|
|Keywords:||Music information retrieval, Music perception, Music structure, Musis cognition, Signal processing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be