Dissertation/Thesis Abstract

Discovering structure in music: Automatic approaches and perceptual evaluations
by Nieto, Oriol, Ph.D., New York University, 2015, 266; 3705329
Abstract (Summary)

This dissertation addresses the problem of the automatic discovery of structure in music from audio signals by introducing novel approaches and proposing perceptually enhanced evaluations. First, the problem of music structure analysis is reviewed from the perspectives of music information retrieval (MIR) and music perception and cognition (MPC), including a discussion of the limitations and current challenges in both disciplines. When discussing the existing methods of evaluating the outputs of algorithms that discover musical structure, a transparent open source software called mir eval, which contains implementations to these evaluations, is introduced. Then, four MIR algorithms are presented: one to compress music recordings into audible summaries, another to discover musical patterns from an audio signal, and two for the identification of the large-scale, non-overlapping segments of a musical piece. After discussing these techniques, and given the differences when perceiving the structure of music, the idea of applying more MPC-oriented approaches is considered to obtain perceptually relevant evaluations for music segmentation. A methodology to automatically obtain the most difficult tracks for machines to annotate is presented in order to include them in a design of a human study to collect multiple human annotations. To select these tracks, a novel open source framework called music structural analysis framework (MSAF) is introduced. This framework contains the most relevant music segmentation algorithms and it uses mir eval to transparently evaluate them. Moreover, MSAF makes use of the JSON annotated music specification (JAMS), a new format to contain multiple annotations for several tasks in a single file, which simplifies the dataset design and the analysis of agreement across different human references. The human study to collect additional annotations (which are stored in JAMS files) is described, where five new annotations for fifty tracks are stored. Finally, these additional annotations are analyzed, confirming the problem of having ground-truth datasets with a single annotator per track due to the high degree of disagreement among annotators for the challenging tracks. To alleviate this, these annotations are merged to produce a more robust human reference annotation. Lastly, the standard F-measure of the hit rate measure to evaluate music segmentation is analyzed when access to additional annotations is not possible, and it is shown, via multiple human studies, that precision seems more perceptually relevant than recall.

Indexing (document details)
Advisor: Farbood, Morwaread Mary, Pablo Bello, Juan
Commitee: Casey, Micahel, Jehan, Tristan, Ruthman, Stephen Alexander
School: New York University
Department: Music and Performing Arts Professions
School Location: United States -- New York
Source: DAI-B 76/10(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Music, Information Technology, Electrical engineering
Keywords: Music information retrieval, Music perception, Music structure, Musis cognition, Signal processing
Publication Number: 3705329
ISBN: 9781321782455
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest