Table Of ContentAnalyzing the Relationship Among Audio Labels Using
Hubert-Arabie adjusted Rand Index
Kwan Kim
Submitted in partial fulfillment of the requirements for the
Master of Music in Music Technology
in the Department of Music and Performing Arts Professions
in The Steinhardt School
New York University
Advisor: Dr. Juan P. Bello
Reader: Dr. Kenneth Peacock
Date: 2012/12/11
Copyright (cid:13)c 2012 Kwan Kim
Abstract
With the advent of advanced technology and instant access to the Internet,
the music databases have grown rapidly, requiring more efficient ways of
organizing and providing access to music. A number of automatic classi-
fication algorithms are proposed in the field of music information retrieval
(MIR) by a means of supervised learning method, in which ground truth
labels are imperative. The goal of this study is to analyze a statistical rela-
tionshipamongaudiolabelssuchasera, emotions, genres, instruments, and
origin, using the Million Song Dataset and Hubert-Arabie adjusted Rand
Index in order to observe whether there is a significant enough correlation
betweentheselabels. Itisfoundthattheclustervalidationislowamongau-
diolabels,whichimpliesnostrongcorrelationandnotenoughco-occurrence
between these labels when describing songs.
Acknowledgements
Iwouldliketothankeveryoneinvolvedincompletingthisthesis. Iespecially
send my deepest gratitude to my advisor, Juan P. Bello, for keeping me
motivated. His critics and insights consistently pushed me to become a
betterstudent. IalsothankMaryFarboodforbeingsuchafriendlymentor.
Itwasa pleasuretoworkasher assistant for thepast yearand half. I thank
therestof NYUfacultyforprovidinganopportunityandexcellentprogram
to study. Lastly, I thank my family and wife for their support and love.
Contents
List of Figures iv
List of Tables vi
1 Introduction 1
2 Literature Review 4
2.1 Music Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Automatic Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Genre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Methodology 9
3.1 Data Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 1st Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.2 2nd Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 3rd Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3.1 Co-occurence . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3.2 Hierarchical Structure . . . . . . . . . . . . . . . . . . . 16
3.2.4 4th Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.4.1 Term Frequency . . . . . . . . . . . . . . . . . . . . . . 18
3.2.5 5th Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Audio Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 Emotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
ii
CONTENTS
3.3.3 Genre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.4 Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.5 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Audio Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 k-means Clustering Algorithm . . . . . . . . . . . . . . . . . . . 25
3.4.2 Feature Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.3 Feature Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.4 Feature Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Hubert-Arabie adjusted Rand Index . . . . . . . . . . . . . . . . . . . . 29
4 Evaluation and Discussion 31
4.1 K vs. ARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
HA
4.2 Hubert-Arabie adjusted Rand Index (revisited) . . . . . . . . . . . . . . 34
4.3 Cluster Structure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 Neighboring Clusters vs. Distant Clusters . . . . . . . . . . . . . 35
4.3.2 Correlated Terms vs. Uncorrelated Terms . . . . . . . . . . . . . 41
5 Conclusion and Future Work 49
References 50
iii
List of Figures
1.1 System Diagram of a Generic Automatic Classification Model . . . . . . 3
2.1 System Diagram of a Genre Classification Model . . . . . . . . . . . . . 6
2.2 System Diagram of a music emotion recognition model . . . . . . . . . . 8
2.3 Thayer’s 2-Dimensional Emotion Plane (19) . . . . . . . . . . . . . . . . 8
3.1 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Co-occurence - same level . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Co-occurence - different level . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Hierarchical Structure (Terms) . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Intersection of Labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Era Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Emotion Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.8 Genre Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.9 Instrument Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.10 Origin Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.11 Elbow Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.12 Content-based Cluster Histogram . . . . . . . . . . . . . . . . . . . . . . 28
4.1 K vs. ARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
HA
4.2 Co-occurence between feature clusters and era clusters . . . . . . . . . . 36
4.3 Co-occurence between feature clusters and emotion clusters . . . . . . . 37
4.4 Co-occurence between feature clusters and genre clusters . . . . . . . . . 38
4.5 Co-occurence between feature clusters and instrument clusters . . . . . 39
4.6 Co-occurence between feature clusters and origin clusters . . . . . . . . 40
4.7 Co-occurence between era clusters and feature clusters . . . . . . . . . . 42
iv
LIST OF FIGURES
4.8 Co-occurence between emotion clusters and feature clusters . . . . . . . 42
4.9 Co-occurence between genre clusters and feature clusters . . . . . . . . . 43
4.10 Co-occurence between instrument clusters and feature clusters . . . . . 43
4.11 Co-occurence between origin clusters and feature clusters . . . . . . . . 44
v
List of Tables
3.1 Overall Data Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Field List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 Hierarchical Structure (Clusters) . . . . . . . . . . . . . . . . . . . . . . 16
3.7 Hierarchical Structure (µ and σ) . . . . . . . . . . . . . . . . . . . . . . 18
3.8 Mutually Exclusive Clusters . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.9 Filtered Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.10 Era Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.11 Emotion Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.12 Genre Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.13 Instrument Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.14 Origin Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.15 Audio Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.16 Cluster Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.17 2 x 2 Contingency Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 ARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
HA
4.2 Term Cooccurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Term Cooccurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Optimal Cluster Validation . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Self-similarity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Neighboring Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.7 Distant Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
vi
LIST OF TABLES
4.8 Term Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.9 Term Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.10 Term Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.11 Label Cluster Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.12 Label Cluster Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vii
Description:(MIR) by a means of supervised learning method, in which ground truth .. functions to access and manipulate the data from Matlab level such as 'bass' or 'guitar', while terms such as 'acoustic bass' or 'classical guitar' .. 'beautiful'. 'urban'. 154. 'beautiful'. 'modern'. 70. 'chill'. 'waltz'. 49.