Table Of ContentJ Y V Ä S K Y L Ä S T U D I E S I N H U M A N I T I E S
243
Pasi Saari
Music Mood Annotation
Using Semantic Computing
and Machine Learning
JYVÄSKYLÄ STUDIES IN hUmaNITIES 243
Pasi Saari
music mood annotation
Using Semantic Computing
and machine Learning
Esitetään Jyväskylän yliopiston humanistisen tiedekunnan suostumuksella
julkisesti tarkastettavaksi yliopiston historica-rakennuksen salissa h320
tammikuun 31. päivänä 2015 kello 12.
academic dissertation to be publicly discussed, by permission of
the Faculty of humanities of the University of Jyväskylä,
in building historica, hall h320, on January 31, 2015 at 12 o’clock noon.
UNIVERSITY OF JYVÄSKYLÄ
JYVÄSKYLÄ 2015
music mood annotation
Using Semantic Computing
and machine Learning
JYVÄSKYLÄ STUDIES IN hUmaNITIES 243
Pasi Saari
music mood annotation
Using Semantic Computing
and machine Learning
UNIVERSITY OF JYVÄSKYLÄ
JYVÄSKYLÄ 2015
Editors
Tuomas Eerola
Department of music, University of Jyväskylä
Pekka Olsbo, Ville Korkiakangas
Publishing Unit, University Library of Jyväskylä
Jyväskylä Studies in humanities
Editorial Board
Editor in Chief heikki hanka, Department of art and Culture Studies, University of Jyväskylä
Petri Karonen, Department of history and Ethnology, University of Jyväskylä
Paula Kalaja, Department of Languages, University of Jyväskylä
Petri Toiviainen, Department of music, University of Jyväskylä
Tarja Nikula, Centre for applied Language Studies, University of Jyväskylä
Raimo Salokangas, Department of Communication, University of Jyväskylä
URN:ISBN:978-951-39-6074-2
ISBN 978-951-39-6074-2 (PDF)
ISSN 1459-4331
ISBN 978-951-39-6073-5 (nid.)
ISSN 1459-4323
Copyright © 2014, by University of Jyväskylä
Jyväskylä University Printing house, Jyväskylä 2014
ABSTRACT
Saari,Pasi
MusicMoodAnnotationUsingSemanticComputingandMachineLearning
Jyväskylä: UniversityofJyväskylä,2015,58p.(+includedarticles)
(JyväskyläStudiesinHumanities
ISSN1456-5390;(cid:3)(cid:21)(cid:23)(cid:22)(cid:12)
ISBN(cid:3)(cid:28)(cid:26)(cid:27)(cid:16)(cid:28)(cid:24)(cid:20)(cid:16)(cid:22)(cid:28)(cid:16)(cid:25)(cid:19)(cid:26)(cid:22)(cid:16)(cid:24)(nid.)
ISBN(cid:3)(cid:28)(cid:26)(cid:27)(cid:16)(cid:28)(cid:24)(cid:20)(cid:16)(cid:22)(cid:28)(cid:16)(cid:25)(cid:19)(cid:26)(cid:23)(cid:16)(cid:21)(PDF)
Finnishsummary
Diss.
Themainappealofmusicliesinitscapabilitytoexpressmoods,somood-based
musicdiscoveryandmanagementishighlybeneficial. Onlinemusicservicesen-
able access to wide music catalogues, and social and editorial tagging produces
large amounts of semantic information on music mood. However, tag data are
notasreliableatrepresentingmoodsexpressedbymusicasself-reportdataob-
tainedfromlisteningexperiments. Theprimaryaimofthepresentworkwasto
examine computational methods for enhancing the existing mood tag data and
to enable automatic annotation of music according to the expressed mood. Se-
mantic computing based on large-scale tag data aims to improve the accuracy
of tags at representing moods, and a novel technique called Affective Circum-
plexTransformation(ACT)wasproposedforthispurpose. Improvementstothe
generalizability of audio-based mood annotation performance were sought us-
ingaudiofeatureselectionandaproposedtechniquetermedasSemanticLayer
Projection (SLP) that efficiently incorporates large-scale tag data. Moreover, a
genre-adaptive technique was proposed to take into account genre-specific as-
pects of music mood in audio-based annotation. Performance evaluation of the
techniques was carried out using social and editorial tags, listener ratings, and
largecorporarepresentingpopularandproductionmusic. ACTledtoclearim-
provementsintheaccuracyasopposedtorawtagdataandconventionalseman-
tic analysis techniques. Moreover, ACT models could be generalized across tag
typesanddifferentmusiccorpora. Audio-basedannotationresultsshowedthat
exploiting tags and semantic computing using SLP can lead to similar or even
higherperformancethantag-basedmoodinference. Adaptingboththesemantic
moodmodelsandaudio-basedmodelstodifferentgenresledtofurtherimprove-
ments,especiallyintermsofthevalencedimension.
Keywords:music mood annotation, music emotion recognition, social tags, ed-
itorial tags, circumplex model, feature selection, genre-adaptive, se-
manticcomputing,audiofeatureextraction
Author PasiSaari
DepartmentofMusic
UniversityofJyväskylä,Finland
Supervisors ProfessorTuomasEerola
DepartmentofMusic
DurhamUniversity,UnitedKingdom
DoctorOlivierLartillot
DepartmentforArchitecture,DesignandMediaTech-
nology
AalborgUniversity,Denmark
Reviewers ProfessorDanEllis
DepartmentofElectricalEngineering
ColumbiaUniversity,USA
AssociateProfessorGeorgeTzanetakis
DepartmentofComputerScience
UniversityofVictoria,Canada
Opponent ProfessorBjörnW.Schuller
ChairofComplexSystemsEngineering
UniversityofPassau,Germany
SeniorLectureratDepartmentofComputing
ImperialCollegeLondon,UnitedKingdom
ACKNOWLEDGMENTS
Duringthetimeworkingonthisthesis,IwasfortunatetobepartoftheFinnish
CentreofExcellenceinInterdisciplinaryMusicResearchattheUniversityofJy-
väskylä. The team has been a great example of a group of innovative, hard-
working and talented people who join together on a common mission to build
up an internationally renowned research center. I think this is something that
is way above of what one could expect to suddenly emerge from a small town
somewhereinthenorth. Theresultsspeakforthemselves,andIfeelhumbledto
havecontributedevenalittletothatstorythroughthiswork.
Firstandforemost,IamgratefultomysupervisorProfessorTuomasEerola
for having the confidence in me, for acting as a warm and caring mentor, and
for being a perfect collaborator – both inspiring and inspired. I felt it was thor-
oughly satisfying how these multiple roles switched organically and supported
my work. Thanks for supporting me in times of trouble, for celebrating some
academic victories with me, and for retaining the encouraging attitude during
the more independent phases of my work. My sincere gratitude goes to Doctor
Olivier Lartillot, my second supervisor, for first of all accepting my application
to the doctoral student position at the CoE and for being a pleasant office mate.
I benefited from your expertise a lot. Thanks for introducing me to MIRtoolbox
andforallowingmetocontributemysmalladd-onsaswell. ManythankstoPro-
fessorPetriToiviainenforbeingatruerolemodelasaresearchleader,forbeing
a dependable resort of advice, and for having the confidence in my work. Also
thanksforagreeingtobethecustosatmydefence.
I want to express my gratitude to my external readers Professor Dan Ellis
and Associate Professor George Tzanetakis. I feel I was privileged to receive
the attention from two such high-level experts whose work in MIR receive my
utmost admiration. Thanks for all the positive and inspiring feedback, and for
suggestions that I did my best to properly acknowledge in the final version of
thethesissummary. ThankyouProfessorBjörnSchullerfortakingthetimeand
lendingyourinsightsasmyopponentatmydefence.
My warm thanks go to the people at the Department of Music at the Uni-
versityofJyväskylä. ThankyouMarkkuPöyhönenforsowellmakingsurethat
everything goes smoothly in the backstage of the research arenas and for your
devotion to make the lives of everyone around you more enjoyable. Thank you
SuviSaarikallioandGeoffLuckforteachingmevarioustopicsimportantforthis
dissertation during my undergraduate studies in the MMT, and for being nice
colleaguesthereafter. Thankstomyfellow,presentorformer,doctoralstudents,
including Birgitta Burger, Anemone Van Zijl, Martín Hartmann, Henna Peltola,
Marc Thompson, Rafael Ferrer, Jonna Vuoskoski, Vinoo Alluri, and Iballa Bu-
runat, for the friendship and the many lunch, coffee, and other breaks. Thanks
to Mikko Leimu, Mikko Myllykoski, Tommi Himberg, Jaakko Erkkilä, Esa Ala-
Ruona, Jörg Fachner, and Marko Punkanen for being integral to forming a fine
research and teaching environment. Finally, thanks to Riitta Rautio and Erkki
Huovinen for handling many practical issues regarding the completion of this
thesis.
DuringthisworkIhadtheopportunitytopayaresearchvisittotheCentre
forDigitalMusicatQueenMaryUniversityofLondon. Thethreemonthsspent
thereplayedapivotalroleinthepreparationofmanyofthepublicationsincluded
inthisthesis. MygratitudegoesespeciallytoGyörgyFazekasandMathieuBar-
thet for their fruitful collaboration, for including me in the activities of their re-
search projects, and for their sincere kindness. Thank you Mark Plumbley and
Mark Sandler for supporting my stay. I also want to express special thanks to
Yading Song, Katerina Kosta, and Ioana Dalca for making my stay much more
pleasant. During the later phase of the thesis preparation, I was fortunate to be
hired as a Principal Researcher at Nokia Technologies. This appointment gave
meusefulinsightsintohowMIRresearchcanbeappliedintheindustrytosolve
real-worldproblemsonalargescale. IwanttothankmycolleaguesatNokia,es-
peciallyJussiLeppänen,ArtoLehtiniemi,AnttiEronen,andVille-VeikkoMattila
forthesupportiveandinspiringattitude,andforintroducingmetotheintriguing
worldofprofessionalIPRwork.
Unquestionably, my biggest gratitude goes to my wife Anna Pehkoranta.
Your scholarly endeavors first stirred up in me the desire to pursue a research
career. Most importantly, your love is the engine under my hood and my main
sourceofinspiration. Althoughmyresearchjourneyhasseveraltimestakenme
to distant places, my home remains wherever you are. I also want to thank my
parentsandfamilyfortheirloveandsupportovertheyears,andmylategrand-
motherLeenaTurunenforexemplifyingathoroughlyhumaneapproachfordeal-
ingwithlife’schallenges. Idedicatethisworktohermemory.
Durham,UK,7January2015
PasiSaari
CONTENTS
ABSTRACT
ACKNOWLEDGEMENTS
CONTENTS
LISTOFPUBLICATIONS
1 INTRODUCTION............................................................................ 9
2 MUSICMOOD................................................................................ 12
2.1 Definition................................................................................ 12
2.2 Models.................................................................................... 13
2.2.1 CategoricalModel ......................................................... 13
2.2.2 DimensionalModel ....................................................... 14
3 MUSICMOODANNOTATION......................................................... 16
3.1 Self-reports.............................................................................. 16
3.1.1 ListenerRatings............................................................. 16
3.1.2 EditorialTags................................................................ 18
3.1.3 SocialTags.................................................................... 19
3.2 SemanticAnalysisofTags......................................................... 21
3.2.1 Techniques.................................................................... 21
3.2.2 UncoveringStructuredRepresentationsofMusicMood.... 23
3.3 Audio-basedAnnotation........................................................... 24
3.3.1 AudioFeatureExtraction................................................ 24
3.3.2 MachineLearning.......................................................... 26
3.3.2.1 ExploitingListenerRatings.................................. 26
3.3.2.2 ExploitingTags................................................... 29
4 AIMSOFTHESTUDIES................................................................... 31
5 MATERIALS,METHODS,ANDRESULTS......................................... 34
5.1 ImprovingtheGeneralizabilityusingFeatureSelection(I)............ 34
5.2 IncorporatingTagDataasInformationSources(II–III)................. 35
5.3 ExploitingTagDataforAudio-basedAnnotation(IV–VI)............. 37
6 CONCLUSIONS .............................................................................. 42
6.1 MethodologicalConsiderations ................................................. 44
6.2 FutureDirections...................................................................... 45
TIIVISTELMÄ ......................................................................................... 47
REFERENCES.......................................................................................... 49
INCLUDEDPUBLICATIONS
Description:Pasi Saari, Mathieu Barthet, György Fazekas, Tuomas Eerola & Mark San- ..
The categorical model is closely related to the theory of basic emotions (Ekman,
. Pandora's Music Genome Project 2, mainly for the purpose of personalized ra-
However, Kim, Schmidt, and Emelle (2008) developed an online