Table Of Content1
Identifying Stable Patterns over Time for
Emotion Recognition from EEG
Wei-Long Zheng, Jia-Yi Zhu, and Bao-Liang Lu*, Senior Member, IEEE
Abstract—Inthispaper,weinvestigatestablepatternsofelectroencephalogram(EEG)overtimeforemotionrecognitionusing
amachinelearningapproach.Uptonow,variousfindingsofactivatedpatternsassociatedwithdifferentemotionshavebeen
reported.However,theirstabilityovertimehasnotbeenfullyinvestigatedyet.Inthispaper,wefocusonidentifyingEEGstability
inemotionrecognition.Tovalidatetheefficiencyofthemachinelearningalgorithmsusedinthisstudy,wesystematicallyevaluate
6
theperformanceofvariouspopularfeatureextraction,featureselection,featuresmoothingandpatternclassificationmethods
1
with the DEAP dataset and a newly developed dataset for this study. The experimental results indicate that stable patterns
0
exhibitconsistencyacrosssessions;thelateraltemporalareasactivatemoreforpositiveemotionthannegativeoneinbetaand
2
gammabands;theneuralpatternsofneutralemotionhavehigheralpharesponsesatparietalandoccipitalsites;andfornegative
n emotion,theneuralpatternshavesignificanthigherdeltaresponsesatparietalandoccipitalsitesandhighergammaresponses
a atprefrontalsites.Theperformanceofouremotionrecognitionsystemshowsthattheneuralpatternsarerelativelystablewithin
J andbetweensessions.
0
1 Index Terms—Affective Computing, Affective Brain-Computer Interaction, Emotion Recognition, EEG, Stable EEG Patterns,
MachineLearning
] (cid:70)
C
H
.
s
c
[1 INTRODUCTION individualdifferentvariationsofemotion.Inaddition,
1 EMOTION plays an important role in human wecan’tobtainthe‘groundtruth’ofhumanemotions
in theory, that is, the true label of EEG corresponding
v communication and decision making. Although
different emotional states. The reason is that emotion
7it seems natural to us with emotions in our daily
9 is considered as a function of time, context, space,
life, we have little knowledge on the mechanisms of
1 language, culture, and races [2].
emotional function of the brain and modeling human
2 Many previous studies focus on subject-dependent
0emotion. In recent years, the research on emotion
and subject-independent patterns and evaluations for
.recognitionfromEEGhasattractedgreatinterestfrom
1 emotion recognition. However, the stable patterns
a vast amount of interdisciplinary fields from psy-
0
and performance of models over time are not fully
6chology to engineering, including basic studies on
exploited, which are very important for real world
1emotion theories and applications to affective Brain-
v:Computer Interaction (aBCI) [1], which enhances the applications. Stable EEG patterns are considered as
neuralactivitiessuchascriticalbrainareasandcritical
iBCI systems with the ability to detect, process, and
X frequency bands that share commonality across indi-
respond to users affective states using physiological
rsignals. vidualsandsessionsunderdifferentemotionalstates.
a
Although task-related EEG is sensitive to change due
Although many progresses in theories, methods
to different cognitive states and environmental vari-
and experiments that support affective computing
ables [3], we intuitively think that the stable patterns
havebeenmadeinthepastseveralyears,theproblem
for specific tasks should exhibit consistency among
of detecting and modeling human emotions in aBCI
repeated sessions of the same subjects. In this pa-
remains largely unexplored [1]. Emotion recognition
per, we focus on the following issues of EEG-based
is critical because computers can never respond to
emotion recognition: What is the capability of EEG
users’ emotional states without recognizing human
signals for discrimination of different emotions? Are
emotions. However, emotion recognition from EEG
thereanystableEEGpatternsofneuraloscillationsor
is very challenging due to the fuzzy boundaries and
brain regions for representing emotions? What is the
performanceofthemodelsbasedonmachinelearning
• Wei-Long Zheng, Jia-Yi Zhu and Bao-Liang Lu are with the Center
approach from day to day?
for Brain-Like Computing and Machine Intelligence, Department of
Computer Science and Engineering, Shanghai Jiao Tong University The main contributions of this paper to emotion
andKeyLaboratoryofShanghaiEducationCommissionforIntelligent recognition from EEG can be summarized as follows:
InteractionandCognitiveEngineering,ShanghaiJiaoTongUniversity,
1) We have developed a novel emotion EEG dataset
800DongChuanRoad,Shanghai200240,China.
E-mail:{weilonglive,zhujiayi1991}@gmail.com,[email protected]. as a subset of SEED (SJTU Emotion EEG Dataset),
*Correspondingauthor thatwillbepubliclyavailableforresearchtoevaluate
stable patterns across subjects and sessions. To the
2
best of our knowledge, there is no available public aselectricalsignals.Thesemeasuresincludeskincon-
EEG dataset for the analysis of stability of neural ductivity, electrocardiogram, electromyogram, elec-
patterns regarding emotion recognition. 2) We carry trooculogram,andEEG.Adetailedreviewofemotion
out a systematic comparison and a qualitative evalu- recognition methods can be found in [19].
ation of different feature extraction, feature selection, With the fast development of micro-nano technolo-
feature smoothing and pattern classification methods gies and embedded systems, it is now conceivable
on a public available EEG dataset, DEAP, and our to port aBCI systems from laboratory to real-world
owndataset,SEED.3)WeadoptdiscriminativeGraph environments. Many advanced dry electrodes and
regularized Extreme Learning Machine (GELM) to embeddedsystemsaredevelopedtohandlethewear-
identifythestablepatternsovertimeandevaluatethe ability, portability, and practical use of these systems
stabilityofouremotionrecognitionmodelwithcross- inrealworldapplications[20],[21].Variousstudiesin
sessionschemes.4)Ourexperimentresultsrevealthat affective computing community try to build compu-
neuralsignaturesforthreeemotions(positive,neutral tational models to estimate emotional states based on
and negative) do exist and the EEG patterns at crit- EEG features. In short, a brief summary of emotion
ical frequency bands and brain regions are relatively recognition using EEG is presented in Table 1. These
stable within and between sessions. studies show the efficiency and feasibility of building
The layout of the paper is as follows. In Section 2, computational models of emotion recognition using
we give a brief overview of related work on emotion EEG. In these studies, the stimuli used in emotion
recognitionfromEEG,aswellasthefindingsofstable recognition experiments contains still images, music
patterns for different emotions. A systematic descrip- and videos and emotions evaluated in most studies
tion of brain signal analysis methods and classifica- are discrete.
tion procedure for feature extraction, dimensionality
reductionandclassifiersisgiveninSection3.Section4
2.2 EEGPatternsAssociatedwithEmotions
presentsthemotivationandrationaleforouremotion
experimental setting. A detailed explanation of all One of the goals in affective neuroscience is to exam-
the materials and protocol we used is described. A inewhetherpatternsofbrainactivityforspecificemo-
systematic evaluation on different methods is con- tions exist, and whether these patterns are to some
ducted on the DEAP dataset and our SEED dataset. extent common across individuals. Various studies
We use time-frequency analysis to find the neural have examined the neural correlations of emotions. It
signatures and stable patterns for different emotions, seems that there don’t exist any processing modules
and evaluate the stability of our emotion recognition for specific emotion. However, there may exist neural
modelsovertime.InSection5,wemakeaconclusion signaturesofspecificemotion,asadistributedpattern
about our work. of brain activity [22]. Mauss and Robinson [23] pro-
posed that emotional state is likely to involve circuits
2 RELATED WORK rather than any brain region considered in isolation.
To AC researchers, to identify neural patterns that
2.1 EmotionRecognitionMethods
are both common across subjects and stable across
In the field of affective computing, a vast amount of sessionscanprovidevaluableinformationforemotion
studies has been conducted toward emotion recog- recognition from EEG.
nition based on different signals. Many efforts have Cortical activity in response to emotional cues was
beenmadetorecognizeemotionsusingexternalemo- related to the lateralization effect. Muller et al. [24]
tion expression, such as facial expression [15], speech reportedincreasedgamma(30-50Hz)powerfornega-
[16], and gestures [17]. However, sometimes the emo- tive valence over left temporal region. Davidson et al.
tionalstatesremaininternalandcannotbedetectedby [25], [26] showed that frontal EEG asymmetry is hy-
externalexpression.Consideringextremecaseswhere pothesizedtorelatetoapproachandwithdrawalemo-
people do not say anything but actually they are an- tions, with heightened approach tendencies reflected
gry, even smile during negative emotional states due in left frontal activity and heightened withdrawal
to the social masking [18]. In these cases, the external tendencies reflected in relative right-frontal activity.
emotional expression can be controlled subjectively Nie et al. [27] reported that the subject-independent
and these external cues for emotion recognition may features associated with positive and negative emo-
be inadequate. tionsaremainlyintherightoccipitallobeandparietal
Emotion is an experience that is associated with a lobeforthealphaband,thecentralsiteforbetaband,
particular pattern of physiological activity including and the left frontal lobe and right temporal lobe for
central nervous system and autonomic nervous sys- gamma band. Balconi et al. [28] found that frequency
tem.Contrarytoaudiovisualexpression,itismoreob- bandmodulationsareeffectedbyvalenceandarousal
jective and straightforward to recognize emotions us- rating, with an increased response for high arousing
ing physiological activities. These physiological activ- and negative or positive stimuli in comparison with
ities can be recorded by noninvasive sensors, mostly low arousing and neutral stimuli.
3
TABLE1
VariousstudiesonemotionclassificationusingEEGandthebestperformancereportedineachstudy
Study Stimuli #Chan. MethodDescription Emotionstates Accuracy Pattern
Study
[4] IAPS, 3 Powerofalphaandbeta,thenPCA,5 Valenceand Valence:92.3%, ×
IADS subjects,classificationwithFDA arousal arousal:92.3%
[5] IAPS 2 Amplitudesoffourfrequencybands, Valence(12), Valence:74%, ×
17subjects,evaluatedKNN,Bagging arousal(12)and arousal:74%,and
dominance(12) dominance:75%
[6] Video 62 Waveletfeaturesofalpha,betaand disgust,happy, 83.26% ×
gamma,20subjects,classificationwith surprise,fearand
KNNandLDA neutral
√
[7] Music 24 Powerspectraldensityandasymmetry Joy,anger,sadness, 82.29%
featuresoffivefrequencybands,26 andpleasure
subjects,evaluatedSVM
[8] IAPS 8 Spectralpowerfeatures,11subjects, Positive,negative 85% ×
KNN andneutral
[9] IAPS 4 Asymmetryindexofalphaandbeta Fourquadrantsof 94.4%(subject- ×
power,16subjects,SVM thevalence-arousal dependent),
space 62.58%(subject-
independent)
[10] Video 32 Spectralpowerfeaturesoffive Valence(2),arousal Valence:57.6%, ×
frequencybands,32subjects,Gaussian (2)andliking(2) arousal:62%and
naiveBayesclassifier liking:55.4%
√
[11] Music 14 Time-frequency(TF)analysis,9 Likeanddislike 86.52%
subjects,KNN,QDAandSVM
[12] Video 32 Powerspectraldensityfeaturesoffive Valence(3)and Valence:68.5%, ×
frequencybands,modalityfusionwith arousal(3) arousal:76.4%
eyetrack,24subjects,SVM
√
[13] Video 62 Powerspectrumfeatures,wavelet Positiveand 87.53%
features,nonlineardynamicalfeatures, negative
6subjects,SVM
√
[14] IAPS 64 HigherOrderCrossings,HigherOrder Happy,curious, 36.8%
SpectraandHilbert-HuangSpectrum angry,sad,quiet
features,16subjects,QDA
IAPSandIADSdenotestheInternationalAffectivePictureSystemandtheInternationalAffectiveDigitalSounds,respectively.
The numbers given in parenthesis denote the numbers of categories for each dimension. Pattern study denotes revealing
neuralactivities(criticalbrainareasandcriticalfrequencybands)thatsharecommonalityacrosssubjectsorsessions.Classifiers
includeK NearestNeighbors(KNN),Fisher’sDiscriminantAnalysis(FDA),LinearDiscriminantAnalysis(LDA),Quadratic
DiscriminantAnalysis(QDA),Bagging,andSupportVectorMachine(SVM).
For EEG-based emotion recognition, subject- coefficients[30],insteadoftheperformanceofpattern
dependent and subject-independent schemes are classifiers.
always used for evaluating the performance of
So far, a few preliminary studies on stability and
emotion recognition systems. As shown in Table 1,
reliability of neural patterns for emotion recognition
some findings of activated patterns such as critical
have been conducted. Lan et al. [32] presented a pilot
channels and oscillations associated with different
study of stability of features in emotion recognition
emotions have been proposed. However, a major
algorithms.However,intheirstabilityassessment,the
limitation is that they extract activated patterns only
samefeaturesderivedfromthesamechannelfromthe
across subject but do not consider the time factor.
same emotion class of the same subject were grouped
The studies of internal consistency and test-retest together to compute the correlation coefficients. Fur-
stability of EEG can be dated back to many years ago thermore, their experiments were conducted on a
[3], [29], [30], especially for clinical applications [31]. small group of subjects with 14-channel EEG signals.
McEvoy et al. [3] proposed that under appropriate They investigated the stability of each feature instead
conditions, task-related EEG has sufficient retest reli- of neural patterns that we focus on in this paper.
ability for use in assessing clinical changes. However, Up to now, there is no systematic evaluation about
these previous studies investigated the stability of the stability for activated patterns over time in previ-
EEG features under different conditions, for example, ous studies. The performance of emotion recognition
a working memory task [3]. Moreover, in these stud- systems over time is still an unsolved problem for
ies, stability and reliability are often quantified using developingreal-worldapplicationsystems.Therefore,
statistical parameters such as intraclass correlation ourmajoraiminthispaperistoinvestigatethestable
4
EEGpatternsovertimeusingtimefrequencyanalysis filmclipsinoneexperiment.Thedurationofeachfilm
and machine learning approaches. Here, we want to clip is about 4 minutes. Each film clip is well edited
emphasizethatwedonotstudyneuralpatternsunder tocreatecoherentemotioneliciting.Thedetailsofthe
emotion regulation [33], but for specific emotional filmclipsusedintheexperimentsarelistedinTable2.
states during different times.
TABLE2
To investigate various critical problems of emotion
Detailsoffilmclipsusedinouremotionexperiment
recognition from EEG, we face a serious lack of pub-
licly available emotional EEG datasets. To the best of
our knowledge, the only publicly available emotional
No. Labels Film clips sources #clips
EEGdatasetsareMAHNOBHCI[12]andDEAP[10].
1 negative Tangshan Earthquake 2
The first one includes EEG, physiological signals, eye
2 negative Back to 1942 3
gaze,audio,andfacialexpressionsof30peoplewhen
3 positive Lost in Thailand 2
watching 20 emotional videos. The DEAP dataset
4 positive Flirting Scholar 1
includes the EEG and peripheral physiological sig-
5 positive Just Another Pandora’s Box 2
nals of 32 participants when watching 40 one-minute
6 neutral World Heritage in China 5
music videos. It also contains participants’ rate of
each video in terms of the levels of arousal, valence,
like/dislike, dominance, and familiarity. However,
Fifteensubjects(7malesand8females;mean:23.27,
thesedatasetsdonotcontainEEGdatafromdifferent
std: 2.37) who are different from those in the prelim-
sessions for the same subject, which can not be used
inary study, participate in the experiments. In order
for investigating the stable patterns over time. Since
to investigate neural signatures and stable patterns
there is no available published EEG dataset for the
across sessions and individuals, each subject is re-
analysis of stability of neural patterns for emotion
quired to perform the experiments for three sessions.
recognition,wedevelopanewemotionalEEGdataset
The time interval between two sessions is one week
for this study as a subset of SEED1.
orlonger.AllparticipantsarenativeChinesestudents
fromShanghaiJiaoTongUniversitywithself-reported
3 EMOTION EXPERIMENT DESIGN normal or corrected-to-normal vision and normal
hearing. Before the experiments, the participants are
In order to investigate neural signatures of differ- informed about the experiment and instructed to sit
ent emotions and stable patterns over time, we de- comfortably, watch the forthcoming movie clips at-
sign a new emotion experiment to collect EEG data, tentively without diverting their attention from the
which is different from other existing publicly avail- screen, and refrain as much as possible from overt
able datasets. In our experiments, the same subject movements.
performs the emotion experiments three times with Facial videos and EEG data are recorded simul-
an interval of one week or longer. We choose film taneously. EEG is recorded using an ESI NeuroScan
clipsasemotionelicitationmaterials.Theseemotional System2atasamplingrateof1000Hzfrom62-channel
filmscontainbothsceneandaudio,whichcanexpose active AgCl electrode cap according to the interna-
subjects to more real-life scenarios and elicit strong tional 10-20 system. The layout of EEG electrodes on
subjective and physiological changes. the cap is shown in Fig. 1. The impendence of each
In our emotion experiments, Chinese film clips are electrode has to be less than 5 kΩ. The frontal face
usedconsideringthatnativeculturefactorsmayaffect videosarerecordedfromthecameramountedinfront
elicitation in emotion experiments [34]. In the pre- of the subjects. Facial videos are encoded into AVI
liminary study, we selected a pool of emotional film format with the frame rate of 30 frames per second
clips from famous Chinese films manually. Twenty and the resolution of 160×120.
participantswereaskedtoassesstheiremotionswhen There are totally 15 trials for each experiment.
watching the selected film clips by scores (1-5) and There is a 15s hint before each clips and 10s feed-
keywords(positive,neutralandnegative).Thecriteria back after each clip. For the feedback, participants
for selecting film clips are as follows: (a) the length are told to report their emotional reactions to each
of the whole experiment should not be too long in filmclipbycompletingthequestionnaireimmediately
caseitwillmakesubjectsvisualfatigue;(b)thevideos after watching each clip. The questions are following
shouldbeunderstoodwithoutexplanation;and(c)the Philippot [35]: (1) what they had actually felt in
videos should elicit a single desired target emotion. response to viewing the film clip; (2) how they felt
At last, 15 Chinese film clips for positive, neutral at the specific time they were watching the film clips;
and negative emotions were chosen from the pool of (3)havetheywatchedthismoviebefore;(4)havethey
materials, which received 3 or higher sores of mean understood the film clips. They also rate the intensity
ratings from the participants. Each emotion has five of subjective emotional arousal using a 5-point scale
1.http://bcmi.sjtu.edu.cn/∼seed/ 2.http://www.neuroscan.com/
5
where X submits the Gauss distribution N(µ,σ2), x
FP1 FPZ FP2 is a variable, and π and e are constants. According
AF3 AF4
F7 F8 to [37], in a certain band, DE is equivalent to the
F5 F3 F1FZ F2 F4F6 logarithmic power spectral density for a fixed length
FT7FC5FC3FC1FCZ FC2FC4 FC6FT8 EEG sequence.
Because many evidences show that the lateraliza-
T7 C5 C3 C1 CZ C2 C4 C6 T8
tion between the left and right hemisphere is asso-
TP7CP5 CP3CP1 CPZ CP2 CP4CP6 TP8 ciated with emotions [26], we investigate asymmetry
P7 P5 P3 P1 PZ P2 P4 P6 P8 features.Wecomputedifferentialasymmetry(DASM)
PO5PO3POZ PO4PO6 and rational asymmetry (RASM) features as the dif-
PO7 PO8
CB1O1OZO2CB2 ferences and ratios between the DE features of 27
pairs of hemispheric asymmetry electrodes (Fp1-Fp2,
Fig.1. TheEEGcaplayoutfor62channels F7-F8, F3-F4, FT7-FT8, FC3-FC4, T7-T8, P7-P8, C3-C4,
TP7-TP8,CP3-CP4,P3-P4,O1-O2,AF3-AF4,F5-F6,F7-
F8, FC5-FC6, FC1-FC2, C5-C6, C1-C2, CP5-CP6, CP1-
CP2,P5-P6,P1-P2,PO7-PO8,PO5-PO6,PO3-PO4,and
CB1-CB2) [37]. DASM and RASM can be expressed,
respectively, as
DASM =DE(X )−DE(X ) (2)
left right
and
RASM =DE(X )/DE(X ). (3)
left right
ASM features are the direct concatenation of DASM
Fig.2. Theprotocolusedinouremotionexperiment and RASM features for comparison. In the litera-
ture, the patterns of spectral differences along frontal
and posterior brain regions have also been explored
according to what they actually felt during the task [39]. To characterize the spectral-band asymmetry in
[36].Fig.2showsthedetailedprotocol.ForEEGsignal respect of caudality (in frontal-posterior direction),
processing, the raw EEG data are first downsampled we define DCAU features as the differences between
to200Hzsamplingrate.Inordertofilterthenoiseand DE features of 23 pairs of frontal-posterior electrodes
remove the artifacts, the EEG data are then processed (FT7-TP7, FC5-CP5, FC3-CP3, FC1-CP1, FCZ-CPZ,
with a bandpass filter between 0.5Hz to 70Hz. FC2-CP2, FC4-CP4, FC6-CP6, FT8-TP8, F7-P7, F5-P5,
F3-P3, F1-P1, FZ-PZ, F2-P2, F4-P4, F6-P6, F8-P8, FP1-
O1,FP2-O2,FPZ-OZ,AF3-CB1,andAF4-CB2).DCAU
4 METHODOLOGY is defined as
4.1 FeatureExtraction
DCAU =DE(X )−DE(X ). (4)
frontal posterior
From our previous work [37], [38], we have found
The dimensions of PSD, DE, DASM, RASM, ASM
that the following six different features and electrode
and DCAU are 310 (62 electrodes × 5 bands), 310
combinations are efficient for EEG-based emotion
(62 electrodes × 5 bands), 135 (27 electrode pairs ×
recognition: power spectral density (PSD), differen-
5 bands), 135 (27 electrode pairs × 5 bands), 270 (54
tial entropy (DE), differential asymmetry (DASM),
electrodepairs×5bands),and115(23electrodepairs
rational asymmetry (RASM), asymmetry (ASM) and
× 5 bands), respectively.
differential caudality (DCAU) features from EEG. As
a result, we use these six different features in this
study. According to five frequency bands: delta (1-
4.2 FeatureSmoothing
3Hz); theta (4-7Hz); alpha (8-13Hz); beta (14-30Hz);
Most of the existing approaches to emotion recogni-
and gamma (31-50Hz), we compute the traditional
tion from EEG may be suboptimal because they map
PSD features using Short Time Fourier Transform
EEG signals to static discrete emotional states and do
(STFT) with a 1s long window and no overlapping
not take temporal dynamics of the emotional state
Hanning window. The differential entropy feature is
into account. However, in general, emotion should
defined as follows [37],
not be considered as a discrete psychophysiological
(cid:90) ∞ 1 (x−µ)2 1 variable [40]. Here, we assume that the emotional
h(X)=− √ exp log√
2πσ2 2σ2 2πσ2 state is defined in a continuous space and emotional
−∞ (1)
(x−µ)2 1 states change gradually. Our approach focuses on
exp dx= log2πeσ2,
2σ2 2 tracking the change of the emotional state over time
6
from EEG. In our approach, we introduce the dy- approaches: principal component analysis (PCA) al-
namic characteristics of emotional changes into emo- gorithm and minimal redundancy maximal relevance
tion recognition and investigate how observed EEG (MRMR) algorithm [45].
isgeneratedfromahiddenemotionalstate.Weapply Principalcomponentanalysis(PCA)algorithmuses
thelineardynamicsystem(LDS)approachtofilterout orthogonal transformation to project high-dimension
components which are not associated with emotional data to a low-dimension space with a minimal loss of
states [41], [42]. For comparison, we also evaluate the information. This transformation is defined in such a
performanceofconventionalmovingaveragemethod. waythatthefirstprincipalcomponenthasthelargest
To make use of the time dependency of emotion possible variance, and each succeeding component
changes and further reduce the influence of emotion- in turn has the highest variance possible under the
unrelated EEG, we introduce the LDS approach to constraint that it is orthogonal to the preceding com-
smooth features. A linear dynamic system can be ponents.
expressed as follows, AlthoughPCAcanreducethefeaturedimension,it
cannotpreservetheoriginaldomaininformationsuch
x =z +w , (5)
t t t as channel and frequency after the transformation.
Hence, we also choose the minimal redundancy max-
z =Az +v , (6)
t t−1 t
imal relevance (MRMR) algorithm to select a feature
where x denotes observed variables, z denotes the subset from an initial feature set. The MRMR algo-
t t
hidden emotion variables, A is a transition matrix, rithm uses mutual information as relevance measure
w is a Gaussian noise with mean w¯ and variable with max-dependency criterion and minimal redun-
t
Q, and v is a Gaussian noise with mean v¯ and dancy criterion. Max-Relevance is to search features
t
variable R. These equations can also be expressed in satisfying (11) with the mean value of all mutual
an equivalent form in terms of Gaussian conditional informationvaluesbetweenindividualfeaturexd and
distributions, class c:
1 (cid:88)
maxD(S,c),D = I(x ;c), (11)
p(xt|zt)=N(xt|zt+w¯,Q), (7) |S| d
xd∈S
and where S represents the feature subset to select. When
p(zt|zt−1)=N(zt|Azt−1+v¯,R). (8) two features highly depend on each other, the re-
spectiveclass-discriminativepowerwouldnotchange
The initial state is assumed to be
much if one of them is removed. Therefore, the fol-
p(z )=N(z |π ,S ). (9) lowing minimal redundancy condition can be added
1 1 0 0
to select mutually exclusive features,
The above model is parameterized by θ =
1 (cid:88)
{A,Q,R,w¯,v¯,π0,S0}.θcanbedeterminedusingmax- minR(S),R= |S|2 I(xdi,xdj). (12)
imumlikelihoodthroughtheEMalgorithm[43]based xdi,xdj∈S
on the observation sequence x . To inference the la-
t The criterion, combining the above two constraints,
tent states z from the observation sequence x , the
t t is minimal-redundancy-maximal-relevance (MRMR),
marginal distribution, p(z |X), must be calculated.
t which can be expressed as
The latent state can be expressed as
maxϕ(D,R),ϕ=D−R. (13)
z =E(z |X), (10)
t t
In practice, an incremental search method is used to
where E means expectation. This marginal distribu- find the near optimal K features.
tion can be achieved by using the messages propaga-
tion method [43]. We use cross validation to estimate
4.4 Classification
the prior parameters.
The extracted features are further fed to three con-
ventional pattern classifiers, i.e., k nearest neighbors
4.3 DimensionalityReduction
(KNN) [46], logistic regression (LR) [47], and support
We compute the initial EEG features based on the vector machine (SVM) [48] and a newly developed
signal analysis. However, the features extracted may pattern classifier, discriminative Graph regularized
be uncorrelated with emotion states, and lead to per- Extreme Learning Machine (GELM) [49], to build
formance degradation of classifiers. What’s more, the emotion recognition systems. For the KNN classifier,
high dimensionality of features may make classifiers the Euclidean distance is selected as the distance
sufferfromthe‘curseofdimensionality’[44].Forreal- metric and the number of nearest neighbors is set to
world applications, dimensionality reduction could 5 using cross validation. For LR, the parameters are
help to increase the speed and stability of the clas- computed by maximal likelihood estimation. We use
sifier. Hence, in this study, we compare two popular LIBLINEAR software [50] to build the SVM classifier
7
with linear kernel. The soft margin parameter is se- emotiondataset,theDEAPdataset3 [10]andcompare
lected using cross-validation. the performance of our models with other methods
ExtremeLearningMachine(ELM)isasinglehidden usedintheexistingstudiesonthesameemotionEEG
layer feed forward neural network (SLFN) [51], while dataset.
learning with local consistency of data has drawn The DEAP dataset consists of EEG and periph-
much attention to improve the performance of the eral physiological signals of 32 participants as each
existing machine learning models in recent years. watched 40 excerpts of one-minute duration music
Peng et al. [49] proposed a discriminative Graph videos. The EEG signals were recorded from 32 ac-
regularizedExtremeLearningMachine(GELM)based tive electrodes (channels) according to the interna-
on the idea that similar samples should share similar tional 10-20 system, whereas peripheral physiological
properties. GELM obtains much better performance signals (8 channels) include galvanic skin response,
in comparison with other models for face recognition skin temperature, blood volume pressure, respiration
[49] and emotion classification [38]. rate, electromyogram and electrooculogram (horizon-
Given a training data set, tal and vertical). The participants rated each video in
terms of the levels of arousal, valence, like/dislike,
L={(x ,t )|x ∈Rd,t ∈Rm}, (14)
i i i i dominance, and familiarity. More details about the
where x = (x ,x ,··· ,x )T and t = (t ,t ,· · DEAP dataset are given in [10].
i i1 i2 id i i1 i2
·,x )T. In GELM, the adjacent W is defined as In this experiment, we adopt an emotion repre-
im
sentation model based on the valence-arousal model.
follows,
Each dimension has value ranging from 1 to 9. Fur-
ther, we segment the four quadrants of the valence-
(cid:26)
x = 1/Nt, if hi and hj belong to the tth class arousal (VA) space according to the ratings. LALV,
i 0, otherwise; HALV, LAHV, and HAHV denote low arousal/low
(15)
valence, high arousal/low valence, low arousal/high
where h = (g (x ),··· ,g (x ))T and h =
i 1 i K i j valence, and high arousal/high valence, respectively.
(g (x ),··· ,g (x ))T are hidden layer outputs for
1 j K j Considering the fuzzy boundary of emotions and the
two input samples x and x . Then we can compute
i j variations of participants’ ratings possibly associated
the graph Laplacian L=D−W , where D is a diago-
with individual difference in rating scale, we add an
nal matrix and each of the entries in D is the column
gap to segment the quadrants of VA space to ensure
sums of W. Therefore, GELM can incorporate two
the correct ratings of participants’ true self-elicitation
regularization terms into conventional ELM model.
emotion and discard the EEG data whose ratings of
TheobjectivefunctionofGELMisdefinedasfollows,
arousalandvalencearebetween4.8and5.2.Thenum-
bersofinstancesforLALV,HALV,LAHV,andHAHV
min||Hβ−T||2+λ Tr(HβLβTHT)+λ ||β||2 (16)
2 1 2 2
β 3.http://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html
where Tr(HβLβTHT) is the graph regularization
term, ||β||2 is the l -norm regularization term, and λ
2 1
9
and λ are regularization parameters to balance two
2
terms.
8
By setting the differentiate of the objective function
(16) with respect to β as zero, we have 7
β =(HHT +λ HLHT +λ I)−1HT (17) 6
1 2
al
s
InGELM,theconstraintimposedonoutputweights ou5
Ar
enforces the outputs of samples from the same class
to be similar. The constraint can be formulated as a 4
regularization term to the objective function of basic
3
ELM, which also makes the output weight matrix
calculated directly.
2
1
5 EXPERIMENT RESULTS 1 2 3 4 5 6 7 8 9
Valence
5.1 ExperimentResultsonDEAPData
In this section, to validate the efficiency of the ma- Fig.3. TheratingdistributionofDEAPonthearousal-
chine learning algorithms used in this study, we first valenceplane(VAplane)forthefourconditions(LALV,
evaluate these algorithms with the publicly available HALV,LAHV,andHAHV)
8
are 12474, 16128, 10962 and 21420, respectively. The The best accuracy of GELM classifier is 69.67% using
rating distribution of DEAP on the arousal-valence DE features of total frequency bands while the best
plane (VA plane) for the four conditions is shown accuracyofSVMclassifieris54.34%.Wealsoevaluate
in Fig. 3. We can see that the ratings are distributed theperformanceofKNN,logisticregressionandSVM
approximately uniformly [10]. We label the EEG data with RBF kernel on DEAP, which achieve the accura-
according to the participants’ ratings of valence and cies (%) and standard deviations (%) of 35.50/14.50,
arousal. 40.86/16.87, and 39.21/15.87, respectively, using the
We firstly extracted the PSD, DE, DASM, RASM, DEfeaturesoftotalfrequencybands.Weperformone
ASMandDCAUfeaturesofthe32-channelEEGdata. wayanalysisofvariance(ANOVA)tostudythestatis-
TheoriginalEEGdatathatwegotfromDEAPdataset ticalsignificance.TheDEfeaturesoutperformthePSD
are preprocessed with a downsample to 128 Hz and features significantly (p < 0.01) and for classifiers,
a bandpass frequency filter from 4.0-45.0Hz and EOG GELM performs much better than SVM (p < 0.01).
artifacts are removed. So we extract the features in As we can also see from Table 3, the diversity of
thefourfrequencybands:theta:4-7Hz,alpha:8-13Hz, classification accuracy for different frequency bands
beta: 14-30Hz, and gamma: 31-45Hz. The features is not significant for the DEAP dataset (p>0.95). The
are further smoothed with the linear dynamic sys- results here do not show specific frequency bands for
tem approach. Then we select SVM and GELM as the quadrants of VA space. We can also see that the
classifiers. In this study, we use SVM classifier with DCAU features achieve comparable accuracies. These
linearkernelandthenumberofhiddenlayerneurons resultsindicatethatthereexistssomekindasymmetry
for GELM is fixed as 10 times of the dimensions which has discriminative information for the four
of input. To use the entire data set for training and affectelicitationconditions(LALV,HALV,LAHV,and
testingtheclassifiers,a5-foldcross-validationscheme HAHV), as discussed in Section 2.2.
is adopted. All experiments here are performed with
5-fold cross-validation and classification performance TABLE4
is evaluated through the classification accuracy rate. ComparisonofvariousstudiesusingEEGinthe
DEAPdataset.Ourmethodachievesthebest
TABLE3 performanceamongtheseapproaches.
Themeanaccuracyrates(%)ofSVMandGELM
classifiersfordifferentfeaturesobtainedfrom
separateandtotalfrequencybands. Study Results
Chung et 66.6%, 66.4% for valence and arousal (2
al. [52] classes), 53.4%, 51.0% for valence and
Feature Classifier Theta Alpha Beta Gamma Total arousal(3classes)withall32participants.
SVM 32.86 33.49 33.73 31.99 36.19 Koelstraet 62.0%, 57.6% for valence and arousal (2
PSD
GELM 61.78 61.14 61.77 61.56 61.46 al. [10] classes) with all 32 participants.
SVM 44.31 41.59 43.54 42.74 47.57 Liu et al. 63.04% for arousal-dominance recognition
DE
GELM 61.45 61.65 62.17 61.45 69.67 [53] (4 classes) with selected 10 participants.
SVM 43.18 42.72 42.07 41.24 40.70 Zhang et 75.19 % and 81.74 % on valence and
DASM
GELM 57.86 57.02 56.08 56.48 52.54 al. [54] arousal(2classes)withselectedeightpar-
ticipants.
SVM 54.34 52.54 53.12 52.76 51.83
RASM
GELM 46.66 44.52 44.88 45.56 52.70 Our 69.67% for quadrants of VA space (4
method classes) with all 32 participants.
SVM 45.03 44.03 43.59 44.03 40.76
ASM
GELM 56.16 54.30 54.40 54.73 51.82
SVM 41.51 41.05 41.27 40.00 40.85
DCAU The recognition accuracy comparison of various
GELM 57.47 56.58 58.25 58.25 55.26
systems using EEG signals in DEAP dataset is pre-
sentedinTable4.Heresinglemodalitysignal(EEG)is
Table 3 shows the mean accuracy rates of SVM used rather than in a combined modality fusion way.
and GELM classifiers for different features obtained Chung et al. [52] defined a weighted-log-posterior
from various frequency bands (theta, alpha, beta and function for the Bayes classifier and evaluated the
gamma)andtotalfrequencybands.Itshouldbenoted method with DEAP dataset. Their accuracies for va-
that ‘Total’ in Table 3 represents the direct concatena- lence and arousal classification are 66.6% and 66.4%
tion of all features from four frequency bands. Since for twoclasses and 53.4%and 51.0% forthree classes,
the EEG data of DEAP are preprocessed with a band- respectively. Koelstra et al. [10] developed the DEAP
pass frequency filter from 4.0-45.0Hz, the results of dataset and they obtained an average accuracy of
delta frequency bands are not included. The average 62.0%, 57.6% for valence and arousal (2 classes), re-
accuracies (%) are 61.46, 69.67, 52.54, 52.70, 51.82 and spectively. Liu et al. [53] proposed a real-time fractal
55.26 for PSD, DE, DASM, RASM, ASM and DCAU dimension (FD) based valence level recognition algo-
features from the total frequency bands, respectively. rithm from EEG signals and got a mean accuracy of
9
63.04% for arousal-dominance recognition (4 classes) respectively. We can see that the LDS approach sig-
with selected 10 participants. Zhang et al. [54] de- nificantly outperforms the moving average method
scribed an ontological model for representation and (p<0.01),whichachieves14.41%higherforaccuracy.
integration of EEG data and their model reached an Andtheresultsalsodemonstratethatfeaturesmooth-
average recognition ratio of 75.19% on valence and ing plays a significant role in EEG-based emotion
81.74% on arousal for eight participants. Although recognition.
their accuracies were relatively high, the categories We compare the performance of four different clas-
of each dimension are only two and these results sifiers,KNN,LogisticRegression,SVMandGELM.In
wereachievedwithasubsetoftheoriginaldataset.In this evaluation, DE features of 310 dimensions were
contrast, our method achieves an average accuracy of used as the inputs of classifiers. The parameter K
69.67%onthesamedatasetforquadrantsofVAspace of KNN was fixed constant five. For LR and linear
(LALV, HALV, LAHV, and HAHV) with PSD features SVM, grid search with cross-validation was used to
from theta frequency band for all 32 participants. tune the parameters. The mean accuracies and stan-
dard deviations in percentage (%) of KNN, LR, SVM
with RBF kernel, SVM with linear kernel and GELM
5.2 ExperimentResultsonSEEDdata
are70.43/12.73,84.08/8.77,78.21/9.72,83.26/9.08and
In this section, we present the results of our ap- 91.07/7.54, respectively. From the above results, we
proaches evaluated with our SEED dataset. One very
importantdifferencebetweenSEEDandDEAPisthat
SEED contains three sessions at the time interval of 100
one week or longer for the same subject.
90
5.2.1 PerformanceofEmotionRecognitionModels 80
We first compare six different features, namely PSD, 70
DE, DASM, RASM, ASM and DCAU, from total
60
frequency bands. Here we use GELM as classifier %)
and the number of hidden layer neurons is fixed acy ( 50
ur
as 10 times of the dimensions of input. We adopt Acc 40
a 5-fold cross-validation scheme. From Table 5, we PSD
30 DE
can see that DE features have the highest accuracy
DASM
and the lowest standard deviation than traditional 20 RASM
PSD features, which imply that DE features are more ASM
suitableforEEG-basedemotionrecognitionthanother 10 DCAU
five different features. For the asymmetry features, 0
Delta Theta Alpha Beta Gamma Total
although they have fewer dimensions than PSD fea- Frequency Bands
tures, they can achieve significant better performance
Fig. 4. The average accuracies of GELM using differ-
than PSD, which means that brain processing about
entfeaturesobtainedfromfivefrequencybandsandin
positive, neutral and negative emotion has asymmet-
afusionmethod.
rical characteristics.
TABLE5 50
Themeansandstandarddeviationsofaccuraciesin
percentage(%)forPSD,DE,DASM,RASM,ASMand
40 pos
DCAUfeaturesfromtotalfrequencybands.
z30 neu bel
H a
Feature PSD DE DASM RASM ASM DCAU q, L
Mean 72.75 91.07 86.76 86.98 85.27 89.95 e
Std. 11.85 7.54 8.43 9.09 9.16 6.87 Fr20 neg
10
We also evaluate the performance of two differ-
ent feature smoothing algorithms. Here, we compare
the linear dynamic system (LDS) approach with the 0
500 1000 1500 2000 2500 3000
conventional moving average algorithm. The size of Time, s
movingwindowsisfiveinthisstudy.Themeansand
standard deviations of accuracies in percentage (%) Fig. 5. The spectrogram of one electrode position T7
forwithoutsmoothing,movingaverage,andtheLDS in one experiment. As different emotions elicited, we
approach are 70.82/9.17, 76.07/8.86 and 91.07/7.54, canseethatthespectrogramhasdifferentpatterns.
10
can see that GELM outperforms other classifiers with 5.2.3 DimensionalityReduction
higher accuracies and lower standard deviations, As discussed above, the brain activities of emotion
which imply that GELM is more suited for EEG- processinghavecriticalfrequencybandsandbrainar-
based emotion recognition. In GELM, the constraint eas, which imply that there must be a low-dimension
imposed on output weights enforces the output of manifold structure for emotion related EEG signals.
samples from the same class to be similar. Therefore, we investigate how the dimension of fea-
tures will affect the performance of emotion recogni-
5.2.2 NeuralSignaturesandStablePatterns
tion. Here, we compare two dimensionality reduction
Fig. 4 presents the average accuracies of GELM clas- algorithms, the principle component analysis (PCA)
sifier with the six different features extracted from algorithm and the minimal redundancy maximal rel-
five frequency bands (Delta, Theta, Alpha, Beta and evance (MRMR) algorithm, with DE features of 310
Gamma) and the direct concatenation of these five dimensions as inputs and GELM as a classifier.
frequency bands. The results in Fig. 4 indicate that Fig.8showstheresultsofdimensionalityreduction.
features obtained from gamma and beta frequency We find that dimensionality reduction does not affect
bands perform better than other frequency bands, the performance of our model very much. For PCA
whichimplythatbetaandgammaoscillationofbrain algorithm, when the dimension is reduced to 210,
activity are more related with the processing of these the accuracy drops from 91.07% to 88.46% and then
threeemotionalstatesthanotherfrequencyoscillation reached a local maximum value 89.57% at dimension
as described in [55]–[57]. of 160. For MRMR algorithms, the accuracies vary
Fig.5showsthespectrogramofoneelectrodeposi- slightly with lower dimension features.
tion T7 in a experiment and Fig. 6 shows the average From the performance comparison between PCA
spectrogram over subjects for each session at some and MRMR, we see that it is better to apply MRMR
electrodes (FPZ, FT7, F7, FT8, T7, C3, CZ, C4, T8, P7, algorithminEEG-basedemotionrecognition.Because
PZ, P8 and OZ). As we can see from Figs. 5 and 6, MRMRalgorithmfindsthebestemotion-relevantand
the spectrograms have different patterns for different minimal redundancy features. It also preserves orig-
elicited emotions. The dynamics of higher frequency inal domain information such as channel and fre-
oscillations are more related to positive/negative quency bands, which have most discriminative in-
emotions,especiallyforthetemporallobes.Moreover, formation for emotion recognition after the trans-
the neural patterns over time are relatively stable for formation. This discovery helps us to reduce the
each session. To see the neural patterns associated computations of features and the complexity of the
with emotion processing, we project the DE features computational models.
to the scalp to find temporal dynamics of frequency Fig.9presentsthedistributionofthetop20subject-
oscillations and stable patterns across subjects. independent features selected by correlation coeffi-
Fig. 7 depicts the average neural patterns for cient. The top 20 features were selected from alpha
positive, neutral, and negative emotion. The results frequency bands at electrode locations (FT8), beta
demonstrate that neural signatures associated with frequency bands at electrode locations (AF4, F6, F8,
positive, neutral, and negative emotions do exist. FT7, FC5, FC6, FT8, T7, TP7) and gamma frequency
The lateral temporal areas activate more for posi- band at electrode locations (FP2, AF4, F4, F6, F8, FT7,
tive emotion than negative one in beta and gamma FC5, FC6, T7, C5). These selected features are mostly
bands, while the energy of the prefrontal area en-
hances for negative emotion over positive one in
93
beta and gamma bands. While the neural patterns
PCA
of neutral emotion are similar to negative emotion, MRMR
92
which both have less activation in temporal areas,
the neural patterns of neutral emotion have higher
91
alpha responses at parietal and occipital sites. For
negativeemotion,theneuralpatternshavesignificant %)90
higher delta responses at parietal and occipital sites cy (
a
and significant higher gamma responses at prefrontal cur89
c
A
sites. The existing studies [58], [59] have shown that
88
EEG alpha activity reflects attentional processing and
beta activity reflects emotional and cognitive pro-
87
cesses. When participants watched neutral stimuli,
they tended to be more relaxed and less attentional,
86
010 50 100 150 200 250 300 350
whichevokedalpharesponses.Andwhenprocessing
Dimension
positive emotion processing, the energy of beta and
gammaresponseenhanced.Ourresultsareconsistent Fig. 8. The results of dimensionality reduction using
with the findings of the existing work [11], [14], [58]. PCAandMRMR