Table Of ContentFramework for Electroencephalography-based Evaluation
of User Experience
Je´re´myFrey MaximeDaniel JulienCastet
Univ. Bordeaux,France ImmersionSAS,France ImmersionSAS,France
[email protected] [email protected] [email protected]
MartinHachet FabienLotte
Inria,France Inria,France
[email protected] [email protected]
6
1
0
2
n
a
J
2
1
]
C
H
Figure1: Wedemonstratehowelectroencephalographycanbeusedtoevaluatehuman-computerinteraction. Forexample,akeyboard(left)canbe
.
s comparedwithatouchinterface(middle)usingacontinuousmeasureofmentalworkload(right,hereparticipant4).
c
[
ABSTRACT INTRODUCTION
1
Measuringbrainactivitywithelectroencephalography(EEG) In practice, a tool is only as good as one’s ability to assess
v
8 is mature enough to assess mental states. Combined with it. Forinstance,evaluationsinHuman-ComputerInteraction
6 existing methods, such tool can be used to strengthen the (HCI)usuallyrelyoninquiries–e.g.,questionnairesorthink
7 understanding of user experience. We contribute a set of aloudprotocols–oronusers’behaviorduringtheinteraction–
2 methodstoestimatecontinuouslytheuser’smentalworkload, e.g.,reactiontimeorerrorrate. However,whilebothtypesof
0 attentionandrecognitionofinteractionerrorsduringdifferent methodshavebeenusedsuccessfullyfordecades,theysuffer
. interactiontasks. Wevalidatethesemeasuresonacontrolled fromsomelimitations. Inquiriesarepronetobecontaminated
1
0 virtualenvironmentandshowhowtheycanbeusedtocompare by ambiguities [23] or may be affected by social pressure
6 differentinteractiontechniquesordevices,bycomparinghere [26]. Itisalsoverydifficulttogainreal-timeinsightswithout
1 a keyboard and a touch-based interface. Thanks to such a disruptingtheinteraction. Indeed, thinkaloudprotocoldis-
: framework,EEGbecomesapromisingmethodtoimprovethe tractsusersandquestionnairescanbegivenonlyatspecific
v
overallusabilityofcomplexcomputersystems. timepoints,usuallyattheendofasession–whichleadsto
i
X abiasduetoparticipants’memorylimitations[17]. Onthe
r otherhand,metricsinferredfrombehavioralmeasurescanbe
a AuthorKeywords computedinreal-time,butaremostlyquantitative. Theydo
EEG;HCIEvaluation;Workload;Attention;Interaction
notprovidemuchinformationaboutusers’mentalstates. For
errors;Neuroergonomy
example,ahighreactiontimecanbecausedeitherbyalow
concentrationlevelorbyadifficulttask[1,14].
ACMClassificationKeywords
Recently,ishasbeensuggestedthatportablebrainimaging
H.5.2UserInterfaces: Evaluation/methodology
techniques–suchaselectroencephalography(EEG)andfunc-
tionalnearinfraredspectroscopy(fNIRS)–havethepotential
toaddresstheselimitations[8,27,11].fNIRShasbeenstudied
toassessusers’workload,forexampletoevaluateuserinter-
faces[15]ordifferentdatavisualizations[25]. However,most
PublicationrightslicensedtoACM.ACMacknowledgesthatthiscontributionwas
oftheseworksareevaluatingpassivetasksorverybasicin-
authoredorco-authoredbyanemployee,contractororaffiliateofanationalgovern-
ment.Assuch,theGovernmentretainsanonexclusive,royalty-freerighttopublishor teractionsthatarenotecological,especiallyasuserinterfaces
reproducethisarticle,ortoallowotherstodoso,forGovernmentpurposesonly andinteractionsaregettingmorecomplex.
CHI’16,May07-12,2016,SanJose,CA,USA
Copyrightisheldbytheowner/author(s).PublicationrightslicensedtoACM.
ACM978-1-4503-3362-7/16/05...$15.00
DOI:http://dx.doi.org/10.1145/2858036.2858525
In this paper, not only do we detail a framework for HCI Errorrecognitionrelatestothedetectionbyusersofanout-
designerstogainfurtherinsightsaboutuserexperienceusing come different from what is expected [22]. We focused on
brain signals – here from EEG – but we also validate this interaction errors [9], which arise when a system reacts in
frameworkonanactualandrealistictask. Weshowthatthey an unexpected way, for example if a touch gesture is badly
canbeusedtocomparedifferentHCI,inanenvironmentthat recognized. Interactionerrorsenabletoassesshowintuitive
providesmanysimulationsandwhereparticipantsareengaged a UI is, and they are hardly measurable by another physio-
inrichinteractions. logicalsignalthanEEG.Thecombinedmeasureofworkload,
attentionanderrorrecognitionconstitutesapowerfulcomple-
Inthefollowingsections,wedescribethevirtualenvironment
mentaryevaluationtoolforpeoplewhodesignnewinteraction
that we developed, specifically aimed at validating the use
techniques.
ofEEGasanevaluationmethodforHCI.Wevalidatedthe
workload induced by our environment in a first study, with Eventhoughcommercialsolutions,suchastheB-Alertsys-
NASA-TLXquestionnaires[14]. Then,wedetailhowEEG tem1, are already pointing this direction, they are validated
canbeusedtoassess3differentconstructs: workload,atten- in the literature with lab tasks only [1]. Our work, on the
tionanderrorrecognition. Finally,duringthemainstudywe otherhand,isclosertothefield. Moreimportantly,asopposed
employEEGrecordingstomeasurecontinuouslysuchwork- toproprietarysoftware,ourmethodologyistransparentand
load,altogetherwiththeattentionlevelofparticipantstoward ourmultidimensionalindexofuserexperiencecanbeeasily
externalstimuliandthenumberofinteractionerrorstheyper- replicated.
ceived,whiletheyinteractwithourvirtualenvironment. We
tooktheopportunityoftheseEEG-basedmeasurestocompare VIRTUAL3DMAZE
akeyboardinputtoatouchscreeninput. The3Dvirtualenvironmentthatwebuiltusesgamification[7]
toincreaseusers’engagementandensurebetterphysiological
Tosummarize,ourmaincontributionsare:
recordings[10]. Suchavirtualenvironmentalsoenablesusto
1. TovalidatetheuseofEEGasacontinuousHCIevaluation assessworkload,attentionanderrorrecognitionduringeco-
methodthankstocontrolledandrealisticinteractiontasks logicalandrealisticinteractiontasks. Indeed,suchconstructs
wedesigned. aretraditionallyevaluatedduringcontrolledlabexperiments
2. Todemonstratehowsuchtoolcanassesswhichofthetested basedonprotocolsfrompsychologythatarevastlydifferent
interactiontechniqueisbettersuitedforaparticularenvi- fromanactualinteractiontask,see,e.g.,[12].
ronment.
3. To propose a framework that can be easily replicated to Overalldescription
improveexistinginterfaceswithlittleornomodifications. Thevirtualenvironmenttakestheformamazewhereplayers
havetolearnandreproduceapathbytriggeringdirectionsat
regularintervals(seeFigure2). Acharacterdisplayedwitha
thirdpersonperspectivemovesbyitselfatapredefinedspeed
Relatedwork insideorthogonaltunnels. Soonafterthecharacterentersa
Since a few years, brain imaging has been used in HCI to new tunnel, symbols appear on-screen. Those symbols are
deepentheunderstandingofusers,thanksnotablytothespread basic2Dshapes,suchassquares,circles,triangles,diamonds
ofaffordableandlightweightdevices. Forinstance,EEGand or stars, and their positions (bottom, top, left or right) indi-
fNIRSareparticularlywellsuitedformobilebrainimaging catewhichdirectionsare“opened”. Playersmustselectone
[20,6]. Thesetechniquescanbeemployedtostudythespatial ofthesesymbolsbeforethecharacterreachestheendofan
focusofattention[32]ortoidentifysystemerrors[33,29]. intersection,eitherbypressingakeyortouchingthescreen.
Ifusersrespondtooearly,i.e.,beforesymbolsappeared,too
EEG and fNIRS are opportunities to assess the overall us-
late,oriftheyselectadirectionthatdoesnotexist,theyloose
ability of a system and improve the ergonomics of HCI. In
points and the character “dies” by smashing against a wall,
a recent work we showed preliminary results regarding the
respawningsoonafteratthebeginningofthecurrenttunnel.
evaluationofmentalworkloadduringa3Dmanipulationtask
[35]. Afterbeingprocessed,EEGsignalshighlightedwhich Themainelementofthegameplayconsistsinselectingthe
partsoftheinteractioninducedahighermentalworkload. In directionsinthecorrectorder. Indeed,onelevelcomprised
thepresentpaperwegomuchbeyond,exploringacontinuous twophases. Duringthe“learning”phaseaparticularsequence
indexofworkloadaswellastwoothersconstructs,namely ofsymbolsishighlighted;ateachsymbols’appearanceone
attentionanderrorrecognition. Wealsorigorouslyvalidate ofthemisbouncingtoindicatethecorrectdirection. Another
suchindexes,andstudytheminlightofbehavioralmeasures cuetakestheformofa“breadcrumbtrail”,abeamoflightthat
(performancesandreactiontimes)andinquiries(NASATLX precedesthecharacterandpointstotherightdirection(see
questionnaire). Figure2b). Selectinganavailablebutincorrectdirectiondoes
notresultinthecharacter’s“death”butleadstoalossofpoints.
Attentionreferstotheabilitytofocuscognitiveresourcesona
Avisualfeedbackisgiventouserswhentheyselectadirection:
particularstimulus[17]. InHCI,measuringtheattentionlevel
thecorrespondingsymbolturnsgreenifthechoiceiscorrect
couldhelptoestimatehowmuchinformationusersperceive.
andredotherwise. Whentheendofthemazeisreached,the
In the present work the measure of attention relates to inat-
characterloopsovertheentirepathsothatplayershaveanother
tentionalblindness;i.e.itconcernsparticipants’capacityto
processstimuliirrelevanttothetask[4]. 1http://www.advancedbrainmonitoring.com/
(a) (b) (c)
Figure2: Thevirtualenvironment,whereplayerscontrolacharacterthatmovesbyitselfinsidea3Dmaze. A:Symbolsappearineachtunnelto
indicatethepossibledirectionsforthenextturn;playershavetoselectaparticularsequenceofsymbols/directions.B:Duringthe“learning”phase,the
correctdirectionishighlightedbyabreadcrumbtrailandtheassociatedsymbolbounces(herethediscontop).C:Controlsdependonthepositionof
thecharacter.Ifthecharacterisontherightside,playershavetopressrightinordertogoup.
opportunitytolearnthesequence. Whenthetrainingphase symbolssequencetheyhavetolearn. Moresymbolstobe
ends, the“recall”phasefollows. Thesymbolsareidentical heldintheworkingmemoryincreasesworkload[12,31].
butthecuesarenomoredisplayed;playershavetoremember • Numberofdirections:ateachintersection,upto4directions
bythemselvestherightpath. Symbolspositionineachtunnel are“opened”inthemaze;thecomplexityofthesymbols
andsymbolssequencearerandomlydrawnwhenanewlevel sequencegrowsasthisnumberincreases.
starts. • Gamespeed: thepaceofthegamecanbeadjustedtoin-
creasetemporalpressure. Whenthespeedincreasessym-
Beside learning a sequence, the principal challenge comes
bols appear sooner and users must respond quicker, thus
fromhowthedirectionsareselected. Thethird-personview
increasing overall stress [14, 19]. In the easiest level the
fulfillsapurpose: theinputdevicethatusersarecontrolling
characterspends6sinatunnelandplayersmustrespond
–i.e.keyboardortouchscreen–ismappedtothecharacter
within3saftersymbolsappearance;inthehardestlevela
position. Sincethecharacterisafuturisticsurferthatdefies
tunnellasts2sandplayershave1stochooseasymbol.
thelawofgravity, itslidesbyitselffromthebottomofthe
tunneltooneofthewallsortotheceilingfromtimetotime. • Spatialorientation: inordertokeepselectingthecorrect
In this latter situation, when the character is upside down, directions, users have to perform a mental rotation if the
commandsareinvertedcomparedtowhatplayersareusedto, charactertheycontroljumpsfromthefloortothewallsor
eventhoughsymbolsremaininthesamepositions. Thisgame totheceiling. Furthermore,theyneedtoupdatetheirframe
mechanismstressesspatialcognitionabilities;usershaveto ofreferenceasoftenasthecharactershiftsfromoneside
constantlyremainawareoftwodifferentframesofreference. toanother. Dependingonthespatialabilityofusers,this
Forexample,if“up”and“left”directionsareopeninagiven mechanismcancauseanimportantcognitiveload[28].
tunnelandifthecharacter’sposition–controlledbytheappli- We used those mechanisms and dimensions to create 4 dif-
cation–isontherightwall,asillustratedinFigure2c,users ferentdifficultylevelsforthegame: “EASY”,“MEDIUM”,
havetopressrighttogoup. Thisdiscrepancybetweeninput “HARD” and “ULTRA” (see Table 1). These levels affect
and output is a reminder of the problematic often observed mostly(symbolic)memoryloadandtimepressure. Indeed,
with3Duserinterfaces,wheremostusersmanipulateadevice the3Dmazeismoreaboutrememberingasequenceofsym-
with2degreesoffreedom(DOF),suchasamouse,tointeract bolsordirectionsratherthanspatialnavigationperse.Because
witha6DOFenvironment. randomizationcouldcreateloopsinthemazetopographyand
Thecombinationofthegamedesignandgamemechanisms sincetherewerenolandmarks,itisunlikelythatparticipants
hereindescribedoffersawidevarietyofelementsthatweput wereabletoadoptanallocentricstrategy.
inusesoastoinvestigateusers’mentalstates.Inparticular,we WhiletheEASYlevelisdesignedtobecompletedwithvery
detailbelowhowwetunedthegameelementstomanipulate littleeffort,theULTRAlevel,ontheotherhand,isdesigned
theuser’sworkloadandattentionincontrolledwaysaswell tosustainaveryhighlevelofworkload,uptothepointthatit
as to trigger interaction errors. Knowing which constructs isbarelypossibletocompleteitwithnoerror. Whileduring
value(e.g.highorlowworkload)toexpect,wecanvalidate EASYlevelsthereisnoneedtoperformmentalrotationsand
whether our EEG-based estimates during interaction match playershavetomemorizeonly2symbolsthatareconstrained
theseexpectations,andthuswhethertheyarereliable. toeitherleftandrightdirections,inULTRAlevelstheframe
ofreferencechangesbetweeneachselectionandthesequence
Manipulatingworkload reaches5symbolsthatcouldappearinall4directions,and
Ourvirtualenvironmentpossessesseveralcharacteristicsthat playershavetoreactthriceasfast. Nomatterthelevel,players
couldbeusedtoinducedifferentlevelsofmentalworkload. had3“loops”tolearnthemazeandanothersetof3loopsto
Wecannotablyadjust4parameters: reproducethepath.
• Mazedepth: thenumberoftunnelsplayershavetocross
beforereachingtheendofthemaze,hencethelengthofthe
pointingisco-locatedwithsoftwareevents,sinceuserscan
Difficulty Depth Directions Resp. time Orientation
directly indicate where they want to interact. However, in
EASY 2 2 3s 0% our case, we decided to mimic exactly the behavior of the
MEDIUM 4 3 2.5s 30% keyboardinterface. Thatistosaythatwiththetouchscreen
HARD 5 4 2s 60% aswellplayershavetoorientatethemselvesdependingonthe
ULTRA 5 4 1s 100% positionofthecharacter. Hence,ifthecharacterispositioned
Table1:Fourdifficultylevelsarecreatedbyleveragingongamemech- ontheleft,playershavetotouchtherightfringeofthescreen
anisms.Depth:numberofdirections/symbolsplayershavetolearn.Di- inordertogoup.Thisismostlycounter-intuitivesinceplayers
rections: numberofpossibledirectionsateachintersection. Response havetoinhibittheurgetopointtotheactualdirectionthey
time:howmuchtimeplayershavetorespondaftersymbolsappearance.
wanttogo;thereisacognitivedissonance.
Orientation:percentagechancethatthecontrolledcharacterchangesits
orientation. Sinceinourexperimentaldesigntheuseofthedirect(touch-
based) interaction is counter-intuitive, we hypothesize that
itwillleadtoanoverallhighernumberofinteractionerrors
Assessingattention
comparedtotheindirectinterface(keyboard).
We relied on stimuli not congruent to the main task in or-
dertoprobeforinattentionalblindness,usingthe“oddball”
PILOT STUDY: VALIDATION OF THE INDUCED WORK-
paradigm. TheoddballparadigmisoftenemployedwithEEG
astheappearanceofrare(i.e. “odd”)stimuliamongastream LOADLEVEL
offrequentstimuli(i.e.distractors)triggersaparticularevent- Wedesignedourvirtualenvironmentasatest-benchaimedat
related potential (ERP) within EEG signals [5]. ERP are inducingseveralmentalstateswithinusers,notably,different
“peaks”and“valleys”inEEGrecordings,andtheamplitudeof workloadlevels. Thus,wehadtoformallyvalidatethemental
someofthemdecreasesasusersarelessattentivetostimuli. workloadthateachgamelevelseekstoinduce. Assuch,we
conducted a pilot study – separate from the main study to
Our protocol uses audio stimuli. It is based on [3], which
alleviate the protocol of the latter –, with no physiological
studiedtheimmersionofvideogameplayers. Inourvirtual
recordings but using the NASA-TLX questionnaire [14], a
environment,whileusers’characterswerenavigatinginthe
wellestablishedquestionnairethataccountsforworkload.
maze, sounds were played at regular intervals, serving as a
background “soundtrack” that was consistent with the user
Protocol
experience. 20%ofthesesoundshadahighpitch(oddevent)
15participantstookpartinthisstudy(4females),meanage
andtheremaining80%hadalowpitch(distractingevents)–
24.53(SD:3.00). Weusedawithin-subjectdesign;allpartici-
thisproportionisonparwiththeliterature[3,9].
pantsansweredforall4difficultylevels. Thegamingsession
Ourhypothesisisthattheattentionlevelofparticipantstoward used the keyboard and started with 2 “training levels”, that
sounds – as measured with the oddball paradigm – should introducedparticipantstothegamemechanisms. Inthefirst
decreaseastheworkloadincrease,sincemostoftheircognitive traininglevel,playerslearnedtheobjectiveofthegame. In
resourceswillbeallocatedtothemaintaskduringthemost thesecondtraininglevel,theydiscoveredhowthecharacter
demandinglevels. couldchangeitsorientationbyitself. Afterthistrainingphase,
participantscontinuedwiththemainphaseoftheexperiment.
Assessingerrorrecognition
Duringthemainphaseoftheexperiment,participantsplayed
EEGcouldbeusedtomeasureinteractionerrors,i.e.errors
once each of the four levels (EASY, MEDIUM, HARD or
originatingfromanincorrectresponseoftheuserinterface,
ULTRA), in a random order. Immediately after the end of
thatdiffersfromwhatuserswereexpecting[9]. Interaction
alevel,participantsweregivenaNASA-TLXquestionnaire
errorsareofparticularinterestforHCIevaluationsincethey
to inquire about their mental workload. The questionnaire
couldaccountforhowintuitiveaninterfaceis[11]. Inorderto
took the form of a 9-points Likert scale. As in the original
testthefeasibilityofsuchmeasure,wedecidedtoimplement
questionnaire[14],itcomprised6items,thatassessedmental
twodifferentinteractiontechniques. Bothofthemusediscrete
demand, physical demand, temporal demand, performance,
events–i.e.symbols’selection–sothatwecouldmoreeasily
effortandfrustration. Theexperimentlastedapproximately
synchronizeEEGrecordingswithin-gameeventslateron.
25minutesandfinishedonceparticipantsplayedall4levels
Thefirsttechniqueusesindirectinteractionsbythemeanofa andfilledthecorrespondingNASA-TLXquestionnaire.
keyboard(Figure1,left). Induetime,left,right,upordown
arrow keysare used to sendthe character inthe tunnel that Results
is situated to its left, right, top or bottom. Indeed, we have Foreachparticipantandeachlevelofdifficulty,weaveraged
seenpreviouslythatinourvirtualenvironmentplayershaveto the6itemsoftheNASA-TLXquestionnaireandnormalized
orientatethemselvesdependingofthepositionofthecharacter. thescalesfrom[1;9]to[0;1]–exceptforthe“performance”
Ifthecharacterismovingonthesides,playershavetoperform item,thatwasnormalizedfrom[1;9]to[1;0]becauseitsscale
amentalrotationof90°,ifitisontheceilingthentheangleis isinreverseordercomparedtotheotheritems(“1”for“good”
180°,i.e.commandsareinverted. and“9”for“poor”).
Thesecondtechniqueusesdirectinteractionbythemeanofa The resulting averaged scores are: EASY: 0.11 (SD: 0.09);
touchscreen(Figure1,middle). Usually,withtouchscreen, MEDIUM:0.32(SD:0.17);HARD:0.43(SD:0.13);ULTRA:
1.00 ofsuchconstructsduringdifferentandcomplexinteraction
tasks.
0.75 Concerningattention,wedidnotdevelopadedicatedtaskper
e
TLX scor0.50 sgeraftoerdittsocoaulirbvrairtitouna.lSeninvcireotnhmeaeundt,iowpersoibmesplwyeureseadlraeasdpyecinifitec-
A− levelofourgame.
S
NA l
0.25
Calibrationofworkload
WeusedtheprotocolknownastheN-backtasktoinduce2
0.00 differentworkloadlevelsandcalibrateourworkloadestimator.
EASY MEDIUMDifficultyHARD ULTRA TheN-backtaskisawell-knowntasktoinduceworkloadby
Figure3:NASA-TLXscoresobtainedduringthepilotstudy.Eachdiffi- playingonmemoryload[24]. Itshowedpromisingresultsin
cultyleveldifferssignificantlyfromtheothers(p<0.01). [35]whereitcouldbeusedtotransfercalibrationresultstoa
3Dcontext.
0.65(SD:0.13)–seeFigure3. Arepeatedmeasuresanaly-
IntheN-backtask,userswatchasequenceoflettersonscreen,
sisofvariance(ANOVA)showedasignificanteffectofthe
thelettersbeingdisplayedonebyone. Foreachlettertheuser
difficultyfactorovertheNASA-TLXscoresandapost-hoc
hadtoindicatewhetherthedisplayedletterwasidenticalor
pairwiseStudent’st-testwithfalsediscoveryrate(FDR)cor-
differenttotheletterdisplayedNlettersbefore,usingaleftor
rectionshowedthateachlevelsdifferedsignificantlyfromthe
rightmouseclickrespectively. Hence,usershavetoremember
others(p<0.01).
nitemsatalltimes.
Discussion Weimplementedaversionsimilarto[12],removingvowelsto
Inthispilotstudy, wedemonstratedthroughquestionnaires preventchunkingstrategiesbasedonphonemes. Weusedthe
thateachdifficultylevelpresentedinTable1inducesadiffer- sametimeconstraintasin[35],i.e.lettersappearedfor0.5s,
entworkloadlevel.Hence,wecanuseourvirtualenvironment withaninter-stimulusintervalof1.5s. Eachuseralternated
asabaselinetoassessthereliabilityofanalogousEEGmea- between“easy”blockswiththe0-backtask(theuserhadto
suresandputintoperspectivethisnewevaluationmethod. identify whether the current letter was a randomly chosen
targetletter,e.g. ‘X’)and“difficult”blockswiththe2-back
EEGINPRACTICE task(theuserhadtoidentifywhetherthecurrentletterwasthe
EEGmeasuresthebrainactivityundertheformofelectrical sameletterastheonedisplayed2lettersbefore),seeFigure4.
currents[21]. ToidentifymentalstatesfromEEG,3typesof
informationcanbeused:
• Frequency domain: oscillations that occur when large
groupsofneuronsfirealtogetheratasimilarfrequency
• Temporal information: ERPs possess temporal features;
positiveandnegative“peaks”withvaryingamplitudesand
delays.
• Spatial domain: position of the electrodes that record a
specificbrainactivity.
However,thereisanimportantvariabilitybetweenpeople’s Figure4: Workloadcalibrationtask. Top: difficulttask(2-backtask),
EEGsignals,andmanyexternalfactorsthatcouldinfluence thetargetletteristheonethatappearedtwostepsearlier,usershaveto
selecttrials4and5. Bottom: easytask(0-backtask),thetargetletter
EEGrecordings(amplifier’sspecifications,electrodesexact
“S”israndomlychosen,usershavetoselecttrials2and5.
location,andsoon). Assuch,itisdifficulttoidentifyauniver-
salsetoffeaturestoestimateagivenmentalstate,fordifferent
Eachblockcontained60letterspresentations. 4letterswere
sessions and participants. This is why machine learning is
drawnatthebeginningofablocksothatthenumberoftar-
typicallyusedinEEGstudies[2]. Withthisapproach,acal-
getlettersaccountedfor25%ofthetrials. Eachparticipant
ibration phase occurs so that the system could learn which
completed6blocks,3blocksforeachworkloadlevel(0-back
featuresareassociatedtoaspecificindividual,duringatask
vs2-back). Therefore,360calibrationtrials(i.e.onetrialbe-
thatisknowntoinducethestudiedconstruct.Oncethecalibra-
ingoneletterpresentation)werecollectedforeachuser,with
tioniscompleted,themachinecouldthenusethisknowledge
180 trials for each workload level (“low” vs “high”). This
togaininsightsaboutanunknowncontext,forexampleanew
calibrationphasetakesapproximately12minutes.
interactiontechniquethatonewouldwanttoevaluate.
Tocalibrateworkloadandattention,wechosetousestandard Calibrationoferrorrecognition
calibrationtasks,validatedbytheliterature,sothatourfind- Wereplicatedthestandardprotocoldescribedin[9]tocalibrate
ings could be easily reproduced. Moreover, as shown later the system regarding error recognition. The task simulates
in this paper, using a single of these tasks to calibrate each ascenarioinwhichuserscontrolthemovementsofarobot.
constructestimatorwasenoughtoobtainreliableestimations The robot appears on screen and has to reach a target. At
eachturnuserscommandtherobottogorightorleftinorder wasalowpitchedbeatof70ms–wedidnotusepuretonesto
to reach the target as fast as possible (with the least steps). improveusersexperience. Thepaceofthegamewasadjusted
However,therobotmayunderstandbadlythegivencommand. sothatasound(targetordistractor)wasplayedeverysecond.
Thisissimulatedbysometrialsduringwhichthecommand Sincetheprobesforattentionreliestotheoddballparadigm,
is(onpurpose)erroneouslyinterpreted;henceaninteraction wechosea20%likelihoodofappearanceforthetargetevent.
errorhappens. TheERPthatcanbeseeninEEGfollowing Thecalibrationlastedabout7minutes,after350soundswere
aninteractionerrorisknownasan“errorrelatedpotential”, played. Note that participants were instructed to count the
ErrP[9]. Thecalibrationtaskisasimplifiedversionofthis “odd”eventsonlyduringthecalibrationphase,andnotduring
scenario: therobotispicturedbyabluerectangleonscreen thecompletionofthe3Dmaze.
thatuserscontrolwiththearrowkeys,thetargetisrepresented
byablueoutline. TherobotisconstrainedtotheXaxisand MAINSTUDY:EEGASANEVALUATIONMETHOD
alongthisaxisthereareonly7differentpositionsbothforthe Themainstudyconsistedintheevaluationofthegameenviron-
robotandthetarget(seeFigure5). mentwithtwodifferenttypesofinterfacesusingEEGrecord-
ings. Assuchwecreateda4(difficulty: EASY,MEDIUM,
1 Press 2 Cursor 3 Cursor HARD,ULTRA)×2(interaction: KEYBOARDvsTOUCH)
right selected moves
within-subjectexperimentalplan. Ourhypothesesare:
1. TheworkloadindexmeasuredbyEEGishigherinTOUCH
and increases with the difficulty, reflecting NASA-TLX
scoresobtainedduringthepilotstudy.
1s 1s 2. The attentional resources that participants assign to the
4 Press 5 Cursor 6 Cursor soundsdecreaseasthedifficultyincreases.
right selected moves 3. TheTOUCHconditioninducesahighernumberofinterac-
Interaction error
tionerrorscomparedtotheKEYBOARDcondition.
Thegamingphasewassplitintotwosequences,oneforeach
interactiontechnique. Toavoidatootediousexperiment,par-
ticipantsalternatedbetweengamesessionsandthe3calibra-
1s 1s
Figure5:Errorrecognitioncalibrationtask. Userscontrolabluefilled tiontasks(workload,attentionanderrorrecognition). Since
rectangle.Theyhavetomoveittoanoutlinedtargetbypressingtheleft the analysis were performed offline, there was no need to
orrightarrowkey. 20%ofthetime,therectanglegoesintheopposite clusterallthecalibrationsatthebeginningoftheexperiment.
direction,thuscausinganinteractionerror.
Theorderofthegamingsessionsandcalibrationphaseswas
We choose a ratio for the occurrence of interaction errors counter-balancedbetweenparticipantsfollowingalatinsquare
thatisconsistentwiththeliterature. 80%ofthemovements (see Figure 6). After the experiment, the signals gathered
matched the actual key pressed and for the other 20% the fromthecalibrationtaskswereprocessedinordertoevaluate
“robot”movedintheoppositedirection. Itwasnecessarynot boththevirtualenvironment(difficultylevels)andthechosen
tobalancebotheventsastoofrequenterrorsmaynotbeper- interactiontechniques.
ceivedasunexpectedanymore,andthusmaynotleadtoan
ErrP.Atimerwassettopreventtheappearanceofartifacts,
suchasmusclemovements,withinEEGrecordings(seethe
Discussionsectionforfurtherconsiderationforartifacts). The
rectanglemoves1safterakeywaspressed,andaftermove-
ment completion users have to wait another 1s before they
couldpressakeyagain. Therectangleturnedyellowtotell
userstheycouldnotcontrolitduringthatsecond.
Atrialiscompletedoncetherobotreachesthetarget. Atrial
fails if after 10 attempts the robot is not yet on target. At
thebeginningofeachtrial,thescreenisreinitializedwitha
randomnewpositionfortherobotandthetarget. Thelasttrial
occurredafter350interactionswereperformed. Onaverage
thiscalibrationphaselasted15minutes.
Figure6: Theorderofthe3calibrationtasksand2interactiontech-
niqueswascounter-balancedbetweenthe12participantstoimproveen-
Calibrationofattention
gagement.
Thecalibrationofattentionoccurredwithinasimplifiedver-
sionofthevirtualenvironment. Usersdidnothavetocontrol
thecharacterduringthisspeciallevel,itwasmovingbyitself Apparatus
through the maze. They were asked to watch the character EEGsignalswereacquiredat512Hzwith2g.tecg.USBamp
andcountintheirheadhowmanytimestheyheardthe“odd” amplifiers. We used 32 electrodes placed at the AF3, AFz,
sound,i.e.,ahighpitchedbelllasting200ms. Thedistractor AF4,F7,F3,Fz,F4,F8,FC3,FCz,FC4,C5,C3,C1,Cz,C2,
C4,C6,CP3,CPz,CP4,P7,P3,Pz,P4,P8,PO7,POz,PO8, foreachbandasetofCommonSpatialPatterns(CSP)spa-
O1,OzandO2sites. tial filters. That way, we reduced the 32 original channels
downto6 “virtual”channels thatmaximize thedifferences
12participantstookpartinthisstudy(3females),meanage
betweenthetwoworkloadlevels[30]. Sincethecalibration
26.25(SD:3.70). Allofthemreportedadailyuseoftactile
(N-backtask)andusecontexts(virtualenvironment)differs
interfaces. Theexperimentoccurredinaquietenvironment,
substantially, we used a regularized version of these filters
isolatedfromtheoutside. Thereweretwoexperimentersin
calledstationarysubspaceCSP(SSCSP)[35]. SSCSPfilters
theroomandtheprocedurecomprisedthefollowingsteps:
aremorerobusttochangesbetweencontextssincetheytake
1. Participantsenteredtheroom,readandsignedaninformed into account the distributions of the EEG signals recorded
consentformandfilledademographicquestionnaire. duringboththecalibrationandtheusecontexts(inanunsu-
2. WhileoneoftheexperimenterinstalledanEEGcaponto pervisedway,i.e.withoutconsideringtheexpectedworkload
participants’heads,theotherexperimenterintroducedpar- levels)toestimatespatialfilterswhoseresultingsignalsare
ticipantstothevirtualenvironment. Theyplayed2training stableacrosscontexts(see[35]fordetails). Finally,foreach
levelsandthe4mainlevelsinanincreasingorderofdif- frequencybandandspatialfilter, weusedtheaverageband
ficulty. They could redo some levels if they did not feel powerofthefilteredEEGsignalsasfeature. Thisresultedin
confidentenough. 30EEGfeatures(5bands×6spatialfiltersperband).
3. Oneofthe3calibrationtasksoccurred(workload,attention
orerrorrecognition). Processingattentionanderrorrecognition
4. Participantsplayedthegameusingoneofthe2interaction Sincebothattentionanderrorrecognitioncanbemeasuredin
techniques(KEYBOARDorTOUCH).Thefourlevelsof ERPs,theysharethesamesignalprocessing.Weselectedtime
difficulty (EASY, MEDIUM, HARD, ULTRA) appeared windowsof1s,startingattheeventofinterest(i.e.soundsfor
twiceduringthesession,inarandomorder. ForTOUCH, attention, rectangle’s movements for error recognition). In
a dedicated training session occurred beforehand so that ordertoutilizetemporalinformation,featureextractionrelied
participantcouldgetusedtothisinteractiontechnique. on regularized Eigen Fisher spatial filters (REFSF) method
5. Anothercalibrationtaskoccurred,differentfromstep3. [16]. Thanks to this spatial filter, specifically designed for
6. Participantstestedthesecondinteractiontechnique. Asin ERPsclassification,the32EEGchannelswerereducedtoa
step 4, TOUCH was preceded by a training session, that setof5channels. Wethendecimatedthesignalbyafactor32.
lasteduntilparticipantsfeltconfidentenoughtoproceedto The“decimate”functionofMatlab, thatappliesalow-pass
themaintask. filterbeforedecimationtopreventaliasing, wasused. Asa
7. Participantsperformedthelastremainingcalibrationtask. result,therewas80featuresbyepoch(5channels×512Hz×
1s/32).
Agamesession(steps4and6)tookapproximately20minutes
andthewholeexperimentlasted2hours.
Classification
WeusedashrinkageLDA(lineardiscriminantanalysis)asa
EEGAnalyses
classifiersinceitismoreefficientthantheregularLDAwitha
The calibration tasks were used to train a classifier specific
highnumberoffeatures[18].
toeachofthestudiedconstruct. Classifierswerecalibrated
separatelyforeachparticipantwhichensuredmaximalEEG Foreachconstructtherewastwosteps: firstweusedthedata
classificationperformances. WeusedEEGLAB13.4.4b2and collectedduringthecalibrationtaskstoestimatetheperfor-
MatlabR2014atoprocessEEGsignalsoffline. EEGfeatures manceoftheclassifiers. Second,westudiedtheoutputofthe
associatedtoworkloadrelatetothefrequencydomainwhile differentclassifierstoevaluatethevirtualenvironment.
thefeaturesassociatedtoattentionanderrorrecognitionrelate
Toassesstheclassifiers’performanceonthecalibrationdata,
totemporalinformation,asdetailedbelow.
weused4-foldcross-validation(CV).Moreprecisely,wesplit
Processingworkload the collected data into 4 parts of equal size, selecting trials
From the signals collected during the N-back tasks, we ex- randomly,used3partstocalibratetheclassifiersandtestedthe
tractedEEGfeaturesfromeach2stimewindowfollowinga resultingclassifiersontheunseendatafromtheremainingpart.
letterpresentation. Weusedeachofthesetimewindowsas Thisprocessoccurred3moretimessothatintheend,eachpart
anexampletocalibrateourclassifier,whoseobjectivewasto wasusedonceastestdata. Finally,weaveragedtheobtained
learnwhetherthesefeaturesrepresentedalowworkloadlevel classificationperformances. Theperformancewasmeasured
(inducedbythe0-backtask)orahighworkloadlevel(induced usingtheareaunderthereceiver-operatingcharacteristiccurve
by the 2-back task). Once calibrated, this classifier can be (AUROCC).TheAUROCCisametricthatisrobustagainst
usedtoestimateworkloadlevelsonnewdata,herewhileour unbalancedclasses,asitisthecasewithattentionanderror
userswereinteractingwiththevirtualenvironment. recognition(20%oftargets,80%ofdistractors). Ascoreof
“1”meansaperfectclassification,ascoreof“0.5”ischance.
Asin[35],wefilteredEEGsignalsinthedelta(1-3Hz),theta
(4-6Hz),alpha(7-13Hz),beta(14-25Hz)andgamma(26- Once the classifiers were trained thanks to the calibrations
40 Hz) bands. To reduce features dimensionality, we used tasks,wecouldusethemontheEEGsignalsacquiredwhile
participantswereinteractingwiththevirtualenvironment,to
2http://sccn.ucsd.edu/eeglab/ estimatethedifferentconstructsvalues.
Forworkload,weused2slongslidingtimewindowsthatwere
overlapping by 1s, to extract signals and feed the classifier. KEYBOARD TOUCH
From the outputs that was produced by the LDA classifier
0.2
for each participant (i.e., the distance to the separating hy-
perplane),wefirstremovedoutliersbyiterativelyremoving x
e 0.1
oneoutlieratatimeusingaGrubb’stestwithp=0.05,until d
n
nomoreoutlierwasdetected[13]. Wethennormalizedthe d i
outlier-free scores between -1 and +1. As such, for all par- oa 0.0
kl
ticipantsaworkloadindexcloseto+1representsthehighest or
mentalworkloadtheyhadtoendurewhiletheywereplaying. W−0.1
Itshouldcomeclosetothe2-backconditionofthecalibration
phase. Ontheopposite,aworkloadindexcloseto-1denotes −0.2
thelowestworkload,similartothe0-backcondition.
EASYMEDIUMHARD ULTRA EASYMEDIUMHARD ULTRA
Difficulty
Theprocesswassimilarforattention,butweonlyextracted
(a)
epochsthatcorrespondedtothetargetstimulionset,i.e.when
the high pitch sound was played. Note that contrary to [3],
l
thatstudiedtheamplitudesofERPsanddidnotusethedata
gatheredduringthecalibrationphase, herewekeptthema- 0.06 l o0.4
cspehaeirnntiecaislpeaaanrcntosinnnfigodtaeipcnpecdreooiandcdhde.xeAvoesfnstthusecwhLh,DitlhAeetcrhleaessyuslwitfiienerrgeaspbclooauryetisnwgch.aenthbeer on index0.04 l l gnition rati0.3
nti co
Asfortheclassifierdedicatedtoerrorrecognition,theprocess- Atte or re0.2
ingdiffers. Indeed, wecouldnotassumewhichinteraction 0.02 l Err
yieldedornotaninteractionerror,i.e.ifandwhenparticipants
0.1
perceived a discrepancy between what they intended to do
andwhatoccurred. Consequently,wesimplycountedoveran EASY MEDIUM HARD ULTRA KEYBOARDTOUCH
Difficulty Interaction
entiregamesessionthenumberoftimestheclassifierlabelled
(b) (c)
aninteractionasbeingerroneousintheeyeoftheparticipants.
Figure7: EEGmeasures. a: Theworkloadindexsignificantlydiffers
acrossdifficultiesandbetweeninteractiontechniques. b: Theattention
indexsignificantlydiffersacrossdifficulties. c: Thenumberofinterac-
Results
tionerrorsdiffersbyatendencybetweenKEYBOARDandTOUCH.
Unlessotherwisenoted,wetestedforsignificanceusingre-
peated measures ANOVA. For significant main effects, we
usedpost-hocpairwiseStudent’st-testwithFDRcorrection. (Figure7b). Thepost-hocanalysisshowedthattheULTRA
levelsignificantlydiffersfromtheothers(p<0.05).
Workload
Onaverage,theclassifierAUROCCscoreduringthetraining Errorrecognition
taskwas0.92(SD:0.06)–seeTable2. Overthetestsetthere Onaverage,theclassifierAUROCCscoreduringthetraining
wereonaverage2171datapointspersubjectacrossallcondi- taskwas0.82(SD:0.10)–seeTable2. Overthetestsetthere
tion(timewindows). Thestatisticalanalysisoftheclassifier wereonaverage388datapointspersubjectacrossallcondi-
outputduringthegamesessionshowedasignificanteffectof tions(interactions). Duetothenatureofthedata(numbers
thedifficultyfactor(p<0.01);theworkloadindexincreasing ofinteractionerrorsacrossentiregamesessions),weuseda
along the difficulty of the levels (Figure 7a). The post-hoc one-tailedWilcoxonSignedRankTesttostressourhypothesis.
analysisshowedthatalldifficultylevelssignificantlydiffers Thenumberofinteractionerrorsdiffersbyatendency(p=
onefromtheotherwithp<0.01;exceptfortheMEDIUM 0.08)betweentheKEYBOARDandtheTOUCHconditions.
level,whichdiffersfromEASYwithp<0.05andwithHARD 19%oftheinteractions(SD:9%)werelabelledasinteraction
onlybyamargin(p=0.11). Therewasasignificanteffectof errorsbytheclassifierforKEYBOARDvs22%(SD:9%)for
theinteractionfactoraswell(p<0.01),theworkloadbeing TOUCH(Figure7c).
higheronaverageduringtheTOUCHcondition. Therewas
nointeractionbetweendifficultyandinteractionfactors. Behavioralmeasures
BesidesEEGmetrics,wehadtheopportunitytostudypartic-
Attention ipants’reactiontimeandperformancesoastogetaclearer
Onaverage,theclassifierAUROCCscoreduringthetraining pictureoftheiruserexperience.
taskwas0.86(SD:0.05)–seeTable2. Overthetestsetthere
were on average 497 data points per subject across all con- Reactiontime
ditions(oddevents). Thestatisticalanalysisoftheclassifier Therewasasignificanteffectofboththedifficultyandinter-
output during the game session showed a significant effect actionfactors,aswellasaninteractioneffectbetweenthem
of the difficulty factor (p < 0.01) but not of the interaction (p < 0.01). Post-hoc tests showed that all difficulty levels
factor. Theattentionindexdecreasesasthedifficultyincreases differfromoneanother(p<0.01),exceptforMEDIUMand
Construct P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Average
Workload 0.85 0.93 0.98 0.95 0.97 0.97 0.79 0.87 0.87 0.98 0.95 0.94 0.92
Attention 0.83 0.82 0.96 0.81 0.85 0.90 0.82 0.82 0.86 0.92 0.88 0.83 0.86
Errorrecognition 0.88 0.57 0.90 0.90 0.86 0.90 0.78 0.80 0.88 0.78 0.85 0.74 0.82
Table2:Classificationaccuracyduringthecalibrationtasksforthe3measuredconstructs(AUROCCscores).
Thesearepromisingresultsforthosewhoseektoassesshow
KEYBOARD TOUCH KEYBOARD TOUCH intuitiveaUIiswithexocentricmeasures[11].
me1.1 e1.0
e ti1.0 anc0.8 In this study, we chose to use the particularity of the touch
s0.9 m
espon00..78 Perfor0.6 sactroeuenchtoscmreaeknefthoertiatsskpomsosribeidliitfyficoufldt.irIencdteemda,nwiphuilleatwioenu,swede
R 0.4 keptthecharacterasaframeofreference,resultingininput
E M H U E M H U E M H U E M H U
Difficulty Difficulty commandsthatwere(patently)notco-localizedwithoutput
(a) (b)
directions. Besidesresultsdenotingthedifferencesbetween
Figure8:Behavioralmeasures:reactiontimeinseconds(left)andper- theconditions,participantsalsospontaneouslyreportedhow
formance (proportion of correctly selected directions – right) signifi-
nonintuitivethisconditionwas. Wewantedtoinvestigateour
cantlydiffersbetweendifficultylevelsandinteractions. E:EASY,M:
MEDIUM,H:HARD,U:ULTRA. evaluation methodon a salientdifference atfirst. Then our
frameworkcouldwellbeemployedtogofurther;forexample
seekingphysiologicaldifferencesbetweendirectandindirect
HARD,whichdonotdiffersignificantly(p=0.91). Themean manipulationinterfacesinmoretraditionaltasks.
reactiontimeswererespectivelyforEASY,MEDIUM,HARD
It is interesting to note how those EEG measures could be
andULTRA:0.78s(SD:0.14),0.97s(0.18),0.98s(0.15)and
combinedwithexistingmethodstobroadentheoverallcom-
0.69s(0.06). Themeanreactiontimewas0.78(0.12)forKEY-
prehension of the user experience. For instance, while we
BOARDand0.93(0.13)forTOUCH.SeeFigure8a.Notethat
did show significant differences across difficulty levels and
usershadlesstimetorespondduringhigherdifficultylevels.
betweeninteractiontechniqueswithbehavioralmeasures(re-
actiontimeandperformanceindex),EEGmeasurescouldhelp
Performance
tounderstandtheunderlyingmechanisms. Becausewehave
Theperformancewascomputedastheratiobetweenthenum-
amoredirectaccesstobrainactivity,wecanmakeassump-
berofcorrectselectionsandthetotalnumberofinteractions.
tions about the cause of observed behaviors. For example
Therewasasignificanteffectofboththedifficultyandinter-
participants’worseperformancewithTOUCHthanwithKEY-
actionfactors,aswellasaninteractioneffectbetweenthem
BOARDcouldbeduetothefactthattheyanticipatelessthe
(p < 0.01). Post-hoc tests showed that all difficulty levels
outcomesoftheiractions(moreinteractionerrors);thehigher
differfromoneanother(p<0.01). Themeanperformance
reactiontimemaynotonlybecausedbytheinteractiontech-
wasrespectivelyforEASY,MEDIUM,HARDandULTRA:
niqueperse,butbyahigherworkload. Andwhileparticipants
98%(SD:3),89%(12),83%(17)and55%(21). TheMean
managetocopewiththefastpaceoftheULTRAlevel(the
performancewas85%(13)forKEYBOARDand77%(13)
smallestreactiontimes),theincreaseinperceptualloadlower
forTOUCH.SeeFigure8b.
theirawarenesstotask-irrelevantstimuli.
Additionally,wecanobservethattheperformancesobtainedat
Discussion
theEASY,MEDIUMandHARDlevelsareverysimilarwith
Mostofthemainhypothesesareverified. Theworkloadin-
thekeyboardandthetouchscreen(seeFigure8b). However,
dex as computed with EEG showed significant differences
EEG analyses revealed that the workload was significantly
thatmatchtheintendeddesignofthedifficultylevels. Itwas
higher in the TOUCH condition, meaning that users had to
alsoshownthatinthehighestdifficultytheattentionlevelof
allocate significantly more cognitive resources to reach the
participantstowardexternalstimuliwassignificantlylower–
same performance. This further highlights that EEG-based
i.e.inattentionalblindnessincreased. Concerningtheinterac-
measuresdobringadditionalinformationthatcancomplement
tiontechniques,thenumberofinteractionerrorsasmeasured
traditionalevaluationssuchabehavioralmeasures.
by EEG was higher with the TOUCH condition, but this is
atendencyandnotasignificanteffect. Theworkloadindex, Measuringusers’cognitiveprocessessuchasworkloadand
on the other hand, was significantly higher in the TOUCH attentionmayproveparticularlyusefultoassess3Duserin-
conditioncomparedtotheKEYBOARDcondition. terfaces(3DUI),sincetheyareknowntobemorecognitively
demanding. They require users to perform 3D mental rota-
Thanks to the ground truth obtained during the pilot study
tiontaskstosuccessfullymanipulateobjectsortoorientate
withtheNASA-TLXquestionnaire,theseresultsvalidatethe
themselvesinthe3Denvironment. Moreover,theusualneed
useofaworkloadindexmeasuredbyEEGforHCIevaluation
foramappingbetweentheuserinputs(withlimiteddegrees-
andsetthepathfortwootherconstructs: attentionanderror
of-freedom–e.g.,only2foramouse)andthecorresponding
recognition.Besidetheevaluationofthecontent(i.e.difficulty
actionson3Dobjects(withtypically6degrees-of-freedom),
levels)wewereabletocomparetwointeractiontechniques.
makes3DUIusuallydifficulttoassessanddesign. Werepro- Thereliabilityofmentalstatesmeasuresisstronglycorrelated
ducedpartofthisproblematicwithourgameenvironmentand to the quality of EEG signals. Interestingly enough, partic-
obtainedcoherentresultsfromEEGmeasures. ipants’ mindset during the recordings is one of the factors
influencingEEGsignals. Theirawarenessandinvolvement
towardthetasksimprovesystemaccuracy.Theformofthecal-
KEYBOARD TOUCH
ibrationtaskscouldbeenhancedtoengagemoreusers,forex-
amplethroughgamification[10]–andourvirtualenvironment
x 0.25
e provedtobesuitabletodoso. Whereasourparticipantswere
nd Difficulty volunteersenrolledamongstudents,intheendtheoutcomeof
ad i 0.00 EMAESDYIUM anevaluationmethodbasedonEEGshouldbestrengthened
o HARD
Workl−0.25 ULTRA breyliraebclryuitthiengdidffeedriecnattecdontessttreurcst,sucsoinugldabseseelseticmtioatnedcrfitreormiathhoewir
EEGsignalsduringcalibrationtasks.
Finally,oneshouldacknowledgethatwhenitcomestorecord-
0 250 500 750 1000 0 300 600 900
Time (s) ingsassensitiveasEEG,artifactssuchastheonesinducedby
Figure9: Workloadindexovertimeforparticipant3–60ssmoothing muscularactivityareofmajorconcern. Thewayweprevented
window.Left:KEYBOARDcondition,right:TOUCHcondition.Back- theappearanceofsuchbiasinthepresentstudyisthreefold.
groundcolorrepresentsthecorrespondingdifficultylevel.
1)Thehardwareweused–activeelectrodeswithAg/AgCl
coating–isrobusttocablemovements,seee.g.,[34]. 2)The
Above all, an evaluation method based on EEG enables a classifiersweretrainedonfeaturesnotrelatedtomotionarti-
continuous monitoring of users. The intended use case of factsormotorcortexactivation. 3)Thepositionofthescreen
ourframeworkistoenrolldedicatedtestersthatwouldwear duringthe“touch”conditionminimizedparticipants’motion,
theEEGequipmentandperformwellduringthecalibration andgesturesoccurredmostlybeforethetimewindowusedfor
tasks. Asamatteroffact,thebestperformerduringworkload detectinginteractionerrors. Theseprecautionsareimportant
calibration(participant3inTable2)showspatternsthatclearly forthetechnologytobecorrectlyapprehended.
meet the expectations concerning both difficulty levels and
Tofurthercontrolforanybiasinourprotocol,weranabatch
interactions,aspicturedinFigure9.
of simulations where the labels of the calibration tasks had
beenrandomlyshuffled,similarlytotheverificationprocess
describedin[35]. Shouldartifactsbiasourclassifiers,differ-
LIMITATIONSANDFUTURECHALLENGES
ences would have appeared between the KEYBOARD and
AlthoughusingEEGmeasuresasanevaluationmethodfor
TOUCHconditionsevenwithsuchrandomtraining. Among
HCIwasprovenconclusiveregardingworkload–weobtained
the20simulationsthatranforeachofthe3constructs(work-
acontinuousindexonparwithagroundtruthbasedontra-
load, attention, error recognition), none yielded significant
ditionalquestionnaires–thetwootherconstructswestudied
differences.
couldbenefitfromfurtherimprovements.
Despitethedirectinteraction(TOUCH)beingmoredisorient- CONCLUSION
ingforusersthantheindirectone(KEYBOARD),therecog- Inthispaper,wedemonstratedhowbrainsignals–recorded
nitionofinteractionerrorsdifferedonlybyatendency. This bymeansofelectroencephalography–couldbeputintoprac-
couldbeexplainedbythefactthatthecalibrationtaskwastoo tice in order to obtain a continuous evaluation of different
dissimilartothevirtualenvironment.Notably,whiletherewas interactiontechniques,forassessingtheirergonomicprosand
fewandslowpacedeventsduringthecalibration,userswere cons. Inparticular,wevalidatedanEEG-basedworkloadesti-
confrontedtomanystimuliwhiletheywereplaying, hence matorthatdoesnotnecessitatetomodifytheexistingsoftware.
overlapping ERPs must have appeared within EEG, which Furthermore,weshowedhowusers’attentionlevelcouldbe
mayhavedisturbedtheclassifier. Acalibrationtaskcloserto evaluatedusingbackgroundstimuli,suchassounds. Finally,
real-lifescenariosthantheonedescribedin[9]shouldbeenvi- weinvestigatedhowtherecognitionofinteractionerrorscould
sioned. Suchtaskshouldremaingenericinordertofacilitate helptodeterminethebestuserinterface.
thedisseminationofEEGasanevaluationmethodforHCI.
Beingabletoestimatethesethreeconstructs–workload,atten-
Signalprocessingcouldalsofacilitatethetransferoftheclassi- tionanderrorrecognition–continuouslyduringrealisticand
ficationbetweenastandardtaskandtheevaluatedHCI.Indeed, complexinteractiontasksopenednewpossibilities. Notably,
ifourresultsdemonstratethattheEEGclassificationofwork- itenabledustoobtainadditionalandmoreobjectivemetrics
loadcouldbetransferredfromtheN-backtaskstoadissimilar ofuserexperience,basedontheusers’cognitiveprocesses. It
virtual environment and user interface, we benefited from alsoprovideduswithadditionalinsightsthattraditionalmea-
spatialfiltersthatspecificallytakeintoaccountthevariance sures(e.g.,behavioralmeasures)couldnotreveal. Tosumup,
betweencalibrationcontextsandusecontexts–thestationary thissuggeststhatcombinedwithexistingevaluationmethods,
subspace CSP [35]. Since ERPs may also slightly differ in EEG-basedevaluationtoolssuchastheonesproposedhere
amplitudesanddelaysbetweencalibrationandusecontexts, canhelptounderstandbettertheoveralluserexperience. We
inthefuture,itwouldbeworthdesigningsimilarapproaches hopethattheincreasingavailabilityofEEGdeviceswillfoster
tooptimizetemporalorspatialfiltersforERPsaswell. suchapproachesandbenefittheHCIfield.