Table Of ContentSemi-Markov model for simulating MOOC students
Louis Faucon, Łukasz Kidzin´ski, Pierre Dillenbourg
ComputerHumanInteractioninLearningandInstruction
ÉcolePolytechniqueFédéraledeLausanne
{louis.faucon,lukasz.kidzninski,pierre.dillenbourg}@epfl.ch
ABSTRACT Weproposeaprobabilisticmodel, basedonextendedversionof
Large-scaleexperimentsareoftenexpensiveandtimeconsuming. MarkovChains,calledsemi-MarkovChains.Inthemodel,wecan
AlthoughMassiveOnlineOpenCourses(MOOCs)provideasolid balancethecomplexityofthestructureandthenumberofparame-
andconsistentframeworkforlearninganalytics,MOOCpractition- terstoestimatebycross-validatingitsparameters. Wepresentan
ersarestillreluctanttoriskresourcesinexperiments.Inthisstudy, algorithmforfittingthemodelaswellasillustrativeexamplesof
wesuggestamethodologyforsimulatingMOOCstudents,which thefitonasetofMOOCs.
allowestimationofdistributions,beforeimplementingalarge-scale
experiment. Thecontributionsofthispaperarethreefold.First,weinvestigate
towhatextentSemi-Markovchainscanbeusedtodescribebe-
Tothisend,weemploygenerativemodelstodrawindependentsam- haviouralpatternsofstudents(RQ1). Second,sinceourmodel
plesofartificialstudentsinMonteCarlosimulations.WeuseSemi- implicitlydividesusersintoclusters,weanalyseiftheseclusters
MarkovChainsformodelingstudent’sactivitiesandExpectation- are interpretable (RQ2). Third, we analyse how these models
Maximizationalgorithmforfittingthemodel.Fromthefittedmodel, canbeusedtoinferdistributionsofevents(RQ3).
wegeneratesimulatedstudentswhoseprocessesofweeklyactivities
aresimilartotheseoftherealstudents. 2. RELATEDWORK
Modelingstudentsisakeyconceptinlearninganalyticsandedu-
Keywords cationalresearchingeneral. Researchersbuildmodelspredicting
MOOCs;simulationofstudents;generativemodels;Expectation- motivationandcognition,basedonstudent’sgoals[19]ortheypre-
Maximization;Semi-Markovchains;Bayesianstatistics dictgoalsbymotivationaltraits[7].Largedatasetsallowresearchers
tofindpredictivepowerofseeminglyslightlyrelatedsignalslike
1. INTRODUCTION thelengthofpausesinavideo[12]orpotentiallynoisysignalslike
headmovementintheclassroom[16].
Vastamountsofdatawhichwegatherandanalyseinmodernlearn-
ingenvironmentsallowustobuildmodelsofunprecedentedscale
andaccuracy.Thisphenomenon,inparallelwithdevelopmentsin 2.1 GenerativemodelsinMOOCs
computerscience,gaverisetonewpossibilitiesofinferencefrom Alltheaforementionedmodelsarefocusedonpredictionandbelong
educationalenvironments.Inparticular,thegrowingfieldofSimu- totheclassofso-calleddiscriminativemodels. Inthisstudy,we
latedLearners[8,11,14]providesuswithtoolsforinferencefrom suggestagenerativemodel,whichallowusnotonlytopredict,but
educationalsimulations. alsotogenerateobservationsfromtheestimateddistribution.These
modelscapturetheprobabilitystructureofinputvariablesandthe
Inferencefromanysimulationsisboundedbythepredefinedlevel flowoftheprocesses.SeveralgenerativemodelsinMOOCshave
of abstraction of the analysis. In the context of Massive Online beenapplied,e.g.toforums[3].
OpenCourses(MOOCs),ononehandasaneducationalinstitution
wehaveaccesstoonlyahandfulofMOOCs,onanotherhand,we Amongmanygenerativemodelsthatcanbeencounteredineduca-
havedataasgranularasstudent’sclickstreaminavideoplayer.We tionalresearch,Markovmodelswereemployedforvisualization[5],
arethereforeobligedtomodelgranularityrobustly,dependingon for modeling engagement [17] and for modeling students reten-
theavailabilityofthedata.Wearguethatunderstandingtheproper- tion[1].
tiesofthestatisticalmethodologyathandiscrucialforsuccessful
inference. 2.2 Simulatedlearner
Theareaofsimulatingstudents’behaviourlaysontheintersection
ofcognitivescienceandartificialintelligence.Examplesofapplica-
tionsofsimulationofstudentscanbefoundevenoutsidecomputer
science,wheretheteachersimulatesstudent’sresponseinorderto
self-improveinstructionalskills[18]. Anacknowledgedexample
of the usage of simulating humans [9] for education deals with
simulationsofpatientsbehaviourfortrainingmedicinestudents.
EmergenceofInternetandnewdatastoragetechniquesallowre-
Proceedings of the 9th International Conference on Educational Data Mining 358
searcherstocollectandanalysemassiveamountsofinformation ForumPost}.Weaddtothissetonespecialtypeofevent,EndOfDay.
abouttheusers.Researchersemploysimulationsforclusteringstu- ThiseventcorrespondstotheendofinteractionswithMOOCsona
dents[13]. Forareviewofearliertechniqueswereferto[2]. We givenday.Weusethesymbola Atodesignateanytypeofactivity.
∈
motivateourmethodologybytheadvancementsofusermodelingin Onecanextendthesetofactivitiestoothereventsifneededfor
webcontext[4],aswefindthisenvironmentconceptuallycloseto certainapplication.
theenvironmentofaMOOC.
Note that we do not specify the regular ’end of a course’ event,
3. GENERALFRAMEWORK sinceweonlymodelthebehaviourwithinthelimitedtime-frame
ofacourseandwetreatthelastdayofthecourseasthelastdayof
3.1 Dataset
theprocess.Therefore,eachstudentwhowentthroughthewhole
FromourinternalMOOCdatabase,aggregatingdatafromCours- coursewithoutdroppingouthasjustaEndOfDayeventonthelast
eraandedX,weextractedeventsfor61EPFLcourses. Theraw dayofthecourse.NumberofEndOfDayeventsisthereforeequal
datacontainedapproximately23millioneventsfor500,000stu-
tothenumberofdaysofthecourse.
dents,arrangedintuples: <StudentID,CourseID,EventType,
Timestamp>.TheEventTypedescribesthetypeofanactivityand (s) (s) (s)
takesoneoffourpossiblevaluespresentedinTable1.Wechoose TherandomsequentialvariableX1 ,X2 ,...,Xn representsthe
(s)
theseeventsasthemostdiscriminativeactionsfromthekeyareas: sequence of activities of one student s. Each Xi ∈A and the
learning, validation and community engagement. Note that our sequencestopsafteranEndOfDaywhenthestudentreachestheend
modellingtechniquecanbeeasilyextendedtocoverothertypesof ofthecourse.Wedenotethelengthofthesequenceforastudents
events. asn(s).TheobservationofonestudentactivityalongoneMOOCis
thusarealizationoftherandomsequenceX.
Abbreviation Description Proportion
VideoPlay watchingavideo 51% TheprobabilitydistributionP:Ingeneral,foreachstudents S
Submission submittinganassignment 33% (s) ∈
wecanmodelthei-theventX withaprobabilitydistribution
ForumView visitingtheforum 15% i
ForumPost postingontheforum 1% P(s)(X(is)=a|X(is)1,,X(is)2,...,X(1s),Cs),
− −
Table1:Distributionofeventsinthedataset. wherea A,X(s),...,X(s) arethepreviouseventsofthatstudent
∈ 1 i 1
andCsarepersonalchar−acteristicsofthestudent.
FortheanalysiswedevelopedourownPythonimplementationofthe
algorithmfittingthemodel1.InSection5weexplainthealgorithm Thisdistributionrepresentsthestudent’sbehaviourprofileandal-
indetail.Since23millioneventscanstillfitinmemoryofasingle lowstogeneratetypicalsequencesofactivities.Ourmainobjective
computer,wedidnotrequireaspecificcomputingarchitectureto is to model this distribution as accurately as possible, given the
performtheanalysis.However,giventheconsiderablesizeofthe limitedinformation. Theaccuratedistributionwouldallowusto
dataset,thealgorithmtakesseveralminutestorun. drawsamplesofstudents.
3.2 Definitions 3.3 Assumptions
Westartwithageneralframework,inwhichstudent’sactivityinany Asdiscussedintheprevioussection,assessingP(s) isunfeasible
MOOCcanbeverypreciselydescribed.Next,weelevateabstraction duetodependenceontoomanyeventsinthepastandduetothe
ofthemodelbyaddingassumptionssimplifyingtheanalysis.Our lack of information on personal student features. In order to fit
goalistointroduceamodelwhosecomplexitycanbeadaptedto a probabilistic model we need to relax these dependencies. We
thestructureofacourseandtheamountofavailabledata. introducefollowingassumptions:
Weconsideramodelinwhichstudentsbehaviourisdescribedin A1 Students’behavioursfitintoasmallnumberofnaturalcate-
asequentialmannerbythetypeofactivitytheyperformandthe goriesofbehaviour.
timetheywaitbetweentwosessions.Furthermore,asmostofthe
A2 Thetypeofactivitydependsonlyonhispreviousactivityand
studentsperformatmost1MOOCsessionperday,wechoosea
notonoldpastactivities.
dailygranularityofactions.
AssumptionA1mapsthespaceofallpossiblestudents’character-
Asequenceofstudent’sdailyactivityisdescribedasalistof’active
isticsintoalimitednumberofcategories,whicharemucheasier
events’ (VideoPlay, Submission, ForumView and ForumPost)
toattribute. ManystudiesonMOOCsexplicitlyclassifystudents
followedbya’endofthedayevent’(EndOfDay)oronlyaEndOfDay
intoasmallnumberofcategories[10],studentsaredividedbetween
inthecasethestudentdidnotperformanyactivitythegivenday.
’Viewers’whoonlywatchvideos,’ForumActives’whosharewith
Theformaldefinitionofthemodelisfollowing:
theirpeersintheMOOCdiscussionforumand’Completers’who
succeedintheassignments.Aswepresentinthenextsection,our
ThesetofallstudentsS:Weusethesymbols S todesignate
∈ methodisbasedonunsupervisedclustering,wheregroupsemerge
anindividualstudent.
inthewayoptimalintermsofmaximumlikelihoodofthemodel.
ThesetAofalltypesofactivities: Forthisstudywechoseaset
AssumptionA2weimposethatonlythelastactivityhasanimpact
offourtypesofevents: {VideoPlay,Submission,ForumView,
onthecurrentactivity. Thisassumptionismoreconstraining,but
1Our implementation is available under https://github.com/ sincethecomplexityofhistorygrowsexponentiallywiththenumber
lfaucon/edm2016-mooc-simulator ofstepsand,inordertobeabletoestimateparameters,wehaveto
Proceedings of the 9th International Conference on Educational Data Mining 359
reducethesearchspace. Thissimplificationisusuallycalledthe whereristhenumberofrepetitionsandλkistheaveragenumber
a
’Markovassumption’. ofrepetitionandneedstobeestimatedfromthedataforeachkand
a.
ApartfromtechnicalassumptionsrequiredforMarkovModels,we
imposeotherassumptionsforconvenience.First,wedonotconsider ToillustratethatthePoissondistributionimprovesthemodel,let
lengthofevents,sotheVideoPlayeventisonlythemomentwhen usconsideranexample. Supposeweexpectthatsomegroupof
astudentstartswatchingavideo. Second,iftheseriesofevents studentsconnectstoaMOOCtwiceaweek,withapproximately
happensduringmidnight,stillaneventEndOfDayisaddedtothe threedaysintervalbetweenconnections.Inthatcase,theaverage
sequence. numberofrepetitionsoftheEndOfDayeventis3.SimpleMarkov
Model,accuratelymodelstheaveragetobe3butimplicitlyassumes
thatthemajorityofstudentsgetsonly1repetition. Semi-Markov
4. PROBABILISTICMODELING
modelwithPoissondistributionalsogivestheaverageequalto3
4.1 Softclustering andthedistributionisconcentratedaround3.
In Section 3 we proposed a simplified framework, in which we
assumethatthereareonlyafewdifferentpossibleclassesofstu- 5. FITTINGTHEMODEL
dents (A1). We enumerate clusters 1,2,...,K. For each student
5.1 Algorithm
(s)
s S weintroduceaprobabilitydistributionµ whichdescribes
∈ k TheExpectation-Maximisation(EM)algorithmhasbeenintroduced
probabilitythatthestudentbelongstothebehaviourclassesk,for
in1977in[6]. Thegoalofthisiterativetechniqueistocompute
k 1,2,...,K .
∈{ } theparametersthatmaximizethelikelihoodofagivenprobabilistic
model.TheEMalgorithmhasbeenproventoconvergeatleasttoa
Thistechniqueisoftenreferredtoassoftclustering,weightedclus-
localminimum.Thisminimumdependsontheinitializationpoint,
teringorfuzzyclustering[15].Insteadofdiscretclusterassignment,
thusmultiplerunswithdifferentrandominitialisationsareoften
as for example in K-means, we obtain for each student a proba-
usedinpracticeinordertoincreasethechancesoffindingtheglobal
bilitydistributionamongtheclusters. Theseprobabilitiescanbe
minimum.
intuitivelyseenasourcertaintythatthestudentbelongstoagiven
cluster.
InthisstudyweusetheEMalgorithmforunsupervisedlearning.
Neithertheparametersofthelatentclassesnortherepartitionof
4.2 Semi-MarkovChain thestudentsareknownatthebeginningandthealgorithmhasto
Assumption(A2),i.e.dependenceonlyonthelaststate,allowsus estimatebothquantitiesatonce.Inoursettings,wedefineforeach
tomodeltheprocessMarkovChains.Formally,inthedefinitionof k 1,2,...,K andstatesaandb:
∈{ }
distributionofthenexteventwecandropdependenceoftheevents
whichoccurredbeforethecurrentone,i.e.weidentify
(k)
- p ,theprobabilitythatastudentwiththebehaviourpro-
b >a
P(s)(X(s) X(s),...,X(s) )=P(s)(X(s) X(s) ) file−kperformstheactivityaaftertheactivityb:
i | 1 i 1 i | i 1
− −
AclapsrseicliMmianrakroyvaMnaoldyeslissirnevoeuarlceodnatenxti.mApotrratadnittiowneaalkMnaersksoovfmuosidnegl p(bk−)>a=P(Xi=a|Xi−1=b)
considersthatastudentisequallylikelytostopwatchingvideos (k)
whentheyhavewatchedone,aswhentheyhavealreadywatchedten - λa ,theaveragenumberofrepetitionsofaneventafroma
studentofprofilek.
videos.Inpractice,studentswatchvideossequentiallyandMarkov
Modeldoesnotcaptureappropriatelythenumberofeventsinthe
(s)
- µ ,theprobabilitythatastudentsbelongstotheprofilek.
sequence. k
ToremedythisissueweemployedSemi-MarkovModels(alsocalled
Wecanthuscomputethelikelihoodoftheobservedsequence,asa
MarkovRenewalProcesses).Thekeyfeatureofthismodelisthat
functionofclusterrepartitionandparametersofMarkovChainsby
itallowstoreplacetheself-loops(transitionsfromoneeventtype
toitself)intheMarkovChain,byaprobabilitydistributionofthe K
numberofrepetitionofagivenstate. likelihood=s∏∈S[k∑=1µk(s)(a,b∏,r)∈Tsp(bk−)>aPλa(k)(r)], (1)
InSemi-MarkovModels,westillneedtochooseaparametricdistri-
whereTsisthesetoftuples(a,b,r) A A Ncorrespondingto
bution,butwehavemorefreedomthanintraditionalMarkovChain. ∈ × ×
transitionsfromactivitybtoactivityawithrrepetitionsofactivity
MarkovChainimplicitlyassumesthatprobabilityofstayinginthe
a.Thegoalofthealgorithmistofindtheparametersthatmaximize
samestateisthelargestfor1stepanddecreaseswithnumberof
thelikelihood.
steps. However,wewouldexpectthat1isnotthemostprobable
numberofrepetitionatleastforaparticulargroupofstudents.This
Inthefirststage,thealgorithminitializerandomlyKprofiles.Next,
phenomenoncanbecapturedby,forexample,Poissondistribution,
ititerativelyimprovesthe likelihood,byalternatingtwostepsas
whichprovedtobemoreaccurateinourpreliminaryanalysis.Thus,
described below. In each step it modifies the repartition or the
foraneventa Aandaclasskwemodelthenumberofrepeated
eventsRk by ∈ Markovchainparameters.
a
P(Rka=r)= e−λakr!λak r Ienitihteiarlitzhaetipob(nk):>aThaendinλiat(ika)liozrattihoenµck(osn).siIsntsoiunrcahlgooorsiitnhgmr,awndeosmtalryt
(cid:0) (cid:1) −
Proceedings of the 9th International Conference on Educational Data Mining 360
(s)
withtheµk .Thiscanbedonebygeneratingarandomnumberk∗
from1toKforeachstudentsandbysetting
µ(s)= 1 ifk=k∗
k 0 otherwise.
(cid:26)
Iterations: Theiterationphasehastwosteps. First,wecompute
(k) (k) (s)
the optimal values for pb >a and λa given that µk are fixed
(equations(2)and(3)). −
∑ ∑ µ(s)
k
p(k) = s∈S(a,b,_)∈Ts (2)
b−>a ∑ ∑ µ(s)
k
s∈S(_,b,_)∈Ts
∑ ∑ rµ(s)
k
λa(k)= s∈∑S(a,_,∑r)∈Ts µ(s) (3)
k
s∈S(a,_,_)∈Ts
(s)
Next, we compute the new values of µ according to the new
k
(k) (k)
pb >aandλa (equations(4)).
−
∏ p(k) P (r)
µ(s)= (a,b,r)∈Ts b−>a λa(k) (4)
k K
∑ ∏ p(c) P (r)
c=1(a,b,r)∈Ts b−>a λa(c)
Intuitively,theinthefirststepwecomputetheparametersofthe
latentclassesgiventherepartitionofthestudentsandinthesecond
stepwerecomputetherepartitionfromthenewclassesparameters.
5.2 Example: Interpretationclusters(K=3)
Beforewepresenttheresultsforthechoiceofthenumberofclusters,
inthissection,weillustratethebehaviourofthealgorithmandthe
model when the number of clusters is small (K=3). Although
in this case we may lose important variability among groups of
students,smallnumberofclustersallowsustovisualisetheSemi-
Markovmodelsandinterpreteachoftheclusters.
The visualizations of the Semi-Markov models on Figure 1 can
revealgeneralcharacteristicsofstudents’behaviours. Forexam-
ple,Profiles1and3areingenerallessactiveastheyhavemore
EndOfDayevents.Onthecontrary,Profile3hasaveryhighaverage
numberofrepetitiononVideoPlayandconsiderableprobability
togobacktoEndOfDayevents. Thismeansthatstudentsofthis
Figure1:Threegraphicalrepresentationsofbehaviourprofiles
clusterarenotfullyengagedinallMOOCactivities.
extractedbytheEMalgorithm. Fromtoptobottom: profiles
1, 2 and 3 (thickness: transition probability; color: average
Amoreinsightfulwaytoanalyseandinterpretthedifferencesisto
numberofrepetitions)
generatesequencesofeventsandcomparetheoutcomes. Wecan
computetheexpectednumberofvideoswatchedortheexpected
numberofpostontheforumdirectlyfromsimulatedsequences.
Table2showstheaveragenumberofseveraltypesofeventsfor example, we can see that students of Profile 1 participate in the
100simulatedstudents(averagefrom10000simulations)overfour collaborativeactivitiesoftheMOOCmorerarely,butengageinthe
weeksgeneratedwiththethreeMarkovmodelsfromFigure1.For assignmentsmorethaninwatchingthevideos.Thismightindicate
Proceedings of the 9th International Conference on Educational Data Mining 361
thattheyalreadyhaveagoodunderstandingofthecontentofthe 6. SIMULATIONS
courseanddonotneedtospendmoretimeonstudying. Tofully WithamodelfittedwiththeEMalgorithmathand,thealgorithm
investigatethishypothesis,furtheranalysisshouldbeconducted. repartitioned students and chose parameters of a Semi-Markov
Chainforeachoftheclusters. Sinceboththerepartitionandthe
Profiles 1 2 3 Semi-MarkovChainsaregenerative,wecandrawsamplesfromthe
WatchedVideos 1060 3133 2363 fitteddistribution,i.e. wecansimulatethestudents. Werunthe
Submissions 1535 2423 442 simulationsandshowapossiblewaytomeasurethevalidityofthe
ForumVisits 68 1711 255 results.
Forumposts 3 96 15
Tovalidatepotentialvalueofsimulations,wefirstproposeasimple
Table 2: Average number of events for 100 students over the accuracy measure. In equation (6), P (a >n) represents the
real
| |
firstfourweeksoftheMOOC probabilitythatastudentperformsmorethanneventsoftypea
duringthetimeoftheMOOC. a isthecountofeventsoftypea.
| |
5.3 ChoiceoftheparameterK Psim(a >n)representsthesameprobabilitybutforasimulated
| |
Acommonchallengeofunsupervisedlearningandfittingaproba- student.InthemeasurewechosethevalueN=50becauseitcovers
bilisticmodelisfindingthecorrectnumberofclasses.Inourcase, mostofthevariabilityinthestudentsactivitysequencesandisnot
thesimilarityofthealgorithmwithotherclusteringtechniquessuch toolargeasstill19%ofthestudentshaveanactivitywithmorethan
astheK-meansleadstothe"elbowheuristic",oftenusedinpractice. 50repetitions.
Theideaistochoosethenumberofclusterslargeenoughtoexplain
a large part of the variability, but such that a greater number of
clusterswouldnotexplainsubstantiallymore. MSE= (A 11) N ∑ ∑ (Preal(|a|>n)−Psim(|a|>n))2
| |− ∗ a An<N
∈ (6)
Inordertoprovethecorrectnessofthemodelingmethod,wedivided
ourdatasetintoatrainingsetandatestsetforvalidatingtheresults.
Thefirststepistorunthealgorithmonthetrainingsetwithseveral
parameterK and then, usethe computed parametersto simulate
anewpopulationofstudentsandfinallycomparethispopulation
with the students from the testing set. In Figure 3 we can see
thatthefitdoesnotimprovemuchafterK=15,becausetoohigh
numberofclustersmakesthealgorithmlearnmostlythenoisefrom
the random actions of the students instead of their real intrinsic
behaviouralpatterns.
Figure 2: Average distance of students from their model for
differentnumberofclasses
Inordertoconfirmtheresultofthisfirstmeasureofquality,we
designedanothermeasuredescribedintheequation(5).Thegoalis
toquantifyhowthestudentssequencesdivergefromtheirattributed
cluster.Intheequation, A isthecardinalityofthesetofpossible
| |
activities, ps(a) is the probability of finding the activity a if we
takeuniformlyatrandomanactivityofstudentsand p (a)isthe
k
probabilityoffindingtheactivityaifwetakeuniformlyatrandom
anactivityfromasequencegeneratedbytheclassk.
1 Figure 3: Measure of accuracy of a simulation for different
d2(s,k)= A ∑(ps(a)−pk(a))2 (5) numberofclasses
| |a A
∈
Thesmallerrorprovesthatthedistributionobtainedfromsimula-
Thisdistancemeasureshowsanelbowshapeforthesamevaluesof tionsisclosetotheoriginaldistribution.Thisimpliesthatthemodel
Kbetween10and15asitcanbeseenonFigure2. Weconclude properlytrainedonsmallsampleofstudentsoronjustfewfirst
thatMOOCstudentsfromourdatasetcanbemeaningfullyclustered events,canbeextrapolatedbysimulationtofurthereventsorlarger
into10 15differentclasses. samples.
−
Proceedings of the 9th International Conference on Educational Data Mining 362
Inanexperimentalsetup,simulationswithvaryinginitialconditions pages83–92.ACM,2014.
ofthemodel(e.x.probabilitiesoftransitions)cangiveusdistribu- [6] A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximum
tionsofeventsatthelaterstate.Knowingprobabilitydistributionsof likelihoodfromincompletedataviatheemalgorithm.Journal
theresultsoftwoconditionsallowstoestimatesamplesizesneeded oftheRoyalStatisticalSociety.SeriesB(Methodological),
forfindingstatisticalevidenceoftheinvestigatedeffect. 39(1):pp.1–38,1977.
[7] AndrewJElliotandToddMThrash.Approach-avoidance
7. DISCUSSION motivationinpersonality:approachandavoidance
temperamentsandgoals.Journalofpersonalityandsocial
InSection5weshowedthatSemi-Markovchainscanbesuccess-
psychology,82(5):804,2002.
fullyappliedtodescribebehaviouralpatternsofstudents(RQ1).In
Section5.2,asimplestudywithreducednumberofclustersprove [8] JoséPGonzález-BrenesandYunHuang.Usingdatafromreal
theirpotentialinterpretability(RQ2).InSection6,wediscusshow andsimulatedlearnerstoevaluateadaptivetutoringsystems.
thesemodelscanbeusedtoinferdistributionsofevents(RQ3). InProceedingsoftheWorkshopsatthe18thInternational
ConferenceonArtificialIntelligenceinEducationAIED,
Ourmethodhastwomainlimitations.Theycanbefurtherrelaxed 2015.
withadditionaldataorwithincorporationofdomainknowledge. [9] JamesAGordon,WilliamMWilkerson,DavidWilliamson
Shaffer,andElizabethGArmstrong."practicing"medicine
TheHomogeneityoftheMarkovprocess: TheMarkovassump- withoutrisk:students’andeducators’responsesto
tionwasintroducedforreducingthenumberofparametersofour high-fidelitypatientsimulation.AcademicMedicine,
model.Itisastrongsimplification,whichentailssomedrawbacks. 76(5):469–472,2001.
Thisassumptionimplicitlyrequiresthatstudentbehavewithexactly [10] RenéFKizilcec,ChrisPiech,andEmilySchneider.
thesametransitionmatrixduringthewholecourse.Themotivation Deconstructingdisengagement:analyzinglearner
tokeeplearningshouldincreasewhengettingclosertotheendof subpopulationsinmassiveopenonlinecourses.In
thecourseandthusthedropoutratedecreases, whichcannotbe Proceedingsofthethirdinternationalconferenceonlearning
capturebyourmethod.Agoodwaytoovercomethisweaknessis analyticsandknowledge,pages170–179.ACM,2013.
touseinhomogeneousMarkovmodelswithtransitionsprobabilities [11] KennethRKoedinger,NoboruMatsuda,ChristopherJ
thatarefunctionsoftime. MacLellan,andElizabethAMcLaughlin.Methodsfor
evaluatingsimulatedlearners:Examplesfromsimstudent.
Differencesbetweencourses:Thequalityofthevideos,thelevel 17thInternationalConferenceonArtificialIntelligencein
ofdifficultyoftheassignmentsorthediscussiontopicsinthefo- EducationAIED,5:45–54,2015.
rumsareallfactorsthatcangreatlyinfluencethebehaviourofa [12] NanLi,ŁukaszKidzin´ski,PatrickJermann,andPierre
student.Noneofthesewereincludedinourmodel.Wehypothesize Dillenbourg.Howdoin-videointeractionsreflectperceived
thataddingexternalannotationsthatwouldimpactthetransition videodifficulty?InProceedingsoftheEuropeanMOOCs
probabilitiesofourMarkovmodelscouldhelpsolvethisproblem. StakeholderSummit2015,numberEPFL-CONF-207968,
Asfornow,ourmodelcanbeusedtocomparecourses.Forexample, pages112–121.PAUEducation,2015.
ifwerunthealgorithmontwoMOOCsandrealisethattheVideo
[13] RanLiuandKennethRKoedinger.Variationsinlearningrate:
Watchersofonecoursehavealowerengagement,thatshowsalower
Studentclassificationbasedonsystematicresidualerror
qualityofvideocontentwhiledifferencesfortheForumFollower
patternsacrosspracticeopportunities.InEducationalData
mayrevealdifferencesonthequalityoftheForumdiscussions.
Mining2015.EDM,2015.
[14] GordMcCallaandJohnChampaign.Aied2013simulated
8. REFERENCES learnersworkshop.InArtificialIntelligenceinEducation,
[1] GirishBalakrishnanandDerrickCoetzee.Predictingstudent pages954–955.Springer,2013.
retentioninmassiveopenonlinecoursesusinghiddenmarkov [15] RichardNockandFrankNielsen.Onweightingclustering.
models.ElectricalEngineeringandComputerSciences PatternAnalysisandMachineIntelligence,IEEE
UniversityofCaliforniaatBerkeley,2013. Transactionson,28(8):1223–1235,2006.
[2] EricBonabeau.Agent-basedmodeling:Methodsand [16] MirkoRaca,ŁukaszKidzin´ski,andPierreDillenbourg.
techniquesforsimulatinghumansystems.Proceedingsofthe Translatingheadmotionintoattention-towardsprocessingof
NationalAcademyofSciences,99(suppl3):7280–7287,2002. student’sbody-language.InProceedingsofthe8th
[3] ChristopherGBrinton,MungChiang,SonalJain,HKLam, InternationalConferenceonEducationalDataMining,
ZhenmingLiu,andFelixMingFaiWong.Learningabout numberEPFL-CONF-207803,2015.
sociallearninginmoocs:Fromstatisticalanalysisto [17] ArtiRamesh,DanGoldwasser,BertHuang,HalDauméIII,
generativemodel.LearningTechnologies,IEEETransactions andLiseGetoor.Modelinglearnerengagementinmoocs
on,7(4):346–359,2014. usingprobabilisticsoftlogic.InNIPSWorkshoponData
[4] EdHChi,PeterPirolli,KimChen,andJamesPitkow.Using DrivenEducation,2013.
informationscenttomodeluserinformationneedsandactions [18] IpkeWachsmuthandJens-HolgerLorenz.Sharpeningone’s
andtheweb.InProceedingsoftheSIGCHIconferenceon diagnosticskillbysimulatingstudents’errorbehaviors.Focus
Humanfactorsincomputingsystems,pages490–497.ACM, onlearningproblemsinmathematics,9(2),1987.
2001. [19] ChristopherAWolters.Advancingachievementgoaltheory:
[5] CarletonCoffrin,LindaCorrin,PauladeBarba,andGregor Usinggoalstructuresandgoalorientationstopredictstudents’
Kennedy.Visualizingpatternsofstudentengagementand motivation,cognition,andachievement.Journalof
performanceinmoocs.InProceedingsofthefourth educationalpsychology,96(2):236,2004.
internationalconferenceonlearninganalyticsandknowledge,
Proceedings of the 9th International Conference on Educational Data Mining 363