Table Of ContentA Framework for Individualizing Predictions of Disease
Trajectories by Exploiting Multi-Resolution Structure
PeterSchulam SuchiSaria
Dept. ofComputerScience Dept. ofComputerScience
JohnsHopkinsUniversity JohnsHopkinsUniversity
6
Baltimore,MD21218 Baltimore,MD21218
1
[email protected] [email protected]
0
2
n
a Abstract
J
1
For many complex diseases, there is a wide variety of ways in which an indi-
2
vidual can manifest the disease. The challenge of personalized medicine is to
developtoolsthatcanaccuratelypredictthetrajectoryofanindividual’sdisease,
]
L which can in turn enable clinicians to optimize treatments. We represent an in-
M dividual’sdiseasetrajectoryasacontinuous-valuedcontinuous-timefunctionde-
scribing the severity of the disease over time. We propose a hierarchical latent
t. variablemodelthatindividualizespredictionsofdiseasetrajectories. Thismodel
a
sharesstatisticalstrengthacrossobservationsatdifferentresolutions–thepopula-
t
s tion,subpopulationandtheindividuallevel.Wedescribeanalgorithmforlearning
[ populationandsubpopulationparametersoffline,andanonlineprocedurefordy-
2 namicallylearningindividual-specificparameters. Finally,wevalidateourmodel
v on the task of predicting the course of interstitial lung disease, a leading cause
4 ofdeathamongpatientswiththeautoimmunediseasescleroderma. Wecompare
7 ourapproachagainststate-of-the-artanddemonstratesignificantimprovementsin
6 predictiveaccuracy.
4
0
. 1 Introduction
1
0 Incomplex,chronicdiseasessuchasautism,lupus,andParkinson’s,thewaythediseasemanifests
6 mayvarygreatlyacrossindividuals[1].Forexample,inscleroderma,thediseaseweuseasarunning
1 exampleinthiswork,individualsmaybeaffectedacrosssixorgansystems—thelungs,heart,skin,
: gastrointestinaltract,kidneys,andvasculature—tovaryingextents[2]. Foranysingleorgansystem,
v
someindividualsmayshowrapiddeclinethroughoutthecourseoftheirdisease,whileothersmay
i
X show early decline but stabilize later on. Often in such diseases, the most effective drugs have
r strongside-effects. Withtoolsthatcanaccuratelypredictanindividual’sdiseaseactivitytrajectory,
a clinicianscanmoreaggressivelytreatthoseatgreatestriskearly,ratherthanwaitinguntilthedisease
progresses to a high level of severity. To monitor the disease, physicians use clinical markers to
quantify severity. In scleroderma, for example, PFVC is a clinical marker used to measure lung
severity. The task of individualized prediction of disease activity trajectories is that of using an
individual’sclinicalhistorytopredictthefuturecourseofaclinicalmarker;inotherwords,thegoal
is to predict a function representing a trajectory that is updated dynamically using an individual’s
previousmarkersandindividualcharacteristics.
Predictingdiseaseactivitytrajectoriespresentsanumberofchallenges. First,therearemultiplela-
tentfactorsthatcauseheterogeneityacrossindividuals. Onesuchfactoristheunderlyingbiological
mechanism driving the disease. For example, two different genetic mutations may trigger distinct
diseasetrajectories(e.g. asinFigures1aand1b). Ifwecoulddivideindividualsintogroupsaccord-
ing to their mechanisms—or disease subtypes (see e.g. [3, 4, 5, 6])—it would be straightforward
tofitseparatemodelstoeachsubpopulation. Inmostcomplexdiseases,however,themechanisms
are poorly understood and clear definitions of subtypes do not exist. If subtype alone determined
trajectory,thenwecouldclusterindividuals. However,otherunobservedindividual-specificfactors
1
such as behavior and prior exposures affect health and can cause different trajectories across indi-
vidualsofthesamesubtype. Forinstance,achronicsmokerwilltypicallyhaveunhealthylungsand
somayhaveatrajectorythatisconsistentlylowerthananon-smoker’s,whichwemustaccountfor
usingindividual-specificparameters. Anindividual’strajectorymayalsobeinfluencedbytransient
factors—e.g. an infection unrelated to the disease that makes it difficult to breath (similar to the
“dips”inFigure1corthethirdrowinFigure1d). Thiscancausemarkervaluestotemporarilydrop,
andmaybehardtodistinguishfromdiseaseactivity. Weshowthatthesefactorscanbearrangedin
a hierarchy (population, subpopulation, and individual), but that not all levels of the hierarchy are
observed. Finally,thefunctionaloutcomeisarichtarget,andthereforemorechallengingtomodel
thanscalaroutcomes. Inaddition,themarkerdataisobservedincontinuous-timeandisirregularly
sampled,makingcommonlyuseddiscrete-timeapproachestotimeseriesmodeling(orapproaches
thatrelyonimputation)notwellsuitedtothisdomain.
Relatedwork. Themajorityofpredictivemodelsinmedicineexplainvariabilityinthetargetout-
come by conditioning on observed risk factors alone. However, these do not account for latent
sources of variability such as those discussed above. Further, these models are typically cross-
sectional—theyusefeaturesfromdatameasuredupuntilthecurrenttimetopredictaclinicalmarker
oroutcomeatafixedpointinthefuture. Asanexample,considerthemortalitypredictionmodelby
Leeetal.[7],wherelogisticregressionisusedtointegratefeaturesintoapredictionabouttheprob-
abilityofdeathwithin30daysforagivenpatient. Topredicttheoutcomeatmultipletimepoints,
typically separate models are trained. Moreover, these models use data from a fixed-size window,
ratherthanagrowinghistory.
Researchers in the statistics and machine learning communities have proposed solutions that ad-
dressanumberoftheselimitations. MostrelatedtoourworkisthatbyRizopoulos[8],wherethe
focus is on making dynamical predictions about a time-to-event outcome (e.g. time until death).
Theirmodelupdatespredictionsovertimeusingallpreviouslyobservedvaluesofalongitudinally
recorded marker. Besides conditioning on observed factors, they account for latent heterogeneity
acrossindividualsbyallowingforindividual-specificadjustmentstothepopulation-levelmodel—
e.g. for a longitudinal marker, deviations from the population baseline are modeled using random
effectsbysamplingindividual-specificinterceptsfromacommondistribution. Othercloselyrelated
work by Proust-Lima et al. [9] tackle a similar problem as Rizopoulos, but address heterogeneity
usingamixturemodel.
Another common approach to dynamical predictions is to use Markov models such as order-p
autoregressive models (AR-p), HMMs, state space models, and dynamic Bayesian networks
(seee.g. in[10]). Althoughsuchmodelsnaturallymakedynamicpredictionsusingthefullhistory
by forward-filtering, they typically assume discrete, regularly-spaced observation times. Gaussian
processes (GPs) are a commonly used alternative for handling continuous-time observations—see
Robertsetal. [11]forarecentreviewofGPtimeseriesmodels. SinceGaussianprocessesarenon-
parametric generative models of functions, they naturally produce functional predictions dynami-
callybyusingtheposteriorpredictiveconditionedontheobserveddata. MixturesofGPshavebeen
appliedtomodelheterogeneityinthecovariancestructureacrosstimeseries(e.g. [12]),howeveras
notedinRobertsetal.,appropriatemeanfunctionsarecriticalforaccurateforecastsusingGPs. In
ourwork,anindividual’strajectoryisexpressedasaGPwithahighlystructuredmeancomprising
population, subpopulation and individual-level components where some components are observed
andothersrequireinference.
Morebroadly,multi-levelmodelshavebeenappliedinmanyfieldstomodelheterogeneouscollec-
tionsofunitsthatareorganizedwithinahierarchy[13]. Forexample,inpredictingstudentgrades
overtime, individualswithinaschoolmayhaveparameterssampledfromtheschool-levelmodel,
andtheschool-levelmodelparametersinturnmaybesampledfromacounty-specificmodel. Inour
setting,thehierarchicalstructure—whichindividualsbelongtothesamesubgroup—isnotknowna
priori. Similarideasarestudiedinmulti-tasklearning,whererelationshipsbetweendistinctpredic-
tiontasksareusedtoencouragesimilarparameters. Thishasbeenappliedtomodelingtrajectories
by treating predictions at each time point as a separate task and enforcing similarity between sub-
models close in time [14]. This approach is limited, however, in that it models a finite number
of times. Others, more recently, have developed models for disease trajectories (see [15, 16] and
referenceswithin)butthesefocusonretrospectiveanalysistodiscoverdiseaseetiologyratherthan
dynamicalprediction. Schulametal.[16]incorporatedifferencesintrajectoriesduetosubtypesand
individual-specificfactors. Webuilduponthisworkhere. Finally,recommendersystemsalsoshare
2
(e)
(d) Subtype B-spline coefficients2Rdz w~g �~g G
Subtype marginal model
coefficients Rqz M
2 z
i
(a)
Subtype indicator 1,...,G
2{ }
Smuobdteypl fee amtuarregsinal Rqz ~xiz yij tij
2
N
Shytrpuecrt-upraerda mnoeisteer sGP ↵ fi i
(b) ~bi ⇢~i ~xip
Structured noise function RR
2
Individual model coefficients2Rd` ⌃b ⇤
Individual model covariance matrix2Rd`⇥d`
(c) Population model features-to-cPooepffiuclaietniotn m maopd2el Rcodpe⇥ffiqcpients 2Rdp Pfeoaptuurleastio2n mRoqdpel
Figure 1: Plots (a-c) show example marker trajectories. Plot (d) shows adjustments to a population and
subpopulation fit (row 1). Row 2 makes an individual-specific long-term adjustment. Row 3 makes short-
termstructurednoiseadjustments. Plot(e)showstheproposedgraphicalmodel. Levelsinthehierarchyare
color-coded.Modelparametersareenclosedindashedcircles.Observedrandomvariablesareshaded.
informationacrossindividualswiththeaimoftailoringpredictions(seee.g. [17,18,19]), butthe
taskisotherwisedistinctfromours.
Contributions. We propose a hierarchical model of disease activity trajectories that directly ad-
dressescommon—latentandobserved—sourcesofheterogeneityincomplex,chronicdiseasesus-
ingthreelevels:thepopulationlevel,subpopulationlevel,andindividuallevel.Themodeldiscovers
thesubpopulationstructureautomatically,andinfersindividual-levelstructureovertimewhenmak-
ing predictions. In addition, we include a Gaussian process as a model of structured noise, which
is designed to explain away temporary sources of variability that are unrelated to disease activity.
Together, thesefourcomponentsallowindividualtrajectoriestobehighlyheterogeneouswhilesi-
multaneously sharing statistical strength across observations at different “resolutions” of the data.
Whenmakingpredictionsforagivenindividual,weuseBayesianinferencetodynamicallyupdate
ourposteriorbeliefoverindividual-specificparametersgiventheclinicalhistoryandusetheposte-
riorpredictivetoproduceatrajectoryestimate. Finally,weevaluateourapproachbydevelopinga
state-of-the-arttrajectorypredictiontoolforlungdiseaseinscleroderma. Wetrainourmodelusing
alarge,nationaldatasetcontainingindividualswithsclerodermatrackedover20yearsandcompare
ourpredictionsagainstalternativeapproaches. Wefindthatourapproachyieldssignificantgainsin
predictiveaccuracyofdiseaseactivitytrajectories.
2 DiseaseTrajectoryModel
We describe a hierarchical model of an individual’s clinical marker values. The graphical model
is shown in Figure 1e. For each individual i, we use N to denote the number of observed mark-
i
ers. We denote each individual observation using y and its measurement time using t where
ij ij
j ∈ {1,...,Ni}. Weuse(cid:126)yi ∈ RNi and(cid:126)ti ∈ RNi todenoteallofindividuali’smarkervaluesand
measurementtimesrespectively. Inthefollowingdiscussion,Φ(t )denotesacolumn-vectorcon-
ij
tainingabasisexpansionofthetimet andweuseΦ(cid:0)(cid:126)t (cid:1) = [Φ(t ),...,Φ(t )](cid:62) todenotethe
ij i i1 iNi
matrixcontainingthebasisexpansionofpointsin(cid:126)t ineachofitsrows. Wemodelthejthmarker
i
valueforindividualiasanormallydistributedrandomvariablewithameanassumedtobethesum
offourterms: apopulationcomponent,asubpopulationcomponent,anindividualcomponent,and
astructurednoisecomponent:
y ∼N Φ (t )(cid:62)Λ (cid:126)x +Φ (t )(cid:62)β(cid:126) +Φ (t )(cid:62)(cid:126)b + f (t ) ,σ2. (1)
ij p ij ip z ij zi (cid:96) ij i i ij
(cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125)
(A)population (B)subpopulation (C)individual (D)structurednoise
The four terms in the sum serve two purposes. First, they allow for a number of different sources
ofvariationtoinfluencetheobservedmarkervalue,whichallowsforheterogeneitybothacrossand
within individuals. Second, they share statistical strength across different subsets of observations.
The population component shares strength across all observations. The subpopulation component
3
sharesstrengthacrossobservationsbelongingtosubgroupsofindividuals. Theindividualcompo-
nentsharesstrengthacrossallobservationsbelongingtothesameindividual. Finally,thestructured
noisesharesinformationacrossobservationsbelongingtothesameindividualthataremeasuredat
similartimes. Predictinganindividual’strajectoryinvolvesestimatinghersubtypeandindividual-
specificparametersasnewclinicaldatabecomesavailable1. Wedescribeeachofthecomponentsin
detailbelow.
Populationlevel. Thepopulationmodelpredictsaspectsofanindividual’sdiseaseactivitytrajec-
toryusingobservedbaselinecharacteristics(e.g. genderandrace),whicharerepresentedusingthe
featurevector(cid:126)x . Thissub-modelisshownwithintheorangeboxinFigure1e. Hereweassume
ip
thatthiscomponentisalinearmodelwherethecoefficientsareafunctionofthefeatures(cid:126)xip ∈Rqp.
Thepredictedvalueofthejthmarkerofindividualimeasuredattimet isshowninEq. 14(A),
ij
whereΦp(t)∈Rdp isabasisexpansionoftheobservationtimeandΛ∈Rdp×qp isamatrixusedas
alinearmapfromanindividual’scovariates(cid:126)xip tocoefficientsρi ∈ Rdp. Atthislevel,individuals
withsimilarcovariateswillhavesimilarcoefficients. ThematrixΛislearnedoffline.
Subpopulation level. We model an individual’s subtype using a discrete-valued latent variable
z ∈{1,...,G}, where G is the number of subtypes. We associate each subtype with a unique
i
diseaseactivitytrajectoryrepresentedusingB-splines,wherethenumberandlocationoftheknots
and the degree of the polynomial pieces are fixed prior to learning. These hyper-parameters de-
termine a basis expansion Φz(t) ∈ Rdz mapping a time t to the B-spline basis function values at
thattime. Trajectoriesforeachsubtypeareparameterizedbyavectorofcoefficientsβ(cid:126)g ∈ Rdz for
g ∈ {1,...,G}, which are learned offline. Under subtype z , the predicted value of marker y
i ij
measured at time t is shown in Eq. 14(B). This component explains differences such as those
ij
observedbetweenthetrajectoriesinFigures1aand1b. Inmanycases,featuresatbaselinemaybe
predictive of subtype. For example, in scleroderma, the types of antibody an individual produces
(i.e. the presence of certain proteins in the blood) are correlated with certain trajectories. We can
improvepredictiveperformancebyconditioningonbaselinecovariatestoinferthesubtype. Todo
this, we use a multinomial logistic regression to define feature-dependent marginal probabilities:
zi | (cid:126)xiz ∼ Mult(π1:G((cid:126)xiz)),whereπg((cid:126)xiz) ∝ ew(cid:126)g(cid:62)(cid:126)xiz. Wedenotetheweightsofthemultinomial
regressionusingw(cid:126) ,wheretheweightsofthefirstclassareconstrainedtobe(cid:126)0toensuremodel
1:G
identifiability. Theremainingweightsarelearnedoffline.
Individual level. This level models deviations from the population and subpopulation models us-
ing parameters that are learned dynamically as the individual’s clinical history grows. Here, we
parameterizetheindividualcomponentusingalinearmodelwithbasisexpansionΦ(cid:96)(t) ∈ Rd(cid:96) and
individual-specificcoefficients(cid:126)bi ∈Rd(cid:96).Anindividual’scoefficientsaremodeledaslatentvariables
withmarginaldistribution(cid:126)b ∼ N((cid:126)0,Σ ). Forindividuali,thepredictedvalueofmarkery mea-
i b ij
suredat time t isshown inEq. 14(C). This componentcanexplain, forexample, differences in
ij
overallhealthduetoanunobservedcharacteristicsuchaschronicsmoking,whichmaycauseatyp-
icallylowerlungfunctionthanwhatispredictedbythepopulationandsubpopulationcomponents.
SuchanadjustmentisillustratedacrossthefirstandsecondrowsofFigure1d.
Structured noise. Finally, the structured noise component f captures transient trends. For ex-
i
ample, an infection may cause an individual’s lung function to temporarily appear more restricted
than it actually is, which may cause short-term trends like those shown in Figure1c and the third
row of Figure1d. We treat f as a function-valued latent variable and model it using a Gaus-
i
sian process with zero-valued mean function and Ornstein-Uhlenbeck (OU) covariance function:
K (t ,t )=a2exp(cid:8)−(cid:96)−1|t −t |(cid:9). The amplitude a controls the magnitude of the structured
OU 1 2 1 2
noisethatweexpecttoseeandthelength-scale(cid:96)controlsthelengthoftimeoverwhichweexpect
thesetemporarytrendstooccur. TheOUkernelisidealformodelingsuchdeviationsasitisboth
mean-revertinganddrawsfromthecorrespondingstochasticprocessareonlyfirst-ordercontinuous,
whicheliminateslong-rangedependenciesbetweendeviations[20]. Applicationsinotherdomains
mayrequiredifferentkernelstructuresmotivatedbypropertiesofthenoiseinthetrajectories.
1Themodelfocusesonpredictingthelong-termtrajectoryofanindividualwhenleftuntreated. Inmany
chronicconditions,asisthecaseforscleroderma,drugsonlyprovideshort-termrelief(accountedforinour
modelbytheindividual-specificadjustments). Iftreatmentsthatalterlong-termcourseareavailableandcom-
monlyprescribed,thentheseshouldbeincludedwithinthemodelasanadditionalcomponentthatinfluences
thetrajectory.
4
2.1 Learning
Objective function. To learn the parameters of our model Θ = {Λ,w(cid:126) ,β(cid:126) ,Σ ,a,(cid:96),σ2}, we
1:G 1:G b
maximizetheobserved-datalog-likelihood(i.e. theprobabilityofallindividual’smarkervalues(cid:126)y
i
given measurement times(cid:126)t and features {(cid:126)x ,(cid:126)x }). This requires marginalizing over the latent
i ip iz
variables{z ,(cid:126)b ,f }foreachindividual. Thisyieldsamixtureofmultivariatenormals:
i i i
G
P ((cid:126)y |X ,Θ)= (cid:88)π ((cid:126)x )N (cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17), (2)
i i zi iz i p i ip z i zi i i
zi=1
where K(t ,t ) = Φ (t )(cid:62)Σ Φ (t ) + K (t ,t ) + σ2I(t = t ). The observed-data
1 2 (cid:96) 1 b (cid:96) 2 OU 1 2 1 2
log-likelihood for all individuals is therefore: L(Θ)=(cid:80)M logP ((cid:126)y |X ,Θ). A more detailed
i=1 i i
derivationisprovidedinthesupplement.
Optimizing the objective. To maximize the observed-data log-likelihood with respect to Θ, we
partitiontheparametersintotwosubsets. Thefirstsubset,Θ ={Σ ,α,(cid:96),σ2},containsvaluesthat
1 b
parameterize the covariance function K(t ,t ) above. As is often done when designing the ker-
1 2
nelofaGaussianprocess,weuseacombinationofdomainknowledgetochoosecandidatevalues
and model selection using observed-data log-likelihood as a criterion for choosing among candi-
dates [20]. The second subset, Θ ={Λ,w(cid:126) ,β(cid:126) }, contains values that parameterize the mean
2 1:G 1:G
ofthemultivariatenormaldistributioninEquation16. Welearntheseparametersusingexpectation
maximization(EM)tofindalocalmaximumoftheobserved-datalog-likelihood.
Expectationstep. Allparametersrelatedto(cid:126)b andf arelimitedtothecovariancekernelandare
i i
not optimized using EM. We therefore only need to consider the subtype indicators z as unob-
i
servedintheexpectationstep. Becausez isdiscrete,itsposterioriscomputedbynormalizingthe
i
jointprobabilityofz and(cid:126)y . Letπ∗ denotetheposteriorprobabilitythatindividualihassubtype
i i ig
g ∈{1,...,G},thenwehave
π∗ ∝π ((cid:126)x )N (cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (3)
ig g iz i p i ip z i g i i
Maximization step. In the maximization step, we optimize the marginal probability of the soft
assignments under the multinomial logistic regression model with respect to w(cid:126) using gradient-
1:G
basedmethods. Tooptimizetheexpectedcomplete-datalog-likelihoodwithrespecttoΛandβ(cid:126) ,
1:G
we note that the mean of the multivariate normal for each individual is a linear function of these
parameters. HoldingΛfixed,wecanthereforesolveforβ(cid:126) inclosedformandviceversa. Weuse
1:G
ablockcoordinateascentapproach,alternatingbetweensolvingforΛandβ(cid:126) untilconvergence.
1:G
Becausetheexpectedcomplete-datalog-likelihoodisconcavewithrespecttoallparametersinΘ ,
2
eachmaximizationstepisguaranteedtoconverge. Weprovideadditionaldetailsinthesupplement.
2.2 Prediction
Our prediction yˆ(t(cid:48)) for the value of the trajectory at time t(cid:48) is the expectation of the marker y(cid:48)
i i i
under the posterior predictive conditioned on observed markers (cid:126)y measured at times(cid:126)t thus far.
i i
Thisrequiresevaluatingthefollowingexpression:
yˆ(t(cid:48))= (cid:88)G (cid:90) (cid:90) E(cid:104)y(cid:48) |z ,(cid:126)b ,f ,t(cid:48)(cid:105)P (cid:16)z ,(cid:126)b ,f |(cid:126)y ,X ,Θ(cid:17)df d(cid:126)b (4)
i i i i i i i i i i i i i
zi=1 Rd(cid:96) RNi (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125)
predictiongivenlatentvars. posterioroverlatentvars.
(cid:104) (cid:105)
=E∗ Φ (t(cid:48))(cid:62)Λ(cid:126)x +Φ (t(cid:48))(cid:62)β(cid:126) +Φ (t(cid:48))(cid:62)(cid:126)b +f (t(cid:48)) (5)
zi,(cid:126)bi,fi p i ip z i zi (cid:96) i i i i
β(cid:126)i∗(Eq.32) (cid:126)b∗i (Eq.36) f∗(t(cid:48))(Eq.43)
(cid:122) (cid:125)(cid:124) (cid:123) (cid:122) (cid:125)(cid:124) (cid:123) i i
(cid:104) (cid:105) (cid:104) (cid:105) (cid:122) (cid:125)(cid:124) (cid:123)
= Φ (t(cid:48))(cid:62)Λ(cid:126)x +Φ (t(cid:48))(cid:62)E∗ β(cid:126) +Φ (t(cid:48))(cid:62)E∗ (cid:126)b + E∗ [f (t(cid:48))] , (6)
(cid:124)p i(cid:123)(cid:122) ip(cid:125) (cid:124)z i (cid:123)(cid:122)zi zi(cid:125) (cid:124)(cid:96) i (cid:123)(cid:122)(cid:126)bi i(cid:125) (cid:124)fi (cid:123)(cid:122)i i(cid:125)
populationprediction subpopulationprediction individualprediction structurednoiseprediction
whereE∗ denotesanexpectationconditionedon(cid:126)y ,X ,Θ. InmovingfromEq. 29to30,wehave
i i
writtentheintegralasanexpectationandsubstitutedtheinnerexpectationwiththemeanofthenor-
maldistributioninEq.14. FromEq. 30to31,weuselinearityofexpectation. Eqs. 32,36,and43
5
1 2 4 1 2 4
(a) Pr()=0.57 Pr()=0.71 Pr()=0.60 (b)
80 Pr()=0.18 Pr()=0.21 Pr()=0.39 80
60 ●●●●●●●●● ●●●●●●●●● ●●●●●●●●● 60 ●●●●●●●●● ●●●●●●●●● ●●●●●●●●●
●● ●● ●● ●● ●● ●●
40 ●●●●● ●●●●● ●●●●● 40 ●●●●● ●●●●● ●●●●●
0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0
1 2 4 1 2 4
100 (c) 100 (d) Pr()=0.56
Pr()=0.40
80 80
60 ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● 60 ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●● ●● ● ● ● ●
40 40
Pr()=0.53 Pr()=0.54 Pr()=0.99 Pr()=0.54
20 Pr()=0.26 Pr()=0.46 Pr()=0.01 20 Pr()=0.46
0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0
Years Since First Seen
Figure 2: Plots (a) and (c) show dynamic predictions using the proposed model for two individuals. Red
markersareunobserved. Blueshowsthetrajectorypredictedusingthemostlikelysubtype,andgreenshows
thesecondmostlikely. Plot(b)showsdynamicpredictionsusingtheB-splineGPbaseline. Plot(d)shows
predictionsmadeusingtheproposedmodelwithoutindividual-specificadjustments.
belowshowhowtheexpectationsinEq.31arecomputed. Anexpandedversionofthesestepsare
providedinthesupplement.
Computingthepopulationpredictionisstraightforwardasallquantitiesareobserved. Tocompute
thesubpopulationprediction,weneedtocomputethemarginalposterioroverz ,whichweusedin
i
theexpectationstepabove(Eq.19). Theexpectedsubtypecoefficientsaretherefore
(cid:16) (cid:17)
β(cid:126)∗ (cid:44) (cid:80)G π∗ β(cid:126) . (7)
i zi=1 izi zi
Tocomputetheindividualprediction,notethatbyconditioningonz ,theintegraloverthelikelihood
i
withrespecttof andthepriorover(cid:126)b formthelikelihoodandpriorofaBayesianlinearregression.
i i
LetK =K ((cid:126)t ,(cid:126)t )+σ2I,thentheposteriorover(cid:126)b conditionedonz is:
f OU i i i i
P (cid:16)(cid:126)b |z ,(cid:126)y ,X ,Θ(cid:17)∝N (cid:16)(cid:126)b |0,Σ (cid:17)N (cid:16)(cid:126)y |Φ Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) +Φ (cid:0)(cid:126)t (cid:1)(cid:126)b , K (cid:17). (8)
i i i i i b i p ip z i zi (cid:96) i i f
JustasinEq.16, wehaveintegratedoverf movingitseffectfromthemeanofthenormaldistri-
i
bution to the covariance. Because the prior over(cid:126)b is conjugate to the likelihood on the right side
i
ofEq.33,theposteriorcanbewritteninclosedformasanormaldistribution(seee.g. [10]). The
meanoftheleftsideofEq.33istherefore
(cid:104) (cid:105)−1(cid:104) (cid:16) (cid:17)(cid:105)
Σ−1+Φ ((cid:126)t )(cid:62)K−1Φ ((cid:126)t ) Φ ((cid:126)t )(cid:62)K−1 (cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126) , (9)
b (cid:96) i f (cid:96) i (cid:96) i f i p i ip z i zi
Tocomputetheunconditionalposteriormeanof(cid:126)b wetaketheexpectationofEq.9withrespectto
i
theposterioroverz . Eq.9islinearinβ(cid:126) ,sowecandirectlyreplaceβ(cid:126) withitsmean(Eq.32):
i zi zi
(cid:104) (cid:105)−1(cid:104) (cid:16) (cid:17)(cid:105)
(cid:126)b∗ (cid:44) Σ−1+Φ ((cid:126)t )(cid:62)K−1Φ ((cid:126)t ) Φ ((cid:126)t )(cid:62)K−1 (cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126)∗ . (10)
i b (cid:96) i f (cid:96) i (cid:96) i f i p i ip z i i
Finally,tocomputethestructurednoiseprediction,notethatconditionedonz and(cid:126)b ,theGPprior
i i
and marker likelihood (Eq. 14) form a standard GP regression (see e.g. [20]). The conditional
posterioroff (t(cid:48))isthereforeaGPwithmean
i i
K (t(cid:48),(cid:126)t )(cid:2)K ((cid:126)t ,(cid:126)t )+σ2I(cid:3)−1(cid:16)(cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126) −Φ ((cid:126)t )(cid:126)b (cid:17). (11)
OU i i OU i i i p i ip z i zi (cid:96) i i
Tocomputetheunconditionalposteriorexpectationoff (t(cid:48)), wenotethattheexpressionaboveis
i i
linearinz and(cid:126)b andsotheirexpectationscanbepluggedintoobtain
i i
f∗(t(cid:48))(cid:44)K (t(cid:48),(cid:126)t )(cid:2)K ((cid:126)t ,(cid:126)t )+σ2I(cid:3)−1(cid:16)(cid:126)y −Φ ((cid:126)t )Λ(cid:126)x −Φ ((cid:126)t )β(cid:126)∗−Φ ((cid:126)t )(cid:126)b∗(cid:17). (12)
i OU i i OU i i i p i ip z i i (cid:96) i i
6
3 Experiments
Wedemonstrateourapproachbybuildingatooltopredictthelungdiseasetrajectoriesofindivid-
uals with scleroderma. Lung disease is currently the leading cause of death among scleroderma
patients,andisnotoriouslydifficulttotreatbecausetherearefewpredictorsofdeclineandthereis
tremendousvariabilityacrossindividualtrajectories[21]. Clinicianstracklungseverityusingper-
centofpredictedforcedvitalcapacity(PFVC),whichisexpectedtodropasthediseaseprogresses.
In addition, demographic variables and molecular test results are often available at baseline to aid
prognoses. WetrainandvalidateourmodelusingdatafromtheJohnsHopkinsSclerodermaCenter
patientregistry,whichisoneofthelargestintheworld. Toselectindividualsfromtheregistry,we
used the following criteria. First, we include individuals who were seen at the clinic within two
yearsoftheirearliestscleroderma-relatedsymptom. Second,weexcludeallindividualswithfewer
thantwoPFVCmeasurementsaftertheirfirstvisit. Finally,weexcludeindividualswhoreceiveda
lungtransplant. Thedatasetcontains672individualsandatotalof4,992PFVCmeasurements.
Forthepopulationmodel,weuseconstantfunctions(i.e. observedcovariatesadjustanindividual’s
intercept). The population covariates ((cid:126)x ) are gender, African American race, and indicators of
ip
ACA and Scl-70 antibodies—two proteins believed to be connected to scleroderma-related lung
disease. Notethatallfeaturesarebinary. ForthesubpopulationB-splines, wesetboundaryknots
at0and25years(themaximumobservationtimeinourdatasetis23years),usetwointeriorknots
that divide the time period from 0-25 years into three equally spaced chunks, and use quadratics
as the piecewise components. These B-spline hyperparameters (knots and polynomial degree) are
also used for all baseline models. We select G = 9 subtypes using BIC. The covariates in the
subtypemarginalmodel((cid:126)x )arethesameusedinthepopulationmodel. Fortheindividualmodel,
iz
weuselinearfunctions. Forthehyper-parametersΘ = {Σ ,α,(cid:96),σ2}wesetΣ tobeadiagonal
1 b b
covariance matrix with entries [16,10−2] along the diagonal, which correspond to intercept and
slopevariancesrespectively. Finally, wesetα = 6, (cid:96) = 2, andσ2 = 1usingdomainknowledge;
weexpecttransientdeviationstolastaround2yearsandtochangePFVCbyaround±6units.
Baselines. First,tocompareagainsttypicalapproachesusedinclinicalmedicinethatconditionon
baselinecovariatesonly(e.g. [22]),wefitaregressionmodelconditionedonallcovariatesincluded
in(cid:126)x above. ThemeanisparameterizedusingB-splinebases(Φ(t))as:
iz
(cid:16) (cid:17)
yˆ|(cid:126)x =Φ(t)(cid:62) β(cid:126) +(cid:80) x β(cid:126) +(cid:80) x x β(cid:126) . (13)
iz 0 xiin(cid:126)xiz i i xi,xjinpairsof(cid:126)xiz i j ij
The second baseline is similar to [8] and [23] and extends the first baseline by accounting for
individual-specific heterogeneity. The model has a mean function identical to the first baseline
and individualizes predictions using a GP with the same kernel as in Equation 16 (using hyper-
parameters as above). Another natural approach is to explain heterogeneity by using a mixture
modelsimilarto[9]. However,amixturemodelcannotadequatelyexplainawayindividual-specific
sourcesofvariabilitythatareunrelatedtosubtypeandthereforefailstorecoversubtypesthatcapture
canonicaltrajectories(wediscussthisindetailinthesupplementalsection).Therecoveredsubtypes
fromthefullmodeldonotsufferfromthisissue. Tomakethecomparisonfairandtounderstandthe
extenttowhichtheindividual-specificcomponentcontributestowardspersonalizingpredictions,we
createamixturemodel(Proposedw/nopersonalization)wherethesubtypesarefixedtobethesame
asthoseinthefullmodelandtheremainingparametersarelearned. Notethatthisversiondoesnot
containtheindividual-specificcomponent.
Evaluation. Wemakepredictionsafterone, two, andfouryearsoffollow-up. Errorsaresumma-
rized within four disjoint time periods: (1,2], (2,4], (4,8], and (8,25] years2. To measure error,
we use the absolute difference between the prediction and a smoothed version of the individual’s
observedtrajectory. Weestimatemeanabsoluteerror(MAE)using10-foldCVatthelevelofindi-
viduals(i.e. allofanindividual’sdataisheld-out),andtestforstatisticallysignificantreductionsin
errorusingaone-sided, pairedt-test. Forallmodels, weusetheMAPestimateoftheindividual’s
trajectory.Inthemodelsthatincludesubtypes,thismeansthatwechoosethetrajectorypredictedby
themostlikelysubtypeundertheposterior. Althoughthisdiscardsinformationfromtheposterior,
inourexperiencecliniciansfindthischoicetobemoreinterpretable.
Qualitativeresults. InFigure2wepresentdynamicallyupdatedpredictionsfortwopatients(one
perrow,dynamicupdatesmovelefttoright).Bluelinesindicatethepredictionunderthemostlikely
subtype and green lines indicate the prediction under the second most likely. The first individual
2Aftertheeighthyear,databecomestoosparsetofurtherdividethistimespan.
7
Predictionsusing1yearofdata
Model (1,2] %Im. (2,4] %Im. (4,8] %Im. (8,25] %Im.
B-splinewithBaselineFeats. 12.78 12.73 12.40 12.14
B-spline+GP 5.49 7.70 9.67 10.71
Proposed 5.26 ∗7.04 8.6 10.17 12.12
Proposedw/nopersonalization 6.17 7.12 9.38 12.85
Predictionsusing2yearsofdata
B-splinewithBaselineFeats. 12.73 12.40 12.14
B-spline+GP 5.88 8.65 10.02
Proposed ∗5.48 6.8 ∗7.95 8.1 9.53
Proposedw/nopersonalization 6.00 8.12 11.39
Predictionsusing4yearsofdata
B-splinewithBaselineFeats. 12.40 12.14
B-spline+GP 6.00 8.88
Proposed ∗5.14 14.3 ∗7.58 14.3
Proposedw/nopersonalization 5.75 9.16
Table1:MAEofPFVCpredictionsforthetwobaselinesandtheproposedmodel.Boldnumbersindicatebest
performanceacrossmodels(∗isstat.significant).“%Im.”reportspercentimprovementovernextbest.
(Figure2a)isa50-year-old,whitewomanwithScl-70antibodies,whicharethoughttobeassociated
withactivelungdisease. Withinthefirstyear,herdiseaseseemsstable,andthemodelpredictsthis
course with 57% confidence. After another year of data, the model shifts 21% of its belief to a
rapidlydecliningtrajectory;likelyinpartduetothesuddendipinyear2. Wecontrastthiswiththe
behavioroftheB-splineGPshowninFigure2b,whichhaslimitedcapacitytoexpressindividualized
long-termbehavior.Weseethatthemodeldoesnotadequatelyadjustinlightofthedownwardtrend
betweenyearsoneandtwo. Toillustratethevalueofincludingindividual-specificadjustments,we
nowturntoFigures2cand2d(whichplotpredictionsmadebytheproposedmodelwithandwithout
personalization respectively). This individual is a 60-year-old, white man that is Scl-70 negative,
which makes declining lung function less likely. Both models use the same set of subtypes, but
whereasthemodelwithoutindividual-specificadjustmentdoesnotconsidertherecoveringsubtype
to be likely until after year two, the full model shifts the recovering subtype trajectory downward
towardstheman’sinitialPFVCvalueandidentifythecorrecttrajectoryusingasingleyearofdata.
Quantitativeresults. Table1reportsMAEforthebaselinesandtheproposedmodel. Wenotethat
afterobservingtwoormoreyearsofdata,ourmodel’serrorsaresmallerthanthetwobaselines(and
statisticallysignificantlysoinallbutonecomparison). AlthoughtheB-splineGPimprovesoverthe
firstbaseline,theseresultssuggestthatbothsubpopulationandindividual-specificcomponentsen-
ablemoreaccuratepredictionsofanindividual’sfuturecourseasmoredataareobserved.Moreover,
bycomparingtheproposedmodelwithandwithoutpersonalization,weseethatsubtypesaloneare
not sufficient and that individual-specific adjustments are critical. These improvements also have
clinicalsignificance. Forexample,individualswhodropbymorethan10PFVCarecandidatesfor
aggressiveimmunosuppressivetherapy. Outofthe7.5%ofindividualsinourdatawhodeclineby
morethan10PFVC,ourmodelpredictssuchadeclineattwicethetrue-positiverateoftheB-spline
GP(31%vs. 17%)andwithalowerfalse-positiverate(81%vs. 90%).
4 Conclusion
We have described a hierarchical model for making individualized predictions of disease activity
trajectories that accounts for both latent and observed sources of heterogeneity. We empirically
demonstrated that using all elements of the proposed hierarchy allows our model to dynamically
personalize predictions and reduce error as more data about an individual is collected. Although
our analysis focused on scleroderma, our approach is more broadly applicable to other complex,
heterogeneousdiseases[1]. Examplesofsuchdiseasesincludeasthma[3], autism[4], andCOPD
[5]. Thereareseveralpromisingdirectionsforfurtherdevelopingtheideaspresentedhere. First,we
observedthatpredictionsarelessaccurateearlyinthediseasecoursewhenlittledataisavailableto
learntheindividual-specificadjustments.Toaddressthisshortcoming,itmaybepossibletoleverage
time-dependentcovariatesinadditiontothebaselinecovariatesusedhere.Second,thequalityofour
predictionsdependsupontheallowedtypesofindividual-specificadjustmentsencodedinthemodel.
More sophisticated models of individual variation may further improve performance. Moreover,
approaches for automatically learning the class of possible adjustments would make it possible to
applyourapproachtonewdiseasesmorequickly.
8
References
[1] J.Craig. Complexdiseases:Researchandapplications. NatureEducation,1(1):184,2008.
[2] J.Varga, C.P.Denton, andF.M.Wigley. Scleroderma: FromPathogenesistoComprehensiveManage-
ment. SpringerScience&BusinessMedia,2012.
[3] J.Lo¨tvalletal. Asthmaendotypes:anewapproachtoclassificationofdiseaseentitieswithintheasthma
syndrome. JournalofAllergyandClinicalImmunology,127(2):355–360,2011.
[4] L.D.Wiggins,D.L.Robins,L.B.Adamson,R.Bakeman,andC.C.Henrich. Supportforadimensional
viewofautismspectrumdisordersintoddlers.Journalofautismanddevelopmentaldisorders,42(2):191–
200,2012.
[5] P.J. Castaldi et al. Cluster analysis in the copdgene study identifies subtypes of smokers with distinct
patternsofairwaydiseaseandemphysema. Thorax,2014.
[6] S.SariaandA.Goldenberg. Subtyping: WhatIsItandItsRoleinPrecisionMedicine. IEEEIntelligent
Systems,30,2015.
[7] D.S.Lee,P.C.Austin,J.L.Rouleau,P.PLiu,D.Naimark,andJ.V.Tu.Predictingmortalityamongpatients
hospitalizedforheartfailure: derivationandvalidationofaclinicalmodel. Jama,290(19):2581–2587,
2003.
[8] D.Rizopoulos. Dynamicpredictionsandprospectiveaccuracyinjointmodelsforlongitudinalandtime-
to-eventdata. Biometrics,67(3):819–829,2011.
[9] C.Proust-Limaetal. Jointlatentclassmodelsforlongitudinalandtime-to-eventdata:Areview. Statisti-
calMethodsinMedicalResearch,23(1):74–90,2014.
[10] K.P.Murphy. Machinelearning:aprobabilisticperspective. MITpress,2012.
[11] S.Roberts,M.Osborne,M.Ebden,S.Reece,N.Gibson,andS.Aigrain. Gaussianprocessesfortime-
seriesmodelling. PhilosophicalTransactionsoftheRoyalSocietyA:Mathematical,PhysicalandEngi-
neeringSciences,371(1984):20110550,2013.
[12] J.Q.Shi,R.Murray-Smith,andD.M.Titterington.Hierarchicalgaussianprocessmixturesforregression.
Statisticsandcomputing,15(1):31–41,2005.
[13] A.GelmanandJ.Hill. Dataanalysisusingregressionandmultilevel/hierarchicalmodels. Cambridge
UniversityPress,2006.
[14] H. Wang et al. High-order multi-task feature learning to identify longitudinal phenotypic markers for
alzheimer’sdiseaseprogressionprediction.InAdvancesinNeuralInformationProcessingSystems,pages
1277–1285,2012.
[15] J.RossandJ.Dy. Nonparametricmixtureofgaussianprocesseswithconstraints. InProceedingsofthe
30thInternationalConferenceonMachineLearning(ICML-13),pages1346–1354,2013.
[16] P.F.Schulam,F.M.Wigley,andS.Saria. Clusteringlongitudinalclinicalmarkertrajectoriesfromelec-
tronichealthdata: Applicationstophenotypingandendotypediscovery. InProceedingsoftheTwinty-
NinthAAAIConferenceonArtificialIntelligence,2015.
[17] B.M.Marlin. Modelinguserratingprofilesforcollaborativefiltering. InAdvancesinneuralinformation
processingsystems,2003.
[18] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of
thestate-of-the-artandpossibleextensions. KnowledgeandDataEngineering, IEEETransactionson,
17(6):734–749,2005.
[19] D.Sontag,K.Collins-Thompson,P.N.Bennett,R.W.White,S.Dumais,andB.Billerbeck. Probabilistic
modelsforpersonalizingwebsearch. InProceedingsofthefifthACMinternationalconferenceonWeb
searchanddatamining,pages433–442.ACM,2012.
[20] C.E.RasmussenandC.K.Williams. GaussianProcessesforMachineLearning. TheMITPress,2006.
[21] Y.Allanoreetal. Systemicsclerosis. NatureReviewsDiseasePrimers,2015.
[22] D. Khanna et al. Clinical course of lung physiology in patients with scleroderma and interstitial lung
disease: analysisofthesclerodermalungstudyplacebogroup. Arthritis&Rheumatism,63(10):3078–
3085,2011.
[23] J.Q.Shi,B.Wang,E.J.Will,andR.M.West.Mixed-effectsgaussianprocessfunctionalregressionmodels
withapplicationtodose–responsecurveprediction. Stat.Med.,31(26):3165–3177,2012.
9
Supplement
A ExpectationMaximizationforDiseaseTrajectoryModel
A.1 Objective
We include the derivation of the EM objective for convenience. Recall that the model for marker y given
ij
parametersΘandlatentvariables{z ,(cid:126)b ,f }is
i i i
y ∼NΦ (t )(cid:62)Λ(cid:126)x +Φ (t )(cid:62)β(cid:126) +Φ (t )(cid:62)(cid:126)b + f (t ) ,σ2. (14)
ij p ij ip z ij zi (cid:96) ij i i ij
(cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125)
(A)population (B)subpopulation (C)individual (D)structurednoise
Let X = {(cid:126)t ,(cid:126)x ,(cid:126)x } denote individual i’s observation times and features, then marginalizing the joint
i i ip iz
likelihoodgivesus
P((cid:126)y |X ,Θ)
i i
= (cid:88)G P(z |X ,Θ)(cid:90) P(cid:16)(cid:126)b |Θ(cid:17)(cid:90) P(f |Θ)P(cid:16)(cid:126)y |z ,(cid:126)b ,f ,X ,Θ(cid:17)df d(cid:126)b (15)
i i i i i i i i i i i
zi=1(cid:124)Mult.reg(cid:123)re(cid:122)ssionprio(cid:125)r Rd(cid:96) (cid:124) (cid:123)(cid:122) (cid:125) RNi (cid:124) GP(cid:123)p(cid:122)rior (cid:125)(cid:124) (cid:123)(cid:122) (cid:125)
Normalprior Eq.14
= (cid:88)G π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (16)
zi iz i p i ip z i zi i i
zi=1
MovingfromEq.15toEq.16,weevaluatetheinnermostintegralusingthefactthattheGPprioroverf is
i
conjugatetoEq.14yieldinganewmultivariatenormal[20]. ToevaluatethenextintegralinEq.15,weagain
havethatthenormalpriorover(cid:126)b isconjugatetothemultivariatenormalobtainedbymarginalizingoverf ,
i i
whichgivesusthemultivariatenormalshowninEq.16wherethecovariancefunctionisdefinedas
K(t ,t )=Φ (t )(cid:62)Σ Φ (t )+K (t ,t )+σ2I(t =t ). (17)
1 2 (cid:96) 1 b (cid:96) 2 OU 1 2 1 2
Weseethattheobserved-datalog-likelihoodforindividualiisdefinedbyamixtureofmultivariatenormals
where each subtype is associated with a class in the mixture. The mixing probabilities are defined by the
multinomiallogisticregression.Themeanofthemultivariatenormalisdefinedbythepopulationandsubpop-
ulationmodels,andthecovarianceisdefinedbytheindividualandstructurednoisemodels.Theobserved-data
log-likelihoodforallindividualsistherefore
L(Θ)=(cid:88)M log(cid:34)(cid:88)G π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17)(cid:35). (18)
zi iz i p i ip z i zi i i
i=1 zi=1
A.2 ExpectationStep
Allparametersrelatedto(cid:126)b andf arelimitedtothecovariancekernelandarethereforenotoptimizedusing
i i
EM.Wethereforeonlyneedtoconsiderthemechanismindicatorsz asunobservedintheexpectationstep.
i
Becausez isdiscrete,itsposteriorissimplycomputedusingthejointprobabilityofz and(cid:126)y .Letπ∗ denote
i i i ig
theposteriorprobabilitythatindividualihasmechanismg∈{1,...,G},thenwehave
π∗ ∝π ((cid:126)x )N(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (19)
ig g iz i p i ip z i g i i
A.3 MaximizationStep
Inthemaximizationstep,wemaximizetheexpectedcomplete-dataloglikelihoodwithrespecttoΘ . Wecan
2
writethecomplete-dataloglikelihoodas
L (Θ )=(cid:88)M logπ ((cid:126)x )+logN(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17)
c 2 zi iz i p i ip z i zi i i
i=1
=(cid:88)M logπ ((cid:126)x )+(cid:88)M logN(cid:16)(cid:126)y |Φ (cid:0)(cid:126)t (cid:1)Λ(cid:126)x +Φ (cid:0)(cid:126)t (cid:1)β(cid:126) ,K(cid:0)(cid:126)t ,(cid:126)t (cid:1)(cid:17). (20)
zi iz i p i ip z i zi i i
i=1 i=1
(cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125)
Subtypemarginal Markerconditional
Theexpectationistakenwithrespecttotheposterioroverz .
i
10