Table Of ContentMachinLee arniinnNg o n-StatiEonnvairryo nments
AdaptCiovmep utaatnidoM na chiLneea rning
ThomDaise ttEedriitcohr,
ChristBoipshheDorap v,Hi edc kerMmiacnh,Ja oerld aannMd,i chKaeealr ns,
AssocEidaitteo rs
A compleltieso tft heb ookpsu blisihnet dh isse riceasnb ef ounadt t heb ack
oft heb ook.
MACHINEL EARNINGI NN ON-STATIONAERNVYI RONMENTS
IntrodutcoCt oivoanr iSahtiAefd ta ptation
MasasShuig iyaamnadM otoaKkaiw anabe
TheM ITP ress
CambridMgaes,s achusetts
LondoEnn,g land
© 2012M assachusIentsttsi otfuT teec hnology
Allr ighrtess ervNeod p.a rotf t hibso okm ayb er eproduicnea dn yf ormb ya nye lectronic
orm echanimceaaln s (incplhuodtioncgo pyriencgo,r dionrig n,f ormatsitoonr aagned r etrieval)
withopuetr missiinwo rni tifrnogm thpeu blisher.
Fori nformaatbioounst p eciqaula ntdiitsyc oupnltesa,es mea il
[email protected]
Thibso okw ass eitn S ynteaxn dT imesRombaynN ewgen.
Printaenddb ounidn t heU niteSdt atoefsA merica.
LibraorfyC ongreCsast aloging-in-PDuabtlai cation
SugiyamMaa,s ash1i9,7 4-
Machinlee arninign n on-statieonnvairryo nm:ei nnttsr oducttoci oovna risahtieaft d aptat/i on
MasasShuig iyaarnnad M otoakKia wanabe.
p. cm.- (Adaptciovmep utatainodmn a chinlee arninsge ries)
Includbeisb liograrpehfiecraelna cnedis n dex.
ISBN9 78-0-262-01(7h0a9r-d1c oavlekrp:.a per)
I.M achinlee arningI..K awanabMeo,t oakiI.I T.i tle.
Q325.5.S824051 2
006.3'I-dc23
2011032824
10987653421
Contents
Foreworxdi
Prefacxei ii
INTRODUCTION
1 IntroducatnidPo rno blFeomr mulati3o n
1.1 MachinLee arniunngd eCro variSahtief t3
1.2 QuicTko uorf C ovariSahtieAf dta ptati5o n
1.3 ProblFeomr mulati7o n
1.3.F1u nctLieoanr nifnrgoE mx ample7s
1.3.L2o sFsu nctio8n s
1.3.G3e neralizEartrioor9n
1.3.C4o variSahtief t9
1.3.M5o delfso Fru nctLieoanr nin1g0
1.3.S6p ecificoafMt oidoenl s1 3
1.4 StructoufTr hei Bso ok 14
1.4.P1a rItIL :e arniunngd eCro variSahtief t1 4
1.4.P2a rItI LIe:a rniCnagu siCnogv ariSahtief t1 7
II LERANINGU NDERC OVARIAETS HIFT
2 FunctiAopnp roximat2i1o n
2.1 Importance-Weightinfgo CrTo evcahrniiSaqhtuieAef dsta ptati2o2n
2.1.I1m portance-WeEiRgMh t2e2d
2.1.A2d aptiIvWeE RM 23
2.1.R3e gularIiWzEeRdM 23
2.2 ExamploefIs m portance-WeRiegghrteesdsM ieotnh ods2 5
2.2.S1q uarLeods Lse: ast-SqRueagrreess si2o6n
2.2.A2b soluLtoes Lse:a st-AbsRoelgurtees si3o0n
2.2.H3u beLro ssH:u beRre gressi3o1n
2.2.D4e adzone-LLionseSsau:rp poVretc tRoerg ressi3o3n
2.3 ExamploefIs m portance-WeCilgahstseidf iMceatthioodns3 5
vi Contents
2.3.S1q uarLeods sF:i shDeirs crimAinnaalnyts i3s6
2.3.L2o gisLtoiscLs o:g istic ReCglraessssiif3oin8e r
2.3.H3i ngLeo sSsu: ppoVretc tMoarc hine3 9
2.3.E4x ponenLtoisasBl:o ostin4g0
2.4 NumeriEcxaalm ple4s0
2.4.R1e gressi4o0n
2.4.C2l assific4a1t ion
2.5 Summarayn dD iscussi4o5n
3 ModeSle lecti4o7n
3.1 Importance-WeAikgahitIkenedf ormation Cr4i7t erion
3.2 Importance-WeSiugbhstpeaIdcn ef ormaCtriiotne ri5o0n
3.2.I1n puDte pendenvcsIe.n puItn dependienn ce
GeneralizEartrAionoran l ysi5s1
3.2.A2p proximaCtoerlryeM cotd els5 3
3.2.I3n put-DepeAnndaelnytos fiG se neralizEartrioor5n 4
3.3 Importance-WeCirgohstse-dV alid6a4t ion
3.4 NumeriEcxaalm ple6s6
3.4.R1e gressi6o6n
3.4.C2l assific6a9t ion
3.5 Summarayn dD iscussi7o0n
4 ImportaEnsctei mati7o3n
4.1 KernDeeln siEtsyt imati7o3n
4.2 Kernel Mean Ma7t5ch ing
4.3 LogisRteigcr essi7o6n
4.4 Kullback-LIemipbolretarEn sctei matPrioocne dur7e8
4.4.A1l gorith7m8
4.4.M2o deSle lectbiyCo rno ss-Valid8a1t ion
4.4.B3a sFiusn ction De8s2i gn
4.5 Least-SqIumaproerst aFnictet in8g3
4.5.A1l gorith8m3
4.5.B2a sFiusn ction aDnedsM iogdne Sle lecti8o4n
4.5.R3e gularizPaattTihro anc kin8g5
4.6 UnconstraLienaesdt -Squares IFmiptotritn8ag7n ce
4.6.A1l gorith8m7
4.6.A2n alyCtoimcp utatoifLo ena ve-OneC-rOousts -Valid8a8t ion
4.7 Numerical Exa8m8p les
4.7.S1e ttin9g0
4.7.I2m portance EsbtyiK mLaitEiP9o 0n
4.7.C3o variSahtieAf dta ptatbiyoI nW LSa ndI WCV 92
4.8 ExperimeCnotmapla riso9n4
4.9 Summary 101
5 DireDcetn sity-ERsattiiom atwiiotDnhi mensionRaeldiutcyt io1n0 3
5.1 DensiDtiyf feriennH ceet ero-DistrSiubbustpiaocn1ea0 l3
5.2 CharacteriozfHa ettieorno -DistrSiubbustpiaocn1ea0 l4
Contents vii
5.3 IdentifHyeitnegr o-DistrSiubbustpiaocn1ea0 l6
5.3.B1a sIidce a 106
5.3.F2i shDeirs crimAinnaalnyts i1s0 8
5.3.L3o caFli shDeirs crimAinnaalnyts i1s0 9
5.4 UsinLgF DA foFri ndiHnegt ero-DistrSiubbustpiaocn1ea1 l2
5.5 Density-ERsattiimoa itnit ohneH etero-DistrSiubbustpiaocn1ea1 l3
5.6 NumeriEcxaalm ple1s1 3
5.6.I1l lustErxaatmipvlee 1 13
5.6.P2e rformaCnocmep ariUssoinn Agr tifDiactiSaae lt s 117
5.7 Summary 121
6 RelattiooS na mplSee lectBiioans 1 25
6.1 HeckmanS'asm plSee lectMioodne l 125
6.2 DistribuCthiaonngaaenl d S amplSee lectBiioans 1 29
6.3 TheT wo-StAelpg orith1m3 1
6.4 RelattiooC no variSahtieAf ptp roach1 34
7 ApplicatoifCo onvsa riSahtieAf dta ptati1o3n7
7.1 Brain-CompIuntteerr fa1c3e7
7.1.B1a ckgroun1d3 7
7.1.E2x perimeSnettaulp 1 38
7.1.E3x perimeRnetsaull t1s4 0
7.2 SpeakIedre ntific1a4t2i on
7.2.B1a ckgroun1d4 2
7.2.F2o rmulati1o4n2
7.2.E3x perimeRnetsaull t1s4 4
7.3 NaturLaaln guaPgreo cessi1n4g9
7.3.F1o rmulati1o4n9
7.3.E2x perimeRnetsaull t1s5 1
7.4 PerceiAvgeePd r edicftriooFmna cIem ages1 52
7.4.B1a ckgroun1d5 2
7.4.F2o rmulati1o5n3
7.4.I3n corporCahtairnagc teroifHs utmiacnAs g eP ercepti1o5n3
7.4.E4x perimeRnetsaull t1s5 5
7.5 HumanA ctivRietcyo gniftriooAmnc celeromDeattrai 1c5 7
7.5.B1a ckgroun1d5 7
7.5.I2m portance-Weighted Least-Squares 1P5r7o babilistic Classifier
7.5.E3x perimeRnetsaull t1s6 0
7.6 SamplRee usienR einforceLmeeanrtn in1g6 5
7.6.M1a rkoDve cisPiroonb lem1s6 5
7.6.P2o liIctye rati1o6n6
7.6.V3a luFeu nctAipopnr oximat1i6o7n
7.6.S4a mplRee usbey C ovariSahtieAf dta ptati1o6n8
7.6.O5n -PolviscO.yf f-Poli1c6y9
7.6.I6m portaWnecieg htiinnV ga luFeu nctAipopnr oximat1i7o0n
7.6.A7u tomaSteilce ctoifto hneF lattePnairnagm ete1r7 4
viii Contents
7.6.S8a mplRee usPeo liIctye rati1o7n5
7.6.R9o boCto ntrEoxlp erimen1t7s6
III LERANINGC AUSINCGO VARIAETS HIFT
8 ActiLveea rnin1g8 3
8.1 Prelimina1r8i3e s
8.1.S1e tup1 83
8.1.D2e composiotfGi eonne ralizEartrioor1n 8 5
8.1.B3a sSitcr atoefgA yc tiLveea rnin1g8 8
8.2 PopulationA-cBtaisLveeeda rniMnegt hods1 88
8.2.C1l assMiectahlo odf A ctiLveea rnifnoCrgo rreMcotd els1 89
8.2.L2i mitatoifCo lnass sAipcparlo aacnhdC ountermeasu1r9e0s
8.2.I3n put-IndepVeanrdieanntc e-MOentlhyo d 191
8.2.I4n put-DepeVnadreinatn ce-MOentlhyo d 193
8.2.I5n put-IndepBeinadse-natn d-VAaprpiraonacceh1 95
8.3 NumeriEcxaalm ploefPs o pulationA-cBtaisLveeeda rniMnegt hods1 98
8.3.S1e tup1 98
8.3.A2c curacGye noefr alizEartrEiosortn i mati2o0n0
8.3.O3b tainGeedn eralizErartoiro2 n0 2
8.4 Pool-BaAscetdiL veea rniMnegt hods2 04
8.4.C1l assAicctailLv eea rniMnegt hofdo Cro rreMcotd elasn d
ItLsi mitati2o0n4s
8.4.I2n put-IndepVeanrdieanntc e-MOentlhyo d 205
8.4.I3n put-DepeVnadreinatn ce-MOentlhyo d 206
8.4.I4n put-IndepBeinadse-natn d-VAaprpiraonacceh2 07
8.5 NumeriEcxaalm ploefPs o ol-BaAscetdiL veea rniMnegt hods2 09
8.6 Summarayn dD iscussi2o1n2
9 ActiLveea rniwnigtM ho deSle lecti2o1n5
9.1 DireAcptp roaacnhdt heA ctiLveea rning/MSoedleelc tDiiolne mma2 15
9.2 Sequential Ap2p1r6o ach
9.3 BatcAhp proach2 18
9.4 EnsembAlcet iLveea rnin2g1 9
9.5 NumeriEcxaalm ple2s2 0
9.5.S1e ttin2g2 0
9.5.A2n alyosfiB sa tcAhp proach2 21
9.5.A3n alyosfiS se quenAtpiparlo ach2 22
9.5.C4o mpariosfoO nb tainGeedn eralizEartrioor2n 2 2
9.6 Summarayn dD iscussi2o2n3
10 ApplicatoifAo cntsi Lveea rnin2g25
10.1D esiogfnE fficiEexnptl oraSttiroant eignRi eeisn forceLmeeanrtn in2g2 5
10.1.E1f ficient ExwpiltoAhrc attiiLvoeena rnin2g2 5
10.1.R2e inforceLmeeanrtn iRnegv isit2e2d6
10.1.D3e composiotfGi eonne ralizEartrioor2n 2 8
Contents ix
10.1.E4s timaGteinnegr alizEartrfiooorArn c tiLveea rnin2g2 9
10.1.D5e signSianmgp liPnogl ici2e3s0
10.1.A6c tiLveea rniinnPg o liIctye rati2o3n1
10.1.R7o boCto ntrEoxlp erimen2t3s2
10.2W afeArl ignmiennS te miconduEcxtpoors uArpep aratu2s3 4
IV CONCLUSIONS
11 ConclusiaonndFs u tuPrreo spec2t4s1
11.1C onclusio2n4s1
11.2F utuPrreo spec2t4s2
AppendLiixso:tf S ymboalnsd A bbreviati2o4n3s
Bibliogra2p4h7y
Index2 59
Foreword
Modermna chinlee arnfiancge asn umbeorf g rancdh allenTgheese .v egrr ow
ingW orldW ideW eb,h ight hroughmpeutth odisng enomicasn,dm odern
imagimnegt hodisnb raisnc ientcoen ,a mej usatf ewp,o seev elra rgperro b
lemsw herlee arnimnegt hodnse edt os calteo,i ncreatshee ierf ficienacnyd,
algoritnhemesdt ob ecomaeb lteo d eawli thm illion-dimeinnspiuoatntsa l
terabyotfed sa taA.tt hes amet imiet b ecomemso rea ndm orei mporttaon t
efficienatnldry o bustmloyd elh ighlcyo mplepxr obletmhsa atr es tructured
(e.ga. g,r ammaurn derltiheeds a taa)n de xhibniotn linbeeahra viIonra .d di
tiodna,t far otmh er eawlo rladr et ypicanlolny- stastoit hoenriaesary n ,ee tdo
compensfaottreh en on-statiaosnpaercoytf ts h ed atian o rdetrom apt hep rob
lemb actko s tationFairniatlywl.hy e,ne xplaintihnawgth ilmea chinlee arning
andm odersnt atisgteincesr aat vea sntu mbeorf a lgoritthhmastt a cktlhee
abovceh allenigteb se,c omeisn creasiinmgployr tfaonrtt hep ractitniootn er
onltyo p rediacntdg eneralwiezleol n u nseedna tbau tt oa lstoo e xplatihne
nonlinperaerd iclteiavren miancgh inteh,ai ts t,o h arvetshtep redictciaopna
biliftoyrm akinign ferenacbeosut th ew orltdh awti lclo ntribtuoat bee tter
understanodfti hnesg c iences.
The presebnoto kc ontribtuoto ensea speocftt hea bove-mentigornaendd
challenngaemse:l tyh,ew orlodf n on-statidoantaiarsa y d dressCelda.s sically,
learnailnwga yass sumtehsa tth eu nderlypirnogb abidliisttyr ibouftt ihodena ta
fromw hicihn fereinsmc aed es taytsh es ameI.n o thewro rdsi,ti su nderstood
thatth eriesn oc hangiend istribbuettiwoente hnse amplfer omw hicwhe l earn
toa ndt hen ove(lu nseeonu)t -of-sadmaptlaIe.n m anyp ractisceatlt itnhgiss
assumptiisio nnc orraencdtt h,u sst andaprrde dicwtiilollni kebleys uboptimal.
The presebnoto kv ersyu ccessfauslsleym bltehses tate-of-the-art research
resulotns l earniinngn on-statieonnvairryo nments-aw fiotchu osn the
covarisahtiemf otd el-anhda se mbeddetdh ibso dyo fw orki nttoh eg eneral
literaftruormme a chinlee arni(nsge misuperlveiarsneidno gn,l inlee arning,
Description:As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of mac