Table Of Content

Machine Learning: A Probabilistic Perspective Solutions Manual (Please do not make publicly available) Kevin P. Murphy TheMITPress Cambridge,Massachusetts London,England ii Chapter 1 Introduction 1.1 Solutions 1.1.1 KNNclassifieronshuffledMNISTdata Wejusthavetoinsertthefollowingpieceofcode. Listing1.1:PartofmnistShuffled1NNdemo ... load data %% permute columns D = 28*28; setSeed(0); perm = randperm(D); Xtrain = Xtrain(:, perm); Xtest = Xtest(:, perm); ... same as before 1.1.2 ApproximateKNNclassifiers AccordingtoJohnChia,thefollowingcodewillwork. Listing1.2:: [result, ndists] = flann_search(Xtrain’, Xtest’, 1, ... struct(’algorithm’, ’kdtree’, ’trees’, 8, ’checks’, 64)); errorRate = mean(ytrain(result) ˜= ytest0) HereportsthefollowingresultsonMNISTwith1NN. ntests=1000 ntests=10,000 Err Time Err Time Flann 4.8% 17s 3.35% 17.2s Vanilla 3.8% 3.68s 3.09% 28.36s Sotheapproximatemethodissomewhatfasterforlargetestsets,butisslightlylessaccurate. 1.1.3 CVforKNN SeeFigure1.1(b).TheCVestimateisanoverestimateofthetesterror,buthastherightshape.Note,however,thattheempirical testerrorisonlybasedon500testpoints. Abettercomparisonwoulduseamuchlargertestset. 1 5−fold cross validation, ntrain = 200 0.35 train 1 0.3 test 0.9 0.8 misclassification rate00..0012..5512 misclassification rate000...567 0.4 0.05 0.3 00 20 40 60 80 100 120 0.20 20 40 60 80 100 120 K K (a) (b) Figure1.1:(a)MisclassificationratevsKinaK-nearestneighborclassifier.Ontheleft,whereKissmall,themodeliscomplexandhence weoverfit. Ontheright,whereK islarge,themodelissimpleandweunderfit. Dottedblueline: trainingset(size200). Solidredline: test set(size500).(b)5-foldcrossvalidationestimateoftesterror.FiguregeneratedbyknnClassifyDemo. 2 Chapter 2 Probability 2.1 Solutions 2.1.1 Probabilitiesaresensitivetotheformofthequestionthatwasusedtogeneratetheanswer 1. Theeventspaceisshownbelow,whereX isonechildandY theother. X Y Prob. G G 1/4 G B 1/4 B G 1/4 B B 1/4 LetN bethenumberofgirlsandN thenumberofboys. Wehavetheconstraint(sideinformation)thatN +N = 2 g b b g and0 ≤ N ,N ≤ 2. WearetoldN ≥ 1andareaskedtocomputetheprobabilityoftheeventN = 1(i.e.,onechild b g b g isagirl). ByBayesrulewehave p(N ≥1|N =1)p(N =1) p(N =1|N ≥1) = b g g (2.1) g b p(N ≥1) b 1×1/2 = =2/3 (2.2) 3/4 2. LetY betheidentityoftheobservedchildandX betheidentityoftheotherchild. Wewantp(X =g|Y =b). ByBayes rulewehave p(Y =b|X =g)p(X =g) p(X =g|y =b) = (2.3) p(Y =b) (1/2)×(1/2) = =1/2 (2.4) 1/2 TomMinka(Minka1998)haswrittenthefollowingabouttheseresults: Thisseemslikeaparadoxbecauseitseemsthatinbothcaseswecouldconditiononthefactthat”atleastonechild isaboy.”Butthatisnotcorrect;youmustconditionontheeventactuallyobserved,notitslogicalimplications. In thefirstcase,theeventwas”Hesaidyestomyquestion.”Inthesecondcase,theeventwas”Onechildappearedin frontofme.”Thegeneratingdistributionisdifferentforthetwoevents. Probabilitiesreflectthenumberofpossible waysaneventcanhappen,likethenumberofroadstoatown. Logicalimplicationsarefurtherdowntheroadand maybereachedinmoreways,throughdifferenttowns. Thedifferentnumberofwayschangestheprobability. 2.1.2 Legalreasoning LetE betheevidence(theobservedbloodtype),andI betheeventthatthedefendantisinnocent,andG = ¬I betheevent thatthedefendantisguilty. 1. Theprosecutorisconfusingp(E|I)withp(I|E). Wearetoldthatp(E|I)=0.01buttherelevantquantityisp(I|E). By Bayesrule,thisis p(E|I)p(I) 0.01p(I) p(I|E)= = (2.5) p(E|I)p(I)+p(E|G)p(G) 0.01p(I)+(1−p(I)) 3 sincep(E|G) = 1andp(G) = 1−p(I). Sowecannotdeterminep(I|E)withoutknowingthepriorprobabilityp(I). Sop(E|I)=p(I|E)onlyifp(G)=p(I)=0.5,whichishardlyapresumptionofinnocence. To understand this more intuitively, consider the following isomorphic problem (from http://en.wikipedia. org/wiki/Prosecutor’s_fallacy): Abigbowlisfilledwithalargebutunknownnumberofballs. Someoftheballsaremadeofwood,andsome ofthemaremadeofplastic. Ofthewoodenballs,100arewhite;outoftheplasticballs,99areredandonly1 arewhite. Aballispulledoutatrandom,andobservedtobewhite. Without knowledge of the relative proportions of wooden and plastic balls, we cannot tell how likely it is that the ball is wooden. If the number of plastic balls is far larger than the number of wooden balls, for instance, then a white ball pulled from the bowl at random is far more likely to be a white plastic ball than a white wooden ball — even though whiteplasticballsareaminorityofthewholesetofplasticballs. 2. Thedefenderisquotingp(G|E)whileignoringp(G). Theprioroddsare p(G) 1 = (2.6) p(I) 799,999 Theposterioroddsare p(G|E) 1 = (2.7) p(I|E) 7999 So the evidence has increased the odds of guilt by a factor of 1000. This is clearly relevant, although perhaps still not enoughtofindthesuspectguilty. 2.1.3 Varianceofasum Wehave var[X+Y] = E[(X+Y)2]−(E[X]+E[Y])2 (2.8) = E[X2+Y2+2XY]−(E[X]2+E[Y]2+2E[X]E[Y]) (2.9) = E[X2]−E[X]2+E[Y2]−E[Y]2+2E[XY]−2E[X]E[Y] (2.10) = var[X]+var[Y]+2cov[X,Y] (2.11) IfX andY areindependent,thencov[X,Y]=0,sovar[X+Y]=var[X]+var[Y]. 2.1.4 Bayesruleformedicaldiagnosis LetT = 1representapositivetestoutcome,T = 0representanegativetestoutcome,D = 1meanyouhavethedisease,and D =0meanyoudon’thavethedisease. Wearetold P(T =1|D =1) = 0.99 (2.12) P(T =0|D =0) = 0.99 (2.13) P(D =1) = 0.0001 (2.14) WeareaskedtocomputeP(D =1|T =1),whichwecandousingBayes’rule: P(T =1|D =1)P(D =1) P(D =1|T =1) = (2.15) P(T =1|D =1)P(D =1)+P(T =1|D =0)P(D =0) 0.99×0.0001 = (2.16) 0.99×0.0001+0.01×0.9999 = 0.009804 (2.17) Soalthoughyouaremuchmorelikelytohavethedisease(giventhatyouhavetestedpositive)thanarandommemberofthe population,youarestillunlikelytohaveit. 4 2.1.5 TheMontyHallproblem LetH denotethehypothesisthattheprizeisbehinddoori. Wemakethefollowingassumptions: thethreehypothesesH ,H i 1 2 andH areequiprobableapriori,i.e.,, 3 1 P(H )=P(H )=P(H )= . (2.18) 1 2 3 3 Thedatumwereceive,afterchoosingdoor1,isoneofD = 3andD = 2(meaningdoor3or2isopened,respectively). We assumethatthesetwopossibleoutcomeshavethefollowingprobabilities. Iftheprizeisbehinddoor1thenthehosthasafree choice;inthiscaseweassumethatthehostselectsatrandombetweenD = 2andD = 3. Otherwisethechoiceofthehostis forcedandtheprobabilitiesare0and1. 1 P(D =2(cid:107)H )= P(D =2(cid:107)H )=0 P(D =2(cid:107)H )=1 1 2 2 3 (2.19) 1 P(D =3(cid:107)H )= P(D =3(cid:107)H )=1 P(D =3(cid:107)H )=0 1 2 2 3 Now,usingBayestheorem,weevaluatetheposteriorprobabilitiesofthehypotheses: P(D =3(cid:107)H )P(H ) P(H (cid:107)D =3)= i i (2.20) i P(D =3) P(H (cid:107)D =3)=(1/2)(1/3) P(H (cid:107)D =3)= (1)(1/3) P(H (cid:107)D =3)= (0)(1/3) (2.21) 1 P(D=3) 2 P(D=3) 3 P(D=3) ThedenominatorP(D =3)is(1/2)becauseitisthenormalizingconstantforthisposteriordistribution. So 1 2 P(H (cid:107)D =3) = P(H (cid:107)D =3) = P(H (cid:107)D =3) = 0. (2.22) 1 3 2 3 3 Sothecontestantshouldswitchtodoor2inordertohavethebiggestchanceofgettingtheprize. Manypeoplefindthisoutcomesurprising.Therearetwowaystomakeitmoreintuitive.Oneistoplaythegamethirtytimes with a friend and keep track of the frequency with which switching gets the prize. Alternatively, you can perform a thought experimentinwhichthegameisplayedwithamilliondoors. Therulesarenowthatthecontestantchoosesonedoor,thenthe gameshowhostopens999,998doorsinsuchawayasnottorevealtheprize, leavingthecontestant’sselecteddoorandone other door closed. The contestant may now stick or switch. Imagine the contestant confronted by a million doors, of which doors1and234,598havenotbeenopened,door1havingbeenthecontestant’sinitialguess. Wheredoyouthinktheprizeis? AnotherwaytothinkabouttheproblemistouseadirectedgraphicalmodeloftheformP →M ←F,whereP indicates the location the prize, F indicates your first choice, and M indicates which door Monty opens. Clearly P and F cause (determine)M. WhenweobserveM,ourbeliefaboutP changesbecausewehaveobservedevidenceaboutitschildM. 2.1.6 MomentsofaBernoullidistribution Mean (cid:88) E[X]= xp(x)=0p(X =0)+1p(X =1)=θ (2.23) x∈{0,1} Variance var[X] = E(cid:2)(X−µ)2(cid:3)= (cid:88) p(x)(x−µ)2 (2.24) x∈{0,1} = θ(1−θ)2+(1−θ)(0−θ)2 (2.25) = θ(1+θ2−2θ)+(1−θ)θ2 (2.26) = θ+θ3−2θ2+θ2−θ3 (2.27) = θ−θ2 =θ(1−θ) (2.28) Alternativeproof E(cid:2)X2(cid:3) = 02p(x=0)+12p(x=1)=θ (2.29) var[X] = E(cid:2)X2(cid:3)−E[X]2 =θ−θ2 =θ(1−θ) (2.30) 5 2.1.7 Conditionalindependence 1. Bayes’rulegives P(E ,E |H)P(H) P(H|E ,E )= 1 2 (2.31) 1 2 P(E ,E ) 1 2 Thus the information in (ii) is sufficient. In fact, we don’t need P(E ,E ) because it is equal to the normalization 1 2 constant(toenforcethesumtooneconstraint). (i)and(iii)areinsufficient. 2. Nowtheequationsimplifiesto P(E |H)P(E |H)P(H) P(H|E ,E )= 1 2 (2.32) 1 2 P(E ,E ) 1 2 so(i)and(ii)areobviouslysufficient. (iii)isalsosufficient,becausewecancomputeP(E ,E )usingnormalization. 1 2 2.1.8 Pairwiseindependencedoesnotimplymutualindependence Weprovidetwocounterexamples. Let X and X be independent binary random variables, and X = X ⊕X , where ⊕ is the XOR operator. We have 1 2 3 1 2 p(X |X ,X )(cid:54)=p(X ),sinceX canbedeterministicallycalculatedfromX andX . Sothevariables{X ,X ,X }arenot 3 1 2 3 3 1 2 1 2 3 mutuallyindependent. However,wealsohavep(X |X ) = p(X ),sincewithoutX ,noinformationcanbeprovidedtoX . 3 1 3 2 3 SoX ⊥X andsimilarlyX ⊥X . Hence{X ,X ,X }arepairwiseindependent. 1 3 2 3 1 2 3 Hereisadifferentexample. Lettherebefourballsinabag,numbered1to4. Supposewedrawoneatrandom. Define3 eventsasfollows: • X : ball1or2isdrawn. 1 • X : ball2or3isdrawn. 2 • X : ball1or3isdrawn. 3 We have p(X ) = p(X ) = p(X ) = 0.5. Also, p(X ,X ) = p(X ,X ) = p(X ,X ) = 0.25. Hence p(X ,X ) = 1 2 3 1 2 2 3 1 3 1 2 p(X )p(X ), andsimilarlyfortheotherpairs. Hencetheeventsarepairwise independent. However, p(X ,X ,X ) = 0 (cid:54)= 1 2 1 2 3 1/8=p(X )p(X )p(X ). 1 2 3 2.1.9 Conditionalindependenceiffjointfactorizes Independency⇒Factorization. Letg(x,z)=p(x|z)andh(y,z)=p(y|z). IfX ⊥Y|Z then p(x,y|z)=p(x|z)p(y|z)=g(x,z)h(y,z) (2.33) Factorization⇒Independency. Ifp(x,y|z)=g(x,z)h(y,z)then (cid:88) (cid:88) (cid:88) (cid:88) 1 = p(x,y|z)= g(x,z)h(y,z)= g(x,z) h(y,z) (2.34) x,y x,y x y (cid:88) (cid:88) (cid:88) p(x|z) = p(x,y|z)= g(x,z)h(y,z)=g(x,z) h(y,z) (2.35) y y y (cid:88) (cid:88) (cid:88) p(y|z) = p(x,y|z)= g(x,z)h(y,z)=h(y,z) g(x,z) (2.36) x x x (cid:88) (cid:88) p(x|z)p(y|z) = g(x,z)h(y,z) g(x,z) g(y,z) (2.37) x y = g(x,z)h(y,z)=p(x,y|z) (2.38) 2.1.10 Conditionalindependence 1. True,since (X ⊥W|Z,Y) ⇒ p(X|W,Z,Y)=p(X|Z,Y) (2.39) (X ⊥Y|Z) ⇒ p(X|Z,Y)=p(X|Z) (2.40) ⇒ p(X|W,Z,Y)=p(X|Z) (2.41) ⇒ (X ⊥Y,W|Z) (2.42) 2. False. ConsidertheDAGinFigure2.1. Itencodesthat(X ⊥Y|Z)and(X ⊥Y|W)butnot(X ⊥Y|Z,W). 6 Figure2.1:ADGM. 2.1.11 Derivingtheinversegammadensity Wehave dx p (y)=p (x)| | (2.43) y x dy where dx 1 =− =−x2 (2.44) dy y2 So ba p (y) = x2 xa−1e−xb (2.45) y Γ(a) ba = xa+1e−xb (2.46) Γ(a) ba = y−(a+1)e−b/y =IG(y|a,b) (2.47) Γ(a) 2.1.12 Normalizationconstantfora1DGaussian Followingthefirsthintwehave (cid:90) 2π(cid:90) ∞ (cid:18) r2 (cid:19) Z2 = rexp − drdθ (2.48) 2σ2 0 0 (cid:20)(cid:90) 2π (cid:21)(cid:20)(cid:90) ∞ (cid:18) r2 (cid:19) (cid:21) = dθ rexp − dr (2.49) 2σ2 0 0 = (2π)I (2.50) whereI istheinnerintegral (cid:90) ∞ (cid:18) r2 (cid:19) I = rexp − (2.51) 2σ2 0 Followingthesecondhintwehave (cid:90) r I = −σ2 − e−r2/2σ2dr (2.52) σ2 (cid:104) (cid:105)∞ = −σ2 e−r2/2σ2 (2.53) 0 = −σ2[0−1]=σ2 (2.54) Hence Z2 = 2πσ2 (2.55) (cid:112) Z = σ (2π) (2.56) 7 2.1.13 Expressingmutualinformationintermsofentropies (cid:88) p(x,y) I(X,Y) = p(x,y)log (2.57) p(x)p(y) x,y (cid:88) p(x|y) = p(x,y)log (2.58) p(x) x,y (cid:88) (cid:88) = − p(x,y)logp(x)+ p(x,y)logp(x|y) (2.59) x,y x,y (cid:32) (cid:33) (cid:88) (cid:88) = − p(x)logp(x)− − p(x,y)logp(x|y) (2.60) x x,y (cid:32) (cid:33) (cid:88) (cid:88) (cid:88) = − p(x)logp(x)− − p(y) p(x|y)logp(x|y) (2.61) x y x = H(X)−H(X|Y) (2.62) WecanshowI(X,Y)=H(Y)−H(Y|X)bysymmetry. 2.1.14 Mutualinformationforcorrelatednormals Theentropyis h(X,Y)= 1log(cid:2)(2πe)2detΣ(cid:3)= 1log(cid:2)(2πe)2σ4(1−ρ2)(cid:3) (2.63) 2 2 SinceX andY areindividuallynormalwithvarianceσ2,wehave h(X)=h(Y)= 1log(cid:2)2πeσ2(cid:3) (2.64) 2 Hence I(X,Y) = h(X)+h(Y)−h(X,Y) (2.65) 1 = log[2πeσ2]− log[(2πe)2σ4(1−ρ2)] (2.66) 2 1 1 = log[(2πeσ2)2]− log[(2πe2σ2)2(1−ρ2)] (2.67) 2 2 1 1 1 = log =− log[1−ρ2] (2.68) 2 1−ρ2 2 1. ρ=1. Inthiscase,X =Y,andI(X,Y)=∞,whichmakessense. 2. ρ=0. Inthiscase,X andY areindependent,andI(X,Y)=0,whichmakessense. 3. ρ=−1. Inthiscase,X =−Y,andI(X,Y)=∞,whichagainmakessense. 2.1.15 Ameasureofcorrelation(normalizedmutualinformation) 1. Wehave H(X)−H(Y|X) H(Y)−H(Y|X) I(X,Y) r = = = (2.69) H(X) H(X) H(X) wherethesecondstepfollowssinceH(X)=H(Y) 8

Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) PDF

127 Pages·1.701 MB·English

by Kevin P. Murphy

Checking for file health...

Save to my drive

Quick download

Download

Download Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) PDF Free - Full Version

by Kevin P. Murphy| 127 pages| 1.701| English

Download Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) by Kevin P. Murphy in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions)

No description available for this book.

Detailed Information

Author:	Kevin P. Murphy
ISBN:	9780262018029
Pages:	127
Language:	English
File Size:	1.701
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) PDF?

Yes, on https://PDFdrive.to you can download Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) by Kevin P. Murphy completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) on my mobile device?

After downloading Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions)?

Yes, this is the complete PDF version of Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) by Kevin P. Murphy. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Machine Learning: A Probabilistic Perspective (Instructor's Solution Manual) (Solutions) PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.