Table Of ContentApproximate Inference, Structure Learning
and Feature Estimation
in Markov Random Fields
Pradeep Ravikumar
August 2007
CMU-ML-07-115
Approximate Inference, Structure Learning
and Feature Estimation
in Markov Random Fields
Pradeep Ravikumar
August 2007
CMU-ML-07-115
Machine Learning Department
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA 15213
Thesis Committee:
John Lafferty (Chair)
Carlos Guestrin
Eric Xing
Martin Wainwright, UC Berkeley
Submitted in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
Copyright © 2007 Pradeep Ravikumar
The views and conclusions contained in this document are those of the author and should not be
interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the
U.S. government of or any other entity.
Keywords: Markov Random Fields, Graphical Models, Approximate Infer-
ence, Structure Learning, Feature Estimation, Non-parametric Estimation, Spar-
sity,` Regularization, AdditiveModels
1
Abstract
Markovrandom fields(MRFs),orundirected graphical models, aregraphical rep-
resentations of probability distributions. Each graph represents a family of dis-
tributions – the nodes of the graph represent random variables, the edges encode
independence assumptions, and weights over the edges and cliques specify a par-
ticularmemberofthefamily.
There are three main classes of tasks within this framework: the first is to
perform inference, given the graph structure and parameters and (clique) feature
functions; the second is to estimate the graph structure and parameters from data,
giventhefeaturefunctions;thethirdistoestimatethefeaturefunctionsthemselves
fromdata.
Key inference subtasks include estimating the normalization constant (also
called the partition function), event probability estimation, computing rigorous
upper and lower bounds (interval guarantees), inference given only moment con-
straints, andcomputingthemostprobable configuration.
Thethesisaddresses alloftheabovetasksandsubtasks.
ii
Acknowledgements
I will start with a couple of historical notes. For the first, we go to the post-
Renaissance period, to the German polities. And for the second, we go to 2000
B.C.in India. These are the roots of the research university system, and the roots
of the Indian monasteries respectively. What is common to both, is an environ-
ment of learned wise people, who devote their ascetic lives to understanding the
worldandexistence. Iapologizeforthebombast,butI’mnotdonewiththehistory
andculture lesson yet. India has manygood things; among them isamultitude of
holy men who devote their life to introspection and religion. A non-Indian might
perhaps notfully graspthis, butmythesisadvisor John Laffertyhasalotofanin-
tangiblesomethingcommontothebestofsuchholymen;aninnerpeace,wisdom.
In the finest of Indian traditions dating back to 2000 B.C., I have learnt under a
holywiseman. Myparentsmustbeveryproud. Inhisotherprofessorialrole,John
hastaught me,asmuchasIcould learnattheleast, alotabout howtothink about
research; I’vebenefitedalotfromhisclarityofthought andcreativeintellect.
I have much to thank other monks at Carnegie Mellon as well. William Co-
hen showed mehow to zoom in when doing research; Steve Fienberg showed me
how to zoom out for the big picture. I’ve had a lot of fun working with Larry
Wasserman,whoisasinsightful withacronymsasheiswithAllofStatistics. I’ve
benefittednotonlyfromhisinsights,butalso,atafundamentallevel,fromhisstyle
ofdoingresearch; whichhasgreatlyinfluencedmyresearch styleaswell.
Imustalsothankmythesiscommittee. ThecreativeenergiesofCarlosGuestrin
andEricXinghavealwaysfascinated me,andIwasluckytobetheTAofacourse
they jointly offered, “Probabilistic Graphical Models”; their insights were quite
helpful. Martin Wainwright’s research on graphical models, and his beautifully
writtenpapers, wereprincipal motivatorsforthisthesisandmyresearch.
I’m thankful to Diane Stidle who has been an ever-present source of help and
advice. I’malsogratefulforthekindness andhelpofSharonCavlovich andMon-
icaHopes.
High-fives to my squash team; Vineet Goyal, Mohit Kumar, Matt Mason; for
allthegoodtimesplaying andwinningtheleague.
iv
Finally, Ithank myparents Ushaand Pattabhiraman Ravikumar, for their love
andtheirencouragements.
Contents
1 Introduction 1
1.1 Representation theory . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Exponential FamilyRepresentation . . . . . . . . . . . . 5
1.2 PairwiseMRFs . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Tasksinagraphical model . . . . . . . . . . . . . . . . . . . . . 7
1.4 Whatthisthesisisabout . . . . . . . . . . . . . . . . . . . . . . 9
I ApproximateInference 11
2 LogPartition Function 13
2.1 ConjugateDualofthelog-partition function . . . . . . . . . . . . 16
3 PreconditionerApproximations 19
3.1 Preconditioners inLinearSystems . . . . . . . . . . . . . . . . . 19
3.2 GraphicalModelPreconditioners . . . . . . . . . . . . . . . . . . 20
3.2.1 MainIdea . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Generalized Eigenvalue Bounds . . . . . . . . . . . . . . 21
3.2.3 MainProcedure . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Generalized SupportTheoryforGraphicalModels . . . . . . . . 25
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 QuadraticProgrammingRelaxations forMAP 29
4.1 MAPEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 LinearRelaxations . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 QuadraticRelaxation . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 ConvexApproximation . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 IterativeUpdateProcedure . . . . . . . . . . . . . . . . . . . . . 37
4.7 InnerPolytopeRelaxations . . . . . . . . . . . . . . . . . . . . . 38
vi CONTENTS
4.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 GeneralEventProbabilities, Bounds 43
6 VariationalChernoffBounds 45
6.1 ClassicalandGeneralized ChernoffBounds . . . . . . . . . . . . 46
6.2 GraphicalModelChernoffBounds . . . . . . . . . . . . . . . . . 48
6.3 ExamplesofClassicalandGraphical ModelChernoffBounds . . 49
6.3.1 Example: ClassicalChernoffbounds . . . . . . . . . . . . 49
6.3.2 Example: ChernoffboundsforMarkovmodels . . . . . . 50
6.4 VariationalChernoffBounds . . . . . . . . . . . . . . . . . . . . 51
6.4.1 CollapsingtheNestedOptimization . . . . . . . . . . . . 52
6.5 TightnessofChernoffBounds . . . . . . . . . . . . . . . . . . . 53
6.6 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 54
7 VariationalChebyshevBounds 57
7.1 GraphicalModelChebyshevbounds . . . . . . . . . . . . . . . . 58
7.2 Chebyshev-Chernoff Bounds . . . . . . . . . . . . . . . . . . . . 60
II Structure Learning 63
8 StructureFromData 65
8.1 Parameterizingedgeselection . . . . . . . . . . . . . . . . . . . 66
9 ` regularized regression 71
1
9.1 ProblemFormulationandNotation . . . . . . . . . . . . . . . . . 72
9.2 MainResultandOutlineofAnalysis . . . . . . . . . . . . . . . . 74
9.2.1 Statementofmainresult . . . . . . . . . . . . . . . . . . 74
9.2.2 Outlineofanalysis . . . . . . . . . . . . . . . . . . . . . 75
9.3 Primal-DualRelationsfor` -Regularized LogisticRegression . . 76
1
9.4 Constructing aPrimal-DualPair . . . . . . . . . . . . . . . . . . 77
9.5 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . 80
III Feature Estimation 83
10 Featuresfromdata 85
10.1 SmoothingandAdditiveModels . . . . . . . . . . . . . . . . . . 87
Description:There are three main classes of tasks within this framework: the first is to . 4 Quadratic Programming Relaxations for MAP. 29 . as an intermediate layer that provides an easy language for applications above the Φ(x); the generalized eigenvalues of a pair of parameter matrices (A, B) are de-.