Table Of ContentElementsofCausalInference
FoundationsandLearningAlgorithms
AdaptiveComputationandMachineLearning
FrancisBach,Editor
ChristopherBishop,DavidHeckerman,MichaelJordan,andMichaelKearns,As-
sociateEditors
A complete list of books published in The Adaptive Computation and Machine
Learningseriesappearsatthebackofthisbook.
ElementsofCausalInference
FoundationsandLearningAlgorithms
JonasPeters,DominikJanzing,andBernhardScho¨lkopf
TheMITPress
Cambridge,Massachusetts
London,England
(cid:13)c 2017MassachusettsInstituteofTechnology
This work is licensed to the public under a Creative Commons Attribution- Non-
Commercial-NoDerivatives4.0license(international):
http://creativecommons.org/licenses/by-nc-nd/4.0/
All rights reserved except as licensed pursuant to the Creative Commons license
identified above. Any reproduction or other use not licensed as above, by any
electronicormechanicalmeans(includingbutnotlimitedtophotocopying,public
distribution, online display, and digital information storage and retrieval) requires
permissioninwritingfromthepublisher.
ThisbookwassetinLaTeXbytheauthors.
PrintedandboundintheUnitedStatesofAmerica.
LibraryofCongressCataloging-in-PublicationData
Names: Peters,Jonas.|Janzing,Dominik.|Scho¨lkopf,Bernhard.
Title: Elements of causal inference : foundations and learning algorithms / Jonas
Peters,DominikJanzing,andBernhardScho¨lkopf.
Description: Cambridge, MA : MIT Press, 2017. | Series: Adaptive computation
andmachinelearningseries|Includesbibliographicalreferencesandindex.
Identifiers: LCCN2017020087|ISBN9780262037310(hardcover: alk.paper)
Subjects: LCSH:Machinelearning.|Logic,Symbolicandmathematical.|Causa-
tion. |Inference.|Computeralgorithms.
Classification: LCCQ325.5.P482017|DDC006.3/1–dc23
LCrecordavailableathttps://lccn.loc.gov/2017020087
10 9 8 7 6 5 4 3 2 1
Toallthosewhoenjoythepursuitofcausalinsight
Contents
Preface xi
NotationandTerminology xv
1 StatisticalandCausalModels 1
1.1 ProbabilityTheoryandStatistics . . . . . . . . . . . . . . . . . . 1
1.2 LearningTheory . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 CausalModelingandLearning . . . . . . . . . . . . . . . . . . . 5
1.4 TwoExamples. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 AssumptionsforCausalInference 15
2.1 ThePrincipleofIndependentMechanisms . . . . . . . . . . . . . 16
2.2 HistoricalNotes . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 PhysicalStructureUnderlyingCausalModels . . . . . . . . . . . 26
3 Cause-EffectModels 33
3.1 StructuralCausalModels . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 CanonicalRepresentationofStructuralCausalModels . . . . . . 37
3.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 LearningCause-EffectModels 43
4.1 StructureIdentifiability . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 MethodsforStructureIdentification . . . . . . . . . . . . . . . . 62
4.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
viii Contents
5 ConnectionstoMachineLearning,I 71
5.1 Semi-SupervisedLearning . . . . . . . . . . . . . . . . . . . . . 71
5.2 CovariateShift . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6 MultivariateCausalModels 81
6.1 GraphTerminology . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 StructuralCausalModels . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 MarkovProperty,Faithfulness,andCausalMinimality . . . . . . 100
6.6 CalculatingInterventionDistributionsbyCovariateAdjustment . 109
6.7 Do-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.8 EquivalenceandFalsifiabilityofCausalModels . . . . . . . . . . 120
6.9 PotentialOutcomes . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.10 GeneralizedStructuralCausalModelsRelatingSingleObjects . . 126
6.11 AlgorithmicIndependenceofConditionals . . . . . . . . . . . . . 129
6.12 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7 LearningMultivariateCausalModels 135
7.1 StructureIdentifiability . . . . . . . . . . . . . . . . . . . . . . . 136
7.2 MethodsforStructureIdentification . . . . . . . . . . . . . . . . 142
7.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8 ConnectionstoMachineLearning,II 157
8.1 Half-SiblingRegression. . . . . . . . . . . . . . . . . . . . . . . 157
8.2 CausalInferenceandEpisodicReinforcementLearning . . . . . . 159
8.3 DomainAdaptation . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9 HiddenVariables 171
9.1 InterventionalSufficiency . . . . . . . . . . . . . . . . . . . . . . 171
9.2 Simpson’sParadox . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.3 InstrumentalVariables . . . . . . . . . . . . . . . . . . . . . . . 175
9.4 ConditionalIndependencesandGraphicalRepresentations . . . . 177
9.5 ConstraintsbeyondConditionalIndependence . . . . . . . . . . . 185
9.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Contents ix
10 TimeSeries 197
10.1 PreliminariesandTerminology . . . . . . . . . . . . . . . . . . . 197
10.2 StructuralCausalModelsandInterventions . . . . . . . . . . . . 199
10.3 LearningCausalTimeSeriesModels . . . . . . . . . . . . . . . . 201
10.4 DynamicCausalModeling . . . . . . . . . . . . . . . . . . . . . 210
10.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Appendices
AppendixA SomeProbabilityandStatistics 213
A.1 BasicDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . 213
A.2 IndependenceandConditionalIndependenceTesting . . . . . . . 216
A.3 CapacityofFunctionClasses . . . . . . . . . . . . . . . . . . . . 219
AppendixB CausalOrderingsandAdjacencyMatrices 221
AppendixC Proofs 225
C.1 ProofofTheorem4.2 . . . . . . . . . . . . . . . . . . . . . . . . 225
C.2 ProofofProposition6.3 . . . . . . . . . . . . . . . . . . . . . . . 226
C.3 ProofofRemark6.6 . . . . . . . . . . . . . . . . . . . . . . . . 226
C.4 ProofofProposition6.13 . . . . . . . . . . . . . . . . . . . . . . 226
C.5 ProofofProposition6.14 . . . . . . . . . . . . . . . . . . . . . . 228
C.6 ProofofProposition6.36 . . . . . . . . . . . . . . . . . . . . . . 228
C.7 ProofofProposition6.48 . . . . . . . . . . . . . . . . . . . . . . 228
C.8 ProofofProposition6.49 . . . . . . . . . . . . . . . . . . . . . . 229
C.9 ProofofProposition7.1 . . . . . . . . . . . . . . . . . . . . . . . 230
C.10 ProofofProposition7.4 . . . . . . . . . . . . . . . . . . . . . . . 230
C.11 ProofofProposition8.1 . . . . . . . . . . . . . . . . . . . . . . . 230
C.12 ProofofProposition8.2 . . . . . . . . . . . . . . . . . . . . . . . 231
C.13 ProofofProposition9.3 . . . . . . . . . . . . . . . . . . . . . . . 231
C.14 ProofofTheorem10.3 . . . . . . . . . . . . . . . . . . . . . . . 232
C.15 ProofofTheorem10.4 . . . . . . . . . . . . . . . . . . . . . . . 232
Bibliography 235
Index 263