Table Of ContentMarkov Logic Networks
MatthewRichardson ([email protected])and
PedroDomingos([email protected])
DepartmentofComputerScienceandEngineering,UniversityofWashington,Seattle,WA
98195-250,U.S.A.
Abstract. We propose a simple approach to combining (cid:2)rst-order logic and probabilistic
graphicalmodelsinasinglerepresentation.AMarkovlogicnetwork(MLN)isa(cid:2)rst-order
knowledge base with a weight attached to each formula (or clause). Together with a set of
constantsrepresentingobjectsinthedomain,itspeci(cid:2)esagroundMarkovnetworkcontaining
one feature for each possible grounding of a (cid:2)rst-order formula in the KB, with the corre-
sponding weight. Inference in MLNs is performed by MCMC over the minimal subset of
the ground network required for answering the query. Weights are ef(cid:2)ciently learned from
relationaldatabasesbyiterativelyoptimizingapseudo-likelihoodmeasure.Optionally,addi-
tionalclausesarelearnedusinginductivelogicprogrammingtechniques.Experimentswitha
real-worlddatabaseandknowledgebaseinauniversitydomainillustratethepromiseofthis
approach.
Keywords:Statisticalrelationallearning,Markovnetworks,Markovrandom(cid:2)elds,log-linear
models,graphicalmodels,(cid:2)rst-orderlogic,satis(cid:2)ability,inductivelogicprogramming,know-
ledge-basedmodelconstruction,MarkovchainMonteCarlo,pseudo-likelihood,linkpredic-
tion
1. Introduction
Combining probability and (cid:2)rst-order logic in a single representation has
longbeenagoalofAI.Probabilisticgraphicalmodelsenableustoef(cid:2)ciently
handleuncertainty.First-orderlogicenablesustocompactlyrepresentawide
variety of knowledge. Many (if not most) applications require both. Interest
in this problem has grown in recent years due to its relevance to statistical
relational learning (Getoor & Jensen, 2000; Getoor & Jensen, 2003; Diet-
terich et al., 2003), also known as multi-relational data mining (Dz(cid:16)eroski &
De Raedt, 2003; Dz(cid:16)eroski et al., 2002; Dz(cid:16)eroski et al., 2003; Dz(cid:16)eroski &
Blockeel,2004). Currentproposals typically focus oncombining probability
with restricted subsets of (cid:2)rst-order logic, like Horn clauses (e.g., Wellman
et al. (1992); Poole (1993); Muggleton (1996); Ngo and Haddawy (1997);
Sato and Kameya (1997); Cussens (1999); Kersting and De Raedt (2001);
SantosCostaetal.(2003)),frame-basedsystems(e.g.,Friedmanetal.(1999);
Pasula and Russell (2001); Cumby and Roth (2003)), or database query lan-
guages(e.g.,Taskaretal.(2002);PopesculandUngar(2003)).Theyareoften
quitecomplex.Inthispaper, weintroduce Markovlogic networks (MLNs),a
representation that is quite simple, yet combines probability and (cid:2)rst-order
logic with no restrictions other than (cid:2)niteness of the domain. We develop
mln.tex; 29/07/2005; 14:21; p.1
2 RichardsonandDomingos
ef(cid:2)cient algorithms for inference and learning in MLNs, and evaluate them
inareal-world domain.
A Markov logic network is a (cid:2)rst-order knowledge base with a weight
attached to each formula, and can be viewed as a template for constructing
Markov networks. From the point of view of probability, MLNs provide a
compact language to specify very large Markov networks, and the ability
to (cid:3)exibly and modularly incorporate a wide range of domain knowledge
into them. From the point of view of (cid:2)rst-order logic, MLNs add the ability
to soundly handle uncertainty, tolerate imperfect and contradictory knowl-
edge, and reduce brittleness. Many important tasks in statistical relational
learning, like collective classi(cid:2)cation, link prediction, link-based clustering,
social network modeling, and object identi(cid:2)cation, are naturally formulated
asinstances ofMLNlearning andinference.
Experiments with a real-world database and knowledge base illustrate
the bene(cid:2)ts of using MLNs over purely logical and purely probabilistic ap-
proaches.Webeginthepaperbybrie(cid:3)yreviewingthefundamentalsofMarkov
networks (Section 2) and (cid:2)rst-order logic (Section 3). The core of the paper
introduces Markov logic networks andalgorithms forinference andlearning
in them (Sections 4(cid:150)6). We then report our experimental results (Section 7).
Finally, we show how a variety of SRL tasks can be cast as MLNs (Sec-
tion8),discuss howMLNsrelate toprevious approaches (Section 9)and list
directions forfuturework(Section10).
2. MarkovNetworks
A Markov network (also known as Markov random (cid:2)eld) is a model for
the joint distribution of a set of variables X = (X ;X ;:::;X ) 2 X
1 2 n
(Pearl,1988). Itiscomposed ofanundirected graph Gand asetofpotential
functions (cid:30) . The graph has a node for each variable, and the model has a
k
potential function foreachclique inthegraph.Apotential function isanon-
negative real-valued function of the state of the corresponding clique. The
jointdistribution represented byaMarkovnetworkisgivenby
1
P(X=x) = (cid:30) (x ) (1)
k fkg
Z
k
Y
where x is the state of the kth clique (i.e., the state of the variables that
fkg
appear in that clique). Z, known as the partition function, is given by Z =
(cid:30) (x ).Markov networks areoften conveniently represented as
x2X k k fkg
log-linear models, with each clique potential replaced by an exponentiated
P Q
weightedsumoffeatures ofthestate,leadingto
mln.tex; 29/07/2005; 14:21; p.2
MarkovLogicNetworks 3
1
P(X=x) = exp w f (x) (2)
j j
Z 0 1
j
X
@ A
Afeature maybeanyreal-valued function ofthestate.Thispaper willfocus
on binary features, f (x) 2 f0;1g. In the most direct translation from the
j
potential-function form (Equation 1), there is one feature corresponding to
each possible state x of each clique, with its weight being log(cid:30) (x ).
fkg k fkg
Thisrepresentation isexponential inthesizeofthecliques. However,weare
free to specify a much smaller number of features (e.g., logical functions of
the state of the clique), allowing for a more compact representation than the
potential-function form, particularly when large cliques are present. MLNs
willtakeadvantage ofthis.
Inference in Markov networks is #P-complete (Roth, 1996). The most
widelyusedmethodforapproximateinferenceinMarkovnetworksisMarkov
chain Monte Carlo (MCMC) (Gilks et al., 1996), and in particular Gibbs
sampling,whichproceedsbysamplingeachvariableinturngivenitsMarkov
blanket. (The Markov blanket of a node is the minimal set of nodes that
renders it independent of the remaining network; in a Markov network, this
issimplythenode’sneighbors inthegraph.)Marginal probabilities arecom-
putedbycountingoverthesesamples;conditionalprobabilitiesarecomputed
byrunningtheGibbssamplerwiththeconditioningvariablesclampedtotheir
given values. Another popular method for inference in Markov networks is
beliefpropagation (Yedidiaetal.,2001).
Maximum-likelihood orMAPestimates ofMarkovnetworkweights can-
notbecomputed inclosed form,but,because thelog-likelihood isaconcave
functionoftheweights,theycanbefoundef(cid:2)cientlyusingstandardgradient-
based or quasi-Newton optimization methods (Nocedal & Wright, 1999).
Anotheralternativeisiterativescaling(DellaPietraetal.,1997).Featurescan
alsobelearnedfromdata,forexamplebygreedily constructing conjunctions
ofatomicfeatures (DellaPietraetal.,1997).
3. First-OrderLogic
A (cid:2)rst-order knowledge base (KB) is a set of sentences or formulas in (cid:2)rst-
order logic (Genesereth & Nilsson, 1987). Formulas are constructed using
four types of symbols: constants, variables, functions, and predicates. Con-
stantsymbols represent objects inthedomainofinterest (e.g.,people: Anna,
Bob, Chris, etc.). Variable symbols range over the objects in the domain.
Function symbols (e.g., MotherOf) represent mappings from tuples of ob-
jects to objects. Predicate symbols represent relations among objects in the
domain (e.g., Friends) or attributes of objects (e.g., Smokes). An inter-
mln.tex; 29/07/2005; 14:21; p.3
4 RichardsonandDomingos
pretation speci(cid:2)es which objects, functions and relations in the domain are
represented by which symbols. Variables and constants may be typed, in
which case variables range only over objects of the corresponding type, and
constants can only represent objects of the corresponding type. For exam-
ple, the variable x might range over people (e.g., Anna, Bob, etc.), and the
constantCmightrepresent acity(e.g.,Seattle).
A term is any expression representing an object in the domain. It can be
aconstant, avariable, orafunction applied toatupleofterms.Forexample,
Anna,x,andGreatestCommonDivisor(x;y)areterms.Anatomicformula
or atom is a predicate symbol applied to a tuple of terms (e.g., Friends(x;
MotherOf(Anna))). Formulas are recursively constructed from atomic for-
mulas using logical connectives and quanti(cid:2)ers. If F and F are formulas,
1 2
thefollowing arealsoformulas::F (negation), whichistrueiffF isfalse;
1 1
F ^ F (conjunction), which is true iff both F and F are true; F _ F
1 2 1 2 1 2
(disjunction),whichistrueiffF orF istrue;F )F (implication),which
1 2 1 2
is true iff F is false or F is true; F , F (equivalence), which is true iff
1 2 1 2
F andF havethesametruthvalue;8xF (universalquanti(cid:2)cation), which
1 2 1
is true iff F is true for every object xin the domain; and 9x F (existential
1 1
quanti(cid:2)cation), which is true iff F is true for at least one object x in the
1
domain. Parentheses may be used to enforce precedence. A positive literal
is an atomic formula; a negative literal is a negated atomic formula. The
formulas in a KB are implicitly conjoined, and thus a KB can be viewed
asasinglelargeformula.Aground termisatermcontaining novariables.A
groundatomorgroundpredicateisanatomicformulaallofwhosearguments
aregroundterms.ApossibleworldorHerbrandinterpretationassignsatruth
valuetoeachpossibleground atom.
Aformulaissatis(cid:2)ableiffthereexistsatleastoneworldinwhichitistrue.
The basic inference problem in (cid:2)rst-order logic is to determine whether a
knowledgebaseKBentailsaformulaF,i.e.,ifF istrueinallworldswhere
KB is true (denoted by KB j= F). This is often done by refutation: KB
entailsF iffKB[:F isunsatis(cid:2)able.(Thus,ifaKBcontainsacontradiction,
all formulas trivially follow from it, which makes painstaking knowledge
engineering a necessity.) For automated inference, it is often convenient to
convert formulas toamoreregular form,typically clausal form (also known
as conjunctive normal form (CNF)). A KB in clausal form is a conjunction
of clauses, a clause being a disjunction of literals. Every KB in (cid:2)rst-order
logiccanbeconvertedtoclausalformusingamechanicalsequenceofsteps.1
Clausalformisusedinresolution,asoundandrefutation-complete inference
procedure for(cid:2)rst-order logic(Robinson, 1965).
1 Thisconversionincludestheremovalofexistentialquanti(cid:2)ersbySkolemization,which
is not sound in general. However, in (cid:2)nite domains an existentially quanti(cid:2)ed formula can
simplybereplacedbyadisjunctionofitsgroundings.
mln.tex; 29/07/2005; 14:21; p.4
MarkovLogicNetworks 5
Inferencein(cid:2)rst-orderlogicisonlysemidecidable.Becauseofthis,knowl-
edge bases are often constructed using a restricted subset of (cid:2)rst-order logic
with more desirable properties. The most widely-used restriction is to Horn
clauses,whichareclausescontaining atmostonepositiveliteral.TheProlog
programming language is based on Horn clause logic (Lloyd, 1987). Prolog
programs can be learned from databases by searching for Horn clauses that
(approximately) holdinthedata;thisisstudied inthe(cid:2)eldofinductivelogic
programming(ILP)(Lavrac(cid:16) &Dz(cid:16)eroski,1994).
TableIshowsasimpleKBanditsconversiontoclausalform.Noticethat,
while these formulas may be typically true in the real world, they are not
always true. In most domains it is very dif(cid:2)cult to come up with non-trivial
formulasthatarealwaystrue,andsuchformulascaptureonlyafractionofthe
relevant knowledge. Thus, despite its expressiveness, pure (cid:2)rst-order logic
has limited applicability to practical AI problems. Many ad hoc extensions
toaddress thishavebeenproposed. Inthemorelimitedcaseofpropositional
logic,theproblemiswellsolvedbyprobabilistic graphical models.Thenext
sectiondescribes awaytogeneralize thesemodelstothe(cid:2)rst-ordercase.
4. MarkovLogicNetworks
A(cid:2)rst-orderKBcanbeseenasasetofhardconstraintsonthesetofpossible
worlds: if a world violates even one formula, it has zero probability. The
basic idea in MLNsistosoften these constraints: whenaworld violates one
formulaintheKBitislessprobable,butnotimpossible.Thefewerformulas
a world violates, the more probable it is. Each formula has an associated
weight that re(cid:3)ects how strong a constraint it is: the higher the weight, the
greater the difference in log probability between a world that satis(cid:2)es the
formulaandonethatdoesnot,otherthingsbeingequal.
DEFINITION4.1. AMarkovlogicnetworkLisasetofpairs(F ;w ),where
i i
F is a formula in (cid:2)rst-order logic and w is a real number. Together with a
i i
(cid:2)nite set of constants C = fc ;c ;:::;c g, it de(cid:2)nes a Markov network
1 2 jCj
M (Equations 1and2)asfollows:
L;C
1. M contains one binary node for each possible grounding of each
L;C
predicate appearing in L. The value of the node is 1 if the ground atom
istrue,and0otherwise.
2. M contains one feature for each possible grounding of each formula
L;C
F inL.Thevalueofthisfeatureis1ifthegroundformulaistrue,and0
i
otherwise.Theweightofthefeatureisthew associated withF inL.
i i
mln.tex; 29/07/2005; 14:21; p.5
6
TableI. Exampleofa(cid:2)rst-orderknowledgebaseandMLN.Fr()isshortforFriends(),Sm()forSmokes(),andCa()
forCancer().
R
ic
h
English First-OrderLogic ClausalForm Weight a
rd
s
o
Friendsoffriendsarefriends. 8x8y8zFr(x;y)^Fr(y;z))Fr(x;z) :Fr(x;y)_:Fr(y;z)_Fr(x;z) 0.7 n
a
ml Friendlesspeoplesmoke. 8x(:(9yFr(x;y)))Sm(x)) Fr(x;g(x))_Sm(x) 2.3 nd
n D
. Smokingcausescancer. 8xSm(x))Ca(x) :Sm(x)_Ca(x) 1.5 om
t
ex Iftwopeoplearefriends,either 8x8yFr(x;y))(Sm(x),Sm(y)) :Fr(x;y)_Sm(x)_:Sm(y), 1.1 ingo
; s
bothsmokeorneitherdoes. :Fr(x;y)_:Sm(x)_Sm(y) 1.1
2
9
/
0
7
/
2
0
0
5
;
1
4
:
2
1
;
p
.
6
MarkovLogicNetworks 7
ThesyntaxoftheformulasinanMLNisthestandardsyntaxof(cid:2)rst-order
logic(Genesereth&Nilsson,1987).Free(unquanti(cid:2)ed) variablesaretreated
asuniversally quanti(cid:2)edattheoutermostleveloftheformula.
AnMLNcanbeviewed asatemplate forconstructing Markovnetworks.
Givendifferentsetsofconstants,itwillproducedifferentnetworks,andthese
may be of widely varying size, but all will have certain regularities in struc-
ture and parameters, given by the MLN (e.g., all groundings of the same
formulawillhavethesameweight).Wecalleachofthesenetworksaground
Markov network to distinguish it from the (cid:2)rst-order MLN. From De(cid:2)ni-
tion 4.1 and Equations 1 and 2, the probability distribution over possible
worldsxspeci(cid:2)edbythegroundMarkovnetworkM isgivenby
L;C
1 1
P(X=x)= exp w n (x) = (cid:30) (x )ni(x) (3)
i i i fig
Z ! Z
i i
X Y
where n (x) is the number of true groundings of F in x, x is the state
i i fig
(truth values) of the atoms appearing in Fi, and (cid:30)i(xfig) = ewi. Notice
that,althoughwede(cid:2)nedMLNsasloglinearmodels,theycouldequallywell
be de(cid:2)ned as products of potential functions, as the second equality above
shows.Thiswillbethemostconvenient approach indomainswithamixture
of hard and soft constraints (i.e., where some formulas hold with certainty,
leadingtozeroprobabilities forsomeworlds).
The graphical structure of M follows from De(cid:2)nition 4.1: there is an
L;C
edgebetweentwonodesofM iffthecorresponding groundatomsappear
L;C
together in at least one grounding of one formula in L. Thus, the atoms
in each ground formula form a (not necessarily maximal) clique in M .
L;C
Figure 1 shows the graph of the ground Markov network de(cid:2)ned by the last
two formulas in Table I and the constants Anna and Bob. Each node in this
graph is a ground atom (e.g., Friends(Anna;Bob)). The graph contains an
arc between each pair of atoms that appear together in some grounding of
oneoftheformulas.M cannowbeusedtoinfertheprobabilitythatAnna
L;C
and Bob are friends given their smoking habits, the probability that Bob has
cancergivenhisfriendship withAnnaandwhethershehascancer,etc.
Eachstate ofM represents apossible world.Apossible worldis aset
L;C
of objects, a set of functions (mappings from tuples of objects to objects),
andasetofrelations thathold between those objects; together withan inter-
pretation,theydeterminethetruthvalueofeachgroundatom.Thefollowing
assumptionsensurethatthesetofpossibleworldsfor(L;C)is(cid:2)nite,andthat
M represents a unique, well-de(cid:2)ned probability distribution over those
L;C
worlds,irrespective oftheinterpretation and domain. Theseassumptions are
quite reasonable in most practical applications, and greatly simplify the use
ofMLNs.Fortheremainingcases,wediscussbelowtheextenttowhicheach
onecanberelaxed.
mln.tex; 29/07/2005; 14:21; p.7
8 RichardsonandDomingos
Friends(A,B)
Friends(A,A) Smokes(A) Smokes(B) Friends(B,B)
Cancer(A) Cancer(B)
Friends(B,A)
Figure1. GroundMarkovnetworkobtainedbyapplyingthelasttwoformulasinTableIto
theconstantsAnna(A)andBob(B).
ASSUMPTION1. Uniquenames.Different constants refer to different ob-
jects(Genesereth &Nilsson,1987).
ASSUMPTION2. Domain closure. The only objects in the domain are
thoserepresentableusingtheconstantandfunctionsymbolsin(L;C)(Gene-
sereth&Nilsson,1987).
ASSUMPTION3. Knownfunctions.ForeachfunctionappearinginL,the
value ofthat function applied toevery possible tuple ofarguments isknown,
andisanelementofC.
This last assumption allows us to replace functions by their values when
grounding formulas. Thus the only ground atoms that need to be considered
are those having constants as arguments. The in(cid:2)nite number of terms con-
structiblefromallfunctionsandconstantsin(L;C)(the(cid:147)Herbranduniverse(cid:148)
of (L;C)) can be ignored, because each of those terms corresponds to a
known constant in C, and atoms involving them are already represented as
the atoms involving the corresponding constants. The possible groundings
of a predicate in De(cid:2)nition 4.1 are thus obtained simply by replacing each
variableinthepredicatewitheachconstantinC,andreplacingeachfunction
term in the predicate by the corresponding constant. Table II shows how the
groundings ofaformulaareobtained givenAssumptions 1(cid:150)3.
Assumption1(uniquenames)canberemovedbyintroducingtheequality
predicate(Equals(x;y),orx= yforshort)andaddingthenecessaryaxioms
to the MLN: equality is re(cid:3)exive, symmetric and transitive; for each unary
predicate P, 8x8yx= y ) (P(x) , P(y)); and similarly for higher-order
predicates and functions (Genesereth & Nilsson, 1987). The resulting MLN
willhave a node for each pair of constants, whose value is 1if the constants
represent the same object and 0 otherwise; these nodes will be connected to
each other and to the rest of the network by arcs representing the axioms
above. Notice that this allows us to make probabilistic inferences about the
mln.tex; 29/07/2005; 14:21; p.8
MarkovLogicNetworks 9
TableII. Constructionofallgroundingsofa(cid:2)rst-orderformulaunderAssumptions1(cid:150)3.
functionGround(F)
input:F,aformulain(cid:2)rst-orderlogic
output:GF,asetofgroundformulas
foreachexistentiallyquanti(cid:2)edsubformula9xS(x)inF
F F with9xS(x)replacedbyS(c1)_S(c2)_:::_S(cjCj),
whereS(ci)isS(x)withxreplacedbyci
GF fFg
foreachuniversallyquanti(cid:2)edvariablex
foreachformulaFj(x)inGF
GF (GF nFj(x))[fFj(c1);Fj(c2);:::;Fj(cjCj)g,
whereFj(ci)isFj(x)withxreplacedbyci
foreachformulaFj 2GF
repeat
foreachfunctionf(a1;a2;:::)allofwhoseargumentsareconstants
Fj Fj withf(a1;a2;:::)replacedbyc,wherec=f(a1;a2;:::)
untilFj containsnofunctions
returnGF
equality of two constants. We have successfully used this as the basis of an
approachtoobjectidenti(cid:2)cation (seeSubsection 8.5).
If the number u of unknown objects is known, Assumption 2 (domain
closure) can be removed simply by introducing uarbitrary new constants. If
uisunknownbut(cid:2)nite,Assumption2canberemovedbyintroducingadistri-
butionoveru,groundingtheMLNwitheachnumberofunknownobjects,and
computingtheprobabilityofaformulaF asP(F) = umaxP(u)P(FjMu ),
u=0 L;C
where Mu is the ground MLN with u unknown objects. An in(cid:2)nite u
L;C P
requiresextending MLNstothecasejCj= 1.
Let H be the set of all ground terms constructible from the function
L;C
symbols in L and the constants in L and C (the (cid:147)Herbrand universe(cid:148) of
(L;C)). Assumption 3 (known functions) can be removed by treating each
element of H as an additional constant and applying the same procedure
L;C
used to remove the unique names assumption. For example, with a function
G(x) and constants A and B, the MLN will now contain nodes for G(A) =A,
G(A)= B, etc. This leads to an in(cid:2)nite number of new constants, requiring
the corresponding extension of MLNs. However, if we restrict the level of
nestingtosomemaximum,theresulting MLNisstill(cid:2)nite.
To summarize, Assumptions 1(cid:150)3 can be removed as long the domain is
(cid:2)nite.WebelieveitispossibletoextendMLNstoin(cid:2)nitedomains(seeJaeger
(1998)), but this isanissue ofchie(cid:3)y theoretical interest, and weleave itfor
futurework.IntheremainderofthispaperweproceedunderAssumptions1(cid:150)
3,exceptwherenoted.
mln.tex; 29/07/2005; 14:21; p.9
10 RichardsonandDomingos
A (cid:2)rst-order KB can be transformed into an MLNsimply by assigning a
weighttoeach formula.Forexample, theclauses andweightsinthelasttwo
columns ofTableIconstitute anMLN.According tothis MLN,other things
being equal, a world where n friendless people are non-smokers is e(2:3)n
times less probable than a world where all friendless people smoke. Notice
thatalltheformulasinTableIarefalseintherealworldasuniversallyquan-
ti(cid:2)ed logical statements, but capture useful information on friendships and
smokinghabits,whenviewedasfeaturesofaMarkovnetwork.Forexample,
it is well known that teenage friends tend to have similar smoking habits
(Lloyd-Richardson et al., 2002). In fact, an MLN like the one in Table I
succinctlyrepresentsatypeofmodelthatisastapleofsocialnetworkanalysis
(Wasserman&Faust,1994).
It is easy to see that MLNs subsume essentially all propositional proba-
bilisticmodels,asdetailed below.
PROPOSITION4.2. Every probability distribution over discrete or (cid:2)nite-
precisionnumericvariables canberepresented asaMarkovlogicnetwork.
Proof.Consider(cid:2)rstthecaseofBooleanvariables(X1;X2;:::;Xn).De(cid:2)ne
a predicate of zero arity R for each variable X , and include in the MLN
h h
L a formula for each possible state of (X ;X ;:::;X ). This formula is a
1 2 n
conjunction of n literals, with the hth literal being R () if X is true in the
h h
state,and:R ()otherwise.Theformula’sweightislogP(X ;X ;:::;X ).
h 1 2 n
(Ifsomestateshavezeroprobability,useinsteadtheproductform(seeEqua-
tion3),with(cid:30) ()equaltotheprobabilityoftheithstate.)Sinceallpredicates
i
in L have zero arity, L de(cid:2)nes the same Markov network M irrespective
L;C
of C, with one node for each variable X . For any state, the corresponding
h
formula is true and all others are false, and thus Equation 3 represents the
original distribution (notice that Z = 1). The generalization to arbitrary
discrete variables is straightforward, by de(cid:2)ning a zero-arity predicate for
each value of each variable. Similarly for (cid:2)nite-precision numeric variables,
bynotingthattheycanberepresented asBooleanvectors. 2
Ofcourse, compact factored models like Markov networks and Bayesian
networks can still be represented compactly by MLNs,by de(cid:2)ning formulas
for the corresponding factors (arbitrary features in Markov networks, and
statesofanodeanditsparentsinBayesiannetworks).2
First-order logic (with Assumptions 1(cid:150)3 above) is the special case of
MLNsobtained whenall weights are equal and tend to in(cid:2)nity, as described
below.
2 Whilesomeconditionalindependencestructurescanbecompactlyrepresentedwithdi-
rectedgraphsbutnotwithundirectedones,theystillleadtocompactmodelsintheformof
Equation3(i.e.,asproductsofpotentialfunctions).
mln.tex; 29/07/2005; 14:21; p.10
Description:introduces Markov logic networks and algorithms for inference and learning in
them (Sections 4–6). We then report our experimental results (Section 7). Finally