Table Of Content1
Submodular relaxation for inference in Markov
random fields
Anton Osokin Dmitry Vetrov
Abstract—InthispaperweaddresstheproblemoffindingthemostprobablestateofadiscreteMarkovrandomfield(MRF),
alsoknownastheMRFenergyminimizationproblem.ThetaskisknowntobeNP-hardingeneralanditspracticalimportance
5 motivates numerous approximate algorithms. We propose a submodular relaxation approach (SMR) based on a Lagrangian
1 relaxationoftheinitialproblem.UnlikethedualdecompositionapproachofKomodakisetal.[30]SMRdoesnotdecomposethe
0 graphstructureoftheinitialproblembutconstructsasubmodularenergythatisminimizedwithintheLagrangianrelaxation.Our
2 approachisapplicabletobothpairwiseandhigh-orderMRFsandallowstotakeintoaccountglobalpotentialsofcertaintypes.
Westudytheoreticalpropertiesoftheproposedapproachandevaluateitexperimentally.
n
a
IndexTerms—Markovrandomfields,energyminimization,combinatorialalgorithms,relaxation,graphcuts
J
✦
5
1
]
V
1 INTRODUCTION list of preferred configurations and the default value
C
for all the others [29], [37].
.The problem of inference in a Markov random field
s Inthispaperwedevelopthesubmodularrelaxation
(MRF) arises in many applied domains, e.g. in ma-
c framework (SMR) for approximate minimization of
[chine learning, computer vision, natural language
energies with pairwise and sparse high-order poten-
processing, etc. In this paper we focus on one impor-
1 tials.SMRisbasedonaLagrangianrelaxationofcon-
vtant type of inference: maximum a posteriori (MAP)
sistency constraints on labelings, i.e. constraints that
1inference, often referred to as MRF energy minimiza-
make each node to have exactly one label assigned.
7tion. Inference of this type is a combinatorial opti-
7mization problem, i.e. an optimization problem with The SMR Lagrangian can be viewed at as a pseudo-
3 Boolean function of binary indicator variables and is
the finite domain.
0 often submodular, thus can be efficiently minimized
1. ThemoststudiedcaseofMRFenergyminimization with a max-flow/min-cut algorithm (graph cut) at
isthesituationwhentheenergycanberepresentedas
0 each step of the Lagrangian relaxation process.
a sum of terms (potentials) that depend on only one
5 In this paper we explore the SMR framework the-
1ortwovariableseach(unaryandpairwisepotentials).
oretically and provide explicit expressions for the ob-
:In this setting the energy is said to be defined by a
v tainedlowerbounds.Weexperimentallycompareour
graph where the nodes correspond to the variables
i approach against several baselines (decomposition-
Xandtheedgestothepairwisepotentials.Minimization
based approaches [30], [25], [29]) and show its appli-
rofenergiesdefinedongraphsinknowntobeNP-hard
a cability on a number of real-world tasks.
in general [8] but can be done exactly in polynomial
The rest of the paper is organized as follows. In
time in a number of special cases, e.g. if the graph
sec. 2 we review the related works. In sec. 3 we
defining the energy is acyclic [36] or if the energy is
introduce our notation and discuss some well-known
submodularinstandard[28]ormulti-labelsense[10].
results. In sec. 4 we present the SMR framework.
Onewaytogobeyondpairwisepotentialsistoadd
In sec. 5 we construct several algorithms within the
higher-order summands to the energy. For example,
SMR framework. In sec. 6 we present the theoretical
Kohli etal. [23] and Ladicky´ etal. [32] use high-order
analysisofourapproachandinsec.7itsexperimental
potentials based on superpixels (image regions) for
evaluation. We conclude in sec. 8.
semantic image segmentation; Delong et al. [11] use
labelcost potentialsfor geometric modelfitting tasks.
Tobetractable,high-orderpotentialsneedtohavea 2 RELATED WORKS
compactrepresentation.Therecentlyproposedsparse
We mention works related to ours in two ways. First,
high-orderpotentialsdefinespecificvaluesforashort
we discuss methods that analogously to SMR rely
on multiple calls of graph cuts. Second, we mention
• A. Osokin is with SIERRA team, INRIA and E´cole Normale some decomposition methods based on Lagrangian
Supe´rieure,Paris,France.E-mail:[email protected] relaxation.
• D. Vetrov is with Faculty of computer science, Higher School of
Economics,Moscow,RussiaandLomonosovMoscowStateUniversity, Multiple graph cuts. The SMR framework uses the
Moscow,Russia.E-mail:[email protected] one-versus-allencodingforthevariables,i.e.theindi-
cators that particular labels are assigned to particular
2
variables, and runs the graph cut atthe inner step. In ular subproblems but only in case of problems with
this sense our method is similar to the α-expansion binaryvariablesandtogether withthedecomposition
algorithm [8] (and its generalizations [22], [23]) and of the graph. Strandmark and Kahl [42] used decom-
the methods for minimizing multi-label submodular position into submodular subproblems to construct
functions [14], [10]. Nevertheless SMR is quite dif- a distributed max-flow/min-cut algorithm for large-
ferent from these approaches. The α-expansion algo- scale binary submodular problems. Komodakis and
rithm is a move-making method where each move Paragios [29] generalized the method of [30] to work
decreases the energy. Methods of [14] and [10] are with the energies with sparse high-order potentials.
exact methods applicable only when a total order on In this method groups of high-order potentials were
thesetoflabelsisdefinedandthepairwisepotentials put into separate subproblems.
are convex w.r.t. it (or more generally submodular Several times the Lagrangian relaxation approach
in multi-label sense). Note that the frequently used was applied to the task of energy minimization with-
Potts model is not submodular when the number of out decomposing the graph of the problem. Yarkony
possible labelsis largerthan2, sothe methods of [14] et al. [46] created several copies of each node and
and [10] are not applicable to it. enforced copies to take equal values. Yarkony et al.
Jegelka and Bilmes [16], Kohli et al. [24] studied [47] introduced planar binary subproblems in one-
the high-order potentials that group lower-order po- versus-all way (similarly to SMR) and used them in
tentials together (cooperative cuts). Similarly to SMR the scheme of tightening the relaxation.
cooperative cut methods rely on calling graph cuts The SMR approachis similar to the methods of [2],
multiple times. As opposed to SMR these methods [29], [30], [42], [46], [47] in a sense that all of them
construct and minimize upper (not lower) bounds on use the Lagrangian relaxation to reach an agreement.
the initial energy. The main advantage of the SMR is that the number
The QPBO method [5] is a popular way of con- of dual variables does not depend on the number of
structing the lower bound on the global minimum labels and the number of high-order potentials but is
usinggraphcuts.Whentheenergydependsonbinary fixedtothenumberofnodes.Thisstructureresultsin
variables the SMR approach obtains exactly the same the speed improvements over the competitors.
lower bound as QPBO.
Decompositionmethodssplit theoriginal hardprob- 3 NOTATION AND PRELIMINARIES
lem into easier solvable problems and try to make
Notation. Let V be a finite set of variables. Each
them reach an agreement on their solutions. One
variable i ∈ V takes values x ∈ P where P is
i
advantage of these methods is the ability not only
a finite set of labels. For any subset of variables
to estimate the solution but also to provide a lower C ⊆ V the Cartesian product P|C| is the joint domain
bound on the global minimum. Obtaining a lower
of variables C. A joint state (or a labeling) of variables
bound is important because it gives an idea on how
C is a mapping x : C → P and x (i), i ∈ C is the
C C
much better the solution could possibly be. If the image of iunder x . We use PC todenote asetof all
C
lower bound is tight (the obtained energy equals possible joint states of variables in C.1 A joint state of
the obtained lower bound) the method provides a
all variables in V is denoted by x (instead of x ). If
V
certificate of optimality, i.e. ensures that the obtained an expression contains both x ∈ PC and x ∈ PA,
C A
solution is optimal. The typical drawback of such
A,C ⊆V weassumethatthemappingsareconsistent,
methodsisahighercomputationalcostincomparison
i.e. ∀i∈A∩C :x (i)=x (i).
C A
with e.g. the α-expansion algorithm [18]. Considerhypergraph(V,C)whereC ⊆2V isasetof
Usually a decomposition method can be described
hyperedges. A potential of hyperedge C is a mapping
bythetwofeatures:awaytoenforceagreementofthe θ : PC → R and a number θ (x ) ∈ R is the value
C C C
subproblems and type of subproblems. We are aware
of the potential on labeling x . The order of potential
C
of two ways to enforce agreement: message passing
θ refersto the cardinalityof setC and isdenoted by
C
andLagrangianrelaxation.Messagepassingwasused
|C|. Potentials of order 1 and 2 are called unary and
in this context by Wainwright et al. [43], Kolmogorov
pairwise correspondingly. Potentials of order higher
[25] in the tree-reweighted message passing (TRW)
than 2 are called high-order.
algorithms. TheLagrangianrelaxationwasappliedto
An energy defined on hypergraph (V,C) (factoriz-
energy minimization by Komodakis et al. [30] and
able according to hypergraph (V,C)) is a mapping
became very popular because of its flexibility. The E : PV → R that can be represented as a sum of
SMR approach uses the Lagrangian relaxation path.
potentials of hyperedges C ∈C:
The majority of methods based on the Lagrangian
relaxationdecomposethegraphdefiningtheproblem E(x)= θC(xC). (1)
into subgraphs of simple structure. The usual choice C∈C
X
is to use acyclic graphs (trees) or, more generally,
1. Note that PC is highly related to P|C|. The “mapping
graphsoflowtreewidth[30],[2].Komodakisetal.[30]
terminology” is introduced to allow original variables i∈C ⊆ V
pointed out the alternative option of using submod- tobeusedasargumentsoflabelings:xC(i).
3
It is convenient to use notational shortcuts related w.r.t. unary y ≥ 0 and pairwise y ≥ 0 variables
ip ij,pq
to potentials and variables: θ (x ,...,x ) = over the so-called marginal polytope: a linear polytope
i1...ik i1 ik
θ =θ =θ (x ), x (i)=x where with facetsdefinedbyallmarginalizationconstraints.
i1...ik,xi1,...,xik C,xC C C C C,i
C ={i ,...,i } and i∈C. The marginal polytope has an exponential number of
1 k
When an energy is factorizable to unary and pair- facets, thus minimizing (6) over it is intractable. A
wise potentials only we call it pairwise. We write polytope defined using only the first-order marginal-
pairwise energies as follows: ization constraints is called a local polytope:
E(x)= θi(xi)+ θij(xi,xj). (2) yip =1, ∀i∈V, (7)
Xi∈V {iX,j}∈E pX∈P
Here E is a set of edges, i.e. hyperedges of order 2. y =y , y =y , ∀{i,j}∈E, ∀p,q∈P.
ij,pq jq ij,pq ip
UnderthisnotationsetCcontainshyperedgesoforder
p∈P q∈P
X X
1 and 2: C ={{i}|i∈V}∪{{i,j}|{i,j}∈E}. (8)
Sometimesitwillbeconvenienttouseindicatorsof
Werefertoconstraints(7)asconsistencyconstraintsand
each variable taking each value: y = [x =p], i ∈ V,
ip i to constraints (8) as marginalization constraints.
p∈P. Indicatorsallow usto rewritethe energy(1) as
Minimizing (6) over local polytope (7)-(8) is often
E (y)= θ y . (3) called a Schlesinger’s LP relaxation [40] (see e.g. [44]
I C,d idi
CX∈CdX∈PC iY∈C for a recent review). In this paper we refer to this
Minimizing multilabelenergy(1) overPV isequiva- relaxation as the standard LP relaxation.
lent to minimizing binary energy (3) over the Boolean
cube {0,1}V×P under so-called consistency constraints: 4 SUBMODULAR RELAXATION
Inthissectionwepresentourapproach.Westartwith
y =1, ∀i∈V. (4)
ip pairwiseMRFs(sec.4.1)[35],generalizetohigh-order
p∈P
X potentials (sec. 4.2) and to linear global constraints
Throughout this paper, we use equation numbers (sec. 4.3). A version of SMR for nonsubmodular La-
to indicate a constraint given by the corresponding grangiansisdiscussedinthesupplementarymaterial.
equation, for example y ∈(4).
The Lagrangian relaxation approach. A popular ap-
4.1 PairwiseMRFs
proximate approach to minimize energy (1) is to en-
The indicator notation allows us to rewrite the min-
largethevariables’domain,i.e.performtherelaxation,
imization of pairwise energy (2) as a constrained
and to construct the dual problem to the relaxed one
optimization problem:
as it might be easier to solve. Formally, we get
mλaxD(λ)≤my∈iYnElb(y)≤xm∈iXnE(x) (5) myin θipyip+ θij,pqyipyjq, (9)
Xi∈VpX∈P {iX,j}∈EpX,q∈P
where X and Y are discrete and continuous sets,
s.t. y ∈{0,1}, ∀i∈V,p∈P, (10)
ip
respectively, where X ⊂ Y. E (y) is a lower bound
lb
function defined on set Y and D(λ) a dual function. yip =1, ∀i∈V. (11)
Both inequalities in (5) can be strict. If the left one is pX∈P
strictwe saythatthereisanon-zerodualitygap,ifthe Constraints (51) (i.e. (4)) guarantee each node i to
right one is strict – there is a non-zero integrality gap. have exactly one label assigned, constraints (50) fix
Note that if the middle problem in (5) is convex and variables y to be binary.
ip
one of the regularity conditions is satisfied (e.g. it is The target function (49) (denoted by E (y)) of bi-
I
a linear program) there is no duality gap. The goal of nary variables y (i.e. (50) holds) is a pseudo-Boolean
all relaxation methods is to make the joint gap in (5) polynomial of degree at most 2. This function is easy
as small as possible. In this paper we will construct tominimize(ignoringtheconsistencyconstraints(51))
the dual directly from the discrete primal and will if it is submodular, which is equivalent to having
need the continuous primal only for the theoretical θ ≤0foralledges{i,j}∈E andalllabelsp,q ∈P.
ij,pq
analysis. The so-called associative potentials are examples of
The linear relaxation of a pairwise energy consists potentials that satisfy these constraints. A pairwise
in converting the energy minimization problem to an potentialθ isassociativeifitrewardsconnectednodes
ij
equivalent integer linear program (ILP) and in relaxing for taking the same label, i.e.
it to a linear program (LP) [40], [43].
−C , p=q,
Wainwright et al. [43] have shown that minimizing ij,p
θ (p,q)=−C [p=q]= (12)
ij ij,p
energy (2) over the discrete domain is equivalent to (0, p6=q.
minimizing the linear function
where C ≥ 0. Note that if constants C are
ij,p ij,p
EL(yL)= θipyip+ θij,pqyij,pq (6) independent of label index p (i.e. Cij,p = Cij) then
Xi∈VpX∈P (iX,j)∈EpX,q∈P potentials (12) are equivalent to the Potts potentials.
4
Osokin et al. [35] observe that when E (y) is 4.2 High-orderMRFs
I
submodular consistency constraints (51) only prevent
Consider a problem of minimizing the energy con-
problem (49)-(51) from being solvable, i.e. if we drop
sisting of the high-order potentials (1). The uncon-
or perform the Lagrangian relaxation of them we
strainedminimizationofenergyE(x)overmulti-label
obtain an easily computable lower bound. In other
variables x is equivalent to the minimization of func-
words we can look at the following Lagrangian:
tion E (y) (3) over variables y under the consistency
I
constraints(51)andthediscretizationconstraints(50).
L(y,λ)=E (y)+ λ y −1 . (13)
I i ip For now let us assume that all the high-order po-
i∈V (cid:18)p∈P (cid:19)
X X tentials are pattern-based and sparse [37], [29], i.e.
AsafunctionofbinaryvariablesyLagrangianL(y,λ)
differs from EI(y) only by the presence of unary θ (x )= θˆC,d, if xC =d∈DC, (17)
potentialsλiyip andaconstant, soLagrangianL(y,λ) C C (0, otherwise.
is submodular whenever E (y) is submodular.
I
Applying the Lagrangian relaxation to the prob- Here set DC contains some selected labelings of vari-
lem (49)-(51) we bound the solution from below ables xC and all the values of the potential are non-
positive, i.e. θˆ ≤ 0, d ∈ D . In case of sparse
C,d C
y∈m(50i)n,(51)EI(y)=ym∈(i5n0)mλaxL(y,λ)≥ potential the cardinality of set DC is much smaller
compared to the number of all possible labelings of
max min L(y,λ)=maxD(λ) (14)
λ y∈(50) λ variables xC, i.e |DC|≪|PC|.
In this setting we can use the identity
where D(λ) is a Lagrangian dual function:
− y =min (|C|−1)z− y z
ℓ∈C ℓ z∈{0,1} ℓ∈C ℓ
D(λ)= min E (y)+ λ y − λ . (15) to perform a reduction2 of the high-order func-
I i ip i (cid:0) Q (cid:1) (cid:0) P (cid:1)
y∈(50) tion E (y) to a pairwise function in such a way that
(cid:18) p∈Pi∈V (cid:19) i∈V I
XX X min E (y)=min E∗(y,z)
The dual function D(λ) is a minimum of a finite y∈{0,1}∗ I z∈{0,1}∗,y∈{0,1}∗ I
where by {0,1}∗ we denote the Boolean cubes of
number of linear functions and thus is concave and
appropriate sizes and
piecewise-linear (non-smooth). A subgradient g ∈
R|V| of the dual D(λ) can be computed given the
result of the inner pseudo-Boolean minimization: EI∗(y,z)=− θˆC,d (|C|−1)zC,d −
g = y∗ −1, i∈V, (16) CX∈CdX∈DC (cid:18)
i ip y z . (18)
p∈P ℓdℓ C,d
X ℓ∈C (cid:19)
X
where {yi∗p}pi∈∈VP =argmin EI(y)+ λiyip . FunctionEI∗(y,z)ispairwiseandsubmodularw.r.t.
y∈(50) (cid:18) p∈Pi∈V (cid:19) binary variables y and z and thus in the absence of
The concavity of D(λ) togetherPwitPh the ability
additionalconstraintscanbeefficientlyminimized by
to compute the subgradient makes it possible to
graph cuts [28]. Adding and relaxing constraints (51)
tackle the problem of finding the best possible lower
to the minimization of (3) gives us the following
bound in family (14) (i.e. the maximization of D(λ))
Lagrangian dual:
by convex optimization techniques. The optimization
methods are discussed in sec. 5.
D(λ)= min L(y,z,λ)=
Note, that if at some point λ the subgradient given y,z∈{0,1}∗
by eq. (16) equals zero it immediately follows that
min E∗(y,z)+ λ y −1 . (19)
the inequality in (14) becomes equality, i.e. there is I i ip
y,z∈{0,1}∗
no integrality gap and the certificate of optimality is (cid:18) Xi∈V (cid:18)pX∈P (cid:19)(cid:19)
provided. This effect happens because there exists a For all λ Lagrangian L(y,z,λ) is submodular w.r.t.
labelingy∗ deliveringtheminimumofL(y,λ)w.r.t.y y and z allowing us to efficiently evaluate D at
and satisfying the consistency constraints (51). any point. Function D(λ) is a lower bound on the
In a special case when all the pairwise potentials
globalminimumofenergy(1)andisconcavebutnon-
are associative (of form (12)) minimization of the smooth as a function of λ. Analogously to (15) the
Lagrangian w.r.t. y decomposes into |P| subproblems bestpossible lower bound canbefoundusing convex
eachcontainingonlyvariablesindexedbyaparticular
optimization techniques.
label p:
Robust sparse pattern-basedpotentials.We cangen-
min L(y,λ)= min Φ (y ,λ)− λ eralize the SMR approach to the case of robust sparse
p p i
y∈(50) p∈Pyp∈{0,1}|V| i∈V high-orderpotentials using the ideasof Kohli et al. [23]
X X
where Φ (y ,λ)= (θ +λ )y − C y y .
p p ip i ip ij,p ip jp 2.Thistransformationofbinaryhigh-orderfunction(3)isinfact
i∈V {i,j}∈E
equivalenttoaspecialcaseofType-IIbinarytransformationof[37],
In this case the mPinimization tasksPw.r.t y can be
p but in this form it was proposed much earlier (see e.g. [15] for a
solved independently, e.g. in parallel. review).
5
and Rother et al. [37]. Robust sparse high-order po- described at the beginning of section 4.2. Note, that
tentialsinadditiontorewardingthespecificlabelings in this case the number of variables z required to
d ∈ D also reward labelings where the deviation add within the SMR approach can be exponential
C
from d ∈D is small. We can formulate such a term in the arity of hyper edge C. Therefore the SMR is
C
for hyperedge C in the following way: practical when either the arity of all hyperedges is
small or when there exists a compact representation
θC(xC)= min 0, θˆC,d+ wℓC[xℓ 6=dℓ] (20) such as (17) or (20).
dX∈DC (cid:18) ℓX∈C (cid:19)
or equivalently in terms of indicator variables y
4.3 Linearglobalconstraints
θ (y )= min 0, θˆ + wC(1−y ) . (21) Linear constraints on the indicator variables form an
C C C,d ℓ ℓdℓ
important type of global constraints. The following
dX∈DC (cid:18) ℓX∈C (cid:19)
constraints are special case of this class:
The inner min of (21) can be transformed to pairwise
form via adding switching variables z (analogous
C,d y =c, strict class size constraints, (23)
jp
to (18)): j∈V
X
y ∈[c ,c ], soft class size constraints,
zC,dm∈i{n0,1} θˆC,dzC,d+ℓ∈CwℓCzC,d(1−yℓdℓ). Xj∈VyjpI =1 2 y I , flux equalities,
X jp j jq j
j∈V j∈V
This function is submodular w.r.t. variables y and z
if coefficients wC are nonnegative. Introducing La- X yjpIj =µX yip, mean,
ℓ j∈V i∈V
grange multipliers and maximizing the dual function X y (I −µ)X(I −µ)T= y Σ, variance.
jp j j ip
is absolutely analogous to the case of non-robust j∈V j∈V
sparse high-order potentials. X X
Here I is the observable scalar or vector value as-
j
One particular instance of robust pattern-based
sociated with node j, e.g. intensity or color of the
potentials is the robust Pn-Potts model [23]. Here
correspondingimagepixel.Therehavebeenanumber
the cardinality of sets D equals the number of the
C of works that separately consider constraints on total
labels |P| and all the selected labelings correspond to
number of nodes that take a specific label (e.g. [31]).
the uniform labelings of variables x :
C The SMRapproachcaneasilyhandle generallinear
constraints such as
θ (x )= min 0,−γ+ γ/Q[x 6=p] . (22)
C C ℓ
pX∈P (cid:18) ℓX∈C (cid:19) wimpyip =cm, m=1,...,M, (24)
Hereγ ≥0andQ∈(0,|C|]arethemodelparameters. Xi∈VpX∈P
γ represents the reward that energy gains if all the vky ≤dk, k =1,...,K. (25)
ip ip
variables xC take equal values. Q stands for the i∈Vp∈P
XX
number of variables that are allowed to deviate from
the selected label p before the penalty is truncated.3 We introduce additional Lagrange multipliers ξ =
Positivevaluesofθˆ .Nowwedescribeasimpleway {ξm}m=1,...,M, π = {πk}k=1,...,K for constraints (24),
C
(25) and get the Lagrangian
togeneralizeourapproachtothecaseofgeneralhigh-
order potentials, i.e. potentials of form (17) but with
values θˆC,d allowed to be positive. L(y,λ,ξ,π)=EI(y)+ λi yip−1 +
First, note that constraints (50) and (51) im- i∈V (cid:18)p∈P (cid:19)
X X
ply that for each hyperedge C ∈ C equality M
p∈PC ℓ∈Cyℓpℓ = 1 holds. Because of this fact ξm wimpyip−cm +
wPe canQsubtract the constant MC = maxd∈DCθˆC(d) mXK=1 (cid:18)Xi∈VpX∈P (cid:19)
fromallvaluesofpotentialθ simultaneouslywithout
C π vky −dk
changing the minimum of the problem, i.e. k ip ip
k=1 (cid:18)i∈Vp∈P (cid:19)
X XX
y∈m(50i)n,(51)EI(y)=y∈m(50i)n,(51) MC + θ˜C(p) yℓpℓ where inequality multipliers π are constrained to
CX∈C(cid:18) pX∈PC ℓY∈C (cid:19) be nonnegative. The Lagrangian is submodular w.r.t.
where θ˜ (p) = θ (p)−M . We have all θ˜ (p) non- indicator variables y and thus dual function
C C C C
positive and therefore can apply the SMR technique
D(λ,ξ,π)= min L(y,λ,ξ,π) (26)
y∈(50)
3.Intheoriginalformulation oftherobustPn-Pottsmodel[23]
theminimumoperationwasusedinsteadoftheoutersumin(22). can be efficiently evaluated. The dual function is
Thetwoschemesareequivalentifthecostofthedeviationfromthe piecewise linear and concave w.r.t. all variables and
selectedlabelingsifhighenough,i.e.Q<0.5|C|.Thesubstitution
thus can be treated similarly to the dual functions
ofminimumwithsummationinsimilarcontextwasusedbyRother
etal.[37]. introduced earlier.
6
5 ALGORITHMS Input: Lagrangian L(y,λ) (13), initialization λ0;
Output: λ∗ =argmax min L(y,λ);
λ y
In sec. 5.1 we discuss the convex optimization meth-
1: λ:=λ0;
ods that can be used to maximize the dual func-
2: repeat
tions D(λ) constructed within the SMR approach. In 3: y∗ :=argminyL(y,λ); // runmax-flow
sec. 5.2 we develop an alternative approach based on 4: D(λ)=L(y∗,λ); // evaluatethefunction
coordinate-wise ascent.
5: for all i∈V do
6: gi := yi∗p−1; //computethesubgradient
5.1 Convexoptimization p∈P
7: if g =0Pthen
We consider severaloptimization methods thatmight
8: return λ; // thecertificateofoptimality
be used to maximize the SMR dual functions:
9: else
1) subgradient ascent methods [30]; 10: estimate the primal; // seesec.5.3
2) bundle methods [17]; 11: choose stepsize α; // e.g.usingrule(27)
3) non-smooth version of L-BFGS [33]; 12: λ:=λ+αg; //makethestep
4) proximal methods [48]; 13: until convergence criteria are met
5) smoothing-based techniques [39], [48]; 14: return λ;
6) stochastic subgradient methods [30].
Fig. 1. Subgradient ascent algorithm to maximize the
First,webeginwiththemethodsthatarenotsuited
SMRdual(15).
to SMR: proximal, smoothing-based and stochastic.
Thefirsttwogroupsofmethodsarenotdirectlyappli-
cableforSMRbecausetheyrequirenotonlytheblack-
selecting the step size:
box oracle that evaluates the function and computes
the subgradient (in SMR this is done by solving the An−D(λn)
αn =γ (27)
max-flow problem) but request additional computa- k∂D(λn)k2
2
tions:proximalmethods–theprox-operator,log-sum-
where An is the currentestimate of the optimum (the
exp smoothing [39] – solving the sum-product prob-
best value of the primal solution found so far) and γ
lem instead of max-sum, quadratic smoothing [48]
isa parameter.Fig. 1 summarizesthe overallmethod.
requires explicit computation of convex conjugates.4
Bundle methods were analyzed by Kappes et al.
We arenot awareof possibilities of computing any of
[17] in the context of MRF energy minimization. A
those in case of the SMR approach.
bundle B is a collection of points λ′, values of dual
Stochastic techniques have proven themselves to
function D(λ′), and sungradients ∂D(λ′). The next
be highly efficient for large-scale machine learning
point is computed by the following rule:
problems. Bousquet and Bottou [6, sec. 13.3.2] show
that stochastic methods perform well within the λn+1 =argmax Dˆ(λ)− wnkλ−λ¯k2 (28)
estimation-optimization tradeoff but are not so great 2 2
λ (cid:18) (cid:19)
asoptimizationmethodsfortheempiricalrisk(i.e.the
where Dˆ(λ) is the upper bound on the dualfunction:
optimized function). In case of SMR we do not face
the machine learning tradeoffs and thus stochastic Dˆ(λ)= min D(λ′)+h∂D(λ′),λ−λ′i .
subgradient is not suitable. (λ′,D(λ′),∂D(λ′))∈B
(cid:0) (cid:1)
Further in this section we give a short review of Parameter wn is intended to keep λn+1 close to
methods of the three groups (subgradient, bundle, L- the current solution estimate λ¯. If the current step
BFGS) that require only a black-box oracle that for does not give a significant improvement than the
an arbitrary value of dual variables λ computes the null step is performed: bundle B is updated with
function value D(λ) and one subgradient ∂D(λ). (λn+1,D(λn+1),∂D(λn+1)).Otherwisetheseriousstep
Subgradient methods at iteration n perform an is performed: in addition to updating the bundle
update in the direction of the current subgradient the current estimate λ¯ is replaced by λn+1. Kappes
λn+1 =λn+αn∂D(λn). et al. [17] suggest to use the relative improvement of
D(λn+1) and Dˆ(λn+1) over D(λ¯) (threshold m ) to
L
withastepsizeαn.Komodakisetal.[30]andKappes choose the type of the step. Inverse step size wn can
et al. [17] suggest the following adaptive scheme of bechosenadaptivelyifthepreviousstepwasserious:
of4a.cZoancvheextnaol.n[-4s8m]odoirthecftulyncctoinosntrfucbtythfεes=m(ofo∗th+aεpkp·rko22x/i2m)∗atwiohnefrεe wn =P[wmin,wmax] γAn−mk∂axDk(=λ1,n..).k,nD(λk) −1
ε>0 isa smoothing parameterand (·)∗ isthe convexconjugate. (cid:18) 2 (cid:19)
(29)
Intheircase functionf isasumofsummandsofsimpleformfor
which analytical computation of convex conjugates is possible. In where P[wmin,wmax] is the projection on the segment.
our case each summand is a min-cut LP, its conjugate is a max- In case of the null step wn = wn−1. In summary,
flowLP.Theadditionofquadratictermstof∗ makesfε quadratic
the bundle method has the following parameters: γ,
insteadofamin-cutLPandthusefficientmax-flowalgorithmsare
nomoreapplicable. wmin, wmax, mL, and maximum size of the bundle –
7
Input: Lagrangian L(y,λ) (13), initialization λ0; a new value λnew such that D(λnew) ≥ D(λold). The
Output: coordinate-wise maximum of D(λ);
new value will differ from the old value only in one
1: λ:=λ0; coordinate λ , i.e. λnew =λold, i6=j.
j i i
2: repeat
Considertheunarymin-marginalsofLagrangian(13)
3: converged:=true;
4: for all j ∈V do MMjp,kL y,λold = min L(y,λold)
5: for all p∈P do y∈(50),yjp=k
(cid:0) (cid:1)
6: compute MMjp,0L(y,λ), MMjp,1L(y,λ); where k ∈ {0,1}, p ∈ P. The min-marginals
7: δpj :=MMjp,0 L(y,λ)−MMjp,1 L(y,λ); can be computed efficiently using the dynamic
8: find δj and δj ; graphcut framework [21]. Denote the differences of
(1) (2)
9: if δj and δj of the same sign then min-marginals for labels 0 and 1 with δpj: δpj =
1101:: λcoj(n1:=v)eλrgje+d(0:2=.)5(fδa(jl1s)e+; δ(j2)); MtheMjmp,a0xLim(cid:0)yu,mλoldo(cid:1)f −δpjM, δM(j2j)p,1–L(tyh,eλonlde)x.tLmetaxδi(jm1)umbe;
12: until converged6=true; p(1) and p(2) – the corresponding indices: p(1) =
j j j
13: return λ; argmax δj, δj = δj , p(2) = argmax δj,
p∈P p (1) p(j1) j p∈P\{p(j1)} p
Fig. 2. Coordinate ascent algorithm for maximizing δj = δj . Let us construct the new value λnew of
the dual D(λ) (15) in case of pairwise associative (2) p(j2) j
dual variable λ with the following rule:
potentials. j
λnew =λold+∆ (30)
j j j
bs. Note thatthe update(28) implicitly estimatesboth where ∆ is a number from segment δj ,δj .
the direction gn and the step size αn. j (2) (1)
h i
Another algorithm analyzed by Kappes et al. [17] Theorem 1. Rule (30) when applied to a pairwise energy
is the aggregated bundle method proposed by Kiwiel with associative pairwise potentials delivers the maximum
[19] with the same step-size rule (29). The method of function D(λ) w.r.t. variable λj.
has the following parameters: γ, w , w , and a
min max The proof relies on the intuition that rule (30) di-
threshold to choose null or serious step – m . Please
r rectly assigns the min-marginals MM L(y,λ), p∈
jp,k
see works [17], [19] for details. We have also tried
P, k∈{0,1} such values that only one subproblem p
a combined strategy: LMBM method5[13] where we
has y equal to 1. We provide the full proof in the
j,p
used only one non-default parameter: the maximum
supplementary material.
size of a bundle b .
s Theorem 1 allows us to construct the coordinate-
L-BFGSmethodschoosethedirectionoftheupdate
ascent algorithm (fig. 2). This result cannot be ex-
using the history of function evaluations: Sn∂D(λn)
tended to the cases of general pairwise or high-order
where Sn is a negative semi-definite estimate of the
potentials. We are not aware of analytical updates
inverse Hessian at λn. The step size αn is typically
like (30) for these cases. The update w.r.t. single vari-
chosen via approximate one-dimensional optimiza-
able can be computed numerically but the resulting
tion, a.k.a. line search. Lewis and Overton [33] pro-
algorithm would be very slow.
posedaspecializedversionofL-BFGSfornon-smooth
functions implemented in HANSO library.6 In this
code we varied only one parameter: the maximum 5.3 Gettingaprimalsolution
rank of Hessian estimator hr. Obtaining a primal solution given some value of the
dual variables is an important practical issue related
5.2 Coordinate-wiseoptimization to the Lagrangian relaxation framework. There are
twodifferenttasks:obtainingthefeasibleprimalpoint
Analternativeapproachtomaximizethedualfunc-
of the relaxationand obtaining the finaldiscretesolu-
tion D(λ) consists in selecting a group of variables
tions.ThefirsttaskwasstudiedbySavchynskyyetal.
and finding the maximum w.r.t them. This strategy
[39] and Zach et al. [48] for algorithms based on the
was used in max-sum diffusion [44] and MPLP [12]
smoothing of the dual function. Recently, Savchyn-
algorithms. In this section we derive an analogous
skyy and Schmidt [38] proposed a general approach
algorithm for the SMR in case of pairwise associative
that can be applied to SMR whenever a subgradient
potentials. The algorithm is based on the concept of
algorithm is used to maximize the dual. The second
min-marginalaveragingused in [43], [25] to optimize
aspect is usually resolved using some heuristics. Os-
the TRW dual function.
okin and Vetrov [34] propose an heuristic method to
Consider some value λold of dual variables λ and
obtain the discrete primal solution within the SMR
a selectednode j ∈V.We will show how to construct
approach.Theirmethodisbasedontheideaofblock-
coordinate descent w.r.t. the primal variables and is
5.http://napsu.karmitsa.fi/lmbm/lmbmu/lmbm-mex.tar.gz
6.http://www.cs.nyu.edu/overton/software/hanso/ similar to the analogous heuristics in TRW-S [25].
8
6 THEORETICAL ANALYSIS D(λ) D(λ) D(λ)
In this section we explore some properties of the integrality
gap
maximum points of the dual functions and express
their maximum values explicitly in the form of LPs.
The latter allows us to reason about tightness of
the obtained lower bounds and to provide analytical λ∗ λ λ∗ λ λ∗ λ
(1) (2) (3)
comparisons against other approaches.
Fig.3. Thethreepossiblecasesatthemaximumofthe
SMRdualfunction.Theoptimalenergyvalueisshown
6.1 Definitionsandtheoreticalproperties
by the horizontal dotted line. Solid lines show faces of
Denote the possible optimal labelings of binary vari- thehypographofthedualfunctionD(λ).
able y associated with node i and label p given
ip
Lagrange multiplyers λ with
Z (λ)= z |∃yλ ∈ArgminL(y,λ) : yλ =z 3) Some sets Zip(λ∗) consist of multiple elements,
ip y∈(50) ip labeling y∗ that is both consistent with (51) and
(cid:8) (cid:9) istheminimumoftheLagrangiandoesnotexist,
where L(y,λ) is the Lagrangian constructed within
the integrality gap is non-zero.
the SMR approach (13).
In situations 1 and 2 there exists a binary label-
Definition 1. Point λ satisfies the strong agreement ing y∗ that simultaneously satisfies consistency con-
condition if for all nodes i, for one label p set Zip(λ) straints (51) and delivers the minimum of the La-
equals {1} and for the other labels it equals {0}: grangian L(y,λ∗). Labeling y∗ defines the horizontal
hyperplane that contains point (λ∗,D(λ∗)).
∀i∈V ∃!p∈P : Z (λ)={1}, ∀q 6=pZ (λ)={0}.
ip iq Insituation1labelingy∗ iseasytofindbecausethe
This definition implies that for a strong agreement subgradient defined by (16) equals zero. Situation 2
point λ there exists the unique binary labeling y can be trickier but in certain cases might be resolved
consistent with all sets Z (λ) (y ∈ Z (λ)) and exactly. Some examples are given in [35].
ip ip ip
with consistency constraints (51). Consequently, such In practice the convex optimization methods de-
y defines labeling x ∈ PV that delivers the global scribed in sec. 5.1 typically converge to the point
minimum of the initial energy (1). that satisfies the strong agreement condition rather
quickly when there is no integrality gap. In case of
Definition 2. Point λ satisfies the weak agreement
the non-zero integrality gap all the methods start to
condition if for all nodes i∈V both statements hold true:
oscillate around the optimum and do not provide a
1) ∃p∈P : 1∈Zip(λ). point satisfying even the weak agreement condition.
2) ∀p∈P : Zip(λ)={1}⇒∀q 6=p, 0∈Zjq(λ). Thecoordinate-wisealgorithm(fig.2)oftenconverges
to points with low values of the dual function (there
This definition means that for a weak agreement
point λ there exists a binary labeling y consistent exist multiple coordinate-wise maximums of the dual
with all sets Z (λ) and consistency constraints (51). function)butallowstoobtainapointthatsatisfiesthe
ip
weakagreementcondition. Thisresultissummarized
It is easy to see that the strong agreement condition
in theorem 3 proven in the supplementary material.
implies the weak agreement condition.
DenoteamaximumpointofdualfunctionD(λ)(15) Theorem 3. Coordinate ascent algorithm (fig. 2) when
by λ∗ and an assignment of the primal variables applied to a pairwise energy with associative pairwise
that minimize Lagrangian L(y,λ∗) by y∗. We prove potentialsreturns a pointthatsatisfiestheweak agreement
the following property of λ∗ in the supplementary condition.
material.
Theorem 2. A maximum point λ∗ of the dual func- 6.2 Tightnessofthelowerbounds
tion D(λ) satisfies the weak agreement condition.
6.2.1 PairwiseassociativeMRFs
In general the three situations are possible at the Osokin et al. [35] show that in the case of pairwise
global maximum of the dual function of SMR (see associative MRFs maximizing Lagrangiandual(15) is
fig. 3 for illustrations): equivalent to the standard LP relaxation of pairwise
1) The strong agreement holds, binary labeling y∗ energy(2),thustheSMRlowerboundequalstheones
satisfying consistency constraints (51) (thus la- of competitive methods such as [30], [17], [39] and in
beling x∗ as well) can be easily reconstructed, generalistighterthanthelowerboundsofTRW-S[25]
there is no integrality gap. and MPLP [12] (TRW-S and MPLP can get stuck at
2) Some(maybeall)setsZ (λ∗)consistofmultiple bad coordinate-wise maximums). In addition Osokin
ip
elements,x∗ consistentwith(51)isnon-trivialto etal.[35]showthatinthecaseofpairwiseassociative
reconstruct, but there is still no integrality gap. MRFs the standard LP relaxation is equivalent to
9
Kleinberg-Tardos (KT) relaxation of a uniform metric Lemma 2 is a direct corollary of the strong dual-
labeling problem [20]. In this section we reformulate ity theorem with linearity constraint qualification [3,
the most relevant result for the sake of completeness. Th. 6.4.2].
Theproofwillbegivenlaterasacorollaryofourmore Let us finish the proof of theorem 5. All values of
general result for high-order MRFs (theorems 5 and the potentials are non-positive θˆ ≤ 0 so lemma 1
C,d
6). implies that the LP (31)-(34) is equivalent to the
following LP:
Theorem 4. In case of pairwise MRFs with associative
pairwise potentials (2), (12) the maximum of Lagrangian
min − θˆ (|C|−1)z − zℓ (39)
dual (15) equals the minimum value of the standard LP y,z C,d C,d C,d
relaxation (6), (7), (8) of the discrete energy minimization CX∈CdX∈DC (cid:18) ℓX∈C (cid:19)
s.t.zℓ ∈[0,1], ∀C∈C, ∀d∈D , ∀ℓ∈C, (40)
problem. C,d C
y ,z ∈[0,1], ∀i∈V, ∀p∈P, ∀C∈C, ∀d∈D ,
ip C,d C
6.2.2 High-orderMRFs zℓ ≤z , zℓ ≤y ,C∈C, ∀d∈D , ∀ℓ∈C,
C,d C,d C,d ℓdℓ C
In this section we state the general result, compare (41)
it to known LP relaxations of high-order terms and y =1, ∀i∈V. (42)
ip
further elaborate on one important special case. p∈P
X
The following theorem explicitly defines the LP
Denote the target function (39) by Q(y,z). Let
relaxation of energy (1) that is solved by SMR.
R(λ)= min Q(y,z)+ λ y −1 .
i ip
Theorem 5. In case of high-order MRF with sparse y,z∈(40)−(41)
(cid:18) i∈V (cid:18)p∈P (cid:19)(cid:19)
pattern-based potentials (17) the maximum of the SMR X X
Task (39), (40)-(41) is equivalent to the standard
dual function D(λ) (19) equals the minimum value of the
LP-relaxation of function (18) of binary variables y
following linear program:
and z (see lemma 3 in the supplementary mate-
min θˆC,d yC,d (31) rial). As θˆC,d ≤ 0, function (18) is submodular and
y
CX∈CdX∈DC thus its LP-relaxation is tight [27, Th. 3], which
s.t.y ,y ∈[0,1], ∀i∈V, ∀p∈P, ∀C∈C, ∀d∈D , means that R(λ) = D(λ) for an arbitrary λ. Func-
ip C,d C
(32) tion f(y) = minz∈(40)−(41)Q(y,z), set Y defined
by y ∈ [0,1], matrix A and vector b defined by (42)
y ≤y , ∀C ∈C, ∀d∈D , ∀ℓ∈C, (33) ip
C,d ℓdℓ C satisfy the conditions of lemma 2 which implies that
yip =1, ∀i∈V. (34) the value of solution of LP (39)-(42) equals R(λ∗),
p∈P where λ∗ = argmax R(λ). The proof is finished by
X λ
equality R(λ∗)=D(λ∗).
Proof: We start the proof with the two lemmas.
Corollary 1. For the case of pairwise energy (2) with
Lemma 1. Let y ,...,y be real numbers belonging to
1 K θ ≤ 0 the SMR returns the lower bound equal to the
the segment [0,1], K > 1. Consider the following linear ij,pq
solution of the following LP-relaxation:
program:
K min θipyip+ θij,pqyij,pq (43)
y
min z(K−1)− zk (35) Xi∈VpX∈P {iX,j}∈EpX,q∈P
z,z1,...,zK∈[0,1] Xk=1 s.t.yip ∈[0,1], ∀i∈V, ∀p∈P,
s.t. z ≤z, z ≤y . (36)
k k k y ∈[0,1], ∀{i,j}∈E, ∀p,q ∈P,
ij,pq
Point z = z1 = ··· = zK = mink=1,...,Kyk delivers the yij,pq ≤yip, yij,pq ≤yjq, ∀{i,j}∈E,∀p,q∈P,
minimum of LP (75)-(76). (44)
Lemma 1 is proven in the supplementary material. yip =1, ∀i∈V. (45)
p∈P
Lemma 2. Consider the convex optimization problem PermutXed Pn-Potts. Komodakis and Paragios [29]
formulate a generally tighter relaxation than (31)-(34)
min f(y), (37)
by adding constraints responsible for marginalization
y∈Y
of high-order potentials w.r.t all but one variable:
s.t. Ay =b, (38)
where Y ⊆ Rn is a convex set, A is a real-valued matrix yC,p =yℓp0, ∀C ∈C, ∀p0 ∈P, ∀ℓ∈C. (46)
of size n×m, b∈Rm, f :Rn →R is a convex function. p∈PCX:pℓ=p0
Let set Y contain a non-empty interior and the solution of In this paragraph we show that in certain cases the
problem (37)-(38) be finite. relaxation of [29] is equivalent to (31)-(34), i.e. con-
Then the value of the solution of (37)-(38) equals straints (46) are redundant.
λm∈aRxmmy∈iYnL(y,λ) where L(y,λ)=f(y)+λT(Ay−b). Definition 3. Potential θC is called permuted Pn-Potts if
10
TABLE1
Theoptimalparametervalueschosenbythegrid
∀C ∈C ∀d′,d′′ ∈D ∀i∈C : d′ 6=d′′ ⇒d′ 6=d′′.
C i i searchinsec.7.1.
In permuted Pn-Potts potentials all preferable con-
figurations differfromeachother inallvariables.The Method Segmentation Stereo
Pn-Potts potential described by Kohli et al. [22] is a W subgradient γ=0.3 γ=0.1
special case. R γ=0.01, mL=0.05, γ=0.01, mL=0.05,
T bundle
D wmax=1000 wmax=2000
Theorem 6. If all higher-order potentials are permuted D aggr. γ=0.02, wmax=100, γ=0.01,mr=0.002,
Pn-Potts,thenthemaximumofdualfunction(19)isequal bundle mr=0.0005 wmax=500
to the value of the solution of LP (31)-(34), (46).
subgradient γ=0.7 γ=0.3
variaPbrloeosf:yFirst,taaklle vmalauxeismaθˆlC,dpoasrseiblneonv-aplouseisti.veThsios SMR bundle wγ=m0a.x1=,100mL=0.2, wγ=m0a.x0=1,100mL=0.1,
C,p aggr. γ=0.02,mr=0.001, γ=0.02,mr=0.001,
implies that equality in (46) can be substituted with bundle wmax=500 wmax=1000
inequality. Second, for a permuted Pn-Potts potential
θ for each ℓ ∈ C and for each p ∈ P there
C 0
exists no more than one labeling d ∈ DC such that maximum value of lower bound obtained within 25
dℓ = p0 which means that all the sums in (46) can seconds for segmentation and 200 seconds for stereo.
be substituted by the single summands. These two Forthebundlemethodweobservethatlargerbundles
observations prove that (46) is equivalent to (33). lead to better results, but increase the computation
The LP result for the associative pairwise MRFs time. Value b = 100 is a compromise for both stereo
s
(theorem 4) immediately follows from theorem 6 and and segmentation datasets. The same reasoning gives
corollary 1 because all associative pairwise poten- us b = 10 for LMBM and h = 10 for L-BFGS.
s r
tials (12) are permuted Pn-Potts and condition (46) Parameter w for (aggregated) bundle method did
min
reduces to marginalization constraints (8). notaffecttheperformancesoweused10−10asin[17].
The remaining parameters are presented in table 1.
6.2.3 Globallinearconstraints Implementation details. For L-BFGS and LMBM we
Theorem 7. The maximum of the dual use the original authors’ implementations. All the
functionD(λ,ξ,π)(26)(constructedfortheminimization otheroptimizationroutinesareimplementedinMAT-
of pairwise energy (2) under global linear constraints (24) LAB. Both SMR and DD TRW oracles are imple-
and (25)) under constraints π ≥ 0 is equal to the value mentedinC++:forSMRweusedthedynamicversion
k
of the solution of LP relaxation (67)-(69) with addition of of BK max-flow algorithm [7], [21]; for DD TRW
the global linear constraints (24) and (25). we used the domestic implementation of dynamic
programmingspeededupspecificallyforPottsmodel.
The proof is similar to the proof of theorem 5 but
To estimate the primal solution at the current point
requiresincludingextratermsintheLagrangian.The-
we use 5 iterations of ICM (Iterated Conditional
orem7generalizestothecaseofhigh-orderpotentials.
Modes)[4]everywhere.Wenormalizealltheenergies
in such a way that the energy of the α-expansion
7 EXPERIMENTAL EVALUATION result corresponds to 100, the sum of minimums of
7.1 PairwiseassociativeMRFs unary potentials – to 0. All our codes are available
online.8
In this section we evaluate the performance of differ-
Results. Fig. 4 shows the plots of the averaged
ent optimization methods discussed in sec. 5.1 and
lower bounds vs. time for stereo and segmentation
compare the SMR approach against the dual decom-
instances. See fig. 7 in the supplementary material
position on trees used in [30], [17] (DD TRW) and
for more detailed plots. We observe that SMRoutper-
TRW-S method [25].
forms DD TRW for almost all optimization methods
Dataset. To perform our tests we use the MRF in-
and for the segmentation instances it outperforms
stances created by Alahari et al. [1].7 We work with
TRW-S as well. When α-expansion was not able to
instancesoftwotypes:stereo(fourMRFs)andseman-
find the global minimum of the energy (1 segmenta-
tic image segmentation (five MRFs). All of them are
tion instance and all stereo instances) all the tested
pairwise MRFs with 4-connectivity system and Potts
methods obtained primal solutions with lower ener-
pairwisepotentials.Theinstancesofthesetwogroups
gies than the energies obtained by α-expansion.
are of significantly different sizes and thus we report
the results separately.
7.2 High-orderpotentials
Parameter choice. For each group and for each op-
timization method we use grid search to find the In this section we evaluate the performance of the
optimalmethodparameters.Asacriterionweusethe SMR approach on the high-order robust Pn-Potts
7.http://www.di.ens.fr/∼alahari/data/pami10data.tgz 8.https://github.com/aosokin/submodular-relaxation