Table Of ContentPolynomial Representation for the Expected
Length of Minimal Spanning Trees∗
5
1 Jared Nishikawa † Peter T. Otto † Colin Starr †
0
2
January 16, 2015
n
a
J
5 Abstract
1
In this paper, we investigate the polynomial integrand of an integral
formula that yields the expected length of the minimal spanning tree of
]
R a graph whose edges are uniformly distributed over the interval [0,1]. In
particular, we derive a general formula for the coefficients of the poly-
P
nomial and apply it to express the first few coefficients in terms of the
.
h structure of the underlying graph; e.g. number of vertices, edges and
t
a cycles.
m
[ 1 Introduction
1
v In 2002, J.M. Steele [7] derived an integral formula for the expected length
8 of a minimal spanning tree (MST) of a graph with independent edge lengths
5
uniformly distributed over the interval [0,1]. While the formula gives an exact
7
value of the mean length of the MST in terms of the Tutte polynomial of the
3
0 graph, it yields (at least to us) little intuition of how the MST relates to the
. structure of the underlying graph.
1
This provided the motivation for the research project investigated by the
0
5 Willamette University group of the Willamette Valley REU-RET Consortium
1 for Mathematics Research in the summer of 2008. The authors of this paper
v: weremembersofthatresearchgroupandthispapercoverstheworkthatbegan
i that summer.
X
The main result of this paper is a general formula for the coefficients of the
r
polynomialintegrandinSteele’sformulafortheexpectedlengthoftheMSTofa
a
simple,finite,connectedgraph. Forthefirstsevencoefficientsofthepolynomial,
we prove a surprising result that expresses the coefficients in terms of features
of the underlying graph; e.g. the number of vertices, edges, and cycles.
The remainder of this paper is organized as follows: In Section 2, we state
Steele’sformula,whichiswrittenintermsoftheTuttepolynomialoftheunder-
lyinggraph. InSection3,weinvestigatetheintegrandoftheformulaandprove
∗ThisresearchwassupportedbyNSFgrant#0649068fundingtheWiVaMREU-RETin
Mathematics
†WillametteUniversity
1
that it is a polynomial, expressing the coefficients in terms of characteristics of
the graph. We illustrate our results with an example in Section 4 and examine
the particular case of the complete graph in Section 5.
Throughout this paper, “graph” means a finite simple graph. We adopt the
usualnotations: V(G)andE(G)arethevertexandedgesetsofG,respectively.
The rank of G is denoted by r(G), and r(G) = |V(G)|−k(G), where k(G)
denotes the number of connected components of G.
2 Steele’s Formula
Let G be a graph. We assign independent random lengths ξ with uniform
e
distribution over the interval [0,1] to the edges e∈E(G). The total length of a
minimal spanning tree (MST) of the graph G is denoted by
(cid:88)
L(G)= ξ .
e
e∈E(MST(G))
We are interested in the expected value of L(G), which we denote by E[L(G)].
Steele’s formula for E[L(G)] is written in terms of the Tutte polynomial of
a graph, which we define next.
Definition 2.1. Let G be a graph, and define S(G) to be the set of spanning
subgraphs of G; i.e., subgraphs of G with vertex set V(G) and edge set a subset
of E(G). The Tutte polynomial of G is defined as follows:
(cid:88)
T(G;x,y)= (x−1)r(G)−r(A)(y−1)|E(A)|−r(A).
A∈S(G)
The Tutte polynomial of a graph encodes much information about the graph,
but we will only use the definition above in our analysis and refer the reader to
[1] for more information.
We will use the following result about the Tutte polynomial in the proof of
the main result. The proof is a straightforward calculation from the definition
and so we omit it.
Lemma 2.2. Let G be a connected graph with n vertices and m edges. Then
for values of (x,y) satisfying (x−1)(y−1)=1, we have
(cid:18) x (cid:19)m
(a) T(G;x,y)=(x−1)n−1
x−1
(cid:88) (cid:18) x (cid:19)m
(b) Tx(G;x,y)=(x−1)n−2 k(A)(y−1)|A|− x−1
A∈S(G)
where T denotes the partial derivative of T with respect to x.
x
WenowstateSteele’sintegralformulafortheexpectedlengthoftheminimal
spanning tree that was proved in [7].
2
Theorem 2.3. (Steele’s formula) Let G be a connected graph and T(G;x,y)
the Tutte polynomial of G. Then
(cid:16) (cid:17)
E[L(G)]=(cid:90) 1 1−tTx G;1t,1−1t dt (1)
(cid:16) (cid:17)
0 t T G;1, 1
t 1−t
Steele’s formula above has been generalized to the case of an arbitrary, but
still identical, edge distribution [5] and to edge distributions that are not nec-
essarily identically distributed [6].
3 Integrand in Steele’s Formula
3.1 Polynomial integrand
We begin by showing that the integrand in Steele’s integral formula is a poly-
nomial of degree less than or equal to the number of edges in the graph.
Theorem 3.1. Let G be a connected graph with n vertices and m edges. Then
(cid:90) 1
E[L(G)]= p (t)dt
m
0
where p (t) is a polynomial of degree less than or equal to m.
m
Proof. For convenience, we let |A|=|E(A)|. By Lemma 2.2, we have
(cid:16) (cid:17)
T G;1, 1
1−t x t 1−t = (cid:88) k(A)t|A|(1−t)m−|A|−1
(cid:16) (cid:17)
t T G;1, 1
t 1−t A∈S(G)
m−|A| (cid:18) (cid:19)
(cid:88) (cid:88) m−|A|
= −1+ k(A) (−1)m−|A|−j tm−j
j
A∈S(G) j=0
This establishes the result, but we refine the coefficients further. Define
m−|A| (cid:18) (cid:19)
(cid:88) (cid:88) m−|A|
p (t)=−1+ k(A) (−1)m−|A|−j tm−j.
m j
A∈S(G) j=0
Let i=m−j. Then m−|A|−j =i−|A|, so we have
m−|A| (cid:18) (cid:19)
(cid:88) (cid:88) m−|A|
p (t)=−1+ k(A) (−1)i−|A| ti.
m m−i
A∈S(G) m−i=0
Tofindthecoefficientofti,wesumoverallAinS(G)suchthat|E(A)|≤i.
This yields
i (cid:18) (cid:19)
(cid:88) m−(cid:96) (cid:88)
a = (−1)i−(cid:96) k(A), (2)
i m−i
(cid:96)=0 A∈S(cid:96)
3
where S :={A∈S(G):|E(A)|=(cid:96)}. Thus
(cid:96)
m
(cid:88)
p (t)=−1+ a ti
m i
i=0
with a as above.
i
In the proof of Theorem 3.1, we derived an initial formula (2) for the coef-
ficients of the polynomial integrand in Steele’s formula for the expected length
of the MST. In the next section, we derive an easier working form for the co-
efficients but we end this section with our first main result on the first three
coefficients.
Theorem 3.2. Let
m
(cid:88)
p (t)=−1+ a ti
m i
i=0
be the polynomial integrand in Steele’s formula for the expected length of the
MST of a connected graph G with n vertices and m edges. Then
a =n, a =−m, and a =0
0 1 2
Proof. The set S consists of just the single subgraph of G with no edges and
0
(cid:80)
n vertices, which has n connected components. Therefore, k(A) = n.
A∈S0
Next, the set S consists of the m spanning subgraphs with just one edge, each
1
(cid:80)
of which has exactly n−1 connected components. Therefore, k(A) =
m(n−1). Lastly,thesetS consistsof(cid:0)m(cid:1)spanningsubgraphswitAh∈tSw1oedges,
2 2
each of which has exactly n−2 connected components.
Substituting in these values into formula (2) yields
(cid:88) (cid:88) (cid:88)
a = k(A)=n, a =−m k(A)+ k(A)=−mn+m(n−1)=−m
0 1
A∈S0 A∈S0 A∈S1
and
(cid:18) (cid:19)
m (cid:88) (cid:88) (cid:88)
a = k(A)−(m−1) k(A)+ k(A)
2 2
A∈S0 A∈S1 A∈S2
(cid:18) (cid:19) (cid:18) (cid:19)
m m
= n−m(m−1)(n−1)+ (n−2)=0.
2 2
This completes the proof.
3.2 Coefficients of the polynomial integrand
In the previous theorem, the initial formula (2) for the coefficients is easily
applied for the cases (cid:96)=0,1,2, because for each such (cid:96), the members of S all
(cid:96)
have the same number of connected components. When k(A) is non-constant
on S , the enumeration becomes more difficult.
(cid:96)
4
Accordingly, we partition the set S into subsets with different numbers of
(cid:96)
connected components. This can be achieved by partitioning over the ranks of
the members of S since subgraphs in S with the same rank have the same
(cid:96) (cid:96)
number of connected components, namely n−r.
Let k(cid:96) be the number of spanning subgraphs of G in S with rank r; i.e.
r (cid:96)
the number of spanning subgraphs of G with (cid:96) edges and n − r connected
components. In terms of k(cid:96), formula (2) can be rewritten as
r
i (cid:18) (cid:19) (cid:96)
(cid:88) m−(cid:96) (cid:88)
a = (−1)i−(cid:96) k(cid:96)(n−r), (3)
i m−i r
(cid:96)=0 r=r(cid:96)
where r is the minimum rank of a graph with n vertices and (cid:96) edges. If K is
(cid:96) q
the largest complete graph with |E(K )|<(cid:96), then r =q. In other words, r is
q (cid:96) (cid:96)
the largest integer with (cid:0)r(cid:96)(cid:1)<(cid:96).
2
We use the fact that (cid:80)(cid:96) k(cid:96) = (cid:0)m(cid:1) to reduce the number of terms of k(cid:96)
r=r(cid:96) r (cid:96) r
by one in (3). The new general expression for the polynomial coefficients a
i
for i ≥ 3 is stated in Theorem 3.4 below. The proof of the theorem requires a
couple of combinatorial identities stated in the following lemma.
Lemma 3.3. For integers m,k,i and n,
(cid:18) (cid:19)(cid:18) (cid:19) (cid:18) (cid:19)(cid:18) (cid:19)
m−k m m i
(a) =
m−i k m−i k
n (cid:18) (cid:19)
(cid:88) n
(b) (−1)k k =0
k
k=0
Theorem 3.4. Let a be the coefficients of the polynomial integrand in Steele’s
i
integral formula for the expected length of the MST of a connected graph G with
n vertices and m edges. Then for i≥3
i (cid:18) (cid:19) (cid:96)−1
(cid:88) m−(cid:96) (cid:88)
a = (−1)i−(cid:96) k(cid:96)((cid:96)−r).
i m−i r
(cid:96)=3 r=r(cid:96)
Proof. Summing all the terms k(cid:96) for a fixed number of edges (cid:96) yields the total
r
number of spanning subgraphs in S , which equals (cid:0)m(cid:1). This implies that k(cid:96) =
(cid:96) (cid:96) (cid:96)
(cid:0)m(cid:1)−(cid:80)(cid:96)−1 k(cid:96) and thus from formula (3), we get
(cid:96) r=r(cid:96) r
i (cid:18) (cid:19)(cid:34)(cid:96)−1 (cid:32)(cid:18) (cid:19) (cid:96)−1 (cid:33) (cid:35)
(cid:88) m−(cid:96) (cid:88) m (cid:88)
a = (−1)i−(cid:96) k(cid:96)(n−r)+ − k(cid:96) (n−(cid:96))
i m−i r (cid:96) r
(cid:96)=0 r=r(cid:96) r=r(cid:96)
i (cid:18) (cid:19)(cid:34)(cid:96)−1 (cid:18) (cid:19) (cid:35)
(cid:88) m−(cid:96) (cid:88) m
= (−1)i−(cid:96) k(cid:96)((cid:96)−r)+ (n−(cid:96))
m−i r (cid:96)
(cid:96)=0 r=r(cid:96)
i (cid:18) (cid:19) (cid:96)−1 i (cid:18) (cid:19)(cid:18) (cid:19)
(cid:88) m−(cid:96) (cid:88) (cid:88) m−(cid:96) m
= (−1)i−(cid:96) k(cid:96)((cid:96)−r)+ (−1)i−(cid:96) (n−(cid:96))
m−i r m−i (cid:96)
(cid:96)=0 r=r(cid:96) (cid:96)=0
5
The minimum ranks for (cid:96) = 0,1,2 are r = 0,r = 1 and r = 2. Therefore,
0 1 2
for these values of (cid:96), the summation on r is empty and a reduces to the second
i
summation. This and Lemma 3.3(a) yield
(cid:34) i (cid:18) (cid:19) (cid:96)−1 (cid:35) (cid:34) i (cid:18) (cid:19)(cid:18) (cid:19) (cid:35)
(cid:88) m−(cid:96) (cid:88) (cid:88) m i
a = (−1)i−(cid:96) k(cid:96)((cid:96)−r) + (−1)i−(cid:96) (n−(cid:96))
i m−i r m−i (cid:96)
(cid:96)=3 r=r(cid:96) (cid:96)=0
The second sum equals zero by the Binomial Theorem and Lemma 3.3(b).
Theaboveresultgivesageneralformulaforthecoefficientsofthepolynomial
integrand in terms of the values k(cid:96). Determining the values of k(cid:96) for large (cid:96)
r r
poses a major challenge. We conclude this section with the enumeration for
(cid:96)=3,4,5,6 and the corresponding coefficients of p (t).
m
Definition 3.5. For a connected graph G, define
(a) c = number of cycles of size i in G.
i
(b) c = number of cycles of size i with one chord.
i,1
(c) c¯ = number of cycles of size i with one chord and one additional edge
i,1
that is not a chord of the cycle.
(d) k = number of complete subgraphs K in G.
i i
(e) k = number of complete bipartite subgraphs K in G.
i,j i,j
Lemma 3.6. For (cid:96)=3,4,5,6,
(cid:96)−1 (cid:96) (cid:18) (cid:19)
(cid:88) (cid:88) m−j
k(cid:96)((cid:96)−r)= c −d (4)
r j m−(cid:96) (cid:96)
r=r(cid:96) j=r(cid:96)
where
d =0, d =0, d =k5, d =c¯ +c +k +4k
3 4 5 3 6 4,1 5,1 3,2 4
Proof. Weshowtheaboveresultfor(cid:96)=5;theothercasesaresimilarinnature.
The minimum rank for (cid:96) = 5 is r = 3 and so the left-hand side of equation
5
(4) is k5 +2k5. The types of subgraphs counted in k5 are those with 5 edges
4 3 4
andn−4connectedcomponents,whichhavetheformshowninFigure1(a)-(c).
Analogously, there is only one type of subgraph counted in k5, which is shown
3
in Figure 1(d). Note that the graphs in Figures 1 and 2 that are a one-clique
sum of smaller graphs actually represent families that include subgraphs that
(cid:83)
aredisjointunionsofthesummands. Forexample,1(a)includesK P ,where
3 2
K is the complete graph on three vertices and P is a path with two edges.
3 2
Now consider the right-hand side of (4). Start with any 3-cycle and choose
any other 2 edges in the graph; there are c (cid:0)m−3(cid:1) ways to do this. This counts
3 2
all the types of subgraphs depicted in Figure 1(a) and counts the subgraphs
in Figure 1(d) twice. Figure 2 gives a pictorial representation of c (cid:0)m−3(cid:1). The
3 2
subgraphscountedbyc (cid:0)m−4(cid:1)(startwitha4-cycleandchooseanyotheredge)
4 1
areofthetypeshowninFigure1(b)andFigure1(d). Thesearedepictedinthe
right-hand side of Figure 2.
6
k5 = + + k35 =
4
(a) (b) (c) (d)
Figure 1: Representations of the subgraphs counted in k5 and k5
4 3
c3(m2-3)= + 2 c4(m1-4) = +
Figure 2: Representations of the subgraphs counted in c (cid:0)m−3(cid:1) and c (cid:0)m−4(cid:1)
3 2 4 1
+
Lastly, c is the number of 5-cycles, which are shown in Figure 1(c). There-
5
fore,
(b) (c)
(cid:18) (cid:19) (cid:18) (cid:19)
m−3 m−4
k5+2k5 =c +c +c −k5.
4 3 3 2 4 1 5 3
While initially Lemma 3.6 appears only to complicate the coefficient for-
mula given in Theorem 3.4, the next lemma shows that when it is applied to
the coefficient formula, it actually simplifies it. The proof is a straightforward
calculationandsoweomitit; thereasoningisanalogoustotheproofofLemma
3.6. Although we proved the first equation in Lemma 3.7 for i = 3,4,5,6, we
conjecture that it holds in general for all i≥3.
Lemma 3.7. For i=3,4,5,6,
i (cid:18) (cid:19) (cid:96) (cid:18) (cid:19)
(cid:88) m−(cid:96) (cid:88) m−j
(−1)i−(cid:96) c =c
m−i j m−(cid:96) i
(cid:96)=3 j=r(cid:96)
and thus
i (cid:18) (cid:19)
(cid:88) m−(cid:96)
a =c − (−1)i−(cid:96) d (5)
i i m−i (cid:96)
(cid:96)=3
Finally, we derive representations for the coefficients a through a in terms
3 6
ofthestructureoftheunderlyinggraphG. Theproofofthetheoremisadirect
application of Lemmas 3.6 and 3.7 to the general coefficient formula given in
Theorem 3.4.
Theorem 3.8. Let
m
(cid:88)
p (t)=−1+ a ti
m i
i=0
7
be the polynomial integrand in Steele’s formula for the expected length of the
MST of a connected graph G with n vertices and m edges. Then
a =c , a =c , a =c −k5, a =c +2k −c −k .
3 3 4 4 5 5 3 6 6 4 5,1 3,2
4 Application of Results
In this section, we apply Theorems 3.2 and 3.8 to the complete bipartite graph
K in order to derive the expected length of the minimal spanning tree of this
3,2
graph.
Proposition 4.1. Let p (t) be the polynomial integrand in Steele’s formula for
m
the complete bipartite graph, K shown below.
3,2
Then
p (t)=4−6t+3t4−t6
m
and
(cid:90) 1 3 1 51
E[L(K )]= (4−6t+3t4−t6)dt=4−3+ − =
3,2 5 7 35
0
Proof. For K , n=5, and m=6. By Theorem 3.2, we have a =5, a =−6,
3,2 0 1
and a =0. Next, we apply Theorem 3.8. K has no 3-cycles, so a =0. The
2 3,2 3
graph has three 4-cycles, so a = 3. For the coefficient a , we note that there
4 5
are no 5-cycles and also no k5-type subgraphs (a 4-cycle with a chord) either,
3
so a = 0. Lastly, for a , there are no 6-cycles, no K subgraphs, no c -type
5 6 4 5,1
subgraphs (since there are no 5-cycles), and one k -type subgraph (the entire
3,2
graph). Therefore, a =−1 and we get
6
p (t)=−1+5−6t+3t4−t6.
m
5 The Complete Graph
The MST problem on K has been studied extensively. Frieze [3] proved that
n
∞
(cid:88)
lim E[L(K )]=ζ(3)= i−3 =1.202...
n
n→∞
i=1
In [8], Steele extended this result to general edge distributions and Janson [4]
proved a central limit theorem for L(K ).
n
We apply our results to the complete graph and derive exact formulas in
termsofthenumberofverticesnforthefirstsevencoefficientsofthepolynomial
integrand in Steele’s formula.
8
Theorem 5.1. Let p (t) = −1 + (cid:80)m a ti be the polynomial integrand in
m i=1 i
Steele’s formula for the complete graph on n vertices, denoted by K . Then
n
(cid:18) (cid:19) (cid:18) (cid:19)
n n
a =n, a =− , a =0, a =
0 1 2 2 3 3
(cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19)
n n n n n n
a =3 , a =12 −6 , a =60 −60 −2(n−5)
4 4 5 5 4 6 6 5 4
Proof. For the complete graph on n vertices, the number of edges m=(cid:0)n(cid:1) and
2
the number of cycles of length j is given by
(cid:18) (cid:19)
1 n
c = (j−1)!
j 2 j
In addition, k5 =2c , c =5c , k =(cid:0)n(cid:1) and k =(cid:0)n(cid:1)(cid:0)5(cid:1).
3 4 5,1 5 4 4 3,2 5 2
NumericalcalculationofE[L(K )]hadledtothefamousconjecturethatthe
n
convergentsequenceisalsomonotoneincreasingandconcave. Thisproblemwas
raised at the conference Mathematics and Computer Science II at Versailles in
2002butnoproofhasbeenfound. Clearly,ourresultsalonewillnotanswerthis
question as we have only derived exact formulas for the first seven coefficients.
Butourresultsgiveahintthattheremaybeapatterntothecoefficientsofthe
polynomial integrand in Steele’s formula for the complete graph, which if true,
would answer the conjecture.
We end this section with a result that factors the polynomial integrand in
Steele’sformulaforK ,withoneofthefactorsapolynomialofdegreelessthan
n
or equal to the number of edges of K .
n−1
Theorem 5.2. Let p (t) be the polynomial integrand in Steele’s formula for
m
the expected length of the MST of the complete graph on n vertices denoted by
K . Then
n
p (t)=(1−t)n−1q(t),
m
where q(t) is a polynomial of degree less than or equal to (cid:0)n−1(cid:1).
2
Proof. As in the proof of Theorem 3.1, we have
(cid:88)
p (t)= k(A)t|A|(1−t)m−|A|−1,
m
A∈S(G)
where m=(cid:0)n(cid:1).
2
We factor out (1−t)n−1 to get
pm(t)=(1−t)n−1 (cid:88) k(A)t|A|(1−t)(n2)−|A|−(n−1)−(1−t)1−n.
A∈S(G)
9
Note that (cid:0)n(cid:1)−(n−1)=(cid:0)n−1(cid:1). Now the sum ranges over spanning subgraphs
2 2
of size (in edges) from 0 to (cid:0)n(cid:1). We split it into two sums as follows:
2
pm(t)=(1−t)n−1 (cid:88) k(A)t|A|(1−t)(n−21)−|A|
|A|≤(n−1)
2
+ (cid:88) k(A)t|A|(1−t)(n2)−|A|−(n−1)−(1−t)1−n
|A|>(n−1)
2
Clearly, the first sum over |A|≤(cid:0)n−1(cid:1) is a polynomial of degree at most (cid:0)n−1(cid:1).
2 2
Call it q (t).
1
For the second sum, we sum over possible number of edges i > (cid:0)n−1(cid:1) and
2
count the number of subgraphs with i edges, which for the complete graph is
(cid:0)(n2i)(cid:1). Furthermore, for all spanning subgraphs of Kn with i>(cid:0)n−21(cid:1) edges, the
number of connected components is 1. Therefore, we have
(cid:88) k(A)t|A|(1−t)(n2)−|A|−(n−1)
|A|>(n−1)
2
(n)
= (cid:88)2 (cid:18)(cid:0)n2(cid:1)(cid:19)ti(1−t)(n2)−i−(n−1)
i
i=(n−1)+1
2
(n) (n−1)
= (cid:88)2 (cid:18)(cid:0)n2(cid:1)(cid:19)ti(1−t)(n2)−i(1−t)−(n−1)− (cid:88)2 (cid:18)(cid:0)n2(cid:1)(cid:19)ti(1−t)(n−21)−i
i i
i=0 i=0
(n) (n−1)
= (1−t)1−n(cid:88)2 (cid:18)(cid:0)n2(cid:1)(cid:19)ti(1−t)(n2)−i− (cid:88)2 (cid:18)(cid:0)n2(cid:1)(cid:19)ti(1−t)(n−21)−i
i i
i=0 i=0
By the Binomial Theorem, the first sum equals 1 and the second sum, call it
q (t), is a polynomial of degree at most (cid:0)n−1(cid:1).
2 2
We now have
p (t)=(1−t)n−1(q (t)+(1−t)1−n+q (t)−(1−t)1−n)=(1−t)n−1(q (t)+q (t)),
m 1 2 1 2
wherebothq (t)andq (t)arepolynomialsofdegreelessthanorequalto(cid:0)n−1(cid:1).
1 2 2
This completes the proof.
References
[1] B.Bollob´as, Modern Graph Theory, SpringerGraduateTextsinMathemat-
ics (1998), p. 335.
10