Randomly coloring constant degree graphs

We study a simple Markov chain, known as the Glauber dynamics, for generating a random k ‐coloring of an n ‐vertex graph with maximum degree Δ. We prove that, for every ε > 0, the dynamics converges to a random coloring within O(nlog n) steps assuming k ≥ k0(ε) and either: (i) k/Δ > α* + ε where α*≈︁ 1.763 and the girth g ≥ 5, or (ii) k/Δ >β * + ε where β*≈︁ 1.489 and the girth g ≥ 7. Our work improves upon, and builds on, previous results which have similar restrictions on k/Δ and the minimum girth but also required Δ = Ω (log n). The best known result for general graphs is O(nlog n) mixing time when k/Δ > 2 and O(n2) mixing time when k/Δ > 11/6. Related results of Goldberg et al apply when k/Δ > α* for all Δ ≥ 3 on triangle‐free “neighborhood‐amenable” graphs.© 2012 Wiley Periodicals, Inc. Random Struct. Alg., 2013


INTRODUCTION
Markov Chain Monte Carlo (MCMC) is an important tool in sampling from complex distributions. It has been successfully applied in several areas of Computer Science, most notably computing the volume of a convex body [6,17,19] and estimating the permanent of a non-negative matrix [15].
One particular problem that has attracted significant interest is that of generating a (nearly) random proper k-coloring of a graph G = (V , E) with maximum degree . Recall that it is straightforward to construct a proper k-coloring when k > . Our interest is to sample a coloring uniformly at random from the space of all proper k-colorings. Our goal is to do this random sampling in time polynomial in the number of vertices n = |V |, even though the number of colorings is often exponential in n. This sampling problem is a well-studied problem in Combinatorics (e.g., see [2]) and Statistical Physics (e.g., see [22]).
This paper studies the (heat-bath) Glauber dynamics, which is a simple and popular Markov chain for generating a random coloring. Let K denote the set of proper k-colorings of the input graph G with maximum degree . For technical purposes, the state space of the Glauber dynamics is = [k] V ⊇ K where [k] = {1, 2, . . . , k}. We will often refer to an element of as a coloring. From a coloring Z t ∈ , the evolution Z t → Z t+1 is defined as follows: a. Choose v = v(t) uniformly at random from V . b. Choose color c = c(t) uniformly at random from the set of colors [k] \ Z t (N(v)) available to v, namely: The set N(v) denotes the neighbors of vertex v. c. Define Z t+1 by It is straightforward to verify that for any graph where k ≥ + 2 the stationary distribution is uniformly distributed over the set K (see, e.g., [14]). For δ > 0, the mixing time T mix (δ) is the number of transitions until the dynamics is within variation distance at most δ of the stationary distribution, assuming the worst initial coloring Z 0 . When δ is omitted from the notation, we are referring to the mixing time T mix = T mix (1/2e). The choice of constant 2e is somewhat arbitrary, and it follows by a straightforward boosting argument (see, e.g., [18]) that T mix (δ) ≤ T mix (1/2e) ln(1/2δ) for any δ > 0.
Jerrum [14] proved that the Glauber dynamics has mixing time O(n log n) provided k/ > 2. This leads to the challenging problem of determining the smallest value of k/ for which a random k-coloring can be generated in time polynomial in n. Note, Hayes and Sinclair [11] have shown that for constant degree graphs (n log n) steps are necessary, i.e., T mix = (n log n).
Vigoda [25] gave the first significant improvement over Jerrum's result, reducing the lower bound on k/ to 11/6 by analyzing a different Markov chain. His result implied O(n 2 ) mixing time for the Glauber dynamics for the same range of k/ . There has been no success in extending Vigoda's approach to smaller values of k/ , and it remains the best bound for general graphs.
Dyer and Frieze [4] introduced an approach, known as the burn-in method, which improved the lower bound on k/ for the class of graphs with large maximum degree and large girth. It is within this context that this paper is written. We will prove that the Glauber dynamics is efficient for a much wider range of girth and maximum degree than has been done before.
The task in a theoretical analysis of MCMC algorithms is to show that a given Markov chain converges rapidly to its steady state. The time to get "close" in variation distance is called the mixing time. One of the most useful tools for doing this is coupling. We take two copies (X t , Y t ) of a Markov chain M and then bound the variation distance d t between the t-step distribution and the steady state distribution via the coupling inequality: We are free to choose our coupling and we endeavour to minimise the RHS of (1). Often we define a distance function dist between states such that X t = Y t implies dist(X t , Y t ) ≥ 1 and then try to prove that our coupling satisfies for some α < 1.
One must consider all possible X t , Y t and so it would seem that we have to take a worstcase pair here. We should point out that path coupling [3] does ameliorate this, in that it allows us to only consider the case where dist(X t , Y t ) = 1. In the burn-in method, we allow the chains to run uncoupled for a sufficient amount of time (the burn-in period) so that only typical pairs of states need be considered. Using this idea Dyer and Frieze reduced the bound to k/ ≥ α for any α > α * where is the root of α = e 1/α . They required lower bounds on the maximum degree = (log n) and on the girth g = (log ). Under these assumptions, Dyer and Frieze proved that after the burn-in period, the colorings X t and Y t satisfy certain properties in the local neighborhood of every vertex, so called local uniformity properties. Assuming these local uniformity properties they were able to avoid the worst case pair in (2).
With the same restrictions on the maximum degree and girth, Molloy [21] improved the lower bound to k/ ≥ β for any β > β * where The girth assumptions were the first to be (nearly) removed. Hayes [8,9] reduced the girth requirements to g ≥ 5 for k/ > α * and g ≥ 7 for k/ > β * . Subsequently, Hayes and Vigoda [12] (using a non-Markovian coupling) reduced the lower bound on k/ to (1 + ) for all > 0, which is nearly optimal. Their result requires girth g > 10. The large maximum degree restriction remained as a serious bottleneck for extending the burn-in approach to general graphs. The assumption = (log n) is required in all of the improvements so far that rely on the burn-in approach.
We significantly improve the maximum degree assumption, only requiring to be a sufficiently large constant, independent of n. When is constant, in a typical coloring a constant fraction of the vertices do not satisfy the desired local uniformity properties. This is the main obstacle our proof overcomes.
Subsequent to the publication of the conference version of this paper [5], Goldberg, Martin and Paterson [7] proved related results. They proved a certain decay of correlations property, which roughly implies that for any triangle-free and neighborhood-amenable graph with maximum degree ≥ 3, when k > α * the Glauber dynamics has mixing time O(n 2 ). The neighborhood amenability property they consider is related to the more common amenability property of infinite graphs, and very roughly, says that the volume of increasing balls around any vertex increases sub-exponentially with the radius.
We prove the following theorem.
then for all δ > 0, the mixing time of the Glauber dynamics on k-colorings of G satisfies T mix (δ) ≤ C * n log(n/δ).
Using now classical results of Jerrum et al [16], the above rapid mixing results imply a fully-polynomial approximation scheme (FPRAS) for counting k-colorings under the same conditions. Recent work of Štefankovič et al [23] designs such an approximate counting algorithm with running time O * (n 2 ).
There are several recent results with a more refined picture for trees or planar graphs. Martinelli, Sinclair and Weitz [20] have significantly stronger results for the complete tree with degree . They showed O(n log n) mixing time for k ≥ + 3, even for any boundary condition which is a fixed coloring of the leaves of the complete tree. For planar graphs with maximum degree , Hayes et al [10] were able to achieve polynomial mixing time for k , in particular, they showed polynomial mixing time when k > 100 / log . More recently, Tetali et al [24] have shown that on the complete tree the mixing time of the Glauber dynamics has a phase transition at k ≈ / log .
The heart of our proof analyzes a simple coupling over T m = (n) steps for an arbitrary pair of colorings which initially differ at a single vertex v 0 . We prove that the expected Hamming distance after T m steps is at most 3/4. We do this by breaking the analysis into two scenarios. In the advantageous scenario, during the entire T m steps, the Hamming distance stays small and all disagreements are close to v 0 . If both of these events occur, after an initial burn-in period of T b < T m steps, every updated vertex near v 0 will have certain local uniformity properties (the same properties used by [4,8,21]). It will then be straightforward to prove that the Hamming distance decreases in expectation over the final T m − T b steps. In the disadvantageous scenario where one of the events fails, we use a crude upper bound on the Hamming distance.

PRELIMINARIES
For X t , Y t ∈ , let X t ⊕ Y t denote their difference and we use D t to denote this set of "disagreements", namely: For vertex v, let d(v) denote its degree and N(v) denote its neighborhood. For vertex v and integer R ≥ 1, we denote the ball of radius R around v by is the graph distance between v and w, i.e., the length of the shortest path from v to w. For a coloring X t and vertex v, let For an event E, we will use the notation 1(E) to refer to the {0, 1}-valued indicator variable for the event E, i. e., To prove Theorem 1, we will use path coupling [3] for T = Cn log(n/δ) steps of the Glauber dynamics. Therefore, for all X 0 , Y 0 ∈ where |X 0 ⊕ Y 0 | = 1, we will define a T -step coupling such that Then, for any X 0 , Y 0 ∈ , since the maximum possible Hamming distance is n, it follows by path coupling that after which Theorem 1 follows by the coupling inequality, (1). The following two technical results are elementary, but will be useful in the proof of Theorem 11.

Lemma 2.
Suppose v, w are vectors of length n, whose coefficients are both sorted in the same order. Then Proof. Since the ordering of coefficients is the same in both v and w, we have We will use the following corollary of the above lemma.
Proof. The first inequality is just a union bound, noting that X ≥ nτ implies some X i ≥ τ . The second inequality comes from applying Lemma 2 with v i = X i and w i = 1(X i ≥ τ ), which clearly have the same sorted order.

Coupling Analysis
For a pair of colorings X and Y , let denote a shortest path between X and Y along pairs of colorings that differ at a single vertex, i.e., this is standard in the application of the path coupling technique [3] to colorings, and it is the reason why the Glauber dynamics was defined with state space = [k] V instead of the set K of proper colorings. This is the path used for the purposes of the path coupling approach of Bubley and Dyer [3]. We refer to the colorings Z 1 , . . . , Z as interpolated colorings for X and Y .
We use the following one-step coupling, which is also used in many of the previous works which apply path coupling. Since we use path coupling we only need to analyze pairs of colorings that initially differ at a single vertex v, which we refer to as neighboring colorings.
At every time t we choose a random vertex v = v(t), and update v in both chains X t and Y t . We couple the available colors for v so as to maximize the probability that X t+1 (v) and With the remaining probability, each chain colors independently from the remaining distribution over their available colors for v.
The heart of our coupling analysis will be to show that for a pair of "nice" neighboring colorings the expected Hamming distance after O(n) steps is small. By "nice" neighboring colorings we mean colorings that have certain local uniformity properties in the local neighborhood around the disagreement. This is formalized in Section 3.2. We use that any coloring, after O(n log ) steps of the Glauber dynamics is likely to be "nice", then we can use that the Hamming distance is likely to contract after O(n) steps.

Local Uniformity Properties
A key element in our proof is that for a "nice" initial coloring, after O(n) steps of the Glauber dynamics, a vertex will have certain local uniformity properties with high probability. To this end, we use the following definition of heaviness from Hayes [9]. The rough idea is that if no color appears too often in the local neighborhood of a vertex v then we only need to recolor most (all but a small constant fraction) of the local neighborhood of v in order for the coloring of N(v) to appear close to random. To recolor most of N(v) requires O(n) steps, rather than O(n log ) steps if we needed to recolor all of the local neighborhood of v.

Definition 4.
We say that a coloring X is ρ-heavy for color c at a vertex v if at least ρ vertices within distance 2 of v receive color c under X, or at least ρ / log neighbors of v receive color c under X.
To be considered "nice" at a vertex v, a coloring should not be heavy for any colors at any vertices too close to v. We formalize this notion as follows.
Definition 5. Let X be a coloring, let ρ > 0, and let v be a vertex. We say v is ρ-suspect for radius R if there exists a vertex w within distance R of v and a color c such that X is ρ-heavy at w for c. Otherwise, we say that, in X, v is ρ-above suspicion for radius R.
For a pair of colorings X, Y and a vertex v where X(v) = Y (v) we say that v is a ρ-suspect disagreement if there exists a vertex w within distance R of v and a color c such that either X or Y is ρ-heavy at w for c. Otherwise, we say that v is a ρ-above suspicion disagreement for radius R.
We next make an easy but crucial observation about the above definitions. For a pair of colorings X and Y , recall that for the purposes of path coupling we consider the shortest path between X and Y along neighboring colorings, namely, . This sequence of colorings Z 1 , . . . , Z we called interpolated colorings for X and Y . A key aspect of the above definitions is that "niceness" is automatically inherited by interpolated colorings, as we now formally state. Observation 6. If X and Y are colorings, neither of which is ρ-heavy for color c at vertex v, then no interpolated coloring is 2ρ-heavy at v. Likewise, if v is ρ-above suspicion disagreement for radius R, then in every interpolated coloring v is 2ρ-above suspicion for radius R.
The first basic local uniformity result says that from any initial coloring X 0 , for any vertex v, after O(n log ) steps of the Glauber dynamics, v is not 200-heavy at v with high probability. In this paper, an event is said to occur with high probability if its failure probability is exp(− ( γ )) for some positive constant γ . (17) of Lemma 26 in Hayes [9]). Let δ > 0, let

Lemma 7 (
. Let X 0 be an arbitrary coloring. Then, Moreover, as the next lemma states, if the initial coloring is not 400-heavy at vertex v, this property is maintained, and even improves slightly after O(n) steps with high probability.
. Let X 0 be a coloring that is 400-above suspicion for radius R ≤ 9/10 at v. Then, is the set of available colors for v in X t . The local uniformity properties concern the available colors and the number of "unblocked" neighbors for a pair of colors.
be the indicator variable for the event that w is unblocked for c 1 or c 2 , i.e., at least one of c 1 and c 2 does not appear on N(w) \ {v}. Finally, we describe the main burn-in result. For an initial coloring X 0 which is 400above suspicion at a vertex v for sufficiently large constant radius, after O(n) steps of the Glauber dynamics certain local uniformity properties hold for v with high probability. In particular, it has close to the expected number of available colors as if its neighbors were colored independently, and close to the expected number of neighbors that are unblocked for a pair of colors c and c .
The following is from Hayes [9]. Part 1 of the following lemma is Lemma 25 in [9], and Part 2 is the second part of Corollary 34 in [9].
1. If the girth of G is ≥ 5, then for arbitrary X 0 : 2. If the girth of G is ≥ 7, and X 0 is 400-above suspicion for radius R = R(γ , δ) at v, then for every pair of colors c 1 , c 2 :

Disagreement Percolation
A basic tool used in several of our proofs will be the notion of propagation of disagreements, see [1]. If for some then there exists a neighbor w of v which propagates its disagreement to v in the following sense: in chain X we chose color c(t + 1) = Y t (w) or in chain Y we chose c(t + 1) = X t (w). In this way, if we initially had a single disagreement X 0 ⊕ Y 0 = {v 0 }, then a disagreement at time t can be traced back via a path of disagreements to v 0 .

Weak Analysis for Worst-Case Pair of Colorings
For a worst-case pair of neighboring colorings, the following result states some upper bounds on the Hamming distance after O(n) and O(n log ) steps of the coupling. Part 4 of the lemma states that after O(n log ) steps, any remaining disagreements are likely to be "nice".
Lemma 10. For every 0 < < 1, every C ≥ 3, there exists 0 > 0 such that for any graph G on n vertices with maximum degree > 0 and girth g ≥ 5, any k > 1.45 , the following hold. Let X 0 , Y 0 be colorings which disagree at a single vertex v. Let T = Cn/ . Then, Proofs of Lemmas 10.1 and 10.2. For parts 1 and 2, we will just bound the rate of spreading of disagreements. In each time step, the number of expected disagreements increases by at most a factor of 1 + n(k− ) ≤ exp(3/n). This holds regardless of the history on previous steps. (There is a ≤ /n chance that v(t) is the neighbor of a particular disagreement and then a ≤ 1/(k − ) chance that the disagreement spreads to v(t)). Hence, expanding out the conditional probabilities, it follows by induction that, after t steps, the expected number of disagreements is at most exp(3t/n). Plugging in the values t = T = Cn/ and t = T log = Cn log / establishes parts 1 and 2 respectively.

Let S T
Proof of Lemma 10.3. Recall, for X t , Y t ∈ , their difference is denoted by Denote their Hamming distance by H t = |D t |. Also, denote their cumulative difference by and denote their cumulative Hamming distance by H ≤t = |D ≤t |. We will prove that for every integer 1 ≤ ≤ n, for T = Cn/ , For 1 ≤ i ≤ , let t i be the time at which the i'th disagreement is generated (possibly counting the same vertex multiple times). Denote t 0 = 0. Let η i := t i − t i−1 be the waiting time for the formation of the i'th disagreement. Conditioned on the evolution at all times in [0, t i ], the distribution of η i stochastically dominates a geometric distribution with success probability ρ i and range {1, 2, . . .}, where This is because at all times prior to t i we have H t ≤ i and thus the set H ≤t increases with probability at most ρ i at each step, regardless of the history. The numerator in the expresion for ρ i is an upper bound on the number of vertices that are non-disagreeing neighbors of disagreements and the denominator is a lower bound on the probability of choosing a fixed such vertex and then choosing a color that increases the number of disagreements. Hence η 1 + · · · + η stochastically dominates the sum of independent geometrically distributed random variables with success probabilities ρ 1 , ρ 2 , . . . , ρ . Now for any real x ≥ 0, Thus η 1 + · · · + η stochastically dominates the sum of exponential random variables with parameters 2ρ 1 , 2ρ 2 , . . . , 2ρ . Now ρ i ≤ iρ where ρ = (k− )n and so η 1 + · · · + η stochastically dominates the sum of exponential random variables ζ 1 , ζ 2 , . . . , ζ with parameters 2ρ, 4ρ, . . . , 2 ρ. Now consider the problem of collecting coupons, assuming each coupon is generated by a Poisson process with rate 2ρ. The delay between collecting the i'th coupon and the i + 1'st coupon is exponentially distributed with rate 2( − i)ρ. Hence the time to collect all coupons has the same distribution as ζ 1 + · · · + ζ . But the event that the total delay is less than T is nothing but the intersection of the (independent) events that all coupons are generated in [0, T ]. The probability of this is This completes the proof of (7). We can now bound the expected Hamming distance at time T m as follows:

Pr(H ≤T = )
Pr(H ≤T ≥ ) The above quantity is at most exp(− √ ), for sufficiently large . This completes the proof of Lemma 10.3.
Proof of Lemma 10.4. To prove part 4, we will use the burn-in result of Lemma 7. We divide the analysis into two cases: those vertices inside and outside B R (v) for R = √ . Let us start with the vertices inside B R (v). We apply Lemma 7 to each vertex w ∈ B R (v), at time T := T log < n exp( /C b ) for sufficiently large, concluding that, w is 4-above suspicion for radius 2 3/5 in X T and Y T with probability at least 1 − 2 exp(− /C b ) > 1−exp(− 3/4 ) for sufficiently large. Hence, if D T is the set of disagreements of (X T , Y T ) that are 200-suspect for radius 2 3/5 , we have shown that: To bound the number of disagreements outside B R (v), we observe that each disagreement in D T \ B R (v) comes from a path of disagreements starting at v, and having length at least R. Hence, by a union bound, we have: Summing the above bounds (8) and (9) on E(|D T \B R (v)|) and E(|D T ∩B R (v)|), respectively, gives the desired upper bound on |D T |, assuming 0 is sufficiently large.

Analysis for "Nice" Pairs of Colorings
Lemma 10.4 shows that from a worst-case pair of colorings that differ at a single vertex, after O(n log ) steps, all disagreements are likely to be "nice" in the sense of being above suspicion. The heart of our rapid mixing proof will be the following result, which shows that for a pair of neighboring colorings that are "nice" (namely, above suspicion), there is a coupling of O(n) steps of the Glauber dynamics where the expected Hamming distance decreases. Also, at the end of this O(n) step coupling, it is extremely unlikely that there are any disagreements that are not "nice".
Theorem 11. There exists C ≥ 3, and for every > 0, there exists 0 , such that for every graph G = (V , E) on n vertices with maximum degree > 0 and girth g, if either: then the following hold. Suppose X 0 , Y 0 differ only at v and v is 400-above suspicion for R, where 3/5 ≤ R ≤ 2 3/5 . Let T m = C n/ . Then,

Pr there exists a 200-suspect disagreement for R
We will prove the above theorem in the next section.

Proving "Contraction" of the Coupling for a Worst-Case Pair of Colorings
Tying together Lemma 10 and Theorem 11, we show that for a worst-case initial pair of colorings that differ at a single vertex, after O(n log ) steps of the coupling, the expected Hamming distance is small.

Lemma 12.
There exists a constant C > 0, for every > 0, there exists 0 , such that for every graph G = (V , E) on n vertices with maximum degree ≥ 0 and girth g, if either: then the following holds. Let X 0 , Y 0 be colorings which disagree at a single vertex v that is 400-above suspicion for R = 2 3/5 . Let T = C n log . Then, Proof. The high level idea is to apply Theorem 11 a number of times. Let T m = C n/ . First, we start from (X 0 , Y 0 ), and run T m steps. We use Theorem 11 to analyze the coupling for these first T m steps. In the event that the number of disagreements has not dropped to zero after these T m steps, we interpolate a sequence of intermediate colorings, Z 0 , . . . , Z d , so that each Z i , Z i+1 differ at a single vertex, and then apply path coupling. Then for each pair of colorings Z i , Z i+1 that differ at a single vertex v i , to analyze the performance of the coupling over the next T m steps, we apply Theorem 11 if v i is 400-above suspicion, and otherwise we apply Lemma 10. At the end of these T m steps we apply path coupling again and repeat the above procedure.
For colorings interpolated at time iT m , we will use R = R i = 2 3/5 − 2i √ in our applications of Theorem 11.
Let E i denote the event that, at some time t ≤ iT m , the Hamming distance between X t and Y t exceeds 2i/3 . (Note, E 1 = E Tm where E T was defined in the statements of Lemmas 10.3.) Let S i denote the event that, at some time t ≤ iT m , there exists a 200-suspect (for radius R i ) disagreement of X t and Y t . Recall, that if X t and Y t have no 200-suspect disagreements, then the interpolated pairs of neighboring colorings have no 400-suspect disagreements.
Let H i = |X t ⊕ Y t | be the total number of disagreements at time t = iT m . We will bound the Hamming distance by considering the above events in the following manner: We now consider the summands on the right-hand side of (10) one by one.
In the following, the phrase "by path coupling" conveys the idea that if there are k disagreements at time iT m , then by applying the path coupling approach, we can bound the expected number of disagreements at time (i + 1)T m by kL where L is the bound obtained by assuming that k = 1.
It remains to bound the two terms in the right-hand side of (11). The first term, E(H i 1(E i−1 )), we will handle by induction. Now observe that, for E i−1 and E i to both occur, at least one of the pairs of neighboring colorings at time (i − 1)T m must expand to Hamming distance ≥ 2/3 by time iT m . Hence we can bound E(H i 1(E i )1(E i−1 )) by using Lemma 10.3 in the following manner. Recall that for the pair of colorings X (i−1)Tm , Y (i−1)Tm , our coupling applies path coupling to this pair, so that we consider a sequence of neighboring colorings, namely, pairs of colorings that differ at a single vertex. Let W 0 , W 1 , . . . , W H i−1 denote this sequence of neighboring colorings. Let H i,j denote the Hamming distance at time T i from the j-th disagreement at time T i−1 (i.e., from the pair W j−1 , W j ). Then we have by the triangle inequality, and where E i,j is the event that H i,j ≥ 2/3 , in other words, the event E Tm from the statement of Lemma 10.3. Therefore, we have the following: (12) and (13) where the last step follows from Lemma 3, applied with X j = H i,j , and τ = 2/3 . Now applying Lemma 10.3 to each disagreement, starting at time (i − 1)T m for T m steps, we have: Returning to (11), we now have that: Therefore, by induction, since H 0 = 1, it follows that Now for the second summand in the right-hand side of (10): by path coupling and Theorem 11.1 Now for the third and final summand in the right-hand side of (10): To bound Pr S i \ E i we apply Theorem 11.2 to each pair of neighboring colorings that arises at times jT m for all j = 0, 1, . . . , i − 1. Since we assume event E i does not occur, there are at most 2i/3 neighboring pairs of colorings that we need to consider for each j. For each of these neighboring pairs of colorings, we use Theorem 11.2 to bound the probability that it creates a 200-suspect disagreement within T m steps. Then taking a union bound over all of the ≤ i 2i/3 neighboring pairs that we need to consider, we then have that: Plugging (14), (15), and (16) into (10) we have For sufficiently large 0 = 0 ( ), for > 0 and i = log , the right-hand side of (17) is at most 1/ √ , which completes the proof of the lemma.

Proof of Main Theorem 1
Finally, we prove the mixing time is O(n log n) by analyzing the coupling for O(n log n) steps for an arbitrary pair of initial colorings.
Proof of Theorem 1. Let us define a weighted Hamming metric ρ on the space of colorings as follows. ρ(X t , Y t ) equals the sum of the usual Hamming distance plus A times the number of 200-suspect disagreements for radius 2 3/5 . Here A = 3C / +1/2 , and we will require to be large enough that which is always true for sufficiently large .

Random Structures and Algorithms DOI 10.1002/rsa
Let T = C n(log )/ .
Claim . For any i ≥ 0, Proof of Claim. Let s denote the number of 200-suspect disagreements for radius 2 3/5 for X iT , Y iT , and let t denote the total number of disagreements. So Similarly, let s denote the number of 200-suspect disagreements for radius 2 3/5 for X (i+1)T , Y (i+1)T , and t the total number of disagrements. So By Lemma 10.4 and path coupling, we have the following bound on s : By path coupling, and applying Lemma 10.2 to the ≤ s 200-suspect disagreements and Lemma 12 to the ≤ (t − s) non-suspect disagreements we have Putting these together, we have This completes the proof of the Claim. Now, by induction and the Claim, we have, for all i ≥ 0, where ρ max = n + nA is the maximum possible value of ρ. Finally, we observe that if C 1 is sufficiently large relative to C then for i * = C 1 log(n/δ) C log , we have that E(ρ(X i * T , Y i * T )) ≤ δ. Since, by Markov's inequality, Pr (X iT = Y iT ) ≤ E(ρ(X iT , Y iT )), the theorem follows with T mix (δ) ≤ i * T = C 1 n log(n/δ) .

PROOF OF THEOREM 11: ANALYSIS OF A "NICE" PAIR OF COLORINGS
Fix v and R as defined in the statement of Theorem 11. Recall, for X t , Y t ∈ , their difference is denoted by Denote their Hamming distance by H t = |D t |. Also, denote their cumulative difference by and denote their cumulative Hamming distance by H ≤t = |D ≤t |.
The main work is to prove part 1.
Proof of Theorem 11.1. Let δ = .45, γ = /20 and let C b = C b (δ, γ ) from Lemma 9. Finally, let Since T m ≤ n exp( /C b ) for all sufficiently large, we can apply Lemma 9 to conclude that the desired local uniformity properties hold with high probability for all t ∈ I := [T b , T m ]. For times t ∈ I we will prove that the expected Hamming distance decreases.
Hence, for t ≥ T b , we define the following bad events: • E(t) denote the event that at some time s ≤ t, H s > 2/3 .
• For part (a) of Theorem 11, let B 2 (t) denote the event that there exists a time T b ≤ τ ≤ t and z ∈ B √ (v) such that For part (b) of Theorem 11, let B 2 (t) denote the event that there exists a time T b ≤ τ ≤ t, z ∈ B √ (v) and colors c 1 , c 2 such that Then we let and finally we define our good event to be For all of these events when the time t is dropped, we are referring to the event at time T m . We will bound the Hamming distance by conditioning on the above events in the following manner, From Lemma 10.3 we know that: For the second term in the right-hand side of (18) we will prove that: Finally, for the third term in the right-hand side of (18) we will prove that: Plugging (19), (20), and (21) into (18) we have that E(H Tm ) < 1/3 for sufficiently large, which completes the proof of Part 1 of Theorem 11.
Proof of (20). We can bound the probability of the event B 1 by a standard paths of disagreement argument. Let = √ . Recall, we are looking at the probability of a paths of disagreement from v of length at least within T m = C n/ steps, hence: for sufficiently large since k > 1.45 and = √ . To bound B 2 we will use Lemma 9. Recall from the beginning of the proof of Theorem 11.1, we set δ = .45, γ = /20, and for C b = C b (δ, γ ) from Lemma 9 we set T b = max{C b n, n ln(1/γ )}. Note the interval of times I := [T b , T m ] we are interested in is covered by Lemma 9. Moreover, the hypothesis of Theorem 11 says v is 400-above suspicion for radius R ≥ 3/5 . Hence, for sufficiently large, every z ∈ B √ (v) is 400-above suspicion for the constant radius R (γ , δ) required by the hypothesis of Part 2 of Lemma 9. Therefore, the desired bound on the local uniformity property of a vertex z fails with probability that is exponentially small in . More precisely, we have that: Summing the bounds in (22) and (23) implies (20).
Proof of (21). We will bound the expected change in H(X t , Y t ) using path coupling. Thus, let W 0 = X t , W 1 , W 2 , . . . , W h = Y t be a sequence of colorings where h = H(X t , Y t ) and W i+1 is obtained from W i by changing the color of one vertex w i from X t (w i ) to Y t (w i ). We maximally couple W i and W i+1 in one step of the Glauber dynamics to obtain W i , W i+1 . More precisely, both chains recolor the same vertex, and maximize the probability of choosing the same new color for the chosen vertex. Consider a pair W i , W i+1 . With probability 1/n both chains recolor w i to the same color, and the distance decreases by one. Consider z ∈ N(w i ), and let c 1 = W i (w i ) and c 2 = W i+1 (w i ). Note, color c 1 is not valid for z in W i , however, it is valid in If at least one of these two cases hold, with probability at most 1/n min{A(W i , z), A(W i+1 , z)}, vertex z is recolored to different colors in the two chains. Otherwise z will be recolored the same in both chains. Therefore, given W i , W i+1 , In any coloring every vertex has at least k − available colors. Since k − ≥ /3, we have the following trivial bound. Given W i , W i+1 , Therefore, given X t , Y t , This bound will only be used for the burn-in phase of T b steps. We will need to do significantly better for the remaining T m − T b steps of an epoch. Assume that G(t) holds. We will bound the distance in (24) separately for part (a) and part (b) of Theorem 1.
Suppose G has girth ≥ 5 and k = (1 + )α * , < .3. (Note, the choice of the constant .3 is arbitrary and to prove the theorem it suffices to consider the case when is upper bounded by any constant.) For all 0 ≤ i ≤ h, z ∈ B R (v), all t ∈ [T b , T m − 1], assuming G(t) occurs, we have that: where the first inequality comes from assuming E(t) occurs, and the second inequality comes from assuming B 2 (t) occurs. Hence, for t ∈ [T b , T m ], given W i , W i+1 , and assuming G(t) occurs we have that: Similarly, suppose G has girth ≥ 7 and k = (1 + )β * , < .3. For all 0 ≤ i ≤ h, z ∈ B R (v), c 1 , c 2 ∈ [k], all t ∈ [T b , T m − 1], assuming G(t) occurs: since A(X t , y) ≥ k − > /3. Plugging (28) into (24) proves (27) for part (b) of the theorem.

E(H t+1 1(G(t))) = E(E(H
The above derivation deserves some words of explanation. In brief, the first equality is Fubini's Theorem, the second is because G(t) is determined by X 0 , Y 0 , . . . , X t , Y t . The first inequality uses (29), and the second inequality uses G(t) ⊂ G(t − 1). By induction, it follows that The result follows from the choice of constants (note, H 0 = 1).
This completes the proof of Theorem 11.1. Now we will prove Part 2 of Theorem 11.
Proof of Theorem 11.2. Consider the event B 1 that D Tm ⊂ B √ (v). Recall (from the proof of part 1 of this theorem) the event B 1 is defined as the event D ≤Tm ⊂ B √ (v). Hence, by (22) we have: Hence, we can assume the disagreements are contained in B √ (v). By the hypothesis of Theorem 11, each vertex w ∈ B √ (v) is 400-above suspicion for radius R − √ in both X 0 and Y 0 . Therefore, by Lemma 8, each vertex w ∈ B √ (v) is 4-above suspicion for radius R − √ − 2 in X Tm and Y Tm with probability at least 1 − exp(− /C b ). Therefore, all w ∈ B √ (v) are 4-above suspicion for radius R − √ − 2 in X Tm and Y Tm with probability at least 1 − exp(− √ ). We have proven that all disagreements between X Tm and Y Tm are 4-above suspicion for radius R − √ − 2 with probability ≥ 1 − 2 exp(− √ ), which proves Theorem 11.2.