Stochastic Steiner Trees without a Root

. This paper considers the Steiner tree problem in the model of two-stage stochastic optimization with recourse . This model, the focus of much recent research [1–4], tries to capture the fact that many infrastructure planning problems have to be solved in the presence of uncertainty, and that we have make decisions knowing merely market forecasts (and not the precise set of demands); by the time the actual demands arrive, the costs may be higher due to inﬂation. In the context of the Stochastic Steiner Tree problem on a graph G = ( V, E ) , the model can be paraphrased thus: on Monday, we are given a probability distribution π on subsets of vertices, and can build some subset E M of edges. On Tuesday, a set of terminals D materializes (drawn from the same distribution π ). We now have to buy edges E T so that the set E M ∪ E T forms a Steiner tree on D . The goal is to minimize the expected cost of the solution. We give the ﬁrst constant-factor approximation algorithm for this problem in this paper. This is, to the best of our knowledge, the ﬁrst O (1) -approximation for the stochastic version of a non sub-additive problem 3 In fact, algorithms for the unrooted stochastic Steiner tree problem we consider in this paper are powerful enough to solve the Multicommodity Rent-or-Buy problem, themselves a topic of much recent interest [6–8].


Introduction
Real world planning problems often have a significant component of uncertainity.For instance, when designing networks, the precise demand patterns and future costs of building capacity are often unknown to begin with, and only become clear as time progresses.However, with our increasing ability to collect statistical data, and the development of sophisticated and realistic forecast models, the paradigm of stochastic optimization has gained much traction.Indeed, we can now aim to solve a wider class of problems: given not a single input, but a distribution over inputs, we want to find a solution that is good in expectation (taken with respect to the randomness in the model).
In this paper, we study the problem of connecting a group of terminals by a Steiner tree in a stochastic setting.In the classical Steiner tree problem, we are given an undirected graph G = (V, E) with edge costs c e , and a group of terminals g = {t 1 , t 2 , . . ., t k }; the goal is to find a subset E of edges of minimum cost that connects all these terminals.We consider this problem when the group g is not deterministically given in advance; instead, it is given by a random variable Γ , with Pr[Γ = g] being the probability that we will be required to build a network that connects a particular group g ⊆ V of terminals.As sketched in the abstract, we work the model of two-stage stochastic optimization with recourse.
-In the first stage, we assume to have (some) knowledge of the distribution of the random variable Γ .Armed with this information, we construct a network F 0 ⊆ E of edges bought as the first anticipatory part of the solution.-In the second stage, we learn a group g ⊆ V of terminals that is a realization of the random variable Γ .We have to purchase an additional augmenting set F 1 (g) of edges to ensure that F 0 ∪ F 1 (g) connects the terminals of g.The problem is interesting when the edges bought in the second stage have a higher cost (due to inflation, or because the second phase has to be built on short notice).We use σ > 1 to denote the inflation factor by which the edges are more expensive.
Our goal is to minimize the expected cost of the two-stage solution.If we define c(F ) = e∈F c e , and denote the first and second stage solutions F 0 ∈ E and Our results.The main quantitative result of this paper is the following: Theorem 1.There is a 12.6-approximation algorithm for the two-stage stochastic (unrooted) Steiner tree problem.
Note that while the stochastic Steiner tree problem has been considered in previous papers [1,3,5], their model is subtly but significantly different.All these works make the crucial assumption that the there is a fixed root r, and the goal is to connect the group g to the root r.This assumption, while a trifling detail in the deterministic case, turns out to make a big difference in the stochastic setting, requiring us to develop new techniques.For example, a fact used in one way or another by all previous results was that the first stage solution F 0 in the rooted case can be assumed to be a connected tree containing the root; this is just not true in the unrooted case: in fact, insisting on a connected first stage network may cost arbitrarily more than the optimum solution.Indeed, our result is the first approximation algorithm given for a problem that is not sub-additive, and requires us to interpret and use cost-sharing ideas in a novel way.
A note on the distributions.The distribution π of the random variable Γ is an object whose size may be exponential in |V |, but there are ways to cope with this fact.There may be succint representations of π: in the independent decisions model, each vertex v ∈ V independently has a probability p v of being included in Γ , which gives us a easyto-represent product distribution.In the scenario model, the distribution π is given by an explicit list of pairs (g i , p i ), with i p i = 1; here p i is the probability that the group g i appears.Note that the algorithm is now allowed to run in time polynomial in the length of the list.In the sampling oracle model, the distribution π can be arbitrary; the algorithm accesses it only through a sampling oracle.Upon request, the oracle outputs a group g that is drawn from the distribution π (or equivalently, is a realization of the random variable Γ ).Our algorithm works in the most general, sampling oracle model.
(We can also handle the case when the inflation parameter σ is random as well; for simplicity of exposition, we defer that discussion to the final version of the paper.)

Related work.
As already mentioned, several papers studied the rooted version of the stochastic Steiner tree problem.Immorlica et al. [1] give a O(log n) approximation in the independent decisions model, while [3] and [5] give constant approximation algorithms for the oracle and scenario models respectively.Karger and Minkoff [9] and Hayrapetyan et al. [10] study the maybecast problem, where one is to output a single tree T , to minimize the expected size of the smallest subtree of T spanning a random set of terminals.While technically this is also a stochastic problem, the recourse action is fixed, and the only randomness present is in the objective function.Gupta et al. [3] give a simple boosted sampling framework to convert an algorithm for a deterministic minimization problem to an algorithm for its stochastic counterpart.Their framework relies crucially on two ingredients: the deterministic version of the problem at hand has to be subadditive, and have an approximation algorithm that admits a strict cost sharing function.Since the unrooted Steiner tree problem is not sub-additive (i.e., if T 1 is a solution for terminal set g 1 , and T 2 for g 2 , then T 1 ∪ T 2 may not be a solution for g 1 ∪ g 2 ), we cannot apply their techniques directly here.
The general area of stochastic optimization is studied heavily in the operations research community, dating back to the seminal works of Dantzig [11] and Beale [12] in the 1950s; the books [13,14] and monograph [15] could serve as introduction for the interested reader.Much of the work related to combinatorial optimization problems in this area has been concerned with finding and characterizing optimal solutions either for restricted classes of inputs or with algorithms without polynomial running times guarantees.Recently, there has been some work on taking solutions to stochastic linear programs and rounding those to obtain approximation algorithms for the stochastic problems [4]; however, it is not clear how to apply those techniques to the Steiner tree problem.
The Boosted Sampling Framework.Gupta et al. [3] propose the Boosted Sampling framework of Figure 1.1 to solve any two-stage stochastic problem Π where the set Γ of demand points is stochastic.
One would naturally expect that in the case of stochastic Steiner tree, the deterministic algorithm of Step 2 would build a Steiner tree on the set of terminals g 1 ∪g 2 ∪• • •∪g σ .In fact, if the support of Γ was on sets that all contained the fixed root r, the analysis of [3] shows that this is enough to obtain an 3.55-approximation algorithm for stochastic Steiner tree.
Unfortunately, building a Steiner tree fails in the unrooted case.For an example, consider two groups g 1 and g 2 that are very far apart relative to their diameter; assume that Pr[Γ = g i ] • σ is large.In this case, the optimum solution must connect up each group g i in the first stage to avoid high second stage cost, but it should not build a link between g 1 and g 2 (to make F 0 span g 1 ∪ g 2 ) if it wants to avoid a high first 1: Boosted Sampling: Take σ independent samples g1, g2, . . ., g σ from the sampling oracle for Γ .

2:
Building First Stage Solution: Use an algorithm A to find a solution to the deterministic equivalent of the problem Π on the groups g1, g2, . . ., g σ .Use this solution as the first stage solution to the stochastic problem.

3:
Building Recourse: Once the group g of required terminals materializes, use an augmenting algorithm Aug A to augment the first stage solution to a valid solution that satisfies g. stage cost.On the other hand, if the two groups are interspersed in the same region of the graph, the optimum solution may benefit from link sharing and hence build a single Steiner tree spanning both groups.Hence it seems natural to suggest that the algorithm A should build a forest ensuring that each group lies within a single connected component; different groups may or may not be in the same component.As it turns out, building a Steiner Forest on the groups g i is a suitable deterministic equivalent of stochastic unrooted Steiner tree; however, proving this requires a lot more work.
To this end, we have to show that the main theorem of [3] which relates the performance of the boosted sampling framework to the notion of strictness4 of certain cost-sharing functions can be proved in our case, even though our problem is not subadditive.The proof of this is simple, and we will sketch it in Section 2. We then define the cost-shares in Section 3, and prove them to be strict in 4.

Notation and preliminaries
Let G = (V, E) be an undirected weighted graph with weigths c e on the edges.A network is simply a subset of the edges.We say that a network F is feasible for (or connects) a group of terminals g = {t 1 , t 2 , . . ., t k }, if all the terminals of g lie in the same connected component of F .The cost of a network F is simply the sum of costs of its edges; that is c(F ) = e∈F c e .
In the Steiner Forest problem, given a weighted undirected graph G and a list of groups of terminals D = {g 1 , g 2 , . . ., g n } with each g i = {t i1 , . . ., t iki }, we want to construct a network F of minimum cost that is feasible for each group g i .For a set D of terminal groups, let Sols(D) denote the set of networks that are feasible for each of the groups in D, and let OPT(D) be the network in Sols(D) of minimum cost.An algorithm A is an α-approximation algorithm for the Steiner Forest problem, if for any set D of terminal groups, it finds a network Given a group g of terminals and an existing network F ⊆ E, the goal of an augmenting algorithm is buy a set of extra edges F so that F ∪ F is a network that connects the group g.For instance, given a network F D ∈ Sols(D) that connects each of the groups in D, and a new group g / ∈ D, the augmenting algorithm Aug A seeks to find a set of edges F of minimum cost so that F D ∪ F ∈ Sols(D ∪ {g}).Definition 1.A cost-sharing function ξ is a function that, for any instance (G, D) of the Steiner forest problem, assigns a non-negative real number ξ(G, D, g i ) to every participating group g i ∈ D.
We shall drop a reference to the graph G, if clear from the context.Note that the cost sharing function assigns shares to groups, and not to the individual terminals.
Since the above definition is so general, let us specify some properties of these functions that we would like to get.A cost-sharing function ξ is competitive if g∈D ξ(D, g) ≤ cost(OPT(D)) holds for any Steiner forest instance (G, D).Thus, competitive cost-shares serve as a lower bound on the cost of the optimal solution.The following notion is crucial to the development of the paper, and implicitly places lower bounds on the cost-shares themselves.Definition 2. A cost sharing function ξ is β-strict with respect to an algorithm A, if there exists an augmenting algorithm Aug A , such that for any set of demand groups D and any group g / ∈ D,

.2)
Remark 1.There is a fine distinction between the notion of strictness we use here and strictness as defined in [7,3].In [7], strictness was defined only for augmentations with groups of size 2; in this paper, we allow for groups of larger sizes.However, the strictness in [3] is stronger than our notion, and allows for multiple group augmentations; the question of proving strictness by this definition remains open despite much effort.
Given all these definitions, we can now state the the following theorem, which can be derived from the proof of [3, Theorem 3.1].
Theorem 2. Suppose that A is an α-approximation algorithm for deterministic Steiner forest.Then, the boosted sampling algorithm of Figure 1.1 is an (α + β)-approximation algorithm for unrooted stochastic Steiner tree whenever there is a cost-sharing function ξ that is β-strict with respect to A and single group augmentations.
The proof of this theorem is simple, and closely follows the arguments in the aforementioned paper; we defer the simple details for the final version of the paper.

The Algorithm A and the Cost Shares ξ
In this section we review the Steiner forest algorithm of [7], although the algorithm of Becchetti et al. [6] would serve our purpose equally well.Both algorithms are extensions of the algorithm of Agarwal, Klein, and Ravi (AKR) [16], and Goemans and Williamson (GW) [17], and are designed to "build a few extra edges" over and above the AKR-GW algorithms, while keeping the overall cost of the solution within a constant factor of the cost of the optimum.We also describe our cost-sharing method.
Recall that we are given a graph G = (V, E) and a set D of groups g 1 , . . ., g n of terminals, where each group g i = {t i1 , t i2 , . . ., t iki } ⊆ V .Before defining our algorithm, we review the LP relaxation and the corresponding LP dual of the Steiner forest problem that was used in [17]: 3) where f (S) is equal to 1 if S separates g i for some i (that is, if both S ∩ g i and (V − S) ∩ g i is nonempty), and is 0 otherwise.Note that variables y S for sets S that do not separate any group are not contributing to the dual objective function, they still play an important role in our algorithm.
We now describe a general way to define primal-dual algorithms for the Steiner forest problem.As is standard for the primal-dual approach, the algorithm with maintain a feasible (fractional) dual, initially the all-zero dual, and a primal integral solution (a set of edges), initially the empty set.The algorithm will terminate with a feasible Steiner forest, which will be proved approximately optimal with the dual solution (which is a lower bound on the optimal cost by weak LP duality).The algorithms of [16,17] arise as a particular instantiation of the following algorithm.Our presentation is closer to [16], where the "reverse delete step" of Goemans and Williamson [17] is implicit; this version of the algorithm is more suitable for our analysis.
Our algorithm has a notion of time, initially 0 and increasing at a uniform rate.At any point in time, some terminals will be active and others inactive.All terminals are initially active and eventually become inactive.At any point of time, the vertex set is also partitioned into clusters, which can again be either active or inactive.In our algorithm, a cluster will be one or more connected components (w.r.t. the currently built edges).Initially, each vertex is a cluster by itself, and the active clusters are just the terminals.We will consider different rules by which demands and clusters become active or inactive, which we describe shortly.To maintain dual feasibility, whenever the constraint (3.3) for some edge e between two clusters S and S becomes tight (i.e., first holds with equality), the clusters are merged and replaced by the cluster S ∪ S .We raise dual variables of active clusters until there are no more such clusters.
We have not yet specified how an edge can get built.Towards this end, let us define a (time-varying) equivalence relation R on the set of terminals.Initially, all terminals lie in their own equivalence class; these classes will only merge with time.When two active clusters are merged, we merge the equivalence classes of all active terminals in the two clusters.Since inactive terminals cannot become active, this rule ensures that all active terminals in a cluster are in the same equivalence class.(Note that if an active cluster merges with an inactive one, this merging of equivalence classes does not happen.) We build enough edges to maintain the following invariant: the terminals in the same equivalence class are connected by built edges.This clearly holds at the beginning, since the equivalence classes are all singletons.When two active clusters meet, the invariant ensures that, in each cluster, all active terminals lie in a common connected component.To maintain the invariant, we join these two components by adding a path between them.Building such paths without incurring a large cost is simple but somewhat subtle; Agrawal et al. [16] (and implicitly, Goemans and Williamson [17]) show how to do this.We refer the reader to [16] for details of this procedure, instead of repeating it here.Specifying the rule by which clusters are deemed active or inactive now gives us two different algorithms: 1. Algorithm GW(G, D): A terminal t ij ∈ g i is active if the current cluster containing it does not contain the entire group g i .A cluster is active as long as it contains at least one active demand.This implementation of the algorithm is equivalent to the algorithms of Agrawal et al. [16] and Goemans and Williamson [17].

Algorithm Timed(G, D, T ): This algorithm takes as an additional input a function
T : V → R ≥0 which assigns a stopping time to each vertex.(We can also view T as a vector with coordinates indexed by V .)A vertex j is active at time τ if j ∈ D and τ ≤ T (j).(T is defined for vertices not in D for future convenience, but such values are irrelevant, and can be imagined to be set to 0 for the rest of the paper.)As before, a cluster is said to be active if at least one demand in it is active.
To get a feeling for Timed(G, D, T ), consider the following procedure: run the algorithm GW(G, D) and set T D (j) to be the time at which vertex j becomes inactive during this execution.(If j / ∈ D, then T D (j) is set to zero.)Since a vertex stays active for exactly the same duration of time in the two algorithms GW(G, D) and Timed(G, D, T D ), the two algorithms clearly have identical outputs.Similarly, if for each t ij ∈ g i we set T (t ij ) = max t,t ∈gi d G (t, t ), we we obtain the recent algorithm of Könemann et al. [18].
It turns out that the Timed algorithm gives us a nice principled way to essentially force the GW algorithm to build additional edges: run the Timed algorithm with a vector of demand activity times that is larger than what is naturally induced by the GW algorithm.
The Algorithm A: The algorithm Algorithm A(G, D) that we use to build the first stage solution is 1: Run GW(G, D), and let T D (v) be the time at which v becomes inactive.

2:
Run Timed(G, D, γT D )-the timed algorithm with the above time vector T D scaled up by a parameter γ ≥ 1-and output the resulting forest F D .
(A technical point: when γ > 1, algorithm A may raise the dual variables of vertex sets that do not separate any group, and hence do not contribute to the value of the dual objective function.However, this will not hinder our analysis.The fact that F D is a feasible Steiner network for D is easily verified, using the fact that the terminals of each group became inactive at the same time T D (g i ) (equal to T D (t ij ) for any t ij ∈ g i ) when g i became connected, and that γ ≥ 1.We now define the cost shares ξ.
The Cost Shares ξ: We want the cost share of a group g i of users to account for the growth of components that grow solely because they contain terminals from g i .Let a(g i , τ ) be the number of active clusters in the execution of GW(G, D) that contain a terminal from g i but do not contain any active terminals outside g i .We define the cost share of g i to be where the integral is over the entire execution of the algorithm.Note that the cost shares defined by Equation (3.4) do not account for the full cost of the dual solution y, as the cost of growth of clusters with active demands from more than one group more than one active demand is not reflected at all.We could fix this by dividing the cost of growing mixed clusters among participating groups in some way; however, we do not see how to use this to improve our approximation ratio.
Augmentation Algorithm Aug A : A practical augmenting algorithm Aug A would simply contract all edges of F D , and then find an approximate Steiner tree on the terminals of g in this contracted graph G/F D .However, in order to bound the second stage cost, we build a specific Steiner tree on g in G/F D , and argue that the cost of this tree can be bounded by β ξ(D + g, g) for some β ∈ R. The construction of this tree is implicit in the proof of Theorem 4, and can be found efficiently in polynomial time if required.In the following, we let Aug A be the algorithm that constructs this implicit tree.Our main technical result is thus the following.
Theorem 3.For any γ > 2, A is a α = (γ + 1)-approximation for the Steiner network problem, and ξ is a β = (4γ/(γ − 2))-strict cost sharing method with respect to the algorithms A and Aug A .
Proof.The fact that A is a (γ + 1)-approximation can be proved along the lines of [6, Lemma 3.1] (We postpone the details to the full version of the paper).The proof of strictness (Theorem 4) is the analytical heart of this paper, and is given in the following section.

Proving strictness
Our analysis follows a fairly natural line of analysis that was also used in [7].We start by fixing a set D of demand groups, and a group g / ∈ D. To prove strictness of our cost shares, we compare two executions of the GW algorithm: the inflated algorithm A(G, D) on the set of groups D that results in the forest F D , and the uninflated algorithm GW(G, D + g) which is responsible for computing the cost share ξ(D + g, g).
Recall that we have to show that g can be connected in F D with cost at most O(ξ(D + g, g)).We prove this in the following theorem, which also implicitly describes the augmenting algorithm Aug A .In the rest of the discussion, we will assume that γ > 2.
Theorem 4.There is a tree F in the graph G/F D that spans all terminals of g and has cost at most 4γ/(γ − 2) ξ(D + g, g).The tree F can be constructed in polynomial time.
The main difficulty in proving Theorem 4 arises from the fact that the two executions A(G, D) and GW(G, D + g) may be very different.Hence it is not immediately clear how to relate the cost of augmenting the forest F D produced by the former by the cost share ξ(D + g, g) computed by the latter.To make a direct comparison possible, we work through some transformations that allow us to find a mapping between dual variables in these two executions.In the grand finale, we produce a tree T that spans terminals of g, and show that a 1/β fraction of its edges is covered by dual variables corresponding to the cost share of g, which will complete the proof.Let us introduce some time vectors to facilitate this comparison.
-Let T D be the time vector obtained by running GW(G, D).Recall that F D is the forest constructed by Timed(G, D, γT D ); we also let R D be the equivalence relation constructed by the latter algorithm.-Let T D+g be the time vector generated by the execution GW(G, D + g) and let τ = T D+g (g) be the time when the terminals of g got connected in this execution.-Let T be the vector obtained by truncating T D+g at time τ .That is, The intuition for T is loosely this: we do not care about time after g has been connected, and this truncation captures this fact.)-Finally, let T −g be the vector T with g "taken out", that is, A side-by-side comparison of the executions GW(G, D) and GW(G, D + g) shows that for all v ∈ V , the simple inductive proof is omitted.Hence, we will use the forest constructed by Timed(G, D, γT −g ) as a proxy for the forest F D created by Timed(G, D, γT D ); intuitively, since T −g is smaller than T D , it should also produce a forest with fewer edges.We will make this intuition precise in Lemma 1 below.
To state the lemma in a general form that will be useful later, we need some more notation.For two weighted graphs G and G on the same vertex set V , we write G ≤ G if the shortest path distance between any pair of vertices (u,v) in G is no more than their distance in G.For a graph G = (V, E) and a set F ⊆ (V × V ), the graph G = G/F is a contraction of G, and is obtained by adding a zero-cost edge in G between every pair (u, v) ∈ F .Since R ⊆ V × V , we can define G/R in the same way.It immediately follows that if G is a contraction of G, then G ≤ G.For time vectors, let T ≤ T denote coordinate-wise inequality (and hence we can rewrite (4.5) as T −g ≤ T D ).

Lemma 1 ([7]
).Let G ≤ G be two weighted graphs and T ≤ T be two time vectors.Then, for the equivalence relations R and R produced by the executions Timed(G, D, T ) and Timed(G , D, T ), it holds that R ⊆ R .
A Simpler graph H: We now define a simpler graph H = G/R −g ; this graph H will act as a proxy for G/F D in the following sense.For two vertices u, v connected by a zero-cost path in H, we know that u and v are connected by a path in F D .This is because the inequality T −g ≤ T D used with Lemma 1 implies that R −g ⊆ R D ; now the invariant maintained by the algorithm Timed implies that there is a path connecting u and v in F D whenever (u, v) ∈ R D .
Thus, to prove Theorem 4, it suffices to exhibit a tree T in H that spans all terminals of g, and has cost at most 4γ/(γ − 2)ξ(D + g, g).By the properties of the graph H, it then follows that the network T ∪ F D is feasible for the group g.
Note that each equivalence class of R −g can also be thought of as a single (super)vertex of the graph H; this view may be more useful in some contexts.To complete the correspondence between the two views, let us extend the definition of a time vector to supernodes in the natural way: if w C is an equivalence class of the relation R −g , we let T (w c ) = max vi∈C T (v i ); this allows us to talk about running the Timed algorithm on H with the vector T .

The tree T spanning terminals of g
We will obtain the desired Steiner tree on the group g in H by considering the execution of the algorithm Timed(H, D + g, T ); we denote this execution by E. Recall that the time vector T was defined to ensure that in the execution Timed(G, D + g, T ) on the original graph G, the terminals of g eventually merge into a single equivalence class of the respective relation R. Since the graph H is a contraction of G, it follows from Lemma 1 that the terminals of g must end up in the same equivalence class in E, and hence in the same connected component of the forest constructed by E. There is a unique minimal tree that spans the terminals of g in this forest; let denote this tree.
Since T was constructed by the execution E, all of its edges must be fully tight with the dual grown in E. Our plan of attack is to show that the dual variables corresponding to the terminals of g account for a significant fraction of this dual, and hence the cost share of g must be large enough to pay for a 1/β fraction of the tree.To pursue this plan, we introduce the following notion of layers as in [7]; this terminology is just a convenient way of talking about "dual moats".
In an execution of an algorithm, a layer (C, I) corresponds to an active cluster C whose dual variable y C has been growing during the time interval I = [τ 1 , τ 2 ); the thickness of this layer is |I| = τ 2 − τ 1 .A layering L of an execution is a set of layers such that, for every time τ and every active cluster C, there is exactly one layer (C, I) ∈ L such that τ ∈ I.
Lonely layers: A layer (C, I) is lonely, if it does not contain any active terminals except terminals belonging to g.Thus, the cost share of g can be expressed as the total thickness of lonely layers in any layering of Timed(G, D + g, T ).Using Lemma 1, we can argue that the total thickness of lonely layers in the execution E is no more than in Timed(G, D + g, T ) (see [7] for details).Hence the total thickness of lonely layers in the execution E is a lower bound on the cost share of g.
We lower bound the thickness of lonely layers by arguing that the thickness of nonlonely layers intersecting T is significantly smaller than the length of T: since all of T has to be covered, this leaves a considerable fraction of the tree to be covered by lonely layers.Hence our overall goal can be reduced to giving an upper bound on the thickness of non-lonely layers that intersect the tree T.
To get a hold on this quantity, we proceed to compare a layering L of the execution E-recall that E = Timed(H, D + g, T )-with a layering L of its inflated counterpart E = Timed(H, D, γT −g ).We construct a mapping that maps every non-lonely layer = (C, I) ∈ L to a distinct layer = (C , γI) ∈ L that is γ times thicker.(Note that lonely layers do not have a natural counterpart, as the terminals of g do not appear at all in the execution E .)To ensure the existence of such a mapping, we align the two layerings to satisfy the following property: if Mapping non-lonely layers of L to layers of L : Every non-lonely layer = (C, [τ 1 , τ 2 )) must contain a terminal t ∈ C such that t / ∈ g, that was active in the interval [τ 1 , τ 2 ).Since T −g ≤ T D , the terminal t must have been active in the interval [γτ 1 , γτ 2 ) in the execution E , and hence there is a unique layer = (C , [γτ 1 , γτ 2 )) such that t ∈ C .We thus map to .A layer may contain multiple active terminals outside g; in that case, pick one of them arbitrarily.
The following two lemmas supply us with all the ammunition we will need to finish our argument.In the next lemma, let V (T) denote the vertex set of the tree T.
Lemma 2. The mapping from non-lonely layers of L to layers of L is one to one; that is, distinct layers of L map to distinct layers of L .Lemma 3. Let = (C, I) ∈ L be a non-shared layer, such that V (T) ∩ C = ∅.Then, for its corrsponding layer = (C , γI) we have that V (T) ∩ C = ∅.
The proof of the former is in Appendix A; the latter follows from [7, Lemmas 4. 16 and 4.17].

The book keeping
Let L and N denote the total thickness of lonely and non-lonely layers that intersect the tree T. Note that we count every layer only once, irrespective of how many edges of T it cuts.We can express the total length of the tree as where X represents the "extra" contributions of layers that intersect T more than once.
(For example, if a lonely layer intersects T in three edges, it is counted once in L and twice in X).
At any time instant τ , consider all the active clusters in the execution E that have a non-empty intersection with the tree T. We claim that any such cluster C "carves out" a connected portion of the tree T, that is, C ∩ T is a connected graph.Hence if we construct a graph with a node for every cluster intersecting T and an edge between every pair of clusters connected by a direct path along T, this graph will also be a tree.The number of layers intersecting T is equal to the number of nodes in this graph; the number of times each layer intersects T is equal to the degree of the corresponding vertex in this graph.Since the average vertex degree in a tree is at most 2, the number of intersections is at any time bounded by twice the number of layers intersecting T. Integrating over the course of the execution E, we obtain that L + N + X ≤ 2(L + N ). (4.7) A non-lonely layer is considered wasted if intersects T, but its image does not.According to Lemma 3, this happens only if T is fully contained inside .Let W denote the total thickness of wasted layers.The total thickness of layers of L intersecting T is a lower bound on the length of T. Since the image of every non-lonely layer that intersects T and is not wasted also intersects T, and because images of distinct layers do not overlap, we get the following lower bound on the length of T. γ(N − W ) ≤ |T|. (4.8) The final piece of our argument is the following claim: for every layer that is wasted, there must be a lonely layer growing at the same time, and hence W ≤ L. To see this claim, suppose that a non-lonely layer = (C, I) intersects T but is wasted-hence for its inflated image = (C , γI), we have V (T) ⊆ C .Since intersects T, there must be a terminal t ∈ g such that t / ∈ C. We now claim that during the interval I, the terminal t must have been a part of a lonely cluster.Indeed, suppose not; let t be inside a non-lonely layer 1 = (C 1 , I) with some other active terminal t 1 / ∈ g.But then, by Lemma 3, the inflated image 1 = (C 1 , γI) of this layer 1 must contain some vertex of T, and since V (T) ⊆ C , the layers and 1 have a nonempty intersection.This is possible only if and 1 are the same inflated layer, which contradicts Lemma 2, as the clearly distinct layers and 1 would then map to the same layer = 1 .Thus, W ≤ L. (4.9) Combining the inequalities (4.6-4.9),we obtain (γ − 2)|T| ≤ 4γL, thus proving Theorem 4.