Where’s the Winner? Max-Finding and Sorting with Metric Costs

in Abstract. Traditionally, a fundamental assumption in evaluating the performance of algorithms for sorting and selection has been that comparing any two elements costs one unit (of time, work, etc.); the goal of an algorithm is to minimize the total cost incurred. However, a body of recent work has attempted to ﬁnd ways to weaken this assumption—in particular, new algorithms have been given for these basic problems of searching, sorting and selection, when comparisons between different pairs of elements have different associated costs. In this paper, we further these investigations, and address the questions of max-ﬁnding and sorting when the comparison costs form a metric ; i.e., the comparison costs c uv respect the triangle inequality c uv + c vw ≥ c uw for all input elements u, v and w . We give the ﬁrst results for these problems—speciﬁcally, we present – An O (log n ) -competitive algorithm for max-ﬁnding on general metrics, and we improve on this result to obtain an O (1) -competitive algorithm for the max-ﬁnding problem in constant dimensional spaces. –


Introduction
The questions of optimal searching, sorting, and selection lie at the very basis of the field of algorithms, with a vast literature on algorithms for these fundamental problems [1].Traditionally the fundamental assumption in evaluating the performance of these algorithms has been the unit-cost comparison model-comparing any two elements costs one unit, and one of the goals is to devise algorithms that minimize the total cost of comparisons performed.Recently, Charikar et al. [2] posed the following problem: given a set V of n elements, where the cost of comparing u and v is c uv , how should we design algorithms for sorting and selection so as to minimize the total cost of all the comparisons performed?Note that these costs c uv are known to the algorithm, which can use them to decide on the sequence of comparisons.The case where all comparison costs are identical is just the unit-cost model.To measure the performance of algorithms in this model, Charikar et al. [2] used the framework of competitive analysis: they compared the cost incurred by the algorithm to the cost incurred by an optimal algorithm to prove the output correct.
The paper of Charikar et al. [2], and subsequent work by the authors [3] and Kannan and Khanna [4] considered sorting, searching, and selection for special cost functions (which are described in the discussion on related work).However, there seems to be no work on the case where the comparison costs form a metric space, i.e., where costs respect the triangle inequality c uv + c vw ≥ c uw for all u, v, w ∈ V .Such situations may arise if the elements reside at different places in a communication network, and the communication cost of comparing two elements is proportional to the distance between them.An equivalent, and perhaps more natural way of enforcing the metric constraint on the costs is to say that the vertices in V lie in an ambitent metric space (X, d) (where V ⊆ X), where the cost c ij of comparing two vertices i and j is the distance d(i, j) between them.
Our Results.In this paper, we initiate the study of these problems: in particular, we consider problems of max-finding and sorting with metric comparison costs.For the max-finding problem, our results show that the lower bound of Ω(n) on the competitive ratio arises only for arguably pathological scenarios, and substantially better guarantees can be given for metric costs.

Theorem 1. Max-finding with metric costs has an O(log n)-competitive algorithm.
We improve the result for the special case when the points are located in d-dimensional Euclidean space: Theorem 2. There is an O(d 3 )-competitive randomized algorithm for max-finding when the nodes lie on the d-dimensional grid and the distance between points is given by the ∞ metric; this yields an O(d 3(1+1/p) )-competitive algorithm for d-dimensional p space.
For the problem of sorting n elements in a metric, we give an O(log n)-competitive algorithm for the case of hierarchically well-separated trees (HSTs).We then use standard results of Bartal [5], and Fakcharoenphol et al. [6] from the theory of metric embeddings to extend our results to general metrics.Our main theorem is the following: Theorem 3.There is an O(log 2 n)-competitive randomized algorithm for sorting with metric costs.
It can be seen that any algorithm for sorting with metric costs must be Ω(log n)competitive even when the points lie on a star or a line-indeed, one can model unit-cost sorting and searching sorted lists in these cases.The question of closing the logarithmic gap between the upper and lower bounds remains an intriguing one.
Our Techniques.For the max-finding algorithm for general metrics, we use a simple algorithm that uses O(log n) rounds, eliminating half the elements in each round while paying at most OP T .Getting better results turns out to be non-trivial even for very simple metrics: an illuminating example is the case where the comparison costs for elements V = {v 1 , v 2 , . . ., v n } are given by c vivj = |i − j|; i.e., when the metric is generated by a path.(Note that this path has no relationship to the total order on V : it merely specifies the costs.) Indeed, an O(1)-competitive algorithm for the path requires some work: one natural idea is to divide the line into two pieces, recursively find the maximum in each of these pieces, and then compare the two maxima to compute the overall maximum.However, a closer look indicates that this algorithm also gives us a competitive ratio of Ω(log n).To fix this problem and reduce the expected cost to O(OP T ), we make a simple yet important change: we run two copies of the above algorithm in parallel, transferring a small amount of information between the two copies after every round of comparisons.Remarkably, this subtle change in the algorithm gives us the claimed O(1)-competitive ratio for the line.In fact, this idea extends to the d-dimensional grid to give us O(d)competitive algorithms-while the algorithm remains virtually unchanged, the proof becomes quite non-trivial for d-dimensions.
For the results on sorting, we first develop an algorithm for the case when the metric is a k-HST (which is a tree where the edges from each vertex to its children are k times shorter than the edge to its parent).For these HST's, we show how to implement a "bottom-up mergesort" to get an (existentially optimal) O(log n)-competitive algorithm; this is then combined with standard techniques to get the O(log 2 n)-competitiveness for general metrics.
Previous Work.The study of the arbitrary cost model for sorting and selection was initiated by Charikar et al. [2].They showed an O(n)-competitive algorithm for finding the maximum for general metrics.Tighter upper and matching lower bounds (up to constants) for finding the maximum were shown by Hartline et al. [7] and independently by the authors [3].
In the latter paper [3], the authors considered the special case of structured costs, where each element v i is assumed to have an inherent size s i , and the cost c vivj of comparing two elements v i and v j is of the form f (s i , s j ) for some function f ; as expected, better results could be proved if the function f was "well-behaved".Indeed, for monotone functions f , they gave O(log n)-competitive algorithms for sorting, O(1)competitive algorithms for max-finding, and O(1)-competitive algorithms for selection for the special cases of f being addition and multiplication.Subseqently, Kannan and Khanna [4] gave an O(log 2 n)-competitive algorithm for selection with monotone functions f , and an O(1)-competitive algorithm when f was the min function.
Formal Problem Definition.The input to our problems is a complete graph G = (V, E), with |V | = n vertices.These vertices represent the elements of the total order, and hence each vertex v ∈ V has a distinct key value denoted by key(v).(We use x ≺ y to denote that key(x) ≤ key(y).)Each edge e = (u, v) has non-negative length or cost c e = c uv , which is the cost of comparing the elements u and v.We assume that these edge lengths satisfy the triangle inequality, and hence form a metric space.
In this paper, we consider the problems of finding the element in V with the maximum key value, and the problem of sorting the elements in V according to their key values.We work in the framework of competitive analysis, and compare the cost of the comparisons performed by our algorithm to the cost incurred by the optimal algorithm which knows the results of all pairwise comparisons and just has to prove that the solution produced by it is correct.We shall denote the optimal solution by OP T ⊆ E. Given a set of edges E ⊆ E, we will let c(E ) = e∈E c e , and hence c(OP T ) is the optimal cost.Note that a proof for max-finding is a rooted spanning tree of G with the maximum element at the root, and where the key values of vertices monotonically increase when moving from any leaf to the root; for sorting, a proof is the Hamilton path where key values monotonically increase from one end to the other.

Max-Finding in Arbitrary Metrics
For arbitrary metrics, we give an algorithm for finding the maximum element v max of the nodes in V ; the algorithm incurs cost at most O(log n) × c(OP T ).Our algorithm proceeds in stages.In stage i, we have a subgraph G i = (V i , E i ) such that V i contains v max ; here V i ⊆ V , and E i = V i × V i with the same costs as in G.We start with G 0 = G; in stage i, if G i has a single node, then it must be v max , else we do the following steps.
1. Find a minimum cost almost-perfect matching Mi in Gi. (I.e., at most one node remains unmatched.)2. For every edge e = (u, v) ∈ Mi, compare the end-points of e. (If u is greater than v, then u "wins" and v "loses".) 3. Delete nodes which lost in the comparisons above to get the new set Vi+1 from Vi.
It is clear that the above algorithm correctly finds v max ; there are O(log n) rounds since |V i | = n/2 i , and hence the following lemma immediately implies that the cost incurred is O(log n) × c(OP T ), this proving Theorem 1.

Lemma 1. The cost of edges in M i is at most c(OP T ).
Proof.Given any set of 2k vertices in a tree T , one can find a pairing of these vertices into k pairs so that the paths in T between these pairs are edge disjoint (see, e.g., [8,Lemma 2.4]).We use this to pair off the vertices of V i in the tree OP T ; the total cost of the paths between them is an upper bound on a min-cost almost-perfect matching of V i .Finally, since the paths between the pairs are edge disjoint, their total cost is at most c(OP T ), and hence the cost of M i is at most c(OP T ) as well.

Finding the maximum on a line
To improve on the results of the previous section, let us consider the case of the line metric; i.e., where the vertices V = {1, 2, . . ., n} lie on a line, and the cost of comparing two elements in V is just the distance between them in this line.Let us assume that the line is unweighted, and consecutive points in V are at unit distance from each other, and hence c ij = |i − j|; we will indicate how to remove this simplifying assumption at the end of this section.We also assume that n is a power of 2. For an element x ∈ V which is not the maximum, let g(x) be a nearest element to x which has a key greater than key(x), and let d(x) be the distance between x and g(x).Observe that in OP T , the parent of x must be at distance d(x) from x, and hence c(OP T ) = x =vmax d(x).
Let us first look at a naïve scheme: we start off with a division D 1 of the line into two-node segments {[1, 2], [3,4] segments, each with 2 i nodes.We maintain the invariant that we know the maximum key element in each segment of D i ; when merging two segments, we compute the maximum by comparing the maxima of the two segments.However, this is just the algorithm of Section 2, and if we have

An algorithm which almost works.
A natural next attempt is to introduce randomization: to form the division D 2 from D 1 , we toss an unbiased coin: if the result is "heads", we merge [1,2], [3,4] into one segment (which we denote using the notation [1][2][3][4]), merge [5,6], [7,8] into the segment [5 − 8], and so on.If the coin comes up "tails", we shift over by one: [1,2] forms a segment by itself, and from then on, we merge every two consecutive segments of D 1 .Hence, with probability 1  2 , the segments in D 2 are {[1-4], [5][6][7][8], . ..}, and with probability 1  2 , they are {[1-2], [3-6], [7-10], . ..}.To get division D i+1 from D i , we flip an unbiased coin and either merge every pair of consecutive segments of D i beginning with the first segment, or merge every pair of consecutive segments starting at the second one.It is easy to see that all segments in D i , except perhaps the first and last ones, have 2 i nodes.Again, the natural randomized algorithm is to maintain the maximum element in each segment of D i : when combining segments of D i to form segments of D i+1 , we compare the two maxima to find the maximum of the newly formed segment.(We use stage i to refer to the comparisons performed whilst forming D i ; note that stages begin at 1, and there are no comparisons in the first stage.) The correctness of the procedure is immediate; to calculate the expected cost incurred, we charge the cost of a comparison to the loser-note that each node except v max pays for exactly one comparison.We would like to show that the expected cost paid by x ∈ V in our algorithm is O(d(x)).Let S i (x) denote the segment of D i containing x; the size of |S i (x)| ≤ 2 i , and the length of , then x definitely wins (and hence does not pay) in stages 1 through k; the following lemma bounds the probability that it loses in any stage t ≥ k + 1. (Recall that depending on the coin tosses, x may nor may lose to g(x).) Proof.Note that the lemma is vacuously true for t ≤ k + 2. Since d(x) < 2 k+1 , nodes x and g(x) must lie in the same or consecutive segments in stage k + 1.Now for x to lose in stage t, it must not have lost in stages {k + 2, k + 3, . . ., t − 1}, and hence the segments containing x and g(x) must not have merged in these stages.Since we make independent decisions at each stage, the probability of this event happening is Since x may have to pay as much as 2 t+1 if it loses in stage t, the expected cost for x is t≥k Θ(2 k ), which may be as large as Θ(2 k •(log n−k)).Before we indicate how to fix the problem, let us note that our analysis is tight: for the example with 1 ≺ 2 ≺ • • • ≺ n, the randomized algorithm incurs a cost Ω(n log n).
Two Copies Help: The Double-Random Algorithm.Let us modify the above algorithm to maintain two independent copies of the line, which we call L and L .The partitions in L will be denoted by D 1 , D 2 , . .., while those in L will be called D 1 , D 2 , . ... These partitions in the two lines are chosen independent of each other.Again, we maintain the maximum element of each segment in D i and D i , but also exchange some information between the lines.Consider the step of merging segments S 1 and S 2 to get a segment S in D i+1 , and let x i be the maximum element of S i .Before we compare x 1 and x 2 , we check if x 1 has lost to an element y ∈ S 2 in some previous stage in L : in this case, we know that x 1 ≺ y ≺ x 2 , and hence can avoid comparing x 1 and x 2 .Similarly, if x 2 has previously lost in L to some element z ∈ S 1 , we can declare x 1 to be the maximum element of S.Only if neither of these fortuitous events occur, we compare x 1 and x 2 .(The process for L is similar, and uses the information from L's previous rounds.)The correctness of the algorithm follows immediately.
Notice that each element x now loses exactly twice, once in each line, but the second loss may be implicit (without an actual comparison being performed).As before, we say that a node x loses in stage i of L (or L ) if this is the first stage in which x loses in L (or L ).The node x pays in stage i of L (or L ) if x loses in stage i of L (or L ) and an actual comparison was made.While x loses twice, it may possibly pay only once.Lemma 3. If x, y ∈ V are at distance d(x, y), then the probability (in either line) that x and y lie in different segments in D i is at most d(x, y)/2 i−1 .
Proof.Let 2 k ≤ d(x, y) < 2 k+1 ; the statement is vacuously true for i − 1 ≤ k.In stage i − 1 > k, the nodes x and y must lie in either the same or consecutive segments in D i−1 .Now, if they were in different segments in D i−1 (which inductively happens with probability at most d(x, y)/2 i−2 ), the chance that these segments do not merge in stage i is exactly 1  2 , giving us the bound of d(x, y)/2 i−2 × 1 2 .
Let the node g(x) lie to the left of x; the other case is symmetric, and proved identically.
Let the distance d(x) = d(x, g(x)) satisfy 2 k ≤ d(x) < 2 k+1 .Let h(x) be the nearest point to the right of x such that x ≺ h(x), and let 2 m ≤ d(x, h(x)) < 2 m+1 .Note that if x pays in stage t of L, then t ≤ m+3.Indeed, if the segment S m+3 (x) is the leftmost or the rightmost segment, then it either contains g(x) or h(x), so it must have paid by then.Else, the length of S m+3 (x) = 2 m+3 − 1, and since d(g(x), h(x)) < 2 m+1 + 2 k+1 = 2 m+2 , the segment S m+3 (x) must contain one of g(x) or h(x), so t ≤ m+3.Moreover, since S t (x) must contain either g(x) or h(x), it follows that t > k.The following key lemma shows us that the probability of paying in stage t ∈ [k + 1, m] is small.(An identical lemma holds for L .) Proof (Lemma 4).Note that if x pays in stage t ≤ m of L, then x must have lost to some element to its left in L, since d(x, h(x)) ≥ 2 m .Depending on whether x loses in L before stage t or not, there are two cases.

Case I:
x has not lost in D t−1 .This implies that x and g(x) lie in different segments in L , which by Lemma 3 has probability ≤ d(x, g(x))/2 t−2 ≤ 2 −(t−k−3) .Now the chance that x loses in L in stage t is 2 −(t−k−2) (by Lemma 2).Since the partitions are independently chosen, the two events are independent, which proves the lemma.
Case II: x has lost in stage l ≤ t − 1 in L .Since l ≤ m, x must have lost to some element y to its left in L ; this y is either g(x), or lies to the left of g(x).Consider stage t − 1 in L: since the distance d(y, x) < 2 l ≤ 2 t−1 , the three elements x, g(x) and y lie in the union of two adjacent segments in D t .Furthermore, x must lie in a different segment from y and g(x), otherwise x would have already lost in L in stage t − 1. Recall that if x loses in stage t in L, it must lose to a node to its left-since t ≤ m, h(x) is too far to the right.But this implies that S t−1 (x) must merge in L with S t−1 (y) = S t−1 (g(x)); in this case, no comparisons would be performed since x had already lost to y in L .
Note that this lemma implies that the expected payment of x for stages k+1 through m is at most

Max-Finding for Euclidean metrics
In this section, we extend our algorithm for the line metric to arbirary Euclidean metrics: the basic idea of running two copies of the algorithm and judiciously exchanging information will be used again, but the proof becomes substantially more involved.We give the proof for the 2-d case; the proof for the general case is deferred to the final version of the paper.
The General Double-Random Algorithm.As in the case of the line, we begin with the simplifying assumption that the nodes in V form a subset of the unit-weight n × n grid; we refer to this underlying grid as M = {1, 2, . . ., n} × {1, 2, . . ., n}. (This assumption can be easily discharged, as for the case of the line; we omit the details here.)Hence each point v ∈ V corresponds to a point (v x , v y ) ∈ M, with 1 ≤ v x , v y ≤ n.In fact, if P x denotes the path along the x-axis from 1 to n, and P y denotes a similar path along the y-axis, then we can identify the grid M with the cartesian product P x × P y .To construct partitions D 1 , D 2 , . . . of the grid, we build stage-i partitions D x i and D y i for the paths P x and P y : the rectangles in M's partition correspond to the products of the segments in D x i and D y i , and hence a square in D i+1 is formed by merging at most four squares in D i . 3The random partitioning schemes for P x and P y evolve independently of each other.
Again, we maintain the invariant that we know the maximum element in each square of the partition D i , and as in the case of the line, we do not want to perform three comparisons when merging squares in D i to get D i+1 .Hence we maintain two independent copies M and M of the grid, with D i and D i being the partitions in the two grids at stage i.Suppose {x 1 , x 2 , x 3 , x 4 } are the four maxima of four squares S i being merged into a new square S in M: for each i ∈ [1,4], we check if x i has lost to some y ∈ S in a previous stage in M , and if so, we remove x i from consideration; we finally compare the x i 's that remain.The correctness of the algorithm is immediate, and we just have to bound the costs incurred.

Cost of the Double-Random Algorithm in Two Dimensions
We charge the cost of each comparison to the node that loses in that comparison, and wish to upper bound the cost charged to any node p ∈ M. Let G(p) = {q | p ≺ q} be the set of nodes with keys greater than p; fix a vertex g(p) ∈ G(p) closest to p, and let d(p) be the distance between p and g(p), with 2 ≤ d(p) < 2 +1 .Since we focus on the node p for the entire argument, we shift our coordinate system to be centered at p: we renumber the vertices on the paths P x and P y so that the vertex p lies at the "origin" of the 2-d grid.Formally, we label the nodes p x ∈ P x and p y ∈ P y as 0; the other vertices on the paths are labeled accordingly.This naturally defines four quadrants as well.Let D x i be the projection of the partition D i on the line P x , and D y i be its projection of P y .(D x i and D y i are defined similarly for the grid M .)Let us note an easy lemma, bounding the chance that p and g(p) are separated in the partition D i in M. (Such a lemma holds for partition D i of M , of course.)

Lemma 5. Pr[p and g(p) lie in different squares of D
Proof.Let the distance from p to g(p) along the two axes be d x = d(p x , g(p) x ) and d y = d(p y , g(p) y ) with max{d x , d y } = d(p).By Lemma 3, the projections p x and g(p) x do not lie in the same stage-i interval of P x with probability at most d x /2 i−1 .Similarly, p y and g(p) y do not lie in the same stage-i interval of P y with probability at most d y /2 i−1 ; a trivial union bound implies that the probability that 2d(p)/2 i−1 < 2 −(i− −3) .We now prove that the expected charge to a node p is O(2 ), where the distance between p and a closest point g(p) in the set Let S i (p) and S i (p) be the squares in D i and D i respectively that contain p.We will consider two events of interest: 1. Let A i be the event that p pays in M in stage i, and the square S i (p) contains at least one point from H(p), but does not contain g(p).2. Let B i be the event that p pays in M in stage i, and S i contains g(p).
Note that A i ∩ B i = ∅; also, if p pays in stage i in M, then either A i or B i must occur.Also, Pr[A i ∪ B i ] > 0 only when p and some element of G(p) lie in the same square in stage i in M: since any two points in such a square are at ∞ distance ≤ 2 − 1 from each other, and each element of G(p) has ∞ distance at least 2 from p, it suffices to consider the case i > .Theorems 5 and 6 will show that i Pr . This shows that p pays only O(2 ) in M; a similar bound holds for M , which proves the claim that Double-Random is O(1)-competitive in the case of twodimensional grids.

Theorem 5.
i Pr[A i ] × 2 i = O(2 ).Proof.Let us define two events E x and E y .Let E x be the event that p x and g(p) x lie in different segments in D x i , and E y be the event that p y and g(p) y lie in different segments in

and hence
Let us now estimate Pr[A i ∩ E x ], the argument for the other term is similar.Assume (w.l.o.g.) that g(p) x lies to the left of p x , and let the points between g(p) x and p x (including p x , but not including g(p) x ) in P x be labeled p 1 x , p 2 x , . . ., p k x = p x from left to right.Define F j as the event that the segment S x i (p) in D x i containing p x has p j x as its left end-point.Note that the events F j are disjoint, and If F j occurs then the end-points of the edge connecting p j x and p j−1 x (where p 0 x = g(p) x ) lie in different segments of D x i .Lemma 3 implies that this can happen with probability at most 1 2 i−1 .Thus, we get Define I j i as the segment of length 2 i in P x containing p x and having p j x as its left endpoint.Let q(i, j) ∈ H(p) be such that q(i, j) x ∈ I j i and |q(i, j) y − p y | is minimum; in other words, the point closest to the x-axis whose projection lies in I j i .If no such point exists, then we say that q(i, j) is undefined.Let δ(i, j) = |q(i, j) y − p y | if q(i, j) is defined, and ∞ otherwise.Notice that for a fixed j, δ(i, j) is a decreasing function of i.
Assume F j occurs for some fixed j: for A i = ∅, S i (p) must contain a point in H(p), and hence δ(i, j) ≤ 2 i ; let i(j) be the smallest value of i for which δ(i, j) ≤ 2 i .Due to δ(i, j) being a decreasing function in i, δ(i, j) > 2 i for all i < i(j), and for all i ≥ i(j), δ(i, j) ≤ 2 i .Now suppose i > i(j); note the strict inequality, which ensures that q(i − 1, j) exists.Again assume that F j occurs: now for A i to occur, the square S i−1 (p) cannot contain any point of H(p).In particular, it cannot contain q(i − 1, j).Lemma 6.If F j occurs and i > + 1, then p x and q(i − 1, j) x lie in the same segment of D x i−1 .Proof.It will suffice to show the claim that segment containing p x in D x i−1 also has p j x as the left end-point; since q(i − 1, j) x also lies in this segment, the lemma follows.
To prove the claim, note that the distance Since i > , it follows that p x and p j x must lie in the same or in adjacent segments of D x i−1 ; we claim that the former is true.Indeed, suppose they were in different segments: since the segment of D x i−1 containing p j x must have width x − p x |, it must happen that p j x lies in the interior of this segment, and hence F j could not occur.
Note that since the projections of p and q(i − 1, j) on the x-axis lie in the same segment implies that the projections p y and q(i−1, j) y on the y-axis must lie in different segments of D y i−1 .Since this event is independent of F j , we can use Lemma 3 to bound the probability: indeed, we get that for i > i(j), We are now ready to prove the theorem.
(from (4.4)) where penultimate inequality follows from the fact that δ(i, j) is a decreasing function of i, and hence i>i(j) is a dominated by a geometric sum.A similar calculation proves that i 2 i • Pr[A i ∩ E y ] is O(2 ), which in turn completes the proof of Theorem 5. Now that we have bounded the charge to p due to the events A i by O(2 ), we turn our attention to the events B i , and claim a similar result for this case.
Proof (Theorem 6).Recall that if p loses in stage i, then i > : hence we define a set of events E +1 , . . ., E i−3 , where E j occurs if p loses in stage j of M .Also, define the event E 0 occur if p does not lose in M till stage i − 3. Note that exactly one of these events can occur, and hence The next two lemmas give us bounds on the probability of each of the terms in the summation. 3).Now given E 0 , p must not lose till stage i − 1 in M for B i to occur.But this event is independent of E 0 , and hence Lemma 5 implies that Pr 3) .Multiplying the two completes the proof.

Lemma 8. Pr[B
Proof.For the event E j to occur, p does not lose till stage j − 1 in M ; now applying Lemma 5 gives us that Pr[E j ] ≤ 2 −((j−1)− −3) .Also, note that < j ≤ i − 3 for us to be in this case.Now let us condition on E j occurring: let p lose to some q in stage j of M , and hence |p x −q x |, |p y −q y | < 2 j .Now consider stage i−1 of M. We claim that p x , q x and g(p) x do not all lie in the same segment of D x i−1 .Indeed, since the distance |p y − g(p) y | < 2 +1 ≤ 2 i−2 , the triangle inequality ensures that |q y − g(p) y | ≤ |p y − g(p) y | + |p y − q y | ≤ 2 i−1 , and hence the distance between any two points in the set {p y , q y , g(p) y } is at most 2 i−1 .Thus two of these points must lie in the same segment in D y i−1 in M. If all three lay in the same segment of D x i−1 , two of these points would lie in the same square in D i−1 .Now if p was one of these points, then p would lose before stage i and B i would not occur.If g(p) and q would lie in the same square of D i−1 , then p and q would be in the same square in D i , and then p would not pay.Therefore, all three of p x , q x and g(p) x cannot lie in the same segment of D x i−1 ; similarly, p y , q y and g(p) y can not lie in the same segment of D y i−1 .Hence one of the following two events must happen: either (1) p x , g(p) x lie in different segments of D x i−1 and p y , q y lie in different segments of D y i−1 , or (2) p x , q x lie in different segments of D x i−1 and p y , g(p) y lie in different segments of D y i−1 .Lemma 3 implies that the probability of either of these events is at most 2 5) .Finally, multiplying this with Pr[E j ] ≤ 2 −((j−1)− −3) completes the proof.Now combining (4.5) with Lemmas 7 and 8, we see that This completes the proof of Theorem 6.

Sorting with Metric Comparison Costs
We now consider the problem of sorting the points in V according to their key values.Let OP T be the set of n − 1 edges going between consecutive nodes in sorted order.A rooted tree T is called a 2-HST if the lengths of all edges at any level of T are the same, and the lengths of consecutive edges on any root-leaf path decrease by a factor of exactly 2. We assume that each internal node of T has at least 2 children.Indeed, if a node has exactly one child, we can contract this edge -this will change distances between leaves up to a constant factor only.Let us denote the set of leaves of the 2-HST tree T by V , and let |V | = n.The following theorem is the main technical result of this section.
Theorem 7. Given n elements, and the metric generated by the leaves of a 2-HST, there is an algorithm to sort the elements with a cost of O(log n) × c(OP T ).
Using standard results on approximating arbitrary metrics by probability distributions on metrics generated by HSTs [5,6], the above theorem immediately implies Theorem 3.
Proof (Theorem 7).For any rooted subtree H of T , let OP T (H) denote the optimal set of comparisons to sort the leaves in H, and let c(OP T (H)) be their cost.Let h be the root of H, and h's children be h 1 , . . ., h r ; let the subtree rooted at h i be H i .Consider OP T (H), and let a segment of OP T (H) be a maximal sequence of consecutive vertices in OP T (H) belonging to the same sub-tree H i for some i.Clearly, we can divide OP T (H) uniquely into node-disjoint segments-let segs(H) denote the number of these disjoint segments.Let d(H) denote the cost of an edge joining h to one of its children; recall that all these edges have the same cost.We omit the proof of the following simple lemma.
Lemma 9. c(OP T (H)) ≥ r i=1 c(OP T (H i )) + (segs(H) − 1) • d(H).Our algorithm sorts the leaves of T in a bottom-up manner, by sorting the leaves of various subtrees, and then merging the results.For subtrees which just consist of a leaf, there is nothing to do.Now let H, h, H i , h i be as above, and assume we have sorted the leaves of H i for all i: we want to merge these sorted lists to get the sorted list for the leaves of H.The following lemma, whose proof we omit, shows that we can do this without paying too much.
Lemma 10.There is an algorithm to merge the sorted lists for H i while incurring a cost of O(segs(H) • log n • d(H)).
We now complete the proof of Theorem 7. If cost(H) is the cost incurred to sort the subtree H, we claim cost(H) ≤ α • log n • c(OP T (H)) for some constant α.The proof is by induction on the height of the tree: the base case is when H is a leaf, and cost(H) = c(OP T (H)) = 0.If H, H i are as above, and if our claim is true for H i , then Lemma 10 implies that

Theorem 4 .
) by Lemma 2, which proves: The Double-Random algorithm is an O(1)-competitive algorithm for maxfinding on the line.