An Efficient Riemannian Gradient Based Algorithm for Max-Cut Problems

The max-cut problem addresses the problem of finding a cut for a graph that splits the graph into two subsets of vertices so that the number of edges between these two subsets is as large as possible. However, this problem is NP-Hard, which may be solved by suboptimal algorithms. In this brief, we propose a fast and accurate Riemannian optimization algorithm for solving the max-cut problem. To do so, we develop a gradient descent algorithm and prove its convergence. Our simulation results show that the proposed method is extremely efficient on some already-investigated graphs. Specifically, our method is on average 35 times faster than the best well-known techniques with slightly losing the performance, which is on average 0.96 of the max-cut value of the others.


I. INTRODUCTION
T HE PROBLEM of minimizing the number of vias (holes on a printed circuit board, or contacts on a chip) subject to pin pre-assignments and layer preference comes up in Very-Large-Scale-Integrated circuits and printed circuit board design. To give more intuition, in the case that the number of available layers in a PCB design is limited, it may become difficult or even impossible to assign a routing net onto one available layer. Owing to the non-crossing constraint on the routing nets in the existing limited layers, a routing net may be separated into some partial nets under the on-crossing constraint, and the necessary vias must be introduced to connect the partial routing nets inside the available layers [1]. To reduce the number of the used vias on the limited layers in a PCB design, first, a graph called the conflict graph is constructed. Then, the concept of clustering is utilized to partition this conflict graph into two or more subgraphs, which correspond to the vias in each layer. Consequently, the problem of minimizing the number of vias can be formulated as a clustering problem, which can be solved by applying the max-cut over the intended conflict graph [2].
The max-cut problem has diverse applications in VLSI circuit design, statistical physics [2], social networks [3], semisupervised learning [4], and biology [5]. This problem can be defined by considering an undirected graph G = (V, E, A), where V denotes the set of vertices, E is the set of edges connecting the vertices in V, and A denotes the adjacency matrix with nonnegative elements. Without loss of generality and for simplicity of presentation, we address the simple graphs which have finitely many vertices and with no loops or multiple edges. A cut for the graph G splits the vertices into two subsets S andS, where S ∩S = ∅ and S ∪S = V. Also, the corresponding cut value is calculated as where E(S,S) is the collection of the edges with one endpoint in S and the other inS, and a ij is the (i, j)th element of the adjacency matrix A. Then, the max-cut problem addresses finding a cut with the maximum cut value.
A naive exact solution for the max-cut problem is to calculate the cut value of all 2 n possible cuts and comparing their corresponding cut values for finding the max-cut, where n is the number of the graph's vertices. This approach demands O(m2 n ) computations, where m is the number of the corresponding edges, which is in general unaffordable. This exact algorithm can be contemplated as a complete traversal of a binary decision tree in which each branch is obtained by either allocating a vertex to a given set or not. Then, to reduce the computational complexity of this algorithm, the upper and lower bounds of the max-cut are used to limit the traversal of the decision tree. Such bounds can be obtained by utilizing heuristic functions [6].
To cope with the NP-hardness of the max-cut problem, polynomial-time ρ-approximation algorithms may be utilized. A ρ-approximation algorithm finds a solution that is at least ρ times the optimal value, where the constant ρ is called the performance guarantee of the algorithm. Also, ρ-approximation for randomized algorithms means that the solution is on average ρ times the optimal value. In [7], a 0.5-approximation algorithm has been presented for the max-cut problem. This algorithm can be simply explained as the random assignment of the set of vertices of the graph to two subsets based on the uniform bipolar distribution. Following this method, some other approximation algorithms have been presented with slight improvement; such as [8] with a performance guarantee of 0.5 + 0.5/ , where is the maximum degree.
In [9], the authors develop a polynomial-time algorithm based on the SDP and randomized rounding approach which achieves 0.878-approximation of the max-cut problem. Then, the max-cut problem is formulated as where y = (y 1 , . . . , y n ) is a cut for G showing V i belongs to S for y i = 1 and toS for y i = −1. However, since (2) is NP-complete, the following semidefinite relaxation is used, Then, a randomized rounding (RR) approach is used to map the optimal solution (x * 1 , . . . , x * n ) resulted from (3) to a cut. To perform the RR, first a random vector r, is chosen with uniform distribution on the unit sphere S n−1 . Then, the label y i is estimated as sign( x * i , r ).) In [10], it is proved that the method of [9] results in the best polynomial time ρ-approximation algorithm for a maximum cut, provided that the unique games conjecture [11] holds. Also, Håstad proved that approximating the max-cut problem with a performance guarantee better than 16/17 0.941176 is NP-hard [12].
Apart from the exact and the ρ-approximation algorithms for solving the max-cut problem, a set of inexact methods have been also addressed with no performance guarantee. In [13], the authors use the solution of [9] to initialize a local search to improve the max-cut solution, and also investigate some heuristic methods like Metropolis and simulated annealing (SA). The results show that by growing a graph, the SA performs effectively in terms of quality. In [14], the quality of the SDP method of [9], random strategy [7], genetic algorithms, combinatorial algorithm [8], and a divide and conquer (D&C) strategy are investigated. The results show that genetic algorithms give a quite similar solution to that of the random strategy [7], however, with much more running time. Moreover, it is shown that although the combinatorial method performs worse than the SDP, it is much faster. It is also discussed that the D&C strategy can present a good compromise between the quality and running time.
In [15], a heuristic rank-2 relaxation is developed and the CirCut algorithm is introduced for solving the maxcut problem. In [16], a variable neighborhood search (VNS) method; which is a local search-based metaheuristic method [17], is combined with the path-relinking (PR) method [18]. The resulting method, called VNSPR, shows well performance in either runtime or accuracy. In [19], an efficient implementation is presented for the scatter search (SS) method of [20] with satisfying approximation results. Also, a dynamic convexized (DC) method in [21] performs a local search on a dynamically updated auxiliary function, which shows good performance in solving the max-cut problem.
However, most of the previous works do not provide the convergence properties of the algorithms. In addition, all of the above methods perform poorly in terms of computational complexity, especially for the graphs with a very large number of vertices. Also, these algorithms seem to be sensitive to the problem characteristics, meaning that the more edges the graph has, the more time it takes to execute the mentioned algorithms.

A. Contribution
In this research, we develop a Riemannian Gradient-based algorithm for solving a Max-Cut problem, which we call the Riemannian Gradient-based Max-Cut (RGMC) algorithm. We show how a max-cut problem can be formulated as an unconstrained optimization problem over a Riemannian manifold. By restricting our problem over a Riemannian manifold, we prove the convergence of the proposed gradient-based algorithm. Using simulations, it is shown that the RGMC method is much faster than the other ones at the cost of a slight decrease in the performance. Specifically, on average it is 35 times faster than the best well-known methods and finds a cut with a value of approximately 0.96 of the max-cut value compared to some recent algorithms. Moreover, as opposed to the previous algorithms, the runtime of the proposed algorithm does not greatly increase by augmenting the number of vertices. This is verified by simulation results.

B. Outline
The remaining of the paper is as follows. Section II includes our main results. Simulations are given in Section III. Section IV concludes the paper and the definitions and proofs are presented in Appendix.

II. MAIN RESULTS
The max-cut problem in 2 can be rewritten as where L denotes the Laplacian of the graph defined as, with D being the degree matrix of the graph. It can be shown that (4) is equivalent to the problem: where sgn(·) acts elementwise and Then, for the optimal solutions of (4) and (6), respectively shown by y * and x * , we get y * = sgn(x * ). In this way, we have formulated the max-cut problem as a nonsmooth optimization problem over the Euclidean space.
To make (6) differentiable, we approximate sgn(q) by the differentiable function 2 π tan −1 (αq), which becomes more accurate as α increases. Then, (6) can be presented as Moreover, to incorporate the approximation accuracy in the optimization problem, we use the following penalty term: In this way, we propose the following maximization problem for the max-cut problem, where γ i 's are positive regularization parameters and x i is the ith element of x. Then, the max-cut solution is estimated as Differentiable cost function F, manifold M, inner product ·, · , retraction function R.
Result: X * ∈ M. for i = 0, 1, 2, ... do Step 1: Set ξ as the negative direction of the gradient, ξ j := − gradF(X j ) Step 2: Convergence evaluation, if ξ j < τ then break Step 3: Find the smallest m satisfying Step 4: Find the modified point as X j+1 :=R X j (ᾱβ m ξ j )) y * = sgn(x * ), where (x * , α * ) is the local optimum of (8). Since this solution is independent of α, we can rewrite (8) as, where z * is a local optimum solution for (9), and y * = sgn(z * ). Next, by expanding (9), we can rewrite it as where γ is the vector of γ i 's and diag(·) transfers the input vector to a diagonal matrix. To optimize the regularization parameters, we also incorporate γ in the problem as min z∈R n ,γ ∈R n where R + is the set of positive real numbers. To solve (11), we propose a gradient descent method over the product manifold M = R n × R n + , as presented in Algorithm 1. Then, the maxcut solution is, where X * = (z * , γ * ) ∈ M denotes the optimum value obtained from Algorithm 1 [22]. The convergence of Algorithm 1 for the problem defined by (11) is investigated in the following.
Lemma 1: A sequence generated by Algorithm 1 for (11) does not necessarily converge to a point in M = R n × R n + . Proof: See Appendix B. To guarantee convergence of Algorithm 1, we modify problem (11) so that any sequence generated by Algorithm 1 lies in a compact set. After that, we present a theorem for convergence of Algorithm 1. To realize the aforementioned compactness, we should create a set that is bounded and closed. For this purpose, we modify problem (11) in two steps. In the first step, we impose the following constraints to (11), which make the feasible set bounded, where · 2 is the l 2 -norm. Then, (11) turns into where S n−1 is an (n − 1)-dimensional sphere and S n−1 is an (n − 1)-dimensional statistical manifold (See Definition 5).
In the second step, we use the logarithmic-barrier function to confine γ within a compact set. Thus, we modify (13) as (14) where υ is an infinitesimal positive value and ε is a very small value in the interval (0,1). By this construction, any sequence generated by Algorithm 1 for (14) will belong to the compact set S n−1 × [ε 1 − ε] n−1 . However, since γ i = ε is not in the feasible set of (14), we present the following theorem to derive a guarantee for the convergence of (13). Theorem 1: Let υ be an infinitesimal positive value and ε be a very small value in the interval (0,1) for (14). Also, assume that Algorithm 1 generates the infinite sequence for (14). Then, the sequence {X j } j→∞ j=0 will have a convergent point, namely X * ∈ N = S n−1 × S n−1 , for problem (13).
Proof: See Appendix C.
III. SIMULATION RESULTS To evaluate the performance of the proposed algorithm, we consider four other approaches for comparison purposes, including the SS, CirCut, VNSPR, and DC. Our criteria for evaluation of the methods are the value of max-cut solution as well as the runtime. All the experiments are performed in MATLAB 2015 on a computer with Intel Core i5-4210U CPU (1.7GHz) and 6 GB RAM.
In the intended experiment, we compare the methods over the graph tests G1, G2, G3, G4, G5, G12, G13, G14, G15, G16, G17, G22, G23, G24, G25, G26, G35, G36, G37, G38, G44, G45, G46, G47, G48, G49, G50, G51, G52, G53, G54, which have binary weights generated by Helmberg and Rendl [23]. The results presented in Table I show that our algorithm is more fast and accurate by finding a cut value with the average ratio of 0.9597 to the best cut and an average runtime of 35 to the best of the other algorithms. To run the algorithm for solving (14), we use random initialization. The results are obtained by averaging over 10 independent trials. For further evaluation, we also consider some other graphs from [23], which are of nodes of 5000 and 7000, namely, G55, G58, G60, and G63. The results are satisfying in terms of runtime. Note that, as the undertaken algorithms do not converge for the above graphs in a reasonable time, their cut values have not been reported here or in other works. The results are presented in Table I.
Finally, to justify the proposed algorithm in circuits and systems for very large graphs, we have considered a random unweighted and undirected graph with 15000 nodes and 34468 vertices. Fortunately, the convergence time is 37.45 seconds, which is a reasonable value and shows the computational complexity efficiency of the proposed method. This was highlighted when we noticed that the other methods could present no satisfying results due to the need for either a great amount of memory and/or a very long time for convergence. Furthermore, the reason for not considering a large number of nodes; like more than 15000, the PC went out of memory for generating a Laplacian graph of size more than 15000*15000.

IV. CONCLUSION
We proposed an efficient Riemannian optimization algorithm for the max cut problem. To solve this problem, a Riemannian gradient descent was developed and its convergence was proved. Simulation results showed high accuracy and speed of the proposed algorithm in terms of the cut value and the runtime. Specifically, by inspecting the runtimes, it was concluded that our method is almost 35 times faster than some well-known algorithms at the cost of slightly losing the max-cut value, which is 0.9597 of the others. APPENDIX A PRELIMINARY ON MANIFOLD GEOMETRY In the following some definitions required for Algorithm 1 are presented. More information may be found in [22].
Definition 1: Consider the manifold M. The set of all tangent vectors at a point x ∈ M denoted by T x M is called the tangent space at this point. Also, the disjoint union of tangent spaces is called the tangent bundle denoted by TM.
Definition 2: Let M be a manifold. A smooth function, which assigns a tangent vector to each point of M is called a tangent vector field ξ on M. An example of the tangent vector field is the gradient of a real-valued smooth function f over M, which is denoted by gradf .
Definition 3: A Riemannian manifold is a manifold equipped with a smoothly varying inner product. The inner product over a manifold is denoted by <·, ·> and its restriction to the tangent space T x M is denoted by <·, ·> x .
Definition 4: Let M be a Riemannian manifold with tangent bundle TM. The exponential map denoted by Exp(·) maps the elements of TM to M, so that for v ∈ TM, Exp(v) equals to h at time 1; where h is a unique geodesic starting from the base point of v with velocity v at time 0. Moreover, the retraction function, denoted by R(·), brings a point on tangent bundle back to the manifold which satisfies some properties mentioned in [22,Definition 4.1.1]. Also, the restriction of the retraction function to the tangent space T x M is denoted by R x (·). The first-order approximation of the exponential map is a way of constructing the retraction function.
Definition 5: Let S n be the set of all positive measures: Then, the subspace S n−1 of S n with the property n−1 i=0 p i = 1 is called the statistical manifold.
be a sequence generated by Algorithm 1 for (11), where X j = (z j , γ j ) ∈ M. Also, let X * be the limit point of the sequence {X j } j→∞ j=0 and let the subsequence {X j } j→∞ j=N belong to B (X * ), which is defined as the l 2 -norm ball of radius around X * . Now, we prove by contradiction that X * does not necessarily belong to M. Assume that X * ∈ M is a local optimal point for (11). In other words, there exists an arbitrarily small ρ-neighborhood B ρ (X * ), for which we can write, It is easy to verify that for the sequence {(z * , ( be any arbitrary sequence generated by Algorithm 1 for (14), which is due to Step 3 of Algorithm 1, a decreasing sequence. By construction, we have formulated (14), so that the elements of {X j } j→∞ j=0 will belong to a subset of the compact set S n−1 × [ε 1 − ε] n−1 .
We prove that {X j } j→∞ j=0 is also a valid sequence for (13), since it is a decreasing sequence for the objective function of (13), i.e., F(X j+1 ) ≤ F(X j ), ∀k ≥ 0. Because {X j } j→∞ j=0 is decreasing for (14), for this problem we have 0. (16) From (16), we see that for an infinitesimal value of υ, a decreasing sequence for (14) is also a decreasing sequence for (13). This means that unlike (14), for (13) we can consider the sequence {X j } j→∞ j=0 belonging to the compact set S n−1 × [ε 1 − ε] n−1 . This is because unlike (14), γ i = ε belongs to the feasible set of (13). It is known that for a compact set any sequence owns a convergent subsequence [24]. Consequently, {X j } j→∞ j=0 will have a convergent point, namely X * . By using [22,Th. 4.3.1]; which states every convergent point of a sequence generated by Algorithm 1 is a critical point, we conclude the convergence of Algorithm 1.