Numerical estimation of the relative entropy of entanglement

Yuriy Zinchenko, ∗ Shmuel Friedland, † and Gilad Gour 3, ‡ Department of Mathematics and Statistics, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 Department of Mathematics, Statistics and Computer Science University of Illinois at Chicago, 851 S. Morgan Street, Chicago, IL 60607-7045 Institute for Quantum Information Science, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 (Dated: 26 August, 2010)


I. INTRODUCTION
With the emergence of quantum information science, entanglement was recognized as a valuable resource for a variety of quantum information tasks, such as teleportation and superdense coding (for review, see e.g.[1,2]).With this recognition, much of the research in the field focused on how to quantify this quantum resource.Among the different proposed measures quantifying entanglement, the relative entropy of entanglement (REE) is one of the most important ones.Its importance comes from the fact that its asymptotic version provides the unique rate for reversible transformations [3] -recently it was discovered that the regularized REE is the unique function that quantify the rate of interconversion between states in a reversible theory of entanglement, where all types of non-entangling operations are allowed [4].
The REE is defined by [5] E R (ρ) = min σ ∈D S(ρ σ ) ≡ S(ρ σ) , where D is the set of separable states or, as will be considered in this paper, the set of positive partial transpose (PPT) states, and σ is the corresponding (possibly nonunique) minimizer.Recall the relative entropy is defined by S(ρ σ ) ≡ Tr (ρ log ρ − ρ log σ ) .
The REE quantifies to what extent a given state can be operationally distinguished from the closest PPT state.Besides of being an entanglement monotone it also has nice properties such as asymptotic continuity.
The state σ = σ(ρ) in Eq. ( 1) is called the closest PPT state (or closest separable state).Recently, the inverse problem to the long standing problem [6] of finding the formula for the closest PPT state σ(ρ) was solved in [7] for the case of two qubits and in [8] for all dimensions and for any number of parties.In [7,8] the authors found a closed formula for the inverse problem.That is, for a given state σ on the boundary of PPT states (or separable states), ∂D, an explicit formula was found describing all entangled states for which σ is the closest PPT state.However, despite the complete solution in [8] for the inverse problem, the original problem remained unsolved.Moreover, the results on the inverse problem in [7,8], suggests that the solution to the original problem may not be analytical.Hence the need for numerical estimation of the REE.
In this paper, we provide, for the first time (to our knowledge), an algorithm for the calculation of the REE, although other authors have also used semidefinite programming to the separability problem (see e.g.[9][10][11]).At the core of our approach lies a classical geometric idea that a convex set may be, in principle, approximated by its supporting hyperplanes.Our algorithm is based on the so-called cutting planes combined with positive semi-definite optimization.Namely, we attempt to successively refine the epigraph of the relative entropy function S(ρ σ ) using supporting hyperplanes, building a piece-wise linear lower-approximation to S(ρ σ ), while using positive semi-definite optimization techniques to characterize our feasible set D. In low dimensions where D corresponds to qubit-qubit and qubit-qutrit pairs, our numerical approach appears to recover the REE value up to small absolute error in relatively short time without much of a difficulty.For example, on a modest desktop station we could compute REE for 50 randomly generated qubit-qutrit problems with an average run-time of under 1 second per problem with pre-set precision of 10 −2 for REE, see the last computational section for more details.In addition, in higher dimensions, our approach readily generalizes to computing an approximate REE over PPT states rather than the more elusive separable states set.
The paper is organized as follows.In the next section we introduce notations and preliminary results concerning the REE.We then discuss our algorithm in Sec-tion III.In Section IV we discuss our computational results.Finally, in last section we summarize our findings with a few conclusions.In addition, in [12] there is a link to the MATLAB driver code.

II. PRELIMINARIES
In what follows we use the following notation.Let C m×n denote the space of m×n complex-valued matrices.
-the set of Hermitian positive semi-definite matrices of trace 1 (i.e.density matrices).
We write A ≥ 0, A 0, A > 0 if A is positive semidefinite, nonzero positive semi-definite, positive definite, respectively.More generally for A, B ∈ H k we let We denote by I k the identity matrix of order k, which is also denoted by I when no ambiguity arises.A standard inner product on H k × H k is the trace-inner product defined as A, B = A • B = trace (A † B); for concreteness, we may fix our norm choice A = A, A .Remark: Unlike the introduction, for the remainder of the manuscript we adapt more math-and-optimization friendly notations, as in detailing our approach we generously borrow on optimization techniques and expect the remaining audience to be fairly optimization savvy.Particularly, for an excellent introduction to the area of the so-called convex optimization see [13].
eig(X) = {ξ i , x (i) } i=1,k denotes the set of eigenvalueeigenvector pairs of X ∈ H k .
For X ∈ H k,++ denote by log X the logarithm function of X.In the eigenbasis of X, log X is the diagonal matrix Diag(log ξ) = Diag(log ξ 1 , . . ., log ξ k ).It is well known that log X is order preserving, i.e., log X ≥ log Y for X ≥ Y > 0, and matrix-concave, i.e., log(tX+ on H k,++ ; see, for example [14].Moreover, log X is also strongly order preserving [8]: if X Y > 0 then log X log Y (more results about strict concavity of log X can be found in [8]).
Assume we are given fixed integers m, n > 1 and a fixed non-zero A ∈ H mn,+,1 .We are interested in finding the REE of A, that is, in finding where the set of separable states D m,n is defined as with conv(•) denoting a convex hull and ⊗ denoting the standard Kronecker product.Recall the Kronecker product A ⊗ B of two matrices A, B may be defined using a block-matrix notation as a matrix with (i, j) blocks (A⊗B) i,j = a i,j B, ∀i, j.The following basic fact justifies the inclusion D m,n ⊂ H mn,+,1 and is readily established from the definition: Since D m,n is compact and the objective −A • log X is continuous, the infimum is achieved and may as well be replaced by the minimum.We turn our attention to the objective function in (2).For a moment, if we assume that X is a diagonal matrix Diag(ξ) ≥ 0 and A 0, then −A • log X = k i=1 A i,i log ξ i .Therefore, in the limiting sense, −A • log X = ∞ if and only if A i,i > 0 and ξ i = 0 for some i.Likewise, if A > 0 and X ≥ 0 is singular, then A • log X = −∞.In the same limiting sense we may write 0 log ξ = 0 for any ξ ≥ 0. So, our objective function −A • log X may approach infinity or stay bounded as X approaches the boundary of D m,n .We may state our first elementary proposition.
Proposition II.1.The objective function of (2), −A • log X, is convex with respect to X on H mn,++ .Moreover, the minimum of −A•log X over H mn,++ is −A•log A and is achieved at X = A. Consequently, if A / ∈ int D m,nthe interior of D m,n , then the solution to (2) lies on the boundary of D m,n .
For brevity we only sketch the proof; using a limiting argument one has to carefully work out the boundary case where −A • log X approaches infinity.Convexity follows from log X being matrix-concave on H mn,++ .The first differential of −A • log X vanishes at affine-feasible X = A + ∆ -a sufficient condition for the minimum of a smooth convex function.Namely, borrowing from Subsection III A, expressions (4), (3), we know that for small ∆ ∈ H mn with trace ∆ = 0, up to first order , the last part follows from both the tlevel sets of the objective function {X ∈ H mn,+,1 : −A • log X ≤ t}, as well as D m,n , being convex.
In fact, one may state an even stronger statement about the log X and the objective as in [8, Appendix A].
Theorem II.2.The function log X is strongly matrixconcave on H k,++ .Consequently, the function −A•log X is strictly convex on H k,++ for each fixed A ∈ H k,++ .
Therefore, if A > 0 then the solution to ( 2) is given by a unique boundary point [8].If A is singular, then X(A) does not have to be unique [7].In general, the set of minimizing separable states form a compact convex subset of D m,n which intersects the boundary of D m,n .
The problem (2) may look deceptively simple.In general, even deciding if a certain X is separable is NPhard [15]; see [16] for a similar problem pertaining to the graph isomorphism.Thus, despite the fact that ( 2) is a convex optimization problem -the objective function is convex-we do not attempt to solve (2) simply relying on, for example, the so-called (polynomial) ellipsoid method, as finding a separating oracle for D m,n might be already quite difficult.To emphasize the distinction between D m,n and its containing superset H mn,+,1 we state our next simple proposition.
Proposition II.3.D m,n is a proper subset of H mn,+,1 , and, in particular, has flat faces besides those that coincide with faces of H mn,+,1 .
Likewise, let cannot be written as A⊗B for any A, B ∈ H 2,+,1 ; the last construction may be easily generalized to arbitrary m, n > 1.
Surprisingly, in small dimensions m = 2, n = 2, 3 the set D m,n admits very easy alternative characterization using the so-called partial transpose [17].The partial transpose (with respect to the second factor) is the linear map PT 2 : H mn → H mn , which is given by partial exchange of matrix entry indices, and may be conveniently stated as PT Fact II.4.For m = 2, n = 2, 3, X ∈ D m,n if and only if X ∈ H mn,+,1 and the corresponding partial transpose PT 2 (X) ∈ H mn,+ .Moreover, for higher dimensions m, n the partial transpose condition is necessary.
(The necessity easily follows from the earlier mentioned property of the eigenvalues for the Kronecker product.) The above characterization of D m,n , m = 2, n = 2, 3, is equivalent to viewing the set as an affine slice of H k,+,1 for some k > 0, which in turn allows us to capture D m,n via positive semi-definite optimization [18,19] techniques.Thus, in the latter case of m = 2, n = 2, 3 where the feasible set is relatively simple and the derivative of the objective function −A • log X is relatively easy to compute, see the next section, we do expect to find an efficient approach to solving (2), as in, for example, [13].

III. THE APPROACH
Our main goal is solving (2), and the proposed technique is fairly simple.Since the objective function is convex, we rely on constructing a successively refined sequence of approximations to the epigraph of the objective restricted to int D m,n ⊂ H mn,+,1 , defined as which in turn may be viewed as a convex set.The resulting relaxations to (2) corresponding to the positive semi-definite optimization may be efficiently solved with off-the-shelf (and even free) numerical software.
We do not attempt to prove the convergence of the proposed approach.Instead, we provide numerical evidence of its efficiency.Lastly, we want to remark that our approach is consistent with the classical cutting planes approach in non-linear optimization, with the latter being already well established in many practical settings.
Recall that log(U XU † ) = U log(X)U † for unitary U and any X.Thus, for arbitrary X ∈ H k,++ , the firstorder expansion of log(X + ∆) may be written by applying spectral decomposition of X = U X Diag(λ X )U † X with the unitary matrix of eigenvectors U X and the matrix Diag(λ X ) of the corresponding eigenvalues λ X of X, Using the above and a sequence of fixed points X (i) ∈ intD m,n , i = 0, . . ., N , we may proceed with constructing an approximation to epi(−A • log X)| int Dm,n .Namely, since −A • log X is convex on int D m,n , its epigraph is supported by tangent hyperplanes at every X (i) , and so epi( where for brevity of notation we write with s i ≥ 0, which is the same as with s i ≥ 0 and since . Finally, recalling that from Proposition II.1 we have t ≥ −A•log A, the above approximation to epi(−A • log X)| int Dm,n may be refined to Note that (4) contains only linear inequalities.Intuitively, if N is large and the sequence X (i) forms a nearlydense cover of D m,n , then the approximation (4) to the epigraph of the objective would be fairly accurate.Consequently, in order to find an approximate solution to our problem (2), we might as well work with the above approximation to epi(−A • log X)| int Dm,n , minimizing t.That is, instead of (2) we want to find the minimum of t satisfying (4) over all X ∈ D m,n .It turns out that the latter problem may be handled efficiently numerically as described in Subsection III C, where we also propose a scheme for generating X (i) incrementally with the end goal of keeping N small.From now, D m,n represents the set of separable states only for the cases of m = 2 and n = 2 or 3.In higher dimensions, our approach will work if we take D m,n to be the set of PPT states.We adapt a somewhat unusual re-parametrization of D m,n relying on Fact II.4,which will prove to be useful in Subsection III C.
Recall that X ∈ D m,n if and only if X ∈ H mn,+,1 and PT 2 (X) ∈ H mn,+ .That is, X ∈ H mn,+ , traceX = 1, and PT 2 (X) ∈ H mn+ .These conditions may be re-written in the following way.Denote ⊗ e (o) e (p) , G (i,j,o,p) = e (i) e (j) ⊗ e (p) e (o) . ( Here e (i) , e (j) and e (o) , e (p) are the standard unit vectors in R m and R n respectively, i.e., e i = 1 and e (i) j = 0, j = i.Then the conditions that X ∈ D m,n can be written as follows: (a) Unit trace of X: (c) Strictly upper-triangular blocks of PT 2 (X): Here the matrix Y corresponds to PT 2 (X).Note that due to X, Y being Hermitian, it suffices to specify the re-ordering of the entries of X into Y = PT 2 (X) only for the upper-triangular part of X.

C. Sequential positive semi-definite optimization approach
A problem corresponds to positive semi-definite optimization [18] if it may be written as inf where k × k matrices C, A (i) are fixed.Note that the above includes block-diagonally structured matrices by choosing appropriate A (i) , b i to zero-out off-diagonal blocks, nonnegativity constraints x ≥ 0 by considering diagonal 1 × 1 positive semi-definite blocks, and the socalled free variables by taking x = y − z where y, z ≥ 0. Without loss of generality we may assume C, A (i) ∈ H k ; indeed, for any k × k matrix C and X ∈ H k we have So, even if C, A (i) are non-Hermitian initially, we may easily symmetrize those by following the procedure above.
Efficient numerical algorithms exist for solving this type of problems.In particular, the so-called interiorpoint methods [18] provide both the theoretical polynomial bound on the number of iterations needed to find an -approximate solution and decent practical performance.There is a number of freely available interiorpoint method solvers, e.g., SeDuMi, SDPT3, CSDP, DSDP, etc., suitable for solving these type of problems.For our purposes we rely on MATLAB-based SeDuMi solver.
The approximation of epi(−A • log X)| int Dm,n given by ( 4) combined with the parametrization of D m,n given by ( 6),( 7),( 8),( 9) form a basis for our numerical scheme to compute an approximate solution to (2).Based on refining our approximation of epi(−A • log X)| int Dm,n , we build a sequence of positive semi-definite optimization problems, with each problem resulting in finer and finer approximation to a solution of (2).
Let * denote the optimal value of (2), As we explained in §II, r * is attainable and we may replace inf by min.Let r ≤ r * ≤ r denote lower and upper bounds on r * respectively.For a given ε > 0, we say that Recalling that 1 mn I ∈ D m,n , the basic approach to find X * and approximate the true value r * may be summarized as follows: [0] if A ∈ D m,n then return X * = A, r = r = 0 (recall Fact II.4) and stop, else [2] solve positive semi-definite optimization problem corresponding to min t,X {t : (t, X) satisfy ( 4), ( 6), ( 7), ( 8), ( 9) and t ≥ r}, (11) and store its optimal solution pair (t, X), [3] update r = t, [4] if r − r ≤ ε then return X * , r, r and stop, else set X (N +1) to be the next X-refinement point for epi(−A • log X)| int Dm,n and increment To justify our termination criteria in [4], observe that since the approximation to (convex) epi(−A • log X)| int Dm,n is based on supporting hyperplanes, the solution to (11) always corresponds to a lower bound on the optimal value of −A • log X in (2).Note that r and r are non-decreasing and non-increasing respectively at each iteration of our scheme.
It is left to explain how we construct X-refinement points.

D. X-refinement point for epi(
Given a sequence of points X (0) , . . ., X (N ) in int D m,n , a point X solving (11), and some fixed reference point Z ∈ int D m,n , we generate X (N +1) simply by minimizing −A • log X on the linear segment [Z, X], that is, In fact, we are interested in finding the approximate minimizer in (12) with a given precision, so the equality above should be interpreted in approximate sense.
Given the tolerance δ > 0, the one-dimensional minimization may be efficiently performed using, say, the standard derivative-based bisection scheme as follows: • set the end points X start = Z, X end = X of the interval containing the approximate minimizer of (12) X min [X start , X end ], set the initial value of This scheme quickly terminates with an interval [X start , X end ] X min such that X end − X start < δ; note that the directional derivative of −A • log X may be easily evaluated using (3).We set X (N +1) = X min .
Particularly, we experiment with two choices of the reference point Z: A. (conservative strategy) Z = 1 mn I, B. (aggressive strategy) Z = X * , where X * corresponds to the best upper bound r on r * obtained so far.
Recall that generally in the absence of degeneracies the solution to (11) lies in the boundary of semi-definite cone.Therefore, intuitively, choice A may encourage the next refinement point X (N +1) to be more centered, while B. would be more aggressive choice that in a sense targets "minimizing the discrepancy" between the lower and upper bounds r, r, and so we would generally expect B to result in smaller N needed, i.e., a fewer number of cutting planes added.Indeed, our intuition seems to be concurred by numerics, see the next section.Despite its seeming simplicity, our choice of X (N +1) proves to be very effective numerically in refining the objective epigraph epi(−A • log X)| int Dm,n near the region of interest, i.e., close to an ε-approximate solution to (2); see the next section for some illustrations.
Next we comment on a few possible reasons for this surprising efficiency.
Remark III.1.We solve (11) only approximately since, in general, presently this is the only option for handling positive semi-definite optimization problems efficiently, i.e., in polynomial time.However, this does not seem to negatively affect our approach to (2) as practically optimization problem ( 11) is solved to a very high (nearmachine) precision.
Remark III.2.It is only for notational convenience that in Subsection III A and from that point on we restrict X (i) to belong to the interior of D m,n rather than the whole set itself.Observe that in general −A • log X may approach infinity when X approaches the boundary of D m,n .Since we want to use (4) in forming (11) and subsequently solve the problem numerically, such a situation is clearly undesirable.In practice, to overcome such a tendency for X (i) we implement a simple numerical safeguard re-centering procedure, briefly outlined at the end of the subsection.
In principal, one can prove the convergence of a minimization algorithm if it is possible to show that the objective function gets reduced by some non-trivial amount at each iteration of the algorithm.If we have a smooth objective function with no constraints, one natural choice is to follow the steepest descent direction -the direction of the negative gradient-at each iteration, until a significant decrease in the objective is achieved.However, when constraints are present, such as the case of (2), following the steepest descent direction might not work well as the direction might send us back to the boundary of the feasible region over and over again, thus, hindering the progress of our algorithm.Therefore, intuitively, we want to (i) make long enough steps that guarantee a sufficient decrease of the objective at each iteration, and, at the same time, (ii) stay away from the boundary of D m,n whenever possible.
The first goal is accomplished by considering (11) instead of, say, merely minimizing −A • log X along the steepest descent direction.Potentially, this allows us to traverse the whole feasible region D m,n in search of a better next iterate.In other words, solving (11) rather than minimizing the objective function along the gradient hopefully prevents the algorithm from being stuck near the boundary of D m,n and allows for long enough steps.Recall that to guarantee that we actually decrease the objective at each iteration rather than overshoot towards a point with even larger objective value as compared to the one we started with, we combine solving (11) with backtracking line-search (12) along [Z, X].
The second goal is in part accomplished by relying on the maximal-rank property of the path-following IPM used to solve (11).Here, interestingly, a seemingly trivial refinement in (4) that replaces unrestricted t with t ≥ −A • log A makes a dramatic difference in the numerics.That is, while t ≥ −A • log A probably does not have such a huge impact on refining the epigraph itself, it certainly helps to produce a well-centered solution X * to (11) by finding a full-rank solution X * -hopefully far away from the boundary of D m,n -if such exists at least during the very first iterations of our scheme.
Furthermore, if the addition of next point X (N +1) to the approximate description of the epi(−A • log X)| int Dm,n according to (4) causes numerical problems for solving (11), we augment X (N +1) (as many times as necessary) by re-centering it towards 1 mn I, that is, by setting X (N +1) = βX (N +1) +(1−β) 1  mn I where β ∈ (0, 1) is some pre-set constant.In our experiments we use the default value of β = .5.The rationale for the latter is that 1 mn I is a well-centered point in D m,n and hence does not cause any numerical problems in (4).

IV. COMPUTATIONAL RESULTS
Our numerical approach was implemented in MAT-LAB, using SeDuMi solver version 1.1R3 and tested with MATALB version R2007a.We tested our algorithm on Intel Xeon 2.4 Ghz machine with 2GB RAM.
We ran our numerical scheme on a set of 100 randomly generated matrices A: 50 instances corresponding to m = 2, n = 2 with targeted default solution precision of ε = 10 −3 , and 50 instances corresponding to m = 2, n = 3 with targeted default solution precision of ε = 10 −2 , setting δ = 10 −10 .All the runs were successful.
Comparing strategies A and B, indeed, the latter aggressive approach appears to produce superior numerical results.For example, on the 50 problems corresponding to m = 2, n = 3, the average solution times per problem instance were decreased almost 5-fold, with average 65.14 outer iterations per problem in case of A resulting in slightly over 5 seconds run-time per instance, compared to only average 13.8 outer iterations and under 1 second run-time per problem while using B.
For the randomly generated problems, our choice of the termination precision level ε was stipulated by the following performance metric that may be used to judge the quality of the approximation.For a given problem instance, one may introduce relative gap defined as g r = r − r r * .
Note that since A•log A is negative due to the eigenvalues of A being non-negative and adding up to 1, and so r > 0, we can bound g r ≤ r − r r .
So, for example, despite a seemingly high value of of ε = 10 −2 in case of m = 2, n = 3, most of the problems were terminated with g r < 10%.For a more detailed illustration, we report on the following three examples, where, in addition, one may compute the relative entropy with ≈ meaning that the actual computed values in X * differ form the above by no more than 10 −5 , relying on strategy A, in case of B the results are similar with solution being found in 5 iterations, • 4 × 4 full-rank (entangled) Werner state, here m = n = 2 and with ≈ meaning that the actual computed values in X * differ form the above by no more than 6 • 10 −5 , relying on strategy A, in case of B the results are similar with solution being found in 25 iterations.
For the above examples, in an attempt to make it even more convincing case, we set the targeted precision to ε = 10 −7 (with the corresponding δ = 10 −12 ). Ref.
[12] contains MATLAB driver routine that calls SeDuMi solver as its internal subroutine; note that functions vec and mat are internal to SeDuMi but are trivial to re-implement as stand-alone routines.For more details on the implementation of SeDuMi see [21] and the references therein.If needed, one may replace SeDuMi by some other appropriate solver such as SDPT3, CSDP, etc.The driver routine is written with the primal goal of having the code transparent and readable, rather than achieving maximal computational efficiency.For example, although the use of MATLAB's cells allows for a greater transparency, it is definitely not the most efficient and fast approach from the computational point of view.

V. CONCLUSION
We propose the first, to our knowledge, numerical algorithm for computing REE E R (σ) in small dimensions for general σ.Our numerical experiments support the viability of our approach.The efficiency of our numerical scheme may potentially be improved by considering the second-order-based approximation to epi(−A • log X)| int Dm,n and consequently relying on the so-called convex quadratic positive semi-definite optimization problems instead of (11), which are still amenable to the interior-point methods.We do not pursue this route since we are quite satisfied with the performance of our numerical approach so far.
Optimistically, we hope that one may actually establish a (polynomial) complexity bound on finding an εapproximate solution to (2) by following a variant of the proposed scheme.Also, we hypothesize that in turn ε gives rise to a bound on the proximity to the true solution of (2).Presently, both are under further investigation.
Note that our approach is also applicable in a straightforward fashion to computing the lower bound on (2) where D m,n is replaced by positive partial transpose states.We experimented with larger matrix sizes going up to m = 5, n = 10 and still were able to get the 10 −2 -approximate solutions, although, the actual computations took much longer.If desired, in general in low dimensions we could achieve far greater numerical precision than 10 −2 as illustrated in the computational section.