THE CONVERGENCE OF THE METHOD OF CONJUGATE GRADIENTS AT ISOLATED EXTREME POINTS OF THE SPECTRUM

Let A be a positive definite matrix with a simple eigenvalue that lies outside an interval [#,6] containing the remaining eigenvalues. Let the method of conjugate gradients be applied to the solution of the linear system Az = b producing a sequence of iterates z^,z^,... and an associated sequence of error vectors e. - z - z.. In this paper bounds are obtained for the com-1 ponent of the error vector lying along the eigenvector associated with X-j. The bounds imply that, provided X-| is well separated from [a,|3], this compo nent will decrease rapidly, even when the matrix A is moderately ill conditioned.


Introduction
In this paper we shall be concerned with the method of conjugate gradients [2] for solving the system of linear equations The algorithm (1.3) will work for any set of conjugate directions satisfying (1.2) (see [7] for a proof and generalizations). What distinguishes the method of conjugate gradients from other conjugate direction algorithms is that the direction d. is taken to be a linear combination of the first k members of k with the well conditioning of A [1]. Even for the moderately ill conditioned systems associated with the numerical solution of partial differential equations, the method may produce acceptably accurate solutions surprisingly quickly [1,6], a phenomenon which is not covered by the existing theory and is not well understood. The behavior of the method in these applications apparently depends rather delicately on the spectrum of the matrix A and its relation to the solution.
The purpose of this paper is to make a start toward a more refined theory by examining the special case where A has a largest or smallest eigenvalue that is isolated from the rest of the spectrum. For the case of a largest isolated eigenvalue we shall show that the component of the error along the corresponding eigenvector must diminish rapidly at a rate that is independent of the condition number of A. Unfortunately the order constant depends on a number that is bounded by the square root of the condition number of A; however the derivation suggests that this bound will in many cases be an overestimate. If P^ were an orthogonal projector, this expectation could be easily verified.
However P^ is an oblique projector, potentially quite oblique since A may be ill conditioned, and the proper treatment of this obliqueness accounts for most of the detail in the sequel.
In the next section we shall use standard techniques to estimate how accurate an approximation to x^ we can expect to find in ^D k ) • In Section 3 we shall examine the structure of the projector P^. These results will be applied in Section 4 to prove a general theorem bounding the component of the error along . We shall follow Householder's notational conventions [3,8], and we shall use the Euclidean vector norm defined by as well as the subordinate spectral matrix norm defined by

IMI-f-P, INI-
Part of this work was done while I was visiting the IBM Thomas J. Watson Research Center, where I was particularly encouraged by a series of stimulating discussions with Dr. Philip Wolfe.

Obtaining an Accurate Eigenvector
Let the matrix A have eigenvalues X-|,X 2 ,...,X n corresponding to the orthonormal system of eigenvectors x^,x 2 ,•••,X r . Since we shall be concerned with the behavior of the method of conjugate gradients at either the largest or the smallest eigenvalue, we shall denote that eigenvalue by \^ and let be the smallest interval containing the remaining eigenvalues. Set It follows that AX = XA, AX 2 = X 2 A 2 , and hence for any polynomial TT In this section we shall attempt to determine how accurate an approximation to x^ can be found in , where was defined in the last section.
This is equivalent to finding a linear combination of the members of the Krylov sequence (1.4) that is a good approximation to x.
If we introduce the polynomial k-1 then we must determine TT so that is a good approximation.
Assume that x-|d-| is nonzero. Then the vector TT(\Jx«d has its component along x^ equal to unity. The remaining components are given by

Î J I l^2 II
The quantity in braces is well known to be 1 Theorera 2.1 has been given implicitly in [5] and explicitly in [4]. It implies that there is a vector in /?(\) whose 2-norm (as opposed to its individual components) along X 2 decreases to zero in proportion to the k-th power of (2.2) in other words the projector P may be defined by any matrix of full rank whose column space is the same as that of D, We shall find it convenient to take the columns of U to form an orthonormal basis for ^(D). From the results of the last section, we know that some vector can be expected to be a good approximation to x.j, and without loss of generality we may assume that this vector is u^ , the first column of U.
-8-Partition U in the form (u 1 .l^) and let The inverses in the above expressions must exist, since B is positive definite. The bound for |CR FL | depends on the fine structure of P. Specifically It follows from the results of Section 2 that the method of conjugate gradients must ultimately reduce the component of the error along x^ at a rate at least as great as the approach of the k-th power of (2.2) to zero. Unfortunately, the In the case where X-j < |3. the factor 1-bt becomes one of order unity.

If we define
However when e is small, the term in braces in (4.2) becomes approximately equal to H(A). Of course the quantity e continues to decrease at a rate that is independent of H(A).