CONJUGATE DIRECTION METHODS FOR SOLVING SYSTEMS OF LINEAR EQUATIONS

A generalization of the notion of a set of directions conjugate to a matrix is shown to lead to a variety of finitely terminating iterations for solving systems of linear equations. The errors in the iterates are characterized in terms of projectors constructable from the conjugate directions. The natural relations of the algorithms to well known matrix decompositions are pointed out. Some of the algorithms can be used to solve linear least squares problems.


Introduction
The purpose of this paper is to describe a general class of algorithms for solving the equation where A is a nonsingular matrix of order n and x and b are n-vectors. The algorithms improve an approximate solution X q by stepping along a set of "conjugate" directions u^, u^y.'yU^ in such a way that the n-th vector so produced is the solution of (1.1). We shall show that with a suitable definition of conjugacy many well known methods, and some less well known ones, can be derived as special cases of our general algorithm.
The prototype for the class of conjugate direction algorithms was described by Fox,Huskey,and Wilkinson [1]. They take A to be Hermitian and define the linearly independent vectors u^, Ug,...,^ to be A-conjugate if i * j ujAu ' = 0 .

Yk-A
3) *k-x fc-i + V: k It should be noted that the division in statement 1.2 of (1.2) can always T T be carried out, since Is the k-th diagonal entry of U AU, which is diagonal and nonsingular.
The above algorithm would be useless without a method for generating conjugate directions. Fox, Huskey, and Wilkinson show how a set of conjugate directions can be constructed recursively as linear combinations of the unit vectors e^>e 2 > • • • > e n * At eac^ ste P tlie conjugation algorithm requires no more work than the solution of a triangular system, and the algorithm as a whole is therefore not an unreasonable method for solving linear systems.
The same algorithm was rediscovered, aparently independently, by Hestenes and Stiefel [3]. They showed that the conjugation algorithm could be regarded as a variant of Gaussian elimination on the matrix A.
Moreover, they pointed out that the set of directions generated by the conjugate gradient algorithm is A-conjugate, thus exhibiting the method of conjugate gradients as a special conjugate direction algorithm.
In 1955  Section 3 is devoted to the description of a general conjugation algorithm and its consequences. In particular it is shown that different choices of the parameters in the algorithm lead to various methods, some well known, for solving linear systems, and that these methods are closely related to well known matrix factorizations.
Ideally the paper should end with a section detailing the author's extensive numerical experience with these algorithms. But the number of conjugate direction algorithms is quite literally infinite, and the choice of any single algorithm will probably be indicated by its suitability for the problem at hand. It is hoped that this paper will encourage independent workers to experiment with specific algorithms in various applications. x jugate direction schemes all that is required of A is that one be able to evaluate Ap for any vector p. In solving nonlinear equations this value may be approximated by for some suitable value of a, which circumvents the need of calculating F explicitly. We shall not persue this line here; however, those who do x may find the theory of §2 useful in constructing local convergence proofs.
Throughout the paper we shall use Householder's notational conventions.
In addition (£ n will denote complex n space, and (£ mXn the set of mxn matrices. The column space of A will be denoted by ?t(A) and the null k Ik k kl space by 71(A). Given any matrix A, the matrices A , A , A-, and A will denote the submatrices consisting of respectively the first k rows, the first k columns, the last k rows, and the last k columns of A. Thus

Ik
A is the leading principal submatrix of A of order k.

Conjugacy
The proof that the vector x^ generated by (1.2) is a solution of (1.1) consists of verifying inductively that the k-th residual r^ is orthogonal to u^,^,... ,u^. Since U is nonsingular, r^ must be the null vector; i.e. b -Ax =0. n The point to be noted is that the vectors u^,^,...^^ serve two purposes: first they provide directions along which the approximate solutions x^ are to be altered, and second they delineate the subspaces in which the residuals r^ are forced to lie. The essential part of our generalization of the notion of conjugacy is to provide a second set of vectors to serve the second purpose.
Definition 2.1. Let A, U, V e <Q nXn be nonsingular. Then (U,V) is an A-conjugate pair if L = V H AU is lower triangular.
The generalized algorithm for solving (1.1) is a slight variant of (1.2).
Let b, x Q e C n .

1)
For k = l,2,...,n Again it should be noted that the algorithm can always be carried to completion; for the denominator v fc Au^ in statement 1.2 is the k-th diagonal of the lower triangular matrix L and must be nonzero since L is nonsingular. The last vector x^ produced by the algorithm is the solution of (1.1).
Theorem 2.3. In Algorithm 2.2 This theorem can be proved in three ways, each of which has advantages, The simplest way is to show inductively that r^ is orthogonal to V 1 ,V 2** * * ,v k* w^^c^ i m pli es that r^ = 0.
A second proof may be had by regarding A as a linear transformation of (£ n into C n « If the domain of A is equiped with the basis formed from the columns of U and the range with the basis formed from the columns of V , then the matrix representing the transformation A is the lower triangular matrix L. Moreover, in this coordinate system, Algorithm 2.2 becomes nothing more than the usual recursive algorithm for solving The third proof follows from a detailed investigation of the errors in the x^, which we now give. Let Thus the problem of characterizing e k becomes one of characterizing the matrix (I -P fc ) (I -P k-1 )...(I -P^ . It is easily verified that 2 k k i.e., P k is a projector. In fact from (2.1) it is seen that P k is the rank one projector onto the space spanned by u k along the orthogonal complement of the space spanned by A v k . Moreover, by the A-conjugacy of U and V, we have i < k v^Au k = 0.
Theorem 2.6. Let P^P^... be conjugate projectors. Then Proof. The theorem is trivial for k -1. Assume its truth for P l' P 2'*-" P k-l and let \ " (I " P k )(I " W"^1 " V-Now from Lemm a 2.5 it follows that and hence that Q k = Q^; i.e. Q fc is a projector.
The column space of is given by Let Q fc x = x. Then by Lemma 2.5 We must still justify the use of the direct sum in the characterization of 7^(P k ). By the induction hypothesis, it is sufficient to show that 7t-( p k) A ^^Pk-1^ =^0 ** Now from the con Jugacy conditions it follows that Q k-1 P k -P fc or P fc-1 P k -0. Hence if P fc x = x (x e # (P fc )) and P k-l x = X (x e ^( P k-l^ then Returning to the characterization (2.2) of e fc , we obtain the following result as a consequence of Theorem 2.6.
Theorem 2.7. In Algorithm 2.2, the errors e^ = x -x^ are given by

e k = (I -Vv
where P^ is the projector onto ft (u'-k) along ??(U n " k l) (or equivently along the orthogonal complement of f%, (aV u )).
Proof. It follows immediately from theorem 2.6 and the form of the projectors P^ that P^ is the projector onto 7^ (U^) along the orthogonal complement of ft (A^V^). By the conjugacy of (U,V), the orthogonal complement of ft (A 11^) is ft(U n~k| ), which establishes the theorem.
Since P = I, it follows that e =0, which proves Theorem 2.3. n n The projector P fc can be represented as follows. Let W = U 1 . Then it is easy to verify that n and i -p. = u n_k^.
k Since the residual vector r^ is simply Ae^, we have r k -(I -A^A" 1 )^.
The matrix I -AP^A ^ is a projector. In fact, we have the following easy corollary of Theorem 2.7.
Corollary 2.8. The residuals in Algorithm 2.2 are given by lk where is the projector onto the orthogonal complement of (V ) along lk the orthogonal complement of (AU ).
We conclude this section with two extensions of the notion of conjugacy. The idea of the conjugation technique is simple. Given nonsingular matrices V, A, and P, we attempt to determine as a linear combination of V^9^2 9 # * * * n su°k a wa y ^at U and V are A-conjugate. We shall call this process the A-conjugation of P with respect to V.
To determine when conjugation can be carried out, note that the process is equivalent to finding an upper triangular matrix, which we shall denote by S \ such that U = PS -1 . The A-conjugacy of (U,V) requires that L = V H AU = V^PS"" 1 be lower triangular. In other words V AP = LS must be factorizable into the produce of a lower triangular matrix and an upper triangular matrix H (V AP has an "LU factorization"). Since V, A, and P are nonsingular, the matrix S is uniquely determined up to the scaling of its rows (see [5, §1.4 which implies that U is uniquely determined up to the scaling of its columns.
We summarize these results in the following theorem.
Theorem 3.1. Let V, A, and P be nonsingular. Then a necessary and sufficient condition that P can be A-conjugated with respect to V is that V AP have an LU factorization. In this case the conjugate vectors so obtained are unique up to scaling.
A reasonably efficient conjugation algorithm may be derived as follows. \ 3a kk ( Pk-u|k " ls k>' a kk^°- Of course when k = 1, statement 1.1 is skipped and u^ is determined as a scalar multiple of p^.
An important feature of the conjugation technique is that the vectors v k ,V k+l* * # -,V n are not nee(^e(^ to determine u^9u^9... ,u k . This means that the choice of v k can be defered until after u k has been computed, and thus can be made to depend on u^,^, •. • jUj^.
We shall now consider some of the algorithms that may be obtained by varying V and P in the conjugation algorithm. Each choice leads to a wellknown matrix decomposition and it is convenient to list the chioces by the decompositions they determine.

Vi.k -Vl A V
Similarly from the equation -VH H , it follows that the columns of V may be calculated by the recurrences where , and n, , . are defined as above and n* , , -, is chosen so that kk k,k+l k,k+l H v^^Au^^ =* 1. The vectors and v^ can be generated simultaneously. At no stage is it necessary to retain more than two of the vectors u^ and two of the vectors v fc , which suggests that the algorithm may find application to large sparse linear systems.
When A is Hermitian and = = r^9 the above method, combined with The columns of U and V may be generated successively by an algorithm which may be derived in much the same way as the biconjungation algorithm. where orthanormality requires that (3.6) X k,k-1 = V k Au k-l> and X is chosen so that Uy. is of length unity: (3.7) X k k -$ \ -A K > K . L U J V L .

From the equation AU = VL
it follows that X k+l,k V k+l = Au k " A kkV The orthonormality conditions for V give the same values for the X f s as (3.6) and (3.7). Again the reduction can proceed stepwise and only the most recent vectors need be stored. This reduction was proposed by Golub and Kahan [ 2 ].
If v x e ft (A), then fl(y) = fi(k) 9 and the algorithm can be used to solve least squares problems.