.EQ
delim @@
.EN
.TL
A Preconditioned Conjugate Gradient Method for Solving  
a Class of Non-Symmetric
Linear Systems
.AU
J.J. Dongarra, G.K. Leaf, and M. Minkoff
.AI
Argonne National Laboratory 
.QS
Summary - This report describes a conjugate gradient preconditioning scheme for 
solving a certain system of equations which arises in the solution of a
three dimensional partial differential equation. The problem
involves solving systems of equations where the matrices are 
large, sparse, and non-symmetric.
.QE
.sp 2
.FS
* Work supported in part by the Applied Mathematical Sciences
Research Program (KC-04-02) of the Office of Energy Research of the
U.S. Department of Energy under Contract W-31-109-Eng-38.
.FE
.NH
STATEMENT OF THE PROBLEM
.PP
We are concerned with the solution of large scale linear
systems where the coefficient matrix arises from a finite 
difference approximation (seven point) to an elliptic
partial differential equation in three dimensions. Such
systems can arise in the modeling of three dimensional,
transient, two-phase flow systems [1,2]. The elliptic
equation governs the pressure (or pressure difference)
and has to be solved numerically at each time step
in the course of the transient analysis. Because the
physical problem involves convection, the elliptic
equation contains first order partial derivatives which implies
that the usual seven point finite difference
approximation leads to a non-symmetric coefficient matrix.
Since these matrices arise from a finite difference
approximation in three dimensions, the matrices will
tend to be large. For example, a finite difference
mesh of @15 times 15 times 30@ mesh cells leads to a linear
system of order 6750. with about 45,000 non-zero coefficients
in the matrix for a density of about .001. Thus the class of matrices 
will be large, sparse, and non-symmetric. In this report we
intend to review several possible approaches for solving
systems of this type, describe an implementation based on
the use of incomplete LU factorization coupled with a 
conjugate gradient method, and finally we shall describe some 
numerical results for this algorithm.
.NH
MATRIX GENERATION
.PP
In general, we are interested in solving linear systems @A x ~=~ b @
where @A@ is large, sparse and non-symmetric. A simple finite difference
approximation to a Poisson equation with convection (or transport) terms
can be used as the model problem for generating such matrices. The matrices
used in this report will be generated in the following manner:
.PP
Consider the rectangular domain
.EQ
D ~=~ [ 0 , ~ L sub x ] ~ times ~ [ 0 , L sub y ] ~times~ [ 0 , ~ L sub z ]
.EN
in three dimensions and the elliptic equation
.EQ I (2.1)
-{ partial sup 2 phi } over { partial x sup 2 } ~-~ { partial sup 2 phi } over { partial y sup 2 } ~-~ { partial sup 2 phi } over { partial z sup 2 } ~+~ OMEGA ( x,y,z) phi ~+~ V vec ( x,y,z ) cdot grad phi ~=~ F (x,y,z)
.EN    
subject to the boundary conditions @{ partial phi } over { partial n } ~=~ 0 @
on the four vertical faces, ( @ partial over { partial n } @ denotes the normal
derivative), and either,
.EQ I (2.2)
phi ~=~ G sup L (x,y,0) ~or~ { partial phi } over {partial n } ~=~ 0
.EN
on the bottom face, and either,
.EQ I (2.3)
phi ~=~ G sup U ( x,y , L sub z ) ~or~ {partial phi } over { partial n } ~=~ 0  
.EN
on the top face.
If a Neumann condition 
is imposed on all faces, then we shall specify the value of @ phi @
at some point in order to ensure uniqueness. We shall assume that 
@ OMEGA , V vec , F @ are given bounded functions with @ OMEGA ~>=~ 0 @
everywhere in @D@.
.PP
For any three integers @ N sub x , N sub y, N sub z @ we generate
a
simple mesh centered difference approximations of the seven point type using
centered differences on the convective terms. Let
@ DELTA x ~=~ L sub x / N sub x @, @~ DELTA y ~=~ L sub y / N sub y, @
@DELTA z ~=~ L sub z / N sub z,@ @x sub i ~=~ i DELTA x,@ @0 <= i <= N sub x,@
@ y sub j ~=~ j DELTA y , ~ 0 <= j <= N sub y , z sub k ~=~ k DELTA z , @
@0 <= k <= N sub z @. We use an overbar to denote the midpoint of the mesh,
for example @ x bar sub i ~=~ half ( x sub i-1 ~+~ x sub i ) @.
We evaluate @ OMEGA , F , G sup L , ~and~ G sup U @ at the mesh centers;
thus for example @ OMEGA sub i,k ~=~ OMEGA ( x bar sub i , y bar sub j , z bar sub k ) @ .
The convection velocity @ V vec ~=~ ( V sup x , V sup y , V sup z ) @
is evaluated at the mid-faces of the edges of the mesh cells. Thus for
example, 
.EQ
V sub i sup x  ~=~ V sup x ( x sub i , y bar sub j , z bar sub k ) , ~ V sub j sup y ~=~ V sup y ( x bar sub i , y sub j , z bar sub k ) , ~and~ V sub k sup z ~=~ V sup z ( x bar sub i , y bar sub j , z sub k ) .
.EN
Let any quantity associated with the @ ( i , j , k ) sup th @ cell
be denoted with a zero subscript, the @(i +- 1 , j , k ) sup th @ cell
with a @ left { pile { 1 above 2 } right } @, 
the @( i,j +- 1 , k ) sup th @ cell with a @ left { pile { 3 above 4 } right } @, 
and the 
@( i , j , k +- 1 ) sup th @ cell with a @left { pile { 5 above 6 } right } @
subscript. With this notation a seven point approximation for equation (2.1)
has the form
.EQ I (2.4)
a sub 0 phi sub 0 ~+~ sum from l=1 to 6 a sub l phi sub l ~=~ f sub 0
.EN
where at an interior cell
.EQ I
a)~~~a sub 0 ~=~ 2 ( 1 / DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ~+~ 1 / DELTA z sup 2 ) ~+~ OMEGA sub 0
.EN
.EQ I (2.5)
b)~~~a sub 1 ~=~ - 1 / DELTA x sup 2 ~-~ V sub i-1 sup x / 2 DELTA x , ~~ a sub 2 ~=~ - 1 / DELTA x sup 2 ~+~ V sub i sup x / 2 DELTA x
.EN
.EQ I
c)~~~ a sub 3 ~=~ - 1 / DELTA y sup 2 ~-~ V sub j-1 sup y / 2 DELTA y , ~~ a sub 4 ~=~ - 1/ DELTA y sup 2 ~+~ V sub j sup y  / 2 DELTA y
.EN
.EQ I 
d) ~~~ a sub 5 ~=~ - 1 / DELTA z sup 2 ~-~ V sub k-1 sup z / 2 DELTA z , ~~ a sub 6 ~=~ - 1/ DELTA z sup 2 ~+~ V sub k sup z / 2 DELTA z
.EN
At the boundary cells these coefficients are modified according to the 
boundary conditions. For example, if @ ( i , j , k ) sup th @ cell is
on the bottom interior face
@( k  = 1,~2 <= i <= N sub x -1 , ~ 2 <= j <= N sub y -1 ) @, then we use
.EQ I (2.6)
a sub 0 ~=~ 2 ( 1/ DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ) ~+~ 3 / DELTA z sup 2 ~+~ OMEGA sub 0 ~+~ V bar sub 0 sup z / 2 DELTA z , ~~ a sub 5 ~=~ 0
.EN
for Dirichlet conditions and
.EQ I (2.7)
a sub 0 ~=~ 2 ( 1/ DELTA x sup 2 ~+~ 1 / DELTA y sup 2 ) ~+~ 1 / DELTA z sup 2 ~+~ OMEGA sub 0 ~-~ V bar sub 0 sup z / 2 DELTA z , ~~ a sub 5 ~=~ 0
.EN
for Neumann conditions where @ V bar sub 0 sup z ~=~ V sup z ( x bar sub i , y bar sub j , 0 ) @.
Similar modifications occur on the other faces, edges, and corners depending
on the imposed boundary conditions. We do not illustrate the modifications
to the right hand side since we are primarily interested in generating
matrices. We order the unknowns @ phi sub ijk @ using a @( kij )@ 
ordering with the linear index
.EQ I (2.8)
m ~=~ k ~+~(i - 1 ) N sub z ~+~ (j-1) N sub z N sub x , ~~ 1 <= k <= N sub z,
~ 1 <= i <= N sub x, ~ 1 <= j <= N sub y .
.EN
.PP
In this manner we generate a class of large sparse matrices @ A @ which
are typical of the class of matrices we are concerned with in this 
report. We observe that the convective terms insure that in general these matrices
will be non-symmetric . This particular class of matrices is rather
simple in that the second order terms in the elliptic operator come from
the Laplacian. However, in the applications which motivated this study, 
the elliptical operator had variable coefficients in the second
order terms. When considering methods, it is this larger and more general
class of matrices which we have in mind. To illustrate
the matrix structure we are dealing with we note that
a @ 3 times 3 times 3 @ mesh would
lead to a @ 27 times 27 @ matrix @ A @ with the sparsity pattern shown
in Figure 1.
.sp 30
.EQ
FIGURE~1.
.EN
.EQ
SPARSITY~PATTERN~FOR~3~ times ~3~ times ~3~~ CASE.
.EN
.NH
BACKGROUND
.PP
As a background to some of the work going on in this area we will
briefly sketch a few of the alternate approaches for solving this 
problem.
.PP
In a paper by Concus, Golub, and O'Leary [8] a generalized conjugate
gradient method is presented. The original matrix @A@ is split
into two matrices of the form @ M ~=~ half ( A ~+~ A sup T ) @ and
@ N ~=~ half ( A ~-~ A sup T ) @, such that @ M ~=~ M sup T @,
@ N ~=~ - N sup T @ and @Mx ~=~ Nx + f @. The form of the matrix
@M@, it is hoped, is such that the system @ M eta ~=~ b @ is
easy to solve. If this is not the case then one must resort to a
scheme in which an inner and an outer iteration is used.
The inner iteration is used to solve @ M eta ~=~ b @ and the outer iteration
to find solution to the original problem.
.PP
In a paper by Manteuffel[9], a method is described based on Tchebychev
polynominals in the complex plane. For the solution of the linear
system @ A x ~=~ f @ the following iterative method can be used:
.EQ I
x sub i+1 ~= - alpha sub i A x sub i ~+~ ( 1 ~+~ beta sub i ) x sub i ~-~ beta sub i x sub i-1 ~+~ alpha sub i f @
.EN
Manteuffel shows that this method converges to the solution of
@ A x ~=~ f @ if the spectrum of @A@ is enclosed in an ellipse, with
focii @d-c@ , @d+c@, in the right half plane and if @ alpha sub i@
and @ beta sub i @ are chosen to be 
.EQ I
alpha sub i ~=~  { 2 T sub i (d/c) } over  {c T sub i+1 (d/c) }
.EN
.EQ I 
beta sub i ~=~ { T sub i-1 (d/c ) } over { T sub i+1 ( d/c ) }
.EN
where @ T sub i @ is the @i sup th @ Tchebycheff polynomial
.EQ I
T sub i (z) ~=~ cosh ( i cosh sup -1 (z) )
.EN
for @z@ complex.
Manteuffel[10] has provided an algorithm for estimating the parameters @d@ and
@c@ adaptively.
.PP
In order to improve the speed of convergence of the algorithm one may use a
preconditioning matrix and solve the resulting system.
Results involving a preconditioning of Manteuffel's algorithm are
given by van der Vorst and van Kats [11].
They considered the use of a fast poisson solver, an incomplete Crout
factorization (with a parameter included to avoid partial pivoting),
and incomplete Cholesky factorization as preconditioners. Also fill-in on
stripes adjacent to the bands in the original matrix is allowed in the
Cholesky approach. Results for a discretization of a two spatially
dimensioned convective-diffusion equation are provided. Van der Vost and
Van Kats conclude the Manteuffel's algorithm with incomplete
Crout preconditioning provides a competitive approach.
.PP
In two papers [12,13] Axelsson has studied iterative methods for the
numerical solution of boundary value problems.
In particular he extends the use of conjugate gradient methods to
nonlinear problems and to constrained boundary value problems.
In [13] he studies various incomplete factorizations, i.e. LU factorizations
which approximate A.
If, during factorization, we allow no fill-in or accept fill-in in 
certain entries we might expect to obtain a good incomplete factorization.
However it is possible that the factorization process may be unstable
since the dropping of fill-in may cause pivot elements to become 
negative (assuming the matrix was originally positive definite).
In [14] Meijerink and van der Vorst show that for M-matrices the process
is in fact stable.
This is however not the case in general.
To overcome this problem Gustafsson [15] presents a modification in which
neglected fill-in elements are added to the diagonal.
He proves this method to be stable for diagonally dominant matrices
(although this method is again unstable for general matrices).
Munksgaard and Axelsson [16] extend this approach by adding neglected
elements to the diagonal or other elements in the row.
A major property of this technique is that row sums are preserved.
This is important since for second order Laplace problems using constant mesh
spacing h [15] the condition number of A is @O( h sup -2 )@ whereas
the condition number of the preconditioned matrix is @O( h sup -1 )@.
This result holds if the rowsums of A and the preconditioning matrix differ
by no more than @O(h)@. 
Thus by preserving rowsums we reduce the condition number of the preconditioned
matrix which, in turn, affects the number of conjugate gradient iterations.
In addition to the above preselection of fill-in elements, Munksgaard and
Axelsson [16] have also considered the use of dynamic fill-in.
In this approach fill-in elements are selected during factorization based
on their magnitude.
.PP
Another approach to dealing with the instability of incomplete decomposition
is presented by Kershaw [5].
Kershaw identifies the unstable pivots as decomposition proceeds and 
these pivots are perturbed to create a stable approximate
factorization.
Kershaw identifies the instable pivots as the decomposition proceeds. He
characterizes these pivots in terms of the number of significant
digits available on the machine and determines a correction to
provide a "best possible" inverse. Also a bound on the resulting error
matrix is provided. His approach is applicable to both complete and
incomplete factorizations. Finally, Kershaw shows that for a complete
factorization, if @p@ pivots have been perturbed by his procedure
then with exact arithmetic, only @2p ~+~ 1@ conjugate gradient
iterations are needed.
.PP
When considering splittings of the matrix A=M-N to be used with conjugate
gradient algorithms we would like to have a criterion for selecting a 
"best" splitting.
In [23] Greenbaum considers splittings when A is positive definite and
symmetric.
A sharp upper bound on the error is given and a comparision test is
presented for determining when one splitting always yields a smaller
error than another splitting.
.PP
In [17] Greenbaum considers the effect of finite precision arithmetic
on the conjugate gradient algorithm.
She shows that the effect of finite precision versus exact arithmetic
is to cause the finite precision conjugate gradient algorithm to 
periodically resemble the exact arithmetic algorithm associated with
an earlier step and different starting point.
.sp
.NH
BACKGROUND TO CONJUGATE GRADIENT AND PRECONDITIONING
.PP
The method of conjugate gradients has been known for some time [7].
The algorithm in theory has finite termination after @n@ steps
when the matrix, say @A@, of order @n@ 
is symmetric positive definite.
Much of the early interest in the method was tarnished since the finite
termination of the algorithm no longer holds in the presence of roundoff
errors and as a direct method the algorithm is not competitive with Gaussian
elimination either with respect to accuracy or number of operations.
The algorithm has a simple form and can be easily translated into a
computer program. A conjugate gradient algorithm can be stated as:
.sp 2
Given a system    @ A x ~=~ b@
of @n@ linear equations where the matrix @A@ is symmetric positive definite
and an initial vector @x sub 0 @,
form the corresponding residual
.EQ
r sub 0 ~=~ b ~-~ A x sub 0.
.EN
Set @p sub 0 ~=~ r sub 0 @ and for @ i ~=~ 0,1,2, ... @ find
@x sub i+1 , r sub i+1 , p sub i+1 , a sub i , ~and~ b sub i @
using the equations
.EQ 
a sub i ~=~ { r sub i sup T r sub i } over { p sub i sup T A p sub i }
.EN
.EQ 
x sub i+1 ~=~ x sub i ~+~ a sub i p sub i
.EN
.EQ (4.1)
r sub i+1 ~=~ r sub i ~-~ a sub i A p sub i
.EN
.EQ
b sub i ~=~ { r sub i+1 sup T r sub i+1 } over { r sub i sup T  r sub i }
.EN
.EQ
p sub i+1 ~=~ r sub i+1 ~+~ b sub i p sub i .
.EN
Termination is controlled by monitoring @ r sub i ~or~ p sub i @.
.PP
Some of the basic properties of the C-G method are [7]:
.EQ
(i)~~~~~ mark r sub i p sub j ~=~ 0 ~~~~~i > j ,
.EN
.EQ I
(ii)~~~~ lineup p sub i sup T A p sub j ~=~ 0 ~~~~~i != j ,
.EN
.EQ I
(iii)~~~ lineup when~ p sub 0 ~=~ r sub 0 ~~ then ~ r sub i sup T r sub j ~=~ 0 ~~~~~i != j ,
.EN
.EQ I
(iv)~~~ lineup when ~ p sub 0 ~=~ r sub 0 , ~the~ sets~ "{" p sub j "}" sub j=0 sup i-1 ~ and~ "{" r sub j "}" sub j=0 sup i-1
.EN
.I
.in +2
each form a basis for @ S sub i ~=~ Span "{" r sub 0 , A r sub 0 , ... , A sup i-1 r sub 0 "}" @
and the residual vector @ r sub i @ minimizes the quadratic form
@Q ( r ) ~=~ r sup T A sup -1 r @ over all vectors of the form
@r ~=~  r sub 0 ~+~ A s , ~ s \(mo S sub i @.
Since @ S sub i+1 \(sp S sub i @ we see that @ Q(r sub i ) @ will
monotonically decrease as @ i @ increases.
.in
.R
.PP
When dealing with very large and sparse matrices C-G has a very attractive
property from an algorithmic standpoint, namely only inner products need
be performed. The inner product calculations will involve a row of the matrix
and a full vector.
When the matrix is sparse the amount of work is diminished substantially.
.PP
Another attractive feature of the C-G method is that, unlike some of the 
other iterative method for finding solution to linear systems,
one does not need an estimate of the extreme eigenvalues or other
parameters for convergence.
.PP
In order to accelerate the convergence of the C-G method a widely used
technique is to precondition the matrix @A@ in such a way as to 
cluster the eigenvalues, or said another way, to produce a matrix 
which is in some ways close to the identity matrix.
.PP
We turn now to the use of preconditioning techniques combined with
C-G applied to non-symmetric matrices. We first observe from the 
C-G algorithm (4.1) that if @ A ~=~ I @, then the C-G method
would converge in one step. In this case we have
.EQ I
r sub 0 ~=~ b ~-^ x sub 0 ,
.EN
.EQ I (4.2)
p sub 0 ~=~ r sub 0 ,
.EN
.EQ I
a sub 0 ~=~ { || r sub 0 || sup 2 } over { || r sub 0 || sup 2 } ~=~ 1,
.EN
.EQ I
x sub 1 ~=~ x sub 0 ~+~ p sub 0 ~=~ x ~+~ b ~-~ x sub 0 ~=~ b .
.EN
.sp
Thus, intuitively we will converge rapidly when the coefficient
matrix is close, in some sense, to the identity matrix. This is the 
guiding principle in the use of preconditioning.
.PP
Consider a linear system
.EQ I (4.3)
B x ~=~ f ,
.EN
where @B@ is not necessarily symmetric but is well conditioned. Suppose we
have an approximate factorization
.EQ I (4.4)
B approx LU ,
.EN
then the following relations are immediate
.EQ I (4.5)
B ( L U ) sup -1 ~ approx ~ I , ~ ( L U ) sup -T B sup T ~ approx I ,
.EN
.EQ I (4.6)
( L U ) sup -1 B ~ approx ~ I , ~ B sup T ( L U ) sup -T ~ approx I ,
.EN
.EQ I (4.7)
L sup -1 B U sup -1 ~ approx ~ I , ~  U sup -T B sup T L sup -T ~ approx I .
.EN
These three sets of relations can be used to generate six variations
of a preconditioned C-G method for non-symmetric matrices.
.PP
For the first three variations we start from the system of the form
.EQ I (4.8)
B sup T B x ~=~ B sup T f .
.EN
Our goal is to derive a system @A y ~=~ g@ where @A~=~D sup T D @ with
@D ~ approx ~ I@.
We observe that equation (4.8) is
equivalent to the system
.EQ I
( L U ) sup -T B sup T B ( L U ) sup -1 L U x ~=~ ( L U ) sup -T B sup T f .
.EN
Hence, setting
.EQ I (4.9)
A ~=~ ( L U ) sup -T B sup T B ( L U ) sup -1 ~=~ [ B ( L U ) sup -1 ] sup T [ B ( L U ) sup -1 ]
.EN
.EQ I 
y ~=~ L U x
.EN
.EQ I
g ~=~ ( L U ) sup -T B sup T f
.EN
we have the system
.EQ I 
A y ~=~ g ,
.EN
with @ A ~ approx ~ I @ and symmetric positive definite.
.PP
The second variation uses relation (4.6) and starts from equation (4.3)
by multipling by @ (L U ) sup -1 @. Then
.EQ I
( L U ) sup -1 B x ~=~ ( L U ) sup -1 f ,
.EN
from which we find
.EQ I
B sup T ( L U ) sup -T ( L U ) sup -1 B x ~=~ B sup T ( L U ) sup -T ( L U ) sup -1 f.
.EN
Setting
.EQ I 
A ~=~ B sup T ( L U ) sup -T ( L U ) sup -1 B ~=~ [ ( L U ) sup -1 B ] sup T [ ( L U ) sup -1 B ]
.EN
.EQ I (4.10)
y ~=~ x
.EN
.EQ I
g ~=~ B sup T ( L U ) sup -T ( LU) sup -1 f
.EN
we have a system @ A y ~=~ g @ of the desired type.
.PP
The third variation starts with equation (4.3) and uses (4.7) to obtain
.EQ I
U sup -T B sup T L sup -T L sup -1 B U sup -1  U x ~=~ U sup -T B sup T L sup -T L sup -1 f .
.EN
Thus, setting
.EQ I (4.11)
A ~=~ U sup -T B sup T L sup -T L sup -1 B U sup -1  ~=~ [ L sup -1 B U sup -1 ] sup T [ L sup -1 B U sup -1 ]
.EN
.EQ I 
y ~=~ U x
.EN
.EQ I
g ~=~ U sup -T B sup T L sup -T L sup -1 f
.EN
we have the system
of the desired type. This variation was suggested in the appendix of [5].
.PP
The last three variations are based on the equations,
.EQ I (4.12)
B B sup T z ~=~ f , ~~~ x ~=~ B sup T z .
.EN
We are drawn to this equation by an error analysis of Equation (4.12)
by Paige [19]. In the analysis, Paige shows that the square of the
condition of @B@ does not enter into the error bounds, only the 
condition number.
.PP
In this case, our goal is to obtain a system @Ay ~=~ g@ where @ A ~=~ DD sup T @
with @ D ~ approx ~ I@.
Starting with @ B x ~=~ f @, it follows that 
.EQ I
B ( L U ) sup -1 L U x ~=~ f ,
.EN
and setting, 
.EQ I
L U x ~=~ ( L U ) sup -T B sup T y ,
.EN
we have
.EQ I
B ( L U ) sup -1 ( L U ) sup -T B sup T y ~=~ f .
.EN
Hence, setting
.EQ I
A ~=~ B ( L U ) sup -1 ( L U ) sup -T B sup T ~=~ [ B ( L U ) sup -1 ] [ B ( L U ) sup -1 ] sup T
.EN
.EQ I (4.13)
x ~=~ ( L U ) sup -1 ( L U ) sup -T B sup T y
.EN
.EQ I
g ~=~f
.EN
we have a system @ A y ~=~ g @.
.PP
Finally, starting with equation (4.9) and using relation (4.6), we find
.EQ I 
( L U ) sup -1 B B sup T ( L U ) sup -T ( L U ) sup T z ~=~ ( L U ) sup -1 f .
.EN
Thus we can set
.EQ I (4.14)
A ~=~ ( L U ) sup -1 B B sup T ( L U ) sup -T ~=~ [ ( L U ) sup -1 B ] [ ( L U ) sup -1 B ] sup T
.EN
.EQ I
y ~=~ ( L U ) sup T z , ~~~~~ x ~=~ B sup T z
.EN
.EQ I
g ~=~ ( L U ) sup -1 f ,
.EN
to obtain a system of the desired type.
.PP
Starting with @ B x ~=~ f @ and having relation (4.7) in mind, we find
.EQ I 
L sup -1 B U sup -1 U x ~=~ L sup -1 f ,
.EN
then setting @ U x ~=~ U sup -T B sup T L sup -T y @, we find
.EQ I
L sup -1 B U sup -1 U sup -T B sup T L sup -T y ~=~ L sup -1 f .
.EN
Setting
.EQ I
A ~=~ L sup -1 B U sup -1 U sup -T B sup T L sup -T ~=~ [ L sup -1 B U sup -1 ] [ L sup -1 B U sup -1 ] sup T
.EN
.EQ I (4.15)
g ~=~ L sup -1 f ,
.EN
.EQ I
x ~=~ U sup -1 U sup -T B sup T L sup -T y
.EN
we have a system @ A y ~=~ g @ of the desired type.
.PP
For each of the six variants, the basic C-G algorithm (4.1) can be
applied. For the first three variants @ A ~=~ D sup T D @, while for the last
three @ A ~=~ D D sup T @, where the matrix @D@ is the preconditioned
matrix of the form @B ( LU) sup -1 @, @(LU) sup -1 B @, or 
@ L sup -1 B U sup -1 @. To summarize, when
we apply the C-G algorithm to each of these variants, we obtain the following
algorithms. 
.TS
center,doublebox;
c|c|c|c.
	I	II	III

_
	@D ~=~ B ( L U ) sup -1@ 	@D ~=~ ( L U ) sup -1 B@	@D ~=~ L sup -1 B U sup -1@	@D ~=~ B ( L U ) sup -1@
	@y ~=~ LUx@	@y ~=~ x@	@y ~=~ Ux@
	@g ~=~ D sup T f @	@g ~=~ D sup T ( LU) sup -1 f@	@g ~=~ D sup T L sup -1 f @
\R-	\R-	\R-	\R-
initial 	@y sub 0 ~=~ LUx sub 0 , r sub 0 ~=~ f - Bx sub 0 @
phase	@R sub 0 ~=~ D sup T r sub 0 @	@R sub 0 ~=~ D sup T (LU) sup -1 r sub 0@	@R sub 0 ~=~ D sup T L sup -1 r sub 0@
	@p sub 0 ~=~ R sub 0 @	@p sub 0 ~=~ R sub 0 @	@p sub 0 ~=~ R sub 0 @	

\R-	\R-	\R-	\R-
.T&
c|c c c.
		@a sub ~=~ || R sub i || sup 2 / || Dp sub i || sup 2 @
		@y sub i+1 ~=~ y sub i ~+~ a sub i p sub i @
Iterative		@R sub i+1 ~=~ R sub i ~-~ a sub i D sup T D p sub i @
phase		@b sub i ~=~ || R sub i+1 || sup 2 / || R sub i || sup 2 @
		@ p sub i+1 ~=~ R sub i+1 ~+~ b sub i p sub i @

\R-	\R-	\R-	\R-
.T&
c|c|c|c.
Final
phase	@x ~=~ (LU) sup -1 y@	@x ~=~ y @	@x ~=~ U sup -1 y @

.TE
.I
.ce 
Table 4.1
.R
.TS
center,doublebox;
c|c|c|c.
	IV	V	VI

_
	@D ~=~ B ( L U ) sup -1@ 	@D ~=~ ( L U ) sup -1 B@	@D ~=~ L sup -1 B U sup -1@	@D ~=~ B ( L U ) sup -1@
	@y ~=~ z@	@y ~=~ (L U ) sup T z@	@y ~=~ z@
	@g ~=~ f @	@g ~=~  ( LU) sup -1 f@	@g ~=~  L sup -1 f @
	@x ~=~ (LU) sup -1 (LU) sup -T B sup -T z @	@x ~=~ B sup T z @	@x ~=~ U sup -1 U sup -T B sup T L sup -1 z @
\R-	\R-	\R-	\R-
initial 	@y sub 0 ~=~ x sub 0 , g ~=~ f @	@ y sub 0 ~=~ x sub 0 , g ~=~ (LU) sup -1 f @	@y sub 0 ~=~ x sub 0 , ~ g ~=~ L sup -1 f @
phase	@R sub 0 ~=~ g ~-~ D D sup T y sub 0  @	@R sub 0 ~=~ g ~-~ D D sup T y sub 0 @	@R sub 0 ~=~ g ~-~ D D sup T y sub 0 @
	@p sub 0 ~=~ R sub 0 @	@p sub 0 ~=~ R sub 0 @	@p sub 0 ~=~ R sub 0 @	

\R-	\R-	\R-	\R-
.T&
c|c c c c.
		@a sub ~=~ || R sub i || sup 2 / || Dp sub i || sup 2 @
		@y sub i+1 ~=~ y sub i ~+~ a sub i p sub i @
Iterative		@R sub i+1 ~=~ R sub i ~-~ a sub i D sup T D p sub i @
phase		@b sub i ~=~ || R sub i+1 || sup 2 / || R sub i || sup 2 @
		@ p sub i+1 ~=~ R sub i+1 ~+~ b sub i p sub i @

\R-	\R-	\R-	\R-
.T&
c|c|c|c.
Final
phase	@x ~=~ (LU) sup -1 (LU) sup -T B sup T y@	@x ~=~ B sup T (LU) sup -1 y @	@x ~=~ U sup -1 U sup -T B sup T L sup -T y @

.TE
.I
.ce 
Table 4.2
.R

.PP
We note that when the algorithms are written in this form, the iterative
portion of each algorithm is the same. Also note that in the last three
variants, the algorithms are operating in the @y-space@
and thereby requires an initial estimate @ y sub 0 @.
If these latter
three algorithms were imbedded in some larger iterative procedure which was
based in the @x-space@, the @y@-vector would have to be stored
between calls in order to use the previous time step solution as an
initial approximation for the next time step. We also observe that for
the last three variants, we have equated a guess @ x sub 0 @ in the 
@x@-space to a guess @ y sub 0 @ in the @y@-space, ignoring the fact that
@x sub 0 ~ != ~ y sub 0 @ and in fact are related by the expressions
shown in the final phase of Table 4.2.
.PP
Considering the 
first three algorithms in @Table 4.1@, we observe that in each case five
vectors are needed, the given vectors @ x and f@ with the three auxilary
vectors @r, p,@ and @s@. In terms of computational work we see that the first
three variants differ only in the final phase with variant II having the 
least work. For the last three variants in @Table 4.2@, we observe that in 
general we will have to keep six vectors; @ x@ and @f@ with the four 
auxiliary vectors @y,~r,~p,@ and @s@. The last three variants
do differ in the work involved in three initial and final phases. However,
multipling a vector by the matrices @L sup -1 @ or @ U sup -1 @ involves
only @ half n sup 2 @ operations, thus all three variants involve the 
same amount of work. 
.NH
PROVIDING FILL IN FOR INCOMPLETE FACTORIZATION
.PP
The incomplete factorization that has been described in section 4, allowed
storage for elements of @L@ and @U@ in the same positions as in the original
matrix @A@. No additional elements where allowed to be formed during
the decomposition. If this condition is relaxed, and fill in is allowed
to occur in positions other than that occupied by the original matrix,
a family of factorizations can be produced [4].
.PP
Intuitively it is hoped that the approximation factors, @L@ and @U@
will more closely approximate the "true" factors of the original
matrix as more and more fill in is provided. Thus, @D sup T D@ or
@DD sup T @ will more closely approximate the identity and fewer
iterations will be expected in order to obtain a solution. On the
other hand, as the fill-in increases, the computation time per
iteration increases, so that the overall computation time may not
be decreased even though the number of iterations is reduced. When
dealing with fill-in, there are three aspects to be considered. First
the amount of fill-in, second, the location of the fill-in, and third,
the nature of the fill-in. We have restricted our fill-in to occur on
diagonal stripes adjacent to the original pattern. With this fill-in
pattern, we have considered the effect of providing additional fill-in
stripes.
(In our approach, we have
provided additional storage for fill in to occur on these
diagonal stripes.)
.NH
ERROR ANALYSIS

.PP
The iterative phase of the above algorithms is
centered around the computation of @ f ~=~ D sup T D p @ or
@ f ~=~D D sup T p@ where @D@ is one of the following three
quantities; @A(LU) sup -1 @, @ (LU) sup -1 A@, or @U sup -1 A L sup -1 @.
In examining the behavior of the algorithm it is important
to understand the nature of the errors made in this computation.
.PP
We will concentrate on the equation @f ~=~ D sup T D p @ where @D ~=~ A ( LU) sup -1 @,
the analysis follows in a similar fashion for the other definitions of @D@ and
@f@. We let @ B ~=~ LU @ and
the equation for @f@ can be written as:
.sp
.EQ I
f ~=~ B sup -T A sup T A B sup -1 p.
.EN
.sp
We are interested in determining how the computed @f@, say @f bar @, is effected
by the condition of @B@.
.PP
We will perform the following operations to compute @f@;
.sp
.EQ I
B p sub 1 ~=~ p ~;~~ solve~for~p sub 1
.EN
.EQ I
p sub 2 ~=~ A p sub 1 ~;~~form ~p sub 2
.EN
.EQ I
p sub 3 ~=~ A sup T p sub 2 ~;~~form~p sub 3
.EN
.EQ I
B sup T f ~=~ p sub 3 ~;~~ solve ~for~ f.
.EN
.sp
There are four sources of errors, one in each of the above steps.
Using a Wilkinson[18]  rounding error 
analysis we know
that the computed solution @f bar @ will satisfy
.sp
.EQ I
(B ~+~ E sub 1 ) p bar sub 1 ~=~ p
.EN
.EQ I
p bar sub 2 ~=~ ( A ~+~ E sub 2 ) p bar sub 1
.EN
.EQ I
p bar sub 3 ~=~ ( A sup T ~+~ E sub 3 ) p bar sub 2
.EN
.EQ I
(B sup T ~+~ E sub 4 ) f bar ~=~ p bar sub 3 ,
.EN
.EQ I
where~ || E sub i || ~ <= ~ epsilon sub i || A || ~for ~i~=~ 2~and~3,
|| E sub i || ~ <= ~ epsilon sub i || B || ~for~i~=~1~and~4.
.EN
.sp
The error bounds for @ f bar @ can be stated as:
.sp
.I
Theorem: If @ || A || ~ <= ~ sigma || B || @ 
then @ { || f ~-~ f bar || } over { || p || } ~ <= ~ 6 xi kappa sigma || A B sup -1 || ~+~ 9 xi sup 2 kappa sup 2 sigma sup 2 @, where
@ kappa ~=~ || B || ~ || B sup -1 || @, @ sigma ~ <= ~ 1 @, and @ xi @ 
reflects the machine precision.
.sp
Proof:
.sp
.R
The computed @f@ can be written as
.sp
.EQ I
f bar ~=~ ( B sup -T ~+~ F sub 1 ) ( A sup T ~+~ E sub 3 )(A ~+~ E sub 4 ) (B sup -1 ~+~ F sub 2 ) p .
.EN
.sp
where @ || E sub i || ~ <= ~ epsilon sub i || A || @ , 
@ || F sub i || ~ <= ~ epsilon sub i || B sup -1 || @ and
@ epsilon sub i ~ <= ~ g(n) epsilon @,  @g(n)@ is a modest function of @n@,
the order of the matrix, and @ epsilon @ is the machine precision.
Expanding we get
.sp
.EQ I
f bar ~=~  ( B sup -T A sup T ~+~ B sup -T E sub 3 ~+~ F sub 1 A sup T ~+~ F sub 1 E sub 3 )( A B sup -1 ~+~ A F sub 2 ~+~ E sub 4 B sup -1 ~+~ E sub 4 F sub 2 ) p.
.EN
.sp
If we let 
.sp
.EQ I
H ~=~ A B sup -1
.EN
.EQ I
G sub 1 ~=~ B sup -T E sub 3 ~+~ F sub 1 A sup T ~+~ F sub 1 E sub 3
.EN
.EQ I
G sub 2 ~=~ A F sub 2 ~+~ E sub 4 B sup -1 ~+~ E sub 4 F sub 2
.EN
.sp
then
.EQ I
f bar ~=~ ( H sup T ~+~ G sub 1 )( H ~+~ G sub 2 ) p
.EN
.sp
We can now determine bounds for @ G sub 1 , ~ G sub 2 @.
If we assume @ || A || ~ <= ~ sigma || B || @ then
.sp
.EQ I
|| G sub 1 || mark ~ <= ~ epsilon sub 3 || A || ~ || B sup -1 || ~+~ epsilon sub 1 || A || ~ || B sup -1 || ~+~ epsilon sub 1 epsilon sub 3 || A || ~ || B sup -1 ||
.EN
.EQ I
lineup ~ <= ~ ( 2 xi kappa ~+~ xi sup 2 kappa ) sigma
.EN
.EQ I
lineup ~ <= ~ 3 xi kappa sigma
.EN
.sp
.EQ I
|| G sub 2 || mark ~ <= ~ epsilon sub 2 || A || ~ || B sup -1 || ~+~ epsilon sub 4 || A || ~ || B sup -1 || ~+~ epsilon sub 2 epsilon sub 4 || A || ~ || B sup -1 ||
.EN
.EQ I
lineup ~ <= ~ ( 2 xi kappa ~+~ xi sup 2 kappa ) sigma
.EN
.EQ I
lineup ~ <= ~ 3 xi kappa sigma
.EN
.sp
where @ kappa ~=~ || B || ~ || B sup -1 || @ and @ xi ~=~ max from  i ~{ epsilon sub i }@.
Then we can write
the errors made in computing @ f @ as 
.sp
.EQ I
{ || f ~-~ f bar || } over { || p || } ~~ <= ~~ 6 xi kappa sigma  || H || ~+~ 9 xi sup 2 kappa sup 2  sigma sup 2 . ~~~~~~~~~~~~~~~~~~~~~~\(sq
.EN
.sp
.PP
If @ xi kappa ~ < ~ 1 @ then the order of magnitude of the error bound
is the same as that for the direct solution.
Whenever the square of the condition number occurs in the error bound for
the solution, it is effectively multiplied by the square of the precision
or something smaller.
The @ || H || @ is a reflection of how effective the preconditioner is. This
quantity is expected to be close to order unity.
.sp

.NH
RESULTS
.PP
The matrices used in this study were generated from the finite difference
equations described in section 2. We choose the domain to be the
unit cube (@ L sub x ~=~ L sub y ~=~ L sub z ~=~ 1 @) and
throughout the study we used the following data.
.EQ I
V sup x ( x ,y,z) ~=~ 800 x(1~-~x)y(1~-~y)zR sup x (x,y,z)
.EN
.EQ I (7.1)
V sup y ( x ,y,z) ~=~ 800 x(1~-~x)y(1~-~y)zR sup y (x,y,z)
.EN
.EQ I
V sup z ( x ,y,z) ~=~ 4xyz sup 2
.EN

where @ R sup x (x,y,z) ~=~ left { pile {1 above {x ~-~ half }} right } @,
@ R sup y ( x,y,z) ~=~ left { pile { 1 above { y ~-~ half }} right } @.
When @ R sup x ~=~ R sup y ~ == ~ 1@, the problem type will be 
referred to as non-rotational; while the other case will be referred
to as rotational. For our problem the absorbtion distribution,
@OMEGA ( x,y,z)@ is zero, and the source distributation, @F(x,y,z) ~=~ x sup 2 yz@.
For the boundary condition, when Dirchlet conditions are used,
.EQ
G sup L (x,y,0) ~ == ~ 1 ~~and~~G sup U (x,y,1) ~ == ~ 2 .
.EN
We also allow Neumann conditions to be imposed on either the top 
or the bottom or both. When Neumann conditions are used on both the top
and bottom, the resulting singular matrix is modified by zeroing out the 
first row and column and placing a constant on the diagonal.
This would correspond to fixing the value of the solution in the first mesh
cell.
.PP
Within the class of matrices generated in this manner, we have 
the following parameters at our disposal:
.sp
.in 1
.I
a)  Size of the mesh, i.e. the order of the matrix.
.sp
b)  Combination of boundary conditions used. For example,
Dirichlet top and bottom, Dirichlet top and Neumann bottom,
Neumann top and bottom, etc.
.sp
c)  Use of a rotational or non-rotational velocity field.
.R
.in
.PP
In this numerical study we have focused our attention on the
following three areas.

.br
1. Differences in the behavior of the six varients discussed in section 3.
.br
2. Effects of fill-in along stripes on the computational times for achieving
a solution.
.br
3. Comparison of these six variants with an algorithm designed and implemented
by Manteuffel [9,10] based on the use of Tchebychev iteration for 
non-symmetric linear systems.
.br
.PP
To illustrate the differences in the six varients 
we used a problem based on a @ 7 times 7 times 7 @
mesh with no rotation in the velocity field. We used two types of boundary
conditions. The first case used Dirichlet conditions on the top and bottom
while the second case used Neumann conditions on the top and bottom with 
the solution value fixed in the first mesh cell. In both cases we have
compared the six varients using no fill-in and using one inner and one
outer stripe for fill-in. The results are shown in Tables 7.1 and 7.2.
(The computations were run on a VAX 11/780 under UNIX and the timings
are reported in seconds. For all the tables @D-D@ refers to Dirchlet
conditions imposed on the top and bottom, and @N-N@ refers to
Neumann conditions imposed on the top and bottom.)
.sp
.TS
center,box;
c|c s||c s
c|c s||c s
c|c|c||c|c
n|n|n||n|n.
	no fill-in	one inner and outer
	storage - 2287	fill-in storage 3551
_
variant	iterations	time	iterations	time
_
1	40	31	32	30
2	36	27	28	27
3	46	33	36	33
4	39	29	31	30
5	35	27	27	27
6	45	32	35	33
.TE
.EQ
Table 7.1 
.EN
.EQ
D-D
.EN
.TS
center,box;
c|c s||c s
c|c s||c s
c|c|c||c|c
n|n|n||n|n.
	no fill-in	one inner and outer
	storage - 2287	fill-in storage 3551
_
variant	iterations	time	iterations	time
_
1	62	43	50	46
2	50	36	42	38
3	70	48	57	50
4	60	41	49	44
5	48	35	41	38
6	66	46	53	48
.TE
.EQ
Table 7.2 
.EN
.EQ
N-N
.EN
.sp
In all cases we see that the performance of the @2 sup nd @ and the
@ 5 sup th @ variants are the best while the @ 3 sup rd@ and the @ 6 sup th @
variants perform the worst. Moreover the variations in performance
are significant with a reduction factor of at least 0.75 in all cases.
.PP
Using the same two test problems 
we studied the effects of fill-in on the number of iterations
and ultimately on the iteration time. The results for the 
@ 2 sup nd @ variant are shown in Table 7.3 and 7.4 which give
the number of iterations needed to achieve convergence for a given
pair @ (m,n)@, where @m@ is the number of inner fill-in stripes and @n@
is the number of outer fill-in stripes. The results are surprising
and discouraging. They show that the number of iterations needed for convergence
is extremely insensitive to the number of fill-in stripes. Clearly
there is no point in considering fill-in beyond the (1,1) pattern when
considering iteration time. From Tables 7.5 and 7.6 it is clear that for these cases,
there is no point in using any fill-in.
.sp
.KS
.TS
center,box;
c|c s s s s s
c|n n n n n n
n|n n n n n n.
	number inner stripes
outer	0	1	2	3	4	5
_
0	36	34	34	34	34	33
1	34	28	29	29	29	29
2	34	29	28	29	29	29
3	34	29	29	29	29	29
4	34	29	29	29	29	29
5	34	29	29	29	29	29
.TE
.EQ
Table 7.3
.EN
.EQ
Number~of~iterations~for~D-D~,~2 sup nd ~variant
.EN
.KE
.KS
.TS
center,box;
c|c s s s s s
c|n n n n n n
n|n n n n n n.
	number inner stripes
outer	0	1	2	3	4	5
_
0	50	47	47	47	47	47
1	47	42	42	42	42	43
2	47	42	42	42	43	43
3	47	42	42	43	43	43
4	47	42	42	43	43	43
5	47	42	42	42	42	43
.TE
.EQ
Table 7.4
.EN
.EQ
Number~of~iterations~for~N-N~,~2 sup nd ~variant
.EN
.KE
.sp
One observation made in the course of these experiments is that 
although the number of iterations decreases by increasing the 
amount of fill-in allowed in the matrix, the overall time to
solve the problem never decreases. This can be understood by
looking at the iterative portion of the algorithm, which is 
based on matrix-vector products. As the matrix is allowed 
to have more non-zero elements the time to do a matrix-vertor 
product increases. This can be seen in Tables 7.5 and 7.6.
.sp
.KS
.TS
center,box;
c|c s
c|n n
n|n n.
	number inner
outer	0	1
_
0	27	29
1	29	27
.TE
.EQ 
Table 7.5
.EN
.EQ
Iteration~Time~for ~D-D, ~2 sup nd ~ variant
.EN
.KE
.KS
.TS
center,box;
c|c s
c|n n
n|n n.
	number inner
outer	0	1
_
0	36	38
1	37	38
.TE
.EQ 
Table 7.6
.EN
.EQ
Iteration~Time~for ~N-N, ~2 sup nd ~ variant
.EN
.KE
.sp
.PP
We have compared the method as described above with
a non-symmetric linear equation solver based on the
use of Tchebychev polynominals in the complex plane.
The Tchebychev iteration is based on work of Manteuffel[9,10],
and is adaptive in the choice of estimating the optimal
iteration parameters. This method will converge whenever the
spectrum of @A@ can be enclosed in an ellipse that does not contain
the origin. It can be shown that the Tchebychev iteration is optimal
over all other polynomial based methods. The software to do the
Tchebychev iteration was provided by Manteuffel.
.PP
In the comparison we took a finite difference mesh of @ 15 times 15 times 30@
mesh cells which led to a system of order 6750, with 46288 non-zero
coefficients in the matrix for a density of about 0.1 per cent.
In the first case we used Dirichlet conditions on top and bottom @(D-D)@,
and in the second case we used Neumann conditions on top and bottom sides @(N-N)@,
with a point fixed to insure a unique solution as before.
The preconditioner was the same in both cases.
.sp
.KS
.TS
center,box;
c|c s
c|c c
c|c c.
	method
conditions	CG	Tchebychev
_
D-D	65 min	51 min
	168 iter	144 iter
N-N	82 min	346 min
	248 iter	2686 iter
.TE
.EQ 
Table 7.7
.EN
.EQ
Comparison~ CG~ vs~ Tchebychev
.EN
.KE
.sp
Table 7.7 presents results for the total computation time and
iterations. When considering time note that the Tchebyshev
method requires periodic reevaluation of the eigenvalue estimates.
In Figue 7.1 graphs of the iteration count versus the residual
are given. Figure 7.1a gives results for (D-D) and Figure 7.1b
gives results for (N-N). In Figure 7.1c the (N-N) results
are graphed for just the first 400 iterations. Note that the CG
residual is not a monotonically decreasing function since the only
residual guaranteed to decrease is @R sub i @ for the second variant
in Table 4.2 and the graph deals with the residual in equation (4.1).
.sp
.KS
.sp 23
.EQ
Table ~~ 7.1a ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Table~~7.1b
.EN
.KE
.KS
.sp 23
.EQ
Table ~~ 7.1c
.EN
.KE
.sp
With both methods we require a residual to be less than @10 sup -13 @ before
convergence was signaled. The results from this problem show that the 
CG variant (variant II of table 4.1)
used is competive with the Tchebychev iteration for Dirichlet
conditions. When Neumann conditions are imposed, the Tchebychev approach
is very slow to converge. (It should be noted that the default parameters
were used in the Tchebychev code.) If no information is known about
the problem the CG variant would be a better choice.
.sp
.NH
REFERENCES
.R
.sp 1
1. F.H. Harlow and A.A. Amsden, 
.I
Flow of Interpenetratary Material Phases,
.R
J. Comp. Phys. 18, 440-464 (1975).
.sp 1
2. V.L. Shah, et. al, 
.I
Some Numerical Results with the COMMIX-2 Computer Code,
.R
Technical Memorandum ANL-CT-79-30 (1979).
.sp 1
3. A. Greenbaum,
.I
Comparison of Splittings Used with the Conjugate Gradient Algorithm,
.R
Numer. Math. 33, 181-194 (1979).
.sp 1
4. O. Axelsson and N. Munksgaard,
.I
A Class of Preconditioned Conjugate Gradient Methods for the Solution
of a Mixed Finite Element Discretization of the Biharmonic Operator,
.R
Inter. J. for Num. Meth. Eng. 14, 1001-1019 (1979).
.sp 1
5. D. Kershaw,
.I
On the Problem of Unstable Pivots in the Incomplete LU-Conjugate
Gradient Method,
.R
J. Comp. Phy. 38, 114-123 (1980).
.sp 1
6. J. Daniel,
.I
The Approximate Minimization of Functionals,
.R 
Prentice-Hall, 1971.
.sp 1
7. M.R. Hestenes and E. Stiefel,
.I
Methods of Conjugate Gradients for Solving Linear Systems,
.R
NBS J. Res. 49, 409-436 (1952).

8. P. Concus, G. Golub, and D. O'Leary,
.I
A Generalized Conjugate Gradient Method for the Numerical Solution
of Elliptic Partial Differential Equations,
.R
Sparse Matric Computations, Ed. J. Bunch and D. Rose, Academic Press (1976).
.sp 1
9. T. Manteuffel,
.I
The Tchebychev Iteration for Nonsymmetric Linear Systems,
.R
Numer. Math. 28, 307-327, (1977).
.sp 1
10. T. Manteuffel,
.I
Adaptive Procedure for Estimating Parameters for the Nonsymmetric
Tchebychev Iteration
.R
Numer. Math. 31, 183-208, (1978).
.sp 1
11. H.A. van der Vorst and J.M. van Kats,
.I
Manteuffel's Algorithm with Preconditioning for the Iterative Solution
of Certain Sparse Linear Systems with a Nonsymmetric Matrix,
.R
Academisch Computer Centrum Report TR-11, Utrecht, The Netherlands,
August, 1979.
.sp 1
12. O. Axelsson, 
.I
On Optimization Methods in the Numerical Solution of Boundary Value
Problems: A Survey,
.R
Univ. of Texas Center for Numerical Analysis Report CNA-137, Austin,
Texas, 1978.
.sp 1
13. O. Axelsson,
.I
Conjugate Gradient Type Methods for Unsysmmetric and Inconsistent Systems
of Linear Equations
.R
Linear Algebra and its Applications 29:1-16 (1980).
.sp 1
14. J.A. Meijerink and H.A. van der Vorst,
.I
An Iterative Solution Method for Linear Systems of which the Coefficient
Matrix is a Symmetric M-matrix,
.R
Math. of Comp., 31, 148-162 (1977).        
.sp 1
15. I. Gustafsson,
.I
A Class of First Order Factorization Methods,
.R
BIT, 18, 142-156 (1978).
.sp 1
16. N. Munksgaard and O. Axelsson,
.I
Analysis of Incomplete Factorizations with Fixed Storage Allocation,
.R
submitted SIAM Jour. on Scientific and Statistical Computing.
.sp 1
17. A. Greenbaum,
.I
Behavior of the Conjugate Gradient Algorithm in Finite Precision Arithmetic,
.R
Lawerence Livermore Laboratory, UCRL-85752, March 1981.
.sp 1
18. J.H. Wilkinson,
.I
Rounding Errors in Algebraic Processes,
.R
Notes on Applied Sciences No. 32, Her Majesty's Stationary Office,
London, Prentice-Hall, New Jersey (1963).
.sp 1
19. C.C. Paige,
.I
An Error Analysis of a Method for Solving Matrix Equations,
.R
Math. Comp. 27, 355-359, April 1973.

.