- Introduction

        tms2:   sparse svd via trace minimization using A'A eigensystems.

	tms2.c is an ANSI-C code designed to find several of the largest
        eigenvalues and eigenvectors of a real symmetric positive definite
	matrix B.  The matrix B is assumed to be of the form
  
           B = (alpha*alpha)*I -A'A,  where A is nrow by ncol (nrow>>ncol)
                                       and sparse,

        and alpha is an upper bound for the largest singular value of
        the matrix A. Hence, the singular triplets of A are computed as
        the eigenpairs of B.  If lambda is an eigenvalue of B, then
        sqrt(lambda-alpha*alpha) is a singular value of A.  The eigenvectors
        of B correspond to the right singular vectors of A only.
        The left singular vectors of A are then determined by

                          u = 1/sigma A*v,

        where {u,sigma,v} is a singular triplet of A.
        This implementation is discussed in "Multiprocessor
        Sparse SVD Algorithms and Applications", Ph.D. Thesis by M. Berry,
        University of Illinois at Urbana-Champaign, October 1990.  This
        version uses Ritz-shifting and (optionally) Chebyshev polynomials
        to accelerate convergence.

        This is a parallel method which permits concurrent iterations of the
        classical Conjugate Gradient method.  The loops which can be
        parallelized are the for-loops containing calls to subroutines
        cgt() and cgts().

- Calling sequence

	The calling sequence for procedure tsvd2 is

        long tsvd2(FILE *fp_out1, long n, long p, long s, long job,
                  double tol, double red, double *sig, long maxi,
                  long *mem, long *itcgt, long *titer, double *time,
                  double *res, long *mxv, double **work1, double **work2,
                  double **work3, double **work4, double **work5,
                  double **y, long **iwork, long *lwork, long maxd)

        The user specifies as part of the parameter list:
         
        fp_out1         ... a pointer to output file {FILE *}.
        n               ... order of matrix B for SVD problem {long}.
                            (n must not be greater than ncol, assuming
                            A is nrow by ncol with nrow >> ncol.)
        p               ... number of desired singular triplets (largest)
                            of matrix A. {long}. 
        s               ... dimension of initial subspace {long}.
                            (s should be greater than p; s=2*p is usually
                            safe but more optimal results may be obtained
                            if s is closer to p)
        job             ... acceleration strategy switch {long}.
                            job = 0, no acceleration is used.
                            job = 1, ritz-shifting   is used.
                            job = 2, Chebyshev polynomials and 
                                     Ritz-shifting used.
        maxi            ... maximum number of trace minimization steps
                            allowed {long}.
        tol             ... user-supplied tolerance for residuals of
                            B-eigenpairs which approximate A-singular
                            triplets {double}.
        red             ... user-supplied tolerance for residual reduction
                            to invoke Ritz-shifting (job = 1 or 2) {double}.
        lwork           ... one-dimensional array of length s used for
                            for logic tests (values are 0 or 1).
	 
        The following are work arrays malloc'ed within tsvd2:

                            double **work1, double **work2,
                            double **work3, double **work4, 
                            double **work5, double **y

                            long **iwork

	tsvd2 returns via its parameter list the following items:
	 
        ierr            ... error flag for job parameter {long}.
                            ierr=99, input parameter invalid.
                            ierr= 0, input parameter   valid.
        mem             ... estimate (in bytes) of memory required {long}.
        maxd            ... maximum polynomial degree used (job =2).
        mxv             ... 1-dim. array of length 3 containing matrix
                            times vector counts {long}.
                            mxv[0] = number of A *x. (x is a vector)
                            mxv[1] = number of A'*x. 
                            mxv[2] = mxv[0] + mxv[1].
        sig             ... 1-dim. array of length s containing the desired
                            singular values of A {double}.
        y               ... 2-dim. array containing the corresponding
                            left and right singular vectors of matrix A
                            {double}.  Each column of y stores
                            the left singular vector in the first nrow
                            elements and the right singular vector in the
                            last ncol elements, where nrow is the number of
                            rows of A and ncol is the number of columns of A.
        titer           ... 1-dim. array of length s containing the number
                            of trace min. steps required for each singular
                            triplet of a {long}.
        itcgt           ... 1-dim. array of length s containing the number
                            of Conjugate Gradient steps taken for each 
                            singular triplet approximation of A {long}.
        time            ... 1-dim. array of length 5 specifying timing
                            breakdown (user cpu seconds) {double}.
                            time[0] = Gram-Schmidt orthogonalization.
                            time[1] = spectral decomposition.
                            time[2] = convergence criteria.
                            time[3] = Conjugate Gradient method.
                            time[4] = total time (sum of the above).
        res             ... 1-dim. array of length s containing the 2-norms
                            of residuals for the singular triplets of A
                            {double}.

- User-supplied routines

        For tms2.c, the user must specify multiplication by matrices
        B, A, and A' (subroutines opb, opa, and opat, respectively).

        The specification of opb should look something like

         void opb(long n, double *x, double *y, double shift)

        so that opb takes a vector x and returns y = B*x, where
        B = [(alpha*alpha-shift)*I - A'A].  Here, alpha is an
        upper bound for the largest singular value of A,
        and shift is an approximate squared singular value of A.

        The specification of opa should look something like

                  void opa(double *x, double *y)

        so that opa  takes an n by 1 vector x and returns the m by 1
        vector y = A*x, where A is m by n (m >> n).

        The specification of opat should look something like

                  void opat(double *x, double *y)

        so that opat takes an m by 1 vector x and returns the n by 1
        vector y = A'*x, where A is m by n (m >> n).

	In tms2.c, we use the Harwell-Boeing sparse matrix format for
	accessing elements of the sparse matrix A and its transpose (A').
        Other sparse matrix formats can be used, of course.

- Information

        Please address all questions, comments, or corrections to:

        M. W. Berry
        Department of Computer Science
        University of Tennessee
        107 Ayres Hall
        Knoxville, TN  37996-1301
        email: berry@cs.utk.edu
        phone: (615) 974-5067

- File descriptions

       tms2.c requires the include files tmsc.h and tmsg.h for
       compilation.  Constants are defined in tmsc.h and all
       global variables are defined in tmsg.h.  The input and
       output files associated with tms2.c are listed below.

             Code           Input         Output
            ------      ------------    ---------
            tms2.c      tmp2, matrix    tmo2,tmv2


       The binary output file tmv2 containing approximate left
       and right singular vectors will be created by tms2.c
       if it does not already exist.  If you are running on
       a Unix-based workstation you should uncomment the line

                 /*   #define  UNIX_CREAT */

       in the declarations prior to main() in tms2.c.
       UNIX_CREAT specifies the use of the UNIX "creat" system
       routine with the permissions defined by the PERMS constant

                  #define PERMS 0664

       You may adjust PERMS for the desired permissions on the
       tmv2 file (default is Read/Write for user and group,
       and Read for others).  Subsequent runs will be able to
       open and overwrite these files with the default permissions.

       tms2.c obtains its parameters specifying the
       sparse SVD problem to be solved from the input file
       tmp2. This parameter file contains the single line

        <name>   p    s    job   tol    red    v    maxi

       where

        <name>     is the name of the data set containing nonzeros of A.
        p          is an integer specifying the number of desired
                   triplets;
        s          is an integer specifying the subspace dimension to use.
        job        is an integer specifying the type of acceleration to be
                   used:  
                         job := 0, no acceleration used;
                         job := 1, Ritz-shifting used;
                         job := 2, Chebyshev polynomials
                                   and Ritz-shifting used;
        tol        is a double specifying the residual tolerance for 
                   approximated singular triplets of A.
        red        is a double specifying the residual reduction factor
                   to inititate Ritz-shifting (when job = 1 or 2).
        v          contains the string TRUE or FALSE to indicate when
                   singular triplets are needed (TRUE) and when only
                   singular values are needed (FALSE);
        maxi       is an integer specifying the maximum number of iterations.


        If the parameter "v" from tmp2 is set to "TRUE",
        the unformatted output file tmv2 will contain the approximate
        singular vectors written in the order u[1], v[1], u[2], v[2],
        ..., u[p], v[p].  Here u[i] and v[i] denote the left and right
        singular vectors, respectively, corresponding to the i-th
        approximate singular value of A.

        tms2.c is primarily designed to approximate the p-largest
        singular triplets of A.  Users interested in the p-smallest
        singular triplets via trace minimization should define B=A'A
        rather than (alpha*alpha)*I-A'A, and modify opb accordingly.

- Sparse matrix format

        tms2.c is designed to read input matrices that are stored
        in the Harwell-Boeing sparse matrix format.  The nonzeros
        of such matrices are stored in a compressed column-oriented
        format.  The row indices and corresponding nonzero values
        are stored by columns with a column start index array
        whose entries contain pointers to the nonzero starting each
        column.  tms2.c reads the sparse matrix data from the input
        file called "matrix".

        Each input file "matrix" should begin with a four-line header
        record followed by three more records containing, in order, 
        the column-start pointers, the row indices, and the nonzero
        numerical values.

        The first line of the header consists of a 72-character title
        and an 8-character key by which the matrices are referenced.
        The second line can be used for comments or to indicate record
        length for each index or value array.  Although this line is 
        generally ignored, A CHARACTER MUST BE PLACED ON THAT LINE.
        The third line contains a three-character string denoting the
        matrix type and the three integers specifying the number of rows,
        columns, and nonzeros.  The fourth line which usually contains
        input format for Fortran-77 I/O is ignored by our ANSI-C code.
        The exact format is

		"%72c %*s %*s %*s %d %d %d %*d"

	for the first three lines of the header,

		line 1      <title>         <key>
		 	(col.  1 - 72) (col. 73 - 80)

		line 2   <string>

		line 3   <matrix type> nrow ncol nnzero 

	and 

		"%*s %*s %*s %*s"

	for the last line of the header.

		line 4   <string1> <string2> <string3> <string4>

        Even though only the title and the integers specifying the
        number of rows, columns, and nonzero elements are read, other
        strings of input must be present in indicated positions.
        Otherwise, the format of the "fscanf" statements must be 
        changed accordingly.

- Reference 

        Sameh, A. H. and  Wisniewski, J. A., A trace minimization strategy
        for the generalized eigevalue problem, SIAM J. Numer. Anal. 19:6,
        1243-1259, 1982.
