=-=-=-=-=-=-=-=-=-=-
LAPACK3E README file
=-=-=-=-=-=-=-=-=-=-

Version 1.0:  September 27, 2002
Version 1.1:  November 13, 2002

LAPACK3E is an update to LAPACK version 3.0 enhanced with selected
features of Fortran 90.  It is compatible with both the Fortran 77
interfaces of LAPACK 3 and the Fortran 90 interfaces of LAPACK 95.

!!! Disclaimer !!!

LAPACK3E, like LAPACK itself, is free software.  It is made available
in the hope that it will be useful, but WITHOUT ANY WARRANTY, either
expressed or implied, including, but not limited to, the implied
warranties of merchantability and fitness for a particular purpose.
The entire risk as to the quality and performance of this software is
with you; should the library prove defective, you assume the cost of all
necessary servicing, repair, or correction.

1.  What is included

The LAPACK3E source tree includes the following subdirectories:

BLAS:  netlib BLAS with an improved version of xNRM2
INSTALL:  a directory with a few subroutines that need to be installed
   before you can compile the SRC directory
SRC:   the LAPACK3E source code
TESTING:  the LAPACK test code

2.  What is not included

Timing code from the original LAPACK source tree.  Most of the old
timing software would work fine if linked with LAPACK3E, but to
evaluate the performance of the iterative algorithms for eigenvalue
problems, instrumented versions of the LAPACK3E source code would be
needed that count operations.  These have not (yet) been done.

Testing and timing code for the BLAS.  This has been omitted mostly
just to avoid duplication.

The high-level Makefile.  Installation instructions are included in
this file which should be easier to recover from in case of problems.

3.  What is new (compared to LAPACK 3)

The BLAS/SRC directory contains new versions of SNRM2, SCNRM2, DNRM2,
and DZNRM2 that call the LAPACK auxiliary routines SLASSQ, DLASSQ,
CLASSQ, or ZLASSQ.  These versions of the BLAS 2-norm routines are
significantly faster than the versions found on netlib, and they fix
scaling problems found in both the netlib versions and in most vendor
libraries.  LAPACK3E relies on proper scaling of the xNRM2 routines as
implemented here to pass its tests.

The INSTALL directory contains a C routine in "rounding_mode.c" which
is referenced by the LAPACK3E version of xLAMCH.  It must be installed
to ensure proper Fortran to C calling conventions.  If this proves too
difficult, just assume the rounding mode is 1 (most machines have
rounded arithmetic).  Also, the sanity checks for SECOND and DSECND,
which formerly did a simple SAXPY operation, have been replaced with the
LINPACK 100 benchmark to provide a more meaningful test.

The SRC directory is mostly what is new in LAPACK3E.  A separate report
will describe the LAPACK3E features in more detail.  In summmary, they
are

* Eliminate SAVE statements for thread safety
* Use PARAMETERs for replicated constants
* Parameterize KIND to allow a common source for single and double
  precision
* Create Fortran 90 interface modules for all LAPACK computational,
  driver, and auxiliary routines, extending LAPACK95
* Use generic interfaces defined in modules for all subroutine calls
  to allow compile-time argument checking
* Use the preprocessor for renaming at compile time
* Include bug fixes and performance improvements

Notable algorithmic improvements can be found in xGEBAL, xLARTG, xLARGV,
xLASSQ, xRSCL, and xHGEQZ.

The latest versions of xSTEGR from Inderjit Dhillon for computing
eigenvectors to high relative accuracy are included in both real and
complex versions, but are currently disabled due to test failures.

The TESTING/LIN directory contains updated test code for the driver
routines CPTSV and CPTSVX, which have an extra argument UPLO in LAPACK3E
to specify whether the subdiagonal or the superdiagonal of the complex
Hermitian tridiagonal matrix is stored.  The old LAPACK interface, which
assumed the subdiagonal was stored, is only supported in the generic
interfaces in LAPACK3E.

The TESTING/EIG directory contains new test code for the balancing and
back transformation routines in LAPACK.  The old test code returned
atypical test ratios that were not well documented and were generally
ignored by LAPACK testers, even when they pointed to a problem.  The
new test code rigorously tests new features of the balancing routines,
and will break the old LAPACK 3 versions of xGEBAL.

The TESTING directory contains scripts called sgo, cgo, dgo, and zgo
to run the eigenvalue tests in each precision with all their different
input files.

4.  How to install and test LAPACK3E

If no machine-specific subroutines need to be installed and if all four
precisions are desired, the instructions for installing LAPACK3E could
be as simple as

   cd $(LAPACKPATH)/BLAS/SRC; make
   cd $(LAPACKPATH)/SRC; make
   cd $(LAPACKPATH)/TESTING/MATGEN; make
   cd $(LAPACKPATH)/TESTING/LIN; make
   cd $(LAPACKPATH)/TESTING/EIG; make

On a CRAY T3E system, replace "make" with "make -f Makefile.t3e".
More details are provided in the following sections.

4.1  make.inc and other machine-specific settings

All the makefiles in the LAPACK3E source tree reference the make.inc
file in the main LAPACK3E directory.  Locate this file and set the
compiler names, options, and path names appropriately for your platform.
Sample make.inc files are provided for CRAY T3E, Sun Solaris, and IBM
AIX environments as guidelines (for example, make.inc.t3e).  Be sure to
set the path LAPACKPATH to point to the correct location of your
LAPACK3E directory.

Certain compiler options are needed to implement the common source
features of LAPACK3E.  On the IBM SP, they include

REAL32 = -WF,-DLA_REALSIZE=4

This setting passes the option "-DLA_REALSIZE=4" to the Fortran
preprocessor, which causes subroutine names and Fortran modules to be
properly renamed for 32-bit processing.  Renaming is specified by
#define statements; on some systems, compiler options for "macro
expansion" must also be specified in order for the #define's to be
substituted into source code lines.  For example, on the CRAY T3E the
equivalent option is

REAL32 = -F -DLA_REALSIZE=4

where the -F option enables macro expansion.

OpenMP directives are included in certain routines to solve linear
systems with multiple right-hand sides in parallel.  These directives
can be enabled with the preprocessing compiler option -D_OPENMP.

Next you need to install some routines with machine-specific features
in $(LAPACKPATH)/INSTALL.  Alternate versions of the auxiliary routines
second.f, dsecnd.f, and rounding_mode.c are located in this directory
if needed.  First test the Fortran modules that define parameters for
use in LAPACK3E.  Enter

make testparams testparamd
./testparams
./testparamd

Typical output for testparamd (here from an IBM SP) is as follows:

 Values set in LA_CONSTANTS:
  WP      =  8
  ZERO    =  0.000000000000000000E+00
  HALF    =  0.500000000000000000
  ONE     =  1.00000000000000000
  TWO     =  2.00000000000000000
  THREE   =  3.00000000000000000
  FOUR    =  4.00000000000000000
  EIGHT   =  8.00000000000000000
  TEN     =  10.0000000000000000
  CZERO   =  (0.000000000000000000E+00,0.000000000000000000E+00)
  CHALF   =  (0.500000000000000000,0.000000000000000000E+00)
  CONE    =  (1.00000000000000000,0.000000000000000000E+00)
  EPS     =  0.222044604925031308E-15
  ULP     =  0.444089209850062616E-15
  SAFMIN  =  0.222507385850720138E-307
  SAFMAX  =  0.449423283715578977E+308
  SMLNUM  =  0.501042090002243194E-292
  BIGNUM  =  0.199584030953471981E+293
  RTMIN   =  0.707843266551461421E-146
  RTMAX   =  0.141274212421613572E+147
  SPREFIX = 'D'
  CPREFIX = 'Z'

Zeros or exceptional values for any of the machine parameters might
require modification to the Fortran modules la_constants32.F or
la_constants.F.  Copy any changes to la_constants.F and la_constants32.F
to $(LAPACKPATH)/SRC.

Next test the Fortran 90 version of SLAMCH and/or DLAMCH by entering

make testslamch testdlamch
./testslamch
./testdlamch

or, on a Cray system,

make -f Makefile.t3e testhlamch testslamch
./testhlamch
./testslamch

Typical output from testdlamch is

  Epsilon                      =  0.222044604925031308E-15
  Safe minimum                 =  0.222507385850720138E-307
  Base                         =  2.00000000000000000
  Precision                    =  0.444089209850062616E-15
  Number of digits in mantissa =  53.0000000000000000
  Rounding mode                =  1.00000000000000000
  Minimum exponent             =  -1021.00000000000000
  Underflow threshold          =  0.222507385850720138E-307
  Largest exponent             =  1024.00000000000000
  Overflow threshold           =  0.179769313486231571E+309
  Reciprocal of safe minimum   =  0.449423283715578977E+308

SLAMCH and DLAMCH are no longer called by LAPACK3E, but are still used
in the test code.  If there are differences between xLAMCH and the
machine parameters returned by testparams and testparamd, these should
be resolved.  In general, the values of testslamch and testdlamch should
be used to set la_constants32.F and la_constants.F, because they use
the compiler's Fortran 90 model parameters.

The C routine "rounding_mode.c" used in xLAMCH is a potential source of
difficulty depending on how Fortran interfaces with C.  It may be
necessary to modify the #define statement in rounding_mode.c to add an
underscore or capitalize the name of the function.  If the loader fails
to find this routine, it should give a clue about the name it was
expecting.  Copy any changes to slamch.F or rounding_mode.c to
$(LAPACKPATH)/SRC.

The original LAPACK package tested the timing routines SECOND and DSECND
by timing a small SAXPY loop, which was occasionally optimized away by
aggressive compilers.  In LAPACK3E, this test is replaced by the LINPACK
100 benchmark, which solves a linear system for which the correct result
is a vector of 1's.  The expected values of X(1) = X(N) = 1. can be
checked visually in the output.  Run these tests by

make testsecond testdsecnd
./testsecond
./testdsecnd

or, on a Cray system,

make -f Makefile.t3e testsecond
./testsecond

Values of zeros in the timing of xGEFA or xGESL, or a floating point
divide by zero condition, are an indication that your timer is not
accurate enough.  Alternate versions of SECOND and DSECND for other
architectures are provided in this directory (see the RELEASE_NOTES for
instructions on their use).  On Cray platforms, the system SECOND
routine can be used.  If changes have been made to second.f and/or
dsecnd.f, copy them to $(LAPACKPATH)/TESTING/LIN and
$(LAPACKPATH)/TESTING/EIG.

The INSTALL directory makefile also includes the test program "testieee"
to test if your machine supports operations on IEEE exceptional values.
Because of test failures, I do not recommend that you enable the
IEEE-specific code.  The versions of the LAPACK auxiliary routine
ilaenv.f in $(LAPACKPATH)/SRC, $(LAPACKPATH)/TESTING/LIN, and
$(LAPACKPATH)/TESTING/EIG currently return ILAENV=0 for ISPEC=10 and
ISPEC=11, indicating that IEEE arithmetic is not supported.  This
setting causes tests of xSTEGR to be skipped.  The code for xSTEGR and
its auxiliary routines is provided in $(LAPACKPATH)/SRC anyway for
experimentation and further study.

4.2  Compile the BLAS library

As described previously, new versions of the BLAS 2-norm routines
xNRM2 are provided in LAPACK3E that fix long-standing problems with
performance and improper scaling.  They should be compiled into the
LAPACK3E library.  You may also wish to compile them into a supplemental
BLAS library for use in other applications that do not use LAPACK.
Optimized vendor's libraries can generally be used for the remaining
BLAS.  See the file RELEASE_NOTES for some known exceptions.  To compile
the recommended replacement BLAS into the LAPACK3E library, go to
$(LAPACKPATH)/BLAS/SRC and enter

make

Several platform-specific makefiles are provided in this directory.
For example, on a CRAY T3E system, use the command

make -f Makefile.t3e

If you are compiling fewer than all four precisions, you will need to
specify the ones you want on the command line as described in the
Makefile.  For example, to compile only 32-bit real and complex
routines, the command would be

make single complex

4.3  Choose your subset of LAPACK3E

In previous releases of LAPACK, it was possible to compile a subset of
the LAPACK library using options to make.  This is not so easy with
LAPACK3E because the generic interfaces reference the specific names
for all the precisions that are expected to be in the library.  By
default, all four precisions (32-bit real, 64-bit real, 32-bit complex,
and 64-bit complex) are assumed to be included.  If you want all four
precisions (the recommended choice for non-Cray platforms), go to
section 4.4.  If you want just a subset of these (for example, only the
64-bit routines), then you must modify the modules defining the generic
interfaces.  A Fortran program called "rename" is provided in
$(LAPACKPATH)/rename.f for this purpose.

To use the rename facility, first compile the program:

f90 -o rename rename.f

The program "rename" reads a file called "rename.opts" containing the
following lines:

F       Put T to use the 32-bit REAL type
T       Put T to use the 64-bit REAL type
F       Put T to use the 32-bit COMPLEX type
T       Put T to use the 64-bit COMPLEX type

In this example, the file has been set to select the 64-bit real and
complex types.  Then each of the interface files (those files in
$(LAPACKPATH)/SRC beginning with "la_" and ending with ".f") must be
modified by rename, for example,

rename < la_blas1.f > la_blas1.f_new;  mv la_blas1.f_new la_blas1.f

This is best done in a script.  The makefile in LAPACK3E/SRC must also
be modified to specify the subset you want.  Combinations other than
the setting of rename.opts shown above haven't been thoroughly tested,
so if you want to try this, you're on your own.

For CRAY users, the modification to the module files to include only the
64-bit REAL and COMPLEX versions has already been done for you.  A set
of Fortran modules for CRAY users can be found in the directory
$(LAPACKPATH)/SRC/lafiles.t3e.  To use these modules, copy the files
from $(LAPACKPATH)/SRC/lafiles.t3e to $(LAPACKPATH)/SRC before going on
to section 4.4.

4.4  Compile the LAPACK3E library

Go to $(LAPACKPATH)/SRC and enter

make

or, on a CRAY system,

make -f Makefile.t3e

The Fortran 90 module files are compiled first by the makefiles.  On
many systems, module information is placed in <filename>.mod while
object code is placed in <filename>.o.  On CRAY systems, the .mod and .o
files are combined.  The module information must be referenced by
subsequent compiles.

4.5  Compile the LAPACK test code

LAPACK3E doesn't yet have common source for the test code, so the old
LAPACK test code is used.  If you compiled all four precisions in step
4.4, you can just enter

make

in the test directories, or

make -f Makefile.t3e

for the CRAY T3E.  The Makefiles have further examples if needed.
Enter make in each of the directories

$(LAPACK3E)/TESTING/MATGEN
$(LAPACK3E)/TESTING/LIN
$(LAPACK3E)/TESTING/EIG

4.6  Run the tests

To run the linear systems tests, go to $(LAPACKPATH)/TESTING and enter
the commands

./xlintsts <stest.in > stest.out
./xlintstd <dtest.in > dtest.out
./xlintstc <ctest.in > ctest.out
./xlintstz <ztest.in > ztest.out

To run the eigensystem tests, use the following scripts:

./sgo
./dgo
./cgo
./zgo

Use only the "S" and "C" tests on a CRAY T3E.
Output of these scripts is concatenated together in one output file for
each precision, for example, seig.out.  Read the output files and
report any test failures, except those for xGGESX and xGGEVX as noted
previously.  Descriptions of the tests run and the meaning of the test
ratios can be found in the LAPACK installation guide.  The most recent
of these, LAPACK Working Note 81, can be found in the
$(LAPACKPATH)/INSTALL directory.

5.  How to report problems

If you have any problems installing LAPACK3E, first consult the file
$(LAPACKPATH)/RELEASE_NOTES for suggestions and workarounds.  Please
send any reports of problems specific to LAPACK3E to me.  If you have
general questions about LAPACK, they should be directed to
lapack@cs.utk.edu.


Ed Anderson
eanderso@cs.utk.edu
September 26, 2002

Updates to this file:
October 14, 2002:  Recommend against testing xSTEGR in this version
11-04-02:  Fix INTENT in sladq3, la_xlasqx
11-08-02:  Fix bugs in slartg, slargv
11-13-02:  Improve performance of sgebal, cgebal
