These subdirectories contain the code for the PRISM matrix multiply
routine called BiMMeR.  The code uses MPI as its communication layer.

All files have the following notice:
----------------------------------------------------------------------
   COPYRIGHT U.S. GOVERNMENT 
   
   This software is distributed without charge and comes with
   no warranty.

   Please feel free to send questions, comments, and problem reports
   to prism@super.org. 
----------------------------------------------------------------------


To make the standard version of the code:

1) The Makefiles are similar to the ones provided by Argonne National
Laboratory/Mississippi State. The easiest way to get the code running
on your system is to set certain environment variables.
Alternatively, you could edit the Makefile and appropriate *.include
file.  Below are the machines which we have run on and the setting for
the environment variables (in csh syntax).  Where the value is a "?",
you need to fill in the location on your machine.  Please look at the
Machine Specific Information section located in file
./code/mm_test/README for known messages you can get during the make
process.  If you want to make the codes a second time for a different
ARCH or COMM then you need to do a "make cleanobjs" after you reset
the ARCH and/or COMM.  See 2) for a script to build the codes and
libraries.

sun4 using ANL/MS mpich with p4: 
setenv PRISM_MPI_ARCH sun4
setenv PRISM_MPI_COMM ch_p4
setenv PRISM_MPI_HOME ?
setenv PRISM_LAPACK_LIB ?
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

IBM SP1 using ANL/MS mpich:
setenv PRISM_MPI_ARCH rs6000
setenv PRISM_MPI_COMM ch_eui
setenv PRISM_MPI_HOME ?
setenv PRISM_LAPACK_LIB ?
# usually done with "-lessl -lblas -bnso -bI:/lib/syscalls.exp"
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

IBM SP1 using IBM's MPIF version of MPI:
setenv PRISM_MPI_ARCH rs6000
setenv PRISM_MPI_COMM mpif
# this assumes your mpif is in the standard location
setenv PRISM_MPI_HOME /usr/lpp/mpif
setenv PRISM_LAPACK_LIB ?
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

Intel Paragon using ANL/MS mpich:
setenv PRISM_MPI_ARCH paragon
setenv PRISM_MPI_COMM ch_nx
setenv PRISM_MPI_HOME ?
# use "/lapack_dir/lapack_paragon.a -lf"
setenv PRISM_LAPACK_LIB ?
# usually done with "-lkmath"
setenv PRISM_BLAS_LIB 
setenv PRISM_MAKE_INCLUDE ../../make_include/

Intel Delta using ANL/MS mpich:
setenv PRISM_MPI_ARCH intelnx
setenv PRISM_MPI_COMM ch_nx
setenv PRISM_MPI_HOME ?
# use "/lapack_dir/lapack_delta.a -lf"
setenv PRISM_LAPACK_LIB ?
# usually done with "-lkmath"
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

Meiko CS2 using ANL/MS mpich:
setenv PRISM_MPI_ARCH meiko
setenv PRISM_MPI_COMM ch_meiko
# location at Livermore
setenv PRISM_MPI_HOME /u1/lusk/mpich
setenv PRISM_LAPACK_LIB ?
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

Thinking Machine CM5 using cmmd:
setenv PRISM_MPI_ARCH sun4
setenv PRISM_MPI_COMM ch_cmmd
setenv PRISM_MPI_HOME ?
setenv PRISM_LAPACK_LIB ?
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

Fujitsu AP1000:
setenv PRISM_MPI_ARCH AP1000
setenv PRISM_MPI_COMM 
setenv PRISM_MPI_HOME 
setenv PRISM_LAPACK_LIB ?
setenv PRISM_BLAS_LIB ?
setenv PRISM_MAKE_INCLUDE ../../make_include/

NOTE: The AP1000 version is supplied for ANU.  We have not run on this
machine and do not know if things work correctly.  You should contact
us or ANU to see what the current status is.

2) To create the matrix multiply library and test code, run the script:
create_bimmer

If you want to rebuild the code from scratch use:
create_bimmer new

The BiMMeR test code is created in mm_test.  The libraries are created
in code/lib/"ARCH"/"COMM"/lib*.a.  (The BiMMeR code requires two
libraries, libmm.a and libutility.a, which are linked in by the create
scripts.)  See the README files in mm for general information on the
BiMMeR code and the README file in mm_test for information on running
the test code.  README files exist in the other subdirectories which
describe what they contain.  The test codes use libmmt.a.  This
version of the library (created by the script) enables timers.  You
should use libmm.a for linking since it is slightly faster.

3) You will probably want to run some jobs next.  The standard scripts
are in the directory code/mm_test with the names "ARCH"-"COMM".run.
The ARCH and COMM are just like the Makefile.  These scripts use the
environment variable PRISM_PROG_DIR, which is set in the script.  The
output files go in the directory $PRISM_PROG_DIR/outmm, which will be
created for you if one does not already exist.  Edit the script, if
you desire a different location.  A number of the ARCHs don't depend
on the COMM so the different COMM files are the same.  The script runs
the MPI collective operations and the PRISM collective operations
using MPI.  For the sun4 and CS2, the scripts use mpirun from ANL/MS.
You need to have $PRISM_MPI_HOME/util in your path.  You also need
$PRISM_MPI_HOME/util/machines/machines."arch" set to the appropriate
machines (this is usually done by the MPI administrator).

You will get one file per run.  A description of the input arguments
is given in the mm_test README file.  The test runs are split into 5
different types of tests, denoted m1 through m5.  Before running the
complete script (which has over 370 cases), try to run just a few of
the cases first, to make sure everything is OK.

The m1 tests produce files such as:
m1_"c"_"p"_"ldim"_"log".m  where

c    = mc, if it used only MPI collective operation, or
       pc, if it used the PRISM collective operation algorithms (designed
           for the Intel machines) but used MPI send/recv calls.
p    = size of square mesh on which the code logically ran.  Thus, it ran on
       p^2 processors and treated it as a pxp mesh.
ldim = the local dimension of the matrix run.  Since it ran on a pxp
       mesh, the full matrix size N = p * ldim.
log  = ff -> tested A*B,
       ft -> tested A*B^t,
       tf -> tested A^t*B.  A^t*B^t is not an option at this time.

The m2 tests produce files such as:
m2_"c"_"p"_"log".m where "c", "p", and "log" mean the same as above.  The code runs 
                   a fixed size matrix N = 2200 on varying number of nodes.

The m3 tests produce files such as:
m3_"c"_"p"_"log".m where "c", "p", and "log" mean the same as above.  The code runs 
                 a fixed size matrix N = 6600 on varying number of nodes.

The m4 tests produce files such as:
m4_"c"_"i"_"w"_"log".m where
c   = as above.
i   = counter.  As the counter varies, the size of the logical mesh also
      changes.  The code runs on a 16xp logical mesh which corresponds
      to 16 * p nodes,  p = 4, 6, 8, 16 when i = 1, 2, 3, 4.
w   = the panelwidth (size of local matrix blocks in the torus wrap).  
      w is set equal to 1, 4, 16.
log = as above.

The m5 tests produce files such as:
m5_"c"_"i"_"w"_"log".m where
c   = as above.
i   = counter.  As the counter varies, the size of the logical mesh also
      changes.  The code runs on a px16 logical mesh which corresponds
      to p * 16 nodes,  p = 4, 6, 8, 16 when i = 1, 2, 3, 4.
w   = the panelwidth (size of local matrix blocks in the torus wrap). 
      w is set equal to 1, 4, 16.
log = as above.


The routines have a large number of cpp options.  Whenever you change
options you should do a "create_bimmer_new" to make sure that the new
cpp option(s) are compiled for all routines.  THIS IS NOT AUTOMATIC.
The general options are:

-DPRISM_DELTA: Used to define Intel Delta specific code.

-DPRISM_PARAGON: Used to define Intel Paragon specific code.

-DPRISM_SP1: Used to define IBM SP1 specific code.

-DPRISM_SUN: Used to define Sun specific code.

-DPRISM_MEIKO: Used to define Meiko CS2 specific code.

-DPRISM_AP1000: Used to define Fujitsu AP1000 specific code.

-DPRISM_NX: Use Intel NX routines.  The code then reverts to a version
similar to what existed before the MPI conversion.  The BiMMeR code
DOES NOT support this option any longer.  You should not use it for
these PRISM routines.

-DPRISM_MPI_COLL: If set, then use MPI routines to perform the
collective operations.  You cannot use this with PRISM_NX.

Make sure that all routines are compiled with the same general CPP
options.
