

			  MPICH2 Release 0.94

MPICH2 is an all-new implementation of MPI from the group at Argonne
National Laboratory.  It shares many goals with the original MPICH but
no actual code.  It is intended to become a portable, high-performance
implementation of the entire MPI-2 standard.  This release has many
MPI-2 features but is not quite complete (see below for the status of MPI-2 
features in MPICH2).  It also currently supports only a few communication
methods. 

This is an early release of MPICH2.  It has been tested by us on a
variety of machines in our own environment, but not extensively tested
by outside users.  If you have problems, please report them to
mpich2-maint@mcs.anl.gov; we are interested in incorporating user
experiences as quickly as possible.  The web site 
http://www.mcs.anl.gov/mpi/mpich2 will also have information on 
bug fixes and new releases.  This release does not include user or 
installation manuals.

Getting Started
===============

The following instructions take you through a sequence of steps to get
the default configuration (TCP communication, MPD process management) of
MPICH2 up and running.  Alternate configuration options are described
later, in the section "Alternative configurations".  

1.  You will need the following prerequisites.

      This tar file mpich2.tar.gz
      A C compiler (gcc is sufficient)
      A Fortran compiler if Fortran applications are to be used (g77 is
	sufficient) 
      A C++ compiler for the C++ MPI bindings (g++ is sufficient)
      Python 2.2 or later (for the default MPD process manager)
      PyXML and an XML parser like expat (in order to use mpiexec)

    Configure will check for these prerequisites and try to work around
    deficiencies if possible.  (If you don't have Fortran, you will
    still be able to use MPICH2, just not with Fortran applications.)

2.  Unpack the tar file and go to the top level directory:

      tar xfz mpich2.tar.gz
      cd mpich2-0.94

    If your tar doesn't accept the z option, use

      gunzip mpich2.tar.gz
      tar xf mpich2.tar
      cd mpich2-0.94

3.  Choose an installation directory (the default is /usr/local/bin):

      mkdir /home/you/mpich2-install

    It will be most convenient if this directory is shared by all of the
    machines where you intend to run processes.  If not, you will have
    to duplicate it on the other machines after installation.

4.  Configure MPICH2, specifying the installation directory:

      ./configure -prefix=/home/you/mpich2-install >& configure.log

    (On sh and its derivatives, use > configure.log 2>&1 instead of >&).
    Other configure options are described below.  You might also prefer
    to do a VPATH build (see below).  Check the configure.log file to
    make sure everything went will.  Problems should be
    self-explanatory, but if not, sent configure.log to
    mpich2-maint@mcs.anl.gov.

5.  Build MPICH2:

      make >& make.log

    This step should succeed if there were no problems with the
    preceding step.  Check make.log.  If there were problems, send
    make.log to mpich2-maint@mcs.anl.gov.

6.  Install the MPICH2 commands:

      make install >& install.log

    This step collects all required executables and scripts in the bin
    subdirectory of the directory specified by the prefix argument to
    configure. 

7.  Add the bin subdirectory of the installation directory to your path:

      setenv PATH /home/you/mpich2-install:$PATH

    for csh and tcsh, or 

      export PATH=/home/you/mpich2-install:$PATH

    for bash and sh.  Check that everything is in order at this point by
    doing 

      which mpd
      which mpiexec
      which mpirun

    All should refer to the commands in the bin subdirectory of your
    install directory.  It is at this point that you will need to
    duplicate this directory on your other machines if it is not
    in a shared file system such as NFS.

8.  MPICH2, unlike MPICH, uses an external process manager for
    scalable startup of large MPI jobs.  The default process manager is
    called MPD, which is a ring of daemons on the machines where you
    will run your MPI programs.  In the next few steps, you will get his
    ring up and tested.  More details on interacting with MPD can be
    found in the README file in mpich2/src/pm/mpd, but the instructions
    given here should be enough to get you started.

    Begin by placing in your home directory a file named .mpd.conf,
    containing the line 

      password=<passwd>

    where <passwd> is a string known only to yourself.  It should not be
    your normal Unix password.  Make this file readable and writable
    only by you:

      chmod 600 .mpd.conf

9.  The first sanity check consists of bringing up a ring of one mpd on
    the local machine, testing one mpd command, and bringing the "ring"
    down. 

      mpd &
      mpdtrace
      mpdallexit

    The output of mpdtrace should be the hostname of the machine you are
    running on.  The mpdallexit causes the mpd daemon to exit.

10. Now we will bring up a ring of mpd's on a set of machines.  Create a
    file consisting of a list of machine names, one per line.  Name this
    file mpd.hosts.  These hostnames will be used as targets for ssh or
    rsh, so include full domain names if necessary.  Check that you can
    reach these machines with ssh or rsh without entering a password.
    You can test by doing

      ssh othermachine date

    or

      rsh othermachine date

    If you cannot get this to work without entering a password, you will
    need to configure ssh or rsh so that this can be done, or else use
    the workaround for mpdboot in the next step.

11. Start the daemons on (some of) the hosts in the file mpd.hosts

      mpdboot

    By default, mpdboot starts one mpd on each of the machines in the
    file mpd.hosts, plus one on the local machine even if it is not in
    the file.  You can start fewer by doing:

      mpdboot -n <number to start>  

    The number to start can be less than 1 + number of hosts in the
    file, but cannot be greater than 1 + the number of hosts in the
    file.  One mpd is always started on the machine where mpdboot is
    run, and is counted in the number to start, whether or not it occurs
    in the file.

    There is a workaround if you cannot get mpdboot to work because of
    difficulties with ssh or rsh setup.  You can start the daemons "by
    hand" as follows:

       mpd &                # starts the local daemon
       mpdtrace -l          # makes the local daemon print its port
       hostname             # remind yourself of this host's name

    Then log into each of the other machines, put the install/bin
    directory in your path, and do:

       mpd -h <hostname> -p <port> &

    where the hostname and port belong to the original mpd that you
    started.  From each machine, after starting the mpd, you can do 

       mpdtrace

    to see which machines are in the ring so far.  More details on
    mpdboot and other options for starting the mpd's are in
    mpich2/src/pm/mpd/README.


12. Test the ring you have just created:

      mpdtrace

    The output should consist of the hosts where MPD daemons are now
    running.  You can see how long it takes a message to circle this
    ring with 

      mpdringtest

    That was quick.  You can see how long it takes a message to go
    around many times by giving mpdringtest an argument:

      mpdringtest 100
      mpdringtest 1000

13. Test that the ring can run a multiprocess job:

      mpdrun -n <number> hostname

    The number of processes need not match the number of hosts in the
    ring;  if there are more, they will wrap around.  You can see the
    effect of this by getting rank labels on the stdout:

      mpdrun -l -n 30 hostname

    You probably didn't have to give the full pathname of the hostname
    command because it is in your path.  If not, use the full pathname:

      mpdrun -l -n 30 /bin/hostname

14. Now we will run an MPI job, using the mpiexec command as specified
    in the MPI-2 standard.  There are some examples in the install
    directory, which you have already put in your path, as well as in
    the directory mpich2/examples.  One of them is the classic cpi
    example, which computes the value of pi by numerical integration in
    parallel.   

      mpiexec -n 5 cpi

    As with mpdrun (which is used internally by mpiexec), the number of
    processes need not match the number of hosts.  The cpi example will
    tell you which hosts it is running on.

    There are many options for mpiexec, by which multiple executables
    can be run, separate command-line arguments and environment
    variables can be passed to different processes, and working
    directories and search paths for executables can be specified.  Do

      mpiexec --help

    for details. A typical example is:

      mpiexec -n 1 master : -n 19 slave

    The mpirun command from the original MPICH is still available,
    although it does not support as many options as mpiexec.  You might
    want to use it in the case where you do not have the XML parser
    required for the use of mpiexec.

If you have completed all of the above steps, you have successfully
installed MPICH2 and run an MPI example.  

Alternatives
============

The above steps utilized the MPICH2 defaults, which included choosing
TCP for communication (the "sock" channel) and the MPD process manager.
Other alternatives are available.  You can find out about configuration
alternatives with

   ./configure --help

in the mpich2 directory.  The alternatives described below are
configured by adding arguments to the configure step

Alternative process managers
----------------------------

mpd
---

MPD is the default process manager.  Its setup and use have been
described above.  The file mpich2/src/pm/mpd/README has more 
information about interactive commands for managing the ring of mpds.


forker
------

Forker is a process manager that creates processes on a single machine,
by having mpiexec itself fork and exec them.  It is useful for
shared-memory multiprocessors (SMPs) where you want to create all the
processes on the same machine, and also for debugging, where it is often
convenient to run all the processes on a single machine, since parallel
performance is less important.  The forker version of mpiexec may not
support all the standard options and arguments.

specify that you want to use forker by doing 

  --with-pm=forker  

on the configure command line.

remsh (not yet available)
-------------------------

Remsh is a process manager that starts the processes via remote shell.
It does not require any pre-existing user daemons, since it uses the
remote shell daemon.  It does not provide as scalable a startup speed as
MPD, and its version of mpiexec may not support all of the standard
options and arguments.  It is useful if you are having difficulties with
MPD, since it does not require a python installation.  This process 
manager is still under development.


Alternative channels and devices
--------------------------------

The communication mechanisms in MPICH2 are called "devices", paired with
specific "channels".  The most thoroughly tested device is the "ch3"
device.  The default configuration chooses the "sock" channel in the ch3
device (all communication goes over TCP sockets), which would be
specified explicitly by putting

  --with-device=ch3:sock

on the configure command line.  The ch3 device has two other channels:
"shm" (shared memory) for use on SMPs (all communication goes through
shared memory instead of over TCP sockets) and "ssm" (sockets and shared
memory) for use on clusters of SMPs (communication between processes on
the same machine goes through shared memory; communication between
processes on different machines goes over sockets).  Configure these by
putting

  --with-device=ch3:shm

or 

  --with-device=ch3:ssm

on the configure command line.

A third channel in the ch3 device is the "rdma" channel, which is still
experimental.  To specify it, use

  --with-device=ch3:rdma --with-rdma=shm
   
on the configure command line.  


VPATH Builds
============

MPICH2 supports building MPICH in a different directory tree than the
one where the MPICH2 source is installed.  This often allows faster
building, as the sources can be placed in a shared filesystem and the
builds done in a local (and hence usually much faster) filesystem.  To
make this clear, the following example assumes that the sources are
placed in /usr/me/mpich2-10-25-02, the build is done in /tmp/me/mpich2,
and the installed version goes into /usr/local/mpich2-test:

  cd /usr/me
  tar xzf mpich2-<VERSION>.tar.gz
  cd /tmp/me
  # Assume /tmp/me already exists
  mkdir mpich2
  cd mpich2
  /usr/me/mpich2-<VERSION>/configure --prefix=/usr/local/mpich2-test
  make
  make install


Optional Features
-----------------

MPICH2 has a number of optional features.  If you are exploring MPICH2
as part of a development project the following configure options are
important:

Performance Options:

 --enable-fast - Turns off error checking and collection of internal 
                 timing information
 --enable-timing=no - Turns off just the collection of internal timing
                 information
MPI Features:
  --enable-romio - Build the ROMIO implementation of MPI-IO.  This is
                 the default

  --with-file-system - When used with --enable-romio, specifies filesystems
                 ROMIO should support.  See README.romio

Language bindings:

  --enable-f77 - Build the Fortran 77 bindings.  This is the default.  
                 It has been tested with the Fortran parts of the Intel
		 test suite.
  --enable-f90 - Build the Fortran 90 bindings.  This is not on by
                 default, since these have not yet been tested. 
  --enable-cxx - Build the C++ bindings.  This has been tested with the
                 Notre Dame C++ test suite.

Cross compilation:

  --with-cross=filename - Provide values for the tests that required
                 running a program, such as the tests that configure
		 uses to determine the sizes of the basic types.
		 This should be a fine in Bourne shell format containing
		 varable assignment of the form
                       CROSS_SIZEOF_INT=2
                 for all of the CROSS_xxx variables.  A list will
		 be provided in later releases; for now, look at the 
		 configure.in files.

Error checking and reporting:

  --enable-error-checking=level - Control the amount of error checking.
                 Currently, only "no" and "all" is supported; all is the 
		 default.
  --enable-error-messages=level - Control the aount of detail in error
                 messages.  By default, MPICH2 provides instance-specific
		 error messages, but with this option, MPICH2 can be
		 configured to provide less detailed messages.  This
		 may be deesirable on small systems, such as clusters
		 built from game consoles or high-density massively
		 parallel systems.

Compilation options for development:

  --enable-g=value - Controls the amount of debugging information
                 collected by the code.  The most useful choice here is
		 dbg, which compiles with -g.
  --enable-coverage - An experimental option that enables GNU coverage
                 analysis.
  --with-logging=name - Select a logging library for recording the 
                 timings of the internal routines.  We have used this
		 to understand the performance of the internals of MPICH2.
		 Wait for the next release for detailed instructions.

Status of MPI-2 Features in MPICH2
==================================
MPICH2 includes all of MPI-1 and the following parts of MPI-2:

MPI-I/O, except for the external data representations (e.g., MPICH2 includes
    all of ROMIO)

Active target RMA is functional, including for derived datatypes at the target

Name publishing is supported on shared file systems (other name publishers
are under development)

Routine not implemented include the dynamic process routines (MPI_Comm_spawn
and friends); other routines, such as the intercommunicator extensions to the
collective routines, have not been extensively tested.
