OSU MPI MVAPICH-0.9.7 in OFED 1.0 Release Notes
===============================================

June 2006

Overview
--------
These are the release notes for OSU MPI MVAPICH-0.9.7, Rev 0.9.7-mlx-2.1.0.
This is OFED's edition of OSU MPI MVAPICH-0.9.7 release. OSU MPI is an MPI
channel implementation over InfiniBand from Ohio State University (OSU)
(http://nowlab.cse.ohio-state.edu/projects/mpi-iba/).

Software Dependencies
---------------------
OSU MPI depends on the installation of the OFED Distribution stack with OpenSM
running. The MPI module also requires an established network interface (either
InfiniBand IPoIB or Ethernet).

New Features
------------
This module is based on the MVAPICH-0.9.7 (MPI-1 over OpenIB/Gen2) module at
openib.org gen2.  This version for OFED has the following additional features:
- A default configuration file mvapich.conf was added
- SRQ was changed to become a run time option (SRQ is enabled by default)
- Multi-HCA and multi-port support - but not multirail. Each job can select on
  which HCA and which port to run (jobs on the same node can use different
  ports/HCAs). The user can control all settings via a hostfile.
- MPI_Alltoall tuning option for big clusters
- OFED packaging and installation scripts

Bug Fixes
---------
- Fix from mvapich trunk
  (https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk) revision: 34:36
- Fix from mvapich trunk
  (https://mvapich.cse.ohio-state.edu/svn/mpi/mvapich/trunk) revision: 110:137
- Fix for compilation problem on IA64 + Intel C compiler
  (http://openib.org/bugzilla/show_bug.cgi?id=121)
- Fix for shared library support on PPC64 platform 
- Fix for compilation on SUSE10 platform

Known Issues
------------
- A process running MPI cannot fork after MPI_Init. Using fork might cause a
  segmentation fault.
- Using mpirun with ssh has a signal collection problem. Killing the run
  (using CTRL-C) might leave some of the processes running on some of the
  nodes. This can also happen if one of the processes exits with an error.
  Note: This problem does not exist with rsh.
- The MPD job launcher feature of OSU MPI module has not been tested by Mellanox
  Technologies.  See http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ for more
  details.
- The latency of the Pallas test for 0-byte messages is long since the algorithm
  for small messages does not warm up the software. To fix the latency set
  VIADEV_ADAPTIVE_RDMA_THRESHOLD=0 in the mvapich.conf.
  Note: This fix impacts scalability, therefore it is not recommended for large
  clusters.
- For users of Mellanox Technologies firmware fw-23108 or fw-25208 only:
  OSU MPI may fail in its default configuration if your HCA is burnt with an
  fw-23108 version that is earlier than 3.4.000, or with an fw-25208 version
  4.7.400 or earlier.
  Workaround:
  Option 1 - Update the firmware.
  Option 2 - In mvapich.conf, set VIADEV_SRQ_ENABLE=0
- MVAPICH does not run on RHEL4 U3 ppc64

Main Verification Flows
-----------------------
In order to verify the correctness of OSU MPI, the following tests and
parameters were run.

Test 				Description
===================================================================
Intel's 			test suite 1400 Intel tests
BW/LT 				OSU's test for bandwidth latency
Pallas 				Intel's Pallas test
mpitest 			b_eff test
Presta 				Presta multicast test
Linpack 			Linpack benchmark
NAS2.3 				NAS NPB2.3 tests
SuperLU 			SuperLU benchmark (NERSC edition)
NAMD 				NAMD application
CAM 				CAM application

