
   ==================================================================
   ===                                                            ===
   ===      PARKBENCH Distributed Memory Benchmarks               ===
   ===                                                            ===
   ===                           COMMS1                           ===
   ===                                                            ===
   ===                          Pingpong                          ===
   ===                                                            ===
   ===               Versions:  PVM + Fortran 77                  ===
   ===                                                            ===
   ===        Original Author:  Roger Hockney                     ===
   ===                          Department of Electronics and     ===
   ===                          Computer Science                  ===
   ===                          University of Southampton         ===
   ===                          Southampton SO17 1BJ, UK          ===
   ===                                                            ===
   ===       Modifications by:  Ian Glendinning & Ade Miller      ===
   ===                          Southampton HPC Centre            ===
   ===                          Computing Services                ===
   ===                          University of Southampton         ===
   ===                          Southampton SO17 1BJ, UK          ===
   ===                                                            ===
   ==================================================================


1. Description
--------------
This benchmark measures the basic communication properties of a computer
network by performing the 'pingpong' experiment between a neighbouring pair of
nodes.  A message of varying length is sent to a neighbouring node, and
immediately returned after the data has become available to the receiving user
program. Half the time for this pingpong exchange is recorded as the time to
send a message from one node to a neighbour.  This time is fitted by
least-squares to the straight line relation:

                     tn = (n + nhalf) / rinf                   (1)

where  rinf  = the asymptotic stream rate (Byte/s), and
       nhalf = the message length (Byte) giving half the 
               asymptotic performance

This corresponds to an average performance, r, as a function of message
length, n,

                            rinf
                    r = -------------                          (2)
                        (1 + nhalf/n)

In the above formula rinf is the asymptotic stream rate to use with the value
of nhalf in order to calculate the average bandwidth. For short messages the
values of rinf may be high but they will not be achieved because of the effect
of nhalf via equation (2).

The benchmark has been deliberately kept simple by restricting the test to
asynchronous communication. This is the most favourable case and gives a lower
bound on the time for the communication of a message.  Asynchronous, here,
means that a send returns to the calling program when the user data array
being sent may be safely reused.  This, however, may be before the message has
been received by the receiving node.  The receiving node program stops (i.e.
blocks) until the data is available for use by the user's program.


2. Operating Instructions
-------------------------
To compile and link the code, type
make

On some systems it may be necessary to allocate the appropriate resources
before running the benchmark, eg. on the iPSC/860 to reserve a cube of 2
processors, type:    getcube -t2

The message length of each test is defined by a file called 'comms1.dat'.  If
you wish to obtain a benchmark result for comparison with results from other
machines, you should use the standard version of comms1.dat provided with this
release.  Alternatively, if you wish to investigate the detailed variation of
communication speed with message length for your particular machine, you can
edit the file before running the benchmark.  The format is one integer value
per line, each defining the message length of a test case.  The values should
be in ascending order.  You can specify any number of values, up to a compile
time limit specified by the parameter MAXTST, which is defined in the file
'comms1.inc'.

To run the benchmark executable, type:    comms1

The default number of node processes is 2, but you may allocate any
number up to a maximum defined by the compile-time parameter MAXNOD, declared
in the file comms1.inc. If you choose more than 2 processors, you must define
which slave node you want the master (node 0) to communicate with in the file
comms1.dat.  This option can be used to study the time variation with
separation within the network. Many message-passing computers have different
timing for short and long messages, and you should define the number of bytes
in the longest short-message, or zero if there is no difference between short
and long messages in comms1.dat.  If you specify a non-zero value, the
program will automatically add test cases for the longest short-message and
the shortest long-message, if they are not already defined in comms1.dat.
Define whether or not you wish timings for zero length messages to used in
computing least squares fits to the data.  This is useful since such timings
can be anomalous.  Finally you must specify the approximate measurement time
that you want for each test case (with different message lengths).  The actual
number of times a message is ping-ponged for each case is calculated to give
approximately that execution time.  This means that, for any particular
system, you can ensure each test is run for long enough to average out
disturbances caused by spurious operating system effects.  It also means that
you have direct control over the total time the benchmark will take to run.

When you have answered the above questions, the program proceeds to make
estimates of the loop overhead and communication parameters.  It uses these to
calculate the number of ping-pongs needed for each test case, to obtain the
requested execution time per test.  It should be noted that the loop overhead
is re-measured for each test, and that the measurement takes approximately the
same time as the ping-pong part of the test, so the total elapsed time for
each test case is actually about twice the specified execution time.

Once the timing parameters have been estimated, the benchmark test cases are
executed.  To enable their progress to be followed, a line is written to the
standard output, showing the test number and message length, when each test
starts.  When it finishes, the measurement of the time taken to send one
message is written out, together with the number of iterations the test used.

A permanent copy of the full benchmark results is written to a file called
'comms1.res'.  If the run is successful and a permanent record is required,
this file should be copied to another file before the next run overwrites
it.

$Id: ReadMe,v 1.6 1994/06/08 14:52:23 igl Exp igl $
