From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 08:42:24 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA03711; Fri, 21 May 93 08:42:24 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15863; Fri, 21 May 93 08:42:58 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 08:42:58 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15855; Fri, 21 May 93 08:42:56 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18681; Fri, 21 May 1993 08:42:55 -0400
Date: Fri, 21 May 1993 08:42:55 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9305211242.AA18681@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact applications


Dear Compact Applications People,

At last I have roughed out some notes on compact applications to serve as
a discussion for next weeks meeting in Knoxville. See you there,

David
------------------ Latex file below --------------------------------
%file: compac2.tex
\chapter{Compact Applications}
\footnote{assembled by David Walker for Compact Applications subcommittee}

\section{Introduction}
\label{sec:compact.intro}
While kernel applications, such as those described in Chapter 4, provide
a fairly straightforward way of assessing the performance the parallel
systems they are not representative of scientific applications in general
since they do not reflect certain types of system behavior. In particular,
many scientific applications involve data movement between phases of
an application, and may also require significant amounts of I/O. These types
of behavior are difficult to gauge using kernel applications. 

One factor
that has hindered the use of full application codes for benchmarking parallel
computers in the past is that such codes are difficult to parallelize and to
port between target architectures. In addition, full application codes that
have been successfully parallelized are often proprietary, and/or subject
to distribution restrictions. To minimize the negative impact of these factors
we propose to make use of compact applications in our benchmarking effort.

Compact applications are typical of those found in research environments 
(as opposed to production or engineering environments), and usually consist of 
up to a few thousand lines of source code. Compact applications are distinct 
from kernel applications since they are capable of producing scientifically
useful results. In many cases, compact applications are made up of several
kernels, interspersed with data movements and I/O operations between the 
kernels.

In this chapter we will discuss a number of compact applications in terms of 
their purpose, the algorithms used, the types of data movements required, 
the memory requirements, and
the amount of I/O. The compact application below are not meant to form a 
definite or complete list.

\section{Proposed Compact Application Benchmarks}
\label{sec:compact.proposed}
To ensure that those areas of scientific computing that make the most use of
high performance computers are adequately represented in the benchmark
suite we shall classify compact applications by scientific field.

\subsection{Plasma Physics}
\label{subsec:plasmas}
Plasma physics is a large consumer of high performance computer cycles. Among
the areas studied are the design of tokamaks, high power microwave devices, and 
astrophysical plasmas. It would be nice to have a compact application from 
each of these three fields in the benchmark suite. Currently we have Hockney's
device simulation, LPM1, from the GENESIS suite.

\subsubsection{Electronic Device Simulation with LMP1}
\label{subsubsec:lpm1}
LMP1 is a time dependent simulation of an electronic device
using a particle-mesh or PIC-type algorithm. It uses a two-dimensional
$(r,z)$ geometry with the fields being computed on a regular mesh
of size $33\times 75\cdot\alpha$, where $alpha$ is a size parameter that can
take the value 1, 2, 4, and 8, corresponding to runs with between about 700 and
6000 particles.

\subsection{Quantum Chromodynamics}
\label{subsubsec:qcd}
Quantum Chromodynamics (QCD) is the gauge theory of the strong
interaction which binds quarks and gluons into hadrons, which make up the
constituents of nuclear matter. Analytical perturbation methods can be applied
to QCD only at high energies, hence computer simulations are necessary to study
QCD at lower, more realistic, energies. In these lattice gauge theory
simulations the quantum field is discretized onto a periodic, four-dimensional,
space-time lattice. Quarks are located at the lattice sites, and the gluons
that bind them are associated with the lattice links. The gluons are
represented by SU(3) matrices, which are a particular type of $3\!\times\! 3$
complex matrix. A major component of the QCD code involves updating these
matrices.

\subsubsection{Quenched QCD}
\label{subsubsec:quenched}
The QCD code in the Perfect benchmark suite is derived from the work of
Fox, Flower, Otto, and Stolorz at Caltech. The Perfect QCD code uses the 
Cabbibo-Marinari pseudo heat bath algorithm to update the SU(3) matrices on
the lattice links. This algorithm uses a Monte Carlo technique to generate a 
chain of configurations which are distributed with a probability proportional
to $\exp{(-S(U))}$, where $S(U)$ is the action of the configuration $U$.
If the only contributions to the action come from the gauge field then
the action is local. The inclusion of dynamical fermions gives rise to a
nonlocal action. This code ignores the effects of dynamical fermions, and so
represents a pure-gauge model in the quenched approximation.

A major component of this QCD code is the updating of the SU(3) matrices
associated with each link in the lattice, and it is this operation which
is benchmarked in the Perfect timings. Two basic operations are involved in
updating the lattice. The first is the multiplication of SU(3) matrices,
and the second is the generation of pseudo-random numbers.

\subsubsection{Genesis QCD}
\label{subsubsection:dynamical}
Is the Genesis benchmark QCD1 similar to the Caltech QCD code. Which one
should be used?

\subsection{General Relativity}
\label{subsec:gr}
\subsubsection{Evolution of Gravitational Field}
The Genesis code GR1 solves a system of hyperbolic PDEs, derived from general
relativity which describe the evolution of a gravitational field from an
initial state. Although conceptually similar to the solution of the wave
equation the equations are long and complicated. This application solves the
axisymmetric problem to reduce the problem to manageable size. Solution of
the general problem requires three orders of magnitude more compute power,
and is likely to become of substantial interest as more powerful parallel
machines are developed.

\subsubsection{Quantum Theory of Gravity}
\label{subsec:gravity}
This code, which derives from the work of Sorkin and Daughton of
Syracuse University, is part of an effort to provide a
satisfactory quantum theory of gravity by the use of causal set
theory$\ldots$whatever that is. The main computational task is the LU
factorization of large, dense matrices ($10000\times 10000$).

\subsection{Climate and Weather Prediction}
\label{subsec:climate}
Mesoscale weather prediction and global climate modeling have become
important application areas in recent years. They typically involve the
solution of nonlinear PDEs.

\subsubsection{Spectral Solver for the Shallow Water Equations}
\label{subsubsec:swe}
The spectral transform method 
is the standard numerical technique
used to solve partial differential equations on the sphere in
global climate modeling. For example, it is used in CCM1 
(the Community Climate Model 1), and its successor CCM2.
The solution of the shallow water equations on a sphere constitutes an 
important component in such global climate models.
The SSWMSB code uses the spectral transform method to solve the shallow water
equations on the surface of a sphere which is discretized as a regular
longitude-latitude grid. In each timestep the state variables of 
the problem are transformed
between the physical domain, where most of the physical forces are calculated,
and the spectral domain, where the terms of the differential equation
are evaluated. This transformation involves first the evaluation of FFTs along
lines of constant latitude, followed by Legendre integration (i.e., weighted
summation) over longitude.

\subsubsection{Helmholtz Solvers for Meteorological Modeling}
\label{subsubsec:helmholtz}
The Genesis suite includes two meteorological applications based on 
Helmholtz solvers. One uses a pseudo-spectral solution method, and the other
a multigrid algorithm.

\subsection{Molecular Dynamics}
\label{subsec:moldyn}

\subsubsection{Dislocation Studies in Crystals}
\label{subsubsec:dislocation}
In parallel Fortran 77 plus message passing code has been developed at ORNL to 
study dislocation phenomena in crystals. This three-dimensional code divides
space into cells, with each processor being assigned a rectangular block of
cells. Each cell contains a set of particles. Communication is necessary to
exchange particles lying in cells on the boundary of a processor with a
neighboring processor. Particles must also be migrated between processors
as they move in space.

\subsubsection{The Genesis Molecular Dynamics Code}
\label{subsubsec:genesis_md}
I don't know much about this, but I expect it's similar to the ORNL code.

\subsubsection{The PERFECT Molecular Dynamics Code}
\label{subsubsec:perfect_md}
The Perfect benchmark suite included two molecular dynamics code, both of
which use data sets that are too small to be used to evaluate current
parallel computers. BDNA which simulates the hydration structure of potassium
counterions and water in a B-DNA molecule, involves 1500 water molecules and
20 counterions. MDG performs a molecular dynamics calculation on 343 water
molecules in the liquid state.

\subsection{Geophysics}
Two important geophysics computations are flow through porous media and
seismic migration. The Perfect suite includes a seismic migration code,
MG3D. This code is dominated by FFTs. A parallel code for modeling groundwater
flow is under development at ORNL and may be a good code to include in the
suite as an example of a flow through porous media code.

\subsection{Other Codes}
Clearly we would want to include CFD codes, astrophysics codes such as the
tree-based simulations of gravitating systems, quantum chemistry and
superconductor simulations. We also need to include codes from the NAS, NPAC,
PERFECT2, and SLALOM benchmark suites, as well as providing better 
descriptions of the codes above.

\section{Concluding Remarks}
There are probably two or three dozen compact applications that
we might consider for inclusion in the benchmark suite. We should consider
what is a reasonable number of codes to include, and the criteria for
accepting a code in terms of documentation, usefulness, and software quality.


From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 09:06:07 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA03860; Fri, 21 May 93 09:06:07 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17282; Fri, 21 May 93 09:06:44 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 09:06:43 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17276; Fri, 21 May 93 09:06:41 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA01842; Fri, 21 May 93 09:06:40 -0400
Message-Id: <9305211306.AA01842@berry.cs.utk.edu>
To: walker@rios2.epm.ornl.gov (David Walker)
Cc: pbwg-compactapp@cs.utk.edu
Subject: Re: Compact applications 
In-Reply-To: Your message of "Fri, 21 May 1993 08:42:55 EDT."
             <9305211242.AA18681@rios2.epm.ornl.gov> 
Date: Fri, 21 May 1993 09:06:39 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>

Fellow Compact Applic. Members: Here is a copy of the minutes
from the SPEC/Perfect meeting I attended in Hunstville.  Some
of this information may be useful to PBWG.

Mike B.
---------------------------------------------------------------

                        Draft Minutes: The SPEC Perfect Group
                                   11-13 May 1993

          The Perfect Club Steering Committee voted to merge with the  SPEC
          organization.  The first joint meeting with SPEC occurred  during
          11-13 May 1993.  The original SPEC organization has been modified
          so that  the name  "SPEC" refers  to the  non-profit  corporation
          which acts as  a financial umbrella  for benchmarking  subgroups.
          The original SPEC  group is now  known as the  SPEC Open  Systems
          Group.  The Perfect Club is now known as the SPEC Perfect Group.

          In accordance with the  vote taken by  David Schneider in  April,
          the initial  SPEC Perfect  Steering Committee  includes  Margaret
          Simmons (LANL), George Cybenko(Darmouth), David Schneider (CSRD),
          John Larson (CSRD),  Mike Berry (U.of  Tenn), Satish Rege  (DEC),
          Joanne Martin (IBM), and Philip Tannenbaum (HNSX).  This  meeting
          was attended by David Schneider  (CSRD), Mike Berry (U.of  Tenn),
          Satish Rege  (DEC),  Philip  Tannenbaum  (HNSX),  Leo  Boelhouwer
          (IBM-Kingston,  representing   Joanne   Martin),   Jacob   Thomas
          (IBM-Austin), Larry Gray  (Chairman, SPEC BOD),  and Rod  Skinner
          (Treasurer, SPEC).   Hwa Lai (Fujitsu)  attended as an  observer.
          Various SPEC Open  Systems members  periodically sat  in.   David
          Schneider indicated  that  he  anticipated  Cray  Research  would
          rejoin because of marketing necessity.

          The meeting  began  with David  Schneider,  Larry Gray,  and  Rod
          Skinner presenting the framework for  the merger.  The SPEC  Open
          Systems Group  and  the SPEC  Perfect  Group will  be  autonomous
          subgroups within  SPEC.   SPEC  itself  will act  as  a  business
          umbrella organization.  Each Group will assess dues and  allocate
          budgets independently.   The  overhead which  SPEC Perfect  Group
          will  be  responsible  for   will  include  legal  retainer   and
          accounting fees  for  NCGA,  and additional  costs  of  printing,
          duplication,  distribution,  or  other  services  that  the  SPEC
          Perfect Group may elect  to utilize in the  future.  It was  also
          stated that the  SPEC organization was  flexible on many  issues,
          but the  underlying  requirement  was to  ensure  that  corporate
          non-profit  status  regulations  are  not  violated.    SPEC   is
          incorporated as a non-profit organization in California.

          It was  generally  agreed  by  all that  mutual  trust  would  be
          required from SPEC Open Systems  Group and SPEC Perfect Group  to
          minimize formality and unnecessary bureaucracy.

          The Perfect Group will be given one SPEC BOD seat on a  temporary
          basis until January 1994.  The  SPEC BOD currently consists of  5
          members that  includes HP,  Intel, Sun,  ATT/NCR, and  IBM.   The
          Perfect Group seat will add 1 member to the BOD.  In January 1994
          this 6th BOD  seat will  be open for  voting by  the entire  SPEC
          membership (SPEC Perfect Group and  SPEC Open Systems Group).   A
          discussion about who should fill the temporary SPEC Perfect Group
          BOD seat resulted in agreement  that University people could  not
          practically take the  position because  of travel  expense.   IBM
          already was  represented  on the  SPEC  BOD, so  David  Schneider





          nominated Satish  Rege  (DEC)  and Philip  Tannenbaum  (HNSX)  as
          candidates for  the  BOD  seat.    Leo  Boelhouwer  seconded  the
          nomination  for  Philip  Tannenbaum;  Mike  Berry  seconded   the
          nomination for Satish Rege.   A vote will  be conducted by  email
          on/about 1 June 1993.  The initial  7 Steering Committee  members
          are the eligible voters.

          During June  a  press  announcement about  the  merger  would  be
          jointly written.

          There was discussion about  inclusion of academic and  government
          members.   As  a  result of  SPEC  non-profit  requirements,  all
          members must be  either full members  ($5,000/year) or  associate
          members ($1,000/year).    It was  agreed  that few  academics  or
          government members could  acquire funding for  membership.   SPEC
          Perfect Group  Steering  Committee  could elect  to  sponsor  the
          memberships of  selected  individuals;  and  certain  individuals
          could  be  included  by  creation  of  "SPEC  Fellows"  or  "SPEC
          Affiliates" whereby  specific services  could  be paid  for  with
          membership.     Seeking  industrial   sponsorship  for   academic
          participation was  discussed as  desireable.   Each  member  will
          initiate a "check is  in the mail"  process for their  membership
          fees.   Diane  Dean,  NCGA,  2722  Merrilee  Drive,  Fairfax,  VA
          22301-4499 (703-698-9600  x318) is  our contact  in this  regard.
          SPEC Open  Systems  Group  members  received  6  free  pages  for
          SPEC/OSG reporting  in the  publications; additional  pages  were
          billed at $500  each--it was  noted that DEC  purchased 60  extra
          pages in the last publication to kick off a new product line.

          The SPEC Perfect group organization was discussed.  It was agreed
          that the SPEC Perfect Group should have a Chairman, a  Secretary,
          and a Technical Coordinator.   The Chairman would be  responsible
          for interfacing  with  SPEC  and the  SPEC  Open  Systems  Group,
          organizing meetings, and general management.  The Secretary would
          be   responsible    for   generating    minutes   and    handling
          correspondence.  The Technical  Coordinator would be  responsible
          for benchmarking status,  benchmark production and  distribution,
          coordinating the benchmark subgroups,  and being the focal  point
          for technical issues.  Each benchmark subgroup would have its own
          leadership.

          Temporary assignments were accepted to fill these positions until
          the next SPEC Perfect Group  meeting, targeted for August at  ATT
          (Chicago).    Rege  Satish  is  the  temporary  Chiarman,  Philip
          Tannenbaum  the  temporary  Secretary,  and  Leo  Boelhouwer  the
          temporary Technical Coordinator.   Specific action items for  the
          period include:

             Completing the benchmark codes
             Generating verification tests and timing instrumentation
             Publishing minutes
             Writing a  solicitation for  vendors  and industry  to  attract
             membership or sponsorship support





          A discussion about the benchmark rules and reporting resulted  in
          general  agreement  that  there  would  be  baseline  ("As   Is")
          executions which  allowed only  the minimal  changes required  to
          obtain correct  results.   There would  also be  an optimized  or
          alternative solution execution which would allow unlimited use of
          standard vendor libraries and unlimited rewriting in a high level
          language.  

          It was agreed  that the benchmark  programs would be  distributed
          via netlib  or  anonymous ftp.    Text  would be  added  to  each
          benchmark program  requiring that  any use  of benchmark  results
          from the program, which are  not formally accepted and  published
          by SPEC  Perfect  Group,  must   state  "these  results  are  not
          officially approved  and  reported  by  the  SPEC  Perfect  Group
          Steering Committee.    They may  not  be directly  comparable  to
          accepted and verified results."
          Only actual execution results would be permitted.  All executions
          must be  on  hardware  and  software  systems  that  are  current
          products or  which  will be  generally  available in  the  market
          within 6 months.  

          There was  a  spirited debate  on  the  metrics to  be  used  for
          reporting results.  Discussion about  the pros and cons of  using
          normalized  ratings,  MFLOPS,  wall  clock  times,  and  absolute
          numbers took  place. The  discussion  resulted in  the  benchmark
          publications including 1)elapsed wall clock time, 2)startup time,
          3)time step  timing,  3)cleanup  time,   4)total  user  cpu  time
          accumulated, and 5)total system cpu time accumulated per program.
            No MFLOPS rate will reported.   This was agreed to be the  most
          scientifically  sound  approach  that  would  be  meaningful  and
          unambiguous.

          All execution results presented for approval and publication must
          include  sufficient   detail  of   the  hardware   and   software
          configuration such that the  run could be essentially  duplicated
          with comparable  timings.   Acceptable  results will  have  valid
          answers and meet  SPEC Perfect Group  standards for code  changes
          and execution requirements.   Optimized and alternative  solution
          results must include the entire  program code as executed, and  a
          statement that the code  may be used,  without restriction, as  a
          SPEC Perfect Group baseline benchmark  code.  All vendor  library
          codes  used   must  include   copies  of   the  relevant   vendor
          documentation page that include sufficient detail to describe the
          processes done within  the library routine.   New vendor  library
          routines   must   have    copies   of   equivalent    preliminary
          documentation.   All  library  routines used  must  be  generally
          available to all vendor customers, and must either be  documented
          products, or  become  documented  products  within  6  months  of
          benchmark submission.    Results on  prototype  or  preproduction
          systems could  be removed  from  publication if  the  benchmarked
          products were not released within the 6 month window.

          The goal  is to  provide  all codes  in  a FORTRAN77  version,  a
          FORTRAN90 version, and a message passing version.  It was  agreed





          that version control  should be  instituted so  that all  results
          would be grouped according to benchmark version.  If any one code
          in a  benchmark group  changed,  all codes  would receive  a  new
          version number.  The benchmark groups will be aligned to  address
          vertical industrial areas such as petroleum, chemistry,  finance,
          etc. 

          The codes available  for the initial  release include the  FDMOD,
          FKMIG, and  SEIS from  the ARCO  suite, QCD,  FALSE, PUEBLO,  and
          TURB3D.   The ARCO suite codes are farthest along.  All codes are
          expected  to  represent  scalable  problem  solutions  that   are
          appropriate to vector, vector parallel, and MPP architectures.  A
          goal is to  maintain the benchmark  set at a  level whereby  only
          supercomputer class  and extreme  high end  workstations/clusters
          could reasonably  execute the  problems.   There is  no  specific
          exclusion intended; this goal was stated in order to maintain the
          SPEC Perfect Group focus on  true supercomputing rather than  the
          broader high performance computing classification.  The goals may
          not all be addressed initially because of pratical limitations in
          how much can be accomplished with available resources.

          Coding and  language standards  were discussed.   Proposals  were
          made.  John Larson''s work in this area will be circulated.   Leo
          Boelhouwer will  edit  the  V1 execution  rules  and  present  an
          updated draft for approval during the next meeting.  









          Language standards  were  presented as  a  basis for  creating  a
          benchmark code  standard  by  David  Schneider.    They  included
          numerous items that were accepted by the group, and a few  (noted
          below) where no final conclusion was made.

               Variables could not exceed 31 characters
               No Pointers
               No DOUBLE PRECISION; REAL*8 and COMPLEX*16 should be used
               No CHARACTER-Floating Point equivalences
               No Hollerith constants or data
               No 128 bit requirements (REAL*16, COMPLEX*32)
               All 64 bit constants should be specified in D format
               All 32 bit constants should be specified in E format
               Machine constant limitations were discussed--no  conclusions
          agreed
               INTEGER*8 and LOGICAL*8 should not be used unless  necessary
          for execution
               Tests  for  floating   point  equality  were   discussed--no
          conclusions agreed





               Known vector directive information  will be translated to  a
          "C*PERFECT" syntax to
                    preserve information; it will be explicitly  prohibited
          from implementing compiler
                    recognition of "C*PERFECT" information.
               DO WHILE and DO-ENDDO syntax is allowed
               "!" inlined comments were discussed--no conclusions agreed


          Additional action items were summarized:

               Distribute old by-laws for review (DS)
               Review old by-laws and offer suggestions for revision (all)
               Contact NCGA regarding our new status (DS)
               Present our proposals for membership specific issues to  the
          SPEC BOD (SR)
               Identify manpower  requirements  to  complete  V2  benchmakr
          suite (all)
               Transfer "Perfect Benchmark" trademark  from U.Ill. to  SPEC
          (DS)
               Distribute Minutes (PT)
               Set up address and email lists (DS)
               Next meeting  at ATT,  Chicago, in  August (with  SPEC  Open
          Systems Group) (all)
               Schedule a benchathon to finalize all V2 inital codes (all).

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Wed May 26 17:49:18 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA09519; Wed, 26 May 93 17:49:18 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25937; Wed, 26 May 93 17:49:43 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 26 May 1993 17:49:42 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25931; Wed, 26 May 93 17:49:41 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA11808; Wed, 26 May 93 17:49:40 -0400
Message-Id: <9305262149.AA11808@berry.cs.utk.edu>
To: pbwg-compactapp@cs.utk.edu
Subject: We can get ARCO
Date: Wed, 26 May 1993 17:49:39 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>

Here's an note I recieved from Mosher at ARCO - looks pretty good!
Mike

Return-Path: <ccm@Arco.COM>
Received: from inetg1.Arco.COM by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA14228; Wed, 26 May 93 14:48:57 -0400
Received: by Arco.COM (4.1/SMI-4.1)
	id AA06937; Wed, 26 May 93 13:48:55 CDT
Date: Wed, 26 May 93 13:48:55 CDT
From: ccm@Arco.COM (Chuck Mosher (214)754-6468)
Message-Id: <9305261848.AA06937@Arco.COM>
To: berry@cs.utk.edu
Subject: ARCO/Perfect Seismic Benchmark


Version 1.0 of SeisPerf is due for Beta release June 1.  The
suite provides a working seismic processing executive with
examples of common industry algorithms.  Version 1.0 is built
over a simple message passing layer, which calls PVM, P4, or
native message passing services.  The applications call several
of the kernal routines mentioned in the PBWG minutes, including
3D fft's, tri-diagonal and Toepplitz matrix solvers, convolutions,
and integral methods.  The codes are designed to be scalable
from single processor workstations to ~1000 processor MPP systems.

Verification tools include a simple X-windows frame viewer, and
a checksum table that is printed at the end of each run.  The 1.0
release is based on Fortran 77.  MasPar has provided a Fortran 90
port of the codes for their systems, which could form the base for
and HPF version of the codes.

I'd be happy to participate in PARKBENCH and provide support for
including SeisPerf results.

Regards,
Chuck Mosher
ccm@arco.com

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Thu May 27 12:54:03 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK)
	id AA13555; Thu, 27 May 93 12:54:03 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10406; Thu, 27 May 93 12:54:28 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 27 May 1993 12:54:27 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10400; Thu, 27 May 93 12:54:26 -0400
Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK)
	id AA13805; Thu, 27 May 93 12:54:25 -0400
Message-Id: <9305271654.AA13805@berry.cs.utk.edu>
To: ccm@arco.com (Chuck Mosher (214)754-6468)
Cc: pbwg-compactapp@cs.utk.edu
Subject: Re: ARCO/Perfect Seismic Benchmark 
In-Reply-To: Your message of "Thu, 27 May 1993 06:59:31 CDT."
             <9305271159.AA15941@Arco.COM> 
Date: Thu, 27 May 1993 12:54:24 -0400
From: "Michael W. Berry" <berry@cs.utk.edu>


> An earlier release of the codes is available on the U of Illinois
> anonymous ftp server 'csrd.uiuc.edu' in the directory '/pub/perfect'.
> The file 'arco_beta.tar.Z' contains code, installation scripts,
> and documentation for an earlier f77 version for uniprocessors.
> You might want to get this file and have a look at the documentation
> and source structure.  The message-passing source is pretty close
> in structure to the f77 version.
> 
> We have a mailing list for discussion of the codes:
> 	'perfect_seismic@csrd.uiuc.edu'
> Let me know if you want to be on the list.  We'll announce the
> new codes there.
> 
> Regards,
> Chuck Mosher
 Yes, please add my email addr and pbwg-compactapp@cs.utk.edu to
the mailing list. Thanks Mike

---
Michael W. Berry     ___-___  o==o======   .   .   .   .   .
Ayres 114         =========== ||//         
Department of             \ \ |//__        
Computer Science          #_______/        berry@cs.utk.edu
University of Tennessee                    (615) 974-3838 [OFF]
Knoxville, TN 37996-1301                   (615) 974-4404 [FAX]
From owner-pbwg-compactapp@CS.UTK.EDU Thu Sep 16 11:20:48 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA00187; Thu, 16 Sep 93 11:20:48 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25374; Thu, 16 Sep 93 11:19:13 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 16 Sep 1993 11:19:10 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from sun4.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25344; Thu, 16 Sep 93 11:19:07 -0400
Received: by sun4.epm.ornl.gov (4.1/1.34)
	id AA00634; Thu, 16 Sep 93 11:19:06 EDT
Date: Thu, 16 Sep 93 11:19:06 EDT
From: worley@sun4.epm.ornl.gov (Pat Worley)
Message-Id: <9309161519.AA00634@sun4.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: potential compact benchmark
Forwarding: Mail from 'MAILER-DAEMON (Mail Delivery Subsystem)'
      dated: Thu, 16 Sep 93 11:16:12 EDT

Ian Foster and I are just finishing version 1.0 of PSTSWM, a parallel
algorithm testbed and benchmark code developed for the climate modelling
community. It will be made available to this community via netlib, but it
may also be interesting as a PARKBENCH compact application. There are a
few difficulties with this though, and I would like some
feedback/suggestions on how to proceed.

Description
-----------
PSTSWM is a parallel implementation of a serial code (STSWM 2.0) written
by Jim Hack and Rudy Jakobs at NCAR to solve the shallow water equations
on a sphere using the spectral transform method. It was originally
developed as a numerical algorithm testbed, to allow comparison of
spectral methods with finite difference methods with finite element
methods, etc., and has 6 runtime-selectable test cases in the code.
These test cases specify initial conditions, forcing, and analytic
solutions (for error analysis), and were chosen to test the ability  of
the numerical methods to simulate important flows phenomena.

For PSTSWM, we completely rewrote STSWM to add vertical levels, in order
to get the correct communication and computation granularity for 3-D
climate codes, and to allow the problem size to be selected at runtime
without depending on such nonportable features as dynamic memory. 

PSTSTWM is meant to be a compromise between paper benchmarks and the
usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how
quickly the numerical simulation can be run on different machines
without fixing the parallel implementation, but forcing all
implementations to execute the same numerical code (to guarantee
fairness). To enable this PSTSWM supports:

a) 4 classes of parallel algorithms (distributed or transpose
   based for each of two major parallel phases)
b) each class has 3-4 specific parallel algorithms (e.g. using a
   recursive-halving vector sum, using a pipelined ring vector sum,
   etc.)
c) each algorithm has 2-4 variants 
d) each algorithm is built on top of two communication constructs,
   swap and sendrecv, and each of these has 5-6 different communication
   protocol options (synchonous, blocking, nonblocking, forcetypes,
   etc.)

We are quite happy with the code, and are getting good results with it.
Most interesting to us is how the best algorithm changes across
platforms and as the problem size changes on the same platform.

Problems
--------
There are couple of issues to be dealt with in using this code as part
of PARKBENCH.

1) The code currently is in single precision with double precision
   parts. Single precision is sufficient for the problem sizes of
   interest, but the Legendre polynomial values and Gauss quadrature
   weights and nodes must be calculated in higher precision. For larger
   problem sizes, double precision computation will be appropriate, but
   the Gauss weights, etc will then need to be calculated in quad.
   precision. I do not think that this sort of mixed case has been
   discussed yet. 

2) In one sense, PSTSWM is not a single benchmark, but many of them.
   We can fix the problem and parallel algorithm specifications by
   providing (a set of) default input files, but which ones should we
   chose? All of them are arguably good algorithms in some setting, and
   I would hate to compare two machines when the algorithm is good for
   one and inappropriate for another.

3) PSTSWM is currently written using PICL (because that is what I
   normally use and because I have embedded instrumentation in the
   research version of the code). I made a real effort to isolate the
   message passing bits, so porting to anything else will be trivial.
   But the message passing interface that is used does effect the
   parallel algorithms that are supported. For example, PICL supports
   nonblocking send and receive and passes through forcetype message
   types. These are important to performance on some Intel machines.
   This is not a problem so much as something to be aware of. PSTSWM
   will also be available in its original form, but a pointer to some of
   the issues in cross-machine comparisions should be made. This may be
   an issue that should be mentioned in the methodology section as
   pertains to compact applications. Unlike low level benchmarks,
   compact applications are less likely to be "done right" by the vendor
   for their particular machines. 

Comments and suggestions would be appreciated. I imagine every proposed
compact application will be unsuitable in one form or another when it is
first submitted, and precise guidelines on what should or should not be
permitted is important. On the other hand, as a developer, I will not be
interested in doing too much work in modifying the code in order to
include it in the benchmark suite. Even with the best intentions, it
will not be a high priority item for me and is likely to be put off
(forever) if not fairly simple.

Thanks.

Pat Worley

From owner-pbwg-compactapp@CS.UTK.EDU Tue Sep 21 11:49:13 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA02710; Tue, 21 Sep 93 11:49:13 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08554; Tue, 21 Sep 93 11:47:15 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 21 Sep 1993 11:47:14 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08546; Tue, 21 Sep 93 11:47:13 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA12782; Tue, 21 Sep 1993 11:47:07 -0400
Date: Tue, 21 Sep 1993 11:47:07 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9309211547.AA12782@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Application submission form


I'm trying to put together a submission form for people to use to submit 
applications for inclusion in the ParkBench Compact Applications suite. Also
I'd like to establish a procedure for submission. Below is a first stab at
these 2 things. Please send me feedback. Later this week I intend to send
out a filled in version of the submission form as an example.

David
                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
		David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         :
-------------------------------------------------------------------------------
Submitter's Name        :
Submitter's Organization:
Submitter's Address     :


Submitter's Telephone # :
Submitter's Fax #       :
Submitter's Email       :
-------------------------------------------------------------------------------
Cognizant Expert(s)     :
CE's Organization       :
CE's Address            :



CE's Telephone #        :
CE's Fax #              :
CE's Email              :
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :


-------------------------------------------------------------------------------
Major Application Field :
Application Subfield(s) :
-------------------------------------------------------------------------------
Application "pedigree"  :




-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :


-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

        Integers :     bytes
	Floats   :     bytes

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :



-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :



-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :



-------------------------------------------------------------------------------
Other relevent research papers:



-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :




-------------------------------------------------------------------------------
Total number of lines in source code:
Number of lines excluding comments  :
Size in bytes of source code        :
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :



-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :



-------------------------------------------------------------------------------
Brief, high-level description of what application does:




-------------------------------------------------------------------------------
Main algorithms used:



-------------------------------------------------------------------------------
Skeleton sketch of application:




-------------------------------------------------------------------------------
Brief description of I/O behavior:




-------------------------------------------------------------------------------
Brief description of load balance behavior :




-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :



-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :




-------------------------------------------------------------------------------
Give parameters that determine the problem size :



-------------------------------------------------------------------------------
Give memory as function of problem size :


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :


-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :




-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :






-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :



-------------------------------------------------------------------------------
From owner-pbwg-compactapp@CS.UTK.EDU Tue Oct  5 15:29:11 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA06534; Tue, 5 Oct 93 15:29:11 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00420; Tue, 5 Oct 93 15:28:34 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 5 Oct 1993 15:28:29 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00402; Tue, 5 Oct 93 15:28:23 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20677; Tue, 5 Oct 1993 15:28:21 -0400
Message-Id: <9310051928.AA20677@rios2.epm.ornl.gov>
To: spb@epcc.edinburgh.ac.uk, mia@unixa.nerc-bidston.ac.uk,
        pbwg-compactapp@cs.utk.edu
Subject: Submission form for ParkBench compact applications
Date: Tue, 05 Oct 93 15:28:20 -0500
From: David W. Walker <walker@rios2.epm.ornl.gov>


Below is an example (prepared by Pat Worley of Oak Ridge National Lab) of
the use of the ParkBench Compact Applications submission form. This form (or
something like it) is intended to be used by all persons wishing to submit 
an application to be included in the suite. The first page or so expalins
the submission procedure. Pat has been very thorough in filling out the form.
I don't think it practical to expect every submission to be this detailed.

If you have applications that you would like to submit please go ahead and
fill in the form. Laso any comments on the form would be appreciated. I hope
to give the form wider distribution in a couple of weeks so we can (I hope)
get a good number of submission before teh SC93 ParkBench meeting.

David

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevant research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------
From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct  8 09:17:11 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA29750; Fri, 8 Oct 93 09:17:11 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00426; Fri, 8 Oct 93 09:16:23 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 09:16:22 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA00418; Fri, 8 Oct 93 09:16:20 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20027; Fri, 8 Oct 1993 09:16:19 -0400
Message-Id: <9310081316.AA20027@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact applications chapter
Date: Fri, 08 Oct 93 09:16:19 -0500
From: David W. Walker <walker@rios2.epm.ornl.gov>


I just sent the following to Mike Berry, but some of you might also like to make
suggestions.

David

Mike,
	I am a bit of a loss as to what to put into the ParkBench report
for Compact Application since we haven't had any codes submitted (except
for maybe 2 or 3).  It seems to me that we can't really say much without
the codes, about from very general requirements.

David
From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct  8 10:17:35 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA00610; Fri, 8 Oct 93 10:17:35 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA06069; Fri, 8 Oct 93 10:17:05 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 10:17:03 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from haven.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA06059; Fri, 8 Oct 93 10:17:02 -0400
Received: by haven.EPM.ORNL.GOV (4.1/1.34)
	id AA15407; Fri, 8 Oct 93 10:16:56 EDT
Date: Fri, 8 Oct 93 10:16:56 EDT
From: worley@haven.EPM.ORNL.GOV (Pat Worley)
Message-Id: <9310081416.AA15407@haven.EPM.ORNL.GOV>
To: walker@rios2.epm.ornl.gov, pbwg-compactapp@cs.utk.edu
Subject: Re: Compact applications chapter
In-Reply-To: Mail from 'David W. Walker <walker@rios2.epm.ornl.gov>'
      dated: Fri, 08 Oct 93 09:16:19 -0500
Cc: worley@haven.EPM.ORNL.GOV

>I just sent the following to Mike Berry, but some of you might also like to make
>suggestions.
>
>David
>
>Mike,
>>I am a bit of a loss as to what to put into the ParkBench report
>for Compact Application since we haven't had any codes submitted (except
>for maybe 2 or 3).  It seems to me that we can't really say much without
>the codes, about from very general requirements.
>
>David

Since I imagine that there will always be a dearth of (good) compact
applications, a requirements document (or, at least, a wish list) would be a
useful contribution, particularly if the wishlist were prioritized by what is
most important for the code to have, e.g.,

1) scientific relevance (does anyone care about this type of problem)
2) numerical relevance (are the numerical algorithms representative or
   interesting) 
3) algorithmic relevance (are the parallel algorithms representative or
   interesting)
4) portability (language, parallel programming model, etc.)
5) runability (easy to run, easy to validate results, easy to use for
   benchmarking)
6) ...

This can probably be broken into requirements and desirable features.

Pat

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 14 13:38:54 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA16662; Thu, 14 Oct 93 13:38:54 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA04580; Thu, 14 Oct 93 13:37:31 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 14 Oct 1993 13:37:29 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA04571; Thu, 14 Oct 93 13:37:28 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA19646; Thu, 14 Oct 1993 13:37:27 -0400
Date: Thu, 14 Oct 1993 13:37:27 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310141737.AA19646@rios2.epm.ornl.gov>
To: berry@cs.utk.edu
Subject: ParkBench compact applications
Cc: pbwg-compactapp@cs.utk.edu


Mike,
	Below is the latest version of the Compact Application section of the
ParkBench document. I also intend to send a latex version of the submission 
form to you later today for inclusion as Appendix A. I hope there will
be some comments back from the other members of teh subcommittee about
this section so I hope there will be an opportunity to update it.

David
%file: compac3.tex
%date: October 14, 1993
\chapter{Compact Applications}
\footnote{assembled by David Walker for Compact Applications subcommittee}

\section{Introduction}
\label{sec:compact.intro}
While kernel applications, such as those described in Chapter 3, provide
a fairly straightforward way of assessing the performance of parallel
systems they are not representative of scientific applications in general
since they do not reflect certain types of system behavior. In particular,
many scientific applications involve data movement between phases of
an application, and may also require significant amounts of I/O. These types
of behavior are difficult to gauge using kernel applications. 

One factor
that has hindered the use of full application codes for benchmarking parallel
computers in the past is that such codes are difficult to parallelize and to
port between target architectures. In addition, full application codes that
have been successfully parallelized are often proprietary, and/or subject
to distribution restrictions. To minimize the negative impact of these factors
we propose to make use of compact applications in our benchmarking effort.

Compact applications are typical of those found in research environments 
(as opposed to production or engineering environments), and usually consist of 
up to a few thousand lines of source code. Compact applications are distinct 
from kernel applications since they are capable of producing scientifically
useful results. In many cases, compact applications are made up of several
kernels, interspersed with data movements and I/O operations between the 
kernels.

In this chapter the criteria for selecting compact applications
for the ParkBench suite will be discussed. In addition, the general 
research areas that will be represented in the suite are outlined.

%In this chapter we will discuss a number of compact applications in terms of 
%their purpose, the algorithms used, the types of data movements required, 
%the memory requirements, and
%the amount of I/O. The compact application below are not meant to form a 
%definite or complete list.

\section{Criteria for Selection}
\label{sec:criteria}
The three main criteria for inclusion of a parallel code
in the Compact Applications suite are,
\begin{enumerate}
\item
The code must be a complete application and be capable of producing results
of research interest. These two points distinguish a compact application from
a kernel. For example, a code that only solves a randomly-generated, dense, 
linear system by LU factorization should be considered a kernel. Even though 
the code is complete, it does not produce results of research interest. 
However, if the LU factorization is embedded in an application that uses
the boundary element method to solve, for example, a two-dimensional
elastodynamics problem, then such an application could legitimately be
considered a compact application. 
Compact applications and full production codes are distinguished by their
software complexity, which is difficult to quantify. Software complexity gives
an indication of how hard it is to write, port and maintain an application, 
and may be gauged very roughly by the length of the source code. However, there
is no hard upper limit on the length of a code in the Compact Applications 
suite.  It is expected that the source code (excluding comments and repeated 
common blocks) for most compact applications will be between 2000 and 10000 
lines, but some may be longer.

\item
The code must be of high quality. This means it must have been extensively
tested and validated, preferably on a wide selection of different parallel
architectures. The problem size and number of processors used must not be
hard-coded into the application, and should be specified at runtime as input 
to the program. Ideally, the parallel code should not impose restrictions on 
the problem size that are not applicable for the corresponding sequential code.
Thus, the parallel code should not require that the problem size be exactly 
divisible by the number of processors, or that the number of processors be 
a power of two. In some cases this latter requirement may have to be relaxed.
For example, most parallel fast Fourier transform routines require the number
of processors to be a power of two. It is preferable that the code be
written so that it works correctly for
an arbitrary one-to-one mapping between the logical process topology of the
application and the hardware topology of the parallel computer.
This is desirable so
that the assignment of a location in the logical process topology to a
physical processor can be easily adjusted when porting
the application between platforms. For example a Gray code assignment may
be best for a hypercube, and a natural ordering for a mesh architecture.

\item
The application must be well documented. The source code itself should 
contain an adequate number of comments, and each module should begin
with a comment section that describes what the routine does, and the
arguments passed to it. In addition, there should be a ``Users' Guide''
to the application that describes the input and output, the parameterization
of the problem size and processor layout, and details of what the application
does. The Users' Guide should also contain a bibliography of related
papers.
\end{enumerate}

In addition, to the three criteria discussed above, there are a number of
other desirable features that a ParkBench Compact Application should have.
These are discussed in the following subsections.

\subsection{Self Checking Applications}
\label{subsec:checking}
The application should be self-checking. That is, at the end of the computation
the application should perform a check to validate the results of the run.
The application may also output a summary of performance results for the run,
such as the Mflop rate, and other pertinent information.

\subsection{Programming Languages}
\label{subsec:languages}
The code should be written in Fortran 77, Fortran 90, High Performance Fortran,
or C. Data should be passed between processors by explicit message passing.
ParkBench does not specify which message passing system should be used, but
one that is available on a number of parallel platforms is preferable. 
Eventually it is expected that MPI will become the message passing system
of choice, but in the meantime portable systems such as PVM, PICL, Express,
PARMACS, and P4 are acceptable alternatives. The codes in the
Compact Applications suite should not contain any assembly coded portions,
although assembly code may be used in optimized versions of the code.

\section{Proposed Compact Application Benchmarks}
\label{sec:compact.proposed}
At the time of writing (October 1993) the ParkBench organization is in
the process of soliciting submission of applications for inclusion in
the Compact Applications suite. Thus, the applications that comprise the suite
cannot yet be listed here. However, in this section the main application areas
that are expected to be in the suite are outlined. The intention is that
these areas should be representative of the fields in which parallel
computers are actually used. The codes should exercise a number of different
algorithms, and possess different communication and I/O characteristics.
Initially the Compact Applications suite will
consist of no more than ten codes. This restriction is imposed so that
the resources needed to manage and distribute the suite can be assessed. The
suite may be enlarged in the future if this seems manageable.
Below is a list of the application areas that are expected to be
represented in the suite. This is
not meant to be an exclusive list; submissions from other application areas
will be considered for inclusion in the suite.
\begin{itemize}
\item
Climate and meteorological modeling
\item
Computational fluid dynamics (CFD)
\item
Finance, e.g., portfolio optimization
\item
Molecular dynamics
\item
Plasma physics
\item
Quantum chemistry
\item
Quantum chromodynamics (QCD)
\item
Reservoir modeling
\end{itemize}

\section{Submitting to the Compact Application Suite}
\label{sec:submit}
The procedure for submitting codes to the ParkBench Compact Applications suite
is as follows.
\begin{enumerate}
\item
Complete the submission form in Appendix A, and email it to David Walker
at walker@msr.epm.ornl.gov. The data on this form will be reviewed
by the ParkBench Compact Applications Subcommittee, and the submitter will
be notified if the application is to be considered further for
inclusion in the ParkBench suite.
\item
If ParkBench Compact Applications Subcommittee decides to consider
the application further the submitter will be asked to submit the source code
and input and output files, together with any documentation and papers
about the application. Source code and input and output files should
be submitted by email, or ftp, unless the files are very large, in
which case a tar file on a 1/4 inch cassette tape. Wherever possible
email submission is preferred for all documents in man page, Latex
and/or Postscipt format. These files documents and papers together
constitute the application package. The application package should
be sent to the following address, and the subcommittee will then make a final 
decision on whether to include the application in the ParkBench suite.\par
\smallskip
\indent David W. Walker\par
\indent Oak Ridge National Laboratory\par
\indent Bldg.~6012/MS-6367\par
\indent P. O. Box 2008\par
\indent Oak Ridge, TN 37831-6367\par
\indent (615) 574-7401/0680 (phone/fax)\par
\indent walker@msr.epm.ornl.gov\par

\item
If the application is approved for inclusion in the ParkBench suite
an authorized person from the submitting organization will be asked
to complete and sign a form giving ParkBench authority to distribute,
and modify (if necessary), the application package.
From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:51:57 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11600; Thu, 28 Oct 93 08:51:57 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07295; Thu, 28 Oct 93 08:51:33 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:51:32 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07287; Thu, 28 Oct 93 08:51:31 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA13437; Thu, 28 Oct 1993 08:51:41 -0400
Date: Thu, 28 Oct 1993 08:51:41 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281251.AA13437@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: Compact Appl. Submissions


So far I've received 3 submissions for the ParkBench Compact
Applications suite. I'm sending you the completed forms in 3 
separate email messages.

David
From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:38 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11616; Thu, 28 Oct 93 08:52:38 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07341; Thu, 28 Oct 93 08:52:14 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:13 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07333; Thu, 28 Oct 93 08:52:11 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA11913; Thu, 28 Oct 1993 08:52:21 -0400
Date: Thu, 28 Oct 1993 08:52:21 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA11913@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: POLMP Compact Application


-------------------------------------------------------------------------------
Name of Program         : POLMP
                 (Proudman Oceanographic Laboratory Multiprocessing Program)
-------------------------------------------------------------------------------
Submitter's Name        : Mike Ashworth
Submitter's Organization: NERC Computer Services
Submitter's Address     : Bidston Observatory
			  Birkenhead, L43 7RA, UK
Submitter's Telephone # : +44-51-653-8633
Submitter's Fax #       : +44-51-653-6269
Submitter's Email       : mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert 	: Mike Ashworth
CE's Organization	: NERC Computer Services
CE's Address     	: Bidston Observatory
			  Birkenhead, L43 7RA, UK
CE's Telephone # 	: +44-51-653-8633
CE's Fax #       	: +44-51-653-6269
CE's Email       	: mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Bearing in mind other commitments, Mike Ashworth is prepared to respond 
quickly to questions and bug reports, and expects to be kept informed as 
to results of experiments and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Ocean and Shallow Sea Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

     The POLMP project was created to develop numerical
     algorithms for shallow sea 3D hydrodynamic models that run
     efficiently on modern parallel computers. A code was
     developed, using a set of portable programming conventions
     based upon standard Fortran 77, which follows the wind
     induced flow in a closed rectangular basin including a number
     of arbitrary land areas. The model solves a set of
     hydrodynamic partial differential equations, subject to a set of
     initial conditions, using a mixed explicit/implicit forward time
     integration scheme. The explicit component corresponds to a
     horizontal finite difference scheme and the implicit to a
     functional expansion in the vertical (Davies, Grzonka and
     Stephens, 1989).

     By the end of 1989 the code had been implemented on the RAL
     4 processor Cray X-MP using Cray's microtasking system,
     which provides parallel processing at the level of the Fortran
     DO loop. Acceptable parallel performance was achieved by
     integrating each of the vertical modes in parallel, referred to
     in Ashworth and Davies (1992) as vertical partitioning. In
     particular, a speed-up of 3.15 over single processor execution
     was obtained, with an execution rate of 548 Megaflops
     corresponding to 58 per cent of the peak theoretical
     performance of the machine. Execution on an 8 processor Cray
     Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or
     67 per cent of the peak (Davies, Proctor and O'Neill, 1991).
     The latter resulted in Davies and Grzonka being awarded a
     prize in the 1990 Cray Gigaflop Performance Awards .

     The project has been extended by implementing the shallow
     sea model in a form which is more appropriate to a variety of
     parallel architectures, especially distributed memory
     machines, and to a larger number of processors. It is especially
     desirable to be able to compare shared memory parallel
     architectures with distributed memory architectures. Such a
     comparison is currently relevant to NERC science generally
     and will be a factor in the considerations for the purchase of
     new machines, bids for allocations on other academic
     machines, and for the design of new codes and the
     restructuring of existing codes.

     In order to simplify development of the new code and to ensure
     a proper comparison between machines, a restructured version
     of the Davies and Grzonka rectangle was designed which will
     perform partitioning of the region in the horizontal dimension.
     This has the advantage over vertical partitioning that the
     communication between processors is limited to a few points
     at the boundaries of each sub-domain. The ratio of interior
     points to boundary points, which determines the ratio of
     computation to communication and hence the efficiency on
     message passing, distributed memory machines, may be
     increased by increasing the size of the individual sub-domains.
     This design may also improve the efficiency on shared memory
     machines by reducing the time of the critical section and
     reducing memory conflicts between processors. In addition, the
     required number of vertical modes is only about 16, which,
     though well suited to a 4 or 8 processor machine, does not
     contain sufficient parallelism for more highly parallel
     machines.

     The code has been designed with portability in mind, so that
     essentially the same code may be run on parallel computers
     with a range of architectures. 

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Ashworth and
Davies) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. 

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

Some 8 byte floating point numbers are used in some of the initialization
code, but calculations on the main field arrays may be done using
4 byte floating point variables without grossly affecting the solution.
Nevertheless, precision conversion is facilitated by a switch supplied
to the C preprocessor. By specifying -DSINGLE, variables will be declared
as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be
DOUBLE PRECISION, normally 8 bytes.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The README file supplied with the code describes how the various versions
of the code should be built. Extensive documentation, including the 
definition of all variables in COMMON is present as comments in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic
   sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth.
   in Fliuds, 1983, Vol. 3, 33-60.

2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using
   an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids,
   1991, Vol. 13, 235-250.

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional
   hydrodynamic models for computers with low and high degrees of
   parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert
   and H.Liddell (North Holland, 1992), 553-560.
   
2) Ashworth, M., Parallel Processing in Environmental Modelling, in
   Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
   Meteorology (Nov. 23-27, 1992)
   Hoffman, G.-R and T. Kauranne, ed., 
   World Scientific Publishing Co. Pte. Ltd, Singapore, 1993.

3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional
   Hydrodynamic Model on a Range of Parallel Computers, in
   Proceedings of the Euromicro Workshop on Parallel and Distributed
   Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE
   Computer Society Press)
   
4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M.,
   Implementation of three dimensional shallow sea models on vector
   and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30.
   
5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation
   of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in
   Advances in Parallel Computing, Vol. 2, edited D.J. Evans.
   
6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea
   Hydrodynamic Models in Environmental Science", Cray Channels,
   Winter 1991.

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Code is initially passed through the C preprocessor, allowing a 
number of versions with different programming styles, precisions
and machine dependencies to be generated.

Fortran 77 version

     A sequential version of POLMP is available, which conforms
     to the Fortran 77 standard. This version has been run on a
     large number of machines from workstations to supercomputers 
     and any code which caused problems, even if it conformed to 
     the standard, has been changed or removed. Thus its conformance 
     to the Fortran 77 standard is well established.

     In order to allow the code to run on a wide range of problem
     sizes without recompilation, the major arrays are defined
     dynamically by setting up pointers, with names starting with
     IX, which point to locations in a single large data array: SA.
     Most pointers are allocated in subroutine MODSUB and the
     starting location passed down into subroutines in which they
     are declared as arrays. For example :

     IX1 = 1
     IX2 = IX1 + N*M
     CALL SUB ( SA(IX1), SA(IX2), N, M )

     SUBROUTINE SUB ( A1, A2, N, M )
     DIMENSION A1(N,M), A2(N,M)
     END

     Although this is probably against the spirit of the Fortran 77
     standard, it is considered the best compromise between
     portability and utility, and has caused no problems on any of
     the machines on which it has been tried. 

     The code has been run on a number of traditional vector
     supercomputers, mainframes and workstations. In addition,
     key loops are able to be parallelized automatically by some
     compilers on shared (or virtual shared) memory MIMD machines, 
     allowing parallel execution on the Convex C2 and C3, Cray X-MP, 
     Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray 
     macrotasking calls may also be enabled for an alternative
     mode of parallel execution on Cray multiprocessors.

Message passing version

     POLMP has been implemented on a number of message-passing machines:
     Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2.
     Code is also present for the PVM and Parmacs portable message
     passing systems, and POLMP has run successfully, though not 
     efficiently, on a network of Silicon Graphics workstations. 
     Calls to message passing routines are concentrated 
     in a small number of routines for ease of portability and 
     maintenance. POLMP performs housekeeping tasks on one node of the 
     parallel machine, usually node zero, referred to in the code as the 
     driver process, the remaining processes being workers. For Parmacs
     version 5 which requires a host program, a simple host program has 
     been provided which loads the node program onto a two dimensional 
     torus and then takes no further part in the run, other than to 
     receive a completion code from the driver, in case terminating the 
     host early would interfere with execution of the nodes.

Data parallel versions

     A data parallel version of the code has been run on the
     Thinking Machines CM-2, CM-200 and MasPar MP-1 machines.

     High Performance Fortran (HPF) defines extensions to the
     Fortran 90 language in order to provide support for parallel
     execution on a wide variety of machines using a data parallel
     programming model. 

     The subset-HPF version of the POLMP code has been written
     to the draft standard specified by the High Performance
     Fortran Forum in the HPF Language Specification version 0.4
     dated November 6, 1992. Fortran 90 code was developed on a
     Thinking Machines CM-200 machine and checked for
     conformance with the Fortran 90 standard using the
     NAGWare Fortran 90 compiler. HPF directives were inserted
     by translating from the CM Fortran directives, but have not
     been tested due to the lack of access to an HPF compiler. The
     only HPF features used are the PROCESSORS, TEMPLATE,
     ALIGN and DISTRIBUTE directives and the system inquiry
     intrinsic function NUMBER_OF_PROCESSORS.

-------------------------------------------------------------------------------
Total number of lines in source code: 26,699
Number of lines excluding comments  : 11,313
Size in bytes of source code        : 756,107

-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

steering file:   13 lines, 250 bytes, ascii (typical size)

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: 700 lines, 62,000 bytes, ascii (typical size)

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

POLMP solves the linear three-dimensional hydrodynamic equations 
for the wind induced flow in a closed rectangular basin of constant depth
which may include an arbitrary number of land areas. 

-------------------------------------------------------------------------------
Main algorithms used:

The discretized form of the hydrodynamic equations are solved for field 
variables, z, surface elevation, and u and v, horizontal components of
velocity. The fields are represented in the horizontal by a staggered 
finite difference grid. The profile of vertical velocity with depth
is represented by the superposition of a number of spectral components.
The functions used in the vertical are arbitrary, although the 
computational advantages of using eigenfunctions (modes) of the eddy
viscosity profile have been demonstrated (Davies, 1983). Velocities
at the closed boundaries are set to zero.

Each timestep in the forward time integration of the model, involves
successive updates to the three fields, z, u and v. New field values 
computed in each update are used in the subsequent calculations. A
five point finite difference stencil is used, requiring only nearest 
neighbours on the grid. 

A number of different data storage and data processing methods is 
included mainly for handling cases with significant amounts of land, 
e.g. index array, packed data. In particular the program may be 
switched between masked operation, more suitable for vector processors, 
in which computation is done on all points, but land and boundary points
are masked out, and strip-mining, more suitable for scalar and RISC 
processors, in which calculations are only done for sea points.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The call chart of the major subroutines is represented thus:

  AAAPOL -> APOLMP -> INIT
                   -> RUNPOL -> INIT2  -> MAP
                                       -> DIVIDE
                                       -> PRMAP
                                       -> GENSTP
                                       -> SPEC   -> ROOTS  -> TRANS
                             -> SNDWRK
                             -> RCVWRK
                             -> SETUP
                             -> MODSUB -> MODEL  -> ASSIGN -> GENMSK
                                                           -> GENSTP
                                                           -> GENIND
                                                           -> GENPAC
                                                           -> METRIC
                                                 -> CLRFLD
                                                 -> TIME*  -> SNDBND
                                                           -> RCVBND
                                                 -> RESULT
                             -> SNDRES
                             -> RCVRES
                             -> MODOUT -> OZUVW  -> OUTFLD -> GETRES
                                                           -> OUTARR
                                                           -> GRYARR
                                       -> WSTATE

AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which
reads parameters from the steering file, checks and monitors them.
RUNPOL is then called which calls another initialization routine INIT2.
Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE
divides the domain between processors, PRMAP maps sub-domains onto
processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS
and TRANS set up the coefficients for the spectral expansion.

SNDWRK on the driver process sends details of the sub-domain to be
worked on to each worker. RCVWRK receives that information. SETUP
does some array allocation and MODSUB does the main allocation of array 
space to the field and ancillary arrays. MODEL is the main driver 
subroutine for the model. ASSIGN calls routines to generate masks
strip-mining indexes, packing indexes and measurement metrics.
CLRFLD initializes the main data arrays. Then one of seven time-
stepping routines, TIME*, is chosen dependent on the vectorization
and packing/indexing method used to cope with the presence of land.
SNDBND and RCVBND handle the sending and reception of boundary
data between sub-domains. After the required number of time-steps
is complete, RESULT saves results from the desired region, and 
SNDRES, on the workers and RCVRES on the driver collect the result data.
MODOUT handles the writing of model output to standard output and disk
files, as required.

For a non-trivial run, 99% of time is spent in whichever of the 
timestepping routines, TIME*, has been chosen.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

The driver process, usually processor 0, reads in the input parameters 
and broadcasts them to the rest of the processors. The driver also receives 
the results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. The simulation domain
is divided into a number of sub-domains which are allocated, one sub-domain
per processor.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The number of processors, p, and the number of sub-domains are provided 
as steering parameters, as is a switch which requests either one-dimensional
or two-dimensional partitioning. 

Partitioning is only actually carried out for the message passing versions
of the code. For two-dimensional partitioning p is factored into px and py 
where px and py are as close as possible to sqrt(p). 

For the data parallel version the number of sub-domains is set to one 
and decomposition is performed by the compiler via data distribution 
directives.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Unless land areas are specified, the load is fairly well balanced. 
If px and py evenly divide the number of grid points, then the
model is perfectly balanced except that boundary sub-domains have 
fewer communications.

No tests with land areas have yet been performed with the parallel 
code, and more sophisticated domain decomposition algorithms have
not yet been included.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

nx, ny      Size of horizontal grid
m           Number of vertical modes
nts         Number of timesteps to be performed

-------------------------------------------------------------------------------
Give memory as function of problem size :

See below for specific examples.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Assuming stanrdard compiler optimizations, there is a requirement for
29 floating point operations (18 add/subtracts and 11 multiplies) per 
grid point, so the total computational load is

          29 * nx * ny * m * nts

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py 
requires the following communications in words :

             nsubx * m     from N
             nsubx         from S
             nsubx * m     from S
             nsuby * m     from W
             nsuby         from E
             nsuby * m     from E
             m             from NE
             m             from SW

making a total of 

             (2 * m + 1)*(nsubx * nsuby) + 2*m words 

in eight messages from six directions.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

     The data sizes and computational requirements for the various
     problems supplied are :

     Name      nx x ny x m x nts        Computational    Memory
                                        Load (Gflop)     (Mword)

     dbg        10 x   10 x  1 x 2      Small debugging test case

     dbg2d      10 x   10 x  1 x 2      Small debugging test case
                                        for a 2 x 2 decomposition

     v200      512 x  512 x 16 x 200        24             14 

     wa200    1024 x 1024 x 40 x 200       226            126

     xb200    2048 x 2048 x 80 x 200      1812            984

     The memory sizes are the number of Fortran real elements
     (words) required for the strip-mined case on a single processor.
     For the masked case the memory requirement is approximately doubled 
     for the extra mask arrays. For the message passing versions, the 
     total memory requirement will also tend to increase slightly (<10%) 
     with the number of processors employed.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand looking at inner loops and making reasonable assumptions
about common compiler optimizations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------

-- 
                                    ,?,
                                   (o o)
|------------------------------oOO--(_)--OOo----------------------------|
|                                                                       |
| Dr Mike Ashworth                          NERC Computer Services      |
| NERC Supercomputing Consultant            Bidston Observatory         |
| Tel:         +44 51 653 8633              BIRKENHEAD                  |
| Fax:         +44 51 653 6269              L43 7RA                     |
| email:       mia@ua.nbi.ac.uk             United Kingdom              |
| alternative: M.Ashworth@ncs.nerc.ac.uk                                |
|-----------------------------------------------------------------------|

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:55 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11653; Thu, 28 Oct 93 08:52:55 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07365; Thu, 28 Oct 93 08:52:35 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:34 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07357; Thu, 28 Oct 93 08:52:32 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA16524; Thu, 28 Oct 1993 08:52:41 -0400
Date: Thu, 28 Oct 1993 08:52:41 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA16524@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: PSTSWM Compact Application


Received: from msr.EPM.ORNL.GOV by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA20602; Tue, 5 Oct 1993 09:58:22 -0400
Received: from haven.EPM.ORNL.GOV by msr.epm.ornl.gov (4.1/1.34)
	id AA09050; Tue, 5 Oct 93 09:58:21 EDT
Received: by haven.EPM.ORNL.GOV (4.1/1.34)
	id AA13369; Tue, 5 Oct 93 09:58:14 EDT
Date: Tue, 5 Oct 93 09:58:14 EDT
From: worley@haven.epm.ornl.gov (Pat Worley)
Message-Id: <9310051358.AA13369@haven.EPM.ORNL.GOV>
To: walker@msr.epm.ornl.gov

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree"  :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevent research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------

From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:53:23 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA11659; Thu, 28 Oct 93 08:53:23 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07386; Thu, 28 Oct 93 08:52:54 -0400
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:53 EDT
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA07372; Thu, 28 Oct 93 08:52:51 -0400
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA13457; Thu, 28 Oct 1993 08:52:59 -0400
Date: Thu, 28 Oct 1993 08:52:59 -0400
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9310281252.AA13457@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: SOLVER Compact Application


Received: from sun2.nsfnet-relay.ac.uk by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA21681; Mon, 18 Oct 1993 01:55:44 -0400
Via: uk.ac.edinburgh.castle; Mon, 18 Oct 1993 06:31:49 +0100
Received: from epcc.ed.ac.uk by castle.ed.ac.uk id aa21204; 18 Oct 93 6:31 BST
Received: from subnode.epcc.ed.ac.uk (feldspar.epcc.ed.ac.uk) by epcc.ed.ac.uk;
          Sun, 17 Oct 93 16:28:48 BST
Date: Sun, 17 Oct 93 16:28:46 BST
Message-Id: <2567.9310171528@subnode.epcc.ed.ac.uk>
From: S P Booth <spb@epcc.edinburgh.ac.uk>
Subject: Re: ParkBench applications
To: "David W. Walker" <walker@rios2.epm.ornl.gov>
In-Reply-To: David W. Walker's message of Fri, 15 Oct 93 13:23:46 -0500


Sorry I took so long to reply to this.
If any of this needs any futher clarification don't hesitate to send me
some email.
		spb

-------------------------------------------------------------------------
                  PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscript format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : SOLVER
                        : 
-------------------------------------------------------------------------------
Submitter's Name        : Stephen P. Booth
Submitter's Organization: UKQCD collaboration
Submitter's Address     : EPCC
			  The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
Submitter's Telephone # : +44 (0)31 650 5746
Submitter's Fax #       : +44 (0)31 622 4712
Submitter's Email       : spb@epcc.ed.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Dr S.P.Booth
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5746
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : spb@epcc.ed.ac.uk

Cognizant Expert(s)     : Dr R.D. Kenway
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5245
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : rdk@epcc.ed.ac.uk

-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

S.Booth is prepared to respond quickly to questions and bug reports.
We have a strong interest in the portability and performance of this code.


-------------------------------------------------------------------------------
Major Application Field : Lattice gauge theory
Application Subfield(s) : QCD
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

SOLVER is part of an ongoing software development exercise carried out
by UKQCD (The United Kingdom Quantum Chromo-Dynamics  collaboration)
To develop a new generation of simulation codes. The current generation
of codes were highly tuned for a particular machine architecture so a
software development exercise was started to design and develop a set of
portable codes. This code was developed by S.Booth and N.Stanford of
the University of Edinburgh during the course of 1993.
Solver is a benchmark code derived from the codes used to generate quark
propagators. It is designed to benchmark and validate the computational 
sections of this operation. It differs from the production code in that
it self initialises to non-trivial test data rather than performing file
access. This is because there is no accepted standard for parallel file
access.
The benchmark was originally developed as part of a national UK procurement
exercise.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed for benchmarking purposes but 
the code remains the property of UKQCD and we ask to be contacted
if anyone wishes to use it as an application code.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

All floating point numbers are defined as macros (either Fpoint or Dpoint)
The majority of the variables are Fpoint. Dpoint is only used for
accumulation values that may require higher precision. This allows the
precision of the program to be changed easily. For small and
intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be 
sufficient. For large problems higher precision may be required.
INTEGERS must be large enough to hold the number of sites 
allocated to a processor (4 bytes almost certainly sufficient)
The COMPLEX type is not used.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

Documentation exists for all program routines except some low level
routines local to a single source file.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Two version of the application were developed in parallel.
1) A HPF version (both CMF and HPF directives)
2) A message passing version.

The message passing version uses ansi-F77 with the following extensions
a) CPP is used for include files and some simple macros and build-time 
   conditionals.
b) The F77 restrictions of variable names are not adhered to though the
   authors have tools to convert the code to conform.

All of the message passing operations are confined to a small number of
routines. These routines were designed to be implementable in as many
different message passing systems as possible. Current versions are
1) fake - converts the program to a single processor code.
2) PARMACS - original parallel versions
3) PVM - under development.

-------------------------------------------------------------------------------
Total number of lines in source code: 15567
Number of lines excluding comments  : 10679
Size in bytes of source code        : 432398
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

None 

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: formatted text

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

The application generates quark propagators from a  background gauge
configuration and a fermionic source. This is equivalent to solving 
M psi = source 
where psi is the quark propagator and M (a function operating on psi)
depends on the gauge fields.
The benchmark performs a cut down version of this operation.

-------------------------------------------------------------------------------
Main algorithms used:

Conjugate gradient least norm with red-black pre-conditioning.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The benchmark code initialises the gauge field to a unit gauge
configuration. (The results for a unit gauge can be calculated
analytically allowing a check on the results)
A gauge transformation is then applied to the gauge field. A unit gauge
field only consists of zeros and ones by applying a gauge transformation
non-trivial values are generated. Quantities corresponding to physical
observables should be unchanged by such a transformation. 
In application code the gauge field would have been read in from disk.
The source field is initialised to a point source (a single non-zero
point on one lattice site)
An iterative solver is called to generate the quark propagator.
The solver routine also generates timing information.
In application code this would then be dumped to disk.
In the benchmark we use the quark propagator to generate a physically
significant quantity (the pion propagator). This generates a single real
number for each timeslice of the lattice. These values are printed to
standard out.

This procedure requires a large number of iterations. For benchmarking
we are only interested in the time per-iteration and some check on the
validity of the results. We therefore usually only perform a fixed
number of iterations (say 50) to generate accurate timing information
and verify the results by comparison with other machines.

-------------------------------------------------------------------------------
Brief description of I/O behaviour:

Unless an error occurs a single processor outputs to standard out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :
A spacial decomposition is used to distribute the 4-D arrays over a 4-D
grid of processors. Each dimension is distributed independently.
The program supports non-regular decomposition,
e.g. a lattice of width 22 will be distributed across a processor-grid
of width 4 as (6, 6, 5, 5)

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :
Lattice size:     NX NY NZ NT
processor grid:   NPX NPY NPZ NPT

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Load balancing depends only on the distribution, if the lattice size can
be exactly divided by the processor grid size all processors will have 
the same workload. In practice it is often useful to trade load
balancing for a larger number of processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :
Lattice size, NX NY NZ NT
problem size is NX*NY*NZ*NT
-------------------------------------------------------------------------------
Give memory as function of problem size :

In a production environment there are build time parameters that
set the array sizes and problem/machine sizes can be set at runtime. 
When creating a benchmark program it seemed less confusing to set
lattice and processor-grid sizes at build time and derive all other
quantities from them. The appropriate parameters for memory use are
Max_body (maximum number of data-points per/processor)
Max_bound (maximum number of data points on a single boundary between
   two processors)
If LX LY LZ LT are the local lattice sizes obtained by dividing the
lattice size by the processor grid size and rounding up to the nearest integer.
Max_body = (LX*LY*LZ*LT)/2
Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 )

The code contains a number of build-time switches for variations
in the implementation that may be beneficial on some machines. The
memory usage depends on these switches but typical values are:
108 * Max_body + 36 * Max_bound Fpoints
16 * (Max_body + Max_bound) INTEGERS

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Each iteration performs 2760 floating point operations per lattice site.
ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

For each iteration every processor sends 24 messages to each of its 8
neighbours each message contains one floating point number for each
lattice point in the common boundary. Two global sum operations are also
performed for each iteration.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

18^3*36		2.90e+10 fp operations
24^3*48		9.16e+10 fp operations
36^3*72		4.64e+11 fp operations

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

count operations in each loop by hand. The code contains a counter to
sum these values.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------


From owner-pbwg-compactapp@CS.UTK.EDU Wed Nov  3 09:19:23 1993
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib)
	id AA22427; Wed, 3 Nov 93 09:19:23 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA27464; Wed, 3 Nov 93 09:18:54 -0500
X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 3 Nov 1993 09:18:53 EST
Errors-To: owner-pbwg-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK)
	id AA27455; Wed, 3 Nov 93 09:18:52 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA15591; Wed, 3 Nov 1993 09:18:51 -0500
Date: Wed, 3 Nov 1993 09:18:51 -0500
From: walker@rios2.epm.ornl.gov (David Walker)
Message-Id: <9311031418.AA15591@rios2.epm.ornl.gov>
To: pbwg-compactapp@cs.utk.edu
Subject: ARCO Compact Application Submission


-------------------------------------------------------------------------------
Name of Program         : ARCO Parallel Seismic Processing Benchmarks
-------------------------------------------------------------------------------
Submitter's Name        : Charles C. Mosher
Submitter's Organization: ARCO Exploration and Production Technology
Submitter's Address     : 2300 West Plano Parkway
                          Plano, TX 75075-8499

Submitter's Telephone # : (214)754-6468
Submitter's Fax #       : (214)754-3016
Submitter's Email       : ccm@arco.com
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Charles C. Mosher

Cognizant Expert(s)     : Siamak Hassanzadeh (co-author)
CE's Organization       : Fujitsu America
CE's Email              : siamak@fai.com
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Will handle reasonable requests in a timley fashion.

-------------------------------------------------------------------------------
Major Application Field : Seismic Data Processing
Application Subfield(s) : Parallel I/O, signal processing, solution of PDE's
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

The application began as a prototype system for seismic data processing
on parallel computing architectures.  The prototype was used to design
and implement production seismic processing on ARCO's Intel iPSC/860, where
it is used today.

Like other companies, ARCO continues to upgrade our HPC facilities.  We found
that we were spending a large amount of time on benchmarking, as were other
companies in the oil industry.  We decided to place our system in the public
domain as a benchmark suite, in the hopes that the benchmarking effort could
be spread across many participants. In addition, we hope to use the system
as a mechanism for code development and sharing between academia, national
labs, and industry.

Our first attempt was to work with the Perfect Benchmark Club at the
University of Illinois Center for Supercomputing Research and Development.
Many members of that group provided valuable input that significantly
improved the structure and content of the suite.  Special thanks to 
David Schneider for his work on organizing and managing the Perfect effort.

Perfect has since disbanded, which leads us to the ParKBench submission.
A consulting organization (Resource 2000) has also picked up the code and
is providing newsletter subscriptions to participants in the oil industry
describing both benchmark numbers and commentary on usability of the sytems
tested.  Thanks to Randy Premont, Gary Montry, and Clive Bailley of Resource
2000 for their continuing work to make the ARCO suite a viable benchmark.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed.  We request that ARCO and the authors be
acknowledged in publications.

In order to ensure relevance of the codes in the suite, the authors plan
to retain control of the source and algorithms contained therein, and request
that suggestions for changes and updates be directed to the authors only.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

        Integers :   4 bytes
        Floats   :   4 bytes

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

High level: ARCO Seismic Benchark Suite Users's Guide
Low  level: source comments

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

Yilmaz, Ozdogan, 1990, Seismic Data Processing: Investigations in Geophysics
    vol. 2, Society of Exploration Geophysicists, P.O. Box 702740,
    Tulsa, Oklahoma, 74170

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite 
    for Parallel Seismic Processing, Supercomputing 1992 proceedings.

-------------------------------------------------------------------------------
Other relevant research papers:


-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Language: 
    Fortran 77

Message Passing: 
    Yet Another Message Passing Layer (YAMPL)
	Sample implementations for PVM, Intel NX, TCGMSG

Machines Supported:
	Workstation clusters and multiprocessors (i.e. Sun, Dec, HP, IBM, SGI)
	Cray YMP 
	Intel iPSC/860

-------------------------------------------------------------------------------
Total number of lines in source code: ~ 20000
Number of lines excluding comments  : ~ 15000
Size in bytes of source code        : ~ 1 MByte 
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

ASCI parameter files, 10-100 lines

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

Binary seismic data files, 1 MByte (small),  1 GByte (medium), 
                          10 Gbyte (large), 100 Gbyte (huge)


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

Synthetic seismic data for small, medium and large test cases are generated
in the native format of the target machine.  The test data are read and
processed in parallel, and the output is written to disk.  Simple checksum
and timing tables are printed to standard output.  A simple x-windows image
display tool is used to verify correctness of results.

-------------------------------------------------------------------------------
Main algorithms used:

Signal processing (FFT's, Toepplitz equation solvers, interpolation)
Seismic Imaging (Fourier domain, Kirchhoff integral, 
   finite difference algorithms)

-------------------------------------------------------------------------------
Skeleton sketch of application:

Processing modules are applied in a pipeline fashion to 2D arrays of seismic
data read from disk.  Processing flows are of the form READ-FLTR-MIGR-WRIT.
The same flow is executed on all processors.  Individual modules communicate
via message passing to implement parallel algorithms.  Nearly all message
passing is hidden via transpose operations that change the parallel data 
distribution as appropriate for each algorithm.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

2D arrays are read/written from HDF style files on disk.  Parallel I/O is
supported for both a single large file read by multiple processors, and a
a separate file read by each processor.  A significant part of the seismic
processing flow requires data to be read in transposed fashion across all
processors.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Assumes a homogeneous array of processors with similar capabilities.
Load balance is rudimentary, with an attempt to distribute equal-sized
'workstation' chunks of work.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

Seismic data is inherently parallel, with large data sets that offer mutliple
opportunities for parallel operation.  Typically, the data is treated as a
collection of 2D arrays, with each processor owning a 'slab' of data.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The data is defined as a 4-dimensional array with Fortran dimensions
(sample, trace, frame, volume).  The third dimension (frame) is typically
spread across the processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

The ASCII parameter files define the data set size in terms of the number
of samples per seismic traces, the number of traces per shot, the number
of shooting lines, and the number of 3D volumes.

-------------------------------------------------------------------------------
Give memory as function of problem size :

Requires enough memory to hold 2 frames on each node, and a 3D volume
spread across the node.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Reported by code as appropriate. On a Cray YMP, medium sized problems with
750 MB of output run at 30-100 Mflops for about an hour.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

On an Intel iPSC/860, there are parts of the suite that have comp/comm
ratios ranging from near infinite to 1/10.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

small: 1 MB output, 10 sec on YMP
medium: 1 GB output, 1 hour on YMP
large: 10 GB output, 10 hours on YMP

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Hand count for simple operations, Regression analysis of Cray HPM results
for more complex operations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------
From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 09:57:45 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id JAA13757; Tue, 22 Mar 1994 09:57:44 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id JAA09199; Tue, 22 Mar 1994 09:57:20 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 09:57:19 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id JAA09186; Tue, 22 Mar 1994 09:57:17 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA24475; Tue, 22 Mar 1994 09:57:26 -0500
Message-Id: <9403221457.AA24475@rios2.epm.ornl.gov>
To: ccm@arco.com
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 09:57:26 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Mosher,

Thank you for submitting the ARCO Parallel Seismic Processing Benchmarks for
inclusion in the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers (if available)

   Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite
   for Parallel Seismic Processing, Supercomputing 1992 proceedings.

   ARCO Seismic Benchark Suite Users's Guide

   and any other relevant papers you may have online.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

All the above will probably come to several Mbytes, so it is probably not
appropriate to email it to me. Do you have an anonymous ftp site where I
could copy the files from?

Best Regards,
David Walker
From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:12:48 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA13948; Tue, 22 Mar 1994 10:12:48 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10288; Tue, 22 Mar 1994 10:11:05 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:10:55 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10257; Tue, 22 Mar 1994 10:10:50 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18866; Tue, 22 Mar 1994 10:07:46 -0500
Message-Id: <9403221507.AA18866@rios2.epm.ornl.gov>
To: mia@unixa.nerc-bidston.ac.uk
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:07:46 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Ashworth,

Thank you for submitting the POLMP code for inclusion in
the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission (v200, wa200, xb200)

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers mentioned in your submission
   describing the sequential and parallel codes (if available). Also the
   users guide if there is one.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

If the above files are too large to email to me, please let me know if there
is an anonymous ftp site where I can copy them from.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------

-------------------------------------------------------------------------------
Name of Program         : POLMP
                 (Proudman Oceanographic Laboratory Multiprocessing Program)
-------------------------------------------------------------------------------
Submitter's Name        : Mike Ashworth
Submitter's Organization: NERC Computer Services
Submitter's Address     : Bidston Observatory
			  Birkenhead, L43 7RA, UK
Submitter's Telephone # : +44-51-653-8633
Submitter's Fax #       : +44-51-653-6269
Submitter's Email       : mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert 	: Mike Ashworth
CE's Organization	: NERC Computer Services
CE's Address     	: Bidston Observatory
			  Birkenhead, L43 7RA, UK
CE's Telephone # 	: +44-51-653-8633
CE's Fax #       	: +44-51-653-6269
CE's Email       	: mia@ua.nbi.ac.uk
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Bearing in mind other commitments, Mike Ashworth is prepared to respond 
quickly to questions and bug reports, and expects to be kept informed as 
to results of experiments and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Ocean and Shallow Sea Modeling
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

     The POLMP project was created to develop numerical
     algorithms for shallow sea 3D hydrodynamic models that run
     efficiently on modern parallel computers. A code was
     developed, using a set of portable programming conventions
     based upon standard Fortran 77, which follows the wind
     induced flow in a closed rectangular basin including a number
     of arbitrary land areas. The model solves a set of
     hydrodynamic partial differential equations, subject to a set of
     initial conditions, using a mixed explicit/implicit forward time
     integration scheme. The explicit component corresponds to a
     horizontal finite difference scheme and the implicit to a
     functional expansion in the vertical (Davies, Grzonka and
     Stephens, 1989).

     By the end of 1989 the code had been implemented on the RAL
     4 processor Cray X-MP using Cray's microtasking system,
     which provides parallel processing at the level of the Fortran
     DO loop. Acceptable parallel performance was achieved by
     integrating each of the vertical modes in parallel, referred to
     in Ashworth and Davies (1992) as vertical partitioning. In
     particular, a speed-up of 3.15 over single processor execution
     was obtained, with an execution rate of 548 Megaflops
     corresponding to 58 per cent of the peak theoretical
     performance of the machine. Execution on an 8 processor Cray
     Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or
     67 per cent of the peak (Davies, Proctor and O'Neill, 1991).
     The latter resulted in Davies and Grzonka being awarded a
     prize in the 1990 Cray Gigaflop Performance Awards .

     The project has been extended by implementing the shallow
     sea model in a form which is more appropriate to a variety of
     parallel architectures, especially distributed memory
     machines, and to a larger number of processors. It is especially
     desirable to be able to compare shared memory parallel
     architectures with distributed memory architectures. Such a
     comparison is currently relevant to NERC science generally
     and will be a factor in the considerations for the purchase of
     new machines, bids for allocations on other academic
     machines, and for the design of new codes and the
     restructuring of existing codes.

     In order to simplify development of the new code and to ensure
     a proper comparison between machines, a restructured version
     of the Davies and Grzonka rectangle was designed which will
     perform partitioning of the region in the horizontal dimension.
     This has the advantage over vertical partitioning that the
     communication between processors is limited to a few points
     at the boundaries of each sub-domain. The ratio of interior
     points to boundary points, which determines the ratio of
     computation to communication and hence the efficiency on
     message passing, distributed memory machines, may be
     increased by increasing the size of the individual sub-domains.
     This design may also improve the efficiency on shared memory
     machines by reducing the time of the critical section and
     reducing memory conflicts between processors. In addition, the
     required number of vertical modes is only about 16, which,
     though well suited to a 4 or 8 processor machine, does not
     contain sufficient parallelism for more highly parallel
     machines.

     The code has been designed with portability in mind, so that
     essentially the same code may be run on parallel computers
     with a range of architectures. 

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Ashworth and
Davies) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. 

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

Some 8 byte floating point numbers are used in some of the initialization
code, but calculations on the main field arrays may be done using
4 byte floating point variables without grossly affecting the solution.
Nevertheless, precision conversion is facilitated by a switch supplied
to the C preprocessor. By specifying -DSINGLE, variables will be declared
as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be
DOUBLE PRECISION, normally 8 bytes.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The README file supplied with the code describes how the various versions
of the code should be built. Extensive documentation, including the 
definition of all variables in COMMON is present as comments in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic
   sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth.
   in Fliuds, 1983, Vol. 3, 33-60.

2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using
   an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids,
   1991, Vol. 13, 235-250.

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional
   hydrodynamic models for computers with low and high degrees of
   parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert
   and H.Liddell (North Holland, 1992), 553-560.
   
2) Ashworth, M., Parallel Processing in Environmental Modelling, in
   Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
   Meteorology (Nov. 23-27, 1992)
   Hoffman, G.-R and T. Kauranne, ed., 
   World Scientific Publishing Co. Pte. Ltd, Singapore, 1993.

3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional
   Hydrodynamic Model on a Range of Parallel Computers, in
   Proceedings of the Euromicro Workshop on Parallel and Distributed
   Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE
   Computer Society Press)
   
4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M.,
   Implementation of three dimensional shallow sea models on vector
   and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30.
   
5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation
   of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in
   Advances in Parallel Computing, Vol. 2, edited D.J. Evans.
   
6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea
   Hydrodynamic Models in Environmental Science", Cray Channels,
   Winter 1991.

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Code is initially passed through the C preprocessor, allowing a 
number of versions with different programming styles, precisions
and machine dependencies to be generated.

Fortran 77 version

     A sequential version of POLMP is available, which conforms
     to the Fortran 77 standard. This version has been run on a
     large number of machines from workstations to supercomputers 
     and any code which caused problems, even if it conformed to 
     the standard, has been changed or removed. Thus its conformance 
     to the Fortran 77 standard is well established.

     In order to allow the code to run on a wide range of problem
     sizes without recompilation, the major arrays are defined
     dynamically by setting up pointers, with names starting with
     IX, which point to locations in a single large data array: SA.
     Most pointers are allocated in subroutine MODSUB and the
     starting location passed down into subroutines in which they
     are declared as arrays. For example :

     IX1 = 1
     IX2 = IX1 + N*M
     CALL SUB ( SA(IX1), SA(IX2), N, M )

     SUBROUTINE SUB ( A1, A2, N, M )
     DIMENSION A1(N,M), A2(N,M)
     END

     Although this is probably against the spirit of the Fortran 77
     standard, it is considered the best compromise between
     portability and utility, and has caused no problems on any of
     the machines on which it has been tried. 

     The code has been run on a number of traditional vector
     supercomputers, mainframes and workstations. In addition,
     key loops are able to be parallelized automatically by some
     compilers on shared (or virtual shared) memory MIMD machines, 
     allowing parallel execution on the Convex C2 and C3, Cray X-MP, 
     Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray 
     macrotasking calls may also be enabled for an alternative
     mode of parallel execution on Cray multiprocessors.

Message passing version

     POLMP has been implemented on a number of message-passing machines:
     Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2.
     Code is also present for the PVM and Parmacs portable message
     passing systems, and POLMP has run successfully, though not 
     efficiently, on a network of Silicon Graphics workstations. 
     Calls to message passing routines are concentrated 
     in a small number of routines for ease of portability and 
     maintenance. POLMP performs housekeeping tasks on one node of the 
     parallel machine, usually node zero, referred to in the code as the 
     driver process, the remaining processes being workers. For Parmacs
     version 5 which requires a host program, a simple host program has 
     been provided which loads the node program onto a two dimensional 
     torus and then takes no further part in the run, other than to 
     receive a completion code from the driver, in case terminating the 
     host early would interfere with execution of the nodes.

Data parallel versions

     A data parallel version of the code has been run on the
     Thinking Machines CM-2, CM-200 and MasPar MP-1 machines.

     High Performance Fortran (HPF) defines extensions to the
     Fortran 90 language in order to provide support for parallel
     execution on a wide variety of machines using a data parallel
     programming model. 

     The subset-HPF version of the POLMP code has been written
     to the draft standard specified by the High Performance
     Fortran Forum in the HPF Language Specification version 0.4
     dated November 6, 1992. Fortran 90 code was developed on a
     Thinking Machines CM-200 machine and checked for
     conformance with the Fortran 90 standard using the
     NAGWare Fortran 90 compiler. HPF directives were inserted
     by translating from the CM Fortran directives, but have not
     been tested due to the lack of access to an HPF compiler. The
     only HPF features used are the PROCESSORS, TEMPLATE,
     ALIGN and DISTRIBUTE directives and the system inquiry
     intrinsic function NUMBER_OF_PROCESSORS.

-------------------------------------------------------------------------------
Total number of lines in source code: 26,699
Number of lines excluding comments  : 11,313
Size in bytes of source code        : 756,107

-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

steering file:   13 lines, 250 bytes, ascii (typical size)

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: 700 lines, 62,000 bytes, ascii (typical size)

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

POLMP solves the linear three-dimensional hydrodynamic equations 
for the wind induced flow in a closed rectangular basin of constant depth
which may include an arbitrary number of land areas. 

-------------------------------------------------------------------------------
Main algorithms used:

The discretized form of the hydrodynamic equations are solved for field 
variables, z, surface elevation, and u and v, horizontal components of
velocity. The fields are represented in the horizontal by a staggered 
finite difference grid. The profile of vertical velocity with depth
is represented by the superposition of a number of spectral components.
The functions used in the vertical are arbitrary, although the 
computational advantages of using eigenfunctions (modes) of the eddy
viscosity profile have been demonstrated (Davies, 1983). Velocities
at the closed boundaries are set to zero.

Each timestep in the forward time integration of the model, involves
successive updates to the three fields, z, u and v. New field values 
computed in each update are used in the subsequent calculations. A
five point finite difference stencil is used, requiring only nearest 
neighbours on the grid. 

A number of different data storage and data processing methods is 
included mainly for handling cases with significant amounts of land, 
e.g. index array, packed data. In particular the program may be 
switched between masked operation, more suitable for vector processors, 
in which computation is done on all points, but land and boundary points
are masked out, and strip-mining, more suitable for scalar and RISC 
processors, in which calculations are only done for sea points.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The call chart of the major subroutines is represented thus:

  AAAPOL -> APOLMP -> INIT
                   -> RUNPOL -> INIT2  -> MAP
                                       -> DIVIDE
                                       -> PRMAP
                                       -> GENSTP
                                       -> SPEC   -> ROOTS  -> TRANS
                             -> SNDWRK
                             -> RCVWRK
                             -> SETUP
                             -> MODSUB -> MODEL  -> ASSIGN -> GENMSK
                                                           -> GENSTP
                                                           -> GENIND
                                                           -> GENPAC
                                                           -> METRIC
                                                 -> CLRFLD
                                                 -> TIME*  -> SNDBND
                                                           -> RCVBND
                                                 -> RESULT
                             -> SNDRES
                             -> RCVRES
                             -> MODOUT -> OZUVW  -> OUTFLD -> GETRES
                                                           -> OUTARR
                                                           -> GRYARR
                                       -> WSTATE

AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which
reads parameters from the steering file, checks and monitors them.
RUNPOL is then called which calls another initialization routine INIT2.
Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE
divides the domain between processors, PRMAP maps sub-domains onto
processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS
and TRANS set up the coefficients for the spectral expansion.

SNDWRK on the driver process sends details of the sub-domain to be
worked on to each worker. RCVWRK receives that information. SETUP
does some array allocation and MODSUB does the main allocation of array 
space to the field and ancillary arrays. MODEL is the main driver 
subroutine for the model. ASSIGN calls routines to generate masks
strip-mining indexes, packing indexes and measurement metrics.
CLRFLD initializes the main data arrays. Then one of seven time-
stepping routines, TIME*, is chosen dependent on the vectorization
and packing/indexing method used to cope with the presence of land.
SNDBND and RCVBND handle the sending and reception of boundary
data between sub-domains. After the required number of time-steps
is complete, RESULT saves results from the desired region, and 
SNDRES, on the workers and RCVRES on the driver collect the result data.
MODOUT handles the writing of model output to standard output and disk
files, as required.

For a non-trivial run, 99% of time is spent in whichever of the 
timestepping routines, TIME*, has been chosen.

-------------------------------------------------------------------------------
Brief description of I/O behavior:

The driver process, usually processor 0, reads in the input parameters 
and broadcasts them to the rest of the processors. The driver also receives 
the results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. The simulation domain
is divided into a number of sub-domains which are allocated, one sub-domain
per processor.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The number of processors, p, and the number of sub-domains are provided 
as steering parameters, as is a switch which requests either one-dimensional
or two-dimensional partitioning. 

Partitioning is only actually carried out for the message passing versions
of the code. For two-dimensional partitioning p is factored into px and py 
where px and py are as close as possible to sqrt(p). 

For the data parallel version the number of sub-domains is set to one 
and decomposition is performed by the compiler via data distribution 
directives.

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Unless land areas are specified, the load is fairly well balanced. 
If px and py evenly divide the number of grid points, then the
model is perfectly balanced except that boundary sub-domains have 
fewer communications.

No tests with land areas have yet been performed with the parallel 
code, and more sophisticated domain decomposition algorithms have
not yet been included.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

nx, ny      Size of horizontal grid
m           Number of vertical modes
nts         Number of timesteps to be performed

-------------------------------------------------------------------------------
Give memory as function of problem size :

See below for specific examples.

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Assuming stanrdard compiler optimizations, there is a requirement for
29 floating point operations (18 add/subtracts and 11 multiplies) per 
grid point, so the total computational load is

          29 * nx * ny * m * nts

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py 
requires the following communications in words :

             nsubx * m     from N
             nsubx         from S
             nsubx * m     from S
             nsuby * m     from W
             nsuby         from E
             nsuby * m     from E
             m             from NE
             m             from SW

making a total of 

             (2 * m + 1)*(nsubx * nsuby) + 2*m words 

in eight messages from six directions.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

     The data sizes and computational requirements for the various
     problems supplied are :

     Name      nx x ny x m x nts        Computational    Memory
                                        Load (Gflop)     (Mword)

     dbg        10 x   10 x  1 x 2      Small debugging test case

     dbg2d      10 x   10 x  1 x 2      Small debugging test case
                                        for a 2 x 2 decomposition

     v200      512 x  512 x 16 x 200        24             14 

     wa200    1024 x 1024 x 40 x 200       226            126

     xb200    2048 x 2048 x 80 x 200      1812            984

     The memory sizes are the number of Fortran real elements
     (words) required for the strip-mined case on a single processor.
     For the masked case the memory requirement is approximately doubled 
     for the extra mask arrays. For the message passing versions, the 
     total memory requirement will also tend to increase slightly (<10%) 
     with the number of processors employed.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand looking at inner loops and making reasonable assumptions
about common compiler optimizations.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------

-- 
                                    ,?,
                                   (o o)
|------------------------------oOO--(_)--OOo----------------------------|
|                                                                       |
| Dr Mike Ashworth                          NERC Computer Services      |
| NERC Supercomputing Consultant            Bidston Observatory         |
| Tel:         +44 51 653 8633              BIRKENHEAD                  |
| Fax:         +44 51 653 6269              L43 7RA                     |
| email:       mia@ua.nbi.ac.uk             United Kingdom              |
| alternative: M.Ashworth@ncs.nerc.ac.uk                                |
|-----------------------------------------------------------------------|









From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:14:36 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA13973; Tue, 22 Mar 1994 10:14:35 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10524; Tue, 22 Mar 1994 10:14:19 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:14:18 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10516; Tue, 22 Mar 1994 10:14:14 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA18130; Tue, 22 Mar 1994 10:14:23 -0500
Message-Id: <9403221514.AA18130@rios2.epm.ornl.gov>
To: worley@rios2.epm.ornl.gov
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:14:23 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Pat,

Thank you for submitting the PSTSWM for
inclusion in the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:

1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission (T21, T42, and T85)

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of any papers describing the sequential and parallel
   algorithms that you may have available.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------

                 PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscipt format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------


-------------------------------------------------------------------------------
Name of Program         : PSTSWM 
                        : (Parallel Spectral Transform Shallow Water Model)
-------------------------------------------------------------------------------
Submitter's Name        : Patrick H. Worley
Submitter's Organization: Oak Ridge National Laboratory
Submitter's Address     : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
Submitter's Telephone # : (615) 574-3128
Submitter's Fax #       : (615) 574-0680
Submitter's Email       : worley@msr.epm.ornl.gov
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Patrick H. Worley
CE's Organization       : Oak Ridge National Laboratory
CE's Address            : Bldg. 6012/MS-6367
                          P. O. Box 2008
                          Oak Ridge, TN 37831-6367
CE's Telephone #        : (615) 574-3128
CE's Fax #              : (615) 574-0680
CE's Email              : worley@msr.epm.ornl.gov

Cognizant Expert(s)     : Ian T. Foster
CE's Organization       : Argonne National Laboratory
CE's Address            : MCS 221/D-235
                          9700 S. Cass Avenue
                          Argonne, IL 60439
CE's Telephone #        : (708) 252-4619
CE's Fax #              : (708) 252-5986
CE's Email              : itf@mcs.anl.gov
-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

Modulo other commitments, Worley is prepared to respond quickly to questions
and bug reports, but expects to be kept informed as to results of experiments
and modifications to the code.

-------------------------------------------------------------------------------
Major Application Field : Fluid Dynamics
Application Subfield(s) : Climate Modeling
-------------------------------------------------------------------------------
Application "pedigree"  :

PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm
testbed that solves the nonlinear shallow water equations using the spectral
transform method. The spectral transform algorithm of the code follows
closely how CCM2, the NCAR Community Climate Model, handles the dynamical
part of the primitive equations, and the parallel algorithms implemented in
the model include those currently used in the message-passing parallel
implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge
National Laboratory and Ian Foster of Argonne National Laboratory, and is
based partly on previous parallel algorithm research by John Drake, David
Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code
development and parallel algorithms research were funded by the DOE Computer
Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The
features of version 1.0 were frozen on 8/1/93, and it is this version we
would offer initially as a benchmark.  

PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written
by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations 
on a sphere using the spectral transform method. STSWM evolved from a
spectral shallow water model written by Hack (NCAR/CGD) to compare numerical
schemes designed to solve the divergent barotropic equations in spherical
geometry. STSWM was written partially to provide the reference solutions
to the test cases proposed by Williamson et. al. (see citation [4] below),
which were chosen to test the ability of numerical methods to simulate
important flow phenomena. These test cases are embedded in the code and 
are selectable at run-time via input parameters, specifying initial conditions,
forcing, and analytic solutions (for error analysis). The solutions are also
published in a Technical Note by Jakob et. al. [3]. In addition, this code is
meant to serve as an educational tool for numerical studies of the shallow
water equations. A detailed description of the spectral transform method, and
a derivation of the equations used in this software, can be found in the
Technical Note by Hack and Jakob [2].  

For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the
correct communication and computation granularity for 3-D weather and climate
codes), to increase modularity and support code reuse, and to allow the
problem size to be selected at runtime without depending on dynamic memory
allocation. PSTSTWM is meant to be a compromise between paper benchmarks and
the usual fixed benchmarks by allowing a significant amount of
runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the
numerical simulation can be run on different machines without fixing the
parallel implementation, but forcing all implementations to execute the same
numerical code (to guarantee fairness). The code has also been written in
such a way that linking in optimized library functions for common operations
instead of the "portable" code will simple.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

Yes, but users are requested to acknowledge the authors (Worley and
Foster) and the program that supported the development of the code
(DOE CHAMMP program) in any resulting research or publications, and are
encouraged to send reprints of their work with this code to the authors.
Also, the authors would appreciate being notified of any modifications to 
the code. Finally, the code has been written to allow easy reuse of code in
other applications, and for educational purposes. The authors encourage this,
but also request that they be notified when pieces of the code are used.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION
variables. The code should work correctly for any system in which COMPLEX is
represented as 2 REALs. The include file params.i has parameters that can be
used to specify the length of these. Also, some REAL and DOUBLE parameters
values may need to be modified for floating point number systems with large
mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where

        Integers : 4   bytes
	Floats   : 4   bytes

The use of two precisions can be eliminated, but at the cost of a significant
loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases
the error by approximately three orders of magnitude.) DOUBLE PRECISION
results are only used in set-up (computing Gauss weights and nodes and
Legendre polynomial values), and are not used in the body of the computation.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

The sequential code is documented in a file included in the distribution of the
code from NCAR:

Jakob, Ruediger, Description of Software for the Spectral Transform Shallow
Water Model Version 2.0. National Center for Atmospheric Research,
Boulder, CO 80307-3000, August 1992

and in 

Hack, J.J. and R. Jakob, Description of a global shallow water model based on
the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 

Documentation of the parallel code is in preparation, but extensive
documentation is present in the code.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of
   three numerical methods for solving differential equations on
   the sphere, Monthly Weather Review, 117:1058-1075, 1989.

2) Hack, J.J. and R. Jakob, Description of a global
   shallow water model based on the spectral transform method,
   NCAR Technical Note TN-343+STR, January 1992.

3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to
   shallow water test set using the spectral transform method,
   NCAR Technical Note TN-388+STR (in preparation).

4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber,
   A standard test set for numerical approximations to the shallow
   water equations in spherical geometry, Journal of Computational Physics,
   Vol. 102, pp.211-224, 1992.
-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method,
   Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), 
   pp. 269-291.

6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral
   Transform Method. Part II, 
   Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), 
   pp. 509-531.

7) Foster, I. T. and P. H. Worley,
   Parallelizing the Spectral Transform Method: A Comparison of Alternative
   Parallel Algorithms,
   Proceedings of the Sixth SIAM Conference on Parallel Processing for
   Scientific Computing (March22-24, 1993), pp. 100-107.

8) Foster, I. T. and P. H. Worley,
   Parallel Algorithms for the Spectral Transform Method,
   (in preparation)

9) Worley, P. H. and I. T. Foster,
   PSTSWM: A Parallel Algorithm Testbed and Benchmark.
   (in preparation)

-------------------------------------------------------------------------------
Other relevent research papers:

10) I. Foster, W. Gropp, and R. Stevens, 
    The parallel scalability of the spectral transform method, 
    Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 

11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes,
    R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley,
    The Message-Passing Version of the Parallel Community Climate Model,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 500-513.

12) Sato, R. K. and R. D. Loft,
    Implementation of the NCAR CCM2 on the Connection Machine,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 371-393.

13) Barros, S. R. M. and Kauranne, T.,
    On the Parallelization of Global Spectral Eulerian Shallow-Water Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 36-43.

14) Kauranne, T. and S. R. M. Barros,
    Scalability Estimates of Parallel Spectral Atmospheric Models,
    Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in
    Meteorology (Nov. 23-27, 1992)
    Hoffman, G.-R and T. Kauranne, ed., 
    World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, 
    pp. 312-328.

15) Pelz, R. B. and W. F. Stern,
    A Balanced Parallel Algorithm for Parallel Processing,
    Proceedings of the Sixth SIAM Conference on Parallel Processing for
    Scientific Computing (March22-24, 1993), pp. 126-128.

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

The model code is primarily written in Fortran 77, but also uses
DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in
common and parameter declarations). It has been compiled and run on the Intel
iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation,
IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code).

Message passing is implemented using the PICL message passing system.
All message passing is encapsulated in 3 highlevel routines:

BCAST0 (broadcast)
GMIN0  (global minimum)
GMAX0  (global maximum)

two classes of low level routines:
 SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3
 (variants and/or pieces of the swap operation)
and
 SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3
 (variants and/or pieces of the send/recv operation)

and one synchronization primitive:
CLOCKSYNC0

PICL instrumentation commands are also embedded in the code.

Porting the code to another message passing library will be simple, although
some of the runtime communication options may become illegal then.
The PICL instrumentation calls can be stubbed out (or removed) without
changing the functionality of the code, but some sort of synchronization is
needed when timing short benchmark runs.

-------------------------------------------------------------------------------
Total number of lines in source code: 28,204
Number of lines excluding comments  : 12,434
Size in bytes of source code        : 994,299
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

problem:   23 lines, 559 bytes, ascii
algorithm: 33 lines, 874 bytes, ascii

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: Number of lines and bytes is a function of the input
                 specifications, but for benchmarking would normally be
                 63 lines (2000 bytes) of meaningful output. (On the Intel
                 machine, FORTRAN STOP messages are sent from each processor
                 at the end of the run, increasing this number.)

timings:         Each run produces one line of output, containing approx.
                 150 bytes.

Both files are ascii.


-------------------------------------------------------------------------------
Brief, high-level description of what application does:

(P)STSWM solves the nonlinear shallow water equations on the sphere.
The nonlinear shallow water equations constitute a simplified
atmospheric-like fluid prediction model that exhibits many of the features of
more complete models, and that has been used to investigate numerical
methods and benchmark a number of machines.
Each run of PSTSWM uses one of 6 embedded initial conditions and forcing
functions. These cases were chosen to stress test numerical methods for this
problem, and to represent important flows that develop in atmospheric
modeling. STSWM also supports reading in arbitrary initial conditions, but
this was removed from the parallel code to simplify the development of the
initial implementation. 

-------------------------------------------------------------------------------
Main algorithms used:

PSTSWM uses the spectral transform method to solve the shallow water
equations. During each timestep, the state variables of the
problem are transformed between the physical domain, where most of the
physical forces are calculated, and the spectral domain, where the terms of
the differential equation are evaluated. The physical domain is a tensor
product longitude-latitude grid. The spectral domain is the set of spectral
coefficients in a spherical harmonic expansion of of the state variables, and
is normally characterized as a triangular array (using a "triangular"
truncation of spectral coefficients). 

Transforming from physical coordinates to spectral coordinates involves
performing a real FFT for each line of constant latitude, followed by 
integration over latitude using Gaussian quadrature (approximating the
Legendre transform) to obtain the spectral coefficients. The inverse
transformation involves evaluating sums of spectral harmonics and inverse
real FFTs, analogous to the forward transform.

Parallel algorithms are used to compute the FFTs and to compute the 
vector sums used to approximate the forward and inverse Legendre transforms.
Two major alternatives are available for both transforms, distributed
algorithms, using a fixed data decompostion and computing results where they
are assigned, and transpose algorithms, remapping the domains to allow the
transforms to be calculated sequentially. This translates to four major
parallel algorithms:

a) distributed FFT/distributed Legendre transform (LT)
b) transpose FFT/distributed LT
c) distributed FFT/transpose LT
d) transpose FFT/transpose LT

Multiple implementations are supported for each type of algorithm, and
the assignment of processors to transforms is also determined by input
parameters. For example, input parameters specify a logical 2-D processor
grid and define the data decomposition of the physical and spectral domains
onto this grid. If 16 processors are used, these can be arranged as
a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid.
This specification determines how many processors are used to calculate each
parallel FFT and how many are used to calculate each parallel LT.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The main program calls INPUT to read problem and algorithm parameters
and set up arrays for spectral transformations, and then calls
INIT to set up the test case parameters. Routines ERRANL and
NRGTCS are called once before the main timestepping loop for
error normalization, once after the main timestepping for 
calculating energetics data and errors, and periodically during 
the timestepping, as requested. The prognostic fields are 
initialized using routine ANLYTC, which provides the analytic
solution. Each call to STEP advances the computed fields by a 
timestep DT. Timing logic surrounds the timestepping loop, so the
initialization phase is not timed. Also, a fake timestep is calculated before
beginning timing to eliminate the first time "paging" effect currently seen
on the Intel Paragon systems. 

STEP computes the first two time levels by two semi-implicit timesteps;
normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1,
which choses between an explicit numerical algorithm, a semi-implicit
algorithm, and a simplified algorithm associated with solving the advection
equation, one of the embedded test cases. The numerical algorithm used is an
input parameter. 

The basic outline of each timestep is the following:
1) Evaluate non-linear product and forcing terms.
2) Fourier transform non-linear terms in place as a block transform.
3) Compute and update divergence, geopotential, and vorticity spectral
   coefficients. (Much of the calculation of the time update is "bundled"
   with the Legendre transform.)
4) Compute velocity fields and transform divergence, geopotential,
   and vorticity back to gridpoint space using 
   a) an inverse Legendre transform and associated computations and
   b) an inverse real block FFT.

PSTSWM has "fictitious" vertical levels, and all computations are duplicated
on the different levels, potentially significantly increasing the granularity
of the computation. (The number of vertical levels is an input parameter.)
For error analysis, a single vertical level is extracted and analyzed. 

-------------------------------------------------------------------------------
Brief description of I/O behavior:

Processor 0 reads in the input parameters and broadcasts them to the rest of
the processors. Processor 0 also receives the error analysis and timing
results from the other processors and writes them out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :

The processors are treated as a logical 2-D grid. There are 3 domains to be
distributed:
 a) physical domain: tensor product longitude-latitude grid
 b) Fourier domain: tensor product wavenumber-latitude grid
 c) spectral domain: triangular array, where each column contains the
                     spectral coefficients associated with a given
                     wavenumber. The larger the wavenumber is, the shorter
                     the column is.
An unordered FFT is used, and the Fourier and spectral domains use the
"unordered" permutation when the data is being distributed.

I) distributed FFT/distributed LT
   1) The tensor-product longitude-latitude grid is mapped onto the 
      processor grid by assigning a block of contiguous longitudes 
      to each processor column and by assigning one or two blocks of
      contiguous latitudes to each processor row. The vertical dimension is
      not distributed.   
   2) After the FFT, the subsequent wavenumber-latitude grid is similarly
      distributed over the processor grid, with a block of the permuted
      wavenumbers assigned to each processor column.
   3) After the LT, the wavenumbers are distributed as before and the spectral
      coefficients associated with any given wavenumber are either
      distributed evenly over the processors in the column containing that
      wavenumber, or are duplicated over the column. What happens is a
      function of the particular distributed LT algorithm used.

II) transpose FFT/distributed LT
   1) same as in (I)
   2) Before the FFT, the physical domain is first remapped to
      a vertical layer-latitude decomposition, with a block of contiguous
      vertical layers assigned to each processor column and the longitude
      dimension not distributed. After the transform, the vertical
      level-latitude grid is distributed as before, and the wavenumber
      dimension is not distributed. 
   3) After the LT, the spectral coefficients for a given vertical layers are
      either distributed evenly over the processors in a column, or are
      duplicated over that column. What happens is a function of the
      particular distributed LT algorithm used. 

III) distributed FFT/transpose LT
   1) same as (I)
   2) same as (I)
   3) Before the LT, the wavenumber-latitude grid is first remapped to
      a wavenumber-vertical layer decomposition, with a block of contiguous
      vertical layers assigned to eadh processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

IV) transpose FFT/transpose LT
   1) same as (I)
   2) same as (II)
   3) Before the LT, the vertical level-latitude grid is first remapped to
      a vertical level-wavenumber decomposition, with a block of the permuted 
      wavenumbers now assigned to each processor row and the latitude
      dimension not distributed. After the transform, the spectral
      coefficients associated with a given wavenumber and vertical layer
      are all on one processor, and the wavenumbers and vertical layers are
      distributed as before.

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :

The distribution is a function of the problem size (longitude, latitude,
vertical levels), the logical processor grid (PX, PY), and the algorithm
(transpose vs. distributed for FFT and LT).

-------------------------------------------------------------------------------
Brief description of load balance behavior :

The load is fairly well balanced. If PX and PY evenly divide the number of
longitudes, latitudes, and vertical levels, then all load imbalances are due
to the unequal distribution of spectral coefficients. As described above, the
spectral coefficients are laid out as a triangular array in most runs, where
each column corresponds to a different Fourier wavenumber. The wavenumbers are
partitioned among the processors in most of the parallel algorithms. Since
each column is a different length, a wrap mapping of the the columns will
approximately balance the load. Instead, the natural "unordered" ordering of
the FFT is used with a block partitioning, which does a reasonable job of
load balancing without any additional data movement. The load imbalance is
quantified in Walker, et al [5]. 

If PX and PY do not evenly divide the dimensions of the physical domain,
then other load imbalances may be as large as a factor of 2 in the worse
case. 

-------------------------------------------------------------------------------
Give parameters that determine the problem size :

MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation
             used. For a triangular truncation, MM = NN = KK.
NLON, NLAT, NVER
           - number of longitudes, latitudes, and vertical levels. There
             are required relationships between NLON, NLAT, and NVER, and
             between these and MM. These relationships are checked in the
             code. We will also provide a selection of input files that
             specify legal (and interesting) problems.
DT         - timestep (in seconds). (Must be small enough to satisfy Courant
             condition stability condition. Code warns if too large, but does
             not abort.)
TAUE       - end of model run (in hours)

-------------------------------------------------------------------------------
Give memory as function of problem size :

Executable size is determined at compile time by setting the parameters
COMPSZ in params.i. Per node memory requirements are approximately
(in REALs)

associated Legendre polynomial values:
   MM*MM*NLAT/PX*PY
physical grid fields: 
   8*NLON*NLAT*NVER/(PX*PY)
spectral grid fields: 
   3*MM*MM*NVER/(PX*PY) 
 or (if spectral coefficients duplicated within a processor column)
   3*MM*MM*MVER/PX        
work space:
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY)
 or (if spectral coefficients duplicated within a processor column)
   8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX

where BUFS1 and BUFS2 are input parameters (number of communication buffers).
BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY.

In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory
requirements are approximately:

    (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY)
  or
    (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX)


-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

for a serial run per timestep (very rough):
  nonlinear terms:
        10*NLON*NLAT*NVER
  forward FFT:
        40*NLON*NLAT*NVER*LOG2(NLON)
  forward LT and time update:
       48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER
  inverse LT and calculation of velocities:
       20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER
  inverse FFT:
       25*NLON*NLAT*NVER*LOG2(NLON)

Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1):

approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep.

For a total run, multiply by TAUE/DT.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

This is a function of the algorithm chosen.

I) transpose FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*(PX-1) steps, D volume
      or
        2*LOG2(PX) steps, D*LOG2(PX) volume 

II) distributed FFT
   a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY)
        2*LOG2(PX) steps, D*LOG2(PX) volume

III) transpose LT

   a) forward LT:  let D = 8*NLON*NLAT*NVER/(PX*PY)
        2*(PY-1) steps, D volume
      or
        2*LOG2(PY) steps, D*LOG2(PY) volume 

   b) inverse LT:  let D = (3/2)*(MM**2)*NVER/(PX*PY)
        (PY-1) steps, D volume
       or
        LOG2((PY) steps, D*PY volume

IV) distributed LT

   a) forward + inverse LT:  let D = 3*(MM**2)*NVER/(PX*PY)
        2*(PY-1) steps, D*PY volume
       or
        2*LOG2((PY) steps, D*PY volume

These are per timestep costs. Multiply by TAUE/DT for total communication
overhead. 

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

Standard input files will be provided for 

T21: MM=KK=NN=21      T42: MM=KK=NN=42        T85: MM=NN=KK=85
     NLON=32               NLON=64                 NLON=128
     NLAT=64               NLAT=128                NVER=256
     NVER=8                NVER=16                 NVER=32
     ICOND=2               ICOND=2                 ICOND=2
     DT=4800.0             DT=2400.0               DT=1200.0
     TAUE=120.0            TAUE=120.0              TAUE=120.0

These are 5 day runs of the "benchmark" case specified in Williamson, et al
[3]. Flops and memory requirements for serial runs are as follows (approx.):

T21:           500,000 REALs
         2,000,000,000 flops
     
T42:         4,000,000 REALs
        45,000,000,000 flops

T85:        34,391,000 REALs
     1,000,000,000,000 flops

Both memory and flops scale well, so, for example, the T42 run fits in
approx. 4MB of memory for a 4 processor run. But different algorithms and 
different aspect ratios of the processor grid use different amounts of memory.

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

Count by hand (looking primarily at inner loops, but eliminating common
subexpressions that compiler is expected to find).

-------------------------------------------------------------------------------

From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:19:48 1994
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib)
	id KAA14012; Tue, 22 Mar 1994 10:19:45 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10903; Tue, 22 Mar 1994 10:19:27 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:19:17 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK)
	id KAA10892; Tue, 22 Mar 1994 10:19:14 -0500
Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA23268; Tue, 22 Mar 1994 10:18:26 -0500
Message-Id: <9403221518.AA23268@rios2.epm.ornl.gov>
To: spb@epcc.ed.ac.uk
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: ParkBench code
Date: Tue, 22 Mar 94 10:18:26 -0500
From: "David W. Walker" <walker@rios2.epm.ornl.gov>


Dear Dr. Booth,

Thank you for submitting the SOLVER code for inclusion in
the ParkBench Compact Applications benchmark suite. After due
consideration the Compact Applications subcommittee has decided to include
the code in the benchmark suite.  I would be grateful if you would arrange
for the source code, input, and output files to be sent to me.

To submit your code please send me the following:


1. The complete source code

2. Input files corresponding to the small, medium, and large cases
   described in your submission

3. An output file corresponding to the small case to be used for
   validation purposes

4. PostScript files of the following papers mentioned in your submission
   describing the sequential and parallel codes (if available). Also the
   users guide if there is one.

If you have versions of the code using different message passing packages
please supply multiple versions of the source code.

Ultimately we would like the codes to be self-validating. Please can you
let me have any suggestions on what quantities might be checked to
validate the code.

If the above files are too large to email to me, please let me know if there
is an anonymous ftp site where I can copy them from.

Best Regards,
David Walker
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
--------------------------------------------------------------------------


-------------------------------------------------------------------------
                  PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM

To submit a compact application to the ParkBench suite you must follow the
following procedure:

1. Complete the submission form below, and email it to David Walker
   at walker@msr.epm.ornl.gov. The data on this form will be reviewed 
   by the ParkBench Compact Applications Subcommittee, and you will
   be notified if the application is to be considered further for
   inclusion in the ParkBench suite.
   
2. If ParkBench Compact Applications Subcommittee decides to consider
   your application further you will be asked to submit the source code
   and input and output files, together with any documentation and papers
   about the application. Source code and input and output files should
   be submitted by email, or ftp, unless the files are very large, in
   which case a tar file on a 1/4 inch cassette tape. Wherever possible 
   email submission is preferred for all documents in man page, Latex 
   and/or Postscript format. These files documents and papers together
   constitute your application package. Your application package should
   be sent to:
David Walker
                Oak Ridge National Laboratory
                Bldg. 6012/MS-6367
                P. O. Box 2008
                Oak Ridge, TN 37831-6367
                (615) 574-7401/0680 (phone/fax)
                walker@msr.epm.ornl.gov

   The street address is "Bethal Valley Road" if Fedex insists on this.
   The subcommittee will then make a final decision on whether to include 
   your application in the ParkBench suite.

3. If your application is approved for inclusion in the ParkBench suite
   you (or some authorized person from your organization) will be asked
   in complete and sign a form giving ParkBench authority to distribute,
   and modify (if necessary), your application package.

-------------------------------------------------------------------------------
Name of Program         : SOLVER
                        : 
-------------------------------------------------------------------------------
Submitter's Name        : Stephen P. Booth
Submitter's Organization: UKQCD collaboration
Submitter's Address     : EPCC
			  The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
Submitter's Telephone # : +44 (0)31 650 5746
Submitter's Fax #       : +44 (0)31 622 4712
Submitter's Email       : spb@epcc.ed.ac.uk
-------------------------------------------------------------------------------
Cognizant Expert(s)     : Dr S.P.Booth
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5746
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : spb@epcc.ed.ac.uk

Cognizant Expert(s)     : Dr R.D. Kenway
CE's Organization       : EPCC/UKQCD
CE's Address            : The University of Edinburgh
			  James Clerk Maxwell Building
			  The King's Buildings 
			  Mayfield Road
			  Edinburgh EH9 3JZ
		          Scotland
CE's Telephone #        : +44 (0)31 650 5245
CE's Fax #              : +44 (0)31 622 4712
CE's Email              : rdk@epcc.ed.ac.uk

-------------------------------------------------------------------------------
Extent and timeliness with which CE is prepared to respond to questions and
bug reports from ParkBench :

S.Booth is prepared to respond quickly to questions and bug reports.
We have a strong interest in the portability and performance of this code.


-------------------------------------------------------------------------------
Major Application Field : Lattice gauge theory
Application Subfield(s) : QCD
-------------------------------------------------------------------------------
Application "pedigree" (origin, history, authors, major mods) :

SOLVER is part of an ongoing software development exercise carried out
by UKQCD (The United Kingdom Quantum Chromo-Dynamics  collaboration)
To develop a new generation of simulation codes. The current generation
of codes were highly tuned for a particular machine architecture so a
software development exercise was started to design and develop a set of
portable codes. This code was developed by S.Booth and N.Stanford of
the University of Edinburgh during the course of 1993.
Solver is a benchmark code derived from the codes used to generate quark
propagators. It is designed to benchmark and validate the computational 
sections of this operation. It differs from the production code in that
it self initialises to non-trivial test data rather than performing file
access. This is because there is no accepted standard for parallel file
access.
The benchmark was originally developed as part of a national UK procurement
exercise.

-------------------------------------------------------------------------------
May this code be freely distributed (if not specify restrictions) :

The code may be freely distributed for benchmarking purposes but 
the code remains the property of UKQCD and we ask to be contacted
if anyone wishes to use it as an application code.

-------------------------------------------------------------------------------
Give length in bytes of integers and floating-point numbers that should be
used in this application:

All floating point numbers are defined as macros (either Fpoint or Dpoint)
The majority of the variables are Fpoint. Dpoint is only used for
accumulation values that may require higher precision. This allows the
precision of the program to be changed easily. For small and
intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be 
sufficient. For large problems higher precision may be required.
INTEGERS must be large enough to hold the number of sites 
allocated to a processor (4 bytes almost certainly sufficient)
The COMPLEX type is not used.

-------------------------------------------------------------------------------
Documentation describing the implementation of the application (at module
level, or lower) :

Documentation exists for all program routines except some low level
routines local to a single source file.

-------------------------------------------------------------------------------
Research papers describing sequential code and/or algorithms :

-------------------------------------------------------------------------------
Research papers describing parallel code and/or algorithms :

-------------------------------------------------------------------------------
Other relevant research papers:

-------------------------------------------------------------------------------
Application available in the following languages (give message passing system
used, if applicable, and machines application runs on) :

Two version of the application were developed in parallel.
1) A HPF version (both CMF and HPF directives)
2) A message passing version.

The message passing version uses ansi-F77 with the following extensions
a) CPP is used for include files and some simple macros and build-time 
   conditionals.
b) The F77 restrictions of variable names are not adhered to though the
   authors have tools to convert the code to conform.

All of the message passing operations are confined to a small number of
routines. These routines were designed to be implementable in as many
different message passing systems as possible. Current versions are
1) fake - converts the program to a single processor code.
2) PARMACS - original parallel versions
3) PVM - under development.

-------------------------------------------------------------------------------
Total number of lines in source code: 15567
Number of lines excluding comments  : 10679
Size in bytes of source code        : 432398
-------------------------------------------------------------------------------
List input files (filename, number of lines, size in bytes, and if formatted) :

None 

-------------------------------------------------------------------------------
List output files (filename, number of lines, size in bytes, and if formatted) :

standard output: formatted text

-------------------------------------------------------------------------------
Brief, high-level description of what application does:

The application generates quark propagators from a  background gauge
configuration and a fermionic source. This is equivalent to solving 
M psi = source 
where psi is the quark propagator and M (a function operating on psi)
depends on the gauge fields.
The benchmark performs a cut down version of this operation.

-------------------------------------------------------------------------------
Main algorithms used:

Conjugate gradient least norm with red-black pre-conditioning.

-------------------------------------------------------------------------------
Skeleton sketch of application:

The benchmark code initialises the gauge field to a unit gauge
configuration. (The results for a unit gauge can be calculated
analytically allowing a check on the results)
A gauge transformation is then applied to the gauge field. A unit gauge
field only consists of zeros and ones by applying a gauge transformation
non-trivial values are generated. Quantities corresponding to physical
observables should be unchanged by such a transformation. 
In application code the gauge field would have been read in from disk.
The source field is initialised to a point source (a single non-zero
point on one lattice site)
An iterative solver is called to generate the quark propagator.
The solver routine also generates timing information.
In application code this would then be dumped to disk.
In the benchmark we use the quark propagator to generate a physically
significant quantity (the pion propagator). This generates a single real
number for each timeslice of the lattice. These values are printed to
standard out.

This procedure requires a large number of iterations. For benchmarking
we are only interested in the time per-iteration and some check on the
validity of the results. We therefore usually only perform a fixed
number of iterations (say 50) to generate accurate timing information
and verify the results by comparison with other machines.

-------------------------------------------------------------------------------
Brief description of I/O behaviour:

Unless an error occurs a single processor outputs to standard out.

-------------------------------------------------------------------------------
Describe the data distribution (if appropriate) :
A spacial decomposition is used to distribute the 4-D arrays over a 4-D
grid of processors. Each dimension is distributed independently.
The program supports non-regular decomposition,
e.g. a lattice of width 22 will be distributed across a processor-grid
of width 4 as (6, 6, 5, 5)

-------------------------------------------------------------------------------
Give parameters of the data distribution (if appropriate) :
Lattice size:     NX NY NZ NT
processor grid:   NPX NPY NPZ NPT

-------------------------------------------------------------------------------
Brief description of load balance behavior :

Load balancing depends only on the distribution, if the lattice size can
be exactly divided by the processor grid size all processors will have 
the same workload. In practice it is often useful to trade load
balancing for a larger number of processors.

-------------------------------------------------------------------------------
Give parameters that determine the problem size :
Lattice size, NX NY NZ NT
problem size is NX*NY*NZ*NT
-------------------------------------------------------------------------------
Give memory as function of problem size :

In a production environment there are build time parameters that
set the array sizes and problem/machine sizes can be set at runtime. 
When creating a benchmark program it seemed less confusing to set
lattice and processor-grid sizes at build time and derive all other
quantities from them. The appropriate parameters for memory use are
Max_body (maximum number of data-points per/processor)
Max_bound (maximum number of data points on a single boundary between
   two processors)
If LX LY LZ LT are the local lattice sizes obtained by dividing the
lattice size by the processor grid size and rounding up to the nearest integer.
Max_body = (LX*LY*LZ*LT)/2
Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 )

The code contains a number of build-time switches for variations
in the implementation that may be beneficial on some machines. The
memory usage depends on these switches but typical values are:
108 * Max_body + 36 * Max_bound Fpoints
16 * (Max_body + Max_bound) INTEGERS

-------------------------------------------------------------------------------
Give number of floating-point operations as function of problem size :

Each iteration performs 2760 floating point operations per lattice site.
ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations.

-------------------------------------------------------------------------------
Give communication overhead as function of problem size and data distribution :

For each iteration every processor sends 24 messages to each of its 8
neighbours each message contains one floating point number for each
lattice point in the common boundary. Two global sum operations are also
performed for each iteration.

-------------------------------------------------------------------------------
Give three problem sizes, small, medium, and large for which the benchmark
should be run (give parameters for problem size, sizes of I/O files,
memory required, and number of floating point operations) :

18^3*36		2.90e+10 fp operations
24^3*48		9.16e+10 fp operations
36^3*72		4.64e+11 fp operations

-------------------------------------------------------------------------------
How did you determine the number of floating-point operations (hardware
monitor, count by hand, etc.) :

count operations in each loop by hand. The code contains a counter to
sum these values.

-------------------------------------------------------------------------------
Other relevant information:



-------------------------------------------------------------------------------


From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 08:44:32 1995
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA15646; Mon, 13 Mar 1995 08:44:31 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA14363; Mon, 13 Mar 1995 08:45:00 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 08:44:57 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from vax.darpa.mil by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA14339; Mon, 13 Mar 1995 08:44:55 -0500
Received: from next63.darpa.mil  (next63.darpa.mil) by vax.darpa.mil (5.65c/5.61+local-5)
	id <AA14843>; Mon, 13 Mar 1995 08:44:53 -0500
Received: by  next63.darpa.mil  (NX5.67d/NeXT-2.0)
	id AA00427; Mon, 13 Mar 95 08:43:24 -0500
Message-Id: <9503131343.AA00427@ next63.darpa.mil >
Content-Type: text/plain
Mime-Version: 1.0 (NeXT Mail 3.3 v118.2)
Received: by NeXT.Mailer (1.118.2)
From: Jose Munoz <jmunoz@next63.darpa.mil>
Date: Mon, 13 Mar 95 08:43:22 -0500
To: pbwg-compactapp@CS.UTK.EDU
Subject: realtime?

Hello,
I'm interested in identifying a set of realtime benchmarks for  
embedded appls.
Is this a good place to start (I thinkk so)?  Im in the process of  
dl a copy of
the report (as I write) and hopefully will have more focused  
questions.  In general
I'm interested in  (1) has a benchmark std. been def'd, (2) are  
metrics id'd, (3)
how is the underlying hw id'd?

Thanks.
Jose
---
<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<  Dr. Jose L. Munoz        | email: jmunoz@arpa.mil    >
<  ARPA/CSTO                |                           >
<  3701 N. Fairfax Dr.      | Phone: (703)696-4468      >
<  Arlington, VA 22203-1714 | FAX:   (703)696-2202      >
<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 12:10:57 1995
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA19933; Mon, 13 Mar 1995 12:10:56 -0500
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA25609; Mon, 13 Mar 1995 11:08:01 -0500
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 11:07:59 EST
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id LAA25596; Mon, 13 Mar 1995 11:07:56 -0500
Received: (from walker@localhost) by rios2.EPM.ORNL.GOV (8.6.10/8.6.10) id LAA18850; Mon, 13 Mar 1995 11:07:20 -0500
From: David Walker <walker@rios2.EPM.ORNL.GOV>
Message-Id: <199503131607.LAA18850@rios2.EPM.ORNL.GOV>
To: Jose Munoz <jmunoz@next63.darpa.mil>
Cc: pbwg-compactapp@CS.UTK.EDU
Subject: Re: realtime? 
In-reply-to: (Your message of Mon, 13 Mar 95 08:43:22 EST.)
             <9503131343.AA00427@ next63.darpa.mil > 
Date: Mon, 13 Mar 95 11:07:19 -0500

Jose,

ParkBench is a proposed set of standard benchmarks, but has not
be officially sanctioned by any standrads body such as ISO.
Several metrics, detailed in the Parkbench report have been identified.
For more information, please take a look at the www page at:

http://www.epm.ornl.gov/~walker/parkbench/

Regards,
David
--------------------------------------------------------------------------
| David W. Walker                 |   Office   : (615) 574-7401          |
| Oak Ridge National Laboratory   |   Fax      : (615) 574-0680          |
| Building 6012/MS-6367           |   Messages : (615) 574-1936          |
| P. O. Box 2008                  |   Email    : walker@msr.epm.ornl.gov |
| Oak Ridge, TN 37831-6367        |                                      |
|               WEB: http://www.epm.ornl.gov/~walker/                    |
--------------------------------------------------------------------------
From owner-parkbench-compactapp@CS.UTK.EDU Fri Sep  8 16:36:42 1995
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA14450; Fri, 8 Sep 1995 16:36:42 -0400
Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA04473; Fri, 8 Sep 1995 16:36:21 -0400
X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Fri, 8 Sep 1995 16:36:20 EDT
Errors-to: owner-parkbench-compactapp@CS.UTK.EDU
Received: from franklin.seas.gwu.edu by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA04465; Fri, 8 Sep 1995 16:36:18 -0400
Received: from felix.seas.gwu.edu (abdullah@felix.seas.gwu.edu [128.164.9.3]) by franklin.seas.gwu.edu (v8) with ESMTP id QAA10099 for <parkbench-compactapp@cs.utk.edu>; Fri, 8 Sep 1995 16:36:16 -0400
Received: (from abdullah@localhost) by felix.seas.gwu.edu (8.6.12/8.6.12) id QAA07113 for parkbench-compactapp@cs.utk.edu; Fri, 8 Sep 1995 16:36:12 -0400
Date: Fri, 8 Sep 1995 16:36:12 -0400
From: Abdullah Meajil <abdullah@seas.gwu.edu>
Message-Id: <199509082036.QAA07113@felix.seas.gwu.edu>
To: parkbench-compactapp@CS.UTK.EDU
Subject: subscribe

subscribe

From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 10:51:58 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA09606; Fri, 28 Jun 1996 10:51:57 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA20519; Fri, 28 Jun 1996 10:51:17 -0400
Received: from convex.convex.com (convex.convex.com [130.168.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA20506; Fri, 28 Jun 1996 10:51:07 -0400
Received: from bach.convex.com by convex.convex.com (8.6.4.2/1.35)
	id JAA01420; Fri, 28 Jun 1996 09:50:28 -0500
Received: from localhost by bach.convex.com (8.6.4/1.28)
	id JAA09161; Fri, 28 Jun 1996 09:50:27 -0500
From: hari@bach.convex.com (Harikumar Sivaraman)
Message-Id: <199606281450.JAA09161@bach.convex.com>
Subject: Bug report on COMMS3.f in PARKBENCH2.0
To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Date: Fri, 28 Jun 96 9:50:26 CDT
Cc: romero@bach.convex.com (Paco Romero)
X-Mailer: ELM [version 2.3 PL11]

DISCLAIMER: The contents of this mail are not an official HP position.
	    I do not speak for HP.


The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of 
the specifications in the MPI standard. The benchmark attempts to do an
MPI_RECV into the same buffer on which it has posted an MPI_ISEND
before it does an MPI_WAIT. The relevant code fragment is as below:

COMMS3  (This code fragments applies in the case of two processors)
------
CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....

CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......

CALL MPI_WAIT(request(NSLAVE), status, ierr)


COMMS3  (Multiple processors)
------

do i = 1, #processors
   CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....
enddo

// The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A"
//  is reused inside the loop.

do i = 1, #processors
   CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......
enddo

do i = 1, #processors
   CALL MPI_WAIT(request(NSLAVE), status, ierr)
enddo

Comments:
---------
The MPI standards (page 40, last but one paragraph) says "the sender should
not access any part of the send buffer after a nonblocking send operation 
is called, until the send completes." Page 41, line 1 of the MPI standards says 
"the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking
communication". Clearly the reuse of buffer "A" in the code fragments
above is in violation of the standard. 

-------
H. Sivaraman                                  (214) 497 - 4374
HP; 3000 Waterview Pk.way
Dallas, TX - 75080


From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep  9 20:31:06 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id UAA24848; Mon, 9 Sep 1996 20:31:05 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id UAA10076; Mon, 9 Sep 1996 20:29:21 -0400
Received: from convex.convex.com (convex.convex.com [130.168.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id UAA10069; Mon, 9 Sep 1996 20:29:17 -0400
Received: from brittany.rsn.hp.com by convex.convex.com (8.6.4.2/1.35)
	id PAA25214; Mon, 9 Sep 1996 15:42:49 -0500
Received: from localhost by brittany.rsn.hp.com with SMTP
	(1.38.193.4/16.2) id AA16691; Mon, 9 Sep 1996 15:39:52 -0500
Sender: sercely@convex.convex.com
Message-Id: <32348098.3BF5@convex.com>
Date: Mon, 09 Sep 1996 15:39:52 -0500
From: Ron Sercely <sercely@convex.convex.com>
Organization: Hewlett-Packard Convex Technology Center
X-Mailer: Mozilla 2.0 (X11; I; HP-UX A.09.05 9000/710)
Mime-Version: 1.0
To: parkbench-lowlevel@CS.UTK.EDU
Cc: wallach@convex.convex.com, romero@convex.convex.com,
        sercely@convex.convex.com
Subject: comms2 and comms3 bugs, mpi release 
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

HP/Convex wants to release lowlevel numbers in two weeks, but we are
trying to
figure out what to do about the bugs we have reported in these codes.

Options are:
Submitting results without these tests
HP/Convex Re-writing the benchmarks to "do the right thing"
other ?

I would appreciate a phone call to discuss these issues.
-- 
Ron Sercely
214.497.4667

HP/CXTC Toolsmith

From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:23:38 1996
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA00602; Tue, 10 Sep 1996 07:23:36 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA24084; Tue, 10 Sep 1996 05:20:31 -0400
Received: from postoffice.npac.syr.edu (postoffice.npac.syr.edu [128.230.7.30]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA24037; Tue, 10 Sep 1996 05:20:22 -0400
Received: from yosemite (pc280.sis.port.ac.uk [148.197.205.60]) by postoffice.npac.syr.edu (8.7.5/8.7.1) with SMTP id FAA00584; Tue, 10 Sep 1996 05:13:39 -0400 (EDT)
From: Mark Baker <mab@npac.syr.edu>
Date: Tue, 10 Sep 96 10:10:24    
Subject: RE: comms2 and comms3 bugs, mpi release 
To: parkbench-lowlevel@cs.utk.edu, Ron Sercely <sercely@convex.convex.com>
Cc: wallach@convex.convex.com, romero@convex.convex.com,
        sercely@convex.convex.com, erich@cs.utk.edu, dongarra@cs.utk.edu,
        ajgh@ecs.soton.ac.uk
X-PRIORITY: 3 (Normal)
X-Mailer: Chameleon notFound, TCP/IP for Windows, NetManage Inc.
Message-ID: <Chameleon.842347097.mab@yosemite>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=us-ascii

Ron,

Ian Glendenning and I produced the first MPI port of the low-level
codes for Parkbench approximately a year ago. 

Erich Strohmaier (who works for Jack Dongarra at UTK) has been managing
and maintaining all the parkbench codes since then.

I would suggest he reply to you on the subject.

If you do not get a reply I am willing to help.

Regards

Mark


On Mon, 09 Sep 1996 15:39:52 -0500  Ron Sercely <sercely@convex.convex.com> 
wrote:

>HP/Convex wants to release lowlevel numbers in two weeks, but we are
>trying to
>figure out what to do about the bugs we have reported in these codes.
>
>Options are:
>Submitting results without these tests
>HP/Convex Re-writing the benchmarks to "do the right thing"
>other ?
>
>I would appreciate a phone call to discuss these issues.
>-- 
>Ron Sercely
>214.497.4667
>
>HP/CXTC Toolsmith
>

-------------------------------------
Dr Mark Baker
DIS, University of Portsmouth, Hants, UK
E-mail: mab@npac.syr.edu
Date: 10/09/96 - Time: 10:10:24
URL http://www.npac.syr.edu/
-------------------------------------


From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:27:37 1996
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA00650; Tue, 10 Sep 1996 07:27:37 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id EAA15736; Tue, 10 Sep 1996 04:02:25 -0400
Received: from beech.soton.ac.uk (beech.soton.ac.uk [152.78.128.78]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id DAA15421; Tue, 10 Sep 1996 03:59:42 -0400
Received: from bright.ecs.soton.ac.uk (bright.ecs.soton.ac.uk [152.78.64.201])
   by beech.soton.ac.uk (8.6.12/hub-8.5a) with SMTP id IAA22959;
   Tue, 10 Sep 1996 08:57:52 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Tue, 10 Sep 96 08:57:21 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from caesar.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Tue, 10 Sep 96 08:59:09 BST
Date: Tue, 10 Sep 96 08:58:36 BST
Message-Id: <2546.9609100758@caesar.ecs.soton.ac.uk>
To: parkbench-comm@cs.utk.edu, parkbench-lowlevel@cs.utk.edu,
        sercely@convex.convex.com
Subject: Re: comms2 and comms3 bugs, mpi release
Cc: wallach@convex.convex.com, romero@convex.convex.com

Hi Ron,

Are you talking about the same or similar bugs as the ones reported for
the comms3 benchmark by Harikumar Sivaraman at the end of June (see the
included message below)?

			-Vladimir Getov

p.s. Apologies if you receive this message more than once - I have 
included parkbench-comm@CS.UTK.EDU on the "To:" line but do not know
the cross membership.
> 
> HP/Convex wants to release lowlevel numbers in two weeks, but we are
> trying to
> figure out what to do about the bugs we have reported in these codes.
> 
> Options are:
> Submitting results without these tests
> HP/Convex Re-writing the benchmarks to "do the right thing"
> other ?
> 
> I would appreciate a phone call to discuss these issues.
> -- 
> Ron Sercely
> 214.497.4667
> 
> HP/CXTC Toolsmith
> 
____________________________  included message  _______________________
>From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 15:54:32 1996
From: hari@bach.convex.com (Harikumar Sivaraman)
Subject: Bug report on COMMS3.f in PARKBENCH2.0
To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Date: Fri, 28 Jun 96 9:50:26 CDT
Cc: romero@bach.convex.com (Paco Romero)
X-Mailer: ELM [version 2.3 PL11]
Content-Length: 1559
X-Status: 

DISCLAIMER: The contents of this mail are not an official HP position.
	    I do not speak for HP.


The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of 
the specifications in the MPI standard. The benchmark attempts to do an
MPI_RECV into the same buffer on which it has posted an MPI_ISEND
before it does an MPI_WAIT. The relevant code fragment is as below:

COMMS3  (This code fragments applies in the case of two processors)
------
CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....

CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......

CALL MPI_WAIT(request(NSLAVE), status, ierr)


COMMS3  (Multiple processors)
------

do i = 1, #processors
   CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, .....
enddo

// The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A"
//  is reused inside the loop.

do i = 1, #processors
   CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ......
enddo

do i = 1, #processors
   CALL MPI_WAIT(request(NSLAVE), status, ierr)
enddo

Comments:
---------
The MPI standards (page 40, last but one paragraph) says "the sender should
not access any part of the send buffer after a nonblocking send operation 
is called, until the send completes." Page 41, line 1 of the MPI standards says 
"the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking
communication". Clearly the reuse of buffer "A" in the code fragments
above is in violation of the standard. 

-------
H. Sivaraman                                  (214) 497 - 4374
HP; 3000 Waterview Pk.way
Dallas, TX - 75080



From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 08:46:41 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA01821; Tue, 10 Sep 1996 08:46:40 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA13971; Tue, 10 Sep 1996 08:41:06 -0400
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA13960; Tue, 10 Sep 1996 08:40:59 -0400
From: Erich Strohmaier <erich@CS.UTK.EDU>
Received:  by rudolph.cs.utk.edu (cf v2.11c-UTK)
          id IAA13912; Tue, 10 Sep 1996 08:40:58 -0400
Date: Tue, 10 Sep 1996 08:40:58 -0400
Message-Id: <199609101240.IAA13912@rudolph.cs.utk.edu>
To: parkbench-lowlevel@CS.UTK.EDU, sercely@convex.convex.com
Subject: Re:  comms2 and comms3 bugs, mpi release
Cc: romero@convex.convex.com, wallach@convex.convex.comh

Ron,

We fixed the two bugs you mentioned and we are currently testing the
new codes.  The new version should be out by end of this week.  If you
would like to get it earlier, please let me know.


Best Regards

Erich



===========================================================================
Erich Strohmaier                       email:  erich@cs.utk.edu
Department of Computer Science         phone:  ++ 1 (423) 974 0293
104 Ayres Hall                         fax  :  ++ 1 (423) 974 8296
Knoxville TN, 37996 - USA              http://www.cs.utk.edu/~erich/

From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 18:13:11 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id SAA06946; Tue, 10 Sep 1996 18:13:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA05907; Tue, 10 Sep 1996 18:12:17 -0400
Received: from VNET.IBM.COM (vnet.ibm.com [199.171.26.4]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA05894; Tue, 10 Sep 1996 18:12:13 -0400
Message-Id: <199609102212.SAA05894@CS.UTK.EDU>
Received: from PKEDVM9 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 2875;
   Tue, 10 Sep 96 18:12:14 EDT
Date: Tue, 10 Sep 96 18:11:11 EDT
From: "C. George Hsi" <HSI@PKEDVM9.VNET.IBM.COM>
To: parkbench-lowlevel@CS.UTK.EDU

Hi, could you please add my name to the ParkBench Low-Level mailing
list?  I work in the RS/6000 SP performance measurement area at IBM
Poughkeepsie, and have been involved in using the ParkBench Low-Level
code recently.  My address is:   hsi@pkedvm9.vnet.ibm.com

Thanks for your help,

C. George Hsi




From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep 16 15:02:05 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA24616; Mon, 16 Sep 1996 15:02:04 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA17941; Mon, 16 Sep 1996 14:51:47 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA17934; Mon, 16 Sep 1996 14:51:45 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA05937; Mon, 16 Sep 1996 18:49:20 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9609161449.ZM5935@blueberry.cs.utk.edu>
Date: Mon, 16 Sep 1996 14:49:20 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@@CS.UTK.EDU, cs.utk.edu@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
Subject: ParKBench Release 2.1
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Hello,

The release 2.1 of ParKBench is available at netlib:
   http://www.netlib.org/parkbench/

It contains the following bug fixes:
- Comms2 for MPI made to be a true exchange benchmark using MPI_SENDRECV.
- Comms3 for MPI using wild-card  and second buffer.
- Added missing mpif.f for the MPI2PVM library.
- Fixed Makefiles.
- make.local.def modifications.
- Updated conf/make.def.SP2MPI.
- LU Solver fixed though the use of a flag to the Blacs build in the Bmakes.
- Addition of the definition for mpi_group_translate_ranks in Bdef.h.
- PBLAS bug solved with new BLACS compilation.

Best Regards


Erich Strohmaier

email:  erich@cs.utk.edu

From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 14 14:28:34 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA06896; Mon, 14 Oct 1996 14:28:34 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA07493; Mon, 14 Oct 1996 14:22:58 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA07485; Mon, 14 Oct 1996 14:22:53 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA13307; Mon, 14 Oct 1996 18:20:29 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9610141420.ZM13305@blueberry.cs.utk.edu>
Date: Mon, 14 Oct 1996 14:20:27 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU
Subject: ParkBench Workshop: Tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii


Dear Colleague,

The ParkBench (Parallel Benchmark Working Group) will meet in
Knoxville, Tennessee on October 31th, 1996.

The format of the meeting is:
Thursday October 31th
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

The tentative agenda for the meeting is:

  1. Minutes of last meeting

     Current release:
  2. Status report and experience with the current release
  3. Examine the results obtained

     Next release:
  4. New HPF Low Level benchmarks
  5. New shared memory Low Level benchmarks
  6. New performance database design and new benchmark output format
  7. Update of GBIS with new Web front-end

  8. Report from other benchmark activities

     ParkBench:
  9. Discussion of ParkBench group structure
 10. ParkBench Bibliography
 11. Status of ParkBench funding

     Other Activities:
 12. Discussion of the Supercomputing'96 activities
 13. "Electronic Benchmarking Journal" - status report

 14. Miscellaneous

 15. Date and venue for next meeting


The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

When making arrangements tell the hotel you are associated
with the Parallel Benchmarking or ParkBench or Park.
The rate about $75.00/night.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

==>  Please make your reservation as soon as possible!



Jack Dongarra
Erich Strohmaier


From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 21 16:14:12 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA11230; Mon, 21 Oct 1996 16:14:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA21293; Mon, 21 Oct 1996 15:57:23 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id PAA20796; Mon, 21 Oct 1996 15:54:50 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id TAA16003; Mon, 21 Oct 1996 19:52:28 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9610211552.ZM16001@blueberry.cs.utk.edu>
Date: Mon, 21 Oct 1996 15:52:27 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU
Subject: ParKBench Workshop
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

All of you who are planning to come to the next meeting
---  http://www.netlib.org/parkbench/ ---
please send email to us so we can make local arrangements.

Thank you very much


Erich Strohmaier



From owner-parkbench-compactapp@CS.UTK.EDU Tue Dec  3 21:46:51 1996
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA14230; Tue, 3 Dec 1996 21:46:50 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA13342; Tue, 3 Dec 1996 21:45:10 -0500
Received: from alberta.sallynet.com (root@[208.1.117.130]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id VAA13325; Tue, 3 Dec 1996 21:45:06 -0500
Received: from euphoria.com (Cust28.Max45.Seattle.WA.MS.UU.NET [153.34.132.156]) by alberta.sallynet.com (8.7.4/8.7.3) with SMTP id RAA06216; Tue, 3 Dec 1996 17:11:18 -0500 (EST)
Message-Id: <199612032211.RAA06216@alberta.sallynet.com>
Comments: Authenticated sender is <promote@mail.strutstuff.com>
From: mail.strutstuff.com@alberta.sallynet.com
To: "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)>,
        "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)>
Date: Tue, 3 Dec 1996 14:12:21 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Subject: Free offer
Priority: normal
X-mailer: Pegasus Mail for Win32 (v2.42a)

Strut Your Stuff!
1001 FREE Places to Promote your site!
http://www.strutyourstuff.com
---------------------------------

If you like to be removed from any future
free offers, simple type the word "remove"
in the subject line. Thank You.

From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 16:40:22 1997
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA10091; Wed, 23 Apr 1997 16:40:22 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA02831; Wed, 23 Apr 1997 16:40:25 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA02732; Wed, 23 Apr 1997 16:40:00 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu>
Date: Wed, 23 Apr 1997 14:36:16 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The format of the meeting is:

Friday May 9th, 1997.
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

There might be also a joint session with the SPEC/HPG group
on Thursday 8th at about 3pm-5pm


----------------
Please send us your comments about the tentative agenda:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
? 6. New shared memory Low Level benchmarks (MBa)
? 7. New performance database design and new benchmark output format (MBa,VG)
? 8. Update of GBIS with new Web front-end (MBa,VG)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (RS)
 10. SPEC (RE)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK
 14. "Electronic Benchmarking Journal" - status report -

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 19:11:02 1997
Return-Path: <owner-parkbench-compactapp@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id TAA12012; Wed, 23 Apr 1997 19:11:01 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id TAA16877; Wed, 23 Apr 1997 19:10:25 -0400
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id TAA16794; Wed, 23 Apr 1997 19:09:55 -0400
Received: from mordillo (node3.remote.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA29461; Thu, 24 Apr 97 00:10:42 BST
Date: Wed, 23 Apr 97 23:56:13    
From: Mark Baker <mab@sis.port.ac.uk>
Subject: RE: ParkBench Committee Meeting - tentative Agenda 
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU, Erich Strohmaier <erich@CS.UTK.EDU>
X-Priority: 3 (Normal)
X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc.
Message-Id: <Chameleon.861836407.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=us-ascii

Erich,

Some corrections...

--- On Wed, 23 Apr 1997 14:36:16 -0400  Erich Strohmaier 
<erich@CS.UTK.EDU> wrote:


>Please send us your comments about the tentative agenda:
>
>  1. Minutes of last meeting (MBe)
>
>     Changes to Current release:
>  2. Low Level (ES, VG, RS)
>     comms1, comms2, comms3, poly2
>  3. Linear Algebra (ES)
>  4. Compact Applications - NPBs (SS, ES)
>
>     New benchmarks:
>  5. HPF Low Level benchmarks (MBa)
>? 6. New shared memory Low Level benchmarks (MBa)

Can you change this to report on our I/O benchmark efforts.

>? 7. New performance database design and new benchmark output 
format (MBa,VG)
>? 8. Update of GBIS with new Web front-end (MBa,VG)

Tony or I will update the committe on the new
back/fronts ends of GBIS + hopefully also give a demo.

VG, as far as I know, is not involved in this activity.

Regards

Mark


-------------------------------------
DIS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 4/23/97 - Time: 11:56:13 PM
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-compactapp@cs.utk.edu Sat Apr 26 06:40:56 1997
Return-Path: <owner-parkbench-compactapp@cs.utk.edu>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA20901; Sat, 26 Apr 1997 06:40:55 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA18130; Wed, 23 Apr 1997 14:37:56 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id OAA18062; Wed, 23 Apr 1997 14:36:39 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT
From: "Erich Strohmaier" <erich@cs.utk.edu>
Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu>
Date: Wed, 23 Apr 1997 14:36:16 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-lowlevel@cs.utk.edu, parkbench-comm@cs.utk.edu,
        parkbench-hpf@cs.utk.edu
Subject: ParkBench Committee Meeting - tentative Agenda
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The format of the meeting is:

Friday May 9th, 1997.
   9:00 - 12.00  Full group meeting
  12.00 -  1.30  Lunch
   1.30 -  5.00  Full group meeting

There might be also a joint session with the SPEC/HPG group
on Thursday 8th at about 3pm-5pm


----------------
Please send us your comments about the tentative agenda:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
? 6. New shared memory Low Level benchmarks (MBa)
? 7. New performance database design and new benchmark output format (MBa,VG)
? 8. Update of GBIS with new Web front-end (MBa,VG)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (RS)
 10. SPEC (RE)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK
 14. "Electronic Benchmarking Journal" - status report -

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-comm@CS.UTK.EDU Fri May  2 15:53:02 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA00358; Fri, 2 May 1997 15:53:02 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA13341; Fri, 2 May 1997 15:44:43 -0400
Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id PAA13327; Fri, 2 May 1997 15:44:36 -0400
Received:  by blueberry.cs.utk.edu (cf v2.11c-UTK)
          id TAA08348; Fri, 2 May 1997 19:44:04 GMT
From: "Erich Strohmaier" <erich@CS.UTK.EDU>
Message-Id: <9705021544.ZM8346@blueberry.cs.utk.edu>
Date: Fri, 2 May 1997 15:44:03 -0400
X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=<YfvQ8HrQFkH>P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2
X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail)
To: parkbench-comm@CS.UTK.EDU
Subject: ParkBench Committee Meeting
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

Dear Colleague,

Here is the revised agenda.
Please send me ASAP a short email if you come
so that we can arrange for a meeting room.

-------------------
The ParkBench (Parallel Benchmark Working Group)
will meet in Knoxville, Tennessee on
May 9th, 1997.

The meeting site will be the Knoxville Downtown Hilton Hotel.
We have made arrangements with the Hilton Hotel in Knoxville.

  Hilton Hotel
  501 W. Church Street
  Knoxville, TN
  Phone:  423-523-2300

When making arrangements tell the hotel you are associated with
the 'ParkBench'. The rate about $79.00/night.
You can download a postscript map of the area by looking at
http://www.netlib.org/utk/people/JackDongarra.html.

----------------
The tentative agenda for the meeting is:

  1. Minutes of last meeting (MBe)

     Changes to Current release:
  2. Low Level (ES, VG, RS)
     comms1, comms2, comms3, poly2
  3. Linear Algebra (ES)
  4. Compact Applications - NPBs (SS, ES)

     New benchmarks:
  5. HPF Low Level benchmarks (MBa)
  6. Java Low-Level Benchmarks (VG)
  7. New I/O benchmark benchmarks (MBa)
  8. New performance database design and new benchmark output format
     Update of GBIS with new Web front-end (MBa,TH)

     Report from other benchmark activities
  9. ASCI Benchmark Codes (AH)
 10. SPEC-HPG (RE, JD)

     ParkBench:
 11. ParkBench Bibliography
 12. ParkBench Report 2

     Other Activities:
 13. Discussion of the ParkBench Workshop 11/12 September, UK (TH, MBa)
 14. PEMCS - "Electronic Benchmarking Journal" - status report - (TH, MBa)
 15. Status of Funding proposals (JD, TH)

 15. Miscellaneous -

 16. Date and venue for next meeting -


  (MBa) Mark Baker          Univ. of Portsmouth
  (MBe) Michael Berry       Univ. of Tennessee
  (JD)  Jack Dongarra       Univ. of Tenn./ORNL
  (RE)  Rudi Eigenmann      SPEC
  (VG)  Vladimir Getov      Univ. of Westminister
  (TH)  Tony Hey            Univ. of Southampton
  (AH)  Adolfy Hoisie       LLNL
  (SS)  Subhash Saini       NASA Ames
  (RS)  Ron Sercely         HP/CXTC
  (ES)  Erich Strohmaier    Univ. of Tennessee


Jack Dongarra
Erich Strohmaier


From owner-parkbench-comm@CS.UTK.EDU Tue May  6 14:46:45 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA04480; Tue, 6 May 1997 14:46:45 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA25737; Tue, 6 May 1997 14:34:05 -0400
Received: from punt-2.mail.demon.net (relay-11.mail.demon.net [194.217.242.137]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id OAA25715; Tue, 6 May 1997 14:33:58 -0400
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net
           id aa1000641; 6 May 97 19:07 BST
Message-ID: <UOrwADAXM3bzEwfw@minnow.demon.co.uk>
Date: Tue, 6 May 1997 19:06:15 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Parkbench Meeting Documents
In-Reply-To: <9705021544.ZM8346@blueberry.cs.utk.edu>
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.01 <kRL7V2isFfDmnKSZb08I5Tyfx$>

AGENDA ITEM:

>     Changes to Current release:
>  2. Low Level (VG)
>     comms1, comms2,

Two documents will be submitted to the committee on this item by Roger
Hockney and Vladimir Getov (Westminster University, UK). They can be
downloaded as postscript files from:

"New COMMS1 Benchmark: Results and Recommendations"
http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER2.PS
 
"New COMMS1 Benchmark: The Details"
http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER3.PS

The papers will be presented by Vladimir who will bring some paper
copies with him.

Best wishes
Roger and Vladimir
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue May  6 17:54:47 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA07526; Tue, 6 May 1997 17:54:46 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA17012; Tue, 6 May 1997 17:48:50 -0400
Received: from punt-1.mail.demon.net (relay-7.mail.demon.net [194.217.242.9]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA17003; Tue, 6 May 1997 17:48:47 -0400
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-1.mail.demon.net
           id aa0623986; 6 May 97 21:37 BST
Message-ID: <IQsX3CAKQ5bzEw9M@minnow.demon.co.uk>
Date: Tue, 6 May 1997 21:26:50 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Parkbench Meeting Documents (Correction)
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.01 <kRL7V2isFfDmnKSZb08I5Tyfx$>

I am resending this because there was a typo in the URLs:
There are two MM in "minnow". 

Also if you took PBPAPER2.PS before receiving this repeat message,
please take it again as I have corrected two errors in the graphs.

SORRY 
Roger
************************
AGENDA ITEM:

>     Changes to Current release:
>  2. Low Level (VG)
>     comms1, comms2,

Two documents will be submitted to the committee on this item by Roger
Hockney and Vladimir Getov (Westminster University, UK). They can be
downloaded as postscript files from:

CORRECTED URLs:

"New COMMS1 Benchmark: Results and Recommendations"
http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER2.PS
              
"New COMMS1 Benchmark: The Details"
http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER3.PS

The papers will be presented by Vladimir who will bring some paper
copies with him.

Best wishes
Roger and Vladimir
-- 
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon May 12 05:36:41 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA24086; Mon, 12 May 1997 05:36:41 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA10068; Mon, 12 May 1997 05:18:21 -0400
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA10051; Mon, 12 May 1997 05:18:18 -0400
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id FAA29262; Mon, 12 May 1997 05:18:16 -0400 (EDT)
Date: Mon, 12 May 1997 05:18:16 -0400 (EDT)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199705120918.FAA29262@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Gordon Conference on HPC and NII 
Forwarding: Mail from 'Tony Skjellum <tony@aurora.cs.msstate.edu>'
     dated: Sat, 10 May 1997 16:32:12 -0500 (CDT)
Cc: worley@haven.EPM.ORNL.GOV

Just in case you haven't received information on this already, here is a
blurb on the 1997 Gordon conference in high performance computing. 
Unlike previous years, there is not an explicit emphasis on performance
evaluation in this year's stated themes, but you can't (shouldn't) discuss
future architectures and their impacts without discussing how to
evaluate performance, and I am hoping that some benchmarking-minded people
will show up and keep the discussion honest.

---------- Begin Forwarded Message ----------

The deadline for applying to attend the 1997 Gordon conference in high
performance computing is June 1. If you are interested in attending,
please apply as soon as possible. The simplest way to apply is to download
the application form from the web site indicated below, or to use the online
registration option. If you have any problems with either of these,
please contact the organizers at tony@cs.msstate.edu and worleyph@ornl.gov.

-------------------------------------------------------------------------------
The 1997 Gordon Conference on High Performance Computing and
Information Infrastructure: "Practical Revolutions in HPC and NII"

Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu,
       601-325-8435
Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov,
       615-574-3128

Conference web page: http://www.erc.msstate.edu/conferences/gordon97

July 13-17, 1997
Plymouth State College
Plymouth NH

The now bi-annual Gordon conference series in HPC and NII commenced in 1992
and has had its second meeting in 1995.  The Gordon conferences are an
elite series of conferences designed to advance the state-of-the-art in
covered disciplines. Speakers are assured of anonymity and
referencing presentations done at Gordon conferences is prohibited by
conference rules in order to promote science, rather than publication
lists.  Previous meetings have had good international participation,
and this is always encouraged. Experts, novices, and technically
interested parties from other fields interested in HPC and NII are
encouraged to apply to attend.

All attendees, including speakers, poster presenters, and session chairs
must apply to attend. We *strongly* encourage all poster presenters to have
their poster proposals in by May 13, 1997, though we will consider poster
presentations up to six weeks prior to the conference.  Application to
attend the conference is also six weeks in advance.

More information on the conference can be found at the web page
listed above, including the list of speakers and poster presenters
and information on applying for attendance.


----------- End Forwarded Message -----------


From owner-parkbench-comm@CS.UTK.EDU Tue May 13 13:58:00 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA20879; Tue, 13 May 1997 13:57:59 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA11997; Tue, 13 May 1997 13:33:14 -0400
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA11983; Tue, 13 May 1997 13:33:10 -0400
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.5/CRI-gate-news-1.3) with ESMTP id MAA20939 for <parkbench-comm@CS.UTK.EDU>; Tue, 13 May 1997 12:33:07 -0500 (CDT)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id MAA16428 for <parkbench-comm@CS.UTK.EDU>; Tue, 13 May 1997 12:33:06 -0500 (CDT)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id RAA20181; Tue, 13 May 1997 17:33:04 GMT
Message-Id: <199705131733.RAA20181@magnet.cray.com>
Subject: Parkbench directions
To: parkbench-comm@CS.UTK.EDU
Date: Tue, 13 May 1997 12:33:04 -0500 (CDT)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


To:   ParkBench Group
From: Charles Grassl

Date: May 13, 1997

(Long)

I appreciated the meeting this past week and wish to thank Eric and Jack 
for hosting it.  I am aware of the great effort of many individuals
have contributed to developing and implementing the ParkBench suite.
In spite of this, I feel that we need to evaluate and correct our course.

ParkBench should not merge with or use benchmarks from the SPEC/HPG
(High Performance Group) group.  SGI/Cray and IBM have already
withdrawn from the SPEC/HPG group and Fujitsu and NEC are no longer
participating.  The reasons for these companies and other institutions
no longer participating should indicate to us (ParkBench) that
something is amiss with the SPEC/HPG benchmarks and paradigm.

Several of the reasons for the supercomputer manufacturers not
supporting the SPEC/HPG effort are listed below.  I list these reasons
so that the ParkBench group can learn from them and avoid the same
problems.

- Relevance.  The particular benchmark programs being used by SPEC/HPG
  are not relevant or appropriate for supercomputing.  The programs in
  the current SPEC/HPG suite do not represent any leading edge software
  which is more typical of usage for high performance systems.

- Redundancy.  The programs being developed by SPEC/HPG
  are not qualitatively or quantitatively different from the SPEC/OSG
  programs and as such, it is viewed as redundant and expensive.

- Methodology.  The methodology being used by SPEC/HPG to
  procure, develop and run benchmarks lacks scientific and technical
  basis and hence results have a vague and arbitrary interpretation.

- Programming model.  Designing benchmarks for portability across
  systems is a convenient idea but does not reflect actual constraints
  or usage.  More often than not, compatibility with a PREVIOUS model
  of computer is more important than compatibility ACROSS computers.

- Expense.  Some of the large data cases for the SPEC/HPG programs
  will requires hours or days to run with little new data or
  information gained by the exercise.  These exercises are extremely
  expensive both in time and capital equipment and in logistics.

- Ergonomics.  The cumbersome design of SPEC/HPG Makefiles and build
  procedures make the programs difficult and expensive to test,
  maintain and analyze.

We in the ParkBench group must acknowledge the above items if we are to
maintain interest and participation from computer vendors.  I believe
that reorganizing and refocusing the group could revitalize high
performance computer benchmarking and and re-invigorate the ParkBench
group.

As the ParkBench suite now stands, there are too many programs and they
are difficult to build, test and maintain.  This situation impedes
usage and participation.  Here are a few suggestions for our future
practices and directions:

- Design and write benchmarks programs.  Don't borrow or solicit old
  code.  The borrowed or solicited code is never quite appropriate and
  usually obsolete.  Our greatest asset is that we have scientist who
  are capable of designing experiments (benchmarks).  (Build value.)

- Monitor and evaluate accuracy.  Though we mention accuracy in
  ParkBench Report 1, we haven't applied it to the current programs
  (Scientifically validate, or invalidate, our experiments.)

- Make it simple.  Write and develop simple programs which do not need
  elaborate build procedures and which easier to test and to maintain.
  (Keep It Simple, Stupid.)

- Build a better user interface.  The belabored "run rules" and the
  interface with layers of Makefiles, includes and embedded relative
  file paths is unacceptable.  An acceptable interface might require
  binary distribution and hence a desirable emphasis on designing and
  running rather than building and porting the benchmarks.  (Make the
  product more attractive to more users.)

- Make the suite truly modular.  The current structure makes the
  simplest one CPU program as difficult to build and run as the most
  complicated program with Makefile includes, special compilers, source
  file includes, special libraries, suite libraries, etc. (Make it
  manageable.)

- Drop the connection with SPEC/HPG and with NPB.  This "grand
  unifying" scheme make redundant code.  It has had the opposite effect
  of focusing benchmarking attention on ParkBench because it is yet
  another collection of benchmarks used by other organizations.  (Be
  distinguishable and identifiable.)

- Emphasis what ParkBench is associated with:  benchmarking distributed
  memory parallel computers.  We should write and develop benchmark
  programs which measure and instrument the parallel processing aspect
  of MPP systems.  (Keep our focus.)


I volunteer to develop and write a suite of message passing test
programs which measure the performance and variance of message passing
communication schemes.  I have much experience with writing such a
programs and believe that such suite would be useful for others and for
the computer industry in general.

I hesitate to contribute such programs to the present structure for
several reasons:

- The network test suite does not logically fit into the current
  "hierarchy" and hence might further clutter the ParkBench suite and
  make it further unfocused.

- The current ParkBench structure is not manageable.  Testing and
  maintenance would be extremely expensive in the current structure.

- My company's effort may be interpreted as an endorsement of the
  current structure and model.  The suite is not popular with vendors
  for reasons outlined above.  Participation is currently discouraged.


Discussion?  


Regards,
Charles Grassl
SGI/Cray
Eagan, Minnesota  USA

From owner-parkbench-comm@CS.UTK.EDU Wed May 21 17:25:15 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA27513; Wed, 21 May 1997 17:25:15 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA07579; Wed, 21 May 1997 17:18:07 -0400
Received: from rastaman.rmt.utk.edu (root@TCHM11A6.RMT.UTK.EDU [128.169.27.188]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id RAA07571; Wed, 21 May 1997 17:18:02 -0400
Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id RAA01108; Wed, 21 May 1997 17:24:43 -0400
Sender: mucci@CS.UTK.EDU
Message-ID: <3383681A.D98C5FB@cs.utk.edu>
Date: Wed, 21 May 1997 17:24:42 -0400
From: "Philip J. Mucci" <mucci@CS.UTK.EDU>
Organization: University of Tennessee, Knoxville
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
CC: "PVM Developer's Mailing List" <pvmspankers@msr.epm.ornl.gov>
Subject: Mesg Passing Benchmarks
References: <199705131733.RAA20181@magnet.cray.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi all,

Charles Grassl in his last message to this committee volunteered
to write a suite of message passing benchmarks to replace the Low
Levels...Before any action on his or this committee's part, I would
recommend that you all have a look at version 3 of my pvmbench
package. It now does MPI as well and can easily support other
message passing primitives with a few #defines. 

Version 3 along with some sample results can be found at
http://www.cs.utk.edu/~mucci/pvmbench.

Note that this has not been tested on any MPP's with UTK PVM.

This benchmark will generate and graph the following:

bandwidth
gap time (to buffer an outgoing message)
roundtrip (latency /2)
barrier/sec
broadcast
summation reduction

Other tests can easily be added...I would highly recommend before any 
action done that this code be examined. It is less than a year old, 
version 3 available on that page is in beta, i.e. it has not been
released to the general public. Let me know what you think...

-Phil

-- 
/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/

From owner-parkbench-comm@CS.UTK.EDU Fri May 23 12:03:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA06549; Fri, 23 May 1997 12:03:03 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA15901; Fri, 23 May 1997 11:05:32 -0400
Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id LAA15895; Fri, 23 May 1997 11:05:30 -0400
Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK)
          id LAA01370; Fri, 23 May 1997 11:05:31 -0400
Message-Id: <199705231505.LAA01370@berry.cs.utk.edu>
to: parkbench-comm@CS.UTK.EDU
Subject: Minutes of May ParkBench Meeting
Date: Fri, 23 May 1997 11:05:31 -0400
From: "Michael W. Berry" <berry@CS.UTK.EDU>

Here are the minutes from the recent ParkBench meeting in Knoxville.
Best regards,
Mike

-----------------------------------------------------------------
Minutes of ParkBench Meeting - Knoxville Hilton, May 9, 1997
-----------------------------------------------------------------

ParkBench Attendee List:

     (MBa) Mark Baker          Univ. of Portsmouth   mab@sis.port.ac.uk
     (MBe) Michael Berry       Univ. of Tennessee    berry@cs.utk.edu
           Shirley Browne      Univ. of Tennessee    browne@cs.utk.edu
     (JD)  Jack Dongarra       Univ. of Tenn./ORNL   dongarra@cs.utk.edu
           Jeff Durachta       Army Res. Lab MSRC    durachta@arl.mil
     (VG)  Vladimir Getov      Univ. of Westminister getovv@wmin.ac.uk
     (CG)  Charles Grassl      SGI/Cray              cmg@cray.com
     (TH)  Tony Hey            Univ. of Southampton  ajgh@ecs.soton.ac.uk
     (AH)  Adolfy Hoisie       Los Alamos Nat'l Lab  hoisie@lanl.gov
     (CK)  Charles Koelbel     Rice University       chk@cs.rice.edu
     (PM)  Phil Mucci          Univ. of Tennessee    mucci@cs.utk.edu
           Erik Riedel         GENIAS Software GmbH  erik@genias.de
     (SS)  Subhash Saini       NASA Ames             saini@nas.nasa.gov
     (RS)  Ron Sercely         HP-Convex             sercely@convex.hp.com
           Alan Stagg          CEWES                 stagga@wes.army.mil
     (ES)  Erich Strohmaier    Univ. of Tennessee    erich@cs.utk.edu
     (PW)  Pat Worley          Oak Ridge Nat'l Lab   worleyph@ornl.gov

SPEC-HPG Visitors:

           Don Dossa           DEC                   dossa@eng.pko.dec.com
     (RE)  Rudi Eigenmann      Purdue University     eigenman@ecn.purdue.edu
           Greg Gaertner       DEC                   ggg@zko.dec.com
           Jean Suplick        HP                    suplick@rsn.hp.com
           Joe Throp           Kuck & Associates     throp@kai.com

At 9:05am EST, TH opened the meeting and ask that all the attendees
introduce themselves.  After a brief overview of the proposed agenda,
MBe reviewed the minutes from the last ParkBench meeting in October
of '96.  The minutes were unanimously accepted and TH asked VG to
present the proposed changes to the low-level benchmarks (9:20am).

VG reviewed the original COMMS1 (ping-pong or simplex communication) and
the COMMS2 (duplex communication) low-level benchmarks.  He discussed
some of the problems with the previous versions.  These included the
omission of calculated bandwidth, large message length problems, and
large errors in the asymptotic fit.   In collaboration with RS and CG,
a number of improvements have been made to these benchmarks:

	1. Measured bandwidth is provided in output.
	2. Time for shortest message is provided.
	3. Maximum measured bandwidth and the corresponding message
	   length is now provided.
	4. The accuracy of the least-squares 2-parameter fit has been
	   improved (sum of squares of the "relative" and not absolute
	   error is now used).
	5. New 3-parameter variable-power fit for certain cases added.
	6. Can report parametric fits if the error is less than some
   	   user-specified tolerance.
	7. Introduce KDIAG parameter to invoke diagnostic outputs.
	8. Modifications fo ESTCOM.f (as suggested by RS).
    
CG pointed out that it may not always be possible to interpret zero-length 
messages for these codes.  On the Cray machines, such messages force an 
immediate return (i.e., no synchronization).  He proposed that allowing zero-
length messages be removed for the COMMS benchmarks.  RS showed an actual
COMMS1 performance graph demonstrating the difficulty of data extrapolation
(if used to get latency for zero-length message-passing).  RS pointed out,
however, that zero-length message are defined w/in MPI, and suggested that
a simple return (as in the case of Cray machines) is not standard.

VG displayed some of the observed COMMS1/2 performance obtained on the
Cray T3E.  The 3-parameter fit yielded a 7% relative error for messages
ranging from 8 to 1.E+7 bytes.  CG questioned how the breakpoints were
determined?  He indicated the input parameters to the program required
previous knowledge of where breakpoints occur (although implementations
could change constantly).  TH suggested that the parametric fitting should
not be the default for these benchmarks, i.e., separate the analysis from
the actual benchmarking (this concept was seconded by CG).  RS suggested
that the fitting routines could be placed on the WWW/Internet and the
COMMS1/2 codes simply produce data.  CK, however, stressed that the codes
should maintain some minimal parametric fitting for clarity and
consistency of output interpretations.  

The minimal message length shown for the T3E results shown by VG was 8 and
the corresponding minimal message length for a Convex CXD set of
COMMS benchmarks was 1.  The lack of similar ranges of messages could
pose problems for comparisons.  JD strongly felt that users will return
to the notion of "latency" and want zero-length message overheads.  Users
may be primarily interested in start-up time for message-passing.  RS pointed
out that MPI does process zero-length messages.  JD suggested that
the minimal message length for the COMMS benchmarks be 8 bytes and RS proposed
that the minimal message-passing time and corresp. message length be
an output.  After more discussion, the following COMMS changes/outputs were 
unanimously agreed upon:

	1.  Maximum bandwidth with corresp. message size.
	2.  Minimum message-passing time with corresp. message size.
	3.  Time for minimum message length (could be 0, 1, 8, or 32 bytes
            but must be specified).
	4.  The software will be split into two program: one to report
	    the spot measurements and the other for the analysis.


At 10:00 am, SPEC-HPG members joined the ParkBench meeting for a joint
session.  CK reviewed the DoD Modernization Program.  He indicated that
the program is based on 3 primary components:

	1. CHSSI (Commonly Highly Scalable Software Initiative)
	2. DREN (Defense Research & Engineering Network)
	3. Shared Resource Centers (4 Major Shared Resource Centers or
           MSRC's and 20 Distributed Centers or DC's)

Benchmarking is part of the mission of the MSRC's, especially for
system integration and the Programming Environment & Training (PET)
team.  CK mentioned that the resources available at the MSRC's include:

256-proc. Cray T3E, SGI Power Challenge (CEWES), 256 proc. IBM SP/2 and
SGI Origin 2000 at ASC, SGI 790 at NAVO, and a collection of {SGI Origin,
Cray Titan, J90} at the Army Research Lab.

The benchmarking needs of the DoD program can be categorized as either
contractual or training.  The contractual needs are specified as PL1
(evaluation of initial machines), PL2 (upgrade to gain 3 times the
performance of PL1), and PL3 (upgrade to gain 10 times the performance
of PL1).  CK mentioned that the MSRC's are planning for the PL2 phase
later this year with PL3 scheduled in approx. 3 years.
The training needs include: the evaluation of programming paradigms,
the evaluation of performance trade-offs, templates for designing new
codes, and benchmarks for training examples.

The contractual benchmarks comprise 30 benchmarks (22 programs) some
of which are export-controlled or proprietary (data may not be used
in the public domain in some cases).  The run rules specify the number
of iterations for each benchmark in the suite.  Each MSRC uses a different
number of iterations per benchmark.  Code modifications are allowed (parallel
directives and message-passing can be used but no assembler) and algorithm
substitutions are permitted provided the problem does not become specialized.
The only performance metric reported for these benchmarks is the elapsed
time for the entire suite.  Benchmarks can be upgraded to reflect current
workloads of the MSRCs but they must be compared head-to-head with 
previous systems.

Example codes included in the DoD benchmark suite include: CTH (finite
volume shock simulation), X3D (explicit finite element code), OCEAN-O2 (an
ocean modeling code), NIKE3D (implicit nonlinear 3D FEM), and Aggregate
I/O benchmark.

Planned benchmarking activites for the DoD Modernization Program include:

	1. benchmarks for evaluating programming techniques (determine what
           works; develop decision trees)
	2. benchmarks for teaching (classes on "worked" examples; template
           modification)

This effort currently has 1 FTE and over 50 University personnel (in PET
program) involved (although they are not primarily responsible for
benchmarking work).

At 10:35am, TH asked AH from Los Alamos Nat'l Lab to overview their ASCI
benchmark suite.  He began by pointing out that these codes formulate the
"Los Alamos set of" ASCI Benchmarks.  Before presenting the list of codes,
AH noted that the philosophy of this activity was to achieve 
"experiment ahead" capability especially with immature computing platforms.
Los Alamos is also interested in developing performance modes as well as
kernels.  The list of active/research codes and compact applications 
comprising this suite are:

Code		Language(s)	Parallelism 	Description           
    
*HEAT(RAGE)	f77, f90	MPI(f90)	Eulerian adaptive mesh
				MPIfSM(f77)	refinement based on
						Riemann solvers; coupled
						physics-CFD; particle &
						radiative transport

EULER		f90		MPI		Admissable fluid (for SIMD);
				SIMD(SP		unstructured mesh, explicit
				vector)		solution; high-speed fluids;
						SP=single processor

NEUT		f77		MPI,SM,		Monte-Carlo, particle
				SHMEM

SWEEP3D		f90		MPI, SHMEM	Inner/outer iteration (kernel)
                                                (compact application)

HYDRO(T)	f77		Serial          (compact application)

TBON		f77		MPI		Material science; quantum
						mechanics; polymer age    
						simulation

*TECOLOTE	C++		MPI		Mixed call hydro. with regular
						structured grid

*TELURIDE	f90		MPI		Casting simulation; irregular
						structured grid; Krylov solution
						methods

*DANTE		HPF		MPI

* = export controlled

The codes and compact apps above vary in size from 2,000 to 35,000 lines.
AK noted that LANL could provide support for future ASCI-based ParkBench codes. 
The ASCI benchmark suite presented might include in the future tri-lab
(Livermore, Sandia, Los Alamos) contributions.  The ASCI application suite can
be set up with data sets leading to varying run-times.  AH mentioned that Los 
Alamos' ASCI benchmarking efforts are focused on high performance computing,
leading edge architectures, algorithms, and applications.  They are 
particularly concentrating in developing expertise in distributed shared-memory
performance evaluation and modeling.  AH expressed the hope that the efforts of
ParkBench will follow similar directions.

At 11:05am, SS reviewed some of the most recent NAS Parallel Benchmarks results.
He began with vendor-optimized CG Class B results using row and column 
distribution blocking.  Results for different numbers of processors of the T3D 
were reported along with results for the NEC SX-4, SGI Origin 2K, Convex SPP2K,
Fujitsu VPP700, and IBM P2SC.  He also showed results for FT Class B and BT 
Class B (all machines reported performed well on this benchmark).  For BT, it 
was pointed out that 4 of the machines (Cray T3E, DEC Alpha, IBM P2SC, and NEC 
SX-4) essentially are based on the same processor but achieve widely-varying
results.  SS also reported HPF Class A MG results on 16 processors of the IBM 
SP2.  The HPF version (APR-HPF/Portland Group compiled) was only 3 times slower
than the MPI-based (f77) implementation.  This is indeed a significant result 
given that two years ago the HPF version was as much as 10 times slower than 
the comparable MPI version.  An HPF version of the Class A FT benchmark on 64 
processors was shown to be faster than the MPI version (1.6 times faster) when
optimized libraries are used in both versions.  For the Class A SP benchmark
(on 64 processors of the SP/2), the APR- and PGI-compiled HPF versions were 
within a factor of 2 of the MPI versions.  Finally, the HPF Class A BT code on 
64 processors of the Cray T3D was within a factor of 0.5 of the MPI version.

At 11:35am, TH invited RE to overview current SPEC-HPG activities.  The SPEC-HPG
benchmarks define a suite of real-world high-performance computing applications
designed for comparisons across different platforms (serial and message-
passing).  RE pointed out the history of the SPEC-HPG effort as a merger between
the PERFECT and SPEC benchmarking activities.  The current SPEC-HPG suite is
comprised of 2 codes: SPECchem96 and SPECseis96.  The SPECchem96 code evolved
from the GAMES code used in pharmaceutical and chemical industries.  It
comprises 109,389 lines of f77 (21% comments), 865 subroutines and
functions.  The wave functions are written to disk.  The SPECseis96 code
is derived from the ARCO benchmark suite which consists of four phases: data
generation, stack data, time migration, and depth migration.  This code
decomposes the domain into n equal parts (for n processors) with each part
processed independently.  It is have over 15K lines of code made up of
230 Fortran subroutines and 199 C functions for I/O and systems utilities.
SPECseis96 uses 32-bit precision, FFT's, Kirchoff integrals, and finite
differences.

The very first set of SPEC-HPG benchmark results were approved on May 8,
1997 (preceding day).  New benchmarks being considered are PMD (Parallel
Molecular Dynamics) and MM5 (NCAR Weather Processing C code).  The decision
on whether or not to accept these two potential SPEC-HPG codes will be made
in about 5 months.  The SPEC-HPG run rules permit the use of compiler
switches, source code changes, optimized libraries (which have been
disclosed to customers).  Only approved algorithmic changes will be disclosed.
RE gave the URL for the SPEC-HPG effort: http://www.specbench.org/hpg.  He
also referred to a recent article by himself and S. Hassanzadeh in "IEEE
Computational Science & Engineering" and two email reflectors for SPEC-HPG
communication: comments@specbench.org and info@specbench.org.

JD then gave a brief history of ParkBench and SPEC-HPG interactions and
suggested that the two efforts might consider sharing results and software.
The biggest difference in the two efforts is in the availability of
software as ParkBench code is freely available and SPEC-HPG software
has some restrictions.  A forum to publish both sets of results was discussed
and it was agreed that both efforts should at least share links on their
respective webpages.  RE pointed out that anyone can get the SPEC-HPG CD
of benchmarks without actually being a SPEC member.

JD stressed that the process of running codes (for any suite) needs to
be simplified so that building executables for different platforms is not
problematic.  Modifications for porting should be restricted to driver programs.
RS indicated that he has Perl scripts that runs all low_level, including 
COMMS3 for 2 to N procs, and produces a summary of the results. 

*** ACTION ITEM ***
JD, RE, AH, and CK will discuss a potential joint effort to simplify the
running of benchmark codes (contact RS also about his Perl scripts).

MBa noted that the SPEC-HPG members should be added to the ParkBench
email list (parkbench-comm@cs.utk.edu).  He also indicated that European
benchmarking workshop scheduled next Fall might coordinate with the
European SPEC group (scheduled for Sept. 11-12).

At 12:10pm, the attendees went to the lunch (Soup Kitchen).

After lunch (1:30pm), TH asked ES and VG to coordinate changes to the
COMMS benchmarks discussed above (*** ACTION ITEM ***).  ES then discussed
modifications to poly2 for the ParkBench V2.2 suite.  The proposed changes
include
	1. enlarged arrays A(1000000), B(1000000)
	2. removal of arrays C and D
	3. avoid cache flush (use a sliding vector), i.e., 

             DO I=1,N               DO I=NMIN,NMAX
                         becomes       ...

                                       NMIN=NMIN+N+INC

           where INC=17 by default (avoids reuse of the old cache line).

PM then discussed a program for determining parameters for memory subsystems.
Characteristics of this software include the use of tight loops, independent
memory references, maximized register use.  He showed graphs of memory
hierarchy bandwidth (reads and writes) depicting memory size (ranging from 4Kb
to 4Mb) versus Mb/sec transfer rates.  Some curves illustrated the effective 
cache size quite well.  PM pointed out that dynamically-scheduled processors
pose a significant problem for this type of modeling.  The program can be
run with or without a calibration loop exploiting known memory transfer data.
CG suggested that it would be nice to have such a program to measure latency
at all levels of the hierarchy.  PM's webpages for this program are:

	http://www.cs.utk.edu/~mucci/cachebench and
	http://www.cs.utk.edu/~mucci/parkbench.

CK suggested that an uncalibrated version of PM's benchmark would be more
useful to users (more reflective of real codes).  JD pointed out that the
output of the program could be tabulated bandwidths, latencies, etc.  CG
felt this program would be a very useful tool.  PM noted that the calibration
will not be used by default.  TH suggested that the ParkBench effort might
want to develop a future "ParkBench Tool Set" which contains progams like
this one developed by PM.

With regard to the Linalg Kernels, ES noted that although many of the
routines have calls to Scalapack routines, Scalapack will not be included
in future software releases.  Users will have to ge their own copies of
the source (or binaries) for Scalapack.  The size of these particular
kernel benchmarks drops by a factor of one-third by removing Scalapack.

*** ACTION ITEM ***
ES will report the most recent Linalg benchmark performance results at the
next ParkBench meeting.

TH then asked for discussions on new benchmarks with MBa leading the
discussion on HPF benchmarks.  MBa indicated that a new mail reflector
(parkbench-hpf@cs.utk.edu) had been set up for this cause with himself
as moderator for low-level codes (CK will moderate kernels and SS will
moderate discussions on HPF compact applications).  MBa noted that there
is limited manpower for the HPF benchmarking activities.  CK noted that
he had discussed this effort at recent the HPFF meeting (and other
users meetings).  A draft document on the ParkBench HPF benchmarks is
available at http://www.sis.port.ac.uk/~mab/ParkBench.  MBa felt strongly
that without manpower support this particular activity will die and that
a lead site is needed.

*** ACTION ITEM ***
CK and SS will investigate interest in HPF compact application development.

JD indicated that wrappers are being used to create HPF versions of the
Linalg kernels.  The procedure involves writing wrappers for the current
Scalapack driver programs.  Eventually, these programs may be completely
rewritten in HPF (this will start in the summer).  TH suggested that HPF
kernel benchmark performance be reported at the ParkBench meeting 
in September (at Southampton Performance Workshop).

MBa went on to report on the status of I/O benchmarks.  Basically, not
much progress has been made on the ParkBench I/O initiative.  A new I/O 
project between ECMWF, FECIT, and the Univ. of Southampton was launched
this past February.  They are looking at the I/O  in the IFS code from
the ECMWF (European Weather Forecasting).  David Snelling is the FECIT
leader who has also participated in ParkBench activities.  This I/O
project has 1 FTE at Southampton and 1.5 FTE at FECIT along with several
personnel at ECMWF.  One workshop, two technical meetings for the 1-year
project is planned.  The goals are: to develop instrumented I/O
benchmarks and build on top of MPI-IO (test, characterize parallel
systems).  Their methodology is very similar to that of ParkBench.
Codes in f90 and ANSI C are being considered (stubs for VAMPIR and
PABLO).  Regular reports to Fujitsu (sponsor of activity) are planned
and a full I/O test suite is planned by February 1998.

MBa also reported on the status of the ParkBench graphical database.
Currently, the performance data is kept in a relational DBMS.  A
frontend Java applet has been written to query the DBMS on-the-fly.
A backend is also in development which will automate the extraction
of new performance data and insertion into the DBMS (via an http
server).  By September, a more complete prototype which will allow
MS access and JDBC between 2 different machines should be ready.

VG then discussed the development of Java-based low-level benchmarks.
He presented a Java-to-C Interface Generator which would allow Java
benchmarks to call existing C libraries on remote machines.  He
presented sample Java+C NAS PB results on a 16-processor IBM SP/2
(Class A IS Benchmark):

           Version        1 Proc  2Procs  4 Procs  8 Procs  16 Procs
           NASA (C)        29.1    17.4     9.4     5.2        2.8
                 C         40.5    24.9     13.1    9.3       15.6
              Java         ----   132.5     64.7   37.9       33.5

At 2:50pm, TH reported other ParkBench activities including the
new PEMCS (Performance Evaluation and Modeling for Computer Systems)
electronic journal.  Suggested articles/authors include:

       *1. ParkBench Report No. 2 (ES, MBe)
       *2. NAS PB
	3. SPEC-HPG
       *4. Top 500
	5. AutoBench (M. Ginsburg)
       *6. Euroben (van der Steen)
	7. RAPS
	8. Europort
       *9. Cache benchmarks
       10. ASCI benchmarks (DoD)
      *11. PERFORM
       12. R. Hockney
      *13. PEPS
       14. C3I/Rome Labs

Those articles possible for Summer '97 are marked via *.  JD suggested
that articles be available in Encapsulated Postscript, PDF (Adobe),
and HTML.  TH noted that EU funding will provide a host computer and
some administration.  Possible publishers are Oxford Univ. Press and
Elsevier.

At 3:10pm, ES requested more items for the ParkBench bibliography
which will be available on the WWW.  PW suggested that authors should
be able to submit links to ParkBench-related applications.  JD then
briefly discussed WebBench which is a website focused on benchmarking
and performance evaluation.  Data is presented on platform,s applications,
organizations, vendors, conferences, papers, newsgroups, FAQ's, and
repositories (PDS, Top500, Linpack, etc.).  The WebBench URL is
http://www.netlib.org/benchweb.

MBa reminded attendees of the Fall Performance Workshop/ParkBench
meeting on (Thursday and Friday) Sept. 11 and 12.  This meeting
will be held at Venue, County Hotel, Southampton, UK.  Invited
and contributed talks will be presented.

With regard to ParkBench funding, JD indicated that the UT/ORNL/NASA
Ames proposal was not selected for funding but that it could be re-
submitted next year.  Expected funding from Rome lab was not received.
TH and VG did not succeed this past year either although some funding
from Fujitsu is possible.

TH adjourned the meeting at 3:25pm EST.

From owner-parkbench-comm@CS.UTK.EDU Tue May 27 10:32:45 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA25239; Tue, 27 May 1997 10:32:45 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA05022; Tue, 27 May 1997 10:12:02 -0400
Received: from exu.inf.puc-rio.br (exu.inf.puc-rio.br [139.82.16.3]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA05013; Tue, 27 May 1997 10:11:53 -0400
Received: from obaluae (obaluae.inf.puc-rio.br) by exu.inf.puc-rio.br (4.1/SMI-4.1)
	id AA20170; Tue, 27 May 97 11:11:00 EST
From: maira@inf.puc-rio.br (Maira Tres Medina)
Received: by obaluae (SMI-8.6/client-1.3)
	id LAA16226; Tue, 27 May 1997 11:10:58 -0300
Date: Tue, 27 May 1997 11:10:58 -0300
Message-Id: <199705271410.LAA16226@obaluae>
To: parkbench-comments@CS.UTK.EDU
Subject: Benchmarks
Cc: parkbench-comm@CS.UTK.EDU, maira@CS.UTK.EDU, victal@CS.UTK.EDU
X-Sun-Charset: US-ASCII

Hello 

I'm a graduate student at the Computer Science Department of PUC-Rio
(Catholic University of Rio de Janeiro). I'm  currently studing
Low_Level benchmarks for measuring basic computer characteristics.

I have had same problems trying to run some of the benchmarks.
For example, the benchmark comms1 for PVM, prints the following errors messages
and stops.
 
    n05.sp1.lncc.br:/u/renata/maira/ParkBench/bin/RS6K>comms1_pvm
      Number of nodes =          2
      Front End System (1=yes, 0=no) =          0
      Spawning done by process (1=yes, 0=no) =          1
      Spawned           0  processes OK...
      libpvm [t4000c]: pvm_mcast(): Bad parameter
      TIDs sent...benchmark progressing...
 
 
   n05.sp1.lncc.br:/u/renata/maira/ParkBench> bin/RS6K/comms1_pvm 
     1525-006 The OPEN request cannot be processed because STATUS=OLD was coded 
     in the OPEN statement but the file comms1.dat does not exist. The program 
     will continue if ERR= or IOSTAT= has been coded in the OPEN statement.
     1525-099 Program is stopping because errors have occurred in an I/O request 
     and ERR= or IOSTAT= was not coded in the I/O statement.
 
 
I would like to know how I can execute the benchmarks only for  PVM.
Can you help me?
 
I have not had problems with benchmarks sequentials (tick1, tick2 ...).
 
Thank you very much for your attention.
 
Maira Tres Medina
Phd. Student
Pontificial Catholic University
Rio de Janeiro, Brazil
 

From owner-parkbench-comm@CS.UTK.EDU Wed May 28 16:36:07 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA15377; Wed, 28 May 1997 16:36:06 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA16158; Wed, 28 May 1997 16:26:41 -0400
Received: from rastaman.rmt.utk.edu (root@TCHM03A16.RMT.UTK.EDU [128.169.27.60]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA16150; Wed, 28 May 1997 16:26:37 -0400
Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id QAA00226; Wed, 28 May 1997 16:33:33 -0400
Sender: mucci@CS.UTK.EDU
Message-ID: <338C968B.124F15AA@cs.utk.edu>
Date: Wed, 28 May 1997 16:33:33 -0400
From: "Philip J. Mucci" <mucci@CS.UTK.EDU>
Organization: University of Tennessee, Knoxville
X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586)
MIME-Version: 1.0
To: Maira Tres Medina <maira@inf.puc-rio.br>
CC: parkbench-comments@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU
Subject: Re: Benchmarks
References: <199705271410.LAA16226@obaluae>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

You need to make sure the dat files are in the executable directory.
They should be installed in $PVM_ROOT/bin/$PVM_ARCH.

-Phil

-- 
/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/

From owner-parkbench-comm@CS.UTK.EDU Thu Jun  5 11:30:41 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id LAA11302; Thu, 5 Jun 1997 11:30:41 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA14227; Thu, 5 Jun 1997 10:53:09 -0400
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA14220; Thu, 5 Jun 1997 10:53:07 -0400
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id KAA06499; Thu, 5 Jun 1997 10:53:06 -0400 (EDT)
Date: Thu, 5 Jun 1997 10:53:06 -0400 (EDT)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199706051453.KAA06499@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Gordon conference deadline extended
Forwarding: Mail from 'Pat Worley <worley>'
     dated: Thu, 5 Jun 1997 10:48:07 -0400 (EDT)
Cc: worley@haven.EPM.ORNL.GOV, tony@cs.msstate.edu

(Our apologies if you receive this multiple times.)

There is still room for additional attendees at the Gordon Conference on High
Performance Computing, and the Gordon Research Conference administration has
agreed to extend the application deadline. As a practical matter,
applications need to be submitted no later than JULY 1. We will also stop
accepting applications before that date if the maximum meeting size is
reached, so please apply as soon as possible if you are interested in
attending.   

The simplest way to apply is to download the application form from the web
site 

http://www.erc.msstate.edu/conferences/gordon97

or to use the online registration option available at the same site.
If you have any problems with either of these, please contact the organizers
at tony@cs.msstate.edu and worleyph@ornl.gov. 

Complete information on the meeting is available from the Web site or its
links, but a short summary of the meeting follows:

--------------------------------------------------------------------------

The 1997 Gordon Conference on High Performance Computing and
Information Infrastructure: "Practical Revolutions in HPC and NII"

Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu,
       601-325-8435
Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov,
       615-574-3128

Conference web page: http://www.erc.msstate.edu/conferences/gordon97

July 13-17, 1997
Plymouth State College
Plymouth NH

The now bi-annual Gordon conference series in HPC and NII commenced in 1992
and has had its second meeting in 1995.  The Gordon conferences are an
elite series of conferences designed to advance the state-of-the-art in
covered disciplines. Speakers are assured of anonymity and
referencing presentations done at Gordon conferences is prohibited by
conference rules in order to promote science, rather than publication
lists.  Previous meetings have had good international participation,
and this is always encouraged. Experts, novices, and technically
interested parties from other fields interested in HPC and NII are
encouraged to apply to attend.

The conference consists of technical sessions in the morning and evening,
with afternoons free for discussion and recreation. Each session consists of
2 or 3 one hour talks, with ample time for questions and discussion. All
speakers are invited and there are no parallel sessions. All attendees are
both encouraged and expected to actively participate, via discussions during
the technical sessions or via poster presentations. 

All attendees, including speakers, poster presenters, and session chairs,
must apply to attend. Poster presenters should indicate their poster
proposals on their applications. While all posters must be approved,
successful applicants should assume that their posters have been accpeted
unless they hear otherwise. 

Meeting Themes:
  Networks: Emerging capabilities and the practical implications
          : New types of networking  
  Real-Time Issues
  Multilevel Multicomputers
  Processors-in-Memory and Other Fine Grain Computational Architectures
  Impact of Evolving Hardware on Applications
  Impact of Software Abstractions on Performance

Confirmed Speakers:
  Ashok K. Agrawala		University of Maryland
  Kirstie Bellman		DARPA/SISTO
  James C. Browne		University of Texas at Austin
  Andrew Chien			University of Illiniois, Urbana-Champaign
  Thomas H. Cormen		Dartmouth College
  Jean-Dominique Decotignie	CSEM
  David Greenberg		Sandia National Laboratories
  William Gropp			Argonne National Laboratory
  Don Heller			Ames Laboratory
  Jeff Koller			Information Sciences Institute
  Peter Kogge			University of Notre Dame
  Chris Landauer		The Aerospace Corporation
  Olaf M. Lubeck		Los Alamos National Laboratory
  Andrew Lumsdaine		University of Notre Dame
  Lenore Mullins		SUNY, Albany
  Paul Plassmann		Argonne National Laboratory
  Lui Sha			Carnegie Mellon Univeristy
  Paul Woodward			University of Minnesota


From owner-parkbench-comm@CS.UTK.EDU Tue Jul  1 17:06:52 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA20550; Tue, 1 Jul 1997 17:06:51 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA21503; Tue, 1 Jul 1997 17:03:35 -0400
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA21438; Tue, 1 Jul 1997 17:02:42 -0400
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA10168; Tue, 1 Jul 97 22:00:22 BST
Date: Tue,  1 Jul 97 20:55:49    
From: Mark Baker <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Workshop - Southampton, UK
To: ejz@ecs.soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        William Gropp <gropp@mcs.anl.gov>,
        Antoine Hyaric <Antoine.Hyaric@comlab.ox.ac.uk>, gent@genias.de,
        gcf@npac.syr.edu, geerd.hoffman@ecmwf.co.uk, reed@cs.uiuc.edu,
        david@cs.cf.ac.uk, clemens-august.thole@gmd.de, klaus.stueben@gmd.de,
        "J.C.T. Pool" <jpool@cacr.caltech.edu>,
        Paul Messina <messina@cacr.caltech.edu>, foster@mcs.anl.gov,
        idh@soton.ac.uk, rjc@soton.ac.uk, plg@pac.soton.ac.uk,
        Graham.Nudd@dcs.warwick.ac.uk
Cc: lec@ecs.soton.ac.uk, rjr@ecs.soton.ac.uk,
        "MATRAVERS Prof. D R STAF" <DRM12@sms.port.ac.uk>,
        wilsona@sis.port.ac.uk, grant <grant@afs.mcc.ac.uk>,
        hwyau@epcc.ed.ac.uk
X-Priority: 3 (Normal)
X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc.
Message-Id: <Chameleon.867790738.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

This is to let you know that the Department of Electronics and Computer
Science at the University of Southampton is organising a Fall 97 
Parkbench Workshop on the 11th and 12th of September 1997.
See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ for futher 
details.

The workshop will include a number of talks from researchers working in
th field of performance evaluation and modelling of computer systems, a panel
discussion session and a Parkbench committee meeting.

The Workshop is free to attend - workshop delegates need only cover their
own travel and accommodation expenses. Attendance is limited and so the 
availability of places at the Workshop will be allocated on a first come basis.

It is planned to turn the talks given at the Workshop into a series of 
short papers which will be put together and published as a Special Issue 
of the electronic journal Performance Evaluation and Modelling of 
Computer Systems (PEMCS).

For further information or registration details refer to the Web pages -
(http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/registration.html).

I would appreciate it if you would kindly pass this email onto colleges who
may be interested in the event.

Regards

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 7/1/97 - Time: 8:55:49 PM
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Jul 23 17:19:23 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id RAA04434; Wed, 23 Jul 1997 17:19:23 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA28191; Wed, 23 Jul 1997 17:10:39 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id RAA28171; Wed, 23 Jul 1997 17:10:24 -0400 (EDT)
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA14190; Wed, 23 Jul 97 22:10:30 BST
Date: Wed, 23 Jul 97 22:01:41 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PEMCS Web Site
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: "3@c]&iv:nfs&\mp6n<RAc6^[LSl&b"vx:G#zkJus3[uV=a.|~/c]T(LKr/FQ'iWPiMF'4x
 n2{)H=1~y.#7>N90ioxbQ-Eu:]}^MyviIL7YjwT,Cl)|TYpTQ})PP'&O=V`~)JQRWjM?H;'`q\"3mv
 "j@5vs)}!WC3pG9q:;rpe0\LoLQfY"1?1A.\(f=E*&QAW8WK+)*)T0[Bv=[{.-f7<6Ddv!2XaWhH
X-Priority: 3 (Normal)
Message-Id: <Chameleon.869692062.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

The Web site that will host the Journal of "Performance 
Evaluation and Modelling of Computer Systems (PEMCS)" can
be found at:

http://hpc-journals.ecs.soton.ac.uk/PEMCS/

The pages I have put up are at the present still in a 
"draft/under-construction" state.

I would appreciate any comments or feedback about the
pages.

Regards

Mark



-------------------------------------
Dr Mark Baker
DIS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 07/23/97 - Time: 22:01:41
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Thu Jul 24 08:26:42 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA12708; Thu, 24 Jul 1997 08:26:42 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA04617; Thu, 24 Jul 1997 08:21:55 -0400 (EDT)
Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA04599; Thu, 24 Jul 1997 08:21:23 -0400 (EDT)
Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK)
          id IAA13817; Thu, 24 Jul 1997 08:21:24 -0400
Message-Id: <199707241221.IAA13817@berry.cs.utk.edu>
To: Mark Baker <mab@sis.port.ac.uk>
cc: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
Subject: Re: PEMCS Web Site 
In-reply-to: Your message of Wed, 23 Jul 1997 22:01:41 -0000.
             <Chameleon.869692062.mab@baker> 
Date: Thu, 24 Jul 1997 08:21:24 -0400
From: "Michael W. Berry" <berry@CS.UTK.EDU>



> Dear All,
> 
> The Web site that will host the Journal of "Performance 
> Evaluation and Modelling of Computer Systems (PEMCS)" can
> be found at:
> 
> http://hpc-journals.ecs.soton.ac.uk/PEMCS/
> 
> The pages I have put up are at the present still in a 
> "draft/under-construction" state.
> 
> I would appreciate any comments or feedback about the
> pages.
> 
> Regards
> 
> Mark
> 
> 
> 
> -------------------------------------
> Dr Mark Baker
> DIS, University of Portsmouth, Hants, UK
> Tel: +44 1705 844285	Fax: +44 1705 844006
> E-mail: mab@sis.port.ac.uk
> Date: 07/23/97 - Time: 22:01:41
> URL http://www.sis.port.ac.uk/~mab/
> -------------------------------------
> 

Mark,
the webpages are well organized.  You might reconsider the
red text on the green background of the menu frame.  It was
difficult to read on my machine at home.

Nice work!
Mike

-------------------------------------------------------------------
Michael W. Berry                     Ayres Hall 114
berry@cs.utk.edu                     Department of Computer Science          
OFF:(423) 974-3838                   University of Tennessee
FAX:(423) 974-4404                   Knoxville, TN  37996-1301
URL:http://www.cs.utk.edu/~berry/
-------------------------------------------------------------------

From owner-parkbench-comm@CS.UTK.EDU Fri Aug  1 12:59:29 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA05831; Fri, 1 Aug 1997 12:59:27 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA01387; Fri, 1 Aug 1997 12:38:00 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA01337; Fri, 1 Aug 1997 12:37:24 -0400 (EDT)
Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA15842; Fri, 1 Aug 97 17:36:11 BST
Date: Fri,  1 Aug 97 17:17:51 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Reminder - Fall Parkbench Workshop
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n
 KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z
 M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89,
 x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y%
 E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4
X-Priority: 3 (Normal)
Message-Id: <Chameleon.870453138.mab@baker>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

This email is a reminder about the:

----------------------------------------------------------------------------------------------------

				Fall ParkBench Workshop

                           Thursday 11th/Friday 12th September 1997 

                               at the University of Southampton, UK


		See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/


----------------------------------------------------------------------------------------------------

If you are interested in attending the Workshop you should register now and 
reserve accommodation as hotel rooms in Southampton during the workshop period
will be in short supply due to the "International Southampton Boat Show" which 
will also be taking place.

At present we have a preliminary reservation on rooms at the County Hotel where
the Workshop is being held. Without concrete delegate reservations we can only
hold onto there rooms for approximately another week.

Thereafter, accommodation at the Hotel, or around the city, may be more problematic
in getting and reserving. So, I encourage potential Workshop delegates to 
register ASAP.

Mark


-------------------------------------
Dr Mark Baker
University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 08/01/97 - Time: 17:17:52
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Aug 11 13:13:12 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA20171; Mon, 11 Aug 1997 13:13:11 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA06842; Mon, 11 Aug 1997 13:02:59 -0400 (EDT)
Received: from MIT.EDU (SOUTH-STATION-ANNEX.MIT.EDU [18.72.1.2]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA06808; Mon, 11 Aug 1997 13:02:42 -0400 (EDT)
Received: from MIT.MIT.EDU by MIT.EDU with SMTP
	id AA27349; Mon, 11 Aug 97 13:02:14 EDT
Received: from HOCKEY.MIT.EDU by MIT.MIT.EDU (5.61/4.7) id AA01161; Mon, 11 Aug 97 13:02:12 EDT
Message-Id: <9708111702.AA01161@MIT.MIT.EDU>
X-Sender: mmccarth@po9.mit.edu
X-Mailer: Windows Eudora Pro Version 2.1.2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 11 Aug 1997 13:02:12 -0400
To: alison.wall@rl.ac.uk, weber@scripps.edu, schauser@cs.ucsb.edu,
        dewombl@sandia.gov, edgorha@sandia.gov, rdskocy@sandia.gov,
        sales@pgroup.com, utpds@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        pancake@cs.orst.edu, johnreed@ghost.CS.ORST.EDU, levesque@apri.com,
        davida@cit.gu.edu.au, gddt@gup.uni-linz.ac.at,
        atempt@gup.uni-linz.ac.at, rileyba@ornl.gov, bac@ccs.ornl.gov
From: "Michael F. McCarthy" <mmccarth@MIT.EDU>
Subject: For Sale: CM-5


   PLEASE FORWARD THIS NOTE TO ANYONE THAT YOU BELIEVE 
   MAY HAVE AN INTEREST IN PURCHASING THIS SYSTEM!
__________________________________________________________________________

Case #3971 -- FOR SALE - CM5 with 128 nodes and SDA --
__________________________________________________________________________

The MIT Lab for Computer Science offers for bid sale a Thinking Machines 
CM-5 Connection Machine (described below).  

Bids to purchase this system are requested from all interested parties,
(with a minimum expected Bid of $25,000).

All bids must be received at the MIT property office by 5:00 PM (EDT)
on Monday, 8/Sept/97.

The machine must be moved from MIT within 10 business days of  acceptance
of the bid. All expenses and arrangements for moving will be made by 
purchaser.

The system consists of:

1) 128 PN CM-5 w/ Vector Units, 256 Network addresses-Part 
          No.CM5-128V-32F
2) Scalable Disk Array with Twenty-four(24) 
          1.2 GB Drives-Part No.CM5-SA25F
3) Control Processor Interface-Part No. CM5-CPI
4) S-Bus to Diagnostics Network Interface-Part No. CM5-SDN
5) S-Bus Network Interface Board(5)-Part No. CM5-SNI

[N.B. On July 16 1997 power was turned off.The machine can be 
turned back on in its present location only until Friday, 22/AUG/97 
when wiring changes are planned in that machine room.]
 
"The Institute reserves the right to reject any or all offers.MIT makes no
warranty of any kind, express or implied, with respect to this equipment.
This includes fitness for a particular purpose. It is the responsibility of 
those making an offer to determine, before making an offer, that the
equipment meets any conditions required by those making that offer.Thank you."
__________________________________________________________________________

Submit bids for Case #3971  
                before Monday, 8/Sept/97, 5:00 PM (EDT) to: 
*****************************************************************
* Michael F. McCarthy       * Phone:  (617)253-2779             *
* MIT Property Office       * FAX:    (617)253-2444             *
* E19-429                   * E-Mail: mmccarth@MIT.EDU          * 
* 77 Massachusetts Ave.     *                                   *
* Cambridge, MA 02139       *                                   *
*****************************************************************
__________________________________________________________________________

SYSTEM HISTORY 

The Project SCOUT CM-5 is housed in M.I.T's Laboratory for Computer 
Science (L.C.S). The machine was acquired in 1993 as part of the the ARPA 
sponsored project SCOUT, and used to accomplish the stated aim of the 
project of "fermenting collaborations between users, builders and
networkers of massively parallel computers". The CM-5 computer, developed
and manufactured by Thinking Machines Corporation, evolved from earlier
T.M.C. computers (the CM-2 and the CM-200)with an architecture targeted 
toward teraflops performance for large, complex data intensive applications.

The MIT hardware consists of a total of 128 32MHz SPARC  microprocessors,
each with 4 proprietary floating point arithmetic units and 32Mb of local
memory attached to it. The system also includes a subsidiary 25Gb parallel 
file system for handling high volume parallel application I/O. 
 
The system was operated under full maintenance contract 
from May of 1993 until March 20 1997.

On July 16 1997 power was turned off. The machine can be turned back on
in its present location only until Friday, 22/AUG/97 when wiring changes 
are planned in that machine room.

The system was used primarily for research but a description of an 
instructional use made of the machine can be found at
     http://www-erl.mit.edu/eaps/seminar/iap95/cnh/CM5Intro.html

Web sites about other CM5 sites and general information include:
     http://www.math.uic.edu/~hanson/cmg.html
     http://www.acl.lanl.gov/UserInfo/cm5admin.html
     http://ec.msc.edu/CM5/

__________________________________________________________________________
FUTURE MAINTENANCE

People submitting bids may wish to discuss future maintenance issues
with a company that is a present maintainer of CM5 Equipment, 
Connection Machine Services. 
*****************************************************************
* Larry Stewart                * Phone:  (505) 820-1470         *
*                              * Cell:   (505) 690-7799         *
* Account Executive            * FAX:    (505) 820-0810         *
* Connection Machines Services * Home:   (505) 983-9670         *
* 1373 Camino Sin Salida       * Pager   (888) 712-4143         *
* Santa Fe, NM 87501           * E-Mail: stewart@ix.netcom.com  *
*****************************************************************
__________________________________________________________________________








Michael F. McCarthy
MIT Property Office
E19-429
77 Massachusetts Ave.
Cambridge, MA 02139
Ph   (617)253-2779
Fax  (617)253-2444


From owner-parkbench-comm@CS.UTK.EDU Mon Sep  1 05:44:50 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA11838; Mon, 1 Sep 1997 05:44:50 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA07176; Mon, 1 Sep 1997 05:35:14 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA07160; Mon, 1 Sep 1997 05:34:44 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA14311; Mon, 1 Sep 97 10:33:06 BST
Date: Mon,  1 Sep 97 10:19:23 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Final Announcement: Fall ParkBench Workshop
To: "Daniel A. Reed"  <reed@cs.uiuc.edu>,
        "J.C.T. Pool"  <jpool@cacr.caltech.edu>, a.j.grant@mcc.ac.uk,
        Antoine Hyaric  <Antoine.Hyaric@comlab.ox.ac.uk>,
        Ed Zaluska  <E.J.Zaluska@ecs.soton.ac.uk>,
        Fritz Ferstl  <ferstl@genias.de>, Hon W Yau  <hwyau@epcc.ed.ac.uk>,
        idh@soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        Paul Messina  <messina@cacr.caltech.edu>,
        R.Rankin@Queens-Belfast.AC.UK, rjc@soton.ac.uk, topic@mcc.ac.uk,
        Wolfgang Genzsch  <getup@genias.de>
Cc: lec@ecs.soton.ac.uk
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n
 KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z
 M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89,
 x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y%
 E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4
X-Priority: 3 (Normal)
Message-Id: <Chameleon.873106125.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear all,

This is the FINAL ANNOUNCEMENT:

If you would like to attend this workshop please let Lesley Courtney 
(lec@ecs.soton.ac.uk) know by Friday 5th September 1997 at 
the latest as we need to confirm numbers.

Workshop details can be found at

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/

Regards

Mark



-------------------------------------
Dr Mark Baker
University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/01/97 - Time: 10:19:23
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Sep  3 15:37:55 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA20262; Wed, 3 Sep 1997 15:37:55 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA08273; Wed, 3 Sep 1997 15:19:14 -0400 (EDT)
Received: from punt-2.mail.demon.net (punt-2b.mail.demon.net [194.217.242.6]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA08262; Wed, 3 Sep 1997 15:19:10 -0400 (EDT)
Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net
           id aa0626941; 3 Sep 97 17:35 BST
Message-ID: <pin21IA7KYD0Ew2z@minnow.demon.co.uk>
Date: Wed, 3 Sep 1997 16:31:07 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Prototype PICT release 1.0
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

At their last meeting the Parkbench Committee recommended that an
interactive curve fitting tool be produced for the postprocessing and
parametrisation of Parkbench results using the latest Internet Web
technology. I have produced a prototype of such a tool as a Java applet
running on a Web page on the user's machine and called it PICT
(Parkbench Interactive Curve-fitting Tool). This is now ready for
evaluation and testing by the committee.

The tool provides the following features:

(1) Automatic plotting of Low-Level Parkbench output files from a URL
anywhere on the Web (At present limited to New COMMS1 and Raw data, but
easily extended to original COMMS1 and RINF1). This is useful for a
quick comparison of raw data.

(2) Automatic plotting of both 2 and 3-parameter curve-fits which are
produce by the benchmarks. Good for checking the quality of the fits.

(3) Allows manual rescaling of the graph range to suit the data, either
by typing in the required range values or by dragging out a range box
with the mouse.

(4) Allows the 2-parameter and 3-parameter performance curves to be
manually moved about the graph in order to fine tune the fits. The curve
follows the mouse and the RMS and MAX percentage errors are shown as the
curve moves. Alternatively parameter values can be typed in and the
Manual button pressed when the curve for these values will be plotted.

(5) The data file being plotted can be VIEWed and a HELP button provides
a description of the action of each button in a separate windows.

The PICT applet has been built on top of Leigh Brookshaw's 2D plotting
package the URL for which is given at the bottom of the HELP window. The
features under the RESTART button are in his original code, I have just
added the 2-PARA and 3-PARA features.

The applet was developed using JDK1.0 beta on a PC with a 1600x1200
display and works on the PC both locally and from my Web page with
appletview, MSIE 3.02 and Netscape 3.01. It has also been successfully
run on a Solaris Sun with NS3.01, but another Sun user has reported no
graphs and errors due to "wrong applet version". So please report your
experiences (both success and failure please) to me with all the
details.

To play with PICT turn your browser to:

     http://www.minnow.demon.co.uk/pict/source/pict1.html  or 
                                               pict1a.html 

pict1.html asks for 1000x732 pixels and suits PCs best (it's about the
minimum useful size).

pict1a.html asks for 1020x900 pixels and was necessary for the whole
applet to visible on the Sun.

For those wishing to look closer all the source is provided and should
be downloadable. Suggestions for improvement, corrections or
constructive criticism are solicited.

I have asked for an agenda item to be included for the Parkbench meeting
on 11 Sept in Southampton so that PICT can be discussed. I look forward
to seeing some of you there.
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-lowlevel@CS.UTK.EDU Wed Sep 10 06:29:15 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA21129; Wed, 10 Sep 1997 06:29:14 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA20815; Wed, 10 Sep 1997 06:31:30 -0400 (EDT)
Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT)
Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST
Date: Wed, 10 Sep 97 11:33:13 BST
Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk>
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Southampton, U.K. on 
September 11th, 1997 as part of the ParkBench Workshop.

The Workshop site will be the County Hotel in Southampton.

  County Hotel
  Highfield Lane
  Southampton, U.K.
  Phone: +44-(0)1703-359955


Please send us your comments about the tentative agenda:


14:30  Finalize meeting agenda
       Minutes of last meeting (Erich Strohmaier)

14:45  Changes to Current release:
         - Low Level COMMS benchmarks (Vladimir Getov)
         - NAS Parallel Benchmarks (Subhash Saini)

15:15  New benchmarks:
         - HPF Low Level benchmarks (Mark Baker)


15:30  ParkBench Performance Analysis Tools:
         - ParkBench Result Templates (Vladimir Getov and Mark Papiani)
         - Visualization of Parallel Benchmark Results - new GBIS
           (Mark Papiani and Flavio Bergamaschi)
         - Interactive Web-page Curve-fitting of Parallel Performance
           Measurements (Roger Hockney)


16:15  Demonstrations:
         - Java Low-Level Benchmarks (Vladimir Getov)
         - BenchView: Java Tool for Visualization of Parallel Benchmark Results
           (Mark Papiani and Flavio Bergamaschi)
         - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)


16:45  Other activities:
         - "Electronic Benchmarking Journal" - status report (Mark Baker)


       Miscellaneous
       Date and venue for next meeting

17:00       Adjourn


Tony Hey
Vladimir Getov
Erich Strohmaier

From owner-parkbench-comm@CS.UTK.EDU Wed Sep 10 06:40:25 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA21186; Wed, 10 Sep 1997 06:40:25 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA20806; Wed, 10 Sep 1997 06:31:06 -0400 (EDT)
Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT)
Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100
Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST
From: Vladimir Getov <vsg@ecs.soton.ac.uk>
Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST
Date: Wed, 10 Sep 97 11:33:13 BST
Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk>
To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU,
        parkbench-hpf@CS.UTK.EDU
Subject: ParkBench Committee Meeting - tentative Agenda

Dear Colleague,

The ParkBench (Parallel Benchmark Working Group)
will meet in Southampton, U.K. on 
September 11th, 1997 as part of the ParkBench Workshop.

The Workshop site will be the County Hotel in Southampton.

  County Hotel
  Highfield Lane
  Southampton, U.K.
  Phone: +44-(0)1703-359955


Please send us your comments about the tentative agenda:


14:30  Finalize meeting agenda
       Minutes of last meeting (Erich Strohmaier)

14:45  Changes to Current release:
         - Low Level COMMS benchmarks (Vladimir Getov)
         - NAS Parallel Benchmarks (Subhash Saini)

15:15  New benchmarks:
         - HPF Low Level benchmarks (Mark Baker)


15:30  ParkBench Performance Analysis Tools:
         - ParkBench Result Templates (Vladimir Getov and Mark Papiani)
         - Visualization of Parallel Benchmark Results - new GBIS
           (Mark Papiani and Flavio Bergamaschi)
         - Interactive Web-page Curve-fitting of Parallel Performance
           Measurements (Roger Hockney)


16:15  Demonstrations:
         - Java Low-Level Benchmarks (Vladimir Getov)
         - BenchView: Java Tool for Visualization of Parallel Benchmark Results
           (Mark Papiani and Flavio Bergamaschi)
         - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)


16:45  Other activities:
         - "Electronic Benchmarking Journal" - status report (Mark Baker)


       Miscellaneous
       Date and venue for next meeting

17:00       Adjourn


Tony Hey
Vladimir Getov
Erich Strohmaier

From owner-parkbench-lowlevel@CS.UTK.EDU Thu Sep 18 18:27:19 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id SAA12991; Thu, 18 Sep 1997 18:27:18 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id SAA29359; Thu, 18 Sep 1997 18:26:21 -0400 (EDT)
Received: from k2.llnl.gov (zosel@k2.llnl.gov [134.9.1.1]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id SAA29352; Thu, 18 Sep 1997 18:26:19 -0400 (EDT)
Received: (from zosel@localhost)
	by k2.llnl.gov (8.8.5/8.8.5/LLNL-Jun96) id PAA07246
	for parkbench-lowlevel@cs.utk.edu; Thu, 18 Sep 1997 15:26:16 -0700 (PDT)
Date: Thu, 18 Sep 1997 15:26:16 -0700 (PDT)
From: Mary E Zosel <zosel@k2.llnl.gov>
Message-Id: <199709182226.PAA07246@k2.llnl.gov>
To: parkbench-lowlevel@CS.UTK.EDU
Subject: any pthreads tests???

Does anyone know of any low-level performance tests for pthreads libraries???
I'm interested in both single processor performance of pthreads calls - 
and also multiprocessor (shared memory) calls ... to measure the overhead
of the calls.
  -mary zosel-   zosel@llnl.gov

From owner-parkbench-lowlevel@CS.UTK.EDU Sun Sep 21 09:13:20 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA08699; Sun, 21 Sep 1997 09:13:20 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA15884; Sun, 21 Sep 1997 09:15:32 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA15877; Sun, 21 Sep 1997 09:15:30 -0400 (EDT)
Received: from mordillo (p41.ascend3.is2.bb.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA10322; Sun, 21 Sep 97 14:15:58 BST
Date: Sun, 21 Sep 97 13:32:56 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: any pthreads tests???
To: Mary E Zosel  <zosel@k2.llnl.gov>, parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <199709182226.PAA07246@k2.llnl.gov> 
Message-Id: <Chameleon.874845447.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Mary,

This has been talked about as one of the activities that Parkbench
would be interested in persuing. But, so far we have not had the
time or man-power to follow up our interests.

Ron Sercely at HP/CTCX was particularly interested in this area. Also,
I know the people at Manchester University wrote a bunch of
Pthreads codes - some were benchmarks - for their KSR machine.

Hope this helps.

Regards

Mark


--- On Thu, 18 Sep 1997 15:26:16 -0700 (PDT)  Mary E Zosel <zosel@k2.llnl.gov> wrote:
> Does anyone know of any low-level performance tests for pthreads libraries???
> I'm interested in both single processor performance of pthreads calls - 
> and also multiprocessor (shared memory) calls ... to measure the overhead
> of the calls.
>   -mary zosel-   zosel@llnl.gov
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/21/97 - Time: 13:32:57
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Wed Sep 24 06:04:19 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA23913; Wed, 24 Sep 1997 06:04:18 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA23163; Wed, 24 Sep 1997 05:46:35 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA23156; Wed, 24 Sep 1997 05:46:26 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA29780; Wed, 24 Sep 97 10:47:01 BST
Date: Wed, 24 Sep 97 10:38:39 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PC timers
To: parkbench-comm@CS.UTK.EDU, parkbench-low-level@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.875094053.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Can someone suggest the appropriate PC-based timer 
function (MS Visual C++ or Digital Visual Fortran)
to replace the usual gettimeofday call !?

Cheers

Mark

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/24/97 - Time: 10:38:39
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Thu Sep 25 10:11:01 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA20147; Thu, 25 Sep 1997 10:11:01 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA18087; Thu, 25 Sep 1997 09:24:56 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA18080; Thu, 25 Sep 1997 09:24:53 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA12457; Thu, 25 Sep 97 14:25:35 BST
Date: Thu, 25 Sep 97 14:11:59 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PC Time function
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.875193559.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Thanks to all for timer info. I used the C function _ftime()
in the end because it had millisec resolution. Just had
to get a my head around using INTERFACE in F90 to include
the external C function.

I've inserted my version of the _ftime() timer below - I don't think
there are any obvious error in it :-)

I also implemented the dflib F90 function  CALL GETTIM(hour, min, sec, hund) -
this function passed tick2 testing but only has 1/100 sec resolution.

-------------------------------------------------------
double dwalltime00()
{

    struct _timeb timebuf;

    _ftime( &timebuf );

    return (double) timebuf.time + (double) timebuf.millitm / 1000.0;
}

double dwalltime00_()
{
	return dwalltime00();
}

double DWALLTIME00()
{
	return dwalltime00();
}
-------------------------------------------------------




Cheers

Mark




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/25/97 - Time: 14:11:59
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Tue Oct  7 06:35:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA26560; Tue, 7 Oct 1997 06:35:04 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA25697; Tue, 7 Oct 1997 06:10:11 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA25668; Tue, 7 Oct 1997 06:09:40 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA05125; Tue, 7 Oct 97 11:09:53 BST
Date: Tue,  7 Oct 97 10:43:49 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Workshop Papers
To: "Aad J. van der Steen"  <steen@fys.ruu.nl>, Charles Grassl  <cmg@cray.com>,
        Clemens Thole  <clemens-august.thole@gmd.de>,
        David Snelling  <snelling@fecit.co.uk>,
        Erich Strohmaier  <erich@CS.UTK.EDU>,
        Grapham Nudd  <Graham.Nudd@dcs.warwick.ac.uk>,
        Klaus Stueben  <klaus.stueben@gmd.de>, parkbench-comm@CS.UTK.EDU,
        Roger Hockney  <roger@minnow.demon.co.uk>,
        Saini Subhash  <saini@nas.nasa.gov>,
        Vladimir Getov  <vsg@ecs.soto.ac.uk>,
        William Gropp  <gropp@mcs.anl.gov>
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.876218541.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I am now back in the office and have a small amount of time to follow
up the Parkbench Workshop that took place a few weeks ago.

I would firstly like to thanks everyone who attended - especially
all the speakers. Even though we did not attract hundreds of 
delegates to the workshop, I think the event was very successful 
- but I may be bias...

So, the plans are that in the first instance I will collect the slides
from all the speaker and package them up and put them on the PEMCS
Web site.

We also decided that we would encourage all the speaker to produce
short papers on their talks and put all the workshop paper together
to create a special issue the the PEMCES journal.

Can the speakers therefore send me their slides (I would prefer
powerpoint or word version if possible). I will harrass you further
about a short papers in the near future.

Thanks in advance for your help.

Regards

Mark



-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/07/97 - Time: 10:43:49
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Sun Oct 12 09:55:57 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA28908; Sun, 12 Oct 1997 09:55:57 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA08800; Sun, 12 Oct 1997 09:44:23 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA08793; Sun, 12 Oct 1997 09:44:20 -0400 (EDT)
Received: from mordillo (p26.nas4.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA11347; Sun, 12 Oct 97 14:45:07 BST
Date: Sun, 12 Oct 97 14:35:10 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Equivalent to comms1
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.876663429.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Can someone point me at the equivalant of comms1 written in
C - either MPI or sockets (or even PVM if its out there).

Cheers

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/12/97 - Time: 14:35:10
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Oct 13 16:30:04 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA17020; Mon, 13 Oct 1997 16:29:59 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA24297; Mon, 13 Oct 1997 16:02:05 -0400 (EDT)
Received: from dancer.cs.utk.edu (DANCER.CS.UTK.EDU [128.169.92.77]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA24288; Mon, 13 Oct 1997 16:02:03 -0400 (EDT)
From: Philip Mucci <mucci@CS.UTK.EDU>
Received:  by dancer.cs.utk.edu (cf v2.11c-UTK)
          id QAA02925; Mon, 13 Oct 1997 16:02:00 -0400
Date: Mon, 13 Oct 1997 16:02:00 -0400
Message-Id: <199710132002.QAA02925@dancer.cs.utk.edu>
To: mab@sis.port.ac.uk, parkbench-comm@CS.UTK.EDU
Subject: Re: Equivalent to comms1
In-Reply-To: <Chameleon.876663429.mab@mordillo>
X-Mailer: [XMailTool v3.1.2b]


I would check out my mpbench on my web page....
It does PVM and MPI for now...

> Can someone point me at the equivalant of comms1 written in
> C - either MPI or sockets (or even PVM if its out there).
> 
> Cheers
> 
> Mark
> 
> 
> -------------------------------------
> Dr Mark Baker
> CSM, University of Portsmouth, Hants, UK
> Tel: +44 1705 844285	Fax: +44 1705 844006
> E-mail: mab@sis.port.ac.uk
> Date: 10/12/97 - Time: 14:35:10
> URL http://www.sis.port.ac.uk/~mab/
> -------------------------------------
> 

/%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\
\*%/ http://www.cs.utk.edu/~mucci  PVM/Active Messages \%*/ 

From owner-parkbench-comm@CS.UTK.EDU Mon Oct 20 10:37:14 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA15359; Mon, 20 Oct 1997 10:37:14 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA07990; Mon, 20 Oct 1997 10:19:41 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA07691; Mon, 20 Oct 1997 10:17:09 -0400 (EDT)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA16636; Mon, 20 Oct 97 15:17:33 BST
Date: Mon, 20 Oct 97 15:02:39 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: PEMCS Short Article
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.877356527.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I've just put up (at last!!) the first PEMCES short article at
http://hpc-journals.ecs.soton.ac.uk/PEMCS/Articles/

At the moment there is not much of a "house style" for the format
of the papers and articles - this will hopefully be developed over
the coming months.

I expect to put the first full paper up on the Web in the next week or
so.

Comments, ideas and help with the journal and its Web site are most
welcome.

Regards

Mark

------------------------------------------------------------------------------------------


COMPARING COMMUNICATION PERFORMANCE OF MPI ON THE CRAY RESEARCH T3E-600 AND IBM SP-2 1
	
                                by
                Glenn R. Luecke and James J. Coyle
                    Iowa State University
                 Ames, Iowa 50011-2251, USA
                
                     Waqar ul Haque
                University of Northern British Columbia
               Prince George, British Columbia, Canada V2N 4Z9
                              

Abstract 

This paper reports the performance of the Cray Research T3E and IBM SP-2 on a collection of 
communication tests that use MPI for the message passing. These tests have been designed to 
evaluate the performance of communication patterns that we feel are likely to occur in 
scientific programs. Communication tests were performed for messages of sizes 8 Bytes (B), 
1 KB, 100 KB, and 10 MB with 2, 4, 8, 16, 32 and 64 processors. Both machines provided a high 
level of concurrency for the nearest neighbor communication tests and moderate concurrency on 
the broadcast operations. On the tests used, the T3E significantly outperformed the SP-2 with 
most performance tests being at least three times faster than the SP-2. 


-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/20/97 - Time: 15:02:42
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Sat Oct 25 08:52:33 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA12875; Sat, 25 Oct 1997 08:52:33 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA05256; Sat, 25 Oct 1997 08:41:15 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA05244; Sat, 25 Oct 1997 08:41:05 -0400 (EDT)
Received: from mordillo (p16.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA01764; Sat, 25 Oct 97 13:41:26 BST
Date: Sat, 25 Oct 97 13:27:24 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Parkbench Workshop Talks - On line
To: Chuck Koelbel  <chk@cs.rice.edu>,
        Clemens Thole  <clemens-august.thole@gmd.de>,
        Grapham Nudd  <Graham.Nudd@dcs.warwick.ac.uk>,
        Guy Robinson  <robinson@arsc.edu>,
        Klaus Stueben  <klaus.stueben@gmd.de>, parkbench-comm@CS.UTK.EDU,
        William Gropp  <gropp@mcs.anl.gov>
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 2 (High)
Message-Id: <Chameleon.877782734.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

I have put the talks received so far up at...

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html

Please can the speakers who have not passed their talks onto me to
do so.

Thanks in advance.

Regards

Mark


-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/25/97 - Time: 13:27:25
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Fri Oct 31 08:22:47 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA19412; Fri, 31 Oct 1997 08:22:46 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA15140; Fri, 31 Oct 1997 07:44:09 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA15133; Fri, 31 Oct 1997 07:44:05 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2017784; 31 Oct 97 12:25 GMT
Message-ID: <uFBOiBAJ2cW0EwdA@minnow.demon.co.uk>
Date: Fri, 31 Oct 1997 12:22:33 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Announcing PICT2
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

                         ANNOUNCING PICT2
                         ++++++++++++++++

The prototype Parkbench Interactive Curve Fitting Tool (PICT1) that was
demonstrated at the Southampton meeting of Parkbench in September was
difficult to use on small screens because the image was too large and
could not be reduced in size to suit the users' screen size. Sorry, I
had developed it on my own 1600x1200 display without realising that most
users considered 800x600 as large!

Well the new version PICT2 that is now on my web page allows for the
full range of screen sizes: 640x480, 800x600, 1024x768, >=1600x1200, and
also allows the user to customise his own display by selecting a font
size and screen width and height. So the new version should be usable by
all -- I hope!   

Another problem at Southampton was that the display workstation was very
old and too slow in MHz to do the job. I use a P133 Pentium and the
graphs lines move around instantly, but if you only have a 20MHz machine
for example the response wil probably be too slow to be useful for real
curve interactive fitting. There is nothing I can do about this except
to suggest that you use the need to use PICT as an excuse (I mean
justification) to upgrade your equipment.

PICT2 still relies on the use of New COMMS1 to compute the least square
2-para fit and the 3-point fit fot the 3-para. The next step will be to
put these features in PICT but that is a fair amount of code to get
right and I thought it best to solve the screen-size problem first. But
remember the key point about PICT is that it allows Interactive manual
fitting and display that is not otherwise available.

To try out PICT2 turn your browser to:

             http://www.minnow.demon.co.uk/pict/source/pict2a.html

and follow the instructions. When you have a good PICT Frame displayed,
press the HELP button for a description of the button actions.

Please report problems, experiences (good and bad), suggestions to me
at:

             roger@minnow.demon.co.uk

I need feedback in order to improve the tool.  

Best wishes to you all

Roger
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue Nov 11 06:21:05 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA18373; Tue, 11 Nov 1997 06:21:05 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA27963; Tue, 11 Nov 1997 06:06:45 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA27930; Tue, 11 Nov 1997 06:06:15 -0500 (EST)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA23083; Tue, 11 Nov 97 11:07:22 GMT
Date: Tue, 11 Nov 97 11:00:36 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Couple of Announcements
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.879246493.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

A couple of announcements...

Firstly, the majority of the papers presented at  Fall ParkBench Workshop 
on Thursday 11th /Friday 12th September 1997 at the University of Southampton,
are now on-line and can be found at...

http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html

or 

>From http://hpc-journals.ecs.soton.ac.uk/PEMCS/ and click on News in the left frame...

Secondly, the first full paper for the electronic journal Performance Evaluation 
and Modelling of Computer Systems (PEMCS)

"PERFORM - A Fast Simulator For Estimating Program Execution Time" By Alistair
Dunlop and Tony Hey,  Department Electronics and Computer Science University of 
Southampton Southampton, SO17 1BJ, U.K. 

Can be found at...

http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/vol1.html

See you'll at the Parkbench BOF at SC'97...


Mark



-------------------------------------
Dr Mark Baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/11/97 - Time: 11:00:36
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-lowlevel@CS.UTK.EDU Wed Nov 12 21:30:42 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA13985; Wed, 12 Nov 1997 21:30:42 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06841; Wed, 12 Nov 1997 21:31:46 -0500 (EST)
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST)
Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK)
          id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500
Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST)
From: Erich Strohmaier <erich@CS.UTK.EDU>
To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU,
        parkbench-comm@CS.UTK.EDU
Subject: ParkBench BOF session at the SC'97
Message-ID: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear Colleague,

  The ParkBench (PARallel Kernels and BENCHmarks) committee has
organized a BOF session at the SC'97 in San Jose.

   Room:   Convention Center Room C1 
   Time:   Wednesday  5:30pm              


We will talk about the latest release, new results available and future
plans.

                Tentative Agenda of the BOF

   - Introduction, background, WWW-Server 
   - Current Release of ParkBench
   - Low Level Performance Evaluation Tools
   - LinAlg Kernel Benchmarks
   - NAS Parallel Benchmarks,  including latest results
   - Plans for the next Release 
   - Electronic Journal of Performance Evaluation and Modeling
     for Computer Systems
   - Questions from the floor / discussion 


  Please mark your calendar and plan to attend.


Jack Dongarra
Tony Hey 
Erich Strohmaier



From owner-parkbench-comm@CS.UTK.EDU Wed Nov 12 21:46:18 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id VAA14031; Wed, 12 Nov 1997 21:46:17 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06813; Wed, 12 Nov 1997 21:31:03 -0500 (EST)
Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST)
Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK)
          id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500
Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST)
From: Erich Strohmaier <erich@CS.UTK.EDU>
To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU,
        parkbench-comm@CS.UTK.EDU
Subject: ParkBench BOF session at the SC'97
Message-ID: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear Colleague,

  The ParkBench (PARallel Kernels and BENCHmarks) committee has
organized a BOF session at the SC'97 in San Jose.

   Room:   Convention Center Room C1 
   Time:   Wednesday  5:30pm              


We will talk about the latest release, new results available and future
plans.

                Tentative Agenda of the BOF

   - Introduction, background, WWW-Server 
   - Current Release of ParkBench
   - Low Level Performance Evaluation Tools
   - LinAlg Kernel Benchmarks
   - NAS Parallel Benchmarks,  including latest results
   - Plans for the next Release 
   - Electronic Journal of Performance Evaluation and Modeling
     for Computer Systems
   - Questions from the floor / discussion 


  Please mark your calendar and plan to attend.


Jack Dongarra
Tony Hey 
Erich Strohmaier



From owner-parkbench-lowlevel@CS.UTK.EDU Thu Nov 13 06:30:40 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA07097; Thu, 13 Nov 1997 06:30:40 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01844; Thu, 13 Nov 1997 05:55:24 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST)
Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA18430; Thu, 13 Nov 97 10:56:11 GMT
Date: Thu, 13 Nov 97 10:48:53 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Committee Meeting Minutes
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu> 
Message-Id: <Chameleon.879418489.mab@mordillo>
Mime-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579"

--mordillo:879418490:877:126:21579
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

Here are the minutes of the Parkbench committee meeting held The County
Hotel in Southampton during the Fall 97 Parkbench Workshop.

For those of you with a MIME-compliant mail-reader I've attached a formatted
word 7 doc.

Regards

Mark

-----------------------------------------------------------------------------

Parkbench Committee Meeting
Held during the Fall Parkbench Workshop

The County Hotel
Southampton, UK

1515,  11th September 1997


Meeting Participation List:

Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk)
Flavio Bergamaschi  - Univ of Southampton (fab@ecs.soton.ac.uk)
Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu)
Vladimir Getov  - Univ. of Westminister (getovv@wmin.ac.uk)
Charles Grassl - SGI/Cray (cmg@cray.com)
William Gropp - ANL (gropp@mcs.anl.gov)
Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk)
Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk)
Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk)
Subhash Saini - NASA Ames (saini@nas.nasa.gov)
Dave Snelling - FECIT (snelling@fecit.co.uk)
Aad J. van der Steen  - RUU (steen@fys.ruu.nl)
Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu)
Klaus Stueben - GMD  (klaus.stueben@gmd.de)

Meeting Activities and Actions

Tony Hey chaired the meeting.

Minutes from last meeting were seven pages long and it was decided that only the actions from the last 

meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each 

took place. A discussion about interaction with SPEC-HPG was initiated.

Comms Low-Level Benchmarks 

Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks.  
Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. 
A long discussion ensued.

Action - Create a formal proposal  of alternative or additions to the comms low-level benchmarks for 
SC'97 
- Charles Grassl.

Action - Members should look at the PALLAS version of the low-level benchmarks (based on 
Genesis/RAPS).

Action  - Erich  Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench 
and 
add in the new Comms1 benchmark (with new curve fitting routine).

NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks

HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts.

Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes

Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web 
site. It was agreed that this would be discussed  further informally.

Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short 

discussion about this ensued.

Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be 
put 
into a Parkbench report II by SC'97.

Funding Efforts

Jack Dongarra's recent benchmarking  proposal was turned down. Tony Hey mentioned the possibly of 
entering a proposal to the EU.
Possibility of a joint EU / NSF bid.

Mark Baker asked if SIO would be interested in being more closely involved.  William Gropp reported 
that 
SIO was actually winding down and so formal association was not really an option.

AOB

The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the 

Parkbench demonstrations which included:

-- Java Low-Level Benchmarks (Vladimir Getov)
-- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio
   Bergamaschi)
-- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)

Jack Dongarra  informed the committee of  Parkbench BOF at SC'97 (Wednesday at 3.30PM).

The meeting was wound up by Tony Hey at 1630.

-----------------------------------------------------------------------------




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/13/97 - Time: 10:48:53
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------

--mordillo:879418490:877:126:21579
Content-Type: APPLICATION/msword; name="minutes-fall-97.doc"
Content-Transfer-Encoding: BASE64
Content-Description: minutes-fall-97.doc

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB
AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD/////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA
hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA
AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA
AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA
Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA
AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka
AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA
AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA
AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA
AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU
GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA
AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA
ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA
BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp
bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N
VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0
aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0
Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu
cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291
dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg
LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W
bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92
dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA
Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu
Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj
cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt
aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh
bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp
DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv
dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51
aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1
Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl
cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu
c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv
bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy
b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg
d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh
c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv
bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz
c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91
dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D
b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh
dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz
IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg
R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t
bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp
dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0
ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp
b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS
OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk
IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg
YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg
LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk
aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu
Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg
bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp
YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl
bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l
YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj
dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg
R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl
cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI
ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh
bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs
ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j
aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm
Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz
aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g
SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l
bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg
YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N
RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo
bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt
ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg
dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp
ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0
ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH
cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk
b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg
YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp
bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT
b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u
c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu
Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s
IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1
bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD
VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s
IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl
IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz
ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg
VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD
AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA
AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA
jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB
BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ
AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA
AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA
hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm
APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA
CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB
XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD
AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA
AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA
wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi
Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A
AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh
8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6
AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA
IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA
+gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA
BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq
CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M
AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A
ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9
AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA
IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA
/QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC
wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr
AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A
AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA
AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A
CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA
AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u
dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C
AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ
AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6
XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl
cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs
bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz
dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl
eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv
YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u
XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA
tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl
ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA
AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg
NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA
AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt
ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp
Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy
IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb
GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA
AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy
ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA
AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA
AP7///8cAAAA/v/////////////////////////////////////////+////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////UgBvAG8AdAAg
AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u
Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA
AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA
bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp
YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt
IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN
Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB
////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA
AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA
dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA
AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA
vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K
AAAACwAAAAwAAAD+////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
/////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA
wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y
ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA
AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA
AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0
AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A
AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA
AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA
HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM
AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg
V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE
AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA
aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/
//////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN
1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA
dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC
AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA
AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA
AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA
ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA
AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE
AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA
AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA
AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA
AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA

--mordillo:879418490:877:126:21579--

From owner-parkbench-comm@CS.UTK.EDU Thu Nov 13 06:31:53 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA07105; Thu, 13 Nov 1997 06:31:52 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01880; Thu, 13 Nov 1997 05:56:05 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST)
Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA18430; Thu, 13 Nov 97 10:56:11 GMT
Date: Thu, 13 Nov 97 10:48:53 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Fall 97 Parkbench Committee Meeting Minutes
To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU,
        parkbench-lowlevel@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <Pine.SUN.3.96.971112212856.24760D-100000@rudolph.cs.utk.edu> 
Message-Id: <Chameleon.879418489.mab@mordillo>
Mime-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579"

--mordillo:879418490:877:126:21579
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

Here are the minutes of the Parkbench committee meeting held The County
Hotel in Southampton during the Fall 97 Parkbench Workshop.

For those of you with a MIME-compliant mail-reader I've attached a formatted
word 7 doc.

Regards

Mark

-----------------------------------------------------------------------------

Parkbench Committee Meeting
Held during the Fall Parkbench Workshop

The County Hotel
Southampton, UK

1515,  11th September 1997


Meeting Participation List:

Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk)
Flavio Bergamaschi  - Univ of Southampton (fab@ecs.soton.ac.uk)
Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu)
Vladimir Getov  - Univ. of Westminister (getovv@wmin.ac.uk)
Charles Grassl - SGI/Cray (cmg@cray.com)
William Gropp - ANL (gropp@mcs.anl.gov)
Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk)
Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk)
Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk)
Subhash Saini - NASA Ames (saini@nas.nasa.gov)
Dave Snelling - FECIT (snelling@fecit.co.uk)
Aad J. van der Steen  - RUU (steen@fys.ruu.nl)
Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu)
Klaus Stueben - GMD  (klaus.stueben@gmd.de)

Meeting Activities and Actions

Tony Hey chaired the meeting.

Minutes from last meeting were seven pages long and it was decided that only the actions from the last 

meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each 

took place. A discussion about interaction with SPEC-HPG was initiated.

Comms Low-Level Benchmarks 

Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks.  
Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. 
A long discussion ensued.

Action - Create a formal proposal  of alternative or additions to the comms low-level benchmarks for 
SC'97 
- Charles Grassl.

Action - Members should look at the PALLAS version of the low-level benchmarks (based on 
Genesis/RAPS).

Action  - Erich  Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench 
and 
add in the new Comms1 benchmark (with new curve fitting routine).

NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks

HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts.

Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes

Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web 
site. It was agreed that this would be discussed  further informally.

Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short 

discussion about this ensued.

Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be 
put 
into a Parkbench report II by SC'97.

Funding Efforts

Jack Dongarra's recent benchmarking  proposal was turned down. Tony Hey mentioned the possibly of 
entering a proposal to the EU.
Possibility of a joint EU / NSF bid.

Mark Baker asked if SIO would be interested in being more closely involved.  William Gropp reported 
that 
SIO was actually winding down and so formal association was not really an option.

AOB

The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the 

Parkbench demonstrations which included:

-- Java Low-Level Benchmarks (Vladimir Getov)
-- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio
   Bergamaschi)
-- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney)

Jack Dongarra  informed the committee of  Parkbench BOF at SC'97 (Wednesday at 3.30PM).

The meeting was wound up by Tony Hey at 1630.

-----------------------------------------------------------------------------




-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 11/13/97 - Time: 10:48:53
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------

--mordillo:879418490:877:126:21579
Content-Type: APPLICATION/msword; name="minutes-fall-97.doc"
Content-Transfer-Encoding: BASE64
Content-Description: minutes-fall-97.doc

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB
AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD/////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA
hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA
AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA
AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA
Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA
AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka
AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA
AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA
AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA
AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU
GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA
AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA
ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA
BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp
bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N
VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0
aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0
Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu
cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291
dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg
LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W
bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92
dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA
Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu
Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj
cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt
aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh
bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp
DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv
dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51
aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1
Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl
cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu
c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv
bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy
b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg
d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh
c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv
bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz
c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91
dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D
b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh
dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz
IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg
R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t
bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp
dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0
ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp
b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS
OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk
IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg
YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg
LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk
aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu
Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg
bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp
YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl
bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l
YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj
dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg
R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl
cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI
ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh
bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs
ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j
aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm
Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz
aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g
SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l
bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg
YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N
RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo
bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt
ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg
dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp
ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0
ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH
cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk
b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg
YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp
bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT
b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u
c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu
Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s
IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1
bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD
VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s
IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl
IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz
ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg
VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD
AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA
AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA
jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB
BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ
AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA
AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA
hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm
APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA
CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB
XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD
AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA
AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA
wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi
Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A
AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh
8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6
AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA
IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA
+gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA
BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq
CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M
AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A
ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9
AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA
IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA
/QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC
wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr
AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A
AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA
AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A
CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA
AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u
dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C
AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ
AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6
XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl
cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs
bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz
dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl
eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv
YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u
XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy
MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA
tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl
ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA
AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg
NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA
AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt
ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp
Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy
IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb
GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA
AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy
ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA
AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA
AP7///8cAAAA/v/////////////////////////////////////////+////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////UgBvAG8AdAAg
AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u
Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA
AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA
bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp
YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt
IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN
Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB
////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA
AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA
dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA
AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA
vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K
AAAACwAAAAwAAAD+////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
/////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA
wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y
ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA
AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA
AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0
AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A
AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA
AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA
HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM
AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg
V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE
AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA
aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/
//////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN
1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA
dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC
AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA
AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA
AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA
ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA
AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE
AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA
AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA
AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA
AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAA

--mordillo:879418490:877:126:21579--

From owner-parkbench-comm@CS.UTK.EDU Mon Nov 17 08:32:09 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA28026; Mon, 17 Nov 1997 08:32:09 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA07698; Mon, 17 Nov 1997 07:58:13 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA07665; Mon, 17 Nov 1997 07:57:54 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2024828; 17 Nov 97 12:43 GMT
Message-ID: <06u4dCAfsDc0Ew8p@minnow.demon.co.uk>
Date: Mon, 17 Nov 1997 12:39:59 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: To the PARKBENCH97 BOF
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

                 GREETINGS TO THE PARKBENCH 1997 BOF
                 -----------------------------------
I am not able to attend the Parkbench BOF this year but would like to
make the following input:

Chairman: Please express my apologies for absence to the meeting.


Agenda Item: Low-Level Performance Evaluation tools.
             --------------------------------------
The latest version of the Parkbench Interactive Curve Fitting Tool 
(PICT2) is on my Web page at:

      http://www.minnow.demon.co.uk/pict/source/pict2a.html

I believe that this solves the problem of displaying on different 
sized screens. Please try it and give me feedback (I have had little
so far, so I don't know how worthwhile it is!). 

This plots and allows manual interactive curve fitting of data 
anywhere on the Web in raw-data, Original COMMS1, and New COMMS1 
format. However, it still relies on COMMS1 calculating the least 
squares 2-Para and 3-Point 3-Para fits. 

Agenda Item : Plans for the next Release.
              --------------------------
Just a reminder that New COMMS1 as announced in my email to the 
committee of 16 Feb 1997, was designed as the minimum necessary 
changes to the existing release to solve the problems raised at
the beginning of the year. It involves new versions of 5 routines
and 2 new routines. In addition, the Make files need the 2 new 
routines added where appropriate. We have incorporated these 
changes at Westminster in the existing release without trouble.

I believe that these should be incorported in the next release.

In summary:

New COMMS1

In directory:

http://www.minnow.demon.co.uk/Pbench/comms1/

The 5 Changed Routines:

(1) File COMMS1_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/COMMS1.f

(2) File COMMS1_1.INC replaces

ParkBench/Low_Level/comms1/src_mpi/comms1.inc

(3) File ESTCOM_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f

(4) File LSTSQ_1.F replaces

        ParkBench/lib/Low_Level/LSTSQ.f

(5) File CHECK_1.F replaces

        Parkbench/lib/Low_Level/CHECK.f

The 2 New Routines:

(6) File LINERR_1.F add as

        ParkBench/lib/Low_Level/LINERR.f

(7) File VPOWER_1.F add as

        ParkBench/lib/Low_Level/VPOWER.f


HAVE A NICE MEETING, and best wishes to you all,

Roger Hockney

-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon Dec  1 08:38:55 1997
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA05062; Mon, 1 Dec 1997 08:38:55 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA20432; Mon, 1 Dec 1997 08:03:34 -0500 (EST)
Received: from hermes.lsi.usp.br (hermes.lsi.usp.br [143.107.161.220]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA20425; Mon, 1 Dec 1997 08:03:30 -0500 (EST)
Received: from cali.lsi.usp.br (cali.lsi.usp.br [10.0.161.7]) by hermes.lsi.usp.br (8.8.5/8.7.3) with SMTP id LAA05866; Mon, 1 Dec 1997 11:03:20 -0200 (BDB)
Message-ID: <34830ABD.487C@lsi.usp.br>
Date: Mon, 01 Dec 1997 11:06:37 -0800
From: Martha Torres <mxtd@lsi.usp.br>
Organization: LSI
X-Mailer: Mozilla 3.01Gold (Win95; I)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
CC: mxtd@lsi.usp.br
Subject: compiling ParkBench for MPICH
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Sirs
ParkBench Committee

Dear Sirs,
I am Ph.D student and I am working with collective communication
operations. Particulary, I am interested in to quantify the influence
of collective communication operations on the total execution time
of several MPI-programs.

My platform is a cluster of 8 Dual Pentium Pro processors 
interconnected by 100Mb/s Fastethernet.
I use MPICH version 1.1, fort77 and cc compilers

I have downloaded ParkBench.tar from netlib. I followed 
all instructions but there are some programs that
did not work:
1. Low_Level/poly1 poly2 rinf1 tick1 tick2
They did not compile. It appears the following:
ParkBench/lib/LINUX/ParkBench_misc.a: No such file or
directory. 
How do I create this library??

2. Kernels/LU_solver QR TRD
They also did not compile. It appears the following:
ParkBench/lib/LINUX/pblas_subset.a: In function 'pberror_'
undefined reference to 'blacs_gridinfo_'
undefined reference to 'blacs_abort_'

3. Comp_Apps/PSTSWM and Kernels/MATMUL
They compiled but they did not run

Thanks in advance, 

Best Regards
Martha Torres
Laboratorio de Sistema Integraveis
University of Sao Paulo
Sao Paulo - S.P. Brazil


From owner-parkbench-lowlevel@CS.UTK.EDU Wed Dec  3 02:22:07 1997
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id CAA13224; Wed, 3 Dec 1997 02:22:07 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id CAA11602; Wed, 3 Dec 1997 02:22:29 -0500 (EST)
Received: from soran.pacific.net.sg (soran.pacific.net.sg [203.120.90.76]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id CAA11594; Wed, 3 Dec 1997 02:22:26 -0500 (EST)
From: <htchng@honda.insasbhd.com>
Received: from pop1.pacific.net.sg (pop1.pacific.net.sg [203.120.90.85])
	by soran.pacific.net.sg with ESMTP
	id PAA08723 for <pbwg-compactapp@cs.utk.edu>; Wed, 3 Dec 1997 15:22:07 +0800 (SGT)
Received: from pacific.net.sg ([203.116.15.109])
        by pop1.pacific.net.sg with SMTP
        id PAA19445 for <pbwg-compactapp@cs.utk.edu>; Wed, 3 Dec 1997 15:22:19 +0800 (SGT)
Message-Id: <199712030722.PAA19445@pop1.pacific.net.sg>
To: pbwg-compactapp@CS.UTK.EDU
Date: Wed,  3 Dec 97 15:25:30 +0800
Subject: Seeking Importer for Blank CD-R and  Computer Parts
X-Mailer: Crescent Internet ToolPak OLE Mail Control v.1.0

Dear Sir,

I understand that you are a computer reseller/trader.
(If you not, or not interested in this message, DO NOTHING, as we might have made a mistake)
We respect your privacy. 
As such, we only followup if you are interested and responded to our mail.

We are seeking importer for the following products:-
Able to supply the following in bulk / small quantity.
 
1.CD-R (Jewel Case)   
2.CD-R (Spindle)      
3.CD-R replicator (4pcs/hour, 50pcs tower)
4.Yamaha CDR400 (4x write, 6x read) recorder.
5.CD-RW as well as its recorder
6.PC Mother Board
7.PC RAMs
8.PC CPUs.
All products FOB Singapore.
Clients to specify freight forwarder.

 
Thank you very much. 
Have a nice day.

Best regards,

Manager,
Insas Networks.


From owner-parkbench-comm@CS.UTK.EDU Wed Jan  7 16:49:19 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA19963; Wed, 7 Jan 1998 16:49:19 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA17461; Wed, 7 Jan 1998 16:30:05 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA17452; Wed, 7 Jan 1998 16:30:02 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA16817 for <parkbench-comm@CS.UTK.EDU>; Wed, 7 Jan 1998 15:30:03 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA27253; Wed, 7 Jan 1998 15:30:00 -0600 (CST)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id VAA26077; Wed, 7 Jan 1998 21:29:59 GMT
Message-Id: <199801072129.VAA26077@magnet.cray.com>
Subject: Low Level benchmarks
To: parkbench-comm@CS.UTK.EDU
Date: Wed, 7 Jan 1998 15:29:59 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


-- 
Charles Grassl

From owner-parkbench-comm@CS.UTK.EDU Wed Jan  7 16:56:40 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA19981; Wed, 7 Jan 1998 16:56:40 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA17784; Wed, 7 Jan 1998 16:36:27 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA17776; Wed, 7 Jan 1998 16:36:24 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA17087 for <parkbench-comm@cs.utk.edu>; Wed, 7 Jan 1998 15:36:24 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA28449 for <parkbench-comm@cs.utk.edu>; Wed, 7 Jan 1998 15:36:22 -0600 (CST)
Received: from magnet by magnet.cray.com (8.8.0/btd-b3) via SMTP
          id VAA26107; Wed, 7 Jan 1998 21:36:21 GMT
Sender: cmg@cray.com
Message-ID: <34B3F553.167E@cray.com>
Date: Wed, 07 Jan 1998 15:36:19 -0600
From: Charles Grassl <cmg@cray.com>
Organization: Cray Research
X-Mailer: Mozilla 3.01SC-SGI (X11; I; IRIX 6.2 IP22)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level benchmark errors and differences
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

To:      Parkbench Low Level interests
From:    Charles Grassl

Subject: Low Level benchmark errors and differences

Date:    7 January, 1998


We should not produce or publish Parkbench Low level benchmark results
with the current suite of programs because the programs are inaccurate
and unreliable.  I ran the Low Level programs and compared the results
with the same metrics as recorded from other benchmark programs.
The differences range from less than 5% (acceptable) to a factor of 6
times difference, which is unacceptable.

The differences, or "errors", are summarized in the table below.
The recorded differences in results from the Low Level program were
arrived at by comparing the Parkbench program reported metrics with the
same metrics as measured by alternative programs.


       Table.  Differences in Low Level benchmark results
               for two systems.  System A is an Origin 2000.
               System B is a CRAY T3E.

                     System A          System B
                  Rinf  Startup    Rinf   Startup
        -----------------------------------------
        COMMS1    <10%     6x       <5%      6x
        COMMS2      2x     3x       <5%     <5%
        COMMS3     <5%              <5%
        POLY1      <5%    60%        2x     <5%
        POLY2      <5%    60%        2x     <5%
        POLY3       -      -         2x     80x


The Parkbench Low Level programs are occasionally requested for
benchmarking computer systems, but the results are usually rejected
because of their inaccuracy and unreliability.  If not rejected, they
cause confusion and consternation because the results do not agree
with other measurements of the same variables.  I emphasize that this
is not a case of obtaining optimization and favorable results for a
computer system.  The problem is with the inaccuracy and unreliability
of the results.

The Low Level programs measure and report low level parameters.
Therefore their value is in accuracy and utility.  The programs do not
constitute definitions of the reported metrics and hence the results
should correlate with other measurements of the the same variables.

The Low Level programs are obsolete and need to be replaced.  I have
written seven simple programs, with MPI and PVM versions, and offer them
as a replacement for the Low Level suite.

I strongly suggest that we delete or withdraw from distribution the
current Low Level suite.

From owner-parkbench-comm@CS.UTK.EDU Thu Jan  8 05:40:28 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA01529; Thu, 8 Jan 1998 05:40:28 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id FAA00442; Thu, 8 Jan 1998 05:20:21 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id FAA00380; Thu, 8 Jan 1998 05:20:13 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id LAA28869; Thu, 8 Jan 1998 11:20:05 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id LAA24864; Thu, 8 Jan 1998 11:18:48 +0100
Date: Thu, 8 Jan 1998 11:18:48 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de>
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level benchmark errors and differences
Cc: ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de
Reply-To: hempel@ccrl-nece.technopark.gmd.de

To:		Parkbench Low Level interests
From:		Rolf Hempel

Subject:	Low Level benchmark errors and differences,
		Note from Charles Grassl of January 7th

Date:		8 January, 1998


Thank you, Charles, for your note on the Low Level benchmarks. It could
not have come at a better time, because at NEC we just recently ran into
problems with COMMS1.

This code had been specified by a customer as a test case in a current
procurement. When we ran COMMS1 with our current MPI library, the
results for rinfinity and latency were completely wrong. In particular,
the latency values were off by more than a factor of two, when compared
with other ping-pong test programs. The following turned out to be the
main reasons for the errors:

1. The performance model is completely inadequate. A linear dependency
   between time and message length, fitted to the measurements by
   least squares, is bound to fail in the presence of discontinuities
   caused by protocol changes. Most MPI implementations change
   protocols for different message lengths for an overall performance
   optimization.

2. To make things worse, the least square fit overweighs the data points
   for very long messages, because the differences "model minus
   measurement" are largest there in absolute terms. The fitted line,
   therefore, more or less ignores the short message measurements.
   As a result, the latencies are completely up to chance.

3. The correction for internal measurement overhead (e.g., for
   subroutine calls) is programmed in a sloppy way, to say the least.
   We discovered several subroutine calls which were not
   taken into account, and the overhead is measured with low
   precision. For our implementation, this alone introduced a latency
   error of about 25%.
   
The result in our case was that, instead of the 13.5 usec latency
measured by the MPICH MPPTEST routine, COMMS1 initially reported some
28 usec. My colleague Hubert Ritzdorf then made an interesting
experiment: he removed some optimization from our MPI library for
long messages, thus INCREASING the communication times for messages
longer than 128000 bytes, and not changing anything for shorter
messages. The resulting DROP in latency from 28 to under 22 usec
clearly shows how ridiculous the COMMS1 benchmark is.

Thus, I strongly agree with Charles in that the COMMS* benchmarks
must be removed from PARKBENCH. They don't help anybody, and they
only cause confusion on the side of customers and frustration on the
side of benchmarkers. Let's get rid of this long-standing nuisance as
quickly as possible.

Best regards,
 Rolf Hempel
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Thu Jan  8 08:07:54 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA02383; Thu, 8 Jan 1998 08:07:53 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA05392; Thu, 8 Jan 1998 07:50:13 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA05383; Thu, 8 Jan 1998 07:50:03 -0500 (EST)
Received: from mordillo (p108.nas1.is4.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA03072; Thu, 8 Jan 98 12:48:32 GMT
Date: Thu,  8 Jan 98 12:10:55 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: Low Level benchmark errors and differences 
To: Charles Grassl  <cmg@cray.com>, parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <34B3F553.167E@cray.com> 
Message-Id: <Chameleon.884263257.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

I am in agreement with Charles and Rolf about the low-level codes.

We've known for some time that they (the codes) are less than perfect,
if not in some cases flawed. At the SC'97 Parkbench meeting it was
mooted that Parkbench should concentrate on producing, supporting,
analysing and recording Low-Level codes and results. If this is the
case then we should certainly ensure that what we support codes
that are soundly written and produce consistent and reliable results. 

I certainly believe that a set of codes, akin to the low-level ones,
should be part of the Parkbench suite. Maybe this is a good time to
replace the current codes with those that Charles has produced !?

As a side issue, I think we should produce C versions of whatever low-level
codes we produce.

Charles, I'd be interested in your thoughts on the codes that Pallas produce 
- ftp://ftp.pallas.de/pub/PALLAS/PMB/PMB10.tar.gz. These are C benchmark 
codes that run: 

PingPong - like comms1
PingPing - like comms2
Xover
Cshift
Exchange 
Allreduce
Bcast
Barrier - like synch1

Obviously, I would'nt like to comment on how well written they are or how reliable
the results that they produce are. I'm relatively impressed with them. I also like
the fact they try and produce results for commonly used MPI functions - 
cshift/exchange/etc. I've run the codes on NT boxes and they appear to produce 
results close to what I would expect. 

Regards

Mark


--- On Wed, 07 Jan 1998 15:36:19 -0600  Charles Grassl <cmg@cray.com> wrote:
> To:      Parkbench Low Level interests
> From:    Charles Grassl
> 
> Subject: Low Level benchmark errors and differences
> 
> Date:    7 January, 1998
> 
> 
> We should not produce or publish Parkbench Low level benchmark results
> with the current suite of programs because the programs are inaccurate
> and unreliable.  I ran the Low Level programs and compared the results
> with the same metrics as recorded from other benchmark programs.
> The differences range from less than 5% (acceptable) to a factor of 6
> times difference, which is unacceptable.
> 
> The differences, or "errors", are summarized in the table below.
> The recorded differences in results from the Low Level program were
> arrived at by comparing the Parkbench program reported metrics with the
> same metrics as measured by alternative programs.
> 
> 
>        Table.  Differences in Low Level benchmark results
>                for two systems.  System A is an Origin 2000.
>                System B is a CRAY T3E.
> 
>                      System A          System B
>                   Rinf  Startup    Rinf   Startup
>         -----------------------------------------
>         COMMS1    <10%     6x       <5%      6x
>         COMMS2      2x     3x       <5%     <5%
>         COMMS3     <5%              <5%
>         POLY1      <5%    60%        2x     <5%
>         POLY2      <5%    60%        2x     <5%
>         POLY3       -      -         2x     80x
> 
> 
> The Parkbench Low Level programs are occasionally requested for
> benchmarking computer systems, but the results are usually rejected
> because of their inaccuracy and unreliability.  If not rejected, they
> cause confusion and consternation because the results do not agree
> with other measurements of the same variables.  I emphasize that this
> is not a case of obtaining optimization and favorable results for a
> computer system.  The problem is with the inaccuracy and unreliability
> of the results.
> 
> The Low Level programs measure and report low level parameters.
> Therefore their value is in accuracy and utility.  The programs do not
> constitute definitions of the reported metrics and hence the results
> should correlate with other measurements of the the same variables.
> 
> The Low Level programs are obsolete and need to be replaced.  I have
> written seven simple programs, with MPI and PVM versions, and offer them
> as a replacement for the Low Level suite.
> 
> I strongly suggest that we delete or withdraw from distribution the
> current Low Level suite.
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 01/08/98 - Time: 12:10:55
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Jan 12 16:02:28 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA26216; Mon, 12 Jan 1998 16:02:28 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA16631; Mon, 12 Jan 1998 15:38:05 -0500 (EST)
Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA16588; Mon, 12 Jan 1998 15:37:38 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa2012292; 12 Jan 98 17:34 GMT
Message-ID: <X8YQ8DANPlu0Ewpp@minnow.demon.co.uk>
Date: Mon, 12 Jan 1998 17:33:01 +0000
To: hempel@ccrl-nece.technopark.gmd.de
Cc: parkbench-comm@CS.UTK.EDU, ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Re: Low Level benchmark errors and differences
In-Reply-To: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de>
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

To: Rolf, Charles, Mark and others,

From: Roger 

I too am distressed to see the original COMMS1 code (written and tested
for message lengths only up to 10^4) is still being issued by Parkbench
and being used well outside its range of proven validity (message
lengths now typically up to 10^7 or even 10^8).

These problems were pointed out about one year ago by Charles and Ron,
and as a result I worked on the code and issued to the committee a
minmum set of changes to the current release that would solve many of
the problems. These involve replacing five existing routines and adding
two to the existing release. The routines involved have been
downloadable from my Web site since about 12 March 1997 and have been
used successfully at Westminster University in our work. 

The New COMMS1, as I called it, was the subject of two printed reports
to the May 1997 meeting of Parkbench and  further results were shown at
the Sept 1997 meeting. There were also extensive discussions in this
email group during 1997.

Unfortunately my simple fixes were not inserted into the Parkbench
release and as a result we are still getting a bad press from
benchmarkers. After all the effort I put into solving this problem a
year ago, I feel rather let down that my work was never used. If my
changes had been encorporated into the Parkbenchmarks when they were
offered at least as an interim measure, I believe we could have avoided
much of the current bad publicity. 

I emphasise that the New COMMS1 was written as a minimum patch to the
existing release to solve an urgent problem in the simplest way. I am
not against a complete rethink of the low level benchmarks and now that
MPI has become a recognised standard, benchmarks timing the principal
software primitives of MPI would seem to be the most useful. Quite
possibly Charles's or Mucci's codes could be used.

However, I am still firmly convinced of the value of approximate
parametric representation of all the benchmark measurements based on a
simple performance model. Most of the existing low-level benchmarks were
written primarily to determine such parameters and hence include both
raw measurements and least squares curve fitting to obtain the
parameters. I have yet to see data that cannot be satisfactorily fitted
by 2 or 3 parameters, or two sets of 2-paras. And remember that I am
talking here about fitting ALL the measured data by some simple
formulae.    

After the decision of the May 1997 meeting to separate the raw
measurements from the parametric curve fitting, the curve fitting will
eventually become part of the "Parkbench Interactive Curve Fitting Tool"
(PICT). At present this applet can be used to produce a manual curve
fit, but eventually I will put up on my Web site a version in which the
least squares and 3-point buttons are active. But PICT as it is can now
be used manually to see how good or bad the 2-para and 3-para fits are.
Turn your browser to:

       http://www.minnow.demon.co.uk/pict/source/pict2a.html

and insert your raw data. I would be very interested to see what the NEC
data looks like.  

To answer some of Rolf's points: 

Rolf Hempel <hempel@ccrl-nece.technopark.gmd.de> writes
>
>1. The performance model is completely inadequate. A linear dependency
>   between time and message length, fitted to the measurements by
>   least squares, is bound to fail in the presence of discontinuities
>   caused by protocol changes. Most MPI implementations change
>   protocols for different message lengths for an overall performance
>   optimization.
>
Note that the original COMMS1 that you are using allows you to insert
one break point to take account of one major discontinuity. Have you
tried this?

In any case, to make t_0 a good measure of startup it is sensible ALWAYS
to make a breakpoint at say 100 or 1000 Byte, then the short message t_0
should be a good measure of startup. The long message t_0 is then not of
interest and should be ignored. In this way one is using the straight-
line fit over a short range of lengths, and the resulting t_0 should be
a better estimate of latency because it is derived from several
measurements rather than just selecting a single measurement (e.g. the
time for the shortest message) -- surely a better experimental
procedure. I emphasise that this procedure can be used now with the
original COMMS1 to get sensible results.

If there are many small discontinuities or changes of protocol then I
expect you data is rather like that shown by Charles this time last year
and used as an example in PICT. In this case the 3-para fit may give
good results for your data as it did for Charles's.

>2. To make things worse, the least square fit overweighs the data points
>   for very long messages, because the differences "model minus
>   measurement" are largest there in absolute terms. The fitted line,
>   therefore, more or less ignores the short message measurements.
>   As a result, the latencies are completely up to chance.
>
This is absolutely true and was discovered to be the problem one year
ago. My solution, used in the New COMMS1, was and is to minimise the sum
of the squares of the relative (rather than absolute) error. If this is
done the values for short messages are not ignored in the way described,
and t_0 is held much closer to the time for the smallest message length.

Note also that the 3-parameter fit provided by New COMMS1 can be fitted
exactly to the time for the shortest message, to the bandwidth for the
longest message, and to the bandwidth near the mid point. This is the
so-called 3-point fit, but it does require a third parameter.

Can you please email me the output file for the NEC from the original
COMMS1. I can then put this data through the New COMMS1 and see what two
and three parameter fits are produced.

Otherwise you could update your version of Parkbenchmarks with the 7
subroutines and rerun using New COMMS1. See the instructions at the end
of this email.
 
>28 usec. My colleague Hubert Ritzdorf then made an interesting
>experiment: he removed some optimization from our MPI library for
>long messages, thus INCREASING the communication times for messages
>longer than 128000 bytes, and not changing anything for shorter
>messages. The resulting DROP in latency from 28 to under 22 usec
>clearly shows how ridiculous the COMMS1 benchmark is.
>
Hubert's results are just what one would expect from minimising the
absolute error. I suspect you would not see this effect with New COMMS1
which does not over-emphasise the long message measurements.

Please remember that the t_0 reported by COMMS1 is not a measurement of
the time for any particular message length. It is the constant term in
the fitted curve:

                        t = t_0 + n/rinf

which is an approximation to ALL the measured data.

If you want to know the time, say for the smallest message length, then
that is listed in the table of lengths and times reported in the
benchmark output. If you mean by latency the time for the shortest
message (hopefully zero or 1 Byte) then the COMMS1 measurements of this
are in this table not in t_0.

For those who missed my two earlier emailings on using the New COMMS1, I
copy my earlier email below:

Agenda Item : Plans for the next Release.
              --------------------------
Just a reminder that New COMMS1 as announced in my email to the 
committee of 16 Feb 1997, was designed as the minimum necessary 
changes to the existing release to solve the problems raised at
the beginning of the year. It involves new versions of 5 routines
and 2 new routines. In addition, the Make files need the 2 new 
routines added where appropriate. We have incorporated these 
changes at Westminster in the existing release without trouble.

I believe that these should be incorported in the next release.

In summary:

New COMMS1

In directory:

http://www.minnow.demon.co.uk/Pbench/comms1/

The 5 Changed Routines:

(1) File COMMS1_1.F replaces the following file in the current release:

        ParkBench/Low_Level/comms1/src_mpi/COMMS1.f

(2) File COMMS1_1.INC replaces

        ParkBench/Low_Level/comms1/src_mpi/comms1.inc

(3) File ESTCOM_1.F replaces

        ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f

(4) File LSTSQ_1.F replaces

        ParkBench/lib/Low_Level/LSTSQ.f

(5) File CHECK_1.F replaces

        Parkbench/lib/Low_Level/CHECK.f

The 2 New Routines:

(6) File LINERR_1.F add as

        ParkBench/lib/Low_Level/LINERR.f

(7) File VPOWER_1.F add as

        ParkBench/lib/Low_Level/VPOWER.f

Best wishes to you all

Roger
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Tue Jan 13 08:38:07 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA17513; Tue, 13 Jan 1998 08:38:07 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA03191; Tue, 13 Jan 1998 08:20:10 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA03184; Tue, 13 Jan 1998 08:20:07 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id OAA04953; Tue, 13 Jan 1998 14:19:47 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA02202; Tue, 13 Jan 1998 14:18:30 +0100
Date: Tue, 13 Jan 1998 14:18:30 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801131318.OAA02202@sgi7.ccrl-nece.technopark.gmd.de>
To: roger@minnow.demon.co.uk
Subject: COMMS1 Benchmark
Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        eckhard@ess.nec.de, clantwin@ess.nec.de, parkbench-comm@CS.UTK.EDU
Reply-To: hempel@ccrl-nece.technopark.gmd.de

Dear Roger,

thank you for your note on the COMMS1 benchmark. We didn't try the
NEW COMMS1 code yet with our MPI library, so I cannot comment on its
accuracy. I just would like to answer some of the issues you raised
in your mail.

Of course we have seen that in COMMS1 you can select a transition point
between a short and a long model. For this choice, however, you have
to be able to change the input data. In our case (a benchmark suite
used in a procurement) our customer had provided the input dataset,
and we were not allowed to change it. So, the only way for us to correct
the results was to tune our MPI library to make it fit to the benchmark
program. I don't think that this is what you had in mind when you
wrote COMMS1.

You didn't comment on the inaccuracies we found in the raw measurements.
We ran several ping-pong benchmarks before, as, for example, the
MPPTEST routine of MPICH, and they consistently give better latencies
for short messages (difference approx. 25%). As I explained in my
previous mail, we found the reason to be an improper correction for
measurement overheads in COMMS1. Thus, the raw data are flawed,
and this cannot be resolved by any parameter fitting. This is also the
reason that I hesitate to send you the raw data reported by COMMS1 on
our machine.

I agree with you that it would be nice to have a few parameters to
characterize the performance of any given system. The values for
"n1/2" and "rinfinity" have been quite successful for vector arithmetic
operations. The situation is, however, much more complicated for
communication operations.

As an example, let's take the famous ping-pong benchmark. We already
discussed the problem of discontinuities caused by protocol changes.
If you want to do a parameter fitting, the only reasonable solution
seems to me that your test program automatically detects such points
and handles the different protocols separately. If you leave the
selection to an input parameter, you will inevitably run into the
problem I discussed above.

Even if you solve this problem, there remain many others. In modern
(i.e. highly optimized) MPI implementations, the performance of a
ping-pong operation crucially depends on the status of the two processes
involved. Is the receiving process already waiting for the message?
In a ping-pong, it usually is. This can make a huge difference!
Also, the performance can also depend on the global number of processes
active in the application. Not only do search lists in communication
progress engines become shorter if there are fewer processes, but some
implementers even went as far as writing special code for the case
where you just have two processes. Ping-pong codes such as COMMS1
almost always just use two communicating processes, so they measure the
best case. Another effect which is too often ignored is that messages
can interfere with each other (both at the hardware and software level)
if they are sent at the same time between different process pairs.
All those effects combined cause a substantial difference between
ping-pong results and measurements in real applications. In this
situation the apparent precision of performance parameters can be
quite misleading.

If I want to judge the quality of an MPI implementation, I don't
trust in best fit parameters so much. For the ping-pong code, I just
look at a graphic representation of time versus message length for
short messages, and another one of bandwidth versus message length for
long messages. This way I can study discontinuities and other minor
effects in detail. And then, take real applications and measure the
communication times there. Then you will often find surprising 
results which you have never seen in a ping-pong benchmark.

Best wishes,
  Rolf
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Thu Jan 15 14:17:57 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA00690; Thu, 15 Jan 1998 14:17:56 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA23858; Thu, 15 Jan 1998 13:55:08 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA23830; Thu, 15 Jan 1998 13:54:57 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id LAA11159 for <parkbench-comm@cs.utk.edu>; Thu, 15 Jan 1998 11:11:42 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id LAA08650 for <parkbench-comm@cs.utk.edu>; Thu, 15 Jan 1998 11:11:41 -0600 (CST)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id RAA07227; Thu, 15 Jan 1998 17:11:40 GMT
Message-Id: <199801151711.RAA07227@magnet.cray.com>
Subject: Low Level Benchmarks
To: parkbench-comm@CS.UTK.EDU
Date: Thu, 15 Jan 1998 11:11:39 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


To:      Parkbench interests
From:    Charles Grassl
Subject: Low Level benchmarks

Date:    15 January, 1998


Mark, thank you for pointing us to the PMB benchmark.  It is well written
and coded, but has some discrepancies and shortcomings.  My comments
lead to suggestions and recommendation regarding low level communication
benchmarks.

First, in program PMB the PingPong tests are twice as fast (in time)
as the corresponding message length tests in the PingPing tests (as run
on a CRAY T3E).  The calculation of the time and bandwidth is incorrect
by a factor of 100% in one of the programs.

This error can be fixed by recording, using and reporting the actual
time, amount of data sent and their ratio.  That is, the time should not
be divided by two in order to correct for a round trip.  This recorded
time is for a round trip message, and is not precisely the time for
two messages.  Half the round trip message passing time, as reported in
the PMB tests, is not the time for a single message and should not be
reported and such.  This same erroneous technique is used in the COMMS1
and COMMS2 two benchmarks.  (Is Parkbench is responsible for propagating
this incorrect methodology.)

In program PMB, the testing procedure performs a "warm up".  This
procedure is a poor testing methodology because is discards important
data.  Testing programs such as this should record all times and calculate
the variance and other statistics in order to perform error analysis.

Program PMB does not measure contention or allow extraction of network
contention data.  Tests "Allreduce" and "Bcast" and several others
stress the inter-PE communication network with multiple messages,
but it is not possible to extract information about the contention from
these tests.  The MPI routines for Allreduce and Bcast have algorithms
which change with respect to number of PEs and message lengths,  Hence,
without detailed information about the specific algorithms used, we cannot
extract information about network performance or further characterize
the inter-PE network.

Basic measurements must be separated from algorithms.  Tests PingPong,
PingPing, Barrier, Xover, Cshift and Exchange are low level.  Tests
Allreduce and Bcast are algorithms.  The algorithms Allreduce and Bcast
need additional (algorithmic) information in order to be described in
terms of the basic level benchmarks.


With respect to low level testing, the round trip exchange of messages,
as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not
characteristic of the lowest level of communication.  This pattern
is actually rather rare in programming practice.  It is more common
for tasks to send single messages and/or to receive single messages.
In this scheme, messages do not make a round trip and there is not
necessarily caching or other coherency effects.

The single message passing is a distinctly different case from that
of round trip tests.  We should be worried that the round trip testing
might introduce artifacts not characteristic of actual (low level) usage.
We need a better test of basic bandwidth and latency in order to measure
and characterize message passing performance.


Here are suggestions and requirements, in an outline form, for a low
level benchmark design:



    I. Single and double (bidirectional) messages.

       A. Test single messages, not round trips.
         1. The round trip test is an algorithm and a pattern.  As
            such it should not be used as the basic low level test of
            bandwidth.
         2. Use direct measurements where possible (which is nearly
            always).  For experimental design, the simplest method is
            the most desirable and best.
         3. Do not perform least squares fits A PIORI.  We know that
            the various message passing mechanisms are not linear or
            analytic because different mechanisms are used for different
            message sizes.  It is not necessarily known before hand
            where this transition occurs.  Some computer systems have
            more than two regimes and their boundaries are dynamic.
         4. Our discussion of least squares fitting is loosing tract
            of experimental design versus modeling.  For example, the
            least squares parameter for t_0 from COMMS1 is not a better
            estimate of latency than actual measurements (assuming
            that the timer resolution is adequate).  A "better" way to
            measure latency is to perform addition DIRECT measurements,
            repetitions or otherwise, and hence decrease the statistical
            error.  The fitting as used in the COMMS programs SPREADS
            error.  It does not reduce error and hence it is not a
            good technique for measuring such an important parameter
            as latency.

       B. Do not test zero length messages.  Though valid, zero length
          messages are likely to take special paths through library
          routines.  This special case is not particularly interesting or
          important.
          1. In practice, the most common and important message size is 64
             bits (one word).  The time for this message is the starting
             point for bandwidth characterization.

       D. Record all times and use statistics to characterize the message
          passing time.  That is, do not prime or warm up caches
          or buffers.  Timings for unprimed caches and buffers give
          interesting and important bounds.  These timings are also the
          nearest to typical usage.  
          1. Characterize message rates by a minimum, maximum, average
             and standard deviation.

       E. Test inhomogeneity of the communication network.  The basic
          message test should be performed for all pairs of PEs.
   

   II. Contention.

       A. Measure network contention relative to all PEs sending and/or
          receiving messages.

       B. Do not use high level routines where the algorithm is not known.
          1. With high level algorithms, we cannot deduce which component
             of the timing is attributable to the "operation count"
             and which is attributable to the actual system (hardware)
             performance.


  III. Barrier.

       A. Simple test of barrier time for all numbers of processors.




Additionally, the suite should be easy to use.  C and Fortran programs
for direct measurements of message passing times are short and simple.
These simple tests are of order 100 lines of code and, at least in
Fortran 90, can be written in a portable and reliable manner.

The current Parkbench low level suite does not satisfy the above
requirements.  It is inaccurate, as pointed out by previous letters, and
uses questionable techniques and methodologies.  It is also difficult to
use, witness the proliferation of files, patches, directories, libraries
and the complexity and size of the Makefiles.

This Low Level suite is a burden for those who are expecting a tool to
evaluate and investigate computer performance.  The suite is becoming
a liability for our group.  As such, it should be withdrawn from
distribution.

I offer to write, test and submit a new set of programs which satisfy
most of the above requirements.


Charles Grassl
SGI/Cray Research
Eagan, Minnesota  USA

From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 09:12:18 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id JAA11774; Fri, 16 Jan 1998 09:12:18 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA16130; Fri, 16 Jan 1998 08:53:07 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA16123; Fri, 16 Jan 1998 08:53:06 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id IAA01963; Fri, 16 Jan 1998 08:52:17 -0500 (EST)
Date: Fri, 16 Jan 1998 08:52:17 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801161352.IAA01963@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
In-Reply-To: Mail from 'Charles Grassl <cmg@cray.com>'
      dated: Thu, 15 Jan 1998 11:11:39 -0600 (CST)
Cc: worley@haven.EPM.ORNL.GOV, ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de,
        eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        tbeckers@ess.nec.de

I have not been paying close attention to the current Low Level communication
suite discussions, having confidence in capabilities and resolve of the
current participants, but have decided to muddy the waters with a few personal
observations. 

1) I do not use the Low Level suite in my own performnace-related work. 
   I find that the interpretation of results is much easier if the
   experiments are designed to answer (my) specific performance questions.
   Producing numbers that are accurate enough and whose experiments are
   well-enough understood to be used to answer arbitrary performance
   questions is much more difficult. 

2) It may be time to revisit the goals of the Low Level suite. There are
   two obvious extremes.

   a) Determine some (hopefully representative) metrics of point-to-point
      communication performance, concentrating on making the measurements
      fair when comparing across platforms, but not requiring that the
      underlying architecture parameters be derivable from these numbers,
      or that they agree exactly with any other group's measurements.
      In this situation, a two (or more) parameter model fit to the data can
      be useful, if only as a shorthand for the raw data, but the model
      should not be expected to explain the data.

   b) Characterize the low level communication performance for each
      platform. Charles Grassl's latest recommendation is a first step in
      that direction. As a personal aside, I attempted such an exercise
      a few years ago (on the T3D, looking at the effect of common usage
      patterns on performance, not just ping-pong between nearest
      neighbors). I quickly became swamped by the amount of data and by the
      number of ways of presenting it (and the work was never written up). I
      realize now that my problem was trying to address too many evaluation
      questions simultaneously. 

      In addition to the large amount of data required, an accurate
      characterization is likely to require more platform-specific
      elements, and will continue to evolve as new machines are added, in
      order to be as fair to the new machines as it is to the old ones.
      (The two parameter models are very acurrate for some of the previous
       generation of homogeneous message-passing platforms.)
    

    In case my sympathies are not clear, I prefer to revisit and fix the
    current suite, "dumbing it down", if only in presentation, making it clear
    what it does and does not measure. In my own work, the
    point-to-point measurements are only for establishing a general
    performance baseline. The important measures are the performance
    observed in the kernel and full application codes. The baseline
    measurements are simply to assess the "peak achieveable" communication 
    performance.

    While a full characterization is an important thing to do, I do not
    believe that this group has the manpower, resources, or staying power to
    do it right. At one time in the past, we proposed to simply be a
    clearinghouse for the best of the performance measurement codes. If
    Charles wants to write and submit such an extensive low level suite, we
    can consider it, but in the meantime we should address the problems in
    the current suite, and not claim more than is appropriate. In particular,
    make sure that the customer does not become concerned that the
    vendor-stated latency and bandwidth does not match the PARKBENCH reported
    values. A discrepancy does not necessarily mean that someone is lying,
    simply that different aspects are being measured. But we should also be
    sure that intermachine comparisons using PARKBENCH measurements are
    valid, otherwise, they serve no purpose.

Pat Worley


PS.  - I may be in the fringe, but all my codes are written using variants
       of SWAP and SENDRECV, and most of the codes I see can be written in
       such a fashion (and could gain something from it). So, ping-pong and
       ping-ping are not irrelevant to me. 

PPS. - Of course the real reason for using ping-pong is the difficulty in
       measuring the time for one-way messaging. I was not aware that this
       was a solved problem, at least at the MPI or PVM level. Perhaps
       system instrumentation can answer it, but I didn't know that
       portable measurement codes could be guaranteed to do so across the
       different platforms.


From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 10:57:55 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA13381; Fri, 16 Jan 1998 10:57:55 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA20483; Fri, 16 Jan 1998 10:38:52 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA20468; Fri, 16 Jan 1998 10:38:45 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id QAA09438; Fri, 16 Jan 1998 16:38:41 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA04930; Fri, 16 Jan 1998 16:37:14 +0100
Date: Fri, 16 Jan 1998 16:37:14 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801161537.QAA04930@sgi7.ccrl-nece.technopark.gmd.de>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de,
        eckhard@ess.nec.de, clantwin@ess.nec.de,
        zimmermann@ccrl-nece.technopark.gmd.de,
        ritzdorf@ccrl-nece.technopark.gmd.de,
        hempel@ccrl-nece.technopark.gmd.de
Reply-To: hempel@ccrl-nece.technopark.gmd.de

I would like to send some remarks to the notes by Charles Grassl and
Pat Worley on the problem of low-level communication benchmarks.

As Pat pointed out, the ping-pong benchmark has been invented because
generally there is no global clock by which you could measure the time
for a single message. Everybody knows that this is no perfect solution,
and in my previous mail I already explained some aspects of why
ping-pong results can differ substantially from times found in real
applications. So, I think we will have to use ping-pong tests in
the future, with the caveat that they only measure a very special case
of message-passing. If Charles knows a way to measure single messages,
I would like to learn about it.

In most other points I agree with Charles. I'm strongly convinced that
the COMMS* routines are obsolete and should be replaced with something
reasonable. In particular, the current routines are far too complicated
to use, and give completely meaningless results. Therefore, I think one
should not even try to correct the COMMS* routines, especially as there
are already better alternatives available. One example is the PMB suite
of PALLAS. It is relatively easy to use, but the documentation should
provide more information than the internal calling tree given in the
README file. What is missing is a precise definition of the underlying
measuring methodology.

I strongly prefer the output of timing tables (perhaps translated in
good graphical representations) over crude parametrizations like the
ones in the COMMS* benchmarks. Those can only frustrate the experts
and confuse all other people.

As to the definition of latency, Charles is right in saying that zero
byte messages are dangerous because they often use special algorithms.
The straightforward solution to use 1 byte messages instead is bad
because usually messages are sent as multiples of 4 or 8 bytes, and for
other message lengths some overhead by additional copying or even
subroutine calls may be introduced. Since the lengths of most real
messages are multiples of 4 or 8 bytes, I support Charles' proposal to
measure the time for an 8 byte message and call it the latency.

I think the warm-up phase before the actual benchmarking is important
in order not to smear out initialization overheads over some number of
messages. The time for the first ping-pong (or other operation), 
however, should be measured and compared with the time found for the
following operations.

I very much welcome Charles Grassl's kind offer to write a new benchmark
suite. Perhaps there are even other suites available which could also
be candidates for getting adopted by PARKBENCH. This forum meanwhile
is quite well-known, which gives them considerable responsibility.
PARKBENCH's choice of benchmark programs influences procurements of new
machines world-wide, and the availability of a good set of low level
benchmarks could give PARKBENCH a good reputation. I'm afraid that the
current set of routines has the opposite effect.

- Rolf Hempel
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 12:46:04 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA14801; Fri, 16 Jan 1998 12:46:04 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA27007; Fri, 16 Jan 1998 12:29:03 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id MAA27000; Fri, 16 Jan 1998 12:29:01 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id MAA02149; Fri, 16 Jan 1998 12:29:01 -0500 (EST)
Date: Fri, 16 Jan 1998 12:29:01 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801161729.MAA02149@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level Benchmarks
In-Reply-To: Mail from 'hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)'
      dated: Fri, 16 Jan 1998 16:37:14 +0100
Cc: worley@haven.EPM.ORNL.GOV

	In most other points I agree with Charles. I'm strongly convinced that
	the COMMS* routines are obsolete and should be replaced with something
	reasonable. 

I have no problem with this. As I indicated, I have no experience with these.

	What is missing is a precise definition of the underlying measuring
        methodology. 

Perhaps this is the point that I was trying to make. Not only must the
codes be easy to use, but the results should be easy to interpret.
Every code should have a simple description of what it is measuring, what
the data can be used for (and what it shouldn't be used for), and how to use
the data. 

PARKBENCH needs to provide guidance in what data to collect, not just
carefully crafted benchmark codes. And we need to describe clearly what 
low level communication tests are good for. For example,
I have problems with low level contention tests. Understanding hotspots
is an interesting exercise, but the connection to "real" codes
is more subtle. Do we stress test, look at contention for given
algorithms/global operators (and which algorithms), use some standard
workload characterization as the background job, ...? For any given
performance question, what should be used may be clear, but it is difficult
to do this a priori. A simultaneous send/receive stress test may very well be
something interesting to present, but we also need to be able to explain
why (because it is typical in synchronous global communication operations?).

In summary, I would like to see a prioritized list of what low level
information is worth collecting, and why. We can then use this to choose or
generate codes to do the testing. I apologize for being lazy. This may have
already been laid out in the original ParkBench document, but I never worried
about the low level tests before and don't have a copy of the document in
front of me. 

Pat Worley





From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 13:45:53 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA15447; Fri, 16 Jan 1998 13:45:52 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA29375; Fri, 16 Jan 1998 13:15:58 -0500 (EST)
Received: from c3serve.c3.lanl.gov (root@c3serve-f0.c3.lanl.gov [128.165.20.100]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA29368; Fri, 16 Jan 1998 13:15:55 -0500 (EST)
Received: from risc.c3.lanl.gov (risc.c3.lanl.gov [128.165.21.76]) by c3serve.c3.lanl.gov (8.8.5/1995112301) with ESMTP id LAA04436 for <parkbench-comm@cs.utk.edu>; Fri, 16 Jan 1998 11:16:08 -0700 (MST)
Received: from localhost (hoisie@localhost) by risc.c3.lanl.gov (950413.SGI.8.6.12/c93112801) with SMTP id LAA13115 for <parkbench-comm@CS.UTK.EDU>; Fri, 16 Jan 1998 11:14:30 -0700
Date: Fri, 16 Jan 1998 11:14:30 -0700 (MST)
From: Adolfy Hoisie <hoisie@c3serve.c3.lanl.gov>
To: parkbench-comm@CS.UTK.EDU
Subject: Low Level Benchmarks
Message-ID: <Pine.SGI.3.95.980116105256.11971B-100000@risc.c3.lanl.gov>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Just to amplify some of the numerous excellent points made by Pat and
Charles and Rolf, the emphasis of the Parkbench group, as I see it, should
be on defining the methodology for benchmarking at this level. A string of
numbers says very little about machine performance in absence of a solid,
scientifcally defined underlying base for the programs utilized for
benchmarking.
COMMS is obsolete in methodology, coding and generation and analysis of
results. As such, I have used it quite some time ago only to reach the
conclusions above. Instead, I always chose to write my own benchmarking
programs in order to extract meaningful data for the applications I was
working on.
I would like to see the debate heading towards what is it that we need to
measure in a suite of general use that is applicable to machines of
interest. For example, very little or no attention is being paid to
benchmarking DSM architectures, where quite a few architectural parameters
become harder to define and subtler to interpret. Including, but not
limited to, message passing characterization on these architectures.

Adolfy    

======================================================================
Adolfy Hoisie                  \        Los Alamos National Laboratory
                                \Scientific Computing, CIC-19, MS B256
hoisie@lanl.gov                  \           Los Alamos, NM  87545 USA
                                  \                Phone: 505-667-5216
http://www.c3.lanl.gov/~hoisie/hoisie.html           FAX: 505-667-1126


From owner-parkbench-comm@CS.UTK.EDU Sun Jan 18 07:38:42 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA20627; Sun, 18 Jan 1998 07:38:42 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA21662; Sun, 18 Jan 1998 07:28:22 -0500 (EST)
Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.154]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA21655; Sun, 18 Jan 1998 07:28:20 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa1002926; 18 Jan 98 12:25 GMT
Message-ID: <IWo32HA0Rfw0Ew5i@minnow.demon.co.uk>
Date: Sun, 18 Jan 1998 12:24:20 +0000
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Low Level Benchmarks
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>


To: the low-level discussion group

From: Roger

I comment below on recent emailings on this topic which arrived  on the 
16 Jan 1998.

Pat Worley writes:

>2) It may be time to revisit the goals of the Low Level suite. There ar
>   are two obvious extremes.
>
>   a) Determine some (hopefully representative) metrics of point-to-po
>      point communication performance, concentrating on making the    
>      measurements
>      SNIP
>      In this situation, a two (or more) parameter model fit to the 
>      data can be useful, if only as a shorthand for the raw data, 
>      but the model should not be expected to explain the data.

This is of course what COMMS1 sets out to do. But please when judging 
this point, use the New COMMS1 revised code that DOES give much more 
sensible answers in difficult cases. Please do not base your opinions on 
results from the Original COMMS1 code that is still unfortunately being 
issued by Parkbench. Instructions for getting the new code was given in 
my email to this group on 12 Jan 1998. 

>     (The two parameter models are very accurate for some of the 
>      previous generation of homogeneous message-passing platforms.)

It is nice to have confirmation of this from an independent source.
In addition, the 3-parameter mode is available in New COMMS1 for cases 
where the 2-para fails.

>    In case my sympathies are not clear, I prefer to revisit and fix 
>    the current suite, "dumbing it down", if only in presentation, 
>    making it clear what it does and does not measure. 
 
Again this was my objective in writting the New COMMS1 as a minimum fix
to the existing Original COMMS1. However I don't think I would call this
"Dumbing Down".

In fact New COMMS1 is a "Smartening UP" of the benchmark because it
provides a 3-parameter fit for those cases for which the 2-para fit
fails. It also reports the Key spot values of "time for shortest message
(which Charles and Rolfe want to call the Latency)" and bandwidth for
longest message (this could equally well be the maximum measured
bandwidth). It also compares the fitted values with measured values at
these key points. The fit formulae are also given in the output for
completeness.

Pleas note that COMMS1 has always reported ALL the measured lengths and
times in the output file as the basic data, and ALL spot bandwidths were
printed to the screen as measured, and could be captured in a file if
required. In New COMMS1 the spot bandwidths are more conveniently
included in the standard output file as they should have been in the
first place.

Unfortunately the above additions make the new output file more complex
(which I am not happy about). An example of New COMMS1 output is
attached at the end of this email. 

>PPS. - Of course the real reason for using ping-pong is the difficulty 
>       in measuring the time for one-way messaging. I was not aware  
>       that this was a solved problem, at least at the MPI or PVM 
>       level. Perhaps system instrumentation can answer it, but I 
>       didn't know that portable measurement codes could be guaranteed
>       to do so across the different platforms.

Exactly so.

*******************************

Rolf Hempel writes:

>of message-passing. If Charles knows a way to measure single messages,
>I would like to learn about it.

Me too.

>In most other points I agree with Charles. I'm strongly convinced that
>the COMMS* routines are obsolete and should be replaced with something
>reasonable. In particular, the current routines are far too complicated
>to use, and give completely meaningless results. Therefore, I think one

Please base your judgement on the results from New COMMS1 which has a
much more satisfactory fitting procedure (see the examples in the PICT
tool mentioned below). I believe that the revised program New COMMS1
gives reasonable results and is not obselete.

>README file. What is missing is a precise definition of the underlying
>measuring methodology.

In contrast, the methodology of the COMMS1 curve fitting is given in the
Parkbench Report and in detail in my book "The Science of Computer 
Benchmarking", see: 

              http://www.siam.org/catalog/mcc07/hockney.htm 

>I strongly prefer the output of timing tables (perhaps translated in
>good graphical representations) over crude parametrizations like the
>ones in the COMMS* benchmarks. Those can only frustrate the experts
>and confuse all other people.

You seem to have failed to notice that both the Original COMMS1 and the
New COMMS1 report the timing table as the FIRST part of their output
files.

Further a good graphical representation is available using the database
tool from Southampton and my own PICT tool (see below)

The COMMS1 fitting procedure is not crude. On the contrary it uses
least-squares fitting of a performance model that is quite satisfactory
for a lot of data. In minimising relative rather than absolute error,
New COMMS1 spreads the error in a much more satisfactory way and allows
the fitting to be used over a much longer range of message lengths.
Furthermore where the 2-parameter model is unsuitable, New COMMS1
provides a 3-parameter model which fits the Cray T3E (Charles's data 17
Dec 96) very well. I don't think one can call all this crude.

To see how good the 2 and 3 parameter fits produced by New COMMS1 are to 
recent data, check out the examples on my Parkbench Interactive Curve 
Fitting Tool (PICT) at:

          http://www.minnow.demon.co.uk/pict/source/pict2a.html

For the most part these show that 2-parameters fit the data surprisingly 
well. The parameters are not meaningless and useless, but often a
rather good summary of the measurements.

The 3-parameter fit is described quite fully in my talk to the 11 Sep
1997. I have finally written this up with pretty pictures for the PEMCS
Web Journal. Look at:

         http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/
                talks/Roger-Hockney/perfprof1.html

In truth we need to see a lot more data before judging the usefulness
of parametric fitting. That is why I would like to look at your NEC 
results. These need not be the timings from COMMS1, but any pingpong 
measurements that you regard as "good".

Please do not base your opinion on the results produced by the Original 
COMMS1 which is presently in the Parkbench suit. This will only work 
satisfactorily results for message lengths up to about 4*10^4. When used 
outside this range it may produce useless numbers.   

>messages are multiples of 4 or 8 bytes, I support Charles' proposal to
>measure the time for an 8 byte message and call it the latency.

I am STRONGLY opposed to this. Latency is an ambiguous term that has 
different meanings to different people. If we wish to report the time 
for an 8-byte message we should call it what it is, no more no less, eg:

                     t(n=8B) = 45.6 us

To call this latency only leads to confusion and senseless 
misunderstanding and argument.

****************************************************************
EXAMPLE NEW COMMS1 OUTPUT FILE: T3E Results from
Grassl's 17 Dec 1996 email to Parkbench committee
****************************************************************
 

 =================================================
 ===                                           ===
 ===  GENESIS / ParkBench Parallel Benchmarks  ===
 ===                                           ===
 ===              comms1_mpi                   ===
 ===                                           ===
 =================================================

 Pingpong Benchmark:
 -------------------
 Measures time to send a message between two nodes on a multi-processor
 computer (MPP or network) as a function of the message length.

 It also characterises the time and corresponding bandwidth by both two
 and three performance parameters.

 Original code by Roger Hockney (1986/7), modified by Ian Glendinning
 and Ade Miller (1993/4), and by Roger Hockney and Ron Sercely (1997).

 -----------------------------------------------------------------------
 You are running the VERSION dated:    RWH-12-Mar-1997
 -----------------------------------------------------------------------

 The measurement time requested for each test case was  1.00E+00
seconds.

 No distinction was made between long and short messages.

 Zero length messages were not used in least squares fitting.

 -----------------------------------------------
 (1) PRIMARY MEASUREMENTS (BW=Bandwidth, B=Byte)
 -----------------------------------------------------------------------
           SPOT MEASURED VALUES        |   EVOLVING TWO-PARAMETER FIT
 --------------------------------------|--------------------------------
 POINT LENGTH(n)   TIME(t)   BW(r=n/t) |    rinf      nhalf     RMS rel
          B           s         B/s    |    B/s         B       error %
 *SPOT1*-------------------------------|--------------------------------
    1  8.000E+00  1.260E-05  6.349E+05 | 0.000E+00  0.000E+00  0.000E+00
    2  1.000E+01  1.348E-05  7.418E+05 | 2.273E+06  2.064E+01 -1.255E-06
    3  2.000E+01  1.380E-05  1.449E+06 | 1.237E+07  1.516E+02  2.277E+00
    4  3.000E+01  1.590E-05  1.887E+06 | 7.798E+06  9.157E+01  2.762E+00
    5  4.000E+01  1.561E-05  2.562E+06 | 1.020E+07  1.237E+02  3.267E+00
    6  5.000E+01  1.648E-05  3.034E+06 | 1.115E+07  1.366E+02  3.126E+00
    7  6.000E+01  1.618E-05  3.708E+06 | 1.364E+07  1.711E+02  3.796E+00
    8  7.000E+01  1.773E-05  3.948E+06 | 1.356E+07  1.699E+02  3.552E+00
    9  8.000E+01  1.694E-05  4.723E+06 | 1.562E+07  1.992E+02  4.072E+00
   10  9.000E+01  1.793E-05  5.020E+06 | 1.634E+07  2.095E+02  3.954E+00
   11  1.000E+02  1.802E-05  5.549E+06 | 1.741E+07  2.249E+02  3.983E+00
   12  1.100E+02  1.889E-05  5.823E+06 | 1.776E+07  2.300E+02  3.841E+00
   13  1.200E+02  1.780E-05  6.742E+06 | 1.983E+07  2.607E+02  4.483E+00
   14  1.300E+02  1.917E-05  6.781E+06 | 2.034E+07  2.682E+02  4.368E+00
   15  1.400E+02  1.902E-05  7.361E+06 | 2.131E+07  2.828E+02  4.405E+00
   16  1.500E+02  1.941E-05  7.728E+06 | 2.209E+07  2.946E+02  4.389E+00
   17  1.600E+02  1.896E-05  8.439E+06 | 2.353E+07  3.167E+02  4.644E+00
   18  1.700E+02  2.057E-05  8.264E+06 | 2.362E+07  3.179E+02  4.514E+00
   19  1.800E+02  1.911E-05  9.419E+06 | 2.526E+07  3.434E+02  4.887E+00
   20  1.900E+02  2.125E-05  8.941E+06 | 2.517E+07  3.420E+02  4.765E+00
   21  2.000E+02  1.894E-05  1.056E+07 | 2.730E+07  3.754E+02  5.382E+00
   22  2.100E+02  2.091E-05  1.004E+07 | 2.767E+07  3.812E+02  5.282E+00
   23  2.200E+02  2.011E-05  1.094E+07 | 2.885E+07  3.998E+02  5.393E+00
   24  2.300E+02  2.136E-05  1.077E+07 | 2.915E+07  4.047E+02  5.296E+00
   25  2.400E+02  2.015E-05  1.191E+07 | 3.053E+07  4.268E+02  5.496E+00
   26  2.500E+02  2.228E-05  1.122E+07 | 3.047E+07  4.258E+02  5.390E+00
   27  2.600E+02  2.144E-05  1.213E+07 | 3.110E+07  4.360E+02  5.365E+00
   28  2.700E+02  2.212E-05  1.221E+07 | 3.142E+07  4.412E+02  5.290E+00
   29  2.800E+02  2.111E-05  1.326E+07 | 3.249E+07  4.588E+02  5.417E+00
   30  2.900E+02  2.259E-05  1.284E+07 | 3.272E+07  4.626E+02  5.337E+00
   31  3.000E+02  2.284E-05  1.313E+07 | 3.294E+07  4.663E+02  5.262E+00
   32  4.000E+02  2.256E-05  1.773E+07 | 3.550E+07  5.098E+02  5.818E+00
   33  6.000E+02  2.549E-05  2.354E+07 | 4.022E+07  5.921E+02  6.632E+00
   34  8.000E+02  2.817E-05  2.840E+07 | 4.567E+07  6.883E+02  7.296E+00
   35  1.000E+03  3.253E-05  3.074E+07 | 4.887E+07  7.452E+02  7.451E+00
   36  2.000E+03  4.496E-05  4.448E+07 | 5.553E+07  8.657E+02  8.013E+00
   37  5.000E+03  6.135E-05  8.150E+07 | 7.983E+07  1.312E+03  1.090E+01
   38  1.000E+04  8.579E-05  1.166E+08 | 1.070E+08  1.814E+03  1.284E+01
   39  2.000E+04  1.294E-04  1.546E+08 | 1.339E+08  2.315E+03  1.426E+01
   40  3.000E+04  1.722E-04  1.742E+08 | 1.523E+08  2.659E+03  1.493E+01
   41  4.000E+04  2.161E-04  1.851E+08 | 1.647E+08  2.890E+03  1.524E+01
   42  5.000E+04  2.594E-04  1.928E+08 | 1.735E+08  3.056E+03  1.539E+01
   43  1.000E+05  4.534E-04  2.206E+08 | 1.847E+08  3.266E+03  1.575E+01
   44  2.000E+05  7.784E-04  2.569E+08 | 1.996E+08  3.548E+03  1.648E+01
   45  3.000E+05  1.110E-03  2.703E+08 | 2.123E+08  3.787E+03  1.701E+01
   46  5.000E+05  1.697E-03  2.946E+08 | 2.256E+08  4.039E+03  1.762E+01
   47  1.000E+06  3.276E-03  3.053E+08 | 2.370E+08  4.255E+03  1.806E+01
   48  2.000E+06  6.373E-03  3.138E+08 | 2.468E+08  4.440E+03  1.839E+01
   49  3.000E+06  9.489E-03  3.162E+08 | 2.547E+08  4.590E+03  1.858E+01
   50  5.000E+06  1.569E-02  3.187E+08 | 2.612E+08  4.714E+03  1.870E+01
   51  1.000E+07  3.134E-02  3.191E+08 | 2.666E+08  4.816E+03  1.874E+01
 *SPOT2*----------------------------------------------------------------

                            ------------------------
                            COMMS1: Message Pingpong
                            ------------------------
                                 Result Summary    
                                 --------------    

 -------------------
 (2) KEY SPOT VALUES
 -------------------
                                        ----------------------- 
 *KEY1* Shortest  n =   8.000E+00 B,   |  t =   1.260E-05 s    |  ******
                                       |                       |  ******
 *KEY2* Longest   n =   1.000E+07 B,   |  r =   3.191E+08 B/s  |  ******
                                        ----------------------- 


 -----------------------------------------------------------------------
--
 ------------------------------------------
 (3) BEST TWO-PARAMETER LINEAR-(t vs n) FIT
 ------------------------------------------
 (Minimises sum of squares of relative error at all points being fitted)

 Root Mean Square (RMS) Relative Error in time =    18.74 %
                Maximum Relative Error in time =    43.61 %  at POINT =
1

 This is a fit to ALL points. Even though different expressions are
 given for short and long messages, they are algebraically identical
 and either may be used for any message length in the full range.

 --------------
 Short Messages
 --------------
 Best expressions to use if nhalf > 0 and n <= nhalf = 4.816E+03 B

 Bandwidth fitted to:    r = pi0*n/(1+n/nhalf)

 Time fitted to:         t = t0*(1+n/nhalf)

           --------------------------------------------
 *LIN1*   |  pi0 = 5.536E+04 Hz,   nhalf= 4.816E+03 B  |   ******
          |                                            |   ******
 *LIN2*   |   t0 = 1/pi0 = 1.807E-05 s                 |   ******
           --------------------------------------------

 Spot comparison at POINT = 1, n = 8.000E+00 B

   t(fit) = 1.810E-05 s,  t(measured) = 1.260E-05 s,  relative error in
time =   43.6 %

 -------------
 Long Messages
 -------------
 Best expressions to use if  n > nhalf = 4.816E+03 B, or nhalf=0

 Bandwidth fitted to:     r = rinf/(1+nhalf/n)

 Time fitted to:          t = (n+nhalf)/rinf

           -----------------------------------------------
 *LIN3*   |  rinf = 2.666E+08 B/s,  nhalf = 4.816E+03 B   |   ******
           -----------------------------------------------

 Spot comparison at POINT =  51, n = 1.000E+07 B

   r(fit) = 2.665E+08 B/s,  r(measured) = 3.191E+08 B/s,  relative error
in B/W =  -16.5 %


 -----------------------------------------------------------------------
--
 ---------------------------------------
 (4) BEST 3-PARAMETER VARIABLE-POWER FIT
 ---------------------------------------

 Root Mean Square (RMS) Relative Error in B/W =    6.89 %
                Maximum Relative Error in B/W =  -13.41 %  at POINT = 39

 This fit is to ALL data points

 Bandwidth is fitted to:   rvp = rivp/(1+(navp/n)^gamvp)^(1/gamvp)

 Time is fitted to:        tvp = t0vp*(1+(n/navp)^gamvp)^(1/gamvp)

 where                    t0vp = navp/rivp  and  navp = t0vp*rivp

 When gamvp = 1.0, this form reduces to the linear-time form (3) above,
 navp becomes nhalf, and rivp becomes rinf.

 The three independent parameters are (t0vp is derived):

          -------------------------------------------------------------
 *VPWR1* | rivp = 3.475E+08 B/s, navp = 3.670E+03 B, gamvp = 4.190E-01 |
         |                                                             |
 *VPWR2* | t0vp = navp/rivp =  1.056E-05 s                             |  
          -------------------------------------------------------------
 This function is guaranteed to fit the first and last measured values  
 of time  and bandwidth. It also fits the (interpolated) time and  
 bandwidth at n = navp.

-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon Jan 19 13:10:51 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA16306; Mon, 19 Jan 1998 13:10:50 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA21116; Mon, 19 Jan 1998 12:53:17 -0500 (EST)
Received: from haze.vcpc.univie.ac.at (haze.vcpc.univie.ac.at [131.130.186.138]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id MAA21105; Mon, 19 Jan 1998 12:53:14 -0500 (EST)
Received: (from smap@localhost)
	by haze.vcpc.univie.ac.at (8.8.6/8.8.6) id SAA21164
	for <parkbench-comm@CS.UTK.EDU>; Mon, 19 Jan 1998 18:53:11 +0100 (MET)
From: Ian Glendinning <Ian.Glendinning@vcpc.univie.ac.at>
Received: from fidelio(131.130.186.155) by haze via smap (V2.0beta)
	id xma021162; Mon, 19 Jan 98 18:52:48 +0100
Received: (from ian@localhost) by fidelio.vcpc.univie.ac.at (8.7.5/8.7.3) id SAA03411 for parkbench-comm@CS.UTK.EDU; Mon, 19 Jan 1998 18:52:48 +0100 (MET)
Date: Mon, 19 Jan 1998 18:52:48 +0100 (MET)
Message-Id: <199801191752.SAA03411@fidelio.vcpc.univie.ac.at>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Low Level benchmark errors and differences
X-Sun-Charset: US-ASCII

Dear parkbench-comm subscriber,

I have been following the discussions regarding the low-level ParkBench
benchmarks over the last couple of weeks with intertest, but so far I have
been content to keep my head below the parapet, as most of the things I would
have said have been said by others anyway.  However, there is one thing that I
would like to point out.

On Wed Jan  7 22:56:04 1998, Charles Grassl wrote:

> The Low Level programs are obsolete and need to be replaced.

I agree that the existing code could use some improvement, though most of the
discussion seems to have revolved around the version in the "current release",
which as Roger has pointed out several times is very old, and he has written
an improved version.  Have people tried that version out?

> I have
> written seven simple programs, with MPI and PVM versions, and offer them
> as a replacement for the Low Level suite.

I have tried a version of Charles's "comms1" code that he sent me, on our CS-2
system, and found that it reported approximately half the expected asymptotic
bandwidth, so this code is not without its problems either!  By "expected", I
mean the bandwidth reported by various versions of (the ParkBench version of)
COMMS1 over the years, coded using first PARMACS, then PVM, and more recently
MPI, as a message-passing library.  This value corresponds closely to what one
would expect for the peak performance, given the performance figures for the
underlying hardware.  For an explanation of what I think is happening, please
read on...

On Thu Jan 15 20:20:36 1998, Charles Grassl wrote:

> This recorded
> time is for a round trip message, and is not precisely the time for
> two messages.  Half the round trip message passing time, as reported in
> the PMB tests, is not the time for a single message and should not be
> reported and such.  This same erroneous technique is used in the COMMS1
> and COMMS2 two benchmarks.  (Is Parkbench is responsible for propagating
> this incorrect methodology.)

As Pat Worley and Rolf Hempel pointed out, the ping-pong is used because of
the difficulty in measuring the time for one-way messages, and I believe that
this is illustrated in this instance, as it seems that Charles's attempt to
time one-way messages has caused the unexpectedly low asymptotic bandwidth
measurement...  Charles's code executes a send, and then as fast as possible
executes another one, without any concern as to whether the data has left the
sending processor, or has arrived at the receiving processor, and what I think
is happening is that his code is queuing requests to send, before the previous
messages have left the sending processor, forcing the MPI implementation to
buffer them, at the cost of an extra copy operation, which would not otherwise
have been necessary, thus reducing the effective bandwidth!

> With respect to low level testing, the round trip exchange of messages,
> as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not
> characteristic of the lowest level of communication.  This pattern
> is actually rather rare in programming practice.  It is more common
> for tasks to send single messages and/or to receive single messages.

It seems to me that it is not very common programming practice to send a
sequence of messages to the same destination in rapid fire, without having
either done some intermediate processing, or waiting to get some response
back.  If you were trying to code efficiently, you would doubtless merge
the messages into one, and send the data all together in one message, if it
was all available already, which it must have been if you were able to
execute the sends so rapidly one after another! 

> The single message passing is a distinctly different case from that
> of round trip tests.  We should be worried that the round trip testing
> might introduce artifacts not characteristic of actual (low level) usage.
> We need a better test of basic bandwidth and latency in order to measure
> and characterize message passing performance.

Well, it seems that in this case, the attempt to measure the single message
passing case has introduced an artifact.  To an extent it depends what you are
trying to measure of course, but it has always been my understanding that the
COMMS1 benchmark was trying to measure the peak performance that you could
reasonably expect to obtain using a portable message-passing library
interface, which, for a good implementation of MPI, ought to come close to the
theoretical hardware limit, which is precisely what the existing COMMS1
ping-pong code does on our system.  I would therefore argue in favour of
retaining the ping-pong technique for obtaining timings.
  Ian
--
Ian Glendinning         European Centre for Parallel Computing at Vienna (VCPC)
ian@vcpc.univie.ac.at   Liechtensteinstr. 22, A-1090 Vienna, Austria
Tel: +43 1 310 939612   WWW: http://www.vcpc.univie.ac.at/~ian/

From owner-parkbench-comm@CS.UTK.EDU Tue Jan 20 08:50:06 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id IAA06977; Tue, 20 Jan 1998 08:50:06 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id IAA01200; Tue, 20 Jan 1998 08:28:44 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id IAA01193; Tue, 20 Jan 1998 08:28:39 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id OAA12945; Tue, 20 Jan 1998 14:19:53 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA09828; Tue, 20 Jan 1998 14:19:52 +0100
Date: Tue, 20 Jan 1998 14:19:52 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801201319.OAA09828@sgi7.ccrl-nece.technopark.gmd.de>
To: cmg@cray.com
Subject: Re: Low Level Benchmarks
Cc: hempel@ccrl-nece.technopark.gmd.de, parkbench-comm@CS.UTK.EDU
Reply-To: hempel@ccrl-nece.technopark.gmd.de

Dear Charles,

thank you for your note, and for sending me your simple test program.
One thing I like about the program is that it's easy to install and run;
no complicated makefiles, include files and sophisticated driver
software. We had the code running in five minutes.

In many points I agree with Ian Glendinning who already reported about
his tests with your code on the Meiko system. When we ran the test on
our SX-4, however, the results were very similar to ping-pong figures.
With the particular MPI version I used for my measurements, the
classical ping-pong test as implemented in MPPTEST of the MPICH
distribution gives about 4 usec less time in latency and about 4%
higher throughput than your test program. The reason for the increase in
latency as reported by your code is fully explained by the fact that
you forgot to correct for the time spent in the timer routine (see
below). So, we would have no problem with adopting a corrected version
of your code as the basic communication test. However, I think that
this is not the point.

The question we have to answer is what communication pattern we want
to measure with our benchmark code. In my view the ping-pong technique,
with all its problems, is much closer to a typical application than
your program. Of course, the situation "receiver already waiting"
implemented by the ping-pong, is a special case which will not be
found for all messages in an application. In this situation,
the MPI implementation can use a more efficient protocol, which will
lead to a best case measurement of latency and throughput.

I agree with Ian that the rapid succession of messages in one direction
is very untypical. Only a stupid programmer would do it this way in
an application, and not aggregate the messages to a larger one. What
you really measure with this benchmark is how well the MPI library
can deal with this kind of congestion. As you see, our library is not
affected at all by this, but, as Ian reported, the Meiko shows a much
different behaviour. In a sense, you measure a kind of worst case
scenario, as opposed to the best case one in the ping-pong.

One technical detail of your program: You time every send operation
separately, and then sum up the individual times. This requires a quite
accurate clock. I would expect that some machines could run into
trouble with this approach. Also, you don't correct for the time needed
for calling the timer twice for every send/receive. On machines with
highly optimized MPI libraries this is not at all negligible. On our
machine two timer calls require as much time as 25% of a complete
send-receive sequence!

As a summary, your basic communication program does not convince me as
a better alternative to ping-pong programs such as MPPTEST. The only
thing I really like about it is its simplicity.

Best regards,
  Rolf
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Wed Jan 21 11:22:06 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id LAA27346; Wed, 21 Jan 1998 11:22:06 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA20207; Wed, 21 Jan 1998 10:55:58 -0500 (EST)
Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id KAA20176; Wed, 21 Jan 1998 10:55:44 -0500 (EST)
Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id QAA01123; Wed, 21 Jan 1998 16:50:13 +0100 (MET)
Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA11663; Wed, 21 Jan 1998 16:54:00 +0100
Date: Wed, 21 Jan 1998 16:54:00 +0100
From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)
Message-Id: <199801211554.QAA11663@sgi7.ccrl-nece.technopark.gmd.de>
To: parkbench-comm@CS.UTK.EDU
Subject: NEW COMMS1 benchmark
Cc: eckhard@ess.nec.de, tbeckers@ess.nec.de,
        lonsdale@ccrl-nece.technopark.gmd.de,
        maciej@ccrl-nece.technopark.gmd.de,
        ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, springstubbe@gmd.de,
        hempel@ccrl-nece.technopark.gmd.de
Reply-To: hempel@ccrl-nece.technopark.gmd.de

In the recent discussion on the low-level benchmarks, Roger repeatedly
asked us to base our evaluation of the COMMS1 benchmark on his new
version, and not on the one which is still in the official PARKBENCH
distribution. At NEC we now have repeated the tests on the NEC SX-4
machine, and I would like to make a few comments on the results.

First of all, the raw data as reported by the table Primary Measurements
more closely match the figures given by other ping-pong tests than the
older version. The correction for oeverheads, however, is still
problematic for the following reasons:
1. In every loop iteration, the returned message is compared with the
   message sent. If one is concerned with the correctnes of the MPI
   library, this could be checked in a separate loop before the timing
   loop. The check inside the timing loop, done only by the sender
   process, delays the sender and thus makes sure that the receiver
   is already waiting in the receive for the next message. This
   aggravates the "Receiver ready" situation which I discussed in an
   earlier mail.
2. The authors take great care in correcting for the overhead introduced
   by the do loop. This is done by the loop over the dummy routine
   before the main loop. On the other hand, the correction for the
   check routine call introduces an overhead of one timer call which
   is NOT taken into account. (Here I assume that the internal clock
   is read out at a fixed point in time during every call of
   DWALLTIME00().) I would argue that on most machines the loop overhead
   per iteration is negligible as compared to a function call. On our
   machine, MPI_Wtime calls a C function which in turn calls an
   assembly language routine. The time needed for this is about 10%
   of our message latency!

Another problem in the measuring procedure is that the test message
contains a single constant, repeated as many times as there are words in
the message. Did the authors never think about the possibility of
data compression in interconnect systems? I would not be surprised to
see bandwidths of Terabytes/sec on some Ethernet connection between
workstations.

Apart from this, the raw data are much better now than they were before,
and when the above points were fixed, the resulting table would be
satisfactory. The interesting question is, however, how much added value
we get from the parameter fitting. In my earlier note, I called the
fitting procedure in the earlier COMMS1 benchmark "crude". I cannot
find a more appropriate word for a model which in cases deviates from
the measured values by more than 100%.

So, how much improvement do we get from the revised COMMS1 version?
As Roger said himself, the increase in modeling sophistication led
to a more complicated output file. Results are now given for two
models, the first one using two parameters, and the second one three.
As could be expected, the two-parameter model does not work better than
in the previous version. For our machine, latency is over-estimated
by 18.9 percent, and the bandwidth at the last data point is off by
27%. Since a linear model is just too simple to be applied to modern
message-passing libraries, I wonder why these results are still in the
output file at all.

The three-parameter fit is better than the two-parameter one. The
major advantage is that it exactly matches the first data point in
time, and the last data point in bandwidth. That is what people would
look at, if there were no parameter fitting at all. So, the reported
latency is the time measured for a zero-byte message, and is as good or
as bad as this measurement. For our MPI library, the RMS fitting
error for the whole data set is 14.04%, and the maximum relative error
is 33.4%. We now can discuss the meaning of the word "crude" (and I
apologize if as a non-native speaker I don't use the right word here),
but I would at least call it unsatisfactory. Given those differences
between model and measurements, I was not surprised to see the
projected RINFINITY as being too high. The 7.65 GBytes/s are well
beyond a memcpy operation in our shared memory, and measured rates never
exceeded 7.1 GBytes/s.

To summarize, in my opinion there is no added value given by the
parameter fitting. The latency value is the first entry in the raw
data table, and the asymptotic bandwidth is easy to figure out by just
looking at the bandwidths as measured for very long messages. As
explained above, the extrapolation by the parametrized model does not
add any precision as compared with a guess based on the long-message
table entries. For message lengths in between, what does a model help
me if it deviates from the measurements by up to 33%? So, my conclusion
would be to drop the whole parameter fitting from the PARKBENCH
low-level routines.

In a separate mail I will send the COMMS1 benchmark output, as produced
with our MPI library, to Roger. I don't want to swamp the whole
PARKBENCH forum with the detailed data.

Best regards,
 Rolf
------------------------------------------------------------------------
Rolf Hempel      (email: hempel@ccrl-nece.technopark.gmd.de)
Senior Research Staff Member
C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10,
53757 Sankt Augustin, Germany
Tel.: +49 (0) 2241 - 92 52 - 95
Fax:  +49 (0) 2241 - 92 52 - 99

From owner-parkbench-comm@CS.UTK.EDU Fri Jan 23 12:24:12 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA07290; Fri, 23 Jan 1998 12:24:11 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA06737; Fri, 23 Jan 1998 12:04:42 -0500 (EST)
Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.154]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA06686; Fri, 23 Jan 1998 12:04:23 -0500 (EST)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa1003594; 23 Jan 98 16:49 GMT
Message-ID: <1GgxMFAgVMy0EwfI@minnow.demon.co.uk>
Date: Fri, 23 Jan 1998 16:29:20 +0000
To: hempel@ccrl-nece.technopark.gmd.de
Cc: parkbench-comm@CS.UTK.EDU, eckhard@ess.nec.de, tbeckers@ess.nec.de,
        lonsdale@ccrl-nece.technopark.gmd.de,
        maciej@ccrl-nece.technopark.gmd.de,
        ritzdorf@ccrl-nece.technopark.gmd.de,
        zimmermann@ccrl-nece.technopark.gmd.de, springstubbe@gmd.de
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Re: NEW COMMS1 benchmark
In-Reply-To: <199801211554.QAA11663@sgi7.ccrl-nece.technopark.gmd.de>
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>

To: The Parkbench discussion group

From: Roger Hockney


First the 3-parameter fit that is produced by New COMMS1 and discussed
by Rolf can be found in the html version of this reply at:

              www.minnow.demon.co.uk/Pbench/emails/hempel1.htm

Or by bringing up the PICT tool on your browser at:

              www.minnow.demon.co.uk/pict/source/pict2a.html
Then:

(1) select a suitable frame size for the PICT display

(2) change the data URL at top from .../data/t3e.res to .../data/sx4.res

(3) press the "GET DATA at URL" button, and the data should download.

(4) press the 3-PARA button then the APPLY3 button, and the 3-para curve
should be drawn.

************************************************************************
Rolf has especialy asked me to point out that the results that he has
supplied are for the SX4 using Release 7.2 MPI software which is will
soon be replaced by a newer version with significantly better latency
and bandwidth. This data does not therefore represent the best that can
be achieved on the SX4.
************************************************************************

I now reply to specific points in Rolf Hempel's email to group on 21 Jan
1998.
 
>In the recent discussion on the low-level benchmarks, Roger repeatedly
>asked us to base our evaluation of the COMMS1 benchmark on his new
>version, and not on the one which is still in the official PARKBENCH
>distribution. At NEC we now have repeated the tests on the NEC SX-4
>machine, and I would like to make a few comments on the results.
>
Thank you, Rolf, for taking the trouble to install New COMMS1 and
sending me the results. I discuss the results below. In answer to your
other points: 

>First of all, the raw data as reported by the table Primary Measurements
>more closely match the figures given by other ping-pong tests than the
>older version. The correction for oeverheads, however, is still

The two points you raise could easily be incorporated in the code. I was
reluctant to tamper with the measurement part of the COMMS1 code because
it would introduce systematic differences in the measurements and make
comparison with older measurements invalid. But of course this has to be
done from time to time. My changes were deliberately kept to a minimum
and confined largely to the parameter fitting part which was causing the
main problems being reported.

>Another problem in the measuring procedure is that the test message
>contains a single constant, repeated as many times as there are words in
>the message. Did the authors never think about the possibility of
>data compression in interconnect systems? I would not be surprised to
>see bandwidths of Terabytes/sec on some Ethernet connection between
>workstations.

Yes I did think about this, but decided I did not know enough about
compression to devise a way to prevent it. Compression algorithms are so
clever now that this may be impossible to do. Anyway this is not yet a
problem, so I suggest we leave it until it becomes one. Perhaps software
should get benefit in its performance numbers for the use of compression
but then we need something more difficult than a sequence of constants
to use as a standard test.
>
>Apart from this, the raw data are much better now than they were before,
>and when the above points were fixed, the resulting table would be
>satisfactory. 

I would have no objection to this.

>The interesting question is, however, how much added value
>we get from the parameter fitting. In my earlier note, I called the

The added value provided in the case of the NEC SX4 results is that the
3-parameter fit (see graph) gives a satisfactory fit to ALL the data.
This reduces 112 numbers to 3 numbers and an analytic formula that can
be manipulated. This is called "Performance Characterisation" and
provides very useful data compression. Furthermore the parameters
themselves can be interpreted as characterising various aspects of the
shape and asymptotes of the performance curve. 

In contrast reporting just the first time and last performance value and
calling them the Latency and Bandwidth only tells us about these two
points. Further the choice of which message lengths to use for this type
of definition is entirely arbitrary and open to much argument at both
ends. However, New COMMS1 does provide this type of output in the lines
marked KEY SPOT VALUES but I deliberately avoided calling them values of
Latency and Bandwidth in order to avoid senseless argument.

Some people are very interested in the parametric representations,
others not. One is not obliged to use or look at the parametric
representations, but they are there for those who want them. For those
interested just in the Raw data those are reported first in the output
file of New COMMS1.


>As could be expected, the two-parameter model does not work better than
>in the previous version. For our machine, latency is over-estimated
>by 18.9 percent, and the bandwidth at the last data point is off by
>27%. Since a linear model is just too simple to be applied to modern
>message-passing libraries, I wonder why these results are still in the
>output file at all.

The 2-PARA results are reported just so that one can see that they are
unsatisfactory, and that therefore one must lose simplicity and consider
a 3-para fit. Actually there is a switch that can be set in the
comms1.inc file to suppress reporting of output if the errors exceed
specified values. Every time I have used this, however, I have tended to
rerun with the output on, in order to see just what the 2-para gave. If
the 2-para can be accepted it is much preferable to the 3-para because
of its simplicity and clearer interpretation of the significance of the
parameters.

>as bad as this measurement. For our MPI library, the RMS fitting
>error for the whole data set is 14.04%, and the maximum relative error
>is 33.4%. We now can discuss the meaning of the word "crude" (and I

If you look at the graph itself (see above), I think you will find the
agreement much more satisfactory than is apparent from the reported
errors. You also may have too high an expectation of what parametric
fitting can reasonably be expected to provide, especially for data with
discontinuities. 

In my experience agreement in RMS error rarely is better than 7% and
anything up to 30% is probably still useful. A maximum error of 30% is
not bad at all, and may be due to a single rogue point or an isolated
discontinuity. Although error numbers are reported in the output, one
really has to look at the graph of all data before drawing conclusions. 

>Given those differences
>between model and measurements, I was not surprised to see the
>projected RINFINITY as being too high. The 7.65 GBytes/s are well
>beyond a memcpy operation in our shared memory, and measured rates never
>exceeded 7.1 GBytes/s.

Actually 7.65 differs from 7.1 by 8% which is very good agreement
indeed. 

>To summarize, in my opinion there is no added value given by the
>parameter fitting. The latency value is the first entry in the raw
>data table, and the asymptotic bandwidth is easy to figure out by just
>looking at the bandwidths as measured for very long messages. As

Your definitions of Latency and Bandwidth will have to be more precise
than the above. What does "by looking at the B/W for very long messages"
actually mean. What are "very long messages?". "What message length
should the first entry in the Raw data table be for?" ... etc.

>explained above, the extrapolation by the parametrized model does not
>add any precision as compared with a guess based on the long-message
>table entries. 

Strictly-speaking it is invalid to extrapolate the fitted curve outside
the range of measured values. However we will always do this, and in
this case the fit predicts the known hardware limit as well as can be
reasonably expected.

>For message lengths in between, what does a model help
>me if it deviates from the measurements by up to 33%? So, my conclusion
>would be to drop the whole parameter fitting from the PARKBENCH
>low-level routines.

I think the graph of the results and the 3-para fit shows remarkably
good and useful agreement. But this is a subjective personal opinion.
What do others think?

Best wishes

Roger
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk
University of   and link to my new book: "The Science of Computer Benchmarking"
Westminster UK  suggestions welcome. Know any fish movies or suitable links?

From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 06:39:21 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA02920; Mon, 26 Jan 1998 06:39:21 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA09063; Mon, 26 Jan 1998 06:22:47 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA09055; Mon, 26 Jan 1998 06:22:36 -0500 (EST)
Received: from mordillo (p112.nas1.is3.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA12226; Mon, 26 Jan 98 11:19:15 GMT
Date: Mon, 26 Jan 98 10:14:38 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: Low Level Benchmarks 
To: Charles Grassl  <cmg@cray.com>, parkbench-comm@CS.UTK.EDU
Cc: solchenbach@pallas.de
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <199801151711.RAA07227@magnet.cray.com> 
Message-Id: <Chameleon.885813075.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Charles,

Thanks for your thoughts and experiences with the Pallas PMB codes -
I will  forward them to the authors...  The main points in favour of
the PMB codes are that they are in C and potentially produce results
for a variety of MPI calls... Obviously if the results they produce are
flawed...

Regarding new low-level codes I would be in favour of taking up your
kind offer of writing a set of codes in C/Fortran. I guess the main
problem is getting a concensus with regards methodology and measurements
that are used with these codes.

Maybe we can decide that a number of actions should be undertaken...

1) It seems clear that no one is 100% happy with the current version
   of the low-level codes. So, this implies that they need to be 
   replaced !?

2) If we are going to replace the codes we can go down a couple of routes;
   start from scratch, replace with Roger's new codes or some combination of
   both...

3) I would be happy to see us start from scratch and  create 
   C/Fortran codes where the methodology and design of each can be
   "hammered out" by discussion first and then implemented 
   (and iterated as necessary).

4) Assuming that we want to go down this route, I suggest we make a starting
   point of Charles' "suggestions and requirements for the low level 
   benchmark design" - towards the end of this email. I am happy to
   put these words on the web and update/change them as our dicussions
   evolve...

5) Charles has offered his services to help write/design/test these new codes -
   I'm willing to offer my services in a similar fashion. I'm sure that others
   interested in the low-level codes could contribute something here as well. 

Overall, it seems clear to me that we have enough energy and manpower to
produce a new set low-level codes whose methodology and design is correct
and relevant to todays systems...

I look forward to your comments...

Regards

Mark 






--- On Thu, 15 Jan 1998 11:11:39 -0600 (CST)  Charles Grassl <cmg@cray.com> wrote:
> 
> To:      Parkbench interests
> From:    Charles Grassl
> Subject: Low Level benchmarks
> 
> Date:    15 January, 1998
> 
> 
> Mark, thank you for pointing us to the PMB benchmark.  It is well written
> and coded, but has some discrepancies and shortcomings.  My comments
> lead to suggestions and recommendation regarding low level communication
> benchmarks.
> 
> First, in program PMB the PingPong tests are twice as fast (in time)
> as the corresponding message length tests in the PingPing tests (as run
> on a CRAY T3E).  The calculation of the time and bandwidth is incorrect
> by a factor of 100% in one of the programs.
> 
> This error can be fixed by recording, using and reporting the actual
> time, amount of data sent and their ratio.  That is, the time should not
> be divided by two in order to correct for a round trip.  This recorded
> time is for a round trip message, and is not precisely the time for
> two messages.  Half the round trip message passing time, as reported in
> the PMB tests, is not the time for a single message and should not be
> reported and such.  This same erroneous technique is used in the COMMS1
> and COMMS2 two benchmarks.  (Is Parkbench is responsible for propagating
> this incorrect methodology.)
> 
> In program PMB, the testing procedure performs a "warm up".  This
> procedure is a poor testing methodology because is discards important
> data.  Testing programs such as this should record all times and calculate
> the variance and other statistics in order to perform error analysis.
> 
> Program PMB does not measure contention or allow extraction of network
> contention data.  Tests "Allreduce" and "Bcast" and several others
> stress the inter-PE communication network with multiple messages,
> but it is not possible to extract information about the contention from
> these tests.  The MPI routines for Allreduce and Bcast have algorithms
> which change with respect to number of PEs and message lengths,  Hence,
> without detailed information about the specific algorithms used, we cannot
> extract information about network performance or further characterize
> the inter-PE network.
> 
> Basic measurements must be separated from algorithms.  Tests PingPong,
> PingPing, Barrier, Xover, Cshift and Exchange are low level.  Tests
> Allreduce and Bcast are algorithms.  The algorithms Allreduce and Bcast
> need additional (algorithmic) information in order to be described in
> terms of the basic level benchmarks.
> 
> 
> With respect to low level testing, the round trip exchange of messages,
> as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not
> characteristic of the lowest level of communication.  This pattern
> is actually rather rare in programming practice.  It is more common
> for tasks to send single messages and/or to receive single messages.
> In this scheme, messages do not make a round trip and there is not
> necessarily caching or other coherency effects.
> 
> The single message passing is a distinctly different case from that
> of round trip tests.  We should be worried that the round trip testing
> might introduce artifacts not characteristic of actual (low level) usage.
> We need a better test of basic bandwidth and latency in order to measure
> and characterize message passing performance.
> 
> 
> Here are suggestions and requirements, in an outline form, for a low
> level benchmark design:
> 
> 
> 
>     I. Single and double (bidirectional) messages.
> 
>        A. Test single messages, not round trips.
>          1. The round trip test is an algorithm and a pattern.  As
>             such it should not be used as the basic low level test of
>             bandwidth.
>          2. Use direct measurements where possible (which is nearly
>             always).  For experimental design, the simplest method is
>             the most desirable and best.
>          3. Do not perform least squares fits A PIORI.  We know that
>             the various message passing mechanisms are not linear or
>             analytic because different mechanisms are used for different
>             message sizes.  It is not necessarily known before hand
>             where this transition occurs.  Some computer systems have
>             more than two regimes and their boundaries are dynamic.
>          4. Our discussion of least squares fitting is loosing tract
>             of experimental design versus modeling.  For example, the
>             least squares parameter for t_0 from COMMS1 is not a better
>             estimate of latency than actual measurements (assuming
>             that the timer resolution is adequate).  A "better" way to
>             measure latency is to perform addition DIRECT measurements,
>             repetitions or otherwise, and hence decrease the statistical
>             error.  The fitting as used in the COMMS programs SPREADS
>             error.  It does not reduce error and hence it is not a
>             good technique for measuring such an important parameter
>             as latency.
> 
>        B. Do not test zero length messages.  Though valid, zero length
>           messages are likely to take special paths through library
>           routines.  This special case is not particularly interesting or
>           important.
>           1. In practice, the most common and important message size is 64
>              bits (one word).  The time for this message is the starting
>              point for bandwidth characterization.
> 
>        D. Record all times and use statistics to characterize the message
>           passing time.  That is, do not prime or warm up caches
>           or buffers.  Timings for unprimed caches and buffers give
>           interesting and important bounds.  These timings are also the
>           nearest to typical usage.  
>           1. Characterize message rates by a minimum, maximum, average
>              and standard deviation.
> 
>        E. Test inhomogeneity of the communication network.  The basic
>           message test should be performed for all pairs of PEs.
>    
> 
>    II. Contention.
> 
>        A. Measure network contention relative to all PEs sending and/or
>           receiving messages.
> 
>        B. Do not use high level routines where the algorithm is not known.
>           1. With high level algorithms, we cannot deduce which component
>              of the timing is attributable to the "operation count"
>              and which is attributable to the actual system (hardware)
>              performance.
> 
> 
>   III. Barrier.
> 
>        A. Simple test of barrier time for all numbers of processors.
> 
> 
> 
> 
> Additionally, the suite should be easy to use.  C and Fortran programs
> for direct measurements of message passing times are short and simple.
> These simple tests are of order 100 lines of code and, at least in
> Fortran 90, can be written in a portable and reliable manner.
> 
> The current Parkbench low level suite does not satisfy the above
> requirements.  It is inaccurate, as pointed out by previous letters, and
> uses questionable techniques and methodologies.  It is also difficult to
> use, witness the proliferation of files, patches, directories, libraries
> and the complexity and size of the Makefiles.
> 
> This Low Level suite is a burden for those who are expecting a tool to
> evaluate and investigate computer performance.  The suite is becoming
> a liability for our group.  As such, it should be withdrawn from
> distribution.
> 
> I offer to write, test and submit a new set of programs which satisfy
> most of the above requirements.
> 
> 
> Charles Grassl
> SGI/Cray Research
> Eagan, Minnesota  USA
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 01/26/98 - Time: 10:14:38
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 11:54:37 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id LAA07118; Mon, 26 Jan 1998 11:54:37 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA18845; Mon, 26 Jan 1998 11:21:36 -0500 (EST)
Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id LAA18837; Mon, 26 Jan 1998 11:21:33 -0500 (EST)
Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id KAA23428 for <parkbench-comm@cs.utk.edu>; Mon, 26 Jan 1998 10:21:26 -0600 (CST)
Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id KAA29079 for <parkbench-comm@cs.utk.edu>; Mon, 26 Jan 1998 10:21:24 -0600 (CST)
From: Charles Grassl <cmg@cray.com>
Received: by magnet.cray.com (8.8.0/btd-b3)
          id QAA29329; Mon, 26 Jan 1998 16:21:23 GMT
Message-Id: <199801261621.QAA29329@magnet.cray.com>
Subject: Low Level Benchmarks
To: parkbench-comm@CS.UTK.EDU
Date: Mon, 26 Jan 1998 10:21:23 -0600 (CST)
X-Mailer: ELM [version 2.4 PL24-CRI-d]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit


To:      Parkbench interests
From:    Charles Grassl
Subject: Low Level benchmarks

Date:    26, January, 1998


A short review of where we have been and decided:

Last year we agreed (via email exchanges) that the Parkbench Low Level
benchmark suite is not intended to be an -MPI- -test- suite.  There was a
consensus that we intended to measure low level performance, not algorithm
design or implementation.  This is why the Pallas benchmark, though
useful for testing the performance of several important MPI functions,
is not the basic low level test which we desire.  (I believe that the
performance measurement of the MPI functions is a worthwhile project for
this group, but it needs to be separate from the low level benchmarks.)

At the May, 1997 Parkbench meeting in Knoxville, TN, we unanimously
decided that the measurement and analysis (fitting) portions of the
COMMS programs would be made into a separate program.  This from Michael
Berry's minutes (23 May 1997):

  After more discussion, the following COMMS changes/outputs were
  unanimously agreed upon:

	  1.  Maximum bandwidth with corresp. message size.
	  2.  Minimum message-passing time with corresp. message
	      size.
	  3.  Time for minimum message length (could be 0, 1, 8,
	      or 32 bytes but must be specified).
	  4.  The software will be split into two program: one
	      to report the spot measurements and the other for
	      the analysis.

Some of the objections with the Parkbench Low Level codes are that they
are difficult to build, run and analyze.  This attributable to their
organization and design.  Separating the analysis would greatly simply
the programs, but the programs still need to be rewritten.



I include in this email message a simple replacement code for COMMS1.
It uses the "back and forth" methodology, reports maximum and minimum
times with corresponding sizes and and does not include "analysis".
It is equivalent to the measurement portion of COMMS1, though it is much
simpler and easier to use.

I will comment on the experimental methodology used in this program.

 - The reported times in standard out are actual round trip times.  It is
   a poor experimental practice to modify raw measurements too early.
   We should not mix measured times with derived times.  The practice
   leads to confusion and errors (witness the Pallas benchmark code and
   and an earlier version of Parkbench).  If we desire to divide the
   times by two (because of the round trip), then this should be done
   in a analysis portion.  Otherwise we misrepresent round trip times
   as actual single trip times, which hay are not.

 - All times are saved and written to unit 7.  The reported times
   in standard out are the first and the last measurements for each
   message size.  The experimental principle is that no data should be
   discarded with out analysis.  We can use statistical analysis or
   graphics or fitting routines to analyze the raw output.  (I favor
   graphics and statistical analysis.)  If we look at the raw output,
   we will see interesting features, such as the actual "warm up" count
   (usually five or less repetitions) and the distribution of times
   (not Gaussian!).

 - Each repetition is individually timed.  If the timer does not have
   adequate resolution, then the times for a number of repetitions, from
   two to all, can be aggregated and used.  This aggregation can be done
   in the analysis phase.  (Most computers should be able to time and
   resolve single round trip messages.)  This aggregation should not be
   done before adequate analysis or evidence that it needs to be done.

 - Each message size is tested the same number of repetitions.  We prefer
   to keep this number a constant so that the experimental sampling
   error (proportional to 1/sqrt[repetitions]) is the same for each
   message size.  Also, it is difficult to cleanly and simply adjust
   the repetition count relative to the message size.

I also have one replacement program for both COMMS2 and COMMS3 (note
that the COMMS2 measurement is a subset of COMMS3 measurements).
More on that later.

Charles Grassl
SGI/Cray Research
Eagan, Minnesota USA


-----------------------------------------------------------------------

      program       Single ! Compile: f90 file.f -l mpi
      character*40  Title
      data          Title/' Single Messages --- MPI'/

      integer       log2nmax,nmax,n_repetitions
      parameter     (log2nmax=18,nmax=2**log2nmax,n_repetitions=50)
      integer       n_starts,n_mess
      parameter     (n_starts=2,n_mess=2)
      include       'mpif.h'
      integer       ier,status(MPI_STATUS_SIZE)
      integer       my_pe,npes
      integer       log2n,n,nrep,i
      real*8        t_call,timer,tf(0:n_repetitions)
      real*8        A(0:nmax-1)
      save          A

      call mpi_init( ier )
      call mpi_comm_rank(MPI_COMM_WORLD, my_pe, ier)
      call mpi_comm_size(MPI_COMM_WORLD, npes, ier)

      radian=1
      do i=0,nmax-1
        A(i) = acos(radian)*i
      end do
        
      tf(0) = timer()
      do nrep=1,n_repetitions
        tf(nrep) = timer()
      end do
      t_call=(tf(n_repetitions)-tf(0))/n_repetitions

      if (my_pe.eq.0) then
        call table_top(Title,npes,n_starts,n_mess,n_repetitions,t_call)
      end if

      do log2n=0,log2nmax
        n = 2**log2n
        call mpi_barrier( MPI_COMM_WORLD, ier )
        tf(0) = timer()
        do nrep=1,n_repetitions
          if (my_pe.eq.1) then
            call MPI_SEND(A,8*n,MPI_BYTE,0,10,MPI_COMM_WORLD,ier)
            call MPI_RECV(A,8*n,MPI_BYTE,0,20,MPI_COMM_WORLD,status,ier)
          end if
          if (my_pe.eq.0) then
            call MPI_RECV(A,8*n,MPI_BYTE,1,10,MPI_COMM_WORLD,status,ier)
            call MPI_SEND(A,8*n,MPI_BYTE,1,20,MPI_COMM_WORLD,ier)
          end if
          tf(nrep) = timer()
        end do
        if (my_pe.eq.0) then
          call table_body(8*n,n_mess,n_repetitions,tf,t_call)
        end if
      end do

      call mpi_finalize(ier)
      end
      subroutine table_top( Title,npes,
     .     n_starts,n_mess,n_repetitions,t_call)
      integer       M
      parameter     (M = 1 000 000)
      character*40  Title
      integer       npes,n_starts,n_mess,n_repetitions
      real*8        t_call

      write(6,9010) Title,npes,n_starts,n_mess,n_repetitions,t_call*M

      return
9010  format(//a40,
     .  // ' Number of PEs:  ',i8
     .  // ' Starts:         ',i8,
     .   / ' Messages:       ',i8,
     .   / ' Repetitions:    ',i8,
     .   / ' Timer overhead: ',f8.3,' microsecond',
     .  // 8x,'            First      ',
     .        '              Last     ',
     .   /'  Length',2x,2('      Time       Rate ',1x),
     .   /' [Bytes]',2x,2(' [Microsec.] [Mbyte/s]',1x),
     .   /' ',8('-'),2x,2(21('-'),2x))
      end
      subroutine table_body(n_byte,n_mess,n_repetitions,tf,t_call)
      integer       M
      parameter     (M = 1 000 000)
      integer   n_byte,n_mess,n_repetitions,i
      real*8    tf(0:n_repetitions)
      real*8    t_call
      real*8    t_first,t_last

      t_first = (tf(1)-tf(0))-t_call
      t_last  = (tf(n_repetitions)-tf(n_repetitions-1))-t_call
      write(6,9020) n_byte,t_first*M,n_mess*n_byte/(t_first*M),
     .                      t_last *M,n_mess*n_byte/(t_last *M)
      write(7) n_byte,n_repetitions,n_mess
      write(7) ((tf(i)-tf(i-1))-t_call,i=1,n_repetitions)

      return
9020  format(i8,     2x,2(f10.1,1x,f10.0,2x))
      end

From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 13:06:36 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id NAA08767; Mon, 26 Jan 1998 13:06:36 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA23400; Mon, 26 Jan 1998 12:31:16 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA23166; Mon, 26 Jan 1998 12:30:05 -0500 (EST)
Received: from mordillo ([195.102.195.125]) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA15447; Mon, 26 Jan 98 17:31:15 GMT
Date: Mon, 26 Jan 98 17:23:08 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Fw: Re: Low Level Benchmarks 
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <34CCB99F.2B3C3D63@cumbria.eng.sun.com> 
Message-Id: <Chameleon.885835450.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

This came direct to me... The rest of Parkbench are probably interested
in Bodo's comments.

Mark

--- On Mon, 26 Jan 1998 08:28:15 -0800  Bodo Parady - SMCC Performance Development 
<Bodo.Parady@Eng.Sun.COM> wrote:
> The key items to find are:
> 
>         Lock time (defined as time to release a lock remotely)
>             Example would be reader spinning on memory, waiting
>             for change in memory word, or receipt of interrupt.
>             This is the effective ping-pong half time.  Sadly
>             subroutine and library call overhead can render
>             this result meaningless.
> 
>             Measuring one way rates is no good here since the response
>             time must be factored in.  This is a two-way transfer
> 
>         Channel rate (defined as large block transfer rate).
> 
>         Block size at half channel rate.
> 
>         Block size at twice lock time latency.
> 
>         Full curve, stepping at 1, 2, 4, 8, 16, ..., 2*n byte block sizes
>                 at full issue rate.  This is probably the least important
>                 since it involves coalescence of transmitted data.
> 
> The fear is that given the limitations of MPI/PVM, and to some degree
> of C and Fortran that accurate measures of these quantities may
> not be practical.
> 
> Regards.
> 
>                 Bodo Parady
> 
> Mark Baker wrote:
> 
> > Charles,
> >
> > Thanks for your thoughts and experiences with the Pallas PMB codes -
> > I will  forward them to the authors...  The main points in favour of
> > the PMB codes are that they are in C and potentially produce results
> > for a variety of MPI calls... Obviously if the results they produce are
> > flawed...
> >
> > Regarding new low-level codes I would be in favour of taking up your
> > kind offer of writing a set of codes in C/Fortran. I guess the main
> > problem is getting a concensus with regards methodology and measurements
> > that are used with these codes.
> >
> > Maybe we can decide that a number of actions should be undertaken...
> >
> > 1) It seems clear that no one is 100% happy with the current version
> >    of the low-level codes. So, this implies that they need to be
> >    replaced !?
> >
> > 2) If we are going to replace the codes we can go down a couple of routes;
> >    start from scratch, replace with Roger's new codes or some combination of
> >    both...
> >
> > 3) I would be happy to see us start from scratch and  create
> >    C/Fortran codes where the methodology and design of each can be
> >    "hammered out" by discussion first and then implemented
> >    (and iterated as necessary).
> >
> > 4) Assuming that we want to go down this route, I suggest we make a starting
> >    point of Charles' "suggestions and requirements for the low level
> >    benchmark design" - towards the end of this email. I am happy to
> >    put these words on the web and update/change them as our dicussions
> >    evolve...
> >
> > 5) Charles has offered his services to help write/design/test these new codes -
> >    I'm willing to offer my services in a similar fashion. I'm sure that others
> >    interested in the low-level codes could contribute something here as well.
> >
> > Overall, it seems clear to me that we have enough energy and manpower to
> > produce a new set low-level codes whose methodology and design is correct
> > and relevant to todays systems...
> >
> > I look forward to your comments...
> >
> > Regards
> >
> > Mark
> >
> > --- On Thu, 15 Jan 1998 11:11:39 -0600 (CST)  Charles Grassl <cmg@cray.com> wrote:
> > >
> > > To:      Parkbench interests
> > > From:    Charles Grassl
> > > Subject: Low Level benchmarks
> > >
> > > Date:    15 January, 1998
> > >
> > >
> > > Mark, thank you for pointing us to the PMB benchmark.  It is well written
> > > and coded, but has some discrepancies and shortcomings.  My comments
> > > lead to suggestions and recommendation regarding low level communication
> > > benchmarks.
> > >
> > > First, in program PMB the PingPong tests are twice as fast (in time)
> > > as the corresponding message length tests in the PingPing tests (as run
> > > on a CRAY T3E).  The calculation of the time and bandwidth is incorrect
> > > by a factor of 100% in one of the programs.
> > >
> > > This error can be fixed by recording, using and reporting the actual
> > > time, amount of data sent and their ratio.  That is, the time should not
> > > be divided by two in order to correct for a round trip.  This recorded
> > > time is for a round trip message, and is not precisely the time for
> > > two messages.  Half the round trip message passing time, as reported in
> > > the PMB tests, is not the time for a single message and should not be
> > > reported and such.  This same erroneous technique is used in the COMMS1
> > > and COMMS2 two benchmarks.  (Is Parkbench is responsible for propagating
> > > this incorrect methodology.)
> > >
> > > In program PMB, the testing procedure performs a "warm up".  This
> > > procedure is a poor testing methodology because is discards important
> > > data.  Testing programs such as this should record all times and calculate
> > > the variance and other statistics in order to perform error analysis.
> > >
> > > Program PMB does not measure contention or allow extraction of network
> > > contention data.  Tests "Allreduce" and "Bcast" and several others
> > > stress the inter-PE communication network with multiple messages,
> > > but it is not possible to extract information about the contention from
> > > these tests.  The MPI routines for Allreduce and Bcast have algorithms
> > > which change with respect to number of PEs and message lengths,  Hence,
> > > without detailed information about the specific algorithms used, we cannot
> > > extract information about network performance or further characterize
> > > the inter-PE network.
> > >
> > > Basic measurements must be separated from algorithms.  Tests PingPong,
> > > PingPing, Barrier, Xover, Cshift and Exchange are low level.  Tests
> > > Allreduce and Bcast are algorithms.  The algorithms Allreduce and Bcast
> > > need additional (algorithmic) information in order to be described in
> > > terms of the basic level benchmarks.
> > >
> > >
> > > With respect to low level testing, the round trip exchange of messages,
> > > as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not
> > > characteristic of the lowest level of communication.  This pattern
> > > is actually rather rare in programming practice.  It is more common
> > > for tasks to send single messages and/or to receive single messages.
> > > In this scheme, messages do not make a round trip and there is not
> > > necessarily caching or other coherency effects.
> > >
> > > The single message passing is a distinctly different case from that
> > > of round trip tests.  We should be worried that the round trip testing
> > > might introduce artifacts not characteristic of actual (low level) usage.
> > > We need a better test of basic bandwidth and latency in order to measure
> > > and characterize message passing performance.
> > >
> > >
> > > Here are suggestions and requirements, in an outline form, for a low
> > > level benchmark design:
> > >
> > >
> > >
> > >     I. Single and double (bidirectional) messages.
> > >
> > >        A. Test single messages, not round trips.
> > >          1. The round trip test is an algorithm and a pattern.  As
> > >             such it should not be used as the basic low level test of
> > >             bandwidth.
> > >          2. Use direct measurements where possible (which is nearly
> > >             always).  For experimental design, the simplest method is
> > >             the most desirable and best.
> > >          3. Do not perform least squares fits A PIORI.  We know that
> > >             the various message passing mechanisms are not linear or
> > >             analytic because different mechanisms are used for different
> > >             message sizes.  It is not necessarily known before hand
> > >             where this transition occurs.  Some computer systems have
> > >             more than two regimes and their boundaries are dynamic.
> > >          4. Our discussion of least squares fitting is loosing tract
> > >             of experimental design versus modeling.  For example, the
> > >             least squares parameter for t_0 from COMMS1 is not a better
> > >             estimate of latency than actual measurements (assuming
> > >             that the timer resolution is adequate).  A "better" way to
> > >             measure latency is to perform addition DIRECT measurements,
> > >             repetitions or otherwise, and hence decrease the statistical
> > >             error.  The fitting as used in the COMMS programs SPREADS
> > >             error.  It does not reduce error and hence it is not a
> > >             good technique for measuring such an important parameter
> > >             as latency.
> > >
> > >        B. Do not test zero length messages.  Though valid, zero length
> > >           messages are likely to take special paths through library
> > >           routines.  This special case is not particularly interesting or
> > >           important.
> > >           1. In practice, the most common and important message size is 64
> > >              bits (one word).  The time for this message is the starting
> > >              point for bandwidth characterization.
> > >
> > >        D. Record all times and use statistics to characterize the message
> > >           passing time.  That is, do not prime or warm up caches
> > >           or buffers.  Timings for unprimed caches and buffers give
> > >           interesting and important bounds.  These timings are also the
> > >           nearest to typical usage.
> > >           1. Characterize message rates by a minimum, maximum, average
> > >              and standard deviation.
> > >
> > >        E. Test inhomogeneity of the communication network.  The basic
> > >           message test should be performed for all pairs of PEs.
> > >
> > >
> > >    II. Contention.
> > >
> > >        A. Measure network contention relative to all PEs sending and/or
> > >           receiving messages.
> > >
> > >        B. Do not use high level routines where the algorithm is not known.
> > >           1. With high level algorithms, we cannot deduce which component
> > >              of the timing is attributable to the "operation count"
> > >              and which is attributable to the actual system (hardware)
> > >              performance.
> > >
> > >
> > >   III. Barrier.
> > >
> > >        A. Simple test of barrier time for all numbers of processors.
> > >
> > >
> > >
> > >
> > > Additionally, the suite should be easy to use.  C and Fortran programs
> > > for direct measurements of message passing times are short and simple.
> > > These simple tests are of order 100 lines of code and, at least in
> > > Fortran 90, can be written in a portable and reliable manner.
> > >
> > > The current Parkbench low level suite does not satisfy the above
> > > requirements.  It is inaccurate, as pointed out by previous letters, and
> > > uses questionable techniques and methodologies.  It is also difficult to
> > > use, witness the proliferation of files, patches, directories, libraries
> > > and the complexity and size of the Makefiles.
> > >
> > > This Low Level suite is a burden for those who are expecting a tool to
> > > evaluate and investigate computer performance.  The suite is becoming
> > > a liability for our group.  As such, it should be withdrawn from
> > > distribution.
> > >
> > > I offer to write, test and submit a new set of programs which satisfy
> > > most of the above requirements.
> > >
> > >
> > > Charles Grassl
> > > SGI/Cray Research
> > > Eagan, Minnesota  USA
> > >
> >
> > ---------------End of Original Message-----------------
> >
> > -------------------------------------
> > CSM, University of Portsmouth, Hants, UK
> > Tel: +44 1705 844285    Fax: +44 1705 844006
> > E-mail: mab@sis.port.ac.uk
> > Date: 01/26/98 - Time: 10:14:38
> > URL http://www.sis.port.ac.uk/~mab/
> > -------------------------------------
> 
> 
> 
> 

---------------End of Original Message-----------------

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 01/26/98 - Time: 17:23:08
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 14:08:38 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id OAA11289; Mon, 26 Jan 1998 14:08:37 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id NAA02837; Mon, 26 Jan 1998 13:52:54 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id NAA02817; Mon, 26 Jan 1998 13:52:50 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id NAA11755; Mon, 26 Jan 1998 13:52:49 -0500 (EST)
Date: Mon, 26 Jan 1998 13:52:49 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801261852.NAA11755@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Fw: Re: Low Level Benchmarks 
In-Reply-To: Mail from 'Mark Baker  <mab@sis.port.ac.uk>'
      dated: Mon, 26 Jan 98 17:23:08 GMT
Cc: worley@haven.EPM.ORNL.GOV

(From Charles Grassl)

> Last year we agreed (via email exchanges) that the Parkbench Low Level
> benchmark suite is not intended to be an -MPI- -test- suite.  There was a
> consensus that we intended to measure low level performance, not algorithm
> design or implementation. 

(From Bodo Parady via Mark Baker)

> The fear is that given the limitations of MPI/PVM, and to some degree
> of C and Fortran that accurate measures of these quantities may
> not be practical.
> 

I have a problem with attempting to determine low level communication
performance parameters independent of the communication library when it

a) is such a difficult task (I doubt that any portable program will be
   "accurate enough" across all the interesting platforms.) 

b) does not reflect what users would see in practice (since they will be
   using MPI or PVM in C or Fortran).

Am I missing something? The primary utility (for me) of the low level
benchmarks is to help explain the performance observed in the Parkbench
kernels and compact applications, or in my own codes. What level of accuracy
is required for such an application? Are more accurate or detailed
measurements useful or doable? 

Upon reflection, such low(er) level performance data would be useful to the
developer of a communication library, to help evaluate its performance, but
that appears to require system-specific measurements (and system-specific
interpretation). Is this really something we want to attempt?

Pat Worley










From owner-parkbench-comm@CS.UTK.EDU Thu Jan 29 16:29:33 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id QAA19023; Thu, 29 Jan 1998 16:29:33 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id QAA09768; Thu, 29 Jan 1998 16:18:48 -0500 (EST)
Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id QAA09756; Thu, 29 Jan 1998 16:18:45 -0500 (EST)
Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id QAA01325; Thu, 29 Jan 1998 16:18:43 -0500 (EST)
Date: Thu, 29 Jan 1998 16:18:43 -0500 (EST)
From: Pat Worley <worley@haven.EPM.ORNL.GOV>
Message-Id: <199801292118.QAA01325@haven.EPM.ORNL.GOV>
To: parkbench-comm@CS.UTK.EDU
Subject: Re: Fw: Re: Low Level Benchmarks 
In-Reply-To: Mail from 'Mark Baker  <mab@sis.port.ac.uk>'
      dated: Mon, 26 Jan 98 17:23:08 GMT
Cc: worley@haven.EPM.ORNL.GOV

In a private exchange, Charles Grassl made a comment that he may come to
regret: 

" We need more input, such as yours, as to what are the important parameters
  and what accuracy is needed. "

so here are some random comments. 

I have been organizing my own performance data over the last couple of weeks.
I never paid too much attention to the detailed output of my own ping-ping
and ping-pong tests because it was not the end product of the research. It
has been enlightening to look at it now. The entry point is

  http://www.epm.ornl.gov/~worley/studies/pt2pt.html

I tried a couple of different fitting techniques, but decided that fits told
me nothing that I was interested in. What I have found mildly interesting is
to measure statistics of the data, and try to build a performance model using
those. The difference is that the interpretation and value of the statistics 
(maximum observed bandwidth, time to send 0 length message, etc.)
are not functions of any model error. The problem with fitting the data is
that, no matter how often I tell myself that it is simply a compact
representation of the data, I keep wanting to use assign meaning to the model
parameters and use them in interplatform comparisons. 

In summary, I have changed my mind. I no longer support even simple fits
to the data unless well-defined statistical measures of the data are also
included (and emphasized).

Pat Worley




From owner-parkbench-comm@CS.UTK.EDU Mon Feb  9 05:05:12 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id FAA29859; Mon, 9 Feb 1998 05:05:11 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id EAA10483; Mon, 9 Feb 1998 04:57:14 -0500 (EST)
Received: from gatekeeper.pallas.de (gatekeeper.pallas.de [194.45.33.1]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id EAA10476; Mon, 9 Feb 1998 04:57:07 -0500 (EST)
Received: from mailhost.pallas.de (gatekeeper [194.45.33.1])
	by gatekeeper.pallas.de (SMI-8.6/SMI-SVR4) with SMTP id KAA18803;
	Mon, 9 Feb 1998 10:50:10 +0100
Received: from schubert.pallas.de by mailhost.pallas.de (SMI-8.6/SMI-SVR4)
	id KAA03909; Mon, 9 Feb 1998 10:50:07 +0100
Received: from localhost by schubert.pallas.de (SMI-8.6/SMI-SVR4)
	id KAA11268; Mon, 9 Feb 1998 10:46:57 +0100
Date: Mon, 9 Feb 1998 10:46:45 +0100 (MET)
From: Hans Plum <hans@pallas.de>
X-Sender: hans@schubert
Reply-To: Hans Plum <hans@pallas.de>
To: cmg@cray.com, mab@sis.port.ac.uk, parkbench-comm@CS.UTK.EDU
cc: snelling@fecit.co.uk
Subject: Re: Low Level Benchmarks  (fwd)
In-Reply-To: <Pine.SOL.3.95q.980127105513.4334A-100000@sun5>
Message-ID: <Pine.SOL.3.95q.980209103111.10381D-100000@schubert>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MimeMultipartBoundary"

--MimeMultipartBoundary
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi,

I am the "PMB person" at PALLAS Gmbh. I have heard about your discussions. 
First note that there is a new version PMB1.2, see

http://www.pallas.de/pages/pmb.htm

Also look at the PMB1.2_doc.ps.gz where we try to give the reasoning for
all decisions made in PMB. We think nothing has been designed sloppy ..

PMB has been developed from point of view of an application developer
which I am. Of course a single person's view is limited, but for myself
the information given by PMB provides a solid base for algorithmic
estimates and decisions. That exactly what we wanted: Something EASY 
(and not COMPLETE) that covers may be 80% of the realistic situations.

-------------------------------------------------------------   ---/---
Dr Hans-Joachim Plum        phone      : +49-2232-1896-0          / /
PALLAS GmbH                 direct line: +49-2232-1896-18        / / /
Hermuelheimer Strasse 10    fax        : +49-2232-1896-29       / / / /
D-50321 Bruehl              email      : plum@pallas.de          / / /  
Germany                     URL        : http://www.pallas.de     / / PALLAS
-------------------------------------------------------------   ---/---


--MimeMultipartBoundary--

From owner-parkbench-comm@CS.UTK.EDU Wed Apr 22 07:43:42 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id HAA03238; Wed, 22 Apr 1998 07:43:41 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA03111; Wed, 22 Apr 1998 07:05:23 -0400 (EDT)
Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.39]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA03104; Wed, 22 Apr 1998 07:05:21 -0400 (EDT)
Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net
           id aa1028865; 22 Apr 98 11:00 GMT
Message-ID: <XmM5ECAn2cP1EwUB@minnow.demon.co.uk>
Date: Wed, 22 Apr 1998 11:59:51 +0100
To: parkbench-comm@CS.UTK.EDU
From: Roger Hockney <roger@minnow.demon.co.uk>
Subject: Announcing PICT2.1 - Now fully Operational
MIME-Version: 1.0
X-Mailer: Turnpike Version 3.03a <kRL7V2isFfDmnKSZb08I5Tyfx$>


To: the Parkbench discussion group

From: Roger

                   ANNOUNCING PICT 2.1 (1 Mar 1998)
                   --------------------------------

I am pleased to announce the first fully-functional version of the 
Parkbench Interactive Curve-Fitting Tool (PICT).

Provision is made for a wide range of screen sizes in pixels by
allowing the user to make a suitable choice in the opening HTML page.

All buttons now work. In particular Jack can have his least-squares 
fitting of the 2-parameters direct from the tool, and this can be 
performed over partial ranges of the data as required. The same 
applies to the Three-point fitting procedure to obtain the 
3-parameter fits.

There is also a nice "Temperature Gauge" feature that helps you 
minimise the error during manual fitting.

The results of these fits can be assembled in a results file and 
annotated using the SAVE buttons. Under MSIE I find I am able to 
store these results in my local disk file system using SAVE as ...

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The methodology of the 2-parameter curve fitting is given in detail 
in my book "The Science of Computer Benchmarking", see: 

              http://www.siam.org/catalog/mcc07/hockney.htm 

The 3-parameter fit was described quite fully in my talk to the 
11 Sep 1997 Parkbench meeting. I have finally written this up with 
pretty pictures for the PEMCS Web Journal. Look at:

    http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/
           talks/Roger-Hockney/perfprof1.html

To try out PICT 2.1 please first try my own Demon Web space which 
has a counter from which I can judge usage: 

       http://www.minnow.demon.co.uk/pict/source/pict2a.html

If this gives problems, it is also mounted on the University of 
Westminster server:

   http://perun.hscs.wmin.ac.uk/LocalInfo/pict/source/pict2a.html

We expect soon to make it available on the Southampton server.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PICT 2.1 has been tested by a small number of friends. Most 
problems and frustrations arise from either slowness of the 
server or of the users' computer.

If download from Demon is slow or appears to hang, try the other
server or try Demon later. Please do not conclude the applet is 
broken. I am confident it is not. A 10 to 20 second wait is 
normal when bringing up the requested graphical window/frame 
even on a good day.

Once the graphical window is on your computer and the applet is 
running, the speed is determined by the speed of your computer.

You may even disconnect from the Web at this stage and continue 
curve fitting with the applet with the data displayed. If you 
want new data, you must, of course, reconnect to the Web and 
use the GET DATA at URL button.

Experience shows that the PICT applet will not respond 
satisfactorily on a computer with slower than a 100 MHz clock. 
This is because a lot of complex calculations must be performed 
as you drag the curves around the data. MSIE seems to work
noticeably faster than Netscape on my Win95 PC. There is no
cure for this except to use a faster computer. But again
please do not think the applet is brocken.

Please report experiences good or bad to:

              roger@minnow.demon.co.uk

Constructive suggestions for improvement are also welcome. 
-- 
Roger Hockney.  Checkout my new Web page at URL   http://www.minnow.demon.co.uk

From owner-parkbench-comm@CS.UTK.EDU Sun Jun 21 10:02:47 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA22167; Sun, 21 Jun 1998 10:02:47 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA06272; Sun, 21 Jun 1998 09:47:56 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA06265; Sun, 21 Jun 1998 09:47:54 -0400 (EDT)
Received: from mordillo (p4.nas1.is5.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA17767; Sun, 21 Jun 98 14:50:05 BST
Date: Sun, 21 Jun 98 14:43:42 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: New PEMCS papers
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.898436744.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear All,

Two new papers have just been published by the PEMCS journal...

3.Comparing The Performance of MPI on the Cray T3E-900, The Cray
  Origin2000 And The IBM P2SC, by Glenn R. Luecke and James J. Coyle 
  Iowa State University, Ames, Iowa 50011-2251, USA. 
4.EuroBen Experiences with the SGI Origin 2000 and the Cray T3E, by
  A.J. van der Steen, Computational Physics, Utrecht University,
  Holland* 

See http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/

Regards

Mark

-------------------------------------
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 06/21/98 - Time: 14:43:42
URL http://www.sis.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Fri Sep 11 12:05:18 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA19578; Fri, 11 Sep 1998 12:05:18 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA20703; Fri, 11 Sep 1998 11:54:29 -0400 (EDT)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id LAA20636; Fri, 11 Sep 1998 11:53:20 -0400 (EDT)
Received: from mordillo (p36.nas1.is5.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA11111; Fri, 11 Sep 98 16:48:17 BST
Date: Fri, 11 Sep 98 14:38:08 +0000
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: CPE - Call for papers - Message Passing Interface-based Parallel Programming with Java
To: javagrandeforum@npac.syr.edu,
        "'mpi-nt-users@erc.msstate.edu'"  <mpi-nt-users@ERC.MsState.EDU>,
        "Dr. Kenneth A. Williams"  <williams@ncat.edu>,
        "Stephen L. Scott"  <sscott@msr.epm.ornl.gov>,
        "Aad J. van der Steen"  <steen@fys.ruu.nl>,
        Advanced Java  <advanced-java@xcf.berkeley.edu>,
        Alexander Reinefeld  <ar@uni-paderborn.de>,
        Andy Grant  <andyg@manchester.sgi.com>,
        Anne Trefethen  <anne@nag.co.uk>, Bryan Capenter  <dbc@npac.syr.edu>,
        Charles Grassl  <cmg@cray.com>, Dave Beckett  <D.J.Beckett@ukc.ac.uk>,
        David Snelling  <snelling@fecit.co.uk>,
        DIS Everyone  <dis-all@sis.port.ac.uk>, fagg@CS.UTK.EDU,
        gentzsch@genias.de, Guy Robinson  <robinson@arsc.edu>,
        Hon W Yau  <hwyau@epcc.ed.ac.uk>, hpvm@cs.uiuc.edu,
        Jack Dongarra  <dongarra@CS.UTK.EDU>, java-for-cse@npac.syr.edu,
        Joao Gabriel Silva  <jgabriel@dei.uc.pt>,
        jtap-club-clusters@mailbase.ac.uk,
        Ken Hawick  <khawick@cs.adelaide.edu.au>,
        Mike Berry  <berry@CS.UTK.EDU>, mpijava-users@npac.syr.edu,
        owner-grounds@mail.software.ibm.com, parkbench-comm@CS.UTK.EDU,
        partners@globus.org, Paul Messina  <messina@cacr.caltech.edu>,
        Roland Wismueller  <wismuell@informatik.tu-muenchen.de>,
        Steve Larkin - AVS  <stevel@avsuk.com>,
        Terri Canzian  <kea@cacr.caltech.edu>,
        Tony Hey  <ajgh@ecs.soton.ac.uk>, topic@mcc.ac.uk,
        Vaidy Sunderam  <vss@mathcs.emory.edu>,
        Vladimir Getov  <vsg@ecs.soto.ac.uk>,
        William Gropp  <gropp@mcs.anl.gov>
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
Message-Id: <Chameleon.905526764.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Dear Colleague,,

Firstly, I apologise for any cross-posting of this email.

If this CFP is not in your field we would appreciate you forwarding it
to your colleagues who may be in the field.

This CFP can be found at http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java/

Regards

Dr Mark Baker
University of Portsmouth, UK

----------------------------------------------------------------------------
Call For Papers

Special Issue of Concurrency: Practice and Experience

Message Passing Interface-based Parallel Programming with Java

Guest Editors
Anthony Skjellum (MPI Software Technology, Inc.)
Mark Baker (University of Portsmouth)

A special issue of Concurrency: Practice and Experience (CPE) is being
planned for Fall of  1999. Papers submitted and accepted for this issue
will be published by John Wiley & Sons  Ltd. in the CPE Journal and in
addition will be made available electronically via the WWW. 

Background 
  
Recently there has been a great deal of interest in the idea that Java
may be a good language  for scientific and engineering computation, and
in particular for parallel computing. The  claims made on behalf of
Java, that it is simple, efficient and platform-neutral - a natural 
language for network programming - make it potentially attractive to
scientific programmers  hoping to harness the collective computational
power of networks of workstations and PCs,  or even of the Internet. 

A basic prerequisite for parallel programming is a good communication
API. Java comes with  various ready-made packages for communication,
notably an easy-to-use interface to BSD  sockets, and the Remote Method
Invocation (RMI) mechanism. Interesting as these  interfaces are, it is
questionable whether parallel programmers will find them especially 
convenient. Sockets and remote procedure calls have been around for
about as long as  parallel computing has been fashionable, and neither
of them has been popular in that field.  Both communication models are
optimized for client-server programming, whereas the  parallel computing
world is mainly concerned with "symmetric" communication, occurring in 
groups of interacting peers. 

This symmetric model of communication is captured in the successful
Message Passing  Interface standard (MPI), established a few years ago.
MPI directly supports the Single  Program Multiple Data (SPMD) model of
parallel computing, wherein a group of processes  cooperate by executing
identical program images on local data values. Reliable point-to-point 
communication is provided through a shared, group-wide communicator,
instead of socket  pairs. MPI allows numerous blocking, non-blocking,
buffered or synchronous communication  modes. It also provides a library
of true collective operations (broadcast is the most trivial  example).
An extended standard, MPI 2, allows for dynamic process creation and
access to  memory in remote processes. 

Call For Papers 

This is a call for papers about the designs, experience, and results
concerning the use of the  Message Passing Interface (MPI) with Java are
sought for a special issue of Concurrency  Practice and Experience.
Development of clear understanding of the opportunities,  challenges,
and state-of-the-art in scalable, peer-oriented messaging with Java are
of interest  and value to both the distributed computing and high
performance computing communities. 

Topics of interest for this special issue include but are not limited to: 

-- Practical systems that use MPI and Java to solve real distributed
   high performance  computing problems. 
-- Designs of systems for combining MPI-type functionality with Java. 
-- Approaches to APIs for object-oriented, group-oriented message passing with Java. 
-- Efforts to combine MPI with CORBA in a Java environment. 
-- Efforts to utilize aspects of the emerging MPI/RT standard are also
   of interest in the  Java context. 
-- Efforts to do MPI interoperability (IMPI) using Java. 
-- Issues and both tactical and strategic solutions concerning MPI-1 and
   MPI-2 standard  and features in conjunction with Java. 
-- Performance results and performance-enhancing techniques for such systems. 
-- Flexible frameworks and techniques for enabling High-Performance
   communication in Java

Timescales for Submission

There is a deadline of  15th December 1998 for submitted papers. Publication is currently 
scheduled for the third quarter of 1999. 
  
Activity 		Deadline
Call For Papers 	1st September 1998
Paper Submission	15th December 1998
Papers Returned		15th March 1999
Papers Approved		1st April 1999
Publication		Q3 1999

Further details about this special issue can be found at:

http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java/

----------------------------------------------------------------------------


-------------------------------------
Dr Mark baker
CSM, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 09/11/98 - Time: 14:38:08
URL http://www.dcs.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Tue Sep 15 22:24:43 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id WAA13441; Tue, 15 Sep 1998 22:24:43 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id WAA25353; Tue, 15 Sep 1998 22:23:26 -0400 (EDT)
Received: from octane11.nas.nasa.gov (octane11.nas.nasa.gov [129.99.34.116]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id WAA25343; Tue, 15 Sep 1998 22:23:08 -0400 (EDT)
Received: (from saini@localhost)
	by octane11.nas.nasa.gov (8.8.7/NAS8.8.7) id TAA24915;
	Tue, 15 Sep 1998 19:17:45 -0700 (PDT)
From: "Subhash Saini" <saini@nas.nasa.gov>
Message-Id: <9809151917.ZM24910@octane11.nas.nasa.gov>
Date: Tue, 15 Sep 1998 19:17:44 -0700
In-Reply-To: Mark Baker  <mab@sis.port.ac.uk>
        "CPE - Call for papers - Message Passing Interface-based Parallel Programming with Java" (Sep 11,  2:38pm)
References: <Chameleon.905526764.mab@mordillo>
X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail)
To: "'mpi-nt-users@erc.msstate.edu'"  <mpi-nt-users@ERC.MsState.EDU>,
        "Aad J. van der Steen"  <steen@fys.ruu.nl>,
        "Dr. Kenneth A. Williams"  <williams@ncat.edu>,
        "Stephen L. Scott"  <sscott@msr.epm.ornl.gov>,
        Advanced Java  <advanced-java@xcf.berkeley.edu>,
        Alexander Reinefeld  <ar@uni-paderborn.de>,
        Andy Grant  <andyg@manchester.sgi.com>,
        Anne Trefethen  <anne@nag.co.uk>, Bryan Capenter  <dbc@npac.syr.edu>,
        Charles Grassl  <cmg@cray.com>, DIS Everyone  <dis-all@sis.port.ac.uk>,
        Dave Beckett  <D.J.Beckett@ukc.ac.uk>,
        David Snelling  <snelling@fecit.co.uk>,
        Guy Robinson  <robinson@arsc.edu>, Hon W Yau  <hwyau@epcc.ed.ac.uk>,
        Jack Dongarra  <dongarra@CS.UTK.EDU>,
        Joao Gabriel Silva  <jgabriel@dei.uc.pt>,
        Ken Hawick  <khawick@cs.adelaide.edu.au>,
        Mark Baker  <mab@sis.port.ac.uk>, Mike Berry  <berry@CS.UTK.EDU>,
        Paul Messina  <messina@cacr.caltech.edu>,
        Roland Wismueller  <wismuell@informatik.tu-muenchen.de>,
        Steve Larkin - AVS  <stevel@avsuk.com>,
        Terri Canzian  <kea@cacr.caltech.edu>,
        Tony Hey  <ajgh@ecs.soton.ac.uk>,
        Vaidy Sunderam  <vss@mathcs.emory.edu>,
        Vladimir Getov  <vsg@ecs.soto.ac.uk>,
        William Gropp  <gropp@mcs.anl.gov>, fagg@CS.UTK.EDU,
        gentzsch@genias.de, hpvm@cs.uiuc.edu, java-for-cse@npac.syr.edu,
        javagrandeforum@npac.syr.edu, jtap-club-clusters@mailbase.ac.uk,
        mpijava-users@npac.syr.edu, owner-grounds@mail.software.ibm.com,
        parkbench-comm@CS.UTK.EDU, partners@globus.org, topic@mcc.ac.uk
Subject: AD _ Workshop 
Cc: mab@sis.port.ac.uk, saini@octane11.nas.nasa.gov
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

You are invited to attend the workshop (see below).

Best regards, subhash

==============================================================================

                           ***** REGISTER NOW *****

                         *** NO REGISTRATION FEE ***

              **** Last Day to Register is Sept. 23, 1998 ****



     "First NASA Workshop on Performance-Engineered Information Systems"
      -----------------------------------------------------------------

			     Sponsored by

               Numerical Aerospace Simulation Systems Division
                         NASA Ames Research Center
                      Moffett Field, California, USA

			  September 28-29, 1998

                   Workshop Chairman: Dr. Subhash Saini

                http://science.nas.nasa.gov/Services/Training

Invited  Speakers:
------------------
Adve, Vikram (Rice University)
Aida, K. (Tokyo Institute of Technology, JAPAN)
Bagrodia, Rajive (University of California, Los Angeles)
Becker, Monique (Institute Nationale des Tele. FRANCE)
Berman, Francine (University of California, San Diego)
Browne, James C. (University of Texas)
Darema, Frederica (U.S. National Science Foundation-CISE)
Dongarra, Jack (Oak Ridge National Laboratory)
Feiereisen, Bill (NASA Ames Research Center)
Fox, Geoffrey (Syracuse University)
Gannon, Dennis (Indiana University)
Gerasoulis, Apostolos (Rutgers University)
Gunther, Neil J. (Performance Dynamics Company)
Hey, Tony (University of Southampton UK)
Hollingsworth, Jeff (University of Maryland)
Jain, Raj (Ohio State University)
Keahy, Kate (Los Alamos National Laboratory)
Mackenzie, Lewis M. (University of Glasgow, Scotland UK)
McCalpin, John (Silicon Graphics)
Menasce, Daniel A. (George Mason University)
Nudd, Graham (University of Warwick UK)
Reed, Dan (University of Illinois)
Saltz, Joel (University of Maryland)
Simmons, Margaret (San Diego Supercomputer Center)
Vernon, Mary (University of Wisconsin)

Topics include:
--------------

  - Performance-by-design techniques for high-performance distributed
    information systems
  - Large transients in packet-switched and circuit-switched networks
  - Workload characterization techniques
  - Integrated performance measurement, analysis, and prediction
  - Performance measurement and modeling in IPG
  - Performance models for threads and distributed objects
  - Application emulators and simulation models
  - Performance prediction engineering of Information Systems including IPG
  - Performance characterization of scientific and engineering applications
    of interest to NASA, DoE, DoD, and industry
  - Scheduling tools for performance prediction of parallel programs
  - Multi-resolution simulations for large-scale I/O-intensive applications
  - Capacity planning for Web performance: metrics, models, and methods


Contact: Marcia Redmond, redmond@nas.nasa.gov, (650) 604-4373

Registration: Advanced registration is required.

Registration Fee: NONE.

Registration Deadlines: Friday, September 23, 1998

There will be no onsite registration.

Contact: Send registration information and direct questions to
	 Marcia Redmond, redmond@nas.nasa.gov, (650) 604-4373.

DESCRIPTION:

The basic goal of performance modeling is to predict and understand the
performance of a computer program or set of programs on a computer
system. The applications of performance modeling are numerous,
including evaluation of algorithms, optimization of code
implementations, parallel library development, comparison of system
architectures, parallel system design, and procurement of new systems.
The most reliable technique for determining the performance of a
program on a computer system is to run and time the program multiple
times, but this can be very expensive and it rarely leads to any deep
understanding of the performance issues. It also does not provide
information on how performance will change under different
circumstances, for example with scaling the problem or system
parameters or porting to a different machine. The complexity of new
parallel supercomputer systems presents a daunting challenge to the
application scientists who must understand the system's behavior to
achieve a reasonable fraction of the peak performance. The NAS Parallel
Benchmarks (NPB) have exposed a large difference between peak and
achievable performance. Such a dismal performance is not surprising,
considering the complexity of these parallel distributed memory
systems.  At present, performance modeling, measurement, and analysis
tools are inadequate for distributed/networked systems such as
Information Power Grid (IPG). The purpose of performance-based
engineering is to develop new methods and tools that will enable
development of these information systems faster, better and cheaper.

================================================================================
                             Registration

     "First NASA Workshop on Performance-Engineered Information Systems"

Send the following information to redmond@nas.nasa.gov

Name _____________________________________________

Organization _____________________________________

Street Address ___________________________________

City ____________________ State __________________

Zip/Mail Code ___________ Country ________________

Phone ___________________ Fax ____________________

Email address ____________________________________

U.S. Citizen __________ Permanent Resident with Green Card ________

*******************************************************************************

Foreign National ________

(non-U.S. Citizen). Must complete the following information:

Passport number ______________________

Name as it appears on passport _______________________________________

Date issued _____________

Date expires _________________

Country of citizenship____________________________



From owner-parkbench-lowlevel@CS.UTK.EDU Wed Oct 21 02:28:56 1998
Return-Path: <owner-parkbench-lowlevel@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id CAA13270; Wed, 21 Oct 1998 02:28:55 -0400
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id CAA23157; Wed, 21 Oct 1998 02:24:45 -0400 (EDT)
Received: from mail2.one.net (mail2.one.net [206.112.192.100]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id CAA23150; Wed, 21 Oct 1998 02:24:43 -0400 (EDT)
Received: from port-29-44.access.one.net ([206.112.210.106] HELO aol.com ident: IDENT-NOT-QUERIED [port 22788]) by mail2.one.net with SMTP id <17237-27384>; Wed, 21 Oct 1998 02:10:48 -0400
From: Online@nj.com
To: Online@nj.com
Subject:  Advertise with Bulk Email!
Message-Id: <19981021061048Z17237-27384+1398@mail2.one.net>
Date: 	Wed, 21 Oct 1998 02:10:42 -0400

___________________________________________________________

Anouncing a Bulk Friendly Isp!

We Bulk Email!
 Are you tired of getting kicked offline for Bulk Emailing?
Well now you can bulk email without getting kicked offline. Call Online Direct a Bulk Friendly ISP.
513 874 7437

For only 125$  a month plus a 50$ setup fee we will send out 35,000 emails a week for you.
Plus provide you with a bullet proof pop 3 email acount so you can recieve all of your mail.
Ask About our special offers up to 100,000 emails per day!
 Any type of bulk adversting! We Do it Right!
Advertise Smart  Bulk Email Today!
Call Online Direct at 513 874 7437
We can also Provide bullet pop 3 email acounts!
CALL TODAY!

513 874 7437

if you wish to be removed from this list please type remove in reply box


From owner-parkbench-comm@CS.UTK.EDU Sun Oct 25 12:41:55 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id MAA29754; Sun, 25 Oct 1998 12:41:54 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA29327; Sun, 25 Oct 1998 12:36:00 -0500 (EST)
Received: from pan.ch.intel.com (pan.ch.intel.com [143.182.246.24]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id MAA29319; Sun, 25 Oct 1998 12:35:58 -0500 (EST)
Received: from sedona.intel.com (sedona.ch.intel.com [143.182.218.21])
	by pan.ch.intel.com (8.8.6/8.8.5) with ESMTP id RAA16591;
	Sun, 25 Oct 1998 17:35:56 GMT
Received: from ccm.intel.com ([143.182.69.127]) by sedona.intel.com (8.9.1a/8.9.1a-chandler01) with ESMTP id KAA27181; Sun, 25 Oct 1998 10:35:54 -0700 (MST)
Message-ID: <36336126.B26DEE2C@ccm.intel.com>
Date: Sun, 25 Oct 1998 10:34:30 -0700
From: Anjaneya Chagam <Anjaneya.Chagam@mail.intel.com>
X-Mailer: Mozilla 4.05 [en] (Win95; I)
MIME-Version: 1.0
To: parkbench-comm@CS.UTK.EDU, Anjaneya.Chagam@intel.com
Subject: Question on parkbench source code in c
X-Priority: 1 (Highest)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi:
    I am looking for packbench benchmarking programs source code in c
language to do benchmarking comparison on Chime and PVM on NT platform @
Arizona State University. Could you please let me know if the parkbench
programs are ported to c, if so where can I find them?

Thanks a million.

Name: Anjaneya R. Chagam
Email: Anjaneya.Chagam@intel.com


From owner-parkbench-comm@CS.UTK.EDU Mon Oct 26 06:36:26 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id GAA11147; Mon, 26 Oct 1998 06:36:25 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA07390; Mon, 26 Oct 1998 06:27:13 -0500 (EST)
Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id GAA07383; Mon, 26 Oct 1998 06:27:10 -0500 (EST)
Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1)
	id AA15115; Mon, 26 Oct 98 11:29:54 GMT
Date: Mon, 26 Oct 98 11:14:06 GMT
From: Mark Baker  <mab@sis.port.ac.uk>
Subject: Re: Question on parkbench source code in c 
To: Anjaneya.Chagam@intel.com, parkbench-comm@CS.UTK.EDU
X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc.
X-Priority: 3 (Normal)
References: <36336126.B26DEE2C@ccm.intel.com> 
Message-Id: <Chameleon.909400696.mab@mordillo>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Anjaneya,

The official Parkbench code are only  available in Fortran 77.

I rememer vaguely sometime back hearing about a graduate-students
attempt to "port" some of the low-level codes to C.

Charles Grassl (Cray) and I did a little work on some simple C
PingPong codes.

You can check-out these on...

http://www.sis.port.ac.uk/~mab/TOPIC/

Regards

Mark

--- On Sun, 25 Oct 1998 10:34:30 -0700  Anjaneya Chagam <Anjaneya.Chagam@mail.intel.com> 
wrote:
> Hi:
>     I am looking for packbench benchmarking programs source code in c
> language to do benchmarking comparison on Chime and PVM on NT platform @
> Arizona State University. Could you please let me know if the parkbench
> programs are ported to c, if so where can I find them?
> 
> Thanks a million.
> 
> Name: Anjaneya R. Chagam
> Email: Anjaneya.Chagam@intel.com
> 
> 

---------------End of Original Message-----------------

-------------------------------------
DCS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: mab@sis.port.ac.uk
Date: 10/26/98 - Time: 11:14:07
URL:  http://www.dcs.port.ac.uk/~mab/
-------------------------------------


From owner-parkbench-comm@CS.UTK.EDU Mon Nov 16 10:06:23 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id KAA11375; Mon, 16 Nov 1998 10:06:23 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id JAA08949; Mon, 16 Nov 1998 09:01:33 -0500 (EST)
Received: from del2.vsnl.net.in (del2.vsnl.net.in [202.54.15.30]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id JAA08936; Mon, 16 Nov 1998 09:01:28 -0500 (EST)
Received: from sameer.myasa.com ([202.54.106.39])
	by del2.vsnl.net.in (8.9.1a/8.9.1) with SMTP id TAA13392
	for <parkbench-comm@cs.utk.edu>; Mon, 16 Nov 1998 19:30:37 -0500 (GMT)
From: "Kashmir Kessar Mart" <kkmartsgr@hotmail.com>
To: <parkbench-comm@CS.UTK.EDU>
Subject: Information
Date: Mon, 16 Nov 1998 19:30:48 +0530
Message-ID: <01be1169$838dd020$276a36ca@sameer.myasa.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_006D_01BE1197.9D460C20"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 4.71.1712.3
X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3

This is a multi-part message in MIME format.

------=_NextPart_000_006D_01BE1197.9D460C20
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Dear Sir,=20
             I have seen your Web Site but could not understand what =
your company is.
Please let me know if you can provide me information regarding Walnut =
Kernels.

Regards
Azad.

------=_NextPart_000_006D_01BE1197.9D460C20
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD W3 HTML//EN">
<HTML>
<HEAD>

<META content=3Dtext/html;charset=3Diso-8859-1 =
http-equiv=3DContent-Type>
<META content=3D'"MSHTML 4.71.1712.3"' name=3DGENERATOR>
</HEAD>
<BODY bgColor=3D#fff8e0>
<DIV><FONT color=3D#000000 size=3D2>Dear Sir, </FONT></DIV>
<DIV><FONT color=3D#000000=20
size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;=20
I have seen your Web Site but could not understand what your company=20
is.</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>Please let me know if you can =
provide me=20
information regarding Walnut Kernels.</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT color=3D#000000 size=3D2>Regards</FONT></DIV>
<DIV><FONT color=3D#000000 size=3D2>Azad.</FONT></DIV></BODY></HTML>

------=_NextPart_000_006D_01BE1197.9D460C20--


From owner-parkbench-comm@CS.UTK.EDU Fri Dec  4 15:44:53 1998
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	id PAA21941; Fri, 4 Dec 1998 15:44:53 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA21231; Fri, 4 Dec 1998 15:18:56 -0500 (EST)
Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id PAA21223; Fri, 4 Dec 1998 15:18:51 -0500 (EST)
From: <info@genias.de>
Received: (qmail 14706 invoked by uid 233); 4 Dec 1998 20:14:46 -0000
Date: 4 Dec 1998 20:14:46 -0000
Message-ID: <19981204201446.14705.qmail@gimli.genias.de>
Reply-to: majordomo@genias.de
To: parkbench-comm@CS.UTK.EDU
Subject: Newsletter on Distributed and Parallel Computing

Dear Colleague,

as already announced a few weeks ago, this is now the very first issue of
our bi-monthly electronic Newsletter on Distributed and Parallel Computing,
DPC NEWS. 

!! If you want to receive DPC NEWS regularly, please just return this  !!
!! e-mail to majordomo@genias.de with                                  !!
!!                                                                     !!
!!  subscribe newsletter    or    subscribe newsletter <email address> !!
!!  end                           end                                  !!
!!                                                                     !!
!! in the first two lines of the email-body (text area).               !!

This newsletter is a FREE service to the DPC Distributed and Parallel
Computing community. It regularly informs on new developments and results
in DPC, e.g. conferences, important weblinks, new books and other relevant
news in distributed and parallel computing.

We also keep all the information in the special newsletter section of our
webpage ( http://www.genias.de/dpcnews/ ) to provide a wealth of infos for 
the DPC community.

If you have any information which might fit into these DPC subjects, please
send it to me together with the corresponding weblink, for publication in
DPC News. We aim to reach a very broad community with this DPC Newletter.

With  Season's Greetings from GENIAS
Wolfgang Gentzsch, CEO and President

=====================================================================
      DPC NEWSletter on Distributed and Parallel Computing
                GENIAS Software, December 1998
                ------------------------------
                http://www.genias.de/dpcnews/

GENIAS NEWS:

EASTMAN CHEMICAL USES CODINE FOR MOLECULAR MODELING
Eastman Chemical uses commercial quantum chemistry programs, like Gaussian,
Jaguar, and Cerius2, to model chemical products, intermediates, catalysts,
etc. The simulation jobs take between 1 hour and 6 days to complete. 
Queuing software is an important part of keeping the processors working at 
full utilization, without being overloaded. Since October, with the new
CODINE release 4.2, Eastman has maintained over 95% CPU utilization on 
the available systems:   http://www.genias.de/dpcnews/

BMW USES CODINE AND GRD FOR CRASH-SIMULATION
At the BMW crash department, very complex compute-intensive PAM-CRASH
simulations are performed on a cluster of 11 compute servers and more than
100 workstations, altogether over 370 CPUs. CODINE and GRD have optimized
the utilization of this big cluster by distributing the load equally,
dynamically and in an application oriented way, transparent to the 45
users:   http://www.genias.de/dpcnews/

GRD MANAGES ACADEMIC COMPUTER CENTER
http://www.genias.de/dpcnews/

QUEUING UP FOR GRD AT ARL ARMY RESEARCH LAB
http://www.genias.de/dpcnews/

GENIAS ADDS DYNAMIC RESOURCE & POLICY MGMT TO LINUX
http://www.genias.de/dpcnews/

GRD STOPPS FLOODING SYSTEM WITH MANY JOBS
http://www.genias.de/dpcnews/

PaTENT MPI ACCELERATES MARC K7.3 FE ANALYSIS CODE
http://www.genias.de/dpcnews/  +  http://www.marc.com/Techniques/


CONFERENCES on DPC, Dec'98 - March'99:

- Workshop on Performance Evaluation with Realistic Applications (sponsored
  by SPEC), San Jose, CA USA, Jan 25 1999:
  http://www.spec.org/news/specworkshop.html
- ACPC99, 4th Int. Conf. on Parallel Computation, ACPC Salzburg, Austria,
  February 16-18 1999: http://www.coma.sbg.ac.at/acpc99/index.html
- MPIDC'99, Message Passing Interface Developer's and User's Conference,
  Atlanta, Georgia USA, March 10-12 1999: http://www.mpidc.org
- 9th SIAM Conf. on Parallel Processing for Scientific Computing, San
  Antonio, Texas USA, March 22-24 1999: http://www.siam.org/meetings/pp99/
- 25th Speedup Workshop, Lugano, Switzerland, March 25-26 1999:
  http://www.speedup.ch/Workshops/Workshop25Ann.html
- CC99, 2nd German Workshop on Cluster Computing, Karlsruhe, Germany, March
  25-26 1999: http://www.tu-chemnitz.de/informati/RA/CC99
More on GENIAS Webpage. http://www.genias.de/dpcnews/


NEW DPC BOOKS:

- Parallel Computing Using Optimal Interconnections, Kequin Li, Yi Pan, Si
  Qing Zheng. Kluwer Publ 1998: http://www.mcs.newpaltz.edu/~li/pcuoi.html
- High-Performance Computing, Contributions to Society,T.Tabor(Ed.),1998: 
  http://www.tgc.com
- Special Issue on Metacomputing, W. Nagel, R. Williams (Eds.), Int. J.
  Parallel Computing, Vol. 24, No. 12-13, Elsevier Science 1998:
  http://www.elsevier.nl/locate/parco
More books on DPC on GENIAS Webpage:   http://www.genias.de/dpcnews/


DPC WEBPAGES:

- PRIMEUR: HPC electronic news magazine: http://www.hoise.com
- PRIMEUR List of ESPRIT Projects:
  http://www.hoise.com/CECupdate/contentscecdec98.html
- HPCwire, Email Newsletter: http://www.tgc.com/hpcwire.html/
- EuroTools, European HPCN Tools Working Group
  http://www.irisa.fr/eurotools
- PTOOLS, Parallel Tools Consortium: http://www.ptools.org
- TOP500: 500 fastest supercomputers: http://www.top500.org
- PROSOMA: Technology fair describing hundreds of European CEC funded
  projects: http://www.prosoma.lu/
- Links to Linux Cluster Projects: http://www.linux-magazin.de/cluster/
More DPC WebLinks:   http://www.genias.de/dpcnews/


NEWS ON HPC BENCHMARKS:

- STREAM, Memory Performance Benchmark from John McCalpin:
  http://www.cs.virginia.edu/stream/


GENIAS JOBS:

- For our CODINE/GRD Devel.Team: Software engineer with experience in GUI 
  development under OSF/Motif, Java and Windows, distributed computing,
  resource mgnt systems under Unix and NT:  http://www.genias.de/jobs/


CALL FOR PAPERS in upcoming Journals:

- Message Passing Interface-based Parallel Programming with Java: deadline
  Dec. 15 1999:   http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java

End of DPC Newsletter
==========================================================================



From owner-parkbench-comm@CS.UTK.EDU Sun Jan 24 12:15:02 1999
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) 
           by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	   id MAA26189; Sun, 24 Jan 1999 12:15:01 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA08151; Sun, 24 Jan 1999 12:08:47 -0500 (EST)
Received: from serv1.is4.u-net.net ([195.102.240.252]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id MAA08144; Sun, 24 Jan 1999 12:08:44 -0500 (EST)
Received: from mordillo [195.102.198.114] 
	by serv1.is4.u-net.net with smtp (Exim 1.73 #1)
	id 104T1E-0003IJ-00; Sun, 24 Jan 1999 17:08:17 +0000
Date: Sun, 24 Jan 1999 17:05:53 +0000
From: Mark Baker  <Mark.Baker@port.ac.uk>
Subject: New PEMCS paper.
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Z-Mail Pro 6.2, NetManage Inc. [ZM62_16H]
X-Priority: 3 (Normal)
Message-ID: <Chameleon.917197705.mab@mordillo>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1

A new PEMCS paper has just been accepted and published...

Comparing the Scalability of the Cray T3E-600 and the Cray Origin
2000 Using SHMEM Routines, by Glenn R. Luecke, Bruno Raffin and James
J. Coyle, Iowa Sate University, Ames, Iowa USA 

Check out...

http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/

Regards

Mark      

-------------------------------------
DCS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: Mark.Baker@port.ac.uk
Date: 01/24/1999 - Time: 17:05:53
URL:  http://www.dcs.port.ac.uk/~mab/
-------------------------------------

From owner-parkbench-comm@CS.UTK.EDU Tue Feb  2 08:17:19 1999
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) 
           by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	   id IAA08459; Tue, 2 Feb 1999 08:17:19 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id HAA01393; Tue, 2 Feb 1999 07:42:18 -0500 (EST)
Received: from serv1.is1.u-net.net (serv1.is1.u-net.net [195.102.240.129]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id HAA01386; Tue, 2 Feb 1999 07:42:16 -0500 (EST)
Received: from [148.197.205.63] (helo=mordillo)
	by serv1.is1.u-net.net with smtp (Exim 2.00 #2)
	for parkbench-comm@cs.utk.edu
	id 107f7u-0005uS-00; Tue, 2 Feb 1999 12:40:22 +0000
Date: Tue,  2 Feb 1999 12:40:29 +0000
From: Mark Baker  <Mark.Baker@port.ac.uk>
Subject: New PEMCS Paper - resend...
To: parkbench-comm@CS.UTK.EDU
X-Mailer: Z-Mail Pro 6.2, NetManage Inc. [ZM62_16H]
X-Priority: 3 (Normal)
Message-ID: <Chameleon.917959331.mab@mordillo>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1

Apologies for the resend - I think this email get lost when I sent it
a couple of weeks back.

---------------------------------------------------------------------------


A new PEMCS paper has just been accepted and published...

"Comparing the Scalability of the Cray T3E-600 and the Cray Origin
2000 Using SHMEM Routines", by Glenn R. Luecke, Bruno Raffin and James
J. Coyle, Iowa Sate University, Ames, Iowa USA 

Check out...

http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/

Regards

Mark      

-------------------------------------
DCS, University of Portsmouth, Hants, UK
Tel: +44 1705 844285	Fax: +44 1705 844006
E-mail: Mark.Baker@port.ac.uk
Date: 02/02/1999 - Time: 12:40:29
URL:  http://www.dcs.port.ac.uk/~mab/
-------------------------------------

From owner-parkbench-comm@CS.UTK.EDU Tue Mar  2 10:35:47 1999
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) 
           by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	   id KAA18531; Tue, 2 Mar 1999 10:35:46 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA01804; Tue, 2 Mar 1999 10:18:56 -0500 (EST)
Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id KAA01781; Tue, 2 Mar 1999 10:18:49 -0500 (EST)
Received: (qmail 8905 invoked from network); 2 Mar 1999 15:19:10 -0000
Received: from fangorn.genias.de (192.129.37.74)
  by gimli.genias.de with SMTP; 2 Mar 1999 15:19:10 -0000
Received: (from daemon@localhost)
	by FANGORN.genias.de (8.8.8/8.8.8) id QAA13715;
	Tue, 2 Mar 1999 16:19:05 +0100
Date: Tue, 2 Mar 1999 16:19:05 +0100
Message-Id: <199903021519.QAA13715@FANGORN.genias.de>
To: parkbench-comm@CS.UTK.EDU
From: Majordomo@genias.de
Subject: Welcome to newsletter
Reply-To: Majordomo@genias.de

--

Welcome to the newsletter mailing list!

Please save this message for future reference.  Thank you.

If you ever want to remove yourself from this mailing list,
send the following command in email to
<newsletter-request@FANGORN.genias.de>:

    unsubscribe

Or you can send mail to <Majordomo@FANGORN.genias.de> with the following
command in the body of your email message:

    unsubscribe newsletter

or from another account, besides parkbench-comm@CS.UTK.EDU:

    unsubscribe newsletter parkbench-comm@CS.UTK.EDU

If you ever need to get in contact with the owner of the list,
(if you have trouble unsubscribing, or have questions about the
list itself) send email to <owner-newsletter@FANGORN.genias.de> .
This is the general rule for most mailing lists when you need
to contact a human.

 Here's the general information for the list you've subscribed to,
 in case you don't already have it:

The GENIAS Newsletter keeps you informed about new products, 
services and  information about  High Performance Computing. 
It serves as an  addition to our printed  newsletter that is
distributed  to our  customers.  To see our printed version, 
just visit our web-site 

	http://www.genias.de 

and follow the link 'newsletter'.



From owner-parkbench-comm@CS.UTK.EDU Wed Mar  3 02:34:35 1999
Return-Path: <owner-parkbench-comm@CS.UTK.EDU>
Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) 
           by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib)
	   id CAA01271; Wed, 3 Mar 1999 02:34:35 -0500
Received: from localhost (root@localhost) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id CAA00679; Wed, 3 Mar 1999 02:32:28 -0500 (EST)
Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) 
        by CS.UTK.EDU with SMTP (cf v2.9s-UTK)
	id CAA00668; Wed, 3 Mar 1999 02:32:25 -0500 (EST)
Received: (qmail 10306 invoked from network); 3 Mar 1999 07:32:58 -0000
Received: from gandalf.genias.de (192.129.37.10)
  by gimli.genias.de with SMTP; 3 Mar 1999 07:32:58 -0000
Received: by GANDALF.genias.de (Smail3.1.28.1 #30)
	id m10I69J-000B10C; Wed, 3 Mar 99 08:32 MET
Message-Id: <m10I69J-000B10C@GANDALF.genias.de>
From: gentzsch@genias.de (Wolfgang Gentzsch)
Subject: sorry!
To: parkbench-comm@CS.UTK.EDU
Date: Wed, 3 Mar 99 8:32:57 MET
Cc: gent@genias.de (Wolfgang Gentzsch)
Reply-To: gentzsch@genias.de
X-Mailer: ELM [version 2.3 PL11]


Dear colleagues,

I just discovered that the parkbench-comm@CS.UTK.EDU has been collected
into our mailing list for our electronic DPC Newsletter.
I very much appologize for this mistake. 

Thank you for your understanding!

Kind regards
Wolfgang

-- 


        -- subscribe now to  http://www.genias.de/dpcnews/ --
        - - - - - - - - - - - - - - - - - - - - - - - - - - - 
        Wolfgang Gentzsch, CEO           Tel: +49 9401 9200-0 
        GENIAS Software GmbH & Inc      Fax: +49 9401 9200-92
        Erzgebirgstr. 2              http://www.geniasoft.com
        D-93073 Neutraubling, Germany  gentzsch@geniasoft.com
        - - - - - - - - - - - - - - - - - - - - - - - - - - -
        GENIAS Software Inc.                Tel: 410 455 5580
        UMBC Technology Center              Fax: 410 455 5567
        1450 S. Rolling Road         http://www.geniasoft.com
        Baltimore, MD 21227, USA       gentzsch@geniasoft.com
        = = = = = = = = = = = = = = = = = = = = = = = = = = =



.