From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 08:42:24 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK) id AA03711; Fri, 21 May 93 08:42:24 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA15863; Fri, 21 May 93 08:42:58 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 08:42:58 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA15855; Fri, 21 May 93 08:42:56 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA18681; Fri, 21 May 1993 08:42:55 -0400 Date: Fri, 21 May 1993 08:42:55 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9305211242.AA18681@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: Compact applications Dear Compact Applications People, At last I have roughed out some notes on compact applications to serve as a discussion for next weeks meeting in Knoxville. See you there, David ------------------ Latex file below -------------------------------- %file: compac2.tex \chapter{Compact Applications} \footnote{assembled by David Walker for Compact Applications subcommittee} \section{Introduction} \label{sec:compact.intro} While kernel applications, such as those described in Chapter 4, provide a fairly straightforward way of assessing the performance the parallel systems they are not representative of scientific applications in general since they do not reflect certain types of system behavior. In particular, many scientific applications involve data movement between phases of an application, and may also require significant amounts of I/O. These types of behavior are difficult to gauge using kernel applications. One factor that has hindered the use of full application codes for benchmarking parallel computers in the past is that such codes are difficult to parallelize and to port between target architectures. In addition, full application codes that have been successfully parallelized are often proprietary, and/or subject to distribution restrictions. To minimize the negative impact of these factors we propose to make use of compact applications in our benchmarking effort. Compact applications are typical of those found in research environments (as opposed to production or engineering environments), and usually consist of up to a few thousand lines of source code. Compact applications are distinct from kernel applications since they are capable of producing scientifically useful results. In many cases, compact applications are made up of several kernels, interspersed with data movements and I/O operations between the kernels. In this chapter we will discuss a number of compact applications in terms of their purpose, the algorithms used, the types of data movements required, the memory requirements, and the amount of I/O. The compact application below are not meant to form a definite or complete list. \section{Proposed Compact Application Benchmarks} \label{sec:compact.proposed} To ensure that those areas of scientific computing that make the most use of high performance computers are adequately represented in the benchmark suite we shall classify compact applications by scientific field. \subsection{Plasma Physics} \label{subsec:plasmas} Plasma physics is a large consumer of high performance computer cycles. Among the areas studied are the design of tokamaks, high power microwave devices, and astrophysical plasmas. It would be nice to have a compact application from each of these three fields in the benchmark suite. Currently we have Hockney's device simulation, LPM1, from the GENESIS suite. \subsubsection{Electronic Device Simulation with LMP1} \label{subsubsec:lpm1} LMP1 is a time dependent simulation of an electronic device using a particle-mesh or PIC-type algorithm. It uses a two-dimensional $(r,z)$ geometry with the fields being computed on a regular mesh of size $33\times 75\cdot\alpha$, where $alpha$ is a size parameter that can take the value 1, 2, 4, and 8, corresponding to runs with between about 700 and 6000 particles. \subsection{Quantum Chromodynamics} \label{subsubsec:qcd} Quantum Chromodynamics (QCD) is the gauge theory of the strong interaction which binds quarks and gluons into hadrons, which make up the constituents of nuclear matter. Analytical perturbation methods can be applied to QCD only at high energies, hence computer simulations are necessary to study QCD at lower, more realistic, energies. In these lattice gauge theory simulations the quantum field is discretized onto a periodic, four-dimensional, space-time lattice. Quarks are located at the lattice sites, and the gluons that bind them are associated with the lattice links. The gluons are represented by SU(3) matrices, which are a particular type of $3\!\times\! 3$ complex matrix. A major component of the QCD code involves updating these matrices. \subsubsection{Quenched QCD} \label{subsubsec:quenched} The QCD code in the Perfect benchmark suite is derived from the work of Fox, Flower, Otto, and Stolorz at Caltech. The Perfect QCD code uses the Cabbibo-Marinari pseudo heat bath algorithm to update the SU(3) matrices on the lattice links. This algorithm uses a Monte Carlo technique to generate a chain of configurations which are distributed with a probability proportional to $\exp{(-S(U))}$, where $S(U)$ is the action of the configuration $U$. If the only contributions to the action come from the gauge field then the action is local. The inclusion of dynamical fermions gives rise to a nonlocal action. This code ignores the effects of dynamical fermions, and so represents a pure-gauge model in the quenched approximation. A major component of this QCD code is the updating of the SU(3) matrices associated with each link in the lattice, and it is this operation which is benchmarked in the Perfect timings. Two basic operations are involved in updating the lattice. The first is the multiplication of SU(3) matrices, and the second is the generation of pseudo-random numbers. \subsubsection{Genesis QCD} \label{subsubsection:dynamical} Is the Genesis benchmark QCD1 similar to the Caltech QCD code. Which one should be used? \subsection{General Relativity} \label{subsec:gr} \subsubsection{Evolution of Gravitational Field} The Genesis code GR1 solves a system of hyperbolic PDEs, derived from general relativity which describe the evolution of a gravitational field from an initial state. Although conceptually similar to the solution of the wave equation the equations are long and complicated. This application solves the axisymmetric problem to reduce the problem to manageable size. Solution of the general problem requires three orders of magnitude more compute power, and is likely to become of substantial interest as more powerful parallel machines are developed. \subsubsection{Quantum Theory of Gravity} \label{subsec:gravity} This code, which derives from the work of Sorkin and Daughton of Syracuse University, is part of an effort to provide a satisfactory quantum theory of gravity by the use of causal set theory$\ldots$whatever that is. The main computational task is the LU factorization of large, dense matrices ($10000\times 10000$). \subsection{Climate and Weather Prediction} \label{subsec:climate} Mesoscale weather prediction and global climate modeling have become important application areas in recent years. They typically involve the solution of nonlinear PDEs. \subsubsection{Spectral Solver for the Shallow Water Equations} \label{subsubsec:swe} The spectral transform method is the standard numerical technique used to solve partial differential equations on the sphere in global climate modeling. For example, it is used in CCM1 (the Community Climate Model 1), and its successor CCM2. The solution of the shallow water equations on a sphere constitutes an important component in such global climate models. The SSWMSB code uses the spectral transform method to solve the shallow water equations on the surface of a sphere which is discretized as a regular longitude-latitude grid. In each timestep the state variables of the problem are transformed between the physical domain, where most of the physical forces are calculated, and the spectral domain, where the terms of the differential equation are evaluated. This transformation involves first the evaluation of FFTs along lines of constant latitude, followed by Legendre integration (i.e., weighted summation) over longitude. \subsubsection{Helmholtz Solvers for Meteorological Modeling} \label{subsubsec:helmholtz} The Genesis suite includes two meteorological applications based on Helmholtz solvers. One uses a pseudo-spectral solution method, and the other a multigrid algorithm. \subsection{Molecular Dynamics} \label{subsec:moldyn} \subsubsection{Dislocation Studies in Crystals} \label{subsubsec:dislocation} In parallel Fortran 77 plus message passing code has been developed at ORNL to study dislocation phenomena in crystals. This three-dimensional code divides space into cells, with each processor being assigned a rectangular block of cells. Each cell contains a set of particles. Communication is necessary to exchange particles lying in cells on the boundary of a processor with a neighboring processor. Particles must also be migrated between processors as they move in space. \subsubsection{The Genesis Molecular Dynamics Code} \label{subsubsec:genesis_md} I don't know much about this, but I expect it's similar to the ORNL code. \subsubsection{The PERFECT Molecular Dynamics Code} \label{subsubsec:perfect_md} The Perfect benchmark suite included two molecular dynamics code, both of which use data sets that are too small to be used to evaluate current parallel computers. BDNA which simulates the hydration structure of potassium counterions and water in a B-DNA molecule, involves 1500 water molecules and 20 counterions. MDG performs a molecular dynamics calculation on 343 water molecules in the liquid state. \subsection{Geophysics} Two important geophysics computations are flow through porous media and seismic migration. The Perfect suite includes a seismic migration code, MG3D. This code is dominated by FFTs. A parallel code for modeling groundwater flow is under development at ORNL and may be a good code to include in the suite as an example of a flow through porous media code. \subsection{Other Codes} Clearly we would want to include CFD codes, astrophysics codes such as the tree-based simulations of gravitating systems, quantum chemistry and superconductor simulations. We also need to include codes from the NAS, NPAC, PERFECT2, and SLALOM benchmark suites, as well as providing better descriptions of the codes above. \section{Concluding Remarks} There are probably two or three dozen compact applications that we might consider for inclusion in the benchmark suite. We should consider what is a reasonable number of codes to include, and the criteria for accepting a code in terms of documentation, usefulness, and software quality. From owner-pbwg-compactapp@CS.UTK.EDU Fri May 21 09:06:07 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK) id AA03860; Fri, 21 May 93 09:06:07 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA17282; Fri, 21 May 93 09:06:44 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 21 May 1993 09:06:43 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA17276; Fri, 21 May 93 09:06:41 -0400 Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK) id AA01842; Fri, 21 May 93 09:06:40 -0400 Message-Id: <9305211306.AA01842@berry.cs.utk.edu> To: walker@rios2.epm.ornl.gov (David Walker) Cc: pbwg-compactapp@cs.utk.edu Subject: Re: Compact applications In-Reply-To: Your message of "Fri, 21 May 1993 08:42:55 EDT." <9305211242.AA18681@rios2.epm.ornl.gov> Date: Fri, 21 May 1993 09:06:39 -0400 From: "Michael W. Berry" Fellow Compact Applic. Members: Here is a copy of the minutes from the SPEC/Perfect meeting I attended in Hunstville. Some of this information may be useful to PBWG. Mike B. --------------------------------------------------------------- Draft Minutes: The SPEC Perfect Group 11-13 May 1993 The Perfect Club Steering Committee voted to merge with the SPEC organization. The first joint meeting with SPEC occurred during 11-13 May 1993. The original SPEC organization has been modified so that the name "SPEC" refers to the non-profit corporation which acts as a financial umbrella for benchmarking subgroups. The original SPEC group is now known as the SPEC Open Systems Group. The Perfect Club is now known as the SPEC Perfect Group. In accordance with the vote taken by David Schneider in April, the initial SPEC Perfect Steering Committee includes Margaret Simmons (LANL), George Cybenko(Darmouth), David Schneider (CSRD), John Larson (CSRD), Mike Berry (U.of Tenn), Satish Rege (DEC), Joanne Martin (IBM), and Philip Tannenbaum (HNSX). This meeting was attended by David Schneider (CSRD), Mike Berry (U.of Tenn), Satish Rege (DEC), Philip Tannenbaum (HNSX), Leo Boelhouwer (IBM-Kingston, representing Joanne Martin), Jacob Thomas (IBM-Austin), Larry Gray (Chairman, SPEC BOD), and Rod Skinner (Treasurer, SPEC). Hwa Lai (Fujitsu) attended as an observer. Various SPEC Open Systems members periodically sat in. David Schneider indicated that he anticipated Cray Research would rejoin because of marketing necessity. The meeting began with David Schneider, Larry Gray, and Rod Skinner presenting the framework for the merger. The SPEC Open Systems Group and the SPEC Perfect Group will be autonomous subgroups within SPEC. SPEC itself will act as a business umbrella organization. Each Group will assess dues and allocate budgets independently. The overhead which SPEC Perfect Group will be responsible for will include legal retainer and accounting fees for NCGA, and additional costs of printing, duplication, distribution, or other services that the SPEC Perfect Group may elect to utilize in the future. It was also stated that the SPEC organization was flexible on many issues, but the underlying requirement was to ensure that corporate non-profit status regulations are not violated. SPEC is incorporated as a non-profit organization in California. It was generally agreed by all that mutual trust would be required from SPEC Open Systems Group and SPEC Perfect Group to minimize formality and unnecessary bureaucracy. The Perfect Group will be given one SPEC BOD seat on a temporary basis until January 1994. The SPEC BOD currently consists of 5 members that includes HP, Intel, Sun, ATT/NCR, and IBM. The Perfect Group seat will add 1 member to the BOD. In January 1994 this 6th BOD seat will be open for voting by the entire SPEC membership (SPEC Perfect Group and SPEC Open Systems Group). A discussion about who should fill the temporary SPEC Perfect Group BOD seat resulted in agreement that University people could not practically take the position because of travel expense. IBM already was represented on the SPEC BOD, so David Schneider nominated Satish Rege (DEC) and Philip Tannenbaum (HNSX) as candidates for the BOD seat. Leo Boelhouwer seconded the nomination for Philip Tannenbaum; Mike Berry seconded the nomination for Satish Rege. A vote will be conducted by email on/about 1 June 1993. The initial 7 Steering Committee members are the eligible voters. During June a press announcement about the merger would be jointly written. There was discussion about inclusion of academic and government members. As a result of SPEC non-profit requirements, all members must be either full members ($5,000/year) or associate members ($1,000/year). It was agreed that few academics or government members could acquire funding for membership. SPEC Perfect Group Steering Committee could elect to sponsor the memberships of selected individuals; and certain individuals could be included by creation of "SPEC Fellows" or "SPEC Affiliates" whereby specific services could be paid for with membership. Seeking industrial sponsorship for academic participation was discussed as desireable. Each member will initiate a "check is in the mail" process for their membership fees. Diane Dean, NCGA, 2722 Merrilee Drive, Fairfax, VA 22301-4499 (703-698-9600 x318) is our contact in this regard. SPEC Open Systems Group members received 6 free pages for SPEC/OSG reporting in the publications; additional pages were billed at $500 each--it was noted that DEC purchased 60 extra pages in the last publication to kick off a new product line. The SPEC Perfect group organization was discussed. It was agreed that the SPEC Perfect Group should have a Chairman, a Secretary, and a Technical Coordinator. The Chairman would be responsible for interfacing with SPEC and the SPEC Open Systems Group, organizing meetings, and general management. The Secretary would be responsible for generating minutes and handling correspondence. The Technical Coordinator would be responsible for benchmarking status, benchmark production and distribution, coordinating the benchmark subgroups, and being the focal point for technical issues. Each benchmark subgroup would have its own leadership. Temporary assignments were accepted to fill these positions until the next SPEC Perfect Group meeting, targeted for August at ATT (Chicago). Rege Satish is the temporary Chiarman, Philip Tannenbaum the temporary Secretary, and Leo Boelhouwer the temporary Technical Coordinator. Specific action items for the period include: Completing the benchmark codes Generating verification tests and timing instrumentation Publishing minutes Writing a solicitation for vendors and industry to attract membership or sponsorship support A discussion about the benchmark rules and reporting resulted in general agreement that there would be baseline ("As Is") executions which allowed only the minimal changes required to obtain correct results. There would also be an optimized or alternative solution execution which would allow unlimited use of standard vendor libraries and unlimited rewriting in a high level language. It was agreed that the benchmark programs would be distributed via netlib or anonymous ftp. Text would be added to each benchmark program requiring that any use of benchmark results from the program, which are not formally accepted and published by SPEC Perfect Group, must state "these results are not officially approved and reported by the SPEC Perfect Group Steering Committee. They may not be directly comparable to accepted and verified results." Only actual execution results would be permitted. All executions must be on hardware and software systems that are current products or which will be generally available in the market within 6 months. There was a spirited debate on the metrics to be used for reporting results. Discussion about the pros and cons of using normalized ratings, MFLOPS, wall clock times, and absolute numbers took place. The discussion resulted in the benchmark publications including 1)elapsed wall clock time, 2)startup time, 3)time step timing, 3)cleanup time, 4)total user cpu time accumulated, and 5)total system cpu time accumulated per program. No MFLOPS rate will reported. This was agreed to be the most scientifically sound approach that would be meaningful and unambiguous. All execution results presented for approval and publication must include sufficient detail of the hardware and software configuration such that the run could be essentially duplicated with comparable timings. Acceptable results will have valid answers and meet SPEC Perfect Group standards for code changes and execution requirements. Optimized and alternative solution results must include the entire program code as executed, and a statement that the code may be used, without restriction, as a SPEC Perfect Group baseline benchmark code. All vendor library codes used must include copies of the relevant vendor documentation page that include sufficient detail to describe the processes done within the library routine. New vendor library routines must have copies of equivalent preliminary documentation. All library routines used must be generally available to all vendor customers, and must either be documented products, or become documented products within 6 months of benchmark submission. Results on prototype or preproduction systems could be removed from publication if the benchmarked products were not released within the 6 month window. The goal is to provide all codes in a FORTRAN77 version, a FORTRAN90 version, and a message passing version. It was agreed that version control should be instituted so that all results would be grouped according to benchmark version. If any one code in a benchmark group changed, all codes would receive a new version number. The benchmark groups will be aligned to address vertical industrial areas such as petroleum, chemistry, finance, etc. The codes available for the initial release include the FDMOD, FKMIG, and SEIS from the ARCO suite, QCD, FALSE, PUEBLO, and TURB3D. The ARCO suite codes are farthest along. All codes are expected to represent scalable problem solutions that are appropriate to vector, vector parallel, and MPP architectures. A goal is to maintain the benchmark set at a level whereby only supercomputer class and extreme high end workstations/clusters could reasonably execute the problems. There is no specific exclusion intended; this goal was stated in order to maintain the SPEC Perfect Group focus on true supercomputing rather than the broader high performance computing classification. The goals may not all be addressed initially because of pratical limitations in how much can be accomplished with available resources. Coding and language standards were discussed. Proposals were made. John Larson''s work in this area will be circulated. Leo Boelhouwer will edit the V1 execution rules and present an updated draft for approval during the next meeting. Language standards were presented as a basis for creating a benchmark code standard by David Schneider. They included numerous items that were accepted by the group, and a few (noted below) where no final conclusion was made. Variables could not exceed 31 characters No Pointers No DOUBLE PRECISION; REAL*8 and COMPLEX*16 should be used No CHARACTER-Floating Point equivalences No Hollerith constants or data No 128 bit requirements (REAL*16, COMPLEX*32) All 64 bit constants should be specified in D format All 32 bit constants should be specified in E format Machine constant limitations were discussed--no conclusions agreed INTEGER*8 and LOGICAL*8 should not be used unless necessary for execution Tests for floating point equality were discussed--no conclusions agreed Known vector directive information will be translated to a "C*PERFECT" syntax to preserve information; it will be explicitly prohibited from implementing compiler recognition of "C*PERFECT" information. DO WHILE and DO-ENDDO syntax is allowed "!" inlined comments were discussed--no conclusions agreed Additional action items were summarized: Distribute old by-laws for review (DS) Review old by-laws and offer suggestions for revision (all) Contact NCGA regarding our new status (DS) Present our proposals for membership specific issues to the SPEC BOD (SR) Identify manpower requirements to complete V2 benchmakr suite (all) Transfer "Perfect Benchmark" trademark from U.Ill. to SPEC (DS) Distribute Minutes (PT) Set up address and email lists (DS) Next meeting at ATT, Chicago, in August (with SPEC Open Systems Group) (all) Schedule a benchathon to finalize all V2 inital codes (all). --- Michael W. Berry ___-___ o==o====== . . . . . Ayres 114 =========== ||// Department of \ \ |//__ Computer Science #_______/ berry@cs.utk.edu University of Tennessee (615) 974-3838 [OFF] Knoxville, TN 37996-1301 (615) 974-4404 [FAX] From owner-pbwg-compactapp@CS.UTK.EDU Wed May 26 17:49:18 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK) id AA09519; Wed, 26 May 93 17:49:18 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA25937; Wed, 26 May 93 17:49:43 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 26 May 1993 17:49:42 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA25931; Wed, 26 May 93 17:49:41 -0400 Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK) id AA11808; Wed, 26 May 93 17:49:40 -0400 Message-Id: <9305262149.AA11808@berry.cs.utk.edu> To: pbwg-compactapp@cs.utk.edu Subject: We can get ARCO Date: Wed, 26 May 1993 17:49:39 -0400 From: "Michael W. Berry" Here's an note I recieved from Mosher at ARCO - looks pretty good! Mike Return-Path: Received: from inetg1.Arco.COM by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA14228; Wed, 26 May 93 14:48:57 -0400 Received: by Arco.COM (4.1/SMI-4.1) id AA06937; Wed, 26 May 93 13:48:55 CDT Date: Wed, 26 May 93 13:48:55 CDT From: ccm@Arco.COM (Chuck Mosher (214)754-6468) Message-Id: <9305261848.AA06937@Arco.COM> To: berry@cs.utk.edu Subject: ARCO/Perfect Seismic Benchmark Version 1.0 of SeisPerf is due for Beta release June 1. The suite provides a working seismic processing executive with examples of common industry algorithms. Version 1.0 is built over a simple message passing layer, which calls PVM, P4, or native message passing services. The applications call several of the kernal routines mentioned in the PBWG minutes, including 3D fft's, tri-diagonal and Toepplitz matrix solvers, convolutions, and integral methods. The codes are designed to be scalable from single processor workstations to ~1000 processor MPP systems. Verification tools include a simple X-windows frame viewer, and a checksum table that is printed at the end of each run. The 1.0 release is based on Fortran 77. MasPar has provided a Fortran 90 port of the codes for their systems, which could form the base for and HPF version of the codes. I'd be happy to participate in PARKBENCH and provide support for including SeisPerf results. Regards, Chuck Mosher ccm@arco.com --- Michael W. Berry ___-___ o==o====== . . . . . Ayres 114 =========== ||// Department of \ \ |//__ Computer Science #_______/ berry@cs.utk.edu University of Tennessee (615) 974-3838 [OFF] Knoxville, TN 37996-1301 (615) 974-4404 [FAX] From owner-pbwg-compactapp@CS.UTK.EDU Thu May 27 12:54:03 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-UTK) id AA13555; Thu, 27 May 93 12:54:03 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA10406; Thu, 27 May 93 12:54:28 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 27 May 1993 12:54:27 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from BERRY.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA10400; Thu, 27 May 93 12:54:26 -0400 Received: from LOCALHOST.cs.utk.edu by berry.cs.utk.edu with SMTP (5.61++/2.7c-UTK) id AA13805; Thu, 27 May 93 12:54:25 -0400 Message-Id: <9305271654.AA13805@berry.cs.utk.edu> To: ccm@arco.com (Chuck Mosher (214)754-6468) Cc: pbwg-compactapp@cs.utk.edu Subject: Re: ARCO/Perfect Seismic Benchmark In-Reply-To: Your message of "Thu, 27 May 1993 06:59:31 CDT." <9305271159.AA15941@Arco.COM> Date: Thu, 27 May 1993 12:54:24 -0400 From: "Michael W. Berry" > An earlier release of the codes is available on the U of Illinois > anonymous ftp server 'csrd.uiuc.edu' in the directory '/pub/perfect'. > The file 'arco_beta.tar.Z' contains code, installation scripts, > and documentation for an earlier f77 version for uniprocessors. > You might want to get this file and have a look at the documentation > and source structure. The message-passing source is pretty close > in structure to the f77 version. > > We have a mailing list for discussion of the codes: > 'perfect_seismic@csrd.uiuc.edu' > Let me know if you want to be on the list. We'll announce the > new codes there. > > Regards, > Chuck Mosher Yes, please add my email addr and pbwg-compactapp@cs.utk.edu to the mailing list. Thanks Mike --- Michael W. Berry ___-___ o==o====== . . . . . Ayres 114 =========== ||// Department of \ \ |//__ Computer Science #_______/ berry@cs.utk.edu University of Tennessee (615) 974-3838 [OFF] Knoxville, TN 37996-1301 (615) 974-4404 [FAX] From owner-pbwg-compactapp@CS.UTK.EDU Thu Sep 16 11:20:48 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA00187; Thu, 16 Sep 93 11:20:48 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA25374; Thu, 16 Sep 93 11:19:13 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 16 Sep 1993 11:19:10 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from sun4.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA25344; Thu, 16 Sep 93 11:19:07 -0400 Received: by sun4.epm.ornl.gov (4.1/1.34) id AA00634; Thu, 16 Sep 93 11:19:06 EDT Date: Thu, 16 Sep 93 11:19:06 EDT From: worley@sun4.epm.ornl.gov (Pat Worley) Message-Id: <9309161519.AA00634@sun4.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: potential compact benchmark Forwarding: Mail from 'MAILER-DAEMON (Mail Delivery Subsystem)' dated: Thu, 16 Sep 93 11:16:12 EDT Ian Foster and I are just finishing version 1.0 of PSTSWM, a parallel algorithm testbed and benchmark code developed for the climate modelling community. It will be made available to this community via netlib, but it may also be interesting as a PARKBENCH compact application. There are a few difficulties with this though, and I would like some feedback/suggestions on how to proceed. Description ----------- PSTSWM is a parallel implementation of a serial code (STSWM 2.0) written by Jim Hack and Rudy Jakobs at NCAR to solve the shallow water equations on a sphere using the spectral transform method. It was originally developed as a numerical algorithm testbed, to allow comparison of spectral methods with finite difference methods with finite element methods, etc., and has 6 runtime-selectable test cases in the code. These test cases specify initial conditions, forcing, and analytic solutions (for error analysis), and were chosen to test the ability of the numerical methods to simulate important flows phenomena. For PSTSWM, we completely rewrote STSWM to add vertical levels, in order to get the correct communication and computation granularity for 3-D climate codes, and to allow the problem size to be selected at runtime without depending on such nonportable features as dynamic memory. PSTSTWM is meant to be a compromise between paper benchmarks and the usual fixed benchmarks by allowing a significant amount of runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the numerical simulation can be run on different machines without fixing the parallel implementation, but forcing all implementations to execute the same numerical code (to guarantee fairness). To enable this PSTSWM supports: a) 4 classes of parallel algorithms (distributed or transpose based for each of two major parallel phases) b) each class has 3-4 specific parallel algorithms (e.g. using a recursive-halving vector sum, using a pipelined ring vector sum, etc.) c) each algorithm has 2-4 variants d) each algorithm is built on top of two communication constructs, swap and sendrecv, and each of these has 5-6 different communication protocol options (synchonous, blocking, nonblocking, forcetypes, etc.) We are quite happy with the code, and are getting good results with it. Most interesting to us is how the best algorithm changes across platforms and as the problem size changes on the same platform. Problems -------- There are couple of issues to be dealt with in using this code as part of PARKBENCH. 1) The code currently is in single precision with double precision parts. Single precision is sufficient for the problem sizes of interest, but the Legendre polynomial values and Gauss quadrature weights and nodes must be calculated in higher precision. For larger problem sizes, double precision computation will be appropriate, but the Gauss weights, etc will then need to be calculated in quad. precision. I do not think that this sort of mixed case has been discussed yet. 2) In one sense, PSTSWM is not a single benchmark, but many of them. We can fix the problem and parallel algorithm specifications by providing (a set of) default input files, but which ones should we chose? All of them are arguably good algorithms in some setting, and I would hate to compare two machines when the algorithm is good for one and inappropriate for another. 3) PSTSWM is currently written using PICL (because that is what I normally use and because I have embedded instrumentation in the research version of the code). I made a real effort to isolate the message passing bits, so porting to anything else will be trivial. But the message passing interface that is used does effect the parallel algorithms that are supported. For example, PICL supports nonblocking send and receive and passes through forcetype message types. These are important to performance on some Intel machines. This is not a problem so much as something to be aware of. PSTSWM will also be available in its original form, but a pointer to some of the issues in cross-machine comparisions should be made. This may be an issue that should be mentioned in the methodology section as pertains to compact applications. Unlike low level benchmarks, compact applications are less likely to be "done right" by the vendor for their particular machines. Comments and suggestions would be appreciated. I imagine every proposed compact application will be unsuitable in one form or another when it is first submitted, and precise guidelines on what should or should not be permitted is important. On the other hand, as a developer, I will not be interested in doing too much work in modifying the code in order to include it in the benchmark suite. Even with the best intentions, it will not be a high priority item for me and is likely to be put off (forever) if not fairly simple. Thanks. Pat Worley From owner-pbwg-compactapp@CS.UTK.EDU Tue Sep 21 11:49:13 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA02710; Tue, 21 Sep 93 11:49:13 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA08554; Tue, 21 Sep 93 11:47:15 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 21 Sep 1993 11:47:14 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK) id AA08546; Tue, 21 Sep 93 11:47:13 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA12782; Tue, 21 Sep 1993 11:47:07 -0400 Date: Tue, 21 Sep 1993 11:47:07 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9309211547.AA12782@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: Application submission form I'm trying to put together a submission form for people to use to submit applications for inclusion in the ParkBench Compact Applications suite. Also I'd like to establish a procedure for submission. Below is a first stab at these 2 things. Please send me feedback. Later this week I intend to send out a filled in version of the submission form as an example. David PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscipt format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker Oak Ridge National Laboratory Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 (615) 574-7401/0680 (phone/fax) walker@msr.epm.ornl.gov The street address is "Bethal Valley Road" if Fedex insists on this. The subcommittee will then make a final decision on whether to include your application in the ParkBench suite. 3. If your application is approved for inclusion in the ParkBench suite you (or some authorized person from your organization) will be asked in complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), your application package. ------------------------------------------------------------------------------- Name of Program : ------------------------------------------------------------------------------- Submitter's Name : Submitter's Organization: Submitter's Address : Submitter's Telephone # : Submitter's Fax # : Submitter's Email : ------------------------------------------------------------------------------- Cognizant Expert(s) : CE's Organization : CE's Address : CE's Telephone # : CE's Fax # : CE's Email : ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : ------------------------------------------------------------------------------- Major Application Field : Application Subfield(s) : ------------------------------------------------------------------------------- Application "pedigree" : ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: Integers : bytes Floats : bytes ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : ------------------------------------------------------------------------------- Other relevent research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : ------------------------------------------------------------------------------- Total number of lines in source code: Number of lines excluding comments : Size in bytes of source code : ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : ------------------------------------------------------------------------------- Brief, high-level description of what application does: ------------------------------------------------------------------------------- Main algorithms used: ------------------------------------------------------------------------------- Skeleton sketch of application: ------------------------------------------------------------------------------- Brief description of I/O behavior: ------------------------------------------------------------------------------- Brief description of load balance behavior : ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : ------------------------------------------------------------------------------- Give parameters that determine the problem size : ------------------------------------------------------------------------------- Give memory as function of problem size : ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : ------------------------------------------------------------------------------- From owner-pbwg-compactapp@CS.UTK.EDU Tue Oct 5 15:29:11 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA06534; Tue, 5 Oct 93 15:29:11 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA00420; Tue, 5 Oct 93 15:28:34 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Tue, 5 Oct 1993 15:28:29 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA00402; Tue, 5 Oct 93 15:28:23 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA20677; Tue, 5 Oct 1993 15:28:21 -0400 Message-Id: <9310051928.AA20677@rios2.epm.ornl.gov> To: spb@epcc.edinburgh.ac.uk, mia@unixa.nerc-bidston.ac.uk, pbwg-compactapp@cs.utk.edu Subject: Submission form for ParkBench compact applications Date: Tue, 05 Oct 93 15:28:20 -0500 From: David W. Walker Below is an example (prepared by Pat Worley of Oak Ridge National Lab) of the use of the ParkBench Compact Applications submission form. This form (or something like it) is intended to be used by all persons wishing to submit an application to be included in the suite. The first page or so expalins the submission procedure. Pat has been very thorough in filling out the form. I don't think it practical to expect every submission to be this detailed. If you have applications that you would like to submit please go ahead and fill in the form. Laso any comments on the form would be appreciated. I hope to give the form wider distribution in a couple of weeks so we can (I hope) get a good number of submission before teh SC93 ParkBench meeting. David PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscipt format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker Oak Ridge National Laboratory Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 (615) 574-7401/0680 (phone/fax) walker@msr.epm.ornl.gov The street address is "Bethal Valley Road" if Fedex insists on this. The subcommittee will then make a final decision on whether to include your application in the ParkBench suite. 3. If your application is approved for inclusion in the ParkBench suite you (or some authorized person from your organization) will be asked in complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), your application package. ------------------------------------------------------------------------------- Name of Program : PSTSWM : (Parallel Spectral Transform Shallow Water Model) ------------------------------------------------------------------------------- Submitter's Name : Patrick H. Worley Submitter's Organization: Oak Ridge National Laboratory Submitter's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 Submitter's Telephone # : (615) 574-3128 Submitter's Fax # : (615) 574-0680 Submitter's Email : worley@msr.epm.ornl.gov ------------------------------------------------------------------------------- Cognizant Expert(s) : Patrick H. Worley CE's Organization : Oak Ridge National Laboratory CE's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 CE's Telephone # : (615) 574-3128 CE's Fax # : (615) 574-0680 CE's Email : worley@msr.epm.ornl.gov Cognizant Expert(s) : Ian T. Foster CE's Organization : Argonne National Laboratory CE's Address : MCS 221/D-235 9700 S. Cass Avenue Argonne, IL 60439 CE's Telephone # : (708) 252-4619 CE's Fax # : (708) 252-5986 CE's Email : itf@mcs.anl.gov ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Modulo other commitments, Worley is prepared to respond quickly to questions and bug reports, but expects to be kept informed as to results of experiments and modifications to the code. ------------------------------------------------------------------------------- Major Application Field : Fluid Dynamics Application Subfield(s) : Climate Modeling ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equations using the spectral transform method. The spectral transform algorithm of the code follows closely how CCM2, the NCAR Community Climate Model, handles the dynamical part of the primitive equations, and the parallel algorithms implemented in the model include those currently used in the message-passing parallel implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge National Laboratory and Ian Foster of Argonne National Laboratory, and is based partly on previous parallel algorithm research by John Drake, David Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code development and parallel algorithms research were funded by the DOE Computer Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The features of version 1.0 were frozen on 8/1/93, and it is this version we would offer initially as a benchmark. PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations on a sphere using the spectral transform method. STSWM evolved from a spectral shallow water model written by Hack (NCAR/CGD) to compare numerical schemes designed to solve the divergent barotropic equations in spherical geometry. STSWM was written partially to provide the reference solutions to the test cases proposed by Williamson et. al. (see citation [4] below), which were chosen to test the ability of numerical methods to simulate important flow phenomena. These test cases are embedded in the code and are selectable at run-time via input parameters, specifying initial conditions, forcing, and analytic solutions (for error analysis). The solutions are also published in a Technical Note by Jakob et. al. [3]. In addition, this code is meant to serve as an educational tool for numerical studies of the shallow water equations. A detailed description of the spectral transform method, and a derivation of the equations used in this software, can be found in the Technical Note by Hack and Jakob [2]. For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the correct communication and computation granularity for 3-D weather and climate codes), to increase modularity and support code reuse, and to allow the problem size to be selected at runtime without depending on dynamic memory allocation. PSTSTWM is meant to be a compromise between paper benchmarks and the usual fixed benchmarks by allowing a significant amount of runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the numerical simulation can be run on different machines without fixing the parallel implementation, but forcing all implementations to execute the same numerical code (to guarantee fairness). The code has also been written in such a way that linking in optimized library functions for common operations instead of the "portable" code will simple. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : Yes, but users are requested to acknowledge the authors (Worley and Foster) and the program that supported the development of the code (DOE CHAMMP program) in any resulting research or publications, and are encouraged to send reprints of their work with this code to the authors. Also, the authors would appreciate being notified of any modifications to the code. Finally, the code has been written to allow easy reuse of code in other applications, and for educational purposes. The authors encourage this, but also request that they be notified when pieces of the code are used. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION variables. The code should work correctly for any system in which COMPLEX is represented as 2 REALs. The include file params.i has parameters that can be used to specify the length of these. Also, some REAL and DOUBLE parameters values may need to be modified for floating point number systems with large mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where Integers : 4 bytes Floats : 4 bytes The use of two precisions can be eliminated, but at the cost of a significant loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases the error by approximately three orders of magnitude.) DOUBLE PRECISION results are only used in set-up (computing Gauss weights and nodes and Legendre polynomial values), and are not used in the body of the computation. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : The sequential code is documented in a file included in the distribution of the code from NCAR: Jakob, Ruediger, Description of Software for the Spectral Transform Shallow Water Model Version 2.0. National Center for Atmospheric Research, Boulder, CO 80307-3000, August 1992 and in Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. Documentation of the parallel code is in preparation, but extensive documentation is present in the code. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : 1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of three numerical methods for solving differential equations on the sphere, Monthly Weather Review, 117:1058-1075, 1989. 2) Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to shallow water test set using the spectral transform method, NCAR Technical Note TN-388+STR (in preparation). 4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber, A standard test set for numerical approximations to the shallow water equations in spherical geometry, Journal of Computational Physics, Vol. 102, pp.211-224, 1992. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : 5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method, Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), pp. 269-291. 6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral Transform Method. Part II, Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), pp. 509-531. 7) Foster, I. T. and P. H. Worley, Parallelizing the Spectral Transform Method: A Comparison of Alternative Parallel Algorithms, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 100-107. 8) Foster, I. T. and P. H. Worley, Parallel Algorithms for the Spectral Transform Method, (in preparation) 9) Worley, P. H. and I. T. Foster, PSTSWM: A Parallel Algorithm Testbed and Benchmark. (in preparation) ------------------------------------------------------------------------------- Other relevant research papers: 10) I. Foster, W. Gropp, and R. Stevens, The parallel scalability of the spectral transform method, Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes, R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley, The Message-Passing Version of the Parallel Community Climate Model, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 500-513. 12) Sato, R. K. and R. D. Loft, Implementation of the NCAR CCM2 on the Connection Machine, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 371-393. 13) Barros, S. R. M. and Kauranne, T., On the Parallelization of Global Spectral Eulerian Shallow-Water Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 36-43. 14) Kauranne, T. and S. R. M. Barros, Scalability Estimates of Parallel Spectral Atmospheric Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 312-328. 15) Pelz, R. B. and W. F. Stern, A Balanced Parallel Algorithm for Parallel Processing, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 126-128. ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : The model code is primarily written in Fortran 77, but also uses DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in common and parameter declarations). It has been compiled and run on the Intel iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation, IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code). Message passing is implemented using the PICL message passing system. All message passing is encapsulated in 3 highlevel routines: BCAST0 (broadcast) GMIN0 (global minimum) GMAX0 (global maximum) two classes of low level routines: SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3 (variants and/or pieces of the swap operation) and SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3 (variants and/or pieces of the send/recv operation) and one synchronization primitive: CLOCKSYNC0 PICL instrumentation commands are also embedded in the code. Porting the code to another message passing library will be simple, although some of the runtime communication options may become illegal then. The PICL instrumentation calls can be stubbed out (or removed) without changing the functionality of the code, but some sort of synchronization is needed when timing short benchmark runs. ------------------------------------------------------------------------------- Total number of lines in source code: 28,204 Number of lines excluding comments : 12,434 Size in bytes of source code : 994,299 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : problem: 23 lines, 559 bytes, ascii algorithm: 33 lines, 874 bytes, ascii ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: Number of lines and bytes is a function of the input specifications, but for benchmarking would normally be 63 lines (2000 bytes) of meaningful output. (On the Intel machine, FORTRAN STOP messages are sent from each processor at the end of the run, increasing this number.) timings: Each run produces one line of output, containing approx. 150 bytes. Both files are ascii. ------------------------------------------------------------------------------- Brief, high-level description of what application does: (P)STSWM solves the nonlinear shallow water equations on the sphere. The nonlinear shallow water equations constitute a simplified atmospheric-like fluid prediction model that exhibits many of the features of more complete models, and that has been used to investigate numerical methods and benchmark a number of machines. Each run of PSTSWM uses one of 6 embedded initial conditions and forcing functions. These cases were chosen to stress test numerical methods for this problem, and to represent important flows that develop in atmospheric modeling. STSWM also supports reading in arbitrary initial conditions, but this was removed from the parallel code to simplify the development of the initial implementation. ------------------------------------------------------------------------------- Main algorithms used: PSTSWM uses the spectral transform method to solve the shallow water equations. During each timestep, the state variables of the problem are transformed between the physical domain, where most of the physical forces are calculated, and the spectral domain, where the terms of the differential equation are evaluated. The physical domain is a tensor product longitude-latitude grid. The spectral domain is the set of spectral coefficients in a spherical harmonic expansion of of the state variables, and is normally characterized as a triangular array (using a "triangular" truncation of spectral coefficients). Transforming from physical coordinates to spectral coordinates involves performing a real FFT for each line of constant latitude, followed by integration over latitude using Gaussian quadrature (approximating the Legendre transform) to obtain the spectral coefficients. The inverse transformation involves evaluating sums of spectral harmonics and inverse real FFTs, analogous to the forward transform. Parallel algorithms are used to compute the FFTs and to compute the vector sums used to approximate the forward and inverse Legendre transforms. Two major alternatives are available for both transforms, distributed algorithms, using a fixed data decompostion and computing results where they are assigned, and transpose algorithms, remapping the domains to allow the transforms to be calculated sequentially. This translates to four major parallel algorithms: a) distributed FFT/distributed Legendre transform (LT) b) transpose FFT/distributed LT c) distributed FFT/transpose LT d) transpose FFT/transpose LT Multiple implementations are supported for each type of algorithm, and the assignment of processors to transforms is also determined by input parameters. For example, input parameters specify a logical 2-D processor grid and define the data decomposition of the physical and spectral domains onto this grid. If 16 processors are used, these can be arranged as a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid. This specification determines how many processors are used to calculate each parallel FFT and how many are used to calculate each parallel LT. ------------------------------------------------------------------------------- Skeleton sketch of application: The main program calls INPUT to read problem and algorithm parameters and set up arrays for spectral transformations, and then calls INIT to set up the test case parameters. Routines ERRANL and NRGTCS are called once before the main timestepping loop for error normalization, once after the main timestepping for calculating energetics data and errors, and periodically during the timestepping, as requested. The prognostic fields are initialized using routine ANLYTC, which provides the analytic solution. Each call to STEP advances the computed fields by a timestep DT. Timing logic surrounds the timestepping loop, so the initialization phase is not timed. Also, a fake timestep is calculated before beginning timing to eliminate the first time "paging" effect currently seen on the Intel Paragon systems. STEP computes the first two time levels by two semi-implicit timesteps; normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1, which choses between an explicit numerical algorithm, a semi-implicit algorithm, and a simplified algorithm associated with solving the advection equation, one of the embedded test cases. The numerical algorithm used is an input parameter. The basic outline of each timestep is the following: 1) Evaluate non-linear product and forcing terms. 2) Fourier transform non-linear terms in place as a block transform. 3) Compute and update divergence, geopotential, and vorticity spectral coefficients. (Much of the calculation of the time update is "bundled" with the Legendre transform.) 4) Compute velocity fields and transform divergence, geopotential, and vorticity back to gridpoint space using a) an inverse Legendre transform and associated computations and b) an inverse real block FFT. PSTSWM has "fictitious" vertical levels, and all computations are duplicated on the different levels, potentially significantly increasing the granularity of the computation. (The number of vertical levels is an input parameter.) For error analysis, a single vertical level is extracted and analyzed. ------------------------------------------------------------------------------- Brief description of I/O behavior: Processor 0 reads in the input parameters and broadcasts them to the rest of the processors. Processor 0 also receives the error analysis and timing results from the other processors and writes them out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : The processors are treated as a logical 2-D grid. There are 3 domains to be distributed: a) physical domain: tensor product longitude-latitude grid b) Fourier domain: tensor product wavenumber-latitude grid c) spectral domain: triangular array, where each column contains the spectral coefficients associated with a given wavenumber. The larger the wavenumber is, the shorter the column is. An unordered FFT is used, and the Fourier and spectral domains use the "unordered" permutation when the data is being distributed. I) distributed FFT/distributed LT 1) The tensor-product longitude-latitude grid is mapped onto the processor grid by assigning a block of contiguous longitudes to each processor column and by assigning one or two blocks of contiguous latitudes to each processor row. The vertical dimension is not distributed. 2) After the FFT, the subsequent wavenumber-latitude grid is similarly distributed over the processor grid, with a block of the permuted wavenumbers assigned to each processor column. 3) After the LT, the wavenumbers are distributed as before and the spectral coefficients associated with any given wavenumber are either distributed evenly over the processors in the column containing that wavenumber, or are duplicated over the column. What happens is a function of the particular distributed LT algorithm used. II) transpose FFT/distributed LT 1) same as in (I) 2) Before the FFT, the physical domain is first remapped to a vertical layer-latitude decomposition, with a block of contiguous vertical layers assigned to each processor column and the longitude dimension not distributed. After the transform, the vertical level-latitude grid is distributed as before, and the wavenumber dimension is not distributed. 3) After the LT, the spectral coefficients for a given vertical layers are either distributed evenly over the processors in a column, or are duplicated over that column. What happens is a function of the particular distributed LT algorithm used. III) distributed FFT/transpose LT 1) same as (I) 2) same as (I) 3) Before the LT, the wavenumber-latitude grid is first remapped to a wavenumber-vertical layer decomposition, with a block of contiguous vertical layers assigned to eadh processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. IV) transpose FFT/transpose LT 1) same as (I) 2) same as (II) 3) Before the LT, the vertical level-latitude grid is first remapped to a vertical level-wavenumber decomposition, with a block of the permuted wavenumbers now assigned to each processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The distribution is a function of the problem size (longitude, latitude, vertical levels), the logical processor grid (PX, PY), and the algorithm (transpose vs. distributed for FFT and LT). ------------------------------------------------------------------------------- Brief description of load balance behavior : The load is fairly well balanced. If PX and PY evenly divide the number of longitudes, latitudes, and vertical levels, then all load imbalances are due to the unequal distribution of spectral coefficients. As described above, the spectral coefficients are laid out as a triangular array in most runs, where each column corresponds to a different Fourier wavenumber. The wavenumbers are partitioned among the processors in most of the parallel algorithms. Since each column is a different length, a wrap mapping of the the columns will approximately balance the load. Instead, the natural "unordered" ordering of the FFT is used with a block partitioning, which does a reasonable job of load balancing without any additional data movement. The load imbalance is quantified in Walker, et al [5]. If PX and PY do not evenly divide the dimensions of the physical domain, then other load imbalances may be as large as a factor of 2 in the worse case. ------------------------------------------------------------------------------- Give parameters that determine the problem size : MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation used. For a triangular truncation, MM = NN = KK. NLON, NLAT, NVER - number of longitudes, latitudes, and vertical levels. There are required relationships between NLON, NLAT, and NVER, and between these and MM. These relationships are checked in the code. We will also provide a selection of input files that specify legal (and interesting) problems. DT - timestep (in seconds). (Must be small enough to satisfy Courant condition stability condition. Code warns if too large, but does not abort.) TAUE - end of model run (in hours) ------------------------------------------------------------------------------- Give memory as function of problem size : Executable size is determined at compile time by setting the parameters COMPSZ in params.i. Per node memory requirements are approximately (in REALs) associated Legendre polynomial values: MM*MM*NLAT/PX*PY physical grid fields: 8*NLON*NLAT*NVER/(PX*PY) spectral grid fields: 3*MM*MM*NVER/(PX*PY) or (if spectral coefficients duplicated within a processor column) 3*MM*MM*MVER/PX work space: 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY) or (if spectral coefficients duplicated within a processor column) 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX where BUFS1 and BUFS2 are input parameters (number of communication buffers). BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY. In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory requirements are approximately: (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY) or (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX) ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : for a serial run per timestep (very rough): nonlinear terms: 10*NLON*NLAT*NVER forward FFT: 40*NLON*NLAT*NVER*LOG2(NLON) forward LT and time update: 48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER inverse LT and calculation of velocities: 20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER inverse FFT: 25*NLON*NLAT*NVER*LOG2(NLON) Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1): approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep. For a total run, multiply by TAUE/DT. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : This is a function of the algorithm chosen. I) transpose FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*(PX-1) steps, D volume or 2*LOG2(PX) steps, D*LOG2(PX) volume II) distributed FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*LOG2(PX) steps, D*LOG2(PX) volume III) transpose LT a) forward LT: let D = 8*NLON*NLAT*NVER/(PX*PY) 2*(PY-1) steps, D volume or 2*LOG2(PY) steps, D*LOG2(PY) volume b) inverse LT: let D = (3/2)*(MM**2)*NVER/(PX*PY) (PY-1) steps, D volume or LOG2((PY) steps, D*PY volume IV) distributed LT a) forward + inverse LT: let D = 3*(MM**2)*NVER/(PX*PY) 2*(PY-1) steps, D*PY volume or 2*LOG2((PY) steps, D*PY volume These are per timestep costs. Multiply by TAUE/DT for total communication overhead. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : Standard input files will be provided for T21: MM=KK=NN=21 T42: MM=KK=NN=42 T85: MM=NN=KK=85 NLON=32 NLON=64 NLON=128 NLAT=64 NLAT=128 NVER=256 NVER=8 NVER=16 NVER=32 ICOND=2 ICOND=2 ICOND=2 DT=4800.0 DT=2400.0 DT=1200.0 TAUE=120.0 TAUE=120.0 TAUE=120.0 These are 5 day runs of the "benchmark" case specified in Williamson, et al [3]. Flops and memory requirements for serial runs are as follows (approx.): T21: 500,000 REALs 2,000,000,000 flops T42: 4,000,000 REALs 45,000,000,000 flops T85: 34,391,000 REALs 1,000,000,000,000 flops Both memory and flops scale well, so, for example, the T42 run fits in approx. 4MB of memory for a 4 processor run. But different algorithms and different aspect ratios of the processor grid use different amounts of memory. ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Count by hand (looking primarily at inner loops, but eliminating common subexpressions that compiler is expected to find). ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct 8 09:17:11 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA29750; Fri, 8 Oct 93 09:17:11 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA00426; Fri, 8 Oct 93 09:16:23 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 09:16:22 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA00418; Fri, 8 Oct 93 09:16:20 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA20027; Fri, 8 Oct 1993 09:16:19 -0400 Message-Id: <9310081316.AA20027@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: Compact applications chapter Date: Fri, 08 Oct 93 09:16:19 -0500 From: David W. Walker I just sent the following to Mike Berry, but some of you might also like to make suggestions. David Mike, I am a bit of a loss as to what to put into the ParkBench report for Compact Application since we haven't had any codes submitted (except for maybe 2 or 3). It seems to me that we can't really say much without the codes, about from very general requirements. David From owner-pbwg-compactapp@CS.UTK.EDU Fri Oct 8 10:17:35 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA00610; Fri, 8 Oct 93 10:17:35 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA06069; Fri, 8 Oct 93 10:17:05 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Fri, 8 Oct 1993 10:17:03 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from haven.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA06059; Fri, 8 Oct 93 10:17:02 -0400 Received: by haven.EPM.ORNL.GOV (4.1/1.34) id AA15407; Fri, 8 Oct 93 10:16:56 EDT Date: Fri, 8 Oct 93 10:16:56 EDT From: worley@haven.EPM.ORNL.GOV (Pat Worley) Message-Id: <9310081416.AA15407@haven.EPM.ORNL.GOV> To: walker@rios2.epm.ornl.gov, pbwg-compactapp@cs.utk.edu Subject: Re: Compact applications chapter In-Reply-To: Mail from 'David W. Walker ' dated: Fri, 08 Oct 93 09:16:19 -0500 Cc: worley@haven.EPM.ORNL.GOV >I just sent the following to Mike Berry, but some of you might also like to make >suggestions. > >David > >Mike, >>I am a bit of a loss as to what to put into the ParkBench report >for Compact Application since we haven't had any codes submitted (except >for maybe 2 or 3). It seems to me that we can't really say much without >the codes, about from very general requirements. > >David Since I imagine that there will always be a dearth of (good) compact applications, a requirements document (or, at least, a wish list) would be a useful contribution, particularly if the wishlist were prioritized by what is most important for the code to have, e.g., 1) scientific relevance (does anyone care about this type of problem) 2) numerical relevance (are the numerical algorithms representative or interesting) 3) algorithmic relevance (are the parallel algorithms representative or interesting) 4) portability (language, parallel programming model, etc.) 5) runability (easy to run, easy to validate results, easy to use for benchmarking) 6) ... This can probably be broken into requirements and desirable features. Pat From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 14 13:38:54 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA16662; Thu, 14 Oct 93 13:38:54 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA04580; Thu, 14 Oct 93 13:37:31 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 14 Oct 1993 13:37:29 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA04571; Thu, 14 Oct 93 13:37:28 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA19646; Thu, 14 Oct 1993 13:37:27 -0400 Date: Thu, 14 Oct 1993 13:37:27 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9310141737.AA19646@rios2.epm.ornl.gov> To: berry@cs.utk.edu Subject: ParkBench compact applications Cc: pbwg-compactapp@cs.utk.edu Mike, Below is the latest version of the Compact Application section of the ParkBench document. I also intend to send a latex version of the submission form to you later today for inclusion as Appendix A. I hope there will be some comments back from the other members of teh subcommittee about this section so I hope there will be an opportunity to update it. David %file: compac3.tex %date: October 14, 1993 \chapter{Compact Applications} \footnote{assembled by David Walker for Compact Applications subcommittee} \section{Introduction} \label{sec:compact.intro} While kernel applications, such as those described in Chapter 3, provide a fairly straightforward way of assessing the performance of parallel systems they are not representative of scientific applications in general since they do not reflect certain types of system behavior. In particular, many scientific applications involve data movement between phases of an application, and may also require significant amounts of I/O. These types of behavior are difficult to gauge using kernel applications. One factor that has hindered the use of full application codes for benchmarking parallel computers in the past is that such codes are difficult to parallelize and to port between target architectures. In addition, full application codes that have been successfully parallelized are often proprietary, and/or subject to distribution restrictions. To minimize the negative impact of these factors we propose to make use of compact applications in our benchmarking effort. Compact applications are typical of those found in research environments (as opposed to production or engineering environments), and usually consist of up to a few thousand lines of source code. Compact applications are distinct from kernel applications since they are capable of producing scientifically useful results. In many cases, compact applications are made up of several kernels, interspersed with data movements and I/O operations between the kernels. In this chapter the criteria for selecting compact applications for the ParkBench suite will be discussed. In addition, the general research areas that will be represented in the suite are outlined. %In this chapter we will discuss a number of compact applications in terms of %their purpose, the algorithms used, the types of data movements required, %the memory requirements, and %the amount of I/O. The compact application below are not meant to form a %definite or complete list. \section{Criteria for Selection} \label{sec:criteria} The three main criteria for inclusion of a parallel code in the Compact Applications suite are, \begin{enumerate} \item The code must be a complete application and be capable of producing results of research interest. These two points distinguish a compact application from a kernel. For example, a code that only solves a randomly-generated, dense, linear system by LU factorization should be considered a kernel. Even though the code is complete, it does not produce results of research interest. However, if the LU factorization is embedded in an application that uses the boundary element method to solve, for example, a two-dimensional elastodynamics problem, then such an application could legitimately be considered a compact application. Compact applications and full production codes are distinguished by their software complexity, which is difficult to quantify. Software complexity gives an indication of how hard it is to write, port and maintain an application, and may be gauged very roughly by the length of the source code. However, there is no hard upper limit on the length of a code in the Compact Applications suite. It is expected that the source code (excluding comments and repeated common blocks) for most compact applications will be between 2000 and 10000 lines, but some may be longer. \item The code must be of high quality. This means it must have been extensively tested and validated, preferably on a wide selection of different parallel architectures. The problem size and number of processors used must not be hard-coded into the application, and should be specified at runtime as input to the program. Ideally, the parallel code should not impose restrictions on the problem size that are not applicable for the corresponding sequential code. Thus, the parallel code should not require that the problem size be exactly divisible by the number of processors, or that the number of processors be a power of two. In some cases this latter requirement may have to be relaxed. For example, most parallel fast Fourier transform routines require the number of processors to be a power of two. It is preferable that the code be written so that it works correctly for an arbitrary one-to-one mapping between the logical process topology of the application and the hardware topology of the parallel computer. This is desirable so that the assignment of a location in the logical process topology to a physical processor can be easily adjusted when porting the application between platforms. For example a Gray code assignment may be best for a hypercube, and a natural ordering for a mesh architecture. \item The application must be well documented. The source code itself should contain an adequate number of comments, and each module should begin with a comment section that describes what the routine does, and the arguments passed to it. In addition, there should be a ``Users' Guide'' to the application that describes the input and output, the parameterization of the problem size and processor layout, and details of what the application does. The Users' Guide should also contain a bibliography of related papers. \end{enumerate} In addition, to the three criteria discussed above, there are a number of other desirable features that a ParkBench Compact Application should have. These are discussed in the following subsections. \subsection{Self Checking Applications} \label{subsec:checking} The application should be self-checking. That is, at the end of the computation the application should perform a check to validate the results of the run. The application may also output a summary of performance results for the run, such as the Mflop rate, and other pertinent information. \subsection{Programming Languages} \label{subsec:languages} The code should be written in Fortran 77, Fortran 90, High Performance Fortran, or C. Data should be passed between processors by explicit message passing. ParkBench does not specify which message passing system should be used, but one that is available on a number of parallel platforms is preferable. Eventually it is expected that MPI will become the message passing system of choice, but in the meantime portable systems such as PVM, PICL, Express, PARMACS, and P4 are acceptable alternatives. The codes in the Compact Applications suite should not contain any assembly coded portions, although assembly code may be used in optimized versions of the code. \section{Proposed Compact Application Benchmarks} \label{sec:compact.proposed} At the time of writing (October 1993) the ParkBench organization is in the process of soliciting submission of applications for inclusion in the Compact Applications suite. Thus, the applications that comprise the suite cannot yet be listed here. However, in this section the main application areas that are expected to be in the suite are outlined. The intention is that these areas should be representative of the fields in which parallel computers are actually used. The codes should exercise a number of different algorithms, and possess different communication and I/O characteristics. Initially the Compact Applications suite will consist of no more than ten codes. This restriction is imposed so that the resources needed to manage and distribute the suite can be assessed. The suite may be enlarged in the future if this seems manageable. Below is a list of the application areas that are expected to be represented in the suite. This is not meant to be an exclusive list; submissions from other application areas will be considered for inclusion in the suite. \begin{itemize} \item Climate and meteorological modeling \item Computational fluid dynamics (CFD) \item Finance, e.g., portfolio optimization \item Molecular dynamics \item Plasma physics \item Quantum chemistry \item Quantum chromodynamics (QCD) \item Reservoir modeling \end{itemize} \section{Submitting to the Compact Application Suite} \label{sec:submit} The procedure for submitting codes to the ParkBench Compact Applications suite is as follows. \begin{enumerate} \item Complete the submission form in Appendix A, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and the submitter will be notified if the application is to be considered further for inclusion in the ParkBench suite. \item If ParkBench Compact Applications Subcommittee decides to consider the application further the submitter will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscipt format. These files documents and papers together constitute the application package. The application package should be sent to the following address, and the subcommittee will then make a final decision on whether to include the application in the ParkBench suite.\par \smallskip \indent David W. Walker\par \indent Oak Ridge National Laboratory\par \indent Bldg.~6012/MS-6367\par \indent P. O. Box 2008\par \indent Oak Ridge, TN 37831-6367\par \indent (615) 574-7401/0680 (phone/fax)\par \indent walker@msr.epm.ornl.gov\par \item If the application is approved for inclusion in the ParkBench suite an authorized person from the submitting organization will be asked to complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), the application package. From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:51:57 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA11600; Thu, 28 Oct 93 08:51:57 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07295; Thu, 28 Oct 93 08:51:33 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:51:32 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07287; Thu, 28 Oct 93 08:51:31 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA13437; Thu, 28 Oct 1993 08:51:41 -0400 Date: Thu, 28 Oct 1993 08:51:41 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9310281251.AA13437@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: Compact Appl. Submissions So far I've received 3 submissions for the ParkBench Compact Applications suite. I'm sending you the completed forms in 3 separate email messages. David From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:38 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA11616; Thu, 28 Oct 93 08:52:38 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07341; Thu, 28 Oct 93 08:52:14 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:13 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07333; Thu, 28 Oct 93 08:52:11 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA11913; Thu, 28 Oct 1993 08:52:21 -0400 Date: Thu, 28 Oct 1993 08:52:21 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9310281252.AA11913@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: POLMP Compact Application ------------------------------------------------------------------------------- Name of Program : POLMP (Proudman Oceanographic Laboratory Multiprocessing Program) ------------------------------------------------------------------------------- Submitter's Name : Mike Ashworth Submitter's Organization: NERC Computer Services Submitter's Address : Bidston Observatory Birkenhead, L43 7RA, UK Submitter's Telephone # : +44-51-653-8633 Submitter's Fax # : +44-51-653-6269 Submitter's Email : mia@ua.nbi.ac.uk ------------------------------------------------------------------------------- Cognizant Expert : Mike Ashworth CE's Organization : NERC Computer Services CE's Address : Bidston Observatory Birkenhead, L43 7RA, UK CE's Telephone # : +44-51-653-8633 CE's Fax # : +44-51-653-6269 CE's Email : mia@ua.nbi.ac.uk ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Bearing in mind other commitments, Mike Ashworth is prepared to respond quickly to questions and bug reports, and expects to be kept informed as to results of experiments and modifications to the code. ------------------------------------------------------------------------------- Major Application Field : Fluid Dynamics Application Subfield(s) : Ocean and Shallow Sea Modeling ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : The POLMP project was created to develop numerical algorithms for shallow sea 3D hydrodynamic models that run efficiently on modern parallel computers. A code was developed, using a set of portable programming conventions based upon standard Fortran 77, which follows the wind induced flow in a closed rectangular basin including a number of arbitrary land areas. The model solves a set of hydrodynamic partial differential equations, subject to a set of initial conditions, using a mixed explicit/implicit forward time integration scheme. The explicit component corresponds to a horizontal finite difference scheme and the implicit to a functional expansion in the vertical (Davies, Grzonka and Stephens, 1989). By the end of 1989 the code had been implemented on the RAL 4 processor Cray X-MP using Cray's microtasking system, which provides parallel processing at the level of the Fortran DO loop. Acceptable parallel performance was achieved by integrating each of the vertical modes in parallel, referred to in Ashworth and Davies (1992) as vertical partitioning. In particular, a speed-up of 3.15 over single processor execution was obtained, with an execution rate of 548 Megaflops corresponding to 58 per cent of the peak theoretical performance of the machine. Execution on an 8 processor Cray Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or 67 per cent of the peak (Davies, Proctor and O'Neill, 1991). The latter resulted in Davies and Grzonka being awarded a prize in the 1990 Cray Gigaflop Performance Awards . The project has been extended by implementing the shallow sea model in a form which is more appropriate to a variety of parallel architectures, especially distributed memory machines, and to a larger number of processors. It is especially desirable to be able to compare shared memory parallel architectures with distributed memory architectures. Such a comparison is currently relevant to NERC science generally and will be a factor in the considerations for the purchase of new machines, bids for allocations on other academic machines, and for the design of new codes and the restructuring of existing codes. In order to simplify development of the new code and to ensure a proper comparison between machines, a restructured version of the Davies and Grzonka rectangle was designed which will perform partitioning of the region in the horizontal dimension. This has the advantage over vertical partitioning that the communication between processors is limited to a few points at the boundaries of each sub-domain. The ratio of interior points to boundary points, which determines the ratio of computation to communication and hence the efficiency on message passing, distributed memory machines, may be increased by increasing the size of the individual sub-domains. This design may also improve the efficiency on shared memory machines by reducing the time of the critical section and reducing memory conflicts between processors. In addition, the required number of vertical modes is only about 16, which, though well suited to a 4 or 8 processor machine, does not contain sufficient parallelism for more highly parallel machines. The code has been designed with portability in mind, so that essentially the same code may be run on parallel computers with a range of architectures. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : Yes, but users are requested to acknowledge the authors (Ashworth and Davies) in any resulting research or publications, and are encouraged to send reprints of their work with this code to the authors. Also, the authors would appreciate being notified of any modifications to the code. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: Some 8 byte floating point numbers are used in some of the initialization code, but calculations on the main field arrays may be done using 4 byte floating point variables without grossly affecting the solution. Nevertheless, precision conversion is facilitated by a switch supplied to the C preprocessor. By specifying -DSINGLE, variables will be declared as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be DOUBLE PRECISION, normally 8 bytes. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : The README file supplied with the code describes how the various versions of the code should be built. Extensive documentation, including the definition of all variables in COMMON is present as comments in the code. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : 1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth. in Fliuds, 1983, Vol. 3, 33-60. 2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids, 1991, Vol. 13, 235-250. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : 1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional hydrodynamic models for computers with low and high degrees of parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert and H.Liddell (North Holland, 1992), 553-560. 2) Ashworth, M., Parallel Processing in Environmental Modelling, in Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993. 3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional Hydrodynamic Model on a Range of Parallel Computers, in Proceedings of the Euromicro Workshop on Parallel and Distributed Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE Computer Society Press) 4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M., Implementation of three dimensional shallow sea models on vector and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30. 5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in Advances in Parallel Computing, Vol. 2, edited D.J. Evans. 6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea Hydrodynamic Models in Environmental Science", Cray Channels, Winter 1991. ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Code is initially passed through the C preprocessor, allowing a number of versions with different programming styles, precisions and machine dependencies to be generated. Fortran 77 version A sequential version of POLMP is available, which conforms to the Fortran 77 standard. This version has been run on a large number of machines from workstations to supercomputers and any code which caused problems, even if it conformed to the standard, has been changed or removed. Thus its conformance to the Fortran 77 standard is well established. In order to allow the code to run on a wide range of problem sizes without recompilation, the major arrays are defined dynamically by setting up pointers, with names starting with IX, which point to locations in a single large data array: SA. Most pointers are allocated in subroutine MODSUB and the starting location passed down into subroutines in which they are declared as arrays. For example : IX1 = 1 IX2 = IX1 + N*M CALL SUB ( SA(IX1), SA(IX2), N, M ) SUBROUTINE SUB ( A1, A2, N, M ) DIMENSION A1(N,M), A2(N,M) END Although this is probably against the spirit of the Fortran 77 standard, it is considered the best compromise between portability and utility, and has caused no problems on any of the machines on which it has been tried. The code has been run on a number of traditional vector supercomputers, mainframes and workstations. In addition, key loops are able to be parallelized automatically by some compilers on shared (or virtual shared) memory MIMD machines, allowing parallel execution on the Convex C2 and C3, Cray X-MP, Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray macrotasking calls may also be enabled for an alternative mode of parallel execution on Cray multiprocessors. Message passing version POLMP has been implemented on a number of message-passing machines: Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2. Code is also present for the PVM and Parmacs portable message passing systems, and POLMP has run successfully, though not efficiently, on a network of Silicon Graphics workstations. Calls to message passing routines are concentrated in a small number of routines for ease of portability and maintenance. POLMP performs housekeeping tasks on one node of the parallel machine, usually node zero, referred to in the code as the driver process, the remaining processes being workers. For Parmacs version 5 which requires a host program, a simple host program has been provided which loads the node program onto a two dimensional torus and then takes no further part in the run, other than to receive a completion code from the driver, in case terminating the host early would interfere with execution of the nodes. Data parallel versions A data parallel version of the code has been run on the Thinking Machines CM-2, CM-200 and MasPar MP-1 machines. High Performance Fortran (HPF) defines extensions to the Fortran 90 language in order to provide support for parallel execution on a wide variety of machines using a data parallel programming model. The subset-HPF version of the POLMP code has been written to the draft standard specified by the High Performance Fortran Forum in the HPF Language Specification version 0.4 dated November 6, 1992. Fortran 90 code was developed on a Thinking Machines CM-200 machine and checked for conformance with the Fortran 90 standard using the NAGWare Fortran 90 compiler. HPF directives were inserted by translating from the CM Fortran directives, but have not been tested due to the lack of access to an HPF compiler. The only HPF features used are the PROCESSORS, TEMPLATE, ALIGN and DISTRIBUTE directives and the system inquiry intrinsic function NUMBER_OF_PROCESSORS. ------------------------------------------------------------------------------- Total number of lines in source code: 26,699 Number of lines excluding comments : 11,313 Size in bytes of source code : 756,107 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : steering file: 13 lines, 250 bytes, ascii (typical size) ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: 700 lines, 62,000 bytes, ascii (typical size) ------------------------------------------------------------------------------- Brief, high-level description of what application does: POLMP solves the linear three-dimensional hydrodynamic equations for the wind induced flow in a closed rectangular basin of constant depth which may include an arbitrary number of land areas. ------------------------------------------------------------------------------- Main algorithms used: The discretized form of the hydrodynamic equations are solved for field variables, z, surface elevation, and u and v, horizontal components of velocity. The fields are represented in the horizontal by a staggered finite difference grid. The profile of vertical velocity with depth is represented by the superposition of a number of spectral components. The functions used in the vertical are arbitrary, although the computational advantages of using eigenfunctions (modes) of the eddy viscosity profile have been demonstrated (Davies, 1983). Velocities at the closed boundaries are set to zero. Each timestep in the forward time integration of the model, involves successive updates to the three fields, z, u and v. New field values computed in each update are used in the subsequent calculations. A five point finite difference stencil is used, requiring only nearest neighbours on the grid. A number of different data storage and data processing methods is included mainly for handling cases with significant amounts of land, e.g. index array, packed data. In particular the program may be switched between masked operation, more suitable for vector processors, in which computation is done on all points, but land and boundary points are masked out, and strip-mining, more suitable for scalar and RISC processors, in which calculations are only done for sea points. ------------------------------------------------------------------------------- Skeleton sketch of application: The call chart of the major subroutines is represented thus: AAAPOL -> APOLMP -> INIT -> RUNPOL -> INIT2 -> MAP -> DIVIDE -> PRMAP -> GENSTP -> SPEC -> ROOTS -> TRANS -> SNDWRK -> RCVWRK -> SETUP -> MODSUB -> MODEL -> ASSIGN -> GENMSK -> GENSTP -> GENIND -> GENPAC -> METRIC -> CLRFLD -> TIME* -> SNDBND -> RCVBND -> RESULT -> SNDRES -> RCVRES -> MODOUT -> OZUVW -> OUTFLD -> GETRES -> OUTARR -> GRYARR -> WSTATE AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which reads parameters from the steering file, checks and monitors them. RUNPOL is then called which calls another initialization routine INIT2. Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE divides the domain between processors, PRMAP maps sub-domains onto processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS and TRANS set up the coefficients for the spectral expansion. SNDWRK on the driver process sends details of the sub-domain to be worked on to each worker. RCVWRK receives that information. SETUP does some array allocation and MODSUB does the main allocation of array space to the field and ancillary arrays. MODEL is the main driver subroutine for the model. ASSIGN calls routines to generate masks strip-mining indexes, packing indexes and measurement metrics. CLRFLD initializes the main data arrays. Then one of seven time- stepping routines, TIME*, is chosen dependent on the vectorization and packing/indexing method used to cope with the presence of land. SNDBND and RCVBND handle the sending and reception of boundary data between sub-domains. After the required number of time-steps is complete, RESULT saves results from the desired region, and SNDRES, on the workers and RCVRES on the driver collect the result data. MODOUT handles the writing of model output to standard output and disk files, as required. For a non-trivial run, 99% of time is spent in whichever of the timestepping routines, TIME*, has been chosen. ------------------------------------------------------------------------------- Brief description of I/O behavior: The driver process, usually processor 0, reads in the input parameters and broadcasts them to the rest of the processors. The driver also receives the results from the other processors and writes them out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : The processors are treated as a logical 2-D grid. The simulation domain is divided into a number of sub-domains which are allocated, one sub-domain per processor. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The number of processors, p, and the number of sub-domains are provided as steering parameters, as is a switch which requests either one-dimensional or two-dimensional partitioning. Partitioning is only actually carried out for the message passing versions of the code. For two-dimensional partitioning p is factored into px and py where px and py are as close as possible to sqrt(p). For the data parallel version the number of sub-domains is set to one and decomposition is performed by the compiler via data distribution directives. ------------------------------------------------------------------------------- Brief description of load balance behavior : Unless land areas are specified, the load is fairly well balanced. If px and py evenly divide the number of grid points, then the model is perfectly balanced except that boundary sub-domains have fewer communications. No tests with land areas have yet been performed with the parallel code, and more sophisticated domain decomposition algorithms have not yet been included. ------------------------------------------------------------------------------- Give parameters that determine the problem size : nx, ny Size of horizontal grid m Number of vertical modes nts Number of timesteps to be performed ------------------------------------------------------------------------------- Give memory as function of problem size : See below for specific examples. ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : Assuming stanrdard compiler optimizations, there is a requirement for 29 floating point operations (18 add/subtracts and 11 multiplies) per grid point, so the total computational load is 29 * nx * ny * m * nts ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py requires the following communications in words : nsubx * m from N nsubx from S nsubx * m from S nsuby * m from W nsuby from E nsuby * m from E m from NE m from SW making a total of (2 * m + 1)*(nsubx * nsuby) + 2*m words in eight messages from six directions. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : The data sizes and computational requirements for the various problems supplied are : Name nx x ny x m x nts Computational Memory Load (Gflop) (Mword) dbg 10 x 10 x 1 x 2 Small debugging test case dbg2d 10 x 10 x 1 x 2 Small debugging test case for a 2 x 2 decomposition v200 512 x 512 x 16 x 200 24 14 wa200 1024 x 1024 x 40 x 200 226 126 xb200 2048 x 2048 x 80 x 200 1812 984 The memory sizes are the number of Fortran real elements (words) required for the strip-mined case on a single processor. For the masked case the memory requirement is approximately doubled for the extra mask arrays. For the message passing versions, the total memory requirement will also tend to increase slightly (<10%) with the number of processors employed. ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Count by hand looking at inner loops and making reasonable assumptions about common compiler optimizations. ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- -- ,?, (o o) |------------------------------oOO--(_)--OOo----------------------------| | | | Dr Mike Ashworth NERC Computer Services | | NERC Supercomputing Consultant Bidston Observatory | | Tel: +44 51 653 8633 BIRKENHEAD | | Fax: +44 51 653 6269 L43 7RA | | email: mia@ua.nbi.ac.uk United Kingdom | | alternative: M.Ashworth@ncs.nerc.ac.uk | |-----------------------------------------------------------------------| From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:52:55 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA11653; Thu, 28 Oct 93 08:52:55 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07365; Thu, 28 Oct 93 08:52:35 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:34 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07357; Thu, 28 Oct 93 08:52:32 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA16524; Thu, 28 Oct 1993 08:52:41 -0400 Date: Thu, 28 Oct 1993 08:52:41 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9310281252.AA16524@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: PSTSWM Compact Application Received: from msr.EPM.ORNL.GOV by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA20602; Tue, 5 Oct 1993 09:58:22 -0400 Received: from haven.EPM.ORNL.GOV by msr.epm.ornl.gov (4.1/1.34) id AA09050; Tue, 5 Oct 93 09:58:21 EDT Received: by haven.EPM.ORNL.GOV (4.1/1.34) id AA13369; Tue, 5 Oct 93 09:58:14 EDT Date: Tue, 5 Oct 93 09:58:14 EDT From: worley@haven.epm.ornl.gov (Pat Worley) Message-Id: <9310051358.AA13369@haven.EPM.ORNL.GOV> To: walker@msr.epm.ornl.gov PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscipt format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker Oak Ridge National Laboratory Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 (615) 574-7401/0680 (phone/fax) walker@msr.epm.ornl.gov The street address is "Bethal Valley Road" if Fedex insists on this. The subcommittee will then make a final decision on whether to include your application in the ParkBench suite. 3. If your application is approved for inclusion in the ParkBench suite you (or some authorized person from your organization) will be asked in complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), your application package. ------------------------------------------------------------------------------- Name of Program : PSTSWM : (Parallel Spectral Transform Shallow Water Model) ------------------------------------------------------------------------------- Submitter's Name : Patrick H. Worley Submitter's Organization: Oak Ridge National Laboratory Submitter's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 Submitter's Telephone # : (615) 574-3128 Submitter's Fax # : (615) 574-0680 Submitter's Email : worley@msr.epm.ornl.gov ------------------------------------------------------------------------------- Cognizant Expert(s) : Patrick H. Worley CE's Organization : Oak Ridge National Laboratory CE's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 CE's Telephone # : (615) 574-3128 CE's Fax # : (615) 574-0680 CE's Email : worley@msr.epm.ornl.gov Cognizant Expert(s) : Ian T. Foster CE's Organization : Argonne National Laboratory CE's Address : MCS 221/D-235 9700 S. Cass Avenue Argonne, IL 60439 CE's Telephone # : (708) 252-4619 CE's Fax # : (708) 252-5986 CE's Email : itf@mcs.anl.gov ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Modulo other commitments, Worley is prepared to respond quickly to questions and bug reports, but expects to be kept informed as to results of experiments and modifications to the code. ------------------------------------------------------------------------------- Major Application Field : Fluid Dynamics Application Subfield(s) : Climate Modeling ------------------------------------------------------------------------------- Application "pedigree" : PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equations using the spectral transform method. The spectral transform algorithm of the code follows closely how CCM2, the NCAR Community Climate Model, handles the dynamical part of the primitive equations, and the parallel algorithms implemented in the model include those currently used in the message-passing parallel implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge National Laboratory and Ian Foster of Argonne National Laboratory, and is based partly on previous parallel algorithm research by John Drake, David Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code development and parallel algorithms research were funded by the DOE Computer Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The features of version 1.0 were frozen on 8/1/93, and it is this version we would offer initially as a benchmark. PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations on a sphere using the spectral transform method. STSWM evolved from a spectral shallow water model written by Hack (NCAR/CGD) to compare numerical schemes designed to solve the divergent barotropic equations in spherical geometry. STSWM was written partially to provide the reference solutions to the test cases proposed by Williamson et. al. (see citation [4] below), which were chosen to test the ability of numerical methods to simulate important flow phenomena. These test cases are embedded in the code and are selectable at run-time via input parameters, specifying initial conditions, forcing, and analytic solutions (for error analysis). The solutions are also published in a Technical Note by Jakob et. al. [3]. In addition, this code is meant to serve as an educational tool for numerical studies of the shallow water equations. A detailed description of the spectral transform method, and a derivation of the equations used in this software, can be found in the Technical Note by Hack and Jakob [2]. For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the correct communication and computation granularity for 3-D weather and climate codes), to increase modularity and support code reuse, and to allow the problem size to be selected at runtime without depending on dynamic memory allocation. PSTSTWM is meant to be a compromise between paper benchmarks and the usual fixed benchmarks by allowing a significant amount of runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the numerical simulation can be run on different machines without fixing the parallel implementation, but forcing all implementations to execute the same numerical code (to guarantee fairness). The code has also been written in such a way that linking in optimized library functions for common operations instead of the "portable" code will simple. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : Yes, but users are requested to acknowledge the authors (Worley and Foster) and the program that supported the development of the code (DOE CHAMMP program) in any resulting research or publications, and are encouraged to send reprints of their work with this code to the authors. Also, the authors would appreciate being notified of any modifications to the code. Finally, the code has been written to allow easy reuse of code in other applications, and for educational purposes. The authors encourage this, but also request that they be notified when pieces of the code are used. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION variables. The code should work correctly for any system in which COMPLEX is represented as 2 REALs. The include file params.i has parameters that can be used to specify the length of these. Also, some REAL and DOUBLE parameters values may need to be modified for floating point number systems with large mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where Integers : 4 bytes Floats : 4 bytes The use of two precisions can be eliminated, but at the cost of a significant loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases the error by approximately three orders of magnitude.) DOUBLE PRECISION results are only used in set-up (computing Gauss weights and nodes and Legendre polynomial values), and are not used in the body of the computation. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : The sequential code is documented in a file included in the distribution of the code from NCAR: Jakob, Ruediger, Description of Software for the Spectral Transform Shallow Water Model Version 2.0. National Center for Atmospheric Research, Boulder, CO 80307-3000, August 1992 and in Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. Documentation of the parallel code is in preparation, but extensive documentation is present in the code. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : 1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of three numerical methods for solving differential equations on the sphere, Monthly Weather Review, 117:1058-1075, 1989. 2) Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to shallow water test set using the spectral transform method, NCAR Technical Note TN-388+STR (in preparation). 4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber, A standard test set for numerical approximations to the shallow water equations in spherical geometry, Journal of Computational Physics, Vol. 102, pp.211-224, 1992. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : 5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method, Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), pp. 269-291. 6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral Transform Method. Part II, Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), pp. 509-531. 7) Foster, I. T. and P. H. Worley, Parallelizing the Spectral Transform Method: A Comparison of Alternative Parallel Algorithms, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 100-107. 8) Foster, I. T. and P. H. Worley, Parallel Algorithms for the Spectral Transform Method, (in preparation) 9) Worley, P. H. and I. T. Foster, PSTSWM: A Parallel Algorithm Testbed and Benchmark. (in preparation) ------------------------------------------------------------------------------- Other relevent research papers: 10) I. Foster, W. Gropp, and R. Stevens, The parallel scalability of the spectral transform method, Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes, R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley, The Message-Passing Version of the Parallel Community Climate Model, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 500-513. 12) Sato, R. K. and R. D. Loft, Implementation of the NCAR CCM2 on the Connection Machine, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 371-393. 13) Barros, S. R. M. and Kauranne, T., On the Parallelization of Global Spectral Eulerian Shallow-Water Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 36-43. 14) Kauranne, T. and S. R. M. Barros, Scalability Estimates of Parallel Spectral Atmospheric Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 312-328. 15) Pelz, R. B. and W. F. Stern, A Balanced Parallel Algorithm for Parallel Processing, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 126-128. ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : The model code is primarily written in Fortran 77, but also uses DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in common and parameter declarations). It has been compiled and run on the Intel iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation, IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code). Message passing is implemented using the PICL message passing system. All message passing is encapsulated in 3 highlevel routines: BCAST0 (broadcast) GMIN0 (global minimum) GMAX0 (global maximum) two classes of low level routines: SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3 (variants and/or pieces of the swap operation) and SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3 (variants and/or pieces of the send/recv operation) and one synchronization primitive: CLOCKSYNC0 PICL instrumentation commands are also embedded in the code. Porting the code to another message passing library will be simple, although some of the runtime communication options may become illegal then. The PICL instrumentation calls can be stubbed out (or removed) without changing the functionality of the code, but some sort of synchronization is needed when timing short benchmark runs. ------------------------------------------------------------------------------- Total number of lines in source code: 28,204 Number of lines excluding comments : 12,434 Size in bytes of source code : 994,299 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : problem: 23 lines, 559 bytes, ascii algorithm: 33 lines, 874 bytes, ascii ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: Number of lines and bytes is a function of the input specifications, but for benchmarking would normally be 63 lines (2000 bytes) of meaningful output. (On the Intel machine, FORTRAN STOP messages are sent from each processor at the end of the run, increasing this number.) timings: Each run produces one line of output, containing approx. 150 bytes. Both files are ascii. ------------------------------------------------------------------------------- Brief, high-level description of what application does: (P)STSWM solves the nonlinear shallow water equations on the sphere. The nonlinear shallow water equations constitute a simplified atmospheric-like fluid prediction model that exhibits many of the features of more complete models, and that has been used to investigate numerical methods and benchmark a number of machines. Each run of PSTSWM uses one of 6 embedded initial conditions and forcing functions. These cases were chosen to stress test numerical methods for this problem, and to represent important flows that develop in atmospheric modeling. STSWM also supports reading in arbitrary initial conditions, but this was removed from the parallel code to simplify the development of the initial implementation. ------------------------------------------------------------------------------- Main algorithms used: PSTSWM uses the spectral transform method to solve the shallow water equations. During each timestep, the state variables of the problem are transformed between the physical domain, where most of the physical forces are calculated, and the spectral domain, where the terms of the differential equation are evaluated. The physical domain is a tensor product longitude-latitude grid. The spectral domain is the set of spectral coefficients in a spherical harmonic expansion of of the state variables, and is normally characterized as a triangular array (using a "triangular" truncation of spectral coefficients). Transforming from physical coordinates to spectral coordinates involves performing a real FFT for each line of constant latitude, followed by integration over latitude using Gaussian quadrature (approximating the Legendre transform) to obtain the spectral coefficients. The inverse transformation involves evaluating sums of spectral harmonics and inverse real FFTs, analogous to the forward transform. Parallel algorithms are used to compute the FFTs and to compute the vector sums used to approximate the forward and inverse Legendre transforms. Two major alternatives are available for both transforms, distributed algorithms, using a fixed data decompostion and computing results where they are assigned, and transpose algorithms, remapping the domains to allow the transforms to be calculated sequentially. This translates to four major parallel algorithms: a) distributed FFT/distributed Legendre transform (LT) b) transpose FFT/distributed LT c) distributed FFT/transpose LT d) transpose FFT/transpose LT Multiple implementations are supported for each type of algorithm, and the assignment of processors to transforms is also determined by input parameters. For example, input parameters specify a logical 2-D processor grid and define the data decomposition of the physical and spectral domains onto this grid. If 16 processors are used, these can be arranged as a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid. This specification determines how many processors are used to calculate each parallel FFT and how many are used to calculate each parallel LT. ------------------------------------------------------------------------------- Skeleton sketch of application: The main program calls INPUT to read problem and algorithm parameters and set up arrays for spectral transformations, and then calls INIT to set up the test case parameters. Routines ERRANL and NRGTCS are called once before the main timestepping loop for error normalization, once after the main timestepping for calculating energetics data and errors, and periodically during the timestepping, as requested. The prognostic fields are initialized using routine ANLYTC, which provides the analytic solution. Each call to STEP advances the computed fields by a timestep DT. Timing logic surrounds the timestepping loop, so the initialization phase is not timed. Also, a fake timestep is calculated before beginning timing to eliminate the first time "paging" effect currently seen on the Intel Paragon systems. STEP computes the first two time levels by two semi-implicit timesteps; normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1, which choses between an explicit numerical algorithm, a semi-implicit algorithm, and a simplified algorithm associated with solving the advection equation, one of the embedded test cases. The numerical algorithm used is an input parameter. The basic outline of each timestep is the following: 1) Evaluate non-linear product and forcing terms. 2) Fourier transform non-linear terms in place as a block transform. 3) Compute and update divergence, geopotential, and vorticity spectral coefficients. (Much of the calculation of the time update is "bundled" with the Legendre transform.) 4) Compute velocity fields and transform divergence, geopotential, and vorticity back to gridpoint space using a) an inverse Legendre transform and associated computations and b) an inverse real block FFT. PSTSWM has "fictitious" vertical levels, and all computations are duplicated on the different levels, potentially significantly increasing the granularity of the computation. (The number of vertical levels is an input parameter.) For error analysis, a single vertical level is extracted and analyzed. ------------------------------------------------------------------------------- Brief description of I/O behavior: Processor 0 reads in the input parameters and broadcasts them to the rest of the processors. Processor 0 also receives the error analysis and timing results from the other processors and writes them out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : The processors are treated as a logical 2-D grid. There are 3 domains to be distributed: a) physical domain: tensor product longitude-latitude grid b) Fourier domain: tensor product wavenumber-latitude grid c) spectral domain: triangular array, where each column contains the spectral coefficients associated with a given wavenumber. The larger the wavenumber is, the shorter the column is. An unordered FFT is used, and the Fourier and spectral domains use the "unordered" permutation when the data is being distributed. I) distributed FFT/distributed LT 1) The tensor-product longitude-latitude grid is mapped onto the processor grid by assigning a block of contiguous longitudes to each processor column and by assigning one or two blocks of contiguous latitudes to each processor row. The vertical dimension is not distributed. 2) After the FFT, the subsequent wavenumber-latitude grid is similarly distributed over the processor grid, with a block of the permuted wavenumbers assigned to each processor column. 3) After the LT, the wavenumbers are distributed as before and the spectral coefficients associated with any given wavenumber are either distributed evenly over the processors in the column containing that wavenumber, or are duplicated over the column. What happens is a function of the particular distributed LT algorithm used. II) transpose FFT/distributed LT 1) same as in (I) 2) Before the FFT, the physical domain is first remapped to a vertical layer-latitude decomposition, with a block of contiguous vertical layers assigned to each processor column and the longitude dimension not distributed. After the transform, the vertical level-latitude grid is distributed as before, and the wavenumber dimension is not distributed. 3) After the LT, the spectral coefficients for a given vertical layers are either distributed evenly over the processors in a column, or are duplicated over that column. What happens is a function of the particular distributed LT algorithm used. III) distributed FFT/transpose LT 1) same as (I) 2) same as (I) 3) Before the LT, the wavenumber-latitude grid is first remapped to a wavenumber-vertical layer decomposition, with a block of contiguous vertical layers assigned to eadh processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. IV) transpose FFT/transpose LT 1) same as (I) 2) same as (II) 3) Before the LT, the vertical level-latitude grid is first remapped to a vertical level-wavenumber decomposition, with a block of the permuted wavenumbers now assigned to each processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The distribution is a function of the problem size (longitude, latitude, vertical levels), the logical processor grid (PX, PY), and the algorithm (transpose vs. distributed for FFT and LT). ------------------------------------------------------------------------------- Brief description of load balance behavior : The load is fairly well balanced. If PX and PY evenly divide the number of longitudes, latitudes, and vertical levels, then all load imbalances are due to the unequal distribution of spectral coefficients. As described above, the spectral coefficients are laid out as a triangular array in most runs, where each column corresponds to a different Fourier wavenumber. The wavenumbers are partitioned among the processors in most of the parallel algorithms. Since each column is a different length, a wrap mapping of the the columns will approximately balance the load. Instead, the natural "unordered" ordering of the FFT is used with a block partitioning, which does a reasonable job of load balancing without any additional data movement. The load imbalance is quantified in Walker, et al [5]. If PX and PY do not evenly divide the dimensions of the physical domain, then other load imbalances may be as large as a factor of 2 in the worse case. ------------------------------------------------------------------------------- Give parameters that determine the problem size : MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation used. For a triangular truncation, MM = NN = KK. NLON, NLAT, NVER - number of longitudes, latitudes, and vertical levels. There are required relationships between NLON, NLAT, and NVER, and between these and MM. These relationships are checked in the code. We will also provide a selection of input files that specify legal (and interesting) problems. DT - timestep (in seconds). (Must be small enough to satisfy Courant condition stability condition. Code warns if too large, but does not abort.) TAUE - end of model run (in hours) ------------------------------------------------------------------------------- Give memory as function of problem size : Executable size is determined at compile time by setting the parameters COMPSZ in params.i. Per node memory requirements are approximately (in REALs) associated Legendre polynomial values: MM*MM*NLAT/PX*PY physical grid fields: 8*NLON*NLAT*NVER/(PX*PY) spectral grid fields: 3*MM*MM*NVER/(PX*PY) or (if spectral coefficients duplicated within a processor column) 3*MM*MM*MVER/PX work space: 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY) or (if spectral coefficients duplicated within a processor column) 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX where BUFS1 and BUFS2 are input parameters (number of communication buffers). BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY. In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory requirements are approximately: (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY) or (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX) ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : for a serial run per timestep (very rough): nonlinear terms: 10*NLON*NLAT*NVER forward FFT: 40*NLON*NLAT*NVER*LOG2(NLON) forward LT and time update: 48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER inverse LT and calculation of velocities: 20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER inverse FFT: 25*NLON*NLAT*NVER*LOG2(NLON) Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1): approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep. For a total run, multiply by TAUE/DT. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : This is a function of the algorithm chosen. I) transpose FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*(PX-1) steps, D volume or 2*LOG2(PX) steps, D*LOG2(PX) volume II) distributed FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*LOG2(PX) steps, D*LOG2(PX) volume III) transpose LT a) forward LT: let D = 8*NLON*NLAT*NVER/(PX*PY) 2*(PY-1) steps, D volume or 2*LOG2(PY) steps, D*LOG2(PY) volume b) inverse LT: let D = (3/2)*(MM**2)*NVER/(PX*PY) (PY-1) steps, D volume or LOG2((PY) steps, D*PY volume IV) distributed LT a) forward + inverse LT: let D = 3*(MM**2)*NVER/(PX*PY) 2*(PY-1) steps, D*PY volume or 2*LOG2((PY) steps, D*PY volume These are per timestep costs. Multiply by TAUE/DT for total communication overhead. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : Standard input files will be provided for T21: MM=KK=NN=21 T42: MM=KK=NN=42 T85: MM=NN=KK=85 NLON=32 NLON=64 NLON=128 NLAT=64 NLAT=128 NVER=256 NVER=8 NVER=16 NVER=32 ICOND=2 ICOND=2 ICOND=2 DT=4800.0 DT=2400.0 DT=1200.0 TAUE=120.0 TAUE=120.0 TAUE=120.0 These are 5 day runs of the "benchmark" case specified in Williamson, et al [3]. Flops and memory requirements for serial runs are as follows (approx.): T21: 500,000 REALs 2,000,000,000 flops T42: 4,000,000 REALs 45,000,000,000 flops T85: 34,391,000 REALs 1,000,000,000,000 flops Both memory and flops scale well, so, for example, the T42 run fits in approx. 4MB of memory for a 4 processor run. But different algorithms and different aspect ratios of the processor grid use different amounts of memory. ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Count by hand (looking primarily at inner loops, but eliminating common subexpressions that compiler is expected to find). ------------------------------------------------------------------------------- From owner-pbwg-compactapp@CS.UTK.EDU Thu Oct 28 08:53:23 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA11659; Thu, 28 Oct 93 08:53:23 -0400 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07386; Thu, 28 Oct 93 08:52:54 -0400 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Thu, 28 Oct 1993 08:52:53 EDT Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA07372; Thu, 28 Oct 93 08:52:51 -0400 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA13457; Thu, 28 Oct 1993 08:52:59 -0400 Date: Thu, 28 Oct 1993 08:52:59 -0400 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9310281252.AA13457@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: SOLVER Compact Application Received: from sun2.nsfnet-relay.ac.uk by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA21681; Mon, 18 Oct 1993 01:55:44 -0400 Via: uk.ac.edinburgh.castle; Mon, 18 Oct 1993 06:31:49 +0100 Received: from epcc.ed.ac.uk by castle.ed.ac.uk id aa21204; 18 Oct 93 6:31 BST Received: from subnode.epcc.ed.ac.uk (feldspar.epcc.ed.ac.uk) by epcc.ed.ac.uk; Sun, 17 Oct 93 16:28:48 BST Date: Sun, 17 Oct 93 16:28:46 BST Message-Id: <2567.9310171528@subnode.epcc.ed.ac.uk> From: S P Booth Subject: Re: ParkBench applications To: "David W. Walker" In-Reply-To: David W. Walker's message of Fri, 15 Oct 93 13:23:46 -0500 Sorry I took so long to reply to this. If any of this needs any futher clarification don't hesitate to send me some email. spb ------------------------------------------------------------------------- PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscript format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker Oak Ridge National Laboratory Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 (615) 574-7401/0680 (phone/fax) walker@msr.epm.ornl.gov The street address is "Bethal Valley Road" if Fedex insists on this. The subcommittee will then make a final decision on whether to include your application in the ParkBench suite. 3. If your application is approved for inclusion in the ParkBench suite you (or some authorized person from your organization) will be asked in complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), your application package. ------------------------------------------------------------------------------- Name of Program : SOLVER : ------------------------------------------------------------------------------- Submitter's Name : Stephen P. Booth Submitter's Organization: UKQCD collaboration Submitter's Address : EPCC The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland Submitter's Telephone # : +44 (0)31 650 5746 Submitter's Fax # : +44 (0)31 622 4712 Submitter's Email : spb@epcc.ed.ac.uk ------------------------------------------------------------------------------- Cognizant Expert(s) : Dr S.P.Booth CE's Organization : EPCC/UKQCD CE's Address : The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland CE's Telephone # : +44 (0)31 650 5746 CE's Fax # : +44 (0)31 622 4712 CE's Email : spb@epcc.ed.ac.uk Cognizant Expert(s) : Dr R.D. Kenway CE's Organization : EPCC/UKQCD CE's Address : The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland CE's Telephone # : +44 (0)31 650 5245 CE's Fax # : +44 (0)31 622 4712 CE's Email : rdk@epcc.ed.ac.uk ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : S.Booth is prepared to respond quickly to questions and bug reports. We have a strong interest in the portability and performance of this code. ------------------------------------------------------------------------------- Major Application Field : Lattice gauge theory Application Subfield(s) : QCD ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : SOLVER is part of an ongoing software development exercise carried out by UKQCD (The United Kingdom Quantum Chromo-Dynamics collaboration) To develop a new generation of simulation codes. The current generation of codes were highly tuned for a particular machine architecture so a software development exercise was started to design and develop a set of portable codes. This code was developed by S.Booth and N.Stanford of the University of Edinburgh during the course of 1993. Solver is a benchmark code derived from the codes used to generate quark propagators. It is designed to benchmark and validate the computational sections of this operation. It differs from the production code in that it self initialises to non-trivial test data rather than performing file access. This is because there is no accepted standard for parallel file access. The benchmark was originally developed as part of a national UK procurement exercise. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : The code may be freely distributed for benchmarking purposes but the code remains the property of UKQCD and we ask to be contacted if anyone wishes to use it as an application code. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: All floating point numbers are defined as macros (either Fpoint or Dpoint) The majority of the variables are Fpoint. Dpoint is only used for accumulation values that may require higher precision. This allows the precision of the program to be changed easily. For small and intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be sufficient. For large problems higher precision may be required. INTEGERS must be large enough to hold the number of sites allocated to a processor (4 bytes almost certainly sufficient) The COMPLEX type is not used. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : Documentation exists for all program routines except some low level routines local to a single source file. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Two version of the application were developed in parallel. 1) A HPF version (both CMF and HPF directives) 2) A message passing version. The message passing version uses ansi-F77 with the following extensions a) CPP is used for include files and some simple macros and build-time conditionals. b) The F77 restrictions of variable names are not adhered to though the authors have tools to convert the code to conform. All of the message passing operations are confined to a small number of routines. These routines were designed to be implementable in as many different message passing systems as possible. Current versions are 1) fake - converts the program to a single processor code. 2) PARMACS - original parallel versions 3) PVM - under development. ------------------------------------------------------------------------------- Total number of lines in source code: 15567 Number of lines excluding comments : 10679 Size in bytes of source code : 432398 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : None ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: formatted text ------------------------------------------------------------------------------- Brief, high-level description of what application does: The application generates quark propagators from a background gauge configuration and a fermionic source. This is equivalent to solving M psi = source where psi is the quark propagator and M (a function operating on psi) depends on the gauge fields. The benchmark performs a cut down version of this operation. ------------------------------------------------------------------------------- Main algorithms used: Conjugate gradient least norm with red-black pre-conditioning. ------------------------------------------------------------------------------- Skeleton sketch of application: The benchmark code initialises the gauge field to a unit gauge configuration. (The results for a unit gauge can be calculated analytically allowing a check on the results) A gauge transformation is then applied to the gauge field. A unit gauge field only consists of zeros and ones by applying a gauge transformation non-trivial values are generated. Quantities corresponding to physical observables should be unchanged by such a transformation. In application code the gauge field would have been read in from disk. The source field is initialised to a point source (a single non-zero point on one lattice site) An iterative solver is called to generate the quark propagator. The solver routine also generates timing information. In application code this would then be dumped to disk. In the benchmark we use the quark propagator to generate a physically significant quantity (the pion propagator). This generates a single real number for each timeslice of the lattice. These values are printed to standard out. This procedure requires a large number of iterations. For benchmarking we are only interested in the time per-iteration and some check on the validity of the results. We therefore usually only perform a fixed number of iterations (say 50) to generate accurate timing information and verify the results by comparison with other machines. ------------------------------------------------------------------------------- Brief description of I/O behaviour: Unless an error occurs a single processor outputs to standard out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : A spacial decomposition is used to distribute the 4-D arrays over a 4-D grid of processors. Each dimension is distributed independently. The program supports non-regular decomposition, e.g. a lattice of width 22 will be distributed across a processor-grid of width 4 as (6, 6, 5, 5) ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : Lattice size: NX NY NZ NT processor grid: NPX NPY NPZ NPT ------------------------------------------------------------------------------- Brief description of load balance behavior : Load balancing depends only on the distribution, if the lattice size can be exactly divided by the processor grid size all processors will have the same workload. In practice it is often useful to trade load balancing for a larger number of processors. ------------------------------------------------------------------------------- Give parameters that determine the problem size : Lattice size, NX NY NZ NT problem size is NX*NY*NZ*NT ------------------------------------------------------------------------------- Give memory as function of problem size : In a production environment there are build time parameters that set the array sizes and problem/machine sizes can be set at runtime. When creating a benchmark program it seemed less confusing to set lattice and processor-grid sizes at build time and derive all other quantities from them. The appropriate parameters for memory use are Max_body (maximum number of data-points per/processor) Max_bound (maximum number of data points on a single boundary between two processors) If LX LY LZ LT are the local lattice sizes obtained by dividing the lattice size by the processor grid size and rounding up to the nearest integer. Max_body = (LX*LY*LZ*LT)/2 Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 ) The code contains a number of build-time switches for variations in the implementation that may be beneficial on some machines. The memory usage depends on these switches but typical values are: 108 * Max_body + 36 * Max_bound Fpoints 16 * (Max_body + Max_bound) INTEGERS ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : Each iteration performs 2760 floating point operations per lattice site. ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : For each iteration every processor sends 24 messages to each of its 8 neighbours each message contains one floating point number for each lattice point in the common boundary. Two global sum operations are also performed for each iteration. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : 18^3*36 2.90e+10 fp operations 24^3*48 9.16e+10 fp operations 36^3*72 4.64e+11 fp operations ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : count operations in each loop by hand. The code contains a counter to sum these values. ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- From owner-pbwg-compactapp@CS.UTK.EDU Wed Nov 3 09:19:23 1993 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with SMTP (5.61+IDA+UTK-930125/2.8t-netlib) id AA22427; Wed, 3 Nov 93 09:19:23 -0500 Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA27464; Wed, 3 Nov 93 09:18:54 -0500 X-Resent-To: pbwg-compactapp@CS.UTK.EDU ; Wed, 3 Nov 1993 09:18:53 EST Errors-To: owner-pbwg-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930922/2.8s-UTK) id AA27455; Wed, 3 Nov 93 09:18:52 -0500 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA15591; Wed, 3 Nov 1993 09:18:51 -0500 Date: Wed, 3 Nov 1993 09:18:51 -0500 From: walker@rios2.epm.ornl.gov (David Walker) Message-Id: <9311031418.AA15591@rios2.epm.ornl.gov> To: pbwg-compactapp@cs.utk.edu Subject: ARCO Compact Application Submission ------------------------------------------------------------------------------- Name of Program : ARCO Parallel Seismic Processing Benchmarks ------------------------------------------------------------------------------- Submitter's Name : Charles C. Mosher Submitter's Organization: ARCO Exploration and Production Technology Submitter's Address : 2300 West Plano Parkway Plano, TX 75075-8499 Submitter's Telephone # : (214)754-6468 Submitter's Fax # : (214)754-3016 Submitter's Email : ccm@arco.com ------------------------------------------------------------------------------- Cognizant Expert(s) : Charles C. Mosher Cognizant Expert(s) : Siamak Hassanzadeh (co-author) CE's Organization : Fujitsu America CE's Email : siamak@fai.com ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Will handle reasonable requests in a timley fashion. ------------------------------------------------------------------------------- Major Application Field : Seismic Data Processing Application Subfield(s) : Parallel I/O, signal processing, solution of PDE's ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : The application began as a prototype system for seismic data processing on parallel computing architectures. The prototype was used to design and implement production seismic processing on ARCO's Intel iPSC/860, where it is used today. Like other companies, ARCO continues to upgrade our HPC facilities. We found that we were spending a large amount of time on benchmarking, as were other companies in the oil industry. We decided to place our system in the public domain as a benchmark suite, in the hopes that the benchmarking effort could be spread across many participants. In addition, we hope to use the system as a mechanism for code development and sharing between academia, national labs, and industry. Our first attempt was to work with the Perfect Benchmark Club at the University of Illinois Center for Supercomputing Research and Development. Many members of that group provided valuable input that significantly improved the structure and content of the suite. Special thanks to David Schneider for his work on organizing and managing the Perfect effort. Perfect has since disbanded, which leads us to the ParKBench submission. A consulting organization (Resource 2000) has also picked up the code and is providing newsletter subscriptions to participants in the oil industry describing both benchmark numbers and commentary on usability of the sytems tested. Thanks to Randy Premont, Gary Montry, and Clive Bailley of Resource 2000 for their continuing work to make the ARCO suite a viable benchmark. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : The code may be freely distributed. We request that ARCO and the authors be acknowledged in publications. In order to ensure relevance of the codes in the suite, the authors plan to retain control of the source and algorithms contained therein, and request that suggestions for changes and updates be directed to the authors only. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: Integers : 4 bytes Floats : 4 bytes ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : High level: ARCO Seismic Benchark Suite Users's Guide Low level: source comments ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : Yilmaz, Ozdogan, 1990, Seismic Data Processing: Investigations in Geophysics vol. 2, Society of Exploration Geophysicists, P.O. Box 702740, Tulsa, Oklahoma, 74170 ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite for Parallel Seismic Processing, Supercomputing 1992 proceedings. ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Language: Fortran 77 Message Passing: Yet Another Message Passing Layer (YAMPL) Sample implementations for PVM, Intel NX, TCGMSG Machines Supported: Workstation clusters and multiprocessors (i.e. Sun, Dec, HP, IBM, SGI) Cray YMP Intel iPSC/860 ------------------------------------------------------------------------------- Total number of lines in source code: ~ 20000 Number of lines excluding comments : ~ 15000 Size in bytes of source code : ~ 1 MByte ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : ASCI parameter files, 10-100 lines ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : Binary seismic data files, 1 MByte (small), 1 GByte (medium), 10 Gbyte (large), 100 Gbyte (huge) ------------------------------------------------------------------------------- Brief, high-level description of what application does: Synthetic seismic data for small, medium and large test cases are generated in the native format of the target machine. The test data are read and processed in parallel, and the output is written to disk. Simple checksum and timing tables are printed to standard output. A simple x-windows image display tool is used to verify correctness of results. ------------------------------------------------------------------------------- Main algorithms used: Signal processing (FFT's, Toepplitz equation solvers, interpolation) Seismic Imaging (Fourier domain, Kirchhoff integral, finite difference algorithms) ------------------------------------------------------------------------------- Skeleton sketch of application: Processing modules are applied in a pipeline fashion to 2D arrays of seismic data read from disk. Processing flows are of the form READ-FLTR-MIGR-WRIT. The same flow is executed on all processors. Individual modules communicate via message passing to implement parallel algorithms. Nearly all message passing is hidden via transpose operations that change the parallel data distribution as appropriate for each algorithm. ------------------------------------------------------------------------------- Brief description of I/O behavior: 2D arrays are read/written from HDF style files on disk. Parallel I/O is supported for both a single large file read by multiple processors, and a a separate file read by each processor. A significant part of the seismic processing flow requires data to be read in transposed fashion across all processors. ------------------------------------------------------------------------------- Brief description of load balance behavior : Assumes a homogeneous array of processors with similar capabilities. Load balance is rudimentary, with an attempt to distribute equal-sized 'workstation' chunks of work. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : Seismic data is inherently parallel, with large data sets that offer mutliple opportunities for parallel operation. Typically, the data is treated as a collection of 2D arrays, with each processor owning a 'slab' of data. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The data is defined as a 4-dimensional array with Fortran dimensions (sample, trace, frame, volume). The third dimension (frame) is typically spread across the processors. ------------------------------------------------------------------------------- Give parameters that determine the problem size : The ASCII parameter files define the data set size in terms of the number of samples per seismic traces, the number of traces per shot, the number of shooting lines, and the number of 3D volumes. ------------------------------------------------------------------------------- Give memory as function of problem size : Requires enough memory to hold 2 frames on each node, and a 3D volume spread across the node. ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : Reported by code as appropriate. On a Cray YMP, medium sized problems with 750 MB of output run at 30-100 Mflops for about an hour. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : On an Intel iPSC/860, there are parts of the suite that have comp/comm ratios ranging from near infinite to 1/10. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : small: 1 MB output, 10 sec on YMP medium: 1 GB output, 1 hour on YMP large: 10 GB output, 10 hours on YMP ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Hand count for simple operations, Regression analysis of Cray HPM results for more complex operations. ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 09:57:45 1994 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib) id JAA13757; Tue, 22 Mar 1994 09:57:44 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id JAA09199; Tue, 22 Mar 1994 09:57:20 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 09:57:19 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id JAA09186; Tue, 22 Mar 1994 09:57:17 -0500 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA24475; Tue, 22 Mar 1994 09:57:26 -0500 Message-Id: <9403221457.AA24475@rios2.epm.ornl.gov> To: ccm@arco.com Cc: pbwg-compactapp@CS.UTK.EDU Subject: ParkBench code Date: Tue, 22 Mar 94 09:57:26 -0500 From: "David W. Walker" Dear Dr. Mosher, Thank you for submitting the ARCO Parallel Seismic Processing Benchmarks for inclusion in the ParkBench Compact Applications benchmark suite. After due consideration the Compact Applications subcommittee has decided to include the code in the benchmark suite. I would be grateful if you would arrange for the source code, input, and output files to be sent to me. To submit your code please send me the following: 1. The complete source code 2. Input files corresponding to the small, medium, and large cases described in your submission 3. An output file corresponding to the small case to be used for validation purposes 4. PostScript files of the following papers (if available) Mosher, C., Hassanzadeh, S., and Schneider, D., 1992, A Benchmark Suite for Parallel Seismic Processing, Supercomputing 1992 proceedings. ARCO Seismic Benchark Suite Users's Guide and any other relevant papers you may have online. If you have versions of the code using different message passing packages please supply multiple versions of the source code. Ultimately we would like the codes to be self-validating. Please can you let me have any suggestions on what quantities might be checked to validate the code. All the above will probably come to several Mbytes, so it is probably not appropriate to email it to me. Do you have an anonymous ftp site where I could copy the files from? Best Regards, David Walker From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:12:48 1994 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib) id KAA13948; Tue, 22 Mar 1994 10:12:48 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10288; Tue, 22 Mar 1994 10:11:05 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:10:55 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10257; Tue, 22 Mar 1994 10:10:50 -0500 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA18866; Tue, 22 Mar 1994 10:07:46 -0500 Message-Id: <9403221507.AA18866@rios2.epm.ornl.gov> To: mia@unixa.nerc-bidston.ac.uk Cc: pbwg-compactapp@CS.UTK.EDU Subject: ParkBench code Date: Tue, 22 Mar 94 10:07:46 -0500 From: "David W. Walker" Dear Dr. Ashworth, Thank you for submitting the POLMP code for inclusion in the ParkBench Compact Applications benchmark suite. After due consideration the Compact Applications subcommittee has decided to include the code in the benchmark suite. I would be grateful if you would arrange for the source code, input, and output files to be sent to me. To submit your code please send me the following: 1. The complete source code 2. Input files corresponding to the small, medium, and large cases described in your submission (v200, wa200, xb200) 3. An output file corresponding to the small case to be used for validation purposes 4. PostScript files of the following papers mentioned in your submission describing the sequential and parallel codes (if available). Also the users guide if there is one. If you have versions of the code using different message passing packages please supply multiple versions of the source code. Ultimately we would like the codes to be self-validating. Please can you let me have any suggestions on what quantities might be checked to validate the code. If the above files are too large to email to me, please let me know if there is an anonymous ftp site where I can copy them from. Best Regards, David Walker -------------------------------------------------------------------------- | David W. Walker | Office : (615) 574-7401 | | Oak Ridge National Laboratory | Fax : (615) 574-0680 | | Building 6012/MS-6367 | Messages : (615) 574-1936 | | P. O. Box 2008 | Email : walker@msr.epm.ornl.gov | | Oak Ridge, TN 37831-6367 | | -------------------------------------------------------------------------- ------------------------------------------------------------------------------- Name of Program : POLMP (Proudman Oceanographic Laboratory Multiprocessing Program) ------------------------------------------------------------------------------- Submitter's Name : Mike Ashworth Submitter's Organization: NERC Computer Services Submitter's Address : Bidston Observatory Birkenhead, L43 7RA, UK Submitter's Telephone # : +44-51-653-8633 Submitter's Fax # : +44-51-653-6269 Submitter's Email : mia@ua.nbi.ac.uk ------------------------------------------------------------------------------- Cognizant Expert : Mike Ashworth CE's Organization : NERC Computer Services CE's Address : Bidston Observatory Birkenhead, L43 7RA, UK CE's Telephone # : +44-51-653-8633 CE's Fax # : +44-51-653-6269 CE's Email : mia@ua.nbi.ac.uk ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Bearing in mind other commitments, Mike Ashworth is prepared to respond quickly to questions and bug reports, and expects to be kept informed as to results of experiments and modifications to the code. ------------------------------------------------------------------------------- Major Application Field : Fluid Dynamics Application Subfield(s) : Ocean and Shallow Sea Modeling ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : The POLMP project was created to develop numerical algorithms for shallow sea 3D hydrodynamic models that run efficiently on modern parallel computers. A code was developed, using a set of portable programming conventions based upon standard Fortran 77, which follows the wind induced flow in a closed rectangular basin including a number of arbitrary land areas. The model solves a set of hydrodynamic partial differential equations, subject to a set of initial conditions, using a mixed explicit/implicit forward time integration scheme. The explicit component corresponds to a horizontal finite difference scheme and the implicit to a functional expansion in the vertical (Davies, Grzonka and Stephens, 1989). By the end of 1989 the code had been implemented on the RAL 4 processor Cray X-MP using Cray's microtasking system, which provides parallel processing at the level of the Fortran DO loop. Acceptable parallel performance was achieved by integrating each of the vertical modes in parallel, referred to in Ashworth and Davies (1992) as vertical partitioning. In particular, a speed-up of 3.15 over single processor execution was obtained, with an execution rate of 548 Megaflops corresponding to 58 per cent of the peak theoretical performance of the machine. Execution on an 8 processor Cray Y-MP gave a speed-up efficiency of 7.9 and 1768 Megaflops or 67 per cent of the peak (Davies, Proctor and O'Neill, 1991). The latter resulted in Davies and Grzonka being awarded a prize in the 1990 Cray Gigaflop Performance Awards . The project has been extended by implementing the shallow sea model in a form which is more appropriate to a variety of parallel architectures, especially distributed memory machines, and to a larger number of processors. It is especially desirable to be able to compare shared memory parallel architectures with distributed memory architectures. Such a comparison is currently relevant to NERC science generally and will be a factor in the considerations for the purchase of new machines, bids for allocations on other academic machines, and for the design of new codes and the restructuring of existing codes. In order to simplify development of the new code and to ensure a proper comparison between machines, a restructured version of the Davies and Grzonka rectangle was designed which will perform partitioning of the region in the horizontal dimension. This has the advantage over vertical partitioning that the communication between processors is limited to a few points at the boundaries of each sub-domain. The ratio of interior points to boundary points, which determines the ratio of computation to communication and hence the efficiency on message passing, distributed memory machines, may be increased by increasing the size of the individual sub-domains. This design may also improve the efficiency on shared memory machines by reducing the time of the critical section and reducing memory conflicts between processors. In addition, the required number of vertical modes is only about 16, which, though well suited to a 4 or 8 processor machine, does not contain sufficient parallelism for more highly parallel machines. The code has been designed with portability in mind, so that essentially the same code may be run on parallel computers with a range of architectures. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : Yes, but users are requested to acknowledge the authors (Ashworth and Davies) in any resulting research or publications, and are encouraged to send reprints of their work with this code to the authors. Also, the authors would appreciate being notified of any modifications to the code. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: Some 8 byte floating point numbers are used in some of the initialization code, but calculations on the main field arrays may be done using 4 byte floating point variables without grossly affecting the solution. Nevertheless, precision conversion is facilitated by a switch supplied to the C preprocessor. By specifying -DSINGLE, variables will be declared as REAL, normally 4 bytes, whereas -DDOUBLE will cause declarations to be DOUBLE PRECISION, normally 8 bytes. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : The README file supplied with the code describes how the various versions of the code should be built. Extensive documentation, including the definition of all variables in COMMON is present as comments in the code. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : 1) Davies, A.M., Formulation of a linear three-dimensional hydrodynamic sea model using a Galerkin-eigenfunction method, Int. J. Num. Meth. in Fliuds, 1983, Vol. 3, 33-60. 2) Davies, A.M., Solution of the 3D linear hydrodynamic equations using an enhanced eigenfunction approach, Int. J. Num. Meth. in Fluids, 1991, Vol. 13, 235-250. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : 1) Ashworth, M. and Davies, A.M., Restructuring three-dimensional hydrodynamic models for computers with low and high degrees of parallelism, in Parallel Computing '91, eds D.J.Evans, G.R.Joubert and H.Liddell (North Holland, 1992), 553-560. 2) Ashworth, M., Parallel Processing in Environmental Modelling, in Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993. 3) Ashworth, M. and Davies, A.M., Performance of a Three Dimensional Hydrodynamic Model on a Range of Parallel Computers, in Proceedings of the Euromicro Workshop on Parallel and Distributed Computing, Gran Canaria 27-29 January 1993, pp 383-390, (IEEE Computer Society Press) 4) Davies, A.M., Ashworth, M., Lawrence, J., O'Neill, M., Implementation of three dimensional shallow sea models on vector and parallel computers, 1992a, CFD News, Vol. 3, No. 1, 18-30. 5) Davies, A.M., Grzonka, R.G. and Stephens, C.V., The implementation of hydrodynamic numerical sea models on the Cray X-MP, 1992b, in Advances in Parallel Computing, Vol. 2, edited D.J. Evans. 6) Davies, A.M., Proctor, R. and O'Neill, M., "Shallow Sea Hydrodynamic Models in Environmental Science", Cray Channels, Winter 1991. ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Code is initially passed through the C preprocessor, allowing a number of versions with different programming styles, precisions and machine dependencies to be generated. Fortran 77 version A sequential version of POLMP is available, which conforms to the Fortran 77 standard. This version has been run on a large number of machines from workstations to supercomputers and any code which caused problems, even if it conformed to the standard, has been changed or removed. Thus its conformance to the Fortran 77 standard is well established. In order to allow the code to run on a wide range of problem sizes without recompilation, the major arrays are defined dynamically by setting up pointers, with names starting with IX, which point to locations in a single large data array: SA. Most pointers are allocated in subroutine MODSUB and the starting location passed down into subroutines in which they are declared as arrays. For example : IX1 = 1 IX2 = IX1 + N*M CALL SUB ( SA(IX1), SA(IX2), N, M ) SUBROUTINE SUB ( A1, A2, N, M ) DIMENSION A1(N,M), A2(N,M) END Although this is probably against the spirit of the Fortran 77 standard, it is considered the best compromise between portability and utility, and has caused no problems on any of the machines on which it has been tried. The code has been run on a number of traditional vector supercomputers, mainframes and workstations. In addition, key loops are able to be parallelized automatically by some compilers on shared (or virtual shared) memory MIMD machines, allowing parallel execution on the Convex C2 and C3, Cray X-MP, Y-MP, and Y-MP/C90, and Kendall Square Research KSR-1. Cray macrotasking calls may also be enabled for an alternative mode of parallel execution on Cray multiprocessors. Message passing version POLMP has been implemented on a number of message-passing machines: Intel iPSC/2 and iPSC/860, Meiko CS-1 i860 and CS-2 and nCUBE 2. Code is also present for the PVM and Parmacs portable message passing systems, and POLMP has run successfully, though not efficiently, on a network of Silicon Graphics workstations. Calls to message passing routines are concentrated in a small number of routines for ease of portability and maintenance. POLMP performs housekeeping tasks on one node of the parallel machine, usually node zero, referred to in the code as the driver process, the remaining processes being workers. For Parmacs version 5 which requires a host program, a simple host program has been provided which loads the node program onto a two dimensional torus and then takes no further part in the run, other than to receive a completion code from the driver, in case terminating the host early would interfere with execution of the nodes. Data parallel versions A data parallel version of the code has been run on the Thinking Machines CM-2, CM-200 and MasPar MP-1 machines. High Performance Fortran (HPF) defines extensions to the Fortran 90 language in order to provide support for parallel execution on a wide variety of machines using a data parallel programming model. The subset-HPF version of the POLMP code has been written to the draft standard specified by the High Performance Fortran Forum in the HPF Language Specification version 0.4 dated November 6, 1992. Fortran 90 code was developed on a Thinking Machines CM-200 machine and checked for conformance with the Fortran 90 standard using the NAGWare Fortran 90 compiler. HPF directives were inserted by translating from the CM Fortran directives, but have not been tested due to the lack of access to an HPF compiler. The only HPF features used are the PROCESSORS, TEMPLATE, ALIGN and DISTRIBUTE directives and the system inquiry intrinsic function NUMBER_OF_PROCESSORS. ------------------------------------------------------------------------------- Total number of lines in source code: 26,699 Number of lines excluding comments : 11,313 Size in bytes of source code : 756,107 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : steering file: 13 lines, 250 bytes, ascii (typical size) ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: 700 lines, 62,000 bytes, ascii (typical size) ------------------------------------------------------------------------------- Brief, high-level description of what application does: POLMP solves the linear three-dimensional hydrodynamic equations for the wind induced flow in a closed rectangular basin of constant depth which may include an arbitrary number of land areas. ------------------------------------------------------------------------------- Main algorithms used: The discretized form of the hydrodynamic equations are solved for field variables, z, surface elevation, and u and v, horizontal components of velocity. The fields are represented in the horizontal by a staggered finite difference grid. The profile of vertical velocity with depth is represented by the superposition of a number of spectral components. The functions used in the vertical are arbitrary, although the computational advantages of using eigenfunctions (modes) of the eddy viscosity profile have been demonstrated (Davies, 1983). Velocities at the closed boundaries are set to zero. Each timestep in the forward time integration of the model, involves successive updates to the three fields, z, u and v. New field values computed in each update are used in the subsequent calculations. A five point finite difference stencil is used, requiring only nearest neighbours on the grid. A number of different data storage and data processing methods is included mainly for handling cases with significant amounts of land, e.g. index array, packed data. In particular the program may be switched between masked operation, more suitable for vector processors, in which computation is done on all points, but land and boundary points are masked out, and strip-mining, more suitable for scalar and RISC processors, in which calculations are only done for sea points. ------------------------------------------------------------------------------- Skeleton sketch of application: The call chart of the major subroutines is represented thus: AAAPOL -> APOLMP -> INIT -> RUNPOL -> INIT2 -> MAP -> DIVIDE -> PRMAP -> GENSTP -> SPEC -> ROOTS -> TRANS -> SNDWRK -> RCVWRK -> SETUP -> MODSUB -> MODEL -> ASSIGN -> GENMSK -> GENSTP -> GENIND -> GENPAC -> METRIC -> CLRFLD -> TIME* -> SNDBND -> RCVBND -> RESULT -> SNDRES -> RCVRES -> MODOUT -> OZUVW -> OUTFLD -> GETRES -> OUTARR -> GRYARR -> WSTATE AAAPOL is a dummy main program calling APOLMP. APOLMP calls INIT which reads parameters from the steering file, checks and monitors them. RUNPOL is then called which calls another initialization routine INIT2. Called from INIT2, MAP forms a map of the domain to be modelled, DIVIDE divides the domain between processors, PRMAP maps sub-domains onto processors, GENSTP counts indexes for strip-mining and SPEC, ROOTS and TRANS set up the coefficients for the spectral expansion. SNDWRK on the driver process sends details of the sub-domain to be worked on to each worker. RCVWRK receives that information. SETUP does some array allocation and MODSUB does the main allocation of array space to the field and ancillary arrays. MODEL is the main driver subroutine for the model. ASSIGN calls routines to generate masks strip-mining indexes, packing indexes and measurement metrics. CLRFLD initializes the main data arrays. Then one of seven time- stepping routines, TIME*, is chosen dependent on the vectorization and packing/indexing method used to cope with the presence of land. SNDBND and RCVBND handle the sending and reception of boundary data between sub-domains. After the required number of time-steps is complete, RESULT saves results from the desired region, and SNDRES, on the workers and RCVRES on the driver collect the result data. MODOUT handles the writing of model output to standard output and disk files, as required. For a non-trivial run, 99% of time is spent in whichever of the timestepping routines, TIME*, has been chosen. ------------------------------------------------------------------------------- Brief description of I/O behavior: The driver process, usually processor 0, reads in the input parameters and broadcasts them to the rest of the processors. The driver also receives the results from the other processors and writes them out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : The processors are treated as a logical 2-D grid. The simulation domain is divided into a number of sub-domains which are allocated, one sub-domain per processor. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The number of processors, p, and the number of sub-domains are provided as steering parameters, as is a switch which requests either one-dimensional or two-dimensional partitioning. Partitioning is only actually carried out for the message passing versions of the code. For two-dimensional partitioning p is factored into px and py where px and py are as close as possible to sqrt(p). For the data parallel version the number of sub-domains is set to one and decomposition is performed by the compiler via data distribution directives. ------------------------------------------------------------------------------- Brief description of load balance behavior : Unless land areas are specified, the load is fairly well balanced. If px and py evenly divide the number of grid points, then the model is perfectly balanced except that boundary sub-domains have fewer communications. No tests with land areas have yet been performed with the parallel code, and more sophisticated domain decomposition algorithms have not yet been included. ------------------------------------------------------------------------------- Give parameters that determine the problem size : nx, ny Size of horizontal grid m Number of vertical modes nts Number of timesteps to be performed ------------------------------------------------------------------------------- Give memory as function of problem size : See below for specific examples. ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : Assuming stanrdard compiler optimizations, there is a requirement for 29 floating point operations (18 add/subtracts and 11 multiplies) per grid point, so the total computational load is 29 * nx * ny * m * nts ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : During each timestep each sub-domain of size nsubx=nx/px by nsuby=ny/py requires the following communications in words : nsubx * m from N nsubx from S nsubx * m from S nsuby * m from W nsuby from E nsuby * m from E m from NE m from SW making a total of (2 * m + 1)*(nsubx * nsuby) + 2*m words in eight messages from six directions. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : The data sizes and computational requirements for the various problems supplied are : Name nx x ny x m x nts Computational Memory Load (Gflop) (Mword) dbg 10 x 10 x 1 x 2 Small debugging test case dbg2d 10 x 10 x 1 x 2 Small debugging test case for a 2 x 2 decomposition v200 512 x 512 x 16 x 200 24 14 wa200 1024 x 1024 x 40 x 200 226 126 xb200 2048 x 2048 x 80 x 200 1812 984 The memory sizes are the number of Fortran real elements (words) required for the strip-mined case on a single processor. For the masked case the memory requirement is approximately doubled for the extra mask arrays. For the message passing versions, the total memory requirement will also tend to increase slightly (<10%) with the number of processors employed. ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Count by hand looking at inner loops and making reasonable assumptions about common compiler optimizations. ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- -- ,?, (o o) |------------------------------oOO--(_)--OOo----------------------------| | | | Dr Mike Ashworth NERC Computer Services | | NERC Supercomputing Consultant Bidston Observatory | | Tel: +44 51 653 8633 BIRKENHEAD | | Fax: +44 51 653 6269 L43 7RA | | email: mia@ua.nbi.ac.uk United Kingdom | | alternative: M.Ashworth@ncs.nerc.ac.uk | |-----------------------------------------------------------------------| From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:14:36 1994 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib) id KAA13973; Tue, 22 Mar 1994 10:14:35 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10524; Tue, 22 Mar 1994 10:14:19 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:14:18 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10516; Tue, 22 Mar 1994 10:14:14 -0500 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA18130; Tue, 22 Mar 1994 10:14:23 -0500 Message-Id: <9403221514.AA18130@rios2.epm.ornl.gov> To: worley@rios2.epm.ornl.gov Cc: pbwg-compactapp@CS.UTK.EDU Subject: ParkBench code Date: Tue, 22 Mar 94 10:14:23 -0500 From: "David W. Walker" Dear Pat, Thank you for submitting the PSTSWM for inclusion in the ParkBench Compact Applications benchmark suite. After due consideration the Compact Applications subcommittee has decided to include the code in the benchmark suite. I would be grateful if you would arrange for the source code, input, and output files to be sent to me. To submit your code please send me the following: 1. The complete source code 2. Input files corresponding to the small, medium, and large cases described in your submission (T21, T42, and T85) 3. An output file corresponding to the small case to be used for validation purposes 4. PostScript files of any papers describing the sequential and parallel algorithms that you may have available. If you have versions of the code using different message passing packages please supply multiple versions of the source code. Ultimately we would like the codes to be self-validating. Please can you let me have any suggestions on what quantities might be checked to validate the code. Best Regards, David Walker -------------------------------------------------------------------------- | David W. Walker | Office : (615) 574-7401 | | Oak Ridge National Laboratory | Fax : (615) 574-0680 | | Building 6012/MS-6367 | Messages : (615) 574-1936 | | P. O. Box 2008 | Email : walker@msr.epm.ornl.gov | | Oak Ridge, TN 37831-6367 | | -------------------------------------------------------------------------- PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscipt format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker -------------------------------------------------------------------------- | David W. Walker | Office : (615) 574-7401 | | Oak Ridge National Laboratory | Fax : (615) 574-0680 | | Building 6012/MS-6367 | Messages : (615) 574-1936 | | P. O. Box 2008 | Email : walker@msr.epm.ornl.gov | | Oak Ridge, TN 37831-6367 | | -------------------------------------------------------------------------- ------------------------------------------------------------------------------- Name of Program : PSTSWM : (Parallel Spectral Transform Shallow Water Model) ------------------------------------------------------------------------------- Submitter's Name : Patrick H. Worley Submitter's Organization: Oak Ridge National Laboratory Submitter's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 Submitter's Telephone # : (615) 574-3128 Submitter's Fax # : (615) 574-0680 Submitter's Email : worley@msr.epm.ornl.gov ------------------------------------------------------------------------------- Cognizant Expert(s) : Patrick H. Worley CE's Organization : Oak Ridge National Laboratory CE's Address : Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 CE's Telephone # : (615) 574-3128 CE's Fax # : (615) 574-0680 CE's Email : worley@msr.epm.ornl.gov Cognizant Expert(s) : Ian T. Foster CE's Organization : Argonne National Laboratory CE's Address : MCS 221/D-235 9700 S. Cass Avenue Argonne, IL 60439 CE's Telephone # : (708) 252-4619 CE's Fax # : (708) 252-5986 CE's Email : itf@mcs.anl.gov ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : Modulo other commitments, Worley is prepared to respond quickly to questions and bug reports, but expects to be kept informed as to results of experiments and modifications to the code. ------------------------------------------------------------------------------- Major Application Field : Fluid Dynamics Application Subfield(s) : Climate Modeling ------------------------------------------------------------------------------- Application "pedigree" : PSTSWM Version 1.0 is a message-passing benchmark code and parallel algorithm testbed that solves the nonlinear shallow water equations using the spectral transform method. The spectral transform algorithm of the code follows closely how CCM2, the NCAR Community Climate Model, handles the dynamical part of the primitive equations, and the parallel algorithms implemented in the model include those currently used in the message-passing parallel implementation of CCM2. PSTSWM was written by Patrick Worley of Oak Ridge National Laboratory and Ian Foster of Argonne National Laboratory, and is based partly on previous parallel algorithm research by John Drake, David Walker, and Patrick Worley of Oak Ridge National Laboratory. Both the code development and parallel algorithms research were funded by the DOE Computer Hardware, Advanced Mathematics, and Model Physics (CHAMMP) program. The features of version 1.0 were frozen on 8/1/93, and it is this version we would offer initially as a benchmark. PSTSWM is a parallel implementation of a sequential code (STSWM 2.0) written by James Hack and Ruediger Jakob at NCAR to solve the shallow water equations on a sphere using the spectral transform method. STSWM evolved from a spectral shallow water model written by Hack (NCAR/CGD) to compare numerical schemes designed to solve the divergent barotropic equations in spherical geometry. STSWM was written partially to provide the reference solutions to the test cases proposed by Williamson et. al. (see citation [4] below), which were chosen to test the ability of numerical methods to simulate important flow phenomena. These test cases are embedded in the code and are selectable at run-time via input parameters, specifying initial conditions, forcing, and analytic solutions (for error analysis). The solutions are also published in a Technical Note by Jakob et. al. [3]. In addition, this code is meant to serve as an educational tool for numerical studies of the shallow water equations. A detailed description of the spectral transform method, and a derivation of the equations used in this software, can be found in the Technical Note by Hack and Jakob [2]. For PSTSWM, we rewrote STSWM to add vertical levels (in order to get the correct communication and computation granularity for 3-D weather and climate codes), to increase modularity and support code reuse, and to allow the problem size to be selected at runtime without depending on dynamic memory allocation. PSTSTWM is meant to be a compromise between paper benchmarks and the usual fixed benchmarks by allowing a significant amount of runtime-selectable algorithm tuning. Thus, the goal is to see how quickly the numerical simulation can be run on different machines without fixing the parallel implementation, but forcing all implementations to execute the same numerical code (to guarantee fairness). The code has also been written in such a way that linking in optimized library functions for common operations instead of the "portable" code will simple. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : Yes, but users are requested to acknowledge the authors (Worley and Foster) and the program that supported the development of the code (DOE CHAMMP program) in any resulting research or publications, and are encouraged to send reprints of their work with this code to the authors. Also, the authors would appreciate being notified of any modifications to the code. Finally, the code has been written to allow easy reuse of code in other applications, and for educational purposes. The authors encourage this, but also request that they be notified when pieces of the code are used. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: The program currently uses INTEGER, REAL, COMPLEX, and DOUBLE PRECISION variables. The code should work correctly for any system in which COMPLEX is represented as 2 REALs. The include file params.i has parameters that can be used to specify the length of these. Also, some REAL and DOUBLE parameters values may need to be modified for floating point number systems with large mantissas, e.g., PI, TWOPI. PSTSWM is currently being used on systems where Integers : 4 bytes Floats : 4 bytes The use of two precisions can be eliminated, but at the cost of a significant loss of precision. (For 4 bytes REALs, not using DOUBLE PRECISION increases the error by approximately three orders of magnitude.) DOUBLE PRECISION results are only used in set-up (computing Gauss weights and nodes and Legendre polynomial values), and are not used in the body of the computation. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : The sequential code is documented in a file included in the distribution of the code from NCAR: Jakob, Ruediger, Description of Software for the Spectral Transform Shallow Water Model Version 2.0. National Center for Atmospheric Research, Boulder, CO 80307-3000, August 1992 and in Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. Documentation of the parallel code is in preparation, but extensive documentation is present in the code. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : 1) Browning, G.L., J.J. Hack and P.N. Swarztrauber, A comparison of three numerical methods for solving differential equations on the sphere, Monthly Weather Review, 117:1058-1075, 1989. 2) Hack, J.J. and R. Jakob, Description of a global shallow water model based on the spectral transform method, NCAR Technical Note TN-343+STR, January 1992. 3) Jakob, R., J.J. Hack and D.L. Williamson, Reference solutions to shallow water test set using the spectral transform method, NCAR Technical Note TN-388+STR (in preparation). 4) Williamson, D.L., J.B. Drake, J.J. Hack, R. Jakob and P.S. Swarztrauber, A standard test set for numerical approximations to the shallow water equations in spherical geometry, Journal of Computational Physics, Vol. 102, pp.211-224, 1992. ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : 5) Worley, P. H. and J. B. Drake, Parallelizing the Spectral Transform Method, Concurrency: Practice and Experience, Vol. 4, No. 4 (June 1992), pp. 269-291. 6) Walker, D. W., P. H. Worley, and J. B. Drake, Parallelizing the Spectral Transform Method. Part II, Concurrency: Practice and Experience, Vol. 4, No. 7 (October 1992), pp. 509-531. 7) Foster, I. T. and P. H. Worley, Parallelizing the Spectral Transform Method: A Comparison of Alternative Parallel Algorithms, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 100-107. 8) Foster, I. T. and P. H. Worley, Parallel Algorithms for the Spectral Transform Method, (in preparation) 9) Worley, P. H. and I. T. Foster, PSTSWM: A Parallel Algorithm Testbed and Benchmark. (in preparation) ------------------------------------------------------------------------------- Other relevent research papers: 10) I. Foster, W. Gropp, and R. Stevens, The parallel scalability of the spectral transform method, Mon. Wea. Rev., 120(5), 1992, pp. 835--850. 11) Drake, J. B., R. E. Flanery, I. T. Foster, J. J. Hack, J. G. Michalakes, R. L. Stevens, D. W. Walker, D. L. Williamson, and P. H. Worley, The Message-Passing Version of the Parallel Community Climate Model, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 500-513. 12) Sato, R. K. and R. D. Loft, Implementation of the NCAR CCM2 on the Connection Machine, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 371-393. 13) Barros, S. R. M. and Kauranne, T., On the Parallelization of Global Spectral Eulerian Shallow-Water Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 36-43. 14) Kauranne, T. and S. R. M. Barros, Scalability Estimates of Parallel Spectral Atmospheric Models, Proceedings of the Fifth ECMWF Workshop on Use of Parallel Processors in Meteorology (Nov. 23-27, 1992) Hoffman, G.-R and T. Kauranne, ed., World Scientific Publishing Co. Pte. Ltd, Singapore, 1993, pp. 312-328. 15) Pelz, R. B. and W. F. Stern, A Balanced Parallel Algorithm for Parallel Processing, Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing (March22-24, 1993), pp. 126-128. ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : The model code is primarily written in Fortran 77, but also uses DO ... ENDDO and DO WHILE ... ENDDO, and the INCLUDE extension (to pull in common and parameter declarations). It has been compiled and run on the Intel iPSC/2, iPSC/860, Delta, and Paragon, the IBM SP1, and on Sun Sparcstation, IBM RS/6000, and Stardent 3000/1500 workstations (as a sequential code). Message passing is implemented using the PICL message passing system. All message passing is encapsulated in 3 highlevel routines: BCAST0 (broadcast) GMIN0 (global minimum) GMAX0 (global maximum) two classes of low level routines: SWAP, SWAP_SEND, SWAP_RECV, SWAP_RECVBEGIN, SWAP_RECVEND, SWAP1, SWAP2, SWAP3 (variants and/or pieces of the swap operation) and SENDRECV, SRBEGIN, SREND, SR1, SR2, SR3 (variants and/or pieces of the send/recv operation) and one synchronization primitive: CLOCKSYNC0 PICL instrumentation commands are also embedded in the code. Porting the code to another message passing library will be simple, although some of the runtime communication options may become illegal then. The PICL instrumentation calls can be stubbed out (or removed) without changing the functionality of the code, but some sort of synchronization is needed when timing short benchmark runs. ------------------------------------------------------------------------------- Total number of lines in source code: 28,204 Number of lines excluding comments : 12,434 Size in bytes of source code : 994,299 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : problem: 23 lines, 559 bytes, ascii algorithm: 33 lines, 874 bytes, ascii ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: Number of lines and bytes is a function of the input specifications, but for benchmarking would normally be 63 lines (2000 bytes) of meaningful output. (On the Intel machine, FORTRAN STOP messages are sent from each processor at the end of the run, increasing this number.) timings: Each run produces one line of output, containing approx. 150 bytes. Both files are ascii. ------------------------------------------------------------------------------- Brief, high-level description of what application does: (P)STSWM solves the nonlinear shallow water equations on the sphere. The nonlinear shallow water equations constitute a simplified atmospheric-like fluid prediction model that exhibits many of the features of more complete models, and that has been used to investigate numerical methods and benchmark a number of machines. Each run of PSTSWM uses one of 6 embedded initial conditions and forcing functions. These cases were chosen to stress test numerical methods for this problem, and to represent important flows that develop in atmospheric modeling. STSWM also supports reading in arbitrary initial conditions, but this was removed from the parallel code to simplify the development of the initial implementation. ------------------------------------------------------------------------------- Main algorithms used: PSTSWM uses the spectral transform method to solve the shallow water equations. During each timestep, the state variables of the problem are transformed between the physical domain, where most of the physical forces are calculated, and the spectral domain, where the terms of the differential equation are evaluated. The physical domain is a tensor product longitude-latitude grid. The spectral domain is the set of spectral coefficients in a spherical harmonic expansion of of the state variables, and is normally characterized as a triangular array (using a "triangular" truncation of spectral coefficients). Transforming from physical coordinates to spectral coordinates involves performing a real FFT for each line of constant latitude, followed by integration over latitude using Gaussian quadrature (approximating the Legendre transform) to obtain the spectral coefficients. The inverse transformation involves evaluating sums of spectral harmonics and inverse real FFTs, analogous to the forward transform. Parallel algorithms are used to compute the FFTs and to compute the vector sums used to approximate the forward and inverse Legendre transforms. Two major alternatives are available for both transforms, distributed algorithms, using a fixed data decompostion and computing results where they are assigned, and transpose algorithms, remapping the domains to allow the transforms to be calculated sequentially. This translates to four major parallel algorithms: a) distributed FFT/distributed Legendre transform (LT) b) transpose FFT/distributed LT c) distributed FFT/transpose LT d) transpose FFT/transpose LT Multiple implementations are supported for each type of algorithm, and the assignment of processors to transforms is also determined by input parameters. For example, input parameters specify a logical 2-D processor grid and define the data decomposition of the physical and spectral domains onto this grid. If 16 processors are used, these can be arranged as a 4x4 grid, an 8x2 grid, a 16x1 grid, a 2x8 grid, or a 1x16 grid. This specification determines how many processors are used to calculate each parallel FFT and how many are used to calculate each parallel LT. ------------------------------------------------------------------------------- Skeleton sketch of application: The main program calls INPUT to read problem and algorithm parameters and set up arrays for spectral transformations, and then calls INIT to set up the test case parameters. Routines ERRANL and NRGTCS are called once before the main timestepping loop for error normalization, once after the main timestepping for calculating energetics data and errors, and periodically during the timestepping, as requested. The prognostic fields are initialized using routine ANLYTC, which provides the analytic solution. Each call to STEP advances the computed fields by a timestep DT. Timing logic surrounds the timestepping loop, so the initialization phase is not timed. Also, a fake timestep is calculated before beginning timing to eliminate the first time "paging" effect currently seen on the Intel Paragon systems. STEP computes the first two time levels by two semi-implicit timesteps; normal time-stepping is by a centered leapfrog-scheme. STEP calls COMP1, which choses between an explicit numerical algorithm, a semi-implicit algorithm, and a simplified algorithm associated with solving the advection equation, one of the embedded test cases. The numerical algorithm used is an input parameter. The basic outline of each timestep is the following: 1) Evaluate non-linear product and forcing terms. 2) Fourier transform non-linear terms in place as a block transform. 3) Compute and update divergence, geopotential, and vorticity spectral coefficients. (Much of the calculation of the time update is "bundled" with the Legendre transform.) 4) Compute velocity fields and transform divergence, geopotential, and vorticity back to gridpoint space using a) an inverse Legendre transform and associated computations and b) an inverse real block FFT. PSTSWM has "fictitious" vertical levels, and all computations are duplicated on the different levels, potentially significantly increasing the granularity of the computation. (The number of vertical levels is an input parameter.) For error analysis, a single vertical level is extracted and analyzed. ------------------------------------------------------------------------------- Brief description of I/O behavior: Processor 0 reads in the input parameters and broadcasts them to the rest of the processors. Processor 0 also receives the error analysis and timing results from the other processors and writes them out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : The processors are treated as a logical 2-D grid. There are 3 domains to be distributed: a) physical domain: tensor product longitude-latitude grid b) Fourier domain: tensor product wavenumber-latitude grid c) spectral domain: triangular array, where each column contains the spectral coefficients associated with a given wavenumber. The larger the wavenumber is, the shorter the column is. An unordered FFT is used, and the Fourier and spectral domains use the "unordered" permutation when the data is being distributed. I) distributed FFT/distributed LT 1) The tensor-product longitude-latitude grid is mapped onto the processor grid by assigning a block of contiguous longitudes to each processor column and by assigning one or two blocks of contiguous latitudes to each processor row. The vertical dimension is not distributed. 2) After the FFT, the subsequent wavenumber-latitude grid is similarly distributed over the processor grid, with a block of the permuted wavenumbers assigned to each processor column. 3) After the LT, the wavenumbers are distributed as before and the spectral coefficients associated with any given wavenumber are either distributed evenly over the processors in the column containing that wavenumber, or are duplicated over the column. What happens is a function of the particular distributed LT algorithm used. II) transpose FFT/distributed LT 1) same as in (I) 2) Before the FFT, the physical domain is first remapped to a vertical layer-latitude decomposition, with a block of contiguous vertical layers assigned to each processor column and the longitude dimension not distributed. After the transform, the vertical level-latitude grid is distributed as before, and the wavenumber dimension is not distributed. 3) After the LT, the spectral coefficients for a given vertical layers are either distributed evenly over the processors in a column, or are duplicated over that column. What happens is a function of the particular distributed LT algorithm used. III) distributed FFT/transpose LT 1) same as (I) 2) same as (I) 3) Before the LT, the wavenumber-latitude grid is first remapped to a wavenumber-vertical layer decomposition, with a block of contiguous vertical layers assigned to eadh processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. IV) transpose FFT/transpose LT 1) same as (I) 2) same as (II) 3) Before the LT, the vertical level-latitude grid is first remapped to a vertical level-wavenumber decomposition, with a block of the permuted wavenumbers now assigned to each processor row and the latitude dimension not distributed. After the transform, the spectral coefficients associated with a given wavenumber and vertical layer are all on one processor, and the wavenumbers and vertical layers are distributed as before. ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : The distribution is a function of the problem size (longitude, latitude, vertical levels), the logical processor grid (PX, PY), and the algorithm (transpose vs. distributed for FFT and LT). ------------------------------------------------------------------------------- Brief description of load balance behavior : The load is fairly well balanced. If PX and PY evenly divide the number of longitudes, latitudes, and vertical levels, then all load imbalances are due to the unequal distribution of spectral coefficients. As described above, the spectral coefficients are laid out as a triangular array in most runs, where each column corresponds to a different Fourier wavenumber. The wavenumbers are partitioned among the processors in most of the parallel algorithms. Since each column is a different length, a wrap mapping of the the columns will approximately balance the load. Instead, the natural "unordered" ordering of the FFT is used with a block partitioning, which does a reasonable job of load balancing without any additional data movement. The load imbalance is quantified in Walker, et al [5]. If PX and PY do not evenly divide the dimensions of the physical domain, then other load imbalances may be as large as a factor of 2 in the worse case. ------------------------------------------------------------------------------- Give parameters that determine the problem size : MM, NN, KK - specifes number of Fourier wavenumber and spectral truncation used. For a triangular truncation, MM = NN = KK. NLON, NLAT, NVER - number of longitudes, latitudes, and vertical levels. There are required relationships between NLON, NLAT, and NVER, and between these and MM. These relationships are checked in the code. We will also provide a selection of input files that specify legal (and interesting) problems. DT - timestep (in seconds). (Must be small enough to satisfy Courant condition stability condition. Code warns if too large, but does not abort.) TAUE - end of model run (in hours) ------------------------------------------------------------------------------- Give memory as function of problem size : Executable size is determined at compile time by setting the parameters COMPSZ in params.i. Per node memory requirements are approximately (in REALs) associated Legendre polynomial values: MM*MM*NLAT/PX*PY physical grid fields: 8*NLON*NLAT*NVER/(PX*PY) spectral grid fields: 3*MM*MM*NVER/(PX*PY) or (if spectral coefficients duplicated within a processor column) 3*MM*MM*MVER/PX work space: 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/(PX*PY) or (if spectral coefficients duplicated within a processor column) 8*NLON*NLAT*NVER*BUFS1/(PX*PY) + 3*MM*MM*NVER*BUFS2/PX where BUFS1 and BUFS2 are input parameters (number of communication buffers). BUFS1 and BUFS2 can be as small as 0 and as large as PX or PY. In standard test cases, NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1, so memory requirements are approximately: (2 + 108*(1+BUFS1) + 3*(1+BUFS2))*(M**3)/(4*PX*PY) or (2 + 108*(1+BUFS1))*(M**3)/(4*PX*PY) + 3*(1+BUFS2)*(M**3)/(4*PX) ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : for a serial run per timestep (very rough): nonlinear terms: 10*NLON*NLAT*NVER forward FFT: 40*NLON*NLAT*NVER*LOG2(NLON) forward LT and time update: 48*MM*NLAT*NVER + 7*(MM**2)*NLAT*NVER inverse LT and calculation of velocities: 20*MM*NLAT*NVER + 14*(MM**2)*NLAT*NVER inverse FFT: 25*NLON*NLAT*NVER*LOG2(NLON) Using standard assumptions (NLON=2*NLAT, NLON=4*NVER, and NLON=3*MM+1): approx. 460*(M**3) + 348*(M**3)*LOG2(M) + 24*(M**4) flops per timestep. For a total run, multiply by TAUE/DT. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : This is a function of the algorithm chosen. I) transpose FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*(PX-1) steps, D volume or 2*LOG2(PX) steps, D*LOG2(PX) volume II) distributed FFT a) forward + inverse FFT: let D = 13*NLON*NLAT*NVER/(PX*PY) 2*LOG2(PX) steps, D*LOG2(PX) volume III) transpose LT a) forward LT: let D = 8*NLON*NLAT*NVER/(PX*PY) 2*(PY-1) steps, D volume or 2*LOG2(PY) steps, D*LOG2(PY) volume b) inverse LT: let D = (3/2)*(MM**2)*NVER/(PX*PY) (PY-1) steps, D volume or LOG2((PY) steps, D*PY volume IV) distributed LT a) forward + inverse LT: let D = 3*(MM**2)*NVER/(PX*PY) 2*(PY-1) steps, D*PY volume or 2*LOG2((PY) steps, D*PY volume These are per timestep costs. Multiply by TAUE/DT for total communication overhead. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : Standard input files will be provided for T21: MM=KK=NN=21 T42: MM=KK=NN=42 T85: MM=NN=KK=85 NLON=32 NLON=64 NLON=128 NLAT=64 NLAT=128 NVER=256 NVER=8 NVER=16 NVER=32 ICOND=2 ICOND=2 ICOND=2 DT=4800.0 DT=2400.0 DT=1200.0 TAUE=120.0 TAUE=120.0 TAUE=120.0 These are 5 day runs of the "benchmark" case specified in Williamson, et al [3]. Flops and memory requirements for serial runs are as follows (approx.): T21: 500,000 REALs 2,000,000,000 flops T42: 4,000,000 REALs 45,000,000,000 flops T85: 34,391,000 REALs 1,000,000,000,000 flops Both memory and flops scale well, so, for example, the T42 run fits in approx. 4MB of memory for a 4 processor run. But different algorithms and different aspect ratios of the processor grid use different amounts of memory. ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : Count by hand (looking primarily at inner loops, but eliminating common subexpressions that compiler is expected to find). ------------------------------------------------------------------------------- From owner-parkbench-compactapp@CS.UTK.EDU Tue Mar 22 10:19:48 1994 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.8t-netlib) id KAA14012; Tue, 22 Mar 1994 10:19:45 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10903; Tue, 22 Mar 1994 10:19:27 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Tue, 22 Mar 1994 10:19:17 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from rios2.epm.ornl.gov by CS.UTK.EDU with SMTP (cf v2.8s-UTK) id KAA10892; Tue, 22 Mar 1994 10:19:14 -0500 Received: by rios2.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03) id AA23268; Tue, 22 Mar 1994 10:18:26 -0500 Message-Id: <9403221518.AA23268@rios2.epm.ornl.gov> To: spb@epcc.ed.ac.uk Cc: pbwg-compactapp@CS.UTK.EDU Subject: ParkBench code Date: Tue, 22 Mar 94 10:18:26 -0500 From: "David W. Walker" Dear Dr. Booth, Thank you for submitting the SOLVER code for inclusion in the ParkBench Compact Applications benchmark suite. After due consideration the Compact Applications subcommittee has decided to include the code in the benchmark suite. I would be grateful if you would arrange for the source code, input, and output files to be sent to me. To submit your code please send me the following: 1. The complete source code 2. Input files corresponding to the small, medium, and large cases described in your submission 3. An output file corresponding to the small case to be used for validation purposes 4. PostScript files of the following papers mentioned in your submission describing the sequential and parallel codes (if available). Also the users guide if there is one. If you have versions of the code using different message passing packages please supply multiple versions of the source code. Ultimately we would like the codes to be self-validating. Please can you let me have any suggestions on what quantities might be checked to validate the code. If the above files are too large to email to me, please let me know if there is an anonymous ftp site where I can copy them from. Best Regards, David Walker -------------------------------------------------------------------------- | David W. Walker | Office : (615) 574-7401 | | Oak Ridge National Laboratory | Fax : (615) 574-0680 | | Building 6012/MS-6367 | Messages : (615) 574-1936 | | P. O. Box 2008 | Email : walker@msr.epm.ornl.gov | | Oak Ridge, TN 37831-6367 | | -------------------------------------------------------------------------- ------------------------------------------------------------------------- PARKBENCH COMPACT APPLICATIONS SUBMISSION FORM To submit a compact application to the ParkBench suite you must follow the following procedure: 1. Complete the submission form below, and email it to David Walker at walker@msr.epm.ornl.gov. The data on this form will be reviewed by the ParkBench Compact Applications Subcommittee, and you will be notified if the application is to be considered further for inclusion in the ParkBench suite. 2. If ParkBench Compact Applications Subcommittee decides to consider your application further you will be asked to submit the source code and input and output files, together with any documentation and papers about the application. Source code and input and output files should be submitted by email, or ftp, unless the files are very large, in which case a tar file on a 1/4 inch cassette tape. Wherever possible email submission is preferred for all documents in man page, Latex and/or Postscript format. These files documents and papers together constitute your application package. Your application package should be sent to: David Walker Oak Ridge National Laboratory Bldg. 6012/MS-6367 P. O. Box 2008 Oak Ridge, TN 37831-6367 (615) 574-7401/0680 (phone/fax) walker@msr.epm.ornl.gov The street address is "Bethal Valley Road" if Fedex insists on this. The subcommittee will then make a final decision on whether to include your application in the ParkBench suite. 3. If your application is approved for inclusion in the ParkBench suite you (or some authorized person from your organization) will be asked in complete and sign a form giving ParkBench authority to distribute, and modify (if necessary), your application package. ------------------------------------------------------------------------------- Name of Program : SOLVER : ------------------------------------------------------------------------------- Submitter's Name : Stephen P. Booth Submitter's Organization: UKQCD collaboration Submitter's Address : EPCC The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland Submitter's Telephone # : +44 (0)31 650 5746 Submitter's Fax # : +44 (0)31 622 4712 Submitter's Email : spb@epcc.ed.ac.uk ------------------------------------------------------------------------------- Cognizant Expert(s) : Dr S.P.Booth CE's Organization : EPCC/UKQCD CE's Address : The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland CE's Telephone # : +44 (0)31 650 5746 CE's Fax # : +44 (0)31 622 4712 CE's Email : spb@epcc.ed.ac.uk Cognizant Expert(s) : Dr R.D. Kenway CE's Organization : EPCC/UKQCD CE's Address : The University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland CE's Telephone # : +44 (0)31 650 5245 CE's Fax # : +44 (0)31 622 4712 CE's Email : rdk@epcc.ed.ac.uk ------------------------------------------------------------------------------- Extent and timeliness with which CE is prepared to respond to questions and bug reports from ParkBench : S.Booth is prepared to respond quickly to questions and bug reports. We have a strong interest in the portability and performance of this code. ------------------------------------------------------------------------------- Major Application Field : Lattice gauge theory Application Subfield(s) : QCD ------------------------------------------------------------------------------- Application "pedigree" (origin, history, authors, major mods) : SOLVER is part of an ongoing software development exercise carried out by UKQCD (The United Kingdom Quantum Chromo-Dynamics collaboration) To develop a new generation of simulation codes. The current generation of codes were highly tuned for a particular machine architecture so a software development exercise was started to design and develop a set of portable codes. This code was developed by S.Booth and N.Stanford of the University of Edinburgh during the course of 1993. Solver is a benchmark code derived from the codes used to generate quark propagators. It is designed to benchmark and validate the computational sections of this operation. It differs from the production code in that it self initialises to non-trivial test data rather than performing file access. This is because there is no accepted standard for parallel file access. The benchmark was originally developed as part of a national UK procurement exercise. ------------------------------------------------------------------------------- May this code be freely distributed (if not specify restrictions) : The code may be freely distributed for benchmarking purposes but the code remains the property of UKQCD and we ask to be contacted if anyone wishes to use it as an application code. ------------------------------------------------------------------------------- Give length in bytes of integers and floating-point numbers that should be used in this application: All floating point numbers are defined as macros (either Fpoint or Dpoint) The majority of the variables are Fpoint. Dpoint is only used for accumulation values that may require higher precision. This allows the precision of the program to be changed easily. For small and intermediate problem sizes 4 byte Fpoints and 8 byte Dpoints should be sufficient. For large problems higher precision may be required. INTEGERS must be large enough to hold the number of sites allocated to a processor (4 bytes almost certainly sufficient) The COMPLEX type is not used. ------------------------------------------------------------------------------- Documentation describing the implementation of the application (at module level, or lower) : Documentation exists for all program routines except some low level routines local to a single source file. ------------------------------------------------------------------------------- Research papers describing sequential code and/or algorithms : ------------------------------------------------------------------------------- Research papers describing parallel code and/or algorithms : ------------------------------------------------------------------------------- Other relevant research papers: ------------------------------------------------------------------------------- Application available in the following languages (give message passing system used, if applicable, and machines application runs on) : Two version of the application were developed in parallel. 1) A HPF version (both CMF and HPF directives) 2) A message passing version. The message passing version uses ansi-F77 with the following extensions a) CPP is used for include files and some simple macros and build-time conditionals. b) The F77 restrictions of variable names are not adhered to though the authors have tools to convert the code to conform. All of the message passing operations are confined to a small number of routines. These routines were designed to be implementable in as many different message passing systems as possible. Current versions are 1) fake - converts the program to a single processor code. 2) PARMACS - original parallel versions 3) PVM - under development. ------------------------------------------------------------------------------- Total number of lines in source code: 15567 Number of lines excluding comments : 10679 Size in bytes of source code : 432398 ------------------------------------------------------------------------------- List input files (filename, number of lines, size in bytes, and if formatted) : None ------------------------------------------------------------------------------- List output files (filename, number of lines, size in bytes, and if formatted) : standard output: formatted text ------------------------------------------------------------------------------- Brief, high-level description of what application does: The application generates quark propagators from a background gauge configuration and a fermionic source. This is equivalent to solving M psi = source where psi is the quark propagator and M (a function operating on psi) depends on the gauge fields. The benchmark performs a cut down version of this operation. ------------------------------------------------------------------------------- Main algorithms used: Conjugate gradient least norm with red-black pre-conditioning. ------------------------------------------------------------------------------- Skeleton sketch of application: The benchmark code initialises the gauge field to a unit gauge configuration. (The results for a unit gauge can be calculated analytically allowing a check on the results) A gauge transformation is then applied to the gauge field. A unit gauge field only consists of zeros and ones by applying a gauge transformation non-trivial values are generated. Quantities corresponding to physical observables should be unchanged by such a transformation. In application code the gauge field would have been read in from disk. The source field is initialised to a point source (a single non-zero point on one lattice site) An iterative solver is called to generate the quark propagator. The solver routine also generates timing information. In application code this would then be dumped to disk. In the benchmark we use the quark propagator to generate a physically significant quantity (the pion propagator). This generates a single real number for each timeslice of the lattice. These values are printed to standard out. This procedure requires a large number of iterations. For benchmarking we are only interested in the time per-iteration and some check on the validity of the results. We therefore usually only perform a fixed number of iterations (say 50) to generate accurate timing information and verify the results by comparison with other machines. ------------------------------------------------------------------------------- Brief description of I/O behaviour: Unless an error occurs a single processor outputs to standard out. ------------------------------------------------------------------------------- Describe the data distribution (if appropriate) : A spacial decomposition is used to distribute the 4-D arrays over a 4-D grid of processors. Each dimension is distributed independently. The program supports non-regular decomposition, e.g. a lattice of width 22 will be distributed across a processor-grid of width 4 as (6, 6, 5, 5) ------------------------------------------------------------------------------- Give parameters of the data distribution (if appropriate) : Lattice size: NX NY NZ NT processor grid: NPX NPY NPZ NPT ------------------------------------------------------------------------------- Brief description of load balance behavior : Load balancing depends only on the distribution, if the lattice size can be exactly divided by the processor grid size all processors will have the same workload. In practice it is often useful to trade load balancing for a larger number of processors. ------------------------------------------------------------------------------- Give parameters that determine the problem size : Lattice size, NX NY NZ NT problem size is NX*NY*NZ*NT ------------------------------------------------------------------------------- Give memory as function of problem size : In a production environment there are build time parameters that set the array sizes and problem/machine sizes can be set at runtime. When creating a benchmark program it seemed less confusing to set lattice and processor-grid sizes at build time and derive all other quantities from them. The appropriate parameters for memory use are Max_body (maximum number of data-points per/processor) Max_bound (maximum number of data points on a single boundary between two processors) If LX LY LZ LT are the local lattice sizes obtained by dividing the lattice size by the processor grid size and rounding up to the nearest integer. Max_body = (LX*LY*LZ*LT)/2 Max_bound = MAX( LX*LY*LZ/2 ,LY*LZ*LT/2 ,LX*LZ*LT/2 ,LX*LY*LT/2 ) The code contains a number of build-time switches for variations in the implementation that may be beneficial on some machines. The memory usage depends on these switches but typical values are: 108 * Max_body + 36 * Max_bound Fpoints 16 * (Max_body + Max_bound) INTEGERS ------------------------------------------------------------------------------- Give number of floating-point operations as function of problem size : Each iteration performs 2760 floating point operations per lattice site. ie. 50 iteration using a 24^3*48 lattice = 9.16e+10 floating point operations. ------------------------------------------------------------------------------- Give communication overhead as function of problem size and data distribution : For each iteration every processor sends 24 messages to each of its 8 neighbours each message contains one floating point number for each lattice point in the common boundary. Two global sum operations are also performed for each iteration. ------------------------------------------------------------------------------- Give three problem sizes, small, medium, and large for which the benchmark should be run (give parameters for problem size, sizes of I/O files, memory required, and number of floating point operations) : 18^3*36 2.90e+10 fp operations 24^3*48 9.16e+10 fp operations 36^3*72 4.64e+11 fp operations ------------------------------------------------------------------------------- How did you determine the number of floating-point operations (hardware monitor, count by hand, etc.) : count operations in each loop by hand. The code contains a counter to sum these values. ------------------------------------------------------------------------------- Other relevant information: ------------------------------------------------------------------------------- From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 08:44:32 1995 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA15646; Mon, 13 Mar 1995 08:44:31 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA14363; Mon, 13 Mar 1995 08:45:00 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 08:44:57 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from vax.darpa.mil by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA14339; Mon, 13 Mar 1995 08:44:55 -0500 Received: from next63.darpa.mil (next63.darpa.mil) by vax.darpa.mil (5.65c/5.61+local-5) id ; Mon, 13 Mar 1995 08:44:53 -0500 Received: by next63.darpa.mil (NX5.67d/NeXT-2.0) id AA00427; Mon, 13 Mar 95 08:43:24 -0500 Message-Id: <9503131343.AA00427@ next63.darpa.mil > Content-Type: text/plain Mime-Version: 1.0 (NeXT Mail 3.3 v118.2) Received: by NeXT.Mailer (1.118.2) From: Jose Munoz Date: Mon, 13 Mar 95 08:43:22 -0500 To: pbwg-compactapp@CS.UTK.EDU Subject: realtime? Hello, I'm interested in identifying a set of realtime benchmarks for embedded appls. Is this a good place to start (I thinkk so)? Im in the process of dl a copy of the report (as I write) and hopefully will have more focused questions. In general I'm interested in (1) has a benchmark std. been def'd, (2) are metrics id'd, (3) how is the underlying hw id'd? Thanks. Jose --- <<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>> < Dr. Jose L. Munoz | email: jmunoz@arpa.mil > < ARPA/CSTO | > < 3701 N. Fairfax Dr. | Phone: (703)696-4468 > < Arlington, VA 22203-1714 | FAX: (703)696-2202 > <<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From owner-parkbench-compactapp@CS.UTK.EDU Mon Mar 13 12:10:57 1995 Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA19933; Mon, 13 Mar 1995 12:10:56 -0500 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id LAA25609; Mon, 13 Mar 1995 11:08:01 -0500 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Mon, 13 Mar 1995 11:07:59 EST Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from rios2.EPM.ORNL.GOV by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id LAA25596; Mon, 13 Mar 1995 11:07:56 -0500 Received: (from walker@localhost) by rios2.EPM.ORNL.GOV (8.6.10/8.6.10) id LAA18850; Mon, 13 Mar 1995 11:07:20 -0500 From: David Walker Message-Id: <199503131607.LAA18850@rios2.EPM.ORNL.GOV> To: Jose Munoz Cc: pbwg-compactapp@CS.UTK.EDU Subject: Re: realtime? In-reply-to: (Your message of Mon, 13 Mar 95 08:43:22 EST.) <9503131343.AA00427@ next63.darpa.mil > Date: Mon, 13 Mar 95 11:07:19 -0500 Jose, ParkBench is a proposed set of standard benchmarks, but has not be officially sanctioned by any standrads body such as ISO. Several metrics, detailed in the Parkbench report have been identified. For more information, please take a look at the www page at: http://www.epm.ornl.gov/~walker/parkbench/ Regards, David -------------------------------------------------------------------------- | David W. Walker | Office : (615) 574-7401 | | Oak Ridge National Laboratory | Fax : (615) 574-0680 | | Building 6012/MS-6367 | Messages : (615) 574-1936 | | P. O. Box 2008 | Email : walker@msr.epm.ornl.gov | | Oak Ridge, TN 37831-6367 | | | WEB: http://www.epm.ornl.gov/~walker/ | -------------------------------------------------------------------------- From owner-parkbench-compactapp@CS.UTK.EDU Fri Sep 8 16:36:42 1995 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA14450; Fri, 8 Sep 1995 16:36:42 -0400 Received: from localhost by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA04473; Fri, 8 Sep 1995 16:36:21 -0400 X-Resent-To: parkbench-compactapp@CS.UTK.EDU ; Fri, 8 Sep 1995 16:36:20 EDT Errors-to: owner-parkbench-compactapp@CS.UTK.EDU Received: from franklin.seas.gwu.edu by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA04465; Fri, 8 Sep 1995 16:36:18 -0400 Received: from felix.seas.gwu.edu (abdullah@felix.seas.gwu.edu [128.164.9.3]) by franklin.seas.gwu.edu (v8) with ESMTP id QAA10099 for ; Fri, 8 Sep 1995 16:36:16 -0400 Received: (from abdullah@localhost) by felix.seas.gwu.edu (8.6.12/8.6.12) id QAA07113 for parkbench-compactapp@cs.utk.edu; Fri, 8 Sep 1995 16:36:12 -0400 Date: Fri, 8 Sep 1995 16:36:12 -0400 From: Abdullah Meajil Message-Id: <199509082036.QAA07113@felix.seas.gwu.edu> To: parkbench-compactapp@CS.UTK.EDU Subject: subscribe subscribe From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 10:51:58 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA09606; Fri, 28 Jun 1996 10:51:57 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA20519; Fri, 28 Jun 1996 10:51:17 -0400 Received: from convex.convex.com (convex.convex.com [130.168.1.1]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id KAA20506; Fri, 28 Jun 1996 10:51:07 -0400 Received: from bach.convex.com by convex.convex.com (8.6.4.2/1.35) id JAA01420; Fri, 28 Jun 1996 09:50:28 -0500 Received: from localhost by bach.convex.com (8.6.4/1.28) id JAA09161; Fri, 28 Jun 1996 09:50:27 -0500 From: hari@bach.convex.com (Harikumar Sivaraman) Message-Id: <199606281450.JAA09161@bach.convex.com> Subject: Bug report on COMMS3.f in PARKBENCH2.0 To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU Date: Fri, 28 Jun 96 9:50:26 CDT Cc: romero@bach.convex.com (Paco Romero) X-Mailer: ELM [version 2.3 PL11] DISCLAIMER: The contents of this mail are not an official HP position. I do not speak for HP. The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of the specifications in the MPI standard. The benchmark attempts to do an MPI_RECV into the same buffer on which it has posted an MPI_ISEND before it does an MPI_WAIT. The relevant code fragment is as below: COMMS3 (This code fragments applies in the case of two processors) ------ CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, ..... CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ...... CALL MPI_WAIT(request(NSLAVE), status, ierr) COMMS3 (Multiple processors) ------ do i = 1, #processors CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, ..... enddo // The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A" // is reused inside the loop. do i = 1, #processors CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ...... enddo do i = 1, #processors CALL MPI_WAIT(request(NSLAVE), status, ierr) enddo Comments: --------- The MPI standards (page 40, last but one paragraph) says "the sender should not access any part of the send buffer after a nonblocking send operation is called, until the send completes." Page 41, line 1 of the MPI standards says "the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking communication". Clearly the reuse of buffer "A" in the code fragments above is in violation of the standard. ------- H. Sivaraman (214) 497 - 4374 HP; 3000 Waterview Pk.way Dallas, TX - 75080 From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep 9 20:31:06 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id UAA24848; Mon, 9 Sep 1996 20:31:05 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id UAA10076; Mon, 9 Sep 1996 20:29:21 -0400 Received: from convex.convex.com (convex.convex.com [130.168.1.1]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id UAA10069; Mon, 9 Sep 1996 20:29:17 -0400 Received: from brittany.rsn.hp.com by convex.convex.com (8.6.4.2/1.35) id PAA25214; Mon, 9 Sep 1996 15:42:49 -0500 Received: from localhost by brittany.rsn.hp.com with SMTP (1.38.193.4/16.2) id AA16691; Mon, 9 Sep 1996 15:39:52 -0500 Sender: sercely@convex.convex.com Message-Id: <32348098.3BF5@convex.com> Date: Mon, 09 Sep 1996 15:39:52 -0500 From: Ron Sercely Organization: Hewlett-Packard Convex Technology Center X-Mailer: Mozilla 2.0 (X11; I; HP-UX A.09.05 9000/710) Mime-Version: 1.0 To: parkbench-lowlevel@CS.UTK.EDU Cc: wallach@convex.convex.com, romero@convex.convex.com, sercely@convex.convex.com Subject: comms2 and comms3 bugs, mpi release Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit HP/Convex wants to release lowlevel numbers in two weeks, but we are trying to figure out what to do about the bugs we have reported in these codes. Options are: Submitting results without these tests HP/Convex Re-writing the benchmarks to "do the right thing" other ? I would appreciate a phone call to discuss these issues. -- Ron Sercely 214.497.4667 HP/CXTC Toolsmith From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:23:38 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id HAA00602; Tue, 10 Sep 1996 07:23:36 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA24084; Tue, 10 Sep 1996 05:20:31 -0400 Received: from postoffice.npac.syr.edu (postoffice.npac.syr.edu [128.230.7.30]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id FAA24037; Tue, 10 Sep 1996 05:20:22 -0400 Received: from yosemite (pc280.sis.port.ac.uk [148.197.205.60]) by postoffice.npac.syr.edu (8.7.5/8.7.1) with SMTP id FAA00584; Tue, 10 Sep 1996 05:13:39 -0400 (EDT) From: Mark Baker Date: Tue, 10 Sep 96 10:10:24 Subject: RE: comms2 and comms3 bugs, mpi release To: parkbench-lowlevel@cs.utk.edu, Ron Sercely Cc: wallach@convex.convex.com, romero@convex.convex.com, sercely@convex.convex.com, erich@cs.utk.edu, dongarra@cs.utk.edu, ajgh@ecs.soton.ac.uk X-PRIORITY: 3 (Normal) X-Mailer: Chameleon notFound, TCP/IP for Windows, NetManage Inc. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Ron, Ian Glendenning and I produced the first MPI port of the low-level codes for Parkbench approximately a year ago. Erich Strohmaier (who works for Jack Dongarra at UTK) has been managing and maintaining all the parkbench codes since then. I would suggest he reply to you on the subject. If you do not get a reply I am willing to help. Regards Mark On Mon, 09 Sep 1996 15:39:52 -0500 Ron Sercely wrote: >HP/Convex wants to release lowlevel numbers in two weeks, but we are >trying to >figure out what to do about the bugs we have reported in these codes. > >Options are: >Submitting results without these tests >HP/Convex Re-writing the benchmarks to "do the right thing" >other ? > >I would appreciate a phone call to discuss these issues. >-- >Ron Sercely >214.497.4667 > >HP/CXTC Toolsmith > ------------------------------------- Dr Mark Baker DIS, University of Portsmouth, Hants, UK E-mail: mab@npac.syr.edu Date: 10/09/96 - Time: 10:10:24 URL http://www.npac.syr.edu/ ------------------------------------- From owner-parkbench-compactapp@cs.utk.edu Tue Sep 10 07:27:37 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id HAA00650; Tue, 10 Sep 1996 07:27:37 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id EAA15736; Tue, 10 Sep 1996 04:02:25 -0400 Received: from beech.soton.ac.uk (beech.soton.ac.uk [152.78.128.78]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id DAA15421; Tue, 10 Sep 1996 03:59:42 -0400 Received: from bright.ecs.soton.ac.uk (bright.ecs.soton.ac.uk [152.78.64.201]) by beech.soton.ac.uk (8.6.12/hub-8.5a) with SMTP id IAA22959; Tue, 10 Sep 1996 08:57:52 +0100 Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Tue, 10 Sep 96 08:57:21 BST From: Vladimir Getov Received: from caesar.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Tue, 10 Sep 96 08:59:09 BST Date: Tue, 10 Sep 96 08:58:36 BST Message-Id: <2546.9609100758@caesar.ecs.soton.ac.uk> To: parkbench-comm@cs.utk.edu, parkbench-lowlevel@cs.utk.edu, sercely@convex.convex.com Subject: Re: comms2 and comms3 bugs, mpi release Cc: wallach@convex.convex.com, romero@convex.convex.com Hi Ron, Are you talking about the same or similar bugs as the ones reported for the comms3 benchmark by Harikumar Sivaraman at the end of June (see the included message below)? -Vladimir Getov p.s. Apologies if you receive this message more than once - I have included parkbench-comm@CS.UTK.EDU on the "To:" line but do not know the cross membership. > > HP/Convex wants to release lowlevel numbers in two weeks, but we are > trying to > figure out what to do about the bugs we have reported in these codes. > > Options are: > Submitting results without these tests > HP/Convex Re-writing the benchmarks to "do the right thing" > other ? > > I would appreciate a phone call to discuss these issues. > -- > Ron Sercely > 214.497.4667 > > HP/CXTC Toolsmith > ____________________________ included message _______________________ >From owner-parkbench-compactapp@CS.UTK.EDU Fri Jun 28 15:54:32 1996 From: hari@bach.convex.com (Harikumar Sivaraman) Subject: Bug report on COMMS3.f in PARKBENCH2.0 To: parkbench-comments@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU Date: Fri, 28 Jun 96 9:50:26 CDT Cc: romero@bach.convex.com (Paco Romero) X-Mailer: ELM [version 2.3 PL11] Content-Length: 1559 X-Status: DISCLAIMER: The contents of this mail are not an official HP position. I do not speak for HP. The COMMS3 benchmark in PARKBENCH2.0 is in apparent violation of the specifications in the MPI standard. The benchmark attempts to do an MPI_RECV into the same buffer on which it has posted an MPI_ISEND before it does an MPI_WAIT. The relevant code fragment is as below: COMMS3 (This code fragments applies in the case of two processors) ------ CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, ..... CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ...... CALL MPI_WAIT(request(NSLAVE), status, ierr) COMMS3 (Multiple processors) ------ do i = 1, #processors CALL MPI_ISEND(A, IWORD, MPI_DOUBLE_PRECISION, ..... enddo // The MPI_ISEND statements in the loop violate the MPI standard since the buffer "A" // is reused inside the loop. do i = 1, #processors CALL MPI_RECV(A, IWORD, MPI_DOUBLE_PRECISION, ...... enddo do i = 1, #processors CALL MPI_WAIT(request(NSLAVE), status, ierr) enddo Comments: --------- The MPI standards (page 40, last but one paragraph) says "the sender should not access any part of the send buffer after a nonblocking send operation is called, until the send completes." Page 41, line 1 of the MPI standards says "the functions MPI_WAIT and MPI_TEST are used to complete a nonblocking communication". Clearly the reuse of buffer "A" in the code fragments above is in violation of the standard. ------- H. Sivaraman (214) 497 - 4374 HP; 3000 Waterview Pk.way Dallas, TX - 75080 From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 08:46:41 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA01821; Tue, 10 Sep 1996 08:46:40 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA13971; Tue, 10 Sep 1996 08:41:06 -0400 Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id IAA13960; Tue, 10 Sep 1996 08:40:59 -0400 From: Erich Strohmaier Received: by rudolph.cs.utk.edu (cf v2.11c-UTK) id IAA13912; Tue, 10 Sep 1996 08:40:58 -0400 Date: Tue, 10 Sep 1996 08:40:58 -0400 Message-Id: <199609101240.IAA13912@rudolph.cs.utk.edu> To: parkbench-lowlevel@CS.UTK.EDU, sercely@convex.convex.com Subject: Re: comms2 and comms3 bugs, mpi release Cc: romero@convex.convex.com, wallach@convex.convex.comh Ron, We fixed the two bugs you mentioned and we are currently testing the new codes. The new version should be out by end of this week. If you would like to get it earlier, please let me know. Best Regards Erich =========================================================================== Erich Strohmaier email: erich@cs.utk.edu Department of Computer Science phone: ++ 1 (423) 974 0293 104 Ayres Hall fax : ++ 1 (423) 974 8296 Knoxville TN, 37996 - USA http://www.cs.utk.edu/~erich/ From owner-parkbench-compactapp@CS.UTK.EDU Tue Sep 10 18:13:11 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id SAA06946; Tue, 10 Sep 1996 18:13:11 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id SAA05907; Tue, 10 Sep 1996 18:12:17 -0400 Received: from VNET.IBM.COM (vnet.ibm.com [199.171.26.4]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id SAA05894; Tue, 10 Sep 1996 18:12:13 -0400 Message-Id: <199609102212.SAA05894@CS.UTK.EDU> Received: from PKEDVM9 by VNET.IBM.COM (IBM VM SMTP V2R3) with BSMTP id 2875; Tue, 10 Sep 96 18:12:14 EDT Date: Tue, 10 Sep 96 18:11:11 EDT From: "C. George Hsi" To: parkbench-lowlevel@CS.UTK.EDU Hi, could you please add my name to the ParkBench Low-Level mailing list? I work in the RS/6000 SP performance measurement area at IBM Poughkeepsie, and have been involved in using the ParkBench Low-Level code recently. My address is: hsi@pkedvm9.vnet.ibm.com Thanks for your help, C. George Hsi From owner-parkbench-compactapp@CS.UTK.EDU Mon Sep 16 15:02:05 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id PAA24616; Mon, 16 Sep 1996 15:02:04 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id OAA17941; Mon, 16 Sep 1996 14:51:47 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id OAA17934; Mon, 16 Sep 1996 14:51:45 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id SAA05937; Mon, 16 Sep 1996 18:49:20 GMT From: "Erich Strohmaier" Message-Id: <9609161449.ZM5935@blueberry.cs.utk.edu> Date: Mon, 16 Sep 1996 14:49:20 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-comm@@CS.UTK.EDU, cs.utk.edu@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU Subject: ParKBench Release 2.1 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Hello, The release 2.1 of ParKBench is available at netlib: http://www.netlib.org/parkbench/ It contains the following bug fixes: - Comms2 for MPI made to be a true exchange benchmark using MPI_SENDRECV. - Comms3 for MPI using wild-card and second buffer. - Added missing mpif.f for the MPI2PVM library. - Fixed Makefiles. - make.local.def modifications. - Updated conf/make.def.SP2MPI. - LU Solver fixed though the use of a flag to the Blacs build in the Bmakes. - Addition of the definition for mpi_group_translate_ranks in Bdef.h. - PBLAS bug solved with new BLACS compilation. Best Regards Erich Strohmaier email: erich@cs.utk.edu From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 14 14:28:34 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id OAA06896; Mon, 14 Oct 1996 14:28:34 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id OAA07493; Mon, 14 Oct 1996 14:22:58 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id OAA07485; Mon, 14 Oct 1996 14:22:53 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id SAA13307; Mon, 14 Oct 1996 18:20:29 GMT From: "Erich Strohmaier" Message-Id: <9610141420.ZM13305@blueberry.cs.utk.edu> Date: Mon, 14 Oct 1996 14:20:27 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-comm@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU Subject: ParkBench Workshop: Tentative Agenda Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear Colleague, The ParkBench (Parallel Benchmark Working Group) will meet in Knoxville, Tennessee on October 31th, 1996. The format of the meeting is: Thursday October 31th 9:00 - 12.00 Full group meeting 12.00 - 1.30 Lunch 1.30 - 5.00 Full group meeting The tentative agenda for the meeting is: 1. Minutes of last meeting Current release: 2. Status report and experience with the current release 3. Examine the results obtained Next release: 4. New HPF Low Level benchmarks 5. New shared memory Low Level benchmarks 6. New performance database design and new benchmark output format 7. Update of GBIS with new Web front-end 8. Report from other benchmark activities ParkBench: 9. Discussion of ParkBench group structure 10. ParkBench Bibliography 11. Status of ParkBench funding Other Activities: 12. Discussion of the Supercomputing'96 activities 13. "Electronic Benchmarking Journal" - status report 14. Miscellaneous 15. Date and venue for next meeting The meeting site will be the Knoxville Downtown Hilton Hotel. We have made arrangements with the Hilton Hotel in Knoxville. You can download a postscript map of the area by looking at http://www.netlib.org/utk/people/JackDongarra.html. When making arrangements tell the hotel you are associated with the Parallel Benchmarking or ParkBench or Park. The rate about $75.00/night. Hilton Hotel 501 W. Church Street Knoxville, TN Phone: 423-523-2300 ==> Please make your reservation as soon as possible! Jack Dongarra Erich Strohmaier From owner-parkbench-compactapp@CS.UTK.EDU Mon Oct 21 16:14:12 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA11230; Mon, 21 Oct 1996 16:14:11 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA21293; Mon, 21 Oct 1996 15:57:23 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id PAA20796; Mon, 21 Oct 1996 15:54:50 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id TAA16003; Mon, 21 Oct 1996 19:52:28 GMT From: "Erich Strohmaier" Message-Id: <9610211552.ZM16001@blueberry.cs.utk.edu> Date: Mon, 21 Oct 1996 15:52:27 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU Subject: ParKBench Workshop Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear Colleague, All of you who are planning to come to the next meeting --- http://www.netlib.org/parkbench/ --- please send email to us so we can make local arrangements. Thank you very much Erich Strohmaier From owner-parkbench-compactapp@CS.UTK.EDU Tue Dec 3 21:46:51 1996 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id VAA14230; Tue, 3 Dec 1996 21:46:50 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id VAA13342; Tue, 3 Dec 1996 21:45:10 -0500 Received: from alberta.sallynet.com (root@[208.1.117.130]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id VAA13325; Tue, 3 Dec 1996 21:45:06 -0500 Received: from euphoria.com (Cust28.Max45.Seattle.WA.MS.UU.NET [153.34.132.156]) by alberta.sallynet.com (8.7.4/8.7.3) with SMTP id RAA06216; Tue, 3 Dec 1996 17:11:18 -0500 (EST) Message-Id: <199612032211.RAA06216@alberta.sallynet.com> Comments: Authenticated sender is From: mail.strutstuff.com@alberta.sallynet.com To: "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)>, "(promote)"<"(promote)"@CS.UTK.EDU (promote@strutstuff.com)> Date: Tue, 3 Dec 1996 14:12:21 +0000 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Free offer Priority: normal X-mailer: Pegasus Mail for Win32 (v2.42a) Strut Your Stuff! 1001 FREE Places to Promote your site! http://www.strutyourstuff.com --------------------------------- If you like to be removed from any future free offers, simple type the word "remove" in the subject line. Thank You. From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 16:40:22 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA10091; Wed, 23 Apr 1997 16:40:22 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA02831; Wed, 23 Apr 1997 16:40:25 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA02732; Wed, 23 Apr 1997 16:40:00 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT From: "Erich Strohmaier" Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu> Date: Wed, 23 Apr 1997 14:36:16 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU Subject: ParkBench Committee Meeting - tentative Agenda Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear Colleague, The ParkBench (Parallel Benchmark Working Group) will meet in Knoxville, Tennessee on May 9th, 1997. The meeting site will be the Knoxville Downtown Hilton Hotel. We have made arrangements with the Hilton Hotel in Knoxville. Hilton Hotel 501 W. Church Street Knoxville, TN Phone: 423-523-2300 When making arrangements tell the hotel you are associated with the 'ParkBench'. The rate about $79.00/night. You can download a postscript map of the area by looking at http://www.netlib.org/utk/people/JackDongarra.html. ---------------- The format of the meeting is: Friday May 9th, 1997. 9:00 - 12.00 Full group meeting 12.00 - 1.30 Lunch 1.30 - 5.00 Full group meeting There might be also a joint session with the SPEC/HPG group on Thursday 8th at about 3pm-5pm ---------------- Please send us your comments about the tentative agenda: 1. Minutes of last meeting (MBe) Changes to Current release: 2. Low Level (ES, VG, RS) comms1, comms2, comms3, poly2 3. Linear Algebra (ES) 4. Compact Applications - NPBs (SS, ES) New benchmarks: 5. HPF Low Level benchmarks (MBa) ? 6. New shared memory Low Level benchmarks (MBa) ? 7. New performance database design and new benchmark output format (MBa,VG) ? 8. Update of GBIS with new Web front-end (MBa,VG) Report from other benchmark activities 9. ASCI Benchmark Codes (RS) 10. SPEC (RE) ParkBench: 11. ParkBench Bibliography 12. ParkBench Report 2 Other Activities: 13. Discussion of the ParkBench Workshop 11/12 September, UK 14. "Electronic Benchmarking Journal" - status report - 15. Miscellaneous - 16. Date and venue for next meeting - (MBa) Mark Baker Univ. of Portsmouth (MBe) Michael Berry Univ. of Tennessee (JD) Jack Dongarra Univ. of Tenn./ORNL (RE) Rudi Eigenmann SPEC (VG) Vladimir Getov Univ. of Westminister (TH) Tony Hey Univ. of Southampton (SS) Subhash Saini NASA Ames (RS) Ron Sercely HP/CXTC (ES) Erich Strohmaier Univ. of Tennessee Jack Dongarra Erich Strohmaier From owner-parkbench-compactapp@CS.UTK.EDU Wed Apr 23 19:11:02 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id TAA12012; Wed, 23 Apr 1997 19:11:01 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id TAA16877; Wed, 23 Apr 1997 19:10:25 -0400 Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id TAA16794; Wed, 23 Apr 1997 19:09:55 -0400 Received: from mordillo (node3.remote.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA29461; Thu, 24 Apr 97 00:10:42 BST Date: Wed, 23 Apr 97 23:56:13 From: Mark Baker Subject: RE: ParkBench Committee Meeting - tentative Agenda To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU, Erich Strohmaier X-Priority: 3 (Normal) X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc. Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=us-ascii Erich, Some corrections... --- On Wed, 23 Apr 1997 14:36:16 -0400 Erich Strohmaier wrote: >Please send us your comments about the tentative agenda: > > 1. Minutes of last meeting (MBe) > > Changes to Current release: > 2. Low Level (ES, VG, RS) > comms1, comms2, comms3, poly2 > 3. Linear Algebra (ES) > 4. Compact Applications - NPBs (SS, ES) > > New benchmarks: > 5. HPF Low Level benchmarks (MBa) >? 6. New shared memory Low Level benchmarks (MBa) Can you change this to report on our I/O benchmark efforts. >? 7. New performance database design and new benchmark output format (MBa,VG) >? 8. Update of GBIS with new Web front-end (MBa,VG) Tony or I will update the committe on the new back/fronts ends of GBIS + hopefully also give a demo. VG, as far as I know, is not involved in this activity. Regards Mark ------------------------------------- DIS, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 4/23/97 - Time: 11:56:13 PM URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-compactapp@cs.utk.edu Sat Apr 26 06:40:56 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA20901; Sat, 26 Apr 1997 06:40:55 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id OAA18130; Wed, 23 Apr 1997 14:37:56 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id OAA18062; Wed, 23 Apr 1997 14:36:39 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id SAA12213; Wed, 23 Apr 1997 18:36:17 GMT From: "Erich Strohmaier" Message-Id: <9704231436.ZM12211@blueberry.cs.utk.edu> Date: Wed, 23 Apr 1997 14:36:16 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-lowlevel@cs.utk.edu, parkbench-comm@cs.utk.edu, parkbench-hpf@cs.utk.edu Subject: ParkBench Committee Meeting - tentative Agenda Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear Colleague, The ParkBench (Parallel Benchmark Working Group) will meet in Knoxville, Tennessee on May 9th, 1997. The meeting site will be the Knoxville Downtown Hilton Hotel. We have made arrangements with the Hilton Hotel in Knoxville. Hilton Hotel 501 W. Church Street Knoxville, TN Phone: 423-523-2300 When making arrangements tell the hotel you are associated with the 'ParkBench'. The rate about $79.00/night. You can download a postscript map of the area by looking at http://www.netlib.org/utk/people/JackDongarra.html. ---------------- The format of the meeting is: Friday May 9th, 1997. 9:00 - 12.00 Full group meeting 12.00 - 1.30 Lunch 1.30 - 5.00 Full group meeting There might be also a joint session with the SPEC/HPG group on Thursday 8th at about 3pm-5pm ---------------- Please send us your comments about the tentative agenda: 1. Minutes of last meeting (MBe) Changes to Current release: 2. Low Level (ES, VG, RS) comms1, comms2, comms3, poly2 3. Linear Algebra (ES) 4. Compact Applications - NPBs (SS, ES) New benchmarks: 5. HPF Low Level benchmarks (MBa) ? 6. New shared memory Low Level benchmarks (MBa) ? 7. New performance database design and new benchmark output format (MBa,VG) ? 8. Update of GBIS with new Web front-end (MBa,VG) Report from other benchmark activities 9. ASCI Benchmark Codes (RS) 10. SPEC (RE) ParkBench: 11. ParkBench Bibliography 12. ParkBench Report 2 Other Activities: 13. Discussion of the ParkBench Workshop 11/12 September, UK 14. "Electronic Benchmarking Journal" - status report - 15. Miscellaneous - 16. Date and venue for next meeting - (MBa) Mark Baker Univ. of Portsmouth (MBe) Michael Berry Univ. of Tennessee (JD) Jack Dongarra Univ. of Tenn./ORNL (RE) Rudi Eigenmann SPEC (VG) Vladimir Getov Univ. of Westminister (TH) Tony Hey Univ. of Southampton (SS) Subhash Saini NASA Ames (RS) Ron Sercely HP/CXTC (ES) Erich Strohmaier Univ. of Tennessee Jack Dongarra Erich Strohmaier From owner-parkbench-comm@CS.UTK.EDU Fri May 2 15:53:02 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id PAA00358; Fri, 2 May 1997 15:53:02 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA13341; Fri, 2 May 1997 15:44:43 -0400 Received: from blueberry.cs.utk.edu (BLUEBERRY.CS.UTK.EDU [128.169.92.34]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id PAA13327; Fri, 2 May 1997 15:44:36 -0400 Received: by blueberry.cs.utk.edu (cf v2.11c-UTK) id TAA08348; Fri, 2 May 1997 19:44:04 GMT From: "Erich Strohmaier" Message-Id: <9705021544.ZM8346@blueberry.cs.utk.edu> Date: Fri, 2 May 1997 15:44:03 -0400 X-Face: ,v?vp%=2zU8m.23T00H*9+qjCVLwK{V3T{?1^Bua(Ud:|%?@D!~^v^hoA@Z5/*TU[RFq_n'n"}z{qhQ^Q3'Mexsxg0XW>+CbEOca91voac=P/w]>n_nS]V_ZL>XRSYWi:{MzalK9Hb^=B}Y*[x*MOX7R=*V}PI.HG~2 X-Mailer: Z-Mail (3.2.0 26oct94 MediaMail) To: parkbench-comm@CS.UTK.EDU Subject: ParkBench Committee Meeting Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Dear Colleague, Here is the revised agenda. Please send me ASAP a short email if you come so that we can arrange for a meeting room. ------------------- The ParkBench (Parallel Benchmark Working Group) will meet in Knoxville, Tennessee on May 9th, 1997. The meeting site will be the Knoxville Downtown Hilton Hotel. We have made arrangements with the Hilton Hotel in Knoxville. Hilton Hotel 501 W. Church Street Knoxville, TN Phone: 423-523-2300 When making arrangements tell the hotel you are associated with the 'ParkBench'. The rate about $79.00/night. You can download a postscript map of the area by looking at http://www.netlib.org/utk/people/JackDongarra.html. ---------------- The tentative agenda for the meeting is: 1. Minutes of last meeting (MBe) Changes to Current release: 2. Low Level (ES, VG, RS) comms1, comms2, comms3, poly2 3. Linear Algebra (ES) 4. Compact Applications - NPBs (SS, ES) New benchmarks: 5. HPF Low Level benchmarks (MBa) 6. Java Low-Level Benchmarks (VG) 7. New I/O benchmark benchmarks (MBa) 8. New performance database design and new benchmark output format Update of GBIS with new Web front-end (MBa,TH) Report from other benchmark activities 9. ASCI Benchmark Codes (AH) 10. SPEC-HPG (RE, JD) ParkBench: 11. ParkBench Bibliography 12. ParkBench Report 2 Other Activities: 13. Discussion of the ParkBench Workshop 11/12 September, UK (TH, MBa) 14. PEMCS - "Electronic Benchmarking Journal" - status report - (TH, MBa) 15. Status of Funding proposals (JD, TH) 15. Miscellaneous - 16. Date and venue for next meeting - (MBa) Mark Baker Univ. of Portsmouth (MBe) Michael Berry Univ. of Tennessee (JD) Jack Dongarra Univ. of Tenn./ORNL (RE) Rudi Eigenmann SPEC (VG) Vladimir Getov Univ. of Westminister (TH) Tony Hey Univ. of Southampton (AH) Adolfy Hoisie LLNL (SS) Subhash Saini NASA Ames (RS) Ron Sercely HP/CXTC (ES) Erich Strohmaier Univ. of Tennessee Jack Dongarra Erich Strohmaier From owner-parkbench-comm@CS.UTK.EDU Tue May 6 14:46:45 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id OAA04480; Tue, 6 May 1997 14:46:45 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id OAA25737; Tue, 6 May 1997 14:34:05 -0400 Received: from punt-2.mail.demon.net (relay-11.mail.demon.net [194.217.242.137]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id OAA25715; Tue, 6 May 1997 14:33:58 -0400 Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net id aa1000641; 6 May 97 19:07 BST Message-ID: Date: Tue, 6 May 1997 19:06:15 +0100 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Parkbench Meeting Documents In-Reply-To: <9705021544.ZM8346@blueberry.cs.utk.edu> MIME-Version: 1.0 X-Mailer: Turnpike Version 3.01 AGENDA ITEM: > Changes to Current release: > 2. Low Level (VG) > comms1, comms2, Two documents will be submitted to the committee on this item by Roger Hockney and Vladimir Getov (Westminster University, UK). They can be downloaded as postscript files from: "New COMMS1 Benchmark: Results and Recommendations" http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER2.PS "New COMMS1 Benchmark: The Details" http://www.minow.demon.co.uk/Pbench/comms1/PBPAPER3.PS The papers will be presented by Vladimir who will bring some paper copies with him. Best wishes Roger and Vladimir -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Tue May 6 17:54:47 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id RAA07526; Tue, 6 May 1997 17:54:46 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA17012; Tue, 6 May 1997 17:48:50 -0400 Received: from punt-1.mail.demon.net (relay-7.mail.demon.net [194.217.242.9]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA17003; Tue, 6 May 1997 17:48:47 -0400 Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-1.mail.demon.net id aa0623986; 6 May 97 21:37 BST Message-ID: Date: Tue, 6 May 1997 21:26:50 +0100 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Parkbench Meeting Documents (Correction) MIME-Version: 1.0 X-Mailer: Turnpike Version 3.01 I am resending this because there was a typo in the URLs: There are two MM in "minnow". Also if you took PBPAPER2.PS before receiving this repeat message, please take it again as I have corrected two errors in the graphs. SORRY Roger ************************ AGENDA ITEM: > Changes to Current release: > 2. Low Level (VG) > comms1, comms2, Two documents will be submitted to the committee on this item by Roger Hockney and Vladimir Getov (Westminster University, UK). They can be downloaded as postscript files from: CORRECTED URLs: "New COMMS1 Benchmark: Results and Recommendations" http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER2.PS "New COMMS1 Benchmark: The Details" http://www.minnow.demon.co.uk/Pbench/comms1/PBPAPER3.PS The papers will be presented by Vladimir who will bring some paper copies with him. Best wishes Roger and Vladimir -- -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Mon May 12 05:36:41 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id FAA24086; Mon, 12 May 1997 05:36:41 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA10068; Mon, 12 May 1997 05:18:21 -0400 Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id FAA10051; Mon, 12 May 1997 05:18:18 -0400 Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id FAA29262; Mon, 12 May 1997 05:18:16 -0400 (EDT) Date: Mon, 12 May 1997 05:18:16 -0400 (EDT) From: Pat Worley Message-Id: <199705120918.FAA29262@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Gordon Conference on HPC and NII Forwarding: Mail from 'Tony Skjellum ' dated: Sat, 10 May 1997 16:32:12 -0500 (CDT) Cc: worley@haven.EPM.ORNL.GOV Just in case you haven't received information on this already, here is a blurb on the 1997 Gordon conference in high performance computing. Unlike previous years, there is not an explicit emphasis on performance evaluation in this year's stated themes, but you can't (shouldn't) discuss future architectures and their impacts without discussing how to evaluate performance, and I am hoping that some benchmarking-minded people will show up and keep the discussion honest. ---------- Begin Forwarded Message ---------- The deadline for applying to attend the 1997 Gordon conference in high performance computing is June 1. If you are interested in attending, please apply as soon as possible. The simplest way to apply is to download the application form from the web site indicated below, or to use the online registration option. If you have any problems with either of these, please contact the organizers at tony@cs.msstate.edu and worleyph@ornl.gov. ------------------------------------------------------------------------------- The 1997 Gordon Conference on High Performance Computing and Information Infrastructure: "Practical Revolutions in HPC and NII" Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu, 601-325-8435 Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov, 615-574-3128 Conference web page: http://www.erc.msstate.edu/conferences/gordon97 July 13-17, 1997 Plymouth State College Plymouth NH The now bi-annual Gordon conference series in HPC and NII commenced in 1992 and has had its second meeting in 1995. The Gordon conferences are an elite series of conferences designed to advance the state-of-the-art in covered disciplines. Speakers are assured of anonymity and referencing presentations done at Gordon conferences is prohibited by conference rules in order to promote science, rather than publication lists. Previous meetings have had good international participation, and this is always encouraged. Experts, novices, and technically interested parties from other fields interested in HPC and NII are encouraged to apply to attend. All attendees, including speakers, poster presenters, and session chairs must apply to attend. We *strongly* encourage all poster presenters to have their poster proposals in by May 13, 1997, though we will consider poster presentations up to six weeks prior to the conference. Application to attend the conference is also six weeks in advance. More information on the conference can be found at the web page listed above, including the list of speakers and poster presenters and information on applying for attendance. ----------- End Forwarded Message ----------- From owner-parkbench-comm@CS.UTK.EDU Tue May 13 13:58:00 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id NAA20879; Tue, 13 May 1997 13:57:59 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA11997; Tue, 13 May 1997 13:33:14 -0400 Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id NAA11983; Tue, 13 May 1997 13:33:10 -0400 Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.5/CRI-gate-news-1.3) with ESMTP id MAA20939 for ; Tue, 13 May 1997 12:33:07 -0500 (CDT) Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id MAA16428 for ; Tue, 13 May 1997 12:33:06 -0500 (CDT) From: Charles Grassl Received: by magnet.cray.com (8.8.0/btd-b3) id RAA20181; Tue, 13 May 1997 17:33:04 GMT Message-Id: <199705131733.RAA20181@magnet.cray.com> Subject: Parkbench directions To: parkbench-comm@CS.UTK.EDU Date: Tue, 13 May 1997 12:33:04 -0500 (CDT) X-Mailer: ELM [version 2.4 PL24-CRI-d] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: ParkBench Group From: Charles Grassl Date: May 13, 1997 (Long) I appreciated the meeting this past week and wish to thank Eric and Jack for hosting it. I am aware of the great effort of many individuals have contributed to developing and implementing the ParkBench suite. In spite of this, I feel that we need to evaluate and correct our course. ParkBench should not merge with or use benchmarks from the SPEC/HPG (High Performance Group) group. SGI/Cray and IBM have already withdrawn from the SPEC/HPG group and Fujitsu and NEC are no longer participating. The reasons for these companies and other institutions no longer participating should indicate to us (ParkBench) that something is amiss with the SPEC/HPG benchmarks and paradigm. Several of the reasons for the supercomputer manufacturers not supporting the SPEC/HPG effort are listed below. I list these reasons so that the ParkBench group can learn from them and avoid the same problems. - Relevance. The particular benchmark programs being used by SPEC/HPG are not relevant or appropriate for supercomputing. The programs in the current SPEC/HPG suite do not represent any leading edge software which is more typical of usage for high performance systems. - Redundancy. The programs being developed by SPEC/HPG are not qualitatively or quantitatively different from the SPEC/OSG programs and as such, it is viewed as redundant and expensive. - Methodology. The methodology being used by SPEC/HPG to procure, develop and run benchmarks lacks scientific and technical basis and hence results have a vague and arbitrary interpretation. - Programming model. Designing benchmarks for portability across systems is a convenient idea but does not reflect actual constraints or usage. More often than not, compatibility with a PREVIOUS model of computer is more important than compatibility ACROSS computers. - Expense. Some of the large data cases for the SPEC/HPG programs will requires hours or days to run with little new data or information gained by the exercise. These exercises are extremely expensive both in time and capital equipment and in logistics. - Ergonomics. The cumbersome design of SPEC/HPG Makefiles and build procedures make the programs difficult and expensive to test, maintain and analyze. We in the ParkBench group must acknowledge the above items if we are to maintain interest and participation from computer vendors. I believe that reorganizing and refocusing the group could revitalize high performance computer benchmarking and and re-invigorate the ParkBench group. As the ParkBench suite now stands, there are too many programs and they are difficult to build, test and maintain. This situation impedes usage and participation. Here are a few suggestions for our future practices and directions: - Design and write benchmarks programs. Don't borrow or solicit old code. The borrowed or solicited code is never quite appropriate and usually obsolete. Our greatest asset is that we have scientist who are capable of designing experiments (benchmarks). (Build value.) - Monitor and evaluate accuracy. Though we mention accuracy in ParkBench Report 1, we haven't applied it to the current programs (Scientifically validate, or invalidate, our experiments.) - Make it simple. Write and develop simple programs which do not need elaborate build procedures and which easier to test and to maintain. (Keep It Simple, Stupid.) - Build a better user interface. The belabored "run rules" and the interface with layers of Makefiles, includes and embedded relative file paths is unacceptable. An acceptable interface might require binary distribution and hence a desirable emphasis on designing and running rather than building and porting the benchmarks. (Make the product more attractive to more users.) - Make the suite truly modular. The current structure makes the simplest one CPU program as difficult to build and run as the most complicated program with Makefile includes, special compilers, source file includes, special libraries, suite libraries, etc. (Make it manageable.) - Drop the connection with SPEC/HPG and with NPB. This "grand unifying" scheme make redundant code. It has had the opposite effect of focusing benchmarking attention on ParkBench because it is yet another collection of benchmarks used by other organizations. (Be distinguishable and identifiable.) - Emphasis what ParkBench is associated with: benchmarking distributed memory parallel computers. We should write and develop benchmark programs which measure and instrument the parallel processing aspect of MPP systems. (Keep our focus.) I volunteer to develop and write a suite of message passing test programs which measure the performance and variance of message passing communication schemes. I have much experience with writing such a programs and believe that such suite would be useful for others and for the computer industry in general. I hesitate to contribute such programs to the present structure for several reasons: - The network test suite does not logically fit into the current "hierarchy" and hence might further clutter the ParkBench suite and make it further unfocused. - The current ParkBench structure is not manageable. Testing and maintenance would be extremely expensive in the current structure. - My company's effort may be interpreted as an endorsement of the current structure and model. The suite is not popular with vendors for reasons outlined above. Participation is currently discouraged. Discussion? Regards, Charles Grassl SGI/Cray Eagan, Minnesota USA From owner-parkbench-comm@CS.UTK.EDU Wed May 21 17:25:15 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id RAA27513; Wed, 21 May 1997 17:25:15 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA07579; Wed, 21 May 1997 17:18:07 -0400 Received: from rastaman.rmt.utk.edu (root@TCHM11A6.RMT.UTK.EDU [128.169.27.188]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id RAA07571; Wed, 21 May 1997 17:18:02 -0400 Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id RAA01108; Wed, 21 May 1997 17:24:43 -0400 Sender: mucci@CS.UTK.EDU Message-ID: <3383681A.D98C5FB@cs.utk.edu> Date: Wed, 21 May 1997 17:24:42 -0400 From: "Philip J. Mucci" Organization: University of Tennessee, Knoxville X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586) MIME-Version: 1.0 To: parkbench-comm@CS.UTK.EDU CC: "PVM Developer's Mailing List" Subject: Mesg Passing Benchmarks References: <199705131733.RAA20181@magnet.cray.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi all, Charles Grassl in his last message to this committee volunteered to write a suite of message passing benchmarks to replace the Low Levels...Before any action on his or this committee's part, I would recommend that you all have a look at version 3 of my pvmbench package. It now does MPI as well and can easily support other message passing primitives with a few #defines. Version 3 along with some sample results can be found at http://www.cs.utk.edu/~mucci/pvmbench. Note that this has not been tested on any MPP's with UTK PVM. This benchmark will generate and graph the following: bandwidth gap time (to buffer an outgoing message) roundtrip (latency /2) barrier/sec broadcast summation reduction Other tests can easily be added...I would highly recommend before any action done that this code be examined. It is less than a year old, version 3 available on that page is in beta, i.e. it has not been released to the general public. Let me know what you think... -Phil -- /%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\ \*%/ http://www.cs.utk.edu/~mucci PVM/Active Messages \%*/ From owner-parkbench-comm@CS.UTK.EDU Fri May 23 12:03:04 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA06549; Fri, 23 May 1997 12:03:03 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id LAA15901; Fri, 23 May 1997 11:05:32 -0400 Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id LAA15895; Fri, 23 May 1997 11:05:30 -0400 Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK) id LAA01370; Fri, 23 May 1997 11:05:31 -0400 Message-Id: <199705231505.LAA01370@berry.cs.utk.edu> to: parkbench-comm@CS.UTK.EDU Subject: Minutes of May ParkBench Meeting Date: Fri, 23 May 1997 11:05:31 -0400 From: "Michael W. Berry" Here are the minutes from the recent ParkBench meeting in Knoxville. Best regards, Mike ----------------------------------------------------------------- Minutes of ParkBench Meeting - Knoxville Hilton, May 9, 1997 ----------------------------------------------------------------- ParkBench Attendee List: (MBa) Mark Baker Univ. of Portsmouth mab@sis.port.ac.uk (MBe) Michael Berry Univ. of Tennessee berry@cs.utk.edu Shirley Browne Univ. of Tennessee browne@cs.utk.edu (JD) Jack Dongarra Univ. of Tenn./ORNL dongarra@cs.utk.edu Jeff Durachta Army Res. Lab MSRC durachta@arl.mil (VG) Vladimir Getov Univ. of Westminister getovv@wmin.ac.uk (CG) Charles Grassl SGI/Cray cmg@cray.com (TH) Tony Hey Univ. of Southampton ajgh@ecs.soton.ac.uk (AH) Adolfy Hoisie Los Alamos Nat'l Lab hoisie@lanl.gov (CK) Charles Koelbel Rice University chk@cs.rice.edu (PM) Phil Mucci Univ. of Tennessee mucci@cs.utk.edu Erik Riedel GENIAS Software GmbH erik@genias.de (SS) Subhash Saini NASA Ames saini@nas.nasa.gov (RS) Ron Sercely HP-Convex sercely@convex.hp.com Alan Stagg CEWES stagga@wes.army.mil (ES) Erich Strohmaier Univ. of Tennessee erich@cs.utk.edu (PW) Pat Worley Oak Ridge Nat'l Lab worleyph@ornl.gov SPEC-HPG Visitors: Don Dossa DEC dossa@eng.pko.dec.com (RE) Rudi Eigenmann Purdue University eigenman@ecn.purdue.edu Greg Gaertner DEC ggg@zko.dec.com Jean Suplick HP suplick@rsn.hp.com Joe Throp Kuck & Associates throp@kai.com At 9:05am EST, TH opened the meeting and ask that all the attendees introduce themselves. After a brief overview of the proposed agenda, MBe reviewed the minutes from the last ParkBench meeting in October of '96. The minutes were unanimously accepted and TH asked VG to present the proposed changes to the low-level benchmarks (9:20am). VG reviewed the original COMMS1 (ping-pong or simplex communication) and the COMMS2 (duplex communication) low-level benchmarks. He discussed some of the problems with the previous versions. These included the omission of calculated bandwidth, large message length problems, and large errors in the asymptotic fit. In collaboration with RS and CG, a number of improvements have been made to these benchmarks: 1. Measured bandwidth is provided in output. 2. Time for shortest message is provided. 3. Maximum measured bandwidth and the corresponding message length is now provided. 4. The accuracy of the least-squares 2-parameter fit has been improved (sum of squares of the "relative" and not absolute error is now used). 5. New 3-parameter variable-power fit for certain cases added. 6. Can report parametric fits if the error is less than some user-specified tolerance. 7. Introduce KDIAG parameter to invoke diagnostic outputs. 8. Modifications fo ESTCOM.f (as suggested by RS). CG pointed out that it may not always be possible to interpret zero-length messages for these codes. On the Cray machines, such messages force an immediate return (i.e., no synchronization). He proposed that allowing zero- length messages be removed for the COMMS benchmarks. RS showed an actual COMMS1 performance graph demonstrating the difficulty of data extrapolation (if used to get latency for zero-length message-passing). RS pointed out, however, that zero-length message are defined w/in MPI, and suggested that a simple return (as in the case of Cray machines) is not standard. VG displayed some of the observed COMMS1/2 performance obtained on the Cray T3E. The 3-parameter fit yielded a 7% relative error for messages ranging from 8 to 1.E+7 bytes. CG questioned how the breakpoints were determined? He indicated the input parameters to the program required previous knowledge of where breakpoints occur (although implementations could change constantly). TH suggested that the parametric fitting should not be the default for these benchmarks, i.e., separate the analysis from the actual benchmarking (this concept was seconded by CG). RS suggested that the fitting routines could be placed on the WWW/Internet and the COMMS1/2 codes simply produce data. CK, however, stressed that the codes should maintain some minimal parametric fitting for clarity and consistency of output interpretations. The minimal message length shown for the T3E results shown by VG was 8 and the corresponding minimal message length for a Convex CXD set of COMMS benchmarks was 1. The lack of similar ranges of messages could pose problems for comparisons. JD strongly felt that users will return to the notion of "latency" and want zero-length message overheads. Users may be primarily interested in start-up time for message-passing. RS pointed out that MPI does process zero-length messages. JD suggested that the minimal message length for the COMMS benchmarks be 8 bytes and RS proposed that the minimal message-passing time and corresp. message length be an output. After more discussion, the following COMMS changes/outputs were unanimously agreed upon: 1. Maximum bandwidth with corresp. message size. 2. Minimum message-passing time with corresp. message size. 3. Time for minimum message length (could be 0, 1, 8, or 32 bytes but must be specified). 4. The software will be split into two program: one to report the spot measurements and the other for the analysis. At 10:00 am, SPEC-HPG members joined the ParkBench meeting for a joint session. CK reviewed the DoD Modernization Program. He indicated that the program is based on 3 primary components: 1. CHSSI (Commonly Highly Scalable Software Initiative) 2. DREN (Defense Research & Engineering Network) 3. Shared Resource Centers (4 Major Shared Resource Centers or MSRC's and 20 Distributed Centers or DC's) Benchmarking is part of the mission of the MSRC's, especially for system integration and the Programming Environment & Training (PET) team. CK mentioned that the resources available at the MSRC's include: 256-proc. Cray T3E, SGI Power Challenge (CEWES), 256 proc. IBM SP/2 and SGI Origin 2000 at ASC, SGI 790 at NAVO, and a collection of {SGI Origin, Cray Titan, J90} at the Army Research Lab. The benchmarking needs of the DoD program can be categorized as either contractual or training. The contractual needs are specified as PL1 (evaluation of initial machines), PL2 (upgrade to gain 3 times the performance of PL1), and PL3 (upgrade to gain 10 times the performance of PL1). CK mentioned that the MSRC's are planning for the PL2 phase later this year with PL3 scheduled in approx. 3 years. The training needs include: the evaluation of programming paradigms, the evaluation of performance trade-offs, templates for designing new codes, and benchmarks for training examples. The contractual benchmarks comprise 30 benchmarks (22 programs) some of which are export-controlled or proprietary (data may not be used in the public domain in some cases). The run rules specify the number of iterations for each benchmark in the suite. Each MSRC uses a different number of iterations per benchmark. Code modifications are allowed (parallel directives and message-passing can be used but no assembler) and algorithm substitutions are permitted provided the problem does not become specialized. The only performance metric reported for these benchmarks is the elapsed time for the entire suite. Benchmarks can be upgraded to reflect current workloads of the MSRCs but they must be compared head-to-head with previous systems. Example codes included in the DoD benchmark suite include: CTH (finite volume shock simulation), X3D (explicit finite element code), OCEAN-O2 (an ocean modeling code), NIKE3D (implicit nonlinear 3D FEM), and Aggregate I/O benchmark. Planned benchmarking activites for the DoD Modernization Program include: 1. benchmarks for evaluating programming techniques (determine what works; develop decision trees) 2. benchmarks for teaching (classes on "worked" examples; template modification) This effort currently has 1 FTE and over 50 University personnel (in PET program) involved (although they are not primarily responsible for benchmarking work). At 10:35am, TH asked AH from Los Alamos Nat'l Lab to overview their ASCI benchmark suite. He began by pointing out that these codes formulate the "Los Alamos set of" ASCI Benchmarks. Before presenting the list of codes, AH noted that the philosophy of this activity was to achieve "experiment ahead" capability especially with immature computing platforms. Los Alamos is also interested in developing performance modes as well as kernels. The list of active/research codes and compact applications comprising this suite are: Code Language(s) Parallelism Description *HEAT(RAGE) f77, f90 MPI(f90) Eulerian adaptive mesh MPIfSM(f77) refinement based on Riemann solvers; coupled physics-CFD; particle & radiative transport EULER f90 MPI Admissable fluid (for SIMD); SIMD(SP unstructured mesh, explicit vector) solution; high-speed fluids; SP=single processor NEUT f77 MPI,SM, Monte-Carlo, particle SHMEM SWEEP3D f90 MPI, SHMEM Inner/outer iteration (kernel) (compact application) HYDRO(T) f77 Serial (compact application) TBON f77 MPI Material science; quantum mechanics; polymer age simulation *TECOLOTE C++ MPI Mixed call hydro. with regular structured grid *TELURIDE f90 MPI Casting simulation; irregular structured grid; Krylov solution methods *DANTE HPF MPI * = export controlled The codes and compact apps above vary in size from 2,000 to 35,000 lines. AK noted that LANL could provide support for future ASCI-based ParkBench codes. The ASCI benchmark suite presented might include in the future tri-lab (Livermore, Sandia, Los Alamos) contributions. The ASCI application suite can be set up with data sets leading to varying run-times. AH mentioned that Los Alamos' ASCI benchmarking efforts are focused on high performance computing, leading edge architectures, algorithms, and applications. They are particularly concentrating in developing expertise in distributed shared-memory performance evaluation and modeling. AH expressed the hope that the efforts of ParkBench will follow similar directions. At 11:05am, SS reviewed some of the most recent NAS Parallel Benchmarks results. He began with vendor-optimized CG Class B results using row and column distribution blocking. Results for different numbers of processors of the T3D were reported along with results for the NEC SX-4, SGI Origin 2K, Convex SPP2K, Fujitsu VPP700, and IBM P2SC. He also showed results for FT Class B and BT Class B (all machines reported performed well on this benchmark). For BT, it was pointed out that 4 of the machines (Cray T3E, DEC Alpha, IBM P2SC, and NEC SX-4) essentially are based on the same processor but achieve widely-varying results. SS also reported HPF Class A MG results on 16 processors of the IBM SP2. The HPF version (APR-HPF/Portland Group compiled) was only 3 times slower than the MPI-based (f77) implementation. This is indeed a significant result given that two years ago the HPF version was as much as 10 times slower than the comparable MPI version. An HPF version of the Class A FT benchmark on 64 processors was shown to be faster than the MPI version (1.6 times faster) when optimized libraries are used in both versions. For the Class A SP benchmark (on 64 processors of the SP/2), the APR- and PGI-compiled HPF versions were within a factor of 2 of the MPI versions. Finally, the HPF Class A BT code on 64 processors of the Cray T3D was within a factor of 0.5 of the MPI version. At 11:35am, TH invited RE to overview current SPEC-HPG activities. The SPEC-HPG benchmarks define a suite of real-world high-performance computing applications designed for comparisons across different platforms (serial and message- passing). RE pointed out the history of the SPEC-HPG effort as a merger between the PERFECT and SPEC benchmarking activities. The current SPEC-HPG suite is comprised of 2 codes: SPECchem96 and SPECseis96. The SPECchem96 code evolved from the GAMES code used in pharmaceutical and chemical industries. It comprises 109,389 lines of f77 (21% comments), 865 subroutines and functions. The wave functions are written to disk. The SPECseis96 code is derived from the ARCO benchmark suite which consists of four phases: data generation, stack data, time migration, and depth migration. This code decomposes the domain into n equal parts (for n processors) with each part processed independently. It is have over 15K lines of code made up of 230 Fortran subroutines and 199 C functions for I/O and systems utilities. SPECseis96 uses 32-bit precision, FFT's, Kirchoff integrals, and finite differences. The very first set of SPEC-HPG benchmark results were approved on May 8, 1997 (preceding day). New benchmarks being considered are PMD (Parallel Molecular Dynamics) and MM5 (NCAR Weather Processing C code). The decision on whether or not to accept these two potential SPEC-HPG codes will be made in about 5 months. The SPEC-HPG run rules permit the use of compiler switches, source code changes, optimized libraries (which have been disclosed to customers). Only approved algorithmic changes will be disclosed. RE gave the URL for the SPEC-HPG effort: http://www.specbench.org/hpg. He also referred to a recent article by himself and S. Hassanzadeh in "IEEE Computational Science & Engineering" and two email reflectors for SPEC-HPG communication: comments@specbench.org and info@specbench.org. JD then gave a brief history of ParkBench and SPEC-HPG interactions and suggested that the two efforts might consider sharing results and software. The biggest difference in the two efforts is in the availability of software as ParkBench code is freely available and SPEC-HPG software has some restrictions. A forum to publish both sets of results was discussed and it was agreed that both efforts should at least share links on their respective webpages. RE pointed out that anyone can get the SPEC-HPG CD of benchmarks without actually being a SPEC member. JD stressed that the process of running codes (for any suite) needs to be simplified so that building executables for different platforms is not problematic. Modifications for porting should be restricted to driver programs. RS indicated that he has Perl scripts that runs all low_level, including COMMS3 for 2 to N procs, and produces a summary of the results. *** ACTION ITEM *** JD, RE, AH, and CK will discuss a potential joint effort to simplify the running of benchmark codes (contact RS also about his Perl scripts). MBa noted that the SPEC-HPG members should be added to the ParkBench email list (parkbench-comm@cs.utk.edu). He also indicated that European benchmarking workshop scheduled next Fall might coordinate with the European SPEC group (scheduled for Sept. 11-12). At 12:10pm, the attendees went to the lunch (Soup Kitchen). After lunch (1:30pm), TH asked ES and VG to coordinate changes to the COMMS benchmarks discussed above (*** ACTION ITEM ***). ES then discussed modifications to poly2 for the ParkBench V2.2 suite. The proposed changes include 1. enlarged arrays A(1000000), B(1000000) 2. removal of arrays C and D 3. avoid cache flush (use a sliding vector), i.e., DO I=1,N DO I=NMIN,NMAX becomes ... NMIN=NMIN+N+INC where INC=17 by default (avoids reuse of the old cache line). PM then discussed a program for determining parameters for memory subsystems. Characteristics of this software include the use of tight loops, independent memory references, maximized register use. He showed graphs of memory hierarchy bandwidth (reads and writes) depicting memory size (ranging from 4Kb to 4Mb) versus Mb/sec transfer rates. Some curves illustrated the effective cache size quite well. PM pointed out that dynamically-scheduled processors pose a significant problem for this type of modeling. The program can be run with or without a calibration loop exploiting known memory transfer data. CG suggested that it would be nice to have such a program to measure latency at all levels of the hierarchy. PM's webpages for this program are: http://www.cs.utk.edu/~mucci/cachebench and http://www.cs.utk.edu/~mucci/parkbench. CK suggested that an uncalibrated version of PM's benchmark would be more useful to users (more reflective of real codes). JD pointed out that the output of the program could be tabulated bandwidths, latencies, etc. CG felt this program would be a very useful tool. PM noted that the calibration will not be used by default. TH suggested that the ParkBench effort might want to develop a future "ParkBench Tool Set" which contains progams like this one developed by PM. With regard to the Linalg Kernels, ES noted that although many of the routines have calls to Scalapack routines, Scalapack will not be included in future software releases. Users will have to ge their own copies of the source (or binaries) for Scalapack. The size of these particular kernel benchmarks drops by a factor of one-third by removing Scalapack. *** ACTION ITEM *** ES will report the most recent Linalg benchmark performance results at the next ParkBench meeting. TH then asked for discussions on new benchmarks with MBa leading the discussion on HPF benchmarks. MBa indicated that a new mail reflector (parkbench-hpf@cs.utk.edu) had been set up for this cause with himself as moderator for low-level codes (CK will moderate kernels and SS will moderate discussions on HPF compact applications). MBa noted that there is limited manpower for the HPF benchmarking activities. CK noted that he had discussed this effort at recent the HPFF meeting (and other users meetings). A draft document on the ParkBench HPF benchmarks is available at http://www.sis.port.ac.uk/~mab/ParkBench. MBa felt strongly that without manpower support this particular activity will die and that a lead site is needed. *** ACTION ITEM *** CK and SS will investigate interest in HPF compact application development. JD indicated that wrappers are being used to create HPF versions of the Linalg kernels. The procedure involves writing wrappers for the current Scalapack driver programs. Eventually, these programs may be completely rewritten in HPF (this will start in the summer). TH suggested that HPF kernel benchmark performance be reported at the ParkBench meeting in September (at Southampton Performance Workshop). MBa went on to report on the status of I/O benchmarks. Basically, not much progress has been made on the ParkBench I/O initiative. A new I/O project between ECMWF, FECIT, and the Univ. of Southampton was launched this past February. They are looking at the I/O in the IFS code from the ECMWF (European Weather Forecasting). David Snelling is the FECIT leader who has also participated in ParkBench activities. This I/O project has 1 FTE at Southampton and 1.5 FTE at FECIT along with several personnel at ECMWF. One workshop, two technical meetings for the 1-year project is planned. The goals are: to develop instrumented I/O benchmarks and build on top of MPI-IO (test, characterize parallel systems). Their methodology is very similar to that of ParkBench. Codes in f90 and ANSI C are being considered (stubs for VAMPIR and PABLO). Regular reports to Fujitsu (sponsor of activity) are planned and a full I/O test suite is planned by February 1998. MBa also reported on the status of the ParkBench graphical database. Currently, the performance data is kept in a relational DBMS. A frontend Java applet has been written to query the DBMS on-the-fly. A backend is also in development which will automate the extraction of new performance data and insertion into the DBMS (via an http server). By September, a more complete prototype which will allow MS access and JDBC between 2 different machines should be ready. VG then discussed the development of Java-based low-level benchmarks. He presented a Java-to-C Interface Generator which would allow Java benchmarks to call existing C libraries on remote machines. He presented sample Java+C NAS PB results on a 16-processor IBM SP/2 (Class A IS Benchmark): Version 1 Proc 2Procs 4 Procs 8 Procs 16 Procs NASA (C) 29.1 17.4 9.4 5.2 2.8 C 40.5 24.9 13.1 9.3 15.6 Java ---- 132.5 64.7 37.9 33.5 At 2:50pm, TH reported other ParkBench activities including the new PEMCS (Performance Evaluation and Modeling for Computer Systems) electronic journal. Suggested articles/authors include: *1. ParkBench Report No. 2 (ES, MBe) *2. NAS PB 3. SPEC-HPG *4. Top 500 5. AutoBench (M. Ginsburg) *6. Euroben (van der Steen) 7. RAPS 8. Europort *9. Cache benchmarks 10. ASCI benchmarks (DoD) *11. PERFORM 12. R. Hockney *13. PEPS 14. C3I/Rome Labs Those articles possible for Summer '97 are marked via *. JD suggested that articles be available in Encapsulated Postscript, PDF (Adobe), and HTML. TH noted that EU funding will provide a host computer and some administration. Possible publishers are Oxford Univ. Press and Elsevier. At 3:10pm, ES requested more items for the ParkBench bibliography which will be available on the WWW. PW suggested that authors should be able to submit links to ParkBench-related applications. JD then briefly discussed WebBench which is a website focused on benchmarking and performance evaluation. Data is presented on platform,s applications, organizations, vendors, conferences, papers, newsgroups, FAQ's, and repositories (PDS, Top500, Linpack, etc.). The WebBench URL is http://www.netlib.org/benchweb. MBa reminded attendees of the Fall Performance Workshop/ParkBench meeting on (Thursday and Friday) Sept. 11 and 12. This meeting will be held at Venue, County Hotel, Southampton, UK. Invited and contributed talks will be presented. With regard to ParkBench funding, JD indicated that the UT/ORNL/NASA Ames proposal was not selected for funding but that it could be re- submitted next year. Expected funding from Rome lab was not received. TH and VG did not succeed this past year either although some funding from Fujitsu is possible. TH adjourned the meeting at 3:25pm EST. From owner-parkbench-comm@CS.UTK.EDU Tue May 27 10:32:45 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA25239; Tue, 27 May 1997 10:32:45 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA05022; Tue, 27 May 1997 10:12:02 -0400 Received: from exu.inf.puc-rio.br (exu.inf.puc-rio.br [139.82.16.3]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA05013; Tue, 27 May 1997 10:11:53 -0400 Received: from obaluae (obaluae.inf.puc-rio.br) by exu.inf.puc-rio.br (4.1/SMI-4.1) id AA20170; Tue, 27 May 97 11:11:00 EST From: maira@inf.puc-rio.br (Maira Tres Medina) Received: by obaluae (SMI-8.6/client-1.3) id LAA16226; Tue, 27 May 1997 11:10:58 -0300 Date: Tue, 27 May 1997 11:10:58 -0300 Message-Id: <199705271410.LAA16226@obaluae> To: parkbench-comments@CS.UTK.EDU Subject: Benchmarks Cc: parkbench-comm@CS.UTK.EDU, maira@CS.UTK.EDU, victal@CS.UTK.EDU X-Sun-Charset: US-ASCII Hello I'm a graduate student at the Computer Science Department of PUC-Rio (Catholic University of Rio de Janeiro). I'm currently studing Low_Level benchmarks for measuring basic computer characteristics. I have had same problems trying to run some of the benchmarks. For example, the benchmark comms1 for PVM, prints the following errors messages and stops. n05.sp1.lncc.br:/u/renata/maira/ParkBench/bin/RS6K>comms1_pvm Number of nodes = 2 Front End System (1=yes, 0=no) = 0 Spawning done by process (1=yes, 0=no) = 1 Spawned 0 processes OK... libpvm [t4000c]: pvm_mcast(): Bad parameter TIDs sent...benchmark progressing... n05.sp1.lncc.br:/u/renata/maira/ParkBench> bin/RS6K/comms1_pvm 1525-006 The OPEN request cannot be processed because STATUS=OLD was coded in the OPEN statement but the file comms1.dat does not exist. The program will continue if ERR= or IOSTAT= has been coded in the OPEN statement. 1525-099 Program is stopping because errors have occurred in an I/O request and ERR= or IOSTAT= was not coded in the I/O statement. I would like to know how I can execute the benchmarks only for PVM. Can you help me? I have not had problems with benchmarks sequentials (tick1, tick2 ...). Thank you very much for your attention. Maira Tres Medina Phd. Student Pontificial Catholic University Rio de Janeiro, Brazil From owner-parkbench-comm@CS.UTK.EDU Wed May 28 16:36:07 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA15377; Wed, 28 May 1997 16:36:06 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA16158; Wed, 28 May 1997 16:26:41 -0400 Received: from rastaman.rmt.utk.edu (root@TCHM03A16.RMT.UTK.EDU [128.169.27.60]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA16150; Wed, 28 May 1997 16:26:37 -0400 Received: from rastaman.rmt.utk.edu (localhost [127.0.0.1]) by rastaman.rmt.utk.edu (8.7.6/8.7.3) with SMTP id QAA00226; Wed, 28 May 1997 16:33:33 -0400 Sender: mucci@CS.UTK.EDU Message-ID: <338C968B.124F15AA@cs.utk.edu> Date: Wed, 28 May 1997 16:33:33 -0400 From: "Philip J. Mucci" Organization: University of Tennessee, Knoxville X-Mailer: Mozilla 3.01 (X11; I; Linux 2.0.28 i586) MIME-Version: 1.0 To: Maira Tres Medina CC: parkbench-comments@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU Subject: Re: Benchmarks References: <199705271410.LAA16226@obaluae> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, You need to make sure the dat files are in the executable directory. They should be installed in $PVM_ROOT/bin/$PVM_ARCH. -Phil -- /%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\ \*%/ http://www.cs.utk.edu/~mucci PVM/Active Messages \%*/ From owner-parkbench-comm@CS.UTK.EDU Thu Jun 5 11:30:41 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id LAA11302; Thu, 5 Jun 1997 11:30:41 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA14227; Thu, 5 Jun 1997 10:53:09 -0400 Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id KAA14220; Thu, 5 Jun 1997 10:53:07 -0400 Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id KAA06499; Thu, 5 Jun 1997 10:53:06 -0400 (EDT) Date: Thu, 5 Jun 1997 10:53:06 -0400 (EDT) From: Pat Worley Message-Id: <199706051453.KAA06499@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Gordon conference deadline extended Forwarding: Mail from 'Pat Worley ' dated: Thu, 5 Jun 1997 10:48:07 -0400 (EDT) Cc: worley@haven.EPM.ORNL.GOV, tony@cs.msstate.edu (Our apologies if you receive this multiple times.) There is still room for additional attendees at the Gordon Conference on High Performance Computing, and the Gordon Research Conference administration has agreed to extend the application deadline. As a practical matter, applications need to be submitted no later than JULY 1. We will also stop accepting applications before that date if the maximum meeting size is reached, so please apply as soon as possible if you are interested in attending. The simplest way to apply is to download the application form from the web site http://www.erc.msstate.edu/conferences/gordon97 or to use the online registration option available at the same site. If you have any problems with either of these, please contact the organizers at tony@cs.msstate.edu and worleyph@ornl.gov. Complete information on the meeting is available from the Web site or its links, but a short summary of the meeting follows: -------------------------------------------------------------------------- The 1997 Gordon Conference on High Performance Computing and Information Infrastructure: "Practical Revolutions in HPC and NII" Chair, Anthony Skjellum, Mississippi State University, tony@cs.msstate.edu, 601-325-8435 Co-Chair, Pat Worley, Oak Ridge National Laboratory, worleyph@ornl.gov, 615-574-3128 Conference web page: http://www.erc.msstate.edu/conferences/gordon97 July 13-17, 1997 Plymouth State College Plymouth NH The now bi-annual Gordon conference series in HPC and NII commenced in 1992 and has had its second meeting in 1995. The Gordon conferences are an elite series of conferences designed to advance the state-of-the-art in covered disciplines. Speakers are assured of anonymity and referencing presentations done at Gordon conferences is prohibited by conference rules in order to promote science, rather than publication lists. Previous meetings have had good international participation, and this is always encouraged. Experts, novices, and technically interested parties from other fields interested in HPC and NII are encouraged to apply to attend. The conference consists of technical sessions in the morning and evening, with afternoons free for discussion and recreation. Each session consists of 2 or 3 one hour talks, with ample time for questions and discussion. All speakers are invited and there are no parallel sessions. All attendees are both encouraged and expected to actively participate, via discussions during the technical sessions or via poster presentations. All attendees, including speakers, poster presenters, and session chairs, must apply to attend. Poster presenters should indicate their poster proposals on their applications. While all posters must be approved, successful applicants should assume that their posters have been accpeted unless they hear otherwise. Meeting Themes: Networks: Emerging capabilities and the practical implications : New types of networking Real-Time Issues Multilevel Multicomputers Processors-in-Memory and Other Fine Grain Computational Architectures Impact of Evolving Hardware on Applications Impact of Software Abstractions on Performance Confirmed Speakers: Ashok K. Agrawala University of Maryland Kirstie Bellman DARPA/SISTO James C. Browne University of Texas at Austin Andrew Chien University of Illiniois, Urbana-Champaign Thomas H. Cormen Dartmouth College Jean-Dominique Decotignie CSEM David Greenberg Sandia National Laboratories William Gropp Argonne National Laboratory Don Heller Ames Laboratory Jeff Koller Information Sciences Institute Peter Kogge University of Notre Dame Chris Landauer The Aerospace Corporation Olaf M. Lubeck Los Alamos National Laboratory Andrew Lumsdaine University of Notre Dame Lenore Mullins SUNY, Albany Paul Plassmann Argonne National Laboratory Lui Sha Carnegie Mellon Univeristy Paul Woodward University of Minnesota From owner-parkbench-comm@CS.UTK.EDU Tue Jul 1 17:06:52 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id RAA20550; Tue, 1 Jul 1997 17:06:51 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA21503; Tue, 1 Jul 1997 17:03:35 -0400 Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA21438; Tue, 1 Jul 1997 17:02:42 -0400 Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA10168; Tue, 1 Jul 97 22:00:22 BST Date: Tue, 1 Jul 97 20:55:49 From: Mark Baker Subject: Fall 97 Parkbench Workshop - Southampton, UK To: ejz@ecs.soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU, William Gropp , Antoine Hyaric , gent@genias.de, gcf@npac.syr.edu, geerd.hoffman@ecmwf.co.uk, reed@cs.uiuc.edu, david@cs.cf.ac.uk, clemens-august.thole@gmd.de, klaus.stueben@gmd.de, "J.C.T. Pool" , Paul Messina , foster@mcs.anl.gov, idh@soton.ac.uk, rjc@soton.ac.uk, plg@pac.soton.ac.uk, Graham.Nudd@dcs.warwick.ac.uk Cc: lec@ecs.soton.ac.uk, rjr@ecs.soton.ac.uk, "MATRAVERS Prof. D R STAF" , wilsona@sis.port.ac.uk, grant , hwyau@epcc.ed.ac.uk X-Priority: 3 (Normal) X-Mailer: Chameleon 5.0.1, TCP/IP for Windows, NetManage Inc. Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, This is to let you know that the Department of Electronics and Computer Science at the University of Southampton is organising a Fall 97 Parkbench Workshop on the 11th and 12th of September 1997. See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ for futher details. The workshop will include a number of talks from researchers working in th field of performance evaluation and modelling of computer systems, a panel discussion session and a Parkbench committee meeting. The Workshop is free to attend - workshop delegates need only cover their own travel and accommodation expenses. Attendance is limited and so the availability of places at the Workshop will be allocated on a first come basis. It is planned to turn the talks given at the Workshop into a series of short papers which will be put together and published as a Special Issue of the electronic journal Performance Evaluation and Modelling of Computer Systems (PEMCS). For further information or registration details refer to the Web pages - (http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/registration.html). I would appreciate it if you would kindly pass this email onto colleges who may be interested in the event. Regards Mark ------------------------------------- Dr Mark Baker CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 7/1/97 - Time: 8:55:49 PM URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Wed Jul 23 17:19:23 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id RAA04434; Wed, 23 Jul 1997 17:19:23 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA28191; Wed, 23 Jul 1997 17:10:39 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id RAA28171; Wed, 23 Jul 1997 17:10:24 -0400 (EDT) Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA14190; Wed, 23 Jul 97 22:10:30 BST Date: Wed, 23 Jul 97 22:01:41 +0000 From: Mark Baker Subject: PEMCS Web Site To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Face: "3@c]&iv:nfs&\mp6nN90ioxbQ-Eu:]}^MyviIL7YjwT,Cl)|TYpTQ})PP'&O=V`~)JQRWjM?H;'`q\"3mv "j@5vs)}!WC3pG9q:;rpe0\LoLQfY"1?1A.\(f=E*&QAW8WK+)*)T0[Bv=[{.-f7<6Ddv!2XaWhH X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, The Web site that will host the Journal of "Performance Evaluation and Modelling of Computer Systems (PEMCS)" can be found at: http://hpc-journals.ecs.soton.ac.uk/PEMCS/ The pages I have put up are at the present still in a "draft/under-construction" state. I would appreciate any comments or feedback about the pages. Regards Mark ------------------------------------- Dr Mark Baker DIS, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 07/23/97 - Time: 22:01:41 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Thu Jul 24 08:26:42 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA12708; Thu, 24 Jul 1997 08:26:42 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA04617; Thu, 24 Jul 1997 08:21:55 -0400 (EDT) Received: from berry.cs.utk.edu (BERRY.CS.UTK.EDU [128.169.94.70]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA04599; Thu, 24 Jul 1997 08:21:23 -0400 (EDT) Received: from cs.utk.edu by berry.cs.utk.edu with ESMTP (cf v2.11c-UTK) id IAA13817; Thu, 24 Jul 1997 08:21:24 -0400 Message-Id: <199707241221.IAA13817@berry.cs.utk.edu> To: Mark Baker cc: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU Subject: Re: PEMCS Web Site In-reply-to: Your message of Wed, 23 Jul 1997 22:01:41 -0000. Date: Thu, 24 Jul 1997 08:21:24 -0400 From: "Michael W. Berry" > Dear All, > > The Web site that will host the Journal of "Performance > Evaluation and Modelling of Computer Systems (PEMCS)" can > be found at: > > http://hpc-journals.ecs.soton.ac.uk/PEMCS/ > > The pages I have put up are at the present still in a > "draft/under-construction" state. > > I would appreciate any comments or feedback about the > pages. > > Regards > > Mark > > > > ------------------------------------- > Dr Mark Baker > DIS, University of Portsmouth, Hants, UK > Tel: +44 1705 844285 Fax: +44 1705 844006 > E-mail: mab@sis.port.ac.uk > Date: 07/23/97 - Time: 22:01:41 > URL http://www.sis.port.ac.uk/~mab/ > ------------------------------------- > Mark, the webpages are well organized. You might reconsider the red text on the green background of the menu frame. It was difficult to read on my machine at home. Nice work! Mike ------------------------------------------------------------------- Michael W. Berry Ayres Hall 114 berry@cs.utk.edu Department of Computer Science OFF:(423) 974-3838 University of Tennessee FAX:(423) 974-4404 Knoxville, TN 37996-1301 URL:http://www.cs.utk.edu/~berry/ ------------------------------------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Fri Aug 1 12:59:29 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA05831; Fri, 1 Aug 1997 12:59:27 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA01387; Fri, 1 Aug 1997 12:38:00 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA01337; Fri, 1 Aug 1997 12:37:24 -0400 (EDT) Received: from baker (baker.npac.syr.edu) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA15842; Fri, 1 Aug 97 17:36:11 BST Date: Fri, 1 Aug 97 17:17:51 +0000 From: Mark Baker Subject: Reminder - Fall Parkbench Workshop To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89, x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y% E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4 X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, This email is a reminder about the: ---------------------------------------------------------------------------------------------------- Fall ParkBench Workshop Thursday 11th/Friday 12th September 1997 at the University of Southampton, UK See http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ ---------------------------------------------------------------------------------------------------- If you are interested in attending the Workshop you should register now and reserve accommodation as hotel rooms in Southampton during the workshop period will be in short supply due to the "International Southampton Boat Show" which will also be taking place. At present we have a preliminary reservation on rooms at the County Hotel where the Workshop is being held. Without concrete delegate reservations we can only hold onto there rooms for approximately another week. Thereafter, accommodation at the Hotel, or around the city, may be more problematic in getting and reserving. So, I encourage potential Workshop delegates to register ASAP. Mark ------------------------------------- Dr Mark Baker University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 08/01/97 - Time: 17:17:52 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Aug 11 13:13:12 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id NAA20171; Mon, 11 Aug 1997 13:13:11 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA06842; Mon, 11 Aug 1997 13:02:59 -0400 (EDT) Received: from MIT.EDU (SOUTH-STATION-ANNEX.MIT.EDU [18.72.1.2]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA06808; Mon, 11 Aug 1997 13:02:42 -0400 (EDT) Received: from MIT.MIT.EDU by MIT.EDU with SMTP id AA27349; Mon, 11 Aug 97 13:02:14 EDT Received: from HOCKEY.MIT.EDU by MIT.MIT.EDU (5.61/4.7) id AA01161; Mon, 11 Aug 97 13:02:12 EDT Message-Id: <9708111702.AA01161@MIT.MIT.EDU> X-Sender: mmccarth@po9.mit.edu X-Mailer: Windows Eudora Pro Version 2.1.2 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Mon, 11 Aug 1997 13:02:12 -0400 To: alison.wall@rl.ac.uk, weber@scripps.edu, schauser@cs.ucsb.edu, dewombl@sandia.gov, edgorha@sandia.gov, rdskocy@sandia.gov, sales@pgroup.com, utpds@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU, pancake@cs.orst.edu, johnreed@ghost.CS.ORST.EDU, levesque@apri.com, davida@cit.gu.edu.au, gddt@gup.uni-linz.ac.at, atempt@gup.uni-linz.ac.at, rileyba@ornl.gov, bac@ccs.ornl.gov From: "Michael F. McCarthy" Subject: For Sale: CM-5 PLEASE FORWARD THIS NOTE TO ANYONE THAT YOU BELIEVE MAY HAVE AN INTEREST IN PURCHASING THIS SYSTEM! __________________________________________________________________________ Case #3971 -- FOR SALE - CM5 with 128 nodes and SDA -- __________________________________________________________________________ The MIT Lab for Computer Science offers for bid sale a Thinking Machines CM-5 Connection Machine (described below). Bids to purchase this system are requested from all interested parties, (with a minimum expected Bid of $25,000). All bids must be received at the MIT property office by 5:00 PM (EDT) on Monday, 8/Sept/97. The machine must be moved from MIT within 10 business days of acceptance of the bid. All expenses and arrangements for moving will be made by purchaser. The system consists of: 1) 128 PN CM-5 w/ Vector Units, 256 Network addresses-Part No.CM5-128V-32F 2) Scalable Disk Array with Twenty-four(24) 1.2 GB Drives-Part No.CM5-SA25F 3) Control Processor Interface-Part No. CM5-CPI 4) S-Bus to Diagnostics Network Interface-Part No. CM5-SDN 5) S-Bus Network Interface Board(5)-Part No. CM5-SNI [N.B. On July 16 1997 power was turned off.The machine can be turned back on in its present location only until Friday, 22/AUG/97 when wiring changes are planned in that machine room.] "The Institute reserves the right to reject any or all offers.MIT makes no warranty of any kind, express or implied, with respect to this equipment. This includes fitness for a particular purpose. It is the responsibility of those making an offer to determine, before making an offer, that the equipment meets any conditions required by those making that offer.Thank you." __________________________________________________________________________ Submit bids for Case #3971 before Monday, 8/Sept/97, 5:00 PM (EDT) to: ***************************************************************** * Michael F. McCarthy * Phone: (617)253-2779 * * MIT Property Office * FAX: (617)253-2444 * * E19-429 * E-Mail: mmccarth@MIT.EDU * * 77 Massachusetts Ave. * * * Cambridge, MA 02139 * * ***************************************************************** __________________________________________________________________________ SYSTEM HISTORY The Project SCOUT CM-5 is housed in M.I.T's Laboratory for Computer Science (L.C.S). The machine was acquired in 1993 as part of the the ARPA sponsored project SCOUT, and used to accomplish the stated aim of the project of "fermenting collaborations between users, builders and networkers of massively parallel computers". The CM-5 computer, developed and manufactured by Thinking Machines Corporation, evolved from earlier T.M.C. computers (the CM-2 and the CM-200)with an architecture targeted toward teraflops performance for large, complex data intensive applications. The MIT hardware consists of a total of 128 32MHz SPARC microprocessors, each with 4 proprietary floating point arithmetic units and 32Mb of local memory attached to it. The system also includes a subsidiary 25Gb parallel file system for handling high volume parallel application I/O. The system was operated under full maintenance contract from May of 1993 until March 20 1997. On July 16 1997 power was turned off. The machine can be turned back on in its present location only until Friday, 22/AUG/97 when wiring changes are planned in that machine room. The system was used primarily for research but a description of an instructional use made of the machine can be found at http://www-erl.mit.edu/eaps/seminar/iap95/cnh/CM5Intro.html Web sites about other CM5 sites and general information include: http://www.math.uic.edu/~hanson/cmg.html http://www.acl.lanl.gov/UserInfo/cm5admin.html http://ec.msc.edu/CM5/ __________________________________________________________________________ FUTURE MAINTENANCE People submitting bids may wish to discuss future maintenance issues with a company that is a present maintainer of CM5 Equipment, Connection Machine Services. ***************************************************************** * Larry Stewart * Phone: (505) 820-1470 * * * Cell: (505) 690-7799 * * Account Executive * FAX: (505) 820-0810 * * Connection Machines Services * Home: (505) 983-9670 * * 1373 Camino Sin Salida * Pager (888) 712-4143 * * Santa Fe, NM 87501 * E-Mail: stewart@ix.netcom.com * ***************************************************************** __________________________________________________________________________ Michael F. McCarthy MIT Property Office E19-429 77 Massachusetts Ave. Cambridge, MA 02139 Ph (617)253-2779 Fax (617)253-2444 From owner-parkbench-comm@CS.UTK.EDU Mon Sep 1 05:44:50 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id FAA11838; Mon, 1 Sep 1997 05:44:50 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA07176; Mon, 1 Sep 1997 05:35:14 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA07160; Mon, 1 Sep 1997 05:34:44 -0400 (EDT) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA14311; Mon, 1 Sep 97 10:33:06 BST Date: Mon, 1 Sep 97 10:19:23 +0000 From: Mark Baker Subject: Final Announcement: Fall ParkBench Workshop To: "Daniel A. Reed" , "J.C.T. Pool" , a.j.grant@mcc.ac.uk, Antoine Hyaric , Ed Zaluska , Fritz Ferstl , Hon W Yau , idh@soton.ac.uk, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU, Paul Messina , R.Rankin@Queens-Belfast.AC.UK, rjc@soton.ac.uk, topic@mcc.ac.uk, Wolfgang Genzsch Cc: lec@ecs.soton.ac.uk X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Face: ,<'y31|nlb,jCP5?km9\KD+>p9/e?:|$RRhY]e;#`awGHh=mrY.T??#]-*rt}l0*u`k2A7n KlqNG"u'-%cS@3|G[%=m%bSB[lfSn5n"gD4CU(j?1y?#SOkm!qw_=p%c#"6g&(+\Oy6T{4CEShal?z M)&Gd'Pb6Qc~>SPx{m[F55=]yY>cN>|/m5)T?q`OTjdQL=7-n%NT({;;$P*2[#7ZWL8baLoI_/F89, x'u`*$'<|ctKNYTSJuLV=!$QT3bN*>91V,a0Cc"_UsxwMKg\;#W2LZ$!`j?ZWp;byz~;y}2Dz6i7y% E&;gfnmI_~}+oifmWXJMHfWeezBL1("ZnFe!rnX[Q|,:IJ?iq+PePa/[3R4 X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear all, This is the FINAL ANNOUNCEMENT: If you would like to attend this workshop please let Lesley Courtney (lec@ecs.soton.ac.uk) know by Friday 5th September 1997 at the latest as we need to confirm numbers. Workshop details can be found at http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ Regards Mark ------------------------------------- Dr Mark Baker University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 09/01/97 - Time: 10:19:23 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Wed Sep 3 15:37:55 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id PAA20262; Wed, 3 Sep 1997 15:37:55 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA08273; Wed, 3 Sep 1997 15:19:14 -0400 (EDT) Received: from punt-2.mail.demon.net (punt-2b.mail.demon.net [194.217.242.6]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA08262; Wed, 3 Sep 1997 15:19:10 -0400 (EDT) Received: from minnow.demon.co.uk ([158.152.73.63]) by punt-2.mail.demon.net id aa0626941; 3 Sep 97 17:35 BST Message-ID: Date: Wed, 3 Sep 1997 16:31:07 +0100 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Prototype PICT release 1.0 MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a At their last meeting the Parkbench Committee recommended that an interactive curve fitting tool be produced for the postprocessing and parametrisation of Parkbench results using the latest Internet Web technology. I have produced a prototype of such a tool as a Java applet running on a Web page on the user's machine and called it PICT (Parkbench Interactive Curve-fitting Tool). This is now ready for evaluation and testing by the committee. The tool provides the following features: (1) Automatic plotting of Low-Level Parkbench output files from a URL anywhere on the Web (At present limited to New COMMS1 and Raw data, but easily extended to original COMMS1 and RINF1). This is useful for a quick comparison of raw data. (2) Automatic plotting of both 2 and 3-parameter curve-fits which are produce by the benchmarks. Good for checking the quality of the fits. (3) Allows manual rescaling of the graph range to suit the data, either by typing in the required range values or by dragging out a range box with the mouse. (4) Allows the 2-parameter and 3-parameter performance curves to be manually moved about the graph in order to fine tune the fits. The curve follows the mouse and the RMS and MAX percentage errors are shown as the curve moves. Alternatively parameter values can be typed in and the Manual button pressed when the curve for these values will be plotted. (5) The data file being plotted can be VIEWed and a HELP button provides a description of the action of each button in a separate windows. The PICT applet has been built on top of Leigh Brookshaw's 2D plotting package the URL for which is given at the bottom of the HELP window. The features under the RESTART button are in his original code, I have just added the 2-PARA and 3-PARA features. The applet was developed using JDK1.0 beta on a PC with a 1600x1200 display and works on the PC both locally and from my Web page with appletview, MSIE 3.02 and Netscape 3.01. It has also been successfully run on a Solaris Sun with NS3.01, but another Sun user has reported no graphs and errors due to "wrong applet version". So please report your experiences (both success and failure please) to me with all the details. To play with PICT turn your browser to: http://www.minnow.demon.co.uk/pict/source/pict1.html or pict1a.html pict1.html asks for 1000x732 pixels and suits PCs best (it's about the minimum useful size). pict1a.html asks for 1020x900 pixels and was necessary for the whole applet to visible on the Sun. For those wishing to look closer all the source is provided and should be downloadable. Suggestions for improvement, corrections or constructive criticism are solicited. I have asked for an agenda item to be included for the Parkbench meeting on 11 Sept in Southampton so that PICT can be discussed. I look forward to seeing some of you there. -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-lowlevel@CS.UTK.EDU Wed Sep 10 06:29:15 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA21129; Wed, 10 Sep 1997 06:29:14 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA20815; Wed, 10 Sep 1997 06:31:30 -0400 (EDT) Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT) Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100 Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST From: Vladimir Getov Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST Date: Wed, 10 Sep 97 11:33:13 BST Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk> To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU Subject: ParkBench Committee Meeting - tentative Agenda Dear Colleague, The ParkBench (Parallel Benchmark Working Group) will meet in Southampton, U.K. on September 11th, 1997 as part of the ParkBench Workshop. The Workshop site will be the County Hotel in Southampton. County Hotel Highfield Lane Southampton, U.K. Phone: +44-(0)1703-359955 Please send us your comments about the tentative agenda: 14:30 Finalize meeting agenda Minutes of last meeting (Erich Strohmaier) 14:45 Changes to Current release: - Low Level COMMS benchmarks (Vladimir Getov) - NAS Parallel Benchmarks (Subhash Saini) 15:15 New benchmarks: - HPF Low Level benchmarks (Mark Baker) 15:30 ParkBench Performance Analysis Tools: - ParkBench Result Templates (Vladimir Getov and Mark Papiani) - Visualization of Parallel Benchmark Results - new GBIS (Mark Papiani and Flavio Bergamaschi) - Interactive Web-page Curve-fitting of Parallel Performance Measurements (Roger Hockney) 16:15 Demonstrations: - Java Low-Level Benchmarks (Vladimir Getov) - BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio Bergamaschi) - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney) 16:45 Other activities: - "Electronic Benchmarking Journal" - status report (Mark Baker) Miscellaneous Date and venue for next meeting 17:00 Adjourn Tony Hey Vladimir Getov Erich Strohmaier From owner-parkbench-comm@CS.UTK.EDU Wed Sep 10 06:40:25 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA21186; Wed, 10 Sep 1997 06:40:25 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA20806; Wed, 10 Sep 1997 06:31:06 -0400 (EDT) Received: from sun3.nsfnet-relay.ac.uk (sun3.nsfnet-relay.ac.uk [128.86.8.50]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id GAA20791; Wed, 10 Sep 1997 06:30:47 -0400 (EDT) Received: from bright.ecs.soton.ac.uk by sun3.nsfnet-relay.ac.uk with JANET SMTP (PP); Wed, 10 Sep 1997 11:30:44 +0100 Received: from landlord.ecs.soton.ac.uk by bright.ecs.soton.ac.uk; Wed, 10 Sep 97 11:32:57 BST From: Vladimir Getov Received: from bill.ecs.soton.ac.uk by landlord.ecs.soton.ac.uk; Wed, 10 Sep 97 11:33:16 BST Date: Wed, 10 Sep 97 11:33:13 BST Message-Id: <2458.9709101033@bill.ecs.soton.ac.uk> To: parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU Subject: ParkBench Committee Meeting - tentative Agenda Dear Colleague, The ParkBench (Parallel Benchmark Working Group) will meet in Southampton, U.K. on September 11th, 1997 as part of the ParkBench Workshop. The Workshop site will be the County Hotel in Southampton. County Hotel Highfield Lane Southampton, U.K. Phone: +44-(0)1703-359955 Please send us your comments about the tentative agenda: 14:30 Finalize meeting agenda Minutes of last meeting (Erich Strohmaier) 14:45 Changes to Current release: - Low Level COMMS benchmarks (Vladimir Getov) - NAS Parallel Benchmarks (Subhash Saini) 15:15 New benchmarks: - HPF Low Level benchmarks (Mark Baker) 15:30 ParkBench Performance Analysis Tools: - ParkBench Result Templates (Vladimir Getov and Mark Papiani) - Visualization of Parallel Benchmark Results - new GBIS (Mark Papiani and Flavio Bergamaschi) - Interactive Web-page Curve-fitting of Parallel Performance Measurements (Roger Hockney) 16:15 Demonstrations: - Java Low-Level Benchmarks (Vladimir Getov) - BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio Bergamaschi) - PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney) 16:45 Other activities: - "Electronic Benchmarking Journal" - status report (Mark Baker) Miscellaneous Date and venue for next meeting 17:00 Adjourn Tony Hey Vladimir Getov Erich Strohmaier From owner-parkbench-lowlevel@CS.UTK.EDU Thu Sep 18 18:27:19 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id SAA12991; Thu, 18 Sep 1997 18:27:18 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id SAA29359; Thu, 18 Sep 1997 18:26:21 -0400 (EDT) Received: from k2.llnl.gov (zosel@k2.llnl.gov [134.9.1.1]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id SAA29352; Thu, 18 Sep 1997 18:26:19 -0400 (EDT) Received: (from zosel@localhost) by k2.llnl.gov (8.8.5/8.8.5/LLNL-Jun96) id PAA07246 for parkbench-lowlevel@cs.utk.edu; Thu, 18 Sep 1997 15:26:16 -0700 (PDT) Date: Thu, 18 Sep 1997 15:26:16 -0700 (PDT) From: Mary E Zosel Message-Id: <199709182226.PAA07246@k2.llnl.gov> To: parkbench-lowlevel@CS.UTK.EDU Subject: any pthreads tests??? Does anyone know of any low-level performance tests for pthreads libraries??? I'm interested in both single processor performance of pthreads calls - and also multiprocessor (shared memory) calls ... to measure the overhead of the calls. -mary zosel- zosel@llnl.gov From owner-parkbench-lowlevel@CS.UTK.EDU Sun Sep 21 09:13:20 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id JAA08699; Sun, 21 Sep 1997 09:13:20 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA15884; Sun, 21 Sep 1997 09:15:32 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA15877; Sun, 21 Sep 1997 09:15:30 -0400 (EDT) Received: from mordillo (p41.ascend3.is2.bb.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA10322; Sun, 21 Sep 97 14:15:58 BST Date: Sun, 21 Sep 97 13:32:56 +0000 From: Mark Baker Subject: Re: any pthreads tests??? To: Mary E Zosel , parkbench-lowlevel@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: <199709182226.PAA07246@k2.llnl.gov> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mary, This has been talked about as one of the activities that Parkbench would be interested in persuing. But, so far we have not had the time or man-power to follow up our interests. Ron Sercely at HP/CTCX was particularly interested in this area. Also, I know the people at Manchester University wrote a bunch of Pthreads codes - some were benchmarks - for their KSR machine. Hope this helps. Regards Mark --- On Thu, 18 Sep 1997 15:26:16 -0700 (PDT) Mary E Zosel wrote: > Does anyone know of any low-level performance tests for pthreads libraries??? > I'm interested in both single processor performance of pthreads calls - > and also multiprocessor (shared memory) calls ... to measure the overhead > of the calls. > -mary zosel- zosel@llnl.gov > ---------------End of Original Message----------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 09/21/97 - Time: 13:32:57 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Wed Sep 24 06:04:19 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA23913; Wed, 24 Sep 1997 06:04:18 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA23163; Wed, 24 Sep 1997 05:46:35 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA23156; Wed, 24 Sep 1997 05:46:26 -0400 (EDT) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA29780; Wed, 24 Sep 97 10:47:01 BST Date: Wed, 24 Sep 97 10:38:39 +0000 From: Mark Baker Subject: PC timers To: parkbench-comm@CS.UTK.EDU, parkbench-low-level@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Can someone suggest the appropriate PC-based timer function (MS Visual C++ or Digital Visual Fortran) to replace the usual gettimeofday call !? Cheers Mark ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 09/24/97 - Time: 10:38:39 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Thu Sep 25 10:11:01 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA20147; Thu, 25 Sep 1997 10:11:01 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA18087; Thu, 25 Sep 1997 09:24:56 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA18080; Thu, 25 Sep 1997 09:24:53 -0400 (EDT) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA12457; Thu, 25 Sep 97 14:25:35 BST Date: Thu, 25 Sep 97 14:11:59 +0000 From: Mark Baker Subject: PC Time function To: parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Thanks to all for timer info. I used the C function _ftime() in the end because it had millisec resolution. Just had to get a my head around using INTERFACE in F90 to include the external C function. I've inserted my version of the _ftime() timer below - I don't think there are any obvious error in it :-) I also implemented the dflib F90 function CALL GETTIM(hour, min, sec, hund) - this function passed tick2 testing but only has 1/100 sec resolution. ------------------------------------------------------- double dwalltime00() { struct _timeb timebuf; _ftime( &timebuf ); return (double) timebuf.time + (double) timebuf.millitm / 1000.0; } double dwalltime00_() { return dwalltime00(); } double DWALLTIME00() { return dwalltime00(); } ------------------------------------------------------- Cheers Mark ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 09/25/97 - Time: 14:11:59 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Tue Oct 7 06:35:04 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA26560; Tue, 7 Oct 1997 06:35:04 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA25697; Tue, 7 Oct 1997 06:10:11 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA25668; Tue, 7 Oct 1997 06:09:40 -0400 (EDT) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA05125; Tue, 7 Oct 97 11:09:53 BST Date: Tue, 7 Oct 97 10:43:49 +0000 From: Mark Baker Subject: Workshop Papers To: "Aad J. van der Steen" , Charles Grassl , Clemens Thole , David Snelling , Erich Strohmaier , Grapham Nudd , Klaus Stueben , parkbench-comm@CS.UTK.EDU, Roger Hockney , Saini Subhash , Vladimir Getov , William Gropp X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, I am now back in the office and have a small amount of time to follow up the Parkbench Workshop that took place a few weeks ago. I would firstly like to thanks everyone who attended - especially all the speakers. Even though we did not attract hundreds of delegates to the workshop, I think the event was very successful - but I may be bias... So, the plans are that in the first instance I will collect the slides from all the speaker and package them up and put them on the PEMCS Web site. We also decided that we would encourage all the speaker to produce short papers on their talks and put all the workshop paper together to create a special issue the the PEMCES journal. Can the speakers therefore send me their slides (I would prefer powerpoint or word version if possible). I will harrass you further about a short papers in the near future. Thanks in advance for your help. Regards Mark ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 10/07/97 - Time: 10:43:49 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Sun Oct 12 09:55:57 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id JAA28908; Sun, 12 Oct 1997 09:55:57 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA08800; Sun, 12 Oct 1997 09:44:23 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA08793; Sun, 12 Oct 1997 09:44:20 -0400 (EDT) Received: from mordillo (p26.nas4.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA11347; Sun, 12 Oct 97 14:45:07 BST Date: Sun, 12 Oct 97 14:35:10 +0000 From: Mark Baker Subject: Equivalent to comms1 To: parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Can someone point me at the equivalant of comms1 written in C - either MPI or sockets (or even PVM if its out there). Cheers Mark ------------------------------------- Dr Mark Baker CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 10/12/97 - Time: 14:35:10 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Oct 13 16:30:04 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA17020; Mon, 13 Oct 1997 16:29:59 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA24297; Mon, 13 Oct 1997 16:02:05 -0400 (EDT) Received: from dancer.cs.utk.edu (DANCER.CS.UTK.EDU [128.169.92.77]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA24288; Mon, 13 Oct 1997 16:02:03 -0400 (EDT) From: Philip Mucci Received: by dancer.cs.utk.edu (cf v2.11c-UTK) id QAA02925; Mon, 13 Oct 1997 16:02:00 -0400 Date: Mon, 13 Oct 1997 16:02:00 -0400 Message-Id: <199710132002.QAA02925@dancer.cs.utk.edu> To: mab@sis.port.ac.uk, parkbench-comm@CS.UTK.EDU Subject: Re: Equivalent to comms1 In-Reply-To: X-Mailer: [XMailTool v3.1.2b] I would check out my mpbench on my web page.... It does PVM and MPI for now... > Can someone point me at the equivalant of comms1 written in > C - either MPI or sockets (or even PVM if its out there). > > Cheers > > Mark > > > ------------------------------------- > Dr Mark Baker > CSM, University of Portsmouth, Hants, UK > Tel: +44 1705 844285 Fax: +44 1705 844006 > E-mail: mab@sis.port.ac.uk > Date: 10/12/97 - Time: 14:35:10 > URL http://www.sis.port.ac.uk/~mab/ > ------------------------------------- > /%*\ Philip J. Mucci | GRA in CS under Dr. JJ Dongarra /*%\ \*%/ http://www.cs.utk.edu/~mucci PVM/Active Messages \%*/ From owner-parkbench-comm@CS.UTK.EDU Mon Oct 20 10:37:14 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA15359; Mon, 20 Oct 1997 10:37:14 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA07990; Mon, 20 Oct 1997 10:19:41 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA07691; Mon, 20 Oct 1997 10:17:09 -0400 (EDT) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA16636; Mon, 20 Oct 97 15:17:33 BST Date: Mon, 20 Oct 97 15:02:39 +0000 From: Mark Baker Subject: PEMCS Short Article To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, I've just put up (at last!!) the first PEMCES short article at http://hpc-journals.ecs.soton.ac.uk/PEMCS/Articles/ At the moment there is not much of a "house style" for the format of the papers and articles - this will hopefully be developed over the coming months. I expect to put the first full paper up on the Web in the next week or so. Comments, ideas and help with the journal and its Web site are most welcome. Regards Mark ------------------------------------------------------------------------------------------ COMPARING COMMUNICATION PERFORMANCE OF MPI ON THE CRAY RESEARCH T3E-600 AND IBM SP-2 1 by Glenn R. Luecke and James J. Coyle Iowa State University Ames, Iowa 50011-2251, USA Waqar ul Haque University of Northern British Columbia Prince George, British Columbia, Canada V2N 4Z9 Abstract This paper reports the performance of the Cray Research T3E and IBM SP-2 on a collection of communication tests that use MPI for the message passing. These tests have been designed to evaluate the performance of communication patterns that we feel are likely to occur in scientific programs. Communication tests were performed for messages of sizes 8 Bytes (B), 1 KB, 100 KB, and 10 MB with 2, 4, 8, 16, 32 and 64 processors. Both machines provided a high level of concurrency for the nearest neighbor communication tests and moderate concurrency on the broadcast operations. On the tests used, the T3E significantly outperformed the SP-2 with most performance tests being at least three times faster than the SP-2. ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 10/20/97 - Time: 15:02:42 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Sat Oct 25 08:52:33 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA12875; Sat, 25 Oct 1997 08:52:33 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA05256; Sat, 25 Oct 1997 08:41:15 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA05244; Sat, 25 Oct 1997 08:41:05 -0400 (EDT) Received: from mordillo (p16.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA01764; Sat, 25 Oct 97 13:41:26 BST Date: Sat, 25 Oct 97 13:27:24 +0000 From: Mark Baker Subject: Parkbench Workshop Talks - On line To: Chuck Koelbel , Clemens Thole , Grapham Nudd , Guy Robinson , Klaus Stueben , parkbench-comm@CS.UTK.EDU, William Gropp X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 2 (High) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, I have put the talks received so far up at... http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html Please can the speakers who have not passed their talks onto me to do so. Thanks in advance. Regards Mark ------------------------------------- Dr Mark Baker CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 10/25/97 - Time: 13:27:25 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Fri Oct 31 08:22:47 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA19412; Fri, 31 Oct 1997 08:22:46 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA15140; Fri, 31 Oct 1997 07:44:09 -0500 (EST) Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA15133; Fri, 31 Oct 1997 07:44:05 -0500 (EST) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa2017784; 31 Oct 97 12:25 GMT Message-ID: Date: Fri, 31 Oct 1997 12:22:33 +0000 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Announcing PICT2 MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a ANNOUNCING PICT2 ++++++++++++++++ The prototype Parkbench Interactive Curve Fitting Tool (PICT1) that was demonstrated at the Southampton meeting of Parkbench in September was difficult to use on small screens because the image was too large and could not be reduced in size to suit the users' screen size. Sorry, I had developed it on my own 1600x1200 display without realising that most users considered 800x600 as large! Well the new version PICT2 that is now on my web page allows for the full range of screen sizes: 640x480, 800x600, 1024x768, >=1600x1200, and also allows the user to customise his own display by selecting a font size and screen width and height. So the new version should be usable by all -- I hope! Another problem at Southampton was that the display workstation was very old and too slow in MHz to do the job. I use a P133 Pentium and the graphs lines move around instantly, but if you only have a 20MHz machine for example the response wil probably be too slow to be useful for real curve interactive fitting. There is nothing I can do about this except to suggest that you use the need to use PICT as an excuse (I mean justification) to upgrade your equipment. PICT2 still relies on the use of New COMMS1 to compute the least square 2-para fit and the 3-point fit fot the 3-para. The next step will be to put these features in PICT but that is a fair amount of code to get right and I thought it best to solve the screen-size problem first. But remember the key point about PICT is that it allows Interactive manual fitting and display that is not otherwise available. To try out PICT2 turn your browser to: http://www.minnow.demon.co.uk/pict/source/pict2a.html and follow the instructions. When you have a good PICT Frame displayed, press the HELP button for a description of the button actions. Please report problems, experiences (good and bad), suggestions to me at: roger@minnow.demon.co.uk I need feedback in order to improve the tool. Best wishes to you all Roger -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Tue Nov 11 06:21:05 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA18373; Tue, 11 Nov 1997 06:21:05 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA27963; Tue, 11 Nov 1997 06:06:45 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA27930; Tue, 11 Nov 1997 06:06:15 -0500 (EST) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA23083; Tue, 11 Nov 97 11:07:22 GMT Date: Tue, 11 Nov 97 11:00:36 GMT From: Mark Baker Subject: Couple of Announcements To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII A couple of announcements... Firstly, the majority of the papers presented at Fall ParkBench Workshop on Thursday 11th /Friday 12th September 1997 at the University of Southampton, are now on-line and can be found at... http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/abstracts.html or >From http://hpc-journals.ecs.soton.ac.uk/PEMCS/ and click on News in the left frame... Secondly, the first full paper for the electronic journal Performance Evaluation and Modelling of Computer Systems (PEMCS) "PERFORM - A Fast Simulator For Estimating Program Execution Time" By Alistair Dunlop and Tony Hey, Department Electronics and Computer Science University of Southampton Southampton, SO17 1BJ, U.K. Can be found at... http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/vol1.html See you'll at the Parkbench BOF at SC'97... Mark ------------------------------------- Dr Mark Baker CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 11/11/97 - Time: 11:00:36 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-lowlevel@CS.UTK.EDU Wed Nov 12 21:30:42 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id VAA13985; Wed, 12 Nov 1997 21:30:42 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id VAA06841; Wed, 12 Nov 1997 21:31:46 -0500 (EST) Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST) Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK) id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500 Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST) From: Erich Strohmaier To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU Subject: ParkBench BOF session at the SC'97 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear Colleague, The ParkBench (PARallel Kernels and BENCHmarks) committee has organized a BOF session at the SC'97 in San Jose. Room: Convention Center Room C1 Time: Wednesday 5:30pm We will talk about the latest release, new results available and future plans. Tentative Agenda of the BOF - Introduction, background, WWW-Server - Current Release of ParkBench - Low Level Performance Evaluation Tools - LinAlg Kernel Benchmarks - NAS Parallel Benchmarks, including latest results - Plans for the next Release - Electronic Journal of Performance Evaluation and Modeling for Computer Systems - Questions from the floor / discussion Please mark your calendar and plan to attend. Jack Dongarra Tony Hey Erich Strohmaier From owner-parkbench-comm@CS.UTK.EDU Wed Nov 12 21:46:18 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id VAA14031; Wed, 12 Nov 1997 21:46:17 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id VAA06813; Wed, 12 Nov 1997 21:31:03 -0500 (EST) Received: from rudolph.cs.utk.edu (RUDOLPH.CS.UTK.EDU [128.169.92.87]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id VAA06806; Wed, 12 Nov 1997 21:31:01 -0500 (EST) Received: from localhost by rudolph.cs.utk.edu with SMTP (cf v2.11c-UTK) id VAA24812; Wed, 12 Nov 1997 21:31:01 -0500 Date: Wed, 12 Nov 1997 21:31:00 -0500 (EST) From: Erich Strohmaier To: parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU, parkbench-comm@CS.UTK.EDU Subject: ParkBench BOF session at the SC'97 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear Colleague, The ParkBench (PARallel Kernels and BENCHmarks) committee has organized a BOF session at the SC'97 in San Jose. Room: Convention Center Room C1 Time: Wednesday 5:30pm We will talk about the latest release, new results available and future plans. Tentative Agenda of the BOF - Introduction, background, WWW-Server - Current Release of ParkBench - Low Level Performance Evaluation Tools - LinAlg Kernel Benchmarks - NAS Parallel Benchmarks, including latest results - Plans for the next Release - Electronic Journal of Performance Evaluation and Modeling for Computer Systems - Questions from the floor / discussion Please mark your calendar and plan to attend. Jack Dongarra Tony Hey Erich Strohmaier From owner-parkbench-lowlevel@CS.UTK.EDU Thu Nov 13 06:30:40 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA07097; Thu, 13 Nov 1997 06:30:40 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA01844; Thu, 13 Nov 1997 05:55:24 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST) Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA18430; Thu, 13 Nov 97 10:56:11 GMT Date: Thu, 13 Nov 97 10:48:53 GMT From: Mark Baker Subject: Fall 97 Parkbench Committee Meeting Minutes To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: Message-Id: Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579" --mordillo:879418490:877:126:21579 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, Here are the minutes of the Parkbench committee meeting held The County Hotel in Southampton during the Fall 97 Parkbench Workshop. For those of you with a MIME-compliant mail-reader I've attached a formatted word 7 doc. Regards Mark ----------------------------------------------------------------------------- Parkbench Committee Meeting Held during the Fall Parkbench Workshop The County Hotel Southampton, UK 1515, 11th September 1997 Meeting Participation List: Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk) Flavio Bergamaschi - Univ of Southampton (fab@ecs.soton.ac.uk) Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu) Vladimir Getov - Univ. of Westminister (getovv@wmin.ac.uk) Charles Grassl - SGI/Cray (cmg@cray.com) William Gropp - ANL (gropp@mcs.anl.gov) Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk) Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk) Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk) Subhash Saini - NASA Ames (saini@nas.nasa.gov) Dave Snelling - FECIT (snelling@fecit.co.uk) Aad J. van der Steen - RUU (steen@fys.ruu.nl) Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu) Klaus Stueben - GMD (klaus.stueben@gmd.de) Meeting Activities and Actions Tony Hey chaired the meeting. Minutes from last meeting were seven pages long and it was decided that only the actions from the last meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each took place. A discussion about interaction with SPEC-HPG was initiated. Comms Low-Level Benchmarks Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks. Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. A long discussion ensued. Action - Create a formal proposal of alternative or additions to the comms low-level benchmarks for SC'97 - Charles Grassl. Action - Members should look at the PALLAS version of the low-level benchmarks (based on Genesis/RAPS). Action - Erich Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench and add in the new Comms1 benchmark (with new curve fitting routine). NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts. Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web site. It was agreed that this would be discussed further informally. Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short discussion about this ensued. Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be put into a Parkbench report II by SC'97. Funding Efforts Jack Dongarra's recent benchmarking proposal was turned down. Tony Hey mentioned the possibly of entering a proposal to the EU. Possibility of a joint EU / NSF bid. Mark Baker asked if SIO would be interested in being more closely involved. William Gropp reported that SIO was actually winding down and so formal association was not really an option. AOB The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the Parkbench demonstrations which included: -- Java Low-Level Benchmarks (Vladimir Getov) -- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio Bergamaschi) -- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney) Jack Dongarra informed the committee of Parkbench BOF at SC'97 (Wednesday at 3.30PM). The meeting was wound up by Tony Hey at 1630. ----------------------------------------------------------------------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 11/13/97 - Time: 10:48:53 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- --mordillo:879418490:877:126:21579 Content-Type: APPLICATION/msword; name="minutes-fall-97.doc" Content-Transfer-Encoding: BASE64 Content-Description: minutes-fall-97.doc 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD///////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// ///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0 aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0 Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291 dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92 dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51 aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1 Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91 dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0 ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0 ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1 bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh 8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6 AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA +gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9 AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA /QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6 XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA AP7///8cAAAA/v/////////////////////////////////////////+//// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// ////////////////////////////////////////////////UgBvAG8AdAAg AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB ////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K AAAACwAAAAwAAAD+//////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// /////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0 AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/ //////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN 1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA --mordillo:879418490:877:126:21579-- From owner-parkbench-comm@CS.UTK.EDU Thu Nov 13 06:31:53 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA07105; Thu, 13 Nov 1997 06:31:52 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA01880; Thu, 13 Nov 1997 05:56:05 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA01835; Thu, 13 Nov 1997 05:55:18 -0500 (EST) Received: from mordillo (p19.nas2.is2.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA18430; Thu, 13 Nov 97 10:56:11 GMT Date: Thu, 13 Nov 97 10:48:53 GMT From: Mark Baker Subject: Fall 97 Parkbench Committee Meeting Minutes To: parkbench-comm@CS.UTK.EDU, parkbench-hpf@CS.UTK.EDU, parkbench-lowlevel@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: Message-Id: Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="mordillo:879418490:877:126:21579" --mordillo:879418490:877:126:21579 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, Here are the minutes of the Parkbench committee meeting held The County Hotel in Southampton during the Fall 97 Parkbench Workshop. For those of you with a MIME-compliant mail-reader I've attached a formatted word 7 doc. Regards Mark ----------------------------------------------------------------------------- Parkbench Committee Meeting Held during the Fall Parkbench Workshop The County Hotel Southampton, UK 1515, 11th September 1997 Meeting Participation List: Mark Baker - Univ. of Portsmouth (mab@sis.port.ac.uk) Flavio Bergamaschi - Univ of Southampton (fab@ecs.soton.ac.uk) Jack Dongarra - Univ. of Tenn./ORNL (dongarra@cs.utk.edu) Vladimir Getov - Univ. of Westminister (getovv@wmin.ac.uk) Charles Grassl - SGI/Cray (cmg@cray.com) William Gropp - ANL (gropp@mcs.anl.gov) Tony Hey - Univ. of Southampton (ajgh@ecs.soton.ac.uk) Roger Hockney - Univ. of Westminister (roger@minnow.demon.co.uk) Mark Papiani - Univ of Southampton (mp@ecs.soton.ac.uk) Subhash Saini - NASA Ames (saini@nas.nasa.gov) Dave Snelling - FECIT (snelling@fecit.co.uk) Aad J. van der Steen - RUU (steen@fys.ruu.nl) Erich Strohmaier - Univ. of Tennessee (erich@cs.utk.edu) Klaus Stueben - GMD (klaus.stueben@gmd.de) Meeting Activities and Actions Tony Hey chaired the meeting. Minutes from last meeting were seven pages long and it was decided that only the actions from the last meeting would be reviewed. The actions from last meeting were reviewed - a short discussion about each took place. A discussion about interaction with SPEC-HPG was initiated. Comms Low-Level Benchmarks Vladimir Getov gave a short presentation on the current status of the Parkbench Comms benchmarks. Charles Grassl was asked to explained how his new Comms programs worked and the rationale behind it. A long discussion ensued. Action - Create a formal proposal of alternative or additions to the comms low-level benchmarks for SC'97 - Charles Grassl. Action - Members should look at the PALLAS version of the low-level benchmarks (based on Genesis/RAPS). Action - Erich Strohmaier and Vladimir Getov will discuss the efforts needed to split up Parkbench and add in the new Comms1 benchmark (with new curve fitting routine). NPB - Subhash Siani reported on the status of the NAS Parallel Benchmarks HPF - Mark Baker read Chuck Koebel's email about CEWES HPCM HPF efforts. Action - Subhash Siani will let RICE know that Gina should start of from the single NAS codes Electronic Journal - Mark Baker and Tony Hey reported on the electronic journal PEMCS and its Web site. It was agreed that this would be discussed further informally. Parkbench Report -Erich Strohmaier reported on the efforts of creating a new Parkbench report. A short discussion about this ensued. Action - Jack Dongarra /Tony Hey will talk to other members about the potential efforts that could be put into a Parkbench report II by SC'97. Funding Efforts Jack Dongarra's recent benchmarking proposal was turned down. Tony Hey mentioned the possibly of entering a proposal to the EU. Possibility of a joint EU / NSF bid. Mark Baker asked if SIO would be interested in being more closely involved. William Gropp reported that SIO was actually winding down and so formal association was not really an option. AOB The participants were then invited by Tony to move to the University of Southampton (bldg. 16) for the Parkbench demonstrations which included: -- Java Low-Level Benchmarks (Vladimir Getov) -- BenchView: Java Tool for Visualization of Parallel Benchmark Results (Mark Papiani and Flavio Bergamaschi) -- PICT: An Interactive Web-page Curve-fitting Tool (Roger Hockney) Jack Dongarra informed the committee of Parkbench BOF at SC'97 (Wednesday at 3.30PM). The meeting was wound up by Tony Hey at 1630. ----------------------------------------------------------------------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 11/13/97 - Time: 10:48:53 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- --mordillo:879418490:877:126:21579 Content-Type: APPLICATION/msword; name="minutes-fall-97.doc" Content-Transfer-Encoding: BASE64 Content-Description: minutes-fall-97.doc 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAB AAAAEQAAAAAAAAAAEAAAEgAAAAEAAAD+////AAAAABAAAAD///////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// ///////////////////////cpWgAY+AJBAAAAABlAAAAAAAAAAAAAAAAAwAA hxAAABAeAAAAAAAAAAAAAAAAAAAAAAAAhw0AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAABgAAGoAAAAAGAAAagAAAGoYAAAAAAAAahgAAAAA AABqGAAAAAAAAGoYAAAAAAAAahgAABQAAACkGAAAAAAAAKQYAAAAAAAApBgA AAAAAACkGAAAAAAAAKQYAAAAAAAApBgAAAoAAACuGAAAEAAAAKQYAAAAAAAA Eh0AAHwAAAC+GAAAAAAAAL4YAAAAAAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAA AAAAvhgAAAAAAAC+GAAAAAAAAL4YAAAAAAAABxoAAAIAAAAJGgAAAAAAAAka AAAAAAAACRoAAEsAAABUGgAAUAEAAKQbAABQAQAA9BwAAB4AAACOHQAAWAAA AOYdAAAqAAAAEh0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAahgAAAAAAAC+GAAA AAAAAAAACQAKAAEAAgC+GAAAAAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAA AL4YAAAAAAAAvhgAAAAAAAASHQAAAAAAANQYAAAAAAAAahgAAAAAAABqGAAA AAAAAL4YAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL4YAAAAAAAA1BgAAAAAAADU GAAAAAAAANQYAAAAAAAAvhgAABYAAABqGAAAAAAAAL4YAAAAAAAAahgAAAAA AAC+GAAAAAAAAAcaAAAAAAAAAAAAAAAAAAAQq9KCIvC8AX4YAAAOAAAAjBgA ABgAAABqGAAAAAAAAGoYAAAAAAAAahgAAAAAAABqGAAAAAAAAL4YAAAAAAAA BxoAAAAAAADUGAAAMwEAANQYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAABQYXJrYmVuY2ggQ29tbWl0dGVlIE1lZXRp bmcNDUhlbGQgZHVyaW5nIHRoZSBGYWxsIFBhcmtiZW5jaCBXb3Jrc2hvcA0N VGhlIENvdW50eSBIb3RlbA0NU291dGhhbXB0b24sIFVLDQ0xNTE1LCAgMTF0 aCBTZXB0ZW1iZXIgMTk5Nw0NDU1lZXRpbmcgUGFydGljaXBhdGlvbiBMaXN0 Og0NTWFyayBCYWtlciAtIFVuaXYuIG9mIFBvcnRzbW91dGggKG1hYkBzaXMu cG9ydC5hYy51aykNRmxhdmlvIEJlcmdhbWFzY2hpICAtIFVuaXYgb2YgU291 dGhhbXB0b24gKGZhYkBlY3Muc290b24uYWMudWspDUphY2sgRG9uZ2FycmEg LSBVbml2LiBvZiBUZW5uLi9PUk5MIChkb25nYXJyYUBjcy51dGsuZWR1KQ1W bGFkaW1pciBHZXRvdiAgLSBVbml2LiBvZiBXZXN0bWluaXN0ZXIgKGdldG92 dkB3bWluLmFjLnVrKQ1DaGFybGVzIEdyYXNzbCAtIFNHSS9DcmF5IChjbWdA Y3JheS5jb20pDVdpbGxpYW0gR3JvcHAgLSBBTkwgKGdyb3BwQG1jcy5hbmwu Z292KQ1Ub255IEhleSAtIFVuaXYuIG9mIFNvdXRoYW1wdG9uIChhamdoQGVj cy5zb3Rvbi5hYy51aykNUm9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3Rt aW5pc3RlciAocm9nZXJAbWlubm93LmRlbW9uLmNvLnVrKQ1NYXJrIFBhcGlh bmkgLSBVbml2IG9mIFNvdXRoYW1wdG9uIChtcEBlY3Muc290b24uYWMudWsp DVN1Ymhhc2ggU2FpbmkgLSBOQVNBIEFtZXMgKHNhaW5pQG5hcy5uYXNhLmdv dikNRGF2ZSBTbmVsbGluZyAtIEZFQ0lUIChzbmVsbGluZ0BmZWNpdC5jby51 aykNQWFkIEouIHZhbiBkZXIgU3RlZW4gIC0gUlVVIChzdGVlbkBmeXMucnV1 Lm5sKQ1FcmljaCBTdHJvaG1haWVyIC0gVW5pdi4gb2YgVGVubmVzc2VlIChl cmljaEBjcy51dGsuZWR1KQ1LbGF1cyBTdHVlYmVuIC0gR01EICAoa2xhdXMu c3R1ZWJlbkBnbWQuZGUpDQ1NZWV0aW5nIEFjdGl2aXRpZXMgYW5kIEFjdGlv bnMNDVRvbnkgSGV5IGNoYWlyZWQgdGhlIG1lZXRpbmcuDQ1NaW51dGVzIGZy b20gbGFzdCBtZWV0aW5nIHdlcmUgc2V2ZW4gcGFnZXMgbG9uZyBhbmQgaXQg d2FzIGRlY2lkZWQgdGhhdCBvbmx5IHRoZSBhY3Rpb25zIGZyb20gdGhlIGxh c3QgbWVldGluZyB3b3VsZCBiZSByZXZpZXdlZC4gVGhlIGFjdGlvbnMgZnJv bSBsYXN0IG1lZXRpbmcgd2VyZSByZXZpZXdlZCAtIGEgc2hvcnQgZGlzY3Vz c2lvbiBhYm91dCBlYWNoIHRvb2sgcGxhY2UuIEEgZGlzY3Vzc2lvbiBhYm91 dCBpbnRlcmFjdGlvbiB3aXRoIFNQRUMtSFBHIHdhcyBpbml0aWF0ZWQuDQ1D b21tcyBMb3ctTGV2ZWwgQmVuY2htYXJrcyANDVZsYWRpbWlyIEdldG92IGdh dmUgYSBzaG9ydCBwcmVzZW50YXRpb24gb24gdGhlIGN1cnJlbnQgc3RhdHVz IG9mIHRoZSBQYXJrYmVuY2ggQ29tbXMgYmVuY2htYXJrcy4gIENoYXJsZXMg R3Jhc3NsIHdhcyBhc2tlZCB0byBleHBsYWluZWQgaG93IGhpcyBuZXcgQ29t bXMgcHJvZ3JhbXMgd29ya2VkIGFuZCB0aGUgcmF0aW9uYWxlIGJlaGluZCBp dC4gDUEgbG9uZyBkaXNjdXNzaW9uIGVuc3VlZC4NDUFjdGlvbiAtIENyZWF0 ZSBhIGZvcm1hbCBwcm9wb3NhbCAgb2YgYWx0ZXJuYXRpdmUgb3IgYWRkaXRp b25zIHRvIHRoZSBjb21tcyBsb3ctbGV2ZWwgYmVuY2htYXJrcyBmb3IgU0OS OTcgLSBDaGFybGVzIEdyYXNzbC4NDUFjdGlvbiAtIE1lbWJlcnMgc2hvdWxk IGxvb2sgYXQgdGhlIFBBTExBUyB2ZXJzaW9uIG9mIHRoZSBsb3ctbGV2ZWwg YmVuY2htYXJrcyAoYmFzZWQgb24gR2VuZXNpcy9SQVBTKS4NDUFjdGlvbiAg LSBFcmljaCAgU3Ryb2htYWllciBhbmQgVmxhZGltaXIgR2V0b3Ygd2lsbCBk aXNjdXNzIHRoZSBlZmZvcnRzIG5lZWRlZCB0byBzcGxpdCB1cCBQYXJrYmVu Y2ggYW5kIGFkZCBpbiB0aGUgbmV3IENvbW1zMSBiZW5jaG1hcmsgKHdpdGgg bmV3IGN1cnZlIGZpdHRpbmcgcm91dGluZSkuDQ1OUEIgLSBTdWJoYXNoIFNp YW5pIHJlcG9ydGVkIG9uIHRoZSBzdGF0dXMgb2YgdGhlIE5BUyBQYXJhbGxl bCBCZW5jaG1hcmtzDQ1IUEYgLSBNYXJrIEJha2VyIHJlYWQgQ2h1Y2sgS29l YmVsknMgZW1haWwgYWJvdXQgQ0VXRVMgSFBDTSBIUEYgZWZmb3J0cy4NDUFj dGlvbiAtIFN1Ymhhc2ggU2lhbmkgd2lsbCBsZXQgUklDRSBrbm93IHRoYXQg R2luYSBzaG91bGQgc3RhcnQgb2YgZnJvbSB0aGUgc2luZ2xlIE5BUyBjb2Rl cw0NRWxlY3Ryb25pYyBKb3VybmFsIC0gTWFyayBCYWtlciBhbmQgVG9ueSBI ZXkgcmVwb3J0ZWQgb24gdGhlIGVsZWN0cm9uaWMgam91cm5hbCBQRU1DUyBh bmQgaXRzIFdlYiBzaXRlLiBJdCB3YXMgYWdyZWVkIHRoYXQgdGhpcyB3b3Vs ZCBiZSBkaXNjdXNzZWQgIGZ1cnRoZXIgaW5mb3JtYWxseS4NDVBhcmtiZW5j aCBSZXBvcnQgLUVyaWNoIFN0cm9obWFpZXIgcmVwb3J0ZWQgb24gdGhlIGVm Zm9ydHMgb2YgY3JlYXRpbmcgYSBuZXcgUGFya2JlbmNoIHJlcG9ydC4gQSBz aG9ydCBkaXNjdXNzaW9uIGFib3V0IHRoaXMgZW5zdWVkLg0NQWN0aW9uIC0g SmFjayBEb25nYXJyYSAvVG9ueSBIZXkgd2lsbCB0YWxrIHRvIG90aGVyIG1l bWJlcnMgYWJvdXQgdGhlIHBvdGVudGlhbCBlZmZvcnRzIHRoYXQgY291bGQg YmUgcHV0IGludG8gYSBQYXJrYmVuY2ggcmVwb3J0IElJIGJ5IFNDkjk3Lg0N RnVuZGluZyBFZmZvcnRzDQ1KYWNrIERvbmdhcnJhknMgcmVjZW50IGJlbmNo bWFya2luZyAgcHJvcG9zYWwgd2FzIHR1cm5lZCBkb3duLiBUb255IEhleSBt ZW50aW9uZWQgdGhlIHBvc3NpYmx5IG9mIGVudGVyaW5nIGEgcHJvcG9zYWwg dG8gdGhlIEVVLg1Qb3NzaWJpbGl0eSBvZiBhIGpvaW50IEVVIC8gTlNGIGJp ZC4NDU1hcmsgQmFrZXIgYXNrZWQgaWYgU0lPIHdvdWxkIGJlIGludGVyZXN0 ZWQgaW4gYmVpbmcgbW9yZSBjbG9zZWx5IGludm9sdmVkLiAgV2lsbGlhbSBH cm9wcCByZXBvcnRlZCB0aGF0IFNJTyB3YXMgYWN0dWFsbHkgd2luZGluZyBk b3duIGFuZCBzbyBmb3JtYWwgYXNzb2NpYXRpb24gd2FzIG5vdCByZWFsbHkg YW4gb3B0aW9uLg0NQU9CDQ1UaGUgcGFydGljaXBhbnRzIHdlcmUgdGhlbiBp bnZpdGVkIGJ5IFRvbnkgdG8gbW92ZSB0byB0aGUgVW5pdmVyc2l0eSBvZiBT b3V0aGFtcHRvbiAoYmxkZy4gMTYpIGZvciB0aGUgUGFya2JlbmNoIGRlbW9u c3RyYXRpb25zIHdoaWNoIGluY2x1ZGVkOg0NSmF2YSBMb3ctTGV2ZWwgQmVu Y2htYXJrcyAoVmxhZGltaXIgR2V0b3YpDUJlbmNoVmlldzogSmF2YSBUb29s IGZvciBWaXN1YWxpemF0aW9uIG9mIFBhcmFsbGVsIEJlbmNobWFyayBSZXN1 bHRzIChNYXJrIFBhcGlhbmkgYW5kIEZsYXZpbyBCZXJnYW1hc2NoaSkNUElD VDogQW4gSW50ZXJhY3RpdmUgV2ViLXBhZ2UgQ3VydmUtZml0dGluZyBUb29s IChSb2dlciBIb2NrbmV5KQ0NSmFjayBEb25nYXJyYSAgaW5mb3JtZWQgdGhl IGNvbW1pdHRlZSBvZiAgUGFya2JlbmNoIEJPRiBhdCBTQ5I5NyAoV2VkbmVz ZGF5IGF0IDMuMzBQTSkuDQ1UaGUgbWVldGluZyB3YXMgd291bmQgdXAgYnkg VG9ueSBIZXkgYXQgMTYzMC4NFQCk0C+l4D2mCAenCAeooAWpoAWqAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAHQMAAGgD AAByAwAAdAMAAIYDAAChAwAAogMAAMUDAADXAwAABAQAABcEAAA+BAAAUQQA AHwEAACNBAAAqgQAALYEAADNBAAA3gQAAAEFAAAVBQAAPgUAAFYFAABvBQAA jgUAAKsFAAC9BQAA1gUAAOoFAAAJBgAAGQYAAEIGAABSBgAAagYAAH4GAACB BgAAoAYAANcHAADzBwAARAgAAEkIAACJCAAAjggAANgIAADeCAAAVgkAAFwJ AABeCQAAvwkAAMUJAAA3CgAAPQoAAGsKAABuCgAAtgoAALkKAAAACwAABgsA AF8LAABxCwAACAwAABgMAACODAAAlAwAAB4NAAAtDQAALg0AAJIOAACVDgAA hxAAAJ4QAAD79gD0APHvAO0A7QDtAO0A7QDrAO0A7QDtAO0A7QDtAO0A7QDm APEA7QDtAOMA4+EA4wDtAPEA8QDjAPEA8QDjAPHvAPEA3wAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJ1AQACVoEABFWBVoEA CFWBXQMAYxgAAANdBQADXQQAA10DAAVVgV0DAAJoAQAIVYFdAwBjHAAACFWB XQMAYyQARwADAAAcAwAAHQMAAEUDAABGAwAAVwMAAFgDAABoAwAAaQMAAIQD AACFAwAAhgMAAKIDAACjAwAA2QMAABkEAABTBAAAjwQAALgEAADgBAAAFwUA AFgFAACQBQAAvwUAAOwFAAAbBgAAVAYAAIAGAACBBgAAoAYAAKEGAAC/BgAA wAYAANYHAADXBwAA8wcAAPQHAAC9CAAA1wgAANgIAAD9AAHAIaIB+gABwCGi Af0AAcAhRgH9AAHAIUYB/QABwCFGAf0AAcAhRgH9AAHAIUYB/QABwCHrAP0A AcAh6wD6AAHAIesA+gABwCHrAPoAAcAh6QD6AAHAIesA+gABwCHyAPoAAcAh 8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyAPoAAcAh8gD6 AAHAIfIA+gABwCHyAPoAAcAh8gD6AAHAIfIA+gABwCHyANwAAcAh8gD6AAHA IesA+gABwCEWAfoAAcAh6wD6AAHAIesA+gABwCHrAPoAA8Ah6wD6AAHAIesA +gABwCHpAPoAAcAh6wD6AALAIfIA+gABwCHrAPoAAcAh6wAAAAAAAAAAHQAA BQMMNP8BAAgAAAEAAAABAGgBAAAAAAAAtwAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAgAABQMAAgAABQEn2AgAAFUJAABWCQAAvgkAAL8JAABq CgAAawoAALUKAAC2CgAA/woAAAALAABeCwAAXwsAAAcMAAAIDAAAjQwAAI4M AAAdDQAAHg0AAC4NAAAvDQAAsA0AANUNAADWDQAAkQ4AAJIOAACWDgAAlw4A ACcPAAAoDwAAUw8AAL4PAAD/DwAAABAAAFgQAABZEAAAhxAAAP0E/8Ah2QH9 AAHAIesA/QT/wCHZAf0AAcAh6wD9BP/AIeAB/QABwCHrAP0AAcAh7gD9AAHA IesA/QABwCHuAP0AAcAh6wD9AAHAIe4A/QABwCHrAP0E/8Ah2QH9AAHAIesA /QT/wCHZAf0AAcAh6wD9BP/AIdkB/QABwCHrAP0AAcAh6QD9AAHAIesA/QAC wCHrAP0AAcAh6wD9AAHAIesA/QACwCHrAP0AAcAh6wD9AAHAIekA/QABwCHr AP0AAsAh6wD9AAHAIesA2wABwCH6ANsE/8Ah5QHbAAHAIfoA/QABwCHrAP0A AcAh6wD9AAHAIesA/QABwCHrAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAIQAABQMNCxFoAROY/gw0/wEACAAAAQAAAAEAaAEAAAAA AAC3AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAFAyQOAA8A CAABAEsADwAAAAAAGgAAQPH/AgAaAAZOb3JtYWwAAgAAAAMAYQkEAAAAAAAA AAAAAAAAAAAAAAAAACIAQUDy/6EAIgAWRGVmYXVsdCBQYXJhZ3JhcGggRm9u dAAAAAAAAAAAAAAAAAAAAIcNAAAEAIcQAAAAAP////8CAAQh//8BAAAg//8C AAAAAABqBwAAhw0AAAAAAQAAAAEAAAAAAAADAACeEAAACQAAAwAA2AgAAIcQ AAAKAAsAAAAAAAECAAAVAgAAiQ0AAAcAHAAHADMBC01hcmsgIEJha2VyJEM6 XHRleFxQYXJrQmVuY2hcbWludXRlcy1mYWxsLTk3LmRvYwtNYXJrICBCYWtl cjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9uXG1pbnV0ZXMtZmFs bC05Ny5kb2MLTWFyayAgQmFrZXIzQzpcdGV4XFBhcmtCZW5jaFxBZG1pbmlz dHJhdGlvblxtaW51dGVzLWZhbGwtOTcuZG9jC01hcmsgIEJha2VyM0M6XHRl eFxQYXJrQmVuY2hcQWRtaW5pc3RyYXRpb25cbWludXRlcy1mYWxsLTk3LmRv YwtNYXJrICBCYWtlcjNDOlx0ZXhcUGFya0JlbmNoXEFkbWluaXN0cmF0aW9u XG1pbnV0ZXMtZmFsbC05Ny5kb2P/QFRla3Ryb25peCBQaGFzZXIgNTUwIDEy MDAgZHBpAExQVDE6AHdpbnNwb29sAFRla3Ryb25peCBQaGFzZXIgNTUwIDEy MDAgZHBpAFRla3Ryb25peCBQaGFzZXIgNTUwIDEyMDAgZHBpAAAAAQQABJwA tAATzwEAAQABAOoKbwhkAAEADwBYAgIAAQAAAAMAAABMZXR0ZXIAABQAZWVl ZWVlZWVlZWVlZWVlZWVlZWVlZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBSSVbgEAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYAAAAAAAQJxAnECcAABAnAAAA AAAAAABjdQgA/wMAAQEBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFRla3Ryb25peCBQaGFzZXIg NTUwIDEyMDAgZHBpAAAAAQQABJwAtAATzwEAAQABAOoKbwhkAAEADwBYAgIA AQAAAAMAAABMZXR0ZXIAAAAADwAGAAAACgAwARQAMAEUAHIAcABjAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAFBSSVbgEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAYAAAAAAAQJxAnECcAABAnAAAAAAAAAABjdQgA/wMAAQEBAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAOAAQDHAAAAxwAAAAgAzwDPAMcAAAAAAAAAxwAAAHwAFRaQAQAAVGlt ZXMgTmV3IFJvbWFuAAwSkAECAFN5bWJvbAAWIpABAAZBcmlhbABIZWx2ZXRp Y2EAABsmvAIAAEFyaWFsIFJvdW5kZWQgTVQgQm9sZAARNZABAABDb3VyaWVy IE5ldwARNZABAgBNUyBMaW5lRHJhdwAiAAQAcQiJGAAA0AIAAGgBAAAAANBb GYa2ahuGAAAAAAcAXAAAAPQBAAAnCwAAAgAFAAAABACDEBcAAAAAAAAAAAAA AAIAAQAAAAEAAAAAAAAAIQMAAAAAKgAAAAAAAAALTWFyayAgQmFrZXILTWFy ayAgQmFrZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAEAAAACAAAAAwAAAAQAAAAFAAAABgAAAAcA AAAIAAAACQAAAAoAAAALAAAADAAAAA0AAAAOAAAADwAAAP7////9////FAAA AP7///8cAAAA/v/////////////////////////////////////////+//// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// ////////////////////////////////////////////////UgBvAG8AdAAg AEUAbgB0AHIAeQAAAGspDUphY2sgRG9uZ2FycmEgLSBVbml2LiBvZiBUZW5u Li9PUk5MIChkbxYABQH//////////wEAAAAACQIAAAAAAMAAAAAAAABGAAAA AKD5PUK9vrwBEKvSgiLwvAETAAAAQAMAAGdldG9XAG8AcgBkAEQAbwBjAHUA bQBlAG4AdAAAAHNzbCAtIFNHSS9DcmF5IChjbWdAY3JheS5jb20pDVdpbGxp YW0gGgACAQIAAAADAAAA/////3BwQG1jcy5hbmwuZ292KQ1Ub255IEhleSAt IFVuaXYuIG9mIAAAAAAQHgAAdG9uIAEAQwBvAG0AcABPAGIAagAAAC51aykN Um9nZXIgSG9ja25leSAtIFVuaXYuIG9mIFdlc3RtaW5pc3RlciAocm8SAAIB ////////////////LmNvLnVrKQ1NYXJrIFBhcGlhbmkAAAAAAAAAAAAAAAAA AAAAAAAAAGoAAABtcEBlBQBTAHUAbQBtAGEAcgB5AEkAbgBmAG8AcgBtAGEA dABpAG8AbgAAAHMgKHNhaW5pQG5hcy5uYXNhLmdvdikNRCgAAgH/////BAAA AP////9FQ0lUIChzbmVsbGluZ0BmZWNpdAAAAAAAAAAAAAAAAAAAAAACAAAA vAEAAHRlZW4BAAAA/v///wMAAAAEAAAABQAAAAYAAAAHAAAACAAAAP7///8K AAAACwAAAAwAAAD+//////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////// /////////////////////////////////wEA/v8DCgAA/////wAJAgAAAAAA wAAAAAAAAEYYAAAATWljcm9zb2Z0IFdvcmQgRG9jdW1lbnQACgAAAE1TV29y ZERvYwAQAAAAV29yZC5Eb2N1bWVudC42APQ5snEAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAA/v8AAAQAAgAAAAAAAAAAAAAAAAAAAAAAAQAA AOCFn/L5T2gQq5EIACsns9kwAAAAjAEAABIAAAABAAAAmAAAAAIAAACgAAAA AwAAAKwAAAAEAAAAuAAAAAUAAADMAAAABgAAANgAAAAHAAAA5AAAAAgAAAD0 AAAACQAAAAgBAAASAAAAFAEAAAoAAAA8AQAACwAAAEgBAAAMAAAAVAEAAA0A AABgAQAADgAAAGwBAAAPAAAAdAEAABAAAAB8AQAAEwAAAIQBAAACAAAA5AQA AB4AAAABAAAAAAAGAB4AAAABAAAAAFdSTR4AAAAMAAAATWFyayAgQmFrZXIA HgAAAAEAAAAAOmkQHgAAAAEAAAAAAAAAHgAAAAcAAABOb3JtYWwAYR4AAAAM AAAATWFyayAgQmFrZXIAHgAAAAIAAAA3AAQAHgAAAB4AAABNaWNyb3NvZnQg V29yZCBmb3IgV2luZG93cyA5NQAAAEAAAAAAKC3aDAAAAEAAAAAAAAAABQBE AG8AYwB1AG0AZQBuAHQAUwB1AG0AbQBhAHIAeQBJAG4AZgBvAHIAbQBhAHQA aQBvAG4AAAAAAAAAAAAAADgAAgD///////////////8AAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJAAAA6AAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAP///////////////wAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAA////////////////AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/ //////////////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAAAtXN 1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAEAAAA dAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwAAAAC AAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAAAAMA AAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAAAgAA AB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AABAAAAAADhSnMW+vAFAAAAAANR+ciLwvAEDAAAAAgAAAAMAAAD0AQAAAwAA ACcLAAADAAAAAAAAAAAAAAD+/wAABAACAAAAAAAAAAAAAAAAAAAAAAABAAAA AtXN1ZwuGxCTlwgAKyz5rjAAAAC4AAAACAAAAAEAAABIAAAADwAAAFAAAAAE AAAAdAAAAAUAAAB8AAAABgAAAIQAAAALAAAAjAAAABAAAACUAAAADAAAAJwA AAACAAAA5AQAAB4AAAAZAAAAVW5pdmVyc2l0eSBvZiBQb3J0c21vdXRoAAAA AAMAAAAAOgAAAwAAABcAAAADAAAABQAAAAsAAAAAAAAACwAAAAAAAAAMEAAA AgAAAB4AAAABAAAAAAMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAA --mordillo:879418490:877:126:21579-- From owner-parkbench-comm@CS.UTK.EDU Mon Nov 17 08:32:09 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA28026; Mon, 17 Nov 1997 08:32:09 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA07698; Mon, 17 Nov 1997 07:58:13 -0500 (EST) Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA07665; Mon, 17 Nov 1997 07:57:54 -0500 (EST) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa2024828; 17 Nov 97 12:43 GMT Message-ID: <06u4dCAfsDc0Ew8p@minnow.demon.co.uk> Date: Mon, 17 Nov 1997 12:39:59 +0000 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: To the PARKBENCH97 BOF MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a GREETINGS TO THE PARKBENCH 1997 BOF ----------------------------------- I am not able to attend the Parkbench BOF this year but would like to make the following input: Chairman: Please express my apologies for absence to the meeting. Agenda Item: Low-Level Performance Evaluation tools. -------------------------------------- The latest version of the Parkbench Interactive Curve Fitting Tool (PICT2) is on my Web page at: http://www.minnow.demon.co.uk/pict/source/pict2a.html I believe that this solves the problem of displaying on different sized screens. Please try it and give me feedback (I have had little so far, so I don't know how worthwhile it is!). This plots and allows manual interactive curve fitting of data anywhere on the Web in raw-data, Original COMMS1, and New COMMS1 format. However, it still relies on COMMS1 calculating the least squares 2-Para and 3-Point 3-Para fits. Agenda Item : Plans for the next Release. -------------------------- Just a reminder that New COMMS1 as announced in my email to the committee of 16 Feb 1997, was designed as the minimum necessary changes to the existing release to solve the problems raised at the beginning of the year. It involves new versions of 5 routines and 2 new routines. In addition, the Make files need the 2 new routines added where appropriate. We have incorporated these changes at Westminster in the existing release without trouble. I believe that these should be incorported in the next release. In summary: New COMMS1 In directory: http://www.minnow.demon.co.uk/Pbench/comms1/ The 5 Changed Routines: (1) File COMMS1_1.F replaces ParkBench/Low_Level/comms1/src_mpi/COMMS1.f (2) File COMMS1_1.INC replaces ParkBench/Low_Level/comms1/src_mpi/comms1.inc (3) File ESTCOM_1.F replaces ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f (4) File LSTSQ_1.F replaces ParkBench/lib/Low_Level/LSTSQ.f (5) File CHECK_1.F replaces Parkbench/lib/Low_Level/CHECK.f The 2 New Routines: (6) File LINERR_1.F add as ParkBench/lib/Low_Level/LINERR.f (7) File VPOWER_1.F add as ParkBench/lib/Low_Level/VPOWER.f HAVE A NICE MEETING, and best wishes to you all, Roger Hockney -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Mon Dec 1 08:38:55 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA05062; Mon, 1 Dec 1997 08:38:55 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA20432; Mon, 1 Dec 1997 08:03:34 -0500 (EST) Received: from hermes.lsi.usp.br (hermes.lsi.usp.br [143.107.161.220]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id IAA20425; Mon, 1 Dec 1997 08:03:30 -0500 (EST) Received: from cali.lsi.usp.br (cali.lsi.usp.br [10.0.161.7]) by hermes.lsi.usp.br (8.8.5/8.7.3) with SMTP id LAA05866; Mon, 1 Dec 1997 11:03:20 -0200 (BDB) Message-ID: <34830ABD.487C@lsi.usp.br> Date: Mon, 01 Dec 1997 11:06:37 -0800 From: Martha Torres Organization: LSI X-Mailer: Mozilla 3.01Gold (Win95; I) MIME-Version: 1.0 To: parkbench-comm@CS.UTK.EDU CC: mxtd@lsi.usp.br Subject: compiling ParkBench for MPICH Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sirs ParkBench Committee Dear Sirs, I am Ph.D student and I am working with collective communication operations. Particulary, I am interested in to quantify the influence of collective communication operations on the total execution time of several MPI-programs. My platform is a cluster of 8 Dual Pentium Pro processors interconnected by 100Mb/s Fastethernet. I use MPICH version 1.1, fort77 and cc compilers I have downloaded ParkBench.tar from netlib. I followed all instructions but there are some programs that did not work: 1. Low_Level/poly1 poly2 rinf1 tick1 tick2 They did not compile. It appears the following: ParkBench/lib/LINUX/ParkBench_misc.a: No such file or directory. How do I create this library?? 2. Kernels/LU_solver QR TRD They also did not compile. It appears the following: ParkBench/lib/LINUX/pblas_subset.a: In function 'pberror_' undefined reference to 'blacs_gridinfo_' undefined reference to 'blacs_abort_' 3. Comp_Apps/PSTSWM and Kernels/MATMUL They compiled but they did not run Thanks in advance, Best Regards Martha Torres Laboratorio de Sistema Integraveis University of Sao Paulo Sao Paulo - S.P. Brazil From owner-parkbench-lowlevel@CS.UTK.EDU Wed Dec 3 02:22:07 1997 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id CAA13224; Wed, 3 Dec 1997 02:22:07 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id CAA11602; Wed, 3 Dec 1997 02:22:29 -0500 (EST) Received: from soran.pacific.net.sg (soran.pacific.net.sg [203.120.90.76]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id CAA11594; Wed, 3 Dec 1997 02:22:26 -0500 (EST) From: Received: from pop1.pacific.net.sg (pop1.pacific.net.sg [203.120.90.85]) by soran.pacific.net.sg with ESMTP id PAA08723 for ; Wed, 3 Dec 1997 15:22:07 +0800 (SGT) Received: from pacific.net.sg ([203.116.15.109]) by pop1.pacific.net.sg with SMTP id PAA19445 for ; Wed, 3 Dec 1997 15:22:19 +0800 (SGT) Message-Id: <199712030722.PAA19445@pop1.pacific.net.sg> To: pbwg-compactapp@CS.UTK.EDU Date: Wed, 3 Dec 97 15:25:30 +0800 Subject: Seeking Importer for Blank CD-R and Computer Parts X-Mailer: Crescent Internet ToolPak OLE Mail Control v.1.0 Dear Sir, I understand that you are a computer reseller/trader. (If you not, or not interested in this message, DO NOTHING, as we might have made a mistake) We respect your privacy. As such, we only followup if you are interested and responded to our mail. We are seeking importer for the following products:- Able to supply the following in bulk / small quantity. 1.CD-R (Jewel Case) 2.CD-R (Spindle) 3.CD-R replicator (4pcs/hour, 50pcs tower) 4.Yamaha CDR400 (4x write, 6x read) recorder. 5.CD-RW as well as its recorder 6.PC Mother Board 7.PC RAMs 8.PC CPUs. All products FOB Singapore. Clients to specify freight forwarder. Thank you very much. Have a nice day. Best regards, Manager, Insas Networks. From owner-parkbench-comm@CS.UTK.EDU Wed Jan 7 16:49:19 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA19963; Wed, 7 Jan 1998 16:49:19 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA17461; Wed, 7 Jan 1998 16:30:05 -0500 (EST) Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA17452; Wed, 7 Jan 1998 16:30:02 -0500 (EST) Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA16817 for ; Wed, 7 Jan 1998 15:30:03 -0600 (CST) Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA27253; Wed, 7 Jan 1998 15:30:00 -0600 (CST) From: Charles Grassl Received: by magnet.cray.com (8.8.0/btd-b3) id VAA26077; Wed, 7 Jan 1998 21:29:59 GMT Message-Id: <199801072129.VAA26077@magnet.cray.com> Subject: Low Level benchmarks To: parkbench-comm@CS.UTK.EDU Date: Wed, 7 Jan 1998 15:29:59 -0600 (CST) X-Mailer: ELM [version 2.4 PL24-CRI-d] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit -- Charles Grassl From owner-parkbench-comm@CS.UTK.EDU Wed Jan 7 16:56:40 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA19981; Wed, 7 Jan 1998 16:56:40 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA17784; Wed, 7 Jan 1998 16:36:27 -0500 (EST) Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA17776; Wed, 7 Jan 1998 16:36:24 -0500 (EST) Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id PAA17087 for ; Wed, 7 Jan 1998 15:36:24 -0600 (CST) Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id PAA28449 for ; Wed, 7 Jan 1998 15:36:22 -0600 (CST) Received: from magnet by magnet.cray.com (8.8.0/btd-b3) via SMTP id VAA26107; Wed, 7 Jan 1998 21:36:21 GMT Sender: cmg@cray.com Message-ID: <34B3F553.167E@cray.com> Date: Wed, 07 Jan 1998 15:36:19 -0600 From: Charles Grassl Organization: Cray Research X-Mailer: Mozilla 3.01SC-SGI (X11; I; IRIX 6.2 IP22) MIME-Version: 1.0 To: parkbench-comm@CS.UTK.EDU Subject: Low Level benchmark errors and differences Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Parkbench Low Level interests From: Charles Grassl Subject: Low Level benchmark errors and differences Date: 7 January, 1998 We should not produce or publish Parkbench Low level benchmark results with the current suite of programs because the programs are inaccurate and unreliable. I ran the Low Level programs and compared the results with the same metrics as recorded from other benchmark programs. The differences range from less than 5% (acceptable) to a factor of 6 times difference, which is unacceptable. The differences, or "errors", are summarized in the table below. The recorded differences in results from the Low Level program were arrived at by comparing the Parkbench program reported metrics with the same metrics as measured by alternative programs. Table. Differences in Low Level benchmark results for two systems. System A is an Origin 2000. System B is a CRAY T3E. System A System B Rinf Startup Rinf Startup ----------------------------------------- COMMS1 <10% 6x <5% 6x COMMS2 2x 3x <5% <5% COMMS3 <5% <5% POLY1 <5% 60% 2x <5% POLY2 <5% 60% 2x <5% POLY3 - - 2x 80x The Parkbench Low Level programs are occasionally requested for benchmarking computer systems, but the results are usually rejected because of their inaccuracy and unreliability. If not rejected, they cause confusion and consternation because the results do not agree with other measurements of the same variables. I emphasize that this is not a case of obtaining optimization and favorable results for a computer system. The problem is with the inaccuracy and unreliability of the results. The Low Level programs measure and report low level parameters. Therefore their value is in accuracy and utility. The programs do not constitute definitions of the reported metrics and hence the results should correlate with other measurements of the the same variables. The Low Level programs are obsolete and need to be replaced. I have written seven simple programs, with MPI and PVM versions, and offer them as a replacement for the Low Level suite. I strongly suggest that we delete or withdraw from distribution the current Low Level suite. From owner-parkbench-comm@CS.UTK.EDU Thu Jan 8 05:40:28 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id FAA01529; Thu, 8 Jan 1998 05:40:28 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id FAA00442; Thu, 8 Jan 1998 05:20:21 -0500 (EST) Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id FAA00380; Thu, 8 Jan 1998 05:20:13 -0500 (EST) Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id LAA28869; Thu, 8 Jan 1998 11:20:05 +0100 (MET) Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id LAA24864; Thu, 8 Jan 1998 11:18:48 +0100 Date: Thu, 8 Jan 1998 11:18:48 +0100 From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel) Message-Id: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de> To: parkbench-comm@CS.UTK.EDU Subject: Low Level benchmark errors and differences Cc: ritzdorf@ccrl-nece.technopark.gmd.de, zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de, eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, tbeckers@ess.nec.de Reply-To: hempel@ccrl-nece.technopark.gmd.de To: Parkbench Low Level interests From: Rolf Hempel Subject: Low Level benchmark errors and differences, Note from Charles Grassl of January 7th Date: 8 January, 1998 Thank you, Charles, for your note on the Low Level benchmarks. It could not have come at a better time, because at NEC we just recently ran into problems with COMMS1. This code had been specified by a customer as a test case in a current procurement. When we ran COMMS1 with our current MPI library, the results for rinfinity and latency were completely wrong. In particular, the latency values were off by more than a factor of two, when compared with other ping-pong test programs. The following turned out to be the main reasons for the errors: 1. The performance model is completely inadequate. A linear dependency between time and message length, fitted to the measurements by least squares, is bound to fail in the presence of discontinuities caused by protocol changes. Most MPI implementations change protocols for different message lengths for an overall performance optimization. 2. To make things worse, the least square fit overweighs the data points for very long messages, because the differences "model minus measurement" are largest there in absolute terms. The fitted line, therefore, more or less ignores the short message measurements. As a result, the latencies are completely up to chance. 3. The correction for internal measurement overhead (e.g., for subroutine calls) is programmed in a sloppy way, to say the least. We discovered several subroutine calls which were not taken into account, and the overhead is measured with low precision. For our implementation, this alone introduced a latency error of about 25%. The result in our case was that, instead of the 13.5 usec latency measured by the MPICH MPPTEST routine, COMMS1 initially reported some 28 usec. My colleague Hubert Ritzdorf then made an interesting experiment: he removed some optimization from our MPI library for long messages, thus INCREASING the communication times for messages longer than 128000 bytes, and not changing anything for shorter messages. The resulting DROP in latency from 28 to under 22 usec clearly shows how ridiculous the COMMS1 benchmark is. Thus, I strongly agree with Charles in that the COMMS* benchmarks must be removed from PARKBENCH. They don't help anybody, and they only cause confusion on the side of customers and frustration on the side of benchmarkers. Let's get rid of this long-standing nuisance as quickly as possible. Best regards, Rolf Hempel ------------------------------------------------------------------------ Rolf Hempel (email: hempel@ccrl-nece.technopark.gmd.de) Senior Research Staff Member C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10, 53757 Sankt Augustin, Germany Tel.: +49 (0) 2241 - 92 52 - 95 Fax: +49 (0) 2241 - 92 52 - 99 From owner-parkbench-comm@CS.UTK.EDU Thu Jan 8 08:07:54 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA02383; Thu, 8 Jan 1998 08:07:53 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA05392; Thu, 8 Jan 1998 07:50:13 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA05383; Thu, 8 Jan 1998 07:50:03 -0500 (EST) Received: from mordillo (p108.nas1.is4.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA03072; Thu, 8 Jan 98 12:48:32 GMT Date: Thu, 8 Jan 98 12:10:55 GMT From: Mark Baker Subject: Re: Low Level benchmark errors and differences To: Charles Grassl , parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: <34B3F553.167E@cray.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII I am in agreement with Charles and Rolf about the low-level codes. We've known for some time that they (the codes) are less than perfect, if not in some cases flawed. At the SC'97 Parkbench meeting it was mooted that Parkbench should concentrate on producing, supporting, analysing and recording Low-Level codes and results. If this is the case then we should certainly ensure that what we support codes that are soundly written and produce consistent and reliable results. I certainly believe that a set of codes, akin to the low-level ones, should be part of the Parkbench suite. Maybe this is a good time to replace the current codes with those that Charles has produced !? As a side issue, I think we should produce C versions of whatever low-level codes we produce. Charles, I'd be interested in your thoughts on the codes that Pallas produce - ftp://ftp.pallas.de/pub/PALLAS/PMB/PMB10.tar.gz. These are C benchmark codes that run: PingPong - like comms1 PingPing - like comms2 Xover Cshift Exchange Allreduce Bcast Barrier - like synch1 Obviously, I would'nt like to comment on how well written they are or how reliable the results that they produce are. I'm relatively impressed with them. I also like the fact they try and produce results for commonly used MPI functions - cshift/exchange/etc. I've run the codes on NT boxes and they appear to produce results close to what I would expect. Regards Mark --- On Wed, 07 Jan 1998 15:36:19 -0600 Charles Grassl wrote: > To: Parkbench Low Level interests > From: Charles Grassl > > Subject: Low Level benchmark errors and differences > > Date: 7 January, 1998 > > > We should not produce or publish Parkbench Low level benchmark results > with the current suite of programs because the programs are inaccurate > and unreliable. I ran the Low Level programs and compared the results > with the same metrics as recorded from other benchmark programs. > The differences range from less than 5% (acceptable) to a factor of 6 > times difference, which is unacceptable. > > The differences, or "errors", are summarized in the table below. > The recorded differences in results from the Low Level program were > arrived at by comparing the Parkbench program reported metrics with the > same metrics as measured by alternative programs. > > > Table. Differences in Low Level benchmark results > for two systems. System A is an Origin 2000. > System B is a CRAY T3E. > > System A System B > Rinf Startup Rinf Startup > ----------------------------------------- > COMMS1 <10% 6x <5% 6x > COMMS2 2x 3x <5% <5% > COMMS3 <5% <5% > POLY1 <5% 60% 2x <5% > POLY2 <5% 60% 2x <5% > POLY3 - - 2x 80x > > > The Parkbench Low Level programs are occasionally requested for > benchmarking computer systems, but the results are usually rejected > because of their inaccuracy and unreliability. If not rejected, they > cause confusion and consternation because the results do not agree > with other measurements of the same variables. I emphasize that this > is not a case of obtaining optimization and favorable results for a > computer system. The problem is with the inaccuracy and unreliability > of the results. > > The Low Level programs measure and report low level parameters. > Therefore their value is in accuracy and utility. The programs do not > constitute definitions of the reported metrics and hence the results > should correlate with other measurements of the the same variables. > > The Low Level programs are obsolete and need to be replaced. I have > written seven simple programs, with MPI and PVM versions, and offer them > as a replacement for the Low Level suite. > > I strongly suggest that we delete or withdraw from distribution the > current Low Level suite. > ---------------End of Original Message----------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 01/08/98 - Time: 12:10:55 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Jan 12 16:02:28 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA26216; Mon, 12 Jan 1998 16:02:28 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA16631; Mon, 12 Jan 1998 15:38:05 -0500 (EST) Received: from post.mail.demon.net (post-20.mail.demon.net [194.217.242.27]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA16588; Mon, 12 Jan 1998 15:37:38 -0500 (EST) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa2012292; 12 Jan 98 17:34 GMT Message-ID: Date: Mon, 12 Jan 1998 17:33:01 +0000 To: hempel@ccrl-nece.technopark.gmd.de Cc: parkbench-comm@CS.UTK.EDU, ritzdorf@ccrl-nece.technopark.gmd.de, zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de, eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, tbeckers@ess.nec.de From: Roger Hockney Subject: Re: Low Level benchmark errors and differences In-Reply-To: <199801081018.LAA24864@sgi7.ccrl-nece.technopark.gmd.de> MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a To: Rolf, Charles, Mark and others, From: Roger I too am distressed to see the original COMMS1 code (written and tested for message lengths only up to 10^4) is still being issued by Parkbench and being used well outside its range of proven validity (message lengths now typically up to 10^7 or even 10^8). These problems were pointed out about one year ago by Charles and Ron, and as a result I worked on the code and issued to the committee a minmum set of changes to the current release that would solve many of the problems. These involve replacing five existing routines and adding two to the existing release. The routines involved have been downloadable from my Web site since about 12 March 1997 and have been used successfully at Westminster University in our work. The New COMMS1, as I called it, was the subject of two printed reports to the May 1997 meeting of Parkbench and further results were shown at the Sept 1997 meeting. There were also extensive discussions in this email group during 1997. Unfortunately my simple fixes were not inserted into the Parkbench release and as a result we are still getting a bad press from benchmarkers. After all the effort I put into solving this problem a year ago, I feel rather let down that my work was never used. If my changes had been encorporated into the Parkbenchmarks when they were offered at least as an interim measure, I believe we could have avoided much of the current bad publicity. I emphasise that the New COMMS1 was written as a minimum patch to the existing release to solve an urgent problem in the simplest way. I am not against a complete rethink of the low level benchmarks and now that MPI has become a recognised standard, benchmarks timing the principal software primitives of MPI would seem to be the most useful. Quite possibly Charles's or Mucci's codes could be used. However, I am still firmly convinced of the value of approximate parametric representation of all the benchmark measurements based on a simple performance model. Most of the existing low-level benchmarks were written primarily to determine such parameters and hence include both raw measurements and least squares curve fitting to obtain the parameters. I have yet to see data that cannot be satisfactorily fitted by 2 or 3 parameters, or two sets of 2-paras. And remember that I am talking here about fitting ALL the measured data by some simple formulae. After the decision of the May 1997 meeting to separate the raw measurements from the parametric curve fitting, the curve fitting will eventually become part of the "Parkbench Interactive Curve Fitting Tool" (PICT). At present this applet can be used to produce a manual curve fit, but eventually I will put up on my Web site a version in which the least squares and 3-point buttons are active. But PICT as it is can now be used manually to see how good or bad the 2-para and 3-para fits are. Turn your browser to: http://www.minnow.demon.co.uk/pict/source/pict2a.html and insert your raw data. I would be very interested to see what the NEC data looks like. To answer some of Rolf's points: Rolf Hempel writes > >1. The performance model is completely inadequate. A linear dependency > between time and message length, fitted to the measurements by > least squares, is bound to fail in the presence of discontinuities > caused by protocol changes. Most MPI implementations change > protocols for different message lengths for an overall performance > optimization. > Note that the original COMMS1 that you are using allows you to insert one break point to take account of one major discontinuity. Have you tried this? In any case, to make t_0 a good measure of startup it is sensible ALWAYS to make a breakpoint at say 100 or 1000 Byte, then the short message t_0 should be a good measure of startup. The long message t_0 is then not of interest and should be ignored. In this way one is using the straight- line fit over a short range of lengths, and the resulting t_0 should be a better estimate of latency because it is derived from several measurements rather than just selecting a single measurement (e.g. the time for the shortest message) -- surely a better experimental procedure. I emphasise that this procedure can be used now with the original COMMS1 to get sensible results. If there are many small discontinuities or changes of protocol then I expect you data is rather like that shown by Charles this time last year and used as an example in PICT. In this case the 3-para fit may give good results for your data as it did for Charles's. >2. To make things worse, the least square fit overweighs the data points > for very long messages, because the differences "model minus > measurement" are largest there in absolute terms. The fitted line, > therefore, more or less ignores the short message measurements. > As a result, the latencies are completely up to chance. > This is absolutely true and was discovered to be the problem one year ago. My solution, used in the New COMMS1, was and is to minimise the sum of the squares of the relative (rather than absolute) error. If this is done the values for short messages are not ignored in the way described, and t_0 is held much closer to the time for the smallest message length. Note also that the 3-parameter fit provided by New COMMS1 can be fitted exactly to the time for the shortest message, to the bandwidth for the longest message, and to the bandwidth near the mid point. This is the so-called 3-point fit, but it does require a third parameter. Can you please email me the output file for the NEC from the original COMMS1. I can then put this data through the New COMMS1 and see what two and three parameter fits are produced. Otherwise you could update your version of Parkbenchmarks with the 7 subroutines and rerun using New COMMS1. See the instructions at the end of this email. >28 usec. My colleague Hubert Ritzdorf then made an interesting >experiment: he removed some optimization from our MPI library for >long messages, thus INCREASING the communication times for messages >longer than 128000 bytes, and not changing anything for shorter >messages. The resulting DROP in latency from 28 to under 22 usec >clearly shows how ridiculous the COMMS1 benchmark is. > Hubert's results are just what one would expect from minimising the absolute error. I suspect you would not see this effect with New COMMS1 which does not over-emphasise the long message measurements. Please remember that the t_0 reported by COMMS1 is not a measurement of the time for any particular message length. It is the constant term in the fitted curve: t = t_0 + n/rinf which is an approximation to ALL the measured data. If you want to know the time, say for the smallest message length, then that is listed in the table of lengths and times reported in the benchmark output. If you mean by latency the time for the shortest message (hopefully zero or 1 Byte) then the COMMS1 measurements of this are in this table not in t_0. For those who missed my two earlier emailings on using the New COMMS1, I copy my earlier email below: Agenda Item : Plans for the next Release. -------------------------- Just a reminder that New COMMS1 as announced in my email to the committee of 16 Feb 1997, was designed as the minimum necessary changes to the existing release to solve the problems raised at the beginning of the year. It involves new versions of 5 routines and 2 new routines. In addition, the Make files need the 2 new routines added where appropriate. We have incorporated these changes at Westminster in the existing release without trouble. I believe that these should be incorported in the next release. In summary: New COMMS1 In directory: http://www.minnow.demon.co.uk/Pbench/comms1/ The 5 Changed Routines: (1) File COMMS1_1.F replaces the following file in the current release: ParkBench/Low_Level/comms1/src_mpi/COMMS1.f (2) File COMMS1_1.INC replaces ParkBench/Low_Level/comms1/src_mpi/comms1.inc (3) File ESTCOM_1.F replaces ParkBench/Low_Level/comms1/src_mpi/ESTCOM.f (4) File LSTSQ_1.F replaces ParkBench/lib/Low_Level/LSTSQ.f (5) File CHECK_1.F replaces Parkbench/lib/Low_Level/CHECK.f The 2 New Routines: (6) File LINERR_1.F add as ParkBench/lib/Low_Level/LINERR.f (7) File VPOWER_1.F add as ParkBench/lib/Low_Level/VPOWER.f Best wishes to you all Roger -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Tue Jan 13 08:38:07 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA17513; Tue, 13 Jan 1998 08:38:07 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA03191; Tue, 13 Jan 1998 08:20:10 -0500 (EST) Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id IAA03184; Tue, 13 Jan 1998 08:20:07 -0500 (EST) Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id OAA04953; Tue, 13 Jan 1998 14:19:47 +0100 (MET) Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA02202; Tue, 13 Jan 1998 14:18:30 +0100 Date: Tue, 13 Jan 1998 14:18:30 +0100 From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel) Message-Id: <199801131318.OAA02202@sgi7.ccrl-nece.technopark.gmd.de> To: roger@minnow.demon.co.uk Subject: COMMS1 Benchmark Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, eckhard@ess.nec.de, clantwin@ess.nec.de, parkbench-comm@CS.UTK.EDU Reply-To: hempel@ccrl-nece.technopark.gmd.de Dear Roger, thank you for your note on the COMMS1 benchmark. We didn't try the NEW COMMS1 code yet with our MPI library, so I cannot comment on its accuracy. I just would like to answer some of the issues you raised in your mail. Of course we have seen that in COMMS1 you can select a transition point between a short and a long model. For this choice, however, you have to be able to change the input data. In our case (a benchmark suite used in a procurement) our customer had provided the input dataset, and we were not allowed to change it. So, the only way for us to correct the results was to tune our MPI library to make it fit to the benchmark program. I don't think that this is what you had in mind when you wrote COMMS1. You didn't comment on the inaccuracies we found in the raw measurements. We ran several ping-pong benchmarks before, as, for example, the MPPTEST routine of MPICH, and they consistently give better latencies for short messages (difference approx. 25%). As I explained in my previous mail, we found the reason to be an improper correction for measurement overheads in COMMS1. Thus, the raw data are flawed, and this cannot be resolved by any parameter fitting. This is also the reason that I hesitate to send you the raw data reported by COMMS1 on our machine. I agree with you that it would be nice to have a few parameters to characterize the performance of any given system. The values for "n1/2" and "rinfinity" have been quite successful for vector arithmetic operations. The situation is, however, much more complicated for communication operations. As an example, let's take the famous ping-pong benchmark. We already discussed the problem of discontinuities caused by protocol changes. If you want to do a parameter fitting, the only reasonable solution seems to me that your test program automatically detects such points and handles the different protocols separately. If you leave the selection to an input parameter, you will inevitably run into the problem I discussed above. Even if you solve this problem, there remain many others. In modern (i.e. highly optimized) MPI implementations, the performance of a ping-pong operation crucially depends on the status of the two processes involved. Is the receiving process already waiting for the message? In a ping-pong, it usually is. This can make a huge difference! Also, the performance can also depend on the global number of processes active in the application. Not only do search lists in communication progress engines become shorter if there are fewer processes, but some implementers even went as far as writing special code for the case where you just have two processes. Ping-pong codes such as COMMS1 almost always just use two communicating processes, so they measure the best case. Another effect which is too often ignored is that messages can interfere with each other (both at the hardware and software level) if they are sent at the same time between different process pairs. All those effects combined cause a substantial difference between ping-pong results and measurements in real applications. In this situation the apparent precision of performance parameters can be quite misleading. If I want to judge the quality of an MPI implementation, I don't trust in best fit parameters so much. For the ping-pong code, I just look at a graphic representation of time versus message length for short messages, and another one of bandwidth versus message length for long messages. This way I can study discontinuities and other minor effects in detail. And then, take real applications and measure the communication times there. Then you will often find surprising results which you have never seen in a ping-pong benchmark. Best wishes, Rolf ------------------------------------------------------------------------ Rolf Hempel (email: hempel@ccrl-nece.technopark.gmd.de) Senior Research Staff Member C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10, 53757 Sankt Augustin, Germany Tel.: +49 (0) 2241 - 92 52 - 95 Fax: +49 (0) 2241 - 92 52 - 99 From owner-parkbench-comm@CS.UTK.EDU Thu Jan 15 14:17:57 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id OAA00690; Thu, 15 Jan 1998 14:17:56 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA23858; Thu, 15 Jan 1998 13:55:08 -0500 (EST) Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id NAA23830; Thu, 15 Jan 1998 13:54:57 -0500 (EST) Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id LAA11159 for ; Thu, 15 Jan 1998 11:11:42 -0600 (CST) Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id LAA08650 for ; Thu, 15 Jan 1998 11:11:41 -0600 (CST) From: Charles Grassl Received: by magnet.cray.com (8.8.0/btd-b3) id RAA07227; Thu, 15 Jan 1998 17:11:40 GMT Message-Id: <199801151711.RAA07227@magnet.cray.com> Subject: Low Level Benchmarks To: parkbench-comm@CS.UTK.EDU Date: Thu, 15 Jan 1998 11:11:39 -0600 (CST) X-Mailer: ELM [version 2.4 PL24-CRI-d] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: Parkbench interests From: Charles Grassl Subject: Low Level benchmarks Date: 15 January, 1998 Mark, thank you for pointing us to the PMB benchmark. It is well written and coded, but has some discrepancies and shortcomings. My comments lead to suggestions and recommendation regarding low level communication benchmarks. First, in program PMB the PingPong tests are twice as fast (in time) as the corresponding message length tests in the PingPing tests (as run on a CRAY T3E). The calculation of the time and bandwidth is incorrect by a factor of 100% in one of the programs. This error can be fixed by recording, using and reporting the actual time, amount of data sent and their ratio. That is, the time should not be divided by two in order to correct for a round trip. This recorded time is for a round trip message, and is not precisely the time for two messages. Half the round trip message passing time, as reported in the PMB tests, is not the time for a single message and should not be reported and such. This same erroneous technique is used in the COMMS1 and COMMS2 two benchmarks. (Is Parkbench is responsible for propagating this incorrect methodology.) In program PMB, the testing procedure performs a "warm up". This procedure is a poor testing methodology because is discards important data. Testing programs such as this should record all times and calculate the variance and other statistics in order to perform error analysis. Program PMB does not measure contention or allow extraction of network contention data. Tests "Allreduce" and "Bcast" and several others stress the inter-PE communication network with multiple messages, but it is not possible to extract information about the contention from these tests. The MPI routines for Allreduce and Bcast have algorithms which change with respect to number of PEs and message lengths, Hence, without detailed information about the specific algorithms used, we cannot extract information about network performance or further characterize the inter-PE network. Basic measurements must be separated from algorithms. Tests PingPong, PingPing, Barrier, Xover, Cshift and Exchange are low level. Tests Allreduce and Bcast are algorithms. The algorithms Allreduce and Bcast need additional (algorithmic) information in order to be described in terms of the basic level benchmarks. With respect to low level testing, the round trip exchange of messages, as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not characteristic of the lowest level of communication. This pattern is actually rather rare in programming practice. It is more common for tasks to send single messages and/or to receive single messages. In this scheme, messages do not make a round trip and there is not necessarily caching or other coherency effects. The single message passing is a distinctly different case from that of round trip tests. We should be worried that the round trip testing might introduce artifacts not characteristic of actual (low level) usage. We need a better test of basic bandwidth and latency in order to measure and characterize message passing performance. Here are suggestions and requirements, in an outline form, for a low level benchmark design: I. Single and double (bidirectional) messages. A. Test single messages, not round trips. 1. The round trip test is an algorithm and a pattern. As such it should not be used as the basic low level test of bandwidth. 2. Use direct measurements where possible (which is nearly always). For experimental design, the simplest method is the most desirable and best. 3. Do not perform least squares fits A PIORI. We know that the various message passing mechanisms are not linear or analytic because different mechanisms are used for different message sizes. It is not necessarily known before hand where this transition occurs. Some computer systems have more than two regimes and their boundaries are dynamic. 4. Our discussion of least squares fitting is loosing tract of experimental design versus modeling. For example, the least squares parameter for t_0 from COMMS1 is not a better estimate of latency than actual measurements (assuming that the timer resolution is adequate). A "better" way to measure latency is to perform addition DIRECT measurements, repetitions or otherwise, and hence decrease the statistical error. The fitting as used in the COMMS programs SPREADS error. It does not reduce error and hence it is not a good technique for measuring such an important parameter as latency. B. Do not test zero length messages. Though valid, zero length messages are likely to take special paths through library routines. This special case is not particularly interesting or important. 1. In practice, the most common and important message size is 64 bits (one word). The time for this message is the starting point for bandwidth characterization. D. Record all times and use statistics to characterize the message passing time. That is, do not prime or warm up caches or buffers. Timings for unprimed caches and buffers give interesting and important bounds. These timings are also the nearest to typical usage. 1. Characterize message rates by a minimum, maximum, average and standard deviation. E. Test inhomogeneity of the communication network. The basic message test should be performed for all pairs of PEs. II. Contention. A. Measure network contention relative to all PEs sending and/or receiving messages. B. Do not use high level routines where the algorithm is not known. 1. With high level algorithms, we cannot deduce which component of the timing is attributable to the "operation count" and which is attributable to the actual system (hardware) performance. III. Barrier. A. Simple test of barrier time for all numbers of processors. Additionally, the suite should be easy to use. C and Fortran programs for direct measurements of message passing times are short and simple. These simple tests are of order 100 lines of code and, at least in Fortran 90, can be written in a portable and reliable manner. The current Parkbench low level suite does not satisfy the above requirements. It is inaccurate, as pointed out by previous letters, and uses questionable techniques and methodologies. It is also difficult to use, witness the proliferation of files, patches, directories, libraries and the complexity and size of the Makefiles. This Low Level suite is a burden for those who are expecting a tool to evaluate and investigate computer performance. The suite is becoming a liability for our group. As such, it should be withdrawn from distribution. I offer to write, test and submit a new set of programs which satisfy most of the above requirements. Charles Grassl SGI/Cray Research Eagan, Minnesota USA From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 09:12:18 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id JAA11774; Fri, 16 Jan 1998 09:12:18 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA16130; Fri, 16 Jan 1998 08:53:07 -0500 (EST) Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id IAA16123; Fri, 16 Jan 1998 08:53:06 -0500 (EST) Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id IAA01963; Fri, 16 Jan 1998 08:52:17 -0500 (EST) Date: Fri, 16 Jan 1998 08:52:17 -0500 (EST) From: Pat Worley Message-Id: <199801161352.IAA01963@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Re: Low Level Benchmarks In-Reply-To: Mail from 'Charles Grassl ' dated: Thu, 15 Jan 1998 11:11:39 -0600 (CST) Cc: worley@haven.EPM.ORNL.GOV, ritzdorf@ccrl-nece.technopark.gmd.de, zimmermann@ccrl-nece.technopark.gmd.de, clantwin@ess.nec.de, eckhard@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, tbeckers@ess.nec.de I have not been paying close attention to the current Low Level communication suite discussions, having confidence in capabilities and resolve of the current participants, but have decided to muddy the waters with a few personal observations. 1) I do not use the Low Level suite in my own performnace-related work. I find that the interpretation of results is much easier if the experiments are designed to answer (my) specific performance questions. Producing numbers that are accurate enough and whose experiments are well-enough understood to be used to answer arbitrary performance questions is much more difficult. 2) It may be time to revisit the goals of the Low Level suite. There are two obvious extremes. a) Determine some (hopefully representative) metrics of point-to-point communication performance, concentrating on making the measurements fair when comparing across platforms, but not requiring that the underlying architecture parameters be derivable from these numbers, or that they agree exactly with any other group's measurements. In this situation, a two (or more) parameter model fit to the data can be useful, if only as a shorthand for the raw data, but the model should not be expected to explain the data. b) Characterize the low level communication performance for each platform. Charles Grassl's latest recommendation is a first step in that direction. As a personal aside, I attempted such an exercise a few years ago (on the T3D, looking at the effect of common usage patterns on performance, not just ping-pong between nearest neighbors). I quickly became swamped by the amount of data and by the number of ways of presenting it (and the work was never written up). I realize now that my problem was trying to address too many evaluation questions simultaneously. In addition to the large amount of data required, an accurate characterization is likely to require more platform-specific elements, and will continue to evolve as new machines are added, in order to be as fair to the new machines as it is to the old ones. (The two parameter models are very acurrate for some of the previous generation of homogeneous message-passing platforms.) In case my sympathies are not clear, I prefer to revisit and fix the current suite, "dumbing it down", if only in presentation, making it clear what it does and does not measure. In my own work, the point-to-point measurements are only for establishing a general performance baseline. The important measures are the performance observed in the kernel and full application codes. The baseline measurements are simply to assess the "peak achieveable" communication performance. While a full characterization is an important thing to do, I do not believe that this group has the manpower, resources, or staying power to do it right. At one time in the past, we proposed to simply be a clearinghouse for the best of the performance measurement codes. If Charles wants to write and submit such an extensive low level suite, we can consider it, but in the meantime we should address the problems in the current suite, and not claim more than is appropriate. In particular, make sure that the customer does not become concerned that the vendor-stated latency and bandwidth does not match the PARKBENCH reported values. A discrepancy does not necessarily mean that someone is lying, simply that different aspects are being measured. But we should also be sure that intermachine comparisons using PARKBENCH measurements are valid, otherwise, they serve no purpose. Pat Worley PS. - I may be in the fringe, but all my codes are written using variants of SWAP and SENDRECV, and most of the codes I see can be written in such a fashion (and could gain something from it). So, ping-pong and ping-ping are not irrelevant to me. PPS. - Of course the real reason for using ping-pong is the difficulty in measuring the time for one-way messaging. I was not aware that this was a solved problem, at least at the MPI or PVM level. Perhaps system instrumentation can answer it, but I didn't know that portable measurement codes could be guaranteed to do so across the different platforms. From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 10:57:55 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA13381; Fri, 16 Jan 1998 10:57:55 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA20483; Fri, 16 Jan 1998 10:38:52 -0500 (EST) Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id KAA20468; Fri, 16 Jan 1998 10:38:45 -0500 (EST) Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id QAA09438; Fri, 16 Jan 1998 16:38:41 +0100 (MET) Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA04930; Fri, 16 Jan 1998 16:37:14 +0100 Date: Fri, 16 Jan 1998 16:37:14 +0100 From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel) Message-Id: <199801161537.QAA04930@sgi7.ccrl-nece.technopark.gmd.de> To: parkbench-comm@CS.UTK.EDU Subject: Re: Low Level Benchmarks Cc: tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, eckhard@ess.nec.de, clantwin@ess.nec.de, zimmermann@ccrl-nece.technopark.gmd.de, ritzdorf@ccrl-nece.technopark.gmd.de, hempel@ccrl-nece.technopark.gmd.de Reply-To: hempel@ccrl-nece.technopark.gmd.de I would like to send some remarks to the notes by Charles Grassl and Pat Worley on the problem of low-level communication benchmarks. As Pat pointed out, the ping-pong benchmark has been invented because generally there is no global clock by which you could measure the time for a single message. Everybody knows that this is no perfect solution, and in my previous mail I already explained some aspects of why ping-pong results can differ substantially from times found in real applications. So, I think we will have to use ping-pong tests in the future, with the caveat that they only measure a very special case of message-passing. If Charles knows a way to measure single messages, I would like to learn about it. In most other points I agree with Charles. I'm strongly convinced that the COMMS* routines are obsolete and should be replaced with something reasonable. In particular, the current routines are far too complicated to use, and give completely meaningless results. Therefore, I think one should not even try to correct the COMMS* routines, especially as there are already better alternatives available. One example is the PMB suite of PALLAS. It is relatively easy to use, but the documentation should provide more information than the internal calling tree given in the README file. What is missing is a precise definition of the underlying measuring methodology. I strongly prefer the output of timing tables (perhaps translated in good graphical representations) over crude parametrizations like the ones in the COMMS* benchmarks. Those can only frustrate the experts and confuse all other people. As to the definition of latency, Charles is right in saying that zero byte messages are dangerous because they often use special algorithms. The straightforward solution to use 1 byte messages instead is bad because usually messages are sent as multiples of 4 or 8 bytes, and for other message lengths some overhead by additional copying or even subroutine calls may be introduced. Since the lengths of most real messages are multiples of 4 or 8 bytes, I support Charles' proposal to measure the time for an 8 byte message and call it the latency. I think the warm-up phase before the actual benchmarking is important in order not to smear out initialization overheads over some number of messages. The time for the first ping-pong (or other operation), however, should be measured and compared with the time found for the following operations. I very much welcome Charles Grassl's kind offer to write a new benchmark suite. Perhaps there are even other suites available which could also be candidates for getting adopted by PARKBENCH. This forum meanwhile is quite well-known, which gives them considerable responsibility. PARKBENCH's choice of benchmark programs influences procurements of new machines world-wide, and the availability of a good set of low level benchmarks could give PARKBENCH a good reputation. I'm afraid that the current set of routines has the opposite effect. - Rolf Hempel ------------------------------------------------------------------------ Rolf Hempel (email: hempel@ccrl-nece.technopark.gmd.de) Senior Research Staff Member C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10, 53757 Sankt Augustin, Germany Tel.: +49 (0) 2241 - 92 52 - 95 Fax: +49 (0) 2241 - 92 52 - 99 From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 12:46:04 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA14801; Fri, 16 Jan 1998 12:46:04 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA27007; Fri, 16 Jan 1998 12:29:03 -0500 (EST) Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id MAA27000; Fri, 16 Jan 1998 12:29:01 -0500 (EST) Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id MAA02149; Fri, 16 Jan 1998 12:29:01 -0500 (EST) Date: Fri, 16 Jan 1998 12:29:01 -0500 (EST) From: Pat Worley Message-Id: <199801161729.MAA02149@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Re: Low Level Benchmarks In-Reply-To: Mail from 'hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel)' dated: Fri, 16 Jan 1998 16:37:14 +0100 Cc: worley@haven.EPM.ORNL.GOV In most other points I agree with Charles. I'm strongly convinced that the COMMS* routines are obsolete and should be replaced with something reasonable. I have no problem with this. As I indicated, I have no experience with these. What is missing is a precise definition of the underlying measuring methodology. Perhaps this is the point that I was trying to make. Not only must the codes be easy to use, but the results should be easy to interpret. Every code should have a simple description of what it is measuring, what the data can be used for (and what it shouldn't be used for), and how to use the data. PARKBENCH needs to provide guidance in what data to collect, not just carefully crafted benchmark codes. And we need to describe clearly what low level communication tests are good for. For example, I have problems with low level contention tests. Understanding hotspots is an interesting exercise, but the connection to "real" codes is more subtle. Do we stress test, look at contention for given algorithms/global operators (and which algorithms), use some standard workload characterization as the background job, ...? For any given performance question, what should be used may be clear, but it is difficult to do this a priori. A simultaneous send/receive stress test may very well be something interesting to present, but we also need to be able to explain why (because it is typical in synchronous global communication operations?). In summary, I would like to see a prioritized list of what low level information is worth collecting, and why. We can then use this to choose or generate codes to do the testing. I apologize for being lazy. This may have already been laid out in the original ParkBench document, but I never worried about the low level tests before and don't have a copy of the document in front of me. Pat Worley From owner-parkbench-comm@CS.UTK.EDU Fri Jan 16 13:45:53 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id NAA15447; Fri, 16 Jan 1998 13:45:52 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA29375; Fri, 16 Jan 1998 13:15:58 -0500 (EST) Received: from c3serve.c3.lanl.gov (root@c3serve-f0.c3.lanl.gov [128.165.20.100]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id NAA29368; Fri, 16 Jan 1998 13:15:55 -0500 (EST) Received: from risc.c3.lanl.gov (risc.c3.lanl.gov [128.165.21.76]) by c3serve.c3.lanl.gov (8.8.5/1995112301) with ESMTP id LAA04436 for ; Fri, 16 Jan 1998 11:16:08 -0700 (MST) Received: from localhost (hoisie@localhost) by risc.c3.lanl.gov (950413.SGI.8.6.12/c93112801) with SMTP id LAA13115 for ; Fri, 16 Jan 1998 11:14:30 -0700 Date: Fri, 16 Jan 1998 11:14:30 -0700 (MST) From: Adolfy Hoisie To: parkbench-comm@CS.UTK.EDU Subject: Low Level Benchmarks Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Just to amplify some of the numerous excellent points made by Pat and Charles and Rolf, the emphasis of the Parkbench group, as I see it, should be on defining the methodology for benchmarking at this level. A string of numbers says very little about machine performance in absence of a solid, scientifcally defined underlying base for the programs utilized for benchmarking. COMMS is obsolete in methodology, coding and generation and analysis of results. As such, I have used it quite some time ago only to reach the conclusions above. Instead, I always chose to write my own benchmarking programs in order to extract meaningful data for the applications I was working on. I would like to see the debate heading towards what is it that we need to measure in a suite of general use that is applicable to machines of interest. For example, very little or no attention is being paid to benchmarking DSM architectures, where quite a few architectural parameters become harder to define and subtler to interpret. Including, but not limited to, message passing characterization on these architectures. Adolfy ====================================================================== Adolfy Hoisie \ Los Alamos National Laboratory \Scientific Computing, CIC-19, MS B256 hoisie@lanl.gov \ Los Alamos, NM 87545 USA \ Phone: 505-667-5216 http://www.c3.lanl.gov/~hoisie/hoisie.html FAX: 505-667-1126 From owner-parkbench-comm@CS.UTK.EDU Sun Jan 18 07:38:42 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id HAA20627; Sun, 18 Jan 1998 07:38:42 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA21662; Sun, 18 Jan 1998 07:28:22 -0500 (EST) Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.154]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA21655; Sun, 18 Jan 1998 07:28:20 -0500 (EST) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa1002926; 18 Jan 98 12:25 GMT Message-ID: Date: Sun, 18 Jan 1998 12:24:20 +0000 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Low Level Benchmarks MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a To: the low-level discussion group From: Roger I comment below on recent emailings on this topic which arrived on the 16 Jan 1998. Pat Worley writes: >2) It may be time to revisit the goals of the Low Level suite. There ar > are two obvious extremes. > > a) Determine some (hopefully representative) metrics of point-to-po > point communication performance, concentrating on making the > measurements > SNIP > In this situation, a two (or more) parameter model fit to the > data can be useful, if only as a shorthand for the raw data, > but the model should not be expected to explain the data. This is of course what COMMS1 sets out to do. But please when judging this point, use the New COMMS1 revised code that DOES give much more sensible answers in difficult cases. Please do not base your opinions on results from the Original COMMS1 code that is still unfortunately being issued by Parkbench. Instructions for getting the new code was given in my email to this group on 12 Jan 1998. > (The two parameter models are very accurate for some of the > previous generation of homogeneous message-passing platforms.) It is nice to have confirmation of this from an independent source. In addition, the 3-parameter mode is available in New COMMS1 for cases where the 2-para fails. > In case my sympathies are not clear, I prefer to revisit and fix > the current suite, "dumbing it down", if only in presentation, > making it clear what it does and does not measure. Again this was my objective in writting the New COMMS1 as a minimum fix to the existing Original COMMS1. However I don't think I would call this "Dumbing Down". In fact New COMMS1 is a "Smartening UP" of the benchmark because it provides a 3-parameter fit for those cases for which the 2-para fit fails. It also reports the Key spot values of "time for shortest message (which Charles and Rolfe want to call the Latency)" and bandwidth for longest message (this could equally well be the maximum measured bandwidth). It also compares the fitted values with measured values at these key points. The fit formulae are also given in the output for completeness. Pleas note that COMMS1 has always reported ALL the measured lengths and times in the output file as the basic data, and ALL spot bandwidths were printed to the screen as measured, and could be captured in a file if required. In New COMMS1 the spot bandwidths are more conveniently included in the standard output file as they should have been in the first place. Unfortunately the above additions make the new output file more complex (which I am not happy about). An example of New COMMS1 output is attached at the end of this email. >PPS. - Of course the real reason for using ping-pong is the difficulty > in measuring the time for one-way messaging. I was not aware > that this was a solved problem, at least at the MPI or PVM > level. Perhaps system instrumentation can answer it, but I > didn't know that portable measurement codes could be guaranteed > to do so across the different platforms. Exactly so. ******************************* Rolf Hempel writes: >of message-passing. If Charles knows a way to measure single messages, >I would like to learn about it. Me too. >In most other points I agree with Charles. I'm strongly convinced that >the COMMS* routines are obsolete and should be replaced with something >reasonable. In particular, the current routines are far too complicated >to use, and give completely meaningless results. Therefore, I think one Please base your judgement on the results from New COMMS1 which has a much more satisfactory fitting procedure (see the examples in the PICT tool mentioned below). I believe that the revised program New COMMS1 gives reasonable results and is not obselete. >README file. What is missing is a precise definition of the underlying >measuring methodology. In contrast, the methodology of the COMMS1 curve fitting is given in the Parkbench Report and in detail in my book "The Science of Computer Benchmarking", see: http://www.siam.org/catalog/mcc07/hockney.htm >I strongly prefer the output of timing tables (perhaps translated in >good graphical representations) over crude parametrizations like the >ones in the COMMS* benchmarks. Those can only frustrate the experts >and confuse all other people. You seem to have failed to notice that both the Original COMMS1 and the New COMMS1 report the timing table as the FIRST part of their output files. Further a good graphical representation is available using the database tool from Southampton and my own PICT tool (see below) The COMMS1 fitting procedure is not crude. On the contrary it uses least-squares fitting of a performance model that is quite satisfactory for a lot of data. In minimising relative rather than absolute error, New COMMS1 spreads the error in a much more satisfactory way and allows the fitting to be used over a much longer range of message lengths. Furthermore where the 2-parameter model is unsuitable, New COMMS1 provides a 3-parameter model which fits the Cray T3E (Charles's data 17 Dec 96) very well. I don't think one can call all this crude. To see how good the 2 and 3 parameter fits produced by New COMMS1 are to recent data, check out the examples on my Parkbench Interactive Curve Fitting Tool (PICT) at: http://www.minnow.demon.co.uk/pict/source/pict2a.html For the most part these show that 2-parameters fit the data surprisingly well. The parameters are not meaningless and useless, but often a rather good summary of the measurements. The 3-parameter fit is described quite fully in my talk to the 11 Sep 1997. I have finally written this up with pretty pictures for the PEMCS Web Journal. Look at: http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ talks/Roger-Hockney/perfprof1.html In truth we need to see a lot more data before judging the usefulness of parametric fitting. That is why I would like to look at your NEC results. These need not be the timings from COMMS1, but any pingpong measurements that you regard as "good". Please do not base your opinion on the results produced by the Original COMMS1 which is presently in the Parkbench suit. This will only work satisfactorily results for message lengths up to about 4*10^4. When used outside this range it may produce useless numbers. >messages are multiples of 4 or 8 bytes, I support Charles' proposal to >measure the time for an 8 byte message and call it the latency. I am STRONGLY opposed to this. Latency is an ambiguous term that has different meanings to different people. If we wish to report the time for an 8-byte message we should call it what it is, no more no less, eg: t(n=8B) = 45.6 us To call this latency only leads to confusion and senseless misunderstanding and argument. **************************************************************** EXAMPLE NEW COMMS1 OUTPUT FILE: T3E Results from Grassl's 17 Dec 1996 email to Parkbench committee **************************************************************** ================================================= === === === GENESIS / ParkBench Parallel Benchmarks === === === === comms1_mpi === === === ================================================= Pingpong Benchmark: ------------------- Measures time to send a message between two nodes on a multi-processor computer (MPP or network) as a function of the message length. It also characterises the time and corresponding bandwidth by both two and three performance parameters. Original code by Roger Hockney (1986/7), modified by Ian Glendinning and Ade Miller (1993/4), and by Roger Hockney and Ron Sercely (1997). ----------------------------------------------------------------------- You are running the VERSION dated: RWH-12-Mar-1997 ----------------------------------------------------------------------- The measurement time requested for each test case was 1.00E+00 seconds. No distinction was made between long and short messages. Zero length messages were not used in least squares fitting. ----------------------------------------------- (1) PRIMARY MEASUREMENTS (BW=Bandwidth, B=Byte) ----------------------------------------------------------------------- SPOT MEASURED VALUES | EVOLVING TWO-PARAMETER FIT --------------------------------------|-------------------------------- POINT LENGTH(n) TIME(t) BW(r=n/t) | rinf nhalf RMS rel B s B/s | B/s B error % *SPOT1*-------------------------------|-------------------------------- 1 8.000E+00 1.260E-05 6.349E+05 | 0.000E+00 0.000E+00 0.000E+00 2 1.000E+01 1.348E-05 7.418E+05 | 2.273E+06 2.064E+01 -1.255E-06 3 2.000E+01 1.380E-05 1.449E+06 | 1.237E+07 1.516E+02 2.277E+00 4 3.000E+01 1.590E-05 1.887E+06 | 7.798E+06 9.157E+01 2.762E+00 5 4.000E+01 1.561E-05 2.562E+06 | 1.020E+07 1.237E+02 3.267E+00 6 5.000E+01 1.648E-05 3.034E+06 | 1.115E+07 1.366E+02 3.126E+00 7 6.000E+01 1.618E-05 3.708E+06 | 1.364E+07 1.711E+02 3.796E+00 8 7.000E+01 1.773E-05 3.948E+06 | 1.356E+07 1.699E+02 3.552E+00 9 8.000E+01 1.694E-05 4.723E+06 | 1.562E+07 1.992E+02 4.072E+00 10 9.000E+01 1.793E-05 5.020E+06 | 1.634E+07 2.095E+02 3.954E+00 11 1.000E+02 1.802E-05 5.549E+06 | 1.741E+07 2.249E+02 3.983E+00 12 1.100E+02 1.889E-05 5.823E+06 | 1.776E+07 2.300E+02 3.841E+00 13 1.200E+02 1.780E-05 6.742E+06 | 1.983E+07 2.607E+02 4.483E+00 14 1.300E+02 1.917E-05 6.781E+06 | 2.034E+07 2.682E+02 4.368E+00 15 1.400E+02 1.902E-05 7.361E+06 | 2.131E+07 2.828E+02 4.405E+00 16 1.500E+02 1.941E-05 7.728E+06 | 2.209E+07 2.946E+02 4.389E+00 17 1.600E+02 1.896E-05 8.439E+06 | 2.353E+07 3.167E+02 4.644E+00 18 1.700E+02 2.057E-05 8.264E+06 | 2.362E+07 3.179E+02 4.514E+00 19 1.800E+02 1.911E-05 9.419E+06 | 2.526E+07 3.434E+02 4.887E+00 20 1.900E+02 2.125E-05 8.941E+06 | 2.517E+07 3.420E+02 4.765E+00 21 2.000E+02 1.894E-05 1.056E+07 | 2.730E+07 3.754E+02 5.382E+00 22 2.100E+02 2.091E-05 1.004E+07 | 2.767E+07 3.812E+02 5.282E+00 23 2.200E+02 2.011E-05 1.094E+07 | 2.885E+07 3.998E+02 5.393E+00 24 2.300E+02 2.136E-05 1.077E+07 | 2.915E+07 4.047E+02 5.296E+00 25 2.400E+02 2.015E-05 1.191E+07 | 3.053E+07 4.268E+02 5.496E+00 26 2.500E+02 2.228E-05 1.122E+07 | 3.047E+07 4.258E+02 5.390E+00 27 2.600E+02 2.144E-05 1.213E+07 | 3.110E+07 4.360E+02 5.365E+00 28 2.700E+02 2.212E-05 1.221E+07 | 3.142E+07 4.412E+02 5.290E+00 29 2.800E+02 2.111E-05 1.326E+07 | 3.249E+07 4.588E+02 5.417E+00 30 2.900E+02 2.259E-05 1.284E+07 | 3.272E+07 4.626E+02 5.337E+00 31 3.000E+02 2.284E-05 1.313E+07 | 3.294E+07 4.663E+02 5.262E+00 32 4.000E+02 2.256E-05 1.773E+07 | 3.550E+07 5.098E+02 5.818E+00 33 6.000E+02 2.549E-05 2.354E+07 | 4.022E+07 5.921E+02 6.632E+00 34 8.000E+02 2.817E-05 2.840E+07 | 4.567E+07 6.883E+02 7.296E+00 35 1.000E+03 3.253E-05 3.074E+07 | 4.887E+07 7.452E+02 7.451E+00 36 2.000E+03 4.496E-05 4.448E+07 | 5.553E+07 8.657E+02 8.013E+00 37 5.000E+03 6.135E-05 8.150E+07 | 7.983E+07 1.312E+03 1.090E+01 38 1.000E+04 8.579E-05 1.166E+08 | 1.070E+08 1.814E+03 1.284E+01 39 2.000E+04 1.294E-04 1.546E+08 | 1.339E+08 2.315E+03 1.426E+01 40 3.000E+04 1.722E-04 1.742E+08 | 1.523E+08 2.659E+03 1.493E+01 41 4.000E+04 2.161E-04 1.851E+08 | 1.647E+08 2.890E+03 1.524E+01 42 5.000E+04 2.594E-04 1.928E+08 | 1.735E+08 3.056E+03 1.539E+01 43 1.000E+05 4.534E-04 2.206E+08 | 1.847E+08 3.266E+03 1.575E+01 44 2.000E+05 7.784E-04 2.569E+08 | 1.996E+08 3.548E+03 1.648E+01 45 3.000E+05 1.110E-03 2.703E+08 | 2.123E+08 3.787E+03 1.701E+01 46 5.000E+05 1.697E-03 2.946E+08 | 2.256E+08 4.039E+03 1.762E+01 47 1.000E+06 3.276E-03 3.053E+08 | 2.370E+08 4.255E+03 1.806E+01 48 2.000E+06 6.373E-03 3.138E+08 | 2.468E+08 4.440E+03 1.839E+01 49 3.000E+06 9.489E-03 3.162E+08 | 2.547E+08 4.590E+03 1.858E+01 50 5.000E+06 1.569E-02 3.187E+08 | 2.612E+08 4.714E+03 1.870E+01 51 1.000E+07 3.134E-02 3.191E+08 | 2.666E+08 4.816E+03 1.874E+01 *SPOT2*---------------------------------------------------------------- ------------------------ COMMS1: Message Pingpong ------------------------ Result Summary -------------- ------------------- (2) KEY SPOT VALUES ------------------- ----------------------- *KEY1* Shortest n = 8.000E+00 B, | t = 1.260E-05 s | ****** | | ****** *KEY2* Longest n = 1.000E+07 B, | r = 3.191E+08 B/s | ****** ----------------------- ----------------------------------------------------------------------- -- ------------------------------------------ (3) BEST TWO-PARAMETER LINEAR-(t vs n) FIT ------------------------------------------ (Minimises sum of squares of relative error at all points being fitted) Root Mean Square (RMS) Relative Error in time = 18.74 % Maximum Relative Error in time = 43.61 % at POINT = 1 This is a fit to ALL points. Even though different expressions are given for short and long messages, they are algebraically identical and either may be used for any message length in the full range. -------------- Short Messages -------------- Best expressions to use if nhalf > 0 and n <= nhalf = 4.816E+03 B Bandwidth fitted to: r = pi0*n/(1+n/nhalf) Time fitted to: t = t0*(1+n/nhalf) -------------------------------------------- *LIN1* | pi0 = 5.536E+04 Hz, nhalf= 4.816E+03 B | ****** | | ****** *LIN2* | t0 = 1/pi0 = 1.807E-05 s | ****** -------------------------------------------- Spot comparison at POINT = 1, n = 8.000E+00 B t(fit) = 1.810E-05 s, t(measured) = 1.260E-05 s, relative error in time = 43.6 % ------------- Long Messages ------------- Best expressions to use if n > nhalf = 4.816E+03 B, or nhalf=0 Bandwidth fitted to: r = rinf/(1+nhalf/n) Time fitted to: t = (n+nhalf)/rinf ----------------------------------------------- *LIN3* | rinf = 2.666E+08 B/s, nhalf = 4.816E+03 B | ****** ----------------------------------------------- Spot comparison at POINT = 51, n = 1.000E+07 B r(fit) = 2.665E+08 B/s, r(measured) = 3.191E+08 B/s, relative error in B/W = -16.5 % ----------------------------------------------------------------------- -- --------------------------------------- (4) BEST 3-PARAMETER VARIABLE-POWER FIT --------------------------------------- Root Mean Square (RMS) Relative Error in B/W = 6.89 % Maximum Relative Error in B/W = -13.41 % at POINT = 39 This fit is to ALL data points Bandwidth is fitted to: rvp = rivp/(1+(navp/n)^gamvp)^(1/gamvp) Time is fitted to: tvp = t0vp*(1+(n/navp)^gamvp)^(1/gamvp) where t0vp = navp/rivp and navp = t0vp*rivp When gamvp = 1.0, this form reduces to the linear-time form (3) above, navp becomes nhalf, and rivp becomes rinf. The three independent parameters are (t0vp is derived): ------------------------------------------------------------- *VPWR1* | rivp = 3.475E+08 B/s, navp = 3.670E+03 B, gamvp = 4.190E-01 | | | *VPWR2* | t0vp = navp/rivp = 1.056E-05 s | ------------------------------------------------------------- This function is guaranteed to fit the first and last measured values of time and bandwidth. It also fits the (interpolated) time and bandwidth at n = navp. -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Mon Jan 19 13:10:51 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id NAA16306; Mon, 19 Jan 1998 13:10:50 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA21116; Mon, 19 Jan 1998 12:53:17 -0500 (EST) Received: from haze.vcpc.univie.ac.at (haze.vcpc.univie.ac.at [131.130.186.138]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id MAA21105; Mon, 19 Jan 1998 12:53:14 -0500 (EST) Received: (from smap@localhost) by haze.vcpc.univie.ac.at (8.8.6/8.8.6) id SAA21164 for ; Mon, 19 Jan 1998 18:53:11 +0100 (MET) From: Ian Glendinning Received: from fidelio(131.130.186.155) by haze via smap (V2.0beta) id xma021162; Mon, 19 Jan 98 18:52:48 +0100 Received: (from ian@localhost) by fidelio.vcpc.univie.ac.at (8.7.5/8.7.3) id SAA03411 for parkbench-comm@CS.UTK.EDU; Mon, 19 Jan 1998 18:52:48 +0100 (MET) Date: Mon, 19 Jan 1998 18:52:48 +0100 (MET) Message-Id: <199801191752.SAA03411@fidelio.vcpc.univie.ac.at> To: parkbench-comm@CS.UTK.EDU Subject: Re: Low Level benchmark errors and differences X-Sun-Charset: US-ASCII Dear parkbench-comm subscriber, I have been following the discussions regarding the low-level ParkBench benchmarks over the last couple of weeks with intertest, but so far I have been content to keep my head below the parapet, as most of the things I would have said have been said by others anyway. However, there is one thing that I would like to point out. On Wed Jan 7 22:56:04 1998, Charles Grassl wrote: > The Low Level programs are obsolete and need to be replaced. I agree that the existing code could use some improvement, though most of the discussion seems to have revolved around the version in the "current release", which as Roger has pointed out several times is very old, and he has written an improved version. Have people tried that version out? > I have > written seven simple programs, with MPI and PVM versions, and offer them > as a replacement for the Low Level suite. I have tried a version of Charles's "comms1" code that he sent me, on our CS-2 system, and found that it reported approximately half the expected asymptotic bandwidth, so this code is not without its problems either! By "expected", I mean the bandwidth reported by various versions of (the ParkBench version of) COMMS1 over the years, coded using first PARMACS, then PVM, and more recently MPI, as a message-passing library. This value corresponds closely to what one would expect for the peak performance, given the performance figures for the underlying hardware. For an explanation of what I think is happening, please read on... On Thu Jan 15 20:20:36 1998, Charles Grassl wrote: > This recorded > time is for a round trip message, and is not precisely the time for > two messages. Half the round trip message passing time, as reported in > the PMB tests, is not the time for a single message and should not be > reported and such. This same erroneous technique is used in the COMMS1 > and COMMS2 two benchmarks. (Is Parkbench is responsible for propagating > this incorrect methodology.) As Pat Worley and Rolf Hempel pointed out, the ping-pong is used because of the difficulty in measuring the time for one-way messages, and I believe that this is illustrated in this instance, as it seems that Charles's attempt to time one-way messages has caused the unexpectedly low asymptotic bandwidth measurement... Charles's code executes a send, and then as fast as possible executes another one, without any concern as to whether the data has left the sending processor, or has arrived at the receiving processor, and what I think is happening is that his code is queuing requests to send, before the previous messages have left the sending processor, forcing the MPI implementation to buffer them, at the cost of an extra copy operation, which would not otherwise have been necessary, thus reducing the effective bandwidth! > With respect to low level testing, the round trip exchange of messages, > as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not > characteristic of the lowest level of communication. This pattern > is actually rather rare in programming practice. It is more common > for tasks to send single messages and/or to receive single messages. It seems to me that it is not very common programming practice to send a sequence of messages to the same destination in rapid fire, without having either done some intermediate processing, or waiting to get some response back. If you were trying to code efficiently, you would doubtless merge the messages into one, and send the data all together in one message, if it was all available already, which it must have been if you were able to execute the sends so rapidly one after another! > The single message passing is a distinctly different case from that > of round trip tests. We should be worried that the round trip testing > might introduce artifacts not characteristic of actual (low level) usage. > We need a better test of basic bandwidth and latency in order to measure > and characterize message passing performance. Well, it seems that in this case, the attempt to measure the single message passing case has introduced an artifact. To an extent it depends what you are trying to measure of course, but it has always been my understanding that the COMMS1 benchmark was trying to measure the peak performance that you could reasonably expect to obtain using a portable message-passing library interface, which, for a good implementation of MPI, ought to come close to the theoretical hardware limit, which is precisely what the existing COMMS1 ping-pong code does on our system. I would therefore argue in favour of retaining the ping-pong technique for obtaining timings. Ian -- Ian Glendinning European Centre for Parallel Computing at Vienna (VCPC) ian@vcpc.univie.ac.at Liechtensteinstr. 22, A-1090 Vienna, Austria Tel: +43 1 310 939612 WWW: http://www.vcpc.univie.ac.at/~ian/ From owner-parkbench-comm@CS.UTK.EDU Tue Jan 20 08:50:06 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA06977; Tue, 20 Jan 1998 08:50:06 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id IAA01200; Tue, 20 Jan 1998 08:28:44 -0500 (EST) Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id IAA01193; Tue, 20 Jan 1998 08:28:39 -0500 (EST) Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id OAA12945; Tue, 20 Jan 1998 14:19:53 +0100 (MET) Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id OAA09828; Tue, 20 Jan 1998 14:19:52 +0100 Date: Tue, 20 Jan 1998 14:19:52 +0100 From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel) Message-Id: <199801201319.OAA09828@sgi7.ccrl-nece.technopark.gmd.de> To: cmg@cray.com Subject: Re: Low Level Benchmarks Cc: hempel@ccrl-nece.technopark.gmd.de, parkbench-comm@CS.UTK.EDU Reply-To: hempel@ccrl-nece.technopark.gmd.de Dear Charles, thank you for your note, and for sending me your simple test program. One thing I like about the program is that it's easy to install and run; no complicated makefiles, include files and sophisticated driver software. We had the code running in five minutes. In many points I agree with Ian Glendinning who already reported about his tests with your code on the Meiko system. When we ran the test on our SX-4, however, the results were very similar to ping-pong figures. With the particular MPI version I used for my measurements, the classical ping-pong test as implemented in MPPTEST of the MPICH distribution gives about 4 usec less time in latency and about 4% higher throughput than your test program. The reason for the increase in latency as reported by your code is fully explained by the fact that you forgot to correct for the time spent in the timer routine (see below). So, we would have no problem with adopting a corrected version of your code as the basic communication test. However, I think that this is not the point. The question we have to answer is what communication pattern we want to measure with our benchmark code. In my view the ping-pong technique, with all its problems, is much closer to a typical application than your program. Of course, the situation "receiver already waiting" implemented by the ping-pong, is a special case which will not be found for all messages in an application. In this situation, the MPI implementation can use a more efficient protocol, which will lead to a best case measurement of latency and throughput. I agree with Ian that the rapid succession of messages in one direction is very untypical. Only a stupid programmer would do it this way in an application, and not aggregate the messages to a larger one. What you really measure with this benchmark is how well the MPI library can deal with this kind of congestion. As you see, our library is not affected at all by this, but, as Ian reported, the Meiko shows a much different behaviour. In a sense, you measure a kind of worst case scenario, as opposed to the best case one in the ping-pong. One technical detail of your program: You time every send operation separately, and then sum up the individual times. This requires a quite accurate clock. I would expect that some machines could run into trouble with this approach. Also, you don't correct for the time needed for calling the timer twice for every send/receive. On machines with highly optimized MPI libraries this is not at all negligible. On our machine two timer calls require as much time as 25% of a complete send-receive sequence! As a summary, your basic communication program does not convince me as a better alternative to ping-pong programs such as MPPTEST. The only thing I really like about it is its simplicity. Best regards, Rolf ------------------------------------------------------------------------ Rolf Hempel (email: hempel@ccrl-nece.technopark.gmd.de) Senior Research Staff Member C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10, 53757 Sankt Augustin, Germany Tel.: +49 (0) 2241 - 92 52 - 95 Fax: +49 (0) 2241 - 92 52 - 99 From owner-parkbench-comm@CS.UTK.EDU Wed Jan 21 11:22:06 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id LAA27346; Wed, 21 Jan 1998 11:22:06 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA20207; Wed, 21 Jan 1998 10:55:58 -0500 (EST) Received: from sun1.ccrl-nece.technopark.gmd.de (sun1.ccrl-nece.technopark.gmd.de [193.175.160.67]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id KAA20176; Wed, 21 Jan 1998 10:55:44 -0500 (EST) Received: from sgi7.ccrl-nece.technopark.gmd.de (sgi7.ccrl-nece.technopark.gmd.de [193.175.160.89]) by sun1.ccrl-nece.technopark.gmd.de (8.7/3.4W296021412) with SMTP id QAA01123; Wed, 21 Jan 1998 16:50:13 +0100 (MET) Received: (from hempel@localhost) by sgi7.ccrl-nece.technopark.gmd.de (950413.SGI.8.6.12/950213.SGI.AUTOCF) id QAA11663; Wed, 21 Jan 1998 16:54:00 +0100 Date: Wed, 21 Jan 1998 16:54:00 +0100 From: hempel@ccrl-nece.technopark.gmd.de (Rolf Hempel) Message-Id: <199801211554.QAA11663@sgi7.ccrl-nece.technopark.gmd.de> To: parkbench-comm@CS.UTK.EDU Subject: NEW COMMS1 benchmark Cc: eckhard@ess.nec.de, tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, maciej@ccrl-nece.technopark.gmd.de, ritzdorf@ccrl-nece.technopark.gmd.de, zimmermann@ccrl-nece.technopark.gmd.de, springstubbe@gmd.de, hempel@ccrl-nece.technopark.gmd.de Reply-To: hempel@ccrl-nece.technopark.gmd.de In the recent discussion on the low-level benchmarks, Roger repeatedly asked us to base our evaluation of the COMMS1 benchmark on his new version, and not on the one which is still in the official PARKBENCH distribution. At NEC we now have repeated the tests on the NEC SX-4 machine, and I would like to make a few comments on the results. First of all, the raw data as reported by the table Primary Measurements more closely match the figures given by other ping-pong tests than the older version. The correction for oeverheads, however, is still problematic for the following reasons: 1. In every loop iteration, the returned message is compared with the message sent. If one is concerned with the correctnes of the MPI library, this could be checked in a separate loop before the timing loop. The check inside the timing loop, done only by the sender process, delays the sender and thus makes sure that the receiver is already waiting in the receive for the next message. This aggravates the "Receiver ready" situation which I discussed in an earlier mail. 2. The authors take great care in correcting for the overhead introduced by the do loop. This is done by the loop over the dummy routine before the main loop. On the other hand, the correction for the check routine call introduces an overhead of one timer call which is NOT taken into account. (Here I assume that the internal clock is read out at a fixed point in time during every call of DWALLTIME00().) I would argue that on most machines the loop overhead per iteration is negligible as compared to a function call. On our machine, MPI_Wtime calls a C function which in turn calls an assembly language routine. The time needed for this is about 10% of our message latency! Another problem in the measuring procedure is that the test message contains a single constant, repeated as many times as there are words in the message. Did the authors never think about the possibility of data compression in interconnect systems? I would not be surprised to see bandwidths of Terabytes/sec on some Ethernet connection between workstations. Apart from this, the raw data are much better now than they were before, and when the above points were fixed, the resulting table would be satisfactory. The interesting question is, however, how much added value we get from the parameter fitting. In my earlier note, I called the fitting procedure in the earlier COMMS1 benchmark "crude". I cannot find a more appropriate word for a model which in cases deviates from the measured values by more than 100%. So, how much improvement do we get from the revised COMMS1 version? As Roger said himself, the increase in modeling sophistication led to a more complicated output file. Results are now given for two models, the first one using two parameters, and the second one three. As could be expected, the two-parameter model does not work better than in the previous version. For our machine, latency is over-estimated by 18.9 percent, and the bandwidth at the last data point is off by 27%. Since a linear model is just too simple to be applied to modern message-passing libraries, I wonder why these results are still in the output file at all. The three-parameter fit is better than the two-parameter one. The major advantage is that it exactly matches the first data point in time, and the last data point in bandwidth. That is what people would look at, if there were no parameter fitting at all. So, the reported latency is the time measured for a zero-byte message, and is as good or as bad as this measurement. For our MPI library, the RMS fitting error for the whole data set is 14.04%, and the maximum relative error is 33.4%. We now can discuss the meaning of the word "crude" (and I apologize if as a non-native speaker I don't use the right word here), but I would at least call it unsatisfactory. Given those differences between model and measurements, I was not surprised to see the projected RINFINITY as being too high. The 7.65 GBytes/s are well beyond a memcpy operation in our shared memory, and measured rates never exceeded 7.1 GBytes/s. To summarize, in my opinion there is no added value given by the parameter fitting. The latency value is the first entry in the raw data table, and the asymptotic bandwidth is easy to figure out by just looking at the bandwidths as measured for very long messages. As explained above, the extrapolation by the parametrized model does not add any precision as compared with a guess based on the long-message table entries. For message lengths in between, what does a model help me if it deviates from the measurements by up to 33%? So, my conclusion would be to drop the whole parameter fitting from the PARKBENCH low-level routines. In a separate mail I will send the COMMS1 benchmark output, as produced with our MPI library, to Roger. I don't want to swamp the whole PARKBENCH forum with the detailed data. Best regards, Rolf ------------------------------------------------------------------------ Rolf Hempel (email: hempel@ccrl-nece.technopark.gmd.de) Senior Research Staff Member C&C Research Laboratories, NEC Europe Ltd., Rathausallee 10, 53757 Sankt Augustin, Germany Tel.: +49 (0) 2241 - 92 52 - 95 Fax: +49 (0) 2241 - 92 52 - 99 From owner-parkbench-comm@CS.UTK.EDU Fri Jan 23 12:24:12 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA07290; Fri, 23 Jan 1998 12:24:11 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA06737; Fri, 23 Jan 1998 12:04:42 -0500 (EST) Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.154]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA06686; Fri, 23 Jan 1998 12:04:23 -0500 (EST) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa1003594; 23 Jan 98 16:49 GMT Message-ID: <1GgxMFAgVMy0EwfI@minnow.demon.co.uk> Date: Fri, 23 Jan 1998 16:29:20 +0000 To: hempel@ccrl-nece.technopark.gmd.de Cc: parkbench-comm@CS.UTK.EDU, eckhard@ess.nec.de, tbeckers@ess.nec.de, lonsdale@ccrl-nece.technopark.gmd.de, maciej@ccrl-nece.technopark.gmd.de, ritzdorf@ccrl-nece.technopark.gmd.de, zimmermann@ccrl-nece.technopark.gmd.de, springstubbe@gmd.de From: Roger Hockney Subject: Re: NEW COMMS1 benchmark In-Reply-To: <199801211554.QAA11663@sgi7.ccrl-nece.technopark.gmd.de> MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a To: The Parkbench discussion group From: Roger Hockney First the 3-parameter fit that is produced by New COMMS1 and discussed by Rolf can be found in the html version of this reply at: www.minnow.demon.co.uk/Pbench/emails/hempel1.htm Or by bringing up the PICT tool on your browser at: www.minnow.demon.co.uk/pict/source/pict2a.html Then: (1) select a suitable frame size for the PICT display (2) change the data URL at top from .../data/t3e.res to .../data/sx4.res (3) press the "GET DATA at URL" button, and the data should download. (4) press the 3-PARA button then the APPLY3 button, and the 3-para curve should be drawn. ************************************************************************ Rolf has especialy asked me to point out that the results that he has supplied are for the SX4 using Release 7.2 MPI software which is will soon be replaced by a newer version with significantly better latency and bandwidth. This data does not therefore represent the best that can be achieved on the SX4. ************************************************************************ I now reply to specific points in Rolf Hempel's email to group on 21 Jan 1998. >In the recent discussion on the low-level benchmarks, Roger repeatedly >asked us to base our evaluation of the COMMS1 benchmark on his new >version, and not on the one which is still in the official PARKBENCH >distribution. At NEC we now have repeated the tests on the NEC SX-4 >machine, and I would like to make a few comments on the results. > Thank you, Rolf, for taking the trouble to install New COMMS1 and sending me the results. I discuss the results below. In answer to your other points: >First of all, the raw data as reported by the table Primary Measurements >more closely match the figures given by other ping-pong tests than the >older version. The correction for oeverheads, however, is still The two points you raise could easily be incorporated in the code. I was reluctant to tamper with the measurement part of the COMMS1 code because it would introduce systematic differences in the measurements and make comparison with older measurements invalid. But of course this has to be done from time to time. My changes were deliberately kept to a minimum and confined largely to the parameter fitting part which was causing the main problems being reported. >Another problem in the measuring procedure is that the test message >contains a single constant, repeated as many times as there are words in >the message. Did the authors never think about the possibility of >data compression in interconnect systems? I would not be surprised to >see bandwidths of Terabytes/sec on some Ethernet connection between >workstations. Yes I did think about this, but decided I did not know enough about compression to devise a way to prevent it. Compression algorithms are so clever now that this may be impossible to do. Anyway this is not yet a problem, so I suggest we leave it until it becomes one. Perhaps software should get benefit in its performance numbers for the use of compression but then we need something more difficult than a sequence of constants to use as a standard test. > >Apart from this, the raw data are much better now than they were before, >and when the above points were fixed, the resulting table would be >satisfactory. I would have no objection to this. >The interesting question is, however, how much added value >we get from the parameter fitting. In my earlier note, I called the The added value provided in the case of the NEC SX4 results is that the 3-parameter fit (see graph) gives a satisfactory fit to ALL the data. This reduces 112 numbers to 3 numbers and an analytic formula that can be manipulated. This is called "Performance Characterisation" and provides very useful data compression. Furthermore the parameters themselves can be interpreted as characterising various aspects of the shape and asymptotes of the performance curve. In contrast reporting just the first time and last performance value and calling them the Latency and Bandwidth only tells us about these two points. Further the choice of which message lengths to use for this type of definition is entirely arbitrary and open to much argument at both ends. However, New COMMS1 does provide this type of output in the lines marked KEY SPOT VALUES but I deliberately avoided calling them values of Latency and Bandwidth in order to avoid senseless argument. Some people are very interested in the parametric representations, others not. One is not obliged to use or look at the parametric representations, but they are there for those who want them. For those interested just in the Raw data those are reported first in the output file of New COMMS1. >As could be expected, the two-parameter model does not work better than >in the previous version. For our machine, latency is over-estimated >by 18.9 percent, and the bandwidth at the last data point is off by >27%. Since a linear model is just too simple to be applied to modern >message-passing libraries, I wonder why these results are still in the >output file at all. The 2-PARA results are reported just so that one can see that they are unsatisfactory, and that therefore one must lose simplicity and consider a 3-para fit. Actually there is a switch that can be set in the comms1.inc file to suppress reporting of output if the errors exceed specified values. Every time I have used this, however, I have tended to rerun with the output on, in order to see just what the 2-para gave. If the 2-para can be accepted it is much preferable to the 3-para because of its simplicity and clearer interpretation of the significance of the parameters. >as bad as this measurement. For our MPI library, the RMS fitting >error for the whole data set is 14.04%, and the maximum relative error >is 33.4%. We now can discuss the meaning of the word "crude" (and I If you look at the graph itself (see above), I think you will find the agreement much more satisfactory than is apparent from the reported errors. You also may have too high an expectation of what parametric fitting can reasonably be expected to provide, especially for data with discontinuities. In my experience agreement in RMS error rarely is better than 7% and anything up to 30% is probably still useful. A maximum error of 30% is not bad at all, and may be due to a single rogue point or an isolated discontinuity. Although error numbers are reported in the output, one really has to look at the graph of all data before drawing conclusions. >Given those differences >between model and measurements, I was not surprised to see the >projected RINFINITY as being too high. The 7.65 GBytes/s are well >beyond a memcpy operation in our shared memory, and measured rates never >exceeded 7.1 GBytes/s. Actually 7.65 differs from 7.1 by 8% which is very good agreement indeed. >To summarize, in my opinion there is no added value given by the >parameter fitting. The latency value is the first entry in the raw >data table, and the asymptotic bandwidth is easy to figure out by just >looking at the bandwidths as measured for very long messages. As Your definitions of Latency and Bandwidth will have to be more precise than the above. What does "by looking at the B/W for very long messages" actually mean. What are "very long messages?". "What message length should the first entry in the Raw data table be for?" ... etc. >explained above, the extrapolation by the parametrized model does not >add any precision as compared with a guess based on the long-message >table entries. Strictly-speaking it is invalid to extrapolate the fitted curve outside the range of measured values. However we will always do this, and in this case the fit predicts the known hardware limit as well as can be reasonably expected. >For message lengths in between, what does a model help >me if it deviates from the measurements by up to 33%? So, my conclusion >would be to drop the whole parameter fitting from the PARKBENCH >low-level routines. I think the graph of the results and the 3-para fit shows remarkably good and useful agreement. But this is a subjective personal opinion. What do others think? Best wishes Roger -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk University of and link to my new book: "The Science of Computer Benchmarking" Westminster UK suggestions welcome. Know any fish movies or suitable links? From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 06:39:21 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA02920; Mon, 26 Jan 1998 06:39:21 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA09063; Mon, 26 Jan 1998 06:22:47 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA09055; Mon, 26 Jan 1998 06:22:36 -0500 (EST) Received: from mordillo (p112.nas1.is3.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA12226; Mon, 26 Jan 98 11:19:15 GMT Date: Mon, 26 Jan 98 10:14:38 GMT From: Mark Baker Subject: Re: Low Level Benchmarks To: Charles Grassl , parkbench-comm@CS.UTK.EDU Cc: solchenbach@pallas.de X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: <199801151711.RAA07227@magnet.cray.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Charles, Thanks for your thoughts and experiences with the Pallas PMB codes - I will forward them to the authors... The main points in favour of the PMB codes are that they are in C and potentially produce results for a variety of MPI calls... Obviously if the results they produce are flawed... Regarding new low-level codes I would be in favour of taking up your kind offer of writing a set of codes in C/Fortran. I guess the main problem is getting a concensus with regards methodology and measurements that are used with these codes. Maybe we can decide that a number of actions should be undertaken... 1) It seems clear that no one is 100% happy with the current version of the low-level codes. So, this implies that they need to be replaced !? 2) If we are going to replace the codes we can go down a couple of routes; start from scratch, replace with Roger's new codes or some combination of both... 3) I would be happy to see us start from scratch and create C/Fortran codes where the methodology and design of each can be "hammered out" by discussion first and then implemented (and iterated as necessary). 4) Assuming that we want to go down this route, I suggest we make a starting point of Charles' "suggestions and requirements for the low level benchmark design" - towards the end of this email. I am happy to put these words on the web and update/change them as our dicussions evolve... 5) Charles has offered his services to help write/design/test these new codes - I'm willing to offer my services in a similar fashion. I'm sure that others interested in the low-level codes could contribute something here as well. Overall, it seems clear to me that we have enough energy and manpower to produce a new set low-level codes whose methodology and design is correct and relevant to todays systems... I look forward to your comments... Regards Mark --- On Thu, 15 Jan 1998 11:11:39 -0600 (CST) Charles Grassl wrote: > > To: Parkbench interests > From: Charles Grassl > Subject: Low Level benchmarks > > Date: 15 January, 1998 > > > Mark, thank you for pointing us to the PMB benchmark. It is well written > and coded, but has some discrepancies and shortcomings. My comments > lead to suggestions and recommendation regarding low level communication > benchmarks. > > First, in program PMB the PingPong tests are twice as fast (in time) > as the corresponding message length tests in the PingPing tests (as run > on a CRAY T3E). The calculation of the time and bandwidth is incorrect > by a factor of 100% in one of the programs. > > This error can be fixed by recording, using and reporting the actual > time, amount of data sent and their ratio. That is, the time should not > be divided by two in order to correct for a round trip. This recorded > time is for a round trip message, and is not precisely the time for > two messages. Half the round trip message passing time, as reported in > the PMB tests, is not the time for a single message and should not be > reported and such. This same erroneous technique is used in the COMMS1 > and COMMS2 two benchmarks. (Is Parkbench is responsible for propagating > this incorrect methodology.) > > In program PMB, the testing procedure performs a "warm up". This > procedure is a poor testing methodology because is discards important > data. Testing programs such as this should record all times and calculate > the variance and other statistics in order to perform error analysis. > > Program PMB does not measure contention or allow extraction of network > contention data. Tests "Allreduce" and "Bcast" and several others > stress the inter-PE communication network with multiple messages, > but it is not possible to extract information about the contention from > these tests. The MPI routines for Allreduce and Bcast have algorithms > which change with respect to number of PEs and message lengths, Hence, > without detailed information about the specific algorithms used, we cannot > extract information about network performance or further characterize > the inter-PE network. > > Basic measurements must be separated from algorithms. Tests PingPong, > PingPing, Barrier, Xover, Cshift and Exchange are low level. Tests > Allreduce and Bcast are algorithms. The algorithms Allreduce and Bcast > need additional (algorithmic) information in order to be described in > terms of the basic level benchmarks. > > > With respect to low level testing, the round trip exchange of messages, > as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not > characteristic of the lowest level of communication. This pattern > is actually rather rare in programming practice. It is more common > for tasks to send single messages and/or to receive single messages. > In this scheme, messages do not make a round trip and there is not > necessarily caching or other coherency effects. > > The single message passing is a distinctly different case from that > of round trip tests. We should be worried that the round trip testing > might introduce artifacts not characteristic of actual (low level) usage. > We need a better test of basic bandwidth and latency in order to measure > and characterize message passing performance. > > > Here are suggestions and requirements, in an outline form, for a low > level benchmark design: > > > > I. Single and double (bidirectional) messages. > > A. Test single messages, not round trips. > 1. The round trip test is an algorithm and a pattern. As > such it should not be used as the basic low level test of > bandwidth. > 2. Use direct measurements where possible (which is nearly > always). For experimental design, the simplest method is > the most desirable and best. > 3. Do not perform least squares fits A PIORI. We know that > the various message passing mechanisms are not linear or > analytic because different mechanisms are used for different > message sizes. It is not necessarily known before hand > where this transition occurs. Some computer systems have > more than two regimes and their boundaries are dynamic. > 4. Our discussion of least squares fitting is loosing tract > of experimental design versus modeling. For example, the > least squares parameter for t_0 from COMMS1 is not a better > estimate of latency than actual measurements (assuming > that the timer resolution is adequate). A "better" way to > measure latency is to perform addition DIRECT measurements, > repetitions or otherwise, and hence decrease the statistical > error. The fitting as used in the COMMS programs SPREADS > error. It does not reduce error and hence it is not a > good technique for measuring such an important parameter > as latency. > > B. Do not test zero length messages. Though valid, zero length > messages are likely to take special paths through library > routines. This special case is not particularly interesting or > important. > 1. In practice, the most common and important message size is 64 > bits (one word). The time for this message is the starting > point for bandwidth characterization. > > D. Record all times and use statistics to characterize the message > passing time. That is, do not prime or warm up caches > or buffers. Timings for unprimed caches and buffers give > interesting and important bounds. These timings are also the > nearest to typical usage. > 1. Characterize message rates by a minimum, maximum, average > and standard deviation. > > E. Test inhomogeneity of the communication network. The basic > message test should be performed for all pairs of PEs. > > > II. Contention. > > A. Measure network contention relative to all PEs sending and/or > receiving messages. > > B. Do not use high level routines where the algorithm is not known. > 1. With high level algorithms, we cannot deduce which component > of the timing is attributable to the "operation count" > and which is attributable to the actual system (hardware) > performance. > > > III. Barrier. > > A. Simple test of barrier time for all numbers of processors. > > > > > Additionally, the suite should be easy to use. C and Fortran programs > for direct measurements of message passing times are short and simple. > These simple tests are of order 100 lines of code and, at least in > Fortran 90, can be written in a portable and reliable manner. > > The current Parkbench low level suite does not satisfy the above > requirements. It is inaccurate, as pointed out by previous letters, and > uses questionable techniques and methodologies. It is also difficult to > use, witness the proliferation of files, patches, directories, libraries > and the complexity and size of the Makefiles. > > This Low Level suite is a burden for those who are expecting a tool to > evaluate and investigate computer performance. The suite is becoming > a liability for our group. As such, it should be withdrawn from > distribution. > > I offer to write, test and submit a new set of programs which satisfy > most of the above requirements. > > > Charles Grassl > SGI/Cray Research > Eagan, Minnesota USA > ---------------End of Original Message----------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 01/26/98 - Time: 10:14:38 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 11:54:37 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id LAA07118; Mon, 26 Jan 1998 11:54:37 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id LAA18845; Mon, 26 Jan 1998 11:21:36 -0500 (EST) Received: from timbuk.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id LAA18837; Mon, 26 Jan 1998 11:21:33 -0500 (EST) Received: from ironwood.cray.com (root@ironwood-fddi.cray.com [128.162.21.36]) by timbuk.cray.com (8.8.7/CRI-gate-news-1.3) with ESMTP id KAA23428 for ; Mon, 26 Jan 1998 10:21:26 -0600 (CST) Received: from magnet.cray.com (magnet [128.162.173.162]) by ironwood.cray.com (8.8.4/CRI-ironwood-news-1.0) with ESMTP id KAA29079 for ; Mon, 26 Jan 1998 10:21:24 -0600 (CST) From: Charles Grassl Received: by magnet.cray.com (8.8.0/btd-b3) id QAA29329; Mon, 26 Jan 1998 16:21:23 GMT Message-Id: <199801261621.QAA29329@magnet.cray.com> Subject: Low Level Benchmarks To: parkbench-comm@CS.UTK.EDU Date: Mon, 26 Jan 1998 10:21:23 -0600 (CST) X-Mailer: ELM [version 2.4 PL24-CRI-d] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit To: Parkbench interests From: Charles Grassl Subject: Low Level benchmarks Date: 26, January, 1998 A short review of where we have been and decided: Last year we agreed (via email exchanges) that the Parkbench Low Level benchmark suite is not intended to be an -MPI- -test- suite. There was a consensus that we intended to measure low level performance, not algorithm design or implementation. This is why the Pallas benchmark, though useful for testing the performance of several important MPI functions, is not the basic low level test which we desire. (I believe that the performance measurement of the MPI functions is a worthwhile project for this group, but it needs to be separate from the low level benchmarks.) At the May, 1997 Parkbench meeting in Knoxville, TN, we unanimously decided that the measurement and analysis (fitting) portions of the COMMS programs would be made into a separate program. This from Michael Berry's minutes (23 May 1997): After more discussion, the following COMMS changes/outputs were unanimously agreed upon: 1. Maximum bandwidth with corresp. message size. 2. Minimum message-passing time with corresp. message size. 3. Time for minimum message length (could be 0, 1, 8, or 32 bytes but must be specified). 4. The software will be split into two program: one to report the spot measurements and the other for the analysis. Some of the objections with the Parkbench Low Level codes are that they are difficult to build, run and analyze. This attributable to their organization and design. Separating the analysis would greatly simply the programs, but the programs still need to be rewritten. I include in this email message a simple replacement code for COMMS1. It uses the "back and forth" methodology, reports maximum and minimum times with corresponding sizes and and does not include "analysis". It is equivalent to the measurement portion of COMMS1, though it is much simpler and easier to use. I will comment on the experimental methodology used in this program. - The reported times in standard out are actual round trip times. It is a poor experimental practice to modify raw measurements too early. We should not mix measured times with derived times. The practice leads to confusion and errors (witness the Pallas benchmark code and and an earlier version of Parkbench). If we desire to divide the times by two (because of the round trip), then this should be done in a analysis portion. Otherwise we misrepresent round trip times as actual single trip times, which hay are not. - All times are saved and written to unit 7. The reported times in standard out are the first and the last measurements for each message size. The experimental principle is that no data should be discarded with out analysis. We can use statistical analysis or graphics or fitting routines to analyze the raw output. (I favor graphics and statistical analysis.) If we look at the raw output, we will see interesting features, such as the actual "warm up" count (usually five or less repetitions) and the distribution of times (not Gaussian!). - Each repetition is individually timed. If the timer does not have adequate resolution, then the times for a number of repetitions, from two to all, can be aggregated and used. This aggregation can be done in the analysis phase. (Most computers should be able to time and resolve single round trip messages.) This aggregation should not be done before adequate analysis or evidence that it needs to be done. - Each message size is tested the same number of repetitions. We prefer to keep this number a constant so that the experimental sampling error (proportional to 1/sqrt[repetitions]) is the same for each message size. Also, it is difficult to cleanly and simply adjust the repetition count relative to the message size. I also have one replacement program for both COMMS2 and COMMS3 (note that the COMMS2 measurement is a subset of COMMS3 measurements). More on that later. Charles Grassl SGI/Cray Research Eagan, Minnesota USA ----------------------------------------------------------------------- program Single ! Compile: f90 file.f -l mpi character*40 Title data Title/' Single Messages --- MPI'/ integer log2nmax,nmax,n_repetitions parameter (log2nmax=18,nmax=2**log2nmax,n_repetitions=50) integer n_starts,n_mess parameter (n_starts=2,n_mess=2) include 'mpif.h' integer ier,status(MPI_STATUS_SIZE) integer my_pe,npes integer log2n,n,nrep,i real*8 t_call,timer,tf(0:n_repetitions) real*8 A(0:nmax-1) save A call mpi_init( ier ) call mpi_comm_rank(MPI_COMM_WORLD, my_pe, ier) call mpi_comm_size(MPI_COMM_WORLD, npes, ier) radian=1 do i=0,nmax-1 A(i) = acos(radian)*i end do tf(0) = timer() do nrep=1,n_repetitions tf(nrep) = timer() end do t_call=(tf(n_repetitions)-tf(0))/n_repetitions if (my_pe.eq.0) then call table_top(Title,npes,n_starts,n_mess,n_repetitions,t_call) end if do log2n=0,log2nmax n = 2**log2n call mpi_barrier( MPI_COMM_WORLD, ier ) tf(0) = timer() do nrep=1,n_repetitions if (my_pe.eq.1) then call MPI_SEND(A,8*n,MPI_BYTE,0,10,MPI_COMM_WORLD,ier) call MPI_RECV(A,8*n,MPI_BYTE,0,20,MPI_COMM_WORLD,status,ier) end if if (my_pe.eq.0) then call MPI_RECV(A,8*n,MPI_BYTE,1,10,MPI_COMM_WORLD,status,ier) call MPI_SEND(A,8*n,MPI_BYTE,1,20,MPI_COMM_WORLD,ier) end if tf(nrep) = timer() end do if (my_pe.eq.0) then call table_body(8*n,n_mess,n_repetitions,tf,t_call) end if end do call mpi_finalize(ier) end subroutine table_top( Title,npes, . n_starts,n_mess,n_repetitions,t_call) integer M parameter (M = 1 000 000) character*40 Title integer npes,n_starts,n_mess,n_repetitions real*8 t_call write(6,9010) Title,npes,n_starts,n_mess,n_repetitions,t_call*M return 9010 format(//a40, . // ' Number of PEs: ',i8 . // ' Starts: ',i8, . / ' Messages: ',i8, . / ' Repetitions: ',i8, . / ' Timer overhead: ',f8.3,' microsecond', . // 8x,' First ', . ' Last ', . /' Length',2x,2(' Time Rate ',1x), . /' [Bytes]',2x,2(' [Microsec.] [Mbyte/s]',1x), . /' ',8('-'),2x,2(21('-'),2x)) end subroutine table_body(n_byte,n_mess,n_repetitions,tf,t_call) integer M parameter (M = 1 000 000) integer n_byte,n_mess,n_repetitions,i real*8 tf(0:n_repetitions) real*8 t_call real*8 t_first,t_last t_first = (tf(1)-tf(0))-t_call t_last = (tf(n_repetitions)-tf(n_repetitions-1))-t_call write(6,9020) n_byte,t_first*M,n_mess*n_byte/(t_first*M), . t_last *M,n_mess*n_byte/(t_last *M) write(7) n_byte,n_repetitions,n_mess write(7) ((tf(i)-tf(i-1))-t_call,i=1,n_repetitions) return 9020 format(i8, 2x,2(f10.1,1x,f10.0,2x)) end From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 13:06:36 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id NAA08767; Mon, 26 Jan 1998 13:06:36 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA23400; Mon, 26 Jan 1998 12:31:16 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA23166; Mon, 26 Jan 1998 12:30:05 -0500 (EST) Received: from mordillo ([195.102.195.125]) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA15447; Mon, 26 Jan 98 17:31:15 GMT Date: Mon, 26 Jan 98 17:23:08 GMT From: Mark Baker Subject: Fw: Re: Low Level Benchmarks To: parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: <34CCB99F.2B3C3D63@cumbria.eng.sun.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII This came direct to me... The rest of Parkbench are probably interested in Bodo's comments. Mark --- On Mon, 26 Jan 1998 08:28:15 -0800 Bodo Parady - SMCC Performance Development wrote: > The key items to find are: > > Lock time (defined as time to release a lock remotely) > Example would be reader spinning on memory, waiting > for change in memory word, or receipt of interrupt. > This is the effective ping-pong half time. Sadly > subroutine and library call overhead can render > this result meaningless. > > Measuring one way rates is no good here since the response > time must be factored in. This is a two-way transfer > > Channel rate (defined as large block transfer rate). > > Block size at half channel rate. > > Block size at twice lock time latency. > > Full curve, stepping at 1, 2, 4, 8, 16, ..., 2*n byte block sizes > at full issue rate. This is probably the least important > since it involves coalescence of transmitted data. > > The fear is that given the limitations of MPI/PVM, and to some degree > of C and Fortran that accurate measures of these quantities may > not be practical. > > Regards. > > Bodo Parady > > Mark Baker wrote: > > > Charles, > > > > Thanks for your thoughts and experiences with the Pallas PMB codes - > > I will forward them to the authors... The main points in favour of > > the PMB codes are that they are in C and potentially produce results > > for a variety of MPI calls... Obviously if the results they produce are > > flawed... > > > > Regarding new low-level codes I would be in favour of taking up your > > kind offer of writing a set of codes in C/Fortran. I guess the main > > problem is getting a concensus with regards methodology and measurements > > that are used with these codes. > > > > Maybe we can decide that a number of actions should be undertaken... > > > > 1) It seems clear that no one is 100% happy with the current version > > of the low-level codes. So, this implies that they need to be > > replaced !? > > > > 2) If we are going to replace the codes we can go down a couple of routes; > > start from scratch, replace with Roger's new codes or some combination of > > both... > > > > 3) I would be happy to see us start from scratch and create > > C/Fortran codes where the methodology and design of each can be > > "hammered out" by discussion first and then implemented > > (and iterated as necessary). > > > > 4) Assuming that we want to go down this route, I suggest we make a starting > > point of Charles' "suggestions and requirements for the low level > > benchmark design" - towards the end of this email. I am happy to > > put these words on the web and update/change them as our dicussions > > evolve... > > > > 5) Charles has offered his services to help write/design/test these new codes - > > I'm willing to offer my services in a similar fashion. I'm sure that others > > interested in the low-level codes could contribute something here as well. > > > > Overall, it seems clear to me that we have enough energy and manpower to > > produce a new set low-level codes whose methodology and design is correct > > and relevant to todays systems... > > > > I look forward to your comments... > > > > Regards > > > > Mark > > > > --- On Thu, 15 Jan 1998 11:11:39 -0600 (CST) Charles Grassl wrote: > > > > > > To: Parkbench interests > > > From: Charles Grassl > > > Subject: Low Level benchmarks > > > > > > Date: 15 January, 1998 > > > > > > > > > Mark, thank you for pointing us to the PMB benchmark. It is well written > > > and coded, but has some discrepancies and shortcomings. My comments > > > lead to suggestions and recommendation regarding low level communication > > > benchmarks. > > > > > > First, in program PMB the PingPong tests are twice as fast (in time) > > > as the corresponding message length tests in the PingPing tests (as run > > > on a CRAY T3E). The calculation of the time and bandwidth is incorrect > > > by a factor of 100% in one of the programs. > > > > > > This error can be fixed by recording, using and reporting the actual > > > time, amount of data sent and their ratio. That is, the time should not > > > be divided by two in order to correct for a round trip. This recorded > > > time is for a round trip message, and is not precisely the time for > > > two messages. Half the round trip message passing time, as reported in > > > the PMB tests, is not the time for a single message and should not be > > > reported and such. This same erroneous technique is used in the COMMS1 > > > and COMMS2 two benchmarks. (Is Parkbench is responsible for propagating > > > this incorrect methodology.) > > > > > > In program PMB, the testing procedure performs a "warm up". This > > > procedure is a poor testing methodology because is discards important > > > data. Testing programs such as this should record all times and calculate > > > the variance and other statistics in order to perform error analysis. > > > > > > Program PMB does not measure contention or allow extraction of network > > > contention data. Tests "Allreduce" and "Bcast" and several others > > > stress the inter-PE communication network with multiple messages, > > > but it is not possible to extract information about the contention from > > > these tests. The MPI routines for Allreduce and Bcast have algorithms > > > which change with respect to number of PEs and message lengths, Hence, > > > without detailed information about the specific algorithms used, we cannot > > > extract information about network performance or further characterize > > > the inter-PE network. > > > > > > Basic measurements must be separated from algorithms. Tests PingPong, > > > PingPing, Barrier, Xover, Cshift and Exchange are low level. Tests > > > Allreduce and Bcast are algorithms. The algorithms Allreduce and Bcast > > > need additional (algorithmic) information in order to be described in > > > terms of the basic level benchmarks. > > > > > > > > > With respect to low level testing, the round trip exchange of messages, > > > as per PingPing and PingPong in PMB or COMMS1 and COMMS2, is not > > > characteristic of the lowest level of communication. This pattern > > > is actually rather rare in programming practice. It is more common > > > for tasks to send single messages and/or to receive single messages. > > > In this scheme, messages do not make a round trip and there is not > > > necessarily caching or other coherency effects. > > > > > > The single message passing is a distinctly different case from that > > > of round trip tests. We should be worried that the round trip testing > > > might introduce artifacts not characteristic of actual (low level) usage. > > > We need a better test of basic bandwidth and latency in order to measure > > > and characterize message passing performance. > > > > > > > > > Here are suggestions and requirements, in an outline form, for a low > > > level benchmark design: > > > > > > > > > > > > I. Single and double (bidirectional) messages. > > > > > > A. Test single messages, not round trips. > > > 1. The round trip test is an algorithm and a pattern. As > > > such it should not be used as the basic low level test of > > > bandwidth. > > > 2. Use direct measurements where possible (which is nearly > > > always). For experimental design, the simplest method is > > > the most desirable and best. > > > 3. Do not perform least squares fits A PIORI. We know that > > > the various message passing mechanisms are not linear or > > > analytic because different mechanisms are used for different > > > message sizes. It is not necessarily known before hand > > > where this transition occurs. Some computer systems have > > > more than two regimes and their boundaries are dynamic. > > > 4. Our discussion of least squares fitting is loosing tract > > > of experimental design versus modeling. For example, the > > > least squares parameter for t_0 from COMMS1 is not a better > > > estimate of latency than actual measurements (assuming > > > that the timer resolution is adequate). A "better" way to > > > measure latency is to perform addition DIRECT measurements, > > > repetitions or otherwise, and hence decrease the statistical > > > error. The fitting as used in the COMMS programs SPREADS > > > error. It does not reduce error and hence it is not a > > > good technique for measuring such an important parameter > > > as latency. > > > > > > B. Do not test zero length messages. Though valid, zero length > > > messages are likely to take special paths through library > > > routines. This special case is not particularly interesting or > > > important. > > > 1. In practice, the most common and important message size is 64 > > > bits (one word). The time for this message is the starting > > > point for bandwidth characterization. > > > > > > D. Record all times and use statistics to characterize the message > > > passing time. That is, do not prime or warm up caches > > > or buffers. Timings for unprimed caches and buffers give > > > interesting and important bounds. These timings are also the > > > nearest to typical usage. > > > 1. Characterize message rates by a minimum, maximum, average > > > and standard deviation. > > > > > > E. Test inhomogeneity of the communication network. The basic > > > message test should be performed for all pairs of PEs. > > > > > > > > > II. Contention. > > > > > > A. Measure network contention relative to all PEs sending and/or > > > receiving messages. > > > > > > B. Do not use high level routines where the algorithm is not known. > > > 1. With high level algorithms, we cannot deduce which component > > > of the timing is attributable to the "operation count" > > > and which is attributable to the actual system (hardware) > > > performance. > > > > > > > > > III. Barrier. > > > > > > A. Simple test of barrier time for all numbers of processors. > > > > > > > > > > > > > > > Additionally, the suite should be easy to use. C and Fortran programs > > > for direct measurements of message passing times are short and simple. > > > These simple tests are of order 100 lines of code and, at least in > > > Fortran 90, can be written in a portable and reliable manner. > > > > > > The current Parkbench low level suite does not satisfy the above > > > requirements. It is inaccurate, as pointed out by previous letters, and > > > uses questionable techniques and methodologies. It is also difficult to > > > use, witness the proliferation of files, patches, directories, libraries > > > and the complexity and size of the Makefiles. > > > > > > This Low Level suite is a burden for those who are expecting a tool to > > > evaluate and investigate computer performance. The suite is becoming > > > a liability for our group. As such, it should be withdrawn from > > > distribution. > > > > > > I offer to write, test and submit a new set of programs which satisfy > > > most of the above requirements. > > > > > > > > > Charles Grassl > > > SGI/Cray Research > > > Eagan, Minnesota USA > > > > > > > ---------------End of Original Message----------------- > > > > ------------------------------------- > > CSM, University of Portsmouth, Hants, UK > > Tel: +44 1705 844285 Fax: +44 1705 844006 > > E-mail: mab@sis.port.ac.uk > > Date: 01/26/98 - Time: 10:14:38 > > URL http://www.sis.port.ac.uk/~mab/ > > ------------------------------------- > > > > ---------------End of Original Message----------------- ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 01/26/98 - Time: 17:23:08 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Jan 26 14:08:38 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id OAA11289; Mon, 26 Jan 1998 14:08:37 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id NAA02837; Mon, 26 Jan 1998 13:52:54 -0500 (EST) Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id NAA02817; Mon, 26 Jan 1998 13:52:50 -0500 (EST) Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id NAA11755; Mon, 26 Jan 1998 13:52:49 -0500 (EST) Date: Mon, 26 Jan 1998 13:52:49 -0500 (EST) From: Pat Worley Message-Id: <199801261852.NAA11755@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Re: Fw: Re: Low Level Benchmarks In-Reply-To: Mail from 'Mark Baker ' dated: Mon, 26 Jan 98 17:23:08 GMT Cc: worley@haven.EPM.ORNL.GOV (From Charles Grassl) > Last year we agreed (via email exchanges) that the Parkbench Low Level > benchmark suite is not intended to be an -MPI- -test- suite. There was a > consensus that we intended to measure low level performance, not algorithm > design or implementation. (From Bodo Parady via Mark Baker) > The fear is that given the limitations of MPI/PVM, and to some degree > of C and Fortran that accurate measures of these quantities may > not be practical. > I have a problem with attempting to determine low level communication performance parameters independent of the communication library when it a) is such a difficult task (I doubt that any portable program will be "accurate enough" across all the interesting platforms.) b) does not reflect what users would see in practice (since they will be using MPI or PVM in C or Fortran). Am I missing something? The primary utility (for me) of the low level benchmarks is to help explain the performance observed in the Parkbench kernels and compact applications, or in my own codes. What level of accuracy is required for such an application? Are more accurate or detailed measurements useful or doable? Upon reflection, such low(er) level performance data would be useful to the developer of a communication library, to help evaluate its performance, but that appears to require system-specific measurements (and system-specific interpretation). Is this really something we want to attempt? Pat Worley From owner-parkbench-comm@CS.UTK.EDU Thu Jan 29 16:29:33 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id QAA19023; Thu, 29 Jan 1998 16:29:33 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id QAA09768; Thu, 29 Jan 1998 16:18:48 -0500 (EST) Received: from haven.EPM.ORNL.GOV (haven.epm.ornl.gov [134.167.12.69]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id QAA09756; Thu, 29 Jan 1998 16:18:45 -0500 (EST) Received: (from worley@localhost) by haven.EPM.ORNL.GOV (8.8.3/8.8.3) id QAA01325; Thu, 29 Jan 1998 16:18:43 -0500 (EST) Date: Thu, 29 Jan 1998 16:18:43 -0500 (EST) From: Pat Worley Message-Id: <199801292118.QAA01325@haven.EPM.ORNL.GOV> To: parkbench-comm@CS.UTK.EDU Subject: Re: Fw: Re: Low Level Benchmarks In-Reply-To: Mail from 'Mark Baker ' dated: Mon, 26 Jan 98 17:23:08 GMT Cc: worley@haven.EPM.ORNL.GOV In a private exchange, Charles Grassl made a comment that he may come to regret: " We need more input, such as yours, as to what are the important parameters and what accuracy is needed. " so here are some random comments. I have been organizing my own performance data over the last couple of weeks. I never paid too much attention to the detailed output of my own ping-ping and ping-pong tests because it was not the end product of the research. It has been enlightening to look at it now. The entry point is http://www.epm.ornl.gov/~worley/studies/pt2pt.html I tried a couple of different fitting techniques, but decided that fits told me nothing that I was interested in. What I have found mildly interesting is to measure statistics of the data, and try to build a performance model using those. The difference is that the interpretation and value of the statistics (maximum observed bandwidth, time to send 0 length message, etc.) are not functions of any model error. The problem with fitting the data is that, no matter how often I tell myself that it is simply a compact representation of the data, I keep wanting to use assign meaning to the model parameters and use them in interplatform comparisons. In summary, I have changed my mind. I no longer support even simple fits to the data unless well-defined statistical measures of the data are also included (and emphasized). Pat Worley From owner-parkbench-comm@CS.UTK.EDU Mon Feb 9 05:05:12 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id FAA29859; Mon, 9 Feb 1998 05:05:11 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id EAA10483; Mon, 9 Feb 1998 04:57:14 -0500 (EST) Received: from gatekeeper.pallas.de (gatekeeper.pallas.de [194.45.33.1]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id EAA10476; Mon, 9 Feb 1998 04:57:07 -0500 (EST) Received: from mailhost.pallas.de (gatekeeper [194.45.33.1]) by gatekeeper.pallas.de (SMI-8.6/SMI-SVR4) with SMTP id KAA18803; Mon, 9 Feb 1998 10:50:10 +0100 Received: from schubert.pallas.de by mailhost.pallas.de (SMI-8.6/SMI-SVR4) id KAA03909; Mon, 9 Feb 1998 10:50:07 +0100 Received: from localhost by schubert.pallas.de (SMI-8.6/SMI-SVR4) id KAA11268; Mon, 9 Feb 1998 10:46:57 +0100 Date: Mon, 9 Feb 1998 10:46:45 +0100 (MET) From: Hans Plum X-Sender: hans@schubert Reply-To: Hans Plum To: cmg@cray.com, mab@sis.port.ac.uk, parkbench-comm@CS.UTK.EDU cc: snelling@fecit.co.uk Subject: Re: Low Level Benchmarks (fwd) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MimeMultipartBoundary" --MimeMultipartBoundary Content-Type: TEXT/PLAIN; charset=US-ASCII Hi, I am the "PMB person" at PALLAS Gmbh. I have heard about your discussions. First note that there is a new version PMB1.2, see http://www.pallas.de/pages/pmb.htm Also look at the PMB1.2_doc.ps.gz where we try to give the reasoning for all decisions made in PMB. We think nothing has been designed sloppy .. PMB has been developed from point of view of an application developer which I am. Of course a single person's view is limited, but for myself the information given by PMB provides a solid base for algorithmic estimates and decisions. That exactly what we wanted: Something EASY (and not COMPLETE) that covers may be 80% of the realistic situations. ------------------------------------------------------------- ---/--- Dr Hans-Joachim Plum phone : +49-2232-1896-0 / / PALLAS GmbH direct line: +49-2232-1896-18 / / / Hermuelheimer Strasse 10 fax : +49-2232-1896-29 / / / / D-50321 Bruehl email : plum@pallas.de / / / Germany URL : http://www.pallas.de / / PALLAS ------------------------------------------------------------- ---/--- --MimeMultipartBoundary-- From owner-parkbench-comm@CS.UTK.EDU Wed Apr 22 07:43:42 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id HAA03238; Wed, 22 Apr 1998 07:43:41 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA03111; Wed, 22 Apr 1998 07:05:23 -0400 (EDT) Received: from post.mail.demon.net (post-10.mail.demon.net [194.217.242.39]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA03104; Wed, 22 Apr 1998 07:05:21 -0400 (EDT) Received: from minnow.demon.co.uk ([158.152.73.63]) by post.mail.demon.net id aa1028865; 22 Apr 98 11:00 GMT Message-ID: Date: Wed, 22 Apr 1998 11:59:51 +0100 To: parkbench-comm@CS.UTK.EDU From: Roger Hockney Subject: Announcing PICT2.1 - Now fully Operational MIME-Version: 1.0 X-Mailer: Turnpike Version 3.03a To: the Parkbench discussion group From: Roger ANNOUNCING PICT 2.1 (1 Mar 1998) -------------------------------- I am pleased to announce the first fully-functional version of the Parkbench Interactive Curve-Fitting Tool (PICT). Provision is made for a wide range of screen sizes in pixels by allowing the user to make a suitable choice in the opening HTML page. All buttons now work. In particular Jack can have his least-squares fitting of the 2-parameters direct from the tool, and this can be performed over partial ranges of the data as required. The same applies to the Three-point fitting procedure to obtain the 3-parameter fits. There is also a nice "Temperature Gauge" feature that helps you minimise the error during manual fitting. The results of these fits can be assembled in a results file and annotated using the SAVE buttons. Under MSIE I find I am able to store these results in my local disk file system using SAVE as ... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The methodology of the 2-parameter curve fitting is given in detail in my book "The Science of Computer Benchmarking", see: http://www.siam.org/catalog/mcc07/hockney.htm The 3-parameter fit was described quite fully in my talk to the 11 Sep 1997 Parkbench meeting. I have finally written this up with pretty pictures for the PEMCS Web Journal. Look at: http://hpc-journals.ecs.soton.ac.uk/Workshops/PEMCS/fall-97/ talks/Roger-Hockney/perfprof1.html To try out PICT 2.1 please first try my own Demon Web space which has a counter from which I can judge usage: http://www.minnow.demon.co.uk/pict/source/pict2a.html If this gives problems, it is also mounted on the University of Westminster server: http://perun.hscs.wmin.ac.uk/LocalInfo/pict/source/pict2a.html We expect soon to make it available on the Southampton server. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PICT 2.1 has been tested by a small number of friends. Most problems and frustrations arise from either slowness of the server or of the users' computer. If download from Demon is slow or appears to hang, try the other server or try Demon later. Please do not conclude the applet is broken. I am confident it is not. A 10 to 20 second wait is normal when bringing up the requested graphical window/frame even on a good day. Once the graphical window is on your computer and the applet is running, the speed is determined by the speed of your computer. You may even disconnect from the Web at this stage and continue curve fitting with the applet with the data displayed. If you want new data, you must, of course, reconnect to the Web and use the GET DATA at URL button. Experience shows that the PICT applet will not respond satisfactorily on a computer with slower than a 100 MHz clock. This is because a lot of complex calculations must be performed as you drag the curves around the data. MSIE seems to work noticeably faster than Netscape on my Win95 PC. There is no cure for this except to use a faster computer. But again please do not think the applet is brocken. Please report experiences good or bad to: roger@minnow.demon.co.uk Constructive suggestions for improvement are also welcome. -- Roger Hockney. Checkout my new Web page at URL http://www.minnow.demon.co.uk From owner-parkbench-comm@CS.UTK.EDU Sun Jun 21 10:02:47 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA22167; Sun, 21 Jun 1998 10:02:47 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA06272; Sun, 21 Jun 1998 09:47:56 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA06265; Sun, 21 Jun 1998 09:47:54 -0400 (EDT) Received: from mordillo (p4.nas1.is5.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA17767; Sun, 21 Jun 98 14:50:05 BST Date: Sun, 21 Jun 98 14:43:42 +0000 From: Mark Baker Subject: New PEMCS papers To: parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear All, Two new papers have just been published by the PEMCS journal... 3.Comparing The Performance of MPI on the Cray T3E-900, The Cray Origin2000 And The IBM P2SC, by Glenn R. Luecke and James J. Coyle Iowa State University, Ames, Iowa 50011-2251, USA. 4.EuroBen Experiences with the SGI Origin 2000 and the Cray T3E, by A.J. van der Steen, Computational Physics, Utrecht University, Holland* See http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/ Regards Mark ------------------------------------- CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 06/21/98 - Time: 14:43:42 URL http://www.sis.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Fri Sep 11 12:05:18 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA19578; Fri, 11 Sep 1998 12:05:18 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id LAA20703; Fri, 11 Sep 1998 11:54:29 -0400 (EDT) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id LAA20636; Fri, 11 Sep 1998 11:53:20 -0400 (EDT) Received: from mordillo (p36.nas1.is5.u-net.net) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA11111; Fri, 11 Sep 98 16:48:17 BST Date: Fri, 11 Sep 98 14:38:08 +0000 From: Mark Baker Subject: CPE - Call for papers - Message Passing Interface-based Parallel Programming with Java To: javagrandeforum@npac.syr.edu, "'mpi-nt-users@erc.msstate.edu'" , "Dr. Kenneth A. Williams" , "Stephen L. Scott" , "Aad J. van der Steen" , Advanced Java , Alexander Reinefeld , Andy Grant , Anne Trefethen , Bryan Capenter , Charles Grassl , Dave Beckett , David Snelling , DIS Everyone , fagg@CS.UTK.EDU, gentzsch@genias.de, Guy Robinson , Hon W Yau , hpvm@cs.uiuc.edu, Jack Dongarra , java-for-cse@npac.syr.edu, Joao Gabriel Silva , jtap-club-clusters@mailbase.ac.uk, Ken Hawick , Mike Berry , mpijava-users@npac.syr.edu, owner-grounds@mail.software.ibm.com, parkbench-comm@CS.UTK.EDU, partners@globus.org, Paul Messina , Roland Wismueller , Steve Larkin - AVS , Terri Canzian , Tony Hey , topic@mcc.ac.uk, Vaidy Sunderam , Vladimir Getov , William Gropp X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Dear Colleague,, Firstly, I apologise for any cross-posting of this email. If this CFP is not in your field we would appreciate you forwarding it to your colleagues who may be in the field. This CFP can be found at http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java/ Regards Dr Mark Baker University of Portsmouth, UK ---------------------------------------------------------------------------- Call For Papers Special Issue of Concurrency: Practice and Experience Message Passing Interface-based Parallel Programming with Java Guest Editors Anthony Skjellum (MPI Software Technology, Inc.) Mark Baker (University of Portsmouth) A special issue of Concurrency: Practice and Experience (CPE) is being planned for Fall of 1999. Papers submitted and accepted for this issue will be published by John Wiley & Sons Ltd. in the CPE Journal and in addition will be made available electronically via the WWW. Background Recently there has been a great deal of interest in the idea that Java may be a good language for scientific and engineering computation, and in particular for parallel computing. The claims made on behalf of Java, that it is simple, efficient and platform-neutral - a natural language for network programming - make it potentially attractive to scientific programmers hoping to harness the collective computational power of networks of workstations and PCs, or even of the Internet. A basic prerequisite for parallel programming is a good communication API. Java comes with various ready-made packages for communication, notably an easy-to-use interface to BSD sockets, and the Remote Method Invocation (RMI) mechanism. Interesting as these interfaces are, it is questionable whether parallel programmers will find them especially convenient. Sockets and remote procedure calls have been around for about as long as parallel computing has been fashionable, and neither of them has been popular in that field. Both communication models are optimized for client-server programming, whereas the parallel computing world is mainly concerned with "symmetric" communication, occurring in groups of interacting peers. This symmetric model of communication is captured in the successful Message Passing Interface standard (MPI), established a few years ago. MPI directly supports the Single Program Multiple Data (SPMD) model of parallel computing, wherein a group of processes cooperate by executing identical program images on local data values. Reliable point-to-point communication is provided through a shared, group-wide communicator, instead of socket pairs. MPI allows numerous blocking, non-blocking, buffered or synchronous communication modes. It also provides a library of true collective operations (broadcast is the most trivial example). An extended standard, MPI 2, allows for dynamic process creation and access to memory in remote processes. Call For Papers This is a call for papers about the designs, experience, and results concerning the use of the Message Passing Interface (MPI) with Java are sought for a special issue of Concurrency Practice and Experience. Development of clear understanding of the opportunities, challenges, and state-of-the-art in scalable, peer-oriented messaging with Java are of interest and value to both the distributed computing and high performance computing communities. Topics of interest for this special issue include but are not limited to: -- Practical systems that use MPI and Java to solve real distributed high performance computing problems. -- Designs of systems for combining MPI-type functionality with Java. -- Approaches to APIs for object-oriented, group-oriented message passing with Java. -- Efforts to combine MPI with CORBA in a Java environment. -- Efforts to utilize aspects of the emerging MPI/RT standard are also of interest in the Java context. -- Efforts to do MPI interoperability (IMPI) using Java. -- Issues and both tactical and strategic solutions concerning MPI-1 and MPI-2 standard and features in conjunction with Java. -- Performance results and performance-enhancing techniques for such systems. -- Flexible frameworks and techniques for enabling High-Performance communication in Java Timescales for Submission There is a deadline of 15th December 1998 for submitted papers. Publication is currently scheduled for the third quarter of 1999. Activity Deadline Call For Papers 1st September 1998 Paper Submission 15th December 1998 Papers Returned 15th March 1999 Papers Approved 1st April 1999 Publication Q3 1999 Further details about this special issue can be found at: http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java/ ---------------------------------------------------------------------------- ------------------------------------- Dr Mark baker CSM, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 09/11/98 - Time: 14:38:08 URL http://www.dcs.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Tue Sep 15 22:24:43 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id WAA13441; Tue, 15 Sep 1998 22:24:43 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id WAA25353; Tue, 15 Sep 1998 22:23:26 -0400 (EDT) Received: from octane11.nas.nasa.gov (octane11.nas.nasa.gov [129.99.34.116]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id WAA25343; Tue, 15 Sep 1998 22:23:08 -0400 (EDT) Received: (from saini@localhost) by octane11.nas.nasa.gov (8.8.7/NAS8.8.7) id TAA24915; Tue, 15 Sep 1998 19:17:45 -0700 (PDT) From: "Subhash Saini" Message-Id: <9809151917.ZM24910@octane11.nas.nasa.gov> Date: Tue, 15 Sep 1998 19:17:44 -0700 In-Reply-To: Mark Baker "CPE - Call for papers - Message Passing Interface-based Parallel Programming with Java" (Sep 11, 2:38pm) References: X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: "'mpi-nt-users@erc.msstate.edu'" , "Aad J. van der Steen" , "Dr. Kenneth A. Williams" , "Stephen L. Scott" , Advanced Java , Alexander Reinefeld , Andy Grant , Anne Trefethen , Bryan Capenter , Charles Grassl , DIS Everyone , Dave Beckett , David Snelling , Guy Robinson , Hon W Yau , Jack Dongarra , Joao Gabriel Silva , Ken Hawick , Mark Baker , Mike Berry , Paul Messina , Roland Wismueller , Steve Larkin - AVS , Terri Canzian , Tony Hey , Vaidy Sunderam , Vladimir Getov , William Gropp , fagg@CS.UTK.EDU, gentzsch@genias.de, hpvm@cs.uiuc.edu, java-for-cse@npac.syr.edu, javagrandeforum@npac.syr.edu, jtap-club-clusters@mailbase.ac.uk, mpijava-users@npac.syr.edu, owner-grounds@mail.software.ibm.com, parkbench-comm@CS.UTK.EDU, partners@globus.org, topic@mcc.ac.uk Subject: AD _ Workshop Cc: mab@sis.port.ac.uk, saini@octane11.nas.nasa.gov Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii You are invited to attend the workshop (see below). Best regards, subhash ============================================================================== ***** REGISTER NOW ***** *** NO REGISTRATION FEE *** **** Last Day to Register is Sept. 23, 1998 **** "First NASA Workshop on Performance-Engineered Information Systems" ----------------------------------------------------------------- Sponsored by Numerical Aerospace Simulation Systems Division NASA Ames Research Center Moffett Field, California, USA September 28-29, 1998 Workshop Chairman: Dr. Subhash Saini http://science.nas.nasa.gov/Services/Training Invited Speakers: ------------------ Adve, Vikram (Rice University) Aida, K. (Tokyo Institute of Technology, JAPAN) Bagrodia, Rajive (University of California, Los Angeles) Becker, Monique (Institute Nationale des Tele. FRANCE) Berman, Francine (University of California, San Diego) Browne, James C. (University of Texas) Darema, Frederica (U.S. National Science Foundation-CISE) Dongarra, Jack (Oak Ridge National Laboratory) Feiereisen, Bill (NASA Ames Research Center) Fox, Geoffrey (Syracuse University) Gannon, Dennis (Indiana University) Gerasoulis, Apostolos (Rutgers University) Gunther, Neil J. (Performance Dynamics Company) Hey, Tony (University of Southampton UK) Hollingsworth, Jeff (University of Maryland) Jain, Raj (Ohio State University) Keahy, Kate (Los Alamos National Laboratory) Mackenzie, Lewis M. (University of Glasgow, Scotland UK) McCalpin, John (Silicon Graphics) Menasce, Daniel A. (George Mason University) Nudd, Graham (University of Warwick UK) Reed, Dan (University of Illinois) Saltz, Joel (University of Maryland) Simmons, Margaret (San Diego Supercomputer Center) Vernon, Mary (University of Wisconsin) Topics include: -------------- - Performance-by-design techniques for high-performance distributed information systems - Large transients in packet-switched and circuit-switched networks - Workload characterization techniques - Integrated performance measurement, analysis, and prediction - Performance measurement and modeling in IPG - Performance models for threads and distributed objects - Application emulators and simulation models - Performance prediction engineering of Information Systems including IPG - Performance characterization of scientific and engineering applications of interest to NASA, DoE, DoD, and industry - Scheduling tools for performance prediction of parallel programs - Multi-resolution simulations for large-scale I/O-intensive applications - Capacity planning for Web performance: metrics, models, and methods Contact: Marcia Redmond, redmond@nas.nasa.gov, (650) 604-4373 Registration: Advanced registration is required. Registration Fee: NONE. Registration Deadlines: Friday, September 23, 1998 There will be no onsite registration. Contact: Send registration information and direct questions to Marcia Redmond, redmond@nas.nasa.gov, (650) 604-4373. DESCRIPTION: The basic goal of performance modeling is to predict and understand the performance of a computer program or set of programs on a computer system. The applications of performance modeling are numerous, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. The most reliable technique for determining the performance of a program on a computer system is to run and time the program multiple times, but this can be very expensive and it rarely leads to any deep understanding of the performance issues. It also does not provide information on how performance will change under different circumstances, for example with scaling the problem or system parameters or porting to a different machine. The complexity of new parallel supercomputer systems presents a daunting challenge to the application scientists who must understand the system's behavior to achieve a reasonable fraction of the peak performance. The NAS Parallel Benchmarks (NPB) have exposed a large difference between peak and achievable performance. Such a dismal performance is not surprising, considering the complexity of these parallel distributed memory systems. At present, performance modeling, measurement, and analysis tools are inadequate for distributed/networked systems such as Information Power Grid (IPG). The purpose of performance-based engineering is to develop new methods and tools that will enable development of these information systems faster, better and cheaper. ================================================================================ Registration "First NASA Workshop on Performance-Engineered Information Systems" Send the following information to redmond@nas.nasa.gov Name _____________________________________________ Organization _____________________________________ Street Address ___________________________________ City ____________________ State __________________ Zip/Mail Code ___________ Country ________________ Phone ___________________ Fax ____________________ Email address ____________________________________ U.S. Citizen __________ Permanent Resident with Green Card ________ ******************************************************************************* Foreign National ________ (non-U.S. Citizen). Must complete the following information: Passport number ______________________ Name as it appears on passport _______________________________________ Date issued _____________ Date expires _________________ Country of citizenship____________________________ From owner-parkbench-lowlevel@CS.UTK.EDU Wed Oct 21 02:28:56 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id CAA13270; Wed, 21 Oct 1998 02:28:55 -0400 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id CAA23157; Wed, 21 Oct 1998 02:24:45 -0400 (EDT) Received: from mail2.one.net (mail2.one.net [206.112.192.100]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id CAA23150; Wed, 21 Oct 1998 02:24:43 -0400 (EDT) Received: from port-29-44.access.one.net ([206.112.210.106] HELO aol.com ident: IDENT-NOT-QUERIED [port 22788]) by mail2.one.net with SMTP id <17237-27384>; Wed, 21 Oct 1998 02:10:48 -0400 From: Online@nj.com To: Online@nj.com Subject: Advertise with Bulk Email! Message-Id: <19981021061048Z17237-27384+1398@mail2.one.net> Date: Wed, 21 Oct 1998 02:10:42 -0400 ___________________________________________________________ Anouncing a Bulk Friendly Isp! We Bulk Email! Are you tired of getting kicked offline for Bulk Emailing? Well now you can bulk email without getting kicked offline. Call Online Direct a Bulk Friendly ISP. 513 874 7437 For only 125$ a month plus a 50$ setup fee we will send out 35,000 emails a week for you. Plus provide you with a bullet proof pop 3 email acount so you can recieve all of your mail. Ask About our special offers up to 100,000 emails per day! Any type of bulk adversting! We Do it Right! Advertise Smart Bulk Email Today! Call Online Direct at 513 874 7437 We can also Provide bullet pop 3 email acounts! CALL TODAY! 513 874 7437 if you wish to be removed from this list please type remove in reply box From owner-parkbench-comm@CS.UTK.EDU Sun Oct 25 12:41:55 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA29754; Sun, 25 Oct 1998 12:41:54 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA29327; Sun, 25 Oct 1998 12:36:00 -0500 (EST) Received: from pan.ch.intel.com (pan.ch.intel.com [143.182.246.24]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id MAA29319; Sun, 25 Oct 1998 12:35:58 -0500 (EST) Received: from sedona.intel.com (sedona.ch.intel.com [143.182.218.21]) by pan.ch.intel.com (8.8.6/8.8.5) with ESMTP id RAA16591; Sun, 25 Oct 1998 17:35:56 GMT Received: from ccm.intel.com ([143.182.69.127]) by sedona.intel.com (8.9.1a/8.9.1a-chandler01) with ESMTP id KAA27181; Sun, 25 Oct 1998 10:35:54 -0700 (MST) Message-ID: <36336126.B26DEE2C@ccm.intel.com> Date: Sun, 25 Oct 1998 10:34:30 -0700 From: Anjaneya Chagam X-Mailer: Mozilla 4.05 [en] (Win95; I) MIME-Version: 1.0 To: parkbench-comm@CS.UTK.EDU, Anjaneya.Chagam@intel.com Subject: Question on parkbench source code in c X-Priority: 1 (Highest) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi: I am looking for packbench benchmarking programs source code in c language to do benchmarking comparison on Chime and PVM on NT platform @ Arizona State University. Could you please let me know if the parkbench programs are ported to c, if so where can I find them? Thanks a million. Name: Anjaneya R. Chagam Email: Anjaneya.Chagam@intel.com From owner-parkbench-comm@CS.UTK.EDU Mon Oct 26 06:36:26 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id GAA11147; Mon, 26 Oct 1998 06:36:25 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA07390; Mon, 26 Oct 1998 06:27:13 -0500 (EST) Received: from osiris.sis.port.ac.uk (root@osiris.sis.port.ac.uk [148.197.100.10]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id GAA07383; Mon, 26 Oct 1998 06:27:10 -0500 (EST) Received: from mordillo (pc297.sis.port.ac.uk) by osiris.sis.port.ac.uk (4.1/SMI-4.1) id AA15115; Mon, 26 Oct 98 11:29:54 GMT Date: Mon, 26 Oct 98 11:14:06 GMT From: Mark Baker Subject: Re: Question on parkbench source code in c To: Anjaneya.Chagam@intel.com, parkbench-comm@CS.UTK.EDU X-Mailer: Chameleon ATX 6.0.1, Standards Based IntraNet Solutions, NetManage Inc. X-Priority: 3 (Normal) References: <36336126.B26DEE2C@ccm.intel.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Anjaneya, The official Parkbench code are only available in Fortran 77. I rememer vaguely sometime back hearing about a graduate-students attempt to "port" some of the low-level codes to C. Charles Grassl (Cray) and I did a little work on some simple C PingPong codes. You can check-out these on... http://www.sis.port.ac.uk/~mab/TOPIC/ Regards Mark --- On Sun, 25 Oct 1998 10:34:30 -0700 Anjaneya Chagam wrote: > Hi: > I am looking for packbench benchmarking programs source code in c > language to do benchmarking comparison on Chime and PVM on NT platform @ > Arizona State University. Could you please let me know if the parkbench > programs are ported to c, if so where can I find them? > > Thanks a million. > > Name: Anjaneya R. Chagam > Email: Anjaneya.Chagam@intel.com > > ---------------End of Original Message----------------- ------------------------------------- DCS, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: mab@sis.port.ac.uk Date: 10/26/98 - Time: 11:14:07 URL: http://www.dcs.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Mon Nov 16 10:06:23 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA11375; Mon, 16 Nov 1998 10:06:23 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id JAA08949; Mon, 16 Nov 1998 09:01:33 -0500 (EST) Received: from del2.vsnl.net.in (del2.vsnl.net.in [202.54.15.30]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id JAA08936; Mon, 16 Nov 1998 09:01:28 -0500 (EST) Received: from sameer.myasa.com ([202.54.106.39]) by del2.vsnl.net.in (8.9.1a/8.9.1) with SMTP id TAA13392 for ; Mon, 16 Nov 1998 19:30:37 -0500 (GMT) From: "Kashmir Kessar Mart" To: Subject: Information Date: Mon, 16 Nov 1998 19:30:48 +0530 Message-ID: <01be1169$838dd020$276a36ca@sameer.myasa.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_006D_01BE1197.9D460C20" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.71.1712.3 X-MimeOLE: Produced By Microsoft MimeOLE V4.71.1712.3 This is a multi-part message in MIME format. ------=_NextPart_000_006D_01BE1197.9D460C20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear Sir,=20 I have seen your Web Site but could not understand what = your company is. Please let me know if you can provide me information regarding Walnut = Kernels. Regards Azad. ------=_NextPart_000_006D_01BE1197.9D460C20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear Sir,
          &nbs= p; =20 I have seen your Web Site but could not understand what your company=20 is.
Please let me know if you can = provide me=20 information regarding Walnut Kernels.
 
Regards
Azad.
------=_NextPart_000_006D_01BE1197.9D460C20-- From owner-parkbench-comm@CS.UTK.EDU Fri Dec 4 15:44:53 1998 Return-Path: Received: from CS.UTK.EDU by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id PAA21941; Fri, 4 Dec 1998 15:44:53 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA21231; Fri, 4 Dec 1998 15:18:56 -0500 (EST) Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id PAA21223; Fri, 4 Dec 1998 15:18:51 -0500 (EST) From: Received: (qmail 14706 invoked by uid 233); 4 Dec 1998 20:14:46 -0000 Date: 4 Dec 1998 20:14:46 -0000 Message-ID: <19981204201446.14705.qmail@gimli.genias.de> Reply-to: majordomo@genias.de To: parkbench-comm@CS.UTK.EDU Subject: Newsletter on Distributed and Parallel Computing Dear Colleague, as already announced a few weeks ago, this is now the very first issue of our bi-monthly electronic Newsletter on Distributed and Parallel Computing, DPC NEWS. !! If you want to receive DPC NEWS regularly, please just return this !! !! e-mail to majordomo@genias.de with !! !! !! !! subscribe newsletter or subscribe newsletter !! !! end end !! !! !! !! in the first two lines of the email-body (text area). !! This newsletter is a FREE service to the DPC Distributed and Parallel Computing community. It regularly informs on new developments and results in DPC, e.g. conferences, important weblinks, new books and other relevant news in distributed and parallel computing. We also keep all the information in the special newsletter section of our webpage ( http://www.genias.de/dpcnews/ ) to provide a wealth of infos for the DPC community. If you have any information which might fit into these DPC subjects, please send it to me together with the corresponding weblink, for publication in DPC News. We aim to reach a very broad community with this DPC Newletter. With Season's Greetings from GENIAS Wolfgang Gentzsch, CEO and President ===================================================================== DPC NEWSletter on Distributed and Parallel Computing GENIAS Software, December 1998 ------------------------------ http://www.genias.de/dpcnews/ GENIAS NEWS: EASTMAN CHEMICAL USES CODINE FOR MOLECULAR MODELING Eastman Chemical uses commercial quantum chemistry programs, like Gaussian, Jaguar, and Cerius2, to model chemical products, intermediates, catalysts, etc. The simulation jobs take between 1 hour and 6 days to complete. Queuing software is an important part of keeping the processors working at full utilization, without being overloaded. Since October, with the new CODINE release 4.2, Eastman has maintained over 95% CPU utilization on the available systems: http://www.genias.de/dpcnews/ BMW USES CODINE AND GRD FOR CRASH-SIMULATION At the BMW crash department, very complex compute-intensive PAM-CRASH simulations are performed on a cluster of 11 compute servers and more than 100 workstations, altogether over 370 CPUs. CODINE and GRD have optimized the utilization of this big cluster by distributing the load equally, dynamically and in an application oriented way, transparent to the 45 users: http://www.genias.de/dpcnews/ GRD MANAGES ACADEMIC COMPUTER CENTER http://www.genias.de/dpcnews/ QUEUING UP FOR GRD AT ARL ARMY RESEARCH LAB http://www.genias.de/dpcnews/ GENIAS ADDS DYNAMIC RESOURCE & POLICY MGMT TO LINUX http://www.genias.de/dpcnews/ GRD STOPPS FLOODING SYSTEM WITH MANY JOBS http://www.genias.de/dpcnews/ PaTENT MPI ACCELERATES MARC K7.3 FE ANALYSIS CODE http://www.genias.de/dpcnews/ + http://www.marc.com/Techniques/ CONFERENCES on DPC, Dec'98 - March'99: - Workshop on Performance Evaluation with Realistic Applications (sponsored by SPEC), San Jose, CA USA, Jan 25 1999: http://www.spec.org/news/specworkshop.html - ACPC99, 4th Int. Conf. on Parallel Computation, ACPC Salzburg, Austria, February 16-18 1999: http://www.coma.sbg.ac.at/acpc99/index.html - MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, Georgia USA, March 10-12 1999: http://www.mpidc.org - 9th SIAM Conf. on Parallel Processing for Scientific Computing, San Antonio, Texas USA, March 22-24 1999: http://www.siam.org/meetings/pp99/ - 25th Speedup Workshop, Lugano, Switzerland, March 25-26 1999: http://www.speedup.ch/Workshops/Workshop25Ann.html - CC99, 2nd German Workshop on Cluster Computing, Karlsruhe, Germany, March 25-26 1999: http://www.tu-chemnitz.de/informati/RA/CC99 More on GENIAS Webpage. http://www.genias.de/dpcnews/ NEW DPC BOOKS: - Parallel Computing Using Optimal Interconnections, Kequin Li, Yi Pan, Si Qing Zheng. Kluwer Publ 1998: http://www.mcs.newpaltz.edu/~li/pcuoi.html - High-Performance Computing, Contributions to Society,T.Tabor(Ed.),1998: http://www.tgc.com - Special Issue on Metacomputing, W. Nagel, R. Williams (Eds.), Int. J. Parallel Computing, Vol. 24, No. 12-13, Elsevier Science 1998: http://www.elsevier.nl/locate/parco More books on DPC on GENIAS Webpage: http://www.genias.de/dpcnews/ DPC WEBPAGES: - PRIMEUR: HPC electronic news magazine: http://www.hoise.com - PRIMEUR List of ESPRIT Projects: http://www.hoise.com/CECupdate/contentscecdec98.html - HPCwire, Email Newsletter: http://www.tgc.com/hpcwire.html/ - EuroTools, European HPCN Tools Working Group http://www.irisa.fr/eurotools - PTOOLS, Parallel Tools Consortium: http://www.ptools.org - TOP500: 500 fastest supercomputers: http://www.top500.org - PROSOMA: Technology fair describing hundreds of European CEC funded projects: http://www.prosoma.lu/ - Links to Linux Cluster Projects: http://www.linux-magazin.de/cluster/ More DPC WebLinks: http://www.genias.de/dpcnews/ NEWS ON HPC BENCHMARKS: - STREAM, Memory Performance Benchmark from John McCalpin: http://www.cs.virginia.edu/stream/ GENIAS JOBS: - For our CODINE/GRD Devel.Team: Software engineer with experience in GUI development under OSF/Motif, Java and Windows, distributed computing, resource mgnt systems under Unix and NT: http://www.genias.de/jobs/ CALL FOR PAPERS in upcoming Journals: - Message Passing Interface-based Parallel Programming with Java: deadline Dec. 15 1999: http://hpc-journals.ecs.soton.ac.uk/CPE/Special/MPI-Java End of DPC Newsletter ========================================================================== From owner-parkbench-comm@CS.UTK.EDU Sun Jan 24 12:15:02 1999 Return-Path: Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id MAA26189; Sun, 24 Jan 1999 12:15:01 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA08151; Sun, 24 Jan 1999 12:08:47 -0500 (EST) Received: from serv1.is4.u-net.net ([195.102.240.252]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id MAA08144; Sun, 24 Jan 1999 12:08:44 -0500 (EST) Received: from mordillo [195.102.198.114] by serv1.is4.u-net.net with smtp (Exim 1.73 #1) id 104T1E-0003IJ-00; Sun, 24 Jan 1999 17:08:17 +0000 Date: Sun, 24 Jan 1999 17:05:53 +0000 From: Mark Baker Subject: New PEMCS paper. To: parkbench-comm@CS.UTK.EDU X-Mailer: Z-Mail Pro 6.2, NetManage Inc. [ZM62_16H] X-Priority: 3 (Normal) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1 A new PEMCS paper has just been accepted and published... Comparing the Scalability of the Cray T3E-600 and the Cray Origin 2000 Using SHMEM Routines, by Glenn R. Luecke, Bruno Raffin and James J. Coyle, Iowa Sate University, Ames, Iowa USA Check out... http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/ Regards Mark ------------------------------------- DCS, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: Mark.Baker@port.ac.uk Date: 01/24/1999 - Time: 17:05:53 URL: http://www.dcs.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Tue Feb 2 08:17:19 1999 Return-Path: Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id IAA08459; Tue, 2 Feb 1999 08:17:19 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id HAA01393; Tue, 2 Feb 1999 07:42:18 -0500 (EST) Received: from serv1.is1.u-net.net (serv1.is1.u-net.net [195.102.240.129]) by CS.UTK.EDU with ESMTP (cf v2.9s-UTK) id HAA01386; Tue, 2 Feb 1999 07:42:16 -0500 (EST) Received: from [148.197.205.63] (helo=mordillo) by serv1.is1.u-net.net with smtp (Exim 2.00 #2) for parkbench-comm@cs.utk.edu id 107f7u-0005uS-00; Tue, 2 Feb 1999 12:40:22 +0000 Date: Tue, 2 Feb 1999 12:40:29 +0000 From: Mark Baker Subject: New PEMCS Paper - resend... To: parkbench-comm@CS.UTK.EDU X-Mailer: Z-Mail Pro 6.2, NetManage Inc. [ZM62_16H] X-Priority: 3 (Normal) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-1 Apologies for the resend - I think this email get lost when I sent it a couple of weeks back. --------------------------------------------------------------------------- A new PEMCS paper has just been accepted and published... "Comparing the Scalability of the Cray T3E-600 and the Cray Origin 2000 Using SHMEM Routines", by Glenn R. Luecke, Bruno Raffin and James J. Coyle, Iowa Sate University, Ames, Iowa USA Check out... http://hpc-journals.ecs.soton.ac.uk/PEMCS/Papers/ Regards Mark ------------------------------------- DCS, University of Portsmouth, Hants, UK Tel: +44 1705 844285 Fax: +44 1705 844006 E-mail: Mark.Baker@port.ac.uk Date: 02/02/1999 - Time: 12:40:29 URL: http://www.dcs.port.ac.uk/~mab/ ------------------------------------- From owner-parkbench-comm@CS.UTK.EDU Tue Mar 2 10:35:47 1999 Return-Path: Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id KAA18531; Tue, 2 Mar 1999 10:35:46 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA01804; Tue, 2 Mar 1999 10:18:56 -0500 (EST) Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id KAA01781; Tue, 2 Mar 1999 10:18:49 -0500 (EST) Received: (qmail 8905 invoked from network); 2 Mar 1999 15:19:10 -0000 Received: from fangorn.genias.de (192.129.37.74) by gimli.genias.de with SMTP; 2 Mar 1999 15:19:10 -0000 Received: (from daemon@localhost) by FANGORN.genias.de (8.8.8/8.8.8) id QAA13715; Tue, 2 Mar 1999 16:19:05 +0100 Date: Tue, 2 Mar 1999 16:19:05 +0100 Message-Id: <199903021519.QAA13715@FANGORN.genias.de> To: parkbench-comm@CS.UTK.EDU From: Majordomo@genias.de Subject: Welcome to newsletter Reply-To: Majordomo@genias.de -- Welcome to the newsletter mailing list! Please save this message for future reference. Thank you. If you ever want to remove yourself from this mailing list, send the following command in email to : unsubscribe Or you can send mail to with the following command in the body of your email message: unsubscribe newsletter or from another account, besides parkbench-comm@CS.UTK.EDU: unsubscribe newsletter parkbench-comm@CS.UTK.EDU If you ever need to get in contact with the owner of the list, (if you have trouble unsubscribing, or have questions about the list itself) send email to . This is the general rule for most mailing lists when you need to contact a human. Here's the general information for the list you've subscribed to, in case you don't already have it: The GENIAS Newsletter keeps you informed about new products, services and information about High Performance Computing. It serves as an addition to our printed newsletter that is distributed to our customers. To see our printed version, just visit our web-site http://www.genias.de and follow the link 'newsletter'. From owner-parkbench-comm@CS.UTK.EDU Wed Mar 3 02:34:35 1999 Return-Path: Received: from CS.UTK.EDU (CS.UTK.EDU [128.169.94.1]) by netlib2.cs.utk.edu with ESMTP (cf v2.9t-netlib) id CAA01271; Wed, 3 Mar 1999 02:34:35 -0500 Received: from localhost (root@localhost) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id CAA00679; Wed, 3 Mar 1999 02:32:28 -0500 (EST) Received: from gimli.genias.de (qmailr@GIMLI.genias.de [192.129.37.12]) by CS.UTK.EDU with SMTP (cf v2.9s-UTK) id CAA00668; Wed, 3 Mar 1999 02:32:25 -0500 (EST) Received: (qmail 10306 invoked from network); 3 Mar 1999 07:32:58 -0000 Received: from gandalf.genias.de (192.129.37.10) by gimli.genias.de with SMTP; 3 Mar 1999 07:32:58 -0000 Received: by GANDALF.genias.de (Smail3.1.28.1 #30) id m10I69J-000B10C; Wed, 3 Mar 99 08:32 MET Message-Id: From: gentzsch@genias.de (Wolfgang Gentzsch) Subject: sorry! To: parkbench-comm@CS.UTK.EDU Date: Wed, 3 Mar 99 8:32:57 MET Cc: gent@genias.de (Wolfgang Gentzsch) Reply-To: gentzsch@genias.de X-Mailer: ELM [version 2.3 PL11] Dear colleagues, I just discovered that the parkbench-comm@CS.UTK.EDU has been collected into our mailing list for our electronic DPC Newsletter. I very much appologize for this mistake. Thank you for your understanding! Kind regards Wolfgang -- -- subscribe now to http://www.genias.de/dpcnews/ -- - - - - - - - - - - - - - - - - - - - - - - - - - - - Wolfgang Gentzsch, CEO Tel: +49 9401 9200-0 GENIAS Software GmbH & Inc Fax: +49 9401 9200-92 Erzgebirgstr. 2 http://www.geniasoft.com D-93073 Neutraubling, Germany gentzsch@geniasoft.com - - - - - - - - - - - - - - - - - - - - - - - - - - - GENIAS Software Inc. Tel: 410 455 5580 UMBC Technology Center Fax: 410 455 5567 1450 S. Rolling Road http://www.geniasoft.com Baltimore, MD 21227, USA gentzsch@geniasoft.com = = = = = = = = = = = = = = = = = = = = = = = = = = = .