   Single Precision                                         January-1994


                   GEMM-Based Level 3 BLAS Benchmark


   The GEMM-Based Level 3 BLAS Benchmark is a tool for performance
   evaluation of Level 3 BLAS kernel programs. With the announcement of
   LAPACK, the need for high performance Level 3 BLAS kernels became
   apparent. LAPACK is based on calls to the Level 3 BLAS kernels. This
   benchmark measures and compares performance of a set of user-supplied
   Level 3 BLAS implementations and of the GEMM-Based Level 3 BLAS
   implementations permanently included in the benchmark. The purpose of
   the benchmark is to facilitate the user in determining the quality of
   different Level 3 BLAS implementations. The included GEMM-Based
   Level 3 BLAS routines provide a lower limit on the performance to be
   expected from a highly optimized Level 3 BLAS library.

   The user supplies a set of Level 3 implementations to be evaluated.
   These are linked with the benchmark program. When the benchmark
   executes, timings are performed according to specifications given in
   an input file. An example input file is given in the file
   'example.in'. The user may design his/her own tests or use the
   example input file, or the enclosed input files specifying proposed
   standard tests.

   The output optionally presents the following results:

      A   A collected mean value result, calculated from the
          performance results of the separate user-supplied Level 3
          routines for specified problem configurations.

      B   Tables, showing performance results in megaflops, and
          comparisons between different routines calculated as the
          performance result of one Level 3 BLAS routine divided by the
          performance results of another Level 3 BLAS routine.

   The purpose of the collected result A is to provide a performance
   result of the user-supplied routines which easily can be compared
   between different machines. We propose two standard tests with
   different problem configurations, SMARK01 and SMARK02 (see the input
   files 'smark01.in' and 'smark02.in').

   The tables B are intended for program developers and others who are
   interested in detailed performance information from the routines.

   All routines are written in Fortran 77 for portability. No changes to
   the code should be necessary in order to run the programs correctly
   on different target machines. In fact, we strongly recommend the user
   to avoid changes, except for the user specified parameters and for
   UNIT numbers for input and output communication. This will ensure
   that performance results from different target machines are
   comparable.

   We intend to collect benchmark results from different machines.
   Please send us the results obtained with the proposed standard input
   files 'smarkXX.in'.

   For further information see: Users' and Installation guide for the
   GEMM-Based Level 3 BLAS Benchmark, enclosed in the INSTALL file.



   Per Ling
   Institute of Information Processing
   University of Umea
   S-901 87 Umea, Sweden
   E-mail: pol@cs.umu.se



   For further information see:

   Anderson E., Bai Z., Bischof C., Demmel J., Dongarra J.,
       DuCroz J., Greenbaum A., Hammarling S., McKenney A.,
       Ostrouchov S., and Sorensen D., "LAPACK Users' Guide", Society
       for Industrial and Applied Mathematics, Philadelphia, 1992.

   Dongarra J. J., DuCroz J., Duff I., and Hammarling S., "A Set of
       Level 3 Basic Linear Algebra Subprograms", ACM Trans. Math.
       Softw., Vol. 16, No. 1, 1990, pp.1-17.

   Dongarra J. J., DuCroz J., Duff I., and Hammarling S., "Algorithm
       679: A Set of Level 3 Basic Linear Algebra Subprograms: Model
       Implementation and Test Programs", ACM Trans. Math. Softw.,
       Vol. 16, No. 1, 1990, pp.18-28.

   Kagstrom B., Ling P. and Van Loan C. "High Performance GEMM-Based
       Level-3 BLAS: Sample Routines for Double Precision Real Data",
       in High Performance Computing II, Durand M. and El Dabaghi F.,
       eds., Amsterdam, 1991, North-Holland, pp.269-281.

   Kagstrom B., Ling P. and Van Loan C. "Portable High Performance
       GEMM-Based Level-3 BLAS, in R. F. Sincovec et al, eds., Parallel
       Processing for Scientific Computing, SIAM Publications, 1993.

   Kagstrom B. and Van Loan C. "GEMM-Based Level-3 BLAS", Tech. rep.
       CTC91TR47, Department of Computer Science, Cornell University,
       Dec. 1989.
