   Double Complex                                            August-1994



                     Users' and Installation guide
               for the GEMM-Based Level 3 BLAS Benchmark

                               Per Ling
                  Institute of Information Processing
                      University of Umea, Sweden

                            August 11, 1994



   Please see the users and installation guide for the double precision
   version of this benchmark program. Installing the complex version is
   similar. Apart from obvious differences, notice the flop count and
   the GEMM-Efficiency.

   GEMM-Efficiency is measured with problem configurations for ZGEMM,
   which in this respect ``corresponds'' to problem configurations used
   for the remaining Level 3 BLAS routines. Performance of the Level 3
   BLAS problems

      ZSYMM(  side, uplo,  m, n, alpha, A, lda, B, ldb, beta, C, ldc ),
      ZSYRK(  uplo, trans, n, k, alpha, A, lda, beta, C, ldc ),
      ZSYR2K( uplo, trans, n, k, alpha, A, lda, B, ldb, beta, C, ldc ),
      ZTRMM(  side, uplo, trans, diag, m, n, alpha, A, lda, C, ldc ),
      ZTRSM(  side, uplo, trans, diag, m, n, alpha, A, lda, C, ldc )

   where alpha = ( 0.9, 0.05 ), beta = ( 1.1, 0.03 ), and
   lda = ldb = ldc are compared with the performance of the following
   problems for ZGEMM:

      -----------------------------------------------------------
      Level 3 BLAS        |      Input parameters for ZGEMM
                          |
      routine  side trans | transa transb  m  n  k  A  B  C  beta
      -----------------------------------------------------------
                          |
      ZSYMM     'L'       |   'N'    'N'   m  n  m  A  B  C  beta
                'R'       |   'N'    'N'   m  n  n  B  A  C  beta
                          |
      ZSYRK          'N'  |   'N'    'T'   n  n  k  A  A  C  beta
                     'T'  |   'T'    'N'   n  n  k  A  A  C  beta
                          |
      ZSYR2K         'N'  |   'N'    'T'   n  n  k  A  B  C  beta
                     'T'  |   'T'    'N'   n  n  k  A  B  C  beta
                          |
      ZTRMM,              |
      ZTRSM     'L'       |  trans   'N'   m  n  m  A  B  C  one
                'R'       |   'N'   trans  m  n  n  B  A  C  one
                          |
      -----------------------------------------------------------
      (Parameters for ZGEMM not shown in the table, equals the
      parameters for the Level 3 BLAS routine ZGEMM is compared
      with. The value one for beta is, one = ( 1.0, 0.0 ).)

   The number of floating point operations (flop) performed by a Level 3
   BLAS routine is divided by the execution time in seconds, times
   1 000 000, to obtain the performance in megaflops. The number of
   floating point operations performed is calculated as follows:

      ----------------------------------------------------------------
      Level 3 BLAS        |  nops : number of operations for a Level 3
                          |         BLAS problem.
                          |  gops : number of operations for the
      routine  side diag  |         corresponding ZGEMM problem.
      ----------------------------------------------------------------
                          |
      ZSYMM     'L'       |  nops:
                          |   mult = ( m+1 )mn + min( mn, m( m+1 )/2 )
                          |   add  = mmn
                          |  gops:
                          |   mult = ( m+1 )mn + min( mn, mm )
                          |   add  = mmn
                          |
                'R'       |  nops:
                          |   mult = ( n+1 )mn + min( mn, n( n+1 )/2 )
                          |   add  = mnn
                          |  gops:
                          |   mult = ( n+1 )mn + min( mn, nn )
                          |   add  = mnn
                          |
      ZSYRK               |  nops:
                          |   mult = ( k+1 )( n( n+1 )/2 ) +
                          |                      min( nk, n( n+1 )/2 )
                          |   add  = k( n( n+1 )/2 )
                          |  gops:
                          |   mult = ( k+1 )nn + min( nk, nn )
                          |   add  = knn
                          |
      ZSYR2K              |  nops:
                          |   mult = ( 2k+1 )( n( n+1 )/2 ) +
                          |                       min( 2nk, n( n+1 ) )
                          |   add  = kn(n+1)
                          |  gops:
                          |   mult = ( k+1 )nn + min( nk, nn )
                          |   add  = knn
                          |
      ZTRMM,              |
      ZTRSM     'L'  'N'  |  nops:
                          |   mult = ( m( m+1 )/2 )n +
                          |                      min( mn, m( m+1 )/2 )
                          |   add  = ( m( m-1 )/2 )n
                'L'  'U'  |  nops:
                          |   mult = ( m( m-1 )/2 )n +
                          |                      min( mn, m( m+1 )/2 )
                          |   add  = ( m( m-1 )/2 )n
                'L'       |  gops:
                          |   mult = mmn + min( mn, mm )
                          |   add  = m( m-1 )n
                          |
                'R'  'N'  |  nops:
                          |   mult = m( n( n+1 )/2 ) +
                          |                      min( mn, n( n+1 )/2 )
                          |   add  = m( n( n-1 )/2 )
                'R'  'U'  |  nops:
                          |   mult = m( n( n-1 )/2 ) +
                          |                      min( mn, n( n+1 )/2 )
                          |   add  = m( n( n-1 )/2 )
                'R'       |  gops:
                          |   mult = mnn + min( mn, nn )
                          |   add  = m( n-1 )n
                          |
      ----------------------------------------------------------------

   The total number of operations (NOP) is

   o   NOP = 6*mult + 2*add.

   For hermitian matrices, the imaginary part of the diagonal
   elements is always zero. Therefore it is not necessary to involve
   the imaginary parts of the diagonal elements in the computations.
   We can assume they are zero when reading a matrix and explicitly
   assign 0.0D+0 to them when writing a matrix.

   Performance of the Level 3 BLAS routines

      ZHEMM(  side, uplo,  m, n, alpha, A, lda, B, ldb, beta, C, ldc ),
      ZHERK(  uplo, trans, n, k, alpha, A, lda, beta, C, ldc ),
      ZHER2K( uplo, trans, n, k, alpha, A, lda, B, ldb, beta, C, ldc ),

   where alpha = ( 0.9, 0.05 ), beta = ( 1.1, 0.03 ), and
   lda = ldb = ldc are compared with the performance of the following
   problems for ZGEMM:

      -----------------------------------------------------------
      Level 3 BLAS        |      Input parameters for ZGEMM
                          |
      routine  side trans | transa transb  m  n  k  A  B  C
      -----------------------------------------------------------
                          |
      ZHEMM     'L'       |   'N'    'N'   m  n  m  A  B  C
                'R'       |   'N'    'N'   m  n  n  B  A  C
                          |
      ZHERK          'N'  |   'N'    'C'   n  n  k  A  A  C
                     'C'  |   'C'    'N'   n  n  k  A  A  C
                          |
      ZHER2K         'N'  |   'N'    'C'   n  n  k  A  B  C
                     'C'  |   'C'    'N'   n  n  k  A  B  C
                          |
      -----------------------------------------------------------
      (Parameters for ZGEMM not shown in the table, equals the
      parameters for the Level 3 BLAS routine that ZGEMM is
      compared with.)

   The number of floating point operations (flop) performed by the
   Level 3 BLAS routines involving a hermitian matrix is calculated
   as follows:

      ----------------------------------------------------------------
      Level 3 BLAS        |  nops : number of operations for a Level 3
                          |         BLAS problem.
                          |
                          |  gops : number of operations for the
      routine  side       |         corresponding ZGEMM problem.
      ----------------------------------------------------------------
                          |
      ZHEMM     'L'       |  nops:
                          |   mult = ( 6m+2 )mn + min( 6mn, 3mm-m )
                          |   add  = 2mmn
                          |  gops:
                          |   mult = ( 6m+6 )mn + min( 6mn, 6mm )
                          |   add  = 2mmn
                          |
                'R'       |  nops:
                          |   mult = ( 6n+2 )mn + min( 6mn, 3nn-n )
                          |   add  = 2mnn
                          |  gops:
                          |   mult = ( 6n+6 )mn + min( 6mn, 6nn )
                          |   add  = 2mnn
                          |
      ZHERK               |  nops:
                          |   mult = ( 3k+1 )nn + min( 2nk, nn )
                          |   add  = knn
                          |  gops:
                          |   mult = ( 6k+6 )nn + min( 6nk, 6nn )
                          |   add  = 2knn
                          |
      ZHER2K              |  nops:
                          |   mult = ( 6k+1 )nn + min( 12nk, 6nn-2n )
                          |   add  = 2knn
                          |  gops:
                          |   mult = ( 6k+6 )nn + min( 6nk, 6nn )
                          |   add  = 2knn
                          |
      ----------------------------------------------------------------

   where the total number of operations (NOP) is calculated as

   o   NOP = mult + add.

   Notice also that the type of the scalars alpha and beta is not
   always complex for the hermitian Level 3 BLAS routines.

                      alpha      beta

           ZHEMM:     complex    complex
           ZHERK:     real       real
           ZHER2K:    complex    real

   This of course affect the flop count.
