From dbailey@ew11.nas.nasa.gov Tue May 3 17:38:03 1988 Return-Path: Received: from anl-mcs.ARPA by antares.mcs.anl (3.2/SMI-3.2) id AA09315; Tue, 3 May 88 17:38:01 CDT Received: from ew11.nas.nasa.gov (3f166680) by anl-mcs.ARPA (4.12/4.9) id AA02534; Tue, 3 May 88 17:43:34 cdt Received: Tue, 3 May 88 15:40:25 PDT by ew11.nas.nasa.gov (5.52/1.2) Date: Tue, 3 May 88 15:40:25 PDT From: David Bailey Message-Id: <8805032240.AA08850@ew11.nas.nasa.gov> To: ahh@lanl.gov, brooks@maddog.llnl.gov, dongarra@anl-mcs.arpa, lyon@icst-cmr.arpa, mth@ornl-msr.arpa, rc@icst-cmr.arpa Subject: ITA Status: R I have prepared a vector logical benchmark test (attached). Please let me know if you have any comments. DHB --------------------------------- PROGRAM LOGICB C C This a vector logcial benchmark test proposed to be a part of the ITA C benchmark suite. It measures the long vector performance of a system C in performing full-word bit-wise logical operations. Because the current C Fortran standard does not provide an efficient means of specifying such C operations, they are specified in this program using these functions: C C IAND (I1, I2) 64-bit bit-wise "and" of I1 and I2 C IOR (I1, I2) 64-bit bit-wise "or" of I1 and I2 C IPAK (I1, I2) Packs the 32-bit I1 and 32-bit I2 into a 64-bit result C (the hardware format of this packing is immaterial) C IUPK1 (I3) Unpacks the 64-bit I3 to obtain the equivalent of I1 C in the definition of IPAK C IUPK2 (I3) Unpacks the 64-bit I3 to obtain the equivalent of I2 C in the definition of IPAK C C It is anticipated that some revision will be necessary to execute the C following code on a particular computer system. The revised code need not C conform to the Fortran-77; indeed, any language that utilizes the full C power of the hardware may be used, provided the same operations are C performed. C C This version assumes that the INTEGER data type can hold 64 bits of data, C and that arithmetic operations on positive integers are valid for results C up to 2^46. C C David H. Bailey May 3, 1988 C PARAMETER (N1 = 1024, N2 = 128, NN = N1 * N2) DIMENSION IA(N1,N2), IB(N1,N2), IC(N1,N2), ID(N1,N2), IE(N1,N2) C> C The following in-line definitions suffice for Cray computers: C IAND (I1, I2) = AND (I1, I2) IOR (I1, I2) = OR (I1, I2) IPAK (I1, I2) = OR (SHIFTL (I1, 32), I2) IUPK1 (I3) = SHIFTR (I3, 32) IUPK2 (I3) = AND (I3, 2 ** 32 - 1) C> C Fill the arrays with random bits. C CALL RANDL (0, IA) CALL RANDL (NN, IA) CALL RANDL (NN, IB) CALL RANDL (NN, IC) CALL RANDL (NN, ID) WRITE (6, 1) N1, N2 1 FORMAT ('VECTOR LOGICAL PERFORMANCE TEST (64 BITS PER WORD)'// $ 'ARRAY DIMENSIONS =', 2I8// 'CHECK VALUES:') C C Begin timing tests. The SECOND function is assumed to be the CPU C timing function on the given computer system. C T10 = SECOND () C DO 100 J = 1, N2 DO 100 I = 1, N1 IE(I,J) = AND (IA(1,1), IB(I,J)) 100 CONTINUE C T11 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(2,3)), IUPK2 (IE(2,3)) T20 = SECOND () C DO 110 J = 1, N2 DO 110 I = 1, N1 IE(I,J) = OR (IA(1,1), IB(I,J)) 110 CONTINUE C T21 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(5,7)), IUPK2 (IE(5,7)) T30 = SECOND () C DO 120 J = 1, N2 DO 120 I = 1, N1 IE(I,J) = AND (IA(I,J), IB(I,J)) 120 CONTINUE C T31 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(11,13)), IUPK2 (IE(11,13)) T40 = SECOND () C DO 130 J = 1, N2 DO 130 I = 1, N1 IE(I,J) = OR (IA(I,J), IB(I,J)) 130 CONTINUE C T41 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(17,19)), IUPK2 (IE(17,19)) T50 = SECOND () C DO 150 J = 1, N2 DO 150 I = 1, N1 IE(I,J) = OR (IA(I,J), AND (IA(1,1), IB(I,J))) 150 CONTINUE C T51 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(23,29)), IUPK2 (IE(23,29)) T60 = SECOND () C DO 160 J = 1, N2 DO 160 I = 1, N1 IE(I,J) = OR (IA(I,J), AND (IB(I,J), IC(I,J))) 160 CONTINUE C T61 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(31,37)), IUPK2 (IE(31,37)) T70 = SECOND () C DO 170 J = 1, N2 DO 170 I = 1, N1 IE(I,J) = OR (AND (IA(I,J), IB(I,J)), AND (IC(I,J), ID(I,J))) 170 CONTINUE C T71 = SECOND () WRITE (6, '(2I15)') IUPK1 (IE(41,43)), IUPK2 (IE(41,43)) C C Output results. C R1 = NN * 64. * 1E-6 R2 = 2. * R1 R3 = 3. * R1 WRITE (6, 2) R1 / (T11 - T10), R1 / (T21 - T20), $ R1 / (T31 - T30), R1 / (T41 - T40), R2 / (T51 - T50), $ R2 / (T61 - T60), R3 / (T71 - T70) 2 FORMAT (/'PERFORMANCES IN MLOPS:'/ 'V = S a V', F22.2/'V = S o V', $ F22.2/ 'V = V a V', F22.2/ 'V = V o V', F22.2/'V = V o (S a V)', $ F16.2/ 'V = V o (V a V)', F16.2/ 'V = (V a V) o (V a V)', F10.2) C STOP END C SUBROUTINE RANDL (N, IA) C C This a pseudo-random number generator for vector computers is based on a C lagged Fibonacci scheme with lags 5 and 17: C C IB(K) = IB(K-5) + IB(K-17) MOD 2^32 C C The IB array is actually a 128 x 17 array (in order to facilitate C vector processing). The array IA is obtained from IB. C C This version assumes that N is a multiple of 64. Subsequent calls C generate additional pseudorandom data in a continuous Fibonacci C sequence. It is initialized by calling with N equal to zero. This C routine should produce the same pseudorandom sequence on any system C that supports 64-bit INTEGER data with aritmetic valid up on positive C integers of size up to 2^46. C C David H. Bailey May 2, 1988 C PARAMETER (IBS = 2176, M32 = 2 ** 32 - 1, M = 5 ** 6, L = 2222, $ IT0 = 3141592653) DIMENSION IA(N) COMMON /RANP/ IP1, IP2, IB(IBS) C>> C The following in-line definitions suffice for Cray computers: C IAND (I1, I2) = AND (I1, I2) IOR (I1, I2) = OR (I1, I2) IPAK (I1, I2) = OR (SHIFTL (I1, 32), I2) IUPK1 (I3) = SHIFTR (I3, 32) IUPK2 (I3) = AND (I3, 2 ** 32 - 1) C>> C This section is executed only during initialization. C IF (N .EQ. 0) THEN IP1 = 0 IP2 = 1536 IB(1) = IT0 C C Use a linear congruential pseudorandom number generator to initialize IB. C DO 100 I = 2, IBS IB(I) = AND (M * IB(I-1) + L, M32) 100 CONTINUE ENDIF C C For a normal call, use a vectorizable lagged Fibonacci scheme. C Two 32-bit results are combined to generate one 64-bit output value. C DO 130 K = 0, N - 64, 64 C C Both of the next two loops are vectorizable. C> CDIR$ IVDEP DO 110 I = 1, 128 IB(I+IP1) = AND (IB(I+IP1) + IB(I+IP2), M32) 110 CONTINUE C DO 120 I = 1, 64 IA(I+K) = IPAK (IB(I+IP1), IB(I+IP1+64)) 120 CONTINUE C IP1 = IP1 + 128 IF (IP1 .EQ. IBS) IP1 = 0 IP2 = IP2 + 128 IF (IP2 .EQ. IBS) IP2 = 0 130 CONTINUE C RETURN END .