SUBROUTINE RANKCI(X, M, Y, N, PERC, DPOINT, DLOW, DHIGH, RAN 10 * IERR, ILOW, IHIGH) C SUBROUTINE TO FIND THE POINT ESTIMATE AND CONFIDENCE INTERVAL C RELATED TO THE TWO-SAMPLE RANK TEST (MANN-WHITNEY, WILCOXON), C FOR THE PARAMETER D = MU(Y) - MU(X), WHERE MU IS A MEASURE OF C LOCATION. C THE STATISTICAL ASSUMPTIONS ARE THAT X AND Y ARE RANDOM SAMPLES C TAKEN INDEPENDENTLY FROM TWO DIFFERENT POPULATIONS, AND THAT C THE POPULATIONS HAVE THE SAME DISTRIBUTION EXCEPT FOR LOCATION. C IN PARTICUAR, IT IS ASSUMED THAT THE SCALE PARAMETERS (VARIANCES, C IF THEY EXIST) ARE EQUAL. C THIS ROUTINE FINDS THE POINT ESTIMATE FOR D AS THAT VALUE DPOINT C SUCH THAT THE MANN-WHITNEY U STATISTIC ACHIEVES ITS EXPECTED C VALUE FOR THE TEST OF THE NULL HYPOTHESIS D = DPOINT (THE MIDDLE C SUCH VALUE IF IT IS NOT UNIQUE). C THE CONFIDENCE INTERVAL IS FOUND AS THOSE VALUES OF DD SUCH THAT C THE NULL HYPOTHESIS D = DD IS NOT REJECTED BY THE TWO SAMPLE RANK C TEST. C THE METHOD OF CALCULATING THE TEST STATISTIC U IS THE ORIGINAL C METHOD OF MANN AND WHITNEY, MODIFIED TO TAKE ADVANTAGE OF HAVING C THE DATA IN ORDER. THIS SPEEDS CALCULATIONS CONSIDERABLY, WHICH C IS IMPORTANT SINCE THE METHODS USED MUST EVALUATE THE STATISTIC C REPEATEDLY. C THE POINT AND CONFIDENCE INTERVALS ARE FOUND BY ITERATION, USING C TRIMMED MEANS AS STARTING POINTS. THE BASIC ITERATION PROCEDURE C IS A MODIFICATION OF THE REGULA FALSI (LINEAR INTERPOLATION) C METHOD, (DOWELL + JARRATT, 1971,BIT) WHICH CONVERGES QUICKLY C BECAUSE OF THE ASYMTOTIC LINEARITY OF THE U STATISTIC AS A C FUNCTION OF D (JURECKOVA, 1971, ANN. MATH. STAT.). C THE PERCENTAGE OF THE CONFIDENCE INTERVAL IS OBTAINED BY A NORMAL C APPROXIMATION, USING CONTINUITY CORRECTION. C INPUT IS AS FOLLOWS C X - ARRAY OF DATA OF DIMENSION M IN NON-DECREASING ORDER C M - NUMBER OF OBSERVATIONS IN X SAMPLE (AT LEAST 2) C Y - ARRAY OF DATA OF DIMENSION N IN NON-DECREASING ORDER C N - NUMBER OF OBSERVATIONS IN Y SAMPLE (AT LEAST 2) C (X,M,Y,AND N ARE UNCHANGED) C PERC - DESIRED CONFIDENCE PERCENT (BETWEEN 49.999 AND 99.999) C (PERC IS CHANGED TO THE NEAREST ATTAINABLE CONFIDENCE) C OUTPUT IS AS FOLLOWS C IERR - 0 IF NORMAL COMPLETION C 1 IF TOO LITTLE DATA (M OR N LESS THAN 2) C 2 IF INVALID PERCENTAGE C 3 IF DATA IS NOT IN ORDER C PERC - ACTUAL (APPROX.) CONFIDENCE OF THE INTERVAL C DPOINT - POINT ESTIMATE OF D C DLOW - LOWER CONFIDENCE LIMIT FOR D C ILOW - THE ORDER OF THE DIFFERENCE DLOW C DHIGH - UPPER CONFIDENCE LIMIT FOR D C IHIGH - THE ORDER OF THE DIFFERENCE DHIGH C WRITTEN JUNE 1975 BY T. RYAN AND J. MCKEAN DIMENSION X(M), Y(N) C INITIALIZE DPOINT = 0.0 DLOW = 0.0 DHIGH = 0.0 IERR = 0 XM = M XN = N C ERROR CHECKING CALL CHECK(X, M, Y, N, PERC, IERR) IF (IERR.GT.0) RETURN C ********************************************************************** C PRELIMINARY ESTIMATE CALL TMEAN(X, M, XBAR, VARXB) CALL TMEAN(Y, N, YBAR, VARYB) D = YBAR - XBAR V = VARXB + VARYB SD = SQRT(V) SD = AMAX1(SD,1.0E-20) T = 2.0 IF (M.LT.10 .OR. N.LT.10 .OR. PERC.GT.96.0) T = 3.0 DL = D - T*SD DH = D + T*SD C ********************************************************************** C DEFINE TOLERANCES EPS1 AND EPS2. THESE C TOLERANCES WERE SELECTED FOR IBM 360-370 C ACCURACY, APPROXIMATELY 6.5 DECIMAL PLACES. C DEPENDING ON MACHINE BEING USED AND WHETHER C DATA WARRANTS SUCH PRECISION, THE TOLERANCE C MAY BE LOWERED. C BASE TOLERANCE ON RANGE OF DATA Z = X(M) - X(1) ZZ = Y(N) - Y(1) EPS1 = (1.0E-7)*AMAX1(Z,ZZ) C BASE TOLERANCE ON THE MAGNITUDE OF THE DATA K = M/2 Z = X(K) K = N/2 ZZ = Y(K) EPS2 = (1.0E-6)*AMAX1(Z,-Z,ZZ,-ZZ) EPS1 = AMAX1(EPS1,EPS2) C BASE TOLERANCE ON WIDTH OF CONFIDENCE INTERVAL C (BASED ON PRELIMINARY ESTIMATES). THE POINT C ESTIMATE SHOULD BE ACCURATE TO 4 SIG. DIGITS IN C THE WIDTH (APPROX.). EPS2 = SD*4.0E-4 EPS1 = AMAX1(EPS1,EPS2) C MAXIMUM ACCURACY SET TO 1.E-20 EPS1 = AMAX1(EPS1,1.0E-20) C ONE LESS SIGNIFICANT DIGIT IN END POINTS OF C THE CONFIDENCE INTERVAL THAN THE POINT ESTIMATE. EPS2 = 10.0*EPS1 C ********************************************************************** C GET CRITICAL VALUES OF THE U STATISTIC C ODD OR EVEN M*N DETERMINES HOW POINT ESTIMATE C IS CALCULATED IODD = 1 IF (MOD(M*N,2).EQ.0) GO TO 10 UMED1 = XM*XN/2.0 GO TO 20 10 UMED1 = XM*XN/2.0 - 0.5 UMED2 = UMED1 + 1.0 IODD = 0 C P IS ONE-TAILED PROB. 20 P = (100.0-PERC)/200.0 C FIND MEAN AND VARIANCE OF THE U-STATISTIC C FOR USE IN A NORMAL APPROXIMATION. UMU = XM*XN/2.0 UVAR = XM*XN*(XM+XN+1.0)/12.0 USIGMA = SQRT(UVAR) C NORMAL APPROX. (WITH CONTINUITY CORRECTION) IS C P = PHI ( (ULOW +.5 -UMU)/USIGMA ) C (WHERE PHI IS STANDARD NORMAL DIST. FUNCTION, C AND PHINV IS ITS INVERSE). ULOW = USIGMA*PHINV(P) + UMU - 0.5 C ROUND CRITICAL VALUE DOWN TO INTEGER C (THIS GIVES CONSERVATIVE BOUNDS), AND FIND P. IU = ULOW IF (IU.LT.0) IU = 0 ULOW = IU Z = (ULOW+0.5-UMU)/USIGMA P = PHI(Z) PERC = 100.0*(1.0-2.0*P) C WANT TO INVERT FUNCTION U AT A HALF INTEGER ULOW = ULOW + 0.5 C UHIGH UHIGH = XM*XN - ULOW ILOW = ULOW + .5 IHIGH = UHIGH + .5 C ILOW AND IHIGH GIVE THE ORDERED DIFFERENCES C WHICH FORM THE CONFIDENCE INTERVAL. C AN ESTIMATE OF THE SLOPE OF THE LINEAR C APPROXIMATION TO FMANN. SLOPE = (ABS(Z)*SQRT(XM*XN*(XM+XN)))/((DH-DL)*SQRT(3.)) C X1 AND X2 BRACKET THE LOWER CONFIDENCE LIMIT. C THEN BY CALLING THE ROUTINE ILL THE LOWER CONFI- C DENCE LIMIT WITHIN EPS2 IS RETURNED VIA DLOW. CALL BRACK(DL, ULOW, SLOPE, X, M, Y, N, X1, FX1, X2, FX2) CALL ILL(DLOW, ULOW, X1, FX1, X2, FX2, X, M, Y, N, EPS2) C SAME FOR UPPER END VIA DHIGH. CALL BRACK(DH, UHIGH, SLOPE, X, M, Y, N, X1, FX1, X2, FX2) CALL ILL(DHIGH, UHIGH, X1, FX1, X2, FX2, X, M, Y, N, EPS2) C A NEW ESTIMATE OF SLOPE BASED ON THE CONFIDENCE C INTERVAL (DLOW,DHIGH), UNLESS THE LENGTH OF C THE INTERVAL IS SMALLER THAN EPS2. IF (DHIGH.GT.DLOW+EPS2) SLOPE = ((DH-DL)/(DHIGH-DLOW))*SLOPE C THE SAME ROUTINES ARE USED FOR THE ESTIMATE C DPOINT EXCEPT EPS1 IS USED. THE MIDPOINT C OF THE CONFIDENCE INTERVAL WILL BE THE INITIAL C ESTIMATE OF DPOINT. D = (DLOW+DHIGH)/2.0 CALL BRACK(D, UMED1, SLOPE, X, M, Y, N, X1, FX1, X2, FX2) CALL ILL(DPOINT, UMED1, X1, FX1, X2, FX2, X, M, Y, N, EPS1) C IF M*N IS ODD THE ESTIMATE IS DPOINT, IF (IODD.EQ.1) RETURN C IF EVEN THEN THE VALUE DPOINT IS THE LOWER C CENTER ESTIMATE. THE UPPER CENTER ESTIMATE IS D2, C AND THE FINAL ESTIMATE IS THE AVERAGE OF THE TWO. CALL BRACK(DPOINT, UMED2, SLOPE, X, M, Y, N, X1, FX1, X2, FX2) CALL ILL(D2, UMED2, X1, FX1, X2, FX2, X, M, Y, N, EPS1) DPOINT = (DPOINT+D2)/2. RETURN END SUBROUTINE CHECK(X, M, Y, N, PERC, IERR) CHE 10 C SUBROUTINE TO DO ERROR CHECKING FOR RANKCI C WRITTEN JUNE 1975 BY T. RYAN DIMENSION X(M), Y(N) C CHECK FOR INSUFFICIENT DATA IF (M.GE.2 .AND. N.GE.2) GO TO 10 IERR = 1 RETURN C CHECK FOR PROPER PERCENT CONFIDENCE 10 IF (PERC.LT.99.999 .AND. PERC.GT.49.999) GO TO 20 IERR = 2 RETURN C CHECK FOR X AND Y ARRAYS IN NON-DECREASING ORDER 20 DO 30 I=2,M IF (X(I-1).GT.X(I)) GO TO 50 30 CONTINUE DO 40 I=2,N IF (Y(I-1).GT.Y(I)) GO TO 50 40 CONTINUE RETURN C X OR Y ARRAY OUT OF ORDER 50 IERR = 3 RETURN END SUBROUTINE TMEAN(Z, N, ZBAR, VARZB) TME 10 C GIVEN DATA Z(1), Z(2), ..., Z(N) IN NON-DECREASING ORDER, C THIS ROUTINE FINDS THE ALPHA-TRIMMED MEAN ZBAR, AND THE C ESTIMATED VARIANCE OF ZBAR, VARZB. C WRITTEN JUNE 1975 BY T. RYAN DIMENSION Z(N) DATA ALPHA /0.10/ XN = N C NT IS NUMBER TRIMMED FROM EACH END C N1 IS NUMBER OF LOWEST OBSERVATION NOT TRIMMED C N2 IS NUMBER OF HIGHEST OBSERVATION NOT TRIMMED NT = ALPHA*XN N1 = NT + 1 N2 = N - NT C TRIMMED MEAN SUM = 0.0 DO 10 I=N1,N2 SUM = SUM + Z(I) 10 CONTINUE X = N - 2*NT ZBAR = SUM/X C WINSORIZED SUM OF SQUARES SUM = 0.0 DO 20 I=N1,N2 SUM = SUM + (Z(I)-ZBAR)**2 20 CONTINUE IF (NT.EQ.0) GO TO 30 XNT = NT SUM = SUM + XNT*(Z(N1-1)-ZBAR)**2 + XNT*(Z(N2+1)-ZBAR)**2 30 VARZB = SUM/(XN*XN) RETURN END SUBROUTINE ILL(XVAL, FVAL, X1, F1, X2, F2, X, M, Y, N, EPS) ILL 10 C THIS ROUTINE SOLVES THE EQUATION C F(T)-FVAL=0 C FOR A MONOTONE FUNCTION F, IN THIS INSTANCE FMANN. THE METHOD C USED IS THE ILLINOIS METHOD AS DESCRIBED BY DOWELL AND JARRATT C (1971,BIT), EXCEPT FOR A MODIFICATION WHEN CLOSE TO THE ROOT. C THEN IF THE ROOT WAS NOT TRAPPED ON THE LAST ITERATION THE ROU- C TINE SWITCHES TO THE BISECTION METHOD. C INPUT - C FVAL = THE VALUE OF THE FUNCTION AT THE ROOT. C X1 AND X2 ARE VALUES WHICH BRACKET THE ROOT, THAT IS C EITHER C F(X1) .LT. FVAL .LT. F(X2) C OR C F(X2) .LT. FVAL .LT. F(X1) C FX1 = F(X1) C FX2 = F(X2) C X, M, Y, AND N ARE QUANTITIES USED BY THE FUNCTION FMANN C EPS = THE ACCURACY OF THE SOLUTION. C X1, X2, AND THEIR FUNCTIONAL VALUES ARE CHANGED THROUGHOUT THE C ROUTINE. C OUTPUT - C XVAL = THE ROOT WITHIN EPS. DIMENSION X(M), Y(N) F1 = F1 - FVAL F2 = F2 - FVAL IBISEC = 0 10 CONTINUE IF (ABS(X2-X1).LT.EPS) GO TO 40 C X3 IS THE INTERSECTION OF THE SECANT LINE C FORMED BY (X1,F1), (X2,F2) AND THE X-AXIS. X3 = X2 - (F2*(X2-X1))/(F2-F1) IF (IBISEC.EQ.1) X3 = (X1+X2)/2. IBISEC = 0 CALL FMANN(X3, X, M, Y, N, F3) F3 = F3 - FVAL IF (F3*F2) 20, 20, 30 C ROOT WAS TRAPPED, SO USE REGULA FALSI 20 X1 = X2 F1 = F2 X2 = X3 F2 = F3 GO TO 10 C ROOT WAS NOT TRAPPED, SO USE ILLINOIS MOD- C IFICATION. 30 X2 = X3 F2 = F3 F1 = F1/2. IF (ABS(F2).LE.ABS(F1)) GO TO 10 C IF ILLINOIS MODIFICATION IS MORE RADICAL C THAN BISECTION, THEN USE BISECTION. F1 = 2.0*F1 IBISEC = 1 GO TO 10 40 XVAL = (X1+X2)/2. RETURN END SUBROUTINE BRACK(XINIT, FVAL, SLOPE, X, M, Y, N, X1, FX1, X2, BRA 10 * FX2) C SUPPOSE THE FUNCTION F IS MONOTONE AND IT IS DESIRED TO SOLVE C THE EQUATION C F(T) - FVAL = 0. C THIS ROUTINE RETURNS TWO VALUES WHICH BRACKET THE ROOT. C INPUT - C XINIT = INITIAL ESTIMATE OF THE ROOT. C FVAL = THE VALUE OF THE FUNCTION AT THE ROOT. C SLOPE = APPROXIMATE SLOPE OF THE FUNCTION IN A C NEIGHBORHOOD OF THE ROOT. C X, M, Y, AND N ARE QUANTITIES USED BY THE FUNCTION WHICH C IN THIS INSTANCE IS FMANN. C NONE OF THE ABOVE QUANTITIES IS CHANGED THROUGHOUT THIS ROUTINE C OUTPUT - C X1 AND X2 BRACKET THE ROOT, THAT IS EITHER C F(X1) .LT. FVAL .LT. F(X2) C OR C F(X2) .LT. FVAL .LT. F(X1) C FX1 = F(X1) C FX2 = F(X2) DIMENSION X(M), Y(N) X1 = XINIT CALL FMANN(X1, X, M, Y, N, FX1) DELTA = 1.5*((FVAL-FX1)/SLOPE) 10 X2 = X1 + DELTA CALL FMANN(X2, X, M, Y, N, FX2) IF ((FX1-FVAL)*(FX2-FVAL).LT.0.) RETURN X1 = X2 GO TO 10 END SUBROUTINE FMANN(D, X, MM, Y, NN, U) FMA 10 C ROUTINE TO CALCULATE THE MANN-WHITNEY STATISTIC U. C U DIFFERS ONLY BY A CONSTANT FROM THE WILCOXON 2-SAMPLE RANK C STATISTIC W. C INPUT - C D = NULL HYPOTHESIS VALUE OF THE SHIFT OF THE Y C POPULATION FROM THE X POPULATION (I.E. THE NULL C HYPOTHESIS BEING TESTED IS MU(Y) = MU(X) + D). C X = ARRAY CONTAINING THE SAMPLE FROM ONE POPULATION, C WHICH MUST BE IN NON-DECREASING ORDER. C M = DIMENSION (SAMPLE SIZE) OF X C Y = ARRAY CONTAINING THE SAMPLE FROM THE OTHER POPULATION, C WHICH MUST BE IN NON-DECREASING ORDER. C N = DIMENSION (SAMPLE SIZE) OF Y C NONE OF THE ABOVE IS ALTERED BY THIS ROUTINE. C OUTPUT - C U = MANN-WHITNEY TEST STATISTIC. C THIS ROUTINE IS INTENDED TO BE USED IN AN ITERATION ROUTINE FOR C ESTIMATION ONLY, SINCE NO CHECKS OR ADJUSTMENTS ARE MADE FOR TIES. C THIS ROUTINE IS DESIGNED TO BE VERY FAST, SINCE IT IS TO BE USED C IN AN ITERATION PROCEDURE, SO NO CHECKS ARE MADE TO INSURE THAT C M AND N ARE POSITIVE, OR THAT X AND Y ARE NON-DECREASING. C ALL CHECKING SHOULD BE DONE IN CALLING PROGRAM. C U= SUM (NO. OF Y(J) LESS THAN (OR EQUAL) TO X(I)+DELTA) WHERE C THE SUM IS OVER I. C SINCE ARRAYS X AND Y ARE ORDERED, IF X(I) IS GREATER THAN Y(J), C X(I+1) MUST ALSO BE. C WRITTEN 3/75 BY T. RYAN. LAST UPDATED 6/75 BY T. RYAN DIMENSION X(MM), Y(NN) DELTA = D M = MM N = NN C JLE IS THE NUMBER OF Y VALUES LESS THAN OR C EQUAL TO X(I). C IU IS ACCUMULATED AS THE VALUE OF THE U C STATISTIC. JLE = 0 IU = 0 C MAIN LOOP DO 30 I=1,M XI = X(I) + DELTA 10 IF (XI.LT.Y(JLE+1)) GO TO 20 JLE = JLE + 1 IF (JLE.GE.N) GO TO 50 GO TO 10 20 IU = IU + JLE 30 CONTINUE 40 U = IU RETURN C X(I) IS GREATER THAN (OR EQUAL) TO ALL Y(J). C THEREFORE X(I+1), ..., X(M) ARE ALL GREATER C THAN ALL Y(J). 50 IU = IU + (M-I+1)*N GO TO 40 END FUNCTION PHINV(P) PHI 10 C INVERSE NORMAL DISTRIBUTION FUNCTION. C IF Z IS N(0,1), THIS FUNCTION RETURNS PHINV DEFINED BY C P(A .LT. PHINV) = P, (0.0 .LT. P AND P .LT. 1.0). C REF. HANDBOOK OF MATHEMATICAL FUNCTIONS , 1964, USDC, NATIONAL C BUREAU OF STANDARDS, WASH. DC. P. 933, FORMULA 26.2.23. C ERROR WILL BE LESS THAN 4.5E-04. C WRITTEN 4/76 BY T. RYAN BASED ON ROUTINE BY H. D. KNOBLE (1966). IF (P.LE.0. .OR. P.GE.1.0) GO TO 30 IF (P.LT.0.5) GO TO 10 SIGN = 1.0 PTAIL = 1.0 - P GO TO 20 10 SIGN = -1.0 PTAIL = P 20 T = SQRT(ALOG(1.0/(PTAIL*PTAIL))) Z = ABS(T-(2.515517+T*(0.802853+T*0.010328))/(1.0+T* * (1.432788+T*(0.189269+T*0.001308)))) PHINV = Z*SIGN RETURN 30 CONTINUE RETURN END FUNCTION PHI(X) PHI 10 C NORMAL DISTRIBUTION FUNCTION C LET Z BE N(0,1), THEN PHI IS DEFINED BY PHI = P(Z .LE. X). C REF. HANDBOOK OF MATHEMATICAL FUNCTIONS , 1964, USDC, NATIONAL C BUREAU OF STANDARDS, WASH. DC, P. 933, FORMULA 26.2.19. C ERROR IS LESS THAN 1.5E-07. C WRITTEN 4/76 BY T. RYAN, BASED ON ROUTINE BY H. D. KNOBLE (1966). Z = X IF (Z.LT.0.0) GO TO 10 ISIGN = 1 GO TO 20 10 Z = -X ISIGN = -1 20 P = 0.5*(1.0+Z*(0.049867347+Z*(0.0211410061+Z*(3.2776263D-3+Z* * (3.80036E-05+Z*(4.88906E-05+Z*5.383E-06))))))**(-16) IF (ISIGN.EQ.1) P = 1.0 - P PHI = P RETURN END .