Fri, 28 Oct 1994 07:41:25 -0700 for From: blovitts@nsf.gov Date: Fri, 28 Oct 94 09:27:21 EST To: socgrad@UCSD.EDU Subject: Myth of the Bell Curve == For Your Information == MAIL VIA INTERNET FROM @VTBIT.CC.VT.EDU:owner-edpolyan@ASUVM.INRE.ASU.EDU TUESDAY 10/25/94 6:19:36 P.M. Date: Tue, 25 Oct 1994 14:55:48 MST From: Gene Glass Subject: Myth of the Bell Curve The reprint below came in today on AERA-GSL. It certainly must be taken as the final word on the "bell curve." And posting it here might forestall discussion of a lot of irrelevant stuff like whether IQ is REALLY, REALLY distributed a la Gauss, or whether teachers ought to grade on the curve. Herrnstein and Murray used the term Bell Curve in their title, but in no more than metonymically. The nub of their book is hinted at in the subtitle: (approx.) IQ, Race and Class in America. They appear to maintain that IQ is largely inherited (between 40% and 80%), that the racial differences in IQ are largely genetic, that IQ largely determines social class (hence, income, quality of life etc), and that attempts at social amelioration of class disadvantages will fail because of the intransigence of IQ. I think that their argument is littered with non sequiturs, e.g., highly heritable traits can still yield to environmental influences (witness age at death over the last 150 years); and that they ignore the mediating effect that education plays in the IQ --> Social Class chain. Schools at all levels continue to worship IQ scores, and sort and admit on the basis of them with the result that diplomas, licenses and opportunities go to those who test high. Education's hands are dirty in this business. What do others think? GVG -------**********======================================**********-------- Gene V Glass glass@asu.edu College of Education atgvg@asuacad.bitnet Arizona State University 602-965-2692 Box 872411 Tempe, AZ 85287-2411 ----------------------------Original message---------------------------- THe following may be of interest, in view of the interest in The Bell Curve book by Herrenstein and Murray. The Myth of the Bell Curve by Ted Goertzel Adapted and condensed from: Ted Goertzel and Joseph Fashing, "The Myth of the Normal Curve: A Theoretical Critique and Examination of its Role in Teaching and Research," Humanity and Society 5:14-31 (1981), reprinted in Readings in Humanist Sociology (General Hall, 1986). Surely the hallowed bell-shaped curve has cracked from top to bottom. Perhaps, like the Liberty Bell, it should be enshrined somewhere as a memorial to more heroic days. -Earnest Ernest, Philadelphia Inquirer. 10 November 1974. The myth of the bell curve has occupied a central place in the theory of inequality (Walker, 1929; Bradley, 1968). Apologists for inequality in all spheres of social life have used the theory of the bell curve, explicitly and implicitly, in developing moral rationalizations to justify the status quo. While the misuse of the bell curve has perhaps been most frequent in the field of education, it is also common in other areas of social science and social welfare. When Abraham de Moivre made the first recorded discovery of the normal curve of error (to give the bell curve its proper name) in 1733, his immediate concern was with games of chance. The normal distribution, which is nothing more than the limiting case of the binomial distribution resulting from random operations such as flipping coins or rolling dice, was a natural discovery for anyone interested in the mathematics of gambling. De Moivre was unhappy, however, with the lowly origins of his discovery, He proceeded to raise its status by attributing to it an -importance beyond its literal meaning. In his age, this could best be done by claiming hat it was a proof of the existence of God. He announced: And thus in all cases it will be found, that although Chance produces irregularities, still the Odds will be infinitely great, that in process of Time, those irregularities will bear no proportion to the recurrency of that Order which naturally results from Original Design .... (Walker, 1929:17). De Moivre's discovery of the bell curve did not attract much attention. Gamblers are perhaps better served with discrete distributions. Theologians, for their part, no doubt preferred to base their case for God's insistence on less probabilistic grounds. Serious interest in the distribution of errors on the part of mathematicians such as Laplace and Gauss awaited the early nineteenth century when astronomers found the bell curve to be a useful tool to take into consideration the errors they made in their observations of the orbits of the planets. Further developments in the myth of the bell curve were left not to the astronomers or theologians but to the early quantitative social scientists. Systematic collection of population statistics began in the late eighteenth and early nineteenth centuries as a response to the social upheavals of the time and the consequent concern with understanding the dynamics of mass behavior. These early sociologists were not concerned with theology, but they were seeking proof of the orderliness of society. Relying on the justifiably great prestige of Laplace and Gauss as mathematicians, they took the bell curve as proof of the existence of order in the seemingly chaotic social world. Unfortunately, the early social scientists often had a poor understanding of the fact that the mathematical formulas of Gauss and Laplace were based on assumptions not often met in the empirical world. As Fisher (1923, Vol. 1: 18 1) points out: the Gaussian error law came to act as a veritable Procrustean bed to which all possible measurements should be made to fit. The belief in authority so typical of modern German learning and which has also spread to America was too great to question the supposed generality of the law discovered by the great Gauss. The mathematicians, on the other hand, did not feel that it was their domain to check whether or not the empirical world happened to fit their postulates. The bell curve came to be generally accepted, as M. Lippmnan remarked to Poincare (Bradley, 1969:8), because "...the experimenters fancy that it is a theorem in mathematics and the mathematicians that it is an experimental fact." Adolph Quetelet, the father of quantitative social science, was the first to claim that the bell curve could be applied only to random errors but also to the distributions of social phenomena (Landau and Lazarsfeld, 1968; Wechsler, 1935:30-31). The myth of the bell curve was part of Quetelet's theory of the Average Man (Quetelet, 1969). He assumed that nature aimed at a fixed point in forming human beings, but made a certain frequency of errors. The mean in any distribution of human phenomena was to him not merely a descriptive tool but a statement of the ideal. Extremes in all things were undesirable deviations. His doctrine was a quantification of Aristotle's doctrine of the Golden Mean, and it is susceptible to the same criticisms. While there may be traits where the average can reasonably be considered to be the ideal, the argument's application is severely limited. One might argue, for example, that average vision is ideal, whereas nearsightedness and farsightedness are undesirable deviations. But is this true of physical strength or of mental abilities, or even of physical stature (one variable for which there is actually substantial evidence of an approximately normal distribution)? Quetelet, like Aristotle, exempted mental abilities, arguing that those who were superior to the average in intelligence were mere forerunners of a new average that was to come. Quetelet's doctrine of the Average Man was ill suited to a society that was more in need of a rationalization for inequality than a glorification of the common man. His use of the bell curve, however, was useful as part of the social Darwinist ideology that was emerging as a justification for the inequities of laissez-faire capitalism. The myth of the bell curve found its most enthusiastic and effective champion in Francis Galton and the eugenics movement of which he was a major founder. The importance that he attributed to the bell curve can be illustrated by the following quotation (Galton, 1889:66): I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "Law of Frequency of Error." The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob, the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along. The tops of the marshalled row form a flowing curve of invariable proportions; and each element, as it is sorted into place, finds, as it were, a preordained niche, accurately adapted to fit it. Galton went beyond Quetelet not only in his enthusiasm for the bell curve but also in his attempt to gather data to demonstrate its general applicability. He obtained data on a number of physical traits that he was interested in improving, such as height, weight, strength of the arms and of the grip, swiftness of the blow, and keenness of eyesight. The variables tended to be approximately normally distributed, but the fit was not perfect. He consequently converted his data into a type of standard score and averaged the standard scores together (Galton, 1889:201). These average scores fit the fit the normal curve very well as might be expected since he had averaged together a number of largely unrelated variables and created a mean score that reflected little more than random error. Karl Pearson (best known today for the invention of the product-moment correlation coefficient) was Galton Professor of Eugenics at the University of London and Galton's biographer. He accepted the ideology of the eugenics movement and was preoccupied with curing social problem by creating a race of superior blue-eyed and golden-haired people (Pearson, 1912). He was, however, too good a statistician to repeat Galton's methodological errors or to accept the Gaussian model on the basis of authority. He used his newly developed Chi Square test to check how closely a number of empirical distributions of supposedly random errors fitted the bell curve. He found that many of the distributions that had been cited in the literature as fitting the normal curve were actually significantly different from it, and concluded that "the normal curve of error possesses no special fitness for describing errors or deviations such as arise either in observing practice or in nature" (Pearson, 1900: 174). The Myth in Testing Theory Pearson's conclusions were not sufficient to stop the application of the normal curve of error as a norm in assigning classroom grades or in psychological testing. Most objective tests that are in practical use today rely on summated scaling techniques. This means that the person taking the tests answers a large number of items and receives a total score corresponding to the number of items that he or she answers correctly. This type of measurement, which is also used in Likert-scaling in sociological research, has an inherent bias toward the normal distribution in that it is essentially an averaging process, and the central limit theorem shows that distributions of means tend to be normally distributed even if the underlying distribution is not (if the means are based on large random samples). This inherent bias is most likely to be realized if the responses to the test items are poorly intercorrelated (i.e., if the test or scale is poorly constructed to measure a central factor). If a large number of people fill out a typical multiple choice test such as the Scholastic Aptitude Test (or a typical sociological questionnaire with precoded responses such as "strongly agree, agree") at random using a perfect die, the scores are very likely to be normally distributed. This is true because many more combinations of responses give a sum that is close to the theoretical mean than give a score that is close to either extreme. This characteristic of the averaging process is useful in calculating probable errors in random sampling and is consequently discussed in elementary statistics books (e.g., Blalock, 1960:138-141). When averaging is used in testing or measurement, however, it means that the greater the amount of error present, the greater the likelihood of a normal distribution of scores, even if the variable being measured is not normally distributed. All objective tests contain a certain amount of error in that the chance of a respondent's getting a given item right depends not only on the central factor being measured but also on other general factors and on characteristics idiosyncratic to that item (not to mention the element of luck). Thus it is not surprising that summated scaling devices tend to give normal distributions. The problem comes when this tendency is interpreted not as a result of unavoidable error, but as a confirmation of a preconceived idea that the variable being measured is in fact normally distributed. The early developers of standardized intelligence tests were pleased to find that their distributions of scores were approximately normal, although they were disturbed by the fact that perfect normal distributions were rarely, if ever, achieved. Tborndike (1926:521-555) went so far as to average together scores achieved by the same respondents on eleven different intelligence tests in order to achieve a more normal distribution. He thus repeated Galton's mistake by averaging together somewhat diverse measures and then assuming that the resultant distribution was due to the normality of the underlying variable rather than to the increased measurement error. (The importance of this, of course, depends on how different the various tests were.) He also discounted the fact that the intelligence tests themselves were standardized in such a way as to give normal distribution. Despite the efforts of prominent psychometricians such as David Wechsler (1935:34) to counter it, the myth of the bell curve was widely disseminated in psychological texts (Goodenough, 1949:148-149; V , 1940-16-17; Anastasi, 1968:27) and is widely used as a criterion for test construction. More modern texts usually recognize that there is no theoretical justification for the use of the normal curve, but justify using it as a convenience (Cronbach, 1970:99-100). The clear assertion by prominent psychologists such as Wechsler and Cronbach that psychological phenomena are not somehow inherently normally distributed is a clear advance over the type of indoctrination that students of educational psychology typically received in the 1930s and 1940s. This methodological advance coincided with a general trend in the social sciences away from sociobiological arguments. The close tie between methodological presuppositions and ideological concerns is illustrated by the fact that the myth of the bell curve has recently been reactivated precisely as part of an attempt to reassert racist arguments about the biological determinants of human abilities. In his highly controversial article on genetics and I.Q., Arthur Jensen (1969) went to considerable length in an attempt to demonstrate that I.Q. scores are approximately normally distributed. In 1994, Richard Herrnstein and Charles Murray used the phrase "The Bell Curve" as the title of their widely reviewed book on Intelligence and Class Structure in American Life. While their book presents elaborate statistical justifications for most of its assertions, however, the claim that intelligence is normally distributed is defended on common sense grounds. Herrnstein and Murray (1994: 557) simply assert that "it makes sense that most things will be arranged in bell-shaped curves. Extremes tend to be rarer than averages." They note that the bell curve "has a close mathematical affinity to the meaning of the standard deviation," a concept which they use extensively in the book, and remark that: It is worth pausing a moment over this link between a relatively simple measure of spread in a distribution and the way things in everyday life vary, for it is one of nature's more remarkable uniformities. In reality, there is nothing remarkable about the fact that measures which contain a good deal of random variation will fit a measure designed to measure random variation. The question whether intelligence is or is not normally distributed is actually irrelevant to the thesis that observed differences in I.Q. scores between racial groups reflect innate biologic differences. Jenson, Herrnstein and Murray apparently introduce the topic of the normality of I.Q. score distributions because readers who have been led to accept the myth of the normal curve in other contexts may assume that a normal distribution proves that the measurement was valid. If the normal distribution were properly understood as nothing more than a distribution of random errors, it would not lend any weight to their arguments. tests. The Myth of the Bell Curve in Grading The myth of the normal bell curve also lives on in educational institutions, where students and faculty often casually refer to "grading on the curve" or "curving the grades." Many administrators resemble the superintendent of schools in "Elmtown" (Hollingshead, 1961) in assuming that a normal distribution of scores indicates that a good job of grading was done. Often, instructors are expected to turn in an approximately normal distri- bution of grades and any substantial deviations must be justified. In a 1970-1972 dispute at a large state university, conflict over grading and other issues led to a situation in which all but one of the full-time junior faculty members were fired, denied tenure, or resigned under pressure (Goertzel and Fashing, 1969). The initial controversy arose when some administrators became concerned about the tendency toward "grade inflation" on campus, an issue that has been of some national concern as well (Jencks and Riesman, 1968). The dean of the college distributed statistics showing that the mean grade point average had been increasing over time and in comparison to other institutions. There was also considerable difference in the average grades given out by departments on campus. The Sociology Department was particularly singled out for its high average grades, and pressure was put on the department chair to bring his faculty members into line. One junior faculty member was told that he must use "common sense" standards in grading that would result in a "more or less normal distribution" of grades. The teaching assistants in the chairman's introductory sociology class were given more explicit instructions: The combined average grades for each of their four classes was not to exceed 2.6 (or a low B -). Five teaching assistants were summarily dismissed after they refused to sign a document declaring their willingness to carry out the intent of the chairman's directive. The issue became a major focus of conflict on campus, leading the dean and other senior faculty and administrators to enunciate assumptions which are not often states so clearly. They made it clear that their concern went beyond the question of the "average" or mean grade. They were also concerned that the number of As be relatively small. Indeed, they insisted that the usual distribution of grades should approximate a normal distribution in that most grades should be clustered around the mean (or C) with relatively few at the extremes. Most of the spokesmen who supported a normal distribution said they thought that such a distribution was the "usual," "natural" or "common sense" result to be obtained from correct grading procedures. In a more traditional view of grading as representing objective academic standards, instructors should grade papers according to their intrinsic merit and give out whatever grades result even if the distribution results in a lot of A's or F's. On tests, an instructor should know, before looking at the results, what score will be required for each grade. This practice, however, may be administratively inconvenient for several reasons. Enrollments may drop if too many students fail. Admissions to elite programs may be too large if too many students receive high grades. The myth of the bell curve serves administrative convenience by assuring that a predictable proportion of students can be channeled into each strata of the educational and occupational system. The Bell curve in Theory and Research The use of the myth of the bell curve in research serves to reinforce some persistent biases, as well as to disguise sloppy research practices. These biased research findings may then be used to justify the assumption that abilities and talents are normally distributed and that grades and other social rewards should be distributed according to the bell curve. The assumption that social phenomena should be normally distributed is consistent with pluralist or other multicausal theoretical models, since a large number of unrelated and equipotent causes lead to a normal distribution. Indeed, the early pluralists in political science expected political attitudes to be normally distributed, since they believed them to be caused by numerous, equipotent independent factors (Rice, 1928:72). Similarly, if social status is determined by a number of independent factors, we would expect it to be normally distributed. If, as Marxists and others argue, it is largely determined by a single variable, such as the relationship to the means of production, there would be no reason for this to be the case. In point of fact, income is not normally distributed in the United States or any other known society. Income can be measured easily in monetary units, this is well accepted. A graph of the income distribution in the United States can even be found in Herrnstein and Murray's book (1984: 100), and it is not a bell curve. Other measurements used by social scientists, however, provide only a rough index of the underlying trait. If sufficient error is present in these measuring instruments, a normal distribution may well result. Lundberg and Friedman (1943), for example, compared three measures of socioeconomic status in a rural community. These tests measured social status by arbitrarily assigning points to the furniture and other objects observed in the respondents' living rooms. After applying several tests to the same families and plotting the resulting distributions, the authors noted: assuming that in a random sample, socioeconomic status is normally distributed, the distortion of the normality of the distribution by the Guttman version of the Chapin scale suggests the presence of spurious factors .... In other words, the bell curve was used as a standard for deciding which test was valid. The commentators on the article (Knupfer and Merton, 1943) were quick to point out that this was an unjustified assumption. Income, property, education, and occupational status are not normally distributed; why should socioeconomic status as measured by a summated scale of the paraphernalia in the respondents' living rooms be? Yet the assumption that distribution should be normal is widely used, perhaps in the absence of any other criterion to demonstrate that a good job of measurement has been done. A U.S. Forest Service Report (1973:24a), for example, reports with satisfaction that scores on an index of the wilderness quality of roadless areas were quite normally distributed. There is no reason why this should be the case except that the Forest Service has averaged together a number of possibly unrelated variables (scenic character, isolation, variety). (In fact, distribution found by the Forest Service deviates significantly from normality; but, as if often the case, they did not check the goodness of fit.) The use of normality as a criterion reinforces sloppiness in scale construction, since a sloppy scale has more error and is thus more likely to approximate a normal distribution. The myth of the bell curve is also consistent with theories that assume that social behavior is a reflection of individual differences (provided, also, that it is assumed that individual differences are normally distributed). Stuart Dodd (1942:251-262), for example, used the bell curve in developing his theory of social problems. A social problem, to Dodd, consisted in a deficit of some characteristic that is socially desirable. The 2% of the population that falls below two standard deviations from the mean on a desirable characteristic are the "minimals," and they constitute the social problems. These "minimals" include divorcees, prostitutes, illegitimates; the sick, blind, crippled, or insane; the poor and unemployed; criminals and political refugees; inferior races such as Bushmen and Pygmies; the illiterate or ignorant; the overworked and underprivileged; the offensively vulgar; atheists; foreign language minorities; hermits and social isolates. Dodd was certainly aware that not all phenomena are normally distributed, and he realized that the two percent figure may not always be appropriate. Yet, only the assumption of normality led him to even suggest this figure; otherwise, what possible reason could there be for suggesting that the divorce rate, poverty rate, unemployment rate, to say nothing of the proportion of foreign language minorities, should fall at 2%? Dodd also used the bell curve to estimate the possible range of human characteristics, determining that it was unlikely for the range to exceed 12.5 standard deviations (Dodd, 1942:261-262). He noted, however, that the range of incomes in our "capitalistic culture" exceeded 2000 standard deviations. His suggestion that the variance in incomes should be limited to correspond to the variance in abilities is perhaps a good one, but more rigorous data show that the assumption of normality cannot be used m determining the range of these abilities. Weschler (1935) shows on the basis of much better data, that the range of human traits rarely exceeds a ratio of 3:1 (the range ratio of Binet Mental Age scores is 2.30:1). Nothing in this paper should be taken as questioning the use of the normal distribution where it is appropriate (e.g., in estimating confidence intervals from random samples). To make this correct usage clear, it might be wise to revert to the earlier phrase, "normal curve of error." This would make it clear that the normal bell curve is "normal" only if we are dealing with random errors. Social life, however, is not a lottery, and there is no reason to expect sociological variables to be nor- mally distributed. Nor is there any reason to expect psycho- logical variables to be if they are influenced by social factors. Certain physiological traits, such as length of the extremities, are often approximately normally distributed within homogeneous populations. Other traits, such as weight, which are affected by social behaviors, are not. Indeed, if a phenomenon is found to be normally distributed, this is very likely an indication that it is caused by random individual variations rather than by social forces. The myth that social variables are normally distributed has been shown to be invalid by those methodologists who have taken the trouble to check it out. Its persistence in the folklore and procedures of social institutions is a reflection of institutionalized bias, not scientific rigor. References Anastasi, A. 1968 Psychological Testing. New York: Macmillan. Blalock, H. 1960 Social Statistics. New York: McGraw-Hill. Bohrnstedt, E. and C. Bohrnstedt 1972 'How One Normally Constructs Good Measures, Sociological Methods and Research, I, 3-12. Bradley, J.V. 1968 Distribution-free Statistical Tests. Englewood Cliffs, N.J.: Prentice-Hall. Cronbach, L. 1970 Essentials of Psychological Testing. New York: Harper & Row. Dodd, S. 1942 Dimensions of Society. New York: Macmillan. Fisher, A. 1922 The Mathematical Theory of Probability. New York: Macmillan. Forest Service, U.S.D.A. Roadless and Undeveloped Areas Within National Forests. Springfield Va.: National Technical Information Service. Galton, F. 1889 Natural Inheritance. London: Macmillan. Goertzel, T. and J. Fashing 1981 "The Myth of the Normal Curve: A Theoretical Critique and Examination of its Role in Teaching and Research" Humanity and Society 5: 14-31. Goodenough, F. 1949 Mental Testing. New York: Rinehart. Herrnstein, R. and C. Murray 1994 The Bell Curve: Intelligence and Class Structure in American Life. New York: Free Press. Hollingshead, A. 1961 Eltmtown's Youth. New York: Wiley. Hoyt, D.P. 1965 "The Relationship Between College Grades and Adult Achievement." Iowa City: American College Testing Program, Research Report No. 7. Jencks, C. and D. Riesman 1968 The Academic Revolution. New York: Doubleday. Jencks, C., et al. 1972 Inequality. New York: Basic Books. Jensen, A. 1969 "How Much Can We Boost I.Q. and Scholastic Achievement?" Harvard Educational Review 39, 1-123. Knupfer, G. and R. Merton 1943 "Discussion." Rural Sociology 8, 236-239. Landau, D. and P.F. Lazarsfeld 1968 "Adolphe Quetelet." In Vol. 13 of International Encyclopedia of the Social Sciences. New York: Macmillan and Free Press. Lundberg, G. and P. Friedman 1943 "A Comparison of Three Measures of SocioEconomic Status." Rural Sociology 8, 227-236. Pearson, K. 1912 Social Problems: Their Treatment, Past, Present and Future. London: Dulau. 1900 "On the Criterion That a Given System of Deviations From the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling." The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 50, 157-175. Quetelet, L.A.J. 1969 A Treatise on Man. Gainesville, Fla.: Scholar's Facsimiles and Reprints. Rice, S. 1928 Quantitative Methods in Politics. New York: Knopf. Thorndike, E.L., el al. 1927 The Measurement of Intelligence. New York: Columbia University Press. Thurstone, L.L. 1959 The Vectors of the Mind. Chicago: University of Chicago Press. Vernon, P. 1940 The Measurement of Abilities. London: University of London Press. Walker, H. 1929 Studies in the History of Statistical Method. Baltimore: Williams and Wilkins. Wechsler, D. 1935 The Range of Human Abilities. Baltimore: Williams and Wilkins.