2025-04-01 - On Binary Digits And Human Habits by Frederik Pohl =============================================================== When an astronomer wants to know where the planet Neptune is going to be on July 4th, 2753 A.D., he can if he wishes spend the rest of his life working out sums on paper with pencil. Given good health and fast reflexes, he may live long enough to come up with the answer. But he is more likely to employ the services of an electronic computer, which--once properly programmed and set in motion--will click out the answer in a matter of hours. Meanwhile the astronomer can catch up on his gin rummy or, alternatively, start thinking about the next problem he wants to set the computer. It isn't only astronomers, of course, who let the electrons do their arithmetic. More and more, in industry, financial institutions, organs of government and nearly every area of life, computers are regularly used to supply quick answers to hard questions. A big problem in facilitating this use, and one which costs computer-users many millions of dollars each year, is that computers are mostly adapted to a diet of binary numbers. The familiar decimal digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 which we use every day upset their digestions. They prefer simple binary digits, 0 and 1, no more. With them the computers can represent any finite quantity quite as unambiguously as we can with five times as many digits in the decimal scheme; what's more, they can "write" their two digits electronically by following such simple rules as, "A current flowing through here means 1, no current flowing through here means 0." In practice, many computers are now equipped with automatic translators which, before anything else happens, convert the decimal information they are fed into the binary arithmetic they can digest. A few, even--clumsy brutes they are!--actually work directly with decimal numbers. But intrinsically binary arithmetic has substantial advantages over decimal... once it is mastered! It is only because it has not been entirely easy to master it that we have been required to take the additional, otherwise unnecessary step of conversion. The principal difficulty in binary arithmetic is in the appearance of the binary numbers themselves. They are homely, awkward and strange. They look like a string of stutters by an electric typewriter with a slipping key; and they pronounce only with difficulty. For example, the figure 11110101000 defies quick recognition by most humans, although most digital computers know it to be the sum of 2^10 plus 2^9 plus 2^8 plus 2^7 plus 2^5 plus 2^3--i.e., in decimal notation, 1960. To cope with this problem some workers have devised their own conventions of writing and pronouncing such numbers. A system in use at the Bell Telephone Laboratories would set off the above figure in groups of three digits: 11,110,101,000 and would then pronounce each group of three (or less) separately as its decimal equivalent. The first binary group, 11, is the equivalent of the decimal 3; the second, 110, of the decimal 6; the third, 101, of the decimal 5. (000 is zero in any notation.) The above would then be read, "Three, six, five, zero." Another suggestion, made by the writer, is to set off binary digits in groups of five and pronounce them according to the "dits" and "dahs" of Morse code, "dit" standing for 1 and "dah" for zero. Thus the date 1960 would be written: 1,11101,01000 and pronounced, "Dit, didididahdit, dahdidahdahdah." Obviously both of these proposals offer some advantages over the employment of no special system at all, i.e., writing the number as one unit and pronouncing it, "One one one one oh one oh one oh oh oh." Yet it seems, if only on heuristic principles, that conventions devised especially for binary notation might offer attractions. Such a convention would probably prove somewhat more difficult to learn than those, like the above, which employ features derived from other vocabularies. It might be so devised, however, that it could provide economy and a lessening of ambiguity. One such convention has already been proposed by Joshua Stern of the National Bureau of Standards, who would set off binary numbers in groups of four-- 111,1010,1000 and gives names to selected quantities, so that binary 10 would become "ap", 100 "bru", 1000 "cid", 1,0000 "dag" and, finally, 1,0000,0000 "hi". The only other names used in this system would be "one" for 1 and "zero" for 0, as in decimal notation. In this way the binary equivalent of 1960 would be read as, "bruaponehi, cidapdag, cid." It will be seen that the Stern proposal has a built-in mnemonic aid, in that the new names are arranged alphabetically. "Ap" contains one zero, "bru" two zeroes, "cid" three zeroes and so on. Such a proposal provides prospects of additions and refinements which could well approach those features of convenience some thousands of years of working with decimal notation have given us. Yet it may nevertheless be profitable to explore some of the other ways in which a suitable naming system can be constructed. As a starting point, let us elect to write our binary numbers in double groups of three, set off by a comma after (more accurately, before) each pair of groups. Our binary representation of the decimal year 1960 then becomes: 11,110,101,000 and we may proceed to a consideration of how to pronounce it. Taking one semi-group of three digits as the unit "root" of the number-word, we find that there are eight possible "roots" to be pronounced: Binary ------ 000 001 010 011 100 101 110 111 The full double group of six digits represents 8x8 or 64 possible cases. By assigning a pronunciation to each of the eight roots, and by affixing what other sounds may prove necessary as aids in pronunciation, we should then be able to construct a sixty-four word vocabulary with which we can pronounce any finite binary number. The problem of pronunciation does not, of course, limit itself to the case of one worker reading results aloud to another. It has been suggested that we all, in some degree, subvocalize as we read. To see the word "cat" is to hear the sound of the word "cat" in the mind, and when the mind is not able instantly to produce an appropriate sound reading falters. (This point is one which probably needs no belaboring for any person who has ever attempted to learn a foreign language out of books.) Reading is, indeed, accompanied often by a faint mutter of the larynx which can be detected and amplified to recognizable speech. Our first suggestion for pronunciation is that there is no need to assign word equivalents to each digit in binary notation, whether the words be "one" and "zero", "dit" and "dah" or any other sounds. We may if we choose regard each digit as a letter, and pronounce the word they construct as a unit in its own right--as, indeed, the written word "cat" is pronounced as a unit and not as "see, ay, tee." Perhaps the adoption of vowel sounds for the digit 0 (if only because 0 looks like a vowel) and consonant sounds for 1 will give us a starting point in this attempt. A useful consonant would be one which takes different sounds as it is moved forward and backward in the mouth: "t" and "d" are one such group in English. We can then assign one of these values for the single 1, one for a double 1, etc. Let us assign to the single 1 the sound "t" and to the double 1 the sound "d." Postponing for a moment the consideration of the triple 1, and representing the as yet unassigned vowel sounds of 0 by the neutral "uh," we can pronounce the first seven basic binary groups as follows: Binary Pronunciation ------ ------------- 000 (uh) 001 (uh) t 010 (uh) t (uh) 011 (uh) d 100 t (uh) 101 t (uh) t 110 d (uh) The eighth case, 111, can be represented in various ways but, by and large, it seems reasonable to represent it by the sound "tee." We now have phonemic representation of each of the eight possible cases, as follows: Decimal Written binary Pronounced binary ------- -------------- ----------------- 0 000 uh 1 001 ut 2 010 uttuh 3 011 ud 4 100 tuh 5 101 tut 6 110 duh 7 111 tee We can then continue to count, by compounding groups, ut-uh (001 000), ut-ut (001 001), ut-uttuh (001 010), etc. In this way the binary equivalent of the date 1960 might be read as "Ud-duh, tut-uh." Conceivably such a system, spoken clearly and listened to with attention, might serve in some applications, but merely by applying similar principles in assigning variable vowel sounds to the digit 0 we can much improve it. One such series of sounds which suggests itself is the set belonging to the letter O itself: the single o as in "hot" the double o as in "cool," and, for the triple o, by the same substitution as in the case of "tee," the sound of the letter itself: "oh." Our first few groups then become: Decimal Written binary Pronounced binary ------- -------------- ----------------- 0 000 Oh 1 001 Oot 2 010 Ahtah 3 011 Odd 4 100 Too 5 101 Tot 6 110 Dah 7 111 Tee followed by: 8 001 000 Oot-oh 9 001 001 Oot-oot 10 001 010 Oot-ahtah et cetera. Pronounceability has been somewhat improved, although we are still conscious of some lacks. The phonemic distinction between "ahtah" and, say, "odd-dah," the pronunciation of 011 110 is not great enough to be entirely satisfactory. At any rate, on the above principles our binary equivalent of 1960 is now pronounced, "odd-dah, tot-oh." At this point we may introduce some new considerations. We have proceeded thus far on mainly logical grounds, but it is difficult to support the thesis that logic is the only, or even the principal, basis for constructing any sort of language. Irregularities and exceptions may in themselves be good things, as promoting recognition and lessening ambiguity. It may suffice to have only an approximate correlation between the symbols and the sounds they represent, as, indeed, it does in the written English language. As the author feels an arbitrary choice of supplemental sounds will enhance recognition, he has taken certain personal liberties in their selection. The consonant L is added to "oh," making it "ohl"; all root-sounds beginning with a vowel are given a consonant prefix when they occur alone or as the second part of a compound word; certain additional sounds are added at the end of the same roots when they occur as the first part of a compound word; "dah" becomes the diphthong "dye"; etc. The final list of pronunciations, then, becomes: Binary quantity Pronunciation when Pronunciation when alone in first group or in second group --------------- ------------------ ------------------------ 000 ohly pohl 001 ooty poot 010 ahtah pahtah 011 oddy pod 100 too too 101 totter tot 110 dye dye 111 teeter tee Decimal 1960 is now in its binary conversion pronounced, "Oddy-dye, totter-pohl." We may yet make one additional amendment to that pronunciation, however. The conventions of reading decimal notation aloud provide a further convenience for reading large numbers or for stating approximations. In a number such as: 1,864,191,933,507 we customarily read "trillions," "billions," "millions," etc., despite the fact that there is no specific symbol calling for these words: We read them because we determine, by counting the number of three-digit groups, what the order of magnitude of the entire quantity is. The spoken phrase "two million" is a way of saying what we sometimes write as 2x10^6. In the above number we might say that it amounts to "nearly two trillion" (in America, at least), which would be equivalent to "nearly 2x10^12" A similar convention for binary notation might well be merely to announce the number of groups (i.e., double groups of six digits) yet to come. An inconveniently long number might be improved in this way, a number such as 101 001, 111 011, 001 010, 000 100 being read as "Totter-oot three groups, teeter-odd, ooty-pahtah, ohly-too." By the phrase "one group" we would understand that the quantity of the entire number is approximately the product of the number spoken before it times 2^6 or 64 (1,000 000). "Two groups" would indicate that the previous number was to be multiplied by 2^12, "three groups" by 2^18, etc. Or, in binary notation: One group: x 10^(1,000 000^1) Two groups: x 10^(1,000 000^10) Three groups: x 10^(1,000 000^11) Four groups: x 10^(1,000 000^100) And so on, so that as we say, in round decimal numbers: "Oh, about three million," we might say in round binary numbers, "Oh, about ooty-poot three groups." It may be considered that there is an impropriety in using a term borrowed from decimal notation to indicate binary magnitudes. Such an impropriety would not be entirely without precedent--the word "thousand," for example, etymologically related to "dozen," being an apparent decimal borrowing from a duodecimal system. In any event, as logic and propriety are not our chief considerations, we may reflect that the group-numbering will occur only when, sandwiched between binary terms, there will be small chance for confusion, and elect to retain it. With this final emendation our spoken term for the binary equivalent of 1960 becomes, "Oddy-dye one group, totter-pohl." Let us now consider that we have achieved a satisfactory pronouncing system for binary numbers and take up the question of whether similar principles can lead us to a more compact and recognizable method of graphically representing these quantities. The numbing impact on the senses of even a fairly large number expressed in binary terms is well known. Although conventions for setting off groups and stating approximations, as outlined above, may be helpful, there would appear to be intrinsic opportunities for error in writing precise measurements, for example, in binary notation: One of the most common writing errors in numbering arises from transposing digits, and as binary numbers have in general about three times as many digits as their decimal equivalents, they can be assumed to furnish three times as many opportunities for error. We have previously chosen to write binary numbers in double groups of three digits each. As each group represents eight possible cases we would, in the manner of the Bell workers, represent each group by its decimal equivalent, so that decimal 1960, which we have given as binary 011 110, 101 000, could be written in some such fashion as B36,50. Yet again we may hope to find advantages in a uniquely designed system for binary numbers. The author has experimented with a system which has little in common with notations intended for human use but some resemblance to records prepared for, or by, machines. For example, a rather rudimentary "reading" machine intended to grade school examinations or similar marked papers does so by taking note not of the symbol written but of its position on the paper--check marks, blacked-in squares, etc. An abacus, too, denotes quantities by in effect recording the position of the space between the beads at the top of the wire and the beads at the bottom. Indeed, one such system might be designed after the model of the abacus, requiring the use of paper pre-printed with a design like this: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Each vertical column of eight dots can represent one three-digit group. To represent the binary equivalent of 1960, one would circle the fourth dot in the first column, the seventh dot in the second, the sixth dot in the third and the first dot in the fourth. The first dot in each column would represent 000, the second 001 and so on. If we permit the use of preprinted paper, a more compact design might be a series of drawings of two squares, one above the other, thus: __ __ __ __ __ | | | | | | | | | | `--' `--' `--' `--' `--' __ __ __ __ __ | | | | | | | | | | `--' `--' `--' `--' `--' Each pair of two squares is made up of eight lines. By assigning to each side of a square a value from 000 through 111, a simple check mark or dot would show the value for that group: 010 (2) +-------+ | | 001 (1) | | 011 (3) | | +-------+ 100 (4) 110 (6) +-------+ | | 101 (5) | | 111 (7) | | +-------+ 000 (0) A vertical pair of squares in which the left-hand side of the lower square was checked would then show that the value 101, or 5, was recorded for that group. Some readers will have remarked that the orderly system of representing the groups from 000 to 111 has been abandoned, the 000 group coming last in this representation. The reason for this is that we have a perfectly adequate symbol for a zero quantity already--that is, 0, which is unambiguous in numbering to almost all bases. (The sole exception is the monadic system, to the base 1. As this does not permit positional notation it does not require any zero.) Using the 0 symbol here does not, of course, fit in with our plan of checking off lines on squares, but this is itself only a way-station in the attempt to find a suitable notation which will not require the use of preprinted paper. We might find such a notation by drawing the squares ourselves as needed, or at least drawing such parts of them as are required. For 001 we would have to show only the first line of the first square. For 010 we need two lines, the left-hand side and the top. For 011 we required three lines. At this point we begin to find the drawing of lines laborious and reflect that, as it is only the last line drawn which conveys the necessary information, we may be able to find a way of omitting the earlier ones. Can we do this? We can if, for example, we explode the square and give it a clockwise indicated motion by means of arrowheads: A-------> | | | | | | <-------V Each arrow has a unique meaning. A is always 001; < is always 100. In order to distinguish the arrows of the first and second squares, we can run a short line through the arrows of the lower square; thus V+ is 111, not 011. Having come this far, we discover that we are still carrying excess baggage; the shaft of the arrow is superfluous, the head alone will give us all the information we need. We are then able to draw up this table: Decimal Binary, digital Binary, graphic ------- --------------- --------------- 0 000 0 1 001 A 2 010 > 3 011 V 4 100 < 5 101 A+ 6 110 >+ 7 111 V+ and our binary equivalent for 1960 may be written as V>+0. The writer does not suggest that in this discussion all problems have been solved and binary notation has been given all the flexibility and compactness of the decimal system. Even if this were so, the decimal system possesses many useful devices which have no parallel here--the distinction between cardinal and ordinal numbers, accepted conventions for pronouncing fractions, etc. It seems probable, however, that many nuisances found in conversion between binary and decimal systems can be alleviated by the application of principles such as these, and that, with relatively little difficulty, additional gains can be made--for example, by adapting existing computers to print and scan binary numbers in a graphic representation similar to the above. Indeed, it further seems only a short step to the spoken dictation to the computer of binary numbers and instructions, and spoken answers in return if desired; but as the author wishes not to confuse his present contribution with his prior publications in the field of science fiction he will defer such prospects to the examination of more academic minds. Decimal Binary, digital Binary, graphic Pronounced ------- --------------- --------------- ---------- 0 000 0 pohl 1 001 A poot 2 010 > pahtah 3 011 V pod 4 100 < too 5 101 A+ tot 6 110 >+ dye 7 111 V+ tee 8 001 000 AO ooty-pohl 9 001 001 AA ooty-poot 10 001 010 AV ooty-pahtah 11 001 011 AV ooty-pod 12 001 100 A< ooty-too 13 001 101 AA+ ooty-tot 14 001 110 A>+ ooty-dye 15 001 111 AV+ ooty-tee 16 010 000 >0 ahtah-pohl 17 010 001 >A ahtah-poot 18 010 010 >> ahtah-pahtah 19 010 011 >V ahtah-pod 20 010 100 >< ahtah-too 30 011 110 V>+ oddy-dye 40 101 000 A+0 totter-pohl 50 110 010 >+> dye-pahtah 100 001,100 100 A,<< poot one group too-too 1,000 001 111,101 000 AV+,A+0 ooty-tee one group totter-pohl ... ... ... ... Decimal: 1,000,000 Binary, digital: 011,110 100,001 001,000 000 Binary, graphic: V,>+<,AA,00 Pronounced: pod three group dye-too ooty-poot ohly-pohl OR "Oh, about pod three groups." From: On Binary Digits And Human Habits by Frederik Pohl (1962) See also: How To Count On Your Fingers by Frederik Pohl (1956) BCR Bytewords RFC 1751 S/KEY, Another Human Readable Encoding (1994) PGP Word List (1995) * * * The number 31337 could be represented as: 111,101 001,101 001 And spoken as: tee two group totter-poot totter-poot Below is an AWK script to convert between decimal and Pohl code. pohlcode.awk Extra credit ideas: * Floating point numbers * Equally amusing hand signs for a signed version tags: article,retrocomputing,technical Tags ==== article retrocomputing technical