I. Introduction TYPO3.MAL is a typo finder/spelling checker that takes about 30% of the time and slightly less memory than TYPO2C. TYPO3 is subject to some limits (explained in more detail below): . -Do not use on a document longer than about 1400 words (6000 bytes). If your document is longer, split it up and check each part separately. . -The program requires free memory of between about 4k and 5.5k depending on the document's length. . -For about 10% of the documents the program will return either a Bad Subscript (Error=9), Out of String (Error=14) or Out of Memory (Error=7) during execution. Why, and what to do, are described below. TYPO3 looks for two kinds of typos. First, it produces a list of words found once in the document that appear nowhere else in memory. The theory is that typos (e.g., transpositional errors) will occur only once, but good words are likely to appear more than once. Some of the words on the list will be fine, but your job in reviewing the document will be simplified --- typically to reviewing about 1/3 of the words in the document. (The list will have the front and back of any word hyphenated at the end of a line). Second, the program will display a list of doubled words ("Paris in the the spring"). Here's a sense of what it takes to run the program: . FILE 1 FILE 2 FILE 3 . .Bytes long 2330 5009 3885 .Words long 383 1280 1083 .Distinct words 194 285 238 . FRE(0) (holding 2500 in TYPO2C and 3000 in TYPO3 for non-string) .TYPO2C 1247 497 442 .TYPO3 1569 744 689 . Time to review document being proofed .TYPO2C(MIN:SEC) 5:19 20:36 12:39 .TYPO3 3:00 7:31 5:16 . TOTAL TIME UNTIL DISPLAY OF RESULTS .TYPO2C 49:27 65:04 53:45 .TYPO3 15:11 18:23 16:14 . Your mileage may vary. Use for comparision only. (Done with little else in memory and version just this.) II. Runtime. The program displays a copyright/liability notice. Then it displays a list of files and asks "Which file to check?" You can answer in upper or lower case and with or without the ".DO" extension--the program will convert the title to upper, and add ".DO" if it's missing. If you enter an illegal file name, the program will crash-- it has no error correction. Because of the way the program allocates memory, if it crashes here OR FOR ANY REASON, after fixing the problem do *NOT* simply hit F4 [RUN]. Go back to the menu and reeneter from the menu. Assuming you've entered a legal name, the screen will clear and numbers will flash near the left margin on the second line of the display. These numbers represent the number of spaces in the document -- a rough estimate of the number of words in the document. The starting time will appear near the bottom of the screen. The program will create arrays based on a formula using this number to hold the different words and count their frequency. If you get an error message for "ERROR= 7" (out of memory) or "ERROR=14" (out of string space) here you have a couple of choices. If you get an Out of Memory error you can increase the "3000" in line 1. If you get an Out of String error either increase your free memory by killing a file in memory or decreasing "3000" in line 1. If these don't help, cut the document in two with CUT and PASTE. Remember to restart after making any of these changes by running from the main menu and to empty the paste buffer before running. If you don't get an error, the word "WAIT" will switch back and forth between normal and reverse video on the line beneath the count described above. Beneath "WAIT" a new count (of the number of different words) will appear. When the program has finished the file to be checked and is moving on to search other files, the name of the file being searched will appear on the line below the second counter described above and a new time will be printed to the right of the time already there. If instead you get "ERROR=9" (Bad Subscript), change the "150" in line 5 to "180" or "200" and rerun from the menu. If you are checking a file with a name that is the first part of name of another file (i.e., "Memo" and "Memo1"), the second file will not be used to look for common words. (But "Memo1" will be checked against "Memo2"). After the last document file is checked, "SEARCHING ROM" will appear in place of a file name. The screen will clear, then say "STANDBY FOR RESULTS", list the ending time, and begin to display "unique" words--all in lower case regardless of how they appear in the document. Every seven lines the screen will say "HIT ANY KEY TO CONTINUE" and wait for a key. When finished the screen may say (if applicable) "The following words appear to be doubled:" and print a list of doubled words. The screen will then say "*****ALL DONE*****", "Break in 29", "Ok". If you want to see the word list again, type "GOTO 26". As the program displays unique words, you should write down any that are typos. When the program is finished, enter the document from the menu, use the F1 [FIND] function, type in the unique typo, and hit enter. Then fix the typo! III. The program. This section is intended only for people who want to understand the approach of the program or fiddle with (and not just use) the program. Line 0 prints a copyright and liability notice. Line 1 allocates all but 3000 free bytes to string use, defines all variables through "T" as integers, the rest as strings. It clears the screen, displays FILES, assigns the file to be checked to U and X, U's length to M. If not in all caps, the program changes the file name to all caps. Line 2 adds "DO" to the file name, if needed. Lines 3 and 4 do the space (word) count. Line 5 uses the space count to guesstimate the number of different words in the file. It then divides this number by 10, and creates two arrays with 11 rows (don't forget 0) and a number of columns equal to one-tenth the estimated number of words. (One array is for words, the other frequencies.) ASIDE: This two-dimension array is the key to TYPO3's speed. I am indebted to my colleague, Howard Smith, for the idea of using it. The basic notion was this - the first thing one does in a real dictionary is go to the section based on the first letter. This looks like a 26 row array. (Sorting the words within each row would be very time consuming, but picking a row isn't.) But the array would need to have as many columns as the most frequent first letter -- a gross waste of columns in rows for infrequent first letters. TYPO3 uses ten rows, each with one or more first letters. Based on frequency each row should be about the same length. There is a general spill over row. Back to the program by lines. Line 6 opens the file. Line 7 takes the ASCII value of the next character in the document and checks it. If a control code it takes another character; if a space or carriage return it declares the end of a word and goes to 9; if not it makes sure all characters are lower case and then goes to line 8, which adds the character to the word and goes back to 7. (ASCII values instead of letters are used where posible to avoid garbage collection). Line 9 strips punctuation from the right of a word, line 10 from the left. Line 11 checks this word against the last, and if the same adds it to the doubled word array, U. Line 12 computes which of the ten main rows of the array contains words that start with the same letter as the new test word. Line 13 checks to see if this word is already in that row of the array. If so, the frequency is upped; if not and the file being processed (see below) is the one to be checked, the word is added to the list and the screen count upped by line 18. (If the end of the row is reached, words are added to the general overflow row (0) which thereafter is also checked before line 18 is reached.) Line 19 alternates the "WAIT". Line 20 checks to make sure its a file, not rom, being processed; if so it goes to get the next word. Line 21 is reached on an End of File situation. It sets the file flag from 0 to 1 on the first pass through; else it returns to the calling subroutine. Lines 22-23 (borrowed from "FILEN" on this SIG) find other ".DO" files to use in cross checking. They also make sure the file to be corrected (X) is not rechecked. Lines 24-25 search rom words. The other file and ROM lines call the main word checking part of the program as a subroutine. Lines 26-27 print out the unique words. Lines 28-29 print out the doubled words. Lines 40-42 are the on error routine. Once a minute they sound 7 beeps until interrupted by hitting any key, as well as printing the error number (see your manual) and line. Good luck! &mike&