\ TR PROGRAM
\ TRANSLATES SOURCE FILE INTO DESTINATION FILE.
\ WORKS LIKE UNIX tr WITH FOLLOWING EXCEPTIONS:
\ 1. -A OPTION NEEDED FOR ASCII FILES.
\ 2. HANDLES NULL CHARACTERS!

\ This program Copyright (C) 1985 by Thomas Almy.
\ Permission is granted to registered users of ForthCMP to sell or distribute
\ computer programs incorporating the compiled contents of this file.


0 [IF]

( note -- program has been modified since writing this paper, and
  this paper benchmarks the original CP/M version of the compiler)

                              TRANSLATE PROGRAM
                                 by Tom Almy
                                 August 1985

PROGRAM DESCRIPTION     

     This program was designed to mimic the functionality of the "tr"
program provided on UNIX (tm Bell Labs) systems.  While written to be
compiled with the author's ForthCMP Forth Compiler, it can be
utilized on any 83 Standard system by providing an appropriate file
system interface.

     TR is used to make one or more single character substitutions in
a file.  ForthCMP's FILTER file interface allows specifying an input
file and an optional output file (if no output file is specified,
output goes to the display).  The file name(s) are followed by an
optional option specification and one or two character specifying
strings. 

     Characters in the strings may be any character except "\"
(backslash) or "-" (hyphen).  Any of the 256 possible character codes
can be specified by backslash followed by one, two, or three octal
digits.  Backslash followed by a lower case character becomes an
upper-case character (done to allow putting the string on a CP/M
command line).  Backslash followed by any other character is that
character, so "-" can be represented as "\-" and "\" can be
represented as "\\".  A range of characters can be represented by the
first character followed by a hyphen followed by the last character.

     If no options are specified, a translation occurs in which each
character which is in the first specification string is replaced with
the character in the same position in the second specification
string.  If the second string is shorter than the first then the
string is extended by appending copies of its last character.

     The option specifcation can contain any or all of the following
characters:

A ASCII mode: On input CR is deleted (leaving just LF of CR-LF pairs)
     and on output LF is replaced with CR-LF pairs.  This allows 
     translating to or from CR-LF pairs.  Additionally, Control-Z denotes
     end of file.

C Complement first string: The first string is replaced with a new
     string consisting of the characters in the range 0 through 255
     not in the first string.  This string is sorted.

D Delete instead of translate: No translation takes place; 
     instead any characters in the first specification string are deleted.

S Squeeze output: Sequential occurrences in the character stream
     (after translation/deletion) of two or more identical
     characters in the second specification string are squeezed to a
     single occurrence.

Example commands:

Options   String1        String2        Function
          a-z            A-Z            Upcase file
          a-zA-Z         A-Za-z         Swap case file
AS        \12            \12            Delete blank lines
ACS       !-~            \12            Put all words on separate lines
AS        \40            \12            Put all words on separate lines
ACDS      A-Za-z\12\40                  Delete all non alphabetics, except
                                        spaces and newlines.
          \200-\377      \0-\177        Clear parity bits.

PERFORMANCE

     I compared the performance of the Forth program, using the
ForthCMP compiler, with that of C, using the MANX (AZTEC) compiler.
The system used was a LOBO MAX-80, which has a 5-Mhz Z-80 processor,
1.2 MByte 8" floppy drives, and runs CP/M+.


Characteristic                          Forth          C

Source file lines (not blank)           163            139

Compilation time    Compile Step        44             44   seconds
                    Assemble Step       none           32
                    Link Step           none           38         
                    TOTAL               44             114

COM file size                           3584           9984 bytes

Test case execution time                21             138  seconds

     The test case involved upcasing a 14k byte file.  The PIP
program (which is written in assembly language) took 16 seconds.

     The ForthCMP compiler compiles and links in a single step.  5
seconds was spent producing a load map (not done in the C example),
so the ForthCMP compilation time could really be considered to be 39
seconds.  



READING THE LISTING

     First, ignore the INCLUDE, ROMABLE, and IN/OUT commands, as they
are directives for the compiler.  The definition of CARRAY is "CREATE
ALLOT DOES> +".  The definition of C<- is "SWAP C!". The non-standard
words ?DO " <= >= ON OFF ASCII CONTROL SKIP and SCAN and Eaker' case
statement (CASE OF ENDOF ENDCASE) have their usual definitions.

     The file interface redefines KEY and EXPECT to read from the
input file.  KEY returns -1 on end of file; otherwise it returns the
next character as an integer in the range 0 through 255.  Because the
new EXPECT does not echo and has no editing, OLD- EXPECT (which is
system dependent) had to be provided.  Output (EMIT and words which
call it) is rewritten to send output to the output file when FILTER
is executed, or to the display when CONSOLE is executed.

     SETFILES is used to initialize the input and output files, and
returns TRUE if successful.  The double variable OPTIONSTRING is set
to contain a pointer to and length of the command tail (that part
excluding the file specifications).  ENDFILES does any necessary file
closing.



[THEN]



\ Modified for new filter August, 1986
\ Modified for newest DOS interface 12/91
100 MSDOS
HEX 4000 DECIMAL CONSTANT BUFSIZ
INCLUDE DOS1


256 CARRAY TRTABLE  \ translation table 
256 CARRAY SQTABLE  \ squeeze duplicates table 
CREATE  INLIST  512 ALLOT  \ instring values 
CREATE  OUTLIST 512 ALLOT  \ outstring values 
VARIABLE DEL-FLAG   \ deletion flag specified 
VARIABLE COM-FLAG   \ reverse sense flag specified 
VARIABLE SQU-FLAG   \ squeeze output string flag 
VARIABLE ASC-FLAG   \ ascii-mode --> CR dropped on input, added
                    \ before LF's on output , CONTROL-Z terminates file 
VARIABLE ^LIST
VARIABLE LASTCHAR

CTRL M CONSTANT ACR	\ Carriage Return
CTRL J CONSTANT ALF	\ Line Feed


\ OUTPUT FILE HANDLING ( basically filter.4th )

VARIABLE outhandle  stderr outhandle !
VARIABLE outbuffer
VARIABLE outbufptr
VARIABLE outbufend

0 0 IN/OUT 
: flushout   outbuffer @ outbufptr @ <> IF
	outhandle @ outbuffer @  outbufptr @ outbuffer @ - DUP >R write
	outbuffer @ outbufptr ! R> <> IF  stderr outhandle !
		." DISK FULL " flushout 4 RETURN THEN 
	THEN ;

: EMIT  outbufptr @ DUP outbuffer @ BUFSIZ + = IF flushout
   DROP outbuffer @ THEN C! 1 outbufptr +! ;

0 0 IN/OUT 
: CONSOLE flushout stderr outhandle ! ;

0 0 IN/OUT 
: FILTER  flushout stdout outhandle ! ;

0 0 IN/OUT : BYE flushout  bye ;

0 0 IN/OUT : ABORT flushout 4 RETURN ;

\ INPUT FILE PROCESSING
VARIABLE inbuffer  ( pointer to allocated buffer )
VARIABLE inbufptr  
VARIABLE inbufend

0 0 IN/OUT
: SETBUFS  ( must execute before any I/O to allocate buffers )
  129 128 C@ >BUFFER
  HERE inbuffer !
  BUFSIZ ALLOT
  HERE DUP outbuffer ! outbufptr !
  BUFSIZ ALLOT
  ; 



\ This version of KEY returns -1 on end of file!
: KEY  inbufptr @ inbufend @ = IF ( fetch block )
	stdin inbuffer @ BUFSIZ read ?DUP 0= IF ( EOF/ERROR ) -1  EXIT THEN
	inbuffer @ + inbufend !
	inbuffer @ inbufptr ! THEN
    inbufptr @ C@  1 inbufptr +!  ;


\ Commentary
0 0 IN/OUT
: HELLO
  ." TRANSLATE PROGRAM" CR
  ." Copyright (C) 1985 by Thomas Almy." CR ;

0 0 IN/OUT
: USAGE
	CONSOLE 
	CR ." [-[A][C][D][S]] str1 [str2]"
	CR  ." Options are Ascii Complement-str1 Delete Squeeze"
	CR  ." strings may have \octal or range specifications."
	ABORT
;

\ List Accessing
1 0 IN/OUT
: ISLIST ( list -- ) ^LIST ! ;

1 0 IN/OUT
: !LIST ( char -- ) ^LIST @ !  2 ^LIST +! ;

0 1 IN/OUT
: @LIST ( -- char ) ^LIST @ @  2 ^LIST +! ;

\ Miscellaneous Subroutines
1 1 IN/OUT
: UPCASE  ( char -- char )
   DUP [CHAR] a >= IF DUP [CHAR] z <= IF BL - THEN THEN ;

PRIMITIVE
: NEXT-CHAR ( addr len -- addr+1 len-1 char, or zero if end )
   DUP IF 1- SWAP COUNT ROT SWAP ELSE FALSE THEN ;

PRIMITIVE
: OCTAL? ( addr len -- addr len boolean )
   OVER C@ DUP [CHAR] 0 >= SWAP [CHAR] 7 <= AND ;

: ?BACKSLASH ( addr len char -- addr' len' value )
   DUP [CHAR] \ = IF DROP
   OCTAL? IF NEXT-CHAR [CHAR] 0 - >R
        OCTAL? IF NEXT-CHAR [CHAR] 0 - R> 8 * + >R
          OCTAL? IF NEXT-CHAR [CHAR] 0 - R> 8 * + >R
        THEN THEN R>
    ELSE
        NEXT-CHAR  
   THEN THEN ;

: FILL-LIST ( string length list -- )
  ISLIST
  BEGIN  NEXT-CHAR ?DUP  WHILE
    DUP [CHAR] - = IF DROP NEXT-CHAR ?BACKSLASH 1+
        ^LIST @ CELL- @ 1+  DO I !LIST LOOP  ELSE
    ?BACKSLASH !LIST  THEN  REPEAT
  -1 !LIST ( delimit list )
  2DROP ;


\ Handle option string
0 0 IN/OUT
: DO-OPTION-STRING
	HERE COUNT SWAP 1+ SWAP 1 ?DO
		COUNT UPCASE CASE
			[CHAR] A OF  ASC-FLAG ON ENDOF
			[CHAR] D OF  DEL-FLAG ON ENDOF
			[CHAR] C OF  COM-FLAG ON ENDOF
			[CHAR] S OF  SQU-FLAG ON ENDOF
			." UNKNOWN OPTION -- " EMIT USAGE ENDCASE
		LOOP 
	DROP
	BL WORD DROP  ( scan next word )
	;

0 0 IN/OUT
: SET-OPTIONS
	ASC-FLAG OFF
	DEL-FLAG OFF
	COM-FLAG OFF
	SQU-FLAG OFF
	BL WORD COUNT 0> SWAP C@ [CHAR] - = AND IF ( an option string )
		DO-OPTION-STRING  
	THEN
	;


\ Various Table handling routines
1 0 IN/OUT 
: SET-SQUTABLE    ( hostlist -- )  ISLIST
  ['] SQTABLE >BODY 256 0 FILL
   BEGIN  @LIST DUP 0< 0= WHILE
            SQTABLE TRUE C<- ( set flag in byte )
   REPEAT DROP ;

0 0 IN/OUT 
: COMPLEMENT-LIST ( complements INLIST )
  INLIST  SET-SQUTABLE  INLIST ISLIST
  256 0 DO I SQTABLE C@ 0= IF I !LIST THEN LOOP
  -1 !LIST  ;

0 0 IN/OUT
: FILL-TRTABLE   ( TRTABLE gets filled from INLIST )
  ['] TRTABLE >BODY 256 0 FILL
     INLIST ISLIST BEGIN  @LIST DUP 0< 0= WHILE
           TRTABLE TRUE C<-  ( set flag in byte )
      REPEAT  DROP ;

0 0 IN/OUT
: SET-TRTABLE  ( TRTABLE is translation table from INLIST to OUTLIST )
   256 0 DO I DUP TRTABLE C! LOOP   INLIST ISLIST
   OUTLIST BEGIN  ^LIST @ @ 0< 0= WHILE
      DUP @ 0< IF DUP CELL- @ ELSE DUP @ SWAP CELL+ SWAP THEN
      @LIST TRTABLE C! REPEAT
   DROP ;

\ Information from user?
0 0 IN/OUT
: GET-RANGES
	HERE COUNT INLIST FILL-LIST
	COM-FLAG @ IF 
		COMPLEMENT-LIST 
	THEN
	
	BL WORD COUNT OUTLIST FILL-LIST
	SQU-FLAG @ IF 
		OUTLIST SET-SQUTABLE 
	THEN
	DEL-FLAG @ IF 
		FILL-TRTABLE 
	ELSE 
		SET-TRTABLE 
	THEN
;

\ Translate functions
PRIMITIVE
: NOT-DELETED? ( key -- key TRUE OR FALSE )
	DUP TRTABLE C@  IF DROP FALSE  ELSE TRUE THEN ;

1 0 IN/OUT
: SEND-IT SQU-FLAG @ IF
           DUP SQTABLE C@ IF
             DUP LASTCHAR @ = IF  ( a duplicate! )
                DROP EXIT THEN THEN
           DUP LASTCHAR ! THEN
    DUP ALF = IF
	ASC-FLAG @ IF
		ACR EMIT THEN THEN
   EMIT ;

: NEW-KEY? ( -- key TRUE OR FALSE )
	ASC-FLAG @ IF 
  		BEGIN KEY DUP ACR = WHILE DROP REPEAT
		DUP 0< OVER [CTRL] Z = OR   
	ELSE
		KEY DUP 0<  
	THEN  
	IF DROP FALSE ELSE TRUE THEN ;

0 0 IN/OUT
: TRANSLATE  
	LASTCHAR ON
	BEGIN 
		NEW-KEY? 
	WHILE
		DEL-FLAG @ IF 
			NOT-DELETED? IF SEND-IT THEN
		ELSE  
			TRTABLE C@  SEND-IT 
		THEN
	REPEAT
   ;

\ TOP LEVEL
: MAIN   
	SETBUFS
	HELLO
	FILTER
	SET-OPTIONS
	GET-RANGES
	TRANSLATE
	BYE
;

INCLUDE DOS2
INCLUDE FORTHLIB
END
