Notes on AWK-to-C Translator Design
===================================
(would probably only make sense to those are/were involved with the
 development of GAWK)


For those who are interested in the design of the translator (and perhaps
want to contribute to further development):

First of all, please read the comments in the source files (eval.c, eval2.c,
main.c, driver.c, and awk.h in particular). 

The basic scheme starts off by using a slightly modified parser from GAWK
which produces a parse tree.  The original recursive-descent interpreter
and expression evaluator (interpret() and tree_eval() in eval.c) were then
EXTENSIVELY revised to walk the parse tree and emit C code on the fly.
To emit optimal C code, the translator first pre-walks the parse tree and
sets up information of user variables and like in the parse tree.  Then, on
the actual translation walk, the expression evaluator emits optimized C
code based on how variables are used in the AWK program  (i.e. if a variable
is only used as number in the life of a program and never as a string, then
the emitted C code can be simpler and therefore faster at runtime).  I 
won't go into the various types of optimization the translator attempts
and instead direct you to 'eval.c' to read the comments.  This is also
one area for future additions/improvements.

Any utility routines (for assoc. arrays, regexp, etc) were pretty much
left alone except for those who depended on the parse tree as one of their
inputs.  These had to be modified to become parser-tree independent.

A host of new routines for runtime use (i.e: to perform variable assignments, 
etc are housed in eval2.c) 

The single most difficult part of designing the translator was simulating
the number/string duality of AWK in a static C program and performing
appropriate conversions on the fly.  The other difficult part was that
AWK allows practically any expression to be an operand in any other expression.
(i.e. 'g = ($(5 < 6) ~ /[a-z]/) * aa[gg && -1])' )

My solution was basically to modify the expression evaluation function
(tree_eval) to accept a flag from the parent caller, and return a flag to
the parent, where these flags indicate various types (number constant,
a number/string node, boolean, etc).  This way, the parent expression 
operation can pass a flag to the child expression demanding it to
emit C code where the child expression conforms to the type specified in the
flag.

i.e:  for an expression like:  "45.3" + a, where 'a' is a variable which is
      possibly a number, string, or both.  Here, the arithmetic addition
      routine (parent) would call tree_eval() on the two operands demanding them
      to conform to the number type.  Then, the routine which handles constants
      will see this flag and convert "45.3" to a number and emit the number 
      45.3.  The routine for handling variable access will also see this
      flag and emit C code to force the number part of 'a' to become current.

      Now the addition operation itself would examine it's parent's flag
      and if the flag is demanding something other than a number (which 
      this operation naturally produces), it will emit C code to convert
      the result accordingly. 

All operation routines also return a flag signifying their natural type.
This is used a by a parent operation to "peek" ahead and decide how it
should behave.  For example, operations like comparison operations (>=, !=)
are dependent on the type of their operands (number or string) for correct
behaviour.
        


Leonard Theivendra
IBM Toronto Software lab
------------------------------------------------------------------------------
E-Mail:  theiven@skule.ecf.toronto.edu (<= 08/31/96)
         theivend@torolab6.vnet.ibm.com (>= 07/01/96)

Standard Disclaimer:  Any opinions expressed are solely my own and not
                      that of IBM Corp.

