Subj : Re: How-to on writing an interpreter?
To   : comp.programming
From : Rob Thorpe
Date : Mon Jul 18 2005 12:43 pm

Flavius Vespasianus wrote:
> "C. Rebert" <cvrebert@gmail.com> wrote in news:1121485564.371721.318960
> @g43g2000cwa.googlegroups.com:
>
> > I'm trying to implement an interpreter for a programming language, but
> > so far haven't been able to find any good materials on how to write a
> > simple stack-based interpreter. I would appreciate it if someone could
> > point me to some resources on writing such an interpreter. I'd be open
> > to buying such materials, as long as they aren't ludicrously expensive.
> > Thanks for your time.
>
> There are about 5 books on compilers and interpreters available on
> Amazon.
>
> (Apparently Aho et all have a new compiler book coming out in Nov.)
>
> Having done it many times, I will give a brief outline.....
>
> The basic process is to define the language and to define the
> intermediate structure to be interpreted. The latter will be symbol
> tables and trees/graphs representing the code to execute.  (OO languages,
> such as C++ are EXTREMELY well suited for such applications)

Actually I can't think of anything less useful for contructing trees
than OO.  It can be done very simply using procedural programming.

Each node of the tree is a little struct like this: -

struct node_t
{
  tree_t type; /* This is an enum describing what this tree node is */
  operand_t value; /* A union/struct storing data vals in some way,
                      or ptrs to them.  */
  struct node_t *left; /* next child node to the left */
  struct node_t *right; /* next child node to the right */
} node_t;

Then two little functions can be written to make the tree.
First a function to create sub-trees representing "command" concepts
(keywords, operators, etc) in the language:-

/* Construct node */
node_t *cons(tree_t op, node_t *left, node_t *right)
{
  /* get node, for instance by calling malloc */
  node_t n = node_alloc();

  n->type = op;
  n->left = left;
  n->right = right;
  return n;
}

op is an variable describing one of the "command" concepts in the
language.  A node for the ADD operation is then constructed like this:-

sub_tree = cons(ADD, x, y);

This function can only create binary trees. This isn't much of a
problem, unary operation can be stored by leaving one field blank,
M-ary operations by chaining together several operations.
Alternatively, the function could be expanded to support M-ary
operations.

Then a function is needed to construct the nodes at the leaf, those
that contain operands: -

/* make a leaf node */
node_t *atom(tree_t type /* type of new node */, operand_t operand)
{
  node_t n = node_alloc();

  switch(n->type = type)
  {
    case NUM:
      n->value.number = operand.number;
      break;
    case STRING:
      new->value.string = operand.string;
      break;
    /* And so one for variable names, character constants  etc ... */
  }
  return n;
}

And that is fairly much it.  An expression like :-

4 * 9

Would be stored by executing: -

node_t a, b;
a.num = 4;
b.num = 9;
sub_tree = cons(MULTIPLY, atom(NUM, a), atom(NUM, b));

sub_tree is then passed upwards to become part of the next sub-tree.

You could build all these function into an object if you really wanted
to but it wouldn't be much more useful than the above.

(BTW The above code is not my idea, it's a way some of Eric Raymond's
interpreters work; inspired by lisp)

> Then you have to write a parser to convert the language into the
> intermediate form.
>
> Then you have to write the interpeter itself that "execute" the
> intermediate form.

Actually you don't even have to do it that way.  A sufficiently simple
grammar can be executed without even being translated into an
intermediate form (something I didn't mention up-thread but probably
should have done).

If you're doing something really simple that can be sufficient.

.