https://rahul.gopinath.org/post/2019/12/07/python-mci/

Rahul Gopinath  
Talks Service Posts Publications Academics Contact Resume

Python Meta Circular Interpreter

Published: Dec 7, 2019

Contents

 1. Uses of a Meta Circular Interpreter
     1. The meta-circular-interpreter base
     2. The semantics class
         1. Modules
             1. Module(stmt* body)
         2. The expressions
             1. Expr(expr value)
         3. The Pythonic data structures.
             1. Constant(constant value, string? kind)
             2. Number(object n)
         4. Containers
             1. List(elts)
             2. Tuple(elts)
             3. Subscript(expr value, slice slice, expr_context ctx)
             4. Attribute(expr value, identifier attr, expr_context
                ctx)
         5. Simple control flow statements
         6. Implementation
         7. Major control flow statements
             1. While(expr test, stmt* body, stmt* orelse)
             2. If(expr test, stmt* body, stmt* orelse)
         8. The scope and symbol table
         9. Hooking up the symbol table
        10. The following statements use symbol table.
             1. Name(identifier id, expr_context ctx)
             2. Assign(expr* targets, expr value)
             3. Call(expr func, expr* args, keyword* keywords)
             4. FunctionDef(identifier name, arguments args, stmt*
                body, expr* decorator_list, expr? returns, string?
                type_comment)
             5. Import(alias* names)
        11. Arithmetic Expressions
             1. Expr(expr value)
     3. A Complete Example

Important: Pyodide takes time to initialize. Initialization
completion is indicated by a red border around Run all button.

Run all
I had previously discussed how one can implement a programming
language using big step semantics. In this post, I want to so
something similar. Here, we implement a meta-circular interpreter
over Python. The code is available here.

A meta-circular interpreter is an interpreter for a language that is
written in that language itself. The MCI can implement a subset or
superset of the host language.

Uses of a Meta Circular Interpreter

Why should one want to write a meta-circular interpreter? Writing
such an interpreter gives you a large amount of control over how the
code is executed. Further, it also gives you a way to track the
execution as it proceeds. Using the same machinery as the
meta-circular interpreter, one can:

  * Write a concolic interpreter that tracks the concrete execution
  * Extract coverage
  * Extract the control flow graph, and execution path through the
    graph
  * Extract the call graph of execution
  * Write a symbolic execution engine
  * Write a taint tracker that will not be confused by indirect
    control flow
  * Extend the host language with new features
  * Reduce the capabilities exposed by the host language (e.g. no
    system calls).

I will be showing how to do these things in the upcoming posts.

First, we import everything we need.

[import ast                              ]
[import textwrap                         ]
[import builtins                         ]
[from functools import reduce            ]

The basic idea is to make use of the Python infrastructure as much as
possible. That is, we do not want to implement things that are not
related to the actual interpretation. Hence, we use the Python
parsing infrastructure exposed by the ast module that parses Python
source files, and returns back the AST. AST (Abstract Syntax Tree) as
the name indicates is a data structure in the format of a tree.

Once we have the AST, we simply walk the tree, and interpret the
statements as we find them.

The meta-circular-interpreter base

The walk() method is at the heart of our interpreter. Given the AST,
It iterates through the statements, and evaluates each by invoking
the corresponding method. If the method is not implemented, it raises
a SynErr which is derived from SyntaxError.

[class SynErr(SyntaxError): pass         ]
[                                        ]
[class PyMCInterpreter:                  ]
[    def walk(self, node):               ]

We provide eval() which converts a given string to its AST, and calls
walk(). It is possible to write a parser of our own, as I have shown
before which can get us the AST. However, as I mentioned earlier, we
use the Python infrastructure where possible.

[class PyMCInterpreter(PyMCInterpreter): ]
[    def parse(self, src):               ]
[        return ast.parse(src)           ]
[                                        ]

The semantics class

The methods that correspond to particular AST elements may be
interpreted differently based on what semantics we are interested in.
So, we define a new class for semantics..

[class PySemantics(PyMCInterpreter): pass]
[                                        ]
[                                        ]
[                                        ]

Modules

We first define modules because every other statement is enclosed in
a module when we use parse.

Module(stmt* body)

The complete AST is wrapped in a Module statement.

[class PySemantics(PySemantics):         ]
[    def on_module(self, node):          ]
[        # return value of module is the ]
[        res = None                      ]

Example

An empty module with a comment.

[expr = PySemantics()                    ]
[print(expr.eval('#a'))                  ]
[                                        ]
[                                        ]

The expressions

An expression is implemented as follows

Expr(expr value)

[class PySemantics(PySemantics):         ]
[    def on_expr(self, node):            ]
[        return self.walk(node.value)    ]
[                                        ]

The Pythonic data structures.

We need to define data. For the primitive data types, we only
implement string and number for now. These are trivial as it is a
direct translation of the AST values.

Constant(constant value, string? kind)

[class PySemantics(PySemantics):         ]
[    def on_constant(self, node):        ]
[        return node.value               ]
[                                        ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('"s"'))                 ]
[                                        ]
[                                        ]

Number(object n)

[class PySemantics(PySemantics):         ]
[    def on_num(self, node):             ]
[        return node.n                   ]
[                                        ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('10'))                  ]
[                                        ]
[                                        ]

Containers

Essentially, we want to be able to make use of all pythonic container
data structures such as lists, tuples, sets and dictionaries. For
demonstration, however, we have implemented only list and tuple.

Unlike the primitives, the containers may be defined such that the
values inside them are result of evaluating some expression. Hence,
we first walk() the elements, and add the results.

List(elts)

[class PySemantics(PySemantics):         ]
[    def on_list(self, node):            ]
[        res = []                        ]
[        for p in node.elts:             ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('[0,1]'))               ]
[                                        ]
[                                        ]

Tuple(elts)

[class PySemantics(PySemantics):         ]
[    def on_tuple(self, node):           ]
[        res = []                        ]
[        for p in node.elts:             ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('(0,1)'))               ]
[                                        ]
[                                        ]

Containers provide the ability to access their contained items via
Subscript.

Subscript(expr value, slice slice, expr_context ctx)

The tuple and list provide a means to access its elements via
subscript. The subscript requires a special Index value as input,
which is also defined below.

[class PySemantics(PySemantics):         ]
[    def on_subscript(self, node):       ]
[        value = self.walk(node.value)   ]
[        slic = self.walk(node.slice)    ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('[1,2,3,4][3]'))        ]
[                                        ]
[                                        ]

Attribute(expr value, identifier attr, expr_context ctx)

Similar to subscript for arrays, objects provide attribute access.
Attributes require symbol tables. Hence, we do not provide an example
here.

[class PySemantics(PySemantics):         ]
[    def on_attribute(self, node):       ]
[        obj = self.walk(node.value)     ]
[        attr = node.attr                ]

Simple control flow statements

The return, break and continue are implemented as exceptions.

[class Return(Exception):                ]
[    def __init__(self, val): self.__dict]
[                                        ]
[class Break(Exception):                 ]

Implementation

[class PySemantics(PySemantics):         ]
[    def on_return(self, node):          ]
[        raise Return(self.walk(node.valu]
[                                        ]

Example

[expr = PySemantics()                    ]
[print(expr.eval('pass'))                ]
[                                        ]
[                                        ]

The difference between break and continue is in how they are handled
in the loop statemens as in While below. The return is handled in the
Call part.

Major control flow statements

Only basic loops and conditionals - while() and if() are implemented.

While(expr test, stmt* body, stmt* orelse)

Implementing the While loop is fairly straight forward. The
while.body is a list of statements that need to be interpreted if the
while.test is True. The break and continue statements provide a way
to either stop the execution or to restart it.

As can be seen, these are statements rather than expressions, which
means that their return value is not important. Hence, we do not
return anything.

[class PySemantics(PySemantics):         ]
[    def on_while(self, node):           ]
[        while self.walk(node.test):     ]
[            try:                        ]

Example

[expr = PySemantics()                    ]
[print(expr.eval("""while 1: break"""))  ]
[                                        ]
[                                        ]

If(expr test, stmt* body, stmt* orelse)

The If statement is similar to While. We check if.test and if True,
execute the if.body. If False, we execute the if.orelse.

[class PySemantics(PySemantics):         ]
[    def on_if(self, node):              ]
[        v = self.walk(node.test)        ]
[        body = node.body if v else node.]

Example

[expr = PySemantics()                    ]
[print(expr.eval("""if 1: 100"""))       ]
[                                        ]
[                                        ]

The scope and symbol table

Now we come to a slightly more complex part. We want to define a
symbol table. The reason this is complicated is that the symbol table
interacts with the scope, which is a nested data structrue, and we
need to provide a way to look up symbols in enclosing scopes. We have
a choice to make here. Essentially, what variables do the calling
program have access to? Historically, the most common conventions are
lexical and dynamic scoping. The most intuitive is the lexical
scoping convention. Hence, we implement lexical scoping, but with a
restriction: If we modify a variable in parent scopes, then the new
variable is created in current scope.

[class Scope:                            ]
[    def __init__(self, parent=None, tabl]
[        self.table = table              ]
[        self.children = []              ]

Hooking up the symbol table

We allow the user to load a pre-defined symbol table. We have a
choice to make here. Should we allow access to the Python default
symbol table? and if we do, what should form the root? The Python
symbol table or what the user supplied?

Here, we assume that the default Python symbol table is the root.

[class PySemantics(PySemantics):         ]
[    def __init__(self, symtable, args): ]
[                                        ]
[        self.symtable = self.create_scop]

The following statements use symbol table.

Name(identifier id, expr_context ctx)

Retrieving a referenced symbol is simple enough.

[class PySemantics(PySemantics):         ]
[    def on_name(self, node):            ]
[        return self.symtable[node.id]   ]
[                                        ]

Example

[expr = PySemantics({'a': 1}, [])        ]
[print(expr.eval("a"))                   ]
[                                        ]
[                                        ]

Assign(expr* targets, expr value)

Python allows multi-target assignments. The problem is that, the type
of the value received may be different based on whether the statement
is multi-target or single-target. Hence, we split both kinds.

[class PySemantics(PySemantics):         ]
[    def on_assign(self, node):          ]
[        value = self.walk(node.value)   ]
[        tgts = [t.id for t in node.targe]

Example

[symtbl = {}                             ]
[expr = PySemantics(symtbl, [])          ]
[expr.eval("a=101")                      ]
[print(symtbl)                           ]

Call(expr func, expr* args, keyword* keywords)

During function calls, we need to make sure that the functions that
are implemented in C are proxied directly.

For others, we want to correctly bind the arguments and create a new
scope. The big question is how should the scopes be nested. We use
lexical scopes. So, we recover the symbol table used at the time of
definition, use it for the call, and reset it back to the current one
after the call.

Note that we handle the return exception here.

[class PySemantics(PySemantics):         ]
[    def on_call(self, node):            ]
[        func = self.walk(node.func)     ]
[        args = [self.walk(a) for a in no]

Example

[expr = PySemantics({}, [])              ]
[print(expr.eval("len([1,2,3])"))        ]
[                                        ]
[                                        ]

FunctionDef(identifier name, arguments args, stmt* body, expr*
decorator_list, expr? returns, string? type_comment)

The function definition itself is quite simple. We simply update the
symbol table with the given values. Note that because we implement
lexical scoping, we have to maintain the scoping references during
creation.

[class PySemantics(PySemantics):         ]
[    def on_functiondef(self, node):     ]
[        fname = node.name               ]
[        args = node.args                ]

Example

[s = """                                 ]
[def a(b):                               ]
[    return b                            ]
[a(1)                                    ]

Import(alias* names)

Import is similar to a definition except that we want to update the
symbol table with predefined values.

[class PySemantics(PySemantics):         ]
[    def on_import(self, node):          ]
[        for im in node.names:           ]
[            if im.name == 'sys': continu]

Arithmetic Expressions

The arithmetic expressions are proxied directly to corresponding
Python operators.

Expr(expr value)

[UnaryOP = {                             ]
[          ast.Invert: lambda a: ~a,     ]
[          ast.Not: lambda a: not a,     ]
[          ast.UAdd: lambda a: +a,       ]

Example

[s = """\                                ]
[def triangle(a, b, c):                  ]
[    if a == b:                          ]
[        if b == c:                      ]

A Complete Example

[triangle_py = """\                      ]
[import sys                              ]
[def triangle(a, b, c):                  ]
[    if a == b:                          ]

The source code of this notebook is available here

Run all
    keybase 

     

dblp   research gate