https://beyondloom.com/blog/lila.html
Lila: a Lil Interpreter in POSIX AWK
AWK is among the most ubiquitous programming languages in the world.
Much like ed, the standard text editor, awk is a mandatory component
of any POSIX operating system. On a fresh-out-of-the-box Mac,
tunneling into the embedded Linux environment on your router, or
scrabbling away at GitBash on a Windows machine, you may not have
access to Python, Perl, or even Tcl, but you can rest assured that
some flavor of awk is already installed. This is a powerful
proposition.
In our modern age, AWK is frequently regarded as a domain-specific
language akin to sed, constrained in its usefulness to batch
processing of semi-structured text files and streams. While many
features of the language are indeed tailored to this ecological
niche, AWK is a fully general-purpose language; AWK provides concise
and familiar control structures, floating-point arithmetic, a small
but flexible toolkit of string manipulation operations, and powerful
associative arrays. I could simply tell you that AWK can do nearly
anything, but perhaps it would be more persuasive to show you?
The Name's Lila
Lila is an implementation of Lil in AWK. Lil is a multi-paradigm
language with a rich set of primitive operators and a
"batteries-included" standard library featuring such goodies as JSON,
XML, CSV and structured binary parsing, first-class database-style
tables with a SQL-like query syntax, and functional niceties like a
REPL and tail-call elimination.
% ./lila.awk
select k:first value v:count value by value from "AABABBBBBBABACBA"
+-----+---+
| k | v |
+-----+---+
| "A" | 6 |
| "B" | 9 |
| "C" | 1 |
+-----+---+
A complete Lil system demonstrates by construction solutions to a
broad range of interesting and practical programming tasks. It also
provides some intriguing possibilities for making the Lil ecosystem
more self-hosting and portable: Lila can subsist anywhere there's an
AWK, even places C compilers and JavaScript interpreters dare not
venture!
In broad strokes, Lila's interpreter closely follows the structure of
the C-based Lil interpreter: It tokenizes and parses source code
strings, assembling them in a single pass into a stream of bytecode
instructions which are executed by a stack-based virtual machine.
Primitive operators and the guts of Lil's "Interface" values are AWK
functions written in terms of boxed "Lil-values" which represent
Lil's core data structures: numbers, strings, lists, dictionaries,
tables, and functions.
Since AWK associative arrays cannot be recursively nested, an AWK
associative array cannot directly represent, say, a Lil dictionary.
Instead, Lila uses a collection of global associative arrays to
represent a heap of Lil-value "structs" keyed by numeric indices.
From the perspective of most of the AWK code that uses Lil-values,
these indices are simply pointers! All the members of Lil-values are
represented as AWK numbers or strings.
Performance Anxiety
Computers, one often hears, are pretty fast. Lila's interpreter,
however, is doing quite a bit of indirection and string acrobatics
with each expression. Is it still usably performant, or are we
looking at speeds rivaled by dead snails in molasses?
An experiment was performed on my 2020 M1 Macbook Air comparing the C
and JS-based reference implementations of Lilt with Lila running
under a variety of AWK implementations: mawk, gawk, goawk, and the
BSD awk which ships with MacOS (v20200816 in this case). As my
benchmarks, I used Mandel.lil, a naive Mandelbrot set renderer
designed to hammer the Lil interpreter and garbage collector, and the
Lil integration test suite:
Interpreter Mandel Tests
c-lilt (release) 0.08s 0.27s
c-lilt (debug) 0.36s 0.79s
js-lilt 0.42s 1.58s
mawk 3.61s 0.95s
gawk 6.58s 2.47s
goawk 10.66s 4.32s
awk 15.09s 162.97s
Of the AWKs tested, mawk makes a truly impressive showing. The
"one-true" AWK fares much poorer, particularly with the integration
suite. I ran a similar experiment on my 9-year old intel-based
Macbook Air which provides awk version 20070501, and- curiously- even
on that far humbler machine I was able to drastically outstrip the M1
with a more "modern" AWK release, completing the integration suite
successfully in about 16 seconds.
Your mileage may vary, but overall I found that even on my slowest
computers and their most ancient AWKs Lila offers a viably
interactive REPL experience.
There aren't many monolithic AWK scripts on the internet of a similar
scale and complexity to lila.awk. If you're the maintainer of an AWK-
or you're thinking of becoming one- Lila might be a useful tool for
investigating performance and correctness.
Conclusion
Hopefully this article has mildly piqued your interest in AWK, Lil,
or their combination. The next time you find yourself drawing in a
Python dependency for simple command-line automation or struggling to
express complex logic in a shell script, consider reaching for AWK.
Sometimes the best tool is the tool you already have.
back