https://applied-langua.ge/posts/zero-feet.html
Zero Feet: a proposal for a systems-free Lisp
or, I don't know what I've been told, rustaceans done stole Guy
Steele's gold
Table of Contents
* Why it would be desirable
* How to achieve it
+ Implementing method lookup
+ Writing a garbage collector
* Conclusion
Some make the claim that "all languages depend on C";^1 not only is
this statement ill-typed, as implementations perhaps depend on other
languages, and languages rarely do, but the statement would still be
wrong if we correct it. Some will retort by writing parts of an
implementation in another "system language"; more recently a
Lisp-like system language has been proposed (again). But it is very
possible to create a working implementation without a systems
language, and we think it is better than the alternative.
We intend to write an implementation of the Utena programming system,
which is based on the Newspeak object model and Lisp evaluation
semantics, using the proposed techniques. The language also inherits
an object-capability model from Newspeak, which we believe is very
beneficial for a language for writing language implementations; a
"systems" language, as it stands, is a collection of things that
don't fit into a modular language. There shouldn't be one.
Why it would be desirable
We believe an implementation written entirely in the language it
implements has some engineering benefits. The first is that a
programmer does not need to know any other language, in order to
modify any part of their implementation.
Implementations which are written in another language and use an
interpreter also have a very large difference in performance between
code written in the language used for the implementation, and the
language being implemented. A common approach to improving the
performance of the implementation is to write more modules in the
host language; for example, dictionaries and sets in CPython are
written in C, rather than Python. Even if a programmer has identified
a data structure with better performance in theory^2, finding a
performance improvement may be very difficult, because the
performance of individual operations is worse, regardless of the
complexity of particular algorithms.
This form of optimisation can also lead to security bugs: errors in
the implemented language are checked and handled, whereas the
implementation language may be unsafe by default. The forementioned
performance hacks would require more code to be written in the
implementation language, exacerbating the problem and leading to more
ways the code can fail, and less of the implementation being
approachable to a programmer who is not familiar with the host
language.
When it is necessary to use unsafe operations, we may introduce them
to a high-level and safe language in a gradual manner, and all other
operations can remain safe. The Jikes RVM, a Java virtual machine
implemented in Java, includes "magic" operations for this purpose.
Newer "system" languages, such as Rust, also allow for marking code
as "unsafe", with any other code being safe, but we can provide very
fine-grained control over magic using the object-capability model.
The object-capability model also allows for replacing magic
operations with simulations. If an implementation is implemented in
two languages (say, Lisp and C), some information invariably gets
duplicated between Lisp code and C code. The genesis stage of
bootstrapping SBCL manages this duplication brilliantly; some Lisp
code run during bootstrapping generates C headers describing
constants and structures, so that constants and object layouts only
need to be defined once in Lisp. But more complex algorithms cannot
be generated in the same way. As SBCL is designed to be bootstrapped
from any Common Lisp implementation, genesis is able to dump out a
"native" image from just Common Lisp, which requires a Lisp
implementation of the heap allocator to be used during bootstrapping,
whereas the allocator in the C runtime is used after bootstrapping.
The allocator implemented in genesis does differ to the C allocator,
as the former is designed to pack more objects into pages than the
latter, but essentially the same algorithm is implemented twice.
As all modules are implicitly parameterised and all dependencies can
be replaced transparently, making the allocator manipulate a
simulated heap rather than a real heap just requires instantiating
the allocator with a module that affects a simulated memory, rather
than a module that affects raw memory. Other parts of the runtime
could be tested in a similar way, by having operations on raw memory
instead modify a simulated memory, and these parts could be debugged
from the comfort of the usual debugging tools used by the user,
rather than a separate debugger like GDB, or a "low-level" debugger
like ldb in SBCL.
This technique is used in the Jikes RVM: the memory management can be
"rehosted" to target a simulated heap, by using a different
implementation of the memory interface. However, the technique is
used for debugging rather than bootstrapping (which, granted, isn't
too important) and replacing modules can be performed transparently
with the module system inherited from Newspeak, whereas it cannot
with the Java module system. The implementation of Squeak can also be
debugged inside a Smalltalk environment, because the implementation
is written in a subset of Smalltalk (which can "conveniently" also be
translated to C with little effort), and thus is a perfectly fine
Smalltalk program.
How to achieve it
There are at least three problems that could cause such
implementation endeavours to go wrong. Fortunately, there are
solutions that are not too complex, which are interesting in that
they are already nice to have for improving the performance of user
code, and don't require too much implementation complexity.
The first problem is that, obviously, we must have a compiler, as we
cannot produce a self-hosted implementation with only interpreters. A
processor has to run someone's machine code. But the compiler does
not have to be too sophisticated; it just has to emit something for
the processor to interpret. The compiler can be similar to the
compiler we proposed in I don't want to go to Chel-C, which merely
replaces each bytecode instruction with a set of native instructions.
Implementing method lookup
A more interesting problem is that, if almost every construct in the
language is a message send, and all of the implementation is written
in the language, handling a message send may cause infinite
recursion. The Art of the Metaobject Protocol addresses several
similar issues in the Common Lisp Object System and the proposed
meta-object protocol, in the appendix "Living with Circularity": for
example, attempting to access a slot of an object requires looking up
the appropriate slot definition in the class of the object, so that
the location of the slot in the object may be determined. The issue
is that a slot definition is also an object, and information in slot
definitions is also stored in slots, thus the same lookup is required
in order to retrieve information from the slot definition, causing
infinite recursion. The appendix suggests hard-coding tests for base
cases to break the infinite recursion, which will preserve the
semantics of the object system, but is somewhat displeasing to read;
and more importantly, it cannot be implemented in a language where
all control flow is also comprised of message sends.
The compiler could inline enough message sends to allow the base case
to be checked without performing another message send, but this would
require a fairly complex compiler. Robert Strandh proposed the
technique of satiation to allow the implementation to automatically
compute the base case. The technique involves installing the methods
which are required to break infinite recursion into caches, so that
no further computation needs to be done while calling the associated
generic functions. The use of inline caches can improve the
performance of object-oriented languages, by caching the method which
should be used when a specific class of the receiver is observed.
Similar to Strandh's technique, we could install the methods for
various base cases into inline caches, avoiding infinite recursion.
Writing a garbage collector
A similar infinite regress problem is implementing a garbage
collector in a language which typically requires a garbage collector
to run. The implementation could similarly use escape analysis to
avoid allocating to the heap, but the analysis could be complicated,
and would likely not achieve much without sufficient inlining or
inter-method analysis.
Another solution is to use the lazy allocation model proposed by
Henry Baker. The implementation would attempt to allocate all objects
on a stack, but moves objects to the heap when an assignment is
performed, which would cause older stack frames to reference newer
stack frames, or the heap to reference the stack. As all objects in
the heap are older than stack frames used by the garbage collector,
it should be possible to implement a garbage collector which does not
cause any heap allocation, and any garbage generated during
collection can be collected by stack allocation. This is most
important when control flow is expressed with message sends and
closures; closures would be garbage, but the amount of garbage is
constant if the callee frame on the stack used for allocation is
popped just before performing another loop iteration. Resetting the
allocation stack after every loop iteration has been proposed for
Java, but as all loops are implemented using tail-recursion^3 in
Utena, it suffices to pop the stack just before performing a
tail-call.
As mentioned prior, a subsequent problem is that it may be useful to
prove that the garbage collector cannot ever cause heap allocation by
accident. This may be ensured by using another language without
garbage collection, but it may also be ensured using a separate
static analysis tool for the same language. The latter approach is
used in the Go runtime, which has a garbage collector written in Go.
The compiler can be made to log when code for heap allocation is
generated, potentially proving the collector does not perform heap
allocation. We could implement this analysis in a separate program,
similar to Gilad Bracha's conception of pluggable types, but this
analysis would not be different to an escape analysis pass in a
compiler, reducing the appeal of lazy allocation as an implementation
technique. But the proof is not necessary to bootstrap a working
implementation, still,^4 and lazy allocation can produce less
conservative results than escape analysis, as the former reacts to
what actually happens at runtime, whereas the latter has to consider
everything that could happen ahead of time.
Conclusion
We have proposed some reasons and techniques for implementing a
high-level language in itself. It is interesting that these
techniques are not new, and would generally appear to be good ideas
for performance, as well as being useful for bootstrapping.
More recently the mysticism of what a "systems" language is has been
demarcated to a definition that it should be "low-level enough to
bootstrap a runtime". The conceptual question "What is a systems
language?" has then changed into the question "What systems?" With
"systems language" one looks for the concept, with "systems" there is
no longer any question at all; the question answers itself.
Footnotes:
^1
An operating system, or some other "bare metal" code could be
produced using similar techniques, with a high-level language. Thus
"obvious" statements like "an operating system cannot be written in
Python" are much less obvious.
^2
There is already the sense that the "theoretical" big-O complexity of
algorithms may be less relevant, when the "constant factors" of
performance of operations vary substantially, and the size of data is
bounded to some extent. But we mean specifically that interpretation
substantially slows down every operation, violating even the model
the programmer has of constant factors associated with operations, be
it of latency or processor instructions or something else.
^3
Something like
(defun %while (test body)
(cond
((funcall test)
(funcall body)
(%while test body))
(t nil)))
(defmacro while (test &body body)
`(%while (lambda () ,test) (lambda () ,@body)))
^4
One would be hard-pressed to prove that parts of an implementation
written in C or C++ are safe, after all. Another interesting
comparison is that the mrustc Rust implementation cannot perform
borrow checking, but will still compile valid programs correctly.