Notes
2025/07/10

RC week 8: an efficient VM for a functional language

Over the past weeks at the Recurse Center I've increasingly been thinking about how to implement an efficient bytecode VM for Kombucha, a minimal and malleable programming language for symbiotic end-user programming. Kombucha is a strict, functional language (supporting side effects and effect handlers) that is desugared to an intermediate language similar to lambda calculus.

I have tried to leave optimizations for later, but it turns out that the semantics of a functional language are in some regards closely tied to the nature of the instruction set of a functional bytecode VM, which is why it makes sense to put some thought into the instruction set before writing large parts of the standard library.

Tail call optimization

Let's start with a well known example: tail call optimization. Whenever a function call appears in “tail position”, in other words as the last instruction before a RETURN instruction, it is unnecessary to push a new call frame onto the call stack, because the current call frame will be popped by the following RETURN instruction. Instead, the current call frame can be reused as the call frame of the tail call.

This is not just a useful optimization, but something that functional programmers depend on as part of the semantics of their language. Tail call optimization guarantees that recursion can be used to implement loop-like behavior without running out of stack space, which is a guarantee that every functional language worth its salt must provide.

Since tail calls are determined statically (just by looking at the syntactic structure of the code), a compiler could emit a special TAIL_CALL instruction. Alternatively, since a tail call is a function call immediately followed by a RETURN instruction, the VM could also check the next instruction after a function call and implement tail call optimization in case the following instruction is a RETURN.

Multi-argument functional calls

More interestingly, multi-argument function calls pose an interesting problem for functional bytecode VMs, as lambda calculus does not provide “true” multi-argument functions. Instead, functions with multiple arguments are just curried single-argument functions.

A naive VM implementation would turn multi-argument functions into nested closures, allocating heap space for all function arguments except the last one. This is obviously quite inefficient and it would be desirable to avoid heap allocation in cases where a curried multi-argument function is called with enough argument to “saturate” its arity, because in such a case all arguments can be accessed from the stack.

River Dillon Keefer pointed me to Xavier Leroy's work on the ZINC Abstract Machine (slides and dissertation), which is an instruction set for strict functional languages like ML that avoids heap allocation for the case mentioned above.

The basic idea is to “mark” how many arguments are applied to a function (ZINC uses a marker value, but it should also be possible to use a stack of arities) so that whenever a function is called, it can tell whether it needs to return a closure or not: If a marker has been reached (meaning there are no more arguments) but the function expects more arguments, it captures its current environment as a closure. If a marker is reached and the function has all its arguments, it can simply return a value without creating a closure. If a marker has not been reached but the function has all its arguments, whatever the function returns will be applied to the remaining arguments.

Leroy's ZINC (and ML implementations in general) seem to use right-to-left evaluation order of arguments, which is more elegant in terms of implementation but is something that I would like to avoid for Kombucha, as I find left-to-right evaluation order more natural in the presence of side effects. As far as I can tell, it should be possible to adapt the basic idea of ZINC to use left-to-right evaluation order though, if one is willing to accept a slightly less elegant/efficient implementation.

Basic data structure accessors

The most fundamental and far-reaching decision, however, is which basic data structure to use and how to access and destructure it. Traditionally, functional languages have used cons lists in one form or another (even if they are built out of algebraic data types such as in ML-like languages), because cons lists make it remarkably easy to share the tail of a list, making them the simplest persistent data structure.

Persistent data structures are great if their sub parts are shared and used in many different places, but they are not especially cache-friendly and are even more wasteful in scenarios where there's just a single owner of the data structure that wants to modify it in-place. In such a scenario, mutable data structures such as (dynamic) arrays fare much better.

An interesting development in this space is Perceus reference counting, which is a clever way to mutate persistent data structures in place if there's just a single owner. Crucially, what the paper calls the “functional but in place (FBIP)” paradigm is a semantic guarantee just like tail call optimization, which allows programmers to reason about code in a way that lets them know where and when persistent data structures are mutated in place.

The Perceus algorithm rests on the idea that the problem isn't mutation in itself, it's mutation of shared data. This is the same insight that lies at the root of Rust's borrow checker, which allows either mutation (if there's a single reference) or sharing (if there are multiple borrowed references), but never both at the same time. Perceus mutates a data structure in place if there's a single owner (a refcount of 1), otherwise it behaves like other functional languages and shares the data persistently.

Because data can be shared persistently, Perceus still assumes the same data model as other functional languages, in other words, cons lists (or other persistent data structures with a lot of pointers and bad cache locality, similar to cons lists).

An interesting alternative, proposed in the context of imperative languages, are Mutable Value Semantics, or MVS for short. MVS guarantee that if there's a single owner of a data structure, it can be mutated in place, similar to Perceus. But MVS and Perceus differ in how they treat the case of multiple references to the same data structure, because instead of sharing it using persistent data structures like Perceus, MVS has copy-on-write semantics where the entire data structure is copied. This might sound wasteful, but makes it possible to use (dynamic) arrays, which have a better cache locality than the persistent data structures used by Perceus.

I am not aware of any functional language that uses MVS (to be fair, both Perceus and MVS are relatively new approaches), but I see no reason why MVS would be tied to imperative languages. At the end of the day, the difference lies merely in what happens in the case of multiple references to the same data. Whether or not Perceus or MVS is a better option seems to depend on how common it is to mutate data in place instead of sharing it: If single-owner mutation is the common case, MVS is likely to be more efficient, because it can use (dynamic) arrays under the hood and doesn't need to do any pointer chasing. If most data is shared, Perceus is like to be more efficient, because it can avoid copying large chunks of data.

Going back to the topic of bytecode instruction sets, Perceus and MVS support different instruction sets, because cons lists and (dynamic) arrays support different access methods: While both data structures support pushing a new element to the end of the structure in O(1) and both support accessing the last element in O(1), only (dynamic) arrays support accessing accessing the first element in O(1) (or any element in the array for that matter).

An instruction set that can rely on MVS instead of Perceus could thus expose both a LAST and a FIRST instruction, whereas a VM built on persistent data structures doesn't have that luxury, which leads to the classic pattern of having to reverse a cons list after accumulating it using tail calls.

Fold considered harmful?

It is still unclear to me whether building a VM for a functional language using MVS is actually a good idea. Such an idea might turn out to involve too much copying, whereas using Perceus seems to be the “safer” choice, as it has been used in Koka and Roc with some success (and appears to be a strict improvement over the usual functional model of using persistent data structures everywhere).

However, one thing I dislike about using cons lists is that especially in a Lisp-like language such as Kombucha, where algebraic data types are built out of a basic list-like data structure, the programming pattern that emerges is very dependent on fold, because all data is accessed in terms of first and rest (to borrow Clojure's names for car and cdr). As Guy Steele points out, this is a fundamentally sequential approach to programming that results in data structures with a long spine (which is also bad for cache locality).

What I find appealing about MVS is that the basic data structure could be (dynamic) arrays. Could that lead to a working instruction set for a functional bytecode VM? Well, let's see!