Notes
2025/12/29

Flat bytecode as lists and arrays

As outlined in a note on bytecode as a flat AST, bytecode can be seen as a flat representation of tree-like data. A list of cons cells of the form (x, (y, (z, ()))) could be represented as the following flat bytecode:

x y z () CONS CONS CONS

When the bytecode is evaluated, we can either view it as a sequence of stack operations that manipulate a separate value representation (such as boxed cons lists), or we can view the bytecode as the value representation itself.

Bytecode as lists

However, using bytecode as the value representation has the disadvantage that we need to traverse the list from the end whenever we want to access an element of the list. Since x, y, or z could be represented as bytecode sequences of varying length, we need to read and traverse these child elements. In other words, we cannot easily skip over elements.

Such an approach is even worse in terms of time complexity than normal cons lists, because not only is accessing a random element O(n) in the size of the list, but each element has to be fully traversed as well. Is there a better bytecode representation? One that would give us O(1) array-like random access?

Bytecode as arrays

The easiest solution is to pad each element so that all elements have the same size and then store both the number of items and the size of individual items:

// n = number of items
// m = max size of item
x y z () CONS<n, m>

Since a CONS instruction can now cons together multiple elements, we could additionally make the empty list () implicit and treat a CONS of 0 elements as the empty list:

CONS<0, m>       // ()
x y z CONS<3, m> // (x, (y, (z, ())))

Could this be a viable runtime representation for a bytecode evaluator? At the very least, it sounds like an idea worth exploring to see where it breaks. At best, it might result in bytecode that supports efficient random access traversal.