Notes
2025/06/01

RC week 2: pattern matching, VM instruction set

Another week has flown by at the Recurse Center, where I managed to tie up a couple of loose threads related to Kombucha, the programming language I'm building.

Kombucha is now in a state where I can use it to write programs with a syntax close to what I'm eventually envisioning. This requires a bit of bootstrapping, as Kombucha's core language does not provide any operator for deep equality, let alone anything resembling pattern matching. This is by design, however, because one of the goals is to be extensible enough that such language constructs can be added in the language using macros.

Bootstrapping pattern matching

That's what I did this week, going from a level of abstraction similar to lambda calculus to pattern matching with unification. Here's the end result:

match: Pair(Foo, Bar) with: [
    Pair('x, 'x) -> { Twice(x, x) }
    Pair('x, Foo) -> { SecondIsFoo }
    Pair('x, 'y) -> { y }
    '_ -> { throw!(InvalidPair) }
]

The prelude (the small set of functions that are by default included in every Kombucha program) now defines the two functions match-with (which can be called using a Smalltalk-like keyword call syntax as match: ... with: ...) and -> (which expects a pattern and a body and returns a function that takes a value and executes the body if the pattern matches the value).

Bytecode instruction format

I settled on an instruction set for Kombucha's stack-based bytecode VM. It's definitely not the most efficient one, but simple to implement, which is more important for my purposes. There are currently 11 instructions (with plans to stick to a maximum of 16 instructions in the long term) with variable lengths ranging from 1 to 4 bytes:

| NAME       | BYTE 1   | BYTE 2   | BYTE 3   | BYTE 4   |
+------------+----------+----------+----------+----------+
| LOAD_FN    | TAG | --FN_INDEX-------------- | --VARS-- |
| LOAD_STR   | TAG | --STRING_INDEX---------- |
| LOAD_EFF   | TAG | --STRING_INDEX---------- |
| LOAD_VAR   | TAG | --VAR_INDEX-- |
| APPLY_PRE  | TAG | -- |
| APPLY_POST | TAG | -- |
| RETURN     | TAG | -- |
| CMP        | TAG | -- |
| UNPACK     | TAG | -- |
| TRY        | TAG | -- |
| UNWIND     | TAG | -- |

Here's what the instructions do:

Fixed-point combinator + recursion

I got rid of a special instruction for recursion. There is still a recursion combinator in the intermediate (lambda-like) calculus, but the combinator is then compiled into a simple fixed-point function using the following code, which is guaranteed to be placed at offset 0 in the list of operators:

// fix = f => x => f(fix(f))(x)
let mut bytecode = vec![
    // 0: f => ...
    Op::LoadFn { code: 2, fvars: 1 },
    Op::Return,
    // 2: ... x => f(fix(f))(x)
    Op::LoadVar(1),                   // f
    Op::LoadFn { code: 0, fvars: 0 }, // f, fix
    Op::LoadVar(1),                   // f, fix, f
    Op::ApplyPrefix,                  // f, fix(f)
    Op::ApplyPrefix,                  // f(fix(f))
    Op::LoadVar(0),                   // f(fix(f)), x
    Op::ApplyPrefix,                  // f(fix(f))(x)
    Op::Return,
];

Next steps

Next week I want to...

We'll see how it goes.