Notes

RC Week 2:
Pattern Matching, VM Instruction Set

Another week has flown by at the Recurse Center, where I managed to tie up a couple of loose threads related to Vorpal, the programming language that I'm building.

Vorpal is now in a state where I can use it to write programs with a syntax close to what I'm eventually envisioning. This requires a bit of bootstrapping, as Vorpal's core language does not provide any operator for deep equality, let alone anything resembling pattern matching. This is by design, however, because one of the goals is to be extensible enough that such language constructs can be added in the language using macros.

Bootstrapping Pattern Matching

That's what I did this week, going from a level of abstraction similar to lambda calculus to pattern matching with unification. Here's the end result:

match: Pair(Foo, Bar) with: [
    Pair('x, 'x) -> { Twice(x, x) }
    Pair('x, Foo) -> { SecondIsFoo }
    Pair('x, 'y) -> { y }
    '_ -> { throw!(InvalidPair) }
]

The prelude (the small set of functions that are by default included in every Vorpal program) now defines the two functions “match-with” (which can be called using a Smalltalk-like keyword call syntax as “match: ... with: ...”) and “->” (which expects a pattern and a body and returns a function that takes a value and executes the body if the pattern matches the value).

Bytecode Instruction Format

I settled on an instruction set for Vorpal's stack-based bytecode VM. It's definitely not the most efficient one, but simple to implement, which is more important for my purposes. There are currently 11 instructions (with plans to stick to a maximum of 16 instructions in the long term) with variable lengths ranging from 1 to 4 bytes:

| NAME       | BYTE 1   | BYTE 2   | BYTE 3   | BYTE 4   |
+------------+----------+----------+----------+----------+
| LOAD_FN    | TAG | --FN_INDEX-------------- | --VARS-- |
| LOAD_STR   | TAG | --STRING_INDEX---------- |
| LOAD_EFF   | TAG | --STRING_INDEX---------- |
| LOAD_VAR   | TAG | --VAR_INDEX-- |
| APPLY_PRE  | TAG | -- |
| APPLY_POST | TAG | -- |
| RETURN     | TAG | -- |
| CMP        | TAG | -- |
| UNPACK     | TAG | -- |
| TRY        | TAG | -- |
| UNWIND     | TAG | -- |

Here's what the instructions do:

  1. LOAD_FN looks up a function in the bytecode stream (using a 20 bit FN_INDEX), associates it with the number of variables (VARS, up to 255) that the function might capture as a closure, then pushes the function onto the intermediate stack.
  2. LOAD_STR looks up a string in the string section of the bytecode (using a 20 bit STRING_INDEX) and pushes it onto the intermediate stack.
  3. LOAD_EFF looks up the name of an effect in the string section of the bytecode (using a 20 bit STRING_INDEX) and pushes it onto the intermediate stack.
  4. LOAD_VAR clones a variable from the variable stack (using a 12 bit index) and pushes it onto the intermediate stack.
  5. APPLY_PRE pops an argument from the intermediate stack, pops a function from the intermediate stack, then applies the function to the argument.
  6. APPLY_POST pops a function from the intermediate stack, pops an argument from the intermediate stack, then applies the function to the argument.
  7. RETURN marks the end of the current function, pops the current call frame and jumps to the return address stored there.
  8. CMP pops an if-false branch, pops an if-true branch, pops two values (all from the intermediate stack) and then applies if-true to the nil value if the two values are the same strings, otherwise applies if-false to the nil value. (This acts as a simple if-then-else without deep equality.)
  9. UNPACK pops an if-false branch, pops an if-true branch, pops a compound value (all from the intermediate stack), then applies the if-true branch to the compound value split into its last part and everything before it (as two separate arguments), or applies if-false to the nil value if the value is not a compound value.
  10. TRY pops an effect handler, an effect and a function (all from the intermediate stack), pushes the effect handler to the handler stack, then applies the function to the nil value (while the handler is active).
  11. UNWIND pops an effect handler from the handler stack.

Fixed-Point Combinator + Recursion

I got rid of a special instruction for recursion. There is still a recursion combinator in the intermediate (lambda-like) calculus, but the combinator is then compiled into a simple fixed-point function using the following code, which is guaranteed to be placed at offset 0 in the list of operators:

// fix = f => x => f(fix(f))(x)
let mut bytecode = vec![
    // 0: f => ...
    Op::LoadFn { code: 2, fvars: 1 },
    Op::Return,
    // 2: ... x => f(fix(f))(x)
    Op::LoadVar(1),                   // f
    Op::LoadFn { code: 0, fvars: 0 }, // f, fix
    Op::LoadVar(1),                   // f, fix, f
    Op::ApplyPrefix,                  // f, fix(f)
    Op::ApplyPrefix,                  // f(fix(f))
    Op::LoadVar(0),                   // f(fix(f)), x
    Op::ApplyPrefix,                  // f(fix(f))(x)
    Op::Return,
];

Next Steps

Next week I want to...

Want to become a better programmer? Join the Recurse Center!