Notes
2025/08/29
This is a proposal for explicit bindings in a programming language, which lead to macros that can be reasoned about statically, by distinguishing variables that are being bound from variables that are being used.
(This note continues the explorations of part 1 and part 2 but is self-contained. There's also a paper that goes into more technical detail.)
Let's assume that we want to build a programming language where every function or control structure can be redefined and reimplemented, including something as fundamental as variable assignment. Our language might have a built-in way of expressing something like a = 5
, but we want to ensure that user-defined functions can do the same, for example by defining a function let
that is used as let(a, 5)
.
One option is to use Lisp-like macros: In a language like Lisp, let
can be a macro, which does not evaluate a
but instead treats it as a symbol, evaluates 5
and then calls the built-in =
function to set the symbol a
to the value 5
. This has two disadvantages:
foo(2 + 2)
, we can no longer be sure that this can be simplified to foo(4)
, because foo could be a macro, which can observe the syntactic difference between 2 + 2
and 4
, even though they might evaluate to the same value. This is why macros must usually be resolved at compile time, before any optimizations can take place.Is there another option? An option that gives us the flexibility of macros (allowing us to define our own control structures), while preserving our ability to reason about a program statically (without first resolving all macros)? Yes, if we distinguish variables that are being bound from variables that are being used. Here's our guiding principle:
Every construct in the language can be redefined, there are no privileged language constructs. But the scope of variables remains immediately obvious, just by looking at the source code, without evaluating any function or macro.
Let's start with an example that does not distinguish using and binding a variable:
a + b // the variable a is being used
a = b // the variable a is being bound
In the first case, a + b
, the variable a is looked up in the current scope, whereas in the second case, a = b
, a is a fresh variable that will be bound in the current scope. That they play such different roles is not visible just by looking at the variables, nor by looking at the operators, because +
and =
are both infix symbols, distinguished only by their name.
Let's distinguish these two cases by making the different roles explicit. Whenever a variable is resolved in the current scope and used, we will continue to just write its name. But whenever a fresh variable is being bound, we prefix it with a colon:
a + b // the variable a is being used
:a = b // the variable a is being bound
This is only half of the solution, however. We still need a way to track the scope of variables syntactically, so that we know which variables-being-used correspond to which variables-being-bound. That's where explicit blocks come in:
Let's make the scope of variables explicit by writing { ... }
for the block where a binding is active. A variable that is bound inside the block goes out of scope at the end of the block.
Since each program is conceptually a block of definitions, we assume that a whole program is implicitly wrapped in { ... }
. Explicit bindings have the effect of binding the symbols prefixed with :
for the remainder of the block:
{
a + b // the variable a is being used
f(a) // f uses the a from the outer scope
:a = b // the variable a is being bound
f(a) // f uses the a that was just bound
}
Without knowing what =
does, we can immediately tell where a variable is bound and where it is used, just by looking at the syntax. We don't know which value the variable a
will have after :a = b
without looking up how =
is implemented, but the scope of variables can be fully tracked statically, even in the presence of macros.
So far we've seen how explicit bindings work within an enclosing scope using the :
prefix. But many control structures need to bind variables in separate scopes, for example pattern matching, where matched variables should be available in specific branches, or function definitions, where parameters should be available in the function body.
For these cases, we use block arguments written as { ... }
, and here's where things get interesting: the distinction between bound and used variables is flipped! When we're defining variables that will be used inside a block argument, we use unmarked names new for bindings and ^
for variables we want to resolve from the outer scope.
Let's look at pattern matching:
match (point) [
(x, y) -> {
// x and y are bound here
x + y
}
(x, ^z) -> {
// x is bound, z comes from outer scope
x + z
}
]
In the pattern (x, y)
, both x
and y
are being bound: they're fresh variables that will be available inside the { ... }
block. But in (x, ^z)
, the ^z
means "use the variable z
from the outer scope" rather than binding a fresh variable.
We can tell that (x, y)
is a pattern just by looking at the syntax. The ->
infix function uses an explicit block { ... }
as an argument and thus behaves like a macro, without us needing to know how ->
is defined. We've now seen how macros are distinguished from regular functions:
f
is called with an explicit binding like :x
as an argument, then f
is considered to be a macro that binds x
in the enclosing scope.f
is called with an explicit block { ... }
as an argument, then f
is considered to be a macro that binds all the variables not explicitly pinned using ^
in the explicit block argument.This dual syntax makes the intent clear: when you see :x
in the enclosing scope, you know a variable is being bound. When you see ^x
in the arguments of a function with an explicit block argument, you know a variable is being used. Everything else follows the natural default for its context.
Here's the key insight that makes this all work: when a function is called with explicit bindings (like :x
) or block arguments (like { ... }
), the system automatically treats it as a macro and wraps the arguments so the function can observe their syntactic structure.
For a call like (:x, :y) = point
, the =
function doesn't just receive the values of x
and y
. Instead it receives a representation that says "this is a binding pattern with two variables named x and y." Meanwhile, point
is evaluated normally and passed as a value. (See the paper for more details on how arguments are wrapped.)
The beauty is that this happens automatically based on the syntax you use. You don't need to declare macros separately, because the presence of binding markers tells the system when syntactic structure should be preserved.
Here's an example that combines both types of bindings. When defining a recursive function, we need the function name to be available both inside its own body (for recursive calls) and in the surrounding scope (so we can call it later):
:factorial(n) = {
if (n == 0) {
1
} else {
n * factorial(n - 1)
}
}
factorial(5)
Here, :factorial(n)
creates a binding in the enclosing scope through the :factorial
syntax (so we can call factorial(5)
afterward), while the { ... }
block receives factorial
and n
as parameters (so we can make the recursive call factorial(n - 1)
).
This works because =
is implemented as a macro that:
:factorial(n)
factorial
in the enclosing scopefactorial
and n
as bindings to the blockWhat we've built is a macro system where the scope and binding behavior is always statically visible, but individual constructs can still be completely redefined. You can implement your own assignment operator, your own pattern matching, your own control flow, all while preserving the compiler's (and your) ability to reason about and optimize your code.
The key insight is that macros only need to observe syntactic structure when you explicitly mark it with binding syntax. Everything else can be evaluated and optimized normally, giving us the best of both worlds: the flexibility of macros with the reasoning power of static analysis.