Notes
2025/06/21

RC week 5: hot reload with replay semantics?

For the past week at the Recurse Center I've been thinking about how to implement hot reload as part of a programming language in such a way that the language can guarantee the absence of stale state.

Hot reload

Before talking about why stale state might be a problem (and how to fix it), let's talk about why a language should support hot reload on the language level as opposed to hot reload for specific libraries, such as web frameworks or game engines. After all, this is the main way that hot reload is used in mainstream programming languages today and it is much easier to get hot reload right when you know more about the specific context where it's used.

The simple answer is: The language that I'm building is aimed at rapid iteration and explorative programming, but it's also meant to be used together with Rust. It is a symbiotic language that does not provide any IO itself, but uses effects and effect handlers to call out to a Rust host, which then handles IO. Having to recompile an entire program every time a small change is made (which requires recompiling the Rust wrapper that is providing the effect handlers for the language) would instantly kill the idea of explorative programming with rapid iteration speed.

More importantly, I want to support the feeling of Lisp-like REPL-driven development without some of the downsides, which brings us to the problem of stale state.

Stale state and restarts

In all the languages with the ability to hot reload code using a REPL or an image (the classic examples being Lisp and Smalltalk), the source of truth isn't the source code, it's the live environment. In other words, it is possible to define a function, call it and store some of its results somewhere, then change the function defintion in such a way that it wouldn't be possible to generate the same state again. The function has been changed, but the state that it previously produced still lives on. Other code might depend on this state, which can result in a situation where some well-defined state cannot be reached again if the whole program is restarted.

A (live) state that wouldn't be reachable by restarting the program (no matter how the program interacts with the outside world) is stale, it is an observable effect of reloading (and thus changing) some code.

If a live system reaches a stale state, the best thing we can do (if we want to avoid the disadvantages that might result from stale state) is to restart the parts of the program that have caused the stale state. In the most extreme case (and in the absence of a sufficiently smart compiler that could just “patch up” stale state) this might be equivalent to restarting the entire program.

Replay semantics

A language that supports hot reload but prohibits stale state supports what I call replay semantics:A language with replay semantics guarantees that the live state of a program after a hot reload is the same state that would have resulted if the program had been restarted and replayed with the new code from the beginning (ignoring any differences that resulted from side effects).

The last part about side effects is important to ensure that replay semantics remain possible in practice: If a program reads from a file and the contents of the file change either between reloads or between running the program the first time and restarting it later, there is no way to “roll back” the world and get the previous file contents back. Replay semantics do not guarantee that restarting a program will always lead to the same results, they only guarantee that a difference in program behavior cannot have been caused by old state left over after hot reloading.

A little bit more formally, replay semantics guarantee that if a program (or fragment of a program) p_0 is first executed in the presence of a sequence of side effects e_0...e_m producing the state s_0, then after hot reloading a new version p_1 and continuing to run the program in the presence of a sequence of effects e_m+1...e_n producing the state s_1 this state is identical to the state s_2 that would have resulted from running the program p_1 in the presence of the sequence of effects e_0...e_n, for all possible sequences of effects.

A chain of modules

The previous paragraphs have talked about a single program fragment, but in larger systems there will usually be a graph of module dependencies which could result in a chain of restarts being triggered by hot reloading a module. After all, if a module A is hot reloaded and replay semantics require restarting the module to get rid of stale state, the parent module's state could depend on the child module and might be invalidated after restarting the child, which would then trigger a restart of the parent module and so on.

But how can replay semantics be implemented in practice? After some very productive discussions at the Recurse Center (especially with Florian and Cyrene), it seems that there are 3 different approaches to tackle replay semantics, by either caching all effects, by rerunning modules to check state differences, or by doing some form of liveness analysis. Let's look at them in turn.

Caching effects

Given that replay semantics come down to comparing the current state after hot reloading with the (hypothetical) state that would have resulted if the new version had been running from the beginning, the simplest approach would be to just cache all side effects of the program, so that it becomes possible to literally replay the hot reloaded version of the program with effects “in the past”. This works, but is almost certainly prohibitively expensive. Worse, it pushes caching into the invisible layer of the system, instead of exposing it in the language, making it impossible for users of the language to build their own caching systems.

Rerun and compare

Another simple approach would be to just assume that most or at least some effects are being cached in parent modules explicitly by the language user and to just restart and rerun a hot reloaded module and its immediate parent from the start, accepting that this might cause some “old” effects to happen “again”. This would effectively rebuild the state from scratch, then compare the new state with the state from before the reload and trigger restarts of parent modules if the state ends up being different. Since every restart of a parent (or a parent's parent) module would need to rerun all of its descendants, the hot reloaded module might be executed multiple times, which could end up being quite unintuitive.

Having effects happen multiple times doesn't seem ideal. Worse, such an approach in its most trivial form wouldn't even guarantee replay semantics, because a module's state higher up in the chain could depend on the reloaded module, which would potentially require all modules in the dependency chain to be restarting, which would be equivalent to a cold restart.

Liveness analysis

The most promising option seems to be some form of static analysis, which would check a hot reloaded module and its parent to decide (conservatively) whether the parent's state could possible depend on the hot reloaded module. If so, the parent must restart itself and walk up the chain to check its parents, and so on. Such a static analysis would look similar to borrow checking, because a restart of the parent module must be triggered if the output (of the old version) of the hot reloaded module is still being used at the point of the next reload, in other words if the lifetime of the old module could outlive it.

What's next?

It still seems unclear to me what the best option is or whether replay semantics are even worth the considerable effort. Using some form of static analysis seems to be the most appealing but also the most complex option. For now, my plan is to implement a module system without replay semantics first and to explore the approaches mentioned here at a later point.