Notes
2025/06/30
I'm at the half-batch point at the Recurse Center, where I've mostly been focusing on Kombucha, a minimal and malleable programming language for symbiotic end-user programming. But as much as I've enjoyed working on the language, it is not the best project to pair on and so I've been looking for a smaller project that I can perhaps work on more collaboratively. Inspired by the wonderful linguistics group at RC (and a great discussion with Bret), here's the idea:
I want to build a simulation that can model the evolution of a very simple language between communicating agents.
Imagine a 2D-grid, similar to Conway's Game of Life, let's say with a size of 1024 by 1024 cells. However, instead of simple cells with on/off states, each cell is an “agent”, a Turing machine with a finite tape that can react to its environment based on the state the agent is in. Agents can move around on the grid, emit sounds and hear sounds. All agents share the same code, which “mutates” randomly after a certain number of simulation steps. Mutations are rolled back if they don't improve the fitness of the agents.
My hope is that with the right setup (and a lot of simulation steps) the agents might evolve to exhibit some form of very rudimentary communication, perhaps even something that we could describe as language-like.
Just like in Conway's Game of Life, the simulated environment is a grid of cells, which are by default empty. Cells can be inhabited by agents, which can move on the grid in 4 directions (up, down, left, right). Besides moving, agents can emit sounds (one sound per simulation step), which can be heard up to X cells away by other agents. Whenever an agent moves, it also emits a sound.
To create some evolutionary pressure, let's assume that there are 3 different types of agents, which “eat” each other using a rock-paper-scissors dynamic whenever two agents of different types end up on the same square or directly adjacent squares. In other words, there are agents of type A, B, C, with A eating B, B eating C and C eating A. The three types of agents have different programs, but each agent of the same type is executing the same program (which frees us from having to store an agent's program inside the grid, which we can use for state instead). Whenever an agent “eats” an agent of another type, the cell of the agent that was eaten is replaced with a copy of the agent that ate it.
The “fitness” of the various types of agents is judged based on the size of the population. Every X simulation steps, the program of one type of agent is mutated randomly and then executed in a copy of the simulation alongside the simulation of the previous program. After X steps, the “better” program (in terms of fitness) is chosen and the other permanently discarded.
Agents perceive their environment purely through the sounds emitted by other agents. Each type of agent makes a unique sound whenever it moves, which makes it possible for agents to listen for sound emitted by their “prey” or their “predators”. There is no sense of direction, agents can only listen for the presence of a sound, but they won't know which direction it came from. Sound is instantly audible within a fixed radius of cells, but completely inaudible outside of it.
It might make sense to restrict the “use of language” (in other words the ability to emit sound) to just one of the three types of agents. (Another option would be to give each type of agent a distinct set of sounds.)
Since the goal is to evolve agents towards “using language”, each agent needs to have a (finite) memory of what it has heard, so that it can base its decision of how to interpret a sound on previous sounds. One option is to turn each agent into a Turing machine with a finite tape, another option would be to model an agent as a pushdown automaton with a finite tape. Since context-free languages are probably enough, a pushdown automaton seems like a good starting point.
What is the alphabet that each agent uses to push/pop to/from the stack (or write to the tape)? There seem to be a couple of options here, but maybe it's easiest to restrict the stack to “language sounds”, in other words sounds that can be emitted by the type of language-using agent.
Listening for sounds is the equivalent of observing a 0
or 1
in a regular Turing machine: An agent can listen for the bitset of all observable sounds. Assuming that there are 4 sounds (2 sounds emitted by the other types of agents when they move and 2 sounds that the language-using type of agent can choose from when moving), there are 16 possible “environment patterns” for each agent state.
But an agent can also decide what to do based on the top stack element, which adds a further 2 possibilities for a total of 32 “input patterns”.
Each language-using agent has a choice between 4 directions to move (2 bits), which sound to emit (1 bit), whether to pop the current sound (1 bit) and which sound to then push (1 bit). If we allow agents to have 2048 possible states (11 bits), this brings us to a total of 2 + 1 + 1 + 1 + 11 = 2 bytes to encode a state change.
A complete program for agents with 2048 states, 32 possible inputs and 2 bytes to encode a state change per input, would then have a size of about 150 kilobytes.
Since the idea is to evolve agents as quickly as possible, a headless program that simulates cells in parallel, for example using a GPU, might be an interesting idea. On the other hand, the whole exercise would be a bit pointless without some sort of viewer that could run the simulation at a slower speed, so a combination of a headless GPU compute shader and a slower viewer could be an interesting idea.
I have no idea whether this would actually result in anything interesting, probably not. At the very least it would probably require a lot of experimentation to find the right environment pressure and the right balance between ease of movement and ease of using and understanding sounds.
But I find the idea of combining (somewhat) “intelligent” agent cells and a grid-based approach interesting and haven't been able to find something similar. There are approaches that simulate individual cells or organisms, but these systems simulate evolutionary pressure at such a low level that it is unlikely that language-like processes would evolve. Instead, I'd be interested in finding a simulation that already provides the structure necessary for language to be useful and which applies evolutionary algorithms at a “higher level”, similar to how human language can evolve much more quickly than through genetic adaptation.