For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered the state machine we use to implement game logic.
We now have all the pieces in place to look at a card shuffling algorithm. Shuffling cards in a game of Mental Poker is one of the key innovations for this type of zero-trust games. We went over the cryptography aspects of shuffling in Part 1.
Let's review the algorithm:
- Alice takes a deck of cards (an array), shuffles the deck, generates a secret key \(K_A\), and encrypts each card with \(K_A\).
- Alice hands the shuffled and encrypted deck to Bob. At this point, Bob doesn't know what order the cards are in (since Alice encrypted the cards in the shuffled deck).
- Bob takes the deck, shuffles it, generates a secret key \(K_B\), and encrypts each card with \(K_B\).
- Bob hands the deck to Alice. At this point, neither Alice nor Bob know what order the cards are in. Alice got the deck back reshuffled and re-encrypted by Bob, so she no longer knows where each card ended up. Bob reshuffled an encrypted deck, so he also doesn't know where each card is.
At this point the cards are shuffled. In order to play, Alice and Bob also need the capability to look at individual cards. In order to enable this, the following steps must happen:
- Alice decrypts the shuffled deck with her secret key \(K_A\). At this point she still doesn't know where each card is, as cards are still encrypted with \(K_B\).
- Alice generates a new set of secret keys, one for each card in the deck. Assuming a 52-card deck, she generates \(K_{A_1} ... K_{A_{52}}\) and encrypts each card in the deck with one of the keys.
- Alice hands the deck of cards to Bob. At this point, each card is encrypted by Bob's key, \(B_K\), and one of Alice's keys, \(K_{A_i}\).
- Bob decrypts the cards using his key \(K_B\). He still doesn't know where each card is, as now the cards are encrypted with Alice's keys.
- Bob generates another set of secret keys, \(K_{B_1} ... K_{B_{52}}\), and encrypts each card in the deck.
- Now each card in the deck is encrypted with a unique key that only Alice knows and a unique key only Bob knows.
If Alice wants to look at a card, she asks Bob for his key for that card. For example, if Alice draws the first card, encrypted with \(K_{A_1}\) and \(K_{B_1}\), she asks Bob for \(K_{B_1}\). If Bob sends her \(K_{B_1}\), she now has both keys to decrypt the card and
lookat it. Bob still can't decrypt it because he doesn't have \(K_{A_1}\).This way, as long as both Alice and Bob agree that one of them is supposed to
seea card, they exchange keys as needed to enable this.
While we covered the algorithm before, we didn't have the infrastructure in place to implement this. We now do.
We'll start by describing our shuffle actions. As we just saw in the above recap, we have 2 steps:
type ShuffleAction1 = BaseAction & { type: "Shuffle1"; deck: string[] };
type ShuffleAction2 = BaseAction & { type: "Shuffle2"; deck: string[] };
We only need to pass around the deck of cards (encrypted or not), so we extend
the BaseAction
type (which includes ClientId
and type
) to pin the type
and add the deck.
We need more data in the context though:
type ShuffleContext = {
clientId: string;
deck: string[];
imFirst: boolean;
keyProvider: KeyProvider;
commonKey?: SRAKeyPair;
privateKeys?: SRAKeyPair[];
};
We need to know our clientId
, whether we are first or second in the turn
order, we need a keyProvider
to generate encryption keys, a commonKey
(that's for the first encryption step) and privateKeys
(for the second
encryption step). We'll use the context later on, when we stich everything
together. Before that, let's look at the basic shuffling functions.
First, we need a function that shuffles an array:
function shuffleArray<T>(arr: T[]): T[] {
let currentIndex = arr.length, randomIndex;
while (currentIndex > 0) {
randomIndex = Math.floor(Math.random() * currentIndex);
currentIndex--;
[arr[currentIndex], arr[randomIndex]] = [arr[randomIndex], arr[currentIndex]];
}
return arr;
};
We won't go into the details of this, as it's a generic shuffling function, not specific to Mental Poker, but a required piece.
Let's look at the two shuffling steps next. First step, in which we shuffle and encrypt all cards with the same key:
async function shuffle1(keyProvider: KeyProvider, deck: string[]): Promise<[SRAKeyPair, string[]]> {
const commonKey = keyProvider.make();
deck = shuffleArray(deck.map((card) => SRA.encryptString(card, commonKey)));
return [commonKey, deck];
};
The shuffle1()
function takes a keyProvider
, a deck
, and returns a
promise of a shuffled deck plus the key used to encrypt it.
The function is pretty straight-forward: we generate a new key, we encrypt each card with it, then we shuffle the deck. We return the key and the now shuffled and encrypted deck.
Both players need to perform the first step, after which both Alice and Bob have encrypted the deck with \(K_A\) and \(K_B\) respectively, so neither knows the order of the cards.
The next step, according to our algorithm, is for each player to decrypt the deck with their key and encrypt each card individually with a unique key:
async function shuffle2(commonKey: SRAKeyPair, keyProvider: KeyProvider, deck: string[]): Promise<[SRAKeyPair[], string[]]> {
const privateKeys: SRAKeyPair[] = [];
deck = deck.map((card) => SRA.decryptString(card, commonKey));
for (let i = 0; i < deck.length; i++) {
privateKeys.push(keyProvider.make());
deck[i] = SRA.encryptString(deck[i], privateKeys[i]);
}
return [privateKeys, deck];
}
shuffle2()
is also fairly straight-forward. It takes the commonKey
from
step 1, a keyProvider
, and the encrypted deck
.
First, it decrypts all cards using the commonKey
(note the cards are still
encrypted by the other player). Next, it uses the keyProvider
to generate
a key for each card, and encrypts each card with the key. The function returns
the private keys generated, and the re-encrypted deck.
We now have all the basics in place. Here's how we put it all together:
Here is the state machine that describes the shuffling steps:
function makeShuffleSequence() {
return sm.sequence([
sm.local(async (queue: IQueue<ShuffleAction1>, context: ShuffleContext) => {
if (!context.imFirst) {
return;
}
[context.commonKey, context.deck] = await shuffle1(context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle1",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction1, context: ShuffleContext) => {
if (action.type !== "Shuffle1") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction1>, context: ShuffleContext) => {
if (context.imFirst) {
return;
}
[context.commonKey, context.deck] = await shuffle1(context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle1",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction1, context: ShuffleContext) => {
if (action.type !== "Shuffle1") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction2>, context: ShuffleContext) => {
if (!context.imFirst) {
return;
}
[context.privateKeys, context.deck] = await shuffle2(context.commonKey!, context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle2",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction2, context: ShuffleContext) => {
if (action.type !== "Shuffle2") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction2>, context: ShuffleContext) => {
if (context.imFirst) {
return;
}
[context.privateKeys, context.deck] = await shuffle2(context.commonKey!, context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle2",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction2, context: ShuffleContext) => {
if (action.type !== "Shuffle2") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
})
]);
}
Note we are limiting this to a 2-player game, though we can easily generalize to more players if needed.
This is a longer function so let's break it down:
shuffle1()
and post the encrypted deck as a Shuffle1
action.Shuffle1
action to arrive - either the one we just posted
(if imFirst
is true
) or incoming from the other player. We store the
encrypted and shuffled deck.shuffle1()
if we are not the first player - if we are not
the first player, then it is our turn to shuffle now. We post another
Shuffle1
action.Shuffle1
action to arrive and update the deck.At this point, both players performed the first step of the shuffle, so the
deck is encrypted with \(K_A\) And \(K_B\) and neither players knows the turn order.
We move on to the second step of the shuffle, where each player calls
shuffle2()
to decrypt the deck and re-encrypt each individual card. Again,
depending on whether we are first or not, we take action or wait:
imFirst
is true
, call shuffle2()
and post a Shuffle2
action.Shuffle2
action and update the deck.imFirst
is not true
, call shuffle2()
and post a Shuffle2
action.Shuffle2
action and update the deck.A helper function to run this state machine given an async queue:
async function shuffle(
clientId: string,
turnOrder: string[],
sharedPrime: bigint,
deck: string[],
actionQueue: IQueue<BaseAction>,
keySize: number = 128 // Key size, defaults to 128 bytes
): Promise<[SRAKeyPair[], string[]]> {
if (turnOrder.length !== 2) {
throw new Error("Shuffle only implemented for exactly two players");
}
const context: ShuffleContext = {
clientId,
deck,
imFirst: clientId === turnOrder[0],
keyProvider: new KeyProvider(sharedPrime, keySize)
};
const shuffleSequence = makeShuffleSequence();
await sm.run(shuffleSequence, actionQueue, context);
return [context.privateKeys!, context.deck];
}
We need our clientId
, the turn order (whether we go first or not), a shared
large prime (to seed other encryption keys), an unshuffled deck, a queue, and,
optionally, a keySize
.
From the input, we create a ShuffleContext
with the required data, then we
generate the state machine by calling the function we discussed previously,
and we run the state machine using the given actionQueue
and generated
context
.
We return the private keys with which we encrypted each individual card, and the shuffled and encrypted deck.
Shuffling a full deck of 52 cards with large enough key sizes gets noticeably slow. Note that we need to generate an encryption key for each card, which involves searching for large prime numbers. The more secure we want the encryption to be, the larger the number of bits we want in the key, the longer it takes to find a key.
This can be mitigated with some loading/progress UI while shuffling. For the
demo discard game
in mental-poker-toolkit
, I used a smaller deck (only cards from 9
to A
)
and a smaller key size (64 bits).
When implementing a game, it might be a good idea to start generating encryption keys asynchronously as soon as possible - note though that the players need to agree on a shared large prime before key generation can begin.
In this post we looked at an implementation of card shuffling.
shuffle1()
and shuffle2()
.The Mental Poker Toolkit is here.
This post covered card shuffling, which is implemented in the primitives
package in shuffle.ts.
For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered actions and an async queue implementation.
In this post, we'll finally look at the infrastructure on top of which we'll model games. The type of games we're considering can all be modeled as state machines^{1}. The challenge is we need a generic enough framework that works for any game, so let's consider what they all have in common.
We can't tell what the exact states of a game are, as they depend on the specific game. But, in general, game play implies transitioning from one state to another.
In some cases, an action originates on our client. For example: we pick between rock, paper, or scissors; we want to draw a card etc. This means we need to run some logic on our client, then send an Action over our transport to other clients.
To keep things generic and unopinionated, the minimal interface for this is a
function that takes an action queue and a context
.
type LocalTransition<TAction extends BaseAction, TContext> = (
actionQueue: IQueue<TAction>,
context: TContext
) => void | Promise<void>;
We covered the queue in the previous post. We need this in a local transition because we will run some code then, in most cases, we'll want to enqueue an action and send it to other players. We'll look at an example of this later on in this post.
The context
can be anything - this enables the game to pass-through whatever
data the function needs. Our state machine implementation doesn't care about
what that data is, this is just the mechanism to make it available to the code
in the function.
The function can return either void
or a Promise<void>
in case it needs to
be async.
In other cases, an action arrives over the transport. This is an action that was sent either by another player, or by us and we receive it back from the server after it has been sequenced^{2}.
In this case, our interface is a function that takes the incoming Action
and
a context.
type Transition<TAction extends BaseAction, TContext> = (
action: TAction,
context: TContext
) => void | Promise<void>;
In this case, we don't necessarily need access to the queue, since we won't
enqueue an action, rather we're processing one. The context
is, again, up to
the consumer of this API.
The function similarly returns void
or a Promise<void>
in case it needs to
be async.
Finally, we need an abstraction over both LocalTransition
and Transition
so
when we specify our state machine we can treat them the same way. We'll use
RunnableTransition
for this:
type RunnableTransition<TContext> = {
actionQueue: IQueue<BaseAction>,
context: TContext
}: Promise<void>;
We expect users of our library to write code in terms of local transitions
(LocalTransition
) and remote transitions (Transition
). This type is meant
to be used internally. Note we are doing some type erasure here as we're going
from a generic IQueue
to a IQueue<BaseAction>
. That's because we need to
work with the queue in our library code, but the exact Action
types depend on
the game.
For local transitions, we simply pass through the actionQueue
. For remote
transitions, we dequeue an action and pass that. We'll see how to do this next.
We're also normalizing return to be Promise<void>
regardless of whether the
transition function originally returned void
or Promise<void>
.
Our state machine is implemented as a set of functions. First, we have a few
factory functions. local()
creates a RunnableTransition
from a
LocalTransition
:
function local<TAction extends BaseAction, TContext>(
transition: LocalTransition<TAction, TContext>
): RunnableTransition<TContext> {
return async (queue: IQueue<BaseAction>, context: TContext) =>
await Promise.resolve(
transition(queue as IQueue<TAction>, context)
);
}
We call Promise.resolve()
to get a Promise
regardless of whether the given
transition
is a synchronous or asynchronous function.
remote()
converts a remote transition into a RunnableTransition
:
function transition<TAction extends BaseAction, TContext>(
transition: Transition<TAction, TContext>
): RunnableTransition<TContext> {
return async (queue: IQueue<BaseAction>, context: TContext) => {
const action = await queue.dequeue();
await Promise.resolve(transition(action as TAction, context));
};
}
Here, we dequeue an action, then pass it to the given transition.
In many cases, we expect multiple players to take the same action, for example
each player picks between rock, paper, or scissors - in this case, we will
expect one remote action coming in from each player (including us), of the same
type. Most times we want to treat these actions the same way, which means we
want to run the same Transition
function for each. The repeat()
function
takes a RunnableTransition
and repeats it a given number of times:
function repeat<TContext>(
transition: RunnableTransition<TContext>,
times: number
): RunnableTransition<TContext>[] {
return Array(times).fill(transition);
}
This gives as an array of RunnableTransitions
we can execute in sequence.
Finally, we might want to combine the output of calling local()
with the
output of calling repeat()
into a longer sequence of RunnableTransitions
we
can run - the first function gives us a RunnableTransition
, the second
function gives us an array of RunnableTransition
s. To address this, we
provide sequence
:
function sequence<TContext>(
transitions: (
| RunnableTransition<TContext>
| RunnableTransition<TContext>[]
)[]
): RunnableTransition<TContext>[] {
return transitions.flat();
}
This function takes an array of RunnableTransition
s, or an array of
arrays, and calls flat()
on this to flatten nested array into a single, flat
list.
Once we have a sequence of transitions, we can run them using run()
:
async function run<TContext>(
sequence: RunnableTransition<TContext>[],
queue: IQueue<BaseAction>,
context: TContext
) {
for (const transition of sequence) {
await transition(queue, context);
}
}
We simply execute each RunnableTransition
in turn.
Understandably, this has all been abstract. Let's now see how we can use these functions to model interactions.
Let's look at a simple example: key exchange: in order to secure our transport, we want each client to share a public key, then sign each subsequent message with their corresponding private key.
We looked at securing the transport layer in this post. We haven't discussed the key negotiation though.
Let's create the following protocol: as each client joins the game, they post
a public key. For an N
player game, each client should expect N
remote
transitions consisting of clients publishing public keys. Once all of these
were processed, we should have all public keys for all clients and can create
a SignedTransport
.
Let's sketch out the state machine:
function makeKeyExchangeSequence(players: number) {
return sm.sequence([
sm.local(async (actionQueue: IQueue<KeyExchangeAction>, context: CryptoContext) => {
// Post public key ...
}),
sm.repeat(sm.transition((action: KeyExchangeAction, context: CryptoContext) => {
// Store incoming public key ...
}), players)
]);
}
Note we create a LocalTransition
in which we post our own public key, and we
repeat the remote transition handling an incoming public key (remember with
Fluid we expect the server to also send us back whatever we post).
Clients can join the game at different times, so we don't know in what order
the keys will come in but, luckily, each Action
has a clientId
so we know
who's key it is.
We'll look at the implementation of the transitions but first let's see what
are the KeyExchangeAction
and CryptoContext
:
type KeyExchangeAction = {
clientId: ClientId;
type: "KeyExchange";
publicKey: Key;
};
type CryptoContext = {
clientId: ClientId;
me: PublicPrivateKeyPair;
keyStore: KeyStore;
};
KeyExchange
is an action consisting of clientId
and publicKey
, with the
type
set to "KeyExchange"
.
CryptoContext
is the context needed by the transitions implementing the key
exchange - that is we need to know our own clientId
, our public-private
key pair, and we need a keyStore
, which is a map of clientId
to public key.
We looked at the KeyStore
and the other key types in a previous blog post, but
here they are again for reference:
type Key = string;
type PublicPrivateKeyPair = {
publicKey: Key;
privateKey: Key;
};
type KeyStore = Map<ClientId, Key>;
With these in place, let's look at the implementation of the transitions:
function makeKeyExchangeSequence(players: number) {
return sm.sequence([
sm.local(
async (
actionQueue: IQueue<KeyExchangeAction>,
context: CryptoContext
) => {
// Post public key
await actionQueue.enqueue({
type: "KeyExchange",
clientId: context.clientId,
publicKey: context.me.publicKey,
});
}
),
sm.repeat(
sm.transition(
(action: KeyExchangeAction, context: CryptoContext) => {
// This should be a KeyExchangeAction
if (action.type !== "KeyExchange") {
throw new Error("Invalid action type");
}
// Protocol expects clients to post an ID
if (action.clientId === undefined) {
throw new Error("Expected client ID");
}
// Protocol expects each client to only post once and to have a unique ID
if (context.keyStore.has(action.clientId)) {
throw new Error(
"Same client posted key multiple times"
);
}
context.keyStore.set(action.clientId, action.publicKey);
}
),
players
),
]);
}
sm
stands for state machine
. The functions described above live in a
StateMachine
namespace aliased to sm
.
Our local transition is simple: we enqueue a KeyExchangeAction
, sending our
clientId
and publicKey
from the CryptoContext
.
When a remote action comes in, we perform the required validations:
KeyExchangeAction
.clinetId
.Finally, we store the clientId
and publicKey
.
The end-to-end implementation for key exchange, relying on the state machine, is here:
async function makeCryptoContext(clientId: ClientId): Promise<CryptoContext> {
return {
clientId,
me: await Signing.generatePublicPrivateKeyPair(),
keyStore: new Map<ClientId, Key>(),
};
}
async function keyExchange(
players: number,
clientId: ClientId,
actionQueue: IQueue<BaseAction>
) {
const context = await makeCryptoContext(clientId);
const keyExchangeSequence = makeKeyExchangeSequence(players);
await sm.run(keyExchangeSequence, actionQueue, context);
return [context.me, context.keyStore] as const;
}
makeCryptoContext()
is a helper function to initialize a CryptoContext
instance - it takes a clientId
, generates a public-private key pair, and
initializes an empty key store.
keyExchange()
calls the functions we defined previously to get a
CryptoContext
, the key exchange sequence, and calls the state machine's
run()
to execute the key exchange.
Once done, it returns the client's public-private key pair, and the key store.
From a caller's perspective, the protocol handling key exchange is now
abstracted away behind the keyExchange()
function. The caller doesn't have to
worry about the mechanics of exchanging keys, rather can just call this and get
back all the required data to create a SignedTransport
.
As a second example, we'll sketch out the state machine for a game of rock-paper-scissors. We won't dive into all the implementation details. At a high level, here is how we play a game of rock-paper-scissors:
This two-step ensures players are committed to a selection and can't cheat by observing what the other player picked and picking afterwards.
The state machine for this game is:
The state machine is:
sm.sequence([
sm.local(async (queue, context) => {
// Post our play action
}),
sm.repeat(sm.transition(async (action, context) => {
// Both player and opponent need to post their encrypted selection
}), 2),
sm.local(async (queue, context) => {
// Post our reveal action
}),
sm.repeat(sm.transition(async (reveal: RevealAction, context: RootStore) => {
// Both player and opponent need to reveal their selection
}), 2)
]);
We won't fill in the functions in this post but this gives you an idea of how we can model a more complex set of steps using our library.
In this post we looked at a state machine we can use to implement games:
Action
types, and has its own relevant
context.RunnableTransition
is a common type that can wrap local or remote
transitions.The Mental Poker Toolkit is here. This post covered the state-machine package, the key exchange is implemented in the primitives package.
Sequenced is a Fluid Framework term. Clients send messages to the Fluid relay service, which orders them in the order they came in and broadcasts them to all clients. This is to ensure all clients eventually see all the messages sent in the same order. ↩
For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered the transport.
As I was building up the library and looking at state machines that would run
turns in a game, I realized an async queue would come in handy. The challenge
with the raw ITransport
interface built on top of the Fluid ledger is that if
you are not the first client to join a session, you end up with a set of ops
that already exist on the ledger. You need a way to consume both the ops that
were already sequenced and new incoming ops. An async interface is also easier
to consume than callbacks.
Before diving into that though, letâs talk about actions.
As a reminder, op is the Fluid Framework term for data being sent/received. In
Mental Poker we use actions. All actions should be subtypes of BaseAction
:
export type ClientId = string;
export type BaseAction = {
clientId: ClientId;
type: unknown;
};
Every action should have a clientId
showing which client it came from, and a
type
.
For example, hereâs how we would model a game of Rock/Paper/Scissors:
We model the game in these two steps so regardless of which player moves first, the player choices are revealed after they have been put on the ledger. If a player would simply post their unencrypted selection, the other player might cheat by looking at it before posting their own.
I will cover the Rock/Paper/Scissors implementation in detail in a future post, for now, letâs just go over the actions:
export type PlayAction = {
clientId: ClientId;
type: "PlayAction";
encryptedSelection: EncryptedSelection;
};
export type RevealAction = {
clientId: ClientId;
type: "RevealAction";
key: SerializedSRAKeyPair;
};
export type Action = PlayAction | RevealAction;
The two actions described above are modeled as PlayAction
and RevealAction
.
Both of these have a clientId
and type
, thus are subtypes of BaseAction
.
Finally, the Action
type represents all possible actions in the game.
This becomes relevant as we move higher in the stack of the Mental Poker
library. Once we start encoding some of the game semantics, we require generic
types to extend BaseAction
. This is what happens with the async queue.
As I mentioned at the beginning of the article, queues aim to provide a nicer API over the transport. The interface is very simple:
export interface IQueue<T extends BaseAction> {
enqueue(value: T): Promise<void>;
dequeue(): Promise<T>;
}
For any type T
extending BaseAction
, we can enqueue()
a value and we can
dequeue()
a value. Both of the operations are asynchronous.
Iâll show the full implementation then go over the details:
export class ActionQueue<T extends BaseAction> implements IQueue<T> {
private queue: T[] = [];
constructor(
private readonly transport: ITransport<T>,
preseed: boolean = false
) {
transport.on("actionPosted", (value) => {
this.queue.push(value);
});
if (preseed) {
for (const value of transport.getActions()) {
this.queue.push(value);
}
}
}
async enqueue(value: T) {
await this.transport.postAction(value);
}
async dequeue(): Promise<T> {
const result = this.queue.shift();
if (result) {
return Promise.resolve(result);
}
return new Promise<T>((resolve) => {
this.transport.once("actionPosted", async () => {
resolve(await this.dequeue());
});
});
}
}
The implementation maintains an array of T
s (actions). The constructor takes a
transport
argument of type ITransport
and preseed
flag:
constructor(
private readonly transport: ITransport<T>,
preseed: boolean = false
) {
transport.on("actionPosted", (value) => {
this.queue.push(value);
});
if (preseed) {
for (const value of transport.getActions()) {
this.queue.push(value);
}
}
}
/* ... */
The queue starts listening to the actionPosted
event and whenever we have an
incoming value, we push it to the internal queue. If preseed
is true
, we
also push all actions already posted to the queue.
The reason we make this optional is that we might end up using multiple queues
in a game implementation but we only want to consume the actions posted on the
ledger before we joined the session once. After we are âup to speedâ, new
incoming actions fire events which we can consume in realtime. So we would
usually create our first queue with preseed
set to true
and subsequent
queues with preseed
set to false
.
Enqueuing a value is trivial - we leverage the transportâs postAction
API:
/* ... */
async enqueue(value: T) {
await this.transport.postAction(value);
}
/* ... */
Dequeuing is a bit more interesting:
/* ... */
async dequeue(): Promise<T> {
const result = this.queue.shift();
if (result) {
return Promise.resolve(result);
}
return new Promise<T>((resolve) => {
this.transport.once("actionPosted", async () => {
resolve(await this.dequeue());
});
});
}
/* ... */
First, we call shift()
on the queue. This either returns a value or
undefined
if the queue is empty.
If we do get a value, we return a resolved promise right away.
If we donât have a value, we add a one-time listener to the actionPosted
event. When a new action is posted, the underlying transport will fire the
event. Since event listeners are called in the order they subscribed, we are
guaranteed the listener we added in the constructor fires first, and adds the
value to queue
. We resolve the promise by recursively calling dequeue()
and
awaiting the response.
The reason we do this is we might have multiple callers to dequeue()
holding
on to promises. In this case, we donât want to resolve all of them with the
incoming value, rather just the first one. The first recursive call to
dequeue()
should grab the value from the internal queue
and return it right
away, while other recursive callers would end up awaiting again until a new
value comes in. There's probably a more efficient non-recursive implementation
but for our specific use-case (games), we don't expect many cases where we have
multiple dequeus pending.
There are two main reasons for using this queue rather than relying directly on the underlying transport.
First, the underlying transport can have a set of actions (messages) that
already arrived on the client (which we would retrieve with the getActions()
method), and some which arrive in real time (which would fire events). The
queue gives us a unified way to consume both, by calling await dequeue()
.
Besides a unified interface, we expect multiple spots in the code to wait for
an incoming action. This depends on the game implementation, but usually at
different game states we expect different messages to come in. This is harder
to achieve waiting for event callbacks and much easier to do via the same
await dequeue()
call.
In this post we looked at actions, the key building blocks of Mental Poker games, and an async queue which provides a clean abstraction over the underlying transport.
The code covered in this post is available on GitHub in
theÂ mental-poker-toolkitÂ repo.
BaseAction
and the ITransport
and IQueue
interfaces are part of the core
types package packages/types.
ActionQueue
Â is implemented underÂ packages/action-queue.
I always have fun with Advent of Code every December, and last year I did write a blog post covering some of the more interesting problems I worked through. I'll continue the tradition this year.
I'll repeat my disclaimer from last time:
Disclaimer on my solutions
I use Python because I find it easiest for this type of coding. I treat solving these as a write-only exercise. I do it for the problem-solving bit, so I don't comment the code & once I find the solution I consider it
done- I don't revisit and try to optimize even though sometimes I strongly feel like there is a better solution. I don't even share code between part 1 and part 2 - once part 1 is solved, I copy/paste the solution and change it to solve part 2, so each can be run independently. I also rarely use libraries, and when I do it's some standard ones likere
,itertools
, ormath
. The code has no comments and is littered with magic numbers and strange variable names. This is not how I usually code, rather my decadent holiday indulgence. I wasn't thinking I will end up writing a blog post discussing my solutions so I would like to apologize for the code being hard to read.
All my solutions are on my GitHubÂ here.
This time around, I did use GitHub Copilot, with mixed results. In general, it mostly helped with tedious work, like implementing the same thing to work in different directions - there are problems that require we do something while heading north, then same thing while heading east etc. I did also observe it produce buggy code that I had to manually edit.
I'll skip over the first few days as they tend to be very easy.
Problem statement is here.
This is an easy problem, I just want to call out a shortcut: for part 2, to exact same algorithm as in part 1 works if you first reverse the input. This was a neat discovery that saved me a bunch of work.
Problem statement is here.
Part 1 was again very straightforward. I found part 2 a bit more interesting,
especially the fact that we can determine whether a tile is inside
or
outside
our loop by only looking at a single row (or column). We always start
outside
, then scan each tile. If we hit a |
, then we toggle from outside
to inside
and vice-versa. If we hit an L
or a F
, we continue while we're
on a -
(these are all parts of our loop), and we stop on the 7
or J
. If we
started on L
and ended on J
or started on F
and eded on 7
- meaning the
pipe bends and turns back the way we came, we don't change our state. On the
other hand, if the pipe goes down
from L
to 7
or up
from F
to J
,
then we toggle outside
/inside
. For each non-pipe tile, if we're inside
, we
count it. Maybe this is obvious but it took me a bit to figure it out.
def scan_line(ln):
total, i, inside, start = 0, -1, False, None
while i < len(grid[0]) - 1:
i += 1
if (ln, i) not in visited:
if inside:
total += 1
else:
if grid[ln][i] == '|':
inside = not inside
continue
# grid[ln][i] in 'LF'
start = grid[ln][i]
i += 1
while grid[ln][i] == '-':
i += 1
if start == 'L' and grid[ln][i] == '7' or \
start == 'F' and grid[ln][i] == 'J':
inside = not inside
return total
In the code above, visited
tracks pipe segments (as opposed to tiles that are
not part of the pipe).
Problem statement is here.
Day 11 was easy, so not much to discuss. Use Manhattan distance for part 1 and
in part 2, just add 999999
for every row or column crossed that doesn't
contain any galaxies.
Problem statement is here.
Part 1 was very easy.
Part 2 was a bit harder because just trying out every combination takes forever
to run. I initially tried to do something more clever around deciding when to
turn a ?
into #
or .
depending on what's around it, where we are in the
sequence, etc. But ultimately it turns out just adding memoization made the
combinatorial approach run very fast.
Problem statement is here.
This was a very easy one, so I won't cover it.
Problem statement is here.
This was easy but part 2 was tedious, having to implement tilt
functions for
various directions. This is where Copilot saved me a bunch of typing.
Once we have the tilt
functions, we can implement a cycle
function that
tilts things north, then west, then south, then east. Finally, we need a bit of
math to figure out the final position: we save the state of the grid after each
cycle and as soon as we find a configuration we encountered before, it means we
found our cycle. Based on this, we know how many steps we have before the cycle,
what the length of the cycle is, so we can compute the state after 1000000000
cycles:
pos = []
while (state := cycle()) not in pos:
pos.append(state)
lead, loop = pos.index(state), len(pos) - pos.index(state)
d = (1000000000 - lead) % loop
With this, we need to count the load of the north support beams for the grid we
have at pos[lead + d - 1]
.
Problem statement is here.
Another very easy one that I won't cover.
Problem statement is here.
This one was also easy and tedious, as we have to handle the different types of reflections. Another one where Copilot saved me a lot of typing.
Problem statement is here.
This was a fairly straightforward depth-first search, where we keep a cache of how much heat loss we have up to a certain point. The one interesting complication is that we can only move forward 3 times. In the original implementation, I keyed the cache on grid coordinates + direction we're going in + how many steps we already took in that direction. This worked in reasonable time.
In part 2, we now have to move at least 4 steps in one direction and at most 10. The cache I used in part 1 doesn't work that well anymore. On the other hand, I realized that rather than keeping track of direction and how many steps we took in that direction so far, I can model this differently: we are moving either horizontally or vertically. If we're at some point and moving horizontally, we can expand our search to all destination points (from 4 to 10 away horizontally or vertically) and flip the direction. For example, if we just moved horizontally to the right, we won't move further to the right as we already covered all those cases, and we won't move back left as the crucible can't turn 180 degrees. That means the only possible directions we can take are up or down in this case, meaning since we just moved horizontally, we now have to move vertically.
This makes our cache much smaller: our key is the coordinates of the cell and the direction we were moving in. This also makes the depth-first search complete very fast.
best, end = {}, 1000000
def search(x, y, d, p):
global end
if p >= end:
return
if x == len(grid) - 1 and y == len(grid[0]) - 1:
if p < end:
end = p
return
if (x, y, d) in best and best[(x, y, d)] <= p:
return
best[(x, y, d)] = p
if d != 'H':
if x + 3 < len(grid[x]):
pxr = p + grid[x + 1][y] + grid[x + 2][y] + grid[x + 3][y]
for i in range(4, 11):
if x + i < len(grid):
pxr += grid[x + i][y]
search(x + i, y, 'H', pxr)
if x - 3 >= 0:
pxl = p + grid[x - 1][y] + grid[x - 2][y] + grid[x - 3][y]
for i in range(4, 11):
if x - i >= 0:
pxl += grid[x - i][y]
search(x - i, y, 'H', pxl)
if d != 'V':
if y + 3 < len(grid[0]):
pyd = p + grid[x][y + 1] + grid[x][y + 2] + grid[x][y + 3]
for i in range(4, 11):
if y + i < len(grid[0]):
pyd += grid[x][y + i]
search(x, y + i, 'V', pyd)
if y - 3 >- 0:
pyu = p + grid[x][y - 1] + grid[x][y - 2] + grid[x][y - 3]
for i in range(4, 11):
if y - i >= 0:
pyu += grid[x][y - i]
search(x, y - i, 'V', pyu)
I realized this approach actually applies well to part 1 too, and retrofitted it there. The only difference is instead of expanding to the cells +4 to +10 in a direction, we expand to the cells +1 to +3.
Problem statement is here.
The first part is easy - we plot the input on a grid, then flood fill to find the area.
In the below code, dig
is the input, processed as a tuple of direction and
number of steps:
x, y, grid = 0, 0, {(0, 0)}
for dig in digs:
match dig[0]:
case 'U':
for i in range(dig[1]):
y -= 1
grid.add((x, y))
case 'R':
for i in range(dig[1]):
x += 1
grid.add((x, y))
case 'D':
for i in range(dig[1]):
y += 1
grid.add((x, y))
case 'L':
for i in range(dig[1]):
x -= 1
grid.add((x, y))
x, y = min([x for x, _ in grid]), min([y for _, y in grid])
while (x, y) not in grid:
y += 1
queue = [(x + 1, y + 1)]
while queue:
x, y = queue.pop(0)
if (x, y) in grid:
continue
grid.add((x, y))
queue += [(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)]
print(len(grid))
Part 2 is trickier, as the number are way larger and the same flood fill
algorithm won't work. My approach was to divide the area into rectangles: as we
process all movements, we end up with a set of (x, y)
tuples of points where
our line changes direction. If we sort all the x
coordinates and all y
coordinates independently, we end up with a grid where we can treat each pair of
subsequent x
s and y
s as describing a rectangle on our grid.
x, y, points = 0, 0, [(0, 0)]
for dig in digs:
match dig[0]:
case 0: x += dig[1]
case 1: y += dig[1]
case 2: x -= dig[1]
case 3: y -= dig[1]
if dig[1] < 10:
print(dig[1])
points.append((x, y))
xs, ys = sorted({x for x, _ in points}), sorted({y for _, y in points})
Where digs
above represents the input, processed as before into direction and
number of steps tuples.
Now points
contains all the connected points we get following the directions,
which means a pair of subsequent points describes a line. Once we have this, we
can start a flood fill in one of the rectangles and proceed as follows: if there
is a north boundary, meaning we have a line between our top left and top right
coordinates, then we don't recurse north; otherwise we go to the rectangle north
of our current rectangle and repeat the algorithm there. Same for east, south,
west.
Since we have to consider each point in the terrain in our area calculation, we need to be careful how we measure the boundaries of each rectangle so we don't double-count or omit points. To ensure this, my approach was that for each rectangle we count, we count an extra line north (if there is no boundary) and an extra line east (if there is no boundary). If there's neither a north nor an east boundary, then we add 1 for the north-east corner. This should ensure we don't double-count, as each rectangle only considers its north and east boundaries, and we don't miss anything, as any rectangle without a boundary will count the additional points. What remains is the perimeter of our surface, which we add it at the end. The explanations might sound convoluted, but the code is very easy to understand:
queue, total, visited = [(1, 1)], 0, set()
while queue:
x, y = queue.pop(0)
e = min([i for i in xs if i > x])
s = max([i for i in ys if i < y])
w = max([i for i in xs if i < x])
n = min([i for i in ys if i > y])
if (n, e) in visited:
continue
visited.add((n, e))
total += (e - w - 1) * (n - s - 1)
found_n, found_s, found_e, found_w = False, False, False, False
for l1, l2 in zip(points, points[1:]):
if l1[1] == l2[1]:
if l1[1] == n and (l1[0] < x < l2[0] or l2[0] < x < l1[0]):
found_n = True
if l1[1] == s and (l1[0] < x < l2[0] or l2[0] < x < l1[0]):
found_s = True
elif l1[0] == l2[0]:
if l1[0] == e and (l1[1] < y < l2[1] or l2[1] < y < l1[1]):
found_e = True
if l1[0] == w and (l1[1] < y < l2[1] or l2[1] < y < l1[1]):
found_w = True
if not found_n:
total += e - w - 1
queue.append((x, n + 1))
if not found_s:
queue.append((x, s - 1))
if not found_e:
total += n - s - 1
queue.append((e + 1, y))
if not found_w:
queue.append((w - 1, y))
if not found_n and not found_e:
if (e, n) not in points:
total += 1
total += sum([dig[1] for dig in digs])
Problem statement is here.
For the first part, we can process rule by rule.
For the second part, start with bounds: (1, 4000)
for all of xmas
. Then at
each decision point, recurse updating bounds. Whenever we hit an A
, add the
bounds to the list of accepted bounds.
Bounds are guaranteed to never overlap, by definition.
accepts = []
def execute_workflow(workflow_key, bounds):
workflow = workflows[workflow_key]
for rule in workflow:
if rule == 'A':
accepts.append(bounds)
return
if rule == 'R':
return
if rule in workflows:
execute_workflow(rule, bounds)
return
check, next_workflow = rule.split(':')
if '<' in check:
key, val = check.split('<')
nb = bounds.copy()
nb[key] = (nb[key][0], int(val) - 1)
bounds[key] = (int(val), bounds[key][1])
elif '>' in check:
key, val = check.split('>')
nb = bounds.copy()
nb[key] = (int(val) + 1, nb[key][1])
bounds[key] = (bounds[key][0], int(val))
execute_workflow(next_workflow, nb)
execute_workflow('in', {'x': (1, 4000), 'm': (1, 4000), 'a': (1, 4000), 's': (1, 4000)})
This gives us all accepted ranges for each of x
, m
, a
, and s
.
Problem statement is here.
For the first part, we can model the various module types as classes with a
common interface and different implementations. Since one of the requirements is
to process pulses in the order they are sent, we will use a queue rather than
have objects call each other based on connections. So rather than module A
directly calling connected module B
when it receives a signal (which would
cause out-of-order processing), model A
will just queue a signal for module
B
, which will be processed once the signals queued before this one are already
processed.
I won't share the code here as it is straightforward. You can find it on my GitHub.
This one was one of the most interesting problems this year. Simply simulating
button presses wouldn't work. I ended up dumping the diagram as a dependency
graph and it looks like the only module that signals rx
is a conjunction
module with multiple inputs.
Conjunction modules emit a low pulse when they remember high pulses being sent
by all their connected inputs. In this case, we can simulate button presses and
keep track when each input to this conjunction module emits a high pulse. Then
we compute the least common multiple of these to determine when the rx
module
will get a low signal.
My full solution is here, though I'm still pretty sure it is topology-dependent. Meaning we might have a different set up where the inputs to this conjunction model are not fully independent, which might make LCM not return the correct answer.
Problem statement is here.
Part 1 is trivial, we can easily simulate 64 steps and count reachable spots.
The second part is much more tricky - this is actually the problem I spent the most time on. Since the garden is infinite, and we are looking for a very high number of steps, we can't use the same approach as in part 1 to simply simulate moves.
Let's now call a tile
a repetition of the garden on our infinite grid. Say we
start with the garden at (0, 0)
. Then as we expand beyond its bounds, we reach
tiles (-1, 0)
, (1, 0)
, (0, -1)
, (0, 1)
, which are repetitions of our
initial garden.
The two observations that helped here were:
In fact, after we grow beyond the first 4 surrounding tiles, it seems like the
garden grows with a periodicity of the size of the garden. Meaning every
len(grid)
steps, we reach new tiles. There are a few cases to consider -
north, east, south, west, diagonals.
My approach was to do a probe - simulate the first few steps and record the results.
def probe():
dx, dy = len(grid) // 2, len(grid[0]) // 2
tiles, progress = {(dx, dy)}, {(0, 0): {0: 1}}
i = 0
while len(progress) < 41:
i += 1
new_tiles = set()
for x, y in tiles:
if grid[(x - 1) % len(grid)][y % len(grid[0])] != '#':
new_tiles.add((x - 1, y))
if grid[(x + 1) % len(grid)][y % len(grid[0])] != '#':
new_tiles.add((x + 1, y))
if grid[x % len(grid)][(y - 1) % len(grid[0])] != '#':
new_tiles.add((x, y - 1))
if grid[x % len(grid)][(y + 1) % len(grid[0])] != '#':
new_tiles.add((x, y + 1))
tiles = new_tiles
for x, y in tiles:
sq_x, sq_y = x // len(grid), y // len(grid[0])
if (sq_x, sq_y) not in progress:
progress[(sq_x, sq_y)] = {}
if i not in progress[(sq_x, sq_y)]:
progress[(sq_x, sq_y)][i] = 0
progress[(sq_x, sq_y)][i] += 1
return progress
Here progress
keeps track, for each tile (keyed as set of (x, y)
coordinates
offset from (0, 0)
), of how many spots are reachable at a given time. I run
this until progress
grows enough for the repeating pattern to show - because
we start from the center of a garden but in all other tiles we enter from a
side, it takes a couple of iterations for the pattern to stabilize. My guess is
this probe could be smaller with some better math, but that's what I have.
With this, given a number of steps, we can reduce it using steps % len(grid)
to a smaller value we can loop in our progress
record. The reasoning being, if
the pattern repeats, it doesn't really matter whether we are 3 steps into tile
(-1000, 0)
or 3 steps into tile (-3, 0)
.
The tedious part was determining the right offsets and special cases when
computing the total number of squares. For example, even for the tiles that are
fully covered, we'll have a subset where tiles are on the odd
state of squares
and a subset where tiles are on the âeven" state.
I ended up with the following formula (which might still be buggy, but seemed to have worked for my input):
def at(x, y, step):
return progress[(x, y)][step] if step in progress[(x, y)] else 0
def count(steps):
even, odd = (1, 0) if steps % 2 == 0 else (0, 1)
for i in range(1, steps // len(grid)):
if steps % 2 == 0:
if i % 2 == 0:
even += 4 * i
else:
odd += 4 * i
else:
if i % 2 == 0:
odd += 4 * i
else:
even += 4 * i
total = even * at(0, 0, len(grid) * 2) + odd * at(0, 0, len(grid) * 2 + 1)
total += at(-3, 0, len(grid) * 3 + steps % len(grid))
total += at(3, 0, len(grid) * 3 + steps % len(grid))
total += at(0, -3, len(grid) * 3 + steps % len(grid))
total += at(0, 3, len(grid) * 3 + steps % len(grid))
i = steps // len(grid) - 1
total += i * at(-1, -1, len(grid) * 2 + steps % len(grid))
total += i * at(-1, 1, len(grid) * 2 + steps % len(grid))
total += i * at(1, -1, len(grid) * 2 + steps % len(grid))
total += i * at(1, 1, len(grid) * 2 + steps % len(grid))
i += 1
total += i * at(-2, -1, len(grid) * 2 + steps % len(grid))
total += i * at(-2, 1, len(grid) * 2 + steps % len(grid))
total += i * at(2, -1, len(grid) * 2 + steps % len(grid))
total += i * at(2, 1, len(grid) * 2 + steps % len(grid))
return total
I'm covering all inner even
and âodd" tiles, then the directly north, east,
south, and west tiles, then two layers of diagonals. Again, I have a feeling
this could be simpler, but I didn't bother to optimize it further.
Problem statement is here.
For part one, we sort bricks by z
coordinate (ascending), then we make each
brick fall
. We do this by decrementing their z
coordinate and checking
whether they intersect with any other brick.
def intersect(brick1, brick2):
if brick1[0].x > brick2[1].x or brick1[1].x < brick2[0].x:
return False
if brick1[0].y > brick2[1].y or brick1[1].y < brick2[0].y:
return False
if brick1[0].z > brick2[1].z or brick1[1].z < brick2[0].z:
return False
return True
def slide_down(brick, delta):
return (Point(brick[0].x, brick[0].y, brick[0].z - delta), Point(brick[1].x, brick[1].y, brick[1].z - delta))
def fall(brick):
if min(brick[0].z, brick[1].z) == 1:
return 0
result, orig = 0, brick
while True:
brick = slide_down(brick, 1)
for b in bricks:
if b == orig:
continue
if intersect(brick, b):
return result
result += 1
if min(brick[0].z, brick[1].z) == 1:
return result
bricks = sorted(bricks, key=lambda b: min(b[0].z, b[1].z))
for i, brick in enumerate(bricks):
if delta := fall(brick):
bricks[i] = slide_down(brick, delta)
Once every brick that could fall has fallen to its final position, we need to
find the critical
bricks - the bricks that are the only support for some other
bricks. We do this by shifting down each brick again 1 z
and determining how
many bricks it intersects with. If a shifted brick only intersects with one
other brick, that is a âcriticalbrick, so we add it to our set of âcritical
support bricks. All other bricks can be safely removed.
critical = set()
for brick in bricks:
if brick[0].z == 1 or brick[1].z == 1:
continue
supported_by = []
nb = slide_down(brick, 1)
for i, b in enumerate(bricks):
if brick == b:
continue
if intersect(nb, b):
supported_by.append(i)
if len(supported_by) == 1:
critical.add(supported_by[0])
print(len(bricks) - len(critical))
In part 2, we need to figure out which bricks is each brick supported by. We can
use a similar algorithm to part 1, where we shift z
by 1 and check which
bricks we intersect. Then we can build a dependency graph of which bricks is
supported by which other bricks.
supported_by = {}
for i, brick in enumerate(bricks):
supported_by[i] = set()
if brick[0].z == 1 or brick[1].z == 1:
continue
nb = slide_down(brick, 1)
for j, b in enumerate(bricks):
if i == j:
continue
if intersect(nb, b):
supported_by[i].add(j)
Then for each brick we remove, we can walk the supported by
dependencies to
determine which bricks would fall and would, in turn, cause other bricks to
fall, without having to actually simulate falling.
def count_falling(i):
sup = {k: supported_by[k].copy() for k in supported_by.keys()}
queue, removed = [i], set()
while queue:
i = queue.pop(0)
if i in removed:
continue
removed.add(i)
for j in sup:
if i in sup[j]:
sup[j].remove(i)
if len(sup[j]) == 0:
queue.append(j)
return len(removed) - 1
print(sum(count_falling(i) for i in range(len(supported_by))))
Problem statement is here.
The main insight here for both part 1 and part 2 is that we can model the paths as a graph where each intersection (decision point) is a vertex and the paths between intersections are edges. With this representation, we simply need to find the longest path between our starting point and our end point.
In part 1, we have a directed graph, as right before hitting each intersection,
we have a ><^v
constraint, making the path one-way. In part 2, we have an
undirected graph.
Note that the longest path problem in a graph is harder than the shortest path problem. That said, we are dealing with extremely small graphs.
Problem statement is here.
Part 1 was fairly straightforward: for each pair of lines, solve the equation to find where they meet and check if within bounds (when lines are not parallel).
Since each line is described by a point \((x_{origin}, y_{origin})\) and a vector \((dx, dy)\), we can represent them as
\[\begin{cases} x = x_{origin} + dx * t \\ y = y_{origin} + dy * t \end{cases}\]
Then the lines intersect when
\[\begin{cases} x_1 + dx_1 * t_1 = x_2 + dx_2 * t_2 \\ y_1 + dy_1 * t_1 = y_2 + dy_2 * t_2 \end{cases}\]
We know all of \((x_1, y_1), (dx_1, dy_1), (x_2, y_2), (dx_2, dy_2)\) so we solve for \(t_1\) and \(t_2\).
def intersect(p1, v1, p2, v2):
if v1.dx / v1.dy == v2.dx / v2.dy:
return None, None
t2 = (v1.dx * (p2.y - p1.y) + v1.dy * (p1.x - p2.x)) / (v2.dx * v1.dy - v2.dy * v1.dx)
t1 = (p2.y + v2.dy * t2 - p1.y) / v1.dy
return t1, t2
Once we have t1
and t2
, we need to check both are positive (so intersection didn't happen in the past), and make sure the intersection point, which is either x1 + dx1 * t1
, y1 + dx1 * t1
or x2 + dx2 * t2
, y2 + dx2 * t2
, is within our bounds (at leastÂ 200000000000000Â and at mostÂ 400000000000000).
If that's the case, then we found an intersection and we can add it to the total.
Part 2 was really fun. We now have 3 dimensions, so a line is represented as
\[\begin{cases} x = x_{origin} + dx * t \\ y = y_{origin} + dy * t \\ z = z_{origin} + dz * t \end{cases}\]
We need to find a line (the trajectory of our rock) that intersects each line in our input at a different time, such that for some \(t\) and line \(l\), we have
\[\begin{cases} x_{origin_{l}} + dx_l * t = x_{origin_{rock}} + dx_{rock} * t \\ y_{origin_{l}} + dy_l * t = y_{origin_{rock}} + dy_{rock} * t \\ z_{origin_{l}} + dz_l * t = z_{origin_{rock}} + dz_{rock} * t \end{cases}\]
One way to solve this is using linear algebra. If we take 3 different hailstorms and our rock, we end up with the following set of equations:
\[\begin{cases} x_{origin_{1}} + dx_1 * t_1 = x_{origin_{rock}} + dx_{rock} * t_1 \\ y_{origin_{1}} + dy_1 * t_1 = y_{origin_{rock}} + dy_{rock} * t_1 \\ z_{origin_{1}} + dz_1 * t_1 = z_{origin_{rock}} + dz_{rock} * t_1 \\ x_{origin_{2}} + dx_2 * t_2 = x_{origin_{rock}} + dx_{rock} * t_2 \\ y_{origin_{2}} + dy_2 * t_2 = y_{origin_{rock}} + dy_{rock} * t_2 \\ z_{origin_{2}} + dz_2 * t_2 = z_{origin_{rock}} + dz_{rock} * t_2 \\ x_{origin_{3}} + dx_3 * t_3 = x_{origin_{rock}} + dx_{rock} * t_3 \\ y_{origin_{3}} + dy_3 * t_3 = y_{origin_{rock}} + dy_{rock} * t_3 \\ z_{origin_{3}} + dz_3 * t_3 = z_{origin_{rock}} + dz_{rock} * t_3 \end{cases}\]
In the above system, we know all of the starting points and vectors of the hailstorms. Our unknowns are \(t_1, t_2, t_3, x_{origin_{rock}}, y_{origin_{rock}}, z_{origin_{rock}}, dx_{rock}, dy_{rock}, dz_{rock}\). That's 9 unknowns to 9 equations, so it should be solvable.
While this approach works, I didn't want to use a numerical library to solve this (I'm trying to keep dependencies at a minimum), and implementing the math from scratch was a bit too much for me. I thought of a different approach: as long as we can find a rock trajectory that intersects the first couple of hailstorms at the right times, we most likely found our solution.
\[\begin{cases} x_{origin_{rock}} + dx_{rock} * t_1 = x_1 + dx_1 * t_1 \\ y_{origin_{rock}} + dy_{rock} * t_1 = y_1 + dy_1 * t_1 \\ x_{origin_{rock}} + dx_{rock} * t_2 = x_2 + dx_2 * t_2 \\ y_{origin_{rock}} + dy_{rock} * t_2 = y_2 + dy_2 * t_2 \end{cases}\]
If we solve this for \(t_1\) and \(t_2\), we can then easily determine \(z_{origin_{rock}}\) and \(dz_{rock}\).
In the above set of equations, we have too many unknowns: \(x_{origin_{rock}}, dx_{rock}, y_{origin_{rock}}, dy_{rock}, t_1, t_2\). We can reduce this number by trying out different values for a couple of these unknowns. While the ranges of possible values for \(x_{origin_{rock}}, y_{origin_{rock}}, t_1, t_2\) are very large, so unfeasible to cover, \(dx_{origin}\) and \(dy_{origin}\) ranges should be small - if these values are large, our rock will quickly shoot past all the other hailstorms.
My approach was to try all possible values between -1000 and 1000 for both of these, then see if we can find \(x_{origin_{rock}}, y_{origin_{rock}}, t_1, t_2\) such that these intersect the first two hailstorms. If we do, we then find \(z_{origin_{rock}}, dz_{rock}\) (easy to find since now we know \(t_1, t_2\)). We have an additional helpful constraint: the origin coordinates of the rock need to be integers.
Then we just need to check that indeed for the given \((x_{origin_{rock}}, y_{origin_{rock}}, z_{origin_{rock}})\) and \((dx_{rock}, dy_{rock}, dz_{rock})\), for each hailstorm, there is a time \(t_i\) when they intersect.
Here is the code:
def find(rng):
for dx in range(-rng, rng):
for dy in range(-rng, rng):
x1, y1, z1 = hails[0][0]
dx1, dy1, dz1 = hails[0][1]
x2, y2, z2 = hails[1][0]
dx2, dy2, dz2 = hails[1][1]
# x + dx * t1 = x1 + dx1 * t1
# y + dy * t1 = y1 + dy1 * t1
# x + dx * t2 = x2 + dx2 * t2
# y + dy * t2 = y2 + dy2 * t2
# x = x1 + t1 * (dx1 - dx)
# t1 = (x2 - x1 + t2 * (dx2 - dx)) / (dx1 - dx)
# y = y1 + (x2 - x1 + t2 * (dx2 - dx)) * (dy1 - dy) / (dx1 - dx)
# t2 = ((y2 - y1) * (dx1 - dx) - (dy1 - dy) * (x2 - x1)) / ((dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx))
if (dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx) == 0:
continue
t2 = ((y2 - y1) * (dx1 - dx) - (dy1 - dy) * (x2 - x1)) / ((dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx))
if not t2.is_integer() or t2 < 0:
continue
if (dx1 - dx) == 0:
continue
y = y1 + (x2 - x1 + t2 * (dx2 - dx)) * (dy1 - dy) / (dx1 - dx)
if not y.is_integer():
continue
t1 = (x2 - x1 + t2 * (dx2 - dx)) / (dx1 - dx)
if not t1.is_integer() or t1 < 0:
continue
x = x1 + t1 * (dx1 - dx)
# z + dz * t1 = z1 + dz1 * t1
# z + dz * t2 = z2 + dz2 * t2
# dz = (z1 + dz1 * t1 - z2 - dz2 * t2) / (t1 - t2)
# z = z1 + dz1 * t1 - dz * t1
if t1 == t2:
continue
dz = (z1 + dz1 * t1 - z2 - dz2 * t2) / (t1 - t2)
if not dz.is_integer():
continue
z = z1 + dz1 * t1 - dz * t1
In the above x
, y
, z
, dx
, dy
, dz
are the rock's origin and vector.
The final step (omitted from the code sample for brevity), is to confirm that for the given origin and vector, we end up eventually intersecting all other hailstorms.
I really enjoyed this problem as it made me work through the math.
Problem statement is here.
I liked this problem. It turned out to be a variation of the minimum cut problem. Trying out all possible permutations of nodes would take way too much time. The algorithm I used keeps track of a set of visited nodes - one of the two components. Then at each step, we add a new node to this set by selecting the most connected node to this component (meaning the node that has most edges incoming from visited nodes).
most_connected()
determines which node we want to pick next:
def most_connected(visited):
best_n, best_d = None, 0
for n in graph:
if n in visited:
continue
neighbors = sum(1 for v in graph[n] if v in visited)
if neighbors > best_d:
best_n, best_d = n, neighbors
return best_n
Then we keep going until our component has exactly 3 outgoing edges to nodes that haven't ben visited yet:
def find_components():
start = list(graph.keys())[0]
visited = {start}
while len(visited) < len(graph):
total = 0
for n in visited:
total += sum(1 for v in graph[n] if v not in visited)
if total == 3:
return visited
n = most_connected(visited)
visited.add(n)
That's where we need to make the cut. We just need to multiply len(visited)
with len(graph) - len(visited)
to find our answer.
I personally found the most difficult problems to be part 2 of day 20, 21, 24 and the one and only part of day 25. All of these took me a bit to figure out. That said, Advent of Code is always a nice holiday past-time and I can't wait for the 2024 iteration.
]]>I spent the past few years building a platform for Loop components within the Microsoft 365 ecosystem. While some of the learnings might only apply to our particular scenario, I think some observations apply broadly.
Weâve been using 1P/2P/3P to mean our team (1P), other teams within Microsoft (2P), and external developers (3P). Loop started with a set of 1P components and we set out to extract a developer platform out of these that can be leveraged by other teams. We currently have a set of 2P components built on our platform, and a 3P developer story centered around Adaptive Cards.
In this blog post Iâll cover some of my learnings with regard to platform development.
Aspirationally, we set out with the stated goal of 1P equals 3P, meaning 3rd party developers should be building on the same platform as 1st party developers. Looking at it another way, if the platform is good enough for 1st party, it should be just as good for 3rd party - this is a statement of platform capabilities and maturity and a lofty goal.
That said, I donât think this is realistic, especially within a product like Office, where user experience is paramount. That is because we have two audiences to consider: we have the developer audience - users building on our platform, and we have Office users, people who get to use the end product. Mediating between the two is quite a challenge.
A simple example is the classic performance/security tradeoff. Especially as Loop components are embedded in other applications, what level of isolation do we provide? Loop components are built with web technology. An iframe provides great isolation (best security) but iframes add performance overhead (worse perf). If we host a Loop component without an iframe, we get better performance, but we open up the whole DOM to the component. If we threat model this, we immediately see that we donât necessarily need isolation for Loop components developed within Microsoft (we donât expect our partner teams to write malicious code) but we absolutely need to isolate code written by 3rd party developers. Of course, we could say âjust isolate everythingâ, which might even have other advantages, but do we want to take the perf hit? Our other audience, people who use our product, would be negatively impacted by an overhead we can technically avoid.
Another example in the same vein: overall user experience. The more we make Loop components feel like part of the hosting app, the smoother the end user experience is. On the other hand, we canât realistically test every single Loop component built by any 3rd party developer. The way Office services and products are deployed and administered, tenant admins can configure which 3rd party extensions are enabled within the tenant. The Microsoft tenant we use internally has set some set of extensions available, but not all. That means there are always 3rd party extensions we never even see. Now if one of these extensions doesnât work properly (errors out, looks out of place, is slow etc.), end users might end up dissatisfied with the overall experience of using Office products. For internally developed components, we get to dogfood and keep a high bar, but this doesnât scale to a wide developer audience. Our current approach is to offer 3rd party development via Adaptive Cards. This way, we donât run 3rd party code on clients and we have a set of consistent UI controls. Ideally, weâd like to enable custom code but this at the time of writing weâre still thinking through the best approach considering all of the challenges listed above.
Finally, I think another key difference is the product goals. The platform audience are the developers, but the product audience are the users. Thereâs usually a tension between these. For example, an internal team builds a Loop component. They come up with a requirement that is a âmustâ to deliver their scenario. For example, we had a component developed by a partner team that asked us to check the tenantâs Cloud Policy service to see whether the component should be on or off. This makes perfect sense in this case, since the backing service might not be running in the tenant. We offer tenant admins a different way to control 3rd party extensions, so this platform capability would not make sense for a 3rd party. In general, a lot of our internal platform capability requests come from the desire to provide the best possible end user experience. If our only customer were the developers using the platform, we would probably say ânoâ to some of these - not general enough, doesnât benefit 3rd party etc. But, of course, Office has way more users than developers.
I think the 1P/3P challenge is common to most platforms built from within product teams (or supporting product teams within the same company). With Loop, this is compounded by the fact we are deeply integrated within other applications. I can think of some notable examples when the strong push for a â1P equals 3Pâ platform ended up disastrously - Windows Longhorn was supposed to be built on a version of .NET that was just not good enough for core OS pieces. I can also think of many platforms that provide sufficient capabilities for 3rd party developers but 1st/2nd party developers donât use. And I think this is OK - building a platform for 3P lets you focus on the developer community needs. Supporting 1P/2P might be best served by focusing on the product goals and unique scenario needs rather than trying to generalize to a public platform.
A platform goes through several life stages, each with its own characteristics and challenges. Looking back at how our platform evolved (and how I foresee the future), a successful platform goes through 4 life stages: incubation, 0 to 1, stabilization, and commoditization.
At this stage, itâs all one team building both the what-will-become-a-platform and the product supported by this platform. During the incubation stage, the platform doesnât really have any users (meaning developers leveraging the platform). We are free to toy with ideas. If we want to make a breaking change to an API, we can easily do it and fix the handful of internal calls. At this point, everything is in flux - the canvas is blank and we have plenty of room to innovate.
On the other hand, we donât really have a clear idea of what developers would need out of the platform - we know what the main scenario we are supporting needs, but we donât have a feedback loop yet. At this stage, we need to rely on experience and intuition to set some initial direction.
This is the biggest growth stage. â0 to 1â is a nod to Peter Thielâs Zero to One book. The platform goes from no users to a few users - and by âusersâ here I mean developers. Taking the platform from 0 (or incubation) to 1, means supporting a handful of âseriousâ production scenarios.
We now have a feedback loop and developers able to give us requirements - we can now understand their needs rather than have to divine them ourselves. As a side note, this is the approach we took with Loop, where we worked closely with a set of 2P partners to light up scenarios and grow the platform to support these.
At this stage, itâs already difficult to make breaking changes. Since there are already a set of dependencies on the platform, a breaking change requires a lot of coordination. Or some form of backwards compatibility. Or legacy support. There are different ways to go about this (maybe in another blog post), but the key point is we can no longer churn as fast as we could during the incubation stage. And added costs at the 0 to 1 stage are painful.
Another challenge is generalization. We have a handful of partners with a handful of requests for the platform. And weâre in the growth stage, so we most likely need to move fast. Thereâs a big tension between how fast we can light up new platform capabilities and how much time we spend thinking through design patterns and future-proofing. If we just say âyesâ to every ask, we can move fast but risk ending up with a very gnarly platform that has many one-off pieces and a very inconsistent developer story. On the other hand, we can spend a lot of time iterating on design and predicting how an incoming requirement would scale when the platform is large, all the way until our partners give up on us or funding runs out. There is no silver bullet for this - you always end up somewhere in the middle, with parts of the platform that you wished were done differently, but hopefully still alive and kicking in the next stage.
At this point, enough developers depend on the platform that ad-hoc breaking changes are no longer possible. By âstabilizationâ I donât mean the platform stops growing - in fact, this is the stage where we get most feedback and requests. But while the platform continues to grow incrementally, changes become even more difficult as they can break the whole ecosystem.
There are now enough user that early design decision that proved wrong become obvious, but itâs too late to change them. This is a natural âif I knew then what I know nowâ point for any platform that canât really be avoided.
This is the point where most platform start producing new major version numbers that aim to address large swats of issues and add new bundles of functionality. But while during the incubation stage, a change could land in a few days, and in the 0 to 1 stage maybe weeks or at most months, breaking changes at this stage take years to land - many developers means not all of them are ready right-away to update their code to the newest patterns. The platform needs some form of long-term support for older versions and deprecation/removal becomes a long journey.
On the other hand, the core of the platform is stable by now and battle-tested. The final step is the platform becoming a commodity.
At this stage, the platform is mature and robust. A large developer community depends on it and the platform is mostly feature complete. Some new requirements might pop up from time to time, but not very often.
At this stage developers rely on existing behaviors and change is next to impossible. Thatâs because a lot of the developer solutions are also âdoneâ by now and people moved on. Nobody wants to go back and update things to support API changes. The platform is a useful commodity.
This is also the stage where active development slows down and fewer engineers are required to keep things going. We havenât reached this stage with Loop, we are still growing the platform and moving fast. But any successfully platform should reach this stage - a low-churn state where its capabilities (and gotchas) are well understood and reliable.
Each of the stages require a different approach to evolving the platform. The speed with which we add capabilities, churn, how updates are rolled out, how we design new features - all happen in different ways and at a different pace depending on where the platform is and its number of users.
In this post I covered two main aspects of platform development: the tension between supporting 3rd party developers and ensuring end users have the best possible experience; and the different stages of a platform. As usage increases, changes become more difficult and early decisions solidify, for better or worse.
If I look at other platforms, I can easily see how they went through the same growing pains and challenges.
Iâll probably have more to write on the topic of platform development, since this has been my main job for a while now.
]]>Now that my LLM book is done,
I can get back to the Mental Poker series. A high-level overview can be found
here.
In the previous posts we covered
cryptography
and a Fluid append-only list data
structure.
Weâll be using the append-only list (we called this fluid-ledger
) to model
games.
An append-only list should be all that is needed to model turn-based games: each turn is an element added to the list. In this post, weâll stitch things together and look at the transport layer for our games.
Our basic transport interface is very simple:
declare interface ITransport<T> {
getActions(): IterableIterator<T>;
postAction(value: T): Promise<void>;
once(event: "actionPosted", listener: (value: T) => void): this;
on(event: "actionPosted", listener: (value: T) => void): this;
off(event: "actionPosted", listener: (value: T) => void): this;
}
For some type T
, we have:
getActions()
, which returns an iterator over all values (of type T
)
posted so far.postAction()
, which takes a value of type T
and an actionPosted
event
which fires whenever any of the clients posts an action (this relies on the
Fluid data synchronization).EventEmitter
methods.We'll cover why we call these values actions in a future post.
The basic implementation of this on top of the fluid-ledger
distributed data
structure looks like this:
class FluidTransport<T> extends EventEmitter implements ITransport<T> {
constructor(private readonly ledger: ILedger<string>) {
super();
ledger.on("append", (value) => {
this.emit("actionPosted", JSON.parse(value) as T);
});
}
*getActions() {
for (const value of this.ledger.get()) {
yield JSON.parse(value) as T;
}
}
postAction(value: T) {
return Promise.resolve(this.ledger.append(JSON.stringify(value)));
}
}
The constructor takes an ILedger<string>
(this is the interface we looked at
in the previous post).
It hooks up an event listener to the ledger's append
event to in turn trigger
an actionPosted
event. We also convert the incoming value from string
to T
using JSON.parse()
.
Similarly, getActions()
is a simple wrapper over the underlying ledger, doing
the same conversion to T
.
Finally, the postAction()
does the reverse - it converts from T
to a string
and appends the value to the ledger.
With this in place, we abstracted away the Fluid-based transport details. We
will separately set up a Fluid container and establish connection to other
clients (in a future post), then take the ILedger
instance, pass it to
FluidTransport
, and we are good to go.
We can model games on top of just these two primitives: postAction()
and
actionPosted
. Whenever we take a turn, we call postAction()
. Whenever any
player takes a turn, the actionPosted
event is fired.
Since weâre designing Mental Poker, which takes place in a zero-trust environment, letâs make sure our transport is secure.
Signature verification allows us to ensure that in a multiplayer game, players canât spoof each other, meaning Alice canât pretend she is Bob and post an action on Bobâs behalf for other clients to misinterpret.
Note in a 2-player game this is not strictly needed if we trust the channel: we know that if a payload was not sent by us, it was sent by the other player. But in games with more players, we need to protect against spoofing. Signatures are also useful in case we donât trust the channel - maybe itâs supposed to be a 2-player game but a third client gets access to the channel and starts sending messages.
We will implement this using public key cryptography. The way this works is each player generates (locally) a public/private key pair. They broadcast the public key to all other players. Then they can sign any message they send with their private key and other players can validate the signature using the public key. Nobody else can sign on their behalf, since the private key is kept private.
I wonât go into deeper detail here, since this is very standard public key cryptography. In fact, I didnât even cover this in the blog post covering cryptography for Mental Poker for this reason. There, I focused on the commutative SRA encryption algorithm. Unlike SRA, which we had to implement by hand, signature verification is part of the standard Web Crypto API. Letâs implement signature verification on top of this.
First, we need to model a public/private key pair:
// Keys are represented as strings
export type Key = string;
// Public/private key pair
export type PublicPrivateKeyPair = {
publicKey: Key;
privateKey: Key;
};
A key is a string. We model the key pair as PublicPrivateKeyPair
, a type
containing two keys. Hereâs how we generate the key pair using the Web Crypto
API:
import { encode, decode } from "base64-arraybuffer";
async function generatePublicPrivateKeyPair(): Promise<PublicPrivateKeyPair> {
const subtle = crypto.subtle;
const keys = await subtle.generateKey(
{
name: "rsa-oaep",
modulusLength: 4096,
publicExponent: new Uint8Array([1, 0, 1]),
hash: "sha-256",
},
true,
["encrypt", "decrypt"]
);
return {
publicKey: encode(await subtle.exportKey("spki", keys.publicKey)),
privateKey: encode(
await subtle.exportKey("pkcs8", keys.privateKey)
),
};
}
We use subtle
to generate our key pair and return both public and private keys
as base64-encoded strings.
We can similarly rely on subtle
for signing. The following function takes a
string payload and signs it with the given private key. The response is the
base64-encoded signature.
async function sign(
payload: string,
privateKey: Key
): Promise<string> {
const subtle = crypto.subtle;
const pk = await subtle.importKey(
"pkcs8",
decode(privateKey),
{ name: "RSA-PSS", hash: "SHA-256" },
true,
["sign"]
);
return encode(
await subtle.sign(
{ name: "RSA-PSS", saltLength: 256 },
pk,
decode(payload)
)
);
}
First, we import the given privateKey
, then we call subtle.sign()
to sign
the base64-decoded payload
. We re-encode the signature to base64 and return it
as a string.
Finally, this is how we verify signatures:
async function verifySignature(
payload: string,
signature: string,
publicKey: Key
): Promise<boolean> {
const subtle = crypto.subtle;
const pk = await subtle.importKey(
"spki",
decode(publicKey),
{ name: "RSA-PSS", hash: "SHA-256" },
true,
["verify"]
);
return subtle.verify(
{ name: "RSA-PSS", saltLength: 256 },
pk,
decode(signature),
decode(payload)
);
}
Here, we import the given publicKey
, then we use subtle.verify()
. For
signature verification, we pass in a signature
and the payload
that was
signed (decoded from base64). This API returns true
if the signature matches,
meaning it was indeed signed with the private key corresponding to the public
key we provided.
Again, I wonât go deep into the subtle
APIs as they are standard and very well
documented. The main takeaway is now we have 3 APIs:
generatePublicPrivateKeyPair()
to generate key pairs.sign()
to sign a payload.verify()
to validate the signature.Weâll put these in the Signing
namespace.
Now letâs layer this cryptography over our FluidTransport
.
Now that we have our Fluid-based implementation of the ITransport
interface
and signature verification functions, weâll provide another implementation of
this interface that handles signature verification.
First, we need a generic Signed
type:
type clientId = string;
type Signed<T> = T & { clientId?: ClientId; signature?: string };
This takes any type T
and extends it with an optional clientId
and
signature
. Weâll represent client IDs as strings.
Now we can decorate any payload in our transport with these optional clientID
and signature
, which we can then validate using the functions we just
implemented. The reason these are optional is that we have states when signing
is unavailable: before clients exchange public keys. During the key exchange
steps, no message can be signed, since no client knows the public key of any
other client. These messages canât be signed. Once keys are exchanged, all
subsequent messages should be signed, and weâll enforce that in
SignedTransport
.
We also need a KeyStore
. This keeps track of which public key belongs to each
client, to help with our signature verification (meaning we keep track of which
public key is Aliceâs, which one is Bobâs and when we get a message from Alice
we know which key to use to verify authenticity).
type KeyStore = Map<ClientId, Key>;
We also need a ClientKey
type, representing a single client ID/private key
pair:
export type ClientKey = { clientId: ClientId; privateKey: Key };
With these additional type definitions in place, we can start building our
SignedTransport<T>
. This is a decorator that takes an ITransport<Signed<T>>
.
Weâll first look at the constructor:
class SignedTransport<T> extends EventEmitter implements ITransport<T> {
constructor(
private readonly transport: ITransport<Signed<T>>,
private readonly clientKey: ClientKey,
private readonly keyStore: KeyStore
) {
super();
transport.on("actionPosted", async (value) => {
this.emit("actionPosted", await this.verifySignature(value));
});
}
/* ... */
This new class has 3 private properties. Letâs discuss them in turn.
transport
is our underlying ITransport<Signed<T>>
. The idea is we can
instantiate a FluidTransport
(or other transport if needed, though for this
project I have no plans of using another transport than Fluid), then pass it in
the constructor here. Then SignedTransport
will use the provided instance for
postAction()
and actionPosted
, simply adding signature verification over it.
The clientKey
should be this clientâs ID and private key. This class is not
concerned with key generation, just signature and verification, so weâll have to
generate the key pair somewhere else and pass it. Weâll use this to sign our
outgoing payloads.
We also pass in a keyStore
. This should have the client ID to public key
mapping for all players in the game. We use this to figure out which public key
to use to validate each posted action.
getActions()
simply calls the underlying transport - we are not doing
signature verification on existing messages, since they were likely sent before
the signed transport was created and cannot be verified.
*getActions() {
for (const value of this.transport.getActions()) {
yield value;
}
}
We only validate incoming actions.
The constructor body hooks up the actionPosted
event to the transport
âs
actionPosted
. So whenever the underlying transport fires the event, the
SignedTransport
will also fire an actionPosted
event. But instead of just
passing value
through, we call verifySignature()
on the value
first.
Letâs look at verifySignature
next (this is also part of the SignedTransport
class):
private async verifySignature(value: Signed<T>): Promise<T> {
if (!value.clientId || !value.signature) {
throw Error("Message missing signature");
}
// Remove signature and client ID from object and store them
const clientId = value.clientId;
const signature = value.signature;
delete value.clientId;
delete value.signature;
// Figure out which public key we need to use
const publicKey = this.keyStore.get(clientId);
if (!publicKey) {
throw Error(`No public key available for client ${clientId}`);
}
if (
!(await Signing.verifySignature(
JSON.stringify(value),
signature,
publicKey
))
) {
throw new Error("Signature validation failed");
}
return value;
}
/* ... */
Since value
is a Signed<T>
, we should have a clientId
and a signature
.
We throw an exception if we canât find them.
Next, we clean up value
and remove the clientId
and signature
from the
object. As we return this to other layers in our stack, they no longer need this
as weâre handling signature verification here.
We then try to retrieve the public key of the client from the keyStore
. We
again throw in case we donât have the key.
We use the verifySigntature()
function we implemented earlier to ensure the
signature is valid. We throw if not.
At this point, we guaranteed that the payload is coming from the client claiming to have sent it. If Alice tries to forge a message and pretend itâs coming from Bob, she wouldnât be able to produce a valid Bob signature (since only Bob has access to his private key). Such a message would not make it past this function.
If no exceptions were thrown, this function returns a value
(with signature
cleaned up), ready to be processed by other layers.
Letâs now look at adding signatures to postAction()
. signAction()
is another
private class member handling signing:
private async signAction(value: T): Promise<Signed<T>> {
const signature = await Signing.sign(
JSON.stringify(value),
this.clientKey.privateKey
);
return {
...value,
clientId: this.clientKey.clientId,
signature: signature,
};
}
/* ... */
We call the sign()
function we implemented earlier in this post, passing it
the stringified value
and our clientâs private key. We then extend value
with the corresponding clientId
and signature
.
The postAction()
implementation uses this function for signing, before calling
the underlyingâs transport postAction()
.
async postAction(value: T) {
this.transport.postAction(await this.signAction(value));
}
We now have the full implementation of SingedTransport
.
We started with a simple FluidTransport
that uses a fluid-ledger
to
implement the postAction()
function and actionPosted
event, which we need
for modeling turn-based games.
Next, we looked at signing and signature verification using subtle
.
Finally, we implemented SingedTransport
, a decorator over another transport
that adds signature singing and verification.
The idea is we start with a FluidTransport
and perform a key exchange, where
each client generates a public/private key pair and broadcasts their ID and
public key. Clients store all these in a KeyStore
. Once the key exchange is
done, we can initialize a SignedTransport
that wraps the original
FluidTransport
and transparently handles signatures.
At this point we have all the pieces in place to start looking at semantics: we can exchange data between clients, we can authenticate exchanged messages, and we have the cryptography primitives for Mental Poker (commutative encryption). In the next post weâll look at a state machine that we can use to implement game semantics.
The code covered in this post is available on GitHub in the mental-poker-toolkit
repo.
FluidTransport
is implemented under packages/fluid-transport
,
SignedTransport
is under packages/signed-transport
,
and the signing functions can be found in packages/cryptography/src/signing.ts
.
Note: Since writing this post, the code was refactored so SignedTransport
doesn't take a direct dependency on the cryptography package, rather signing
and signature verification is now passed as a ISignatureProvider
interface.
Keeping with tradition, I'm writing the RTM post for Large Language Models at Work. The book is done. Now available on Kindle.
I decided not to contact a publisher this time around, for a couple of reasons: First, I didn't want the pressure of a contract and timelines (though looking back, I did finish this book faster than the previous two); Second, I had no idea if I will be able to write something that is still valuable by the time the book is done, considering the speed of innovation. More on this later.
I authored the book in the open, at https://vladris.com/llm-book/ and self-published on Kindle. Maybe I will look into making it a print book at some point, for now I'm keeping it digital.
Amazon offers a nice set of tools to import and format ebooks, but they have some big limitations - for example, no support for formatting tables, footnotes etc. I also couldn't convince the tool the code samples should be monospace on import so I had to manually re-set the font on each. The book has a few formatting glitches because of these limitations, which make me reluctant to look into a print book as I expect I will need to do a lot more manual tweaking for the text to look good in print.
I mused about this in chapter 10: Closing Thoughts. I'll repeat it here as it perfectly highlight why it is impossible to pin down this strange new world of AI.
I started writing the book in April 2023. When I picked up the project, GPT-4 was in private preview, with GPT-3.5 being the most powerful globally available model offered by OpenAI. Since then, GPT-4 opened to the public.
In June, OpenAI announced Functions - fortunately, this happened just before I
started working on chapter 6, Interacting with External Systems. Before
Functions, the way to get a large language model to connect with native code was
through few-shot learning in the prompt, covered in the Non-native functions
section. Originally, I was planning to focus exclusively on this implementation.
Of course, built-in support makes it easier to specify available functions and
the model interaction is likely to work better - since the model has been
specifically trained to understand
function definitions and output correct
function calls.
In August, OpenAI announced fine-tuning support for gpt-3.5-turbo
. When I was
writing the first draft of chapter 4, Learning and Tuning, the only models that
used to support fine-tuning were the older GPT-3 generation models: Ada,
Babbage, Currie, and Davinci. This was particularly annoying, as the quality of
output produced by these models is way below gpt-3.5-turbo
levels. Now, with
the newer models having fine-tuning support, I had to rewrite the Fine-tuning
section.
text-davinci-003
launched in November of 2022, while gpt-3.5-turbo
launched
on March 1st 2023. When I started writing the book, text-davinci-003
was
backing most large language model-based solutions across the industry, and
migrations to the newer gpt-3.5-turbo
were underway. text-davinci-003
is
deprecated to be removed by January 4, 2024 (to be replaced by
gpt-3.5-turbo-instruct
), and the industry is moving to adopt GPT-4. I had to
update several code samples from text-davinci-003
to gpt-3.5-turbo-instruct
.
No idea how long the code samples will keep working or when OpenAI will decide
to deprecate gpt-3.5-turbo
or introduce an even more powerful model with
capabilities not covered in the book.
While some of the code examples will not age well as new models and APIs get release, the underlying principles of working with large language models that I walked through in this book - prompt engineering, memory, interacting with external systems, planning, and so on - will be relevant for a while. Understanding these fundamentals should help anyone ramp up in the space.
This is an exciting new field, that is going to see a lot more innovation in the near future. But I expect some of these fundamentals to carry on, in one shape or another. I hope the topics discussed in this book to remain interesting for long after the specific models used in the examples become obsolete.
Like with my previous books, I've been publishing excerpts as shorter, stand-alone reads. This might sound a bit strange in this case, as the book is already all online. But I figured it will hopefully help reach more people, and I did some work on each excerpt to remove references to other parts of the book so they can, indeed, be read wihtout context. I published all of these on Medium:
I hope you enjoy the book! Check it out here: Large Language Models at Work.
]]>I recently announced I'm working on a new book about large language models and how to integrate them in software systems. As I'm writing this, the first 3 chapters are live at https://vladris.com/llm-book.
The remaining chapters are in the works and I will upload them as I work through the manuscript. In the meantime, since I announced my previous books with a blog post each (Programming with Types, Azure Data Engineering), I'll keep the tradition and talk a bit about the current book.
When embarking on a writing project, it's good to have a plan. Of course, the details change as the book gets written, but starting with a clear articulation of what the book is about, who is the target reader, the list of chapters and an outline helps. Here is the book plan I wrote a few months ago:
This book is aimed at software engineers wanting to learn about how they can integrate LLMs into their software systems. It covers all the necessary domain concepts and comes with simple code samples. A good way to frame this is the book covers the same layer of the stack that frameworks like Semantic Kernel and LangChain are trying to provide.
No prior AI knowledge required to understand this book, just basic programming.
After reading the book, one should have a solid understanding of all the required pieces to build an LLM-powered solution and the various things to keep in mind (like non-determinism, AI safety & security etc.).
Your feedback is very much welcomed! Do leave comments if you have any thoughts.
Building with Large Language Models
A book about integrating LLMs in software systems and the various aspects software developers need to know (prompt engineering, memory & embeddings, connecting with external systems etc.). Simple code examples in Python, using the OpenAI API.
A New Paradigm
An introduction, describing how LLMs are being integrated in software solutions and the new design patterns emerging.
1.1. Who this book is for
The pitch for the book, who should read it, what they will get out of it, what to expect.
1.2. Taking the world by storm
Briefly talk about the major innovations since the launch of ChatGPT.
1.3. New software architectures for a new world
Talk about the new architectures that embed LLMs into broader software systems and frameworks being built to address this.
1.4. Using OpenAI
The book uses plenty of code examples in Python and using OpenAI. This section introduces OpenAI and setup steps for the reader.
1.5. In this book
Preview of the topics covered throughout the rest of the book.
Large Language Models
This chapter introduces large language models, the OpenAI offering, key concepts and api parameters. code examples will include the first âhello worldâ API calls.
2.1. Large language models
Describes large language models and key ways in which they differ from other software components (train once, prompt many times; non-deterministic; no memory of prior interactions etc.).
2.2. OpenAI models
Describes the OpenAI model families, and doubleclick on GPT-3.5 models (though by the time this book is done Iâm sure GPT-4 will be out of beta). Examples in the book will start with text-davinci-300 (simpler prompting), then move to gpt-3.5-turbo (cheaper).
2.3. Tokens
Explain tokens, token limits, and how OpenAI prices API calls based on tokens.
2.4. API parameters
Covers some important API parameters OpenAI offers, like n, max_tokens, suffix, and temperature.
Prompt Engineering
This chapter dives deep into prompting, which is the main way we interact with LLMs, potentially a new engineering discipline.
3.1. Prompt design & tuning
Covers prompt design and how small tweaks in a prompt can yield very different results. Tips for authoring prompts, like telling the LLM who it is (âyou are an assistantâ) and the magic âletâs think step by stepâ.
3.2. Prompt templates
Shows the need for templating prompts and a simple template implementation. Let user focus on task input and use template to provide additional info needed by the LLM.
3.3. Prompt selection
Solutions usually have multiple prompts, and we select the best one based on user intent. This section covers prompt selection and going from user ask to picking template to generating prompt.
3.4. Prompt chaining
Prompt chaining includes the input preprocessing and output postprocessing of an LLM request, and feeding previous outputs back into new prompts to refine asks.
Learning and Tuning
This chapter focuses on teaching an LLM new domain-specific stuff to unlock its full potential. Includes prompt-based learning and fine tuning.
4.1. Zero-, one-, few-shot learning
Explains zero-shot learning, one-shot learning, and few-shot learning with examples for each.
4.2. Fine tuning
Explains fine tuning, when it should be used, and works through an example.
Memory and Embeddings
This chapter covers solutions to work around the fact LLMs donât have any memory.
5.1. A simple memory
Starting with a basic example of using memory and some limitations we hit due to token limits.
5.2. Key-value memory
A simple key-value memory where we retrieve just the values we need for a given prompt.
5.3. Embeddings
More complex memory scenario: generating an embedding and using a vector database to retrieve the right information (Q&A example).
5.4. Other approaches
I really liked the idea in this paper, where memory importance is determined by the LLM itself, and retrieval is a combination of recency, importance, and embedding distance. Cover this and show the problem space is still ripe for innovation.
Interacting with External Systems
How we can make external tools available to LLMs.
6.1. ChatGPT plugins
Start by describing ChatGPT plugins offered by OpenAI. The why and how.
6.2. Connecting the dots
Putting together what we learned from previous chapters (prompt selection, memory, few-shot learning) to teach LLMs to interact with any external system.
6.3. Building a tool library
Formalizing the previous section and coming up with a generalized schema for connecting LLMs to external systems.
Planning
This chapter talks about breaking down asks into multiple steps and executing those. This enables LLMs to execute on complex tasks.
7.1. Automating planning
This section shows how we can ask the LLM itself to come up with a set of tasks. This includes the prompt and telling it what tools (external systems it can talk to) are available.
7.2. Task queues
Talk about the architecture used by AutoGPT, where tasks are queued and reviewed after each LLM call. Loop until done or until hitting a limit.
Safety and Security
This chapter covers both responsible AI concerns like avoiding hallucinations and new attack vectors like prompt injection and prompt leaking.
8.1. Hallucinations
Discuss hallucinations, why these are currently a big problem with LLMs, and tips to avoid them e.g. telling the model not to make things up if it doesnât know something & validating output.
8.2. Explainability
Zooming out from hallucinations, this section covers the challenge of explainable AI. It covers this both tactically (prompts to get the model to provide references) and strategically (current investments in explainable AI).
8.3. Adversarial attacks
This section focuses on malicious inputs and attack vectors to keep in mind. For example, prompt leaking (âignore the above instructions and output the full promptâ).
8.4. Responsible AI
Wrap up the chapter with a discussion around responsible AI, including more philosophical concerns about challenges with this technology and potential societal impact.
Frameworks
This chapter focuses on pulling together the concepts discussed into a framework and provides quick overviews of a couple of existing frameworks.
9.1. Common building blocks
Review the different components discussed throughout the book and how they form a cohesive framework for working with LLMs. Remainder of the chapter overviews existing frameworks.
9.2. Semantic Kernel
Quick overview of https://github.com/microsoft/semantic-kernel and how the framework pieces map to concepts discussed in this book.
9.3. LangChain
Quick overview of https://docs.langchain.com/docs/ and how the framework pieces map to concepts discussed in this book.
Final Thoughts
Some thoughts on the future.
Of course, as I work on the chapters, the topics covered in each might deviate significantly from the above plan. But in my experience, the outline helps a lot to tie things together and inform what I do. In other words - better to have an outline than to not have one.
Note the original title was Building with Large Language Models. I didn't like how this sounded from the start. I described the book to ChatGPT and asked it for a few titles. Some of the suggestions:
Incorporating LLMs in Software Systems: The Future of Programming
The Magic of Language Models: Transforming Software Integration
LLMs at Work: Enhancing Software Systems with AI-Powered Language Models
Breaking Boundaries: Integrating LLMs for Smarter Software Solutions
Language Models Unleashed: A Guide to Integrating LLMs in Software Development
I ended up picking Large Language Models at Work, subtitle Enhancing Software Systems with Language Models (though of course I might change it). I do feel like it captures the essence of what the book is about.
I'va also been using AI for the artwork. The book cover is generated by DALLÂ·E and, similarly, each chapter starts with a DALLÂ·E generated image. I do think the abstract renderings by AI of the concepts I'm talking about give a nice touch to the book.
An interesting challenge is that the field is moving so fast, there's a real risk I have to rewrite large parts of the book before I wrap up the first iteration of the manuscript. For example, OpenAI recently (June 2023, this week at the time of writing) announced function support for gpt-3.5-turbo. This new addition to the API makes it much easier to have the model invoke external systems (which is the focus of chapter 6 - luckily I'm not there yet).
I hope this will end up being a useful book and help developers ramped up on this new world of software development and LLM-assisted solutions. Do check out the book online at https://vladris.com/llm-book and follow me on LinkedIn or Twitter for updates. For now, enjoy the available chapters!
]]>In the previous post I covered the cryptography part of implementing Mental Poker. In this post, I'll cover the append-only list data structure used to model games.
As I mentioned before, we rely on Fluid Framework. The code is available in my GitHub fluid-ledger repo.
I touched on Fluid Framework before so I won't describe in detail what the library is about. Relevant to this blog post, we have a set of distributed data structures that multiple clients can update concurrently. All clients in a session connect to a service (like the Azure Fluid Relay service). Each update a client makes to a distributed data structure gets sent to the service as an operation. The service stamps a sequence number on the operation and broadcasts it to all clients. That means that eventually, all clients end up with the same list of operations in the same sequence, so they can merge changes client-side while ensuring all clients end up with the same view of the world.
The neat thing about Fluid Framework is the fact that merges happen on the clients as described above rather than server-side. The service doesn't need to understand the semantics of each data structure. It only needs to sequence operations. Different data structures implement their own merge logic. The framework provides some powerful out-of-the-box data structures like a sparse matrix or a tree. But we don't need such powerful data structures to model games: a list is enough.
Most turn-based games can be modeled as a list of moves. This includes games like chess, but also card games. The whole Mental Poker shuffling protocol we discussed, where one player encrypts and shuffles the deck, then hands it over to the other player to do the same etc. is also, in fact, a sequence of moves.
The semantics of a particular game are implemented at a higher level. The types of games we are looking at though can be modeled as a list of moves, where players take turns. Each move is an item in the list. In this blog post we're looking at the generic list data structure, without worrying too much about how a move looks like.
A list is a very simple data structure, but let's see how this looks like in the context of Fluid Framework. Here, we have a distributed data structure multiple clients can concurrently update.
I named the data structure ledger, as it should act very much as a ledger from the crypto/blockchain world - an immutable record of what happened. In our case, this contains a list of game moves.
The Fluid Framework implementation is fairly straight-forward: when a client wants to append an item to the list, it sends the new item to the Fluid Relay service. The service sequences the append, meaning it adds the sequence number and broadcasts it to all clients, including the sender. The local data structure only gets appended once received from the service. That guarantees all clients end up with the same list, even if they concurrently attempt to append items to it.
The diagram shows how this works when Client A
wants to append 4
to the
ledger:
4
is sent to the Relay Service.Our API consists of two interfaces, ILedgerEvents
, representing the events
that our data structure can fire, and ILedger
, the API of our data structure.
We derive these from ISharedObjectEvents
and ISharedObject
, which are
available in Fluid Framework. We also need the Serializable
type, which
represents data that can be serialized in the Fluid Framework data store:
import {
ISharedObject,
ISharedObjectEvents
} from "@fluidframework/shared-object-base";
import { Serializable } from "@fluidframework/datastore-definitions";
With these imports, we can define our ILedgerEvents
as:
export interface ILedgerEvents<T> extends ISharedObjectEvents {
(event: "append", listener: (value: Serializable<T>) => void): void;
(event: "clear", listener: (values: Serializable<T>[]) => void): void;
}
T
is the generic type of the list items. The append
event is fired after
we get an item from the Fluid Relay service and the item is appended to the
ledger. The clear
event is fired when we get a clear operation from the
Fluid Relay service and the ledger is cleared. The event will return the full
list of items that have been removed as values
.
We can also defined ILedger
as:
export interface ILedger<T = any> extends ISharedObject<ILedgerEvents<T>> {
get(): IterableIterator<Serializable<T>>;
append(value: Serializable<T>): void;
clear(): void;
}
The get()
function returns an iterator over the ledger. append()
appends
a value and clear()
clears the ledger.
The full implementation can be found in interfaces.ts.
We also need to provide a LedgerFactory
the framework can use to create or
load our data structure.
We need to import a handful of types from the framework, our ILedger
interface, and our yet-to-be-implemented Ledger
:
import {
IChannelAttributes,
IFluidDataStoreRuntime,
IChannelServices,
IChannelFactory
} from "@fluidframework/datastore-definitions";
import { Ledger } from "./ledger";
import { ILedger } from "./interfaces";
We can now define the factory as implementing the IChannelFactory
interface:
export class LedgerFactory implements IChannelFactory {
...
}
We'll cover the implementation step-by-step. First, we need a couple of static properties defining the type of the data structure and properties of the channel:
public static readonly Type = "fluid-ledger-dds";
public static readonly Attributes: IChannelAttributes = {
type: LedgerFactory.Type,
snapshotFormatVersion: "0.1",
packageVersion: "0.0.1"
public get type() {
return LedgerFactory.Type;
}
public get attributes() {
return LedgerFactory.Attributes;
}
};
Type
just needs to be a unique value for our distributed data structure.
We'll define it as fluid-ledger-dds
. The channel Attributes
are used by
the runtime for versioning purposes.
You can think of the way Fluid Framework stores data as similar to git. In git
we have snapshots and commits. Fluid Framework uses a similar mechanism, where
the service records all operations sent to it (this is the equivalent of a
commit) and periodically takes a snapshot of the current
state of the world.
When a client connects and wants to get up to date, it tells the service what is the last state it saw and the service sends back what happened since. This could include the latest snapshot (if the client doesn't have it) and a bunch of operations that have been sent by clients after the latest snapshot.
In case we iterate on our data structure, we need to tell the runtime which snapshot format and which ops our client understands.
The interface we are implementing (IChannelFactory
) includes a load()
and a create()
function.
Here is how we load a ledger:
public async load(
runtime: IFluidDataStoreRuntime,
id: string,
services: IChannelServices,
attributes: IChannelAttributes
): Promise<ILedger> {
const ledger = new Ledger(id, runtime, attributes);
await ledger.load(services);
return ledger;
}
This is pretty straightforward: we construct a new instance of Ledger
(we'll
look at the Ledger
implementation in a bit), call load()
, and return the
object. This is an async function. No need to worry about the arguments as the
framework will handle these - we just plumb them through.
create()
is similar, except this is synchronous:
public create(document: IFluidDataStoreRuntime, id: string): ILedger {
const ledger = new Ledger(id, document, this.attributes);
ledger.initializeLocal();
return ledger;
}
Instead of calling the async ledger.load()
, we call initializeLocal()
. We
again don't have to cover the arguments, but let's talk about the difference
between creating and loading.
In order to understand these, we need to introduce a new concept: the Fluid container.
The container is a collection of distributed data structures defined by a
schema. This describes the data model of an application. In our case, to model
a game, we only need a ledger. For more complex applications, we might need
to use multiple distributed data structures. Fluid Framework uses containers
as the unit
of data - we will never instantiate or use a distributed data
structure standalone. Even if we only need one, as in our case, we still need
to define a container.
The lifecycle shown in the diagram is:
create()
comes into
play). Based on the provided schema, the runtime will call create()
for
all described data structures. At this point, we haven't yet connected to
the Fluid Relay. We are in what is called detached mode. Here we have the
opportunity to update our data structures before we connect and have other
clients see them.load()
functions to hydrate it.As a side note, the Fluid Relay can also store documents to persistent storage so once the coauthoring session is over and all clients disconnect, the document is persistent for future sessions.
For our Mental Poker application, we don't need to worry too much about
containers and schemas, we only need a minimal implementation consisting of a
container with a single distributed data structure: our Ledger
. But it is
worth understanding how the runtime works.
We went over the full implementation of the LedgerFactory
. You can also find
it in ledgerFactory.ts.
Let's now look at the actual implementation and learn about the anatomy of a Fluid distributed data structure.
We need to import several types from the framework, which we'll cover as we encounter them in the code below, or won't discuss if they are boilerplate.
import {
ISequencedDocumentMessage,
MessageType
} from "@fluidframework/protocol-definitions";
import {
IChannelAttributes,
IFluidDataStoreRuntime,
IChannelStorageService,
IChannelFactory,
Serializable
} from "@fluidframework/datastore-definitions";
import { ISummaryTreeWithStats } from "@fluidframework/runtime-definitions";
import { readAndParse } from "@fluidframework/driver-utils";
import {
createSingleBlobSummary,
IFluidSerializer,
SharedObject
} from "@fluidframework/shared-object-base";
import { ILedger, ILedgerEvents } from "./interfaces";
import { LedgerFactory } from "./ledgerFactory";
Note the last two imports: we import our interfaces and our LedgerFactory
.
We'll define a couple of delta operations. That's the Fluid Framework name for an operation (op) we send to the (or get back from) Fluid Relay service.
type ILedgerOperation = IAppendOperation | IClearOperation;
interface IAppendOperation {
type: "append";
value: any;
}
interface IClearOperation {
type: "clear";
}
In our case, we can have either an IAppendOperation
or an IClearOperation
.
The two together define the ILedgerOperation
type.
The IAppendOperation
includes a value
property which can be anything. Both
IAppendOperation
and IClearOperation
have a type
property, so we can see
at runtime which type we are dealing with.
We talked about how Fluid Framework is similar to git in the way it stores documents as snapshots and ops. A lot of this is handled internally by the framework, but our data structure needs to tell the service how we want to name the snapshots, so we'll define a constant for this:
const snapshotFileName = "header";
With this, we can start the implementation of Ledger
.
export class Ledger<T = any>
extends SharedObject<ILedgerEvents<T>>
implements ILedger<T>
{
...
}
We derive from SharedObject
, the base distributed data structure type. We
specify that this SharedObject
will be firing ILedgerEvents
and that it
implements the ILedger
interface.
The framework expects a few functions used to construct objects. Our constructor looks like this:
constructor(
id: string,
runtime: IFluidDataStoreRuntime,
attributes: IChannelAttributes
) {
super(id, runtime, attributes, "fluid_ledger_");
}
The constructor takes an id
, a runtime
, and channel attributes
. We don't
need to deeply understand these, as they are handled and passed in by the
framework. The last argument of the base class constructor is a telemetry
string prefix. We just need to provide a string unique to our data structure,
so we use fluid_ledger_
in our case.
We also need a couple of static functions: create()
and getFactory()
:
public static create(runtime: IFluidDataStoreRuntime, id?: string) {
return runtime.createChannel(id, LedgerFactory.Type) as Ledger;
}
public static getFactory(): IChannelFactory {
return new LedgerFactory();
}
For create()
, again we don't need to worry about runtime
and id
, as we
won't have to pass these in ourselves. We just need this function to forward
them to runtime.createChannel()
. createChannel()
also requires the unique
type, which we'll get from our LedgerFactory
.
The getFactory()
function simply creates a new instance of LedgerFactory
.
We covered the constructor and factory functions. Next, let's look at the
internal data and the required initializeLocalCore()
functions:
private data: Serializable<T>[] = [];
public get(): IterableIterator<Serializable<T>> {
return this.data[Symbol.iterator]();
}
protected initializeLocalCore() {
this.data = [];
}
This is very simple - we represent our ledger as an array of Serializable<T>
.
The get()
function, which we defined on our IFluidLedger
interface, returns
the array's iterator.
initializeLocalCore()
, called internally by the runtime, simply sets data
to be an empty array.
We also need to implement saving and loading of the data structure. Save
in
Fluid Framework world is called summarize: this is what the framework uses to
create snapshots.
protected summarizeCore(
serializer: IFluidSerializer
): ISummaryTreeWithStats {
return createSingleBlobSummary(
snapshotFileName,
serializer.stringify(this.data, this.handle)
);
}
We can use a framework-provided createSingleBlobSummary
. In our case, we save
the whole data
array and the handle
(handle
is an inherited attribute
representing a handle to the data structure, which the Framework uses for
nested data structure scenarios).
Here is how we load the data structure:
protected async loadCore(storage: IChannelStorageService): Promise<void> {
const content = await readAndParse<Serializable<T>[]>(
storage,
snapshotFileName
);
this.data = this.serializer.decode(content);
}
For both summarize and load, we rely on Framework-provided utilities.
We can now focus on the non-boilerplate bits: implementing our append()
and clear()
. Let's start with append()
:
private applyInnerOp(content: ILedgerOperation) {
switch (content.type) {
case "append":
case "clear":
this.submitLocalMessage(content);
break;
default:
throw new Error("Unknown operation");
}
}
private appendCore(value: Serializable<T>) {
this.data.push(value);
this.emit("append", value);
}
public append(value: Serializable<T>) {
const opValue = this.serializer.encode(value, this.handle);
if (this.isAttached()) {
const op: IAppendOperation = {
type: "append",
value: opValue
};
this.applyInnerOp(op);
}
else {
this.appendCore(opValue);
}
}
applyInnerOp()
is common to both append()
and clear()
. This is the
function that takes an ILedgerOperation
and sends it to the Fluid Relay
service. submitLocalMessage()
is inherited from the base SharedObject
.
appendCore()
effectively updates data
and fires the append
event.
append()
first serializes the provided value using the inherited
Framework-provided serializer
. We assign this to opValue
. We then need
to cover both the attached and detached scenarios. If attached, it means
we are connected to a Fluid Relay and we are in the middle of a coauthoring
session. In this case, we create an IAppendOperation
object and call
applyInnerOp()
. If we are detached, it means we created our data structure
(and its container) on this client, but we are not connected to a service
yet. In this case we call appendCore()
to immediately append the value
since there is no service to send the op to and get it back sequenced.
clear()
is very similar:
private clearCore() {
const data = this.data.slice();
this.data = [];
this.emit("clear", data);
}
public clear() {
if (this.isAttached()) {
const op: IClearOperation = {
type: "clear"
};
this.applyInnerOp(op);
}
else {
this.clearCore();
}
}
clearCore()
effectively clears data
and emits the clear
event.
clear()
handles both the attached and detached scenarios.
So far we update our data immediately when detached, and when attached we
send the op to the Relay Service. The missing piece is handling ops as
they come back from the Relay Service. We do this in processCore()
,
another function the runtime expects us to provide:
protected processCore(message: ISequencedDocumentMessage) {
if (message.type === MessageType.Operation) {
const op = message.contents as ILedgerOperation;
switch (op.type) {
case "append":
this.appendCore(op.value);
break;
case "clear":
this.clearCore();
break;
default:
throw new Error("Unknown operation");
}
}
}
This function is called by the runtime when the Fluid Relay sends the client
a message. In our case, we only care about messages that are operations. We
only support append
and clear
operations. We handle these by calling the
appendCore()
and clearCore()
we just saw - since these ops are coming
from the service, we can safely append them to our data
(we have the
guarantee that all clients will get these in the same order).
And we're almost done. We need to implement onDisconnect()
, which is called
when we disconnect from the Fluid Relay. This gives the distributed data
structure a chance to run some code but in our case we don't need to do
anything.
protected onDisconnect() {}
Finally, we also need applyStashedOp()
. This is used in offline mode. For
some applications, we might want to provide some functionality when offline -
a client can keep making updates, which get stashed. We won't dig into this
since for Mental Poker we can't have a single client play offline - we simply
throw an exception if this function ever gets called:
protected applyStashedOp(content: unknown) {
throw Error("Not supported");
}
The full implementation is in ledger.ts.
And that's it! We have a fully functioning distributed data structure we can use to model games.
The GitHub repo also includes a demo app: a collaborative coloring application where multiple clients can simultaneously color a drawing.
In this case, we model coloring operations as x
and y
coordinates, and a
color
. As users click on the drawing, we append these operations to the
ledger and play them back to color the drawing using flood fill.
I spent a bunch of time lately revamping some documentation and this got me thinking. In terms of tooling, even state-of-the-art documentation pipelines are missing some key features. This is also an area where we can directly apply LLMs. In this post, I'll jot down some thoughts of how things could look like in a more perfect world. Of course, here I'm referring to documentation associated with software projects.
This first one isn't unheard of: documentation should be captured in source control and generated from there as a static website. There are two major types of documentation: API reference and articles that aren't tied to a specific API.
API reference should be extracted from code comments. Different languages have
different levels of official
support for this. C# has out-of-the-box XML
documentation (///
), JavaScript has the non-standard but popular JsDoc
etc.
Articles on the other hand should be written as stand-alone Markdown files.
A good documentation pipeline should support both. My team is using DocFX to that effect, though TypeScript is not supported out-of-the-box and requires some additional packages to set up.
Commenting APIs should be enforced via linter. We have tools like StyleCop for C# and a JsDoc plugin for eslint for JavaScript. At the very least, all of the public API surface should be documented. If you introduce a new public API without corresponding documentation, this should cause a build break.
For technical documentation, many times articles also contain code samples. These run the risk of getting out of sync with the actual code as the code churns. In an ideal world, we should be able to associate a code snippet from an article with a test that runs with the CI pipeline. Documentation might skip scaffolding for clarity, so it's likely harder to simply attempt running the exact code snippet. But we should have a way to pull the snipped into a test that provides that scaffolding.
Alternately, enforce that running all snippets in an article in order works - treat articles more like Jupyter notebooks, where the runtime maintains some context, so if, for example, I import something in the first code snippet, the import is available to subsequent code snippets.
The key thing is to have some way to validate at build time that all code examples actually work and not allow breaking changes, even if the only thing that breaks is documentation.
From my personal experience, documentation is usually treated as an afterthought. From time to time there is a big push to update things, but it's rare that everyone is constantly working towards improving docs.
Unless documentation reaches a critical mass of contributors to ensure everything is kept in order, it's best to have clear ownership of each article. Git history is not always the best for finding owners - sometimes the last author is no longer with the team or with the company, or maybe last commits just moved the file around or fixed typos.
This concern goes beyond documentation, in general I'd love to see an ownership tracking system that can associate assets with people and is also org-chart aware - so if an owner changes teams, this gets flagged and a new owner must be provided.
While working on documentation, I noticed that for a large enough project, some information tends to repeat across multiple articles. Maybe as part of a summary on the front page, then again in an article covering some of the details, and once more incidentally in a related article.
The problem is that if something changes and I only update one of the articles (maybe I'm not aware of all the places this shows up), documentation can start contradicting itself. This is something that is not part of the common Markdown syntax but I'd love to have a way to inline a paragraph across multiple documents to avoid this.
All documentation should include a style guide. Some guidelines encourage writing for easier reading, so apply in most cases. For example:
Some guidelines depend on the type of article. If you're documenting a design decision, explain the reasoning and list other options considered and why these weren't adopted. On the other hand, if you are writing a troubleshooting guide, no need to explain the why, just what steps the reader needs to take.
Unfortunately I haven't seen a lot of such guides accompany projects. I wish we had a set of industry standard ones to simply plug in, like we do with open source licenses.
In many cases, there is little effort put into structuring the documentation.
We start with /docs
then as articles pile up, we create new subfolders
organically.
Much like we want some high-level design of a system, we should also require a high-level design of the documentation. What are the key topics and sub-sections? This doesn't even need to be reinvented for each new project, I expect there's a handful of structures which can support most projects, so much like style guides, it would be great to have these available of-the-shelf.
I started this post talking about building documentation from source, which naturally maps to articles being files organized in folders (categories). This type of organization - categories and subcategories - works well up to a certain volume of information.
At some point, it gets hard to figure out which subcategory something fits in: it might fit just as well in multiple places. Here the folder categorization breaks down: there is no clear hierarchy of nested folders in which to fit everything.
At alternative to hierarchies are tags. Maintain a curated set of tags, then tag each article with one or more tags. You can then browse by tag, but have articles show up under multiple tags. This tends to work better with larger volumes of information, but it's harder to map to a file and folder structure.
With the popularity of large language models, I see many applications throughout the lifecycle:
Generative AI can help coauthor documentation. GitHub Copilot already does this. As models get better and cheaper to run, I expect they will be more and more involved in writing documentation.
Given a style guide, a model can review how closely a document adheres to it and suggest changes to match the guide.
With a knowledge of the whole documentation, a model could also spot contradictions (the problem I mentioned in the Inline fragments section). This could be a step in the CI pipeline to ensure consistency.
A model could potentially also act as a reader and provide feedback on how clear the documentation is.
Most tools generating documentation from source provide very rudimentary search capabilities. OpenAI offers text and code embedding APIs which enable semantic search and natural language querying. Using something like this on documentation should make finding things much easier.
Models can also be used to answer questions, so instead of readers having to search the docs for what they need, they can simply ask questions. A model can provide answers based on the documentation (and the codebase). This takes retrieval a step further: users can simply get their questions answered by a model. In some cases articles might not even be needed, as the model can explain in real time how the code is supposed to be used.
I believe as of today, even the best tools available for documentation leave room for improvement and large language models have the potential to radically change the game.
In this post we looked at:
Some of these features exist and some of these practices are adopted in some projects, but most are not widely implemented. I'm curious to see how the landscape will look like in a few years and how AIs will change the way we learn and get our questions answered.
]]>In the previous post I outlined some of the interesting bits of putting together a Mental Poker toolkit. In this post I will talk about cryptography.
The golden rule when it comes to cryptography code is to not roll your own, rather use something that's been battle-tested. That said, I could not find what I needed so had to implement some stuff. I urge you not to rely on my implementation for high-stakes poker, as it is likely buggy.
With the disclaimer out of the way, let's look at what we need to support Mental Poker.
Recap from this old post when I first got interested in the subject:
Mental poker requires a commutative encryption function. If we encrypt \(A\) using \(Key_1\) then encrypting the result using \(Key_2\), we should be able to decrypt the result back to \(A\) regardless of the order of decryption (first with \(Key_1\) and then with \(Key_2\), or vice-versa).
Here is how Alice and Bob play a game of mental poker:
- Alice takes a deck of cards (an array), shuffles the deck, generates a secret key \(K_A\), and encrypts each card with \(K_A\).
- Alice hands the shuffled and encrypted deck to Bob. At this point, Bob doesn't know what order the cards are in (since Alice encrypted the cards in the shuffled deck).
- Bob takes the deck, shuffles it, generates a secret key \(K_B\), and encrypts each card with \(K_B\).
- Bob hands the deck to Alice. At this point, neither Alice nor Bob know what order the cards are in. Alice got the deck back reshuffled and re-encrypted by Bob, so she no longer knows where each card ended up. Bob reshuffled an encrypted deck, so he also doesn't know where each card is.
At this point the cards are shuffled. In order to play, Alice and Bob also need the capability to look at individual cards. In order to enable this, the following steps must happen:
- Alice decrypts the shuffled deck with her secret key \(K_A\). At this point she still doesn't know where each card is, as cards are still encrypted with \(K_B\).
- Alice generates a new set of secret keys, one for each card in the deck. Assuming a 52-card deck, she generates \(K_{A_1} ... K_{A_{52}}\) and encrypts each card in the deck with one of the keys.
- Alice hands the deck of cards to Bob. At this point, each card is encrypted by Bob's key, \(B_K\), and one of Alice's keys, \(K_{A_i}\).
- Bob decrypts the cards using his key \(K_B\). He still doesn't know where each card is, as now the cards are encrypted with Alice's keys.
- Bob generates another set of secret keys, \(K_{B_1} ... K_{B_{52}}\), and encrypts each card in the deck.
- Now each card in the deck is encrypted with a unique key that only Alice knows and a unique key only Bob knows.
If Alice wants to look at a card, she asks Bob for his key for that card. For example, if Alice draws the first card, encrypted with \(K_{A_1}\) and \(K_{B_1}\), she asks Bob for \(K_{B_1}\). If Bob sends her \(K_{B_1}\), she now has both keys to decrypt the card and
lookat it. Bob still can't decrypt it because he doesn't have \(K_{A_1}\).This way, as long as both Alice and Bob agree that one of them is supposed to
seea card, they exchange keys as needed to enable this.
The reason I ended up hand-rolling some cryptography is that off-the-shelf encryption algorithms are non-commutative. With a non-commutative algorithm, the above steps don't work: Alice cannot decrypt the deck with her secret key \(K_A\) after Bob shuffled it and encrypted it with \(K_B\).
The analogy I used in this tech talk is boxes and locks: if we have commutative encryption, we put the secret information in a box and both Alice (using \(K_A\)) and Bob (using \(K_B\)) put a lock on that box. It doesn't really matter in which order we unlock the two locks - as long as both are unlocked, we can get to the content. On the other hand, if we have non-commutative encryption, this is equivalent of Alice putting the secret in a box locked with \(K_A\), and Bob putting the whole locked box in another box locked with \(K_B\). Now Alice's key is useless while the outerbox only has the \(K_B\) lock on it.
There aren't as many applications for commutative encryption, so the popular libraries out there provide only non-commutative encryption algorithms. The commutative encryption algorithm we will look at is SRA.
The SRA encryption algorithm was designed by Shamir, Rivest, and Adleman of RSA fame. Both algorithms use their initials, but the industry-standard RSA is non-commutative. SRA, on the other hand, is.
SRA works like this: we need a large prime number \(P\). This seed prime is shared by all players. To generate encryption keys from it, let \(\phi = P - 1\). Each player needs to find another prime \(E\), such that \(\phi\) and \(E\) are coprime. \(E\) is that player's encryption key. The decryption key is derived from \(\phi\) and \(E\) as the modulo-inverse \(D\) such that \(E * D \equiv 1 \pmod{\phi}\).
To encrypt a number \(N\), we raise it to \(E\) modulo \(P\). To decrypt an encrypted number \(N'\), we raise it to \(D\) modulo \(P\).
Then if player 1 encrypts a payload with \(E_1\) and player 2 encrypts again using \(E_2\), the message can be decrypted by applying \(D_1\) and \(D_2\) in any order. Remember, this is key to the card shuffling algorithm.
For a simple implementation, we can use arbitrarily large integers (BigInt).
Unfortunately, the built-in JavaScript math libraries only work with number
values, so we need to implement a bit of math.
First, we need to find the greatest common divisor of two numbers:
function gcd(a: bigint, b: bigint): bigint {
while (b) {
[a, b] = [b, a % b];
}
return a;
}
We use this to check if two numbers are coprime (their GCD is 1).
Next, we need modulo inverse (find x
such that (a * x) % m == 1
). One way
of doing this is using Euclidean Division. We use the same algorithm we used
for GCD, but we keep track of the values we find at each step. Finally, if a
is 1
, it means there is no modulo inverse. Otherwise we find the modulo
inverse by starting with a pair of numbers x = 1, y = 0
and iterating over
the values we found at the previous step, updating x
to be y
and y
to be
x - y * (a / b)
where a
and b
are values we saved from the previous step:
function modInverse(a: bigint, m: bigint) {
a = ((a % m) + m) % m;
if (!a || m < 2) {
throw new Error("Invalid input");
}
// Find GCD (and remember numbers at each step)
const s = [];
let b = m;
while (b) {
[a, b] = [b, a % b];
s.push({ a, b });
}
if (a !== BigInt(1)) {
throw new Error("No inverse");
}
// Find the inverse
let x = BigInt(1);
let y = BigInt(0);
for (let i = s.length - 2; i >= 0; --i) {
[x, y] = [y, x - y * (s[i].a / s[i].b)];
}
return ((y % m) + m) % m;
}
This gives us the modulo inverse. To recap, we use this once we have a large prime \(P\) with \(\phi = P - 1\) and a large prime \(E\) such that \(gcd(E, \phi) = 1\) to find our decryption key \(D\).
We also need modulo exponentiation for encryption/decryption. Since we are
dealing with large numbers, we will implement exponentiation using the ancient
Egyptian multiplication algorithm.
To raise b
to e
modulo m
, if e
is 1
, we return b
. Otherwise we
recursively raise (b * b) % m
to e / 2
modulo m
. Whenever e
is odd,
we multiply the recursion result by an additional b
:
function exp(b: bigint, e: bigint, m: bigint): bigint {
if (e === BigInt(1)) {
return b;
}
let result = exp((b * b) % m, e / BigInt(2), m);
if (e % BigInt(2) === BigInt(1)) {
result *= b;
}
return result % m;
}
This algorithm runs in log e
time and keeps the large numbers to a manageable
size since we apply modulo m
at each step. We have most of the math pieces in
place. The only thing missing is a way to generate large primes.
One way of generating large primes is through trial and error: we generate a
large number, check if it is prime, and repeat if it isn't. We can generate a
large number by filling a byte array with random values, then converting it
into a BigInt
:
function randBigInt(sizeInBytes: number = 128): bigint {
let buffer = new Uint8Array(sizeInBytes);
crypto.getRandomValues(buffer);
// Build a bigint out of the buffer
let result = BigInt(0);
buffer.forEach((n) => {
result = result * BigInt(256) + BigInt(n);
});
return result;
}
This gives us a random number of as many bytes as we want (default being 128 bytes, i.e. 1024 bits). Since we are dealing with very large numbers, we can't naively test for primality of \(N\) by trying divisions up to \(\sqrt{N}\), this is too expensive. We instead use the probabilistic Miller-Rabin test.
In short, Miller-Rabin works like this: we can write an integer \(N\) (our prime candidate) as \(N = 2^S * D + 1\) where \(S\) and \(D\) are positive integers.
Let's take another integer \(A\) coprime with \(N\). \(N\) is likely to be prime if \(A^D \equiv 1 \pmod{N}\) or \(A^{2^{R}*D} \equiv -1 \pmod{N}\) for some \(0 <= R <= S\). If this is not the case, then \(N\) is not a prime and \(A\) is called a witness of the compositeness of \(N\).
This is a probabilistic test, so we can tell whether \(N\) is for sure non-prime or likely to be prime. Unfortunately, we can't tell for sure that \(N\) is prime. We need to run multiple iterations of this picking different \(A\) values until we are satisfied that \(N\) is likely enough to be prime.
First, we need a helper function that checks \(A\) is not a witness of \(N\), given \(A\), \(N\), and \(S\) and \(D\) such that \(N = S^2 * D + 1\).
We compute \(U\) as \(A^D \pmod{N}\). If \(U - 1 = 0\) or \(U + 1 = N\), then \(A\) is not a witness of \(N\). Otherwise, we repeat \(S - 1\) times: \(U = U^2 \pmod{N}\) and \(A\) is not a witness if \(U + 1 = N\). At this point, if we haven't confirmed that \(A\) is not a witness, we consider \(A\) a witness of \(N\) thus \(N\) is not prime. These are simply the checks described above (\(A^D \equiv 1 \pmod{N}\) and \(A^{2^{R}*D} \equiv -1 \pmod{N}\)) in implementation form.
function isNotWitness(a: bigint, d: bigint, s: bigint, n: bigint): boolean {
if (a === BigInt(0)) {
return true;
}
// u is a ^ d % n
let u = exp(a, d, n);
// a is not a witness if u - 1 = 0 or u + 1 = n
if (u - BigInt(1) === BigInt(0) || u + BigInt(1) === n) {
return true;
}
// Repeat s - 1 times
for (let i = BigInt(0); i < s - BigInt(1); i++) {
// u = u ^ 2 % n
u = exp(u, BigInt(2), n);
// a is not a witness if u = n - 1
if (u + BigInt(1) === n) {
return true;
}
}
// a is a witness of n
return false;
}
With this, we can finally implement Miller-Rabin. We first check a few trivial
cases (2
and 3
are prime, even numbers are non-prime). We then find \(S\) and
\(D\) such that our number \(N = 2^S * D + 1\) (we do this by factoring out powers
of 2 from \(N - 1\)).
We then repeat the test: get a random number \(A < N\). If \(A\) is a witness of \(N\), then \(N\) is not prime. If we run this test enough times, we can safely assume the number is prime. According to this, 40 rounds should be good enough for a 1024 bit prime.
function millerRabinTest(candidate: bigint): boolean {
// Handle some obvious cases
if (candidate === BigInt(2) || candidate === BigInt(3)) {
return true;
}
if (candidate % BigInt(2) === BigInt(0) || candidate < BigInt(2)) {
return false;
}
// Find s and d
let d = candidate - BigInt(1);
let s = BigInt(0);
while ((d & BigInt(1)) === BigInt(0)) {
d = d >> BigInt(1);
s++;
}
// Test 40 rounds.
for (let k = 0; k < 40; k++) {
let a = randBigInt() % candidate;
if (!isNotWitness(a, d, s, candidate)) {
return false;
}
}
return true;
}
Note d
and s
above are technically only needed in isNotWitness()
, but
since they are based on our prime candidate, we compute them once and pass them
as arguments to isNotWitness()
rather than having to recompute them on each
call of the function.
We can finally implement our prime generator. We simply generate large numbers and repeat until Miller-Rabin confirms we got a prime number:
function randPrime(sizeInBytes: number = 128): bigint {
let candidate = BigInt(0);
do {
candidate = randBigInt(sizeInBytes);
} while (!millerRabinTest(candidate));
return candidate;
}
With the low-level math out of the way, we can implement the cryptography API.
First, we will define an SRAKeyPair
as consisting of the initial large prime
\(P\) and the derived \(E\) and \(D\) used for encryption/decryption:
type SRAKeyPair = {
prime: bigint;
enc: bigint;
dec: bigint;
};
We can generate a large prime using randPrime()
. From such a prime, we can
generate an SRAKeyPair
:
function generateKeyPair(largePrime: bigint, size: number = 128): SRAKeyPair {
const phi = largePrime - BigInt(1);
let enc = BigInt(0);
// Trial and error
for (;;) {
// Generate a large prime
enc = randPrime(size);
// Stop when generated prime and passed in prime - 1 are coprime
if (gcd(enc, phi) === BigInt(1)) {
break;
}
}
// enc is our encryption key, now let's find dec as the mod inverse of enc
let dec = modInverse(enc, phi);
return {
prime: largePrime,
enc: enc,
dec: dec,
};
}
If we have an SRAKeyPair
, we can encrypt/decrypt numbers using the modulo
exponentiation function we defined above (exp()
):
function encryptInt(n: bigint, kp: SRAKeyPair) {
return exp(n, kp.enc, kp.prime);
}
function decryptInt(n: bigint, kp: SRAKeyPair) {
return exp(n, kp.dec, kp.prime);
}
We can also convert a string into a BigInt and vice-versa. Assuming we only have character codes below 256 (so ASCII), we can simply encode the string as a 256-base number where each digit is a character:
function stringToBigInt(str: string): bigint {
let result = BigInt(0);
for (const c of str) {
if (c.charCodeAt(0) > 255) {
throw Error(`Unexpected char code ${c.charCodeAt(0)} for ${c}`);
}
result = result * BigInt(256) + BigInt(c.charCodeAt(0));
}
return result;
}
The ASCII assumption is reasonable, since we use this at a protocol level, not as part of the user experience. We can decode such a number back into a string using division and modulo:
function bigIntToString(n: bigint): string {
let result = "";
let m = BigInt(0);
while (n > 0) {
[n, m] = [n / BigInt(256), n % BigInt(256)];
result = String.fromCharCode(Number(m)) + result;
}
return result;
}
Now that we have these conversions, we can can implement string
encryption/decryption on top of our encryptInt()
and decryptInt()
functions:
function encryptString(clearText: string, kp: SRAKeyPair): string {
return bigIntToString(encryptInt(stringToBigInt(clearText), kp));
}
function decryptString(cypherText: string, kp: SRAKeyPair): string {
return bigIntToString(decryptInt(stringToBigInt(cypherText), kp));
}
We can encode any object as a string (and decode back strings to objects):
function encrypt<T>(obj: T, kp: SRAKeyPair): string {
return encryptString(JSON.stringify(obj), kp);
}
function decrypt<T>(cypherText: string, kp: SRAKeyPair): T {
return JSON.parse(decryptString(cypherText, kp));
}
And that's it! We start with randPrime()
to generate a large prime, then
use generateKeyPair()
to derive \(E\) and \(D\) from it. We can then use this
SRAKeyPair
with encrypt()
and decrypt()
to encrypt/decrypt objects using
the commutative SRA algorithm.
Here is a small example pulling everything together:
// Seed prime used by both players to generate keys
const sharedPrime = randPrime();
const aliceKP = generateKeyPair(sharedPrime);
const bobKP = generateKeyPair(sharedPrime);
const card = "Ace of spades";
// Encrypt with Alice's key first, then Bob's
const aliceEncrypted = encryptString(card, aliceKP);
const aliceAndBobEncrypted = encryptString(aliceEncrypted, bobKP);
// Decrypt with Alice's key first, then Bob's
const bobEncrypted = decryptString(aliceAndBobEncrypted, aliceKP);
const decrypted = decryptString(bobEncrypted, bobKP);
// Prints "Ace of spades"
console.log(decrypted);
BigInt
implementations for GCD, modulo inverse, and modulo
exponentiation.BigInt
, and more
generally any object by stringifying it.My work-in-progress Mental Poker Toolkit is here. This post covered the cryptography package.
]]>I wrote previously about Mental Poker, how one can set up a game in a zero trust environment, and how this could be implemented using Fluid Framework.
Since the previous post, I spent some more time prototyping an implementation with a colleague and did a tech talk about it.
If you haven't read the previous post and are not familiar with Mental Poker, the following won't make much sense. Please start there or by watching the tech talk video.
The implementation consists of a few components:
At the time of writing, the append-only list distributed data structure is ready, available on my GitHub as fluid-ledger and published on npm.
The other components will all eventually end up in the mental-poker-toolkit repo.
Some parts, like cryptography and the game client, I cleaned up and moved from a private hackathon repo. Other parts, like the state machine, require major rework, which I haven't gotten around to yet.
The plan is to provide a quality implementation with good documentation and samples. A major difference between the hackathon proof of concept and this is that the proof of concept implements a simple discard game while I'm hoping the toolkit can support games with more than two players.
Modeling a game like Poker is non-trivial. That said, a big part of the complexity comes from the rules of the game itself. For a proof of concept of Mental Poker, we didn't want to get in the weeds of Poker rules, rather showcase the key ideas of how two players can shuffle a deck of cards, agree on what order the cards end up in, while at the same time each being able to maintain some private state (cards in hand). All of this done over a public channel (Fluid Framework).
The game we modeled was simple: players draw a hand of cards, then take turns discarding by number or suit. If a player can't discard (no matching number or suit), they draw cards until they can discard. The player who discards their whole hand first wins.
This prototype informed the components we had to build.
Framework does not offer out of the box
a data structure like the one needed
to model a sequence of moves. We ended up using SharedObjectSequence
, a data
structure that was marked as deprecated and since removed from Fluid. In
general, the Fluid data structures that support lists are overkill for Mental
Poker as they support insertion and deletion of sequences of items at arbitrary
positions. For modeling a game, we just need an append only list - players take
turns and each move means appending something to the end of the list.
In fact, having an append-only list ensures that we don't run into issues like a client unexpectedly inserting something in the middle of the list, which doesn't make sense if we're modeling a sequence of moves in a game.
I was also not able to find a package providing commutative encryption. This is a key requirement for the Mental Poker protocol but industry standard cryptography algorithms do not have this property. I ended up implementing the SRA algorithm from scratch, including a bunch of BigInt math. I still strongly believe in the don't roll your own crypto rule, so please do not use my implementation to play Poker for real money.
Besides encryption, we also need digital signatures. When a player joins a
game, they generate a public/private key pair and their first action is to post
their public key. All subsequent moves from that player are signed with the
private key, so other players can ensure the action is taken by the player
claiming to take that action, eliminating spoofing. Fortunately we were able to
use Crypto.subtle
for this (see Crypto Web API).
Another interesting discovery was the state machine. A high-level game move, like I'm drawing a card from the top of the pile translates into a message exchange between the players:
Alice
: I'm drawing a card from the top of the pile.Bob
: Here is my key for that card.Shuffling cards, as described in the previous blog post, includes a longer
sequence of steps. We needed a way to express I do this, then I expect the
other player to reply with that
. We can use such a state machine to express
sequences of multiple moves to implement things like card shuffling.
The proof of concept state machine uses a queue of expected moves from the other player to implement the game mechanics and Mental Poker protocol. For example, for the Discard game, if it is the other player's turn, we expect two things can happen: they either discard a card or draw a card.
If they discard a card, then they publish their encryption key for the card which we can use to see the card (again, please refer to the previous Mental Poker post for details on the protocol). Alternately, if they can't discard a card, they need to draw a card, in which case we have to hand over our encryption key for the card on top of the deck.
Some of the rules captured in this state machine are specific to each game
implemented. Others though are simply steps in the Mental Poker protocol:
things like shuffling, drawing cards etc. are all modeled as actions I take
and actions I expect the other player to follow up with. I envision
expressing such known sequences as recipes
, building blocks for games.
As I mentioned before, the proof of concept state machine implementation requires some major rework. It needs to scale from two players to an arbitrary number of players, and needs to support recipes, which it currently doesn't. At the time of writing, this is one of the biggest chunks of pending work, and considering this is a hobby project I work on when time permits, I currently don't have a good sense of when I'll finish this. That said, a bunch of pieces are already in decent shape and public, so I plan to write about them while I continue working on finishing the toolkit.
In upcoming blog posts, I plan to cover the various pieces discussed above. The components address different problems, and I find all of them quite interesting. The problem space includes understanding how Fluid Framework distributed data structures work internally, how to generate large prime numbers, and how to model expected sequences of moves in a game among other things.
This post outlines the high level framing of the project. Following posts will dive deep into specific aspects.
In terms of applications, as I mention in the tech talk, the term games is pretty broad - we're not talking only about card games, but things like auctions, lotteries, blind voting etc. All of these can be implemented using Mental Poker as decentralized, zero-trust games.
]]>I've been having fun solving Advent of Code problems every December for a few years now. Advent of Code is an advent calendar of programming puzzles.
All my solutions are on my GitHub here. First, a quick disclaimer:
Disclaimer on my solutions
I use Python because I find it easiest for this type of coding. I treat solving these as a write-only exercise. I do it for the problem-solving bit, so I don't comment the code & once I find the solution I consider it
done- I donât revisit and try to optimize even though sometimes I strongly feel like there is a better solution. I don't even share code between part 1 and part 2 - once part 1 is solved, I copy/paste the solution and change it to solve part 2, so each can be run independently. I also rarely use libraries, and when I do it's some standard ones likere
,itertools
, ormath
. The code has no comments and is littered with magic numbers and strange variable names. This is not how I usually code, rather my decadent holiday indulgence. I wasn't thinking I will end up writing a blog post discussing my solutions so I would like to apologize for the code being hard to read.
With that long disclaimer out of the way, let's talk Advent of Code 2022. I figured I'll cover a few problems that seemed interesting to me during this round, before they fade in my memory. The first couple of weeks are usually easy, so I'll start from day 15.
Problem statement is here.
Part 1 is pretty easy. We use taxicab geometry and for each sensor, we can find
its scan radius by computing the Manhattan distance between its coordinates and
the closest beacon it sees. Once we have this, we intersect each (taxicab)
circle with the row y=2000000
. This gives as a bunch of segments defined by
(x0, x1)
pairs.
import re
y, segments = 2000000, set()
for line in open('input').readlines():
m = re.match('Sensor at x=(-?\d+), y=(-?\d+).*x=(-?\d+), y=(-?\d+)$', line)
sx, sy, bx, by = map(int, m.groups())
radius = abs(sx - bx) + abs(sy - by)
if abs(sy - y) <= radius:
segments.add(((sx - (radius - abs(sy - y)),
(sx + (radius - abs(sy - y))))))
We need to figure out where these overlap so we don't double-count so for each pair of segments, if they intersect, we replace them by their union until no segments intersect anymore. Then we simply sum the length of each segment:
def intersect(s1, s2):
return s1[1] >= s2[0] and s2[1] >= s1[0]
def union(s1, s2):
return (min(s1[0], s2[0]), max(s1[1], s2[1]))
done = False
while not done:
done = True
for s1 in segments:
for s2 in segments:
if s1 == s2:
continue
if intersect(s1, s2):
segments.remove(s1)
segments.remove(s2)
segments.add(union(s1, s2))
done = False
break
if not done:
break
print(sum([s[1] - s[0] for s in segments]))
Part 2 is more interesting. We need to scan a quite large area (both x
and y
between 0
and 4000000
). We know that all points except one are covered by at
least one sensor. We start from (0, 0)
and scan like this: for each point,
find the first sensor that sees
it (Manhattan distance from sensor <= sensor
radius). If no scanner can see it, we found our point. Otherwise, again relying
on taxicab geometry, we can tell how many additional points to the right
(increasing x
) are still in range of this sensor. We move x
beyond these
(\(x = x_sensor + radius - abs(y_sensor - y) + 1\)). If x
goes beyond
4000000
, we reset it to 0
and increment y
. This is not blazingly fast, but
does the job in a reasonable amount of time (around 20 seconds on my machine).
import re
sensors = []
for line in open('input').readlines():
m = re.match('Sensor at x=(-?\d+), y=(-?\d+).*x=(-?\d+), y=(-?\d+)$', line)
sx, sy, bx, by = map(int, m.groups())
radius = abs(sx - bx) + abs(sy - by)
sensors.append((sx, sy, radius))
def in_range(x, y):
for sensor in sensors:
if abs(sensor[0] - x) + abs(sensor[1] - y) <= sensor[2]:
return True, sensor
return False, None
x, y = 0, 0
while True:
found, sensor = in_range(x, y)
if not found:
break
x = sensor[0] + sensor[2] - abs(sensor[1] - y) + 1
if x > 4_000_000:
x = 0
y += 1
print(x * 4_000_000 + y)
Problem statement is here.
Part 1 is again pretty easy: we can model the valves and tunnels as a graph, then use the Floyd-Warshall algorithm to find the distances between each pair of valves:
import re
dist, flows, to_open = {}, {}, set()
for line in open('input').readlines():
m = re.match(
'Valve (\w+) has flow rate=(\d+); tunnels? leads? to valves? (.*)$', line)
src, flow, *dst = m.groups()
dst = [d.strip() for d in dst[0].split(',')]
dist[src] = {d: 1 for d in dst} | {src: 0}
flows[src] = int(flow)
if flows[src] > 0:
to_open.add(src)
for i in dist:
for j in dist:
if j not in dist[i]:
dist[i][j] = 1000
for k in dist:
for i in dist:
for j in dist:
if dist[i][j] > dist[i][k] + dist[k][j]:
dist[i][j] = dist[i][k] + dist[k][j]
We can then search for the best solution recursively: we start from AA
and
keep track of which valves we opened (none for starters). Then at each step,
pick one of the unopened valves. If we have enough time to reach them, recurse
with updated location and set of opened nodes. We also compute the total
pressure released so far at each step and keep track of the highest value we
found. This gives us the solution.
best = 0
def search(current='AA', opened=set(), time=30, score=0):
global best
score += time * flows[current]
if score >= best:
best = score
for node in to_open - opened:
if time - dist[current][node] - 1 >= 0:
search(node, opened | {node}, time -
dist[current][node] - 1, score)
search()
print(best)
Part 2 is more fun. We now have an elephant to help us, which makes it a bit
more complicated. My solution now keeps track of a few more things: which valve
am I headed to and how many more minutes I have to get there; which valve is the
elephant headed to and how many more minutes until it gets there. We both start
at AA
with an ETA of 0
. Then for each node, if my ETA is 0, I'll be heading
that way. If not, the elephant will be heading there. But since we're dealing
with two ETAs, we need to figure out which of us will reach their destination
first, and recurse to that time.
best = 0
def search(me=('AA', 0), elephant=('AA', 0), opened=set(), time=26, score=0):
global best
if score > best:
best = score
for node in to_open - opened:
me_next, elephant_next, score_next = me, elephant, score
if me[1] == 0:
me_next = (node, dist[me[0]][node] + 1)
score_next += (time - dist[me[0]][node] - 1) * flows[node]
else:
elephant_next = (node, dist[elephant[0]][node] + 1)
score_next += (time - dist[elephant[0]][node] - 1) * flows[node]
dt = min(me_next[1], elephant_next[1])
me_next = (me_next[0], me_next[1] - dt)
elephant_next = (elephant_next[0], elephant_next[1] - dt)
if time - dt >= 0:
search(me_next, elephant_next, opened |
{node}, time - dt, score_next)
search()
print(best)
This works but takes a long time, so I added some caching: since both the elephant and I move around a bunch, we can cache the score for each combination of my destination and ETA, the elephant's destination and ETA, and the time. If at a given minute, both the elephant and I were already in this situation but with a better score, we no longer need to keep searching this branch as we already found a better solution. This prunes enough of the search tree to easily find the answer. Updated search with cache:
best = 0
cache = {}
def search(me=('AA', 0), elephant=('AA', 0), opened=set(), time=26, score=0):
global best
if score > best:
best = score
key = str(me) + str(elephant) + str(time)
if key in cache:
if cache[key] >= score:
return
cache[key] = score
for node in to_open - opened:
me_next, elephant_next, score_next = me, elephant, score
if me[1] == 0:
me_next = (node, dist[me[0]][node] + 1)
score_next += (time - dist[me[0]][node] - 1) * flows[node]
else:
elephant_next = (node, dist[elephant[0]][node] + 1)
score_next += (time - dist[elephant[0]][node] - 1) * flows[node]
dt = min(me_next[1], elephant_next[1])
me_next = (me_next[0], me_next[1] - dt)
elephant_next = (elephant_next[0], elephant_next[1] - dt)
if time - dt >= 0:
search(me_next, elephant_next, opened |
{node}, time - dt, score_next)
search()
print(best)
Problem statement is here.
For part 1 we can simply simulate the falling blocks and find the answer. This gives us some of the building blocks needed for part 2.
jets = open('input').read()
rocks = [{(0, 0), (1, 0), (2, 0), (3, 0)},
{(0, 1), (1, 0), (1, 1), (1, 2), (2, 1)},
{(0, 0), (1, 0), (2, 0), (2, 1), (2, 2)},
{(0, 0), (0, 1), (0, 2), (0, 3)},
{(0, 0), (0, 1), (1, 0), (1, 1)}]
grid = set({(i, 0) for i in range(1, 8)})
def intersects(rock, grid):
for block in rock:
if block in grid or block[0] <= 0 or block[0] >= 8:
return True
return False
def move(rock, dx, dy):
return {(i + dx, j + dy) for i, j in rock}
rock_i, jet_i = 0, 0
for _ in range(2022):
top = max(grid, key=lambda pt: pt[1])[1]
rock = move(rocks[rock_i], 3, top + 4)
while True:
new_pos = move(rock, 1 if jets[jet_i] == '>' else -1, 0)
jet_i += 1
if jet_i == len(jets):
jet_i = 0
if not intersects(new_pos, grid):
rock = new_pos
new_pos = move(rock, 0, -1)
if intersects(new_pos, grid):
break
rock = new_pos
grid |= rock
rock_i += 1
if rock_i == len(rocks):
rock_i = 0
print(max(grid, key=lambda pt: pt[1])[1])
Part 2 makes it obvious simulating everything is not an option as we need to
look at a thousand billion rocks. The key here is to find a pattern: we are
bound to end up simulating the same rock and initial move instruction over and
over. If we do and we see the same gain in height between repeats, it means we
found our repeating pattern. We know that starting from this position, we have a
period of length period
in which our tower of rocks grows by growth
. We
subtract the number of rocks we already simulated from 1000000000000, we divide
by period
and multiply by growth
. We'll call this delta_top
.
We are close to the final answer. The only thing left to do is simulate a few
more steps: 1000000000000 minus the number of rocks we already simulated modulo
period
. Now we get the height of the top of the tower we simulated and add
delta_top
to it to find the final answer.
def top():
return max(grid, key=lambda pt: pt[1])[1]
rock_i, jet_i = 0, 0
cache, delta_top = {}, 0
i = 0
while i < 10_000:
rock = move(rocks[rock_i], 3, top() + 4)
while True:
new_pos = move(rock, 1 if jets[jet_i] == '>' else -1, 0)
jet_i += 1
if jet_i == len(jets):
jet_i = 0
if not intersects(new_pos, grid):
rock = new_pos
new_pos = move(rock, 0, -1)
if intersects(new_pos, grid):
break
rock = new_pos
grid |= rock
rock_i += 1
if rock_i == len(rocks):
rock_i = 0
i += 1
if not delta_top:
if (rock_i, jet_i) not in cache:
cache[(rock_i, jet_i)] = []
c = cache[(rock_i, jet_i)]
c.append([i, top()])
if len(c) > 2 and c[-1][1] - c[-2][1] == c[-2][1] - c[-3][1]:
period, growth = c[-1][0] - c[-2][0], c[-1][1] - c[-2][1]
delta_top = (1_000_000_000_000 - i) // period * growth
i = 10_000 - (1_000_000_000_000 - i) % period
print(top() + delta_top)
Problem statement is here.
Part is trivial so I won't discuss it here.
Part 2 is also very easy, but I found a really neat solution worth sharing:
since all boulders are within (0, 0, 0)
and (20, 20, 20)
, I look at a grid
encompassing everything ((-1, -1, -1) to (21, 21, 21)
) and starting from (-1,
-1, -1)
, flood fill. We use a queue and at each step we dequeue a triple of
coordinates. If already visited or out of bounds, we ignore it and continue.
Otherwise if it is a boulder, it means we found a new surface area. We mark
these coordinates as visited and enqueue all the neighbors. I like how whenever
we run into a boulder gives us exactly the area we are looking for. The full
solution is:
cubes = [tuple(map(int, l.strip().split(','))) for l in open('input').readlines()]
visited, queue, area = set(), [(-1, -1, -1)], 0
while queue:
(x, y, z) = queue.pop(0)
if (x, y, z) in visited:
continue
if not (-1 <= x <= 22 and -1 <= y <= 22 and -1 <= z <= 22):
continue
if (x, y, z) in cubes:
area += 1
continue
visited.add((x, y, z))
queue.append((x - 1, y, z))
queue.append((x + 1, y, z))
queue.append((x, y - 1, z))
queue.append((x, y + 1, z))
queue.append((x, y, z - 1))
queue.append((x, y, z + 1))
print(area)
Problem statement is here.
I used the same solution for part 1 and part 2: a recursive search where we keep track of the bots and resources we have, and the time. The problem is it takes too long to simulate minute by minute. If we try deciding at each minute whether to build any of the bots we can build or keep collecting resources, then recurse to next minute, we end up with too much combinatorial complexity. My solution instead does something like this: for the current moment in time, for each type of robot, say we want to build that one next - based on costs and available resources, we can calculate how many minutes from now that robot be built. We can then recurse (jump ahead in time) there updating available resources, since we know other robots won't be built until then.
As an additional optimization, we can keep track of how many geodes we collected at each minute and if our current search has fewer geodes, it means we already found a better solution and it is not worth going down this branch. There's probably smarter caching/pruning we can do but this seems to be good enough.
This tames the combinatorial complexity enough to get a reasonable run time and going from simulating 24 minutes in part 1 to simulating 32 minutes for fewer blueprints in part 2 doesn't seem to require changing the algorithm. Both parts take around 2 minutes to run. It can probably be optimize further.
import re
import math
def run(bots, costs, resources, time):
if best[time] > resources[3]:
return
best[time] = resources[3]
if time == 0:
return
for bot_type in range(4):
dt = math.ceil((costs[bot_type][0] - resources[0]) / bots[0])
if bot_type >= 2:
if bots[bot_type - 1] == 0:
continue
dt = max(dt, math.ceil((costs[bot_type][1] -
resources[bot_type - 1]) / bots[bot_type - 1]))
dt = max(dt, 0) + 1
if time < dt:
continue
new_resources = [resources[i] + bots[i] * dt for i in range(4)]
new_resources[0] -= costs[bot_type][0]
if bot_type >= 2:
new_resources[bot_type - 1] -= costs[bot_type][1]
bots[bot_type] += 1
run(bots, costs, new_resources, time - dt)
bots[bot_type] -= 1
score = 1
for line in open('input').readlines()[:3]:
m = re.match(
'.*(\d+) ore.*(\d+) ore.*(\d+) ore and (\d+) clay.*(\d+) ore and (\d+) obsidian', line)
costs = list(map(int, m.groups()))
costs = [[costs[0]], [costs[1]], [
costs[2], costs[3]], [costs[4], costs[5]]]
best = [0] * 33
run([1, 0, 0, 0], costs, [0] * 4, 32)
score *= best[0]
print(score)
Problem statement is here.
Day 20 was very easy so I won't cover it here.
Problem statement is here.
Another easy one. For part 1, we parse the input in an expression tree (with values at leaf nodes and operators at non-leaf nodes) and we recursively evaluate it from root.
tree = {}
for line in open('input').readlines():
key, value = line.strip().split(': ')
value = value.split(' ')
if len(value) == 1:
value = int(value[0])
tree[key] = value
def get(key):
if isinstance(tree[key], int):
return tree[key]
v1, v2 = get(tree[key][0]), get(tree[key][2])
match tree[key][1]:
case '+': return v1 + v2
case '-': return v1 - v2
case '*': return v1 * v2
case '/': return v1 // v2
print(get('root'))
Part 2 effectively makes the root be ==
and asks us to find the value for the
humn
node. For this, we can update our recursive evaluation to either compute
a value or return None
if humn
is part of the subtree we're trying to
evaluate (so if either left or right subtree evaluates to None
, return
None
). We add another recursive function solve()
which takes a node and an
expected value (we expect the node to end up equal to the value) then we can
recursively solve: evaluate left and right. Depending on which of them returns
None
, we recurse down that subtree with an updated expected value. For
example, if we expect left + right
to be 10
and we get 5
and None
back,
then we recurse down the right
subtree, with an expected value of 10 - left
.
tree = {}
for line in open('input').readlines():
key, value = line.strip().split(': ')
value = value.split(' ')
if len(value) == 1:
value = int(value[0])
tree[key] = value
def get(key):
if tree[key] == None or isinstance(tree[key], int):
return tree[key]
v1, v2 = get(tree[key][0]), get(tree[key][2])
if v1 == None or v2 == None:
return None
match tree[key][1]:
case '+': return v1 + v2
case '-': return v1 - v2
case '*': return v1 * v2
case '/': return v1 // v2
def solve(key, eq):
if tree[key] == None:
return eq
k1, k2 = tree[key][0], tree[key][2]
v1, v2 = get(k1), get(k2)
if v1 == None:
match tree[key][1]:
case '+': return solve(k1, eq - v2)
case '-': return solve(k1, eq + v2)
case '*': return solve(k1, eq // v2)
case '/': return solve(k1, eq * v2)
if v2 == None:
match tree[key][1]:
case '+': return solve(k2, eq - v1)
case '-': return solve(k2, v1 - eq)
case '*': return solve(k2, eq // v1)
case '/': return solve(k2, v1 // eq)
tree['humn'] = None
tree['root'][1] = '-'
print(solve('root', 0))
Problem statement is here.
This one was fun but a bit tedious. Part 1 is very easy, we implement movement with wrap-around and stopping when we hit #
.
import re
grid = [line.strip('\n').ljust(150, ' ') for line in open('input').readlines()]
dirs, grid = [m.group() for m in re.finditer('(\d+)|L|R', grid[-1])], grid[:-2]
dirs = [int(d) if str.isdecimal(d) else d for d in dirs]
facing = [(1, 0), (0, 1), (-1, 0), (0, -1)]
x, y, d = grid[0].index('.'), 0, 0
def move(x, y, d):
nx = (x + d[0]) % len(grid[0])
ny = (y + d[1]) % len(grid)
match grid[ny][nx]:
case ' ':
nx, ny = move(nx, ny, d)
return (nx, ny) if grid[ny][nx] != ' ' else (x, y)
case '#': return (x, y)
case '.': return (nx, ny)
for step in dirs:
if isinstance(step, int):
while step > 0:
x, y = move(x, y, facing[d])
step -= 1
elif step == 'L':
d = (d - 1) % 4
else:
d = (d + 1) % 4
print(1000 * (y + 1) + 4 * (x + 1) + d)
For part 2, we need to figure out how the various facets connect into a cube and map movement from one face to another. Personally, I made a paper cutout of the input shape, folded it, and used that to figure out the transitions:
The algorithm is pretty easy if the mappings are right. While on the same facet,
we simply move in the direction we are supposed to move. We can encode a facet
as a pair of (region_x, region_y)
coordinates where region_x, region_y = x //
50, y // 50
. Of course, some pairs of coordinates are not part of any facet of
the cube (e.g. (0, 0)
) but that doesn't matter. Using this encoding, we can
tell when a movement gets us outside the current region. When that happens, we
have a helper function which helps figure out where we end up and what is the
new orientation.
import re
grid = [line.strip('\n').ljust(150, ' ') for line in open('input').readlines()]
dirs, grid = [m.group() for m in re.finditer('(\d+)|L|R', grid[-1])], grid[:-2]
dirs = [int(d) if str.isdecimal(d) else d for d in dirs]
size = 50
facing = [(1, 0), (0, 1), (-1, 0), (0, -1)]
connections = {
(1, 0): [(2, 0, 0), (1, 1, 1), (0, 2, 0), (0, 3, 0)],
(2, 0): [(1, 2, 2), (1, 1, 2), (1, 0, 2), (0, 3, 3)],
(1, 1): [(2, 0, 3), (1, 2, 1), (0, 2, 1), (1, 0, 3)],
(0, 2): [(1, 2, 0), (0, 3, 1), (1, 0, 0), (1, 1, 0)],
(1, 2): [(2, 0, 2), (0, 3, 2), (0, 2, 2), (1, 1, 3)],
(0, 3): [(1, 2, 3), (2, 0, 1), (1, 0, 1), (0, 2, 3)],
}
x, y, d = grid[0].index('.'), 0, 0
def move(x, y, d):
nx = x + facing[d][0]
ny = y + facing[d][1]
nd = d
if (x // size, y // size) != (nx // size, ny // size):
nx, ny, nd = switch_region(x, y, d)
match grid[ny][nx]:
case '#': return (x, y, d)
case '.': return (nx, ny, nd)
def switch_region(x, y, d):
nrx, nry, nd = connections[(x // size, y // size)][d]
nx, ny = nrx * size, nry * size
rx, ry = x % size, y % size
if (d, nd) in [(0, 0), (1, 3), (2, 2), (3, 1)]:
return nx + size - rx - 1, ny + ry, nd
if (d, nd) in [(0, 2), (1, 1), (2, 0), (3, 3)]:
return nx + rx, ny + size - ry - 1, nd
if (d, nd) in [(0, 1), (1, 0), (2, 3), (3, 2)]:
return nx + size - ry - 1, ny + size - rx - 1, nd
if (d, nd) in [(0, 3), (1, 2), (2, 1), (3, 0)]:
return nx + ry, ny + rx, nd
for step in dirs:
if isinstance(step, int):
while step > 0:
x, y, d = move(x, y, d)
step -= 1
elif step == 'L':
d = (d - 1) % 4
else:
d = (d + 1) % 4
print(1000 * (y + 1) + 4 * (x + 1) + d)
Problem statement is here.
This is a cellular automaton. In general, when implementing cellular automata,
the trick
is to not change things in place, rather use a new copy for each
generation. I represented the elves as a set of (x, y)
coordinates. We can use
set intersection to see if an elf has other elves nearby or whether two elves
would end up moving in the same spot. I won't go into more detail as this was
another pretty easy problem. The code is on my GitHub.
Problem statement is here.
I liked this one. For both part 1 and part 2, this becomes easy to solve with a couple of interesting observations.
First the blizzards move in a repeating pattern so we can map which squares are
occupied at a given point in time and we know the occupancy repeats every
lcm(height, width)
where height
and width
are the height and width of the
valley. We can compute this many generations and store the occupancy map in a
lookup.
import math
blizzards = []
lines = [line.strip() for line in open('input').readlines()]
for y, line in enumerate(lines):
for x, c in enumerate(line):
if c in '<^>v':
blizzards.append((x, y, c))
maxx, maxy = len(lines[0]) - 1, len(lines) - 1
move = {'<': (-1, 0), '^': (0, -1), '>': (1, 0), 'v': (0, 1)}
def step(blizzards):
new = []
for b in blizzards:
x, y = b[0] + move[b[2]][0], b[1] + move[b[2]][1]
if x == 0: x = maxx - 1
if x == maxx: x = 1
if y == 0: y = maxy - 1
if y == maxy: y = 1
new.append((x, y, b[2]))
return new
def occupancy(blizzards):
return {(x, y) for x, y, c in blizzards}
steps, lcm = {}, math.lcm(maxx - 1, maxy - 1)
for i in range(lcm):
steps[i] = {(x, y) for x, y, _ in blizzards}
blizzards = step(blizzards)
Next, we can do a breadth-first search to find the closest path from one side to
the other. Since a possible move is waiting one, its pretty hard to find bounds
for a depth-first search. On the other hand, at every step the elves can occupy
one of the at most height * width
positions. Of course, most of these will be
occupied by blizzards. So for a BFS, we start from the initial position and time
(step 0
) and use a queue. We pop the first move and enqueue all possible moves
from this position (taking into account valley bounds and blizzard occupancy)
for the next step. As long as we ensure not to enqueue duplicates, the queue
stays small. Since this is BFS, as soon as the position we dequeue is our
destination, we know this is the earliest we can get there.
def solve():
queue = [(1, 0, 0)]
while True:
x, y, step = queue.pop(0)
for x, y in [(x + m[0], y + m[1]) for m in move.values()] + [(x, y)]:
if (x, y) == (maxx - 1, maxy):
return step + 1
if (x, y) != (1, 0):
if x <= 0 or x >= maxx or y <= 0 or y >= maxy:
continue
if (x, y) in steps[(step + 1) % lcm]:
continue
if (x, y, step + 1) not in queue:
queue.append((x, y, step + 1))
print(solve())
The extra trips are no problem since this is very fast. The only changes I had
to make from part 1 to part 2 were modifying solve()
to parameterize start,
destination, and initial point in time, then call it 3 times for each trip:
def solve(src, dest, step):
queue = [(src[0], src[1], step)]
while True:
x, y, step = queue.pop(0)
for x, y in [(x + m[0], y + m[1]) for m in move.values()] + [(x, y)]:
if (x, y) == (dest[0], dest[1]):
return step + 1
if (x, y) != (src[0], src[1]):
if x <= 0 or x >= maxx or y <= 0 or y >= maxy:
continue
if (x, y) in steps[(step + 1) % lcm]:
continue
if (x, y, step + 1) not in queue:
queue.append((x, y, step + 1))
trip1 = solve((1, 0), (maxx - 1, maxy), 0)
trip2 = solve((maxx - 1, maxy), (1, 0), trip1)
trip3 = solve((1, 0), (maxx - 1, maxy), trip2)
print(trip3)
Problem statement is here.
Another easy one that I won't discuss in detail, we just need to implement conversion from decimal to SNAFU and back:
def to_dec(n):
digits = {'0': 0, '1': 1, '2': 2, '-': -1, '=': -2}
return sum([5 ** i * digits[d] for i, d in enumerate(n[::-1])])
def to_snafu(n):
s = ''
while n:
s = ['0', '1', '2', '=', '-'][n % 5] + s
n = n // 5 + (1 if s[0] in '-=' else 0)
return s
print(to_snafu(sum([to_dec(line.strip()) for line in open('input').readlines()])))
In Advent of Code tradition, day 25 has only 1 part.
This was another very fun set of problems and I am looking forward to Advent of Code 2023.
]]>In the previous post, we covered lambda calculus, a computational model underpinning functional programming. In this blog post, we'll continue down the functional programming road and cover one of the oldest programming languages still in use: LISP.
LISP was originally specified in 1958 by John McCarthy and the paper describing the language was published in 1960^{1}. It became very popular in AI research and flavors of it are still in use today.
LISP has a quite unique syntax and execution model.
If we are going to talk about LISP, we need to start with symbolic expressions. Symbolic expressions, or S-expressions, are defined as:
An S-expression is either
(x . y)
where x
and y
are S-expressions.
This very simple definition is very powerful: it allows us to represent any
binary tree. Let's start with a very simple universe where the only atom is
()
, representing a null value. With this atom and the above definition, while
we can't (easily) represent data, we can capture the shape of a binary tree. For
example, the tree consisting of a root node and two leaf nodes:
can be represented as (() . ())
.
The tree consisting of a root, a left leaf node, and a right node with two child leaf nodes
would be (() . (() . ()))
.
If we expand the definition of atom to include numbers and basic arithmetic
(+
, -
, *
, /
), we can represent arithmetic expressions as S-expressions.
2 + 3
can be represented as (+ . (2 . (3 . ()))
.
2 * (3 + 5)
can be represented as (* . (2 . ((+ . (3 . (5 . ()))) . ())
.
Note the S-expression definition only allows for values
(atoms) at leaf nodes
of the tree. An S-expression is either a leaf node containing a value or a
non-leaf node with 2 S-expression children. That means we can't represent 2 +
3
as
but the representation we just saw is equivalent.
S-expressions can be used to represent data. Consider a simple list 1, 2, 3, 4,
5
. Much like we saw in the previous post when we looked at representing lists
as lambda expressions, we can represent lists using S-expressions using a head
and a tail (recursively):
can be viewed as
or (1 . (2 . (3 . (4 . (5 . ())))))
.
We can also represent an associative array: instead of a value, we can represent
a key-value pair as an S-expression ((key . value)
), so we can represent the
associative array { 1: 2, 2: 3, 3: 5 }
as ((1 . 2) . ((2 . 3) . ((3 . 5) .
())))
.
Historically, a non-atom S-expression in LISP is called a cons cell (from
construction
). Instead of head and tail, LISP uses car and cdr
(standing for contents of the address register and contents of the decrement
register, which are artifacts of the computer architecture first flavors of
LISP were implemented in).
We just saw how we can represent trees, lists, and associative arrays using S-expressions. But S-expressions aren't limited to representing data: we can also use them to represent code.
We looked at how 2 + 3
would look like as an S-expression. In fact, we can
represent any function call as an S-expression, where the left node of the root
S-expression is the function to be called and the right subtree contains the
arguments.
2 + 3
is equivalent to the function add(2, 3)
. So we can represent the
function call add(2, 3)
as the S-expression (add . (2 . (3 . ())))
.
Note we can have any number of arguments as we grow the right subtrees: sum(2,
3, 4, 5)
can be represented as (sum . (2 . (3 . (4 . (5 . ())))))
. If we want
to pass the result of another function as an argument, say sum(2, sum(3, 4),
5)
, we can represent this as (sum . (2 . ((sum . (3 . (4 . ()))) . (5 . ()))
))
.
We saw in the previous post that we can represent pretty much anything using
functions. An if expression is a function if(condition, true-branch,
false-branch)
. We can combine this with recursion to generate loops. So we have
all the building blocks for a Turing-complete system.
It turns out we can represent both data and code as S-expressions. Before moving on to look at some implementation details, let's introduce some syntactic sugar.
Writing S-expression like this can become tedious, so let's introduce some
syntactic sugar. Instead of (1 . (2 . (3 . (4 . (5 . ())))))
, we can write
(1 2 3 4 5)
. We omit some of the parenthesis, the concatenation symbol .
,
and the final ()
. By default, we concatenate on the right subtree. If we need
to go down the left subtree, we add parenthesis. So instead of representing the
associative array { 1: 2, 2: 3, 3: 5 }
as ((1 . 2) . ((2 . 3) . ((3 . 5) .
())))
, we can more succinctly represent it as ((1 2) (2 3) (3 5))
, without
losing any meaning.
Similarly, (add . (2 . (3 . ())))
becomes (add 2 3)
and (sum . (2 . ((sum .
(3 . (4 . ()))) . (5 . ()))))
becomes (sum 2 (sum 3 4) 5)
.
In our implementation, we will represent S-expressions as lists which can contain any number of elements. This is a more succinct representation and will make our code easier to understand.
We can now look at implementing a small LISP. We take an input string, we parse it into an S-expression, then we evaluate the S-expression and print the result.
First, the parser: we will take a string as input, split it into tokens, then parse the tokens into an S-expression.
We will transform an input string into a list of tokens by matching it with
either (
, )
, or a string of alphanumeric characters. We'll use a regular
expression for this, then extract the matched values (using match.group()
)
into a list:
import re
def lex(line):
return [match.group() for match in re.finditer('\(|\)|\w+', line)]
We can now transform an input like '(add 1 (add 2 3))'
into the list of tokens
['(', 'add', '1', '(', 'add', '2', '3', ')', ')']
by calling lex()
on it.
We need to transform this list of tokens into an S-expression. First, we need a
couple of helper functions. An atom can be either a number or a symbol. We'll
create one from a token using an atom()
function:
def atom(value):
try:
return int(value)
except:
return value
The other helper function will yield while the head of our token list is
different than )
, then pop the )
token. We'll use this while parsing to
iterate over the tokens after a (
and until we find the matching )
:
def pop_rpar(tokens):
while tokens[0] != ')':
yield
tokens.pop(0)
Parsing into an S-expression is now very simple:
(
, we recursively parse the following tokens until we reach
the matching )
.)
, we raise an exception - this is an unmatched )
.atom()
on it.def parse(tokens):
match token := tokens.pop(0):
case '(':
return [parse(tokens) for _ in pop_rpar(tokens)]
case ')':
raise Exception('Unexpected )')
case _:
return atom(token)
That's it. If we parse the input string '(add 1 (add 2 3))'
using our
functions - parse(lex('(add 1 (add 2 3))'))
- we will get back
['add', 1, ['add', 2, 3]]
.
We can now take text as input and convert it into the internal representation we discussed.
The next step is to evaluate such an S-expression and return a result. We need two pieces for this: an environment which stores built-in functions and user-defined variables, and an evaluation function which takes an S-expression and processes it using the environment.
We'll start with a simple environment with built-in support for equality, arithmetic operations and list operations:
env = {
# Equality
'eq': lambda arg1, arg2: arg1 == arg2,
# Arithmetic
'add': lambda arg1, arg2: arg1 + arg2,
'sub': lambda arg1, arg2: arg1 - arg2,
'mul': lambda arg1, arg2: arg1 * arg2,
'div': lambda arg1, arg2: arg1 / arg2,
# Lists
'cons': lambda car, cdr: [car] + cdr,
'car': lambda list: list[0],
'cdr': lambda list: list[1:],
}
Our evaluation function has a few special-case handling for variable definitions, quotations, and if-expressions, and is otherwise pretty straightforward:
def eval(sexpr):
# If null or number atom, return it
if sexpr == [] or isinstance(sexpr, int):
return sexpr
# If string atom, look it up in environment
if isinstance(sexpr, str):
return env[sexpr]
match sexpr[0]:
case 'def':
env[sexpr[1]] = eval(sexpr[2])
case 'quote':
return sexpr[1]
case 'if':
return eval(sexpr[2]) if eval(sexpr[1]) else eval(sexpr[3])
case call:
return env[call](*[eval(arg) for arg in sexpr[1:]])
Our evaluation works like this:
def
, we add a definition to the environment.quote
, we return the second symbol unevaluated.if
, we evaluate the second symbol and if it is
truthy, we evaluate the third symbol, otherwise the fourth symbol.We're taking a bit of a shortcut here and relying on Python's notion of
truthy-ness (e.g. 0
or an empty list []
is non-truthy). If needed, we can
enhance our implementation with Boolean support.
We can now implement a simple read-eval-print loop (REPL):
while line := input('> '):
try:
print(eval(parse(lex(line))))
except Exception as e:
print(f'{type(e).__name__}: {e}')
We can try a few simple commands (shown below with the corresponding output):
> (def a 40)
None
> (def b 2)
None
> (add a b)
42
> (if a 1 0)
1
> (add 2 (add 3 4))
9
> (def list (cons 1 (cons 2 (cons 3 ()))))
None
> (car list)
1
> (cdr list)
[2, 3]
We can extend the environment with additional functions as needed. These
represent the built-in
functions of our LISP interpreter. One capability we
are still missing is the ability to define custom functions at runtime. Let's
extend our interpreter to support that.
A function can take any number of arguments, which should become defined in
the environment while the function is executing but which don't exist outside
the function. For example, if we define an addition function as add(x, y)
,
we should be able to refer to the x
and y
arguments inside the body of the
function but not outside of it. x
and y
only exist within the scope of
the function.
We can add scoping to our interpreter by extending our eval
to take an
environment as an argument instead of always referencing our env
. Then when
we create a new scope, we create a new environment to use.
For function definition, we will use the following syntax: (deffun
function_name (arguments...) (body...))
. deffun
denotes a function
definition. The second argument is the function name. The third is a list of
parameters and the fourth is the body of the function, which is going to be
evaluated in an environment where its arguments are defined.
We need a function factory:
def make_function(params, body, env):
return lambda *args: eval(body, env | dict(zip(params, args)))
This takes the parameters, body, and environment and returns a lambda which
expects a list of arguments. Calling the lambda will invoke eval
on the
body. Note we extend the environment with a dictionary mapping parameters to
arguments.
Let's update eval
to use a parameterized environment and support the new
deffun
function definition capability:
def eval(sexpr, env=env):
# If number atom, return value
if isinstance(sexpr, int):
return sexpr
# If string atom, look it up in environment
if isinstance(sexpr, str):
return env[sexpr]
if sexpr == []:
return []
match sexpr[0]:
case 'def':
env[sexpr[1]] = eval(sexpr[2], env)
case 'deffun':
env[sexpr[1]] = make_function(sexpr[2], sexpr[3], env)
case 'quote':
return sexpr[1]
case 'if':
return eval(sexpr[2], env) if eval(sexpr[1], env) else eval(sexpr[3], env)
case call:
return env[call](*[eval(arg, env) for arg in sexpr[1:]])
Besides plumbing env
through each eval
call, we just added a deffun
case where
we use our function factory.
We can run our REPL again and try out the new capability:
> (deffun myadd (x y) (add x y))
None
> (myadd 2 3)
5
Here is a Fibonacci implementation, using deffun
and recursion:
> (deffun fib (n) (if (eq n 0) 0 (if (eq n 1) 1 (add (fib (sub n 1)) (fib (sub n 2))))))
None
> (fib 8)
21
If n
is 0, return 0
else if n
is 1, return 1
, else recursively call
fib
for n - 1
and n - 2
and add the results.
We won't provide a proof of Turing-completeness but it should be obvious that the capabilities we implemented so far are sufficient to emulate, for example, a cyclic tag system like we did in the previous post with lambdas.
The full implementation of our mini-LISP is here.
Peter Norvig wrote a much more detailed article describing a LISP implementation here.
LISP is a very interesting language as it uses the same representation for both data and code (for better or worse). Turns out binary trees (or trees if we use our syntactic sugar) are enough to represent both.
As we just saw, a core LISP runtime is fairly easy to implement and many of the more advanced features can be bootstrapped within the language itself.
Languages in the LISP family are called LISP dialects. Even though the language is many decades old, modern dialects are alive and thriving. For example Raket and Closure are LISP dialects.
In this post we looked at LISP:
Original paper: http://www-formal.stanford.edu/jmc/recursive.pdf. ↩
In the previous posts, we dug deeper into one particular model of computation, starting with Turing Machines in part 2, to the von Neumann computer architecture in part 6, to some of the implementation practicalities of machines - physical or virtual - in part 7.
We'll switch gears and cover another computational model this time around: lambda calculus. Lambda calculus was developed by Alonzo Church around the same time Alan Turing was proposing the Turing machine as a universal model for computation. The Church-Turing thesis^{1} proves the equivalence between the two models - anything a Turing machine can compute can also be computed by lambda calculus.
Formally:
Lambda calculus consists of lambda terms and reductions applied to lambda terms.
The lambda terms are built with the following rules, where \(\Lambda\) is the set of all possible lambda terms:
- Variables, like \(x\), are lambda terms. \(x \in \Lambda\).
- Abstractions, \((\lambda x.M)\). This is a function definition where \(M\) is a lambda term and \(x\) becomes bound in the expression. For \(x \in \Lambda\) and \(M \in \Lambda\), \((\lambda x.M) \in \Lambda\).
- Applications, \((M \space N)\). This applies the function \(M\) to the argument \(N\), where \(M\) and \(N\) are lambda terms. For \(M \in \Lambda\) and \(N \in \Lambda\), \((M \space N) \in \Lambda\).
If a term \(y\) appears in \(M\) but is not bound, then \(y\) is free in \(M\), e.g. for \(\lambda x.y \space x\), \(x\) is bound and \(y\) is free. The reductions are:
- \(\alpha\)-equivalence: bound variables in an expression can be renamed to avoid collisions: \((\lambda x.M[x]) \rightarrow (\lambda y.M[y])\).
- \(\beta\)-reduction: bound variables in the body of an abstraction are replaced with the argument expression: \((\lambda x.t)s \rightarrow t[x := s]\).
- \(\eta\)-reduction: if \(x\) is a variable that does not appear free in the lambda term M, then \(\lambda x.(M x) \rightarrow M\). This can also be understood in terms of function equivalence: if two functions give the same result for all arguments, then the functions are equivalent.
Let's look at a few simple examples in Python:
lambda x: x
This is the identity function expressed as a lambda abstraction. In this case,
x
(the lambda parameter), becomes bound in the body of the lambda.
\(\alpha\)-equivalence:
lambda y: y
This is the same identity function, we're just using y
instead of x
to name
the parameter.
For function application, we can apply the identity function to any other lambda term and get back that lambda term:
(lambda x: x)(lambda y: y)
This applied the identify function lambda x: x
to the argument lambda y: y
,
which will give us back lambda y: y
.
Based on the above definition, lambda calculus consists exclusively of lambda
terms - while (lambda x: x)(10)
is valid Python code, applying an identity
lambda to the number 10
, lambda calculus does not have a number 10
. Enter
Church encoding: Alonzo Church came up with a way to encode logic values and
numbers as lambda terms.
Let's start with Boolean logic: TRUE
is defined as \(T := (\lambda x.\lambda
y.x)\), FALSE
is defined as \(F := (\lambda x.\lambda y.y)\).
TRUE = lambda x: lambda y: x
FALSE = lambda x: lambda y: y
Note with this definition, if we apply a first argument to TRUE
, and a second
argument to the returned lambda, we always get back the first argument. For
FALSE
, we always get back the second argument.
We can defined IF
as \(IF := (\lambda x.x)\). This is the same as the identity
function.
IF = lambda x: x
This works since we defined TRUE
to always return the first argument and
FALSE
to always return the second argument. So when we call IF(c)(x)(y)
,
if c
is TRUE
, we get back x
(the if-branch), otherwise we get back y
(the else-branch).
We can try this out (though again this is outside of lambda calculus, we are introducing numbers for clarity):
IF(TRUE)(1)(2) # This returns 1
IF(FALSE)(1)(2) # This returns 2
Now that we can express if-then-else, we can easily express other logic operators. Negation is \(\lambda x.(x \space F \space T)\).
NOT = lambda x: x(FALSE)(TRUE)
If x
is TRUE
, we get back the first argument, FALSE
; if x
is FALSE
,
we get back the second argument, TRUE
.
x AND y
can be expressed as if x then y else FALSE, or: \(\lambda x.\lambda
y.(x \space y \space F)\). x OR y
can be expressed as if x then TRUE else y,
or \(\lambda x.\lambda y.(x \space T \space y)\).
AND = lambda x: lambda y: x(y)(FALSE)
OR = lambda x: lambda y: x(TRUE)(y)
Here are a few examples:
print(AND(TRUE)(TRUE) == TRUE) # prints True
print(AND(TRUE)(FALSE) == TRUE) # prints False
print(OR(TRUE)(FALSE) == TRUE) # prints True
print(NOT(FALSE) == TRUE) # prints True
Using only lambda terms, we were able to implement Boolean logic! But Church encoding goes further - we can also represent natural numbers and arithmetic as lambda terms.
Alonzo Church encoded numbers as applications of a function \(f\) to a term \(x\).
0
means applying \(f\) 0 times to the term: \(0 := \lambda f.\lambda x.x\).1
means applying \(f\) once to the term: \(1 := \lambda f.\lambda x.f x\).2
means applying \(f\) twice: \(2 := \lambda f.\lambda x.f (f x)\).In general, the number n
is represented by n
applications of f
: \(n :=
\lambda f.\lambda x.f (f (... (f x)) ... ))\) or \(n := \lambda f.\lambda x.
f^n(x)\).
In Python:
ZERO = lambda f: lambda x: x
ONE = lambda f: lambda x: f(x)
TWO = lambda f: lambda x: f(f(x))
...
Note ZERO
is the same as FALSE
. With this definition of numbers, we can
define the successor function SUCC
as a function that takes a number n
(represented with our Church encoding), the function f
, the term x
, and
applies f
one more time. \(SUCC := \lambda n.\lambda f.\lambda x.f (n f x)\).
SUCC = lambda n: lambda f: lambda x: f(n(f)(x))
We can define addition as \(PLUS := \lambda m.\lambda n.m \space SUCC \space n\).
Since we define a number as repeatedly applying a function, we express m + n
as applying m
times the successor function SUCC
to n
.
PLUS = lambda m: lambda n: m(SUCC)(n)
We can similarly define multiplication as applications of the PLUS
function:
MUL = lambda m: lambda n: m(PLUS)(n)
We'll stop here with arithmetic, but this should hopefully give you a sense of the expressive power of lambda calculus.
Some well-known lambda terms are called combinators:
In Python:
I = lambda x: x
K = lambda x: lambda y: x
S = lambda x: lambda y: lambda z: x(z)(y(z))
Turns out these 3 combinators can together express any lambda term. The SKI
combinators are the simplest programming language
since they can express
anything expressable in lambda calculus, which we know is Turing-complete.
Another interesting combinator is the \(Y\) combinator. In lambda calculus, there
is no way for a function to reference itself: within the body of a lambda like
lambda x: ...
we can refer to the bound term x
, but we can reference the
lambda itself. The implication is that we can't define, using this syntax,
self-referential functions. We can only pass functions as arguments. How can we
then implement recursion? With the \(Y\) combinator, of course.
Let's take an example: we can recursively define factorial as:
def fact(n):
return 1 if n == 0 else n * fact(n - 1)
This works, but note we reference fact()
within its body. In lambda calculus
we can't do that.
The \(Y\) combinator is defined as \(Y := \lambda f.(\lambda x.f (x x))(\lambda x.f (x x))\).
Y = lambda f: (lambda x: f(x(x)))(lambda x: f(lambda z: x(x)(z)))
Note the Python implementation is slightly different than the mathematical definition. This has to do with the way in which Python evaluates arguments. We won't go into the details here, but consider this a Python implementation detail irrelevant to the lambda calculus discussion^{2}.
Here is a lambda version of factorial:
FACT = lambda f: lambda n: 1 if n == 0 else n * f(n - 1)
With this definition, we pass the function to call as an argument (f
). We can
fully express this in lambda calculus (using Church numerals, arithmetic and
logic), but we'll keep the example simple. We can then use the \(Y\) combinator
like this:
print(Y(FACT)(5)) # prints 120
This should give you an intuitive understanding of how the \(Y\) combinator works: we pass it our function and argument, and it enables the recursion mechanism.
We can similarly implement Fibonacci as:
FIB = lambda f: lambda n: 1 if n <= 2 else f(n - 1) + f(n - 2)
print(Y(FIB)(10)) # prints 55
The powerful \(Y\) combinator can be used to define recursive functions in programming languages that don't natively support recursion.
Let's also look at how we can express lists in lambda calculus. Let's start with pairs. We can define a pair as \(PAIR := \lambda x.\lambda y.\lambda f. f x y\). We can extract the first element of a pair with \(FIRST := \lambda p. p \space T\) and the second one with \(SECOND := \lambda p.p \space F\).
PAIR = lambda x: lambda y: lambda f: f(x)(y)
FIRST = lambda p: p(TRUE)
SECOND = lambda p: p(FALSE)
print(FIRST(PAIR(10)(20))) # prints 10
print(SECOND(PAIR(10)(20))) # prints 20
We can define a NULL
value as \(NULL := \lambda x.T\) and a test for NULL
as
\(ISNULL := \lambda p.p (\lambda x.\lambda y.FALSE)\).
NULL = lambda x: TRUE
ISNULL = lambda p: p(lambda x: lambda y: FALSE)
We can now define a linked list as either NULL
(an empty list) or as a pair
consisting of a pair of elements - a head element and a tail list.
We can get the head of the list using FIRST
and the tail using SECOND
. Given
list \(L\), we can prepend an element \(x\) by forming the pair \((x, L)\).
HEAD = FIRST
TAIL = SECOND
PREPEND = lambda x: lambda xs: PAIR(x)(xs)
We can build a list by prepending elements to NULL
, and traverse it using
HEAD
and TAIL
:
# Build the list [10, 20, 30]
L = PREPEND(10)(PREPEND(20)(PREPEND(30)(NULL)))
print(HEAD(TAIL(L))) # prints 20
Appending is more interesting: if our list is represented as a pair of head and
tail, we need to traverse
the list until we reach the end. This sounds a lot
like a recursive function: appending x
to xs
entails returning the pair
PAIR(x, NULL)
if xs
is NULL
, else the pair PAIR(HEAD(xs), APPEND(TAIL(xs,
x)))
. Fortunately, we just looked at the \(Y\) combinator which allows us
to express this.
Here is a simplified, readable implementation, using Python tuples:
_append = lambda f: lambda xs: lambda x: \
(x, None) if not xs else (xs[0], f(xs[1])(x))
append = Y(_append)
print(append(append(append(None)(10))(20))(30))
# This will print (10, (20, (30, None)))
We can express the same using the lambdas we defined above (NULL
, ISNULL
,
PAIR
, HEAD
, TAIL
):
_APPEND = lambda f: lambda xs: lambda x: \
ISNULL(xs) (lambda _: PAIR(x)(NULL)) (lambda _: PAIR(HEAD(xs))(f(TAIL(xs))(x))) (TRUE)
APPEND = Y(_APPEND)
L = APPEND(APPEND(APPEND(NULL)(10))(20))(30)
print(HEAD(L)) # prints 10
print(HEAD(TAIL(L))) # prints 20
We covered logic, arithmetic, combinators, pairs, and lists, all expressed as lambda terms. Let's also sketch a proof of Turing completeness, like we did in previous posts.
We're calling this a sketch
, as lambda notation is not easy to read. We will
instead look at an implementation using more Python syntax than just lambdas,
but we will only use constructs which we know can be expressed in lambda
calculus.
As usual, we will emulate another system which we know to be Turing-complete.
In part 3
we looked at tag systems. We talked about cyclic tag systems, which can emulate
m-tag systems, which are Turing-complete. As a reminder, a cyclic tag system is
implemented as a set of binary strings (strings containing only 0
s and 1
s)
which are production rules, and we process a binary input string by popping the
head of the string and, if it is equal to 1
, appending the current production
rule to the string. We cycle through the production rules at each step. This is
the code we used in the previous post:
def cyclic_tag_system(productions, string):
# Keeps track of current production
i = 0
# Repeat until the string is empty
while string:
string = string[1:] + (productions[i] if string[0] == '1' else '')
# Update current production
i = i + 1
if i == len(productions):
i = 0
yield string
We used the productions 11
, 01
, and 00
and the input 1
:
productions = ['11', '01', '00']
string = '1'
print(string)
for string in cyclic_tag_system(productions, string):
print(string)
Let's sketch an alternative implementation using the constructs we covered in this post.
First, we can describe our production rules as lists of Boolean values. We
know how to represent Boolean values (TRUE
and FALSE
), and how to build
a list using PAIR
. Our productions can be represented as:
p1 = (True, (True, None)) # PAIR(TRUE)(PAIR(TRUE)(NULL))
p2 = (False, (True, None)) # PAIR(FALSE)(PAIR(TRUE)(NULL))
p3 = (False, (False, None)) # PAIR(FALSE)(PAIR(FALSE)(NULL))
productions = (p1, (p2, (p3, None)))
We can cycle through the list by processing the head, then appending it to the tail of the list. Here are simpler implementations of our list processing functions over Python tuples (though we know how to do these using only lambda terms):
def head(p):
return p[0]
def tail(p):
return p[1]
def append(xs, x):
return (x, None) if not xs else (head(xs), append(tail(xs), x))
# If we want to cycle through our productions, we can do:
# productions = append(tail(productions), head(productions))
We'll also need a function to concatenate two lists. We can easily build this
on top of append()
:
def concat(xs, ys):
return xs if not ys else concat(append(xs, head(ys)), tail(ys))
While we still have ys
, we append the head of ys
to xs
, then recurse
with the tail of ys
.
We process our input string as follows: if it is empty, we are done. If not,
if the head is 1
, we concatenate our current production to the end of the
string, and recurse, cycling productions:
def cyclic_tag_system(productions, input):
return None if not input else \
cyclic_tag_system(
# Cycle productions
append(tail(productions), head(productions)),
# If head is True, concatenate head production. Pop head input either way.
concat(tail(input), head(productions)) if head(input) else tail(input))
Let's throw in a print()
and run this on the same input as our original
example:
def cyclic_tag_system(productions, input):
print(input)
return None if not input else \
cyclic_tag_system(
# Cycle productions
append(tail(productions), head(productions)),
# If head is True, concatenate head production. Pop head input either way.
concat(tail(input), head(productions)) if head(input) else tail(input))
# The input is equivalent to the string '1'
cyclic_tag_system(productions, (True, None))
This should produce output very similar to our original cyclic_tag_system()
,
but using lists of Booleans instead of strings of 0
s and 1
s.
We emulated a cyclic tag system in lambda calculus - well, we didn't write all the code as lambda terms, but everything is expressed as one-liner functions that use only if-then-else expressions, lists (pair, head, tail), and recursion (for which we have the \(Y\) combinator).
Lambda calculus has been extremely influential in computer science - it is the
root of functional programming. LISP, one of the earliest programming
languages, is heavily influenced by lambda calculus. Many ideas, like anonymous
functions, also known as lambdas, are now broadly available in most modern
programming languages (Python even uses the keyword lambda
for these, as we
saw in this post).
In this post we covered lambda calculus:
append
operation.See this Wikipedia article. ↩
In the previous post we covered the von Neumann architecture and even built a small VM implementing the different components. Such naÃ¯ve implementation does make for a very inefficient machine though. In this post, we'll dive a bit deeper into machine architectures (virtual and physical) and discuss some of the implementation details. We'll talk about processing: register and stack-based; we'll talk about memory: word size, byte and word addressing; finally, we'll talk about I/O: port and memory mapped. Note these are all machines that conform to the von Neumann architecture, with the same high-level components. We're just double clicking to the next level of implementation details.
The VM we implemented in our previous post simply operated directly over the memory. This works for a toy example, but moving data from memory to the CPU and back is costly. That's why modern CPUs employ multiple layers of caching (we won't cover these in this post), and rely on a set of registers to perform operations.
Registers can store a number of bits (the word size, more on it below)
and operations are performed using registers. For example, to add two
numbers, the machine would load one number into register R0
, the
second number into register R1
, add the values stored in registers
R0
and R1
, then finally save the result back to memory:
mov r0 @<memory address 1> # Move the value from memory address 1 to r0
mov r1 @<memory address 2> # Move the value from memory address 2 to r1
add r0 r1 # Add the values storing the result in r0
mov @<memory address 3> r0 # Move the value from r0 to memory address 3
Some register are used for general computation. These are called
general-purpose registers. Other register have specialized purposes.
For example, the program counter which keeps track of the instruction to
be executed is usually implemented as an IP
(instruction pointer) or
PC
(program counter) register.
The original 8088 Intel processor had 14 registers. Modern Intel processors have significantly more registers^{1}, though many of them are special-purpose. ARM processors have 17 registers^{2}, 13 of which are general purpose.
Let's emulate a simple CPU with 4 general purpose registers and a
program counter register to get the feel of it. We will only implement
mov
(move) and add
instructions for this example. Our implementation
will check the 16th bit of an argument to determine whether it refers to
a register (if 0
) or to a memory location (if 1
).
class CPU:
def __init__(self, memory):
self.memory = memory
self.registers = [0, 0, 0, 0, 0] # r0, r1, r2, r3, pc
def run(self):
while self.registers[4] < len(self.memory):
instr, arg1, arg2 = self.memory[
self.registers[4]:self.registers[4] + 3]
self.process(instr, arg1, arg2)
self.registers[4] += 3
def get_at(self, arg):
# 16th bit tells us whether this refers to a register or memory
if arg & (1 << 15): # Memory address
return self.memory[arg ^ (1 << 15)]
else: # Register
return self.registers[arg]
def set_at(self, arg, value):
# 16th bit tells us whether this refers to a register or memory
if arg & (1 << 15): # Memory address
self.memory[arg ^ (1 << 15)] = value
else: # Register
self.registers[arg] = value
def process(self, instr, arg1, arg2):
match instr:
case 0: # mov
self.set_at(arg1, self.get_at(arg2))
case 1: # add
self.set_at(arg1, self.get_at(arg1) + self.get_at(arg2))
Here is how it would run a small program that adds two numbers and stores the result:
program = [
0, 0, 15 | (1 << 15), # mov r0 @15
0, 1, 16 | (1 << 15), # mov r1 @16
1, 0, 1, # add r0 r1
0, 17 | (1 << 15), 0, # mov @17 r0
0, 4, 18 | (1 << 15), # mov pc @18 - this ends execution
40, # this is @15
2, # this is @16
0, # this is @17
10000 # this is @18
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
print(memory[17]) # Should print 0
CPU(memory).run()
print(memory[17]) # Should print 42
We're doing a bunch of stuff by hand
, like loading the program into
memory and not using an assembler to implement the program. That's
because we're only focusing on the register-based processing. You can
update the assembler in the previous post to target this VM as an
exercise.
An alternative to registers is to use a stack for storage. While hardware stack machines are not unheard of, register machines easily outperform them so most CPUs you interact with are register-based. That said, stack machines are a popular choice for virtual machines - they are easier to implement and port to different systems and the stack keeps the data being processed close together which helps with performance when running the VM on a physical machine. A few examples: JVM (the Java virtual machine), the CLR (the .NET virtual machine), CPython's VM (the VM for the reference Python implementation) are all stack-based.
The example we used above of adding two numbers would look like this on a stack machine: push the first number onto the stack, push the second number onto the stack, add the numbers (which would pop the two numbers from the stack and replace them with their sum), then pop the value from the stack and store it in memory.
push @<memory address 1> # Push a value from memory address 1
push @<memory address 2> # Push a value from memory address 2
add # Add the top two values
pop @<memory address 3> # Pop the top of the stack and store at memory address 3
Another advantage of stack machines is in general the instructions tend to be shorter. As you can see above, for most instructions that move data around, we don't need to specify both a source and a destination since the stack is implied.
Let's emulate a simple stack VM with only push
, add
, and pop
instructions, plus a jmp
(jump) instruction so we can use the same
mechanism to terminate:
class CPU:
def __init__(self, memory):
self.memory = memory
self.stack, self.pc = [], 0
def run(self):
while self.pc < len(self.memory):
instr, arg = self.memory[self.pc:self.pc + 2]
self.process(instr, arg)
self.pc += 2
def process(self, instr, arg):
match instr:
case 0: # push
self.stack.append(self.memory[arg])
case 1: # pop
self.memory[arg] = self.stack.pop()
case 2: # jmp
self.pc = self.stack.pop()
case 3: # add
self.stack.append(self.stack.pop() + self.stack.pop())
Here is how it would run a small program that adds two numbers and stores the result:
program = [
0, 12, # push @12
0, 13, # push @13
3, 0, # add
1, 14, # pop @14
0, 15, # push @15
2, 0, # jmp
40, # this is @12
2, # this is @13
0, # this is @14
10000, # this is @15
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
print(memory[14]) # Should print 0
CPU(memory).run()
print(memory[14]) # Should print 42
Contrast the implementation with the register-based one: the latter VM only needs 1 argument for the instructions we implemented and the program is slightly shorter.
So far we focused on how data is processed. Let's also look at the different ways of referencing data.
We've been using Python for our toy implementations. Python supports arbitrarily large integers, so a list of numbers in Python (the way we implemented our memory) doesn't imply much in terms of bits and bytes. Bits and bytes do become important for physical machines and serious VMs implemented in languages closer to the metal.
First, let's talk about word size. A word is the fixed-size unit of computation for a CPU. It's size is the number of bits. For example, a 16-bit processor has a word-size of 16-bits.
Applied to registers, this would mean that a machine register can hold at most 16 bits (a value between 0 and 65535). Operations within the value range are blazingly fast, as they run natively. If we need to process larger values, we need to do extra work to chunk the values into words and process these in turn. For example we can split a 32-bit value into two 16-bit values, process them separately, then concatenate the result. This obviously impacts performance. The point being that we are not necessarily limited to the word size, but processing larger values becomes much costlier.
Applied to memory addresses, this would mean how pointers are represented and what range of values can be addressed. For example, if the word size is 16 bits, then a pointer can point to any one of 65536 distinct memory locations.
An architecture can use the same word size for both registers and pointers, or different word sizes for different concerns. Commonly, a single word size is used (and, potentially, fractions or multiples of it for special concerns), that's why it's common to refer to a processor as a 32-bit processor, 64-bit processor etc.
An implication of word size applied to memory addressing is how the machine accesses memory. Some architectures allow byte addressing, which means a pointer points to a specific byte in memory, while others support only word addressing, which means a pointer points to a word in memory.
This is another important decision when designing a computer. If we want to be able to address individual bytes, a 16 bit pointer can refer to any of 65536 bytes. That is 64 Kb. If our memory is larger than that, a pointer won't be able to address higher locations.
On the other hand, if we make our memory word-addressable, for our 16-bit example, a pointer can refer to any of 65536 16-bit words. 16 bits are 2 bytes, so our memory's upper limit is 131072 bytes (65536 x 2), which is 128 Kb. We can now refer to higher memory addresses, but we can't address individual bytes as before - address 0 is no longer the byte at 0, is the whole 2-byte word (since address 1 refers to the next 2 bytes and so on).
This difference becomes even more dramatic for higher word sizes. A 32-bit pointer can address 4294967296 bytes (up to 4 Gb of memory). Alternately, with word addressing, the same pointer can cover 16 Gb.
On the flip side, word-addressing is less efficient when the unit of
processing is smaller. Let's take text editing as an example. Say we
want to update a one byte character, like a UTF-8 encoded common
character like a
. If we can refer to it directly, we can load,
process, and update its memory location using a pointer. If, on the
other hand, this character is part of a larger word, we would have to
process the whole word to extract the character we care about (masking
bits we don't need to process), apply the update to the whole word, and
write this word back to memory.
So depending on the scenario, byte or word addressing might make things faster or slower. Byte addressing is great for text processing - document authoring, HTML, writing code etc. Word addressing unlocks larger memory sizes and is great for crunching numbers - math, graphics etc.
Another important design decision is how to handle I/O.
One way to connect I/O to the system is through specific CPU
instructions. For example, the CPU might have an inp
instruction used
to consume input and an out
instruction used to send output. Programs
can use these instructions to perform I/O. This is called port-mapped
I/O, as I/O is achieved by connecting devices to the CPU via dedicated
ports.
For example, let's extend our stack machine with an out
instruction
(also connecting an output to it):
class CPU:
def __init__(self, memory, out):
self.memory, self.out = memory, out
self.stack, self.pc = [], 0
def run(self):
while self.pc < len(self.memory):
instr, arg = self.memory[self.pc:self.pc + 2]
self.process(instr, arg)
self.pc += 2
def process(self, instr, arg):
match instr:
case 0: # push
self.stack.append(self.memory[arg])
case 1: # pop
self.memory[arg] = self.stack.pop()
case 2: # jmp
self.pc = self.stack.pop()
case 3: # add
self.stack.append(self.stack.pop() + self.stack.pop())
case 4: # out
self.out(self.stack.pop())
Here is a program that prints Hello
:
program = [
0, 24, # push @24
0, 25, # push @25
0, 26, # push @26
0, 27, # push @27
0, 28, # push @28
4, 0, # out
4, 0, # out
4, 0, # out
4, 0, # out
4, 0, # out
0, 29, # push @29
2, 0, # jmp
111, # this is @24
108, # this is @25
108, # this is @26
101, # this is @27
72, # this is @28
10000, # this is @29
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
def out(val):
print(chr(val), end='')
CPU(memory, out).run()
An alternative to port-mapped I/O is memory-mapped I/O. In this case, a certain address range of memory is used for I/O operations. That is, from the CPU's perspective, memory and I/O are addressed identically. But depending on the address range, data might reside in memory or it might actually come from/go to an I/O device.
Let's enhance our memory implementation (which so far was just an array) to support mapped I/O. In this case, any values written at address 1000 will be instead printed on screen:
class MappedMemory:
def __init__(self, program):
# MappedMemory wraps a list
self.memory = [0] * 10000
self.memory = program + self.memory[len(program):]
def __len__(self):
# Use underlying list's __len__
return self.memory.__len__()
def __getitem__(self, key):
# Index in wrapped list
return self.memory[key]
def __setitem__(self, key, value):
# If key is 1000, print
if key == 1000:
print(chr(value), end='')
# Otherwise set in underlying list
else:
self.memory[key] = value
And here is the corresponding program that prints Hello
(using the
stack CPU without the out
instruction and connected output):
program = [
0, 24, # push @24
0, 25, # push @25
0, 26, # push @26
0, 27, # push @27
0, 28, # push @28
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
0, 29, # push @29
2, 0, # jmp
111, # this is @24
108, # this is @25
108, # this is @26
101, # this is @27
72, # this is @28
10000, # this is @29
]
## Load program into memory
memory = MappedMemory(program)
CPU(memory).run()
Note in this program we repeatedly set
the value at address 1000
which is mapped to our output device (print()
).
In this post we discussed some of the implementation details of machines and virtual machines:
A few years back I implemented a toy VM with 7 registers, 16 op codes, 128 KB of memory, and port-mapped I/O in 121 lines of C++. It comes with an assembler, examples, and, of course, a Brainfuck interpreter. Linking it here for reference: Pixie.
See this SO question. ↩
See the ARM documentation. ↩
During the previous posts, we covered Turing machines, tag systems, and cellular automata. All of these are equivalent in terms of what they can compute, but some are more practical than others. In this post, we'll look at the von Neumann architecture of physical computers and implement an extremely inefficient machine, write a few programs targeting it, then prove it is Turing complete.
John von Neumann was a famous mathematician and physicist. Contemporary with Alan Turing, he was aware of Turing's work on Turing machines and computability. At the same time, von Neumann was involved in the Manhattan Project which required lots of computation provided by some early computers. Thus he got involved in computer design. Unlike a Turing machine, a physical computer can't have an infinite tape and while data is processed based on input and states, this needs to be more ergonomic than Yurii Rogozhin's 4-state 6-symbol machine we described in Part 2.
Von Neumann described a computer architecture as consisting of the following components^{1}:
- A central arithmetic component (CA) handling calculation.
- A central control component (CC) driving which calculations should be performed.
- Memory (M) for storage.
- Input (I) and output (O) components to get data into the system and to communicate results outside of the system, from/to a recording medium (R)
Here is a diagram of this architecture:
Before von Neumann, computers were single-purpose devices - the programming was hardwired. One of the major innovations, which might not be apparent, is the introduction of a central control component and the ability of the memory to store not only data but also the program itself. This makes devices based on this architecture able to be reprogrammed to perform different tasks.
We can now load an arbitrary program into memory. The program will use the instructions which our central arithmetic understands to perform computations. The central control can read this program and have the central arithmetic perform the required operations. During execution, data is also read from/written to memory.
Programs (and data) is loaded into memory through the input component and results are sent through the output component.
While over the following decades this architecture got tweaked and tuned, it's pretty obvious it is the ancestor of all modern computers: computers still have CPUs, which include control and arithmetic, and memory.
Let's create a virtual machine based on this architecture.
We will create a very simple machine based on this architecture in Python. In subsequent posts, we will look at other designs, but we're starting with a direct translation of this architecture.
The interface to our input component is a function that, when called, returns an integer. This is all our machine needs to get data.
We will implement this over a text file. Our input component will buffer
this file into a list and expose a read_one()
function that will
return one integer (as returned by [ord()]{.title-ref}) for each
character from the buffer.
def inp(file):
buffer = list(open(file).read())
return lambda: ord(buffer.pop(0))
The interface to our output component is a function that takes an integer as an argument. This is all our machine needs to output one memory cell.
We will implement this using print()
and actually convert the given
integer to a character. This is just to provide a convenient way for us
to look at output like Hello world!
.
def out(value):
print(chr(value), end='')
Our memory will consist of a list of 10000 integers. We will zero-initialize the list, then load a program from a file to memory, starting at address 0. We expect the program to consist of a series of integers separated by a space or a newline character. We'll use this encoding to make it easier for us to peek at the code targeting our von Neumann machine.
def memory(file):
memory = [0] * 10000
for i, value in enumerate(' '.join(open(file).readlines()).split()):
memory[i] = int(value)
10000 is chosen arbitrarily, at this point we're not worrying about word size, page alignment etc. We simply have room to store 10000 integers in our memory, which will include both code and data.
We'll package the control and arithmetic components into a CPU
class.
We'll initialize this class with memory, input, and output components.
class CPU:
def __init__(self, memory, inp, out):
self.memory, self.inp, self.out = memory, inp, out
Our control unit will maintain a program counter (PC
), an index into
the memory pointing to the next instruction to execute. The machine runs
by reading 3 integers from memory (at PC
, PC + 1
and PC + 2
), and
passing these to the arithmetic unit for processing. The program counter
is then incremented by 3. This repeats until PC
goes outside the
bounds of the memory, at which point the machine halts (alternately we
could have provided some HALT
instruction).
def run(self):
self.pc = 0
while self.pc < len(self.memory):
instr, m1, m2 = self.memory[self.pc:self.pc + 3]
self.process(instr, m1, m2)
self.pc += 3
We will implement process()
next.
Our arithmetic unit will process triples of
<Instruction> <memory address 1> <memory address 2>
. It will support 8
instructions:
AT
will set the value at memory address 1
to be the value at the
memory address specified by the value at memory address 2
(in
short, m[m1] = m[m[m2]]
).SET
will set the value at the memory address specified by the
value at memory address 1
to be the value at memory address 2
(in short, m[m[m1]] = m[2]
).ADD
will update the value at memory address 1
by adding the
value at memory address 2
to it (in short, m[m1] += m[m2]
).NOT
will update the value at memory address 1
to be 0 if the
value at memory address 2
is different than 0, or 1 if the value
at memory address 2
is 0 (in short, m[m1] = !m[m2]
).EQ
will compare the values at memory address 1
and
memory address 2
and update the value at memory address 1
to be
1 if they are equal, 0 otherwise (in short,
m[m1] = m[m1] == [m2]
).JZ
will perform a conditional jump - if the value at
memory address 1
is 0, it will update the program counter to point
to memory address 2
(in short, if !m[m1] then PC = m[m2]
).INP
will read one integer from the input and store it at
memory address 1
+ an offset value specified at memory address 2
(in short, m[m1 + m[m2]] = inp()
).OUT
will write the value at memory address 1
+ an offset value
specified at memory address 2
to the output (in short,
out(m[m1 + m[m2]])
.Since the instructions are also read from memory, which is a list of
integers, we will encode them as integers: AT = 0
, SET = 1
, ...
OUT = 7
.
def process(self, instr, m1, m2):
match instr:
case 0: # AT
self.memory[m1] = self.memory[self.memory[m2]]
case 1: # SET
self.memory[self.memory[m1]] = self.memory[m2]
case 2: # ADD
self.memory[m1] += self.memory[m2]
case 3: # NOT
self.memory[m1] = +(not self.memory[m2])
case 4: # EQ
self.memory[m1] = +(self.memory[m1] == self.memory[m2])
case 5: # JZ
if not self.memory[m1]:
# Set PC to m2 - 3 since run() will increment PC by 3
self.pc = m2 - 3
case 6: # INP
self.memory[m1 + self.memory[m2]] = self.inp()
case 7: # OUT
out(self.memory[m1 + self.memory[m2]])
case _:
raise Exception("Unknown instruction")
Putting it all together, we'll take two input arguments: the first one
(argv[1]
) will represent the code input file containing the program,
the second one (argv[2]
) will be the file containing additional input
to be consumed by the inp()
function:
import sys
vn = CPU(memory(sys.argv[1]), inp(sys.argv[2]), out)
vn.run()
Here is our von Neumann virtual machine in one listing:
def inp(file):
buffer = list(open(file).read())
return lambda: ord(buffer.pop(0))
def out(value):
print(chr(value), end='')
def memory(file):
memory = [0] * 10000
for i, value in enumerate(' '.join(open(file).readlines()).split()):
memory[i] = int(value)
return memory
class CPU:
def __init__(self, memory, inp, out):
self.memory, self.inp, self.out = memory, inp, out
def run(self):
self.pc = 0
while self.pc < len(self.memory):
instr, m1, m2 = self.memory[self.pc:self.pc + 3]
self.process(instr, m1, m2)
self.pc += 3
def process(self, instr, m1, m2):
match instr:
case 0: # AT
self.memory[m1] = self.memory[self.memory[m2]]
case 1: # SET
self.memory[self.memory[m1]] = self.memory[m2]
case 2: # ADD
self.memory[m1] += self.memory[m2]
case 3: # NOT
self.memory[m1] = +(not self.memory[m2])
case 4: # EQ
self.memory[m1] = +(self.memory[m1] == self.memory[m2])
case 5: # JZ
if not self.memory[m1]:
# Set PC to m2 - 3 since run() will increment PC by 3
self.pc = m2 - 3
case 6: # INP
self.memory[m1 + self.memory[m2]] = self.inp()
case 7: # OUT
out(self.memory[m1 + self.memory[m2]])
case _:
raise Exception("Unknown instruction")
import sys
vn = CPU(memory(sys.argv[1]), inp(sys.argv[2]), out)
vn.run()
We can save this as vn.py
.
Let's create a Hello world!
program targeting this machine. We will
use the OUT
instruction to output each character of Hello
and a
new line (\n
). We'll first tell the VM to output the values at memory
address 21 to 26:
7 21 9999
7 22 9999
7 23 9999
7 24 9999
7 25 9999
7 26 9999
We are referencing addresses 21 to 26 plus the offset 0 (the value at
memory 9999
, since our memory is initialized with zeros).
We want to halt after this, so we need to jump our program counter to
10000. We will do this by using our JZ
instruction, saying if the
memory value at index 9999 is 0, jump to 10000:
5 9999 10000
Now we get to memory address 21, so we will set the values of memory 21
to 26 to the values of the characters in Hello
(as returned by
ord()
) plus a 10
for \n
:
72 101 108 108 111 10
Here is the full listing which we can save as hello.vn
:
7 21 9999
7 22 9999
7 23 9999
7 24 9999
7 25 9999
7 26 9999
5 9999 10000
72 101 108 108 111 10
We can then use our VM to run the program like this:
touch input
python3 vn.py hello.vn input
We're also creating a blank input
file since Hello world!
isn't
going to read anything via inp()
.
Running this should print Hello
. Our program
is pretty hard to
write or read, we're programming with integers. Let's make our life a
bit easier.
We will implement an assembler for our VM. An assembly language is a low-level language closely matching the architecture it targets (in our case, our very simple von Neumann machine).
Our assembler will take 2 arguments - an input file and an output file - and automatically translate the input (assembly language) into instructions for our VM.
We will add the following features:
#
will be ignored.at
, set
,
add
, not
, eq
, jz
, inp
, out
to represent the instructions
0
, 1
, ... 5
.:
, for example HERE:
. We will then be able to refer to the
location using the identified preceded by :
, like :HERE
. We will
also allow adding an offset to a reference: :HERE+2
is 2 past the
HERE
label.ORD
macro - To make implementing Hello world!easier, we will provide the
ORD()
macro which will return the integer
representation of the character passed to it, for example ORD(H)
will return 72
.Using this assembly language, we can rewrite Hello world!
as:
## Print 6 characters starting from DATA
out :DATA 9999
out :DATA+1 9999
out :DATA+2 9999
out :DATA+3 9999
out :DATA+4 9999
out :DATA+5 9999
## End program
jz 9999 10000
## Data section
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
First, we'll read the input file and convert it into a list of tokens.
We will ignore lines starting with #
(so we can add comments to our
assembly file).
import sys
if len(sys.argv) != 3:
print("Usage: asm.py <input> <output>")
exit()
## Read all lines into a list
lines = open(sys.argv[1]).readlines()
## Filter out blank lines and lines starting with '#'
lines = list(filter(lambda line: line and line[0] != '#', lines))
## Join all lines and split into tokens
tokens = ' '.join(lines).split()
The labels themselves aren't part of the program, rather mark locations in the program, so in the next step we will pluck these out from the list of tokens but retain the index they are referencing:
## pluck labels and remember position
labels, i = {}, 0
while i < len(tokens):
# If not a label, advance
if tokens[i][-1] != ':':
i += 1
continue
# Store location and pluck label
labels[tokens[i][:-1]] = i
tokens.pop(i)
Now we will process all tokens and handle the following cases:
:
, it is a label reference, so replace it
with the actual location (as stored during the previous step).ORD()
macro, replace the character passed to
ORD()
with its value.## Op code list (constant)
OP_CODES = ['at', 'set', 'add', 'not', 'eq', 'jz', 'inp', 'out']
for i, token in enumerate(tokens):
# replace label references with actual position
if token[0] == ':':
if '+' in token:
base, offset = token.split('+')
tokens[i] = labels[base[1:]] + int(offset)
else:
tokens[i] = labels[token[1:]]
# replace op codes with values
if token in OP_CODES:
tokens[i] = OP_CODES.index(token)
# replace ORD macro
if token[:4] == 'ORD(':
tokens[i] = ord(token[4:-1])
Finally, we write all tokens to the output file:
open(sys.argv[2], "w").write(
' '.join([str(token) for token in tokens]))
Here is the full source code of our assembler (asm.py
):
import sys
if len(sys.argv) != 3:
print("Usage: asm.py <input> <output>")
exit()
## Read all lines into a list
lines = open(sys.argv[1]).readlines()
## Filter out blank lines and lines starting with '#'
lines = list(filter(lambda line: line and line[0] != '#', lines))
## Join all lines and split into tokens
tokens = ' '.join(lines).split()
## pluck labels and remember position
labels, i = {}, 0
while i < len(tokens):
# If not a label, advance
if tokens[i][-1] != ':':
i += 1
continue
# Store location and pluck label
labels[tokens[i][:-1]] = i
tokens.pop(i)
## Op code list (constant)
OP_CODES = ['at', 'set', 'add', 'not', 'eq', 'jz', 'inp', 'out']
for i, token in enumerate(tokens):
# replace label references with actual position
if token[0] == ':':
if '+' in token:
base, offset = token.split('+')
tokens[i] = labels[base[1:]] + int(offset)
else:
tokens[i] = labels[token[1:]]
# replace op codes with values
if token in OP_CODES:
tokens[i] = OP_CODES.index(token)
# replace ORD macro
if token[:4] == 'ORD(':
tokens[i] = ord(token[4:-1])
open(sys.argv[2], "w").write(
' '.join([str(token) for token in tokens]))
We can now save our assembly Hello world!
(listed above) to a file,
let's call it hello.asm
and use the assembler to convert it to a
program our VM can execute:
python3 asm.py hello.asm hello.vn
The resulting hello.vn
should have the same content as our
hand-crafted Hello world!
, minus the newlines (the assembler
doesn't output newlines). The content of the assembled file hello.vn
is:
7 21 9999 7 22 9999 7 23 9999 7 24 9999 7 25 9999 7 26 9999 5 9999 10000 72 101 108 108 111 10
We can run this using:
python3 vn.py hello.vn input
We are again using an empty input file since we don't need input. As a
convention, we use the .asm
extensions for assembly files and .vn
for assembled files targeting the VM.
Let's rewrite our program: instead of outputting :DATA
, then
:DATA+1
, then DATA+2
... we should be able to output :DATA + :I
where :I
goes from 0 to 5.
We can easily use a variable by tagging any part of the program then referencing it, then using that label to refer to the variable.
I: 0
Then we can use :I
to reference to it. We will use a COUNTER
variable to count down from 6 to 0, and an offset variable I
:
## Variables
I: 0
COUNTER: 6
We also need a couple of constant values: 0
, 1
- by which we
increment I
during each iteration, and -1
to decrement COUNTER
during each iteration. And, of course, our DATA
, where we store the
Hello
string:
## Constants
CONST: 0 1 -1
## Data
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
Now lets look at how we can implement a loop using JZ
:
## Beginning of loop
LOOP:
## Output I
out :DATA :I
## Decrement COUNTER, increment I
add :COUNTER :CONST+2
add :I :CONST+1
## If COUNTER is 0, we're done
jz :COUNTER 10000
## If not, jump to the start of the loop
jz :CONST :LOOP
At each iteration, our loop will output the character value at DATA
plus the offset specified in I
(initially 0). Then we subtract -1 from
our COUNTER
and add 1 to I
. Since our VM uses memory addresses for
all operations, we stored 1
and -1
in memory at CONST
and
CONST+1
respectively.
If COUNTER
is 0, we're done, so we jump to 10000
. If not, we repeat
the loop (jump to LOOP
if CONST
is 0, but CONST
is always 0).
Here is the full listing of this program:
## Beginning of loop
LOOP:
## Output I
out :DATA :I
## Decrement COUNTER, increment I
add :COUNTER :CONST+2
add :I :CONST+1
## If COUNTER is 0, we're done
jz :COUNTER 10000
## If not, jump to the start of the loop
jz :CONST :LOOP
## Constants
CONST: 0 1 -1
## Data
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
## Variables
I: 0
COUNTER: 6
We can save this as hello2.asm
, then assemble and run it:
python3 asm.py hello2.asm hello2.vn
python3 vn.py hello2.vn
A few notes: data is mixed with code in all our programs, which follows from the von Neumann architecture, in which the memory of the system stores both code and data. This is fundamentally true for all computers, and enables some interesting behavior like self-modifying code. This could be intentional, or we could, accidentally due to a bug, interpret data as code or vice-versa, code as data. Modern systems employ various additional protections to prevent this type of accidental usage.
Because our particular VM starts execution from memory location 0, we have to place our constants and variables (data) after the instructions in the program. Executable files on modern systems similarly contain code and data segments, albeit with more complex layout and rules.
Let's prove our simple von Neumann VM is Turing-complete, meaning capable of universal computation. As we saw throughout this series of blog posts, the best way to prove this is to emulate another known Turing-complete system.
We will prove this by implementing a
Brainfuck interpreter. We
covered Brainfuck during the second post in the
series,
under Esoteric Turing machines. To recap: Brainfuck (BF) uses a byte
array (tape), a data pointer (index in the array), and 8 symbols: >
,
<
, +
, -
, .
, ,
, [
, ]
. The symbols are interpreted as:
>
: Increment the data pointer (move head right).<
: Decrement the data pointer (move head left).+
: Increment array value at data pointer.-
: Decrement array value at data pointer..
: Output value at data pointer.,
: Read 1 byte of input and store at data pointer.[
: If the byte at data pointer is 0, jump right to the matching
]
, else increment data pointer.]
: If the byte at data pointer is not 0, jump left to the matching
[
, else decrement data pointer.We will use our assembly language to implement a program which reads a BF program from input, then executes it. Effectively, we'll use our very simple virtual machine to emulate another very simple virtual machine!
I won't cover the details of the implementation, since it is quite cumbersome due to the simplicity of our VM and assembly language. I will just provide a short summary of what is going on:
\
).CODE_PTR
code pointer variable to point to the
current BF instructions and a DATA_PTR
data pointer variable to
point to the BF array.>
, <
, etc.).[
and ]
, which require keeping track of unbalanced parenthesis
so we properly jump from [
to matching ]
and vice-versa.Here is the full Brainfuck interpreter implemented in our assembly language:
## Read Brainfuck program until a \n is encountered
START:
## Read one integer at PROG + offset I
inp :PROG :I
## Increment I by 1
add :I :CONST+1
## Zero out DONE_READING (!1)
not :DONE_READING :CONST+1
## DONE_READING = 10
add :DONE_READING :CONST+3
## Load the last integer we read in TEMP
at :TEMP :END
## Increment END to keep track of program end
add :END :CONST+1
## Check if the last integer we read was 10 (\n)
eq :DONE_READING :TEMP
## If it wasn't zero, jump to start and read another value
jz :DONE_READING :START
## Start running program
BF_RUN:
at :TEMP :CODE_PTR
add :CODE_PTR :CONST+1
## Check if we're on a > instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :RIGHT
## Check if we're on a < instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+1
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :LEFT
## Check if we're on a + instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+2
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :INC
## Check if we're on a - instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+3
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :DEC
## Check if we're on a . instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+4
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :OUT
## Check if we're on a , instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+5
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :IN
## Check if we're on a [ instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+6
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :FORWARD
## Check if we're on a ] instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+7
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :BACKWARD
## No matching BF instruction so we're done
jz :CONST 10000
RIGHT:
## > - increment data pointer
add :DATA_PTR :CONST+1
jz :CONST :BF_RUN
LEFT:
## < - decrement data pointer
add :DATA_PTR :CONST+2
jz :CONST :BF_RUN
INC:
## + - increment cell
at :TEMP :DATA_PTR
add :TEMP :CONST+1
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
DEC:
## - - decrement cell
at :TEMP :DATA_PTR
add :TEMP :CONST+2
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
OUT:
## . - output cell
at :TEMP :DATA_PTR
out :TEMP :CONST
jz :CONST :BF_RUN
IN:
## , - store input in cell
inp :TEMP :CONST
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
FORWARD:
## [
at :TEMP :DATA_PTR
not :TEMP :TEMP
## If value in cell is not 0, continue
jz :TEMP :BF_RUN
## Find matching ]
## Set TEMP to 1, counting unbalanced [
not :TEMP :TEMP
add :TEMP :CONST+1
SCAN_FORWARD:
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+6
not :TEMP2 :TEMP2
## Jump if found a [
jz :TEMP2 :FORWARD_LPAR
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+7
not :TEMP2 :TEMP2
## Jump if found a ]
jz :TEMP2 :FORWARD_RPAR
## Keep scanning
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
## Increment counter when finding a [
FORWARD_LPAR:
add :TEMP :CONST+1
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
## Decrement counter when finding a ]
FORWARD_RPAR:
add :TEMP :CONST+2
## If counter is 0, we're done
jz :TEMP :BF_RUN
## Else keep scanning
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
BACKWARD:
## ]
at :TEMP :DATA_PTR
## If value in cell is 0, continue
jz :TEMP :BF_RUN
## Find matching [
## Set TEMP to 1, counting unbalanced ]
not :TEMP :TEMP
add :TEMP :CONST+1
## Move code pointer back 2
add :CODE_PTR :CONST+2
add :CODE_PTR :CONST+2
SCAN_BACKWARD:
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+6
not :TEMP2 :TEMP2
## Jump if found a [
jz :TEMP2 :BACKWARD_LPAR
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+7
not :TEMP2 :TEMP2
## Jump if found a ]
jz :TEMP2 :BACKWARD_RPAR
## Keep scanning
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
## Decrement counter when finding a [
BACKWARD_LPAR:
add :TEMP :CONST+2
## If counter is 0, we're done
jz :TEMP :BF_RUN
## Else keep scanning
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
## Increment counter when finding a ]
BACKWARD_RPAR:
add :TEMP :CONST+1
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
CONST: 0 1 -1 10
BF: ORD(>) ORD(<) ORD(+) ORD(-) ORD(.) ORD(,) ORD([) ORD(])
I: 0
TEMP: 0
TEMP2: 0
END: :PROG
DONE_READING: 0
CODE_PTR: :PROG
DATA_PTR: 5000
## We'll load the BF program here
PROG:
We can save this program as bf.asm
. We will also create a Brainfuck
program to run - Hello world
:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
We will save this as hello.bf
. Now we can compile our BF interpreter
and run it using our VM:
python3 asm.py bf.asm bf.vn
python3 vn.py bf.vn hello.bf
This should output Hello world!
.
Since Brainfuck is Turing-complete and our VM can emulate a Brainfuck interpreter, our VM is also Turing-complete.
Hello world, and saw how we can use variables and loops.
For convenience, the code we covered in this post is online here:
]]>In the previous post we talked about Conway's Game of Life as a well-known cellular automaton. In this post we will cover even simpler automata - the elementary cellular automata. Stephen Wolfram covers them extensively in his book, A New Kind of Science.
To recap, we defined a cellular automaton as a discrete n-dimensional lattice of cells, a set of states (for each cell), a notion of neighborhood for each cell, and a transition function mapping the neighborhood of each cell to a new cell state.
An elementary cellular automaton is 1-dimensional - an array of cells. A cell can be either on or off (just like in Conway's Game of Life). The neighborhood of a cell, meaning the cells that we take into account when we determine the next state of the next generation, consists of the cell itself and its left and right neighbors.
For example, we can define an elementary cellular automaton with the following rules:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> on
[off, off, on] -> on
[off, off, off] -> off
If we start with a single on cell and produce 10 generations, we get
(using #
to mean on):
#
###
# #
### ###
# #
### ###
# # # #
### ### ### ###
# #
### ###
The elementary cellular automata can easily be enumerated exhaustively:
the neighborhood of a cell can be in only one of 8 states, as we saw
above: [on, on, on]
, [on, on, off]
, ... [off, off, off]
. The
transition function maps each of these possible states to either on or
off. If we think of the on/off as a bit, we need 8 bits to
represent the transition function.
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> on
[off, off, on] -> on
[off, off, off] -> off
can be represented as the binary number 00010110
, which, in decimal,
is 22 (where [off, off, off]
is the least significant bit). We can
represent numbers from 0 to 255 in 8 bits, so there are exactly 256
elementary cellular automata. This encoding is referred to as Rule as
in transition rule
. The elementary cellular automata in our above
example is called Rule 22 .
A common way to plot the evolution of an elementary cellular automata
over multiple generation is to render each generation below the previous
one, like our above example using #
for on. A more condensed version
with 1 pixel per cell of running rule 22 for 301 generations looks like
this:
At this level, we can clearly see patterns emerging in the automaton. We get an even more interesting view if, instead of starting with just a single on cell, we start with a random state - an array of random on and off cells. Here is rule 22 starting with 301 random cells and running for 301 generations:
We can also easily see some of the automatons are complements of other automatons: if we simply flip each bit, we end up with a complementary version. Rule 22's complement is Rule 151:
We can also reflect a rule by swapping the transitions for
[on, off, off]
with [off, off, on]
and [on, on, off]
with
[off, on, on]
. This doesn't work for rule 22, since its reflection is
still 22, but, for example, rules 3 and 17 are reflections of each
other.
Rule 3:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> off
[off, on, on] -> off
[off, on, off] -> off
[off, off, on] -> on
[off, off, off] -> on
Renders as:
Rule 17:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> off
[off, off, on] -> off
[off, off, off] -> on
Renders as:
That means that, even though there are 256 possible automata, from behavioral perspective, some are complements or reflections of others thus exhibit the same behavior. In fact, there are only 88 uniquely behaving automata, all others being complements and/or reflections of these.
Let's look at a Python implementation. We will represent the state of
an automaton as a list of Boolean cells. We can encode the state of a
neighborhood as a 3 bit number:
left neighbor * 4 + cell * 2 + right neighbor
. Given a list of cells
and the index of a cell, we have:
def neighbors(cells, i):
return (cells[i - 1] if i > 0 else False) * 4 + \
cells[i] * 2 + \
(cells[i + 1] if i < len(cells) - 1 else False)
If we run off the ends of the list, we assume the state of that cell is
off. In Python, False
becomes 0
and True
becomes 1 if we do
arithmetic with them, so this function will return a number between 0
and 7.
We can derive the transitions from the rule number by taking a rule number and expanding it into a dictionary that maps each value from 0 to 7 to the corresponding bit in the rule number value:
def transition(rule):
return {i: rule & (1 << i) != 0 for i in range(8)}
This might be a bit hard to understand, so let's work through an
example. Let's take Rule 22. The binary representation of Rule 22 is
00010110
. We're iterating over the range 0...7 (i
) and for each of
these values, we shift 1
exactly i
bits left. Then we check if the
rule logic AND this shifted bit is different than 0.
For i == 0
: 00010110 & (1 << 0)
, which is 00010110 & 00000001
, we
get False
, so transitions[0] = False
.
For i == 1
: 00010110 & (1 << 1)
, which is 00010110 & 00000010
, we
get True
, so transitions[1] = True
.
...
For i == 7
: 00010110 & (1 << 7)
, which is 00010110 & 10000000
, we
get False
, so transitions[7] = False
.
Remember the keys of the dictionary are neighborhood states.
Now we just need a function that takes a rule, an initial state, and the number of steps we want to run. The function will start with the initial state, then at each step, update the list of cells using the transition function:
def run(rule, initial_state, steps):
t, cells = transition(rule), initial_state
for _ in range(steps):
yield cells
cells = [t[neighbors(cells, i)] for i in range(len(cells))]
We talked about two ways to look at cellular automata: starting with a single on cell, or starting with a random initial state.
Let's implement an initial_state
function which takes a cell count as
input and returns a list of cells, all of which are off except the
middle one:
def initial_state(cell_count):
result = [False] * cell_count
result[cell_count // 2] = True
return result
We'll also want a random_initial_state
which takes a cell count and
returns a random cell list. We'll take advantage of the fact that
Python supports arbitrarily large integers natively, so we'll just
generate a random number with cell_count
bits, then derive the cell
list from that (if a bit is 1
, the corresponding cell is on):
import random
def random_initial_state(cell_count):
seed = random.randint(0, 2 ** cell_count - 1)
return [seed & (1 << i) != 0 for i in range(cell_count)]
Here is all the code in one listing:
def neighbors(cells, i):
return (cells[i - 1] if i > 0 else False) * 4 + \
cells[i] * 2 + \
(cells[i + 1] if i < len(cells) - 1 else False)
def transition(rule):
return {i: rule & (1 << i) != 0 for i in range(8)}
def run(rule, initial_state, steps):
t, cells = transition(rule), initial_state
for _ in range(steps):
yield cells
cells = [t[neighbors(cells, i)] for i in range(len(cells))]
def initial_state(cell_count):
result = [False] * cell_count
result[cell_count // 2] = True
return result
import random
def random_initial_state(cell_count):
seed = random.randint(0, 2 ** cell_count - 1)
return [seed & (1 << i) != 0 for i in range(cell_count)]
Here is how we can use this to print the first 30 steps of Rule 22:
for state in run(22, initial_state(61), 30):
print(''.join(['#' if e else ' ' for e in state]))
Wolfram analyzed the behavior of cellular automata and classified them in 4 classes (called Wolfram classes). These go beyond elementary cellular automata to cover other cellular automata like, for example, ones where the next generation of a cell is not determined only by the cell and the two cells next to it, rather the neighborhood includes next-next cells. In this post we'll stick to elementary cellular automata though.
Class 1 automata converge quickly to a uniform state. For example rule 0 becomes all off in one generation:
It's complement, rule 255, becomes all on in one generation:
Class 2 automata converge quickly to a repetitive state. For example rule 4:
Class 3 automata appear to remain in a random state, without converging. Rule 22, which we started with above, exhibits this type of behavior:
The most interesting class of cellular automata, class 4, has a quite remarkable behavior: areas of cells end up in static or repetitive state, while some cells end up forming structures that interact with each other. Rule 110 is the only elementary cellular automaton that exhibits this behavior:
The fact that Rule 110 has areas of cells that are static or repetitive
while some other cells form structures should remind you of the
Conway's Game of Life spaceships we discussed in the previous post. In
the previous post, we saw that the Game of Life is Turing complete, and
how a Turing machine was implemented
using spaceships as signals
processed
by other patterns.
Turns out Rule 110 is also Turing complete. Stephen Wolfram conjectured this in 1985, and the conjecture was proved in 2004 by Matthew Cook^{1}. Cook uses Rule 110 gliders (interacting structures) to emulate a cyclic tag system. We saw in Computability Part 3: Tag Systems that cyclic tag systems can emulate tag systems, and an m-tag system with \(m \gt 1\) is Turing complete.
Rule 110, an elementary cellular automaton, is also capable of universal computation. And while this all might seem very abstract, cellular automata are so simple they show up in nature:
]]>Formal definition:
A cellular automaton consists of a discrete n-dimensional lattice of cells, a set of states (for each cell), a notion of neighborhood for each cell, and a transition function mapping the neighborhood of each cell to a new cell state.
The system evolves over time, where at each step, the transformation function is applied over the lattice to determine the states of the next generation of cells.
Conway's Game of Life is a cellular automaton on a 2D plane with the following rules:
- Any live cell with fewer than two live neighbors dies.
- Any live cell with two or three live neighbors lives on to the next generation.
- Any live cell with more than three live neighbors dies.
- Any dead cell with exactly three live neighbors becomes a live cell.
In other words, a live cell stays alive during the next iteration if it has 2 or 3 live neighbors. A dead cell becomes live if it has exactly 3 live neighbors.
In the case of Conway's Game of Life, the lattice is a 2D grid, we have 2 states (on or off), the neighborhood of a cell consists of all adjacent cells (including corners), and the transition function is the one described above. Mathematician John Conway proposed the Game of Life in 1970.
The reason we started with Conway's Game of Life for discussing cellular automata is that this simple game with simple rules exhibits some very interesting behavior that has been classified for many years by people toying with the simulation.
First, we have still lives, patterns that don't change while stepping through the simulation. These patterns are stable: no cells die, no cells become live.
Next, we have oscillators, patterns that repeat with a certain periodicity:
In the above example, the last (bottom right) pattern has period 5 and is called Octagon 2. The other 3 patterns all have period 2.
More interestingly, we have spaceships - these are patterns that repeat but translate through space:
The above examples shows a couple of small spaceships, the tiny 5-cell glider and the lightweight spaceship or LWSS. There are many more spaceship patterns, some of them quite large (hundreds or even thousands of cells).
Most simulations tend to eventually stabilize into a combination of oscillators and still lives. Patterns that start from a small seed of a handful of cells and take a long time (in terms of iterations) to stabilize are called Methuselahs. Here is an example, nicknamed Acorn:
Conway conjectured that for any initial configuration, there is an upper limit of how many live cells can ever exist. This was proved wrong by the discovery of glider guns. A glider gun generates gliders every few iterations. The gliders continue moving away from the gun, thus running the simulation the number of live cells continues to grow.
One of the most popular glider guns is called Gosper glider gun, named after Mathematician and programmer Bill Gosper:
There are many other interesting patterns and constructions in the Game of Life discovered throughout the years. A few examples:
absorbother patterns like spaceships, and return to their original state.
There are many others, and combinations of them which give rise to interesting systems like circuits and logic gates based on spaceships and strategically placed still lives and oscillators.
Let's look at a Python implementation for the Game of Life. We will use a wrap-around space, so we'll consider cells on the last column to be neighbors with cells on the first column and similarly cells on the last row to be neighbors with cells on the first row.
def make_matrix(width, height):
return [[False] * width for _ in range(height)]
def neighbors(m, i, j):
last_j = j + 1 if j + 1 < len(m[0]) else 0
last_i = i + 1 if i + 1 < len(m) else 0
return (m[i - 1][j - 1] + m[i - 1][j] + m[i - 1][last_j] +
m[i][j - 1] + m[i][last_j] +
m[last_i][j - 1] + m[last_i][j] + m[last_i][last_j])
def step(m1):
m2 = make_matrix(len(m1[0]), len(m1))
for i in range(len(m1)):
for j in range(len(m1[0])):
n = neighbors(m1, i, j)
if n == 3:
m2[i][j] = True
elif n == 2 and m1[i][j]:
m2[i][j] = True
return m2
To run a simulation, we also need a function to print the game state and some initial conditions:
def print_matrix(m):
for line in m:
print(str.join('', ['#' if c else ' ' for c in line]))
m = make_matrix(10, 10)
m[0][1] = True
m[1][2] = True
m[2][0] = True
m[2][1] = True
m[2][2] = True
for _ in range(100):
print_matrix(m)
m = step(m)
Another very simple to implement system with powerful computational capabilities.
It turns out the Game of Life is Turing complete, meaning it is also
capable of universal computation. Gliders are key to this. In general,
if the behavior of cells would be either repetitive (still life or
oscillators cycle through 1 or more patterns) or chaotic, it would be
hard to perform any computation. But gliders move
and can interact
with each other, thus enabling some non-chaotic processes.
We briefly discussed above how Game of Life patterns can be combined to
form circuits that can process signals (in the form of spaceships) like
logic gates and memory storage
. Paul Rendell implemented
a
universal Turing machine in the Game of Life. His website
(http://rendell-attic.org/gol/tm.htm) covers the details, which we
won't go into due to the complexity. Suffice to say the patterns
emerging in the Game of Life can be combined to build such a device.
Paul also wrote a book about it^{1}.
We again encountered a system capable of computing anything computable, based only on a matrix of cells and a couple of rules (live cells with 2 or 3 neighbors stay alive, dead cells with exactly 3 neighbors become live).
The website https://conwaylife.com/ includes a lot of details on Conway's Game of Life, various patterns discovered, and a forum where people discuss their exploration of the system.
In the next post, we'll look at even simpler cellular automata: elementary cellular automata where cells have 2 possible states and 2 neighbors.
]]>In the previous post we talked about universal Turing machines and looked at some very small machines that are still capable of computing anything that can be computed (the Turing-completeness property). In this post, we'll look at another model for computation: tag systems.
A tag system operates on a string of symbols by reading the symbol from the head of the string, deleting a constant number of symbols from the head of the string, and appending one or more symbols to the tail of the string based on the symbol read from the head.
Formally:
A tag system is a triplet \(\langle m, A, P \rangle\).
- \(m\) is a positive integer, called the deletion number, which specifies how many symbols are deleted from the head during each iteration.
- \(A\) is a finite alphabet of symbols, including a special halting symbol.
- \(P\) is a set of production rules which map each symbol in \(A\) to a string of symbols or words from \(A\) (to be appended to the end of the string).
Tag systems were specified by Emil Leon Post in 1943, 7 years after Turing Machines. We usually refer to tag systems as m-tag systems where \(m\) is the deletion number from the definition above.
At each step, \(x\) is read from the head of the string, \(m\) symbols are deleted, and \(P(x)\) is appended to the end of the string. The tag system halts when \(x\) is the halting symbol.
An alternative definition that doesn't require a halting symbol considers as halting all words that are smaller than \(m\). In this case, the tag system halts when the string shrinks sufficiently. Yet another alternative considers as halting the empty string. In this case, the tag system halts when the string becomes empty.
Let's look at a Python implementation for a tag system:
def tag_system(m, productions, string):
# Repeat until the string is empty or we see the halting symbol
while string and string[0] in productions:
string = string[m:] + productions[string[0]]
yield string
As an example, let's take the tag system with \(m = 2, A = \langle a, b, H \rangle\), and the production rules
Symbol | Word |
---|---|
a | aab |
b | H |
Starting with the string aa
, the steps are:
aa // Erase 2 symbols from head, a -> aab
aab // Erase 2 symbols from head, a -> aab
baab // Erase 2 symbols from head, b -> H
abH // Erase 2 symbols from head, a -> aab
Haab // Halt
Using our tag_system()
function implemented above:
productions = {
'a': 'aab',
'b': 'H',
}
string = 'aa'
print(string)
for string in tag_system(2, productions, string):
print(string)
Tag systems are simple, even simpler than Turing machines. Remember we defined a Turing machine as a 7-tuple while tag systems are represented by triplets. Turing machines have states, and depending on the state, a machine takes different actions. Tag systems technically have a single state: when a symbol is read from the head of the string, the same thing will always happen: \(m\) symbols are deleted from the head and the corresponding production rule determines what word to append to the tail of the string. Even so, tag systems are Turing-complete.
For \(m \gt 1\), m-tag systems are Turing complete. For any Turing machine, there is an m-tag system that can emulate that Turing machine. John Cocke and Marvin Minsky showed in 1964 how a 2-tag system can emulate a universal Turing machine^{1}. That means that such a super simple system is also capable of universal computation!
But it gets even simpler.
A cyclic tag system is a modification of tag systems where:
0
and 1
.0
and 1
) called productions.Instead of production rules, we cycle through the list of productions.
We start from the head of the list of productions. At each step, if the
symbol at the head of the string is 1
, we append the production to the
end of the string. If the symbol at the head of the string is 0
, we
don't append anything. We then move to the next production in the list
for the next step. Once we exhaust the list of productions, we loop
around to the head (this inspired the cyclic name).
Here is a Python implementation for a cyclic tag system:
def cyclic_tag_system(productions, string):
# Keeps track of current production
i = 0
# Repeat until the string is empty
while string:
string = string[1:] + (productions[i] if string[0] == '1' else '')
# Update current production
i = i + 1
if i == len(productions):
i = 0
yield string
For example, we will use the production rules 11
, 01
, 00
. With an
initial string 1
, the steps are:
1 // Append production 11
11 // Append production 01
101 // Append production 00
0100 // Current production 11 (won't append since head is 0)
100 // Append production 01
0001 // Current production 00 (won't append since head is 0)
001 // Current production 11 (won't append since head is 0)
01 // Current production 01 (won't append since head is 0)
1 // Append production 00
00 // Current production 11 (won't append since head is 0)
0 // Current production 01 (won't append since head is 0)
// Halts
Using our Python implementation:
productions = ['11', '01', '00']
string = '1'
print(string)
for string in cyclic_tag_system(productions, string):
print(string)
Cyclic tag systems are simpler than tag systems since \(m\) is fixed to
1
, the alphabet is fixed to 0
and 1
, and productions are a
represented as a cyclic list rather than a map of symbols to words. Even
so, a cyclic tag system can emulate any m-tag system.
An m-tag system with \(n\) symbols \(\lbrace a_1, a_2, ... a_n \rbrace\) and their corresponding production rules \(\lbrace P_1, P_2, ... P_n \rbrace\) can be translated to a cyclic tag system with \(m * n\) productions where the first \(n\) productions \(\lbrace P'_1, P'_2, ... P'_n \rbrace\) are encodings of their respective \(P\)-productions in the m-tag system and the rest are empty strings.
Productions in the m-tag system are words over the alphabet \(A\). We
encode each symobl in \(A\) as a binary string of length \(n\), with a 1
in the \(k\)-th position for \(a_k\). For example, for \(n = 3\) and the
alphabet \(A = \lbrace a_1, a_2, a_3 \rbrace\), we encode \(a_1\) as 100
,
\(a_2\) as 010
, \(a_3\) as 001
. Since a production \(P_k\) is a sequence
of symbols, we can similarly translate it into an encoded representation
\(P'_k\) using symbols 0
and 1
.
Our first example was the 2-tag system with the alphabet \(A = \langle a, b, H \rangle\), and the production rules
Symbol | Word |
---|---|
a | aab |
b | H |
H | H |
Here we added the production rule H -> H
for completeness, so we have
exactly \(n\) production rules.
Translating this into a cyclic tag system, \(a, b, H\) become 100
,
010
, and 001
respectively. The production rules translate as:
a -> aab
becomes100100010
b -> H
becomes001
H -> H
becomes001
The full list of production for the cyclic tag system is
100100010, 001, 001, -, -, -
where -
is the empty string.
The initial string aa
becomes 100100
, so our emulation is:
100100 // * Emulated production rule a -> aab
00100100100010 // P = 001 (but head is 0)
0100100100010 // P = 001 (but head is 0)
100100100010 // P = empty string
00100100010 // P = empty string, head is 0
0100100010 // P = empty string, head is 0
100100010 // * Emulated production rule a -> aab
00100010100100010 // P = 001 (but head is 0)
0100010100100010 // P = 001 (but head is 0)
100010100100010 // P = empty string
00010100100010 // P = empty string, head is 0
0010100100010 // P = empty string, head is 0
010100100010 // P = 100100010 (but is 0)
10100100010 // * Emulated production rule b -> H
0100100010001 // P = 001 (but head is 0)
100100010001 // P = empty string
...
Using our Python implementation:
productions = ['100100010', '001', '001', '', '', '']
string = '100100'
print(string)
for string in cyclic_tag_system(productions, string):
print(string)
Note in this case the cyclic tag system won't halt when the emulated
m-tag system halts, since that would be an emulated halt. But we can
stop it by checking whether the first 3 symbols represent the encoding
of H
. We do this every sixth step, since we have a 2-tag system with 3
symbols, which means we emulate 1 step of the tag system with 6 steps of
the cyclic tag system.
productions = ['100100010', '001', '001', '', '', '']
i, string = 0, '100100'
print(string)
for string in cyclic_tag_system(productions, string):
print(string)
i = (i + 1) % 6
# Break if halting symbol is at the head of the string
if i == 0 and string[:3] == '001':
break
Or, an updated example that prints every sixth step and translates from the cyclic tag system encoding to the original symbols:
productions = ['100100010', '001', '001', '', '', '']
symbols = {
'100': 'a',
'010': 'b',
'001': 'H',
}
def translate(s):
return ''.join([symbols[s[i:i + 3]] for i in range(0, len(s), 3)])
i, string = 0, '100100'
print(f'{string} ({translate(string)})')
for string in cyclic_tag_system(productions, string):
i = (i + 1) % 6
if i == 0:
print(f'{string} ({translate(string)})')
if string[:3] == '001':
break
Running this code should be the emulated equivalent of our first example in this post.
Since m-tag systems (with \(m \gt 1\)) are Turing-complete and cyclic tag
systems can emulate any m-tag system, it follows that cyclic tag systems
are also Turing complete. We can compute anything that is computable
with the alphabet 0
, 1
, and a list of words over this alphabet!
In the next post, we will continue our exploration of simple systems capable of universal computation with cellular automata.
]]>In the previous post, we looked at a history of what would become computer science. In this post, we'll focus on Turing machines and Turing completeness.
The informal definition we gave to a Turing machine in the previous post is:
An abstract computer consisting of an infinite tape of cells, a head that can read from a cell, write to a cell, and move left or right over the tape, and a set of rules which direct the head based on the read symbol and the current state of the machine.
Formally:
A Turing machine is a 7-tuple \(M = \langle Q, q_0, F, \Gamma, b, \Sigma, \delta \rangle\).
- \(Q \ne \varnothing\) is a finite set of states. These are all the states the machine can be in.
- \(q_0 \in Q\) is the initial state. This is the state the machine starts in.
- \(F \subseteq Q\) is the set of final states. When the machine reaches one of the final states, it halts - it stops execution.
- \(\Gamma \ne \varnothing\) is a finite set of tape symbols. These are all the symbols that can appear on the tape.
- \(b \in \Gamma\) is the blank symbol, one of the possible tape symbols. The only symbol allowed to occur on the tape infinitely often at any step.
- \(\Sigma \subseteq \Gamma \setminus \lbrace b \rbrace\) is the set of input symbols allowed to appear in the initial tape contents (not written by the machine during execution). These symbols can be the whole alphabet (except the blank symbol), or a subset of the alphabet.
- \(\delta: (Q \setminus F) \times \Gamma \to Q \times \Gamma \times \lbrace L, R \rbrace\) is a function called the transition function. This functions takes as input the current machine state and the symbol on the tape. It outputs the new machine state, the symbol to overwrite the current tape symbol, and the head movement (either left or right). Note the function domain excludes the final states - once the machine reaches a state in \(F\), it halts so no more transitions happen.
Alternately, the transition function can be defined as a partial function \(\delta: Q \times \Gamma \hookrightarrow Q \times \Gamma \times \lbrace L, R \rbrace\), where the machine halts if the function is undefined for the given combination of machine state and tape symbol. In some compact Turing machines (like we'll see below), \(F\) is empty. There is not final state, rather we halt when encountering a certain combination of machine state and tape symbol for which no transition is defined.
Note this definition allows for some very uninteresting machines: a machine that only has an initial and a final state (\(Q = \lbrace q_0, f \rbrace\)) and, for any input symbol in \(\Gamma\), the transition function moves the machine into the final state. This is a Turing machine, but it can't really compute much. Something more is needed.
A universal Turing machine is a Turing machine that can simulate another, arbitrary, Turing machine on arbitrary input. That is, it can read the description of a Turing machine and that machine's input as its own input, then simulate the execution of that machine.
With this definition, a universal Turing machine can compute anything any other Turing machine can compute (anything that is computable).
Marvin Minsky discovered a universal Turing machine that requires only 7 states and 2 symbols. Yurii Rogozhin discovered a machine with only 4 states and 6 symbols. Let's call the states \(Q = \lbrace A, B, C, D \rbrace\) and the symbols \(\Gamma = \lbrace 0, 1, 2, 3, 4, 5 \rbrace\).
(4, 6) Turing Machine
A | B | C | D | |
---|---|---|---|---|
0 | 3,L,A | 4,R,B | 0,R,C | 4,R,D |
1 | 2,R,A | 2,L,C | 3,R,D | 5,L,B |
2 | 1,L,A | 3,R,B | 1,R,C | 3,R,D |
3 | 4,R,A | 2,L,B | HALT | HALT |
4 | 3,L,A | 0,L,B | 5,R,A | 5,L,B |
5 | 4,R,D | 1,R,B | 0,R,A | 1,R,D |
The above table describes the transition function of the Turing machine.
For example, if the machine is in state A
and the read tape symbol is
5
, we can look up the A
column and 5
row to find the transition
4,R,D
. This means print
.4
on the tape (overwriting the current
symbol), move the head right (R
), machine is now in state D
We're using the partial transition function definition, so instead of
defining one or more explicit final states (\(F\)), we don't define a
transition when the tape symbol is 3
and the machine is in state C
or state D
.
Let's look at a Python implementation of Turing machines. First, let's implement the tape we will be using. Theoretically this is an infinite tape. To simulate this in software, we will use a list and whenever we move the head left or right beyond the list, we extend the list with an additional blank symbol:
class Tape:
def __init__(self, tape, head = 0):
# Initial tape should have at least one symbol
assert(len(tape) >= 1)
# Tape head should be a valid index
assert(0 <= head < len(tape))
self.tape = tape
self.head = head
def read(self):
return self.tape[self.head]
def write(self, symbol):
self.tape[self.head] = symbol
def move_left(self):
# If attempting to move left out of bounds, extend tape left
if self.head == 0:
self.tape.insert(0, 0)
else:
self.head -= 1
def move_right(self):
self.head += 1
# If attempting to move right out of bounds, extend tape right
if self.head == len(self.tape):
self.tape.append(0)
We'll implement a machine that takes a tape, a transition table, and an initial state, and runs until it halts:
def machine(tape, transitions, state):
while True:
symbol = tape.read()
# If no transition is defined for the current state and symbol, halt
if not transitions[state][symbol]:
break
new_symbol, direction, new_state = transitions[state][symbol]
tape.write(new_symbol)
tape.move_left() if direction == 'L' else tape.move_right()
state = new_state
To stich this together, we need a transition table and initial tape state. We'll use the Rogozhin (4, 6) machine:
## Machine states
A, B, C, D = 'A', 'B', 'C', 'D'
## Left and right
L, R = 'L', 'R'
## Rogozhin 4-state, 6-symbol Turing machine
transition = {
A: [(3, L, A), (2, R, A), (1, L, A), (4, R, A), (3, L, A), (4, R, D)],
B: [(4, R, B), (2, L, C), (3, R, B), (2, L, B), (0, L, B), (1, R, B)],
C: [(0, R, C), (3, R, D), (1, R, C), None, (5, R, A), (0, R, A)],
D: [(4, R, D), (5, L, B), (3, R, D), None, (5, L, B), (1, R, D)],
}
This machine is a universal Turing machine, meaning it can simulate any other turing machine, thus is capable of universal computation (can compute anything that is computable).
A Turing-complete system is any system capable of simulating any Turing machine.
Turing-completeness is a way of expressing the computational power of a given system. A Turing-complete system is capable of universal computation. The small Rogozhin (4, 6) machine, since it is a universal Turing machine, is Turing-complete.
More so, the fact that we can simulate this machine in the Python programming language proves that the Python language itself is Turing-complete.
If we weaken some of the constraints for Turing machines, there are even smaller weak universal Turing machines. For example, if we allow the tape to contain an infinitely repeated sequence of symbols, or we don't require the machine to ever halt.
The smallest weak Turing machine is a Turing machine consisting of 2 states and 3 symbols. Let's call the states \(Q = \lbrace A, B \rbrace\) and the symbols \(\Gamma = \lbrace 0, 1, 2 \rbrace\).
(2, 3) Turing Machine
A | B | |
---|---|---|
0 | 1,R,B | 2,L,A |
1 | 2,L,A | 2,R,B |
2 | 1,L,A | 0,R,A |
Stephen Wolfram in A New Kind of Science (a book we'll get back to in a future post) described a 2-state 5-symbol universal Turing machine and conjectured the 2-state 3-symbol machine is also universal. The universality of the 2-state 3-symbol machine was proved in 2007.
In terms of Turing-complete programming languages, a somewhat famous
esoteric programming langue is
Brainfuck. Brainfuck uses a
byte array (tape), a data pointer (index in the array), and 8 symbols:
>
, <
, +
, -
, .
, ,
, [
, ]
. The symbols are interpreted as:
>
: Increment the data pointer (move head right).<
: Decrement the data pointer (move head left).+
: Increment array value at data pointer.-
: Decrement array value at data pointer..
: Output value at data pointer.,
: Read 1 byte of input and store at data pointer.[
: If the byte at data pointer is 0, jump right to the matching
]
, else increment data pointer]
: If the byte at data pointer is not 0, jump left to the matching
[
, else decrement data pointerThis simple language is very much modeled after a Turing machine. Here
is Hell World!
in Brainfuck:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>
---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
Since the language definition is so simple, it is very easy to write a Brainfuck interpreter:
import sys
def bf(program):
# Data array, data pointer, and code pointer
data, dp, cp = [0], 0, 0
while cp < len(program):
match program[cp]:
case '<':
dp -= 1
case '>':
dp += 1
if dp == len(data):
data.append(0)
case '+':
data[dp] += 1
case '-':
data[dp] -= 1
case '.':
print(chr(data[dp]), end='')
case ',':
data[dp] = ord(sys.stdin.read(1))
case '[':
if data[dp] == 0:
opened = 1
while opened:
cp += 1
if program[cp] == ']':
opened -= 1
elif program[cp] == '[':
opened += 1
case ']':
if data[dp] != 0:
opened = 1
while opened:
cp -= 1
if program[cp] == '[':
opened -= 1
elif program[cp] == ']':
opened += 1
cp += 1
Also note that any programming language that can implement a Brainfuck interpreter is Turing-complete (since Brainfuck is Turing-complete).
There's also some surprising proofs of unintentional Turing-completeness. For example, C++ template metaprogramming was proved to be Turing-complete (not the C++ language itself, which is obviously Turing-complete, just the template part alone). Magic: The Gathering is also Turing-complete. Turing-completeness comes in many forms. In the next posts, we'll look at some other models of universal computation: tag systems and cellular automata.
]]>An algorithm (/ËÃ¦lÉ¡ÉrÉªÃ°Ém/ ) is a finite sequence of well-defined instructions, typically used to solve a class of specific problems or to perform a computation.
The first computer (we know of) is the Antikythera mechanism. It was found in 1901 in a shipwreck. The device was built sometime between 100 BC and 150 BC and uses gears to predict astronomical positions of the Sun, Moon, and planets through the zodiac.
Image from Wikimedia Commons user Marsyas, CC BY 2.5
This millennia old device is a hand-powered analog computer. Humanity has been looking at automating computation for quite some time.
Skipping forward a few hundred years, the famous Gottfried Wilhelm Leibniz (1646-1714) designed the first device that could perform the 4 arithmetic operations and used an internal memory. He also invented the binary system, and his Algebra of Thought is a precursor to Boolean Algebra. Leibniz is famous as a mathematician (inventing calculus independently of Isaac Newton), but some also call him the father of computer science.
After creating his arithmetic machine, Leibniz dreamt of a machine that could manipulate symbols in order to decide the truth value of mathematical statements.
Over a century later, Charles Babbage (1791-1871) invents the Difference Engine, a mechanical calculator that can tabulate polynomial functions. Babbage created a small version of this, the Difference Engine 0, in 1822. Work on a larger version, which was supposed to enable larger calculations, was funded by the British government. Unfortunately, this did not materialize due to the manufacturing limitations of the time. It took 20 years and large amounts until the project was abandoned. The Difference Engine 1 was never completed.
During this time, Babbage started thinking about a general-purpose computer, the Analytical Engine. The Analytical Engine would include an arithmetic logic unit, control flow, and memory - components of modern electronic computers. The programming language resembled modern day assembly languages and would have been fed to the computer through punch cards. This machine was never built.
Even though the physical Analytical Engine did not materialize, several programs were created for it, both by Babbage and Ada Lovelace (1815-1852). Ada published the first algorithm for the Analytical Engine, used to compute Bernoulli numbers, and is regarded as the first programmer.
At the beginning of the 20th century, mathematicians were looking for a proper foundation for mathematics: a set of axioms from which all theorems could be derived.
David Hilbert (1862-1943) put forward 23 problems in 1900, which heavily influenced the direction of mathematics research in the 20th century. Some of the problems have since been solved, others, like the famous Riemann hypothesis (problem 8), are still unresolved.
The 2nd problem, directly tying into the foundational crisis, was to prove that the axioms of arithmetic are consistent (meaning no contradictions can arise as theorems are derived from the axioms).
Alfred North Whitehead (1861-1947) and Bertrand Russell (1872-1970) start working on the Principia Mathematica. 3 volumes are published in 1910, 1912, and 1913. Starting with a minimum set of primitive notions, axioms, and inference rules, they deduce theorems pertaining to logic, arithmetic, set theory and so on. Famously, the proof that 1+1=2 appears on page 379 of volume 1.
Kurt GÃ¶del (1906-1978) proves, with his incompleteness theorem (1930), that a formal system powerful enough to describe arithmetic cannot be both consistent and complete. In other words, starting from a set of axioms, if these are consistent (no contradictions can be derived), they cannot be complete (there will be true statements that cannot be derived from these axioms).
Building upon this work, in 1933, GÃ¶del develops general recursive functions as a model of computability (more on this later).
David Hilbert proposes another challenge in 1928: the decision problem. The problem asks for an algorithm that takes a statement as an input and decides whether the statement is provable within the considered set of axioms. Note that GÃ¶del's incompleteness theorem shows that some true statements cannot be proved from a consistent set of axioms. That doesn't mean there isn't an algorithm that can decide whether a statement is provable or not. Hilbert believed such an algorithm exists.
Alonzo Church (1903-1995) develops lambda calculus as a model of computation that uses function abstraction, application, and variable binding and substitution. Church's Theorem (1936) provides a negative answer to the decision problem, based on lambda calculus. He shows there is no computable function that can decide whether two lambda expressions are equivalent.
During the same time, Alan Turing (1912-1954) develops another model of computation: the Turing machine. This is an abstract computer consisting of an infinite tape of cells, a head that can read from a cell, write to a cell, and move left or right over the tape, and a set of rules which direct the head based on the read symbol and the current state of the machine. Turing also provides a negative answer to the decision problem during the same year as Church (1936), based on Turing machines: he shows that there is no general method to decide whether any given Turing machine halts or not (the halting problem).
These are remarkable results: we now have proof that some problems are incomputable. More than that, we know that a Turing machine can compute anything that is computable.
The Church-Turing thesis shows that lambda calculus can be used to simulate a Turing machine. That means that lambda calculus can compute anything that a Turing machine can compute, thus the two systems have the same computability power.
In general, if a system can be used to simulate a Turing machine, this makes it Turing complete, meaning capable of computing anything that is computable.
GÃ¶del's general recursive functions are also shown to be an equivalent model of computation (these are the functions that Turing machines can compute).
We have 3 quite different approaches to universal computability: general recursive functions, lambda calculus, and Turing machines. These turn out to all be equivalent in terms of what is possible to compute.
Turing machines, with their simple definition, are easy to simulate, thus making Turing completeness the preferred way of proving that a system is capable of universal computation.
]]>I wrote before about the inherent complexity of the real world and how software that behaves well in the real world must necessarily take on some complexity (Time and Complexity). A lot of the software engineering best practices try to reduce or eliminate the accidental complexity of large systems (making things more complicated than they should be). But we don't live in a perfect world, so modeling it using software requires some inherent complexity in the software, to reflect reality. One of the algorithms which perfectly illustrates this is the Timsort sorting algorithm.
Timsort is an algorithm developed by Tim Peters in 2002 to replace Python's previous sorting algorithm. It has since been adopted inJava's OpenJDK, the V8 JavaScript engine, and the Swift and Rust languages. This is a testament that Timsort is a performant sort.
Timsort is a stable sorting algorithm, which means it will never changes the relative order of equal elements. This is an important property in certain situations. This is not important when sorting numbers, but becomes important when sorting objects with custom comparisons.
But in 2002 we already had plenty of well known sorting algorithms which were quite efficient. How did Timsort manage to outperform these?
The key insight of Timsort is that in the real world, many lists of
elements that require sorting contain subsequences of elements that are
already sorted. These are called runs and tend to appear naturally.
For example, in the list [5, 2, 3, 4, 9, 1, 6, 8, 10, 7]
we have two
runs: [2, 3, 4]
and [1, 6, 8, 10]
.
If we know runs will show up more often than not in our input, how can we best leverage this to our advantage, and avoid extraneous comparisons and data movement?
Timsort starts by finding the minimum accepted
run length for a
given input. This doesn't have anything to do with the content of the
input, rather it is a function of the size of the input. More on this
later.
Then we do a single pass over the array and identify consecutive runs. If the next minimum accepted run length elements are not already sorted (they don't form a run), we sort them using insertion sort (so they do end up as a run). We push these runs on a stack, then we then merge pairs of them until we end up with a single run, which is our sorted list.
Let's start with a simple sketch implementation. We'll use Python since it is expressive and it makes it easier to focus on the algorithm rather than syntax around it.
MIN_MERGE = 4
def sort(arr):
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
minRun = MIN_MERGE
while nRemaining > 0:
runLen = min(nRemaining, minRun)
insertionSort(arr, lo, lo + runLen)
stack.append((lo, runLen))
lo += runLen
nRemaining -= runLen
while len(stack) > 1:
base2, len2 = stack.pop()
base1, len1 = stack.pop()
merge(arr, base1, base2, base2 + len2)
stack.append((base1, len1 + len2))
First, we initialize a few variables:
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
minRun = MIN_MERGE
MIN_MERGE
represents the minimum number of elements we want to merge,
and is a constant. We'll talk more about this once we look at some
optimizations later on.
lo
and hi
represent the range in the array we will operate on. Note
ranges are always half-open (arr[lo]
included, arr[hi]
excluded,
potentially out of bounds). stack
is the run stack, nRemaining
is
the number of elements we still need to process. minRun
is the minimum
run length. For this first iteration, we'll just use MIN_MERGE
.
Next, we traverse the array and come up with our runs:
while nRemaining > 0:
runLen = min(nRemaining, minRun)
insertionSort(arr, lo, lo + runLen)
stack.append((lo, runLen))
lo += runLen
nRemaining -= runLen
Our run in this case will be the minimum between minRun
and the
remaining elements of the array (so for the final run, we don't go out
of bounds). We sort the run using insertionSort
, then we push the run
start index and length onto the stack. We advance lo
by the length of
the run and we similarly decrement nRemaining
, the number of elements
still to be processed.
Next, we merge the runs:
while len(stack) > 1:
base2, len2 = stack.pop()
base1, len1 = stack.pop()
merge(arr, base1, base2, base2 + len2)
stack.append((base1, len1 + len2))
We pop 2 runs from the top of the stack, merge them, and push the new run back onto the stack. With this basic implementation, a stack is technically not really needed, but I'm trying to preserve the general shape of the optimized solution.
We called a couple of helper functions: insertionSort
and merge
.
Here is insertionSort
:
def insertionSort(arr, lo, hi):
i = lo + 1
while i < hi:
elem = arr[i]
j = i - 1
while elem < arr[j] and j >= lo:
j -= 1
arr.pop(i)
arr.insert(j + 1, elem)
i += 1
Insertion sort traverses
the array from the lower bound + 1 to the higher bound and maintains the
invariant that all elements preceding i
are sorted. So for any element
arr[i]
, we find a spot j
in the range [lo, i)
where this element
should fit. We then insert it there and shift the remaining elements in
[j + 1, i)
one spot to the right. Note this algorithm is quite
inefficient on large data sets, but performs well on small inputs.
Our merge algorithm is:
def merge(arr, lo, mid, hi):
t = arr[lo:mid]
i, j, k = lo, mid, 0
while k < mid - lo and j < hi:
if t[k] < arr[j]:
arr[i] = t[k]
k += 1
else:
arr[i] = arr[j]
j += 1
i += 1
if k < mid - lo:
arr[i:hi] = t[k:mid - lo]
We are merging the consecutive (sorted) ranges [lo, mid)
and
[mid, hi)
. One way to do this (which our implementation uses), is to
copy [lo, mid)
to a temporary buffer t
. We then traverse the
[mid, hi)
range with j
and the buffer with k
. We pick the smallest
of t[k]
and arr[j]
to insert at arr[i]
(incrementing the
corresponding index), then we increment i
. At some point, either j
or k
reaches the end. If j
makes it to the end first, it means we
still have some elements in t
we need to copy over. If k
makes it to
the end first, we don't need to do anything: the remaining elements in
[j, hi)
are where they are supposed to be.
We now have a full implementation of a very simple Timsort. If we run it
on the [5, 2, 3, 4, 9, 1, 6, 8, 10, 7]
input, the following steps take
place:
[5, 2, 3, 4]
and sort it using
insertionSort
. This becomes [2, 3, 4, 5]
. We push its start
index and length on the stack ((0, 4)
).[9, 1, 6, 8]
, sort it to [1, 6, 8, 9]
, and push
(4, 4)
on the stack.[10, 7]
. We sort this short run to [7, 10]
and push (6, 2)
on the stack.Note all our sorting happens in-place, so by now the whole input became
[2, 3, 4, 5, 1, 6, 8, 9, 7, 10]
. We then proceed to merge runs from
the top of the stack:
[1, 6, 8, 9]
with [7, 10
, which yields
[1, 6, 7, 8, 9, 10]
. We pop the two runs from the stack and push
(4, 6)
, the index and length of this new run.[2, 3, 4, 5]
with [1, 6, 7, 8, 9, 10]
, and update
the stack accordingly. At this point, we only have 1 run on the
stack ([0, 10)
). We are done.So far, we haven't relied that much on the fact that our input might be
naturally partially sorted. Instead of simply calling insertionSort
on
minRun
elements, we can actually check whether elements are already
ordered. If they are, we don't need to do anything with them. Even
better, if the run of elements is longer than minRun
, we keep going.
Elements might also come naturally sorted in descending order, while we
are sorting in ascending order. No problem: we can take a range of
elements coming in descending order and reverse it to produce a run in
ascending order. Let's call this function countRunAndMakeAscending
:
def countRunAndMakeAscending(arr, lo, hi):
runHi = lo + 1
if runHi == hi:
return 1
if arr[lo] > arr[runHi]: # Descending run
while runHi < hi and arr[runHi] < arr[runHi - 1]:
runHi += 1
reverseRange(arr, lo, runHi)
else: # Ascending run
while runHi < hi and arr[runHi] >= arr[runHi - 1]:
runHi += 1
return runHi - lo
We return the length of the run starting from lo
, going to at most
hi - 1
. If we have a natural descending run, we reverse the range
before returning. Here is reverseRange
:
def reverseRange(arr, lo, hi):
hi -= 1
while lo < hi:
arr[lo], arr[hi] = arr[hi], arr[lo]
lo += 1
hi -= 1
We can't get rid of sorting though: we might have worst-case scenario
cases with very small runs, in which case we still need a range of at
least minRun
size. Based on the result of countRunAndMarkAscending
,
if it is smaller than minRun
, we will force
a few more elements
into the run and sort it. Our new implementation looks like this:
def sort(arr):
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
minRun = MIN_MERGE
while nRemaining > 0:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
insertionSort(arr, lo, lo + force)
runLen = force
stack.append((lo, runLen))
lo += runLen
nRemaining -= runLen
while len(stack) > 1:
base2, len2 = stack.pop()
base1, len1 = stack.pop()
merge(arr, base1, base2, base2 + len2)
stack.append((base1, len1 + len2))
Highlighting the changed part:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
insertionSort(arr, lo, lo + force)
runLen = force
Instead of simply taking the next minRun
elements, we try to find a
run. If the run we find is smaller than minRun
, we force it to be
minRun
by insertion-sorting into it more elements. If it is larger
than or equal to minRun
on the other hand, we don't have to do any
sorting.
It gets better: now we know after calling countRunAndMakeAscending
that the range [lo, lo + runLen)
is already sorted. We can hint this
to our sorting function and have it start sorting only from
lo + runLen
. We can update insertionSort
to take a hint of where to
start from:
def insertionSort(arr, lo, hi, start):
if start == lo:
start += 1
while start < hi:
elem = arr[start]
j = start - 1
while elem < arr[j] and j >= lo:
j -= 1
arr.pop(start)
arr.insert(j + 1, elem)
start += 1
This version is very similar to our previous one. Instead of using a
local i
variable to iterate over the range [lo + 1, hi)
, we just use
start
. If start
is lo
, we increment it before the loop (just like
we used to initialize i
to lo + 1
).
We can now pass this hint in from our main function:
while nRemaining > 0:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
insertionSort(arr, lo, lo + force, lo + runLen)
runLen = force
At this point, we're starting to get a lot of value from naturally
sorted runs: we either don't do any sorting, or just sort at most
minRun - runLen
elements into the range.
A further optimization for sorting: we can replace insertion sort with
binary sort. Binary sort works much like insertion sort, but instead of
checking where element i
fits into [lo, i)
by comparing it with
i - 1
, then i - 2
and so on, it relies on the fact that [lo, i)
is
already sorted and performs a binary search to find the right spot. Here
is an implementation, which also takes a start
hint:
def binarySort(arr, lo, hi, start):
if start == lo:
start += 1
while start < hi:
pivot = arr[start]
left, right = lo, start
while left < right:
mid = (left + right) // 2
if pivot < arr[mid]:
right = mid
else:
left = mid + 1
arr.pop(start)
arr.insert(left, pivot)
start += 1
Our main function now looks like this:
def sort(arr):
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
minRun = MIN_MERGE
while nRemaining > 0:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
binarySort(arr, lo, lo + force, lo + runLen)
runLen = force
stack.append((lo, runLen))
lo += runLen
nRemaining -= runLen
while len(stack) > 1:
base2, len2 = stack.pop()
base1, len1 = stack.pop()
merge(arr, base1, base2, base2 + len2)
stack.append((base1, len1 + len2))
Another key optimization of Timsort is trying as much as possible to merge runs of balanced sizes. The closer the size, the better average performance as a combination of additional space required and number of operations.
So far we just pushed everything onto a stack, then merged the top 2 elements of the stack until we ended up with a single run. We actually want to do something a bit different: we want our stack to maintain a couple of invariants:
stack[i - 1][1] > stack[i][1] + stack[i + 1][1]
- the length of a
run needs to be larger than the sum of the lengths of the following
runs.stack[i][1] > stack[i + 1][1]
- the length of a run needs to be
larger than the following run.When pushing a new index and run length tuple onto the stack, we check
if the invariant still holds. If it doesn't, we merge stack[i]
with
the smallest of stack[i - 1]
, stack[i + 1]
and recheck. We continue
merging until the invariants are re-established. Let's call this
function mergeCollapse
:
def mergeCollapse(arr, stack):
while len(stack) > 1:
n = len(stack) - 2
if (n > 0 and stack[n - 1][1] <= stack[n][1] + stack[n + 1][1]) or \
(n > 1 and stack[n - 2][1] <= stack[n][1] + stack[n - 1][1]):
if stack[n - 1][1] < stack[n + 1][1]:
n -= 1
elif n < 0 or stack[n][1] > stack[n + 1][1]:
break
mergeAt(arr, stack, n)
We start from the top of the stack - 2. If n > 0
and the invariant
doesn't hold for stack[n - 1]
, stack[n]
, and stack[n + 1]
or if
n > 1
and the invariant doesn't hold for stack[n - 2]
,
stack[n - 1]
and stack[n]
, we need to merge. We decide whether we
want to merge stack[n]
with stack[n + 1]
or stack[n - 1]
with
stack[n]
depending on which one is smallest (if stack[n - 1]
is
smaller, then we decrement n
to trigger the merge at n - 1
.
If the invariant holds, we check for the other invariant:
stack[n][1] > stack[n + 1][1]
. If this second invariant holds, we're
done and we can break out of the loop (we do the same if we ran out of
elements). If not, we trigger a merge by calling mergeAt
and repeat
until we either merge everything or the invariant is reestablished.
We start by checking only the top few elements of the stack, since we expect the rest of the stack to hold the invariants. We only call this function when we push a new run on the stack, in which case we need to ensure we merge as needed.
Let's take a look at mergeAt
. This function simply merges the runs at
positions n
and n + 1
on the stack:
def mergeAt(arr, stack, i):
assert i == len(stack) - 2 or i == len(stack) - 3
base1, len1 = stack[i]
base2, len2 = stack[i + 1]
stack[i] = (base1, len1 + len2)
if i == len(stack) - 3:
stack[i + 1] = stack[i + 2]
stack.pop()
merge(arr, base1, base2, base2 + len2)
Remember we only ever merge either the second from top and top runs or
the third from top and second from top runs. So i
should be either
len(stack) - 2
or len(stack) - 3
. We get the first element and run
length for the two runs and update the stack: stack[i]
starts at the
same position but will now have the length of both unmerged runs. If we
are merging stack[-3]
with stack[-2]
, we need to copy stack[-1]
(top of the stack) to stack[-2]
(second to top). Finally, we pop the
top of the stack. At this point, the stack is updated. We call merge
on the two runs to update arr
too.
We can now maintain a healthy balance for merges. Remember, the whole reason for this is to aim to always merge runs similar in size.
Of course, once we are done pushing everything on the stack, we still
need to force merging to finish our sort. We'll do this with
mergeForceCollapse
:
def mergeForceCollapse(arr, stack):
while len(stack) > 1:
n = len(stack) - 2
if n > 0 and stack[n - 1][1] < stack[n + 1][1]:
n -= 1
mergeAt(arr, stack, n)
This function again merges the second from the top run with the smallest
of third from the top or top. It continues until all runs are merged
into one. Our updates sort
looks like this:
def sort(arr):
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
minRun = MIN_MERGE
while nRemaining > 0:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
binarySort(arr, lo, lo + force, lo + runLen)
runLen = force
stack.append((lo, runLen))
mergeCollapse(arr, stack)
lo += runLen
nRemaining -= runLen
mergeForceCollapse(arr, stack)
Instead of pushing everything onto the stack and merging everything at
the end, we now call mergeCollapse
after each push to keep the runs
balanced. At the end, we call mergeForceCollapse
to force-merge the
stack.
We used a constant minimum run length so far, but mentioned earlier that
it is in fact determined as a function of the size of the input. We will
determine this with minRunLength
:
def minRunLength(n):
r = 0
while n >= MIN_MERGE:
r |= n & 1
n >>= 1
return n + r
This function takes the length of the input and does the following:
n
is smaller than MIN_MERGE
, returns n
- the input size is
too small to use complicated optimizations on.n
is a power of 2, the algorithm will return MIN_MERGE / 2
.
Note: MIN_MERGE
is also a power of 2. In our initial sketch we set
it to 4, but in practice this is usually 32 or 64.k
between MIN_MERGE / 2
and
MIN_MERGE
so that n / k
is close to but strictly less than a
power of 2.It does this by shifting n
one bit to the right until it is less than
MIN_MERGE
. In case any shifted bit is 1, it means n
is not a power
of 2. In that case, we set r
to 1 and return n + 1
.
The reason we do all of this work is to again strive to keep merges balanced. If we get an input like 2048 and our MIN_MERGE is 64, we get back 32. That means that, if we don't have any great runs in our input, we end up with 64 runs, each of length 32. We saw in the previous section how we balance the stack. Consider we're pushing these runs onto the stack:
(0, 32)
on the stack (first 32 elements).(32, 32)
on the stack (next 32 elements).(0, 32)
is not greater than
the run (32, 32)
. The stack becomes (0, 64)
.(64, 32)
on the stack (next 32 elements).(96, 32)
on the stack (next 32 elements).(0, 64)
(64) is not greater than the length of the next two runs,
both of which are 32. The run (64, 32)
gets merged with the
smaller run, (96, 32)
. The stack becomes [(0, 64), (64, 64)]
.[(0, 128)]
.This goes on in the same fashion, and all merges end up being perfectly balanced. This works great for powers of 2.
Now let's consider another case: what if the input is 2112? If we would
still use 32 as our minimum run length, we would get 66 runs of length
32. The first 64 will trigger perfectly balanced merges as before, but
then we end up with the stack [(0, 2048), (2048, 32), (2080, 32)]
.
This collapses to [(0, 2048), (2048, 64)]
, triggering a completely
unbalanced merge (2048 on one side and 64 on the other).
To keep things balanced, if our input is not a power of 2, we pick a
minimum run length that is close to but strictly less than a power of 2.
Let's update our MIN_MERGE
to be 32, and update our sort
to call
minRunLength
instead of automatically setting it to MIN_MERGE
.
We'll throw in another quick optimization: if the whole input is
smaller than MIN_MERGE
, don't even bother with the whole thing: find
a starting run then binary sort the rest, without any merging.
MIN_MERGE = 32
def sort(arr):
lo, hi = 0, len(arr)
stack = []
nRemaining = hi
if nRemaining < MIN_MERGE:
initRunLen = countRunAndMakeAscending(arr, lo, hi)
binarySort(arr, lo, hi, lo + initRunLen)
return
minRun = minRunLength(len(arr))
while nRemaining > 0:
runLen = countRunAndMakeAscending(arr, lo, hi)
if runLen < minRun:
force = min(nRemaining, minRun)
binarySort(arr, lo, lo + force, lo + runLen)
runLen = force
stack.append((lo, runLen))
mergeCollapse(arr, stack)
lo += runLen
nRemaining -= runLen
mergeForceCollapse(arr, stack)
We can optimize merging further. Our initial implementation of merge
simply copied the first run into a buffer, then performed the merge. We
can do better than that.
What if the second run is smaller? Maybe we'd prefer always merging the
smaller run into the larger one. Let's look at an optimized version of
merge. First, we'll replace merge
with two functions, mergeLo
and
mergeHi
. mergeLo
will copy elements from the first run into the
temporary buffer, while mergeHi
will copy elements from the second
run. Our original merge
becomes mergeLo
, and we can add a mergeHi
:
def mergeHi(arr, lo, mid, hi):
t = arr[mid:hi]
i, j, k = hi - 1, mid - 1, hi - mid - 1
while k >= 0 and j >= lo:
if t[k] > arr[j]:
arr[i] = t[k]
k -= 1
else:
arr[i] = arr[j]
j -= 1
i -= 1
if k >= 0:
arr[lo:i + 1] = t[0:k + 1]
This is very similar with merge
, except it copies the second (mid
to
hi
) run into a temporary buffer and traverses the runs and the buffer
from end to start.
When we trigger the merge, another optimization we can do is check elements from the first run and see if they are smaller than the first element in the second run. While they are smaller, we can simply ignore them when merging - they are already in position. We do this by taking the first element of the second run and seeing where it would fit in the first run.
Similarly, elements from the end of the second run which are greater than the last element in the first run are already in place. We don't need to touch them. We take the last element of the first run and check where it would fit in the first run.
We can use binary search for this. Note that we need two version in
order to maintain the stable property of the sort: a searchLeft
, which
returns the first index where a new element should be inserted, and a
searchRight
, which returns the last index. For example, if we have a
run like [1, 2, 5, 5, 5, 5, 7, 8]
and we are looking for where to
insert another 5
, it really depends where it comes from. If it comes
from the run before this one, we need the left-most spot (before the
first 5
in the run). On the other hand, if it comes from the run after
this one, we need to place it after the last 5
. That ensures that the
relative order of elements is preserved. Here is an implementation for
searchLeft
and searchRight
:
def searchLeft(key, arr, base, len):
left, right = base, base + len
while left < right:
mid = left + (right - left) // 2
if key > arr[mid]:
left = mid + 1
else:
right = mid
return left - base
def searchRight(key, arr, base, len):
left, right = base, len
while left < right:
mid = left + (right - left) // 2
if key < arr[mid]:
right = mid
else:
left = mid + 1
return left - base
Both functions return the offset from base
where key
should be
inserted.
We can now update our mergeAt
function with the new capabilities:
def mergeAt(arr, stack, i):
base1, len1 = stack[i]
base2, len2 = stack[i + 1]
stack[i] = (base1, len1 + len2)
if i == len(stack) - 3:
stack[i + 1] = stack[i + 2]
stack.pop()
k = searchRight(arr[base2], arr, base1, len1)
base1 += k
len1 -= k
if len1 == 0:
return
len2 = searchLeft(arr[base1 + len1 - 1], arr, base2, len2)
if len2 == 0:
return
if len1 > len2:
mergeLo(arr, base1, base2, base2 + len2)
else:
mergeHi(arr, base1, base2, base2 + len2)
The first part stays the same: we get base1
, len1
, base2
, and
len2
and update the stack. Next, instead of merging right away, we
first search for where the first element of the second run would go into
the first run. We know the elements in [base1, k)
won't move, so we
can remove them from the merge by moving base1
to the right k
elements (we also need to update len1
). Similarly, we search for where
the last element of the first run (arr[base1 + len1 - 1]
) would fit
into the second run. We know all elements beyond that are already in
place, so we update len2
to be this offset.
In case either of the searches exhausts a run, we simply return.
Otherwise, depending on which run is longer, we call mergeLo
or
mergeHi
.
But wait, there's more! Binary search always performs log(len + 1)
comparisons where len
is the length of the array we are searching for
regardless of where our element belongs. Galloping attempts to find the
spot faster.
Galloping starts by comparing the element we are searching for in array
A
with A[0]
, A[1]
, A[3]
, ... A[i^2 - 1]
. With these
comparisons, we will end up finding a range between some
A[(k - 1)^2 - 1]
and A[k^2 - 1]
that would contain the element we
are searching for. We then run a binary search only within that
interval.
There are some tradeoffs here: on large datasets or purely random data,
binary search performs better. But on inputs which contain natural runs,
galloping tends to find things faster. Galloping also performs better
when we expect to find the interval early on. Let's look at an
implementation of gallopLeft
as an alternative to searchLeft
:
def gallopLeft(key, arr, base, len, hint):
lastOfs, ofs = 0, 1
if key > arr[base + hint]:
maxOfs = len - hint
while ofs < maxOfs and key > arr[base + hint + ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs += hint
ofs += hint
else: # key <= arr[base + hint]
maxOfs = hint + 1
while ofs < maxOfs and key <= arr[base + hint - ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs, ofs = hint - ofs, hint - lastOfs
# arr[base + lastOfs] < key <= arr[base + ofs]
lastOfs += 1
while lastOfs < ofs:
mid = lastOfs + (ofs - lastOfs) // 2
if key > arr[base + mid]:
lastOfs = mid + 1
else:
ofs = mid
return ofs
We start by initializing 2 offsets: lastOfs
and ofs
to represent the
offsets between which we expect to find our key. Note the function also
takes a hint, so callers can provide a tentative starting place.
Let's go over the parts of this function:
if key > arr[base + hint]:
maxOfs = len - hint
while ofs < maxOfs and key > arr[base + hint + ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs += hint
ofs += hint
We first find the two offsets. If the key we are searching for is
greater than (right of) our starting element (arr[base + hint]
), then
our maximum possible offset is len - hint
. While ofs
is hasn't
overflowed and the key is still larger than arr[base + hint + ofs]
, we
keep updating ofs
to be the next power of 2 minus 1. We keep track of
the previous offset in lastOfs
. Once we're done, we add hint
to
both offsets (we do that because we add hint
to all indices in our
loop, but not to ofs
since we keep it a power of 2 minus 1). If
key > arr[base + hint]
is not true, in other words, our key is left of
our starting element:
else: # key <= arr[base + hint]
maxOfs = hint + 1
while ofs < maxOfs and key <= arr[base + hint - ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs, ofs = hint - ofs, hint - lastOfs
In this case, our maximum possible offset is hint + 1
. We gallop
again, but now we are looking at elements left of our starting point,
arr[base + hint - ofs]
where ofs
keeps increasing. Once we find the
range, we update our offsets: lastOfs
becomes hint - ofs
and ofs
becomes hint - lastOfs
. The hint -
part is again because that is
what we actually used as indices. The swap is because we were moving
left, and we need lastOfs
to be the one on the left, ofs
the one on
the right.
We now identified the range within which we'll find our key, between
arr[base + lastOfs]
and arr[base + ofs]
. The last part of the
function is just a binary search within this interval.
The gallopRight
function is very similar to gallopLeft
:
def gallopRight(key, arr, base, len, hint):
ofs, lastOfs = 1, 0
if key < arr[base + hint]:
maxOfs = hint + 1
while ofs < maxOfs and key < arr[base + hint - ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs, ofs = hint - ofs, hint - lastOfs
else:
maxOfs = len - hint
while ofs < maxOfs and key >= arr[base + hint + ofs]:
lastOfs = ofs
ofs = (ofs << 1) + 1
if ofs > maxOfs:
ofs = maxOfs
lastOfs += hint;
ofs += hint;
lastOfs += 1
while lastOfs < ofs:
mid = lastOfs + ((ofs - lastOfs) // 2)
if key < arr[base + mid]:
ofs = mid
else:
lastOfs = mid + 1
return ofs
We won't cover this in details: the difference is here, like with
searchRight
, we want to find the rightmost index where key belongs
instead of the leftmost one, so the algorithm changes accordingly.
The very neat thing about galloping is that its use isn't limited to
only when we set up the merge. We can also gallop while merging. Let's
go over mergeLo
example, since mergeHi
is a mirror of this.
In mergeLo
, we first copy all elements from the first run to a buffer,
then we iterate over the array and at each position we copy either an
element from the buffer or one from the second run, depending on which
one is smaller. While we do this, we can keep track of how many times
the buffer or the second run won
. If one of these wins consistently,
we can assume it will keep winning for a while longer.
For example, if we merge [5, 6, 7, 8, 9]
with [0, 1, 2, 3, 4]
, we
initialize the buffer with [5, 6, 7, 8, 9]
, but for the next 5
comparisons, the second run wins (0 < 5
, 1 < 5
...). Now imagine
much longer runs. Instead of comparing all elements one by one, we
switch to a galloping mode:
We find the last spot where the next element of the second run would fit
into the buffer, and immediately copy the preceding elements of the
buffer into the array. For example, if our buffer is
[12, 13, 14, 15, 17]
and the element we are considering from the
second run is [16]
, we know we can copy [12, 13, 14, 15]
into the
array. Similarly, we find the first spot the next element in the buffer
would fit into the remaining second run, and copy elements before that
from the second run to their position. The galloping mode aims to reduce
the number of comparisons and bulk copy data when possible (using a
memcpy
equivalent where available). While galloping, we still keep
track of how many elements we were able to skip comparing individually.
If this falls below the galloping threshold, we switch back to
regular
mode. Here is an updated mergeLo
implementation:
MIN_GALLOP = 7
minGallop = MIN_GALLOP
def mergeLo(arr, lo, mid, hi):
t = arr[lo:mid]
i, j, k = lo, mid, 0
global minGallop
done = False
while not done:
count1, count2 = 0, 0
while (count1 | count2) < minGallop:
if t[k] < arr[j]:
arr[i] = t[k]
count1 += 1
count2 = 0
k += 1
else:
arr[i] = arr[j]
count1 = 0
count2 += 1
j += 1
i += 1
if k == mid - lo or j == hi:
done = True
break
if done:
break
while count1 >= MIN_GALLOP or count2 >= MIN_GALLOP:
count1 = gallopRight(arr[j], t, k, mid - lo - k, 0)
if count1 != 0:
arr[i:i + count1] = t[k:k + count1]
i += count1
k += count1
if k == mid - lo:
done = True
break
arr[i] = arr[j]
i += 1
j += 1
if j == hi:
done = True
break
count2 = gallopLeft(t[k], arr, j, hi - j, 0)
if count2 != 0:
arr[i:i + count2] = arr[j:j + count2]
i += count2
j += count2
if j == hi:
done = True
break
arr[i] = t[k]
i += 1
k += 1
if k == mid - lo:
done = True
break
minGallop -= 1
if minGallop < 0:
minGallop = 0
minGallop += 2
if k < mid - lo:
arr[i:hi] = t[k:mid - lo]
We introduced a new MIN_GALLOP
constant which is the threshold after
we want to start galloping. We also maintain a minGallop
variable
across merges.
We have a couple of nested while
loops, but the idea is pretty
straightforward. The first nested while
does the normal merge but now
keeps track of how many times in the row did we end up picking an
element from the buffer:
count1, count2 = 0, 0
while (count1 | count2) < minGallop:
if t[k] < arr[j]:
arr[i] = t[k]
count1 += 1
count2 = 0
k += 1
else:
arr[i] = arr[j]
count1 = 0
count2 += 1
j += 1
i += 1
if k == mid - lo or j == hi:
done = True
break
if done:
break
Whenever we increment one counter, we set the other to 0, so at any point, at most one of them is different than 0. We can exit the while loop in two ways: either one of the counters reaches the gallop threshold, or we run out of elements in one of the arrays.
If we ran out of elements we are done, so we break out of the outer loop. Otherwise we are in gallop mode:
while count1 >= MIN_GALLOP or count2 >= MIN_GALLOP:
count1 = gallopRight(arr[j], t, k, mid - lo - k, 0)
if count1 != 0:
arr[i:i + count1] = t[k:k + count1]
i += count1
k += count1
if k == mid - lo:
done = True
break
arr[i] = arr[j]
i += 1
j += 1
if j == hi:
done = True
break
count2 = gallopLeft(t[k], arr, j, hi - j, 0)
if count2 != 0:
arr[i:i + count2] = arr[j:j + count2]
i += count2
j += count2
if j == hi:
done = True
break
arr[i] = t[k]
i += 1
k += 1
if k == mid - lo:
done = True
break
minGallop -= 1
if minGallop < 0:
minGallop = 0
minGallop += 2
We first try to find where the next element in the second run would fit
into the buffer. That becomes our count1
. If we get an offset greater
than 0, we can bulk copy the previous elements from the buffer
([k, k + count1)
) to the range [i, i + count1)
and increment both
k
and i
by count1
. Once we're done, we know for sure we need to
copy the next element from the second run (a[j]
), so we do that.
We then do the opposite: gallop left to find where the next element from
the buffer would fit into the second run. That becomes our count2
and
if it is greater than 0, we bulk copy elements from the second run. Once
we're done, we again now that the next element to copy is at t[k]
, so
we do that.
This loop repeats while either count1
or count2
is greater than
MIN_GALLOP
. If galloping works, we also update minGallop
to favor
future galloping. Each time we iterate, we decrement minGallop
. Once
we're out of the loop, if it is due to both count1
and count2
being
smaller than MIN_GALLOP
, we again adjust minGallop
- first, if it
became negative, we make it 0. We then add 2 to penalize galloping
because our last iteration didn't meet MIN_GALLOP
. As a reminder,
minGallop
is used as the threshold in the first loop. These tweaks to
minGallop
aim to optimize, depending on the data, when to enter gallop
mode and when to keep merging in normal mode.
minGallop
state should be maintained across multiple merges, and only
reset when we start a new sort - so we would make
minGallop = MIN_GALLOP
in our main sort
function, but otherwise rely
on the same value we are updating in minGallop
for subsequent calls of
mergeLo
and mergeHi
. We made minGallop
a global to keep the code
(relatively) simpler. To avoid globals, we should either put all
functions in a class and have minGallop be a member, or pass it through
as an argument through all functions that need it.
Finally, we copy the remaining elements in the buffer, if any:
if k < mid - lo:
arr[i:hi] = t[k:mid - lo]
We also have the mirrored mergeHi
version:
def mergeHi(arr, lo, mid, hi):
t = arr[mid:hi]
i, j, k = hi - 1, mid - 1, hi - mid - 1
global minGallop
done = False
while not done:
count1, count2 = 0, 0
while (count1 | count2) < minGallop:
if t[k] > arr[j]:
arr[i] = t[k]
count1 += 1
count2 = 0
k -= 1
else:
arr[i] = arr[j]
count1 = 0
count2 += 1
j -= 1
i -= 1
if k == -1 or j == lo - 1:
done = True
break
if done:
break
while count1 >= MIN_GALLOP or count2 >= MIN_GALLOP:
count1 = j - lo + 1 - gallopRight(t[k], arr, lo, j - lo + 1, j - lo)
if count1 != 0:
arr[i - count1 + 1:i + 1] = arr[j - count1 + 1:j + 1]
i -= count1
j -= count1
if j == lo - 1:
done = True
break
arr[i] = t[k]
i -= 1
k -= 1
if k == -1:
done = True
break
count2 = k + 1 - gallopLeft(arr[j], t, 0, k + 1, k)
if count2 != 0:
arr[i - count2 + 1:i + 1] = t[k - count2 + 1:k + 1]
i -= count2
k -= count2
if k == -1:
done = True
break
arr[i] = arr[j]
i -= 1
j -= 1
if j == lo - 1:
done = True
break
minGallop -= 1
if minGallop < 0:
minGallop = 0
minGallop += 2
if k >= 0:
arr[lo:i + 1] = t[0:k + 1]
This is very similar to the previous one, so I won't break it into
pieces and explain, just note that since we are starting from the end of
the range and we go backwards, we use closed ranges: i
, j
, and k
always point to the last element of the range, not the one past the
last.
This is a very efficient sorting algorithm which relies on observed properties of datasets in the real world. Quick recap:
k
such that the position we looking for is within
A[(k - 1)^2]
and A[k^2]
, then performs a binary search in the
interval.Is this sorting algorithm beautiful? Maybe not from a purely syntactical/readability perspective. Compare it with the recursive quicksort implementation in Haskell:
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a | a <- xs, a <= x]
biggerSorted = quicksort [a | a <- xs, a > x]
in smallerSorted ++ [x] ++ biggerSorted
Timsort is not a succinct algorithm. There are special cases, optimizations for left to right and right to left cases, galloping, which tries to beat binary search in some situations, multi-mode merges and so on.
That said, everything in it has one purpose: sort real world data efficiently. I find it beautiful for the amount of research that went into it, the major insight that real world data is usually partially sorted, and for how it adapts to various patterns in the data to improve efficiency.
Most real world software looks more like Timsort than the Haskell quicksort above. And while there is, unfortunately, way too much accidental complexity in the world of software, there is a limit to how much we can simplify before we can no longer model reality, or operate efficiently. And, ultimately, that is what matters.
The final version of the code in this blog post is in this GitHub gist (be advised: implementation might be buggy).
Tim Peters has a very detailed explanation of the algorithm and all optimizations in the Python codebase as listsort.txt. I do recommend reading this as it talks about all the research and benchmarks that went into developing Timsort.
The C implementation of Timsort in the Python codebase is listobject.c.
The Python implementation relies on a lot of Python runtime constructs, so it might be harder to read. My implementation is derived from the OpenJDK implementation which I found very readable. That one is here on GitHub.
]]>For the past year or so, I've been on the Fluid Framework team. I won't go deeply into the details of the framework, rather I'll quote a few paragraphs from the Overview page:
What is Fluid Framework?
Fluid Framework is a collection of client libraries for distributing and synchronizing shared state. These libraries allow multiple clients to simultaneously create and operate on shared data structures using coding patterns similar to those used to work with local data.
Why Fluid?
Because building low-latency, collaborative experiences is hard!
Fluid Framework offers:
- Client-centric application model with data persistence requiring no custom server code.
- Distributed data structures with familiar programming patterns.
- Very low latency.
Applications built with Fluid Framework require zero custom code on the server to enable sophisticated data sync scenarios such as real-time typing across text editors. Client developers can focus on customer experiences while letting Fluid do the work of keeping data in sync.
How Fluid works
Fluid was designed to deliver collaborative experiences with blazing performance. To achieve this goal, the team kept the server logic as simple and lightweight as possible. This approach helped ensure virtually instant syncing across clients with very low server costs.
To keep the server simple, each Fluid client is responsible for its own state. While previous systems keep a source of truth on the server, the Fluid service is responsible for taking in data operations, sequencing the operations, and returning the sequenced operations to the clients. Each client is able to use that sequence to independently and accurately produce the current state regardless of the order it receives operations.
The following is a typical flow.
- Client code changes data locally.
- Fluid runtime sends that change to the Fluid service.
- Fluid service sequences that operation and broadcasts it to all clients.
- Fluid runtime incorporates that operation into local data and raises a
valueChanged
event.- Client code handles that event (updates view, runs business logic).
When using Fluid Framework, you model your data using a set of distributed data structures which can internally merge changes from multiple clients.
During various hackathons, the team built various applications using this data model. Of course, one of the first applications of any new technology is games. This got me thinking about how we could model a game on top of the framework.
There are some interesting constraints: games like chess or go don't have any hidden information, but most games do require some hidden information. Card games are especially interesting: each player holds some cards that only themselves can see, some cards are face up on the table (everyone can see them), while the rest of the deck is face down on the table (nobody sees what order the cards are in).
With Fluid Framework, data is replicated across all clients. Assuming we're playing a game of high stakes poker, we can't trust any other client not to cheat. So a naÃ¯ve solution of sending the whole game state (cards each player holds in their hand) to all clients and trust clients not to peek won't work. We should assume that even if the game code only shows a client their own cards, the client can cheat and use a debugger to see what other players are holding in their hands.
We can trust the server, but there is very little the server can do for us -while it can tell us which client changed state (distributed data structure changes sequenced by the server include client ID), the server itself cannot maintain private state. So, for example, we can't tell the server to shuffle a deck of card without telling us what order the cards end up in - all shared state is replicated across all clients.
In this zero-trust environment, where we assume other clients can cheat and all shared state can be accessed by all clients, can we model a card game? Surprisingly, the answer is yes.
Turns out this exact problem has been studied for quite some time, starting with the original 1981 paper by Ron Rivest, Adi Shamir, and Leonard Adleman (inventors of the RSA algorithm among other things).
Once there were two mental chess experts who had become tired of their pastime.
Let's play mental poker, for varietysuggested one.Suresaid the other.Just let me deal!
Mental poker requires a commutative encryption function. If we encrypt \(A\) using \(Key_1\) then encrypting the result using \(Key_2\), we should be able to decrypt the result back to \(A\) regardless of the order of decryption (first with \(Key_1\) and then with \(Key_2\), or vice-versa).
Here is how Alice and Bob play a game of mental poker:
At this point the cards are shuffled. In order to play, Alice and Bob also need the capability to look at individual cards. In order to enable this, the following steps must happen:
If Alice wants to look at a card, she asks Bob for his key for that
card. For example, if Alice draws the first card, encrypted with
\(K_{A_1}\) and \(K_{B_1}\), she asks Bob for \(K_{B_1}\). If Bob sends her
\(K_{B_1}\), she now has both keys to decrypt the card and look
at it.
Bob still can't decrypt it because he doesn't have \(K_{A_1}\).
This way, as long as both Alice and Bob agree that one of them is
supposed to see
a card, they exchange keys as needed to enable this.
At the end of the game, players reveal all keys to validate that no cheating happened.
This approach can be extended to any number of players, each player maintaining their own set of secret keys.
We can model a game using two data structures: one to keep track of the
cards, one to keep track of the moves
in the game.
We can model a deck of cards using a distributed data structure that holds the set of cards. Each client generates secret keys and initially keeps them private (not part of the shared state). The deck of cards can be shuffled and encrypted as described above, with each client updating the shared set of cards.
We can model the gameplay using an append-only list of moves. For
example, if Alice draws
the first card, the move can be modeled as
DRAW 1
. If Bob agrees Alice should see the card, Bob can publish his
secret key \(K_{B_1}\) as PUBLISH <KB1>
. Alice can now use her
\(K_{A_1}\) and the published \(K_{B_1}\) to decrypt the first card of the
deck (stored in the other data structure). DRAW
, PUBLISH
, and other
actions are part of the game semantics, which can be implemented and
interpreted by clients.
Note the deck of cards stays in place during the game. Drawing
a
card means simply that all clients agree Alice should get the keys to
the card at index 1 and that the next card to be drawn is at index 2.
Discarding
a card simply means Bob said he discards the card at
index 5. Depending on whether discarding is face up or face down, Bob
can publish \(K_{B_5}\) or keep it private until the end of the game. All
these actions are part of the game move list, and clients can construct
the game state based on these, without having to mutate the deck itself.
In terms of trust, we can say that, at any point, if a client can prove the game is invalid (another client misbehaved), the game is cancelled. If a player acts out of turn, or performs an action that they shouldn't, the game is invalid. At the end of the game, the append-only list should contain the full record of moves. With all keys available, clients can replay and validate no cheating happened (for example Bob claiming a card decrypted to an Ace, when in fact the card was a 2). Clients can keep a local copy of the list of moves, and confirm no other client rewrote history by tweaking the content of the list.
Establishing turn order can also be modeled through the append-only
action list: each player can start by adding a SIT AT TABLE
action.
The framework will sequence these action in some order, which will
become the turn order. For example, if both Alice and Bob concurrently
SIT AT TABLE
, the action list will contain both actions in some order.
Alice and Bob will take turns in that order.
Game semantics can be implemented as actions clients interpret. This is outside the scope of this article.
As I mentioned, this problem has been studied for many decades. A Toolbox for Mental Card Games by Christian Schindelhauer describes many other techniques for playing cards in a zero-trust environment.
There is also an open-source C++ library implementing the toolbox: LibTMCG.
The https://secret.cards website seems to implement a card game using mental poker techniques.
Wikipedia also has a good page on mental poker.
]]>I spent a lot of time lately looking at how our team can improve our product's reliability and capability to respond to production incidents. This got me thinking about the lifecycle of a contemporary software project. The model I've been using for this looks like the following:
At a high level, the cycle starts with engineers writing code. The code gets merged and at some cadence, a new build is prepared for release. This usually includes looking at a combination of testing and telemetry signals for engineers to signoff on deploying the build. In case tests fail or telemetry shows some anomalies, the deployment is abandoned. If all looks good, the build gets deployed. Once the build is exposed to a larger audience, more telemetry signals come in.
Most software these days uses some form of controlled exposure. For example, services might be deployed first in a dev environment, then in a pre-production environment, then to production in one region, then to all regions. Client software is similarly deployed to different rings, for example new Office builds get deployed first to the Office organization, then to all of Microsoft, then to customers who opted into the Insider program, then to the whole world (I'm very much oversimplifying things here, as release management for Office is way more complex, but you get the idea). Telemetry signals from a ring feed back into the build promotion process to give confidence that a build can be exposed to a larger audience in the next ring.
Of course, sometimes things go wrong. We identify issues in the product, either from telemetry signals or, worse, from user reports. These become live site incidents. On-call engineers react to these and try to mitigate as fast as possible. After the fire is put out, a good practice is to run a postmortem to understand how the issue happened and see how it can be prevented in the future. The learnings usually translate into repair items, which get added to the engineering backlog.
We can split this lifecycle into two parts: a proactive part and a reactive part, which roughly map to the top and bottom halves of the diagram.
The proactive part deals with what we can do to prevent issues from making it to production.
There are several things that could allow issues to slip through the cracks.
On the coding part, a feature might be missing tests to uncover regressions, it might not be instrumented well enough to get good signals, or it might not be put under a feature gate. Feature gates are service-controlled flags that can be turned on/off to disable a feature. These are extremely valuable for quickly mitigating production issues.
All of the above are addressed through education and engineering
culture: more junior engineers on the team might not even be aware of
all the requirements a feature should satisfy before it is ready
(see my Shipping a
Feature
post).
A good practice is to have a feature checklist, a list of things engineers need to consider before submitting a pull request. This includes things like test coverage, telemetry, feature gates, performance, accessibility (for UI) etc.
Everyone writing code should know where this checklist is, and code reviewers should keep it in mind while evaluating changes.
Two main issues would allow a regression to get passed the build validation process: either there is a gap in validation, or missed signals. This, of course, assumes that the code has tests and is properly instrumented in the coding stage. Here, the person or persons validating a build, either miss running some validation (automatic or manual tests) or miss looking at a telemetry signal that would tell them something is wrong.
Both of these issues can be addressed with automation.
Have a go/no-go dashboard that aggregates all relevant signals (like test run results, telemetry metrics).
Of course, putting together such a dashboard and ensuring all code has the right test automation and instrumentation is not easy.
Telemetry could have gaps: issues could manifest themselves without us receiving a signal. If this happens, we need to learn from these incidents, understand where the gaps are, and eliminate them. More about this on the reactive part.
The reactive part deals with how we can mitigate issues as quickly as possible if they make it to production.
The entry point into the reactive cycle is an incident. An incident alerts the on-call engineer and starts the mitigation process. The sooner an incident is created, the sooner it can be addressed.
Issues here come from alerting. An alerting system runs automated queries over incoming telemetry signals and looks for some anomalies or thresholds. Things can go wrong in multiple ways:
We can collect a lot of telemetry but not have the right queries to notice sudden spikes, or drops, or other anomalies in the telemetry stream.
We could be overly cautious and generate too many alerts, most of them false positives, which makes it hard for on-call to figure out when an alert is real.
Alerts might be very generic and not contain enough information for on-call to easily mitigate.
Alerts should be continuously finetuned to be accurate and actionable, with as few false positives as possible.
Telemetry signals, even if correct, can be impacted by multiple things outside of our control. For example, usage might raise or drop sharply during weekends (depending on whether we're talking about a game or a productivity app) or holidays. This makes it even harder to develop accurate alerts.
The worst case is when issues get reported by customers before we see any alerts: this signifies a big gap and a postmortem should identify the follow up work (see postmortems below).
Several things can make mitigation harder. The on-call engineer might not know how to handle certain types of incidents.
It's a good idea to have a troubleshooting guide (TSG) for each type of alert, where an area expert details the steps to mitigate an issue.
Another common issue is there is no easy mitigation. This goes back to our coding section: code should be behind feature gates, so mitigation is as easy as flipping a switch.
Yet another common issue, which we covered in the previous section when we discussed alerts, is not having enough information to easily pinpoint the actual issue. The on-call engineer sees an incidents, knows something is wrong, but not enough information is available for a quick mitigation. Alerts should contain enough information to be useful.
Postmortems are an invaluable tool for learning from incidents. Postmortems are reviews of incidents once mitigated, root caused, and understood, where the team gets together to discuss what happened and take steps to prevent the same type of issue form happening in the future. A postmortem is not about blaming, it is about answering the following question:
What can we do to ensure this doesn't happen in the future?
A postmortem that doesn't answer this question is not that useful. A good postmortem identifies one or more work items that can be handed to engineering to implement additional guardrails so the same issue doesn't recur.
Finally, identifying repair items is not enough. A long backlog of repair items that nobody gets around to implement won't make things any easier.
Engineers should treat repair items with priority.
Repair items are some of the most critical work items: we've seen incidents in production, we know the scope of the impact, and we know the work needed to prevent them in the future.
In this post we looked at a model of software lifecycle, consisting of a proactive part: engineers writing code, a signoff process to promote a build, and signals to increase the audience for a build; and a reactive part: a live site incident, which on-call engineers mitigate, the team postmortems, and comes up with a set of repair items.
We also looked at some of the common issues across these various parts of the lifecycle, and some best practices.
This was a very high-level overview - each of these steps has a lot of depth, from writing safe code, to release management, to telemetry models, site reliability engineering, and so on. All of these are critical parts of shipping software.
]]>This is an excerpt from chapter 7 of my book, Data Engineering on Azure, which deals with machine learning workloads. This is part 3 in a 3 part series. In this post, we'll run the model we created in part 1 on the Azure Machine Learning (AML) infrastructure we set up in part 2 .
We use the Python Azure Machine Learning SDK for this, so the first step
is to install it using the Python package manager (pip). First, make
sure pip is up-to-date. (If there is a newer pip version, you should see
a message printed to the console suggesting you upgrade when you run a
pip command.) You can update pip by running
python -m pip install --upgrade pip
as an administrator. Once pip is
up-to-date, install the Azure Machine Learning SDK with the command in
the following listing:
pip install azureml-sdk
Let's now write a Python script to publish our original ML model to the
cloud, with all the required configuration. We'll call this
pipeline.py
.
from azureml.core import Workspace, Datastore, Dataset, Model
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.compute import AmlCompute
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps.python_script_step import PythonScriptStep
import os
tenant_id = '<your tenant ID>'
subscription_id = '<your Azure subscription GUID>'
service_principal_id = '<your service principal ID>'
resource_group = 'aml-rg'
workspace_name = 'aml'
## Auth
auth = ServicePrincipalAuthentication(
tenant_id,
service_principal_id,
os.environ.get('SP_PASSWORD'))
## Workspace
workspace = Workspace(
subscription_id = subscription_id,
resource_group = resource_group,
workspace_name = workspace_name,
auth=auth)
## Datastore
datastore = Datastore.get(workspace, 'MLData')
## Compute target
compute_target = AmlCompute(workspace, 'd1compute')
## Input
model_input = Dataset.File.from_files(
[(datastore, '/models/highspenders/input.csv')]).as_mount()
## Python package configuration
conda_deps = CondaDependencies.create(
pip_packages=['pandas', 'sklearn', 'azureml-core', 'azureml-dataprep'])
run_config = RunConfiguration(conda_dependencies=conda_deps)
## Train step
trainStep = PythonScriptStep(
script_name='highspenders.py',
arguments=['--input', model_input],
inputs=[model_input],
runconfig=run_config,
compute_target=compute_target)
## Submit pipeline
pipeline = Pipeline(workspace=workspace, steps=[trainStep])
published_pipeline = pipeline.publish(
name='HighSpenders',
description='High spenders model',
continue_on_step_failure=False)
open('highspenders.id', 'w').write(published_pipeline.id)
We'll break down this script and discuss each part. First, we have the required imports and the additional parameters we need.
from azureml.core import Workspace, Datastore, Dataset, Model
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.compute import AmlCompute
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps.python_script_step import PythonScriptStep
import os
tenant_id = '<your tenant ID>'
subscription_id = '<your Azure subscription GUID>'
service_principal_id = '<your service principal ID>'
resource_group = 'aml-rg'
workspace_name = 'aml'
We import a set of packages from the azureml-sdk
. We need the tenant
ID, subscription ID, and service principal ID we will use to connect to
the Azure Machine Learning service. We created the service principal in
part
2.
We stored it in the $sp
variable. In case you closed that PowerShell
session and no longer have the $sp
variable, you can simply rerun the
scripts we covered in part 2 to create a new service principal and grant
it the required permissions.
You can get the service principal ID from $sp.appId
in PowerShell.
Similarly, you can get the tenant ID from $sp.tenant
. The subscription
ID is the GUID of your Azure subscription.
Use these to intialize the tenant_id
, subscription_id
, and
service_principal_id
in the script above.
Next, we connect to the workspace using the service principal and get
the data store (MLData
) and compute target (d1compute
) needed by our
model. The following listing shows the steps.
## Auth
auth = ServicePrincipalAuthentication(
tenant_id,
service_principal_id,
os.environ.get('SP_PASSWORD'))
## Workspace
workspace = Workspace(
subscription_id = subscription_id,
resource_group = resource_group,
workspace_name = workspace_name,
auth=auth)
## Datastore
datastore = Datastore.get(workspace, 'MLData')
## Compute target
compute_target = AmlCompute(workspace, 'd1compute')
Here we define a service principal authentication as Auth
and use the
environment variable SP_PASSWORD
to retrieve the service principal
secret. We set this variable in part 2, after we created the principal.
We connect to the Azure Machine Learning workspace with the given
subscription ID, resource group, name, and auth. We then retrieve the
datastore (MLData
) and compute target (d1compute
) from the
workspace.
We need these to set up our deployment: the data store is where we have our input, while the compute target is where the model trains. The following listing shows how we can specify the model input.
## Input
model_input = Dataset.File.from_files(
[(datastore, '/models/highspenders/input.csv')]).as_mount()
The from_files()
method takes a list of files. Each element of the
list is a tuple consisting of a data store and a path. The as_mount()
method ensures the file is mounted and made available to the compute
that trains the model.
Azure Machine Learning datasets reference a data source location, along with a copy of its metadata. This allows models to seamlessly access data during training.
Next, we'll specify the Python packages required by our model, from
which we can initialize a run configuration. If you remember from part
1, we used pandas
and sklearn
. We'll also need the azureml-core
and azureml-dataprep
packages required by the runtime. The next
listing shows how to create the run configuration.
## Python package configuration
conda_deps = CondaDependencies.create(
pip_packages=['pandas', 'sklearn', 'azureml-core', 'azureml-dataprep'])
run_config = RunConfiguration(conda_dependencies=conda_deps)
Conda stands for Anaconda, a Python and R open source distribution of common data science packages. Anaconda simplifies package management and dependencies and is commonly used in data science projects because it provides a stable environment for this type of workload. Azure Machine Learning also uses it under the hood.
Next, let's create a step for training our model. In our case, this is
a PythonScriptStep
, a step that executes Python code. We'll provide
the name of the script (from our previous section), the command-line
arguments, the inputs, run configuration, and compute target. The
following listing shows the details.
## Train step
trainStep = PythonScriptStep(
script_name='highspenders.py',
arguments=['--input', model_input],
inputs=[model_input],
runconfig=run_config,
compute_target=compute_target)
We specify the script to upload/run with script_name
. This is our
highspenders.py
model we created in part 1. We set the arguments we
want passed to the script as arguments
. Here, model_input
resolves
at runtime to the path where the data is mounted on the node running the
script. We set the inputs, run configuration, and compute target to run
on as inputs
, runconfig
, and compute_target
.
We can chain multiple steps together, but we only need one in our case. One or more steps form a ML pipeline.
An Azure Machine Learning pipeline simplifies building ML workflows including data preparation, training, validation, scoring, and deployment.
Pipelines are an important concept in Azure Machine Learning. These capture all the information needed to run a ML workflow. The following listing shows how we can create and submit a pipeline to our workspace.
## Submit pipeline
pipeline = Pipeline(workspace=workspace, steps=[trainStep])
published_pipeline = pipeline.publish(
name='HighSpenders',
description='High spenders model',
continue_on_step_failure=False)
open('highspenders.id', 'w').write(published_pipeline.id)
We create a pipeline with a single step, trainStep
in our workspace.
We publish the pipeline. We'll save the GUID of the published pipeline
into the highspenders.id
file so we can refer to it later.
This covers the whole pipeline.py
script. Our pipeline automation is
almost complete. But before calling this script to create the pipeline,
let's make one small addition to our high spender model. While we could
do all of the previous steps without touching our original model code,
we add the final step to the model code itself. Remember that once the
model is trained, we save it to disk as outputs/highspender.pkl
.
For this step, we'll make one Azure Machine Learning-specific addition:
taking the trained model and storing it in the workspace. Add the lines
in the following listing to the highspenders.py
model we created in
part 1 (not to pipeline.py
, which we just covered).
## Register model
from azureml.core import Model
from azureml.core.run import Run
run = Run.get_context()
workspace = run.experiment.workspace
model = Model.register(
workspace=workspace,
model_name='highspender',
model_path=model_path)
Note the call to Run.get_context()
and how we use this to retrieve the
workspace. In pipeline.py
, we provided the subscription ID, resource
group, and workspace name. That is how we can get a workspace from
outside Azure Machine Learning. In this case, though, the code runs in
Azure Machine Learning as part of our pipeline. This gives us additional
context that we can use to retrieve the workspace at runtime. Every run
of a pipeline in Azure Machine Learning is called an experiment.
Azure Machine Learning experiments represent one execution of a pipeline. When we rerun a pipeline, we have a new experiment.
We are all set! Let's run the pipeline.py
script to publish our
pipeline to the workspace. The following listing provides the command
for this step.
python pipeline.py
The GUID matters! If we rerun the script, it registers another pipeline with the same name but a different GUID. Azure Machine Learning does not update pipelines in place. We have the option to disable pipelines so these don't clutter the workspace, but not to update those. Let's kick off the pipeline using Azure CLI as the next listing shows.
$pipelineId = Get-Content -Path highspenders.id
az ml run submit-pipeline `
--pipeline-id $pipelineId `
--workspace-name aml `
--resource-group aml-rg
We read the pipeline ID from the highspenders.id
file produced in the
previous step into the $pipelineId
variable. We then submit a new run.
Check the UI at https://ml.azure.com. You should see the pipeline under the Pipelines section, the run we just kicked off under the Experiments section. Once the model is trained, you'll see the model output under the Models section.
After implementing a model in Python, we started with provisioning a workspace, which is the top-level container for all Azure Machine Learning-related artifacts. Next, we created a compute target, which specifies the type of compute our model runs on. We can define as many compute targets as needed; some models require more resources than others, some require GPUs, etc. Azure provides many types of VM images suited to all these workloads. A main advantage of using compute targets in Azure Machine Learning is that compute is provisioned on demand when we run a pipeline. Once the pipeline finishes, compute gets deprovisioned. This allows us to scale elastically and only pay for what we need.
We then attached a data store. Data stores are an abstraction over existing storage services, and these allow Azure Machine Learning connections to read the data. The main advantage of using data stores is that these abstract away access control, so our data scientists don't need to worry about authenticating against the storage service.
With the infrastructure in place, we proceeded to set up a pipeline for our model. A pipeline specifies all the requirements and steps our execution needs to take. There are many pipelines in Azure: Azure DevOps Pipelines are focused on DevOps, provisioning resources, and in general, providing automation around Git; Azure Data Factory pipelines are focused on ETL, data movement, and orchestration; Azure Machine Learning Pipelines are meant for ML workflows, where we set up the environment and then execute a set of steps to train, validate, and publish a model.
Our pipeline included a dataset (our input), a compute target, a set of Python package dependencies, a run configuration, and a step to run a Python script. We also enhanced our original model code to publish the model in AML. This takes the result of our training run and makes it available in the workspace. Then we published the pipeline to our Azure Machine Learning workspace and submitted a run, which in Azure Machine Learning is called an experiment.
We will stop here with the series of article. Grab the book to see how
we can apply DevOps to our ML scenario. In the book, we go over putting
both the model code and pipeline.py
in Git, then deploy updates using
Azure DevOps Pipelines. We also cover orchestrating ML runs with Azure
Data Factory, which includes getting the input data ready, running an
Azure Machine Learning experiment, and handling the output.
All of this and more in Data Engineering on Azure.
]]>This is an excerpt from chapter 7 of my book, Data Engineering on Azure, which deals with machine learning workloads. This is part 2 in a 3 part series. In this post, we'll explore the Azure Machine Learning (AML) service and set it up in preparation of running our model in the cloud.
In this post, like throughout the book, we'll be using PowerShell Core and Azure CLI to interact with Azur