For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series:Â https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we looked at building a simple game of rock-paper-scissors. In this post we'll look at implementing a card game.
We'll build a discard game - players take turns discarding a card that must match either the suit or the value of the card on top of the discard pile. The player who discards their whole hand first wins.
We're implementing a simple game as the focus is not on the game-specific logic, rather how to leverage the Mental Poker toolkit.
The full code for this is in the
demos/discard
app. The best way to read this post is side by side with the code.
We'll follow a similar structure to the rock-paper-scissors game we looked at in the previous post:
First, let's look at how we implement the deck of cards and associated logic.
We'll represent a card as a string, for example "9:hearts"
is the 9 of hearts.
The function getDeck()
initializes as unshuffled deck of cards:
function getDeck() {
const deck: string[] = [];
for (const value of ["9", "10", "J", "Q", "K", "A"]) {
for (const suit of ["hearts", "diamonds", "clubs", "spades"]) {
deck.push(value + ":" + suit);
}
}
return deck;
}
We're using fewer cards (from 9 to Aces) for this demo as the more cards we have the more prime numbers we need to find to encrypt them and it slows things down. Rather than implementing some loading UI, we'll just use fewer cards for the example.
We need a helper function to tell us whether two cards match either value or suit:
function matchSuitOrValue(a: string, b: string) {
const [aValue, aSuit] = a.split(":");
const [bValue, bSuit] = b.split(":");
return aValue === bValue || aSuit === bSuit;
}
Finally, we want a class to wrap a deck and implement the functions needed for using it:
class Deck {
private myCards: number[] = [];
private othersCards: number[] = [];
private drawPile: number[] = [];
private discardPile: number[] = [];
private decryptedCards: (string | undefined)[] = [];
private othersKeys: SRAKeyPair[] = [];
constructor(
private encryptedCards: string[],
private myKeys: SRAKeyPair[],
private store: RootStore
) {
this.drawPile = encryptedCards.map((_, i) => i);
}
...
We initialize the class with an array of encrypted cards (the shuffled deck) as
encryptedCards
, our set of SRA keys (myKeys
) and the Redux store (store
).
We also need to track cards (by index):
myCards
).othersCards
).drawPile
).discardPile
).As the other player shares their encryption keys (when they reveal a card to
us), we'll store them in the othersKeys
array. Similarly, as we decrypt cards,
we'll store them in decryptedCards
- this is just for convenience, so we don't
have to keep decrypting the same values over and over.
We assume we're starting with a shuffled deck of cards as a draw pile, with no
player having cards in hand - so we initialize drawPile
to the indexes of
encryptedCards
.
Some helper functions:
...
getKey(index: number) {
return SRAKeySerializationHelper.serializeSRAKeyPair(
this.myKeys[index]
);
}
getKeyFromHand(index: number) {
return SRAKeySerializationHelper.serializeSRAKeyPair(
this.myKeys[this.myCards[index]]
);
}
cardAt(index: number) {
if (!this.decryptedCards[index]) {
const partial = SRA.decryptString(
this.encryptedCards[index],
this.myKeys[index]
);
this.decryptedCards[index] = SRA.decryptString(
partial,
this.othersKeys[index]
);
}
return this.decryptedCards[index]!;
}
getDrawIndex() {
return this.drawPile[0];
}
canIMove() {
if (this.discardPile.length === 0) {
return true;
}
return (
this.drawPile.length > 0 ||
this.myCards.some((index) =>
matchSuitOrValue(
this.cardAt(index),
this.cardAt(this.discardPile[this.discardPile.length - 1])
)
)
);
}
...
These are pretty self-explanatory:
getKey()
returns our SRA key at index
.getKeyFromHand()
returns our SRA key for a card in our hand (at index
).cardAt()
returns the decrypted card at index
. This assumes we can decrypt
the card. If we are already storing it in decryptedCards
, we return it from
there; otherwise we decrypt it using our key and the other player's key, then
store it in decryptedCards
.getDrawIndex()
returns the index at the top of the discard pile.canIMove()
returns true
if we can discard a card. If the discard pile is
empty, we can discard anything; otherwise at least one of the cards in our
hand needs to match the suit or value of the card on top of the discard pile.We also need to implement some functions that mutate the deck (in which case we also need to update our view-model so our UI reflects the changes):
...
async myDraw(serializedSRAKeyPair: SerializedSRAKeyPair) {
const index = this.drawPile.shift()!;
this.myCards.push(index);
this.othersKeys[index] =
SRAKeySerializationHelper.deserializeSRAKeyPair(
serializedSRAKeyPair
);
await this.updateViewModel();
}
async othersDraw() {
this.othersCards.push(this.drawPile.shift()!);
await this.updateViewModel();
}
async myDiscard(index: number) {
const cardIndex = this.myCards.splice(index, 1)[0];
this.discardPile.push(cardIndex);
this.updateViewModel();
}
async othersDiscard(
index: number,
serializedSRAKeyPair: SerializedSRAKeyPair
) {
const cardIndex = this.othersCards.splice(index, 1)[0];
this.othersKeys[cardIndex] =
SRAKeySerializationHelper.deserializeSRAKeyPair(
serializedSRAKeyPair
);
this.discardPile.push(cardIndex);
this.updateViewModel();
}
...
The actions are:
myDraw()
- we draw a card from the top of the draw pile. We need the other
player's key for this card, given as the serializedSRAKeyPair
argument.othersDraw()
- other player draws a card from the top of the draw pile. Note
the Deck
class just maintains state, so is not responsible for sharing our
key for that card with the other player - rather we just update the state
(othersCards
and drawPile
).myDiscard()
- we discard a card. We take the index
of the card as an
argument.othersDiscard()
- other player discards a card. We take the index
of the
card and the other player's SRA key as arguments.Note all these functions end up calling updateViewModel()
. That's because all
of the functions change state, so we need to update our Redux store and reflect
the changes on the UI:
...
private async updateViewModel() {
await this.store.dispatch(
updateDeckViewModel({
drawPile: this.drawPile.length,
discardPile: this.discardPile.map((i) => this.cardAt(i)),
myCards: this.myCards.map((i) => this.cardAt(i)),
othersHand: this.othersCards.length,
})
);
}
}
We haven't looked at the Redux store
yet. We'll cover this later on but here
we dispatch a deck view-model update. The deck view-model contains the size of
the draw pile, the cards in the discard pile and our hand, and the number of
cards in the other player's hand.
type DeckViewModel = {
drawPile: number;
discardPile: string[];
myCards: string[];
othersHand: number;
};
const defaultDeckViewModel: DeckViewModel = {
drawPile: 0,
discardPile: [],
myCards: [],
othersHand: 0,
};
These is all the deck management logic we need. Let's move on to game actions.
We'll be using the library-provided shuffle. We covered this in part
6
so we won't go over it again. This is exposed by as a shuffle()
function. So
assuming our deck is shuffled, the first action we need to handle is dealing
cards. In Mental Poker, dealing a card to Bob means Alice needs to share her key
to that card. Then Bob can use his key and Alice's key to see
the card, while
Alice cannot see it since she doesn't have Bob's key. This is the equivalent of
Bob holding a card in his hand.
We define a DealAction
:
type DealAction = {
clientId: ClientId;
type: "DealAction";
cards: number[];
keys: SerializedSRAKeyPair[];
}
Here, cards
are the indexes of the cards in the deck and keys
are the
corresponding SRA keys for each card. Here's the state machine for dealing cards
to both players:
async function deal(imFirst: boolean, count: number) {
const queue = store.getState().queue.value!;
await store.dispatch(updateGameStatus("Dealing"));
const cards = new Array(count).fill(0).map((_, i) => imFirst ? i + count : i);
const keys = cards.map((card) => store.getState().deck.value!.getKey(card)!);
await sm.run(sm.sequence([
sm.local(async (queue: IQueue<Action>, context: RootStore) => {
await queue.enqueue({
clientId: context.getState().id.value,
type: "DealAction",
cards,
keys });
}),
sm.repeat(sm.transition(async (action: DealAction, context: RootStore) => {
if (action.type !== "DealAction") {
throw new Error("Invalid action type");
}
if (action.clientId === context.getState().id.value) {
return;
}
const deck = context.getState().deck.value!;
for (let i = 0; i < action.cards.length; i++) {
if (imFirst) {
if (action.cards[i] !== i) {
throw new Error("Unexpected card index");
}
await deck.myDraw(action.keys[i]);
} else {
await deck.othersDraw();
}
}
for (let i = 0; i < action.cards.length; i++) {
if (imFirst) {
await deck.othersDraw();
} else {
if (action.cards[i] !== i + action.cards.length) {
throw new Error("Unexpected card index");
}
await deck.myDraw(action.keys[i]);
}
}
}), 2)
]), queue, store);
}
In preparation of dealing, we:
queue
from the store
.Dealing
(more details on this later).count
cards so the other player will get the next count
ones;
otherwise they get the first count
cards and we get the next ones.With this done, our state machine consists of:
DealAction
with the cards
and keys
we
determined the other player gets.DealAction
actions. If we see the one we sent out (the clientId
matches our clientId
)
we can ignore it. If we see the DealAction
from the other player, we update
the deck. If we are first to draw, then we call deck.myDraw()
count
times,
then deck.othersDraw()
count
times; otherwise we do it the other way
around - call deck.othersDraw()
count
times, then call deck.myDraw()
count
times.Local transitions and remote transitions are explained in part 5, in which we talked about the state machine.
Drawing a card is a two-step process. We need to tell the other player we intend to draw a card (from the draw pile), and they need to give us their key to that card. Similarly, if the other player tells us they want to draw a card, we give them our key to that card.
We need two actions:
type DrawRequestAction = {
clientId: ClientId;
type: "DrawRequest";
cardIndex: number;
}
type DrawResponseAction = {
clientId: ClientId;
type: "DrawResponse";
cardIndex: number;
key: SerializedSRAKeyPair;
}
If we want to draw a card, here is our state machine:
async function drawCard() {
const queue = store.getState().queue.value!;
await store.dispatch(updateGameStatus("Waiting"));
await sm.run([
sm.local(async (queue: IQueue<Action>, context: RootStore) => {
await queue.enqueue({
clientId: context.getState().id.value,
type: "DrawRequest",
cardIndex: context.getState().deck.value!.getDrawIndex() });
}),
sm.transition((action: DrawRequestAction) => {
if (action.type !== "DrawRequest") {
throw new Error("Invalid action type");
}
}),
sm.transition(async (action: DrawResponseAction, context: RootStore) => {
if (action.type !== "DrawResponse") {
throw new Error("Invalid action type");
}
await context.getState().deck.value!.myDraw(action.key);
}),
], queue, store);
await store.dispatch(updateGameStatus("OthersTurn"));
await waitForOpponent();
}
We again get the async queue
from the store
and update the game status. Then
we run the state machine consisting of 3 transitions:
DrawRequest
action.DrawRequest
.DrawResponse
action, giving us the key and allowing us to draw a card.Finally, after running the state machine and drawing the card, we update the
game status again to other player's turn and call waitForOpponent()
, which
we'll cover later.
This fully implements us drawing a card from the top of the discard pile and updating the deck.
Similar to drawing cards, we need to implement discarding cards. Discarding a
card is easier - we don't need a key from the other player, rather we just
provide the key to the card we're discarding such that the other player can
see
it.
type DiscardRequestAction = {
clientId: ClientId;
type: "DiscardRequest";
cardIndex: number;
key: SerializedSRAKeyPair;
}
Our DiscardRequestAction
contains the card index and our key.
The corresponding state machine:
async function discardCard(index: number) {
const queue = store.getState().queue.value!;
await store.dispatch(updateGameStatus("Waiting"));
await sm.run([
sm.local(async (queue: IQueue<Action>, context: RootStore) => {
await queue.enqueue({
clientId: context.getState().id.value,
type: "DiscardRequest",
cardIndex: index,
key: context.getState().deck.value!.getKeyFromHand(index)});
}),
sm.transition(async (action: DiscardRequestAction, context: RootStore) => {
if (action.type !== "DiscardRequest") {
throw new Error("Invalid action type");
}
await context.getState().deck.value!.myDiscard(action.cardIndex);
}),
], queue, store);
if (store.getState().deckViewModel.value.myCards.length === 0) {
await store.dispatch(updateGameStatus("Win"));
} else {
await store.dispatch(updateGameStatus("OthersTurn"));
await waitForOpponent();
}
}
As usual, we get the queue
and update game state. Then we run the state
machine:
DiscardRequest
with the card index and key.DiscardRequest
- since
this round-tripped, we can now update the deck.After running the state machine, we need to check whether we discarded the last card in our hand. If we did, we can update the game state to us winning. Otherwise we wait for the other player's move.
The last action we need to look at is the situation in which we can't discard
any card (no matching suit or value) and we also can't draw a card (draw pile is
empty). In this case we lose the game. Since it is our turn, we need to let the
other player know that we're not pondering our next move, rather that we can't
do anything and we lose. We'll model this as a simple CantMoveAction
:
type CantMoveAction = {
clientId: ClientId;
type: "CantMove";
}
This action has no payload. The state machine is also very simple:
async function cantMove() {
const queue = store.getState().queue.value!;
await queue.enqueue({
clientId: store.getState().id.value,
type: "CantMove" });
await store.dispatch(updateGameStatus("Loss"));
}
At the end of it, we update the game status to us losing.
So far, we have the 3 possible actions we can take when it is our turn:
drawCard()
).discardCard()
).cantMove()
).Next, we need to model responding to the other player's move.
The opponent can take the same actions as we can, so we don't need to declare any new action types, rather we need a state machine that responds to actions incoming from the other player:
async function waitForOpponent() {
const queue = store.getState().queue.value!;
const othersAction = await queue.dequeue();
switch (othersAction.type) {
case "DrawRequest":
await sm.run([
sm.local(async (queue: IQueue<Action>, context: RootStore) => {
if (othersAction.cardIndex !== store.getState().deck.value!.getDrawIndex()) {
throw new Error("Invalid card index for draw");
}
await queue.enqueue({
clientId: store.getState().id.value,
type: "DrawResponse",
cardIndex: othersAction.cardIndex,
key: store.getState().deck.value!.getKey(othersAction.cardIndex)
})}),
sm.transition(async (action: DrawResponseAction, context: RootStore) => {
if (action.type !== "DrawResponse") {
throw new Error("Invalid action type");
}
await context.getState().deck.value!.othersDraw();
})], queue, store);
await store.dispatch(updateGameStatus("MyTurn"));
break;
case "DiscardRequest":
await store.getState().deck.value!.othersDiscard(othersAction.cardIndex, othersAction.key);
if (store.getState().deckViewModel.value.othersHand === 0) {
await store.dispatch(updateGameStatus("Loss"));
} else if (store.getState().deck.value?.canIMove()) {
await store.dispatch(updateGameStatus("MyTurn"));
} else {
await cantMove();
}
break;
case "CantMove":
await store.dispatch(updateGameStatus("Win"));
break;
}
}
We dequeue an action, then we respond based on its type:
DrawRequest
, we send a DrawResponse
. We implement this as a
simple state machine with a local transition (our DrawResponse
) and a remote
transition in which we expect to see our response round-tripped. We also check
to ensure the draw request card index matches the top of the draw pile
(otherwise the other player might trick us and draw some other card).DiscardRequest
, we update the deck. If the other player
discarded their last card, we lose. Otherwise, if we can move, we update game
status to MyTurn
and let the user pick which card to discard etc. But if we
can't move - can't discard anything, can't draw, then we automatically call
cantMove()
to mark the fact we lost.CantMove
, the other player lost so we update game status to
Win
.Note for the discard request, to keep things simple, we aren't checking whether the move is legal. If we want to secure the implementation, we should check that the card the other player is discarding matches either the suit or value of the card on top of the discard pile.
We already covered all possible actions:
type Action = DealAction | DrawRequestAction | DrawResponseAction | DiscardRequestAction | CantMoveAction;
The possible game status:
type GameStatus = "Waiting" | "Shuffling" | "Dealing" | "MyTurn" | "OthersTurn" | "Win" | "Loss" | "Draw";
We just implemented all the game logic - the possible actions a player can take, and the request/response needed to model the game of discard. We have the full model, so let's move on to the Redux store.
Like in the previous post, we will be using Redux and the Redux Toolkit.
The sate we'll be maintaining:
GameStatus
in our model).Deck
).Using createAction
from the Redux Toolkit:
const updateId = createAction<string>("id/update");
const updateOtherPlayer = createAction<string>("otherPlayer/update");
const updateQueue = createAction<IQueue<Action>>("queue/update");
const updateGameStatus = createAction<GameStatus>("gameStatus/update");
const updateDeck = createAction<Deck>("deck/update");
const updateDeckViewModel = createAction<DeckViewModel>("deckViewModel/update");
We'll also use the same helper to create Redux reducers as for rock-paper-scissors:
function makeUpdateReducer<T>(
initialValue: T,
updateAction: ReturnType<typeof createAction>
) {
return createReducer({ value: initialValue }, (builder) => {
builder.addCase(updateAction, (state, action) => {
state.value = action.payload;
});
});
}
Our Redux store is:
const store = configureStore({
reducer: {
id: makeUpdateReducer("", updateId),
otherPlayer: makeUpdateReducer("Not joined", updateOtherPlayer),
queue: makeUpdateReducer<IQueue<Action> | undefined>(
undefined,
updateQueue
),
gameStatus: makeUpdateReducer("Waiting", updateGameStatus),
deck: makeUpdateReducer<Deck | undefined>(undefined, updateDeck),
deckViewModel: makeUpdateReducer<DeckViewModel>(defaultDeckViewModel, updateDeckViewModel),
},
middleware: (getDefaultMiddleware) =>
getDefaultMiddleware({
serializableCheck: false,
}),
});
This is all we need to connect the model with the view.
We'll use React.
The first component we need is a card:
type CardViewProps = {
card: string | undefined;
onClick?: () => void;
};
const suiteMap = new Map([
["hearts", "â¥"],
["diamonds", "â¦"],
["clubs", "â£"],
["spades", "â "]
]);
const CardView: React.FC<CardViewProps> = ({ card, onClick }) => {
const number = card?.split(":")[0];
const suite = card ? suiteMap.get(card.split(":")[1]) : undefined;
const color = suite === "â¥" || suite === "â¦" ? "red" : "black";
return <div style={{ width: 70, height: 100, borderColor: "black", borderWidth: 1, borderStyle: "solid", borderRadius: 5,
backgroundColor: card ? "white" : "darkred"}} onClick={onClick}>
<div style={{ display: card ? "block" : "none", paddingLeft: 15, paddingRight: 15, color }}>
<p style={{ marginTop: 20, marginBottom: 0, textAlign: "left", fontSize: 25 }}>{number}</p>
<p style={{ marginTop: 0, textAlign: "right", fontSize: 30 }}>{suite}</p>
</div>
</div>
}
This renders a card
which can be a string
or undefined
. If it is a string,
we render the value and suit. Otherwise we render the back
of the card - a
dark red rectangle. Cards have an optional onClick()
event.
A HandView
renders several cards:
type HandViewProps = {
prefix: string;
cards: (string | undefined)[];
onClick?: (index: number) => void;
};
const HandView: React.FC<HandViewProps> = ({ cards, prefix, onClick }) => {
return <div style={{ display: "flex", flexDirection: "row", justifyContent: "center" }}>{
cards.map((card, i) => <CardView key={prefix + ":" + i} card={ card } onClick={() => { if (onClick) { onClick(i) } }} />)
}
</div>
}
This can be the player's hand, where we should have string
values for each
card and an onClick()
event hooked up for when the player clicks on a card to
discard it. It can also be the other player's hand, in which case we should have
undefined
values for each card and just show their backs.
MainView
implements a view of the whole table:
const useSelector: TypedUseSelectorHook<RootState> = useReduxSelector;
const MainView = () => {
const idSelector = useSelector((state) => state.id);
const otherPlayer = useSelector((state) => state.otherPlayer);
const gameStateSelector = useSelector((state) => state.gameStatus);
const deckViewModel = useSelector((state) => state.deckViewModel);
const myTurn = gameStateSelector.value === "MyTurn";
const canDiscard = (index: number) => {
if (deckViewModel.value.discardPile.length === 0) {
return true;
}
return matchSuitOrValue(
deckViewModel.value.myCards[index],
deckViewModel.value.discardPile[deckViewModel.value.discardPile.length - 1]);
}
return <div>
<div>
<p>Id: {idSelector.value}</p>
<p>Other player: {otherPlayer.value}</p>
<p>Status: {gameStateSelector.value}</p>
</div>
<div style={{ height: 200, textAlign: "center" }}>
<HandView prefix={"others"} cards={ new Array(deckViewModel.value.othersHand).fill(undefined) } />
</div>
<div style={{ height: 200, display: "flex", flexDirection: "row", justifyContent: "center" }}>
<div style={{ display: deckViewModel.value.drawPile > 0 ? "block" : "none", margin: 5 }} onClick={() => { if (myTurn) { drawCard()} }}>
<span>{deckViewModel.value.drawPile} card{deckViewModel.value.drawPile !== 1 ? "s" : ""}</span>
<CardView card={ undefined } />
</div>
<div style={{ display: deckViewModel.value.discardPile.length > 0 ? "block" : "none", margin: 5 }}>
<span>{deckViewModel.value.discardPile.length} card{deckViewModel.value.discardPile.length !== 1 ? "s" : ""}</span>
<CardView card={ deckViewModel.value.discardPile[deckViewModel.value.discardPile.length - 1] } />
</div>
</div>
<div style={{ height: 200, textAlign: "center" }}>
<HandView
prefix={"mine"}
cards={ deckViewModel.value.myCards }
onClick={(index) => { if (myTurn && canDiscard(index)) { discardCard(index) } }} />
</div>
</div>
This consists of:
If it is our turn, we hook up drawCard()
to the draw pile's onClick()
and
for each card we can discard, we hook up discardCard()
to the card's
onClick()
.
And that's it. Rendering it all on the page:
const root = ReactDOM.createRoot(document.getElementById("root")!);
root.render(
<Provider store={store}>
<MainView />
</Provider>
);
Here,Â Provider
Â comes from theÂ react-redux
Â package and makes the Redux store
available to the React components.
Like with rock-paper-scissors, let's look at how we initialize the game:
getLedger<Action>().then(async (ledger) => {
const id = randomClientId();
await store.dispatch(updateId(id));
const queue = await upgradeTransport(2, id, ledger);
await store.dispatch(updateQueue(queue));
for (const action of ledger.getActions()) {
if (action.clientId !== id) {
await store.dispatch(updateOtherPlayer(action.clientId));
break;
}
}
const [sharedPrime, turnOrder] = await establishTurnOrder(2, id, queue);
await store.dispatch(updateGameStatus("Shuffling"));
const [keys, deck] = await shuffle(id, turnOrder, sharedPrime, getDeck(), queue, 64);
const imFirst = turnOrder[0] === id;
await store.dispatch(updateDeck(new Deck(deck, keys, store)));
await deal(imFirst, 5);
await store.dispatch(updateGameStatus(imFirst ? "MyTurn" : "OthersTurn"));
if (!imFirst) {
await waitForOpponent();
}
});
ledger
, as we saw
inÂ part 7.packages/primitives/src/randomClientId.ts
).upgradeTransport()
Â (also discussed in part 7).Shuffling
.shuffle()
primitive and get back our keys and
encrypted cards.imFirst
.Deck
and store in the Redux store
.deal()
.MyTurn
or
OthersTurn
.waitForOpponent()
.This initialization is a bit longer than the one for rock-paper-scissors, since we have to shuffle and deal cards, and the order in which the players go is important.
We looked at implementing a discard card game using the Mental Poker toolkit.
The full source code for the demo is underÂ demos/discard
.
README.md
.deck.ts
.model.ts
.store.ts
.cardView.tsx
,
handView.tsx
,
mainView.tsx
.index.tsx
.We finally put the whole toolkit to its intended use and built an end-to-end interactive, 2-player card game.
]]>For an overview on Mental Poker, seeÂ Mental Poker Part 0: An Overview. Other articles in this series:Â https://vladris.com/writings/index.html#mental-poker. In the previousÂ post in the seriesÂ we looked at some low-level building blocks. It this post, weâll finally see how to implement a game end-to-end using the toolkit. Weâll start with a simple game: rock-paper-scissors.
Weâll build this game as a React app, using the toolkit. Weâll be using Redux for state management - Redux provides a good way of binding game state to the UI, which works well with our toolkit.
The full code for this is in the
demos/rock-paper-scissors
app.
Since we got a lot of the primitives out of the way in the previous post (Fluid
connection, getting a SignedTransport
etc.), in this post we can focus on the
higher level semantics of modeling the game.
Weâll play a round of rock-paper-scissors as follows:
rock
or paper
or scissors
) encrypted.This 2-step protects against cheating: before the game proceeds, both players need to make a selection. But the other player doesnât know what the selection is until the decryption key is provided. Note for this particular game, turn order doesnât matter.
Weâll start with a few type definitions:
type PlaySelection = "Rock" | "Paper" | "Scissors";
type EncryptedSelection = string;
PlaySelection
represents the possible plays, EncryptedSelection
is the
string representation of an encrypted PlaySelection
.
Our game model will have 2 actions:
type PlayAction = {
clientId: ClientId;
type: "PlayAction";
encryptedSelection: EncryptedSelection;
};
type RevealAction = {
clientId: ClientId;
type: "RevealAction";
key: SerializedSRAKeyPair;
};
type Action = PlayAction | RevealAction;
PlayAction
is the first step, when players post their encrypted choice.
RevealAction
is the second step, revealing the encryption key. Weâll use the
SRA algorithm for encryption since we have it in our toolkit, but for this game
any encryption algorithm would work.
Weâll also need a couple more type definitions for the game state:
type GameStatus = "Waiting" | "Ready" | "Win" | "Loss" | "Draw";
type PlayValue =
| { type: "Selection"; value: PlaySelection }
| { type: "Encrypted"; value: EncryptedSelection }
| { type: "None"; value: undefined };
The GameStatus
represents the different states the client can be in:
Waiting
for another player to connect or for round to finish.Ready
to play.Win
, Loss
, Draw
- the result after playing a round.The PlayValue
represents the current state of a playerâs pick. It can be
either an encrypted selection, a revealed selection, or nothing (at the start of
the game).
Before implementing the game state machine, letâs look at the Redux store.
I wonât go into the details of Redux in this post - please refer to the Redux documentation for that. Weâll be using the Redux Toolkit to streamline setting up our store.
We will maintain 6 pieces of state:
GameStatus
above).PlayValue
above).PlayValue
).Weâll use the Redux Toolkit createAction
helper to define the update functions
for these:
const updateId = createAction<string>("id/update");
const updateOtherPlayer = createAction<string>("otherPlayer/update");
const updateQueue = createAction<IQueue<Action>>("queue/update");
const updateGameStatus = createAction<GameStatus>("gameStatus/update");
const updateMyPlay = createAction<PlayValue>("myPlay/update");
const updateTheirPlay = createAction<PlayValue>("theirPlay/update");
Weâll also need reducers (another Redux concept) for updating the values. We can implement a helper function to create these:
function makeUpdateReducer<T>(
initialValue: T,
updateAction: ReturnType<typeof createAction>
) {
return createReducer({ value: initialValue }, (builder) => {
builder.addCase(updateAction, (state, action) => {
state.value = action.payload;
});
});
}
Finally, we set up our Redux store as:
const store = configureStore({
reducer: {
id: makeUpdateReducer("", updateId),
otherPlayer: makeUpdateReducer("Not joined", updateOtherPlayer),
queue: makeUpdateReducer<IQueue<Action> | undefined>(
undefined,
updateQueue
),
myPlay: makeUpdateReducer<PlayValue>(
{ type: "None", value: undefined },
updateMyPlay
),
theirPlay: makeUpdateReducer<PlayValue>(
{ type: "None", value: undefined },
updateTheirPlay
),
gameStatus: makeUpdateReducer("Waiting", updateGameStatus),
},
middleware: (getDefaultMiddleware) =>
getDefaultMiddleware({
serializableCheck: false,
}),
});
We initialize the store with the default values:
Waiting
(for other player to connect).Thatâs about it for Redux setup - again, I wonât cover what reducers are, how Redux manages state changes etc.
Weâll implement playing a round of rock-paper-scissors in the function async
function playRound(selection: PlaySelection)
. We invoke this with our selection
(rock, paper, or scissors).
First, we need to get a few references:
const context = store;
await context.dispatch(updateGameStatus("Waiting"));
const queue = context.getState().queue.value!;
const kp = SRA.genereateKeyPair(BigIntUtils.randPrime());
First, we get a reference to the Redux store. Then we update the game status to
Waiting
. We get a reference to the async queue from the Redux store and,
finally, we generate an SRA key pair. The generateKeyPair()
and randPrime()
functions we discussed all the way in part
1,
when we covered cryptography. The dispatch()
and getState()
are standard
Redux calls.
Now letâs look at the state machine modeling a round. It consists of the following sequence:
We can run this state machine with the Redux store as context:
await sm.run(sm.sequence([
sm.local(async (queue) => {
const playAction = {
clientId: context.getState().id.value,
type: "PlayAction",
encryptedSelection: SRA.encryptString(selection, kp),
};
await queue.enqueue(playAction);
}),
sm.repeat(sm.transition(async (play: PlayAction, context: RootStore) => {
const action =
play.clientId === context.getState().id.value
? updateMyPlay
: updateTheirPlay;
await context.dispatch(
action({ type: "Encrypted", value: play.encryptedSelection })
);
}), 2),
sm.local(async (queue) => {
const revealAction = {
clientId: context.getState().id.value,
type: "RevealAction",
key: SRAKeySerializationHelper.serializeSRAKeyPair(kp),
};
await queue.enqueue(revealAction);
}),
sm.repeat(sm.transition(async (reveal: RevealAction, context: RootStore) => {
const action =
reveal.clientId === context.getState().id.value
? updateMyPlay
: updateTheirPlay;
const originalValue =
reveal.clientId === context.getState().id.value
? context.getState().myPlay.value
: context.getState().theirPlay.value;
await context.dispatch(
action({
type: "Selection",
value: SRA.decryptString(
originalValue.value as EncryptedSelection,
SRAKeySerializationHelper.deserializeSRAKeyPair(reveal.key)
) as PlaySelection,
})
);
}), 2)
]), queue, context);
We first define a local
transition - we enqueue our PlayAction
.
We then repeat 2 times a transition
. We update the Redux store accordingly: if
the received client ID is ours, we call updateMyPlay()
, otherwise we call
updateTheirPlay()
with the encrypted value.
Next, we enqueue our RevealAction
.
We then again repeat 2 times a transition
. If the incoming client ID is ours,
we call updateMyPlay()
and decrypt the originalValue
(myPlay.value
) with
the received key, otherwise we call updateTheirPlay()
and decrypt the
originalValue
(theirPlay.value
) with the received key.
Note how this code updates the Redux store directly, by using it as the context for the state machine.
Once the state machine finishes, we should have both our play and the opponentâs play, so we can determine the winner and update the game state accordingly:
const myPlay = context.getState().myPlay.value;
const theirPlay = context.getState().theirPlay.value;
if (myPlay.value === theirPlay.value) {
await context.dispatch(updateGameStatus("Draw"));
} else if (
(myPlay.value === "Rock" && theirPlay.value === "Scissors") ||
(myPlay.value === "Paper" && theirPlay.value === "Rock") ||
(myPlay.value === "Scissors" && theirPlay.value === "Paper")
) {
await context.dispatch(updateGameStatus("Win"));
} else {
await context.dispatch(updateGameStatus("Loss"));
}
And thatâs it in terms of game mechanics. Finally, letâs look at a simple UI for the game.
Weâll build the UI using React. First, letâs create a component that provides the rock-paper-scissors options as 3 buttons:
type ButtonsViewProps = {
disabled: boolean;
onPlay: (play: PlaySelection) => void;
}
const ButtonsView = ({ disabled, onPlay }: ButtonsViewProps) => {
return <div>
<button disabled={disabled} onClick={() => onPlay("Rock")} style={{ width: 200}}>ðª¨</button>
<button disabled={disabled} onClick={() => onPlay("Paper")} style={{ width: 200 }}>ð</button>
<button disabled={disabled} onClick={() => onPlay("Scissors")} style={{ width: 200 }}>âï¸</button>
</div>
}
Our properties are a boolean that determines whether buttons should be enabled
or disabled and an onPlay()
callback.
Our view is also very simple:
const useSelector: TypedUseSelectorHook<RootState> = useReduxSelector;
const MainView = () => {
const idSelector = useSelector((state) => state.id);
const otherPlayer = useSelector((state) => state.otherPlayer);
const gameStateSelector = useSelector((state) => state.gameStatus);
return <div>
<div>
<p>Id: {idSelector.value}</p>
<p>Other player: {otherPlayer.value}</p>
<p>Status: {gameStateSelector.value}</p>
</div>
<ButtonsView disabled={gameStateSelector.value === "Waiting"} onPlay={playRound}></ButtonsView>
</div>
}
The first line is some React-Redux plumbing (via the react-redux
package),
which allows us to grab data from the Redux store and put it in the UI.
Weâll be showing our ID, the other playerâs ID, the game status, and the 3
buttons. The buttons are enabled as long as the game state is no Waiting
. Once
the user clicks a button, we simply call the playRound()
function we looked at
in the previous section.
Rendering all of this on the page:
const root = ReactDOM.createRoot(document.getElementById("root")!);
root.render(
<Provider store={store}>
<MainView />
</Provider>
);
Here, Provider
comes from the react-redux
package and makes the Redux store
available to the React components.
We now have all the pieces into place, the only bit of code we havenât covered is initializing the game:
getLedger<Action>().then(async (ledger) => {
const id = randomClientId();
await store.dispatch(updateId(id));
const queue = await upgradeTransport(2, id, ledger);
await store.dispatch(updateQueue(queue));
for (const action of ledger.getActions()) {
if (action.clientId !== id) {
store.dispatch(updateOtherPlayer(action.clientId));
break;
}
}
await store.dispatch(updateGameStatus("Ready"));
});
The steps are:
ledger
, as we saw
in the previous
post.randomClientId()
function in this post, but you can find the implementation in
packages/primitives/src/randomClientId.ts
).upgradeTransport()
(also discussed in the previous
post).Ready
(from the default, which is Waiting
).The steps are pretty self-explanatory, maybe except getting the other playerâs
ID. The way that works is as follows: getActions()
returns all actions posted
on the ledger so far. We look for an action where the client ID is different
than our client ID and store that as the other playerâs ID. We are guaranteed to
see at least one action from the other player, as we ran upgradeTransport()
,
which under the hood performs a public key exchange.
And thatâs it - we have an end-to-end game of rock-paper-scissors.
We looked at implementing rock-paper-scissors using the Mental Poker toolkit.
The full source code for the demo is under
demos/rock-paper-scissors
.
README.md
.model.ts
.store.ts
.buttonsView.tsx
and
mainView.tsx
.index.tsx
.Note how easy it is to model a game if we rely on the toolkitâs primitives. We implement the game logic in the model, relying on the toolkitâs capabilities. We use Redux to store game state, which we can easily bind to a React view. That said, this was a very simple game. In the next post weâll look at implementing a card game.
]]>For an overview on Mental Poker, seeÂ Mental Poker Part 0: An Overview. Other articles in this seriesÂ here. In the previousÂ post in the seriesÂ we saw how to implement shuffling on top of our primitives.
It this post, weâll look at a few other primitives useful for implementing a game on top of this toolkit.
We talked about Fluid Framework in previous
posts. In part
2,
we discussed the Fluid ledger, a distributed data structure which forms the
basis of our game message exchange. In part
3, we
talked about our ITransport
interface and how we can implement it given a
ledger. We haveât covered how to get a ledger.
Letâs go back down the stack, all the way to Fluid Framework. Fluid Framework expects clients to agree on the basic layout of the distributed data structures theyâre working with. These data structures are packaged in a container. Note this container has nothing to do with Docker containers, itâs simply a definition for a set of data structures.
Weâll look at a simple implementation of joining a Fluid session and using a
container that includes only a ledger. We wonât even try to connect to an
instance of the Azure Fluid Relay service, rather weâll use a local server.
Instructions for connecting to a service hosted in Azure are
here.
For our local server, we need a stub user and an AzureLocalConnectionConfig
including an InsecureTokenProvider
- this is all plumbing to connect to a
local instance of the Fluid Relay service:
const user = {
id: "userId",
name: "userName",
};
const localConnectionConfig: AzureLocalConnectionConfig = {
type: "local",
tokenProvider: new InsecureTokenProvider("", user),
endpoint: "http://localhost:7070",
};
With this connection config, we can now define a simple container containing a Ledger
:
export async function getLedger<T>(): Promise<ITransport<T>> {
const client = new AzureClient({ connection: localConnectionConfig });
const containerSchema = {
initialObjects: { myLedger: Ledger },
};
let container: IFluidContainer;
const containerId = window.location.hash.substring(1);
if (containerId) {
({ container } = await client.getContainer(
containerId,
containerSchema
));
} else {
({ container } = await client.createContainer(containerSchema));
const id = await container.attach();
window.location.hash = id;
}
const ledger = container.initialObjects.myLedger as Ledger<string>;
return makeFluidClient(ledger);
}
We check the browser windowâs URL: if it ends with a GUID, we load the container; if not, we create a new container and add its GUID to the browser windowâs URL. This makes it easy to connect two local clients to the same session:
The code above can be found in the
demos/transport
package. This is used by the other demo apps. Note you need to run the Fluid
Framework local service: npx @fluidframework/azure-local-service@latest
.
We now have a simple abstraction, getLedger()
, that wraps all the Fluid
Framework-specifics and gives us back an ITransport
interface (implemented as
a FluidTransport
).
We are building a turn-based, cryptographically secure game, so the first step is to ensure our channel is secure and clients canât spoof each other.
In part
3 we
looked at the ITransport
interface, the FluidTransport
implementation which
leverages the Fluid protocol for communication, and the SignedTransport
implementation which wraps the FluidTransport
and enhances it with signature
verification.
Recap of signing: in cryptography, we do signing using a public/private key pair. These are both generated from a shared seed. Alice can sign a message using her private key and anyone that has the public key, including Bob, can verify that the signature is indeed Aliceâs.
So given a public/private key pair \(<K_private, K_public>\) and some payload \(P\), singing is a function that produces a signature given the payload and private key \(sign(P, K_private) -> signature\). Signature verification is a function that takes a payload, signature, and public key and tells us whether the signature was indeed produced by the corresponding private key \(verify(P, signature, K_public) -> true/false\).
The neat thing about public/private key cryptography is that the public key, which is required for validation, is not a secret - only the private key is. Nobody can spoof a signature unless they have the private key (which isnât shared), but everyone with the public key can verify that the signature comes from the private key owner.
So if we start with a FluidTransport
, we need our clients to exchange public
keys. Each client generates a public/private key pair, and posts its client ID
and public key. We use these to populate the key store.
We can implement this on top of the state machine we saw in part 5. First, we define our action and context. As a reminder, the action is what we send over the wire and expect to receive. The context is an object we make available to the code we run whenever an action appears over the transport.
type KeyExchangeAction = {
clientId: ClientId;
type: "KeyExchange";
publicKey: Key;
};
type CryptoContext = {
clientId: ClientId;
me: PublicPrivateKeyPair;
keyStore: KeyStore;
};
In our case our action contains the ClientId
, the type
(which is
KeyExchange
), and a public key. Each client is expected to post this over the
transport. The context contains our ClientId
(so we can tell whether the
message came from us or someone else), our public/private key pair, and the
KeyStore
in which we put all ClientId
-to-Key
mappings.
A helper function to create the CryptoContext
:
async function makeCryptoContext(clientId: ClientId): Promise<CryptoContext> {
return {
clientId,
me: await Signing.generatePublicPrivateKeyPair(),
keyStore: new Map<ClientId, Key>(),
};
}
This leverages the cryptography primitives in our toolkit to generate a public/private key pair.
Our sequence to be executed by the state machine is:
function makeKeyExchangeSequence(players: number) {
return sm.sequence([
sm.local(
async (
actionQueue: IQueue<KeyExchangeAction>,
context: CryptoContext
) => {
await actionQueue.enqueue({
type: "KeyExchange",
clientId: context.clientId,
publicKey: context.me.publicKey,
});
}
),
sm.repeat(
sm.transition(
(action: KeyExchangeAction, context: CryptoContext) => {
if (action.type !== "KeyExchange") {
throw new Error("Invalid action type");
}
if (action.clientId === undefined) {
throw new Error("Expected client ID");
}
if (context.keyStore.has(action.clientId)) {
throw new Error(
"Same client posted key multiple times"
);
}
context.keyStore.set(action.clientId, action.publicKey);
}
),
players
),
]);
}
Refer to part
5
for the state machine details and a more in-depth explanation of local
actions/transitions etc. Our sequence starts with a local action, meaning
originating from our client: we post our client ID and public key. Then, for the
given number of players
we expect in the session, we repeatedly expect an
incoming action of type KeyExchangeAction
.
In other words, our protocol require each client to start by posting their public key, and each client should expect as many such key postings as clients in the game.
We handle some error cases:
KeyExchangeAction
, one of the clients
didnât respect the protocol, so we bail.If we didnât hit any of these issues, then we store the client ID and key in the
KeyStore
instance. Once the state machine executes this sequence, each client has enough
information to create a SignedTransport
. Here is a helper function to perform
the whole key exchange:
async function keyExchange(
players: number,
clientId: ClientId,
actionQueue: IQueue<BaseAction>
) {
const context = await makeCryptoContext(clientId);
const keyExchangeSequence = makeKeyExchangeSequence(players);
await sm.run(keyExchangeSequence, actionQueue, context);
return [context.me, context.keyStore] as const;
}
This function takes as input the expected number of players, the ID of this client, and an action queue (as discussed in part 4). The implementation is straight-forward:
context
.KeyStore
(the key store contains only
public keys).And here is a helper function that upgrades a transport to a signed one:
export async function upgradeTransport<T extends BaseAction>(
players: number,
clientId: ClientId,
transport: ITransport<T>
): Promise<IQueue<T>> {
const [keyPair, keyStore] = await keyExchange(
players,
clientId,
new ActionQueue(
transport as unknown as ITransport<BaseAction>,
true
)
);
return new ActionQueue(
new SignedTransport(
transport,
{ clientId, privateKey: keyPair.privateKey },
keyStore,
new SignatureProvider()
)
);
}
This function takes the number of players, our client ID, and an ITransport
which doesnât support signature verification. It executes the key exchange, then
creates a SignedTransport
since it now has all the pieces needed for that.
This function goes a step further, and also initializes an async queue on top of
the singed transport.
A game that uses the toolkit can go from start to a queue over a signed transport in 3 steps:
const ledger = await getLedger<Action>();
const id = randomClientId();
const queue = await upgradeTransport(2, id, ledger);
In this example, we call getLedger()
, which we discussed in the first part of
this post, we generate a unique client ID, then we call upgradeTransport()
.
With these 3 lines of code, we get an ActionQueue
over a SignedTransport
.
The last primitive weâll look at in this post is another key component of Mental Poker: having clients agree who goes first, and agree on a shared large prime (this shared prime is used to generate SRA keys, as discussed in part 1).
These can be separate steps but we can combine them to be more efficient. To establish turn order, we can leverage the ledger distributed data structure which guarantees all clients get all ops in the same sequence: each client posts something, then we simply use the order in which clients see these posts as the turn order.
Hereâs a sketch of the state machine for this:
type EstablishTurnOrderAction = BaseAction;
type EstablishTurnOrderContext = {
clientId: ClientId;
turnOrder: ClientId[];
};
function makeEstablishTurnOrderSequence(players: number) {
return sm.sequence([
sm.local(async (actionQueue: IQueue<EstablishTurnOrderAction>, context: EstablishTurnOrderContext) => {
await actionQueue.enqueue({
type: "EstablishTurnOrder",
clientId: context.clientId,
});
}),
sm.repeat(sm.transition((action: EstablishTurnOrderAction, context: EstablishTurnOrderContext) => {
if (action.type !== "EstablishTurnOrder") {
throw new Error("Invalid action type");
}
if (context.turnOrder.find((id) => id === action.clientId)) {
throw new Error("Same client posted prime multiple times");
}
context.turnOrder.push(action.clientId);
}), players)
]);
}
Our EstablishTurnOrderAction
is an alias for BaseAction
, as it doesnât
contain any additional information, just the client ID. The context contains our
clientId
and the turn order array we need to populate.
The state machine posts our clientID as an action of type EstablishTurnOrder
action. Then for the given number of players, we expect an action of this type.
We check that incoming action is of this type, then we check we donât see the
same action coming multiple times from the same client. Finally, we add the
received clientId
to the turnOrder
array.
And thatâs it - once this executes, all clients will end up with the same
turnOrder
array and will know whether it is their turn to act, or they should
be waiting for another client to take a turn.
We can extend this implementation to also establish a shared prime: each client posts a prime, then the first one to arrive to others âwinsâ and becomes the shared prime.
Weâll update our EstablishTurnOrderAction
to include a prime:
type SerializedPrime = string;
type EstablishTurnOrderAction = BaseAction & { prime: SerializedPrime };
We need to define a SerializedPrime
(as a string) to work around the fact that
we canât serialize BigInt
s using JSON.stringify()
, which is what weâre using
to serialize actions.
We extend our context to also include the shared prime:
type EstablishTurnOrderContext = {
clientId: ClientId;
prime: bigint | undefined;
turnOrder: ClientId[];
};
Our state machine also gets updated:
function makeEstablishTurnOrderSequence(players: number) {
return sm.sequence([
sm.local(async (actionQueue: IQueue<EstablishTurnOrderAction>, context: EstablishTurnOrderContext) => {
await actionQueue.enqueue({
type: "EstablishTurnOrder",
clientId: context.clientId,
prime: BigIntUtils.bigIntToString(BigIntUtils.randPrime()),
});
}),
sm.repeat(sm.transition((action: EstablishTurnOrderAction, context: EstablishTurnOrderContext) => {
if (action.type !== "EstablishTurnOrder") {
throw new Error("Invalid action type");
}
if (context.turnOrder.length === 0) {
context.prime = BigIntUtils.stringToBigInt(action.prime);
}
if (context.turnOrder.find((id) => id === action.clientId)) {
throw new Error("Same client posted prime multiple times");
}
context.turnOrder.push(action.clientId);
}), players)
]);
}
The only changes are:
turnOrder
array is empty, meaning we just received the first action,
we set the prime
in the context.With these changes, after we run this state machine we have both the turn order and a prime all clients agree on.
To make calling this easier, we provide a function to initialize the context:
function makeEstablishTurnOrderContext(
clientId: ClientId
): EstablishTurnOrderContext {
return {
clientId,
prime: undefined,
turnOrder: [],
};
}
Then putting it all together:
export async function establishTurnOrder(
players: number,
clientId: ClientId,
actionQueue: IQueue<BaseAction>
) {
const context = makeEstablishTurnOrderContext(clientId);
const establishTurnOrderSequence = makeEstablishTurnOrderSequence(players);
await sm.run(establishTurnOrderSequence, actionQueue, context);
return [context.prime!, context.turnOrder] as const;
}
We create a context, we create the state machine, then we run it. The function returns the shared prime and the turn order.
In this post we covered a few primitives or building blocks we can use for building games:
getLedger()
function. The code for this is in the demo/transport
package,
in
container.ts
.SignedTransport
which signs outbound
actions and verifies signatures of incoming actions. The code for this is in
packages/primitives/upgradeTransport.ts
.packages/primitives/establishTurnOrder.ts
.With the primitives out of the way, in the next post weâll look at the high-level of modeling a game using the toolkit.
]]>For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered the state machine we use to implement game logic.
We now have all the pieces in place to look at a card shuffling algorithm. Shuffling cards in a game of Mental Poker is one of the key innovations for this type of zero-trust games. We went over the cryptography aspects of shuffling in Part 1.
Let's review the algorithm:
- Alice takes a deck of cards (an array), shuffles the deck, generates a secret key \(K_A\), and encrypts each card with \(K_A\).
- Alice hands the shuffled and encrypted deck to Bob. At this point, Bob doesn't know what order the cards are in (since Alice encrypted the cards in the shuffled deck).
- Bob takes the deck, shuffles it, generates a secret key \(K_B\), and encrypts each card with \(K_B\).
- Bob hands the deck to Alice. At this point, neither Alice nor Bob know what order the cards are in. Alice got the deck back reshuffled and re-encrypted by Bob, so she no longer knows where each card ended up. Bob reshuffled an encrypted deck, so he also doesn't know where each card is.
At this point the cards are shuffled. In order to play, Alice and Bob also need the capability to look at individual cards. In order to enable this, the following steps must happen:
- Alice decrypts the shuffled deck with her secret key \(K_A\). At this point she still doesn't know where each card is, as cards are still encrypted with \(K_B\).
- Alice generates a new set of secret keys, one for each card in the deck. Assuming a 52-card deck, she generates \(K_{A_1} ... K_{A_{52}}\) and encrypts each card in the deck with one of the keys.
- Alice hands the deck of cards to Bob. At this point, each card is encrypted by Bob's key, \(B_K\), and one of Alice's keys, \(K_{A_i}\).
- Bob decrypts the cards using his key \(K_B\). He still doesn't know where each card is, as now the cards are encrypted with Alice's keys.
- Bob generates another set of secret keys, \(K_{B_1} ... K_{B_{52}}\), and encrypts each card in the deck.
- Now each card in the deck is encrypted with a unique key that only Alice knows and a unique key only Bob knows.
If Alice wants to look at a card, she asks Bob for his key for that card. For example, if Alice draws the first card, encrypted with \(K_{A_1}\) and \(K_{B_1}\), she asks Bob for \(K_{B_1}\). If Bob sends her \(K_{B_1}\), she now has both keys to decrypt the card and
lookat it. Bob still can't decrypt it because he doesn't have \(K_{A_1}\).This way, as long as both Alice and Bob agree that one of them is supposed to
seea card, they exchange keys as needed to enable this.
While we covered the algorithm before, we didn't have the infrastructure in place to implement this. We now do.
We'll start by describing our shuffle actions. As we just saw in the above recap, we have 2 steps:
type ShuffleAction1 = BaseAction & { type: "Shuffle1"; deck: string[] };
type ShuffleAction2 = BaseAction & { type: "Shuffle2"; deck: string[] };
We only need to pass around the deck of cards (encrypted or not), so we extend
the BaseAction
type (which includes ClientId
and type
) to pin the type
and add the deck.
We need more data in the context though:
type ShuffleContext = {
clientId: string;
deck: string[];
imFirst: boolean;
keyProvider: KeyProvider;
commonKey?: SRAKeyPair;
privateKeys?: SRAKeyPair[];
};
We need to know our clientId
, whether we are first or second in the turn
order, we need a keyProvider
to generate encryption keys, a commonKey
(that's for the first encryption step) and privateKeys
(for the second
encryption step). We'll use the context later on, when we stich everything
together. Before that, let's look at the basic shuffling functions.
First, we need a function that shuffles an array:
function shuffleArray<T>(arr: T[]): T[] {
let currentIndex = arr.length, randomIndex;
while (currentIndex > 0) {
randomIndex = Math.floor(Math.random() * currentIndex);
currentIndex--;
[arr[currentIndex], arr[randomIndex]] = [arr[randomIndex], arr[currentIndex]];
}
return arr;
};
We won't go into the details of this, as it's a generic shuffling function, not specific to Mental Poker, but a required piece.
Let's look at the two shuffling steps next. First step, in which we shuffle and encrypt all cards with the same key:
async function shuffle1(keyProvider: KeyProvider, deck: string[]): Promise<[SRAKeyPair, string[]]> {
const commonKey = keyProvider.make();
deck = shuffleArray(deck.map((card) => SRA.encryptString(card, commonKey)));
return [commonKey, deck];
};
The shuffle1()
function takes a keyProvider
, a deck
, and returns a
promise of a shuffled deck plus the key used to encrypt it.
The function is pretty straight-forward: we generate a new key, we encrypt each card with it, then we shuffle the deck. We return the key and the now shuffled and encrypted deck.
Both players need to perform the first step, after which both Alice and Bob have encrypted the deck with \(K_A\) and \(K_B\) respectively, so neither knows the order of the cards.
The next step, according to our algorithm, is for each player to decrypt the deck with their key and encrypt each card individually with a unique key:
async function shuffle2(commonKey: SRAKeyPair, keyProvider: KeyProvider, deck: string[]): Promise<[SRAKeyPair[], string[]]> {
const privateKeys: SRAKeyPair[] = [];
deck = deck.map((card) => SRA.decryptString(card, commonKey));
for (let i = 0; i < deck.length; i++) {
privateKeys.push(keyProvider.make());
deck[i] = SRA.encryptString(deck[i], privateKeys[i]);
}
return [privateKeys, deck];
}
shuffle2()
is also fairly straight-forward. It takes the commonKey
from
step 1, a keyProvider
, and the encrypted deck
.
First, it decrypts all cards using the commonKey
(note the cards are still
encrypted by the other player). Next, it uses the keyProvider
to generate
a key for each card, and encrypts each card with the key. The function returns
the private keys generated, and the re-encrypted deck.
We now have all the basics in place. Here's how we put it all together:
Here is the state machine that describes the shuffling steps:
function makeShuffleSequence() {
return sm.sequence([
sm.local(async (queue: IQueue<ShuffleAction1>, context: ShuffleContext) => {
if (!context.imFirst) {
return;
}
[context.commonKey, context.deck] = await shuffle1(context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle1",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction1, context: ShuffleContext) => {
if (action.type !== "Shuffle1") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction1>, context: ShuffleContext) => {
if (context.imFirst) {
return;
}
[context.commonKey, context.deck] = await shuffle1(context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle1",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction1, context: ShuffleContext) => {
if (action.type !== "Shuffle1") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction2>, context: ShuffleContext) => {
if (!context.imFirst) {
return;
}
[context.privateKeys, context.deck] = await shuffle2(context.commonKey!, context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle2",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction2, context: ShuffleContext) => {
if (action.type !== "Shuffle2") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
}),
sm.local(async (queue: IQueue<ShuffleAction2>, context: ShuffleContext) => {
if (context.imFirst) {
return;
}
[context.privateKeys, context.deck] = await shuffle2(context.commonKey!, context.keyProvider, context.deck);
await queue.enqueue({
type: "Shuffle2",
clientId: context.clientId,
deck: context.deck,
});
}),
sm.transition(async (action: ShuffleAction2, context: ShuffleContext) => {
if (action.type !== "Shuffle2") {
throw new Error("Invalid action type");
}
context.deck = action.deck;
})
]);
}
Note we are limiting this to a 2-player game, though we can easily generalize to more players if needed.
This is a longer function so let's break it down:
shuffle1()
and post the encrypted deck as a Shuffle1
action.Shuffle1
action to arrive - either the one we just posted
(if imFirst
is true
) or incoming from the other player. We store the
encrypted and shuffled deck.shuffle1()
if we are not the first player - if we are not
the first player, then it is our turn to shuffle now. We post another
Shuffle1
action.Shuffle1
action to arrive and update the deck.At this point, both players performed the first step of the shuffle, so the
deck is encrypted with \(K_A\) And \(K_B\) and neither players knows the turn order.
We move on to the second step of the shuffle, where each player calls
shuffle2()
to decrypt the deck and re-encrypt each individual card. Again,
depending on whether we are first or not, we take action or wait:
imFirst
is true
, call shuffle2()
and post a Shuffle2
action.Shuffle2
action and update the deck.imFirst
is not true
, call shuffle2()
and post a Shuffle2
action.Shuffle2
action and update the deck.A helper function to run this state machine given an async queue:
async function shuffle(
clientId: string,
turnOrder: string[],
sharedPrime: bigint,
deck: string[],
actionQueue: IQueue<BaseAction>,
keySize: number = 128 // Key size, defaults to 128 bytes
): Promise<[SRAKeyPair[], string[]]> {
if (turnOrder.length !== 2) {
throw new Error("Shuffle only implemented for exactly two players");
}
const context: ShuffleContext = {
clientId,
deck,
imFirst: clientId === turnOrder[0],
keyProvider: new KeyProvider(sharedPrime, keySize)
};
const shuffleSequence = makeShuffleSequence();
await sm.run(shuffleSequence, actionQueue, context);
return [context.privateKeys!, context.deck];
}
We need our clientId
, the turn order (whether we go first or not), a shared
large prime (to seed other encryption keys), an unshuffled deck, a queue, and,
optionally, a keySize
.
From the input, we create a ShuffleContext
with the required data, then we
generate the state machine by calling the function we discussed previously,
and we run the state machine using the given actionQueue
and generated
context
.
We return the private keys with which we encrypted each individual card, and the shuffled and encrypted deck.
Shuffling a full deck of 52 cards with large enough key sizes gets noticeably slow. Note that we need to generate an encryption key for each card, which involves searching for large prime numbers. The more secure we want the encryption to be, the larger the number of bits we want in the key, the longer it takes to find a key.
This can be mitigated with some loading/progress UI while shuffling. For the
demo discard game
in mental-poker-toolkit
, I used a smaller deck (only cards from 9
to A
)
and a smaller key size (64 bits).
When implementing a game, it might be a good idea to start generating encryption keys asynchronously as soon as possible - note though that the players need to agree on a shared large prime before key generation can begin.
In this post we looked at an implementation of card shuffling.
shuffle1()
and shuffle2()
.The Mental Poker Toolkit is here.
This post covered card shuffling, which is implemented in the primitives
package in shuffle.ts.
For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered actions and an async queue implementation.
In this post, we'll finally look at the infrastructure on top of which we'll model games. The type of games we're considering can all be modeled as state machines^{1}. The challenge is we need a generic enough framework that works for any game, so let's consider what they all have in common.
We can't tell what the exact states of a game are, as they depend on the specific game. But, in general, game play implies transitioning from one state to another.
In some cases, an action originates on our client. For example: we pick between rock, paper, or scissors; we want to draw a card etc. This means we need to run some logic on our client, then send an Action over our transport to other clients.
To keep things generic and unopinionated, the minimal interface for this is a
function that takes an action queue and a context
.
type LocalTransition<TAction extends BaseAction, TContext> = (
actionQueue: IQueue<TAction>,
context: TContext
) => void | Promise<void>;
We covered the queue in the previous post. We need this in a local transition because we will run some code then, in most cases, we'll want to enqueue an action and send it to other players. We'll look at an example of this later on in this post.
The context
can be anything - this enables the game to pass-through whatever
data the function needs. Our state machine implementation doesn't care about
what that data is, this is just the mechanism to make it available to the code
in the function.
The function can return either void
or a Promise<void>
in case it needs to
be async.
In other cases, an action arrives over the transport. This is an action that was sent either by another player, or by us and we receive it back from the server after it has been sequenced^{2}.
In this case, our interface is a function that takes the incoming Action
and
a context.
type Transition<TAction extends BaseAction, TContext> = (
action: TAction,
context: TContext
) => void | Promise<void>;
In this case, we don't necessarily need access to the queue, since we won't
enqueue an action, rather we're processing one. The context
is, again, up to
the consumer of this API.
The function similarly returns void
or a Promise<void>
in case it needs to
be async.
Finally, we need an abstraction over both LocalTransition
and Transition
so
when we specify our state machine we can treat them the same way. We'll use
RunnableTransition
for this:
type RunnableTransition<TContext> = {
actionQueue: IQueue<BaseAction>,
context: TContext
}: Promise<void>;
We expect users of our library to write code in terms of local transitions
(LocalTransition
) and remote transitions (Transition
). This type is meant
to be used internally. Note we are doing some type erasure here as we're going
from a generic IQueue
to a IQueue<BaseAction>
. That's because we need to
work with the queue in our library code, but the exact Action
types depend on
the game.
For local transitions, we simply pass through the actionQueue
. For remote
transitions, we dequeue an action and pass that. We'll see how to do this next.
We're also normalizing return to be Promise<void>
regardless of whether the
transition function originally returned void
or Promise<void>
.
Our state machine is implemented as a set of functions. First, we have a few
factory functions. local()
creates a RunnableTransition
from a
LocalTransition
:
function local<TAction extends BaseAction, TContext>(
transition: LocalTransition<TAction, TContext>
): RunnableTransition<TContext> {
return async (queue: IQueue<BaseAction>, context: TContext) =>
await Promise.resolve(
transition(queue as IQueue<TAction>, context)
);
}
We call Promise.resolve()
to get a Promise
regardless of whether the given
transition
is a synchronous or asynchronous function.
remote()
converts a remote transition into a RunnableTransition
:
function transition<TAction extends BaseAction, TContext>(
transition: Transition<TAction, TContext>
): RunnableTransition<TContext> {
return async (queue: IQueue<BaseAction>, context: TContext) => {
const action = await queue.dequeue();
await Promise.resolve(transition(action as TAction, context));
};
}
Here, we dequeue an action, then pass it to the given transition.
In many cases, we expect multiple players to take the same action, for example
each player picks between rock, paper, or scissors - in this case, we will
expect one remote action coming in from each player (including us), of the same
type. Most times we want to treat these actions the same way, which means we
want to run the same Transition
function for each. The repeat()
function
takes a RunnableTransition
and repeats it a given number of times:
function repeat<TContext>(
transition: RunnableTransition<TContext>,
times: number
): RunnableTransition<TContext>[] {
return Array(times).fill(transition);
}
This gives as an array of RunnableTransitions
we can execute in sequence.
Finally, we might want to combine the output of calling local()
with the
output of calling repeat()
into a longer sequence of RunnableTransitions
we
can run - the first function gives us a RunnableTransition
, the second
function gives us an array of RunnableTransition
s. To address this, we
provide sequence
:
function sequence<TContext>(
transitions: (
| RunnableTransition<TContext>
| RunnableTransition<TContext>[]
)[]
): RunnableTransition<TContext>[] {
return transitions.flat();
}
This function takes an array of RunnableTransition
s, or an array of
arrays, and calls flat()
on this to flatten nested array into a single, flat
list.
Once we have a sequence of transitions, we can run them using run()
:
async function run<TContext>(
sequence: RunnableTransition<TContext>[],
queue: IQueue<BaseAction>,
context: TContext
) {
for (const transition of sequence) {
await transition(queue, context);
}
}
We simply execute each RunnableTransition
in turn.
Understandably, this has all been abstract. Let's now see how we can use these functions to model interactions.
Let's look at a simple example: key exchange: in order to secure our transport, we want each client to share a public key, then sign each subsequent message with their corresponding private key.
We looked at securing the transport layer in this post. We haven't discussed the key negotiation though.
Let's create the following protocol: as each client joins the game, they post
a public key. For an N
player game, each client should expect N
remote
transitions consisting of clients publishing public keys. Once all of these
were processed, we should have all public keys for all clients and can create
a SignedTransport
.
Let's sketch out the state machine:
function makeKeyExchangeSequence(players: number) {
return sm.sequence([
sm.local(async (actionQueue: IQueue<KeyExchangeAction>, context: CryptoContext) => {
// Post public key ...
}),
sm.repeat(sm.transition((action: KeyExchangeAction, context: CryptoContext) => {
// Store incoming public key ...
}), players)
]);
}
Note we create a LocalTransition
in which we post our own public key, and we
repeat the remote transition handling an incoming public key (remember with
Fluid we expect the server to also send us back whatever we post).
Clients can join the game at different times, so we don't know in what order
the keys will come in but, luckily, each Action
has a clientId
so we know
who's key it is.
We'll look at the implementation of the transitions but first let's see what
are the KeyExchangeAction
and CryptoContext
:
type KeyExchangeAction = {
clientId: ClientId;
type: "KeyExchange";
publicKey: Key;
};
type CryptoContext = {
clientId: ClientId;
me: PublicPrivateKeyPair;
keyStore: KeyStore;
};
KeyExchange
is an action consisting of clientId
and publicKey
, with the
type
set to "KeyExchange"
.
CryptoContext
is the context needed by the transitions implementing the key
exchange - that is we need to know our own clientId
, our public-private
key pair, and we need a keyStore
, which is a map of clientId
to public key.
We looked at the KeyStore
and the other key types in a previous blog post, but
here they are again for reference:
type Key = string;
type PublicPrivateKeyPair = {
publicKey: Key;
privateKey: Key;
};
type KeyStore = Map<ClientId, Key>;
With these in place, let's look at the implementation of the transitions:
function makeKeyExchangeSequence(players: number) {
return sm.sequence([
sm.local(
async (
actionQueue: IQueue<KeyExchangeAction>,
context: CryptoContext
) => {
// Post public key
await actionQueue.enqueue({
type: "KeyExchange",
clientId: context.clientId,
publicKey: context.me.publicKey,
});
}
),
sm.repeat(
sm.transition(
(action: KeyExchangeAction, context: CryptoContext) => {
// This should be a KeyExchangeAction
if (action.type !== "KeyExchange") {
throw new Error("Invalid action type");
}
// Protocol expects clients to post an ID
if (action.clientId === undefined) {
throw new Error("Expected client ID");
}
// Protocol expects each client to only post once and to have a unique ID
if (context.keyStore.has(action.clientId)) {
throw new Error(
"Same client posted key multiple times"
);
}
context.keyStore.set(action.clientId, action.publicKey);
}
),
players
),
]);
}
sm
stands for state machine
. The functions described above live in a
StateMachine
namespace aliased to sm
.
Our local transition is simple: we enqueue a KeyExchangeAction
, sending our
clientId
and publicKey
from the CryptoContext
.
When a remote action comes in, we perform the required validations:
KeyExchangeAction
.clinetId
.Finally, we store the clientId
and publicKey
.
The end-to-end implementation for key exchange, relying on the state machine, is here:
async function makeCryptoContext(clientId: ClientId): Promise<CryptoContext> {
return {
clientId,
me: await Signing.generatePublicPrivateKeyPair(),
keyStore: new Map<ClientId, Key>(),
};
}
async function keyExchange(
players: number,
clientId: ClientId,
actionQueue: IQueue<BaseAction>
) {
const context = await makeCryptoContext(clientId);
const keyExchangeSequence = makeKeyExchangeSequence(players);
await sm.run(keyExchangeSequence, actionQueue, context);
return [context.me, context.keyStore] as const;
}
makeCryptoContext()
is a helper function to initialize a CryptoContext
instance - it takes a clientId
, generates a public-private key pair, and
initializes an empty key store.
keyExchange()
calls the functions we defined previously to get a
CryptoContext
, the key exchange sequence, and calls the state machine's
run()
to execute the key exchange.
Once done, it returns the client's public-private key pair, and the key store.
From a caller's perspective, the protocol handling key exchange is now
abstracted away behind the keyExchange()
function. The caller doesn't have to
worry about the mechanics of exchanging keys, rather can just call this and get
back all the required data to create a SignedTransport
.
As a second example, we'll sketch out the state machine for a game of rock-paper-scissors. We won't dive into all the implementation details. At a high level, here is how we play a game of rock-paper-scissors:
This two-step ensures players are committed to a selection and can't cheat by observing what the other player picked and picking afterwards.
The state machine for this game is:
The state machine is:
sm.sequence([
sm.local(async (queue, context) => {
// Post our play action
}),
sm.repeat(sm.transition(async (action, context) => {
// Both player and opponent need to post their encrypted selection
}), 2),
sm.local(async (queue, context) => {
// Post our reveal action
}),
sm.repeat(sm.transition(async (reveal: RevealAction, context: RootStore) => {
// Both player and opponent need to reveal their selection
}), 2)
]);
We won't fill in the functions in this post but this gives you an idea of how we can model a more complex set of steps using our library.
In this post we looked at a state machine we can use to implement games:
Action
types, and has its own relevant
context.RunnableTransition
is a common type that can wrap local or remote
transitions.The Mental Poker Toolkit is here. This post covered the state-machine package, the key exchange is implemented in the primitives package.
Sequenced is a Fluid Framework term. Clients send messages to the Fluid relay service, which orders them in the order they came in and broadcasts them to all clients. This is to ensure all clients eventually see all the messages sent in the same order. ↩
For an overview on Mental Poker, see Mental Poker Part 0: An Overview. Other articles in this series: https://vladris.com/writings/index.html#mental-poker. In the previous post in the series we covered the transport.
As I was building up the library and looking at state machines that would run
turns in a game, I realized an async queue would come in handy. The challenge
with the raw ITransport
interface built on top of the Fluid ledger is that if
you are not the first client to join a session, you end up with a set of ops
that already exist on the ledger. You need a way to consume both the ops that
were already sequenced and new incoming ops. An async interface is also easier
to consume than callbacks.
Before diving into that though, letâs talk about actions.
As a reminder, op is the Fluid Framework term for data being sent/received. In
Mental Poker we use actions. All actions should be subtypes of BaseAction
:
export type ClientId = string;
export type BaseAction = {
clientId: ClientId;
type: unknown;
};
Every action should have a clientId
showing which client it came from, and a
type
.
For example, hereâs how we would model a game of Rock/Paper/Scissors:
We model the game in these two steps so regardless of which player moves first, the player choices are revealed after they have been put on the ledger. If a player would simply post their unencrypted selection, the other player might cheat by looking at it before posting their own.
I will cover the Rock/Paper/Scissors implementation in detail in a future post, for now, letâs just go over the actions:
export type PlayAction = {
clientId: ClientId;
type: "PlayAction";
encryptedSelection: EncryptedSelection;
};
export type RevealAction = {
clientId: ClientId;
type: "RevealAction";
key: SerializedSRAKeyPair;
};
export type Action = PlayAction | RevealAction;
The two actions described above are modeled as PlayAction
and RevealAction
.
Both of these have a clientId
and type
, thus are subtypes of BaseAction
.
Finally, the Action
type represents all possible actions in the game.
This becomes relevant as we move higher in the stack of the Mental Poker
library. Once we start encoding some of the game semantics, we require generic
types to extend BaseAction
. This is what happens with the async queue.
As I mentioned at the beginning of the article, queues aim to provide a nicer API over the transport. The interface is very simple:
export interface IQueue<T extends BaseAction> {
enqueue(value: T): Promise<void>;
dequeue(): Promise<T>;
}
For any type T
extending BaseAction
, we can enqueue()
a value and we can
dequeue()
a value. Both of the operations are asynchronous.
Iâll show the full implementation then go over the details:
export class ActionQueue<T extends BaseAction> implements IQueue<T> {
private queue: T[] = [];
constructor(
private readonly transport: ITransport<T>,
preseed: boolean = false
) {
transport.on("actionPosted", (value) => {
this.queue.push(value);
});
if (preseed) {
for (const value of transport.getActions()) {
this.queue.push(value);
}
}
}
async enqueue(value: T) {
await this.transport.postAction(value);
}
async dequeue(): Promise<T> {
const result = this.queue.shift();
if (result) {
return Promise.resolve(result);
}
return new Promise<T>((resolve) => {
this.transport.once("actionPosted", async () => {
resolve(await this.dequeue());
});
});
}
}
The implementation maintains an array of T
s (actions). The constructor takes a
transport
argument of type ITransport
and preseed
flag:
constructor(
private readonly transport: ITransport<T>,
preseed: boolean = false
) {
transport.on("actionPosted", (value) => {
this.queue.push(value);
});
if (preseed) {
for (const value of transport.getActions()) {
this.queue.push(value);
}
}
}
/* ... */
The queue starts listening to the actionPosted
event and whenever we have an
incoming value, we push it to the internal queue. If preseed
is true
, we
also push all actions already posted to the queue.
The reason we make this optional is that we might end up using multiple queues
in a game implementation but we only want to consume the actions posted on the
ledger before we joined the session once. After we are âup to speedâ, new
incoming actions fire events which we can consume in realtime. So we would
usually create our first queue with preseed
set to true
and subsequent
queues with preseed
set to false
.
Enqueuing a value is trivial - we leverage the transportâs postAction
API:
/* ... */
async enqueue(value: T) {
await this.transport.postAction(value);
}
/* ... */
Dequeuing is a bit more interesting:
/* ... */
async dequeue(): Promise<T> {
const result = this.queue.shift();
if (result) {
return Promise.resolve(result);
}
return new Promise<T>((resolve) => {
this.transport.once("actionPosted", async () => {
resolve(await this.dequeue());
});
});
}
/* ... */
First, we call shift()
on the queue. This either returns a value or
undefined
if the queue is empty.
If we do get a value, we return a resolved promise right away.
If we donât have a value, we add a one-time listener to the actionPosted
event. When a new action is posted, the underlying transport will fire the
event. Since event listeners are called in the order they subscribed, we are
guaranteed the listener we added in the constructor fires first, and adds the
value to queue
. We resolve the promise by recursively calling dequeue()
and
awaiting the response.
The reason we do this is we might have multiple callers to dequeue()
holding
on to promises. In this case, we donât want to resolve all of them with the
incoming value, rather just the first one. The first recursive call to
dequeue()
should grab the value from the internal queue
and return it right
away, while other recursive callers would end up awaiting again until a new
value comes in. There's probably a more efficient non-recursive implementation
but for our specific use-case (games), we don't expect many cases where we have
multiple dequeus pending.
There are two main reasons for using this queue rather than relying directly on the underlying transport.
First, the underlying transport can have a set of actions (messages) that
already arrived on the client (which we would retrieve with the getActions()
method), and some which arrive in real time (which would fire events). The
queue gives us a unified way to consume both, by calling await dequeue()
.
Besides a unified interface, we expect multiple spots in the code to wait for
an incoming action. This depends on the game implementation, but usually at
different game states we expect different messages to come in. This is harder
to achieve waiting for event callbacks and much easier to do via the same
await dequeue()
call.
In this post we looked at actions, the key building blocks of Mental Poker games, and an async queue which provides a clean abstraction over the underlying transport.
The code covered in this post is available on GitHub in
theÂ mental-poker-toolkitÂ repo.
BaseAction
and the ITransport
and IQueue
interfaces are part of the core
types package packages/types.
ActionQueue
Â is implemented underÂ packages/action-queue.
I always have fun with Advent of Code every December, and last year I did write a blog post covering some of the more interesting problems I worked through. I'll continue the tradition this year.
I'll repeat my disclaimer from last time:
Disclaimer on my solutions
I use Python because I find it easiest for this type of coding. I treat solving these as a write-only exercise. I do it for the problem-solving bit, so I don't comment the code & once I find the solution I consider it
done- I don't revisit and try to optimize even though sometimes I strongly feel like there is a better solution. I don't even share code between part 1 and part 2 - once part 1 is solved, I copy/paste the solution and change it to solve part 2, so each can be run independently. I also rarely use libraries, and when I do it's some standard ones likere
,itertools
, ormath
. The code has no comments and is littered with magic numbers and strange variable names. This is not how I usually code, rather my decadent holiday indulgence. I wasn't thinking I will end up writing a blog post discussing my solutions so I would like to apologize for the code being hard to read.
All my solutions are on my GitHubÂ here.
This time around, I did use GitHub Copilot, with mixed results. In general, it mostly helped with tedious work, like implementing the same thing to work in different directions - there are problems that require we do something while heading north, then same thing while heading east etc. I did also observe it produce buggy code that I had to manually edit.
I'll skip over the first few days as they tend to be very easy.
Problem statement is here.
This is an easy problem, I just want to call out a shortcut: for part 2, to exact same algorithm as in part 1 works if you first reverse the input. This was a neat discovery that saved me a bunch of work.
Problem statement is here.
Part 1 was again very straightforward. I found part 2 a bit more interesting,
especially the fact that we can determine whether a tile is inside
or
outside
our loop by only looking at a single row (or column). We always start
outside
, then scan each tile. If we hit a |
, then we toggle from outside
to inside
and vice-versa. If we hit an L
or a F
, we continue while we're
on a -
(these are all parts of our loop), and we stop on the 7
or J
. If we
started on L
and ended on J
or started on F
and eded on 7
- meaning the
pipe bends and turns back the way we came, we don't change our state. On the
other hand, if the pipe goes down
from L
to 7
or up
from F
to J
,
then we toggle outside
/inside
. For each non-pipe tile, if we're inside
, we
count it. Maybe this is obvious but it took me a bit to figure it out.
def scan_line(ln):
total, i, inside, start = 0, -1, False, None
while i < len(grid[0]) - 1:
i += 1
if (ln, i) not in visited:
if inside:
total += 1
else:
if grid[ln][i] == '|':
inside = not inside
continue
# grid[ln][i] in 'LF'
start = grid[ln][i]
i += 1
while grid[ln][i] == '-':
i += 1
if start == 'L' and grid[ln][i] == '7' or \
start == 'F' and grid[ln][i] == 'J':
inside = not inside
return total
In the code above, visited
tracks pipe segments (as opposed to tiles that are
not part of the pipe).
Problem statement is here.
Day 11 was easy, so not much to discuss. Use Manhattan distance for part 1 and
in part 2, just add 999999
for every row or column crossed that doesn't
contain any galaxies.
Problem statement is here.
Part 1 was very easy.
Part 2 was a bit harder because just trying out every combination takes forever
to run. I initially tried to do something more clever around deciding when to
turn a ?
into #
or .
depending on what's around it, where we are in the
sequence, etc. But ultimately it turns out just adding memoization made the
combinatorial approach run very fast.
Problem statement is here.
This was a very easy one, so I won't cover it.
Problem statement is here.
This was easy but part 2 was tedious, having to implement tilt
functions for
various directions. This is where Copilot saved me a bunch of typing.
Once we have the tilt
functions, we can implement a cycle
function that
tilts things north, then west, then south, then east. Finally, we need a bit of
math to figure out the final position: we save the state of the grid after each
cycle and as soon as we find a configuration we encountered before, it means we
found our cycle. Based on this, we know how many steps we have before the cycle,
what the length of the cycle is, so we can compute the state after 1000000000
cycles:
pos = []
while (state := cycle()) not in pos:
pos.append(state)
lead, loop = pos.index(state), len(pos) - pos.index(state)
d = (1000000000 - lead) % loop
With this, we need to count the load of the north support beams for the grid we
have at pos[lead + d - 1]
.
Problem statement is here.
Another very easy one that I won't cover.
Problem statement is here.
This one was also easy and tedious, as we have to handle the different types of reflections. Another one where Copilot saved me a lot of typing.
Problem statement is here.
This was a fairly straightforward depth-first search, where we keep a cache of how much heat loss we have up to a certain point. The one interesting complication is that we can only move forward 3 times. In the original implementation, I keyed the cache on grid coordinates + direction we're going in + how many steps we already took in that direction. This worked in reasonable time.
In part 2, we now have to move at least 4 steps in one direction and at most 10. The cache I used in part 1 doesn't work that well anymore. On the other hand, I realized that rather than keeping track of direction and how many steps we took in that direction so far, I can model this differently: we are moving either horizontally or vertically. If we're at some point and moving horizontally, we can expand our search to all destination points (from 4 to 10 away horizontally or vertically) and flip the direction. For example, if we just moved horizontally to the right, we won't move further to the right as we already covered all those cases, and we won't move back left as the crucible can't turn 180 degrees. That means the only possible directions we can take are up or down in this case, meaning since we just moved horizontally, we now have to move vertically.
This makes our cache much smaller: our key is the coordinates of the cell and the direction we were moving in. This also makes the depth-first search complete very fast.
best, end = {}, 1000000
def search(x, y, d, p):
global end
if p >= end:
return
if x == len(grid) - 1 and y == len(grid[0]) - 1:
if p < end:
end = p
return
if (x, y, d) in best and best[(x, y, d)] <= p:
return
best[(x, y, d)] = p
if d != 'H':
if x + 3 < len(grid[x]):
pxr = p + grid[x + 1][y] + grid[x + 2][y] + grid[x + 3][y]
for i in range(4, 11):
if x + i < len(grid):
pxr += grid[x + i][y]
search(x + i, y, 'H', pxr)
if x - 3 >= 0:
pxl = p + grid[x - 1][y] + grid[x - 2][y] + grid[x - 3][y]
for i in range(4, 11):
if x - i >= 0:
pxl += grid[x - i][y]
search(x - i, y, 'H', pxl)
if d != 'V':
if y + 3 < len(grid[0]):
pyd = p + grid[x][y + 1] + grid[x][y + 2] + grid[x][y + 3]
for i in range(4, 11):
if y + i < len(grid[0]):
pyd += grid[x][y + i]
search(x, y + i, 'V', pyd)
if y - 3 >- 0:
pyu = p + grid[x][y - 1] + grid[x][y - 2] + grid[x][y - 3]
for i in range(4, 11):
if y - i >= 0:
pyu += grid[x][y - i]
search(x, y - i, 'V', pyu)
I realized this approach actually applies well to part 1 too, and retrofitted it there. The only difference is instead of expanding to the cells +4 to +10 in a direction, we expand to the cells +1 to +3.
Problem statement is here.
The first part is easy - we plot the input on a grid, then flood fill to find the area.
In the below code, dig
is the input, processed as a tuple of direction and
number of steps:
x, y, grid = 0, 0, {(0, 0)}
for dig in digs:
match dig[0]:
case 'U':
for i in range(dig[1]):
y -= 1
grid.add((x, y))
case 'R':
for i in range(dig[1]):
x += 1
grid.add((x, y))
case 'D':
for i in range(dig[1]):
y += 1
grid.add((x, y))
case 'L':
for i in range(dig[1]):
x -= 1
grid.add((x, y))
x, y = min([x for x, _ in grid]), min([y for _, y in grid])
while (x, y) not in grid:
y += 1
queue = [(x + 1, y + 1)]
while queue:
x, y = queue.pop(0)
if (x, y) in grid:
continue
grid.add((x, y))
queue += [(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)]
print(len(grid))
Part 2 is trickier, as the number are way larger and the same flood fill
algorithm won't work. My approach was to divide the area into rectangles: as we
process all movements, we end up with a set of (x, y)
tuples of points where
our line changes direction. If we sort all the x
coordinates and all y
coordinates independently, we end up with a grid where we can treat each pair of
subsequent x
s and y
s as describing a rectangle on our grid.
x, y, points = 0, 0, [(0, 0)]
for dig in digs:
match dig[0]:
case 0: x += dig[1]
case 1: y += dig[1]
case 2: x -= dig[1]
case 3: y -= dig[1]
if dig[1] < 10:
print(dig[1])
points.append((x, y))
xs, ys = sorted({x for x, _ in points}), sorted({y for _, y in points})
Where digs
above represents the input, processed as before into direction and
number of steps tuples.
Now points
contains all the connected points we get following the directions,
which means a pair of subsequent points describes a line. Once we have this, we
can start a flood fill in one of the rectangles and proceed as follows: if there
is a north boundary, meaning we have a line between our top left and top right
coordinates, then we don't recurse north; otherwise we go to the rectangle north
of our current rectangle and repeat the algorithm there. Same for east, south,
west.
Since we have to consider each point in the terrain in our area calculation, we need to be careful how we measure the boundaries of each rectangle so we don't double-count or omit points. To ensure this, my approach was that for each rectangle we count, we count an extra line north (if there is no boundary) and an extra line east (if there is no boundary). If there's neither a north nor an east boundary, then we add 1 for the north-east corner. This should ensure we don't double-count, as each rectangle only considers its north and east boundaries, and we don't miss anything, as any rectangle without a boundary will count the additional points. What remains is the perimeter of our surface, which we add it at the end. The explanations might sound convoluted, but the code is very easy to understand:
queue, total, visited = [(1, 1)], 0, set()
while queue:
x, y = queue.pop(0)
e = min([i for i in xs if i > x])
s = max([i for i in ys if i < y])
w = max([i for i in xs if i < x])
n = min([i for i in ys if i > y])
if (n, e) in visited:
continue
visited.add((n, e))
total += (e - w - 1) * (n - s - 1)
found_n, found_s, found_e, found_w = False, False, False, False
for l1, l2 in zip(points, points[1:]):
if l1[1] == l2[1]:
if l1[1] == n and (l1[0] < x < l2[0] or l2[0] < x < l1[0]):
found_n = True
if l1[1] == s and (l1[0] < x < l2[0] or l2[0] < x < l1[0]):
found_s = True
elif l1[0] == l2[0]:
if l1[0] == e and (l1[1] < y < l2[1] or l2[1] < y < l1[1]):
found_e = True
if l1[0] == w and (l1[1] < y < l2[1] or l2[1] < y < l1[1]):
found_w = True
if not found_n:
total += e - w - 1
queue.append((x, n + 1))
if not found_s:
queue.append((x, s - 1))
if not found_e:
total += n - s - 1
queue.append((e + 1, y))
if not found_w:
queue.append((w - 1, y))
if not found_n and not found_e:
if (e, n) not in points:
total += 1
total += sum([dig[1] for dig in digs])
Problem statement is here.
For the first part, we can process rule by rule.
For the second part, start with bounds: (1, 4000)
for all of xmas
. Then at
each decision point, recurse updating bounds. Whenever we hit an A
, add the
bounds to the list of accepted bounds.
Bounds are guaranteed to never overlap, by definition.
accepts = []
def execute_workflow(workflow_key, bounds):
workflow = workflows[workflow_key]
for rule in workflow:
if rule == 'A':
accepts.append(bounds)
return
if rule == 'R':
return
if rule in workflows:
execute_workflow(rule, bounds)
return
check, next_workflow = rule.split(':')
if '<' in check:
key, val = check.split('<')
nb = bounds.copy()
nb[key] = (nb[key][0], int(val) - 1)
bounds[key] = (int(val), bounds[key][1])
elif '>' in check:
key, val = check.split('>')
nb = bounds.copy()
nb[key] = (int(val) + 1, nb[key][1])
bounds[key] = (bounds[key][0], int(val))
execute_workflow(next_workflow, nb)
execute_workflow('in', {'x': (1, 4000), 'm': (1, 4000), 'a': (1, 4000), 's': (1, 4000)})
This gives us all accepted ranges for each of x
, m
, a
, and s
.
Problem statement is here.
For the first part, we can model the various module types as classes with a
common interface and different implementations. Since one of the requirements is
to process pulses in the order they are sent, we will use a queue rather than
have objects call each other based on connections. So rather than module A
directly calling connected module B
when it receives a signal (which would
cause out-of-order processing), model A
will just queue a signal for module
B
, which will be processed once the signals queued before this one are already
processed.
I won't share the code here as it is straightforward. You can find it on my GitHub.
This one was one of the most interesting problems this year. Simply simulating
button presses wouldn't work. I ended up dumping the diagram as a dependency
graph and it looks like the only module that signals rx
is a conjunction
module with multiple inputs.
Conjunction modules emit a low pulse when they remember high pulses being sent
by all their connected inputs. In this case, we can simulate button presses and
keep track when each input to this conjunction module emits a high pulse. Then
we compute the least common multiple of these to determine when the rx
module
will get a low signal.
My full solution is here, though I'm still pretty sure it is topology-dependent. Meaning we might have a different set up where the inputs to this conjunction model are not fully independent, which might make LCM not return the correct answer.
Problem statement is here.
Part 1 is trivial, we can easily simulate 64 steps and count reachable spots.
The second part is much more tricky - this is actually the problem I spent the most time on. Since the garden is infinite, and we are looking for a very high number of steps, we can't use the same approach as in part 1 to simply simulate moves.
Let's now call a tile
a repetition of the garden on our infinite grid. Say we
start with the garden at (0, 0)
. Then as we expand beyond its bounds, we reach
tiles (-1, 0)
, (1, 0)
, (0, -1)
, (0, 1)
, which are repetitions of our
initial garden.
The two observations that helped here were:
In fact, after we grow beyond the first 4 surrounding tiles, it seems like the
garden grows with a periodicity of the size of the garden. Meaning every
len(grid)
steps, we reach new tiles. There are a few cases to consider -
north, east, south, west, diagonals.
My approach was to do a probe - simulate the first few steps and record the results.
def probe():
dx, dy = len(grid) // 2, len(grid[0]) // 2
tiles, progress = {(dx, dy)}, {(0, 0): {0: 1}}
i = 0
while len(progress) < 41:
i += 1
new_tiles = set()
for x, y in tiles:
if grid[(x - 1) % len(grid)][y % len(grid[0])] != '#':
new_tiles.add((x - 1, y))
if grid[(x + 1) % len(grid)][y % len(grid[0])] != '#':
new_tiles.add((x + 1, y))
if grid[x % len(grid)][(y - 1) % len(grid[0])] != '#':
new_tiles.add((x, y - 1))
if grid[x % len(grid)][(y + 1) % len(grid[0])] != '#':
new_tiles.add((x, y + 1))
tiles = new_tiles
for x, y in tiles:
sq_x, sq_y = x // len(grid), y // len(grid[0])
if (sq_x, sq_y) not in progress:
progress[(sq_x, sq_y)] = {}
if i not in progress[(sq_x, sq_y)]:
progress[(sq_x, sq_y)][i] = 0
progress[(sq_x, sq_y)][i] += 1
return progress
Here progress
keeps track, for each tile (keyed as set of (x, y)
coordinates
offset from (0, 0)
), of how many spots are reachable at a given time. I run
this until progress
grows enough for the repeating pattern to show - because
we start from the center of a garden but in all other tiles we enter from a
side, it takes a couple of iterations for the pattern to stabilize. My guess is
this probe could be smaller with some better math, but that's what I have.
With this, given a number of steps, we can reduce it using steps % len(grid)
to a smaller value we can loop in our progress
record. The reasoning being, if
the pattern repeats, it doesn't really matter whether we are 3 steps into tile
(-1000, 0)
or 3 steps into tile (-3, 0)
.
The tedious part was determining the right offsets and special cases when
computing the total number of squares. For example, even for the tiles that are
fully covered, we'll have a subset where tiles are on the odd
state of squares
and a subset where tiles are on the âeven" state.
I ended up with the following formula (which might still be buggy, but seemed to have worked for my input):
def at(x, y, step):
return progress[(x, y)][step] if step in progress[(x, y)] else 0
def count(steps):
even, odd = (1, 0) if steps % 2 == 0 else (0, 1)
for i in range(1, steps // len(grid)):
if steps % 2 == 0:
if i % 2 == 0:
even += 4 * i
else:
odd += 4 * i
else:
if i % 2 == 0:
odd += 4 * i
else:
even += 4 * i
total = even * at(0, 0, len(grid) * 2) + odd * at(0, 0, len(grid) * 2 + 1)
total += at(-3, 0, len(grid) * 3 + steps % len(grid))
total += at(3, 0, len(grid) * 3 + steps % len(grid))
total += at(0, -3, len(grid) * 3 + steps % len(grid))
total += at(0, 3, len(grid) * 3 + steps % len(grid))
i = steps // len(grid) - 1
total += i * at(-1, -1, len(grid) * 2 + steps % len(grid))
total += i * at(-1, 1, len(grid) * 2 + steps % len(grid))
total += i * at(1, -1, len(grid) * 2 + steps % len(grid))
total += i * at(1, 1, len(grid) * 2 + steps % len(grid))
i += 1
total += i * at(-2, -1, len(grid) * 2 + steps % len(grid))
total += i * at(-2, 1, len(grid) * 2 + steps % len(grid))
total += i * at(2, -1, len(grid) * 2 + steps % len(grid))
total += i * at(2, 1, len(grid) * 2 + steps % len(grid))
return total
I'm covering all inner even
and âodd" tiles, then the directly north, east,
south, and west tiles, then two layers of diagonals. Again, I have a feeling
this could be simpler, but I didn't bother to optimize it further.
Problem statement is here.
For part one, we sort bricks by z
coordinate (ascending), then we make each
brick fall
. We do this by decrementing their z
coordinate and checking
whether they intersect with any other brick.
def intersect(brick1, brick2):
if brick1[0].x > brick2[1].x or brick1[1].x < brick2[0].x:
return False
if brick1[0].y > brick2[1].y or brick1[1].y < brick2[0].y:
return False
if brick1[0].z > brick2[1].z or brick1[1].z < brick2[0].z:
return False
return True
def slide_down(brick, delta):
return (Point(brick[0].x, brick[0].y, brick[0].z - delta), Point(brick[1].x, brick[1].y, brick[1].z - delta))
def fall(brick):
if min(brick[0].z, brick[1].z) == 1:
return 0
result, orig = 0, brick
while True:
brick = slide_down(brick, 1)
for b in bricks:
if b == orig:
continue
if intersect(brick, b):
return result
result += 1
if min(brick[0].z, brick[1].z) == 1:
return result
bricks = sorted(bricks, key=lambda b: min(b[0].z, b[1].z))
for i, brick in enumerate(bricks):
if delta := fall(brick):
bricks[i] = slide_down(brick, delta)
Once every brick that could fall has fallen to its final position, we need to
find the critical
bricks - the bricks that are the only support for some other
bricks. We do this by shifting down each brick again 1 z
and determining how
many bricks it intersects with. If a shifted brick only intersects with one
other brick, that is a âcriticalbrick, so we add it to our set of âcritical
support bricks. All other bricks can be safely removed.
critical = set()
for brick in bricks:
if brick[0].z == 1 or brick[1].z == 1:
continue
supported_by = []
nb = slide_down(brick, 1)
for i, b in enumerate(bricks):
if brick == b:
continue
if intersect(nb, b):
supported_by.append(i)
if len(supported_by) == 1:
critical.add(supported_by[0])
print(len(bricks) - len(critical))
In part 2, we need to figure out which bricks is each brick supported by. We can
use a similar algorithm to part 1, where we shift z
by 1 and check which
bricks we intersect. Then we can build a dependency graph of which bricks is
supported by which other bricks.
supported_by = {}
for i, brick in enumerate(bricks):
supported_by[i] = set()
if brick[0].z == 1 or brick[1].z == 1:
continue
nb = slide_down(brick, 1)
for j, b in enumerate(bricks):
if i == j:
continue
if intersect(nb, b):
supported_by[i].add(j)
Then for each brick we remove, we can walk the supported by
dependencies to
determine which bricks would fall and would, in turn, cause other bricks to
fall, without having to actually simulate falling.
def count_falling(i):
sup = {k: supported_by[k].copy() for k in supported_by.keys()}
queue, removed = [i], set()
while queue:
i = queue.pop(0)
if i in removed:
continue
removed.add(i)
for j in sup:
if i in sup[j]:
sup[j].remove(i)
if len(sup[j]) == 0:
queue.append(j)
return len(removed) - 1
print(sum(count_falling(i) for i in range(len(supported_by))))
Problem statement is here.
The main insight here for both part 1 and part 2 is that we can model the paths as a graph where each intersection (decision point) is a vertex and the paths between intersections are edges. With this representation, we simply need to find the longest path between our starting point and our end point.
In part 1, we have a directed graph, as right before hitting each intersection,
we have a ><^v
constraint, making the path one-way. In part 2, we have an
undirected graph.
Note that the longest path problem in a graph is harder than the shortest path problem. That said, we are dealing with extremely small graphs.
Problem statement is here.
Part 1 was fairly straightforward: for each pair of lines, solve the equation to find where they meet and check if within bounds (when lines are not parallel).
Since each line is described by a point \((x_{origin}, y_{origin})\) and a vector \((dx, dy)\), we can represent them as
\[\begin{cases} x = x_{origin} + dx * t \\ y = y_{origin} + dy * t \end{cases}\]
Then the lines intersect when
\[\begin{cases} x_1 + dx_1 * t_1 = x_2 + dx_2 * t_2 \\ y_1 + dy_1 * t_1 = y_2 + dy_2 * t_2 \end{cases}\]
We know all of \((x_1, y_1), (dx_1, dy_1), (x_2, y_2), (dx_2, dy_2)\) so we solve for \(t_1\) and \(t_2\).
def intersect(p1, v1, p2, v2):
if v1.dx / v1.dy == v2.dx / v2.dy:
return None, None
t2 = (v1.dx * (p2.y - p1.y) + v1.dy * (p1.x - p2.x)) / (v2.dx * v1.dy - v2.dy * v1.dx)
t1 = (p2.y + v2.dy * t2 - p1.y) / v1.dy
return t1, t2
Once we have t1
and t2
, we need to check both are positive (so intersection didn't happen in the past), and make sure the intersection point, which is either x1 + dx1 * t1
, y1 + dx1 * t1
or x2 + dx2 * t2
, y2 + dx2 * t2
, is within our bounds (at leastÂ 200000000000000Â and at mostÂ 400000000000000).
If that's the case, then we found an intersection and we can add it to the total.
Part 2 was really fun. We now have 3 dimensions, so a line is represented as
\[\begin{cases} x = x_{origin} + dx * t \\ y = y_{origin} + dy * t \\ z = z_{origin} + dz * t \end{cases}\]
We need to find a line (the trajectory of our rock) that intersects each line in our input at a different time, such that for some \(t\) and line \(l\), we have
\[\begin{cases} x_{origin_{l}} + dx_l * t = x_{origin_{rock}} + dx_{rock} * t \\ y_{origin_{l}} + dy_l * t = y_{origin_{rock}} + dy_{rock} * t \\ z_{origin_{l}} + dz_l * t = z_{origin_{rock}} + dz_{rock} * t \end{cases}\]
One way to solve this is using linear algebra. If we take 3 different hailstorms and our rock, we end up with the following set of equations:
\[\begin{cases} x_{origin_{1}} + dx_1 * t_1 = x_{origin_{rock}} + dx_{rock} * t_1 \\ y_{origin_{1}} + dy_1 * t_1 = y_{origin_{rock}} + dy_{rock} * t_1 \\ z_{origin_{1}} + dz_1 * t_1 = z_{origin_{rock}} + dz_{rock} * t_1 \\ x_{origin_{2}} + dx_2 * t_2 = x_{origin_{rock}} + dx_{rock} * t_2 \\ y_{origin_{2}} + dy_2 * t_2 = y_{origin_{rock}} + dy_{rock} * t_2 \\ z_{origin_{2}} + dz_2 * t_2 = z_{origin_{rock}} + dz_{rock} * t_2 \\ x_{origin_{3}} + dx_3 * t_3 = x_{origin_{rock}} + dx_{rock} * t_3 \\ y_{origin_{3}} + dy_3 * t_3 = y_{origin_{rock}} + dy_{rock} * t_3 \\ z_{origin_{3}} + dz_3 * t_3 = z_{origin_{rock}} + dz_{rock} * t_3 \end{cases}\]
In the above system, we know all of the starting points and vectors of the hailstorms. Our unknowns are \(t_1, t_2, t_3, x_{origin_{rock}}, y_{origin_{rock}}, z_{origin_{rock}}, dx_{rock}, dy_{rock}, dz_{rock}\). That's 9 unknowns to 9 equations, so it should be solvable.
While this approach works, I didn't want to use a numerical library to solve this (I'm trying to keep dependencies at a minimum), and implementing the math from scratch was a bit too much for me. I thought of a different approach: as long as we can find a rock trajectory that intersects the first couple of hailstorms at the right times, we most likely found our solution.
\[\begin{cases} x_{origin_{rock}} + dx_{rock} * t_1 = x_1 + dx_1 * t_1 \\ y_{origin_{rock}} + dy_{rock} * t_1 = y_1 + dy_1 * t_1 \\ x_{origin_{rock}} + dx_{rock} * t_2 = x_2 + dx_2 * t_2 \\ y_{origin_{rock}} + dy_{rock} * t_2 = y_2 + dy_2 * t_2 \end{cases}\]
If we solve this for \(t_1\) and \(t_2\), we can then easily determine \(z_{origin_{rock}}\) and \(dz_{rock}\).
In the above set of equations, we have too many unknowns: \(x_{origin_{rock}}, dx_{rock}, y_{origin_{rock}}, dy_{rock}, t_1, t_2\). We can reduce this number by trying out different values for a couple of these unknowns. While the ranges of possible values for \(x_{origin_{rock}}, y_{origin_{rock}}, t_1, t_2\) are very large, so unfeasible to cover, \(dx_{origin}\) and \(dy_{origin}\) ranges should be small - if these values are large, our rock will quickly shoot past all the other hailstorms.
My approach was to try all possible values between -1000 and 1000 for both of these, then see if we can find \(x_{origin_{rock}}, y_{origin_{rock}}, t_1, t_2\) such that these intersect the first two hailstorms. If we do, we then find \(z_{origin_{rock}}, dz_{rock}\) (easy to find since now we know \(t_1, t_2\)). We have an additional helpful constraint: the origin coordinates of the rock need to be integers.
Then we just need to check that indeed for the given \((x_{origin_{rock}}, y_{origin_{rock}}, z_{origin_{rock}})\) and \((dx_{rock}, dy_{rock}, dz_{rock})\), for each hailstorm, there is a time \(t_i\) when they intersect.
Here is the code:
def find(rng):
for dx in range(-rng, rng):
for dy in range(-rng, rng):
x1, y1, z1 = hails[0][0]
dx1, dy1, dz1 = hails[0][1]
x2, y2, z2 = hails[1][0]
dx2, dy2, dz2 = hails[1][1]
# x + dx * t1 = x1 + dx1 * t1
# y + dy * t1 = y1 + dy1 * t1
# x + dx * t2 = x2 + dx2 * t2
# y + dy * t2 = y2 + dy2 * t2
# x = x1 + t1 * (dx1 - dx)
# t1 = (x2 - x1 + t2 * (dx2 - dx)) / (dx1 - dx)
# y = y1 + (x2 - x1 + t2 * (dx2 - dx)) * (dy1 - dy) / (dx1 - dx)
# t2 = ((y2 - y1) * (dx1 - dx) - (dy1 - dy) * (x2 - x1)) / ((dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx))
if (dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx) == 0:
continue
t2 = ((y2 - y1) * (dx1 - dx) - (dy1 - dy) * (x2 - x1)) / ((dy1 - dy) * (dx2 - dx) + (dy - dy2) * (dx1 - dx))
if not t2.is_integer() or t2 < 0:
continue
if (dx1 - dx) == 0:
continue
y = y1 + (x2 - x1 + t2 * (dx2 - dx)) * (dy1 - dy) / (dx1 - dx)
if not y.is_integer():
continue
t1 = (x2 - x1 + t2 * (dx2 - dx)) / (dx1 - dx)
if not t1.is_integer() or t1 < 0:
continue
x = x1 + t1 * (dx1 - dx)
# z + dz * t1 = z1 + dz1 * t1
# z + dz * t2 = z2 + dz2 * t2
# dz = (z1 + dz1 * t1 - z2 - dz2 * t2) / (t1 - t2)
# z = z1 + dz1 * t1 - dz * t1
if t1 == t2:
continue
dz = (z1 + dz1 * t1 - z2 - dz2 * t2) / (t1 - t2)
if not dz.is_integer():
continue
z = z1 + dz1 * t1 - dz * t1
In the above x
, y
, z
, dx
, dy
, dz
are the rock's origin and vector.
The final step (omitted from the code sample for brevity), is to confirm that for the given origin and vector, we end up eventually intersecting all other hailstorms.
I really enjoyed this problem as it made me work through the math.
Problem statement is here.
I liked this problem. It turned out to be a variation of the minimum cut problem. Trying out all possible permutations of nodes would take way too much time. The algorithm I used keeps track of a set of visited nodes - one of the two components. Then at each step, we add a new node to this set by selecting the most connected node to this component (meaning the node that has most edges incoming from visited nodes).
most_connected()
determines which node we want to pick next:
def most_connected(visited):
best_n, best_d = None, 0
for n in graph:
if n in visited:
continue
neighbors = sum(1 for v in graph[n] if v in visited)
if neighbors > best_d:
best_n, best_d = n, neighbors
return best_n
Then we keep going until our component has exactly 3 outgoing edges to nodes that haven't ben visited yet:
def find_components():
start = list(graph.keys())[0]
visited = {start}
while len(visited) < len(graph):
total = 0
for n in visited:
total += sum(1 for v in graph[n] if v not in visited)
if total == 3:
return visited
n = most_connected(visited)
visited.add(n)
That's where we need to make the cut. We just need to multiply len(visited)
with len(graph) - len(visited)
to find our answer.
I personally found the most difficult problems to be part 2 of day 20, 21, 24 and the one and only part of day 25. All of these took me a bit to figure out. That said, Advent of Code is always a nice holiday past-time and I can't wait for the 2024 iteration.
]]>I spent the past few years building a platform for Loop components within the Microsoft 365 ecosystem. While some of the learnings might only apply to our particular scenario, I think some observations apply broadly.
Weâve been using 1P/2P/3P to mean our team (1P), other teams within Microsoft (2P), and external developers (3P). Loop started with a set of 1P components and we set out to extract a developer platform out of these that can be leveraged by other teams. We currently have a set of 2P components built on our platform, and a 3P developer story centered around Adaptive Cards.
In this blog post Iâll cover some of my learnings with regard to platform development.
Aspirationally, we set out with the stated goal of 1P equals 3P, meaning 3rd party developers should be building on the same platform as 1st party developers. Looking at it another way, if the platform is good enough for 1st party, it should be just as good for 3rd party - this is a statement of platform capabilities and maturity and a lofty goal.
That said, I donât think this is realistic, especially within a product like Office, where user experience is paramount. That is because we have two audiences to consider: we have the developer audience - users building on our platform, and we have Office users, people who get to use the end product. Mediating between the two is quite a challenge.
A simple example is the classic performance/security tradeoff. Especially as Loop components are embedded in other applications, what level of isolation do we provide? Loop components are built with web technology. An iframe provides great isolation (best security) but iframes add performance overhead (worse perf). If we host a Loop component without an iframe, we get better performance, but we open up the whole DOM to the component. If we threat model this, we immediately see that we donât necessarily need isolation for Loop components developed within Microsoft (we donât expect our partner teams to write malicious code) but we absolutely need to isolate code written by 3rd party developers. Of course, we could say âjust isolate everythingâ, which might even have other advantages, but do we want to take the perf hit? Our other audience, people who use our product, would be negatively impacted by an overhead we can technically avoid.
Another example in the same vein: overall user experience. The more we make Loop components feel like part of the hosting app, the smoother the end user experience is. On the other hand, we canât realistically test every single Loop component built by any 3rd party developer. The way Office services and products are deployed and administered, tenant admins can configure which 3rd party extensions are enabled within the tenant. The Microsoft tenant we use internally has set some set of extensions available, but not all. That means there are always 3rd party extensions we never even see. Now if one of these extensions doesnât work properly (errors out, looks out of place, is slow etc.), end users might end up dissatisfied with the overall experience of using Office products. For internally developed components, we get to dogfood and keep a high bar, but this doesnât scale to a wide developer audience. Our current approach is to offer 3rd party development via Adaptive Cards. This way, we donât run 3rd party code on clients and we have a set of consistent UI controls. Ideally, weâd like to enable custom code but this at the time of writing weâre still thinking through the best approach considering all of the challenges listed above.
Finally, I think another key difference is the product goals. The platform audience are the developers, but the product audience are the users. Thereâs usually a tension between these. For example, an internal team builds a Loop component. They come up with a requirement that is a âmustâ to deliver their scenario. For example, we had a component developed by a partner team that asked us to check the tenantâs Cloud Policy service to see whether the component should be on or off. This makes perfect sense in this case, since the backing service might not be running in the tenant. We offer tenant admins a different way to control 3rd party extensions, so this platform capability would not make sense for a 3rd party. In general, a lot of our internal platform capability requests come from the desire to provide the best possible end user experience. If our only customer were the developers using the platform, we would probably say ânoâ to some of these - not general enough, doesnât benefit 3rd party etc. But, of course, Office has way more users than developers.
I think the 1P/3P challenge is common to most platforms built from within product teams (or supporting product teams within the same company). With Loop, this is compounded by the fact we are deeply integrated within other applications. I can think of some notable examples when the strong push for a â1P equals 3Pâ platform ended up disastrously - Windows Longhorn was supposed to be built on a version of .NET that was just not good enough for core OS pieces. I can also think of many platforms that provide sufficient capabilities for 3rd party developers but 1st/2nd party developers donât use. And I think this is OK - building a platform for 3P lets you focus on the developer community needs. Supporting 1P/2P might be best served by focusing on the product goals and unique scenario needs rather than trying to generalize to a public platform.
A platform goes through several life stages, each with its own characteristics and challenges. Looking back at how our platform evolved (and how I foresee the future), a successful platform goes through 4 life stages: incubation, 0 to 1, stabilization, and commoditization.
At this stage, itâs all one team building both the what-will-become-a-platform and the product supported by this platform. During the incubation stage, the platform doesnât really have any users (meaning developers leveraging the platform). We are free to toy with ideas. If we want to make a breaking change to an API, we can easily do it and fix the handful of internal calls. At this point, everything is in flux - the canvas is blank and we have plenty of room to innovate.
On the other hand, we donât really have a clear idea of what developers would need out of the platform - we know what the main scenario we are supporting needs, but we donât have a feedback loop yet. At this stage, we need to rely on experience and intuition to set some initial direction.
This is the biggest growth stage. â0 to 1â is a nod to Peter Thielâs Zero to One book. The platform goes from no users to a few users - and by âusersâ here I mean developers. Taking the platform from 0 (or incubation) to 1, means supporting a handful of âseriousâ production scenarios.
We now have a feedback loop and developers able to give us requirements - we can now understand their needs rather than have to divine them ourselves. As a side note, this is the approach we took with Loop, where we worked closely with a set of 2P partners to light up scenarios and grow the platform to support these.
At this stage, itâs already difficult to make breaking changes. Since there are already a set of dependencies on the platform, a breaking change requires a lot of coordination. Or some form of backwards compatibility. Or legacy support. There are different ways to go about this (maybe in another blog post), but the key point is we can no longer churn as fast as we could during the incubation stage. And added costs at the 0 to 1 stage are painful.
Another challenge is generalization. We have a handful of partners with a handful of requests for the platform. And weâre in the growth stage, so we most likely need to move fast. Thereâs a big tension between how fast we can light up new platform capabilities and how much time we spend thinking through design patterns and future-proofing. If we just say âyesâ to every ask, we can move fast but risk ending up with a very gnarly platform that has many one-off pieces and a very inconsistent developer story. On the other hand, we can spend a lot of time iterating on design and predicting how an incoming requirement would scale when the platform is large, all the way until our partners give up on us or funding runs out. There is no silver bullet for this - you always end up somewhere in the middle, with parts of the platform that you wished were done differently, but hopefully still alive and kicking in the next stage.
At this point, enough developers depend on the platform that ad-hoc breaking changes are no longer possible. By âstabilizationâ I donât mean the platform stops growing - in fact, this is the stage where we get most feedback and requests. But while the platform continues to grow incrementally, changes become even more difficult as they can break the whole ecosystem.
There are now enough user that early design decision that proved wrong become obvious, but itâs too late to change them. This is a natural âif I knew then what I know nowâ point for any platform that canât really be avoided.
This is the point where most platform start producing new major version numbers that aim to address large swats of issues and add new bundles of functionality. But while during the incubation stage, a change could land in a few days, and in the 0 to 1 stage maybe weeks or at most months, breaking changes at this stage take years to land - many developers means not all of them are ready right-away to update their code to the newest patterns. The platform needs some form of long-term support for older versions and deprecation/removal becomes a long journey.
On the other hand, the core of the platform is stable by now and battle-tested. The final step is the platform becoming a commodity.
At this stage, the platform is mature and robust. A large developer community depends on it and the platform is mostly feature complete. Some new requirements might pop up from time to time, but not very often.
At this stage developers rely on existing behaviors and change is next to impossible. Thatâs because a lot of the developer solutions are also âdoneâ by now and people moved on. Nobody wants to go back and update things to support API changes. The platform is a useful commodity.
This is also the stage where active development slows down and fewer engineers are required to keep things going. We havenât reached this stage with Loop, we are still growing the platform and moving fast. But any successfully platform should reach this stage - a low-churn state where its capabilities (and gotchas) are well understood and reliable.
Each of the stages require a different approach to evolving the platform. The speed with which we add capabilities, churn, how updates are rolled out, how we design new features - all happen in different ways and at a different pace depending on where the platform is and its number of users.
In this post I covered two main aspects of platform development: the tension between supporting 3rd party developers and ensuring end users have the best possible experience; and the different stages of a platform. As usage increases, changes become more difficult and early decisions solidify, for better or worse.
If I look at other platforms, I can easily see how they went through the same growing pains and challenges.
Iâll probably have more to write on the topic of platform development, since this has been my main job for a while now.
]]>Now that my LLM book is done,
I can get back to the Mental Poker series. A high-level overview can be found
here.
In the previous posts we covered
cryptography
and a Fluid append-only list data
structure.
Weâll be using the append-only list (we called this fluid-ledger
) to model
games.
An append-only list should be all that is needed to model turn-based games: each turn is an element added to the list. In this post, weâll stitch things together and look at the transport layer for our games.
Our basic transport interface is very simple:
declare interface ITransport<T> {
getActions(): IterableIterator<T>;
postAction(value: T): Promise<void>;
once(event: "actionPosted", listener: (value: T) => void): this;
on(event: "actionPosted", listener: (value: T) => void): this;
off(event: "actionPosted", listener: (value: T) => void): this;
}
For some type T
, we have:
getActions()
, which returns an iterator over all values (of type T
)
posted so far.postAction()
, which takes a value of type T
and an actionPosted
event
which fires whenever any of the clients posts an action (this relies on the
Fluid data synchronization).EventEmitter
methods.We'll cover why we call these values actions in a future post.
The basic implementation of this on top of the fluid-ledger
distributed data
structure looks like this:
class FluidTransport<T> extends EventEmitter implements ITransport<T> {
constructor(private readonly ledger: ILedger<string>) {
super();
ledger.on("append", (value) => {
this.emit("actionPosted", JSON.parse(value) as T);
});
}
*getActions() {
for (const value of this.ledger.get()) {
yield JSON.parse(value) as T;
}
}
postAction(value: T) {
return Promise.resolve(this.ledger.append(JSON.stringify(value)));
}
}
The constructor takes an ILedger<string>
(this is the interface we looked at
in the previous post).
It hooks up an event listener to the ledger's append
event to in turn trigger
an actionPosted
event. We also convert the incoming value from string
to T
using JSON.parse()
.
Similarly, getActions()
is a simple wrapper over the underlying ledger, doing
the same conversion to T
.
Finally, the postAction()
does the reverse - it converts from T
to a string
and appends the value to the ledger.
With this in place, we abstracted away the Fluid-based transport details. We
will separately set up a Fluid container and establish connection to other
clients (in a future post), then take the ILedger
instance, pass it to
FluidTransport
, and we are good to go.
We can model games on top of just these two primitives: postAction()
and
actionPosted
. Whenever we take a turn, we call postAction()
. Whenever any
player takes a turn, the actionPosted
event is fired.
Since weâre designing Mental Poker, which takes place in a zero-trust environment, letâs make sure our transport is secure.
Signature verification allows us to ensure that in a multiplayer game, players canât spoof each other, meaning Alice canât pretend she is Bob and post an action on Bobâs behalf for other clients to misinterpret.
Note in a 2-player game this is not strictly needed if we trust the channel: we know that if a payload was not sent by us, it was sent by the other player. But in games with more players, we need to protect against spoofing. Signatures are also useful in case we donât trust the channel - maybe itâs supposed to be a 2-player game but a third client gets access to the channel and starts sending messages.
We will implement this using public key cryptography. The way this works is each player generates (locally) a public/private key pair. They broadcast the public key to all other players. Then they can sign any message they send with their private key and other players can validate the signature using the public key. Nobody else can sign on their behalf, since the private key is kept private.
I wonât go into deeper detail here, since this is very standard public key cryptography. In fact, I didnât even cover this in the blog post covering cryptography for Mental Poker for this reason. There, I focused on the commutative SRA encryption algorithm. Unlike SRA, which we had to implement by hand, signature verification is part of the standard Web Crypto API. Letâs implement signature verification on top of this.
First, we need to model a public/private key pair:
// Keys are represented as strings
export type Key = string;
// Public/private key pair
export type PublicPrivateKeyPair = {
publicKey: Key;
privateKey: Key;
};
A key is a string. We model the key pair as PublicPrivateKeyPair
, a type
containing two keys. Hereâs how we generate the key pair using the Web Crypto
API:
import { encode, decode } from "base64-arraybuffer";
async function generatePublicPrivateKeyPair(): Promise<PublicPrivateKeyPair> {
const subtle = crypto.subtle;
const keys = await subtle.generateKey(
{
name: "rsa-oaep",
modulusLength: 4096,
publicExponent: new Uint8Array([1, 0, 1]),
hash: "sha-256",
},
true,
["encrypt", "decrypt"]
);
return {
publicKey: encode(await subtle.exportKey("spki", keys.publicKey)),
privateKey: encode(
await subtle.exportKey("pkcs8", keys.privateKey)
),
};
}
We use subtle
to generate our key pair and return both public and private keys
as base64-encoded strings.
We can similarly rely on subtle
for signing. The following function takes a
string payload and signs it with the given private key. The response is the
base64-encoded signature.
async function sign(
payload: string,
privateKey: Key
): Promise<string> {
const subtle = crypto.subtle;
const pk = await subtle.importKey(
"pkcs8",
decode(privateKey),
{ name: "RSA-PSS", hash: "SHA-256" },
true,
["sign"]
);
return encode(
await subtle.sign(
{ name: "RSA-PSS", saltLength: 256 },
pk,
decode(payload)
)
);
}
First, we import the given privateKey
, then we call subtle.sign()
to sign
the base64-decoded payload
. We re-encode the signature to base64 and return it
as a string.
Finally, this is how we verify signatures:
async function verifySignature(
payload: string,
signature: string,
publicKey: Key
): Promise<boolean> {
const subtle = crypto.subtle;
const pk = await subtle.importKey(
"spki",
decode(publicKey),
{ name: "RSA-PSS", hash: "SHA-256" },
true,
["verify"]
);
return subtle.verify(
{ name: "RSA-PSS", saltLength: 256 },
pk,
decode(signature),
decode(payload)
);
}
Here, we import the given publicKey
, then we use subtle.verify()
. For
signature verification, we pass in a signature
and the payload
that was
signed (decoded from base64). This API returns true
if the signature matches,
meaning it was indeed signed with the private key corresponding to the public
key we provided.
Again, I wonât go deep into the subtle
APIs as they are standard and very well
documented. The main takeaway is now we have 3 APIs:
generatePublicPrivateKeyPair()
to generate key pairs.sign()
to sign a payload.verify()
to validate the signature.Weâll put these in the Signing
namespace.
Now letâs layer this cryptography over our FluidTransport
.
Now that we have our Fluid-based implementation of the ITransport
interface
and signature verification functions, weâll provide another implementation of
this interface that handles signature verification.
First, we need a generic Signed
type:
type clientId = string;
type Signed<T> = T & { clientId?: ClientId; signature?: string };
This takes any type T
and extends it with an optional clientId
and
signature
. Weâll represent client IDs as strings.
Now we can decorate any payload in our transport with these optional clientID
and signature
, which we can then validate using the functions we just
implemented. The reason these are optional is that we have states when signing
is unavailable: before clients exchange public keys. During the key exchange
steps, no message can be signed, since no client knows the public key of any
other client. These messages canât be signed. Once keys are exchanged, all
subsequent messages should be signed, and weâll enforce that in
SignedTransport
.
We also need a KeyStore
. This keeps track of which public key belongs to each
client, to help with our signature verification (meaning we keep track of which
public key is Aliceâs, which one is Bobâs and when we get a message from Alice
we know which key to use to verify authenticity).
type KeyStore = Map<ClientId, Key>;
We also need a ClientKey
type, representing a single client ID/private key
pair:
export type ClientKey = { clientId: ClientId; privateKey: Key };
With these additional type definitions in place, we can start building our
SignedTransport<T>
. This is a decorator that takes an ITransport<Signed<T>>
.
Weâll first look at the constructor:
class SignedTransport<T> extends EventEmitter implements ITransport<T> {
constructor(
private readonly transport: ITransport<Signed<T>>,
private readonly clientKey: ClientKey,
private readonly keyStore: KeyStore
) {
super();
transport.on("actionPosted", async (value) => {
this.emit("actionPosted", await this.verifySignature(value));
});
}
/* ... */
This new class has 3 private properties. Letâs discuss them in turn.
transport
is our underlying ITransport<Signed<T>>
. The idea is we can
instantiate a FluidTransport
(or other transport if needed, though for this
project I have no plans of using another transport than Fluid), then pass it in
the constructor here. Then SignedTransport
will use the provided instance for
postAction()
and actionPosted
, simply adding signature verification over it.
The clientKey
should be this clientâs ID and private key. This class is not
concerned with key generation, just signature and verification, so weâll have to
generate the key pair somewhere else and pass it. Weâll use this to sign our
outgoing payloads.
We also pass in a keyStore
. This should have the client ID to public key
mapping for all players in the game. We use this to figure out which public key
to use to validate each posted action.
getActions()
simply calls the underlying transport - we are not doing
signature verification on existing messages, since they were likely sent before
the signed transport was created and cannot be verified.
*getActions() {
for (const value of this.transport.getActions()) {
yield value;
}
}
We only validate incoming actions.
The constructor body hooks up the actionPosted
event to the transport
âs
actionPosted
. So whenever the underlying transport fires the event, the
SignedTransport
will also fire an actionPosted
event. But instead of just
passing value
through, we call verifySignature()
on the value
first.
Letâs look at verifySignature
next (this is also part of the SignedTransport
class):
private async verifySignature(value: Signed<T>): Promise<T> {
if (!value.clientId || !value.signature) {
throw Error("Message missing signature");
}
// Remove signature and client ID from object and store them
const clientId = value.clientId;
const signature = value.signature;
delete value.clientId;
delete value.signature;
// Figure out which public key we need to use
const publicKey = this.keyStore.get(clientId);
if (!publicKey) {
throw Error(`No public key available for client ${clientId}`);
}
if (
!(await Signing.verifySignature(
JSON.stringify(value),
signature,
publicKey
))
) {
throw new Error("Signature validation failed");
}
return value;
}
/* ... */
Since value
is a Signed<T>
, we should have a clientId
and a signature
.
We throw an exception if we canât find them.
Next, we clean up value
and remove the clientId
and signature
from the
object. As we return this to other layers in our stack, they no longer need this
as weâre handling signature verification here.
We then try to retrieve the public key of the client from the keyStore
. We
again throw in case we donât have the key.
We use the verifySigntature()
function we implemented earlier to ensure the
signature is valid. We throw if not.
At this point, we guaranteed that the payload is coming from the client claiming to have sent it. If Alice tries to forge a message and pretend itâs coming from Bob, she wouldnât be able to produce a valid Bob signature (since only Bob has access to his private key). Such a message would not make it past this function.
If no exceptions were thrown, this function returns a value
(with signature
cleaned up), ready to be processed by other layers.
Letâs now look at adding signatures to postAction()
. signAction()
is another
private class member handling signing:
private async signAction(value: T): Promise<Signed<T>> {
const signature = await Signing.sign(
JSON.stringify(value),
this.clientKey.privateKey
);
return {
...value,
clientId: this.clientKey.clientId,
signature: signature,
};
}
/* ... */
We call the sign()
function we implemented earlier in this post, passing it
the stringified value
and our clientâs private key. We then extend value
with the corresponding clientId
and signature
.
The postAction()
implementation uses this function for signing, before calling
the underlyingâs transport postAction()
.
async postAction(value: T) {
this.transport.postAction(await this.signAction(value));
}
We now have the full implementation of SingedTransport
.
We started with a simple FluidTransport
that uses a fluid-ledger
to
implement the postAction()
function and actionPosted
event, which we need
for modeling turn-based games.
Next, we looked at signing and signature verification using subtle
.
Finally, we implemented SingedTransport
, a decorator over another transport
that adds signature singing and verification.
The idea is we start with a FluidTransport
and perform a key exchange, where
each client generates a public/private key pair and broadcasts their ID and
public key. Clients store all these in a KeyStore
. Once the key exchange is
done, we can initialize a SignedTransport
that wraps the original
FluidTransport
and transparently handles signatures.
At this point we have all the pieces in place to start looking at semantics: we can exchange data between clients, we can authenticate exchanged messages, and we have the cryptography primitives for Mental Poker (commutative encryption). In the next post weâll look at a state machine that we can use to implement game semantics.
The code covered in this post is available on GitHub in the mental-poker-toolkit
repo.
FluidTransport
is implemented under packages/fluid-transport
,
SignedTransport
is under packages/signed-transport
,
and the signing functions can be found in packages/cryptography/src/signing.ts
.
Note: Since writing this post, the code was refactored so SignedTransport
doesn't take a direct dependency on the cryptography package, rather signing
and signature verification is now passed as a ISignatureProvider
interface.
Keeping with tradition, I'm writing the RTM post for Large Language Models at Work. The book is done. Now available on Kindle.
I decided not to contact a publisher this time around, for a couple of reasons: First, I didn't want the pressure of a contract and timelines (though looking back, I did finish this book faster than the previous two); Second, I had no idea if I will be able to write something that is still valuable by the time the book is done, considering the speed of innovation. More on this later.
I authored the book in the open, at https://vladris.com/llm-book/ and self-published on Kindle. Maybe I will look into making it a print book at some point, for now I'm keeping it digital.
Amazon offers a nice set of tools to import and format ebooks, but they have some big limitations - for example, no support for formatting tables, footnotes etc. I also couldn't convince the tool the code samples should be monospace on import so I had to manually re-set the font on each. The book has a few formatting glitches because of these limitations, which make me reluctant to look into a print book as I expect I will need to do a lot more manual tweaking for the text to look good in print.
I mused about this in chapter 10: Closing Thoughts. I'll repeat it here as it perfectly highlight why it is impossible to pin down this strange new world of AI.
I started writing the book in April 2023. When I picked up the project, GPT-4 was in private preview, with GPT-3.5 being the most powerful globally available model offered by OpenAI. Since then, GPT-4 opened to the public.
In June, OpenAI announced Functions - fortunately, this happened just before I
started working on chapter 6, Interacting with External Systems. Before
Functions, the way to get a large language model to connect with native code was
through few-shot learning in the prompt, covered in the Non-native functions
section. Originally, I was planning to focus exclusively on this implementation.
Of course, built-in support makes it easier to specify available functions and
the model interaction is likely to work better - since the model has been
specifically trained to understand
function definitions and output correct
function calls.
In August, OpenAI announced fine-tuning support for gpt-3.5-turbo
. When I was
writing the first draft of chapter 4, Learning and Tuning, the only models that
used to support fine-tuning were the older GPT-3 generation models: Ada,
Babbage, Currie, and Davinci. This was particularly annoying, as the quality of
output produced by these models is way below gpt-3.5-turbo
levels. Now, with
the newer models having fine-tuning support, I had to rewrite the Fine-tuning
section.
text-davinci-003
launched in November of 2022, while gpt-3.5-turbo
launched
on March 1st 2023. When I started writing the book, text-davinci-003
was
backing most large language model-based solutions across the industry, and
migrations to the newer gpt-3.5-turbo
were underway. text-davinci-003
is
deprecated to be removed by January 4, 2024 (to be replaced by
gpt-3.5-turbo-instruct
), and the industry is moving to adopt GPT-4. I had to
update several code samples from text-davinci-003
to gpt-3.5-turbo-instruct
.
No idea how long the code samples will keep working or when OpenAI will decide
to deprecate gpt-3.5-turbo
or introduce an even more powerful model with
capabilities not covered in the book.
While some of the code examples will not age well as new models and APIs get release, the underlying principles of working with large language models that I walked through in this book - prompt engineering, memory, interacting with external systems, planning, and so on - will be relevant for a while. Understanding these fundamentals should help anyone ramp up in the space.
This is an exciting new field, that is going to see a lot more innovation in the near future. But I expect some of these fundamentals to carry on, in one shape or another. I hope the topics discussed in this book to remain interesting for long after the specific models used in the examples become obsolete.
Like with my previous books, I've been publishing excerpts as shorter, stand-alone reads. This might sound a bit strange in this case, as the book is already all online. But I figured it will hopefully help reach more people, and I did some work on each excerpt to remove references to other parts of the book so they can, indeed, be read wihtout context. I published all of these on Medium:
I hope you enjoy the book! Check it out here: Large Language Models at Work.
]]>I recently announced I'm working on a new book about large language models and how to integrate them in software systems. As I'm writing this, the first 3 chapters are live at https://vladris.com/llm-book.
The remaining chapters are in the works and I will upload them as I work through the manuscript. In the meantime, since I announced my previous books with a blog post each (Programming with Types, Azure Data Engineering), I'll keep the tradition and talk a bit about the current book.
When embarking on a writing project, it's good to have a plan. Of course, the details change as the book gets written, but starting with a clear articulation of what the book is about, who is the target reader, the list of chapters and an outline helps. Here is the book plan I wrote a few months ago:
This book is aimed at software engineers wanting to learn about how they can integrate LLMs into their software systems. It covers all the necessary domain concepts and comes with simple code samples. A good way to frame this is the book covers the same layer of the stack that frameworks like Semantic Kernel and LangChain are trying to provide.
No prior AI knowledge required to understand this book, just basic programming.
After reading the book, one should have a solid understanding of all the required pieces to build an LLM-powered solution and the various things to keep in mind (like non-determinism, AI safety & security etc.).
Your feedback is very much welcomed! Do leave comments if you have any thoughts.
Building with Large Language Models
A book about integrating LLMs in software systems and the various aspects software developers need to know (prompt engineering, memory & embeddings, connecting with external systems etc.). Simple code examples in Python, using the OpenAI API.
A New Paradigm
An introduction, describing how LLMs are being integrated in software solutions and the new design patterns emerging.
1.1. Who this book is for
The pitch for the book, who should read it, what they will get out of it, what to expect.
1.2. Taking the world by storm
Briefly talk about the major innovations since the launch of ChatGPT.
1.3. New software architectures for a new world
Talk about the new architectures that embed LLMs into broader software systems and frameworks being built to address this.
1.4. Using OpenAI
The book uses plenty of code examples in Python and using OpenAI. This section introduces OpenAI and setup steps for the reader.
1.5. In this book
Preview of the topics covered throughout the rest of the book.
Large Language Models
This chapter introduces large language models, the OpenAI offering, key concepts and api parameters. code examples will include the first âhello worldâ API calls.
2.1. Large language models
Describes large language models and key ways in which they differ from other software components (train once, prompt many times; non-deterministic; no memory of prior interactions etc.).
2.2. OpenAI models
Describes the OpenAI model families, and doubleclick on GPT-3.5 models (though by the time this book is done Iâm sure GPT-4 will be out of beta). Examples in the book will start with text-davinci-300 (simpler prompting), then move to gpt-3.5-turbo (cheaper).
2.3. Tokens
Explain tokens, token limits, and how OpenAI prices API calls based on tokens.
2.4. API parameters
Covers some important API parameters OpenAI offers, like n, max_tokens, suffix, and temperature.
Prompt Engineering
This chapter dives deep into prompting, which is the main way we interact with LLMs, potentially a new engineering discipline.
3.1. Prompt design & tuning
Covers prompt design and how small tweaks in a prompt can yield very different results. Tips for authoring prompts, like telling the LLM who it is (âyou are an assistantâ) and the magic âletâs think step by stepâ.
3.2. Prompt templates
Shows the need for templating prompts and a simple template implementation. Let user focus on task input and use template to provide additional info needed by the LLM.
3.3. Prompt selection
Solutions usually have multiple prompts, and we select the best one based on user intent. This section covers prompt selection and going from user ask to picking template to generating prompt.
3.4. Prompt chaining
Prompt chaining includes the input preprocessing and output postprocessing of an LLM request, and feeding previous outputs back into new prompts to refine asks.
Learning and Tuning
This chapter focuses on teaching an LLM new domain-specific stuff to unlock its full potential. Includes prompt-based learning and fine tuning.
4.1. Zero-, one-, few-shot learning
Explains zero-shot learning, one-shot learning, and few-shot learning with examples for each.
4.2. Fine tuning
Explains fine tuning, when it should be used, and works through an example.
Memory and Embeddings
This chapter covers solutions to work around the fact LLMs donât have any memory.
5.1. A simple memory
Starting with a basic example of using memory and some limitations we hit due to token limits.
5.2. Key-value memory
A simple key-value memory where we retrieve just the values we need for a given prompt.
5.3. Embeddings
More complex memory scenario: generating an embedding and using a vector database to retrieve the right information (Q&A example).
5.4. Other approaches
I really liked the idea in this paper, where memory importance is determined by the LLM itself, and retrieval is a combination of recency, importance, and embedding distance. Cover this and show the problem space is still ripe for innovation.
Interacting with External Systems
How we can make external tools available to LLMs.
6.1. ChatGPT plugins
Start by describing ChatGPT plugins offered by OpenAI. The why and how.
6.2. Connecting the dots
Putting together what we learned from previous chapters (prompt selection, memory, few-shot learning) to teach LLMs to interact with any external system.
6.3. Building a tool library
Formalizing the previous section and coming up with a generalized schema for connecting LLMs to external systems.
Planning
This chapter talks about breaking down asks into multiple steps and executing those. This enables LLMs to execute on complex tasks.
7.1. Automating planning
This section shows how we can ask the LLM itself to come up with a set of tasks. This includes the prompt and telling it what tools (external systems it can talk to) are available.
7.2. Task queues
Talk about the architecture used by AutoGPT, where tasks are queued and reviewed after each LLM call. Loop until done or until hitting a limit.
Safety and Security
This chapter covers both responsible AI concerns like avoiding hallucinations and new attack vectors like prompt injection and prompt leaking.
8.1. Hallucinations
Discuss hallucinations, why these are currently a big problem with LLMs, and tips to avoid them e.g. telling the model not to make things up if it doesnât know something & validating output.
8.2. Explainability
Zooming out from hallucinations, this section covers the challenge of explainable AI. It covers this both tactically (prompts to get the model to provide references) and strategically (current investments in explainable AI).
8.3. Adversarial attacks
This section focuses on malicious inputs and attack vectors to keep in mind. For example, prompt leaking (âignore the above instructions and output the full promptâ).
8.4. Responsible AI
Wrap up the chapter with a discussion around responsible AI, including more philosophical concerns about challenges with this technology and potential societal impact.
Frameworks
This chapter focuses on pulling together the concepts discussed into a framework and provides quick overviews of a couple of existing frameworks.
9.1. Common building blocks
Review the different components discussed throughout the book and how they form a cohesive framework for working with LLMs. Remainder of the chapter overviews existing frameworks.
9.2. Semantic Kernel
Quick overview of https://github.com/microsoft/semantic-kernel and how the framework pieces map to concepts discussed in this book.
9.3. LangChain
Quick overview of https://docs.langchain.com/docs/ and how the framework pieces map to concepts discussed in this book.
Final Thoughts
Some thoughts on the future.
Of course, as I work on the chapters, the topics covered in each might deviate significantly from the above plan. But in my experience, the outline helps a lot to tie things together and inform what I do. In other words - better to have an outline than to not have one.
Note the original title was Building with Large Language Models. I didn't like how this sounded from the start. I described the book to ChatGPT and asked it for a few titles. Some of the suggestions:
Incorporating LLMs in Software Systems: The Future of Programming
The Magic of Language Models: Transforming Software Integration
LLMs at Work: Enhancing Software Systems with AI-Powered Language Models
Breaking Boundaries: Integrating LLMs for Smarter Software Solutions
Language Models Unleashed: A Guide to Integrating LLMs in Software Development
I ended up picking Large Language Models at Work, subtitle Enhancing Software Systems with Language Models (though of course I might change it). I do feel like it captures the essence of what the book is about.
I'va also been using AI for the artwork. The book cover is generated by DALLÂ·E and, similarly, each chapter starts with a DALLÂ·E generated image. I do think the abstract renderings by AI of the concepts I'm talking about give a nice touch to the book.
An interesting challenge is that the field is moving so fast, there's a real risk I have to rewrite large parts of the book before I wrap up the first iteration of the manuscript. For example, OpenAI recently (June 2023, this week at the time of writing) announced function support for gpt-3.5-turbo. This new addition to the API makes it much easier to have the model invoke external systems (which is the focus of chapter 6 - luckily I'm not there yet).
I hope this will end up being a useful book and help developers ramped up on this new world of software development and LLM-assisted solutions. Do check out the book online at https://vladris.com/llm-book and follow me on LinkedIn or Twitter for updates. For now, enjoy the available chapters!
]]>In the previous post I covered the cryptography part of implementing Mental Poker. In this post, I'll cover the append-only list data structure used to model games.
As I mentioned before, we rely on Fluid Framework. The code is available in my GitHub fluid-ledger repo.
I touched on Fluid Framework before so I won't describe in detail what the library is about. Relevant to this blog post, we have a set of distributed data structures that multiple clients can update concurrently. All clients in a session connect to a service (like the Azure Fluid Relay service). Each update a client makes to a distributed data structure gets sent to the service as an operation. The service stamps a sequence number on the operation and broadcasts it to all clients. That means that eventually, all clients end up with the same list of operations in the same sequence, so they can merge changes client-side while ensuring all clients end up with the same view of the world.
The neat thing about Fluid Framework is the fact that merges happen on the clients as described above rather than server-side. The service doesn't need to understand the semantics of each data structure. It only needs to sequence operations. Different data structures implement their own merge logic. The framework provides some powerful out-of-the-box data structures like a sparse matrix or a tree. But we don't need such powerful data structures to model games: a list is enough.
Most turn-based games can be modeled as a list of moves. This includes games like chess, but also card games. The whole Mental Poker shuffling protocol we discussed, where one player encrypts and shuffles the deck, then hands it over to the other player to do the same etc. is also, in fact, a sequence of moves.
The semantics of a particular game are implemented at a higher level. The types of games we are looking at though can be modeled as a list of moves, where players take turns. Each move is an item in the list. In this blog post we're looking at the generic list data structure, without worrying too much about how a move looks like.
A list is a very simple data structure, but let's see how this looks like in the context of Fluid Framework. Here, we have a distributed data structure multiple clients can concurrently update.
I named the data structure ledger, as it should act very much as a ledger from the crypto/blockchain world - an immutable record of what happened. In our case, this contains a list of game moves.
The Fluid Framework implementation is fairly straight-forward: when a client wants to append an item to the list, it sends the new item to the Fluid Relay service. The service sequences the append, meaning it adds the sequence number and broadcasts it to all clients, including the sender. The local data structure only gets appended once received from the service. That guarantees all clients end up with the same list, even if they concurrently attempt to append items to it.
The diagram shows how this works when Client A
wants to append 4
to the
ledger:
4
is sent to the Relay Service.Our API consists of two interfaces, ILedgerEvents
, representing the events
that our data structure can fire, and ILedger
, the API of our data structure.
We derive these from ISharedObjectEvents
and ISharedObject
, which are
available in Fluid Framework. We also need the Serializable
type, which
represents data that can be serialized in the Fluid Framework data store:
import {
ISharedObject,
ISharedObjectEvents
} from "@fluidframework/shared-object-base";
import { Serializable } from "@fluidframework/datastore-definitions";
With these imports, we can define our ILedgerEvents
as:
export interface ILedgerEvents<T> extends ISharedObjectEvents {
(event: "append", listener: (value: Serializable<T>) => void): void;
(event: "clear", listener: (values: Serializable<T>[]) => void): void;
}
T
is the generic type of the list items. The append
event is fired after
we get an item from the Fluid Relay service and the item is appended to the
ledger. The clear
event is fired when we get a clear operation from the
Fluid Relay service and the ledger is cleared. The event will return the full
list of items that have been removed as values
.
We can also defined ILedger
as:
export interface ILedger<T = any> extends ISharedObject<ILedgerEvents<T>> {
get(): IterableIterator<Serializable<T>>;
append(value: Serializable<T>): void;
clear(): void;
}
The get()
function returns an iterator over the ledger. append()
appends
a value and clear()
clears the ledger.
The full implementation can be found in interfaces.ts.
We also need to provide a LedgerFactory
the framework can use to create or
load our data structure.
We need to import a handful of types from the framework, our ILedger
interface, and our yet-to-be-implemented Ledger
:
import {
IChannelAttributes,
IFluidDataStoreRuntime,
IChannelServices,
IChannelFactory
} from "@fluidframework/datastore-definitions";
import { Ledger } from "./ledger";
import { ILedger } from "./interfaces";
We can now define the factory as implementing the IChannelFactory
interface:
export class LedgerFactory implements IChannelFactory {
...
}
We'll cover the implementation step-by-step. First, we need a couple of static properties defining the type of the data structure and properties of the channel:
public static readonly Type = "fluid-ledger-dds";
public static readonly Attributes: IChannelAttributes = {
type: LedgerFactory.Type,
snapshotFormatVersion: "0.1",
packageVersion: "0.0.1"
public get type() {
return LedgerFactory.Type;
}
public get attributes() {
return LedgerFactory.Attributes;
}
};
Type
just needs to be a unique value for our distributed data structure.
We'll define it as fluid-ledger-dds
. The channel Attributes
are used by
the runtime for versioning purposes.
You can think of the way Fluid Framework stores data as similar to git. In git
we have snapshots and commits. Fluid Framework uses a similar mechanism, where
the service records all operations sent to it (this is the equivalent of a
commit) and periodically takes a snapshot of the current
state of the world.
When a client connects and wants to get up to date, it tells the service what is the last state it saw and the service sends back what happened since. This could include the latest snapshot (if the client doesn't have it) and a bunch of operations that have been sent by clients after the latest snapshot.
In case we iterate on our data structure, we need to tell the runtime which snapshot format and which ops our client understands.
The interface we are implementing (IChannelFactory
) includes a load()
and a create()
function.
Here is how we load a ledger:
public async load(
runtime: IFluidDataStoreRuntime,
id: string,
services: IChannelServices,
attributes: IChannelAttributes
): Promise<ILedger> {
const ledger = new Ledger(id, runtime, attributes);
await ledger.load(services);
return ledger;
}
This is pretty straightforward: we construct a new instance of Ledger
(we'll
look at the Ledger
implementation in a bit), call load()
, and return the
object. This is an async function. No need to worry about the arguments as the
framework will handle these - we just plumb them through.
create()
is similar, except this is synchronous:
public create(document: IFluidDataStoreRuntime, id: string): ILedger {
const ledger = new Ledger(id, document, this.attributes);
ledger.initializeLocal();
return ledger;
}
Instead of calling the async ledger.load()
, we call initializeLocal()
. We
again don't have to cover the arguments, but let's talk about the difference
between creating and loading.
In order to understand these, we need to introduce a new concept: the Fluid container.
The container is a collection of distributed data structures defined by a
schema. This describes the data model of an application. In our case, to model
a game, we only need a ledger. For more complex applications, we might need
to use multiple distributed data structures. Fluid Framework uses containers
as the unit
of data - we will never instantiate or use a distributed data
structure standalone. Even if we only need one, as in our case, we still need
to define a container.
The lifecycle shown in the diagram is:
create()
comes into
play). Based on the provided schema, the runtime will call create()
for
all described data structures. At this point, we haven't yet connected to
the Fluid Relay. We are in what is called detached mode. Here we have the
opportunity to update our data structures before we connect and have other
clients see them.load()
functions to hydrate it.As a side note, the Fluid Relay can also store documents to persistent storage so once the coauthoring session is over and all clients disconnect, the document is persistent for future sessions.
For our Mental Poker application, we don't need to worry too much about
containers and schemas, we only need a minimal implementation consisting of a
container with a single distributed data structure: our Ledger
. But it is
worth understanding how the runtime works.
We went over the full implementation of the LedgerFactory
. You can also find
it in ledgerFactory.ts.
Let's now look at the actual implementation and learn about the anatomy of a Fluid distributed data structure.
We need to import several types from the framework, which we'll cover as we encounter them in the code below, or won't discuss if they are boilerplate.
import {
ISequencedDocumentMessage,
MessageType
} from "@fluidframework/protocol-definitions";
import {
IChannelAttributes,
IFluidDataStoreRuntime,
IChannelStorageService,
IChannelFactory,
Serializable
} from "@fluidframework/datastore-definitions";
import { ISummaryTreeWithStats } from "@fluidframework/runtime-definitions";
import { readAndParse } from "@fluidframework/driver-utils";
import {
createSingleBlobSummary,
IFluidSerializer,
SharedObject
} from "@fluidframework/shared-object-base";
import { ILedger, ILedgerEvents } from "./interfaces";
import { LedgerFactory } from "./ledgerFactory";
Note the last two imports: we import our interfaces and our LedgerFactory
.
We'll define a couple of delta operations. That's the Fluid Framework name for an operation (op) we send to the (or get back from) Fluid Relay service.
type ILedgerOperation = IAppendOperation | IClearOperation;
interface IAppendOperation {
type: "append";
value: any;
}
interface IClearOperation {
type: "clear";
}
In our case, we can have either an IAppendOperation
or an IClearOperation
.
The two together define the ILedgerOperation
type.
The IAppendOperation
includes a value
property which can be anything. Both
IAppendOperation
and IClearOperation
have a type
property, so we can see
at runtime which type we are dealing with.
We talked about how Fluid Framework is similar to git in the way it stores documents as snapshots and ops. A lot of this is handled internally by the framework, but our data structure needs to tell the service how we want to name the snapshots, so we'll define a constant for this:
const snapshotFileName = "header";
With this, we can start the implementation of Ledger
.
export class Ledger<T = any>
extends SharedObject<ILedgerEvents<T>>
implements ILedger<T>
{
...
}
We derive from SharedObject
, the base distributed data structure type. We
specify that this SharedObject
will be firing ILedgerEvents
and that it
implements the ILedger
interface.
The framework expects a few functions used to construct objects. Our constructor looks like this:
constructor(
id: string,
runtime: IFluidDataStoreRuntime,
attributes: IChannelAttributes
) {
super(id, runtime, attributes, "fluid_ledger_");
}
The constructor takes an id
, a runtime
, and channel attributes
. We don't
need to deeply understand these, as they are handled and passed in by the
framework. The last argument of the base class constructor is a telemetry
string prefix. We just need to provide a string unique to our data structure,
so we use fluid_ledger_
in our case.
We also need a couple of static functions: create()
and getFactory()
:
public static create(runtime: IFluidDataStoreRuntime, id?: string) {
return runtime.createChannel(id, LedgerFactory.Type) as Ledger;
}
public static getFactory(): IChannelFactory {
return new LedgerFactory();
}
For create()
, again we don't need to worry about runtime
and id
, as we
won't have to pass these in ourselves. We just need this function to forward
them to runtime.createChannel()
. createChannel()
also requires the unique
type, which we'll get from our LedgerFactory
.
The getFactory()
function simply creates a new instance of LedgerFactory
.
We covered the constructor and factory functions. Next, let's look at the
internal data and the required initializeLocalCore()
functions:
private data: Serializable<T>[] = [];
public get(): IterableIterator<Serializable<T>> {
return this.data[Symbol.iterator]();
}
protected initializeLocalCore() {
this.data = [];
}
This is very simple - we represent our ledger as an array of Serializable<T>
.
The get()
function, which we defined on our IFluidLedger
interface, returns
the array's iterator.
initializeLocalCore()
, called internally by the runtime, simply sets data
to be an empty array.
We also need to implement saving and loading of the data structure. Save
in
Fluid Framework world is called summarize: this is what the framework uses to
create snapshots.
protected summarizeCore(
serializer: IFluidSerializer
): ISummaryTreeWithStats {
return createSingleBlobSummary(
snapshotFileName,
serializer.stringify(this.data, this.handle)
);
}
We can use a framework-provided createSingleBlobSummary
. In our case, we save
the whole data
array and the handle
(handle
is an inherited attribute
representing a handle to the data structure, which the Framework uses for
nested data structure scenarios).
Here is how we load the data structure:
protected async loadCore(storage: IChannelStorageService): Promise<void> {
const content = await readAndParse<Serializable<T>[]>(
storage,
snapshotFileName
);
this.data = this.serializer.decode(content);
}
For both summarize and load, we rely on Framework-provided utilities.
We can now focus on the non-boilerplate bits: implementing our append()
and clear()
. Let's start with append()
:
private applyInnerOp(content: ILedgerOperation) {
switch (content.type) {
case "append":
case "clear":
this.submitLocalMessage(content);
break;
default:
throw new Error("Unknown operation");
}
}
private appendCore(value: Serializable<T>) {
this.data.push(value);
this.emit("append", value);
}
public append(value: Serializable<T>) {
const opValue = this.serializer.encode(value, this.handle);
if (this.isAttached()) {
const op: IAppendOperation = {
type: "append",
value: opValue
};
this.applyInnerOp(op);
}
else {
this.appendCore(opValue);
}
}
applyInnerOp()
is common to both append()
and clear()
. This is the
function that takes an ILedgerOperation
and sends it to the Fluid Relay
service. submitLocalMessage()
is inherited from the base SharedObject
.
appendCore()
effectively updates data
and fires the append
event.
append()
first serializes the provided value using the inherited
Framework-provided serializer
. We assign this to opValue
. We then need
to cover both the attached and detached scenarios. If attached, it means
we are connected to a Fluid Relay and we are in the middle of a coauthoring
session. In this case, we create an IAppendOperation
object and call
applyInnerOp()
. If we are detached, it means we created our data structure
(and its container) on this client, but we are not connected to a service
yet. In this case we call appendCore()
to immediately append the value
since there is no service to send the op to and get it back sequenced.
clear()
is very similar:
private clearCore() {
const data = this.data.slice();
this.data = [];
this.emit("clear", data);
}
public clear() {
if (this.isAttached()) {
const op: IClearOperation = {
type: "clear"
};
this.applyInnerOp(op);
}
else {
this.clearCore();
}
}
clearCore()
effectively clears data
and emits the clear
event.
clear()
handles both the attached and detached scenarios.
So far we update our data immediately when detached, and when attached we
send the op to the Relay Service. The missing piece is handling ops as
they come back from the Relay Service. We do this in processCore()
,
another function the runtime expects us to provide:
protected processCore(message: ISequencedDocumentMessage) {
if (message.type === MessageType.Operation) {
const op = message.contents as ILedgerOperation;
switch (op.type) {
case "append":
this.appendCore(op.value);
break;
case "clear":
this.clearCore();
break;
default:
throw new Error("Unknown operation");
}
}
}
This function is called by the runtime when the Fluid Relay sends the client
a message. In our case, we only care about messages that are operations. We
only support append
and clear
operations. We handle these by calling the
appendCore()
and clearCore()
we just saw - since these ops are coming
from the service, we can safely append them to our data
(we have the
guarantee that all clients will get these in the same order).
And we're almost done. We need to implement onDisconnect()
, which is called
when we disconnect from the Fluid Relay. This gives the distributed data
structure a chance to run some code but in our case we don't need to do
anything.
protected onDisconnect() {}
Finally, we also need applyStashedOp()
. This is used in offline mode. For
some applications, we might want to provide some functionality when offline -
a client can keep making updates, which get stashed. We won't dig into this
since for Mental Poker we can't have a single client play offline - we simply
throw an exception if this function ever gets called:
protected applyStashedOp(content: unknown) {
throw Error("Not supported");
}
The full implementation is in ledger.ts.
And that's it! We have a fully functioning distributed data structure we can use to model games.
The GitHub repo also includes a demo app: a collaborative coloring application where multiple clients can simultaneously color a drawing.
In this case, we model coloring operations as x
and y
coordinates, and a
color
. As users click on the drawing, we append these operations to the
ledger and play them back to color the drawing using flood fill.
I spent a bunch of time lately revamping some documentation and this got me thinking. In terms of tooling, even state-of-the-art documentation pipelines are missing some key features. This is also an area where we can directly apply LLMs. In this post, I'll jot down some thoughts of how things could look like in a more perfect world. Of course, here I'm referring to documentation associated with software projects.
This first one isn't unheard of: documentation should be captured in source control and generated from there as a static website. There are two major types of documentation: API reference and articles that aren't tied to a specific API.
API reference should be extracted from code comments. Different languages have
different levels of official
support for this. C# has out-of-the-box XML
documentation (///
), JavaScript has the non-standard but popular JsDoc
etc.
Articles on the other hand should be written as stand-alone Markdown files.
A good documentation pipeline should support both. My team is using DocFX to that effect, though TypeScript is not supported out-of-the-box and requires some additional packages to set up.
Commenting APIs should be enforced via linter. We have tools like StyleCop for C# and a JsDoc plugin for eslint for JavaScript. At the very least, all of the public API surface should be documented. If you introduce a new public API without corresponding documentation, this should cause a build break.
For technical documentation, many times articles also contain code samples. These run the risk of getting out of sync with the actual code as the code churns. In an ideal world, we should be able to associate a code snippet from an article with a test that runs with the CI pipeline. Documentation might skip scaffolding for clarity, so it's likely harder to simply attempt running the exact code snippet. But we should have a way to pull the snipped into a test that provides that scaffolding.
Alternately, enforce that running all snippets in an article in order works - treat articles more like Jupyter notebooks, where the runtime maintains some context, so if, for example, I import something in the first code snippet, the import is available to subsequent code snippets.
The key thing is to have some way to validate at build time that all code examples actually work and not allow breaking changes, even if the only thing that breaks is documentation.
From my personal experience, documentation is usually treated as an afterthought. From time to time there is a big push to update things, but it's rare that everyone is constantly working towards improving docs.
Unless documentation reaches a critical mass of contributors to ensure everything is kept in order, it's best to have clear ownership of each article. Git history is not always the best for finding owners - sometimes the last author is no longer with the team or with the company, or maybe last commits just moved the file around or fixed typos.
This concern goes beyond documentation, in general I'd love to see an ownership tracking system that can associate assets with people and is also org-chart aware - so if an owner changes teams, this gets flagged and a new owner must be provided.
While working on documentation, I noticed that for a large enough project, some information tends to repeat across multiple articles. Maybe as part of a summary on the front page, then again in an article covering some of the details, and once more incidentally in a related article.
The problem is that if something changes and I only update one of the articles (maybe I'm not aware of all the places this shows up), documentation can start contradicting itself. This is something that is not part of the common Markdown syntax but I'd love to have a way to inline a paragraph across multiple documents to avoid this.
All documentation should include a style guide. Some guidelines encourage writing for easier reading, so apply in most cases. For example:
Some guidelines depend on the type of article. If you're documenting a design decision, explain the reasoning and list other options considered and why these weren't adopted. On the other hand, if you are writing a troubleshooting guide, no need to explain the why, just what steps the reader needs to take.
Unfortunately I haven't seen a lot of such guides accompany projects. I wish we had a set of industry standard ones to simply plug in, like we do with open source licenses.
In many cases, there is little effort put into structuring the documentation.
We start with /docs
then as articles pile up, we create new subfolders
organically.
Much like we want some high-level design of a system, we should also require a high-level design of the documentation. What are the key topics and sub-sections? This doesn't even need to be reinvented for each new project, I expect there's a handful of structures which can support most projects, so much like style guides, it would be great to have these available of-the-shelf.
I started this post talking about building documentation from source, which naturally maps to articles being files organized in folders (categories). This type of organization - categories and subcategories - works well up to a certain volume of information.
At some point, it gets hard to figure out which subcategory something fits in: it might fit just as well in multiple places. Here the folder categorization breaks down: there is no clear hierarchy of nested folders in which to fit everything.
At alternative to hierarchies are tags. Maintain a curated set of tags, then tag each article with one or more tags. You can then browse by tag, but have articles show up under multiple tags. This tends to work better with larger volumes of information, but it's harder to map to a file and folder structure.
With the popularity of large language models, I see many applications throughout the lifecycle:
Generative AI can help coauthor documentation. GitHub Copilot already does this. As models get better and cheaper to run, I expect they will be more and more involved in writing documentation.
Given a style guide, a model can review how closely a document adheres to it and suggest changes to match the guide.
With a knowledge of the whole documentation, a model could also spot contradictions (the problem I mentioned in the Inline fragments section). This could be a step in the CI pipeline to ensure consistency.
A model could potentially also act as a reader and provide feedback on how clear the documentation is.
Most tools generating documentation from source provide very rudimentary search capabilities. OpenAI offers text and code embedding APIs which enable semantic search and natural language querying. Using something like this on documentation should make finding things much easier.
Models can also be used to answer questions, so instead of readers having to search the docs for what they need, they can simply ask questions. A model can provide answers based on the documentation (and the codebase). This takes retrieval a step further: users can simply get their questions answered by a model. In some cases articles might not even be needed, as the model can explain in real time how the code is supposed to be used.
I believe as of today, even the best tools available for documentation leave room for improvement and large language models have the potential to radically change the game.
In this post we looked at:
Some of these features exist and some of these practices are adopted in some projects, but most are not widely implemented. I'm curious to see how the landscape will look like in a few years and how AIs will change the way we learn and get our questions answered.
]]>In the previous post I outlined some of the interesting bits of putting together a Mental Poker toolkit. In this post I will talk about cryptography.
The golden rule when it comes to cryptography code is to not roll your own, rather use something that's been battle-tested. That said, I could not find what I needed so had to implement some stuff. I urge you not to rely on my implementation for high-stakes poker, as it is likely buggy.
With the disclaimer out of the way, let's look at what we need to support Mental Poker.
Recap from this old post when I first got interested in the subject:
Mental poker requires a commutative encryption function. If we encrypt \(A\) using \(Key_1\) then encrypting the result using \(Key_2\), we should be able to decrypt the result back to \(A\) regardless of the order of decryption (first with \(Key_1\) and then with \(Key_2\), or vice-versa).
Here is how Alice and Bob play a game of mental poker:
- Alice takes a deck of cards (an array), shuffles the deck, generates a secret key \(K_A\), and encrypts each card with \(K_A\).
- Alice hands the shuffled and encrypted deck to Bob. At this point, Bob doesn't know what order the cards are in (since Alice encrypted the cards in the shuffled deck).
- Bob takes the deck, shuffles it, generates a secret key \(K_B\), and encrypts each card with \(K_B\).
- Bob hands the deck to Alice. At this point, neither Alice nor Bob know what order the cards are in. Alice got the deck back reshuffled and re-encrypted by Bob, so she no longer knows where each card ended up. Bob reshuffled an encrypted deck, so he also doesn't know where each card is.
At this point the cards are shuffled. In order to play, Alice and Bob also need the capability to look at individual cards. In order to enable this, the following steps must happen:
- Alice decrypts the shuffled deck with her secret key \(K_A\). At this point she still doesn't know where each card is, as cards are still encrypted with \(K_B\).
- Alice generates a new set of secret keys, one for each card in the deck. Assuming a 52-card deck, she generates \(K_{A_1} ... K_{A_{52}}\) and encrypts each card in the deck with one of the keys.
- Alice hands the deck of cards to Bob. At this point, each card is encrypted by Bob's key, \(B_K\), and one of Alice's keys, \(K_{A_i}\).
- Bob decrypts the cards using his key \(K_B\). He still doesn't know where each card is, as now the cards are encrypted with Alice's keys.
- Bob generates another set of secret keys, \(K_{B_1} ... K_{B_{52}}\), and encrypts each card in the deck.
- Now each card in the deck is encrypted with a unique key that only Alice knows and a unique key only Bob knows.
If Alice wants to look at a card, she asks Bob for his key for that card. For example, if Alice draws the first card, encrypted with \(K_{A_1}\) and \(K_{B_1}\), she asks Bob for \(K_{B_1}\). If Bob sends her \(K_{B_1}\), she now has both keys to decrypt the card and
lookat it. Bob still can't decrypt it because he doesn't have \(K_{A_1}\).This way, as long as both Alice and Bob agree that one of them is supposed to
seea card, they exchange keys as needed to enable this.
The reason I ended up hand-rolling some cryptography is that off-the-shelf encryption algorithms are non-commutative. With a non-commutative algorithm, the above steps don't work: Alice cannot decrypt the deck with her secret key \(K_A\) after Bob shuffled it and encrypted it with \(K_B\).
The analogy I used in this tech talk is boxes and locks: if we have commutative encryption, we put the secret information in a box and both Alice (using \(K_A\)) and Bob (using \(K_B\)) put a lock on that box. It doesn't really matter in which order we unlock the two locks - as long as both are unlocked, we can get to the content. On the other hand, if we have non-commutative encryption, this is equivalent of Alice putting the secret in a box locked with \(K_A\), and Bob putting the whole locked box in another box locked with \(K_B\). Now Alice's key is useless while the outerbox only has the \(K_B\) lock on it.
There aren't as many applications for commutative encryption, so the popular libraries out there provide only non-commutative encryption algorithms. The commutative encryption algorithm we will look at is SRA.
The SRA encryption algorithm was designed by Shamir, Rivest, and Adleman of RSA fame. Both algorithms use their initials, but the industry-standard RSA is non-commutative. SRA, on the other hand, is.
SRA works like this: we need a large prime number \(P\). This seed prime is shared by all players. To generate encryption keys from it, let \(\phi = P - 1\). Each player needs to find another prime \(E\), such that \(\phi\) and \(E\) are coprime. \(E\) is that player's encryption key. The decryption key is derived from \(\phi\) and \(E\) as the modulo-inverse \(D\) such that \(E * D \equiv 1 \pmod{\phi}\).
To encrypt a number \(N\), we raise it to \(E\) modulo \(P\). To decrypt an encrypted number \(N'\), we raise it to \(D\) modulo \(P\).
Then if player 1 encrypts a payload with \(E_1\) and player 2 encrypts again using \(E_2\), the message can be decrypted by applying \(D_1\) and \(D_2\) in any order. Remember, this is key to the card shuffling algorithm.
For a simple implementation, we can use arbitrarily large integers (BigInt).
Unfortunately, the built-in JavaScript math libraries only work with number
values, so we need to implement a bit of math.
First, we need to find the greatest common divisor of two numbers:
function gcd(a: bigint, b: bigint): bigint {
while (b) {
[a, b] = [b, a % b];
}
return a;
}
We use this to check if two numbers are coprime (their GCD is 1).
Next, we need modulo inverse (find x
such that (a * x) % m == 1
). One way
of doing this is using Euclidean Division. We use the same algorithm we used
for GCD, but we keep track of the values we find at each step. Finally, if a
is 1
, it means there is no modulo inverse. Otherwise we find the modulo
inverse by starting with a pair of numbers x = 1, y = 0
and iterating over
the values we found at the previous step, updating x
to be y
and y
to be
x - y * (a / b)
where a
and b
are values we saved from the previous step:
function modInverse(a: bigint, m: bigint) {
a = ((a % m) + m) % m;
if (!a || m < 2) {
throw new Error("Invalid input");
}
// Find GCD (and remember numbers at each step)
const s = [];
let b = m;
while (b) {
[a, b] = [b, a % b];
s.push({ a, b });
}
if (a !== BigInt(1)) {
throw new Error("No inverse");
}
// Find the inverse
let x = BigInt(1);
let y = BigInt(0);
for (let i = s.length - 2; i >= 0; --i) {
[x, y] = [y, x - y * (s[i].a / s[i].b)];
}
return ((y % m) + m) % m;
}
This gives us the modulo inverse. To recap, we use this once we have a large prime \(P\) with \(\phi = P - 1\) and a large prime \(E\) such that \(gcd(E, \phi) = 1\) to find our decryption key \(D\).
We also need modulo exponentiation for encryption/decryption. Since we are
dealing with large numbers, we will implement exponentiation using the ancient
Egyptian multiplication algorithm.
To raise b
to e
modulo m
, if e
is 1
, we return b
. Otherwise we
recursively raise (b * b) % m
to e / 2
modulo m
. Whenever e
is odd,
we multiply the recursion result by an additional b
:
function exp(b: bigint, e: bigint, m: bigint): bigint {
if (e === BigInt(1)) {
return b;
}
let result = exp((b * b) % m, e / BigInt(2), m);
if (e % BigInt(2) === BigInt(1)) {
result *= b;
}
return result % m;
}
This algorithm runs in log e
time and keeps the large numbers to a manageable
size since we apply modulo m
at each step. We have most of the math pieces in
place. The only thing missing is a way to generate large primes.
One way of generating large primes is through trial and error: we generate a
large number, check if it is prime, and repeat if it isn't. We can generate a
large number by filling a byte array with random values, then converting it
into a BigInt
:
function randBigInt(sizeInBytes: number = 128): bigint {
let buffer = new Uint8Array(sizeInBytes);
crypto.getRandomValues(buffer);
// Build a bigint out of the buffer
let result = BigInt(0);
buffer.forEach((n) => {
result = result * BigInt(256) + BigInt(n);
});
return result;
}
This gives us a random number of as many bytes as we want (default being 128 bytes, i.e. 1024 bits). Since we are dealing with very large numbers, we can't naively test for primality of \(N\) by trying divisions up to \(\sqrt{N}\), this is too expensive. We instead use the probabilistic Miller-Rabin test.
In short, Miller-Rabin works like this: we can write an integer \(N\) (our prime candidate) as \(N = 2^S * D + 1\) where \(S\) and \(D\) are positive integers.
Let's take another integer \(A\) coprime with \(N\). \(N\) is likely to be prime if \(A^D \equiv 1 \pmod{N}\) or \(A^{2^{R}*D} \equiv -1 \pmod{N}\) for some \(0 <= R <= S\). If this is not the case, then \(N\) is not a prime and \(A\) is called a witness of the compositeness of \(N\).
This is a probabilistic test, so we can tell whether \(N\) is for sure non-prime or likely to be prime. Unfortunately, we can't tell for sure that \(N\) is prime. We need to run multiple iterations of this picking different \(A\) values until we are satisfied that \(N\) is likely enough to be prime.
First, we need a helper function that checks \(A\) is not a witness of \(N\), given \(A\), \(N\), and \(S\) and \(D\) such that \(N = S^2 * D + 1\).
We compute \(U\) as \(A^D \pmod{N}\). If \(U - 1 = 0\) or \(U + 1 = N\), then \(A\) is not a witness of \(N\). Otherwise, we repeat \(S - 1\) times: \(U = U^2 \pmod{N}\) and \(A\) is not a witness if \(U + 1 = N\). At this point, if we haven't confirmed that \(A\) is not a witness, we consider \(A\) a witness of \(N\) thus \(N\) is not prime. These are simply the checks described above (\(A^D \equiv 1 \pmod{N}\) and \(A^{2^{R}*D} \equiv -1 \pmod{N}\)) in implementation form.
function isNotWitness(a: bigint, d: bigint, s: bigint, n: bigint): boolean {
if (a === BigInt(0)) {
return true;
}
// u is a ^ d % n
let u = exp(a, d, n);
// a is not a witness if u - 1 = 0 or u + 1 = n
if (u - BigInt(1) === BigInt(0) || u + BigInt(1) === n) {
return true;
}
// Repeat s - 1 times
for (let i = BigInt(0); i < s - BigInt(1); i++) {
// u = u ^ 2 % n
u = exp(u, BigInt(2), n);
// a is not a witness if u = n - 1
if (u + BigInt(1) === n) {
return true;
}
}
// a is a witness of n
return false;
}
With this, we can finally implement Miller-Rabin. We first check a few trivial
cases (2
and 3
are prime, even numbers are non-prime). We then find \(S\) and
\(D\) such that our number \(N = 2^S * D + 1\) (we do this by factoring out powers
of 2 from \(N - 1\)).
We then repeat the test: get a random number \(A < N\). If \(A\) is a witness of \(N\), then \(N\) is not prime. If we run this test enough times, we can safely assume the number is prime. According to this, 40 rounds should be good enough for a 1024 bit prime.
function millerRabinTest(candidate: bigint): boolean {
// Handle some obvious cases
if (candidate === BigInt(2) || candidate === BigInt(3)) {
return true;
}
if (candidate % BigInt(2) === BigInt(0) || candidate < BigInt(2)) {
return false;
}
// Find s and d
let d = candidate - BigInt(1);
let s = BigInt(0);
while ((d & BigInt(1)) === BigInt(0)) {
d = d >> BigInt(1);
s++;
}
// Test 40 rounds.
for (let k = 0; k < 40; k++) {
let a = randBigInt() % candidate;
if (!isNotWitness(a, d, s, candidate)) {
return false;
}
}
return true;
}
Note d
and s
above are technically only needed in isNotWitness()
, but
since they are based on our prime candidate, we compute them once and pass them
as arguments to isNotWitness()
rather than having to recompute them on each
call of the function.
We can finally implement our prime generator. We simply generate large numbers and repeat until Miller-Rabin confirms we got a prime number:
function randPrime(sizeInBytes: number = 128): bigint {
let candidate = BigInt(0);
do {
candidate = randBigInt(sizeInBytes);
} while (!millerRabinTest(candidate));
return candidate;
}
With the low-level math out of the way, we can implement the cryptography API.
First, we will define an SRAKeyPair
as consisting of the initial large prime
\(P\) and the derived \(E\) and \(D\) used for encryption/decryption:
type SRAKeyPair = {
prime: bigint;
enc: bigint;
dec: bigint;
};
We can generate a large prime using randPrime()
. From such a prime, we can
generate an SRAKeyPair
:
function generateKeyPair(largePrime: bigint, size: number = 128): SRAKeyPair {
const phi = largePrime - BigInt(1);
let enc = BigInt(0);
// Trial and error
for (;;) {
// Generate a large prime
enc = randPrime(size);
// Stop when generated prime and passed in prime - 1 are coprime
if (gcd(enc, phi) === BigInt(1)) {
break;
}
}
// enc is our encryption key, now let's find dec as the mod inverse of enc
let dec = modInverse(enc, phi);
return {
prime: largePrime,
enc: enc,
dec: dec,
};
}
If we have an SRAKeyPair
, we can encrypt/decrypt numbers using the modulo
exponentiation function we defined above (exp()
):
function encryptInt(n: bigint, kp: SRAKeyPair) {
return exp(n, kp.enc, kp.prime);
}
function decryptInt(n: bigint, kp: SRAKeyPair) {
return exp(n, kp.dec, kp.prime);
}
We can also convert a string into a BigInt and vice-versa. Assuming we only have character codes below 256 (so ASCII), we can simply encode the string as a 256-base number where each digit is a character:
function stringToBigInt(str: string): bigint {
let result = BigInt(0);
for (const c of str) {
if (c.charCodeAt(0) > 255) {
throw Error(`Unexpected char code ${c.charCodeAt(0)} for ${c}`);
}
result = result * BigInt(256) + BigInt(c.charCodeAt(0));
}
return result;
}
The ASCII assumption is reasonable, since we use this at a protocol level, not as part of the user experience. We can decode such a number back into a string using division and modulo:
function bigIntToString(n: bigint): string {
let result = "";
let m = BigInt(0);
while (n > 0) {
[n, m] = [n / BigInt(256), n % BigInt(256)];
result = String.fromCharCode(Number(m)) + result;
}
return result;
}
Now that we have these conversions, we can can implement string
encryption/decryption on top of our encryptInt()
and decryptInt()
functions:
function encryptString(clearText: string, kp: SRAKeyPair): string {
return bigIntToString(encryptInt(stringToBigInt(clearText), kp));
}
function decryptString(cypherText: string, kp: SRAKeyPair): string {
return bigIntToString(decryptInt(stringToBigInt(cypherText), kp));
}
We can encode any object as a string (and decode back strings to objects):
function encrypt<T>(obj: T, kp: SRAKeyPair): string {
return encryptString(JSON.stringify(obj), kp);
}
function decrypt<T>(cypherText: string, kp: SRAKeyPair): T {
return JSON.parse(decryptString(cypherText, kp));
}
And that's it! We start with randPrime()
to generate a large prime, then
use generateKeyPair()
to derive \(E\) and \(D\) from it. We can then use this
SRAKeyPair
with encrypt()
and decrypt()
to encrypt/decrypt objects using
the commutative SRA algorithm.
Here is a small example pulling everything together:
// Seed prime used by both players to generate keys
const sharedPrime = randPrime();
const aliceKP = generateKeyPair(sharedPrime);
const bobKP = generateKeyPair(sharedPrime);
const card = "Ace of spades";
// Encrypt with Alice's key first, then Bob's
const aliceEncrypted = encryptString(card, aliceKP);
const aliceAndBobEncrypted = encryptString(aliceEncrypted, bobKP);
// Decrypt with Alice's key first, then Bob's
const bobEncrypted = decryptString(aliceAndBobEncrypted, aliceKP);
const decrypted = decryptString(bobEncrypted, bobKP);
// Prints "Ace of spades"
console.log(decrypted);
BigInt
implementations for GCD, modulo inverse, and modulo
exponentiation.BigInt
, and more
generally any object by stringifying it.My work-in-progress Mental Poker Toolkit is here. This post covered the cryptography package.
]]>I wrote previously about Mental Poker, how one can set up a game in a zero trust environment, and how this could be implemented using Fluid Framework.
Since the previous post, I spent some more time prototyping an implementation with a colleague and did a tech talk about it.
If you haven't read the previous post and are not familiar with Mental Poker, the following won't make much sense. Please start there or by watching the tech talk video.
The implementation consists of a few components:
At the time of writing, the append-only list distributed data structure is ready, available on my GitHub as fluid-ledger and published on npm.
The other components will all eventually end up in the mental-poker-toolkit repo.
Some parts, like cryptography and the game client, I cleaned up and moved from a private hackathon repo. Other parts, like the state machine, require major rework, which I haven't gotten around to yet.
The plan is to provide a quality implementation with good documentation and samples. A major difference between the hackathon proof of concept and this is that the proof of concept implements a simple discard game while I'm hoping the toolkit can support games with more than two players.
Modeling a game like Poker is non-trivial. That said, a big part of the complexity comes from the rules of the game itself. For a proof of concept of Mental Poker, we didn't want to get in the weeds of Poker rules, rather showcase the key ideas of how two players can shuffle a deck of cards, agree on what order the cards end up in, while at the same time each being able to maintain some private state (cards in hand). All of this done over a public channel (Fluid Framework).
The game we modeled was simple: players draw a hand of cards, then take turns discarding by number or suit. If a player can't discard (no matching number or suit), they draw cards until they can discard. The player who discards their whole hand first wins.
This prototype informed the components we had to build.
Framework does not offer out of the box
a data structure like the one needed
to model a sequence of moves. We ended up using SharedObjectSequence
, a data
structure that was marked as deprecated and since removed from Fluid. In
general, the Fluid data structures that support lists are overkill for Mental
Poker as they support insertion and deletion of sequences of items at arbitrary
positions. For modeling a game, we just need an append only list - players take
turns and each move means appending something to the end of the list.
In fact, having an append-only list ensures that we don't run into issues like a client unexpectedly inserting something in the middle of the list, which doesn't make sense if we're modeling a sequence of moves in a game.
I was also not able to find a package providing commutative encryption. This is a key requirement for the Mental Poker protocol but industry standard cryptography algorithms do not have this property. I ended up implementing the SRA algorithm from scratch, including a bunch of BigInt math. I still strongly believe in the don't roll your own crypto rule, so please do not use my implementation to play Poker for real money.
Besides encryption, we also need digital signatures. When a player joins a
game, they generate a public/private key pair and their first action is to post
their public key. All subsequent moves from that player are signed with the
private key, so other players can ensure the action is taken by the player
claiming to take that action, eliminating spoofing. Fortunately we were able to
use Crypto.subtle
for this (see Crypto Web API).
Another interesting discovery was the state machine. A high-level game move, like I'm drawing a card from the top of the pile translates into a message exchange between the players:
Alice
: I'm drawing a card from the top of the pile.Bob
: Here is my key for that card.Shuffling cards, as described in the previous blog post, includes a longer
sequence of steps. We needed a way to express I do this, then I expect the
other player to reply with that
. We can use such a state machine to express
sequences of multiple moves to implement things like card shuffling.
The proof of concept state machine uses a queue of expected moves from the other player to implement the game mechanics and Mental Poker protocol. For example, for the Discard game, if it is the other player's turn, we expect two things can happen: they either discard a card or draw a card.
If they discard a card, then they publish their encryption key for the card which we can use to see the card (again, please refer to the previous Mental Poker post for details on the protocol). Alternately, if they can't discard a card, they need to draw a card, in which case we have to hand over our encryption key for the card on top of the deck.
Some of the rules captured in this state machine are specific to each game
implemented. Others though are simply steps in the Mental Poker protocol:
things like shuffling, drawing cards etc. are all modeled as actions I take
and actions I expect the other player to follow up with. I envision
expressing such known sequences as recipes
, building blocks for games.
As I mentioned before, the proof of concept state machine implementation requires some major rework. It needs to scale from two players to an arbitrary number of players, and needs to support recipes, which it currently doesn't. At the time of writing, this is one of the biggest chunks of pending work, and considering this is a hobby project I work on when time permits, I currently don't have a good sense of when I'll finish this. That said, a bunch of pieces are already in decent shape and public, so I plan to write about them while I continue working on finishing the toolkit.
In upcoming blog posts, I plan to cover the various pieces discussed above. The components address different problems, and I find all of them quite interesting. The problem space includes understanding how Fluid Framework distributed data structures work internally, how to generate large prime numbers, and how to model expected sequences of moves in a game among other things.
This post outlines the high level framing of the project. Following posts will dive deep into specific aspects.
In terms of applications, as I mention in the tech talk, the term games is pretty broad - we're not talking only about card games, but things like auctions, lotteries, blind voting etc. All of these can be implemented using Mental Poker as decentralized, zero-trust games.
]]>I've been having fun solving Advent of Code problems every December for a few years now. Advent of Code is an advent calendar of programming puzzles.
All my solutions are on my GitHub here. First, a quick disclaimer:
Disclaimer on my solutions
I use Python because I find it easiest for this type of coding. I treat solving these as a write-only exercise. I do it for the problem-solving bit, so I don't comment the code & once I find the solution I consider it
done- I donât revisit and try to optimize even though sometimes I strongly feel like there is a better solution. I don't even share code between part 1 and part 2 - once part 1 is solved, I copy/paste the solution and change it to solve part 2, so each can be run independently. I also rarely use libraries, and when I do it's some standard ones likere
,itertools
, ormath
. The code has no comments and is littered with magic numbers and strange variable names. This is not how I usually code, rather my decadent holiday indulgence. I wasn't thinking I will end up writing a blog post discussing my solutions so I would like to apologize for the code being hard to read.
With that long disclaimer out of the way, let's talk Advent of Code 2022. I figured I'll cover a few problems that seemed interesting to me during this round, before they fade in my memory. The first couple of weeks are usually easy, so I'll start from day 15.
Problem statement is here.
Part 1 is pretty easy. We use taxicab geometry and for each sensor, we can find
its scan radius by computing the Manhattan distance between its coordinates and
the closest beacon it sees. Once we have this, we intersect each (taxicab)
circle with the row y=2000000
. This gives as a bunch of segments defined by
(x0, x1)
pairs.
import re
y, segments = 2000000, set()
for line in open('input').readlines():
m = re.match('Sensor at x=(-?\d+), y=(-?\d+).*x=(-?\d+), y=(-?\d+)$', line)
sx, sy, bx, by = map(int, m.groups())
radius = abs(sx - bx) + abs(sy - by)
if abs(sy - y) <= radius:
segments.add(((sx - (radius - abs(sy - y)),
(sx + (radius - abs(sy - y))))))
We need to figure out where these overlap so we don't double-count so for each pair of segments, if they intersect, we replace them by their union until no segments intersect anymore. Then we simply sum the length of each segment:
def intersect(s1, s2):
return s1[1] >= s2[0] and s2[1] >= s1[0]
def union(s1, s2):
return (min(s1[0], s2[0]), max(s1[1], s2[1]))
done = False
while not done:
done = True
for s1 in segments:
for s2 in segments:
if s1 == s2:
continue
if intersect(s1, s2):
segments.remove(s1)
segments.remove(s2)
segments.add(union(s1, s2))
done = False
break
if not done:
break
print(sum([s[1] - s[0] for s in segments]))
Part 2 is more interesting. We need to scan a quite large area (both x
and y
between 0
and 4000000
). We know that all points except one are covered by at
least one sensor. We start from (0, 0)
and scan like this: for each point,
find the first sensor that sees
it (Manhattan distance from sensor <= sensor
radius). If no scanner can see it, we found our point. Otherwise, again relying
on taxicab geometry, we can tell how many additional points to the right
(increasing x
) are still in range of this sensor. We move x
beyond these
(\(x = x_sensor + radius - abs(y_sensor - y) + 1\)). If x
goes beyond
4000000
, we reset it to 0
and increment y
. This is not blazingly fast, but
does the job in a reasonable amount of time (around 20 seconds on my machine).
import re
sensors = []
for line in open('input').readlines():
m = re.match('Sensor at x=(-?\d+), y=(-?\d+).*x=(-?\d+), y=(-?\d+)$', line)
sx, sy, bx, by = map(int, m.groups())
radius = abs(sx - bx) + abs(sy - by)
sensors.append((sx, sy, radius))
def in_range(x, y):
for sensor in sensors:
if abs(sensor[0] - x) + abs(sensor[1] - y) <= sensor[2]:
return True, sensor
return False, None
x, y = 0, 0
while True:
found, sensor = in_range(x, y)
if not found:
break
x = sensor[0] + sensor[2] - abs(sensor[1] - y) + 1
if x > 4_000_000:
x = 0
y += 1
print(x * 4_000_000 + y)
Problem statement is here.
Part 1 is again pretty easy: we can model the valves and tunnels as a graph, then use the Floyd-Warshall algorithm to find the distances between each pair of valves:
import re
dist, flows, to_open = {}, {}, set()
for line in open('input').readlines():
m = re.match(
'Valve (\w+) has flow rate=(\d+); tunnels? leads? to valves? (.*)$', line)
src, flow, *dst = m.groups()
dst = [d.strip() for d in dst[0].split(',')]
dist[src] = {d: 1 for d in dst} | {src: 0}
flows[src] = int(flow)
if flows[src] > 0:
to_open.add(src)
for i in dist:
for j in dist:
if j not in dist[i]:
dist[i][j] = 1000
for k in dist:
for i in dist:
for j in dist:
if dist[i][j] > dist[i][k] + dist[k][j]:
dist[i][j] = dist[i][k] + dist[k][j]
We can then search for the best solution recursively: we start from AA
and
keep track of which valves we opened (none for starters). Then at each step,
pick one of the unopened valves. If we have enough time to reach them, recurse
with updated location and set of opened nodes. We also compute the total
pressure released so far at each step and keep track of the highest value we
found. This gives us the solution.
best = 0
def search(current='AA', opened=set(), time=30, score=0):
global best
score += time * flows[current]
if score >= best:
best = score
for node in to_open - opened:
if time - dist[current][node] - 1 >= 0:
search(node, opened | {node}, time -
dist[current][node] - 1, score)
search()
print(best)
Part 2 is more fun. We now have an elephant to help us, which makes it a bit
more complicated. My solution now keeps track of a few more things: which valve
am I headed to and how many more minutes I have to get there; which valve is the
elephant headed to and how many more minutes until it gets there. We both start
at AA
with an ETA of 0
. Then for each node, if my ETA is 0, I'll be heading
that way. If not, the elephant will be heading there. But since we're dealing
with two ETAs, we need to figure out which of us will reach their destination
first, and recurse to that time.
best = 0
def search(me=('AA', 0), elephant=('AA', 0), opened=set(), time=26, score=0):
global best
if score > best:
best = score
for node in to_open - opened:
me_next, elephant_next, score_next = me, elephant, score
if me[1] == 0:
me_next = (node, dist[me[0]][node] + 1)
score_next += (time - dist[me[0]][node] - 1) * flows[node]
else:
elephant_next = (node, dist[elephant[0]][node] + 1)
score_next += (time - dist[elephant[0]][node] - 1) * flows[node]
dt = min(me_next[1], elephant_next[1])
me_next = (me_next[0], me_next[1] - dt)
elephant_next = (elephant_next[0], elephant_next[1] - dt)
if time - dt >= 0:
search(me_next, elephant_next, opened |
{node}, time - dt, score_next)
search()
print(best)
This works but takes a long time, so I added some caching: since both the elephant and I move around a bunch, we can cache the score for each combination of my destination and ETA, the elephant's destination and ETA, and the time. If at a given minute, both the elephant and I were already in this situation but with a better score, we no longer need to keep searching this branch as we already found a better solution. This prunes enough of the search tree to easily find the answer. Updated search with cache:
best = 0
cache = {}
def search(me=('AA', 0), elephant=('AA', 0), opened=set(), time=26, score=0):
global best
if score > best:
best = score
key = str(me) + str(elephant) + str(time)
if key in cache:
if cache[key] >= score:
return
cache[key] = score
for node in to_open - opened:
me_next, elephant_next, score_next = me, elephant, score
if me[1] == 0:
me_next = (node, dist[me[0]][node] + 1)
score_next += (time - dist[me[0]][node] - 1) * flows[node]
else:
elephant_next = (node, dist[elephant[0]][node] + 1)
score_next += (time - dist[elephant[0]][node] - 1) * flows[node]
dt = min(me_next[1], elephant_next[1])
me_next = (me_next[0], me_next[1] - dt)
elephant_next = (elephant_next[0], elephant_next[1] - dt)
if time - dt >= 0:
search(me_next, elephant_next, opened |
{node}, time - dt, score_next)
search()
print(best)
Problem statement is here.
For part 1 we can simply simulate the falling blocks and find the answer. This gives us some of the building blocks needed for part 2.
jets = open('input').read()
rocks = [{(0, 0), (1, 0), (2, 0), (3, 0)},
{(0, 1), (1, 0), (1, 1), (1, 2), (2, 1)},
{(0, 0), (1, 0), (2, 0), (2, 1), (2, 2)},
{(0, 0), (0, 1), (0, 2), (0, 3)},
{(0, 0), (0, 1), (1, 0), (1, 1)}]
grid = set({(i, 0) for i in range(1, 8)})
def intersects(rock, grid):
for block in rock:
if block in grid or block[0] <= 0 or block[0] >= 8:
return True
return False
def move(rock, dx, dy):
return {(i + dx, j + dy) for i, j in rock}
rock_i, jet_i = 0, 0
for _ in range(2022):
top = max(grid, key=lambda pt: pt[1])[1]
rock = move(rocks[rock_i], 3, top + 4)
while True:
new_pos = move(rock, 1 if jets[jet_i] == '>' else -1, 0)
jet_i += 1
if jet_i == len(jets):
jet_i = 0
if not intersects(new_pos, grid):
rock = new_pos
new_pos = move(rock, 0, -1)
if intersects(new_pos, grid):
break
rock = new_pos
grid |= rock
rock_i += 1
if rock_i == len(rocks):
rock_i = 0
print(max(grid, key=lambda pt: pt[1])[1])
Part 2 makes it obvious simulating everything is not an option as we need to
look at a thousand billion rocks. The key here is to find a pattern: we are
bound to end up simulating the same rock and initial move instruction over and
over. If we do and we see the same gain in height between repeats, it means we
found our repeating pattern. We know that starting from this position, we have a
period of length period
in which our tower of rocks grows by growth
. We
subtract the number of rocks we already simulated from 1000000000000, we divide
by period
and multiply by growth
. We'll call this delta_top
.
We are close to the final answer. The only thing left to do is simulate a few
more steps: 1000000000000 minus the number of rocks we already simulated modulo
period
. Now we get the height of the top of the tower we simulated and add
delta_top
to it to find the final answer.
def top():
return max(grid, key=lambda pt: pt[1])[1]
rock_i, jet_i = 0, 0
cache, delta_top = {}, 0
i = 0
while i < 10_000:
rock = move(rocks[rock_i], 3, top() + 4)
while True:
new_pos = move(rock, 1 if jets[jet_i] == '>' else -1, 0)
jet_i += 1
if jet_i == len(jets):
jet_i = 0
if not intersects(new_pos, grid):
rock = new_pos
new_pos = move(rock, 0, -1)
if intersects(new_pos, grid):
break
rock = new_pos
grid |= rock
rock_i += 1
if rock_i == len(rocks):
rock_i = 0
i += 1
if not delta_top:
if (rock_i, jet_i) not in cache:
cache[(rock_i, jet_i)] = []
c = cache[(rock_i, jet_i)]
c.append([i, top()])
if len(c) > 2 and c[-1][1] - c[-2][1] == c[-2][1] - c[-3][1]:
period, growth = c[-1][0] - c[-2][0], c[-1][1] - c[-2][1]
delta_top = (1_000_000_000_000 - i) // period * growth
i = 10_000 - (1_000_000_000_000 - i) % period
print(top() + delta_top)
Problem statement is here.
Part is trivial so I won't discuss it here.
Part 2 is also very easy, but I found a really neat solution worth sharing:
since all boulders are within (0, 0, 0)
and (20, 20, 20)
, I look at a grid
encompassing everything ((-1, -1, -1) to (21, 21, 21)
) and starting from (-1,
-1, -1)
, flood fill. We use a queue and at each step we dequeue a triple of
coordinates. If already visited or out of bounds, we ignore it and continue.
Otherwise if it is a boulder, it means we found a new surface area. We mark
these coordinates as visited and enqueue all the neighbors. I like how whenever
we run into a boulder gives us exactly the area we are looking for. The full
solution is:
cubes = [tuple(map(int, l.strip().split(','))) for l in open('input').readlines()]
visited, queue, area = set(), [(-1, -1, -1)], 0
while queue:
(x, y, z) = queue.pop(0)
if (x, y, z) in visited:
continue
if not (-1 <= x <= 22 and -1 <= y <= 22 and -1 <= z <= 22):
continue
if (x, y, z) in cubes:
area += 1
continue
visited.add((x, y, z))
queue.append((x - 1, y, z))
queue.append((x + 1, y, z))
queue.append((x, y - 1, z))
queue.append((x, y + 1, z))
queue.append((x, y, z - 1))
queue.append((x, y, z + 1))
print(area)
Problem statement is here.
I used the same solution for part 1 and part 2: a recursive search where we keep track of the bots and resources we have, and the time. The problem is it takes too long to simulate minute by minute. If we try deciding at each minute whether to build any of the bots we can build or keep collecting resources, then recurse to next minute, we end up with too much combinatorial complexity. My solution instead does something like this: for the current moment in time, for each type of robot, say we want to build that one next - based on costs and available resources, we can calculate how many minutes from now that robot be built. We can then recurse (jump ahead in time) there updating available resources, since we know other robots won't be built until then.
As an additional optimization, we can keep track of how many geodes we collected at each minute and if our current search has fewer geodes, it means we already found a better solution and it is not worth going down this branch. There's probably smarter caching/pruning we can do but this seems to be good enough.
This tames the combinatorial complexity enough to get a reasonable run time and going from simulating 24 minutes in part 1 to simulating 32 minutes for fewer blueprints in part 2 doesn't seem to require changing the algorithm. Both parts take around 2 minutes to run. It can probably be optimize further.
import re
import math
def run(bots, costs, resources, time):
if best[time] > resources[3]:
return
best[time] = resources[3]
if time == 0:
return
for bot_type in range(4):
dt = math.ceil((costs[bot_type][0] - resources[0]) / bots[0])
if bot_type >= 2:
if bots[bot_type - 1] == 0:
continue
dt = max(dt, math.ceil((costs[bot_type][1] -
resources[bot_type - 1]) / bots[bot_type - 1]))
dt = max(dt, 0) + 1
if time < dt:
continue
new_resources = [resources[i] + bots[i] * dt for i in range(4)]
new_resources[0] -= costs[bot_type][0]
if bot_type >= 2:
new_resources[bot_type - 1] -= costs[bot_type][1]
bots[bot_type] += 1
run(bots, costs, new_resources, time - dt)
bots[bot_type] -= 1
score = 1
for line in open('input').readlines()[:3]:
m = re.match(
'.*(\d+) ore.*(\d+) ore.*(\d+) ore and (\d+) clay.*(\d+) ore and (\d+) obsidian', line)
costs = list(map(int, m.groups()))
costs = [[costs[0]], [costs[1]], [
costs[2], costs[3]], [costs[4], costs[5]]]
best = [0] * 33
run([1, 0, 0, 0], costs, [0] * 4, 32)
score *= best[0]
print(score)
Problem statement is here.
Day 20 was very easy so I won't cover it here.
Problem statement is here.
Another easy one. For part 1, we parse the input in an expression tree (with values at leaf nodes and operators at non-leaf nodes) and we recursively evaluate it from root.
tree = {}
for line in open('input').readlines():
key, value = line.strip().split(': ')
value = value.split(' ')
if len(value) == 1:
value = int(value[0])
tree[key] = value
def get(key):
if isinstance(tree[key], int):
return tree[key]
v1, v2 = get(tree[key][0]), get(tree[key][2])
match tree[key][1]:
case '+': return v1 + v2
case '-': return v1 - v2
case '*': return v1 * v2
case '/': return v1 // v2
print(get('root'))
Part 2 effectively makes the root be ==
and asks us to find the value for the
humn
node. For this, we can update our recursive evaluation to either compute
a value or return None
if humn
is part of the subtree we're trying to
evaluate (so if either left or right subtree evaluates to None
, return
None
). We add another recursive function solve()
which takes a node and an
expected value (we expect the node to end up equal to the value) then we can
recursively solve: evaluate left and right. Depending on which of them returns
None
, we recurse down that subtree with an updated expected value. For
example, if we expect left + right
to be 10
and we get 5
and None
back,
then we recurse down the right
subtree, with an expected value of 10 - left
.
tree = {}
for line in open('input').readlines():
key, value = line.strip().split(': ')
value = value.split(' ')
if len(value) == 1:
value = int(value[0])
tree[key] = value
def get(key):
if tree[key] == None or isinstance(tree[key], int):
return tree[key]
v1, v2 = get(tree[key][0]), get(tree[key][2])
if v1 == None or v2 == None:
return None
match tree[key][1]:
case '+': return v1 + v2
case '-': return v1 - v2
case '*': return v1 * v2
case '/': return v1 // v2
def solve(key, eq):
if tree[key] == None:
return eq
k1, k2 = tree[key][0], tree[key][2]
v1, v2 = get(k1), get(k2)
if v1 == None:
match tree[key][1]:
case '+': return solve(k1, eq - v2)
case '-': return solve(k1, eq + v2)
case '*': return solve(k1, eq // v2)
case '/': return solve(k1, eq * v2)
if v2 == None:
match tree[key][1]:
case '+': return solve(k2, eq - v1)
case '-': return solve(k2, v1 - eq)
case '*': return solve(k2, eq // v1)
case '/': return solve(k2, v1 // eq)
tree['humn'] = None
tree['root'][1] = '-'
print(solve('root', 0))
Problem statement is here.
This one was fun but a bit tedious. Part 1 is very easy, we implement movement with wrap-around and stopping when we hit #
.
import re
grid = [line.strip('\n').ljust(150, ' ') for line in open('input').readlines()]
dirs, grid = [m.group() for m in re.finditer('(\d+)|L|R', grid[-1])], grid[:-2]
dirs = [int(d) if str.isdecimal(d) else d for d in dirs]
facing = [(1, 0), (0, 1), (-1, 0), (0, -1)]
x, y, d = grid[0].index('.'), 0, 0
def move(x, y, d):
nx = (x + d[0]) % len(grid[0])
ny = (y + d[1]) % len(grid)
match grid[ny][nx]:
case ' ':
nx, ny = move(nx, ny, d)
return (nx, ny) if grid[ny][nx] != ' ' else (x, y)
case '#': return (x, y)
case '.': return (nx, ny)
for step in dirs:
if isinstance(step, int):
while step > 0:
x, y = move(x, y, facing[d])
step -= 1
elif step == 'L':
d = (d - 1) % 4
else:
d = (d + 1) % 4
print(1000 * (y + 1) + 4 * (x + 1) + d)
For part 2, we need to figure out how the various facets connect into a cube and map movement from one face to another. Personally, I made a paper cutout of the input shape, folded it, and used that to figure out the transitions:
The algorithm is pretty easy if the mappings are right. While on the same facet,
we simply move in the direction we are supposed to move. We can encode a facet
as a pair of (region_x, region_y)
coordinates where region_x, region_y = x //
50, y // 50
. Of course, some pairs of coordinates are not part of any facet of
the cube (e.g. (0, 0)
) but that doesn't matter. Using this encoding, we can
tell when a movement gets us outside the current region. When that happens, we
have a helper function which helps figure out where we end up and what is the
new orientation.
import re
grid = [line.strip('\n').ljust(150, ' ') for line in open('input').readlines()]
dirs, grid = [m.group() for m in re.finditer('(\d+)|L|R', grid[-1])], grid[:-2]
dirs = [int(d) if str.isdecimal(d) else d for d in dirs]
size = 50
facing = [(1, 0), (0, 1), (-1, 0), (0, -1)]
connections = {
(1, 0): [(2, 0, 0), (1, 1, 1), (0, 2, 0), (0, 3, 0)],
(2, 0): [(1, 2, 2), (1, 1, 2), (1, 0, 2), (0, 3, 3)],
(1, 1): [(2, 0, 3), (1, 2, 1), (0, 2, 1), (1, 0, 3)],
(0, 2): [(1, 2, 0), (0, 3, 1), (1, 0, 0), (1, 1, 0)],
(1, 2): [(2, 0, 2), (0, 3, 2), (0, 2, 2), (1, 1, 3)],
(0, 3): [(1, 2, 3), (2, 0, 1), (1, 0, 1), (0, 2, 3)],
}
x, y, d = grid[0].index('.'), 0, 0
def move(x, y, d):
nx = x + facing[d][0]
ny = y + facing[d][1]
nd = d
if (x // size, y // size) != (nx // size, ny // size):
nx, ny, nd = switch_region(x, y, d)
match grid[ny][nx]:
case '#': return (x, y, d)
case '.': return (nx, ny, nd)
def switch_region(x, y, d):
nrx, nry, nd = connections[(x // size, y // size)][d]
nx, ny = nrx * size, nry * size
rx, ry = x % size, y % size
if (d, nd) in [(0, 0), (1, 3), (2, 2), (3, 1)]:
return nx + size - rx - 1, ny + ry, nd
if (d, nd) in [(0, 2), (1, 1), (2, 0), (3, 3)]:
return nx + rx, ny + size - ry - 1, nd
if (d, nd) in [(0, 1), (1, 0), (2, 3), (3, 2)]:
return nx + size - ry - 1, ny + size - rx - 1, nd
if (d, nd) in [(0, 3), (1, 2), (2, 1), (3, 0)]:
return nx + ry, ny + rx, nd
for step in dirs:
if isinstance(step, int):
while step > 0:
x, y, d = move(x, y, d)
step -= 1
elif step == 'L':
d = (d - 1) % 4
else:
d = (d + 1) % 4
print(1000 * (y + 1) + 4 * (x + 1) + d)
Problem statement is here.
This is a cellular automaton. In general, when implementing cellular automata,
the trick
is to not change things in place, rather use a new copy for each
generation. I represented the elves as a set of (x, y)
coordinates. We can use
set intersection to see if an elf has other elves nearby or whether two elves
would end up moving in the same spot. I won't go into more detail as this was
another pretty easy problem. The code is on my GitHub.
Problem statement is here.
I liked this one. For both part 1 and part 2, this becomes easy to solve with a couple of interesting observations.
First the blizzards move in a repeating pattern so we can map which squares are
occupied at a given point in time and we know the occupancy repeats every
lcm(height, width)
where height
and width
are the height and width of the
valley. We can compute this many generations and store the occupancy map in a
lookup.
import math
blizzards = []
lines = [line.strip() for line in open('input').readlines()]
for y, line in enumerate(lines):
for x, c in enumerate(line):
if c in '<^>v':
blizzards.append((x, y, c))
maxx, maxy = len(lines[0]) - 1, len(lines) - 1
move = {'<': (-1, 0), '^': (0, -1), '>': (1, 0), 'v': (0, 1)}
def step(blizzards):
new = []
for b in blizzards:
x, y = b[0] + move[b[2]][0], b[1] + move[b[2]][1]
if x == 0: x = maxx - 1
if x == maxx: x = 1
if y == 0: y = maxy - 1
if y == maxy: y = 1
new.append((x, y, b[2]))
return new
def occupancy(blizzards):
return {(x, y) for x, y, c in blizzards}
steps, lcm = {}, math.lcm(maxx - 1, maxy - 1)
for i in range(lcm):
steps[i] = {(x, y) for x, y, _ in blizzards}
blizzards = step(blizzards)
Next, we can do a breadth-first search to find the closest path from one side to
the other. Since a possible move is waiting one, its pretty hard to find bounds
for a depth-first search. On the other hand, at every step the elves can occupy
one of the at most height * width
positions. Of course, most of these will be
occupied by blizzards. So for a BFS, we start from the initial position and time
(step 0
) and use a queue. We pop the first move and enqueue all possible moves
from this position (taking into account valley bounds and blizzard occupancy)
for the next step. As long as we ensure not to enqueue duplicates, the queue
stays small. Since this is BFS, as soon as the position we dequeue is our
destination, we know this is the earliest we can get there.
def solve():
queue = [(1, 0, 0)]
while True:
x, y, step = queue.pop(0)
for x, y in [(x + m[0], y + m[1]) for m in move.values()] + [(x, y)]:
if (x, y) == (maxx - 1, maxy):
return step + 1
if (x, y) != (1, 0):
if x <= 0 or x >= maxx or y <= 0 or y >= maxy:
continue
if (x, y) in steps[(step + 1) % lcm]:
continue
if (x, y, step + 1) not in queue:
queue.append((x, y, step + 1))
print(solve())
The extra trips are no problem since this is very fast. The only changes I had
to make from part 1 to part 2 were modifying solve()
to parameterize start,
destination, and initial point in time, then call it 3 times for each trip:
def solve(src, dest, step):
queue = [(src[0], src[1], step)]
while True:
x, y, step = queue.pop(0)
for x, y in [(x + m[0], y + m[1]) for m in move.values()] + [(x, y)]:
if (x, y) == (dest[0], dest[1]):
return step + 1
if (x, y) != (src[0], src[1]):
if x <= 0 or x >= maxx or y <= 0 or y >= maxy:
continue
if (x, y) in steps[(step + 1) % lcm]:
continue
if (x, y, step + 1) not in queue:
queue.append((x, y, step + 1))
trip1 = solve((1, 0), (maxx - 1, maxy), 0)
trip2 = solve((maxx - 1, maxy), (1, 0), trip1)
trip3 = solve((1, 0), (maxx - 1, maxy), trip2)
print(trip3)
Problem statement is here.
Another easy one that I won't discuss in detail, we just need to implement conversion from decimal to SNAFU and back:
def to_dec(n):
digits = {'0': 0, '1': 1, '2': 2, '-': -1, '=': -2}
return sum([5 ** i * digits[d] for i, d in enumerate(n[::-1])])
def to_snafu(n):
s = ''
while n:
s = ['0', '1', '2', '=', '-'][n % 5] + s
n = n // 5 + (1 if s[0] in '-=' else 0)
return s
print(to_snafu(sum([to_dec(line.strip()) for line in open('input').readlines()])))
In Advent of Code tradition, day 25 has only 1 part.
This was another very fun set of problems and I am looking forward to Advent of Code 2023.
]]>In the previous post, we covered lambda calculus, a computational model underpinning functional programming. In this blog post, we'll continue down the functional programming road and cover one of the oldest programming languages still in use: LISP.
LISP was originally specified in 1958 by John McCarthy and the paper describing the language was published in 1960^{1}. It became very popular in AI research and flavors of it are still in use today.
LISP has a quite unique syntax and execution model.
If we are going to talk about LISP, we need to start with symbolic expressions. Symbolic expressions, or S-expressions, are defined as:
An S-expression is either
(x . y)
where x
and y
are S-expressions.
This very simple definition is very powerful: it allows us to represent any
binary tree. Let's start with a very simple universe where the only atom is
()
, representing a null value. With this atom and the above definition, while
we can't (easily) represent data, we can capture the shape of a binary tree. For
example, the tree consisting of a root node and two leaf nodes:
can be represented as (() . ())
.
The tree consisting of a root, a left leaf node, and a right node with two child leaf nodes
would be (() . (() . ()))
.
If we expand the definition of atom to include numbers and basic arithmetic
(+
, -
, *
, /
), we can represent arithmetic expressions as S-expressions.
2 + 3
can be represented as (+ . (2 . (3 . ()))
.
2 * (3 + 5)
can be represented as (* . (2 . ((+ . (3 . (5 . ()))) . ())
.
Note the S-expression definition only allows for values
(atoms) at leaf nodes
of the tree. An S-expression is either a leaf node containing a value or a
non-leaf node with 2 S-expression children. That means we can't represent 2 +
3
as
but the representation we just saw is equivalent.
S-expressions can be used to represent data. Consider a simple list 1, 2, 3, 4,
5
. Much like we saw in the previous post when we looked at representing lists
as lambda expressions, we can represent lists using S-expressions using a head
and a tail (recursively):
can be viewed as
or (1 . (2 . (3 . (4 . (5 . ())))))
.
We can also represent an associative array: instead of a value, we can represent
a key-value pair as an S-expression ((key . value)
), so we can represent the
associative array { 1: 2, 2: 3, 3: 5 }
as ((1 . 2) . ((2 . 3) . ((3 . 5) .
())))
.
Historically, a non-atom S-expression in LISP is called a cons cell (from
construction
). Instead of head and tail, LISP uses car and cdr
(standing for contents of the address register and contents of the decrement
register, which are artifacts of the computer architecture first flavors of
LISP were implemented in).
We just saw how we can represent trees, lists, and associative arrays using S-expressions. But S-expressions aren't limited to representing data: we can also use them to represent code.
We looked at how 2 + 3
would look like as an S-expression. In fact, we can
represent any function call as an S-expression, where the left node of the root
S-expression is the function to be called and the right subtree contains the
arguments.
2 + 3
is equivalent to the function add(2, 3)
. So we can represent the
function call add(2, 3)
as the S-expression (add . (2 . (3 . ())))
.
Note we can have any number of arguments as we grow the right subtrees: sum(2,
3, 4, 5)
can be represented as (sum . (2 . (3 . (4 . (5 . ())))))
. If we want
to pass the result of another function as an argument, say sum(2, sum(3, 4),
5)
, we can represent this as (sum . (2 . ((sum . (3 . (4 . ()))) . (5 . ()))
))
.
We saw in the previous post that we can represent pretty much anything using
functions. An if expression is a function if(condition, true-branch,
false-branch)
. We can combine this with recursion to generate loops. So we have
all the building blocks for a Turing-complete system.
It turns out we can represent both data and code as S-expressions. Before moving on to look at some implementation details, let's introduce some syntactic sugar.
Writing S-expression like this can become tedious, so let's introduce some
syntactic sugar. Instead of (1 . (2 . (3 . (4 . (5 . ())))))
, we can write
(1 2 3 4 5)
. We omit some of the parenthesis, the concatenation symbol .
,
and the final ()
. By default, we concatenate on the right subtree. If we need
to go down the left subtree, we add parenthesis. So instead of representing the
associative array { 1: 2, 2: 3, 3: 5 }
as ((1 . 2) . ((2 . 3) . ((3 . 5) .
())))
, we can more succinctly represent it as ((1 2) (2 3) (3 5))
, without
losing any meaning.
Similarly, (add . (2 . (3 . ())))
becomes (add 2 3)
and (sum . (2 . ((sum .
(3 . (4 . ()))) . (5 . ()))))
becomes (sum 2 (sum 3 4) 5)
.
In our implementation, we will represent S-expressions as lists which can contain any number of elements. This is a more succinct representation and will make our code easier to understand.
We can now look at implementing a small LISP. We take an input string, we parse it into an S-expression, then we evaluate the S-expression and print the result.
First, the parser: we will take a string as input, split it into tokens, then parse the tokens into an S-expression.
We will transform an input string into a list of tokens by matching it with
either (
, )
, or a string of alphanumeric characters. We'll use a regular
expression for this, then extract the matched values (using match.group()
)
into a list:
import re
def lex(line):
return [match.group() for match in re.finditer('\(|\)|\w+', line)]
We can now transform an input like '(add 1 (add 2 3))'
into the list of tokens
['(', 'add', '1', '(', 'add', '2', '3', ')', ')']
by calling lex()
on it.
We need to transform this list of tokens into an S-expression. First, we need a
couple of helper functions. An atom can be either a number or a symbol. We'll
create one from a token using an atom()
function:
def atom(value):
try:
return int(value)
except:
return value
The other helper function will yield while the head of our token list is
different than )
, then pop the )
token. We'll use this while parsing to
iterate over the tokens after a (
and until we find the matching )
:
def pop_rpar(tokens):
while tokens[0] != ')':
yield
tokens.pop(0)
Parsing into an S-expression is now very simple:
(
, we recursively parse the following tokens until we reach
the matching )
.)
, we raise an exception - this is an unmatched )
.atom()
on it.def parse(tokens):
match token := tokens.pop(0):
case '(':
return [parse(tokens) for _ in pop_rpar(tokens)]
case ')':
raise Exception('Unexpected )')
case _:
return atom(token)
That's it. If we parse the input string '(add 1 (add 2 3))'
using our
functions - parse(lex('(add 1 (add 2 3))'))
- we will get back
['add', 1, ['add', 2, 3]]
.
We can now take text as input and convert it into the internal representation we discussed.
The next step is to evaluate such an S-expression and return a result. We need two pieces for this: an environment which stores built-in functions and user-defined variables, and an evaluation function which takes an S-expression and processes it using the environment.
We'll start with a simple environment with built-in support for equality, arithmetic operations and list operations:
env = {
# Equality
'eq': lambda arg1, arg2: arg1 == arg2,
# Arithmetic
'add': lambda arg1, arg2: arg1 + arg2,
'sub': lambda arg1, arg2: arg1 - arg2,
'mul': lambda arg1, arg2: arg1 * arg2,
'div': lambda arg1, arg2: arg1 / arg2,
# Lists
'cons': lambda car, cdr: [car] + cdr,
'car': lambda list: list[0],
'cdr': lambda list: list[1:],
}
Our evaluation function has a few special-case handling for variable definitions, quotations, and if-expressions, and is otherwise pretty straightforward:
def eval(sexpr):
# If null or number atom, return it
if sexpr == [] or isinstance(sexpr, int):
return sexpr
# If string atom, look it up in environment
if isinstance(sexpr, str):
return env[sexpr]
match sexpr[0]:
case 'def':
env[sexpr[1]] = eval(sexpr[2])
case 'quote':
return sexpr[1]
case 'if':
return eval(sexpr[2]) if eval(sexpr[1]) else eval(sexpr[3])
case call:
return env[call](*[eval(arg) for arg in sexpr[1:]])
Our evaluation works like this:
def
, we add a definition to the environment.quote
, we return the second symbol unevaluated.if
, we evaluate the second symbol and if it is
truthy, we evaluate the third symbol, otherwise the fourth symbol.We're taking a bit of a shortcut here and relying on Python's notion of
truthy-ness (e.g. 0
or an empty list []
is non-truthy). If needed, we can
enhance our implementation with Boolean support.
We can now implement a simple read-eval-print loop (REPL):
while line := input('> '):
try:
print(eval(parse(lex(line))))
except Exception as e:
print(f'{type(e).__name__}: {e}')
We can try a few simple commands (shown below with the corresponding output):
> (def a 40)
None
> (def b 2)
None
> (add a b)
42
> (if a 1 0)
1
> (add 2 (add 3 4))
9
> (def list (cons 1 (cons 2 (cons 3 ()))))
None
> (car list)
1
> (cdr list)
[2, 3]
We can extend the environment with additional functions as needed. These
represent the built-in
functions of our LISP interpreter. One capability we
are still missing is the ability to define custom functions at runtime. Let's
extend our interpreter to support that.
A function can take any number of arguments, which should become defined in
the environment while the function is executing but which don't exist outside
the function. For example, if we define an addition function as add(x, y)
,
we should be able to refer to the x
and y
arguments inside the body of the
function but not outside of it. x
and y
only exist within the scope of
the function.
We can add scoping to our interpreter by extending our eval
to take an
environment as an argument instead of always referencing our env
. Then when
we create a new scope, we create a new environment to use.
For function definition, we will use the following syntax: (deffun
function_name (arguments...) (body...))
. deffun
denotes a function
definition. The second argument is the function name. The third is a list of
parameters and the fourth is the body of the function, which is going to be
evaluated in an environment where its arguments are defined.
We need a function factory:
def make_function(params, body, env):
return lambda *args: eval(body, env | dict(zip(params, args)))
This takes the parameters, body, and environment and returns a lambda which
expects a list of arguments. Calling the lambda will invoke eval
on the
body. Note we extend the environment with a dictionary mapping parameters to
arguments.
Let's update eval
to use a parameterized environment and support the new
deffun
function definition capability:
def eval(sexpr, env=env):
# If number atom, return value
if isinstance(sexpr, int):
return sexpr
# If string atom, look it up in environment
if isinstance(sexpr, str):
return env[sexpr]
if sexpr == []:
return []
match sexpr[0]:
case 'def':
env[sexpr[1]] = eval(sexpr[2], env)
case 'deffun':
env[sexpr[1]] = make_function(sexpr[2], sexpr[3], env)
case 'quote':
return sexpr[1]
case 'if':
return eval(sexpr[2], env) if eval(sexpr[1], env) else eval(sexpr[3], env)
case call:
return env[call](*[eval(arg, env) for arg in sexpr[1:]])
Besides plumbing env
through each eval
call, we just added a deffun
case where
we use our function factory.
We can run our REPL again and try out the new capability:
> (deffun myadd (x y) (add x y))
None
> (myadd 2 3)
5
Here is a Fibonacci implementation, using deffun
and recursion:
> (deffun fib (n) (if (eq n 0) 0 (if (eq n 1) 1 (add (fib (sub n 1)) (fib (sub n 2))))))
None
> (fib 8)
21
If n
is 0, return 0
else if n
is 1, return 1
, else recursively call
fib
for n - 1
and n - 2
and add the results.
We won't provide a proof of Turing-completeness but it should be obvious that the capabilities we implemented so far are sufficient to emulate, for example, a cyclic tag system like we did in the previous post with lambdas.
The full implementation of our mini-LISP is here.
Peter Norvig wrote a much more detailed article describing a LISP implementation here.
LISP is a very interesting language as it uses the same representation for both data and code (for better or worse). Turns out binary trees (or trees if we use our syntactic sugar) are enough to represent both.
As we just saw, a core LISP runtime is fairly easy to implement and many of the more advanced features can be bootstrapped within the language itself.
Languages in the LISP family are called LISP dialects. Even though the language is many decades old, modern dialects are alive and thriving. For example Raket and Closure are LISP dialects.
In this post we looked at LISP:
Original paper: http://www-formal.stanford.edu/jmc/recursive.pdf. ↩
In the previous posts, we dug deeper into one particular model of computation, starting with Turing Machines in part 2, to the von Neumann computer architecture in part 6, to some of the implementation practicalities of machines - physical or virtual - in part 7.
We'll switch gears and cover another computational model this time around: lambda calculus. Lambda calculus was developed by Alonzo Church around the same time Alan Turing was proposing the Turing machine as a universal model for computation. The Church-Turing thesis^{1} proves the equivalence between the two models - anything a Turing machine can compute can also be computed by lambda calculus.
Formally:
Lambda calculus consists of lambda terms and reductions applied to lambda terms.
The lambda terms are built with the following rules, where \(\Lambda\) is the set of all possible lambda terms:
- Variables, like \(x\), are lambda terms. \(x \in \Lambda\).
- Abstractions, \((\lambda x.M)\). This is a function definition where \(M\) is a lambda term and \(x\) becomes bound in the expression. For \(x \in \Lambda\) and \(M \in \Lambda\), \((\lambda x.M) \in \Lambda\).
- Applications, \((M \space N)\). This applies the function \(M\) to the argument \(N\), where \(M\) and \(N\) are lambda terms. For \(M \in \Lambda\) and \(N \in \Lambda\), \((M \space N) \in \Lambda\).
If a term \(y\) appears in \(M\) but is not bound, then \(y\) is free in \(M\), e.g. for \(\lambda x.y \space x\), \(x\) is bound and \(y\) is free. The reductions are:
- \(\alpha\)-equivalence: bound variables in an expression can be renamed to avoid collisions: \((\lambda x.M[x]) \rightarrow (\lambda y.M[y])\).
- \(\beta\)-reduction: bound variables in the body of an abstraction are replaced with the argument expression: \((\lambda x.t)s \rightarrow t[x := s]\).
- \(\eta\)-reduction: if \(x\) is a variable that does not appear free in the lambda term M, then \(\lambda x.(M x) \rightarrow M\). This can also be understood in terms of function equivalence: if two functions give the same result for all arguments, then the functions are equivalent.
Let's look at a few simple examples in Python:
lambda x: x
This is the identity function expressed as a lambda abstraction. In this case,
x
(the lambda parameter), becomes bound in the body of the lambda.
\(\alpha\)-equivalence:
lambda y: y
This is the same identity function, we're just using y
instead of x
to name
the parameter.
For function application, we can apply the identity function to any other lambda term and get back that lambda term:
(lambda x: x)(lambda y: y)
This applied the identify function lambda x: x
to the argument lambda y: y
,
which will give us back lambda y: y
.
Based on the above definition, lambda calculus consists exclusively of lambda
terms - while (lambda x: x)(10)
is valid Python code, applying an identity
lambda to the number 10
, lambda calculus does not have a number 10
. Enter
Church encoding: Alonzo Church came up with a way to encode logic values and
numbers as lambda terms.
Let's start with Boolean logic: TRUE
is defined as \(T := (\lambda x.\lambda
y.x)\), FALSE
is defined as \(F := (\lambda x.\lambda y.y)\).
TRUE = lambda x: lambda y: x
FALSE = lambda x: lambda y: y
Note with this definition, if we apply a first argument to TRUE
, and a second
argument to the returned lambda, we always get back the first argument. For
FALSE
, we always get back the second argument.
We can defined IF
as \(IF := (\lambda x.x)\). This is the same as the identity
function.
IF = lambda x: x
This works since we defined TRUE
to always return the first argument and
FALSE
to always return the second argument. So when we call IF(c)(x)(y)
,
if c
is TRUE
, we get back x
(the if-branch), otherwise we get back y
(the else-branch).
We can try this out (though again this is outside of lambda calculus, we are introducing numbers for clarity):
IF(TRUE)(1)(2) # This returns 1
IF(FALSE)(1)(2) # This returns 2
Now that we can express if-then-else, we can easily express other logic operators. Negation is \(\lambda x.(x \space F \space T)\).
NOT = lambda x: x(FALSE)(TRUE)
If x
is TRUE
, we get back the first argument, FALSE
; if x
is FALSE
,
we get back the second argument, TRUE
.
x AND y
can be expressed as if x then y else FALSE, or: \(\lambda x.\lambda
y.(x \space y \space F)\). x OR y
can be expressed as if x then TRUE else y,
or \(\lambda x.\lambda y.(x \space T \space y)\).
AND = lambda x: lambda y: x(y)(FALSE)
OR = lambda x: lambda y: x(TRUE)(y)
Here are a few examples:
print(AND(TRUE)(TRUE) == TRUE) # prints True
print(AND(TRUE)(FALSE) == TRUE) # prints False
print(OR(TRUE)(FALSE) == TRUE) # prints True
print(NOT(FALSE) == TRUE) # prints True
Using only lambda terms, we were able to implement Boolean logic! But Church encoding goes further - we can also represent natural numbers and arithmetic as lambda terms.
Alonzo Church encoded numbers as applications of a function \(f\) to a term \(x\).
0
means applying \(f\) 0 times to the term: \(0 := \lambda f.\lambda x.x\).1
means applying \(f\) once to the term: \(1 := \lambda f.\lambda x.f x\).2
means applying \(f\) twice: \(2 := \lambda f.\lambda x.f (f x)\).In general, the number n
is represented by n
applications of f
: \(n :=
\lambda f.\lambda x.f (f (... (f x)) ... ))\) or \(n := \lambda f.\lambda x.
f^n(x)\).
In Python:
ZERO = lambda f: lambda x: x
ONE = lambda f: lambda x: f(x)
TWO = lambda f: lambda x: f(f(x))
...
Note ZERO
is the same as FALSE
. With this definition of numbers, we can
define the successor function SUCC
as a function that takes a number n
(represented with our Church encoding), the function f
, the term x
, and
applies f
one more time. \(SUCC := \lambda n.\lambda f.\lambda x.f (n f x)\).
SUCC = lambda n: lambda f: lambda x: f(n(f)(x))
We can define addition as \(PLUS := \lambda m.\lambda n.m \space SUCC \space n\).
Since we define a number as repeatedly applying a function, we express m + n
as applying m
times the successor function SUCC
to n
.
PLUS = lambda m: lambda n: m(SUCC)(n)
We can similarly define multiplication as applications of the PLUS
function:
MUL = lambda m: lambda n: m(PLUS)(n)
We'll stop here with arithmetic, but this should hopefully give you a sense of the expressive power of lambda calculus.
Some well-known lambda terms are called combinators:
In Python:
I = lambda x: x
K = lambda x: lambda y: x
S = lambda x: lambda y: lambda z: x(z)(y(z))
Turns out these 3 combinators can together express any lambda term. The SKI
combinators are the simplest programming language
since they can express
anything expressable in lambda calculus, which we know is Turing-complete.
Another interesting combinator is the \(Y\) combinator. In lambda calculus, there
is no way for a function to reference itself: within the body of a lambda like
lambda x: ...
we can refer to the bound term x
, but we can reference the
lambda itself. The implication is that we can't define, using this syntax,
self-referential functions. We can only pass functions as arguments. How can we
then implement recursion? With the \(Y\) combinator, of course.
Let's take an example: we can recursively define factorial as:
def fact(n):
return 1 if n == 0 else n * fact(n - 1)
This works, but note we reference fact()
within its body. In lambda calculus
we can't do that.
The \(Y\) combinator is defined as \(Y := \lambda f.(\lambda x.f (x x))(\lambda x.f (x x))\).
Y = lambda f: (lambda x: f(x(x)))(lambda x: f(lambda z: x(x)(z)))
Note the Python implementation is slightly different than the mathematical definition. This has to do with the way in which Python evaluates arguments. We won't go into the details here, but consider this a Python implementation detail irrelevant to the lambda calculus discussion^{2}.
Here is a lambda version of factorial:
FACT = lambda f: lambda n: 1 if n == 0 else n * f(n - 1)
With this definition, we pass the function to call as an argument (f
). We can
fully express this in lambda calculus (using Church numerals, arithmetic and
logic), but we'll keep the example simple. We can then use the \(Y\) combinator
like this:
print(Y(FACT)(5)) # prints 120
This should give you an intuitive understanding of how the \(Y\) combinator works: we pass it our function and argument, and it enables the recursion mechanism.
We can similarly implement Fibonacci as:
FIB = lambda f: lambda n: 1 if n <= 2 else f(n - 1) + f(n - 2)
print(Y(FIB)(10)) # prints 55
The powerful \(Y\) combinator can be used to define recursive functions in programming languages that don't natively support recursion.
Let's also look at how we can express lists in lambda calculus. Let's start with pairs. We can define a pair as \(PAIR := \lambda x.\lambda y.\lambda f. f x y\). We can extract the first element of a pair with \(FIRST := \lambda p. p \space T\) and the second one with \(SECOND := \lambda p.p \space F\).
PAIR = lambda x: lambda y: lambda f: f(x)(y)
FIRST = lambda p: p(TRUE)
SECOND = lambda p: p(FALSE)
print(FIRST(PAIR(10)(20))) # prints 10
print(SECOND(PAIR(10)(20))) # prints 20
We can define a NULL
value as \(NULL := \lambda x.T\) and a test for NULL
as
\(ISNULL := \lambda p.p (\lambda x.\lambda y.FALSE)\).
NULL = lambda x: TRUE
ISNULL = lambda p: p(lambda x: lambda y: FALSE)
We can now define a linked list as either NULL
(an empty list) or as a pair
consisting of a pair of elements - a head element and a tail list.
We can get the head of the list using FIRST
and the tail using SECOND
. Given
list \(L\), we can prepend an element \(x\) by forming the pair \((x, L)\).
HEAD = FIRST
TAIL = SECOND
PREPEND = lambda x: lambda xs: PAIR(x)(xs)
We can build a list by prepending elements to NULL
, and traverse it using
HEAD
and TAIL
:
# Build the list [10, 20, 30]
L = PREPEND(10)(PREPEND(20)(PREPEND(30)(NULL)))
print(HEAD(TAIL(L))) # prints 20
Appending is more interesting: if our list is represented as a pair of head and
tail, we need to traverse
the list until we reach the end. This sounds a lot
like a recursive function: appending x
to xs
entails returning the pair
PAIR(x, NULL)
if xs
is NULL
, else the pair PAIR(HEAD(xs), APPEND(TAIL(xs,
x)))
. Fortunately, we just looked at the \(Y\) combinator which allows us
to express this.
Here is a simplified, readable implementation, using Python tuples:
_append = lambda f: lambda xs: lambda x: \
(x, None) if not xs else (xs[0], f(xs[1])(x))
append = Y(_append)
print(append(append(append(None)(10))(20))(30))
# This will print (10, (20, (30, None)))
We can express the same using the lambdas we defined above (NULL
, ISNULL
,
PAIR
, HEAD
, TAIL
):
_APPEND = lambda f: lambda xs: lambda x: \
ISNULL(xs) (lambda _: PAIR(x)(NULL)) (lambda _: PAIR(HEAD(xs))(f(TAIL(xs))(x))) (TRUE)
APPEND = Y(_APPEND)
L = APPEND(APPEND(APPEND(NULL)(10))(20))(30)
print(HEAD(L)) # prints 10
print(HEAD(TAIL(L))) # prints 20
We covered logic, arithmetic, combinators, pairs, and lists, all expressed as lambda terms. Let's also sketch a proof of Turing completeness, like we did in previous posts.
We're calling this a sketch
, as lambda notation is not easy to read. We will
instead look at an implementation using more Python syntax than just lambdas,
but we will only use constructs which we know can be expressed in lambda
calculus.
As usual, we will emulate another system which we know to be Turing-complete.
In part 3
we looked at tag systems. We talked about cyclic tag systems, which can emulate
m-tag systems, which are Turing-complete. As a reminder, a cyclic tag system is
implemented as a set of binary strings (strings containing only 0
s and 1
s)
which are production rules, and we process a binary input string by popping the
head of the string and, if it is equal to 1
, appending the current production
rule to the string. We cycle through the production rules at each step. This is
the code we used in the previous post:
def cyclic_tag_system(productions, string):
# Keeps track of current production
i = 0
# Repeat until the string is empty
while string:
string = string[1:] + (productions[i] if string[0] == '1' else '')
# Update current production
i = i + 1
if i == len(productions):
i = 0
yield string
We used the productions 11
, 01
, and 00
and the input 1
:
productions = ['11', '01', '00']
string = '1'
print(string)
for string in cyclic_tag_system(productions, string):
print(string)
Let's sketch an alternative implementation using the constructs we covered in this post.
First, we can describe our production rules as lists of Boolean values. We
know how to represent Boolean values (TRUE
and FALSE
), and how to build
a list using PAIR
. Our productions can be represented as:
p1 = (True, (True, None)) # PAIR(TRUE)(PAIR(TRUE)(NULL))
p2 = (False, (True, None)) # PAIR(FALSE)(PAIR(TRUE)(NULL))
p3 = (False, (False, None)) # PAIR(FALSE)(PAIR(FALSE)(NULL))
productions = (p1, (p2, (p3, None)))
We can cycle through the list by processing the head, then appending it to the tail of the list. Here are simpler implementations of our list processing functions over Python tuples (though we know how to do these using only lambda terms):
def head(p):
return p[0]
def tail(p):
return p[1]
def append(xs, x):
return (x, None) if not xs else (head(xs), append(tail(xs), x))
# If we want to cycle through our productions, we can do:
# productions = append(tail(productions), head(productions))
We'll also need a function to concatenate two lists. We can easily build this
on top of append()
:
def concat(xs, ys):
return xs if not ys else concat(append(xs, head(ys)), tail(ys))
While we still have ys
, we append the head of ys
to xs
, then recurse
with the tail of ys
.
We process our input string as follows: if it is empty, we are done. If not,
if the head is 1
, we concatenate our current production to the end of the
string, and recurse, cycling productions:
def cyclic_tag_system(productions, input):
return None if not input else \
cyclic_tag_system(
# Cycle productions
append(tail(productions), head(productions)),
# If head is True, concatenate head production. Pop head input either way.
concat(tail(input), head(productions)) if head(input) else tail(input))
Let's throw in a print()
and run this on the same input as our original
example:
def cyclic_tag_system(productions, input):
print(input)
return None if not input else \
cyclic_tag_system(
# Cycle productions
append(tail(productions), head(productions)),
# If head is True, concatenate head production. Pop head input either way.
concat(tail(input), head(productions)) if head(input) else tail(input))
# The input is equivalent to the string '1'
cyclic_tag_system(productions, (True, None))
This should produce output very similar to our original cyclic_tag_system()
,
but using lists of Booleans instead of strings of 0
s and 1
s.
We emulated a cyclic tag system in lambda calculus - well, we didn't write all the code as lambda terms, but everything is expressed as one-liner functions that use only if-then-else expressions, lists (pair, head, tail), and recursion (for which we have the \(Y\) combinator).
Lambda calculus has been extremely influential in computer science - it is the
root of functional programming. LISP, one of the earliest programming
languages, is heavily influenced by lambda calculus. Many ideas, like anonymous
functions, also known as lambdas, are now broadly available in most modern
programming languages (Python even uses the keyword lambda
for these, as we
saw in this post).
In this post we covered lambda calculus:
append
operation.See this Wikipedia article. ↩
In the previous post we covered the von Neumann architecture and even built a small VM implementing the different components. Such naÃ¯ve implementation does make for a very inefficient machine though. In this post, we'll dive a bit deeper into machine architectures (virtual and physical) and discuss some of the implementation details. We'll talk about processing: register and stack-based; we'll talk about memory: word size, byte and word addressing; finally, we'll talk about I/O: port and memory mapped. Note these are all machines that conform to the von Neumann architecture, with the same high-level components. We're just double clicking to the next level of implementation details.
The VM we implemented in our previous post simply operated directly over the memory. This works for a toy example, but moving data from memory to the CPU and back is costly. That's why modern CPUs employ multiple layers of caching (we won't cover these in this post), and rely on a set of registers to perform operations.
Registers can store a number of bits (the word size, more on it below)
and operations are performed using registers. For example, to add two
numbers, the machine would load one number into register R0
, the
second number into register R1
, add the values stored in registers
R0
and R1
, then finally save the result back to memory:
mov r0 @<memory address 1> # Move the value from memory address 1 to r0
mov r1 @<memory address 2> # Move the value from memory address 2 to r1
add r0 r1 # Add the values storing the result in r0
mov @<memory address 3> r0 # Move the value from r0 to memory address 3
Some register are used for general computation. These are called
general-purpose registers. Other register have specialized purposes.
For example, the program counter which keeps track of the instruction to
be executed is usually implemented as an IP
(instruction pointer) or
PC
(program counter) register.
The original 8088 Intel processor had 14 registers. Modern Intel processors have significantly more registers^{1}, though many of them are special-purpose. ARM processors have 17 registers^{2}, 13 of which are general purpose.
Let's emulate a simple CPU with 4 general purpose registers and a
program counter register to get the feel of it. We will only implement
mov
(move) and add
instructions for this example. Our implementation
will check the 16th bit of an argument to determine whether it refers to
a register (if 0
) or to a memory location (if 1
).
class CPU:
def __init__(self, memory):
self.memory = memory
self.registers = [0, 0, 0, 0, 0] # r0, r1, r2, r3, pc
def run(self):
while self.registers[4] < len(self.memory):
instr, arg1, arg2 = self.memory[
self.registers[4]:self.registers[4] + 3]
self.process(instr, arg1, arg2)
self.registers[4] += 3
def get_at(self, arg):
# 16th bit tells us whether this refers to a register or memory
if arg & (1 << 15): # Memory address
return self.memory[arg ^ (1 << 15)]
else: # Register
return self.registers[arg]
def set_at(self, arg, value):
# 16th bit tells us whether this refers to a register or memory
if arg & (1 << 15): # Memory address
self.memory[arg ^ (1 << 15)] = value
else: # Register
self.registers[arg] = value
def process(self, instr, arg1, arg2):
match instr:
case 0: # mov
self.set_at(arg1, self.get_at(arg2))
case 1: # add
self.set_at(arg1, self.get_at(arg1) + self.get_at(arg2))
Here is how it would run a small program that adds two numbers and stores the result:
program = [
0, 0, 15 | (1 << 15), # mov r0 @15
0, 1, 16 | (1 << 15), # mov r1 @16
1, 0, 1, # add r0 r1
0, 17 | (1 << 15), 0, # mov @17 r0
0, 4, 18 | (1 << 15), # mov pc @18 - this ends execution
40, # this is @15
2, # this is @16
0, # this is @17
10000 # this is @18
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
print(memory[17]) # Should print 0
CPU(memory).run()
print(memory[17]) # Should print 42
We're doing a bunch of stuff by hand
, like loading the program into
memory and not using an assembler to implement the program. That's
because we're only focusing on the register-based processing. You can
update the assembler in the previous post to target this VM as an
exercise.
An alternative to registers is to use a stack for storage. While hardware stack machines are not unheard of, register machines easily outperform them so most CPUs you interact with are register-based. That said, stack machines are a popular choice for virtual machines - they are easier to implement and port to different systems and the stack keeps the data being processed close together which helps with performance when running the VM on a physical machine. A few examples: JVM (the Java virtual machine), the CLR (the .NET virtual machine), CPython's VM (the VM for the reference Python implementation) are all stack-based.
The example we used above of adding two numbers would look like this on a stack machine: push the first number onto the stack, push the second number onto the stack, add the numbers (which would pop the two numbers from the stack and replace them with their sum), then pop the value from the stack and store it in memory.
push @<memory address 1> # Push a value from memory address 1
push @<memory address 2> # Push a value from memory address 2
add # Add the top two values
pop @<memory address 3> # Pop the top of the stack and store at memory address 3
Another advantage of stack machines is in general the instructions tend to be shorter. As you can see above, for most instructions that move data around, we don't need to specify both a source and a destination since the stack is implied.
Let's emulate a simple stack VM with only push
, add
, and pop
instructions, plus a jmp
(jump) instruction so we can use the same
mechanism to terminate:
class CPU:
def __init__(self, memory):
self.memory = memory
self.stack, self.pc = [], 0
def run(self):
while self.pc < len(self.memory):
instr, arg = self.memory[self.pc:self.pc + 2]
self.process(instr, arg)
self.pc += 2
def process(self, instr, arg):
match instr:
case 0: # push
self.stack.append(self.memory[arg])
case 1: # pop
self.memory[arg] = self.stack.pop()
case 2: # jmp
self.pc = self.stack.pop()
case 3: # add
self.stack.append(self.stack.pop() + self.stack.pop())
Here is how it would run a small program that adds two numbers and stores the result:
program = [
0, 12, # push @12
0, 13, # push @13
3, 0, # add
1, 14, # pop @14
0, 15, # push @15
2, 0, # jmp
40, # this is @12
2, # this is @13
0, # this is @14
10000, # this is @15
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
print(memory[14]) # Should print 0
CPU(memory).run()
print(memory[14]) # Should print 42
Contrast the implementation with the register-based one: the latter VM only needs 1 argument for the instructions we implemented and the program is slightly shorter.
So far we focused on how data is processed. Let's also look at the different ways of referencing data.
We've been using Python for our toy implementations. Python supports arbitrarily large integers, so a list of numbers in Python (the way we implemented our memory) doesn't imply much in terms of bits and bytes. Bits and bytes do become important for physical machines and serious VMs implemented in languages closer to the metal.
First, let's talk about word size. A word is the fixed-size unit of computation for a CPU. It's size is the number of bits. For example, a 16-bit processor has a word-size of 16-bits.
Applied to registers, this would mean that a machine register can hold at most 16 bits (a value between 0 and 65535). Operations within the value range are blazingly fast, as they run natively. If we need to process larger values, we need to do extra work to chunk the values into words and process these in turn. For example we can split a 32-bit value into two 16-bit values, process them separately, then concatenate the result. This obviously impacts performance. The point being that we are not necessarily limited to the word size, but processing larger values becomes much costlier.
Applied to memory addresses, this would mean how pointers are represented and what range of values can be addressed. For example, if the word size is 16 bits, then a pointer can point to any one of 65536 distinct memory locations.
An architecture can use the same word size for both registers and pointers, or different word sizes for different concerns. Commonly, a single word size is used (and, potentially, fractions or multiples of it for special concerns), that's why it's common to refer to a processor as a 32-bit processor, 64-bit processor etc.
An implication of word size applied to memory addressing is how the machine accesses memory. Some architectures allow byte addressing, which means a pointer points to a specific byte in memory, while others support only word addressing, which means a pointer points to a word in memory.
This is another important decision when designing a computer. If we want to be able to address individual bytes, a 16 bit pointer can refer to any of 65536 bytes. That is 64 Kb. If our memory is larger than that, a pointer won't be able to address higher locations.
On the other hand, if we make our memory word-addressable, for our 16-bit example, a pointer can refer to any of 65536 16-bit words. 16 bits are 2 bytes, so our memory's upper limit is 131072 bytes (65536 x 2), which is 128 Kb. We can now refer to higher memory addresses, but we can't address individual bytes as before - address 0 is no longer the byte at 0, is the whole 2-byte word (since address 1 refers to the next 2 bytes and so on).
This difference becomes even more dramatic for higher word sizes. A 32-bit pointer can address 4294967296 bytes (up to 4 Gb of memory). Alternately, with word addressing, the same pointer can cover 16 Gb.
On the flip side, word-addressing is less efficient when the unit of
processing is smaller. Let's take text editing as an example. Say we
want to update a one byte character, like a UTF-8 encoded common
character like a
. If we can refer to it directly, we can load,
process, and update its memory location using a pointer. If, on the
other hand, this character is part of a larger word, we would have to
process the whole word to extract the character we care about (masking
bits we don't need to process), apply the update to the whole word, and
write this word back to memory.
So depending on the scenario, byte or word addressing might make things faster or slower. Byte addressing is great for text processing - document authoring, HTML, writing code etc. Word addressing unlocks larger memory sizes and is great for crunching numbers - math, graphics etc.
Another important design decision is how to handle I/O.
One way to connect I/O to the system is through specific CPU
instructions. For example, the CPU might have an inp
instruction used
to consume input and an out
instruction used to send output. Programs
can use these instructions to perform I/O. This is called port-mapped
I/O, as I/O is achieved by connecting devices to the CPU via dedicated
ports.
For example, let's extend our stack machine with an out
instruction
(also connecting an output to it):
class CPU:
def __init__(self, memory, out):
self.memory, self.out = memory, out
self.stack, self.pc = [], 0
def run(self):
while self.pc < len(self.memory):
instr, arg = self.memory[self.pc:self.pc + 2]
self.process(instr, arg)
self.pc += 2
def process(self, instr, arg):
match instr:
case 0: # push
self.stack.append(self.memory[arg])
case 1: # pop
self.memory[arg] = self.stack.pop()
case 2: # jmp
self.pc = self.stack.pop()
case 3: # add
self.stack.append(self.stack.pop() + self.stack.pop())
case 4: # out
self.out(self.stack.pop())
Here is a program that prints Hello
:
program = [
0, 24, # push @24
0, 25, # push @25
0, 26, # push @26
0, 27, # push @27
0, 28, # push @28
4, 0, # out
4, 0, # out
4, 0, # out
4, 0, # out
4, 0, # out
0, 29, # push @29
2, 0, # jmp
111, # this is @24
108, # this is @25
108, # this is @26
101, # this is @27
72, # this is @28
10000, # this is @29
]
## Load program into memory
memory = [0] * 10000
memory = program + memory[len(program):]
def out(val):
print(chr(val), end='')
CPU(memory, out).run()
An alternative to port-mapped I/O is memory-mapped I/O. In this case, a certain address range of memory is used for I/O operations. That is, from the CPU's perspective, memory and I/O are addressed identically. But depending on the address range, data might reside in memory or it might actually come from/go to an I/O device.
Let's enhance our memory implementation (which so far was just an array) to support mapped I/O. In this case, any values written at address 1000 will be instead printed on screen:
class MappedMemory:
def __init__(self, program):
# MappedMemory wraps a list
self.memory = [0] * 10000
self.memory = program + self.memory[len(program):]
def __len__(self):
# Use underlying list's __len__
return self.memory.__len__()
def __getitem__(self, key):
# Index in wrapped list
return self.memory[key]
def __setitem__(self, key, value):
# If key is 1000, print
if key == 1000:
print(chr(value), end='')
# Otherwise set in underlying list
else:
self.memory[key] = value
And here is the corresponding program that prints Hello
(using the
stack CPU without the out
instruction and connected output):
program = [
0, 24, # push @24
0, 25, # push @25
0, 26, # push @26
0, 27, # push @27
0, 28, # push @28
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
1, 1000, # pop @1000
0, 29, # push @29
2, 0, # jmp
111, # this is @24
108, # this is @25
108, # this is @26
101, # this is @27
72, # this is @28
10000, # this is @29
]
## Load program into memory
memory = MappedMemory(program)
CPU(memory).run()
Note in this program we repeatedly set
the value at address 1000
which is mapped to our output device (print()
).
In this post we discussed some of the implementation details of machines and virtual machines:
A few years back I implemented a toy VM with 7 registers, 16 op codes, 128 KB of memory, and port-mapped I/O in 121 lines of C++. It comes with an assembler, examples, and, of course, a Brainfuck interpreter. Linking it here for reference: Pixie.
See this SO question. ↩
See the ARM documentation. ↩
During the previous posts, we covered Turing machines, tag systems, and cellular automata. All of these are equivalent in terms of what they can compute, but some are more practical than others. In this post, we'll look at the von Neumann architecture of physical computers and implement an extremely inefficient machine, write a few programs targeting it, then prove it is Turing complete.
John von Neumann was a famous mathematician and physicist. Contemporary with Alan Turing, he was aware of Turing's work on Turing machines and computability. At the same time, von Neumann was involved in the Manhattan Project which required lots of computation provided by some early computers. Thus he got involved in computer design. Unlike a Turing machine, a physical computer can't have an infinite tape and while data is processed based on input and states, this needs to be more ergonomic than Yurii Rogozhin's 4-state 6-symbol machine we described in Part 2.
Von Neumann described a computer architecture as consisting of the following components^{1}:
- A central arithmetic component (CA) handling calculation.
- A central control component (CC) driving which calculations should be performed.
- Memory (M) for storage.
- Input (I) and output (O) components to get data into the system and to communicate results outside of the system, from/to a recording medium (R)
Here is a diagram of this architecture:
Before von Neumann, computers were single-purpose devices - the programming was hardwired. One of the major innovations, which might not be apparent, is the introduction of a central control component and the ability of the memory to store not only data but also the program itself. This makes devices based on this architecture able to be reprogrammed to perform different tasks.
We can now load an arbitrary program into memory. The program will use the instructions which our central arithmetic understands to perform computations. The central control can read this program and have the central arithmetic perform the required operations. During execution, data is also read from/written to memory.
Programs (and data) is loaded into memory through the input component and results are sent through the output component.
While over the following decades this architecture got tweaked and tuned, it's pretty obvious it is the ancestor of all modern computers: computers still have CPUs, which include control and arithmetic, and memory.
Let's create a virtual machine based on this architecture.
We will create a very simple machine based on this architecture in Python. In subsequent posts, we will look at other designs, but we're starting with a direct translation of this architecture.
The interface to our input component is a function that, when called, returns an integer. This is all our machine needs to get data.
We will implement this over a text file. Our input component will buffer
this file into a list and expose a read_one()
function that will
return one integer (as returned by [ord()]{.title-ref}) for each
character from the buffer.
def inp(file):
buffer = list(open(file).read())
return lambda: ord(buffer.pop(0))
The interface to our output component is a function that takes an integer as an argument. This is all our machine needs to output one memory cell.
We will implement this using print()
and actually convert the given
integer to a character. This is just to provide a convenient way for us
to look at output like Hello world!
.
def out(value):
print(chr(value), end='')
Our memory will consist of a list of 10000 integers. We will zero-initialize the list, then load a program from a file to memory, starting at address 0. We expect the program to consist of a series of integers separated by a space or a newline character. We'll use this encoding to make it easier for us to peek at the code targeting our von Neumann machine.
def memory(file):
memory = [0] * 10000
for i, value in enumerate(' '.join(open(file).readlines()).split()):
memory[i] = int(value)
10000 is chosen arbitrarily, at this point we're not worrying about word size, page alignment etc. We simply have room to store 10000 integers in our memory, which will include both code and data.
We'll package the control and arithmetic components into a CPU
class.
We'll initialize this class with memory, input, and output components.
class CPU:
def __init__(self, memory, inp, out):
self.memory, self.inp, self.out = memory, inp, out
Our control unit will maintain a program counter (PC
), an index into
the memory pointing to the next instruction to execute. The machine runs
by reading 3 integers from memory (at PC
, PC + 1
and PC + 2
), and
passing these to the arithmetic unit for processing. The program counter
is then incremented by 3. This repeats until PC
goes outside the
bounds of the memory, at which point the machine halts (alternately we
could have provided some HALT
instruction).
def run(self):
self.pc = 0
while self.pc < len(self.memory):
instr, m1, m2 = self.memory[self.pc:self.pc + 3]
self.process(instr, m1, m2)
self.pc += 3
We will implement process()
next.
Our arithmetic unit will process triples of
<Instruction> <memory address 1> <memory address 2>
. It will support 8
instructions:
AT
will set the value at memory address 1
to be the value at the
memory address specified by the value at memory address 2
(in
short, m[m1] = m[m[m2]]
).SET
will set the value at the memory address specified by the
value at memory address 1
to be the value at memory address 2
(in short, m[m[m1]] = m[2]
).ADD
will update the value at memory address 1
by adding the
value at memory address 2
to it (in short, m[m1] += m[m2]
).NOT
will update the value at memory address 1
to be 0 if the
value at memory address 2
is different than 0, or 1 if the value
at memory address 2
is 0 (in short, m[m1] = !m[m2]
).EQ
will compare the values at memory address 1
and
memory address 2
and update the value at memory address 1
to be
1 if they are equal, 0 otherwise (in short,
m[m1] = m[m1] == [m2]
).JZ
will perform a conditional jump - if the value at
memory address 1
is 0, it will update the program counter to point
to memory address 2
(in short, if !m[m1] then PC = m[m2]
).INP
will read one integer from the input and store it at
memory address 1
+ an offset value specified at memory address 2
(in short, m[m1 + m[m2]] = inp()
).OUT
will write the value at memory address 1
+ an offset value
specified at memory address 2
to the output (in short,
out(m[m1 + m[m2]])
.Since the instructions are also read from memory, which is a list of
integers, we will encode them as integers: AT = 0
, SET = 1
, ...
OUT = 7
.
def process(self, instr, m1, m2):
match instr:
case 0: # AT
self.memory[m1] = self.memory[self.memory[m2]]
case 1: # SET
self.memory[self.memory[m1]] = self.memory[m2]
case 2: # ADD
self.memory[m1] += self.memory[m2]
case 3: # NOT
self.memory[m1] = +(not self.memory[m2])
case 4: # EQ
self.memory[m1] = +(self.memory[m1] == self.memory[m2])
case 5: # JZ
if not self.memory[m1]:
# Set PC to m2 - 3 since run() will increment PC by 3
self.pc = m2 - 3
case 6: # INP
self.memory[m1 + self.memory[m2]] = self.inp()
case 7: # OUT
out(self.memory[m1 + self.memory[m2]])
case _:
raise Exception("Unknown instruction")
Putting it all together, we'll take two input arguments: the first one
(argv[1]
) will represent the code input file containing the program,
the second one (argv[2]
) will be the file containing additional input
to be consumed by the inp()
function:
import sys
vn = CPU(memory(sys.argv[1]), inp(sys.argv[2]), out)
vn.run()
Here is our von Neumann virtual machine in one listing:
def inp(file):
buffer = list(open(file).read())
return lambda: ord(buffer.pop(0))
def out(value):
print(chr(value), end='')
def memory(file):
memory = [0] * 10000
for i, value in enumerate(' '.join(open(file).readlines()).split()):
memory[i] = int(value)
return memory
class CPU:
def __init__(self, memory, inp, out):
self.memory, self.inp, self.out = memory, inp, out
def run(self):
self.pc = 0
while self.pc < len(self.memory):
instr, m1, m2 = self.memory[self.pc:self.pc + 3]
self.process(instr, m1, m2)
self.pc += 3
def process(self, instr, m1, m2):
match instr:
case 0: # AT
self.memory[m1] = self.memory[self.memory[m2]]
case 1: # SET
self.memory[self.memory[m1]] = self.memory[m2]
case 2: # ADD
self.memory[m1] += self.memory[m2]
case 3: # NOT
self.memory[m1] = +(not self.memory[m2])
case 4: # EQ
self.memory[m1] = +(self.memory[m1] == self.memory[m2])
case 5: # JZ
if not self.memory[m1]:
# Set PC to m2 - 3 since run() will increment PC by 3
self.pc = m2 - 3
case 6: # INP
self.memory[m1 + self.memory[m2]] = self.inp()
case 7: # OUT
out(self.memory[m1 + self.memory[m2]])
case _:
raise Exception("Unknown instruction")
import sys
vn = CPU(memory(sys.argv[1]), inp(sys.argv[2]), out)
vn.run()
We can save this as vn.py
.
Let's create a Hello world!
program targeting this machine. We will
use the OUT
instruction to output each character of Hello
and a
new line (\n
). We'll first tell the VM to output the values at memory
address 21 to 26:
7 21 9999
7 22 9999
7 23 9999
7 24 9999
7 25 9999
7 26 9999
We are referencing addresses 21 to 26 plus the offset 0 (the value at
memory 9999
, since our memory is initialized with zeros).
We want to halt after this, so we need to jump our program counter to
10000. We will do this by using our JZ
instruction, saying if the
memory value at index 9999 is 0, jump to 10000:
5 9999 10000
Now we get to memory address 21, so we will set the values of memory 21
to 26 to the values of the characters in Hello
(as returned by
ord()
) plus a 10
for \n
:
72 101 108 108 111 10
Here is the full listing which we can save as hello.vn
:
7 21 9999
7 22 9999
7 23 9999
7 24 9999
7 25 9999
7 26 9999
5 9999 10000
72 101 108 108 111 10
We can then use our VM to run the program like this:
touch input
python3 vn.py hello.vn input
We're also creating a blank input
file since Hello world!
isn't
going to read anything via inp()
.
Running this should print Hello
. Our program
is pretty hard to
write or read, we're programming with integers. Let's make our life a
bit easier.
We will implement an assembler for our VM. An assembly language is a low-level language closely matching the architecture it targets (in our case, our very simple von Neumann machine).
Our assembler will take 2 arguments - an input file and an output file - and automatically translate the input (assembly language) into instructions for our VM.
We will add the following features:
#
will be ignored.at
, set
,
add
, not
, eq
, jz
, inp
, out
to represent the instructions
0
, 1
, ... 5
.:
, for example HERE:
. We will then be able to refer to the
location using the identified preceded by :
, like :HERE
. We will
also allow adding an offset to a reference: :HERE+2
is 2 past the
HERE
label.ORD
macro - To make implementing Hello world!easier, we will provide the
ORD()
macro which will return the integer
representation of the character passed to it, for example ORD(H)
will return 72
.Using this assembly language, we can rewrite Hello world!
as:
## Print 6 characters starting from DATA
out :DATA 9999
out :DATA+1 9999
out :DATA+2 9999
out :DATA+3 9999
out :DATA+4 9999
out :DATA+5 9999
## End program
jz 9999 10000
## Data section
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
First, we'll read the input file and convert it into a list of tokens.
We will ignore lines starting with #
(so we can add comments to our
assembly file).
import sys
if len(sys.argv) != 3:
print("Usage: asm.py <input> <output>")
exit()
## Read all lines into a list
lines = open(sys.argv[1]).readlines()
## Filter out blank lines and lines starting with '#'
lines = list(filter(lambda line: line and line[0] != '#', lines))
## Join all lines and split into tokens
tokens = ' '.join(lines).split()
The labels themselves aren't part of the program, rather mark locations in the program, so in the next step we will pluck these out from the list of tokens but retain the index they are referencing:
## pluck labels and remember position
labels, i = {}, 0
while i < len(tokens):
# If not a label, advance
if tokens[i][-1] != ':':
i += 1
continue
# Store location and pluck label
labels[tokens[i][:-1]] = i
tokens.pop(i)
Now we will process all tokens and handle the following cases:
:
, it is a label reference, so replace it
with the actual location (as stored during the previous step).ORD()
macro, replace the character passed to
ORD()
with its value.## Op code list (constant)
OP_CODES = ['at', 'set', 'add', 'not', 'eq', 'jz', 'inp', 'out']
for i, token in enumerate(tokens):
# replace label references with actual position
if token[0] == ':':
if '+' in token:
base, offset = token.split('+')
tokens[i] = labels[base[1:]] + int(offset)
else:
tokens[i] = labels[token[1:]]
# replace op codes with values
if token in OP_CODES:
tokens[i] = OP_CODES.index(token)
# replace ORD macro
if token[:4] == 'ORD(':
tokens[i] = ord(token[4:-1])
Finally, we write all tokens to the output file:
open(sys.argv[2], "w").write(
' '.join([str(token) for token in tokens]))
Here is the full source code of our assembler (asm.py
):
import sys
if len(sys.argv) != 3:
print("Usage: asm.py <input> <output>")
exit()
## Read all lines into a list
lines = open(sys.argv[1]).readlines()
## Filter out blank lines and lines starting with '#'
lines = list(filter(lambda line: line and line[0] != '#', lines))
## Join all lines and split into tokens
tokens = ' '.join(lines).split()
## pluck labels and remember position
labels, i = {}, 0
while i < len(tokens):
# If not a label, advance
if tokens[i][-1] != ':':
i += 1
continue
# Store location and pluck label
labels[tokens[i][:-1]] = i
tokens.pop(i)
## Op code list (constant)
OP_CODES = ['at', 'set', 'add', 'not', 'eq', 'jz', 'inp', 'out']
for i, token in enumerate(tokens):
# replace label references with actual position
if token[0] == ':':
if '+' in token:
base, offset = token.split('+')
tokens[i] = labels[base[1:]] + int(offset)
else:
tokens[i] = labels[token[1:]]
# replace op codes with values
if token in OP_CODES:
tokens[i] = OP_CODES.index(token)
# replace ORD macro
if token[:4] == 'ORD(':
tokens[i] = ord(token[4:-1])
open(sys.argv[2], "w").write(
' '.join([str(token) for token in tokens]))
We can now save our assembly Hello world!
(listed above) to a file,
let's call it hello.asm
and use the assembler to convert it to a
program our VM can execute:
python3 asm.py hello.asm hello.vn
The resulting hello.vn
should have the same content as our
hand-crafted Hello world!
, minus the newlines (the assembler
doesn't output newlines). The content of the assembled file hello.vn
is:
7 21 9999 7 22 9999 7 23 9999 7 24 9999 7 25 9999 7 26 9999 5 9999 10000 72 101 108 108 111 10
We can run this using:
python3 vn.py hello.vn input
We are again using an empty input file since we don't need input. As a
convention, we use the .asm
extensions for assembly files and .vn
for assembled files targeting the VM.
Let's rewrite our program: instead of outputting :DATA
, then
:DATA+1
, then DATA+2
... we should be able to output :DATA + :I
where :I
goes from 0 to 5.
We can easily use a variable by tagging any part of the program then referencing it, then using that label to refer to the variable.
I: 0
Then we can use :I
to reference to it. We will use a COUNTER
variable to count down from 6 to 0, and an offset variable I
:
## Variables
I: 0
COUNTER: 6
We also need a couple of constant values: 0
, 1
- by which we
increment I
during each iteration, and -1
to decrement COUNTER
during each iteration. And, of course, our DATA
, where we store the
Hello
string:
## Constants
CONST: 0 1 -1
## Data
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
Now lets look at how we can implement a loop using JZ
:
## Beginning of loop
LOOP:
## Output I
out :DATA :I
## Decrement COUNTER, increment I
add :COUNTER :CONST+2
add :I :CONST+1
## If COUNTER is 0, we're done
jz :COUNTER 10000
## If not, jump to the start of the loop
jz :CONST :LOOP
At each iteration, our loop will output the character value at DATA
plus the offset specified in I
(initially 0). Then we subtract -1 from
our COUNTER
and add 1 to I
. Since our VM uses memory addresses for
all operations, we stored 1
and -1
in memory at CONST
and
CONST+1
respectively.
If COUNTER
is 0, we're done, so we jump to 10000
. If not, we repeat
the loop (jump to LOOP
if CONST
is 0, but CONST
is always 0).
Here is the full listing of this program:
## Beginning of loop
LOOP:
## Output I
out :DATA :I
## Decrement COUNTER, increment I
add :COUNTER :CONST+2
add :I :CONST+1
## If COUNTER is 0, we're done
jz :COUNTER 10000
## If not, jump to the start of the loop
jz :CONST :LOOP
## Constants
CONST: 0 1 -1
## Data
DATA: ORD(H) ORD(e) ORD(l) ORD(l) ORD(o) 10
## Variables
I: 0
COUNTER: 6
We can save this as hello2.asm
, then assemble and run it:
python3 asm.py hello2.asm hello2.vn
python3 vn.py hello2.vn
A few notes: data is mixed with code in all our programs, which follows from the von Neumann architecture, in which the memory of the system stores both code and data. This is fundamentally true for all computers, and enables some interesting behavior like self-modifying code. This could be intentional, or we could, accidentally due to a bug, interpret data as code or vice-versa, code as data. Modern systems employ various additional protections to prevent this type of accidental usage.
Because our particular VM starts execution from memory location 0, we have to place our constants and variables (data) after the instructions in the program. Executable files on modern systems similarly contain code and data segments, albeit with more complex layout and rules.
Let's prove our simple von Neumann VM is Turing-complete, meaning capable of universal computation. As we saw throughout this series of blog posts, the best way to prove this is to emulate another known Turing-complete system.
We will prove this by implementing a
Brainfuck interpreter. We
covered Brainfuck during the second post in the
series,
under Esoteric Turing machines. To recap: Brainfuck (BF) uses a byte
array (tape), a data pointer (index in the array), and 8 symbols: >
,
<
, +
, -
, .
, ,
, [
, ]
. The symbols are interpreted as:
>
: Increment the data pointer (move head right).<
: Decrement the data pointer (move head left).+
: Increment array value at data pointer.-
: Decrement array value at data pointer..
: Output value at data pointer.,
: Read 1 byte of input and store at data pointer.[
: If the byte at data pointer is 0, jump right to the matching
]
, else increment data pointer.]
: If the byte at data pointer is not 0, jump left to the matching
[
, else decrement data pointer.We will use our assembly language to implement a program which reads a BF program from input, then executes it. Effectively, we'll use our very simple virtual machine to emulate another very simple virtual machine!
I won't cover the details of the implementation, since it is quite cumbersome due to the simplicity of our VM and assembly language. I will just provide a short summary of what is going on:
\
).CODE_PTR
code pointer variable to point to the
current BF instructions and a DATA_PTR
data pointer variable to
point to the BF array.>
, <
, etc.).[
and ]
, which require keeping track of unbalanced parenthesis
so we properly jump from [
to matching ]
and vice-versa.Here is the full Brainfuck interpreter implemented in our assembly language:
## Read Brainfuck program until a \n is encountered
START:
## Read one integer at PROG + offset I
inp :PROG :I
## Increment I by 1
add :I :CONST+1
## Zero out DONE_READING (!1)
not :DONE_READING :CONST+1
## DONE_READING = 10
add :DONE_READING :CONST+3
## Load the last integer we read in TEMP
at :TEMP :END
## Increment END to keep track of program end
add :END :CONST+1
## Check if the last integer we read was 10 (\n)
eq :DONE_READING :TEMP
## If it wasn't zero, jump to start and read another value
jz :DONE_READING :START
## Start running program
BF_RUN:
at :TEMP :CODE_PTR
add :CODE_PTR :CONST+1
## Check if we're on a > instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :RIGHT
## Check if we're on a < instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+1
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :LEFT
## Check if we're on a + instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+2
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :INC
## Check if we're on a - instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+3
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :DEC
## Check if we're on a . instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+4
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :OUT
## Check if we're on a , instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+5
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :IN
## Check if we're on a [ instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+6
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :FORWARD
## Check if we're on a ] instruction
not :TEMP2 :CONST+1
add :TEMP2 :BF+7
eq :TEMP2 :TEMP
not :TEMP2 :TEMP2
jz :TEMP2 :BACKWARD
## No matching BF instruction so we're done
jz :CONST 10000
RIGHT:
## > - increment data pointer
add :DATA_PTR :CONST+1
jz :CONST :BF_RUN
LEFT:
## < - decrement data pointer
add :DATA_PTR :CONST+2
jz :CONST :BF_RUN
INC:
## + - increment cell
at :TEMP :DATA_PTR
add :TEMP :CONST+1
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
DEC:
## - - decrement cell
at :TEMP :DATA_PTR
add :TEMP :CONST+2
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
OUT:
## . - output cell
at :TEMP :DATA_PTR
out :TEMP :CONST
jz :CONST :BF_RUN
IN:
## , - store input in cell
inp :TEMP :CONST
set :DATA_PTR :TEMP
jz :CONST :BF_RUN
FORWARD:
## [
at :TEMP :DATA_PTR
not :TEMP :TEMP
## If value in cell is not 0, continue
jz :TEMP :BF_RUN
## Find matching ]
## Set TEMP to 1, counting unbalanced [
not :TEMP :TEMP
add :TEMP :CONST+1
SCAN_FORWARD:
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+6
not :TEMP2 :TEMP2
## Jump if found a [
jz :TEMP2 :FORWARD_LPAR
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+7
not :TEMP2 :TEMP2
## Jump if found a ]
jz :TEMP2 :FORWARD_RPAR
## Keep scanning
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
## Increment counter when finding a [
FORWARD_LPAR:
add :TEMP :CONST+1
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
## Decrement counter when finding a ]
FORWARD_RPAR:
add :TEMP :CONST+2
## If counter is 0, we're done
jz :TEMP :BF_RUN
## Else keep scanning
add :CODE_PTR :CONST+1
jz :CONST :SCAN_FORWARD
BACKWARD:
## ]
at :TEMP :DATA_PTR
## If value in cell is 0, continue
jz :TEMP :BF_RUN
## Find matching [
## Set TEMP to 1, counting unbalanced ]
not :TEMP :TEMP
add :TEMP :CONST+1
## Move code pointer back 2
add :CODE_PTR :CONST+2
add :CODE_PTR :CONST+2
SCAN_BACKWARD:
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+6
not :TEMP2 :TEMP2
## Jump if found a [
jz :TEMP2 :BACKWARD_LPAR
at :TEMP2 :CODE_PTR
eq :TEMP2 :BF+7
not :TEMP2 :TEMP2
## Jump if found a ]
jz :TEMP2 :BACKWARD_RPAR
## Keep scanning
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
## Decrement counter when finding a [
BACKWARD_LPAR:
add :TEMP :CONST+2
## If counter is 0, we're done
jz :TEMP :BF_RUN
## Else keep scanning
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
## Increment counter when finding a ]
BACKWARD_RPAR:
add :TEMP :CONST+1
add :CODE_PTR :CONST+2
jz :CONST :SCAN_BACKWARD
CONST: 0 1 -1 10
BF: ORD(>) ORD(<) ORD(+) ORD(-) ORD(.) ORD(,) ORD([) ORD(])
I: 0
TEMP: 0
TEMP2: 0
END: :PROG
DONE_READING: 0
CODE_PTR: :PROG
DATA_PTR: 5000
## We'll load the BF program here
PROG:
We can save this program as bf.asm
. We will also create a Brainfuck
program to run - Hello world
:
++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
We will save this as hello.bf
. Now we can compile our BF interpreter
and run it using our VM:
python3 asm.py bf.asm bf.vn
python3 vn.py bf.vn hello.bf
This should output Hello world!
.
Since Brainfuck is Turing-complete and our VM can emulate a Brainfuck interpreter, our VM is also Turing-complete.
Hello world, and saw how we can use variables and loops.
For convenience, the code we covered in this post is online here:
]]>In the previous post we talked about Conway's Game of Life as a well-known cellular automaton. In this post we will cover even simpler automata - the elementary cellular automata. Stephen Wolfram covers them extensively in his book, A New Kind of Science.
To recap, we defined a cellular automaton as a discrete n-dimensional lattice of cells, a set of states (for each cell), a notion of neighborhood for each cell, and a transition function mapping the neighborhood of each cell to a new cell state.
An elementary cellular automaton is 1-dimensional - an array of cells. A cell can be either on or off (just like in Conway's Game of Life). The neighborhood of a cell, meaning the cells that we take into account when we determine the next state of the next generation, consists of the cell itself and its left and right neighbors.
For example, we can define an elementary cellular automaton with the following rules:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> on
[off, off, on] -> on
[off, off, off] -> off
If we start with a single on cell and produce 10 generations, we get
(using #
to mean on):
#
###
# #
### ###
# #
### ###
# # # #
### ### ### ###
# #
### ###
The elementary cellular automata can easily be enumerated exhaustively:
the neighborhood of a cell can be in only one of 8 states, as we saw
above: [on, on, on]
, [on, on, off]
, ... [off, off, off]
. The
transition function maps each of these possible states to either on or
off. If we think of the on/off as a bit, we need 8 bits to
represent the transition function.
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> on
[off, off, on] -> on
[off, off, off] -> off
can be represented as the binary number 00010110
, which, in decimal,
is 22 (where [off, off, off]
is the least significant bit). We can
represent numbers from 0 to 255 in 8 bits, so there are exactly 256
elementary cellular automata. This encoding is referred to as Rule as
in transition rule
. The elementary cellular automata in our above
example is called Rule 22 .
A common way to plot the evolution of an elementary cellular automata
over multiple generation is to render each generation below the previous
one, like our above example using #
for on. A more condensed version
with 1 pixel per cell of running rule 22 for 301 generations looks like
this:
At this level, we can clearly see patterns emerging in the automaton. We get an even more interesting view if, instead of starting with just a single on cell, we start with a random state - an array of random on and off cells. Here is rule 22 starting with 301 random cells and running for 301 generations:
We can also easily see some of the automatons are complements of other automatons: if we simply flip each bit, we end up with a complementary version. Rule 22's complement is Rule 151:
We can also reflect a rule by swapping the transitions for
[on, off, off]
with [off, off, on]
and [on, on, off]
with
[off, on, on]
. This doesn't work for rule 22, since its reflection is
still 22, but, for example, rules 3 and 17 are reflections of each
other.
Rule 3:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> off
[off, on, on] -> off
[off, on, off] -> off
[off, off, on] -> on
[off, off, off] -> on
Renders as:
Rule 17:
[ on, on, on] -> off
[ on, on, off] -> off
[ on, off, on] -> off
[ on, off, off] -> on
[off, on, on] -> off
[off, on, off] -> off
[off, off, on] -> off
[off, off, off] -> on
Renders as:
That means that, even though there are 256 possible automata, from behavioral perspective, some are complements or reflections of others thus exhibit the same behavior. In fact, there are only 88 uniquely behaving automata, all others being complements and/or reflections of these.
Let's look at a Python implementation. We will represent the state of
an automaton as a list of Boolean cells. We can encode the state of a
neighborhood as a 3 bit number:
left neighbor * 4 + cell * 2 + right neighbor
. Given a list of cells
and the index of a cell, we have:
def neighbors(cells, i):
return (cells[i - 1] if i > 0 else False) * 4 + \
cells[i] * 2 + \
(cells[i + 1] if i < len(cells) - 1 else False)
If we run off the ends of the list, we assume the state of that cell is
off. In Python, False
becomes 0
and True
becomes 1 if we do
arithmetic with them, so this function will return a number between 0
and 7.
We can derive the transitions from the rule number by taking a rule number and expanding it into a dictionary that maps each value from 0 to 7 to the corresponding bit in the rule number value:
def transition(rule):
return {i: rule & (1 << i) != 0 for i in range(8)}
This might be a bit hard to understand, so let's work through an
example. Let's take Rule 22. The binary representation of Rule 22 is
00010110
. We're iterating over the range 0...7 (i
) and for each of
these values, we shift 1
exactly i
bits left. Then we check if the
rule logic AND this shifted bit is different than 0.
For i == 0
: 00010110 & (1 << 0)
, which is 00010110 & 00000001
, we
get False
, so transitions[0] = False
.
For i == 1
: 00010110 & (1 << 1)
, which is 00010110 & 00000010
, we
get True
, so transitions[1] = True
.
...
For i == 7
: 00010110 & (1 << 7)
, which is 00010110 & 10000000
, we
get False
, so transitions[7] = False
.
Remember the keys of the dictionary are neighborhood states.
Now we just need a function that takes a rule, an initial state, and the number of steps we want to run. The function will start with the initial state, then at each step, update the list of cells using the transition function:
def run(rule, initial_state, steps):
t, cells = transition(rule), initial_state
for _ in range(steps):
yield cells
cells = [t[neighbors(cells, i)] for i in range(len(cells))]
We talked about two ways to look at cellular automata: starting with a single on cell, or starting with a random initial state.
Let's implement an initial_state
function which takes a cell count as
input and returns a list of cells, all of which are off except the
middle one:
def initial_state(cell_count):
result = [False] * cell_count
result[cell_count // 2] = True
return result
We'll also want a random_initial_state
which takes a cell count and
returns a random cell list. We'll take advantage of the fact that
Python supports arbitrarily large integers natively, so we'll just
generate a random number with cell_count
bits, then derive the cell
list from that (if a bit is 1
, the corresponding cell is on):
import random
def random_initial_state(cell_count):
seed = random.randint(0, 2 ** cell_count - 1)
return [seed & (1 << i) != 0 for i in range(cell_count)]
Here is all the code in one listing:
def neighbors(cells, i):
return (cells[i - 1] if i > 0 else False) * 4 + \
cells[i] * 2 + \
(cells[i + 1] if i < len(cells) - 1 else False)
def transition(rule):
return {i: rule & (1 << i) != 0 for i in range(8)}
def run(rule, initial_state, steps):
t, cells = transition(rule), initial_state
for _ in range(steps):
yield cells
cells = [t[neighbors(cells, i)] for i in range(len(cells))]
def initial_state(cell_count):
result = [False] * cell_count
result[cell_count // 2] = True
return result
import random
def random_initial_state(cell_count):
seed = random.randint(0, 2 ** cell_count - 1)
return [seed & (1 << i) != 0 for i in range(cell_count)]
Here is how we can use this to print the first 30 steps of Rule 22:
for state in run(22, initial_state(61), 30):
print(''.join(['#' if e else ' ' for e in state]))
Wolfram analyzed the behavior of cellular automata and classified them in 4 classes (called Wolfram classes). These go beyond elementary cellular automata to cover other cellular automata like, for example, ones where the next generation of a cell is not determined only by the cell and the two cells next to it, rather the neighborhood includes next-next cells. In this post we'll stick to elementary cellular automata though.
Class 1 automata converge quickly to a uniform state. For example rule 0 becomes all off in one generation:
It's complement, rule 255, becomes all on in one generation:
Class 2 automata converge quickly to a repetitive state. For example rule 4:
Class 3 automata appear to remain in a random state, without converging. Rule 22, which we started with above, exhibits this type of behavior:
The most interesting class of cellular automata, class 4, has a quite remarkable behavior: areas of cells end up in static or repetitive state, while some cells end up forming structures that interact with each other. Rule 110 is the only elementary cellular automaton that exhibits this behavior:
The fact that Rule 110 has areas of cells that are static or repetitive
while some other cells form structures should remind you of the
Conway's Game of Life spaceships we discussed in the previous post. In
the previous post, we saw that the Game of Life is Turing complete, and
how a Turing machine was implemented
using spaceships as signals
processed
by other patterns.
Turns out Rule 110 is also Turing complete. Stephen Wolfram conjectured this in 1985, and the conjecture was proved in 2004 by Matthew Cook^{1}. Cook uses Rule 110 gliders (interacting structures) to emulate a cyclic tag system. We saw in Computability Part 3: Tag Systems that cyclic tag systems can emulate tag systems, and an m-tag system with \(m \gt 1\) is Turing complete.
Rule 110, an elementary cellular automaton, is also capable of universal computation. And while this all might seem very abstract, cellular automata are so simple they show up in nature:
]]>Formal definition:
A cellular automaton consists of a discrete n-dimensional lattice of cells, a set of states (for each cell), a notion of neighborhood for each cell, and a transition function mapping the neighborhood of each cell to a new cell state.
The system evolves over time, where at each step, the transformation function is applied over the lattice to determine the states of the next generation of cells.
Conway's Game of Life is a cellular automaton on a 2D plane with the following rules:
- Any live cell with fewer than two live neighbors dies.
- Any live cell with two or three live neighbors lives on to the next generation.
- Any live cell with more than three live neighbors dies.
- Any dead cell with exactly three live neighbors becomes a live cell.
In other words, a live cell stays alive during the next iteration if it has 2 or 3 live neighbors. A dead cell becomes live if it has exactly 3 live neighbors.
In the case of Conway's Game of Life, the lattice is a 2D grid, we have 2 states (on or off), the neighborhood of a cell consists of all adjacent cells (including corners), and the transition function is the one described above. Mathematician John Conway proposed the Game of Life in 1970.
The reason we started with Conway's Game of Life for discussing cellular automata is that this simple game with simple rules exhibits some very interesting behavior that has been classified for many years by people toying with the simulation.
First, we have still lives, patterns that don't change while stepping through the simulation. These patterns are stable: no cells die, no cells become live.
Next, we have oscillators, patterns that repeat with a certain periodicity:
In the above example, the last (bottom right) pattern has period 5 and is called Octagon 2. The other 3 patterns all have period 2.
More interestingly, we have spaceships - these are patterns that repeat but translate through space:
The above examples shows a couple of small spaceships, the tiny 5-cell glider and the lightweight spaceship or LWSS. There are many more spaceship patterns, some of them quite large (hundreds or even thousands of cells).
Most simulations tend to eventually stabilize into a combination of oscillators and still lives. Patterns that start from a small seed of a handful of cells and take a long time (in terms of iterations) to stabilize are called Methuselahs. Here is an example, nicknamed Acorn:
Conway conjectured that for any initial configuration, there is an upper limit of how many live cells can ever exist. This was proved wrong by the discovery of glider guns. A glider gun generates gliders every few iterations. The gliders continue moving away from the gun, thus running the simulation the number of live cells continues to grow.
One of the most popular glider guns is called Gosper glider gun, named after Mathematician and programmer Bill Gosper:
There are many other interesting patterns and constructions in the Game of Life discovered throughout the years. A few examples:
absorbother patterns like spaceships, and return to their original state.
There are many others, and combinations of them which give rise to interesting systems like circuits and logic gates based on spaceships and strategically placed still lives and oscillators.
Let's look at a Python implementation for the Game of Life. We will use a wrap-around space, so we'll consider cells on the last column to be neighbors with cells on the first column and similarly cells on the last row to be neighbors with cells on the first row.
def make_matrix(width, height):
return [[False] * width for _ in range(height)]
def neighbors(m, i, j):
last_j = j + 1 if j + 1 < len(m[0]) else 0
last_i = i + 1 if i + 1 < len(m) else 0
return (m[i - 1][j - 1] + m[i - 1][j] + m[i - 1][last_j] +
m[i][j - 1] + m[i][last_j] +
m[last_i][j - 1] + m[last_i][j] + m