# Fold

Most object-oriented programming languages represent object instances in memory in two separate chunks. One of them contains the instance-specific state - the class attributes. The other one contains the methods, which are actually shared across instances. Let's take as an example a simple type with a couple of properties and a method:

```
class Point {
public:
double X;
double Y;
double distanceToOrigin() const { /* ... */ }
};
```

If we take two distinct instances of `Point`

, like `new Point(10, 5)`

and `new Point(20, 20)`

, they have different `X`

and `Y`

coordinates, so
those need to be stored individually. On the other hand, the logic of
`distanceToOrigin`

is the same for both objects. The method evaluates to
different results, because `this->X`

and `this->Y`

are different for the
two objects, but the code is the same.

Instead of storing separate copies of the code for each instance, the
method implementations are shared across objects. `distanceToOrigin`

can
return different results for different objects because each invocation
gets a different `this`

pointer to the state of each object. In fact,
under the hood, every class method gets an implicit `this`

argument
which represents the object on which the method is invoked. Python makes
that explicit, requiring all class methods to implement a `self`

argument in order to reference instance-specific state.

There isn't much difference between a method like
`Point::distanceToOrigin`

and a free function which takes a `Point`

as
an argument:

```
double distanceToOrigin(Point p)
{
/* ... */
}
```

The only difference is that a class member gets access to privates,
while an outside function doesn't. Visibility aside, methods of a class
are an independent chunk

shared across object instances and the
state of each object is another, separate chunk

.

An interpretation of this is that, even though classes contain methods,
object instances of a given type can still be thought of as pure state.
Our `new Point(10, 5)`

is represented by a combination of the `Point`

class, which includes the method implementations and their memory layout
etc. and instance-specific state. The `Point`

class includes the method
`distanceToOrigin`

with an implicit `this`

argument (and some additional
metadata).

With this view, a `Point`

instance is a pair of values `X`

and `Y`

. Any
particular instance of `Point`

is a member of the set
`{ (X, Y) : X ∈ double, Y ∈ double }`

. The function `distanceToOrigin`

can be viewed as a function from this set (from the implicit `this`

argument) to a `double`

value.

`disntaceToOrigin : { (X, Y) : X ∈ double, Y ∈ double } → double.`

## Operations and Closures

A function that takes two arguments of type `Point`

is called a binary
operation on the set. We can represent it as a function
`f(Point, Point)`

or, equivalently, as an operation `Point ⊙ Point`

. If
the codomain of the function is also a `Point`

, for example let's take a
function like `Point midPoint(Point x, Point y)`

, which computes the
middle point between two given points, we call the operation *closed*
over the set.

The notion of closure in algebra is different from the notion closure in computer science, which deals with context captured in lambdas. In this case it simply means an operation that combines a number of elements of a set into another element of the set.

Since we can view any type as a set, and any function taking two arguments of a type and returning another instance of that type as a closed binary operation on the set, we can start talking about algebraic structures.

## Magmas

A magma is just a set with a closed operation, without any other
constraints imposed. If we have a magma, we can implement an algorithm
like `fold`

which, given a set of values in the magma, combines them
into a single value.

For example we can fold three `Point`

instances `a`

, `b`

, and `c`

into a
single `Point`

using the `midPoint`

function like this:

```
Point result = midPoint(midPoint(a, b), c);
```

We can sketch out a function that, given a set of objects of any type
`T`

and an operation closed over `T`

, produces a final value of type
`T`

:

```
template <typename It, typename Op,
typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, T init, Op op)
{
T result = init;
for (auto it = begin; it != end; ++it) {
result = op(result, *it);
}
return result;
}
```

This function takes a pair of iterators which we use to traverse the set
of values of type `T`

, an initial value of type `T`

, and a binary
operation `Op`

. We need an initial value of type `T`

because if the set
is empty, we still need to return something from the function. That is,
if the `for`

loop never executes, and we never apply `op`

, we still need
a `T`

to return. In this case we will simply return `init`

.

Since a magma doesn't impose any constraints on the operation, the
order in which we combine elements might be important. For example, for
integers and the subtraction operation, if we start with the set of
number `1`

, `2`

, and `3`

, and an initial value of `1`

, we can fold from
left to right and get `((1 - 1) - 2) - 3`

, or `-5`

. On the other hand,
we can start from right to left, and fold `1 - (2 - (3 - 1))`

, or `1`

.
Let's call this verison `foldRight`

and look at a possible
implementation:

```
template <typename It, typename Op
typename T=typename std::iterator_traits<It>::value_type>
T fold_right(It begin, It end, T init, Op op)
{
T result = init;
for (auto it = end; it != begin; ) {
result = op(*(--it), result);
}
return result;
}
```

Note that for an operation like addition, this is not the case: if we
fold `1`

, `2`

, and `3`

, with an initial value of `1`

, we get `7`

regardless of the direction and whether we make the initial value a left
or a right operand. Let's zoom in on this.

## Semigroups

If we have a set with a closed operation that is also associative, we
have a *semigroup*. For an *associative* operation, it doesn't really
matter what order we apply multiple operations on. For example, for the
set of strings and the string concatenation operation, it doesn't
metter whether we append `foo`

to `bar`

, then the result to `baz`

or
whether we concatenate `bar`

with `baz`

first, then prepend `foo`

to it:

```
("foo"s + "bar"s) + "baz"s == "foo"s + ("bar"s + "baz"s)
```

More formally, an opeartion is associative if, for any `a`

, `b`

, and
`c`

, `(a ⊙ b) ⊙ c == a ⊙ (b ⊙ c)`

. That being said, that's not enough
to make our right fold redundant:

```
std::vector<std::string> elems = { "foo"s, "bar"s, "baz"s };
std::cout << fold(elems.begin(), elems.end(), "!"s,
[](const std::string& x, const std::string& y) {
return x + y;
});
// Prints !foobarbaz
std::cout << fold_right(elems.begin(), elems.end(), "!"s,
[](const std::string& x, const std::string& y) {
return x + y;
});
// Prints foobarbaz!
```

It still matters whether our initial value is a left or right argument
to our operation. If for an operation we get the same result regardless
of how we arrange the operands, then we have a *commutative* operation.
Integer addition is an example of a commutative operation, where `1 + 2`

is the same as `2 + 1`

. Note this is different than string
concatenation, where `"foo"s + "bar"s != "bar"s + "foo"s`

. A semigroup
with a commutative operation is called an *abelian semigroup*. Here we
finally no longer need to distinguish between `fold`

and `fold_right`

as
they both produce the same result.

Let's also look at that mandatory initial value.

## Monoids

The reason we require an initial value is that we need to return
something in case our set is empty, and we don't always have a good
default. That's not always the case. There are operations for which we
have such a default, called the *identity* of the opeartion. This value,
combined with any other value using the operation, leaves the other
value unchanged. For string concatenation, the identity is the empty
string. For addition, the identity is `0`

. In general, we have
`a ⊙ id == id ⊙ a == a`

for any `a`

. This is what makes `id`

an
identity.

A semigroup with an identity is a *monoid*. That's a set with an
associative operation and an identity element. Note commutativity is not
required. If the operation is also commutative, then we have a
*commutative monoid*.

If the default constructor of our type `T`

creates an identiy, then we
can reimplement `fold`

like this:

```
template <typename It, typename Op,
typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, Op op)
{
T result{ };
for (auto it = begin; it != end; ++it) {
result = op(result, *it);
}
return result;
}
```

We no longer require an initial value to be passed in. That's because,
if we really want to combine all elements with a certain value, we can
do it after calling `fold`

. For example we can concatenate our strings
and then append or prepend our !

s:

```
std::vector<std::string> elems = { "foo"s, "bar"s, "baz"s };
std::string folded = fold(elems.begin(), elems.end(),
[](const std::string& x, const std::string& y) { return x + y; });
// folded is "foobarbaz"s
```

Unlike the first implementation, this works for empty sets too:

```
std::vector<std::string> elems = { };
std::string folded = fold(elems.begin(), elems.end(),
[](const std::string& x, const std::string& y) { return x + y; });
// folded is the empty string
```

This version only works if we have an identity and if the default
constructor actually creates that identity. Note identity is a property
related to the operation, so we need to be careful. The default integer
value is `0`

, which is the identity for addition, so we can very well
sum numbers using the second version of `fold`

. On the other hand, we
can't use it for product:

```
std::vector<int> elems = { 1, 2, 3 };
int sum = fold(elems.begin(), elems.end(),
[](int x, int y) { return x + y; });
// sum is 6
int product = fold(elems.begin(), elems.end(),
[](int x, int y) { return x * y; });
// product is 0
```

If we default to `0`

and multiply all numbers with it, our product
becomes `0`

. On the other hand, if we do have a default identity, then
`fold`

and `fold_right`

give us the same result even if the operation is
only associative, without necessarily being commutative. That's
because, if our initial value is an identity, it doesn't matter whether
it is a left or right argument to our operation. By definition,
`a ⊙ id == id ⊙ a == a`

so for a set of values like `a`

, `b`

, and `c`

,
we get:

```
((id ⊙ a) ⊙ b) ⊙ c == (a ⊙ b) ⊙ c
```

From associativity, we get:

```
(a ⊙ b) ⊙ c == a ⊙ (b ⊙ c)
```

We can add another identity in there, since `c == c ⊙ id`

:

```
a ⊙ (b ⊙ c) == (a ⊙ (b ⊙ (c ⊙ id)))
```

We started with how `fold`

evaluates and ended up with how `fold_right`

evaluates. That means we don't distinguish between a left to right or
right to left fold is either:

- The operation is associative and commutative (abelian semigroup).
- The operation is associative and the initial value is the identity (monoid with identity as initial value).

If neither of these holds, it matters which way the fold happens as we get different results.

## Parallelized Fold

Wihout going into too many details, we can also divide & conquer a fold
operation, fold subsets in parallel, then merge the results.
Associativity allows us to do so, as we can split the set of values `a`

,
`b`

, `c`

, and `d`

into `left_half = a ⊙ b`

and `right_half = c ⊙ d`

,
then combine the two halves for the final result
`left_half ⊙ right_half`

. This is the same thing as `(a ⊙ b) ⊙ (c ⊙ d)`

.
As long as the operation is associative, this is the same as
`((a ⊙ b) ⊙ c) ⊙ d`

or `a ⊙ (b ⊙ (c ⊙ d))`

.

Not all operations are associative though, as we saw before. Subtraction
isn't for example. `(1 - 2) - (3 - 4)`

is `0`

. `((1 - 2) - 3) - 4`

is
`-8`

. We can only parallelize folding a semigroup, not any magma.

Here, since we are talking about dividing the input set, we can assume we have more than zero elements in a subset (otherwise we wouldn't divide it), so whether we have an identity or not or whether we apply it to the left or the right can be left to the top function which combines the partial results.

## Alternatives

We covered traditional implementations of `fold`

but let's go over a
couple of alternative implementations. We could say that if the set is
empty, we don't return anything, otherwise we take the first element as
our initial value:

```
template <typename It, typename Op,
typename T=typename std::iterator_traits<It>::value_type>
std::optional<T> fold(It begin, It end, Op op)
{
if (begin == end) return std::nullopt;
It it = begin;
T result = *(it++);
for (; it != end; ++it) {
result = op(result, *it);
}
return result;
}
```

As long as the operation is associative, the fold direction doesn't
matter. In case we don't have any value at all, we return `nullopt`

to
signal the absence of a value.

Another option is to take a default value as an argument and return that in case we have no elements to combine, but if we do, simply ignore that value instead of combining it with the input elements:

```
template <typename It, typename Op,
typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, T def, Op op)
{
if (begin == end) return def;
It it = begin;
T result = *(it++);
for (; it != end; ++it) {
result = op(result, *it);
}
return result;
}
```

This is similar to the previous implementation, but instead of an empty
optional we return a supplied value in case we don't have anything to
fold. This would be interpreted as *fold or return def*, as opposed to
our original implementation, which was *fold with init*.

## Summary

This post started with a discussion of how types can be viewed as sets
and functions as functions over those sets. We then covered a few
abstract algebra concepts as applied to the `fold`

higher order
function, looking in which situations do `fold`

and `fold_right`

return
the same result.

- A function
`T f(T, T)`

is a closed binary operation on the set`T`

. - A set with a binary operation is called a magma. We can implement a
`fold`

and a`fold_right`

if we have a magma. Both functions need an initial value. - If the operation is also associative, we have a semigroup. Here it still matters whether the initial value is applied on the left or the right, as the results might be different.
- If the operation is also commutative, we have an abelian semigroup. There is no distinction between a left-to-right and a right-to-left fold for an abelian semigroup.
- If the operation has an identity, then we have a monoid. If we use the identity as an initial value, then we again have no distinctino between a left-to-right and a right-to-left fold.
- We can paralllize fold to take subsets of the input set, combine them in parallel, then combine the results. Associativity is the only requirement for this, so we can parallelize folding a semigroup.
- We also looked at a couple of alternative implementations which only require a semigroup for the fold direction not to matter. The first one returns an empty optional if there are no values to combine; the second returns a value supplied as argument, without otherwise combining it.