December 31, 2018

Fold

Most object-oriented programming languages represent object instances in memory in two separate chunks. One of them contains the instance-specific state - the class attributes. The other one contains the methods, which are actually shared across instances. Let's take as an example a simple type with a couple of properties and a method:

class Point {
public:
    double X;
    double Y; 

    double distanceToOrigin() const { /* ... */ }
};

If we take two distinct instances of Point, like new Point(10, 5) and new Point(20, 20), they have different X and Y coordinates, so those need to be stored individually. On the other hand, the logic of distanceToOrigin is the same for both objects. The method evaluates to different results, because this->X and this->Y are different for the two objects, but the code is the same.

Instead of storing separate copies of the code for each instance, the method implementations are shared across objects. distanceToOrigin can return different results for different objects because each invocation gets a different this pointer to the state of each object. In fact, under the hood, every class method gets an implicit this argument which represents the object on which the method is invoked. Python makes that explicit, requiring all class methods to implement a self argument in order to reference instance-specific state.

There isn't much difference between a method like Point::distanceToOrigin and a free function which takes a Point as an argument:

double distanceToOrigin(Point p)
{
    /* ... */
}

The only difference is that a class member gets access to privates, while an outside function doesn't. Visibility aside, methods of a class are an independent chunk shared across object instances and the state of each object is another, separate chunk.

An interpretation of this is that, even though classes contain methods, object instances of a given type can still be thought of as pure state. Our new Point(10, 5) is represented by a combination of the Point class, which includes the method implementations and their memory layout etc. and instance-specific state. The Point class includes the method distanceToOrigin with an implicit this argument (and some additional metadata).

With this view, a Point instance is a pair of values X and Y. Any particular instance of Point is a member of the set { (X, Y) : X ∈ double, Y ∈ double }. The function distanceToOrigin can be viewed as a function from this set (from the implicit this argument) to a double value.

disntaceToOrigin : { (X, Y) : X ∈ double, Y ∈ double } → double.

Operations and Closures

A function that takes two arguments of type Point is called a binary operation on the set. We can represent it as a function f(Point, Point) or, equivalently, as an operation Point ⊙ Point. If the codomain of the function is also a Point, for example let's take a function like Point midPoint(Point x, Point y), which computes the middle point between two given points, we call the operation closed over the set.

The notion of closure in algebra is different from the notion closure in computer science, which deals with context captured in lambdas. In this case it simply means an operation that combines a number of elements of a set into another element of the set.

Since we can view any type as a set, and any function taking two arguments of a type and returning another instance of that type as a closed binary operation on the set, we can start talking about algebraic structures.

Magmas

A magma is just a set with a closed operation, without any other constraints imposed. If we have a magma, we can implement an algorithm like fold which, given a set of values in the magma, combines them into a single value.

For example we can fold three Point instances a, b, and c into a single Point using the midPoint function like this:

Point result = midPoint(midPoint(a, b), c);

We can sketch out a function that, given a set of objects of any type T and an operation closed over T, produces a final value of type T:

template <typename It, typename Op,
    typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, T init, Op op)
{
    T result = init;

    for (auto it = begin; it != end; ++it) {
        result = op(result, *it);
    }

    return result;
}

This function takes a pair of iterators which we use to traverse the set of values of type T, an initial value of type T, and a binary operation Op. We need an initial value of type T because if the set is empty, we still need to return something from the function. That is, if the for loop never executes, and we never apply op, we still need a T to return. In this case we will simply return init.

Since a magma doesn't impose any constraints on the operation, the order in which we combine elements might be important. For example, for integers and the subtraction operation, if we start with the set of number 1, 2, and 3, and an initial value of 1, we can fold from left to right and get ((1 - 1) - 2) - 3, or -5. On the other hand, we can start from right to left, and fold 1 - (2 - (3 - 1)), or 1. Let's call this verison foldRight and look at a possible implementation:

template <typename It, typename Op
    typename T=typename std::iterator_traits<It>::value_type>
T fold_right(It begin, It end, T init, Op op)
{
    T result = init;

    for (auto it = end; it != begin; ) {
        result = op(*(--it), result);
    }

    return result;
}

Note that for an operation like addition, this is not the case: if we fold 1, 2, and 3, with an initial value of 1, we get 7 regardless of the direction and whether we make the initial value a left or a right operand. Let's zoom in on this.

Semigroups

If we have a set with a closed operation that is also associative, we have a semigroup. For an associative operation, it doesn't really matter what order we apply multiple operations on. For example, for the set of strings and the string concatenation operation, it doesn't metter whether we append foo to bar, then the result to baz or whether we concatenate bar with baz first, then prepend foo to it:

("foo"s + "bar"s) + "baz"s == "foo"s + ("bar"s + "baz"s)

More formally, an opeartion is associative if, for any a, b, and c, (a ⊙ b) ⊙ c == a ⊙ (b ⊙ c). That being said, that's not enough to make our right fold redundant:

std::vector<std::string> elems = { "foo"s, "bar"s, "baz"s };

std::cout << fold(elems.begin(), elems.end(), "!"s,
    [](const std::string& x, const std::string& y) { 
        return x + y; 
    });
// Prints !foobarbaz

std::cout << fold_right(elems.begin(), elems.end(), "!"s,
    [](const std::string& x, const std::string& y) { 
        return x + y;
    });
// Prints foobarbaz!

It still matters whether our initial value is a left or right argument to our operation. If for an operation we get the same result regardless of how we arrange the operands, then we have a commutative operation. Integer addition is an example of a commutative operation, where 1 + 2 is the same as 2 + 1. Note this is different than string concatenation, where "foo"s + "bar"s != "bar"s + "foo"s. A semigroup with a commutative operation is called an abelian semigroup. Here we finally no longer need to distinguish between fold and fold_right as they both produce the same result.

Let's also look at that mandatory initial value.

Monoids

The reason we require an initial value is that we need to return something in case our set is empty, and we don't always have a good default. That's not always the case. There are operations for which we have such a default, called the identity of the opeartion. This value, combined with any other value using the operation, leaves the other value unchanged. For string concatenation, the identity is the empty string. For addition, the identity is 0. In general, we have a ⊙ id == id ⊙ a == a for any a. This is what makes id an identity.

A semigroup with an identity is a monoid. That's a set with an associative operation and an identity element. Note commutativity is not required. If the operation is also commutative, then we have a commutative monoid.

If the default constructor of our type T creates an identiy, then we can reimplement fold like this:

template <typename It, typename Op, 
    typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, Op op)
{
    T result{ };

    for (auto it = begin; it != end; ++it) {
        result = op(result, *it);
    }

    return result;
}

We no longer require an initial value to be passed in. That's because, if we really want to combine all elements with a certain value, we can do it after calling fold. For example we can concatenate our strings and then append or prepend our !s:

std::vector<std::string> elems = { "foo"s, "bar"s, "baz"s };

std::string folded = fold(elems.begin(), elems.end(),
    [](const std::string& x, const std::string& y) { return x + y; });
// folded is "foobarbaz"s

Unlike the first implementation, this works for empty sets too:

std::vector<std::string> elems = { };

std::string folded = fold(elems.begin(), elems.end(),
    [](const std::string& x, const std::string& y) { return x + y; });
// folded is the empty string

This version only works if we have an identity and if the default constructor actually creates that identity. Note identity is a property related to the operation, so we need to be careful. The default integer value is 0, which is the identity for addition, so we can very well sum numbers using the second version of fold. On the other hand, we can't use it for product:

std::vector<int> elems = { 1, 2, 3 };

int sum = fold(elems.begin(), elems.end(),
    [](int x, int y) { return x + y; });
// sum is 6

int product = fold(elems.begin(), elems.end(),
    [](int x, int y) { return x * y; });
// product is 0

If we default to 0 and multiply all numbers with it, our product becomes 0. On the other hand, if we do have a default identity, then fold and fold_right give us the same result even if the operation is only associative, without necessarily being commutative. That's because, if our initial value is an identity, it doesn't matter whether it is a left or right argument to our operation. By definition, a ⊙ id == id ⊙ a == a so for a set of values like a, b, and c, we get:

((id ⊙ a) ⊙ b) ⊙ c == (a ⊙ b) ⊙ c

From associativity, we get:

(a ⊙ b) ⊙ c == a ⊙ (b ⊙ c)

We can add another identity in there, since c == c ⊙ id:

a ⊙ (b ⊙ c) == (a ⊙ (b ⊙ (c ⊙ id)))

We started with how fold evaluates and ended up with how fold_right evaluates. That means we don't distinguish between a left to right or right to left fold is either:

If neither of these holds, it matters which way the fold happens as we get different results.

Parallelized Fold

Wihout going into too many details, we can also divide & conquer a fold operation, fold subsets in parallel, then merge the results. Associativity allows us to do so, as we can split the set of values a, b, c, and d into left_half = a ⊙ b and right_half = c ⊙ d, then combine the two halves for the final result left_half ⊙ right_half. This is the same thing as (a ⊙ b) ⊙ (c ⊙ d). As long as the operation is associative, this is the same as ((a ⊙ b) ⊙ c) ⊙ d or a ⊙ (b ⊙ (c ⊙ d)).

Not all operations are associative though, as we saw before. Subtraction isn't for example. (1 - 2) - (3 - 4) is 0. ((1 - 2) - 3) - 4 is -8. We can only parallelize folding a semigroup, not any magma.

Here, since we are talking about dividing the input set, we can assume we have more than zero elements in a subset (otherwise we wouldn't divide it), so whether we have an identity or not or whether we apply it to the left or the right can be left to the top function which combines the partial results.

Alternatives

We covered traditional implementations of fold but let's go over a couple of alternative implementations. We could say that if the set is empty, we don't return anything, otherwise we take the first element as our initial value:

template <typename It, typename Op,
    typename T=typename std::iterator_traits<It>::value_type>
std::optional<T> fold(It begin, It end, Op op)
{
    if (begin == end) return std::nullopt;

    It it = begin;
    T result = *(it++);

    for (; it != end; ++it) {
        result = op(result, *it);
    }

    return result;
}

As long as the operation is associative, the fold direction doesn't matter. In case we don't have any value at all, we return nullopt to signal the absence of a value.

Another option is to take a default value as an argument and return that in case we have no elements to combine, but if we do, simply ignore that value instead of combining it with the input elements:

template <typename It, typename Op,
    typename T=typename std::iterator_traits<It>::value_type>
T fold(It begin, It end, T def, Op op)
{
    if (begin == end) return def;

    It it = begin;
    T result = *(it++);

    for (; it != end; ++it) {
        result = op(result, *it);
    }

    return result;
}

This is similar to the previous implementation, but instead of an empty optional we return a supplied value in case we don't have anything to fold. This would be interpreted as fold or return def, as opposed to our original implementation, which was fold with init.

Summary

This post started with a discussion of how types can be viewed as sets and functions as functions over those sets. We then covered a few abstract algebra concepts as applied to the fold higher order function, looking in which situations do fold and fold_right return the same result.