Notes on Types
Type Systems
A good type system can eliminate entire categories of errors in a program and simply make invalid code not compile. Before digging into types, below are a few common distinctions between the type systems of various programming languages:
Dynamic Typing vs Static Typing
In a statically typed language, the types are determined at compile time
so if a function was declared as only accepting a certain type T
but
we attempt to pass it an unrelated type U
, the program is considered
invalid. This is invalid C++:
void sqr(int x)
{
return x * x;
}
...
sqr("not an int"); // does not compile
On the other hand, in a dynamically typed language, we do not perform any compile time checks and, if the data we get is of an unexpected type, we treat it as a runtime error. Below is a Python function that squares a number:
def sqr(x):
return x ** 2
...
sqr("not an int")
## runs but fails with TypeError: unsupported operand type(s)
## for ** or pow(): 'str' and 'int'
Some of the interesting features of dynamic languages are duck typing
(if it walks like a duck, and it quacks like a duck...
) and monkey
patching. Duck typing means that accessing a member of an object works
as long as that object has such a member, regardless of the type of the
object. In Python:
class Foo:
def func(self):
print("foo")
class Bar:
def func(self):
print("bar")
[obj.func() for obj in [Foo(), Bar()]] # prints "foo" and "bar"
This can't work in a statically typed language where, at the bare
minimum, we would have to add some form of constraint for the types in
the list to ensure they contain a func()
method we can call.
Monkey patching refers to the ability to change the structure of an
object at runtime. For example we can swap the func
method from an
instance of foo
with another function like this:
class Foo:
def func(self):
print("foo")
def func_bar():
print("bar")
obj = Foo()
obj.func() # prints "foo"
obj.func = func_bar
obj.func() # prints "bar"
These are useful capabilities, but the tradeoff is a whole class of type errors which a statically typed language would've caught.
As a side note, the fact that dynamic languages don't need to specify types makes them more terse. That being said, the Hindley-Milner algorithm W can infer the types of a program in linear time with respect to the source size. So while Python is starting to support type annotations for better static analysis and TypeScript provides a way for writing type-safe JavaScript, C++ has better and better type inference, while in Haskell (which has one of the strongest static type systems) type annotations are mostly optional.
Strong Typing vs Weak Typing
At a high level, a strongly typed language does not implicitly convert between unrelated types. This is good in most situations as implicit conversions are often meaningless or have surprising effects - for example, adding a number to a list of characters. This can either result in runtime errors or garbage data. In contrast, a strongly typed language will not accept code that attempts to do this.
In Python, which is strongly typed, this doesn't work:
foo = "foo" # foo is "foo"
foo = foo + " bar" # foo is "foo bar"
foo = foo + 5 # TypeError: Can't convert 'int' object to str implicitly
It works just fine in JavaScript though:
var foo = "foo"; // foo is "foo"
foo = foo + " bar"; // foo is "foo bar"
foo = foo + 5; // foo is "foo bar5"
Note type strength is not an either/or - C++, while considered strongly typed, still allows several implicit casts between types (eg. pointer to bool). Some languages are more strict about converting between types implicitly, others less so.
Dynamic Polymorphism vs Static Polymorphism
Another difference to note is between static and dynamic polymorphism. Dynamic polymorphism happens at runtime, when calling a function on a base type gets resolved to the actual function of the deriving type:
struct base
{
virtual void func() = 0;
};
struct foo : base
{
void func() override
{
std::cout << "foo" << std::endl;
}
};
struct bar : base
{
void func() override
{
std::cout << "bar" << std::endl;
}
};
void call_func(base& obj)
{
obj.func();
}
...
call_func(foo{}); // prints "foo"
call_func(bar{}); // prints "bar"
In the above case, we effectively have a single function call_func
which takes a reference to a base
struct. The compiler generates a
v-table for struct base
and a call to func()
on base
involves a
v-table jump to the actual implementation of the function, which is
different between the inheriting types foo
and bar
.
Contrast this with the static alternative:
struct foo
{
void func()
{
std::cout << "foo" << std::endl;
}
};
struct bar
{
void func()
{
std::cout << "bar" << std::endl;
}
};
template <typename T> void call_func(T& obj)
{
obj.func();
}
...
call_func(foo{}); // prints "foo"
call_func(bar{}); // prints "bar"
In this case there is no relationship between foo
and bar
and no
v-table is needed. On the other hand, we no longer have a single
call_func
, we have a templated function which is instantiated for both
foo
and bar
types. This is all done at compile-time, the advantage
being faster code, the drawback being compiler needs to be aware of all
the types involved - we can no longer inject
types implementing an
interface at runtime. When calling call_func
, we need to have both the
definition of the function and the declaration of the type we're
passing in visible.
Types
During the rest of this post, I will talk about types in the context of a statically and strongly typed language, with a focus on static polymorphism. This pushes as much as possible of the type checking to the compilation stage, so many of the runtime issues of less strict languages become invalid syntax.
I will focus on C++ and cover some of the new C++17 feature which enable or make some of these concepts easier to work with. That being said, since this post focuses on types, I will also provide examples in Haskell, as Haskell can express these concepts much more succinctly.
Type Basics
Let's start with the definition of a type: a type represents the set
of possible values. For example, the C++ type uint8_t
represents the
set of integers from 0 to 255. Effectively this means that a variable of
a given type can only have values from within that set.
Interesting Types
Since we defined a type as a set of possible values, we can talk about the cardinality of a type, in other words the number of values in the set. Based on cardinality, there are a few interesting classes of types to talk about:
Empty Type
The first interesting type to talk about is the type that represents the
empty set, with |T| = 0
.
In Haskell, this type is named Void
. Since Haskell is a functional
language, all functions must return a value, so it does not make sense
to have a function that returns Void
- the same way it doesn't make
sense to define a mathematical function with the empty set as its
codomain. We do have an absurd
function though, which maps the empty
set to any value:
absurd :: Void -> a
This function cannot be called though.
In C++, the absence of a value is represented as the void
type. Since
C++ is not purely functional, we can define functions that don't return
anything. We can even say that a function does not take any arguments by
putting a void
between the parenthesis:
void foo(void);
This is the equivalent of:
void foo();
Note though that we cannot have a real argument of type void
, that
is a compile error as it doesn't make any sense - we would be mandating
the function takes a value from the empty set. So we can say foo(void)
but not foo(void arg)
, or even foo(int arg, void)
.
Unit Type
The next interesting class consists of types with cardinality 1. A type
T
with |T| = 1
is called a unit or singleton type. A variable of
such a type can only ever have a single possible value. In Haskell, the
anonymous representation is the empty tuple ()
. Here is an example of
a function that maps anything to this type:
unit :: a -> ()
unit _ = ()
Of course, we can declare our own singleton types. Below is a custom
Singleton
type and an equivalent unit function:
data Singleton = Singleton
unit :: a -> Singleton
unit _ = Singleton
In C++, the anonymous representation of a singleton is an empty
std::tuple
:
template <typename T> std::tuple<> unit(T)
{
return { };
}
As can be seen from the above, Haskell makes it easier to define a
function that takes an argument of any type, as it provides syntactic
sugar for type parameters (a
in our example). In C++, the equivalent
declaration involves a template, but they boil down to the same thing.
The non-anonymous C++ representation is a struct which doesn't contain
anything. All instances of such a struct are equivalent:
struct singleton { };
template <typename T> singleton unit(T)
{
return { };
}
Sum Types
Here, things get a bit more interesting: a sum type
is a type which
can represent a value from any of the types it sums. So given type A
and type B
, the type S
summing up A
and B
is
S = { i : i ∈ A U B }
. So a variable of type S
could have any value
in A
or any value in B
. S
is called a sum type because its
cardinality is the sum of the cardinalities of A
and B
,
|S| = |A| + |B|
.
Sum types are great, because they allow us to build up more complex
types from simpler ones. Once we have unit types, we can build up more
complex types out of them by summing them. For example, a boolean type
which can be either true
or false
can be thought of as the sum of
the singleton true
type and the singleton false
type. In Haskell, a
boolean is defined as:
data Bool = True | False
Similarly, a Weekday
type can be defined as:
data Weekday = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday
Theoretically, numerical types could also be defined as huge sum types of every possible value they can represent. Of course, this is impractical, but we can reason about them the same way we reason about other sum types, we don't have to treat them as a special case.
In C++, an equivalent of the above is an enum class
. bool
is a
built-in type with special syntax, but we could define an equivalent as:
enum class Boolean
{
True,
False
};
It's easy to see how a Weekday
definition would look like. Things get
more interesting when we throw type parameters into the mix. In Haskell,
we have the Either
type, which is declared as follows:
data Either a b = Left a | Right b
An instance of this could be either a Left a
, where a
is a type
itself, which means it can be any of the values of a
, or it can be
Right b
, with any of the values of b
. In Haskell we use
pattern-matching to operate on such a type, so we can declare a simple
function that tells us whether we were given a Left a
like this:
isLeft :: Either a b -> Bool
isLeft (Left a) = True
isLeft (Right b) = False
This might not look like much, but of course we can compose more complex
functions. For example, say we have a function foo
that takes an a
and returns an a
, a function bar
that takes a b
and returns a b
.
We can then write a transform
function which takes an Either a b
and, depending on the contained type, it applies the appropriate
function:
-- Implementation of foo and bar not provided in this example
foo :: a -> a
bar :: b -> b
transform :: Either a b -> Either a b
transform (Left a) = Left (foo a)
transform (Right b) = Right (bar b)
This is way beyond the capabilities of a C++ enum class
. The old way
of implementing something like this in C++ was using a union and a tag
enum to keep track of which is the actual type we're working with:
// Declaration of A and B not provided in this example
struct A;
struct B;
struct Either
{
Either(A a)
{
ab.left = a;
tag = tag::isA;
}
Either(B b)
{
ab.right = b;
tag = tag::isB;
}
union
{
A left;
B right;
} ab;
enum class tag
{
isA,
isB
} tag;
};
Our implementation of transform would look like this:
// Implementation of A and B not provided in this example
A foo(A);
B bar(B);
Either transform(Either either)
{
switch (either.tag)
{
case Either::tag::isA:
return foo(either.ab.left);
case Either::tag::isB:
return bar(either.ab.right);
}
}
Our Either
type definition is obviously much more verbose than what we
had in Haskell, and it doesn't even support type parameters - at this
point it only works with struct A
and struct B
, while the Haskell
version works for any a
and b
types. The other major problem is
that, while unions provide efficient storage for different types (the
size of the union is the size of the maximum contained type), it is up
to the implementer to make sure we don't try to read an A
as a B
or
vice-versa. That means we need to keep our tag in sync with what we put
in the type and respect it when accessing the value of the union.
C++17 introduces a better, safer, parameterized type for this:
std::variant
. Variant takes any number of types as template arguments
and stores an instance of any one of those types. Using variant, we can
re-write the above as:
std::variant<A, B> transform(std::variant<A, B> either)
{
return std::visit([](auto e&&) {
if constexpr (std::is_same_v<decltype(e), A>)
return foo(std::get<A>(e));
else
return bar(std::get<B>(e));
}, either);
}
This is a lot of new syntax, so let's break it down:
std::variant<A, B>
is the new C++17 sum type. In this case, we specify
it holds either A
or B
(but it can hold an arbitrary number of
types).
std::visit
is a function that applies the visitor function given as
its first argument to the variants given as its subsequent arguments. In
our example, this effectively expands to applying the lambda to
std::get<0>(either)
and std::get<1>(either)
.
if constexpr
is also a new C++17 construct which evaluates the if
expression at compile time and discards the else branch from the final
object code. So in this example, we determine at compile time whether
the type we are being called with is A
or B
and apply the correct
function based on that. Something very similar can be achieved with
templates and enable_if
, but this syntax makes for more readable code.
Note that with this version we can simply prepend a
template <typename A, typename B>
and make the whole function generic,
as in the Haskell example. It doesn't read as pretty (because we don't
have good pattern matching syntax in the language), but this is the new,
type safe way of implementing and working with sum types, which is a
major improvement.
Product Types
With sum types out of the way, the remaining interesting category is
that of product types. Product types combine the values of several
other types into one. For types A
and B
, we have
P = { (a, b) : a ∈ A, b ∈ B }
, so |P| = |A| x |B|
.
In Haskell, the anonymous version of product types is represented by
tuples, while the named version is represented by records. An example of
a perimeter
function which computes the perimeter of a rectangle
defined by two points, where each point is a tuple of numbers:
perimeter :: (Num n) => (n, n) -> (n, n) -> n
perimeter (x1, y1) (x2, y2) = 2 * (abs(x1 - x2) + abs(y1 - y2))
The named version would declare a Point
type with Int
coordinates
and use that instead:
data Point = Point { x, y :: Int }
perimeter :: Point -> Point -> Int
perimeter (Point x1 y1) (Point x2 y2) = 2 * (abs(x1 - x2) + abs(y1 - y2))
The C++ equivalents are std::tuple
for anonymous product types and
struct
for named types:
// Anonymous
int perimeter(std::tuple<int, int> p1, std::tuple<int, int> p2)
{
return 2 * ((abs(std::get<0>(p1) - std::get<0>(p2))
+ abs(std::get<1>(p1) - std::get<1>(p2)));
}
// Named
struct point
{
int x;
int y;
};
int perimeter(point p1, point p2)
{
return 2 * (abs(p1.x - p2.x) + abs(p1.y - p2.y));
}
While sum types allow us to express values from multiple types into one, product types allow us to express values from several types together. Empty, unit, sum, and product types are the building blocks of a type system.
Bonus: Optional Types
An optional type is a type that can hold any of the values of another
type, or not hold any value, which is usually represented as a
singleton. So an optional is effectively a sum type between a given type
and a singleton representing doesn't hold a value
. In other words,
the cardinality of an optional for a type T
is |O| = |T| + 1
.
In Haskell, an optional is the famous Maybe
type:
data Maybe a = Just a | Nothing
A function that operates on Maybe
could say only apply
:foo
if the
optional contains an a
-- Implementation of foo not provided in this example
foo :: a -> a
transform :: Maybe a -> Maybe a
transform (Just a) = Just (foo a)
transform Nothing = Nothing
The new C++17 equivalent is the optional
type:
// Implementation of foo not provided in this example
A foo(A);
std::optional<A> transform(std::optional<A> a)
{
return a != nullopt ? foo(*a) : nullopt;
}
This might read a bit like the pointer implementation:
A* transform(A* a)
{
return a != nullptr ? foo(*a) : nullptr;
}
There is a key difference though: the type contained in the optional is
part of the object, so it is not allocated dynamically the way we would
allocate a pointer. nullopt
is a helper object of the singleton type
nullopt_t
.
Types are important because a big part of programming effectively consits of designing and composing types. Having a good understanding of the fundamentals leads to better, safer, and saner code.
Summary
We started by outlining some of the basic principles of type systems:
- Static and dynamic typing.
- Weak and strong typing.
- Static and dynamic polymorphism.
Then we went over the building block types of a type system, with Haskell and C++ examples:
- Empty types (cardinality 0).
- Unit types (cardinality 1).
- Sum types (
S
ofA
andB
has cardinalityA + B
). - Product types (
P
ofA
andB
has cardinalityA x B
). - Optional types (
O
ofA
has cardinalityA + 1
).
Further Reading
Ben Dean had an excellent talk at CppCon this year, Using Types Effectively. Another great talk about type design from CppCon is C++, Abstract Algebra and Practical Applications by Robert Ramey. And then there is Bartosz Milewski blog about Haskell, C++, and category theory.