Clean Code - Part 1

These posts are based on a Clean Code talk I did for my team a few months ago, which, in turn, was inspired by the advice I gave to some of our summer interns as four rules of thumb for writing cleaner code:

Write less code
Write simple code
Write stateless code
Write readable code

I will cover the first two points in this post and the remaining two in Part 2. I'm talking about C++ and C# throughout, but most of this should be applicable to any object-oriented or multi-paradigm language.

Write Less Code

The number of defects found in open source projects was 0.45 defects/1,000 lines of code, while the industry's average is around 1 defect per 1,000 lines of code for companies not using automated testing such as static analysis.

Source: InfoQ

A great way to have fewer bugs is to have fewer lines of code! What I mean by this is that churning out many lines of code is by no means a measure of productivity yet, unfortunately, most developers still feel great when, at the end of the day, we wrote insert high number LoC.

How to write less code?

Two points to keep in mind: first, don't reinvent the wheel - don't write code if there is an existing library, internal to your company or open-source, that already does what needs to be done. Case in point, we refactored some code for a project (C#), extracted some interfaces, componentized things, and wrote a bunch of unit tests. All of this was great, except we ended up with a bunch of handcrafted stub classes: for

interface IMyComponent
{
    void Foo();
    int Bar();
    void Baz();
}

we had

class MyComponentStub : IMyComponent
{
    public void Foo()
    {
    }

    public int Bar()
    {
        return 0;
    }

    public void Baz()
    {
    }
}
...
var myComponentStub = new MyComponentStub();

and so on. Implementing these stubs was tedious, needless work - we integrated Moq , a mocking library, and the above code turned into:

using Moq;
...
var myComponentStub = Mock.Of<IMyComponent>();

Moq uses reflection to stub out interfaces at run-time, so simply adopting the library helped us get rid of a lot of code.

The second way to write less code is to know the standard library of your language of choice. Many times, a block of code can be replaced with a simple library call. For C++, pay particular attention to the STL <algorithm> header and for C#, System.Linq. Both contain many of useful algorithms which can replace a lot of code.

I also recommend watching Sean Parent's C++ Seasoning talk, one of the best tech talks I've seen. The example he gives in the talk (from the Chrome codebase) shows how a couple of lines of STL code can be used instead of a whole convoluted function:

void PanelBar::RepositionExpandedPanels(Panel* fixed_panel) {
  CHECK(fixed_panel);

  // First, find the index of the fixed panel.
  int fixed_index = GetPanelIndex(expanded_panels_, *fixed_panel);
  CHECK_LT(fixed_index, expanded_panels_.size());

  // Next, check if the panel has moved to the other side of another panel.
  const int center_x = fixed_panel->cur_panel_center();
  for (size_t i = 0; i < expanded_panels_.size(); ++i) {
    Panel* panel = expanded_panels_[i].get();
    if (center_x <= panel->cur_panel_center() ||
        i == expanded_panels_.size() - 1) {
      if (panel != fixed_panel) {
        // If it has, then we reorder the panels.
        ref_ptr<Panel> ref = expanded_panels_[fixed_index];
        expanded_panels_.erase(expanded_panels_.begin() + fixed_index);
        if (i < expanded_panels_.size()) {
          expanded_panels_.insert(expanded_panels_.begin() + i, ref);
        } else {
          expanded_panels_.push_back(ref);
        }
      }
    break;
    }
  }

  // Find the total width of the panels to the left of the fixed panel.
  int total_width = 0;
  fixed_index = -1;
  for (int i = 0; i < static_cast<int>(expanded_panels_.size()); ++i) {
    Panel* panel = expanded_panels_[i].get();
    if (panel == fixed_panel) {
      fixed_index = i;
      break;
    }
    total_width += panel->panel_width();
  }

  CHECK_NE(fixed_index, -1);
  int new_fixed_index = fixed_index;

  // Move panels over to the right of the fixed panel until all of the ones
  // on the left will fit.
  int avail_width = max(fixed_panel->cur_panel_left() - kBarPadding, 0);
  while (total_width > avail_width) {
    new_fixed_index--;
    CHECK_GE(new_fixed_index, 0);
    total_width -= expanded_panels_[new_fixed_index]->panel_width();
  }

  // Reorder the fixed panel if its index changed.
  if (new_fixed_index != fixed_index) {
    Panels::iterator it = expanded_panels_.begin() + fixed_index;
    ref_ptr<Panel> ref = *it;
    expanded_panels_.erase(it);
    expanded_panels_.insert(expanded_panels_.begin() + new_fixed_index, ref);
    fixed_index = new_fixed_index;
  }

  // Now find the width of the panels to the right, and move them to the
  // left as needed.
  total_width = 0;
  for (Panels::iterator it = expanded_panels_.begin() + fixed_index + 1;
          it != expanded_panels_.end(); ++it) {
    total_width += (*it)->panel_width();
  }
  avail_width = max(wm_->width() - (fixed_panel->cur_right() + kBarPadding), 0);

  while (total_width > avail_width) {
    new_fixed_index++;
    CHECK_LT(new_fixed_index, expanded_panels_.size());
    total_width -= expanded_panels_[new_fixed_index]->panel_width();
  }

  // Do the reordering again.
  if (new_fixed_index != fixed_index) {
    Panels::iterator it = expanded_panels_.begin() + fixed_index;
    ref_ptr<Panel> ref = *it;
    expanded_panels_.erase(it);
    expanded_panels_.insert(expanded_panels_.begin() + new_fixed_index, ref);
    fixed_index = new_fixed_index;
  }

  // Finally, push panels to the left and the right so they don't overlap.
  int boundary = expanded_panels_[fixed_index]->cur_panel_left() - kBarPadding;
  for (Panels::reverse_iterator it =
       // Start at the panel to the left of 'new_fixed_index'.
       expanded_panels_.rbegin() + (expanded_panels_.size() - new_fixed_index);
       it != expanded_panels_.rend(); ++it) {
    Panel* panel = it->get();
    if (panel->cur_right() > boundary) {
      panel->Move(boundary, kAnimMs);
    } else if (panel->cur_panel_left() < 0) {
      panel->Move(min(boundary, panel->panel_width() + kBarPadding), kAnimMs);
    }
    boundary = panel->cur_panel_left() - kBarPadding;
  }
  boundary = expanded_panels_[fixed_index]->cur_right() + kBarPadding;

  for (Panels::iterator it = expanded_panels_.begin() + new_fixed_index + 1;
       it != expanded_panels_.end(); ++it) {
    Panel* panel = it->get();
    if (panel->cur_panel_left() < boundary) {
      panel->Move(boundary + panel->panel_width(), kAnimMs);
    } else if (panel->cur_right() > wm_->width()) {
      panel->Move(max(boundary + panel->panel_width(),
                  wm_->width() - kBarPadding),
                  kAnimMs);
    }
    boundary = panel->cur_right() + kBarPadding;
  }
}

becomes:

void PanelBar::RepositionExpandedPanels(Panel* fixed_panel) {
  CHECK(fixed_panel);

  // First, find the index of the fixed panel.
  int fixed_index = GetPanelIndex(expanded_panels_, *fixed_panel);
  CHECK_LT(fixed_index, expanded_panels_.size());

  // Next, check if the panel has moved to the left side of another panel.
  auto f = begin(expanded_panels_) + fixed_index;
  auto p = lower_bound(begin(expanded_panels_), f, center_x,
    [](const ref_ptr<Panel>& e, int x){ return e->cur_panel_center() < x; });

  // If it has, then we reorder the panels.
  rotate(p, f, f + 1);
}

Code snippets borrowed from Sean Parent's slides, I highly recommend watching the whole talk.

The key takeaway here is that there could be a standard library implementation or an external module that can greatly simplify your work and it's a good practice to always ask yourself do I really need to write this?

Write Simple Code

First, a few notes on cyclomatic complexity from Wikipedia:

Cyclomatic complexity is a software metric (measurement), used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code.

The complexity M is then defined as

M = E − N + 2P

where

E = the number of edges of the graph, N = the number of nodes of the graph, P = the number of connected components.

A control flow graph of a simple program. The program begins executing at the red node, then enters a loop (group of three nodes immediately below the red node). On exiting the loop, there is a conditional statement (group below the loop), and finally the program exits at the blue node. This graph has 9 edges, 8 nodes, and 1 connected component, so the cyclomatic complexity of the program is 9 - 8 + 2 * 1 = 3.

Source: Wikipedia

The cyclomatic complexity of any piece of code should be minimized. This can be achieved by avoiding branching, namely, whenever possible, avoiding conditional statements and loops. Linear code is easier to read and maintain, and provides less opportunities for bugs.

Avoiding conditional statements

One way to avoid conditional statements is to, whenever feasible, throw exceptions instead of propagating errors through return values.

Here is an example of error code propagation through return values using the Windows API's HRESULT:

HRESULT foo(); // Does some work and returns an HRESULT
HRESULT bar(); // Does some work and returns an HRESULT

HRESULT baz()
{
    HRESULT hr = S_OK;

    hr = foo();
    if (FAILED(hr))
        return hr;

    hr = baz();
    if (FAILED(hr))
        return hr;

    ... // Some more work here which might fail

    return hr;
}

if (SUCCEEDED(baz()))
{
    // :)
}
else
{
    // :(
}

This can be replaced with the more concise and much easier to read:

void foo(); // Does some work, might throw
void bar(); // Does some work, might throw

void baz()
{
    foo();
    baz();

    ... // Some more work here which might fail (and throw)
}

try
{
    baz();
    // :)
}
catch (...)
{
    // :(
}

Error code return values come from the old days when exceptions didn't exist and make code harder to read. That being said, for C++ specifically, you should be careful about throwing exceptions across DLL boundaries. In practice though, a lot of code in the shape of the above example appears within the same executable for no good reason. If cross-DLL boundary is a problem, I would actually recommend using exceptions internally and switching to return codes at the public API boundary.

Another way to avoid conditional statements is to use the Null Object pattern instead of checking for null. For example, take an IActivity interface on which we can log success or failure, and an ActivityScope which can retrieve the current activity from a context:

interface IActivity
{
    void LogSuccess();
    void LogFailure();
}

class ActivityScope
{
    ...

    public IActivity GetCurrentActivity()
    {
        if (!_context.HasCurrentActivity())
        {
            return null;
        }
        return _context.GetActivity();
    }
}

With this implementation, all clients of the API have to make sure GetCurrentActivity() returns an object as opposed to null. All callers look like this:

ActivityScope activityScope = new ActivityScope();
activityScope.CreateActivity();

... // Do a bunch of stuff

var activity = activityScope.GetCurrentActivity();
if (activity != null)
{
    activity.LogSuccess();
}

While there is a single ActivityScope implementation, there are hundreds of calls to GetCurrentActivity, all coming with a boilerplate null check. The Null Object alternative for this is to provide a NullActivity, for which LogSuccess and LogFailure don't do anything. ActivityScope can return NullActivity instead of null if there is no Activity in the context:

class NullActivity : IActivity
{
    public void LogSuccess() { }
    public void LogFailure() { }
}

class ActivityScope
{
    ...
    private static NullActivity _nullActivity = new NullActivity();

    public IActivity GetCurrentActivity()
    {
        if (!_context.HasCurrentActivity())
        {
            return _nullActivity;
        }
        return _context.GetActivity();
    }
}

Now callers don't need to worry about getting back a null, and can use the API like this:

activityScope.GetCurrentActivity().LogSuccess();

Yet another way to reduce branching is when it used for mapping between two types:

if (a == IdType.Foo)
{
    b = "Foo string";
}
else if (a == IdType.Bar)
{
    b = "Bar string";
}
else if ...

A pattern like this (which can also take the form of a big switch/case statement) can usually be replaced with indexing into an array or looking up the corresponding value in a hash map:

Dictionary<IdType, string> IdTypeToStringMap = new Dictionary<IdType, string>()
{
    { IdType.Foo, "Foo" },
    { IdType.Bar, "Bar" },
    ...
};
...
b = IdTypeToStringMap[a];

This is, again, easier to maintain, since it is declarative - the mapping is given as data (IdTypeToStringMap), not as code (long series of if/else).

Avoid loops

This goes back to the C++ Seasoning talk, namely the No Raw Loops guideline. Here's a C# example: given a list of numbers, we want to get the square of all the odd numbers in the list.

var numbers = new List<int> { 6, 1, 2, 7, 3, 4, 9, 5, 8 };

// Get the squares of all odd numbers

One way to do this is to maintain a list of numbers, iterate over the list, check if numbers are odd, and if so, square them and add them to the list:

// Get the squares of all odd numbers
IEnumerable<int> SquareOdds(IEnumerable<int> numbers)
{
    var squares = new List<int>();

    foreach (int number in numbers)
    {
        if (number % 2 != 0)
        {
            squares.Add(number * number);
        }
    }

    return squares;
}

var squares = SquareOdds(numbers);

A neater way to do this is to use a generator instead of manually maintaining the list of squares:

// Get the squares of all odd numbers
IEnumerable<int> SquareOdds(IEnumerable<int> numbers)
{
    foreach (int number in numbers)
    {
        if (number % 2 != 0)
        {
            yield return number * number;
        }
    }
}

var squares = SquareOdds(numbers);

That being said, what I would actually recommend is using Linq:

// Get the squares of all odd numbers
var squares = numbers.
                Where(number => number % 2 != 0).
                Select(number => number * number);

Fewer lines of code and no branching whatsoever¹. Where and Select are generic algorithms, and their arguments are the predicates we use. This makes the intent of the code clear at a glance - we are filtering the collection with a predicate (number => number % 2 != 0) and applying a transformation to it with another predicate (number => number * number). Also, the filtering and transformation are library functions, so we can be fairly certain they work well, and only need to worry about maintaining our predicates.

It might not look like a big deal in this simple made-up example, but as code evolves, it becomes harder and harder to follow the iteration logic, as the code gets littered with break, continue, and return statements (see the Chrome example quoted in the Write Less Code section).

Key takeaways:

Try to keep functions linear (or as linear as possible)
Default to throwing instead of propagating errors up the call stack
Consider creating a null object when code is littered with null checks
Separate algorithm logic from predicates to make the intent of the code clear (in other words, no raw loops).

Notes on performance

The most interesting question I was asked is what are the performance implications of using an STL algorithm or Linq.

The default answer is, of course, you have to measure for your particular case! Blanket statements cannot be made about performance, as there are many factors involved: compiler, runtime, standard library, OS, architecture, whether code is on a hot path or not, and so on and so forth.

Still, my recommendation is to use the library algorithms and, only if they become the bottleneck (which in most cases shouldn't happen), look into replacing them with handcrafted code. Another thing to keep in mind is that standard library authors know what they're doing, so it's very likely that library code is already pretty well optimized. I ran a simple wall clock benchmark for 1M iterations for some of the examples I used throughout the presentation (both the handcrafted and the library versions), and in all cases the code leveraging library functions ran slightly faster.

Cyclomatic complexity of this is actually higher when computed by looking at basic blocks (eg. from Visual Studio's Analyze menu), since the compiler will automatically add a finally block to dispose of the Linq-returned IEnumerables in case of exception. That being said, I prefer compiler-generated complexity to developer-generated complexity. ↩