Develop Easy-to-debug Interfaces

Whenever you’ve designed an interface for some purpose, have you ever considered what would happen if a client of your interface misused it? Misused, as in causing your interface to enter an unstable state where it can’t possibly do anything of value without causing the program to stop functioning properly? How can you notify the client code of this problem as soon as it happens so it’s detected earlier? Let’s look at a typical example of how some part of the interface for a vector might look like (in C++), using the TryGet() pattern (no idea if it is considered an official pattern):

bool DoubleVector::TryGet(int index, double& result) {
   if(index >= 0 && index < Length())) {
     result = ElementAt(index);
     return true;
   } else {
     return false;
   }
}

Here we have some pseudoish C++ for a vector we’ve created.
And here’s some client code using it:

void ClientCode::SaveFirstDouble(const DoubleVector& vector) {
   double result;
   if (vector.TryGet(0, result)) {
     FileSystem.SaveMyDouble(result);
   }
}

All is fine and dandy. But have you considered that a programmer may make a mistake (I know right? Totes cray cray.) and not check whether the value could actually be fetched? Like this:

void ClientCode::SaveFirstDouble(const DoubleVector& vector) {
   double result;
   vector.TryGet(0, result);
   FileSystem.SaveMyDouble(result);
}

This would work fine if the vector was ensured to be non-empty at all times. But what if that’s not the case? What would the variable ‘result’ hold if the vector was empty? In this case, since it’s an unitialized local variable of a primitive type, it’d save garbage data. The TryGet() method wouldn’t change the value of the variable, after all. God knows what would happen with this value later on. Maybe the user wants to open the file and read it, and as they do, a segfault suddenly appears or some other strange behavior occurs. How long time will it take to trace the bug back to this client code we’ve written? How can we inform the client of this grave mistake at an early stage so we won’t need to spend time trying to find the source of the problem? Hmm, our first thought could be to log the error by changing the TryGet() method in this way:

bool DoubleVector::TryGet(int index, double& result) {
   if(index >= 0 && index < Length()) {
     result = ElementAt(index);
     return true;
   } else {
     LogError("Tried to fetch element outside the range of valid indices.");
     return false;
   }
}

Cool! Now we’ll totally know what happened. But oh my, how many messages are we printing to the logging system in other parts of the codebase? How do we know what error message to look for when trying to fix a particular bug? How do we know that message is even relevant to our problem? Oh, and this method will log an error each time it doesn’t find the element, even though that doesn’t necessarily mean there’s something wrong. It’d log an error message even if the client correctly checked whether an element existed before trying to fetch it. Lord save our helpless small souls. Let’s also not forget that the message will be invisible to the end user. How will they report the bug if we don’t catch it before they do? Will we be able to easily reproduce it?

Wouldn’t it be great if we could just stop executing the program and avoid entering an unstable state at this point? Why let this bug propagate through the program and cause it to behave in very strange ways while possibly corrupting data? Hmm, maybe we could throw an exception like this?

bool DoubleVector::TryGet(int index, double& result) {
   if(index >= 0 && index < Length()) {
     result = ElementAt(index);
     return true;
   } else {
     throw Exception("Sinful deeds have been made this day");
     return false;
   }
}

Nice, so now we won’t need to look through any logging messages; if execution ends here, we’re sure it’s the source of the problem. Or is it? We still have the problem that this exception will be thrown even when the client correctly checks for the element’s existence before reading it. And now the return value of the TryGet() method loses its meaning, why return any boolean at all if it only proceeds to throw an exception in the case where it would return false? At this point, why not just return the element immediately if it exists while throwing an exception if it doesn’t? Like this:

double DoubleVector::Get(int index) {
   if(index >= 0 && index < Length()) {
      return ElementAt(index);
   }

   throw Exception("Sinful deeds have been made this day");
}

Nice, but we still have a problem left: how can we let the client check whether the element exists or not without forcing them to use some ugly try-catch code? It’s quite simple really, you can move the logic for checking whether the element exists or not to a new method:

bool DoubleVector::ElementExistsAt(int index) {
    return index >= 0 && index < Length();
}

Now we’ve separated the logic for checking the existence of an element and the logic for fetching the element. Now clients can also choose to only check the existence of some element without having to fetch the element as well, so they won’t need to always create a new variable to hold the value in. And we also help the client debug their code should they ever do a mistake when they used it. How? We throw an exception when attempting to read outside the vector, preventing the program from entering an unstable state, and this will immediately tell the developer where the source of a bug is so they could fix it. They just need to add a call to ElementExistsAt() and they’re done.

Now you might think, “but wouldn’t they need to catch the exception?”
They could, but they don’t need to – nor should they, they should use the ElementExistsAt() method instead. Using try-catch for controlling the flow is much more unreadable. Just think about it, how do you use the std::vector or some equivalent construct in your programming language’s standard library? Do you put a try-catch around code calling the std::vector::at() method, or do you check the boundaries first? Maybe it’s a matter of taste, but I certainly check the boundaries rather than using try-catch for controlling the execution flow. The point isn’t how you check for the existence of the element, the point is how you can help the debugging process when the existence isn’t checked at all before fetching the element.

Summary

Some of you may recognize what I’m talking about here, and it’s the principle of failing fast. Most of the time when the flow of execution reaches a point where nothing can be done to avoid the program from entering an unstable state, you should crash the program immediately or save yourself from it somehow. That’s better than letting the program fail slowly, because who knows what will happen. Some data becomes corrupted? Segfaults appear at some distant point from the source of the bug? Or the program starts to behave in strange ways which makes the end user think “Man, what a piece of trash this program is. It never works as intended.”

Not only that, you may catch those bugs earlier in the development phase if you fail fast, reducing the probability of bugs sneaking into release. You can’t be certain that you find the bug immediately after all, not even if you have lots of unit tests and integration tests.

Of course it isn’t good to always fail fast however, but for unsolved programming mistakes it usually is. But imagine when a user tries to open a file and read it into the program, but the program notices that the file contains corrupt data that might’ve left the program in an unstable state if it chose to use this data anyway. In this case the best thing is to just display an error message in a GUI dialog, for example, and just not open the file for any further processing. This way, no corrupt data enters the program and it can continue executing perfectly fine.

In short, causing the program to enter an unstable state should immediately stop the execution of the program to avoid any corruption of data. If the program is left in an unstable state, it’ll stop behaving correctly while the end user becomes frustrated and it also becomes much harder for the developers to track down the bug.

Also, remember to consider refactoring your interface in such a way so it’s harder for client code to make mistakes while they use it, so you won’t need to think so much about what happens in those scenarios where the program may be left in an unstable state. It’s not always possible, but remember to at least always consider it.

So what are exceptions used for? It’s used for telling the developer that at this point, you need to recover from this problem or else the program will close in order to avoid entering an unstable state.