Code isn’t just code – it’s knowledge
Ever since I started reading about software design and while I have been practicing what I’ve been taught, I have begun to see a pattern in the problems we face while maintaining our codebase. It all has to do with how we organize our knowledge in our codebase. What knowledge am I talking about here? Technological knowledge? Best practices while coding? Not really, I am talking about our solutions to problems. Not only does code solve some problem, but it also represents some form of knowledge about a particular problem and the environment it resides in. It has knowledge about how a particular problem has been solved. This knowledge should preferably only exist in one and only one place in our code and not leak uncontrollably into the rest of our code base. For if this knowledge changes sometime in the future, we’ll be having some pretty gnarly challenges ahead of us.
Since this can be rather abstract to talk about, I suggest you ask yourself the following questions about the code you write:
- What problem does it solve? What does it know about it?
- Does my code make any assumptions of how some other part of the codebase works in order to solve the problem?
- Does my interface clearly communicate its intent? Are there any implicit rules when using it? If so, how can I communicate those more clearly?
- Can my solution or parts of it be reused?
Those are a handful of questions I usually reflect upon when writing code, and the list could go on for quite a while. Much of the code we write should communicate its intent and guide the reader towards those places in the code that may be interesting to them. They’d rather not shuffle through tons of code just to find what they’re looking for. It should not only communicate its intent, but also when it should be utilized.
Since those questions can be interpreted in a variety of ways, I thought I’d share my thoughts on what I mean with those questions.
What problem does it solve? What does it know about it?
The first step is to think about your solution. What knowledge did you need to have in order to solve it? If the problem was to, for example, bake an apple pie, maybe your code needed to solve one part of the problem: cut an apple into pieces. So maybe your code would look something like this (written in Lua)
function BakeAnApplePie()
local apple = CreateApple()
local appleBites = {}
for i = 0, 10 do
local knife = CreateKnife()
local bite = knife:Cut(apple)
table.insert(appleBites, bite)
end
...
end
So what knowledge does this function have as of right now? Well, obviously the knowledge of how to cut an apple. Is this knowledge reusable by some other part of the code? Not right now. This knowledge is mixed with the rest of the activity of baking an apple pie, but some other code may only want to cut an apple into pieces and maybe use it in a smoothie instead. So instead, this other code would cut the apple the exact same way, while introducing duplicate code. Maybe our knowledge of how to cut an apple changes in the future, how would that affect our code? Both the function for baking a pie and preparing a smoothie would have to change.
The point here however is not to discuss whether this is good or not, the point is to describe what we mean when we talk about knowledge in code. The knowledge of how something is done. All knowledge needed in order to solve a particular problem.
Does my code make any assumptions of some other part of the codebase?
This can be a bit more subtle in your code. When you write your solution, did you study the implementation of some other class, for example? Did you use any of the knowledge in this implementation in your solution? For example, you may notice that some class returns an array of integers that happen to be, let’s say by coincidence, sorted in an ascending order. So you decide to write some code like the following:
function IsLotteryTicketValid(ticketNumber)
local validTicketNumbers = GetValidTicketNumbers()
local ticket = BinarySearch(validTicketNumbers, ticketNumber)
return ticket ~= nil
end
Now then, what knowledge does this particular function have? Well, certainly the knowledge of what it means for a lottery ticket to be valid or not. Not only that, it also assumes that the array of valid ticket numbers is sorted. This assumption is made on the line where we search for the given ticket number when we use a binary search. As we know, a binary search requires a sorted collection of integers in this case (or a rotated one). But if the ordering of the ticket numbers is changed so that it is no longer sorted, this function will stop working properly. And this will become apparent when this problem manifests itself as a bug. Who knows how long it’ll take to find out what’s wrong?
Don’t make any assumptions of how some other class or module works. Rules on how a particular class or module should be used should be communicated clearly by the interface itself and the programmer(s) who designed it. Do not make your code depend upon implementation details as we did in this case. Depending on implementation details isn’t just about depending upon private members of some class, it can be a lot more subtle than that, as we’ve seen in this example.
Does my interface clearly communicate its intent?
When you’ve created a class (or something equivalent) with a public interface, you should consider how clients will use this interface. In this case, a client can be either a programmer or some client code. Are there any implicit rules, for example? An example could be temporal coupling, where code is dependent upon a sequence of events that must happen in a certain order or at some particular point in time. A typical example is, in languages like C for example, where the client must initialize a data structure before using it. If the client forgets to call it, it will probably attempt to work with an object that is in an inconsistent state. Maybe there are several functions that must be called in a specific order in order for the object to be in a consistent state? Those rules (invariants) should preferably be encapsulated in the object itself. Or else beautiful things will happen, such as segfaults.
Another example is a function that is both a command and a query (command is a subroutine which causes side-effects; a query simply returns data), without communicating the side-effect. One thing to reflect upon is whether this side-effect is okay or not. Can it potentially cause problems for the client? For example, consider the following pseudo-ish C++ code:
std::string Class::RetrieveUserName() {
m_connection = new Connection("imaginary.host");
m_name = m_connection->GetUserName(m_user_id);
return m_name
} // assume variables with the 'm_' prefix to be member variables
What happens if this method is called twice? The immediate problem we can notice is that there’ll probably be a memory leak, if we look at the line where m_connection is assigned a new connection. What happens to the previous connection? This could be seen as the implicit rule, “Don’t call this function more than once.”
Another thing we notice is that the member variable m_name gets assigned a new name each time. But one may assume that the user name associated with the user_id is never changed. It could be that it changes, though, who knows. If the Connection class establishes a connection over the network, that is a major problem as well. There are many flaws with this particular method.
A function that is both a query and a command is not always bad. The side-effect may be invisible to the client, for example. Maybe you perform a heavy calculation and store the calculation in a cache. The client doesn’t necessarily care where you get the calculation from, it only cares about getting it at all.
The point is to reflect upon those things. How can things go terribly wrong when a client uses your interface? What does the client need to know in order to use it? Can I isolate this knowledge so the client doesn’t need to know about it, and would that make sense? Is my interface intuitive? Use the method signature and separation of concerns (and other various design principles) in order to define a better interface that communicates its intent.
Can my solution or parts of it be reused?
Sharing solutions is important; sharing knowledge of how something was solved is not (in terms of code, of course). The only time we want to know about this is when that particular piece of knowledge changes. And when that piece of knowledge changes, only one well-defined part of the code should be affected by the change. Knowledge about a particular concept shouldn’t lay around sprinkled all over the codebase, because who’s going to make sure those pieces of knowledge stay in sync with each other in the face of change? How will new developers find those puzzle pieces and put them together in their head in order to understand it?
To make your solution reusable, think about the previous questions, especially whether your interface is intuitive or not, and whether it leaks knowledge of how the problem was solved. Divide the problem into tiny pieces, and let your code be divided in the same way. Create a one-to-one mapping between those pieces. Small steps taken should be represented by functions, concepts should be represented by classes or something equivalent. Someone may not need to utilize every bit of your solution, but they may need to solve a smaller problem which you’ve already solved as part of your total solution. With good naming and organization of your knowledge, you will increase the chance that people will know to reuse some parts of your code for some other problem.
Summary
View your code not only as a solution to some problem, but as knowledge as well. This knowledge is tightly coupled to the problem at hand, and as we all know, our understanding of the problem tend to change. New features come, old features go and some features change. It is important that we organize our understanding of the problem in an intelligent way. And because our code is a tangible form of our understanding of the problem, it has to be continuously adapted to reflect our new understanding.