My main code quality guideline
…is not succintness, removing duplication, reducing the number of nested loops, any particular naming scheme or a style guide. It is, instead, asking myself the question how tricky will the code be to understand if I have to fix it after switching to something else for half a year?
Or to put it another way: assuming I forget this code completely and have to return to fix a bug in some function, how much of the surrounding module|library|app do I have to get in my head to see what is going on?
[W]e want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for expressing ideas about methodology. Thus, programs must be written for people to read, and only incidentally for machines to execute.
Abelson & Sussman, Structure and Interpretation of Computer Programs
I know for a fact that most of the code I write will at some point be modified, fixed, and extended by someone who does not know it. In many cases that someone will be me after switching to something else for long enough to forget the code completely. It makes perfect sense to optimise for that by attempting to minimise the cost of understanding it.
It sounds blurry, but that’s only because it is. From this angle and with a bit of a squint coding starts to look like creating prose with the additional requirement of the result being executable, and suddenly we get all the blurriness and difficulty of writing English1. There are no strict rules that guarantee readability, just general guidelines, rules of thumb, and the hope that I can tap into some context I share with future maintainers.
Assumptions
In the case of prose you can assume the reader understands the language and has certain level of common knowledge. In the case of code, I assume that they know the language fairly well; not just the syntax, but also conventions used by its standard library.
These assumptions are not always met, but that’s okay; in that case if I manage to make my code conform to conventions it will help the reader build the right intuition for the community; otherwise I would be one of people responsible for the deterioration of quality within it, and would actively get in someone’s way to improve their skills.
Conventions
Conventions are good. Conventions are decisions for free. Conventions are lower cognitive load when reading code and when writing it. Conventions are patterns and people are great at recognising those. Use them. Don’t clash with them.
If something looks similar to what people already know and understand, they will naturally assume it behaves similar. Use that whenever possible, treat it as a way to avoid having to explain things.
You have multiple levels of conventions at your disposal: language (idioms and patterns established by the standard library), framework (Django introduced some good conventions early on), project, module. The higher level you manage to use, the easier it will be for random programmer to understand the reference.
Vocabulary
You create it whether you want it or not, you might as well use that to your advantage. Function and variable names are roughly verbs and nouns, when someone tries to understand your code that’s how they are going to read it.
Good names make it easier to reason about the code. Bad names require the reader to go read some other piece of code to understand what’s going on. Very bad names are actively misleading.
Inheritance
It’s a big one. Many class hierarchies obscure the flow of code and make discovery more difficult.
And sure, it’s just as easy to come up with examples of bad function composition, but the common approach of “here’s a class that implements the top-level code flow, override pieces in subclasses to tweak or configure it” requires the reader to have a fairly good understanding of the superclass, and makes it trickier to follow the code.
It’s sometimes worth it (the entire unittest
family is a
particularly good example: test cases allow customization by
overriding a very limited set of methods, and those are mostly
consistent across a large number of languages), more often it is not
(most modules are not unittest
), and the key point is to recognize
that the reversal of flow comes with high cognitive cost you need to
consider when using it.
Example: wrapping a well-known function
Let’s say you find yourself writing the same piece of code around some
standard library function often and want to refactor that out. For
example, my app does a lot of work on a set of data files in a
particular location and encoding, some reads and some writes,
something that might start as multiple calls to open
like this one:
The first thing to go would probably be that repeated call to
os.path.join
:
And if it wasn’t for that encoding
parameter I would leave it as is,
without wrapping that open(path_to(filename), <mode>)
in another
function, because it’s idiomatic and most Python programmers will
immediately understand what’s going on. That’s much more value than
removing this particular piece of duplication.
Having to specify encoding changes things because unlike with file name and mode, omitting it by mistake leads to a bug that might go unnoticed for a while. So let’s create a wrapper here.
The combination of function name and interface is going to be important. The name in particular might take a while, but it’s worth it to get it right because this is how you lead the reader to understanding what is going on.
I want something based on open
to use associations with the
built-in. open_data_file
might work, but the word data
is very
ambiguous and in most situations you’re better off with something more
specific. Are our files part of configuration? Cache? Static files
for a website? Let’s assume configuration files.
Next thing to do is get the interface right. Write how you would like to use the function before implementing it.
Does it read right? If I see it in code I open for the first time in
my life, will it make it easier for me to understand it? Hopefully.
I pushed encoding
and path_to
into open_config_file
, tried to
choose a name that might convey some idea about what’s going on, but
otherwise I kept the interface as close to open
as possible to make
it easier to understand by anyone who already knows that one.
So now I can implement it.
It ended up being a one-liner that builds the right file name and
overrides the default value of one of open
arguments.
Example: unit tests
This is fairly common in projects I’ve seen:
There’s a piece of initialisation that is [probably] used by multiple
test cases. Some of them might depend on it by using self.instance
,
others might depend on it indirectly by assuming it’s in the DB and
taking it into account when counting objects or checking results of
non-ORM queries. Those test cases are now coupled and it makes
changes more difficult.
Sometimes that assignment will migrate to a middle class in the hierarchy, in which case it’s even less convenient to get all the code necessary to understand a test case on the screen at the same time. And then since the assignment is in one of superclasses, you might want to introduce a way to customize it, and sometimes this happens:
Or, alternatively:
At this point those attempts to reduce code duplication end up creating
more complexity than they prevent. What I try to do instead is avoid
passing data between setUp
and test_*
.
At first glance this might look like going the wrong way: my test is now longer, and having to do additional work per test is definitely some cost. On the other hand, I gain:
- Readability: I have the whole data flow in one place, and with the
right naming for helper methods like
create_instance
(this one is generic, but for non-generic initialization data I would choose a name that gives a hint what is special about the instance) I have more information in the test. - Decoupling: in the first version,
setUp
controlled the data that went intoinstance
. In the second, individual tests do that; if some of them need slightly different data I can easily add that by havingcreate_instance
allow overriding defaults, egcreate_instance(some_field='override')
. - Clarity: now I know for certain which tests really use this instance. It’s opt-in.
- Flexibility: now I can have a test that creates multiple instances.
To me it’s a win.
Least maintainer astonishment
UX world has the Principle of Least Astonishment. It’s about understanding that users build some expectations based on what the system is supposed to do, and how it looks and behaves, and trying to match those.
I like that rule a lot, and try to apply it to programming: people reading my code will use patterns and conventions matching what I wrote to reason about it. I can try to work with that to make things easier to understand.
-
In one of previous lives I had to [briefly] deal with code created by merger of three teams who used English, German, and French respectively for their code and comments. I don’t remember trying to make sense of that mess fondly. Stick to English, please. ↩