Good Coding In Academia


Coding is an essential part of science for a lot of us: whether it is purely for automation purposes, or a data analysis tool (or even your text editor — LaTeX I see you), most of us have to code.

For the amount of programming we do, good practice in academia tends to be lacking. I see two main reasons for that:

  • Most of us are self-taught

  • In research, coding is just a means to an end: the science. In most people’s mind, less time spent polishing code means more time for science.

There are already plenty of courses and guides that can tell you about best coding practices, both in general as well as more specific to your language of choice. But implementing them all may not be practical or even feasible in an academic environment.

It’s important to find balance between cost and benefit, but the good news is that small changes in the way you code can have a tremendous impact on productivity and reproducbility.

My aim here is to provide you with Good Code Lite (TM) guidelines that are achievable and maintainable in the academic sphere. I have also split these recommendations into 3 tiers of effort, depending on the application: each new tier builds upon the previous ones.

1 - Unique Scripts

By this I mean code that has one application: for example making a plot or performing a specific data analysis task. This is code you may not intend to use again… but you might do (in my experience data analysis is an iterative process).

Good Names

Even for something small and simple that (you think) you’ll use only once, make sure you use proper variable names - you don’t have to ponder the best possible name, just use something meaningful so that reading and debugging your code doesn’t cause unnecessary headache.

A quick rule of thumb is that you should name your variables with nouns and your functions with verbs — because variables are things and functions do things.

Comments

Using sensible variable names will drastically reduce your need for comments, but please add a bit of English that could help your future self.

Use comments to describe what your code is supposed to do - this is particularly important for big loops or big chains of calculations.

2 - Code you’ll use over and over

This is code you created to automate a task or build a general version of a tool you’re tired of coding over and over.

Doc-strings

Docstrings are the backbone of code documentation. It is a Python-specific term but there must exist equivalents in whatever language you favour. It is defined as follows:

“A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.” —PEP 257

Essentially, they tell you what you need to know about a function (or Class, etc..) and how to use it. Writing good docstrings can be time consuming but they are essential to usability. If you are going to re-use that function, it should have a docstring.

Don’t know where to start when writing docstrings? Take a look at this quick guide from Data Camp.

Version Control

In short, version control keeps track of changes in your code so that you can revert back to previous iterations. Rather than having multiple copies of your code with different version or subversion ID (e.g. "v1.py”, “v2.p”, etc..), you should learn to use tools like Git/GiHub.

The basics will take you 2 hours to learn and it is really essential for tools you are going to be re-using or build upon so that you can save your version as you add new features and fix new bugs.

To get started you can check out this tutorial and the online GitHub resources.

Unit Test

A Unit test is a snipet of code that tests one element of your code — tests whether it breaks, if it does what it’s supposed to do, etc…

These tests, once written, can be run every time you make a change or add a feature to your code, to check you haven’t broken anything.

Here is a really cool and pretty exhaustive tutorial to get started on code testing ;)

3 - Open Source Code

Have you got a great application or package that your community would benefit from? Making it open source is the next step up. Here are a few additional things you’ll need.

License

This is not just a recommendation

A code is not open source if it doesn’t have an open source license. Just putting it online is not enough.

Learn more about open source licenses here.

Documentation

By this point you should already have some good docstrings. If not, you really need to tighten them up — especially if using python, you can use tools like Sphinx which automate creating documentation for your code by reading your docstrings.

There are a few other things you’ll need to add (I like to have them in the README of my GitHub):

  • Who is this code for

  • How to install it (including dependencies, especially if you’re not using a package manager like pypi)

  • Tutorials (if you have the time, it’s really best to have basic recipes for users).

Continuous Integration

When you have a GitHub set-up for your code and unit tests written, you can automate the testing process by having tools like Travis CI run your tests with different virtual machine configurations (e.g. version of numpy) every time you push code onto your GitHub!


I hope you found this useful!

Happy Coding!

Previous
Previous

Why Boys Just Aren’t Made For Science

Next
Next

A Guide To Research Logbooks