Writing tests for a codebase that has none

June 09, 2021

/images/automation.jpg

In this post, I will share with you my decision making process when I took over a legacy code base with no tests, and how I started added tests to it.

Understanding the value and pitfalls of tests

It is easy to get into writing automated tests because there are so many benefits to the codebase and the team, but one conversation that is missing is choosing the type of test to write. There are unit tests and integration tests and it is important to know which one to write:

  • Unit tests: Small units of tests that test the inputs and outputs of a singular module or class. They are easy to write, easy to set up and run very fast.
  • Integration tests: Tests that test several parts together. In the context of web applications, this will often be your controller tests. They are more involved to set up, and often much slower.

Which should you then pick? Which has value?

Consider this: Often a legacy code base has a lot of old code that was probably written by people who have left the company or pieces of code that are terribly written. What will give you the most value now is the certainty and confidence that changes to code doesn't introduce regressions (i.e. break existing behaviour). On top of it, if it helps you identify and specify previously unknown edge cases, that will be even better.

For this example, let's have a UserController that accepts a some params and then creates database object and sends an email:

UserController
  - it calls database create_user function
  - it calls mailer function
  - it redirects to confirmation page
DatabaseConnection
  - it creates new user record
EntireMailer
  - it queues a 'new user created' email for sending

If you add more unit tests, what you are doing is testing parts that you may eventually be throwing away. On top of it, you will be introducing rigidity to your code because you can't restructure your code effectively. In the example above, DatabaseConnection class could be written using SQL statements directly instead of a more sensible ORM library. However you can't easily change the code without also changing the test specifications.

Rigidity is bad for a legacy code base you don't yet understand fully, as you are making it even harder for others to change code.

Let's see how we can make this an integration test:

UserController
  - it creates new user record
  - it queues a 'new user created' email for sending
  - it redirects to confirmation page

If you add more integration tests, you will be testing the system inputs and the resulting side effects / combined outputs. This helps you because each test is closer to how the actual features work.

Notice that there are now no longer dependent on DatabaseConnection or EntireMailer classes, so if you feel like these need refactoring you are free to make them. Maybe you decided to introduce a User class to make it easier to make database changes; or maybe extract the 'new user created' into a UserMailer class. You can introduce these classes and then swap them out in UserController without having to change anything in the test.

Early when you have no tests, you should aim to get your first integration tests up as these provide the most value. On top of it, the more involved set up forces you to understand the system deeper.

Write tests as you go along

I was tempted to write a lot of tests at the start but I quickly realise there is little value in that. The reason is because that whatever process that got the codebase to where it is today works. Tests aim to make your work in the future easier, but it doesn't necessarily make today any easier apart from wasting time that can be otherwise used to implement features.

The sensible thing to do is to write test for what you will be changing. The way to go about this is to do a characterisation of the existing behaviour. You do this by setting up the integration test and write the known expectations that you can observe from the code. Using the above example:

class UserController
  def create
    if params[:email].blank? || params[:password].blank?
      return render :new 
    end
    
    DatabaseConnection.new.create_user(params[:email], params[:password])
    EntireMailer.new_user_email(params[:email]).enqueue_mail
    
    redirect_to :confirmation
  end
end
UserController
- given email and password
  - it creates new user record
  - it queues a 'new user created' email for sending
  - it redirects to confirmation page
- given email and NO password
  - it renders the error
- given NO email and NO password
  - it renders the error

Once you have the tests for the above behaviours passing, you can proceed with regular test-driven development and add the specification for your new feature:

UserController
...
- given password too short # << newly added feature
  - it renders the error

There are several benefits to the controlled approach above:

  • a greater understanding for the area you are changing
  • focus writing tests at where it matters

Skip tests if the value is not justified

Of course you don't necessarily have to follow this approach. In my course of maintaining the legacy code base, there are many times I have chose to not write tests at all for some parts of the code.

In a codebase with no tests, it can be both tempting to write lots of tests or can even be confusing to decide where to start from. A discussion with the product owner and other developers will help you establish a better idea where to start from. Some of the factors that I take into account include:

  • ✅ very common code execution path
  • ✅ high business value
  • ✅ poor performance
  • ❌ slated for removal
  • ❌ the change is trivial enough
  • ❌ existing processes will not be improved by the addition of tests

Finally, remember to do housekeeping!

As you go about happily writing your tests, pushing new features and refactoring code, one key thing to do is to start 'pushing your tests down'.

This means to convert your integration tests into unit tests. This is often done when you are confident of the code quality and you wish to improve the test performance. Waiting 30 minutes for a test run can be really painful when you need fast test feedback.

A good time to do this is after refactoring a piece of legacy code. If you find yourself touching a piece of code functionality very often and the tests are too thick, consider converting them into unit tests for better performance.

Like my content? Subscribe to receive tips on software engineering.
© Alvin Ng