Speeding Up Test Suite In Java

Keywords: #Java #Spring #Tests

I once had the chance to speed up a test suite spanning multiple maven projects which took over 4 hours to run. The suite had around 8100 tests, many of which were integration tests using a Spring ApplicationContext. I managed to get this down to around 1h 20~30 mins, until other priorities prevented further reductions.

What follows is how this was achieved, and what things to watch out for.

Using a single JVM for the tests in a project

The very first point is reusing forks in your maven-surefire-plugin and maven-failsafe-plugin configurations. This is the default, but many of the projects had <reuseForks>false</reuseForks>. Using false means that for each test class a new JVM is used. This makes it easier to keep tests isolated, since no JVM-global state is preserved (e.g. static variables). The downside is that this is costlier than using a single JVM for all tests, and Spring’s ApplicationContext caching cannot work. This leads us to the next point.

Spring ApplicationContext caching

By default Spring keeps each application context it needs to create cached, and reuses them whenever possible. For two tests to use the same application context they not only need to point to the same configuration (using the same locations or classes in @ContextConfiguration), but a bunch of other things. For instance, if a test specifies a different initializer, then that implies using a different application context (for further details refer to MergedContextConfiguration, which is what Spring uses as key for the map of application contexts it uses as cache).

Thus, one should try to use the same application context for all tests in a project that run under the same JVM. Sometimes a different application context makes sense, but the less the better, performance-wise.

Watch out for JVM-global state

Once the tests run under the same JVM, global state matters more. As always, we need to minimize its usage, and keep it in mind to avoid flaky tests. An obvious example is System properties. Another example is EhCache, which supports using it as a singleton (e.g. with Hibernate’s SingletonEhCacheRegionFactory, or calling CacheManager.create()).

Any JVM singleton carries issues, as changes to the state by one test can affect subsequent ones if we don’t properly clean up after it. By JVM singleton I mean a class using a standard implementation of the Singleton pattern, such as using an enum (as recommended by Bloch in Effective Java), or a static field. Singleton Spring beans can still be a problem (if they have state), but are a bit less susceptible because different ApplicationContexts can have different instances, so at least tests running with different ApplicationContext won’t affect each other through that bean.

Also, relying on such state complicates running tests in parallel, as then if test A passes or not can depend on whether some other test B that also modifies the same state runs at the same time or not.

Watch out for thread-bound values

When relying on a ThredLocal value, you need to make sure your tests clear it after being done, and tests that depend on it set the appropriate value at the beginning, even if this value is the default one (say, null). Both are solutions to the same issue: if each test cleans up after using the thread-bound resource, then later tests don’t need to set the default again, and if other tests set the needed value at the start (even if it’s the default), then previous tests don’t need to clean up. Having both is mostly for better resilience, and setting the right value at the beginning (even if it’s the default) is being explicit about a tests prerequisites, so it also works as documentation.

Reusing database between tests

Starting up a database is costly, even when using an in-memory one such as HSQLDB. Using a docker instance of the same DBMS as in production with testcontainers is better—and costlier.

It thus makes sense from a performance perspective to try to use a single database for all tests in a suite. This of course introduces another point through which different tests can affect each other. For example, if a test creates two instances of an entity in a database, and then asserts that there are actually two such instances in there, it can fail if some other test created another instance before. This means we cannot:

  1. Run assertions about the whole state of a database table (aggregations such as count, max or min can be problematic).
  2. Run statements that modify the whole state of a table, such as deleting all rows.

As a practical example of (2), when initializing the application context, we had some logic to create some default entries in the database (if missing) which we could later expect to always be there (e.g. default currency, default language, etc.) A test that deleted all data for one of these tables ended up corrupting subsequent tests that expected such defaults to be in the database, but didn’t find them (because they reused the ApplicationContext, so the defaults weren’t re-created).

How should different tests use the same database then? One option is expecting each test to clean up after itself. This can work, but is of course error-prone, as it’s easy for a developer to forget it (or not do it thoroughly enough), and end up causing hard-to-track flaky tests. The other side of the same coin is expecting each test to make sure the database is in the right state before running (i.e. by clearing the needed tables). But this suffers from the same downsides.

Another frequently used approach is recreating the whole DB when a test starts, and dropping the whole DB at the end of it. This can be easier to automatize, and then avoids the previously mentioned issue. All three approaches though complicate parallelizing tests, since having two tests that need the same database table run in parallel can easily lead to race conditions.

A better option is randomizing test data. By creating fresh, unique entries in the database that are sufficiently random, then

  1. Tests are less likely to affect each other. Of course the initial restrictions still apply: running aggregation functions on tables or performing statements that alter state that other tests need (such as deleting all entries in a table) are still problematic. Tests need to run assertions specific to the data they create, but this is seldom a problem.
  2. There is no need to clean up between tests. Tests can freely pollute the database, and some argue this is even more realistic, as it’s closer to how it would be in production (it’s artificial to always have a clean empty database for each use-case).
  3. Lastly, they are amenable to parallelization, because no clean up is performed, and because with sufficiently random data no unique constraint should be violated.

To support this, the creation of frequently used entities can be encapsulated in some helper classes. JUnit 5 extensions can also reduce a lot of the boilerplate (so that registering an extension will create a bunch of randomized entities and their needed relations), and there is even some libraries that can simplify this further (e.g. instancio).