Speeding Up Test Suite In Java
I once had the chance to speed up a test suite spanning multiple maven projects
which took over 4 hours to run. The suite had around 8100 tests, many of which
were integration tests using a Spring ApplicationContext
. I managed to get this
down to around 1h 20~30 mins, until other priorities prevented further
reductions.
What follows is how this was achieved, and what things to watch out for.
Using a single JVM for the tests in a project
The very first point is reusing forks in your maven-surefire-plugin and
maven-failsafe-plugin configurations. This is the default, but many of the
projects had <reuseForks>false</reuseForks>
. Using false
means that for each
test class a new JVM is used. This makes it easier to keep tests isolated, since
no JVM-global state is preserved (e.g. static variables). The downside is that
this is costlier than using a single JVM for all tests, and Spring’s
ApplicationContext
caching cannot work. This leads us to the next point.
Spring ApplicationContext caching
By default Spring keeps each application context it needs to create cached, and
reuses them whenever possible. For two tests to use the same application context
they not only need to point to the same configuration (using the same
locations
or classes
in @ContextConfiguration
), but a bunch of other
things. For instance, if a test specifies a different initializer
, then that
implies using a different application context (for further details refer to
MergedContextConfiguration
, which is what Spring uses as key for the map of
application contexts it uses as cache).
Thus, one should try to use the same application context for all tests in a project that run under the same JVM. Sometimes a different application context makes sense, but the less the better, performance-wise.
Watch out for JVM-global state
Once the tests run under the same JVM, global state matters more. As always, we
need to minimize its usage, and keep it in mind to avoid flaky tests. An obvious
example is System properties. Another example is EhCache, which supports using
it as a singleton (e.g. with Hibernate’s SingletonEhCacheRegionFactory
, or
calling CacheManager.create()
).
Any JVM singleton carries issues, as changes to the state by one test can affect
subsequent ones if we don’t properly clean up after it. By JVM singleton I mean
a class using a standard implementation of the Singleton pattern, such as using
an enum (as recommended by Bloch in Effective Java), or a static field.
Singleton Spring beans can still be a problem (if they have state), but are a
bit less susceptible because different ApplicationContexts
can have different
instances, so at least tests running with different ApplicationContext
won’t
affect each other through that bean.
Also, relying on such state complicates running tests in parallel, as then if test A passes or not can depend on whether some other test B that also modifies the same state runs at the same time or not.
Watch out for thread-bound values
When relying on a ThredLocal
value, you need to make sure your tests clear it
after being done, and tests that depend on it set the appropriate value at the
beginning, even if this value is the default one (say, null). Both are solutions
to the same issue: if each test cleans up after using the thread-bound resource,
then later tests don’t need to set the default again, and if other tests set the
needed value at the start (even if it’s the default), then previous tests don’t
need to clean up. Having both is mostly for better resilience, and setting the
right value at the beginning (even if it’s the default) is being explicit about
a tests prerequisites, so it also works as documentation.
Reusing database between tests
Starting up a database is costly, even when using an in-memory one such as HSQLDB. Using a docker instance of the same DBMS as in production with testcontainers is better—and costlier.
It thus makes sense from a performance perspective to try to use a single database for all tests in a suite. This of course introduces another point through which different tests can affect each other. For example, if a test creates two instances of an entity in a database, and then asserts that there are actually two such instances in there, it can fail if some other test created another instance before. This means we cannot:
- Run assertions about the whole state of a database table (aggregations such as count, max or min can be problematic).
- Run statements that modify the whole state of a table, such as deleting all rows.
As a practical example of (2), when initializing the application context, we had
some logic to create some default entries in the database (if missing) which we
could later expect to always be there (e.g. default currency, default language,
etc.) A test that deleted all data for one of these tables ended up corrupting
subsequent tests that expected such defaults to be in the database, but didn’t
find them (because they reused the ApplicationContext
, so the defaults weren’t
re-created).
How should different tests use the same database then? One option is expecting each test to clean up after itself. This can work, but is of course error-prone, as it’s easy for a developer to forget it (or not do it thoroughly enough), and end up causing hard-to-track flaky tests. The other side of the same coin is expecting each test to make sure the database is in the right state before running (i.e. by clearing the needed tables). But this suffers from the same downsides.
Another frequently used approach is recreating the whole DB when a test starts, and dropping the whole DB at the end of it. This can be easier to automatize, and then avoids the previously mentioned issue. All three approaches though complicate parallelizing tests, since having two tests that need the same database table run in parallel can easily lead to race conditions.
A better option is randomizing test data. By creating fresh, unique entries in the database that are sufficiently random, then
- Tests are less likely to affect each other. Of course the initial restrictions still apply: running aggregation functions on tables or performing statements that alter state that other tests need (such as deleting all entries in a table) are still problematic. Tests need to run assertions specific to the data they create, but this is seldom a problem.
- There is no need to clean up between tests. Tests can freely pollute the database, and some argue this is even more realistic, as it’s closer to how it would be in production (it’s artificial to always have a clean empty database for each use-case).
- Lastly, they are amenable to parallelization, because no clean up is performed, and because with sufficiently random data no unique constraint should be violated.
To support this, the creation of frequently used entities can be encapsulated in some helper classes. JUnit 5 extensions can also reduce a lot of the boilerplate (so that registering an extension will create a bunch of randomized entities and their needed relations), and there is even some libraries that can simplify this further (e.g. instancio).