Stories About Flaky Tests
While working on a legacy codebase, I had the chance to track down several random test failures: tests that worked or not depending on the time of the day, or depending on the order the test classes were running, etc.
Following are some such stories.
Test working in the IDE but failing with maven
A test was failing as part of the Jenkins pipeline, but working fine locally from my IDE. Even reproducing the execution order of the tests as a test suite wouldn’t trigger the problem. Running the tests with maven though did cause it to fail.
Unfortunately the assertion’s error message wasn’t good, but improving it was enough to pinpoint the issue. After switching the assertion to use AssertJ, the error was something as follows:
Expecting map: {"$jacocoData"=[true, false, true, false, true, false, true], "field1"="value1", "field2"=123, "field3"=true} to contain only: ["field1"="value1", "field3"=true, "field2"=123] but the following map entries were unexpected: ["$jacocoData"=[true, false, true, false, true, false, true]]
We had recently added Jacoco to our build for code coverage. The problem was that we were using reflection to get the declared fields of a class, and when running the tests with Jacoco for code coverage, Jacoco adds a synthetic field to the class named $jacocoData, which was making the assertion fail. The solution was simply ignoring synthetic fields, which is in any case good practice.
Tests failing depending on execution speed
Some tests were asserting that the time stored in an entity was strictly before the time when the assertion ran. When the test was fast enough, the time when the entity was created and the time when the assertion ran didn’t differ, causing the test to fail. Debugging it would cause it to pass, as running it with a debugger caused enough of a slowdown that the times always differed.
This issue is trivially noticed and fixed with the right assertion messages. If doing, e.g.:
assertTrue(LocalDateTime.now().isAfter(entity.getCreationDateTime()));
having the assertion fail doesn’t give us a meaningful message. Using AssertJ with the right methods provides a lot more information. Fixing the assertion to consider equality:
assertThat(entity.getCreationDateTime())
.as("creationDateTime")
.isBeforeOrEqualTo(now);
Tests failing due to EhCache’s singleton CacheManager
For a long time we sometimes had a cascade of failing tests saying that a Cache
had been shut down. We noticed that it seemed to be that after some test failed,
something was left in an invalid state and affected subsequent tests. In the
end, what was happening was that the code was using EhCache as a singleton, and
when there was an issue initializing an ApplicationContext, Spring was calling
destroy()
on the CacheManager
. This CacheManager
, being a singleton for
the whole JVM, was reused by any other tests needing it, but had already been
shutdown previously. The final fix was changing the code to use separate
CacheManager
instances per application context.
Somewhat related, there was a test calling
CacheManager cacheManger = CacheManager.create();
cacheManger.removeAllCaches();
CacheManager.create()
reuses the singleton CacheManager
if there is one
(otherwise it creates it), so it was effectively removing all caches for
subsequent tests, also causing issues.
Tests failing depending on the date or time of execution
Dates and time are full of complexity and provide a myriad of pitfalls. The options usually are two:
- Either making it so that the test can always use the current time and work anyway, or
- Use a fixed time by having a
java.time.Clock
injected wherever time is needed, and using a fixedClock
for the test.
Test failing the 29th of February
We had a test for a method that given a birthday date, would return when the birthday should be on the current year. The test was using the current time and shifting it a year in the past to construct a date, something as follows:
final Date lastYear = DateUtils.addYears(now, -1);
assertEquals(now, DateTools.getBirthdayDate(lastYear));
When running on the 29th of February, the previous year had no 29th of February (since it’s not a leap year); the assertion would then fail because it expected the birthday date to be today, 29th, but was the 28th.
Test failing near midnight
Part of the code was using java’s legacy Date
class, and to better support
caching, it was rounding the Date
to the next hour. The idea was that calling
a service method with today at 16:03:00, or at 16:03:01, or at 16:59, would all
hit the same cache entry, instead of each one creating new cache entries but
always missing them.
The test was assigning permissions to a user for some action starting tomorrow,
and at some point asserting that the user did not have permissions for that
action now (today). However, the code checking these permissions was using
this rounding logic, so that asking between 23:00 and midnight would return
true
(i.e. that the persons did in fact have permissions today at the time of
running), because when rounding to the next hour, it would end up asking about
permissions on the following day.
Test failing only between 00:00 and 02:00
Some tests had been sporadically failing for years without anyone knowing why. Behind not understanding hides a teaching, so I spent some time to figure it out. I noticed the failures only happened between 00:00 and 02:00, and could reproduce this locally by changing the system’s clock to a time inside that range. Having a reproduction, the rest was a matter of time.
I’ll focus on one test, as the others were variations of the same issue. The test started with creating an entity and asserting that it could be found. This failed between 00:00 and 02:00.
The problem was that the entity had a validity (two Date
properties specifying
from when until when that entity was valid), and the service method being used
to find the entity was taking validity into account.
When creating the entity, it would have as validFrom
the date when the test
was running at start of the day, say 2021-09-19 00:00:00.0
. For the HQL query,
the problematic part of the where clause was:
validFrom <= current_timestamp()
at first sight, this looks fine. What’s the problem? The test was using HSQLDB,
and Hibernate translates current_timestamp()
to HSQLDB’s current_timestamp
,
which includes a time zone. Running the test in Vienna with time zone UTC+2
meant the condition would look something as follows:
2021-09-19 00:00:00.0 <= 2021-09-19 00:13:39.611+02
HSQLDB would then take the time zone into account, and use UTC for the
validFrom
date. Shifting to UTC, the condition would look like:
2021-09-19 00:00:00.0 <= 2021-09-18 22:13:39.611
which makes the issue more obvious. The condition was thus true between 02:00 and midnight, making the test normally pass, and false between 00:00 and 02:00, making it fail during that time.
The solution was using current_date()
which returns a value of type DATE, i.e.
does not include time zone, but read the next one about current_date()
!
HQL Query Parameter vs. current_date()
After some code changes a test started failing. The test was creating an entity
with a validity: two Date
properties called validFrom
and validTo
, where
the first one was “now”, and the second one “infinity” (some big sentinel value
for the date). At some point we were querying for this entity, and the result
set was coming empty. Eventually I narrowed down the problem to a change in the
HQL query. I had replaced a Date
parameter by current_date()
, i.e. the
query used to be something like:
session.createQuery(
"select ... from ... "
+ "where :date between validFrom and validTo")
.setParameter("date", new Date())
.list();
and I replaced the where condition by current_date() between validFrom and validTo
.
After some time, I found out that the time reported by the dockerized SQL
Server DB (using testcontainers) was an hour in the past. This was because by
default the docker container was using UTC as timezone, but I was running the
test in Vienna, which had UTC+1 at the time. I fixed this, so that the timezone
of the container matched the one used by the JVM (using the TZ
environment
variable).
I thought I was done, but then the test started randomly failing. When debugging the test though, it always passed (a so-called “Heisenbug”).
Eventually, I noticed that the time reported by SQL Server was some milliseconds
behind the time reported by Java. Hibernate translates current_date()
to SQL
Server’s GetDate()
, which includes the time apart from the date. Thus, when
the test executed fast enough, sometimes the date-time returned by using
current_date()
was slightly in the past compared to the time as seen in Java,
making the query fail (because the validFrom
of the created entity was some
milliseconds after the current_date()
time when executing the query). By then
I had spent enough figuring out the issue, that I couldn’t spend more figuring
out why the millisecond mismatch. I simply changed the query to use a parameter,
in that way the time was sure to be right.
Modifying in-memory state
Sometimes a test would change a setting in a Spring singleton (say, changing a flag to enable a feature), and wasn’t resetting it to its default. Then another test class using the same application context would run, and implicitly rely on the default value being used, failing.
A variation of this was a test modifying a ThreadLocal
variable and not
resetting it, making a later test fail because it was implicitly relying on such
variable being null when it wasn’t.
Using a static reference of the ApplicationContext
We had a static reference to the ApplicationContext
, which was meant to be
used in classes that couldn’t be Spring beans, that is classes where we couldn’t
rely on Spring wiring the needed dependencies 1 (think for instance a Log4J2
plugin such as a custom appender). Unfortunately, throughout the years this was
misused, used in places where it shouldn’t have been. Then weird things would
happen with tests. A typical example would be 3 tests running as follows:
- Test1, using ApplicationContext A.
- Test2, using ApplicationContext B.
- Test3, using ApplicationContext A.
When Test1 initialized the application context, it would set the static reference; then Test2 would do the same. Test3 wouldn’t, though, because it didn’t initialize the application context, it reused the one that had already been initialized before (because Spring caches it). Then Test3 would try to use a bean from a different application context, leading to errors.
Even when later this was changed so that each test would reset the static
reference, it was not enough, as some classes were getting a dependency from
the application context and caching it, or storing it in a static final
field,
which would still lead to a test using a bean from the wrong application
context.
All this is of course avoided by following standard dependency injection recommendations.
MetadataExtractorIntegrator
In one of his articles, Vlad Mihalcea shows how to get access to database metadata
in Hibernate 5. The idea is registering a Hibernate Integrator
and reading the
SessionFactory
's metadata. In that article the class is a singleton, keeping
the sole instance in a static final
field.
As described, that works fine in production, but can cause issues during
testing. Different application contexts can have different Hibernate entities
(depending on which entities are registered while building the
SessionFactory
). What I saw happening was again a problem with test running
order and Spring’s application context caching. Again, given:
- Test1, using ApplicationContext A.
- Test2, using ApplicationContext B.
- Test3, using ApplicationContext A.
If ApplicationContext B had a smaller set of entities, then Test3 could fail because:
- Test1 runs, intializes ApplicationContext A, and the
MetadataExtractorIntegrator extracts the metadata and keeps it in its
INSTANCE
field. - Test2 runs, intializes ApplicationContext B, and the
MetadataExtractorIntegrator extracts the metadata and overwrites the one kept
on its
INSTANCE
field. - Test3 runs, using the cached ApplicationContext A, with the metadata from
ApplicationContext B. At some point the test requests metadata for an entity
for which we have no data, since the
SessionFactory
in ApplicationContext B isn’t aware of it.
The fix for this was having a singleton Spring bean which stored the metadata
read by the MetadataExtractorIntegrator
, and having the rest of the code use
that bean instead of the MetadataExtractorIntegrator
INSTANCE
. In that way,
each application context had access to the right metadata.
Essentially a mixture of Ambient Context and Service Locator anti-patterns, as defined in “Dependency Injection: Principles, Practices and Patterns”, by Steven van Deursen and Mark Seemann. ↩︎