Stories About Flaky Tests

Keywords: #Java #Tests

While working on a legacy codebase, I had the chance to track down several random test failures: tests that worked or not depending on the time of the day, or depending on the order the test classes were running, etc.

Following are some such stories.

Test working in the IDE but failing with maven

A test was failing as part of the Jenkins pipeline, but working fine locally from my IDE. Even reproducing the execution order of the tests as a test suite wouldn’t trigger the problem. Running the tests with maven though did cause it to fail.

Unfortunately the assertion’s error message wasn’t good, but improving it was enough to pinpoint the issue. After switching the assertion to use AssertJ, the error was something as follows:

    Expecting map:
      {"$jacocoData"=[true, false, true, false, true, false, true], "field1"="value1", "field2"=123, "field3"=true}
    to contain only:
      ["field1"="value1", "field3"=true, "field2"=123]
    but the following map entries were unexpected:
      ["$jacocoData"=[true, false, true, false, true, false, true]]

We had recently added Jacoco to our build for code coverage. The problem was that we were using reflection to get the declared fields of a class, and when running the tests with Jacoco for code coverage, Jacoco adds a synthetic field to the class named $jacocoData, which was making the assertion fail. The solution was simply ignoring synthetic fields, which is in any case good practice.

Tests failing depending on execution speed

Some tests were asserting that the time stored in an entity was strictly before the time when the assertion ran. When the test was fast enough, the time when the entity was created and the time when the assertion ran didn’t differ, causing the test to fail. Debugging it would cause it to pass, as running it with a debugger caused enough of a slowdown that the times always differed.

This issue is trivially noticed and fixed with the right assertion messages. If doing, e.g.:

assertTrue(LocalDateTime.now().isAfter(entity.getCreationDateTime()));

having the assertion fail doesn’t give us a meaningful message. Using AssertJ with the right methods provides a lot more information. Fixing the assertion to consider equality:

assertThat(entity.getCreationDateTime())
    .as("creationDateTime")
    .isBeforeOrEqualTo(now);

Tests failing due to EhCache’s singleton CacheManager

For a long time we sometimes had a cascade of failing tests saying that a Cache had been shut down. We noticed that it seemed to be that after some test failed, something was left in an invalid state and affected subsequent tests. In the end, what was happening was that the code was using EhCache as a singleton, and when there was an issue initializing an ApplicationContext, Spring was calling destroy() on the CacheManager. This CacheManager, being a singleton for the whole JVM, was reused by any other tests needing it, but had already been shutdown previously. The final fix was changing the code to use separate CacheManager instances per application context.

Somewhat related, there was a test calling

CacheManager cacheManger = CacheManager.create();
cacheManger.removeAllCaches();

CacheManager.create() reuses the singleton CacheManager if there is one (otherwise it creates it), so it was effectively removing all caches for subsequent tests, also causing issues.

Tests failing depending on the date or time of execution

Dates and time are full of complexity and provide a myriad of pitfalls. The options usually are two:

  1. Either making it so that the test can always use the current time and work anyway, or
  2. Use a fixed time by having a java.time.Clock injected wherever time is needed, and using a fixed Clock for the test.

Test failing the 29th of February

We had a test for a method that given a birthday date, would return when the birthday should be on the current year. The test was using the current time and shifting it a year in the past to construct a date, something as follows:

final Date lastYear = DateUtils.addYears(now, -1);
assertEquals(now, DateTools.getBirthdayDate(lastYear));

When running on the 29th of February, the previous year had no 29th of February (since it’s not a leap year); the assertion would then fail because it expected the birthday date to be today, 29th, but was the 28th.

Test failing near midnight

Part of the code was using java’s legacy Date class, and to better support caching, it was rounding the Date to the next hour. The idea was that calling a service method with today at 16:03:00, or at 16:03:01, or at 16:59, would all hit the same cache entry, instead of each one creating new cache entries but always missing them.

The test was assigning permissions to a user for some action starting tomorrow, and at some point asserting that the user did not have permissions for that action now (today). However, the code checking these permissions was using this rounding logic, so that asking between 23:00 and midnight would return true (i.e. that the persons did in fact have permissions today at the time of running), because when rounding to the next hour, it would end up asking about permissions on the following day.

Test failing only between 00:00 and 02:00

Some tests had been sporadically failing for years without anyone knowing why. Behind not understanding hides a teaching, so I spent some time to figure it out. I noticed the failures only happened between 00:00 and 02:00, and could reproduce this locally by changing the system’s clock to a time inside that range. Having a reproduction, the rest was a matter of time.

I’ll focus on one test, as the others were variations of the same issue. The test started with creating an entity and asserting that it could be found. This failed between 00:00 and 02:00.

The problem was that the entity had a validity (two Date properties specifying from when until when that entity was valid), and the service method being used to find the entity was taking validity into account.

When creating the entity, it would have as validFrom the date when the test was running at start of the day, say 2021-09-19 00:00:00.0. For the HQL query, the problematic part of the where clause was:

    validFrom <= current_timestamp()

at first sight, this looks fine. What’s the problem? The test was using HSQLDB, and Hibernate translates current_timestamp() to HSQLDB’s current_timestamp, which includes a time zone. Running the test in Vienna with time zone UTC+2 meant the condition would look something as follows:

    2021-09-19 00:00:00.0 <= 2021-09-19 00:13:39.611+02

HSQLDB would then take the time zone into account, and use UTC for the validFrom date. Shifting to UTC, the condition would look like:

    2021-09-19 00:00:00.0 <= 2021-09-18 22:13:39.611

which makes the issue more obvious. The condition was thus true between 02:00 and midnight, making the test normally pass, and false between 00:00 and 02:00, making it fail during that time.

The solution was using current_date() which returns a value of type DATE, i.e. does not include time zone, but read the next one about current_date()!

HQL Query Parameter vs. current_date()

After some code changes a test started failing. The test was creating an entity with a validity: two Date properties called validFrom and validTo, where the first one was “now”, and the second one “infinity” (some big sentinel value for the date). At some point we were querying for this entity, and the result set was coming empty. Eventually I narrowed down the problem to a change in the HQL query. I had replaced a Date parameter by current_date(), i.e. the query used to be something like:

session.createQuery(
    "select ... from ... "
    + "where :date between validFrom and validTo")
.setParameter("date", new Date())
.list();

and I replaced the where condition by current_date() between validFrom and validTo.

After some time, I found out that the time reported by the dockerized SQL Server DB (using testcontainers) was an hour in the past. This was because by default the docker container was using UTC as timezone, but I was running the test in Vienna, which had UTC+1 at the time. I fixed this, so that the timezone of the container matched the one used by the JVM (using the TZ environment variable).

I thought I was done, but then the test started randomly failing. When debugging the test though, it always passed (a so-called “Heisenbug”).

Eventually, I noticed that the time reported by SQL Server was some milliseconds behind the time reported by Java. Hibernate translates current_date() to SQL Server’s GetDate(), which includes the time apart from the date. Thus, when the test executed fast enough, sometimes the date-time returned by using current_date() was slightly in the past compared to the time as seen in Java, making the query fail (because the validFrom of the created entity was some milliseconds after the current_date() time when executing the query). By then I had spent enough figuring out the issue, that I couldn’t spend more figuring out why the millisecond mismatch. I simply changed the query to use a parameter, in that way the time was sure to be right.

Modifying in-memory state

Sometimes a test would change a setting in a Spring singleton (say, changing a flag to enable a feature), and wasn’t resetting it to its default. Then another test class using the same application context would run, and implicitly rely on the default value being used, failing.

A variation of this was a test modifying a ThreadLocal variable and not resetting it, making a later test fail because it was implicitly relying on such variable being null when it wasn’t.

Using a static reference of the ApplicationContext

We had a static reference to the ApplicationContext, which was meant to be used in classes that couldn’t be Spring beans, that is classes where we couldn’t rely on Spring wiring the needed dependencies 1 (think for instance a Log4J2 plugin such as a custom appender). Unfortunately, throughout the years this was misused, used in places where it shouldn’t have been. Then weird things would happen with tests. A typical example would be 3 tests running as follows:

  1. Test1, using ApplicationContext A.
  2. Test2, using ApplicationContext B.
  3. Test3, using ApplicationContext A.

When Test1 initialized the application context, it would set the static reference; then Test2 would do the same. Test3 wouldn’t, though, because it didn’t initialize the application context, it reused the one that had already been initialized before (because Spring caches it). Then Test3 would try to use a bean from a different application context, leading to errors.

Even when later this was changed so that each test would reset the static reference, it was not enough, as some classes were getting a dependency from the application context and caching it, or storing it in a static final field, which would still lead to a test using a bean from the wrong application context.

All this is of course avoided by following standard dependency injection recommendations.

MetadataExtractorIntegrator

In one of his articles, Vlad Mihalcea shows how to get access to database metadata in Hibernate 5. The idea is registering a Hibernate Integrator and reading the SessionFactory's metadata. In that article the class is a singleton, keeping the sole instance in a static final field.

As described, that works fine in production, but can cause issues during testing. Different application contexts can have different Hibernate entities (depending on which entities are registered while building the SessionFactory). What I saw happening was again a problem with test running order and Spring’s application context caching. Again, given:

  1. Test1, using ApplicationContext A.
  2. Test2, using ApplicationContext B.
  3. Test3, using ApplicationContext A.

If ApplicationContext B had a smaller set of entities, then Test3 could fail because:

  1. Test1 runs, intializes ApplicationContext A, and the MetadataExtractorIntegrator extracts the metadata and keeps it in its INSTANCE field.
  2. Test2 runs, intializes ApplicationContext B, and the MetadataExtractorIntegrator extracts the metadata and overwrites the one kept on its INSTANCE field.
  3. Test3 runs, using the cached ApplicationContext A, with the metadata from ApplicationContext B. At some point the test requests metadata for an entity for which we have no data, since the SessionFactory in ApplicationContext B isn’t aware of it.

The fix for this was having a singleton Spring bean which stored the metadata read by the MetadataExtractorIntegrator, and having the rest of the code use that bean instead of the MetadataExtractorIntegrator INSTANCE. In that way, each application context had access to the right metadata.


  1. Essentially a mixture of Ambient Context and Service Locator anti-patterns, as defined in “Dependency Injection: Principles, Practices and Patterns”, by Steven van Deursen and Mark Seemann. ↩︎