Products

Oct 12, 2023

Finding and Fixing Jest Phantom Failures

My journey of debugging a phantom Jest test failure in my NX Angular Monorepo

Summarize with AI

Finding and Fixing Jest Phantom Failures

For Qualys admins, NES for .NET directly resolves the EOL/Obsolete Software: Microsoft .NET Version 6 Detected vulnerability, ensuring your systems remain secure and compliant. Fill out the form to get pricing details and learn more.

Recently, I came across a problem in my Angular application tests that I’ve never run into before. Some previously written tests were failing in CI, but the test output didn’t tell me why! I had a phantom failure somewhere, haunting my tests. ’Tis the season after all. This is a developer’s worst nightmare. A failure with no error message. I couldn’t just ignore it though, it was failing my CI! I took a few steps to find out the culprit. In this article, I’ll walk you through my debugging process for identifying flaky, unexplained failures in Jest unit tests.

TLDR;

Asynchronous code can cause tests to fail outside of the normal Jest process if not handled properly
Running Jest in a single process can improve performance in CI and reduce complications
You can detect leaked code handles with leaked-handles

‍

What Was Failing?

My project is a monorepo built with NX. I have multiple Angular applications being tested with Jest. In my continuous integration (CI) job I run nx affected to only run the tests that were affected by my code changes. This allows for faster, more efficient CI runs, testing only what is required. Here is a sample of the test output I got:

I ran my tests, and everything appeared to be passing, with no failures. However, when my job finished, NX told me that one of my projects, my-app1-settings-feature-lib:test, failed. Looking at the output though, I got no reason for the failure, not even a test summary like the other projects. Subsequent runs showed that sometimes the tests passed, and sometimes they failed, but I never got any output to tell me what was failing. A hidden, flaky failure and I don’t even know where to look?

So I started debugging this mystery.

‍

Debugging Steps

It seems odd to have to debug your tests, but it could be that the tests are revealing a problem with the code being tested, so you shouldn’t ignore failures like this. There are a few steps I took to find the culprit, ranging from small, easy things, to more involved methods. You may find that you won’t need to try all these methods to find your phantom failure, so try them in this order to see if you can find your failing test faster than I did.

‍

Run In Band

The first thing I did was try to force my tests to run in a single process. Jest by default parallelizes tests across multiple workers. I wanted to rule out the possibility that my test output was being lost to a terminated process. I added the flag --runInBand to my NX test command. This command “runs all tests serially in the current process, rather than creating a worker pool of child processes that run tests (source).” --runInBand is equivalent to running --maxWorkers=1(docs here). You can use whichever flag you prefer.

NX actually advises using this flag when in a CI environment to improve performance. So I updated my nx affected command like so:

Unfortunately, this did not solve my issue, but I did see a marginal improvement in my CI execution time, so I kept it! Let’s try something else.

Leaked Handles?
Running in a single process did identify my phantom failure, so there must be something going on inside a single process that is not being caught and displayed by Jest. Jest isn’t displaying the error, but the error is causing a test to fail.

I took to my local machine to run the entire test suite with --runInBand enabled. It’s quicker to iterate on a local machine rather than in CI, and I wanted to see if I could reproduce the failure in another environment. I saved the console output to examine it in detail. When I ran locally without using nx affected I noticed a new error output across many of my test suites that I didn’t see in CI.

A leaked handle? That’s not good. Jest doesn’t tell me what tests have leaked handles, only that I’ve got some. I ran the test suite for the NX project in question with Jest’s suggestion.

This… did nothing. I got no helpful output detecting any open handles. What’s worse, the previous message telling me I had open handles didn’t display this time. So I got less information than in my previous runs.

‍

The Silver Bullet

I scoured the internet for some way to have these leaked handles shown to me. There were way too many tests in this project to look through and identify potential failed tests. Then, I found the silver bullet. A package called leaked-handles. This dependency states that it will “detect any handles leaked in Node.”

I installed the package and got to work. I went one by one through each spec file in the project with the leaked-handlespackage imported at the top of each spec file and ran the suite, according to the leaked-handles documentation.

At first, I got nothing. No errors, no optimizations. And then, I found it.

“Let’s see who this really is…”

"And I would have gotten away with it, too, if it weren't for leaked handles."

‍

It’s Asynchronous Code!

When I ran this test with leaked-handles running, the test passed in Jest, but the leaked handle showed that my assertions failed. I got an “undefined access” error. This async code, where I subscribed to the result of an RxJS Observable was executing after Jest thought my it block was finished. The subscription was never cleaned up. These assertions were not being run in a timely manner. This call to .subscribe is a leaked handle detected by leaked-handles. I added a call to done() (see Jest’s docs on testing asynchronous code for more detail), to ensure this it block would not finish until it is done, and the test passed.

And that’s not all! I continued testing every file by importing leaked-handles, thinking there may be more than one leaked handle, and there was! I fixed an issue of a window.setTimeout not being waited on properly, more leaked Observable subscriptions, and even a few references to document after the test execution context was gone. After fixing all of these leaked handles, I ran the tests in CI, and the pipeline was green.

‍

How to Prevent This From Happening to You

If you are getting phantom failures, odds are the problem has to do with asynchronous code that is not being run in a timely manner. Here are some options you can use to ensure that your asynchronous code is tested properly, with no errors.

‍

Use async / await

You can make your it blocks async and call await on any async values.

Execution is paused while you wait for the Promise to resolve. You can even do this if you are using RxJS and Observables with firstValueFrom, pausing execution while you wait for an Observable to emit.

‍

Use Jest done()

You can ensure that a given it block only finishes when you tell it to finish with done(). This is especially useful when you need to set up some assertions in an RxJS Subscription and then trigger the Observable emission to the subscription. More often than not, firstValueFrom will do, but sometimes a Subscription is the best tool for the job. Even if you use done(), you should still clean up that subscription afterward.

‍

Always Clean Up Async Handlers

I mentioned this in the previous section, but always ensure you are cleaning up your handles when your test execution finishes. Some examples would be:

Unsubscribe to any Observable subscriptions created in your test.
Unsubscribe to any Observable subscriptions inside your “unit under test” if there are long-lived subscriptions your code created
- Trigger Angular’s ngOnDestroy() or any destroy lifecycle hooks from your particular JS framework inside your tests.
Mock real wait times done through RxJS delay() and debounce(), or native JS window.setTimeout. It is crucial that you test this wait behavior, but you don’t want to wait for real!
- Use Jest fake timers
- Use Angular’s fakeAsync and tick
- Use jest.spyOn to fake calls to window.setTimeout and call the given timeout handler immediately.

‍

Conclusion

Leaked handles in asynchronous code can cause sneaky, unexplained failures. They are hard to track down and often leave no traces. Follow the given steps to safeguard your tests from flakiness. If you do find yourself in the same situation I did, try these debugging steps to identify the phantom culprit. A huge shoutout to the creator of leaked-handles Raynos. Without his work, I never would have found my failed test. Don’t let your code releases get bogged down by flaky tests you can’t trust. Shore up your test suite today so you can ship your code knowing it is going to work for your users.

Share via: