Debugging from the Field: Sudden CI Test Failures
This is one of a series of posts revolving around debugging stories from the field. The goal of this series is to help demonstrate how to debug issues in a Big Data or Cloud application, as well as show some of the decision-making processes that go into diagnosing issues. I encourage you to look past just the problem and the solution and instead focus on the process. Feel free to post comments on possibilities you think we missed, or questions on why we went a certain direction so that we can all learn from each other! All names are omitted in order to protect the innocent and not-so-innocent.
I am currently working with a client on an AWS-native application. This application is based on a microservices architecture using Lambda for the processing and Kinesis streams for passing messages between functions. It also scrapes data from a government website using a Lambda function run once a day.
We have a set of unit and integration tests which can run locally, and run automatically before a PR can be merged, and again after any changes have been pushed to main. There is one test which pulls from the government website mentioned above, but otherwise the tests are meant to be completely self-sufficient.
During deployment of a minor change to the end-to-end testing scripts, we started to see a test failure in that one test which reaches out to the government website. Immediately my suspicion was that the website had changed their formatting, breaking us. This is always a risk when doing web scraping, but that was a risk we decided to take for this project.
I started with finding when it succeeded and when it failed. Rerunning the same change a second time resulted in a failure, as did rerunning the last deployment that succeeded from the previous day. I also verified that making a new branch with an insignificant change would cause a failure in the PR Verification action, separate from the Main CI which was failing. Running the test locally passed, however.
This last part was important, since it excluded the possibility of this being due to a change in the website's code. If the website had changed, tests would fail everywhere, but now they only failed in the CI process.
Looking through the log from the failures in the CI process, I came across these two lines:
Both of these call stacks are located within the responses package, which we do not explicitly require. Responses is a package which mocks out calls to request for us, but in this system, we are using requests-mock, and this particular test was not mocking requests at all.
Will Someone Claim This Package?
This left us with the question of why this package was even being installed, and what led it to not working.
For the first question, using the pip show command proved invaluable. From the input below, we can see that the moto package is the culprit, based on the "Required-by" field.
With that answered, the reason of why the failure just started came into focus. For each package, we set an exact version that package should be, otherwise known as pinning the version. This is done in order to avoid breaking changes making their way into a production system like this. But if the package was included in a dependency we had, and the version was not pinned, then we could still get newer versions that introduce bugs.
To test this theory, we compared the versions of the responses package between the CI environment and the local environment. Looking at my local environment, I was using responses==0.14.0, while the tests were installing responses==0.17.0. Uninstalling responses and then reinstalling moto caused the error to start appearing, confirming the suspicion.
The final nail in the coffin was confirming that moto only requires a minimum version for responses, which you can see in their source code.
Solving the Issue
The best solution would be to fix the issue in responses. A worse but still okay solution would be to ensure moto doesn't pull in that new of a version of responses. While either of these would solve the issue, this doesn't solve it fast enough to get production up and running quickly.
For our project, however, we can just pin the version of responses to 0.14.0, the version we were using locally that worked correctly. To do this, we add responses==0.14.0 to the requirements file above the line which includes moto, so that it is installed by the time moto is installed, therefore it doesn’t try to install the latest version anymore.
Sure enough, making this single change and rerunning the CI showed the test started passing again. Once again, the more complex the issue, the simpler the fix.
To Pin or not to Pin?
I had always been taught that pinning versions was a best practice. Pinning package versions allows you to completely control the versions of all of your dependencies, so that the behavior will match what you tested with. This prevents bugs from introducing themselves to otherwise stable systems, like what we saw here. If you need a new version of a package, you update it in your spec and rerun your tests before deploying the change.
As I was writing this post, however, a thread came up in twitter from Kelsey Hightower, self-proclaimed minimalist and all-around tech guru.
I bring up this thread because the rationale given by Adam Jacob and Luis Villa above are pretty solid rationales, and something I hadn't considered. Using minimum versions or fuzzy versions allows you to get security updates automatically, reducing the chance of something like the recent log4j attack from requiring changes on your end. That comes at a cost though, of having to deal with issues like this more often. It also requires you to have more trust of your dependencies that they will test their new versions rigorously before releasing, or (in the case of fuzzy version matching) being consistent about version numbering schemes.
I still find myself in Kelsey's camp on this topic, but it's worth understanding both sides and considering it as you build the system. For the particular system mentioned in this post (and really most data engineering systems you'll handle), it is an internal system with a limited attack vector, so risking that for higher maintainability is probably the right move. But that may not always be the case, so it is worth developing this strategy as you start a new project.