During routine monitoring, Squiz identified operational issues with multiple Funnelback servers, leading to search function disruptions and latency for several customers.
A subset of UK Customers may have experienced delays in search results and encountered 500 errors when attempting to utilise the Funnelback search function.
Squiz engineers were alerted to errors and timeouts originating from our Squiz hosted Funnelback services.
Investigation by Squiz engineers revealed that certain filter configurations were resulting in background optimisation calculations taking an excessive amount of time to compute.
This resulted in a large amount of computation resources being locked by these calculations for extended periods, reducing available computation resources to other main searches.
In response, we reduced the timing thresholds and the amount of computing power these calculations are allowed to take, in order to streamline query performance.
These restrictions only apply to optional background calculations and the limitations put in place will not cause disruption to search traffic.
As part of our standard process we initiated a period of heightened monitoring leading to resolution on May 8th at 11:03 BST
We have added new monitoring checks to flag excess computation delay as well as utility scripts to help us debug slow performance in the future. Our Product team is investigating approaches to improve filter background calculation performance in order to improve overall query performance going forward.