Major Incident - Funnelback DXP UK
Incident Report for Squiz
Postmortem

Summary

Squiz identified operational issues with the query processing resources in the UK DXP, leading to queuing and slower response times on searches being processed in the UK region.

Customer impact

A subset of UK Customers may have experienced delays in search results and timeouts when attempting to utilise the search function.

Issue and Resolution

Squiz engineers identified an increase in query volumes on the processing compute layer. Whilst clients have independent query processing capabilities, the overall compute layer powering this has a finite ceiling. The increased traffic was identified as a potential DDOS masquerading as valid search traffic. At no stage was there any breach of our systems.

Once this was identified, Squiz Cloud Engineering was able to pinpoint the pattern of traffic and our web application firewall (WAF) was reconfigured to stop the the negative affect on our compute layer. This had an immediate impact on search performance and resulted in restoration of service.

The degraded service was restored at 11:08 GMT.

Mitigation

In light of this incident, we are reviewing our alerting thresholds to detect and prevent these issues sooner.

Posted Jul 01, 2024 - 18:58 AEST

Resolved
We are pleased to confirm that the previously reported issue affecting the performance of our Funnelback DXP system has been successfully resolved.
Our team closely monitored the situation, and were able to apply a fix for the issue, which led to significant improvements in performance.
We will continue to keep a watchful eye on the system to ensure optimal performance and stability. We appreciate your patience and understanding during this time and apologise for any inconvenience caused.

A post mortem will be made available on https://status.squiz.cloud/ in the coming days.
Posted Jun 26, 2024 - 21:29 AEST
Monitoring
We have identified the root cause of this issue and have implemented a fix.

A further update will be provided via https://status.squiz.cloud in 15 minutes, or earlier if the situation or information changes.
Posted Jun 26, 2024 - 21:08 AEST
Investigating
Squiz monitoring has detected a degradation of service impacting some Funnelback DXP customers in the UK only.

Some customers are experiencing slow response times and/or timeouts.

We are working to find the root cause of this issue currently

A further update will be provided via https://status.squiz.cloud in 15 minutes, or earlier if the situation or information changes.
Posted Jun 26, 2024 - 20:59 AEST
This incident affected: Squiz Funnelback Hosted Instances.