UK DXP Funnelback Customers - Service Degradation

Incident Report for Squiz

Postmortem

Postmortem

Summary

On March 26, 2025, Squiz observed a service degradation affecting several Funnelback DXP customers in the UK. The incident began at 11:57 GMT and lasted for several hours, during which customers experienced slow search responses and degraded service. Engineers identified a potential cause of the issue at 12:24 GMT, and by 13:20 GMT on March 27, signs of recovery were observed. Full service restoration was confirmed by 16:12 GMT, and the issue was resolved.

The degradation was primarily caused by unusually high traffic levels coming from a few specific actors, which overwhelmed the system and led to service disruptions. This traffic was subsequently rate-limited to mitigate the impact and restore service functionality.

Customer Impact

A subset of UK customers using Funnelback DXP services experienced slow response times and degraded search functionality. Some users also reported intermittent 504 service errors. The impact was localised to the UK, with no other regions affected.

Issue, Resolution, and Mitigation

Upon investigation, Squiz engineers identified unusually high levels of traffic coming from a small group of specific actors. This high traffic load caused congestion in the system, resulting in delays and intermittent service disruptions for affected customers.

The issue was addressed in the following steps:

  1. The traffic from the specific actors was identified and rate-limited to reduce the load on the system.
  2. Once the rate limiting was applied, service performance began to improve, and the degraded functionality started to recover.
  3. Engineers continued to monitor the system for further anomalies and ensured that the rate-limiting measures were effectively restoring normal operations.
  4. By 16:12 GMT, the system was fully stabilised, and all affected services were restored.
Posted Mar 31, 2025 - 19:47 AEDT

Resolved

The issue has been addressed, and services are fully restored.
We will continue to monitor system performance, but the incident has now been resolved.
A Post-Incident Review (PIR) will be available in the coming days.
Posted Mar 27, 2025 - 03:12 AEDT

Update

Our engineers continue to monitor the situation, as we see signs of recovery.
Posted Mar 27, 2025 - 00:55 AEDT

Monitoring

We've started to see improvements in search functionality as we continue to monitor the situation.
Posted Mar 27, 2025 - 00:20 AEDT

Update

Our engineers are continuing to work through the current incident and are actively testing solutions to resolve the issue.
Posted Mar 26, 2025 - 23:53 AEDT

Identified

Our engineers have identified a potential source of the issue and are working on implementing a fix.
Posted Mar 26, 2025 - 23:24 AEDT

Update

We are continuing to investigate this issue.
Posted Mar 26, 2025 - 23:08 AEDT

Investigating

Squiz has detected a degradation of service impacting some Funnelback DXP customers in the UK only.
Some customers may experience slow response and or degraded service.
We are working to mitigate this problem. More updates to follow shortly.
A further update will be provided via https://status.squiz.cloud in 15 minutes, or earlier if the situation or information changes.
Posted Mar 26, 2025 - 22:57 AEDT
This incident affected: Squiz SaaS Hosted Instances.