Cloud Hosted Funnelback Incident - AU only

Incident Report for Squiz

Postmortem

Summary

On 16 July 2025 at approximately 13:00 AEST, Squiz observed an issue affecting our Funnelback AU platform.Some customers’ search data appeared to have reverted to an earlier state (30 June 2025), impacting the accuracy of their search results.

This was triggered by an admin server restart that inadvertently caused an outdated recovery volume to be mounted.While this older volume was not intended to be active, a technical conflict led it to override the correct data volume mount during startup.

Squiz engineers took corrective action by detaching the incorrect volume and restarting the affected system.Full service was restored and verified by 13:54 AEST the same day.

Impact

Customers hosted on the affected Pod 01 may have experienced outdated or missing search data. This included:

  • Reverted content in admin server screens/searches - live search was unaffected
  • Delays in document updates being reflected in search
  • Temporary inconsistencies between backend systems and frontend content

While the issue did not affect all collections or customers equally, those with recent content updates were most likely to notice discrepancies within their admin interface.

Root cause Analysis

The incident was caused by a recovery volume that had been used in a prior restore task but was not properly detached after use.When the affected server restarted as part of a scheduled release, it mounted this older volume instead of the correct one due to a conflict between system identifiers.

This resulted in the search system presenting data that was current as of 30 June 2025, rather than the most recent state.

Although the issue was initially reported by developers working on non-live content, it quickly escalated as production data inconsistencies were identified.

Resolution Actions

Once the problem was confirmed, Squiz engineers initiated a structured response to diagnose and resolve the issue.

Key actions included:

  • Declaring a technical incident and halting other ongoing deployments to focus on resolution
  • Identifying that an outdated recovery volume was still attached
  • Shutting down the affected server and detaching the incorrect volume
  • Restarting the server, allowing it to mount the correct, current data volume
  • Verifying that the latest data was restored and consistent with expectations
  • Reviewing push API access logs to assist customers in re-sending any missing updates

Recommendations and Follow-up Actions

Enforce Cleanup After Restore Operations

Ensure that any recovery volumes or temporary resources used during troubleshooting are fully removed when no longer needed.Improve Volume Identification

Change the way data volumes are mounted by using different device IDs that are unique and immutable, reducing the risk of volume conflicts after a reboot.

Review and Adjust Backup Naming Conventions

Eliminate automated naming systems that obscure the purpose of EBS volumes, so restore drives can be more easily identified.

Posted Jul 30, 2025 - 14:00 AEST

Resolved

This incident is now declared as resolved.
Our team have identified the clients who need to be contacted in order to push their collections again.
Feel free to let us know if you have any outstanding issues via My.Squiz.Net
Posted Jul 16, 2025 - 13:55 AEST

Monitoring

We have rectified the issue and are now trying to identify any outstanding items left to wrap up this incident.
If you believe you are affected you should now be able to repush your collections now to rectify your instance.
Posted Jul 16, 2025 - 13:47 AEST

Identified

We have identified the cause of the issue at hand and are now on our way to producing a fix for this.

If you believe you are affected then feel free to log a case with us on My.Squiz.Net
Posted Jul 16, 2025 - 13:28 AEST

Investigating

Hello Customers.

We have an incident raised now regarding configurations reverting to old versions and failures in pushing new collections for clients who have Funnelback hosted in Australia and in a SAAS/Cloud infrastructure.

Our team are now working on rectifying Funnelback functionality.
Posted Jul 16, 2025 - 13:03 AEST
This incident affected: Squiz Cloud Hosted Instances and Squiz SaaS Hosted Instances.