DXP 6.61.0 Safe Edit Issue causing error 500 for Public users

Incident Report for Squiz

Postmortem

Summary

On the 2nd April at approximately 14:49 (GMT+10) Squiz received indications of 500 Errors from customer pages.

Squiz’s Support teams alongside our product team, quickly identified that the issue was induced by a recent DXP Upgrade version 6.61.0. Hot patches were released whilst in parallel Matrix DXP was rolled back to 6.60.1 restoring services.

Customer Impact

Incident Duration: 02 Apr 2025, 14:49 - 23:14 (GMT+10)

Impact: some customers experienced site page 500 errors.

Impact times and service restoration times varied throughout the course of the incident duration.

The effect of this issue was limited to clients who changed asset statuses during a specific period of time which meant that impact was only felt by some users who would have been editing assets at the time of the incident.

Root cause Analysis

An asset property was removed in Matrix Version 6.61.0. This impacted assets that were placed into Safe Edit as it resulted in errors when Matrix attempted to serialise objects.

Resolution Actions

  1. Identification

Squiz Support Team identified a trend in logs when investigating reports of problems. Product teams were engaged with and quickly isolated the cause.

  1. Hot Patch

Squiz developed, tested and deployed hot-patches, whilst in parallel assessing version Rollback vs Roll forward.

  1. Downgrade

To fully resolve the issue a Matrix Version downgrade took place.

Follow-up Actions

Squiz has deployed monitoring enhancements to have the ability to detect/monitor for similar events including identification during testing - completed

Squiz has rolled out Matrix Version 6.61.1 successfully, which introduced a change to circumvent this issue. - completed

Posted Apr 10, 2025 - 15:56 AEST

Resolved

Dear Customers,

Following an extended period of monitoring, we are pleased to confirm that this issue has now been resolved.

We appreciate your patience and understanding during this time and apologise for any inconvenience caused.

A post mortem will be made available on https://status.squiz.cloud/ in the coming days.
Posted Apr 03, 2025 - 00:40 AEDT

Update

We are continuing to monitor for any outstanding issues. A further update will be provided once the rollback has been completed.
Posted Apr 02, 2025 - 23:26 AEDT

Update

We are continuing to see recovery for more affected customers and believe we are nearing resolution.
We are continuing to be vigilant for any further issues.
Posted Apr 02, 2025 - 21:58 AEDT

Update

We are continuing to progress with the rollback of the changes. We are continuing to see recovery for some affected customers. We are actively checking reported outages for recovery and are continuing to monitor for developments.
Posted Apr 02, 2025 - 20:10 AEDT

Update

Rollback of the changes continues, and we are now seeing recovery for some affected customers. We are actively checking reported outages for recovery and are continuing to monitor for developments.
Posted Apr 02, 2025 - 19:07 AEDT

Update

Deployment of the rollback is now underway. We will continue to monitor for any issues during the deployment.
Posted Apr 02, 2025 - 18:36 AEDT

Update

We are continuing to work on the rollback to resolve the issues.
Posted Apr 02, 2025 - 18:18 AEDT

Update

We are continuing to work on rolling back the changes and are continuing to monitor the situation.
Posted Apr 02, 2025 - 17:46 AEDT

Monitoring

We have Begun rolling back clients with the affected version of DXP and have begun monitoring them to see if there are any further issues.
Posted Apr 02, 2025 - 17:18 AEDT

Update

We are now implementing a Rollback for the affected version of DXP.
We will have another update ready once the operation is complete.
Posted Apr 02, 2025 - 16:53 AEDT

Identified

Squiz are continuing to work on a Fix for this issue and will implement a fix as soon as we have positive test results.
Posted Apr 02, 2025 - 16:38 AEDT

Update

We are continuing to investigate this issue.
Posted Apr 02, 2025 - 16:25 AEDT

Investigating

Squiz has been made aware of an error affecting customers on the latest version of DXP has caused an error when assets have been placed into safe edit where it will cause an error 500 on trying to access the asset in question.
Squiz is working on Rolling back affected clients now.
Posted Apr 02, 2025 - 16:25 AEDT
This incident affected: Squiz SaaS Hosted Instances.