Major Incident - Sydney Datacenter
Incident Report for Squiz
Postmortem

Executive Summary

At ~06:38 AEDT on the 02nd of November 2022, Squiz internal monitoring started generating alerts reporting a degradation of service for a few customers hosted in our Sydney Data Centre. Performed investigations by our Data Centre team indicated that some of the Virtual Machine’s (VM) hosted in the Sydney Data Centre entered into a “paused” state due to an “unknown storage” error. These impacted VM’s were restarted, promoting a complete recovery at ~08:06 AEDT. No data issues were observed.

Customer impact

Between ~06:38 AEDT and ~08:06 AEDT on the 02nd of November 2022, some customers may have experienced a degradation of service when accessing their applications hosted in the Sydney Data Centre.

Root cause

Performed investigations by our Data Centre team and technical engineers indicated that the compute node on the storage cluster had issues, thereby updating the VM’s to a paused state.

Mitigation and Follow-up actions

The Squiz Data Centre team is continuing to assess the situation and as a result below actions have been undertaken:

  1. A ticket has been opened with our external provider to further investigate the cause of the “unknown storage” error. 
  2. Upgrade the storage cluster to a newer version.
  3. Improve internal monitoring to better align with change in status of VM’s.
Posted Nov 03, 2022 - 18:02 AEDT

Resolved
This service issue has now been resolved. If your Squiz Cloud hosted instance was affected a report will be forwarded to your nominated contact within 7 business days. If you do not receive the report, please contact Squiz Support.
Posted Nov 02, 2022 - 08:06 AEDT
Monitoring
The service issue with the Sydney datacenter has been rectified and all services have been restored. We are continuing to monitor the situation closely to ensure services remain fully operational.

If you are still experiencing issues, please contact Squiz Support.
Posted Nov 02, 2022 - 07:53 AEDT
Identified
We have identified the service issue with the Sydney datacenter. Our technical teams are working to rectify the issue as quickly as possible.

We will post updates as soon as they become available, if you have any questions, please contact Squiz Support

Current ETA to resolution: 30 minutes
Posted Nov 02, 2022 - 07:44 AEDT
Update
Our teams are continuing at this time to investigate this issue, we will provide additional updates once we can confirm the affected services are beginning to return.
Posted Nov 02, 2022 - 07:23 AEDT
Investigating
Squiz monitoring has detected a degradation of service incident that is affecting customers hosted in our Sydney Data Centre, some customers may experience a degradation of service. Multiple Squiz teams are currently investigating.

A further update will be provided in ~15 minutes.
Posted Nov 02, 2022 - 07:07 AEDT
This incident affected: Squiz Cloud Hosted Instances.