Major Incident - Ash Datacenter

Incident Report for Squiz

Postmortem

Executive Summary

At ~18:12 UTC on the 17th of November 2022, Squiz internal monitoring started generating alerts reporting a degradation of service for a few customers hosted in our ASH Data Centre. Performed investigations by our Data Centre team indicated that a host Virtual Machine (VM) became unresponsive between 2-10 minutes for a few customers. 

It was observed that the VM’s hosted on the impacted host were still running. No recovery actions were performed and the issue auto resolved at ~18:34 UTC. 

Customer impact

Between ~18:12 UTC and ~18:34 UTC on the 17th of November 2022, some customers may have experienced a degradation of service when accessing their applications hosted in the ASH Data Centre.

 

Root cause

Performed investigations by our Data Centre team indicated that the host VM became unresponsive and was not restarted due to the Cluster Fencing Policy.

Mitigation and Follow-up actions

Squiz Data Centre team have performed the below action and continue to monitor the situation closely:

  1. An emergency maintenance activity has been successfully completed (21st of November 2022) in the ASH Data Centre as a result of which the impacted host VM has been successfully restarted. Customer VM’s hosted on this host VM are running fine. 

No further actions are required at this stage, hosted applications in the ASH Data Centre remain stable.

Posted Nov 23, 2022 - 11:25 AEDT

Resolved

Squiz monitoring detected an issue leading to a degradation of services for a subset of customer in our ASH data centre. This service issue has now been resolved.

A postmortem will be provided via https://status.squiz.cloud .
Posted Nov 18, 2022 - 06:00 AEDT

Investigating

Squiz monitoring has detected a degradation of service incident that is affecting customers hosted in our ASH Data Centre, some customers may have experienced a brief degradation of service. Multiple Squiz teams are currently investigating.
Posted Nov 18, 2022 - 05:52 AEDT
This incident affected: Squiz Cloud Hosted Instances.