Squiz Cloud Incident - 20201029
Incident Report for Squiz
Resolved
Dear Customer,
SQUIZ is happy to confirm that the services are restored and the Incident is marked as recovered.

Incident Start: 00:38 AEDT Oct 30, 2020
Incident End: 20:31 AEDT Oct 31, 2020

What was the known extent of the impact:
A small number of customers with primary production servers residing in Melbourne Datacenter were experiencing outages or a degraded search functionality.
What is the current status of the investigation:
All services have been restored.
We apologise for any inconvenience this may have caused. Should you encounter any further problem, please do not hesitate to contact our Customer Support Team.
Posted Oct 30, 2020 - 20:40 AEDT
Update
Dear Customer,
SQUIZ is happy to confirm that almost all services are restored and the Incident is close to its final recovery.

Incident Start: 00:38 AEDT Oct 30, 2020
Expected Incident End: 20:45 AEDT Oct 31, 2020

What was the known extent of the impact:
A small number of customers with primary production servers residing in Melbourne Datacenter were experiencing outages or a degraded search functionality.
We apologise for any inconvenience this may have caused. Should you encounter any further problem, please do not hesitate to contact our Customer Support Team.
Posted Oct 30, 2020 - 20:27 AEDT
Update
Dear Customer,

Please be informed that the critical Incident, which started 00:38 AEDT Oct 30, 2020 is still ongoing and impacting a small number of customers with primary production services residing in Melbourne Datacenter.

Status of investigation:
All our expert teams are still restoring the service with the highest priority.


Posted Oct 30, 2020 - 16:15 AEDT
Update
Dear Customer,

SQUIZ would like to provide more visibility on our critical incident which may prevent some of our Melbourne hosted customers to connect to our SQUIZ Cloud instances.

Incident Start: 00:38 AEDT Oct 30, 2020

What is the known extent of the impact:
A small number of customers with primary production servers residing in Melbourne Datacentre are experiencing outages or degraded search functionality.

What is the current status of the investigation:
All our expert teams are still investigating the issue and restoring the services.

What is the estimated recovery time?
At the moment an estimated Recovery Time is not available

Is there a workaround available?
There is no workaround available


Posted Oct 30, 2020 - 10:40 AEDT
Update
Application Support Engineers are still working hard to restore from system backups that weren’t possible to recover due to corrupted image files.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Unfortunately, we're still unable to provide any ETR's at this stage.
Posted Oct 30, 2020 - 09:46 AEDT
Update
Our teams are still working to restore from backup systems that weren’t possible to recover due to corrupted image files.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Unfortunately, we're still unable to provide any ETR's at this stage.
Posted Oct 30, 2020 - 08:32 AEDT
Update
Our teams are still working to restore from backup systems that weren’t possible to recover due to corrupted image files. As previously noted, this is only affecting Melbourne DC hosted clients.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Unfortunately, we're still unable to provide any ETR's at this stage.
Posted Oct 30, 2020 - 07:32 AEDT
Update
Out teams are working to restore from backup systems that weren’t possible to recover due to corrupted image files.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates.

Next update ETA: 60 min
Posted Oct 30, 2020 - 06:09 AEDT
Update
As some VMs remain in a “paused” state due to corrupted image files, and we are continuing to work with the vendor to recover these. If data cannot be recovered, the VMs will be restored from backup.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Next update ETA: 30 minutes
Posted Oct 30, 2020 - 05:17 AEDT
Update
The storage array is back online, and most production VMs are now running.

Some VMs remain in a "paused" state due to corrupted image files, and we are continuing to work with the vendor to recover these. If data cannot be recovered, the VMs will be restored from backup.

Our vendor is continuing to troubleshoot the root cause, and has identified:
- a high number of errors in the storage array's intent log RAM (this has now been disabled)
- no SMART errors on the physical disks
At present they believe there may be a backplane fault in the chassis containing the physical disks, however this is still being confirmed.

Waiting on vendor to provide ETA.

Next update ETA: 30 minutes
Posted Oct 30, 2020 - 04:43 AEDT
Update
Based on the information from the vendor VMs are about to be brought up, storage parameters are fine and SSD SMART data is not logging anything out of the ordinary.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Next update ETA: 30 minutes?
Posted Oct 30, 2020 - 04:09 AEDT
Update
Updates are done, and the vendor is now using diagnostic tools from the most recent release to continue troubleshooting.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Next update ETA: 30 minutes
Posted Oct 30, 2020 - 03:29 AEDT
Update
Technical team is working on the resolution of the issue. The vendor has finished the work on initial upgrades and moved to the next stage of the process.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Next update ETA: 30 minutes
Posted Oct 30, 2020 - 03:05 AEDT
Update
We are continuing on resolution of the issue. The vendor's still working on the updates related to a recent software version.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates.
Posted Oct 30, 2020 - 02:46 AEDT
Update
We are continuing on resolution of the issue. The vendor's still working on the updates related to a recent software version.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates.
Posted Oct 30, 2020 - 02:31 AEDT
Update
Technical teams are working together with the vendor and currently are applying software updates on the array.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for updates.
Posted Oct 30, 2020 - 02:12 AEDT
Update
Our technical teams are continuing to work on a resolution of the issue relating to Squiz Cloud Storage together with the vendor support.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates.
Posted Oct 30, 2020 - 01:51 AEDT
Update
Our technical teams are continuing to work on a resolution of the issue relating to Squiz Cloud Storage together with the vendor support.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates
Posted Oct 30, 2020 - 01:30 AEDT
Identified
We have identified the service issue with Melbourne DC (AU) Squiz Cloud hosted instances. Our technical teams are working with the vendor to rectify the issue relating to Squiz Cloud storage subsystem as quickly as possible.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.

Current ETA to resolution: Undefined, waiting for vendor updates
Posted Oct 30, 2020 - 01:12 AEDT
Update
We are continuing to investigate a service issue affecting only Melbourne DC (AU) customers with Squiz Cloud hosted instances.

The issue has been identified and is related to connectivity to the Squiz Cloud storage subsystem.

We will post updates as soon as they become available to StatusPage. If you have any questions, please contact Squiz Support.


Posted Oct 30, 2020 - 00:52 AEDT
Investigating
.
Posted Oct 30, 2020 - 00:38 AEDT
This incident affected: Squiz Cloud Hosted Instances.