Outage on Europe (West 2)

Incident Report for Platform.sh

Postmortem

On November 20, 2020, Platform.sh monitoring detected various stuck cron jobs on a number of projects that may have caused site issues. As we investigated, we discovered this event had the potential to affect projects across multiple regions, most notably on the EU-4 and EU-2 regions.

At 16:05 UTC, an incident was created and escalated to our internal teams for further troubleshooting. Immediately it was identified that the issues were related to a previously deployed feature update related to the handling of cron tasks.

At 16:06 UTC, a statuspage update was posted, and our team began to work on a multi-phased approach that included, but was not limited to the following:

Disable the new cron feature on any affected region,
Troubleshoot any stuck cron jobs and any related issues for individual projects in that region,
Re-enable existing cron services for all projects in the affected regions,
Verify the cron service for each region was fully restored.

Around 16:49 UTC, our team began executing the previously outlined summary to restore services. Not all projects were affected during this event, and those affected may have experienced varied impacts. During the event, services may have been restored for individual projects at different times throughout the evening as progress was made.

This incident was resolved for the EU-2 and EU-4 regions on November 21, 2020 around 02:30 UTC and monitored until it was officially closed at 03:15 UTC.

Posted Dec 01, 2020 - 16:58 UTC

Resolved

This incident is now resolved. If you are still experiencing issues please raise a support ticket https://support.platform.sh/

Posted Nov 21, 2020 - 03:15 UTC

Update

We are continuing to monitor for any further issues.

Posted Nov 21, 2020 - 02:36 UTC

Monitoring

The issue with crons has been resolved and projects should now be fully functional. If you are still experiencing issues please raise a support ticket https://support.platform.sh/

Posted Nov 21, 2020 - 02:36 UTC

Update

Sites on the region are now responding. The Operations and engineering teams are currently working on an issue with the crons system. Further updates will be provided once the issue with crons is resolved.

Posted Nov 21, 2020 - 01:11 UTC

Identified

Our Operations and Engineering teams have identified the problem and are working to restore services.

Posted Nov 21, 2020 - 00:01 UTC

Update

We are finding a small number of projects in the AU region that are affected as well. We're continuing our investigation.

Posted Nov 20, 2020 - 22:40 UTC

Update

We are continuing to investigate this issue. We will post an update as soon as we have one.

Posted Nov 20, 2020 - 22:33 UTC

Investigating

We have detected an issue affecting service on the EU-2 region. We are currently working to restore service.
This issue affects multiple production sites as well as development environments. Access to your site, project UI as well as Git and SSH access may be affected.

This outage does not affect Dedicated Enterprise Clusters.

We will update you as soon as we have further information.

Posted Nov 20, 2020 - 21:45 UTC

This incident affected: Australia (au.platform.sh) and Europe (West 2) (eu-2.platform.sh).