Partial Outage on EU-4
Incident Report for Platform.sh
Postmortem

On November 20, 2020, Platform.sh monitoring detected various stuck cron jobs on a number of projects that may have caused site issues. As we investigated, we discovered this event had the potential to affect projects across multiple regions, most notably on the EU-4 and EU-2 regions. 

At 16:05 UTC, an incident was created and escalated to our internal teams for further troubleshooting. Immediately it was identified that the issues were related to a previously deployed feature update related to the handling of cron tasks. 

At 16:06 UTC, a statuspage update was posted, and our team began to work on a multi-phased approach that included, but was not limited to the following:

  1. Disable the new cron feature on any affected region,
  2. Troubleshoot any stuck cron jobs and any related issues for individual projects in that region,
  3. Re-enable existing cron services for all projects in the affected regions,
  4. Verify the cron service for each region was fully restored.

Around 16:49 UTC, our team began executing the previously outlined summary to restore services. Not all projects were affected during this event, and those affected may have experienced varied impacts. During the event, services may have been restored for individual projects at different times throughout the evening as progress was made. 

This incident was resolved for the EU-2 and EU-4 regions on November 21, 2020 around 02:30 UTC and monitored until it was officially closed at 03:15 UTC.

Posted Dec 01, 2020 - 16:58 UTC

Resolved
This incident has been resolved.
Posted Nov 20, 2020 - 23:10 UTC
Update
We are continuing to work on this issue and are restoring scheduled crons. All sites should be online at this point, if you do see an issue with your site please open a new support ticket.
Posted Nov 20, 2020 - 21:31 UTC
Update
Impacted projects have been restored, however cron scheduling continues to be affected. Our operations and engineering teams are continuing to work on a fix for this issue.
Posted Nov 20, 2020 - 19:42 UTC
Update
Our operations and engineering teams are still working on a fix for this issue.
Posted Nov 20, 2020 - 18:31 UTC
Update
We have identified the same issue affecting the regions DE-2, and EU-2. This issue will not affect any projects that have not been impacted yet, and we are working on recovering the affected ones.
Posted Nov 20, 2020 - 17:02 UTC
Identified
We have identified the issue, and we are working on the fix.
Posted Nov 20, 2020 - 16:12 UTC
Investigating
We have detected an issue affecting service on the EU-4 region. Our Operations team has been notified and is currently working to restore service. Projects on affected regions may experience site outages.
We will update you as soon as we have further information.
Posted Nov 20, 2020 - 16:08 UTC
This incident affected: Europe (Germany) (de-2.platform.sh), Europe (West 2) (eu-2.platform.sh), and Europe (West 4) (eu-4.platform.sh).