EU-2 Orchestration system
Incident Report for Platform.sh
Postmortem

This incident was caused by resource contention in our orchestration layer. One of the components had a memory leak which resulted in a host using large amounts of swap which put additional stress on the other services run on the host. As a result, several hosts rebooted as they lost their connection to the orchestration layer. This incident affected one third of the EU-2 region hosts.

Fixes deployed during maintenance on Wednesday November 21st:

  • Split orchestration components on dedicated hosts
  • Fix for the memory leak in the coordinator process.
  • Fix for a bug in an upstream library used in the orchestration software.
  • Fix for a networking bug preventing project recovery after reallocation.

We are continuing to monitor the situation closely to ensure the issue does not present again.

Posted Nov 22, 2018 - 19:34 UTC

Resolved
This incident has been resolved.
Posted Nov 20, 2018 - 03:16 UTC
Update
We are continuing to monitor services in the EU-2 region. Please open a support ticket if you are still experiencing troubles.
Posted Nov 20, 2018 - 02:48 UTC
Monitoring
All services have been restored. If your site is not back up, please redeploy or create a support ticket.
Posted Nov 20, 2018 - 02:11 UTC
Identified
Services are recovering
Posted Nov 20, 2018 - 01:55 UTC
Investigating
We are currently investigating an outage with the Orchestration system on Platform EU-2 region.
Live and dev sites are affected.
Posted Nov 20, 2018 - 01:45 UTC
This incident affected: Europe (West 2) (eu-2.platform.sh).