Partial Outage on EU-2
Incident Report for Platform.sh
Postmortem

This incident was caused by resource contention in our orchestration layer. One of the components had a memory leak which resulted in a host using large amounts of swap which put additional stress on the other services run on the host. As a result, some hosts rebooted as they lost their connection to the orchestration layer. This incident affected 16% of the EU-2 region hosts.

Fixes deployed during maintenance on Wednesday November 21st:

  • Split orchestration components on dedicated hosts
  • Fix for the memory leak in the coordinator process.
  • Fix for a bug in an upstream library used in the orchestration software.
  • Fix for a networking bug preventing project recovery after reallocation.

We are continuing to monitor the situation closely to ensure the issue does not present again.

Posted Nov 22, 2018 - 19:23 UTC

Resolved
This incident is resolved. All services are operating normally.
Posted Nov 18, 2018 - 19:09 UTC
Monitoring
We have recovered services for EU-2 and are monitoring its status.
Posted Nov 18, 2018 - 18:02 UTC
Identified
We have identified the cause of the outage and are recovering services.
Posted Nov 18, 2018 - 17:45 UTC
Investigating
We are currently investigating an outage affecting a small number of sites in the EU-2 region.
Posted Nov 18, 2018 - 17:25 UTC
This incident affected: Europe (West 2) (eu-2.platform.sh).