US region coordinator issue
Incident Report for Platform.sh
Postmortem

Platform.sh monitoring detected a sitedown on the US region hosted on Amazon Web Services infrastructure and launched the recovery procedure. Multiple other projects started to misbehave. The engineers identified the errors as being caused by an orchestration software reboot and a following incomplete recovery. To prevent more sites from being affected a full region wide recovery was re-initiated. The incident affected 1.5% of live projects on our Professional offering US region where actions such as deployments or snapshots were triggered while the region’s coordinator was in an incomplete recovery state. No enterprise clusters were affected by the outage. We have implemented a permanent fix for the issue.

Posted Mar 12, 2018 - 20:11 UTC

Resolved
This incident has been resolved.
Posted Mar 11, 2018 - 13:18 UTC
Monitoring
The incident is over. We are recovering the environments that had failed during deployments and/or snapshots.
Posted Mar 11, 2018 - 12:57 UTC
Update
Deployments and snapshots are unable to proceed on US region, please do not trigger such actions.
The incident within our coordinator is still ongoing. Our engineers have identified a possible cause and are investigating mitigations.
Posted Mar 11, 2018 - 12:52 UTC
Update
Deployments and snapshots are unable to proceed on US region, please do not trigger such actions.
The incident within our coordinator is still ongoing. Our engineers have identified a possible cause and are investigating mitigations
Posted Mar 11, 2018 - 12:27 UTC
Identified
The incident within our internal APIs is still ongoing. Our engineers have identified a possible cause and are investigating mitigations. We will update you as soon as we have confirmed the diagnosis and validated the mitigation method
Posted Mar 11, 2018 - 11:58 UTC
Investigating
We detected an issue within our coordinator. We are investigating currently.
Posted Mar 11, 2018 - 11:43 UTC