Console offline

Incident Report for Platform.sh

Postmortem

What happened

An issue during an upgrade to our API caused parts of it to go partially offline. The containers and services on one node of the cluster were not accessible, and caused intermittent issues on some requests to some parts of the API.

Customer Impact

The Console and UI were intermittently offline for approximately 3 hours. There was no impact on production site availability.

What was done to resolve the incident

The issue was limited to one node of the cluster, and that node was disabled to fix the immediate customer-facing issues. The underlying issue was resolved, and the node returned to service.

Posted Nov 14, 2023 - 22:53 UTC

Resolved

This incident has been resolved.

Posted Nov 14, 2023 - 22:35 UTC

Update

Our Teams will continue to closely monitor Organizations API cluster, our next update will follow in 24 hours.

Posted Nov 14, 2023 - 14:03 UTC

Update

Our team has restored service to all nodes in the cluster. We are monitoring the Organizations API cluster and the console.

Posted Nov 14, 2023 - 12:10 UTC

Monitoring

The Engineering Team have shifted the traffic to the working nodes in the cluster and are troubleshooting the issue.

We are monitoring the Organizations API cluster and the console.

Posted Nov 14, 2023 - 11:12 UTC

Identified

We are seeing failures on the Organizations API cluster and our Engineering team is working to correct the problem.

Posted Nov 14, 2023 - 09:49 UTC

Monitoring

Our Team have restored service to the Organizations API cluster and the console is now responding. We are monitoring the Organizations API cluster and the console.

Posted Nov 14, 2023 - 08:21 UTC

Update

Our Teams are still working on this issue. We have engaged our Engineering Team to assist with the troubleshooting process.

Posted Nov 14, 2023 - 08:12 UTC

Identified

Our Operations Team have performed and upgrade on our Organizations API cluster and have encountered an issue.

At this time, the console and the API are not responding correctly to all calls.

We are working with the team to bring this endpoint back online.

Posted Nov 14, 2023 - 07:20 UTC

Investigating

We are currently investigating this issue.

Posted Nov 14, 2023 - 07:01 UTC

This incident affected: Accounts (Console).