Unable to load Platform
Incident Report for Ellingham Innovations Ltd
Postmortem

A short downtime occurred during deployment due to a failed migration. This affected all Platform services. Fortunately, this incident was picked up quickly and resolved with a secondary Hot Fix deployment with a fixed migration.

Such issues are rare and fortunately the timing of this issue and speed of resolution prevented too many failures on the Platform.

No data was lost and everything is secure.

An investigation is underway to determine how this issue was not detected before or during deployment. The deployments returned a success status on all services, which is why the deployment wasn’t cancelled.

Our normal process, should a deployment fail, is to immediately cancel the deployment and remove all affected servers from the load balancer. This prevents any downtime and gives us time to investigate. Additional checks need to be implemented into the deployment pipeline to help prevent issues such as this in the future.

Posted Oct 24, 2020 - 04:23 UTC

Resolved
A hotfix was applied and the migration has run successfully. All services are back up.
Posted Oct 24, 2020 - 04:13 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Oct 24, 2020 - 04:11 UTC
Identified
There has been a database outage due to a failed migration. We are working on the situation. No data has been lost and everything is secure.
Posted Oct 24, 2020 - 04:10 UTC
This incident affected: Ellingham Platform (Platform Sites, Platform Emails, Reporting Services, Mobile Backend API).