Service Outage detected

Incident Report for The Plugin People

Resolved

Larger app nodes appear to have chewed through the webhook backlog, mail flowing.

Posted Jan 31, 2020 - 17:28 UTC

Monitoring

That was gnarly! We believe stability has now been restored - the en-masse webhook updates from Jira have subsided and we're back to BAU. No code changes caused this or were required to fix it. We've upped the database size again, to allow for increased capacity in similar situations in future, with work underway to further improve resilience. Again, apologies to those affected - we believe no data was lost here. We continue to monitor.

Posted Jan 31, 2020 - 17:22 UTC

Update

We believe the cause of the problems is related to database connection exhaustion driven by repeat high volume 'missed' webhooks from Jira cloud. Every time a new node come up, its buried by events. We have some options to increase capacity which will be done soon.

Posted Jan 31, 2020 - 07:16 UTC

Update

Mail is being sent and receive periodically, slowly. UI access is erratic. Root cause is still elusive, will continue in 8hrs - must sleep now.

Posted Jan 30, 2020 - 23:14 UTC

Update

Sorry for the long running incident. We haven't made any code or infrastructure changes to cause this problem - its data-driven. We are working to mitigate the problem and will post updates when we have them.

Posted Jan 30, 2020 - 20:26 UTC

Identified

The issue has been identified and a fix is being implemented.

Posted Jan 30, 2020 - 15:44 UTC

Investigating

We are currently investigating this issue.

Posted Jan 30, 2020 - 13:48 UTC

This incident affected: Enterprise Mail Handler for Jira Cloud (JEMHC) (Inbound mail processing, Outbound notification processing).