Root cause
As part of our SOC 2 journey we were ‘hardening’ the network connectivity of the app internal infrastructure, whilst cleaning up and documenting the IP connectivity within the app, a change was made which broke outbound connectivity via NAT that caused a cascade failure over all nodes, this took a little time to track down.
Lessons learned
Security Group rules needed fully documenting, that is being done.
Impact
We expect no data loss from this outage, events will be retried (from your Jira to JEMHC) no inbound mail will have been lost.