SC
Sarah Chen
INC-4821 · P1
·
api-gateway · prod · us-east
SLA · 26m
SC
Hey, our API gateway is completely down. 5xx errors on every endpoint — all customers in us-east are affected right now.
14:31
Luma Agent
Flagged as INC-4821 · P1 and raised in ServiceNow. Pulling signals from logs, recent deploys, and metrics — RCA running now.
14:31
Luma Agent · RCA complete
Root cause identified
3 signals cross-correlated
git · a3f91c
Jira · JAM-1201
pod-3 logs
Deployment v2.4.1 at 14:28 UTC introduced a null pointer in the auth middleware. Memory leak is causing OOM crashes on 3 of 5 gateway pods.
Confidence
92%
14:32
SC
What's the fastest fix? Need this back up immediately.
14:32
Luma Agent
Fastest path to recovery — estimated ~4 minutes:
Recovery runbook
1
Roll back with
kubectl rollout undo deployment/api-gateway --to-revision=142
Drain and restart the 3 affected pods in us-east
3
Monitor 5xx rate for 10 minutes before marking resolved
4
File post-mortem in Jira and notify impacted enterprise accounts
14:32
Show rollback command
Post to Slack
Escalate to L2
Notify customers