Conversations
Active · 4
SC
Sarah Chennow
API gateway completely down…
P1RCA active
MH
Marcus Hill3m
Redis OOM in eu-west cluster
P2
Dev Team Bot18m
CI pipeline failing on main
P3
PN
Priya Nair41m
Slack webhooks timing out
P3
Resolved · 9
JT
Jake Torres1h
DB connection pool exhausted
Resolved
OA
Ops Alert2h
Memory spike on checkout
Resolved
AL
Alex Liu3h
CDN cache invalidation delay
Resolved
SC
Sarah Chen
INC-4821 · P1 · api-gateway · prod · us-east
SLA · 26m
SC
Hey, our API gateway is completely down. 5xx errors on every endpoint — all customers in us-east are affected right now.
14:31
Luma Agent
Flagged as INC-4821 · P1 and raised in ServiceNow. Pulling signals from logs, recent deploys, and metrics — RCA running now.
14:31
Luma Agent · RCA complete
Root cause identified
3 signals cross-correlated
git · a3f91c Jira · JAM-1201 pod-3 logs
Deployment v2.4.1 at 14:28 UTC introduced a null pointer in the auth middleware. Memory leak is causing OOM crashes on 3 of 5 gateway pods.
Confidence
92%
14:32
SC
What's the fastest fix? Need this back up immediately.
14:32
Luma Agent
Fastest path to recovery — estimated ~4 minutes:
Recovery runbook
1
Roll back with kubectl rollout undo deployment/api-gateway --to-revision=14
2
Drain and restart the 3 affected pods in us-east
3
Monitor 5xx rate for 10 minutes before marking resolved
4
File post-mortem in Jira and notify impacted enterprise accounts
14:32
Show rollback command Post to Slack Escalate to L2 Notify customers
Incident
IDINC-4821
PriorityP1 Critical
Serviceapi-gateway
Environmentprod · us-east
Opened4 min ago
SLA breachin 26 min
Linked items
JAM-1201
Deploy v2.4.1 to prod
a3f91c · main
fix: auth middleware null check
INC0041023
ServiceNow · auto-created
#incidents
3 messages · 2 min ago
Agent insights
Similar incident
INC-4710 — same pattern 18 days ago. Resolved in 6 min via rollback.
Impact
~2,400 active sessions. 6 enterprise accounts in us-east affected.
Error rate
97.3% of requests failing since 14:28 UTC — correlates with deploy time.