Pitfall: Reactive Debugging Without Architecture Context¶
The Anti-Pattern¶
Diagnosing CI/CD, deployment, or infrastructure failures by chasing error messages (fix error → next error → next error) instead of first consulting the architecture documentation to understand: - What infrastructure actually exists - What the deployment flow is designed to be - What the machine capabilities are - What services should be running
Incident: 2026-04-09¶
A full session was spent debugging CI pipeline failures reactively. The actual root causes were: 1. An expired ArgoCD token (5-minute fix) 2. Admin-UI pod in CrashLoopBackOff (missing .next build)
Instead of diagnosing from the architecture doc (production-deployment-architecture.md), the session produced:
- A failed DAST self-contained approach (commit + revert) — unnecessary because DAST was already correctly configured to scan the deployed app
- An incorrect "needs dedicated runner" conclusion — fort-knox-dev IS the dedicated 16c/64GB development environment
- 6 deferrals of "deploy:staging fix — separate concern" — it was a 5-minute ArgoCD token refresh
The Rule¶
BEFORE diagnosing any infrastructure issue:
- Read
ge-ops/wiki/docs/development/standards/production-deployment-architecture.md - Check actual infrastructure state:
kubectl get pods, service health endpoints - Verify the deployment target is running (admin-ui, orchestrator, etc.)
- THEN look at CI job logs
Verification Checklist¶
When a CI pipeline fails on deploy/integration/e2e:
- [ ] Is the target app running? (
kubectl get pods -n ge-system) - [ ] Is ArgoCD healthy? (
kubectl get pods -n argocd) - [ ] Is the ArgoCD token valid? (check
ARGOCD_AUTH_TOKENin CI variables) - [ ] Is the app reachable? (
curl http://admin-ui.ge.internal/api/system/health) - [ ] Is DNS working from pods? (CoreDNS
coredns-customConfigMap)
Key Facts (read, don't guess)¶
- fort-knox-dev = 8 cores / 16 threads, 64 GB RAM, 1 TB SSD
- fort-knox-dev IS the development/staging environment — there is no separate staging
- The CI runner runs ON this machine with
concurrent = 10 - Admin-UI deployment uses hostPath mount (
/home/claude/ge-bootstrap/admin-ui→/app) - Must have
.nextbuild on disk fornpm startto work - DAST scans the deployed app at
admin-ui.ge.internal, not a self-built server
Captured from INC-20260409 session. Applies to all agents and future Claude sessions.