Infrastructure Overview¶
Host: fort-knox-dev — Ubuntu 24.04, single-node k3s v1.34.3, IP 192.168.1.85
Codebase: /home/claude/ge-bootstrap/ on the host filesystem
Port assignments: Source of truth is config/ports.yaml — NEVER hardcode ports
Kubernetes Namespaces¶
| Namespace | Purpose |
|---|---|
ge-agents |
ge-orchestrator (2 replicas), shared executor (2 replicas, HPA 2-5), completion service, GitLab bridge |
ge-system |
Admin UI (Next.js), PostgreSQL, Redis, Vault |
ge-monitoring |
Grafana, Loki, Promtail |
ge-wiki |
MkDocs documentation wiki |
ge-ingress |
Traefik ingress controller (2 replicas) |
ge-hosting |
Client-facing sites (landing page, www.growing-europe.com) |
ge-gitlab |
Self-hosted GitLab (CI/CD) — Owner: Gerco |
Core Services¶
ge-system¶
- Admin UI — Next.js, SSOT control plane. External:
office.growing-europe.com - PostgreSQL — Admin UI database, SSOT for all agent/task/discussion data
- Redis — Port defined in
config/ports.yaml. Redis Streams for agent triggers. StatefulSetredis-0 - Vault — HashiCorp Vault for all secrets. Auto-unseals via CronJob every 2min
ge-agents¶
- ge-orchestrator — Event-driven HA routing (2 replicas, consumer group failover). Routes work via Redis Streams. NEVER makes LLM calls. See Orchestrator.
- Shared Executor (
ge-executor) — 2 replicas (HPA: 2-5). Runs ALL 54 active agents via PTY capture - Completion Service — Processes completion signals
- GitLab Bridge — 2 replicas, bridges GitLab CI events to agent system
- Per-agent deployments — ALL scaled to 0. Legacy. All work goes through shared executor
All monitoring agents active (ron, annegreet, mira, eltjo re-enabled 2026-02-15 after hook loop fix). See agent-system pitfalls.
Execution Flow¶
TaskService → Redis XADD to triggers.{agent} → executor consumer group picks up
→ fetches agent config from admin-ui API → loads identity (CORE + ROLE + REFERENCE)
→ builds prompt (constitution + identity + task + JIT learnings) → provider CLI executes via PTY
→ completion file written → host cron syncs to DB
(XADD to ge:work:completed is PLANNED but not yet implemented)
Deploy code changes: Always rebuild executor image. See ge-ops/infrastructure/local/k3s/executor/build-executor.sh. NEVER use kubectl cp.
Executor Probe Configuration (2026-02-15)¶
The executor's k8s probes are tuned for long-running agent sessions (1-5 minutes):
- Startup probe: HTTP
/health, 12 attempts x 10s = 120s max startup - Liveness probe:
exec kill -0 1(checks PID 1 is alive), period=30s, failureThreshold=6 (180s tolerance). Uses exec instead of HTTP because the health server's asyncio event loop blocks during PTY agent execution — HTTP probes timeout and kill healthy pods. - Readiness probe: HTTP
/ready, period=10s, failureThreshold=3
Source of truth: k8s/base/agents/executor.yaml
Consumer Group Configuration¶
Each executor pod uses Redis consumer groups for trigger delivery:
- Group name: executor-group (shared across all pods)
- Consumer name: $POD_NAME (unique per pod, from k8s downward API)
- On startup: claims and ACKs orphaned messages from dead consumers (pods killed by restarts)
- Dedup: exec_dedup:{work_item_id} Redis key with 5-min TTL prevents double execution
Source of truth: ge_agent/listener.py
Network & DNS¶
External Domains (Traefik + Let's Encrypt)¶
| Domain | Service |
|---|---|
office.growing-europe.com |
Admin UI |
wiki.growing-europe.com |
MkDocs Wiki |
grafana.growing-europe.com |
Grafana |
gitlab.growing-europe.com |
GitLab |
www.growing-europe.com |
Company website |
Internal Service DNS¶
| DNS Name | Service |
|---|---|
redis.ge-system.svc.cluster.local |
Redis (port from config/ports.yaml) |
vault.ge-system.svc.cluster.local |
Vault (8200) |
admin-ui.ge-system.svc.cluster.local |
Admin UI (80) |
wiki.ge-wiki.svc.cluster.local |
Wiki (80) |
Known Issues¶
See Infrastructure Pitfalls for ClusterIP, DNS, and deployment gotchas.
Agent System (56 registered, 54 active)¶
By provider: Claude (51), OpenAI/Codex (4: margot, benjamin, jouke, dinand), Gemini (1: felice)
Source of truth: ge-ops/master/AGENT-REGISTRY.json
Identity files: ge-ops/master/agent-configs/{name}/IDENTITY-CORE.md, IDENTITY-ROLE.md, IDENTITY-REFERENCE.md
Agent learnings: Primary: ge-ops/master/agent-configs/{name}/LEARNINGS.md, fallback: ge-ops/agents/{name}/LEARNINGS.md (per ge_agent/identity/loader.py)
Secrets Management¶
All secrets in Vault at vault.ge-system.svc.cluster.local:8200. Vault unseal keys at /home/claude/ge-bootstrap/vault.keys (host file — NEVER move this).
Key k8s secrets: anthropic-api-key, internal-api-token, vault-bootstrap, admin-ui-app-secrets, admin-ui-postgres-credentials.