Skip to content

Infrastructure Overview

Host: fort-knox-dev — Ubuntu 24.04, single-node k3s v1.34.3, IP 192.168.1.85 Codebase: /home/claude/ge-bootstrap/ on the host filesystem Port assignments: Source of truth is config/ports.yaml — NEVER hardcode ports

Kubernetes Namespaces

Namespace Purpose
ge-agents ge-orchestrator (2 replicas), shared executor (2 replicas, HPA 2-5), completion service, GitLab bridge
ge-system Admin UI (Next.js), PostgreSQL, Redis, Vault
ge-monitoring Grafana, Loki, Promtail
ge-wiki MkDocs documentation wiki
ge-ingress Traefik ingress controller (2 replicas)
ge-hosting Client-facing sites (landing page, www.growing-europe.com)
ge-gitlab Self-hosted GitLab (CI/CD) — Owner: Gerco

Core Services

ge-system

  • Admin UI — Next.js, SSOT control plane. External: office.growing-europe.com
  • PostgreSQL — Admin UI database, SSOT for all agent/task/discussion data
  • Redis — Port defined in config/ports.yaml. Redis Streams for agent triggers. StatefulSet redis-0
  • Vault — HashiCorp Vault for all secrets. Auto-unseals via CronJob every 2min

ge-agents

  • ge-orchestrator — Event-driven HA routing (2 replicas, consumer group failover). Routes work via Redis Streams. NEVER makes LLM calls. See Orchestrator.
  • Shared Executor (ge-executor) — 2 replicas (HPA: 2-5). Runs ALL 54 active agents via PTY capture
  • Completion Service — Processes completion signals
  • GitLab Bridge — 2 replicas, bridges GitLab CI events to agent system
  • Per-agent deployments — ALL scaled to 0. Legacy. All work goes through shared executor

All monitoring agents active (ron, annegreet, mira, eltjo re-enabled 2026-02-15 after hook loop fix). See agent-system pitfalls.

Execution Flow

TaskService → Redis XADD to triggers.{agent} → executor consumer group picks up
→ fetches agent config from admin-ui API → loads identity (CORE + ROLE + REFERENCE)
→ builds prompt (constitution + identity + task + JIT learnings) → provider CLI executes via PTY
→ completion file written → host cron syncs to DB
(XADD to ge:work:completed is PLANNED but not yet implemented)

Deploy code changes: Always rebuild executor image. See ge-ops/infrastructure/local/k3s/executor/build-executor.sh. NEVER use kubectl cp.

Executor Probe Configuration (2026-02-15)

The executor's k8s probes are tuned for long-running agent sessions (1-5 minutes):

  • Startup probe: HTTP /health, 12 attempts x 10s = 120s max startup
  • Liveness probe: exec kill -0 1 (checks PID 1 is alive), period=30s, failureThreshold=6 (180s tolerance). Uses exec instead of HTTP because the health server's asyncio event loop blocks during PTY agent execution — HTTP probes timeout and kill healthy pods.
  • Readiness probe: HTTP /ready, period=10s, failureThreshold=3

Source of truth: k8s/base/agents/executor.yaml

Consumer Group Configuration

Each executor pod uses Redis consumer groups for trigger delivery: - Group name: executor-group (shared across all pods) - Consumer name: $POD_NAME (unique per pod, from k8s downward API) - On startup: claims and ACKs orphaned messages from dead consumers (pods killed by restarts) - Dedup: exec_dedup:{work_item_id} Redis key with 5-min TTL prevents double execution

Source of truth: ge_agent/listener.py

Network & DNS

External Domains (Traefik + Let's Encrypt)

Domain Service
office.growing-europe.com Admin UI
wiki.growing-europe.com MkDocs Wiki
grafana.growing-europe.com Grafana
gitlab.growing-europe.com GitLab
www.growing-europe.com Company website

Internal Service DNS

DNS Name Service
redis.ge-system.svc.cluster.local Redis (port from config/ports.yaml)
vault.ge-system.svc.cluster.local Vault (8200)
admin-ui.ge-system.svc.cluster.local Admin UI (80)
wiki.ge-wiki.svc.cluster.local Wiki (80)

Known Issues

See Infrastructure Pitfalls for ClusterIP, DNS, and deployment gotchas.

Agent System (56 registered, 54 active)

By provider: Claude (51), OpenAI/Codex (4: margot, benjamin, jouke, dinand), Gemini (1: felice) Source of truth: ge-ops/master/AGENT-REGISTRY.json

Identity files: ge-ops/master/agent-configs/{name}/IDENTITY-CORE.md, IDENTITY-ROLE.md, IDENTITY-REFERENCE.md Agent learnings: Primary: ge-ops/master/agent-configs/{name}/LEARNINGS.md, fallback: ge-ops/agents/{name}/LEARNINGS.md (per ge_agent/identity/loader.py)

Secrets Management

All secrets in Vault at vault.ge-system.svc.cluster.local:8200. Vault unseal keys at /home/claude/ge-bootstrap/vault.keys (host file — NEVER move this).

Key k8s secrets: anthropic-api-key, internal-api-token, vault-bootstrap, admin-ui-app-secrets, admin-ui-postgres-credentials.