Skip to content

Risk Acceptance: Single-Node k3s in Zone 1 (Development)

Document type: Formal Risk Acceptance ISO 27001 reference: A.8.14 (Redundancy of information processing facilities) Date: 2026-03-26 Risk owner: Dirk-Jan (Founder/Owner) Review cadence: Annually or on infrastructure change


Risk Description

Zone 1 (development environment, fort-knox-dev) runs a single-node k3s cluster on a single Minisforum physical server. This configuration provides no high availability (HA) — if the node or hardware fails, all GE agent infrastructure in Zone 1 becomes unavailable until manual recovery.


Risk Classification

Factor Assessment
Likelihood Medium — hardware failure is possible but mitigated by modern SSD/NVMe reliability and UPS
Impact Low — Zone 1 contains NO client data, NO production workloads. Only GE internal agent development and orchestration.
Residual risk level LOW

Why This Is Accepted

Zone 1 is a development-only environment. It processes no client data and serves no production traffic. Downtime in Zone 1 means GE agents stop working temporarily — no client SLA is breached, no data is at risk.

The cost of multi-node HA for a development environment (additional hardware, networking, etcd clustering) is disproportionate to the risk it mitigates.


Mitigating Controls

  1. Daily backups (Otto) — all persistent data backed up with tested restore procedures
  2. Zone separation enforced (ISO 27001 A.8.31) — Zone 1 has no path to client production data
  3. Zones 2 and 3 use UpCloud Managed Kubernetes — multi-node, HA-capable, with auto-scaling
  4. Hardware monitoring (Gerco) — temperature, disk, memory, CPU monitored every 5 minutes with alerting
  5. Rebuild procedure documented — k3s can be reinstalled and state restored from backups within 4 hours (RTO)
  6. Redis persistence — AOF + RDB snapshots ensure message queue state survives restarts
  7. Git as code SSOT — all agent configs, manifests, and code are in git; nothing is lost if Zone 1 hardware fails

Zones 2 and 3 (Staging + Production)

Zones 2 and 3 run on UpCloud Managed Kubernetes with: - Multi-node clusters (control plane HA managed by UpCloud) - Auto-scaling node pools - Cross-zone redundancy - Client data present — full HA is REQUIRED and implemented

These zones are NOT subject to this risk acceptance. They meet A.8.14 redundancy requirements fully.


Acceptance

Field Value
Risk accepted by Dirk-Jan (Founder/Owner)
Acceptance date 2026-03-26
Valid until 2027-03-26 (or until Zone 1 architecture changes)
Condition This acceptance is VOID if Zone 1 ever processes client data

License Scanning Gap (A.5.32)

Related risk: Automated license scanning (ISO 27001 A.5.32 — Intellectual property rights) is planned but not yet implemented in the CI/CD pipeline.

Factor Assessment
Likelihood Low — GE uses well-known open-source stacks (Next.js, Hono, Drizzle, PostgreSQL) with permissive licenses
Impact Medium — license violation could create legal liability
Residual risk level MEDIUM
Owner Koen (Code Quality Automation)
Target implementation Before first client project ships to production
Mitigating control Manual license review during dependency addition (developer responsibility until automated)

This document serves as the formal risk acceptance record an ISO 27001 auditor requires for A.8.14 and A.5.32.