Risk Acceptance: Single-Node k3s in Zone 1 (Development)¶
Document type: Formal Risk Acceptance ISO 27001 reference: A.8.14 (Redundancy of information processing facilities) Date: 2026-03-26 Risk owner: Dirk-Jan (Founder/Owner) Review cadence: Annually or on infrastructure change
Risk Description¶
Zone 1 (development environment, fort-knox-dev) runs a single-node k3s cluster on a single Minisforum physical server. This configuration provides no high availability (HA) — if the node or hardware fails, all GE agent infrastructure in Zone 1 becomes unavailable until manual recovery.
Risk Classification¶
| Factor | Assessment |
|---|---|
| Likelihood | Medium — hardware failure is possible but mitigated by modern SSD/NVMe reliability and UPS |
| Impact | Low — Zone 1 contains NO client data, NO production workloads. Only GE internal agent development and orchestration. |
| Residual risk level | LOW |
Why This Is Accepted¶
Zone 1 is a development-only environment. It processes no client data and serves no production traffic. Downtime in Zone 1 means GE agents stop working temporarily — no client SLA is breached, no data is at risk.
The cost of multi-node HA for a development environment (additional hardware, networking, etcd clustering) is disproportionate to the risk it mitigates.
Mitigating Controls¶
- Daily backups (Otto) — all persistent data backed up with tested restore procedures
- Zone separation enforced (ISO 27001 A.8.31) — Zone 1 has no path to client production data
- Zones 2 and 3 use UpCloud Managed Kubernetes — multi-node, HA-capable, with auto-scaling
- Hardware monitoring (Gerco) — temperature, disk, memory, CPU monitored every 5 minutes with alerting
- Rebuild procedure documented — k3s can be reinstalled and state restored from backups within 4 hours (RTO)
- Redis persistence — AOF + RDB snapshots ensure message queue state survives restarts
- Git as code SSOT — all agent configs, manifests, and code are in git; nothing is lost if Zone 1 hardware fails
Zones 2 and 3 (Staging + Production)¶
Zones 2 and 3 run on UpCloud Managed Kubernetes with: - Multi-node clusters (control plane HA managed by UpCloud) - Auto-scaling node pools - Cross-zone redundancy - Client data present — full HA is REQUIRED and implemented
These zones are NOT subject to this risk acceptance. They meet A.8.14 redundancy requirements fully.
Acceptance¶
| Field | Value |
|---|---|
| Risk accepted by | Dirk-Jan (Founder/Owner) |
| Acceptance date | 2026-03-26 |
| Valid until | 2027-03-26 (or until Zone 1 architecture changes) |
| Condition | This acceptance is VOID if Zone 1 ever processes client data |
License Scanning Gap (A.5.32)¶
Related risk: Automated license scanning (ISO 27001 A.5.32 — Intellectual property rights) is planned but not yet implemented in the CI/CD pipeline.
| Factor | Assessment |
|---|---|
| Likelihood | Low — GE uses well-known open-source stacks (Next.js, Hono, Drizzle, PostgreSQL) with permissive licenses |
| Impact | Medium — license violation could create legal liability |
| Residual risk level | MEDIUM |
| Owner | Koen (Code Quality Automation) |
| Target implementation | Before first client project ships to production |
| Mitigating control | Manual license review during dependency addition (developer responsibility until automated) |
This document serves as the formal risk acceptance record an ISO 27001 auditor requires for A.8.14 and A.5.32.