Kubernetes — Manifests¶
OWNER: gerco (dev), thijmen (staging), rutger (production)
ALSO_USED_BY: arjan, alex, tjitte
LAST_VERIFIED: 2026-03-26
GE_STACK_VERSION: k3s v1.34.x (Zone 1), UpCloud Managed K8s (Zones 2+3)
Overview¶
All GE Kubernetes resources are defined as YAML manifests stored in git.
Manifests live under k8s/base/ with zone-specific overlays in k8s/overlays/{zone}/.
This page covers Deployment, Service, Ingress, PDB, HPA, and resource limit conventions.
Deployment¶
Every GE Deployment follows this baseline structure:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ge-{service}
namespace: ge-{namespace}
labels:
app.kubernetes.io/name: ge-{service}
app.kubernetes.io/part-of: growing-europe
app.kubernetes.io/managed-by: ge-agents
ge.zone: "{zone}"
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app.kubernetes.io/name: ge-{service}
template:
metadata:
labels:
app.kubernetes.io/name: ge-{service}
app.kubernetes.io/part-of: growing-europe
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: ge-{service}
image: ge-bootstrap-{service}:latest
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
CHECK: every Deployment has both livenessProbe and readinessProbe
CHECK: every container has resources.requests AND resources.limits
CHECK: securityContext.runAsNonRoot: true is set
CHECK: maxUnavailable: 0 for zero-downtime rolling updates
GE Label Convention¶
All GE resources MUST carry these labels:
| Label | Value | Required |
|---|---|---|
app.kubernetes.io/name |
Service name (e.g., ge-executor) |
Yes |
app.kubernetes.io/part-of |
growing-europe |
Yes |
app.kubernetes.io/managed-by |
ge-agents or terraform |
Yes |
app.kubernetes.io/component |
api, worker, database, monitoring |
Yes |
app.kubernetes.io/version |
Semver or git SHA | Recommended |
ge.zone |
dev, staging, production |
Yes |
ge.team |
alfa, bravo, zulu, shared |
Recommended |
ANTI_PATTERN: using custom label keys like service: foo
FIX: use the app.kubernetes.io/ standard labels
GE Annotations¶
| Annotation | Purpose | Example |
|---|---|---|
ge.growing-europe.com/owner |
Agent responsible | gerco |
ge.growing-europe.com/cost-center |
Billing context | infrastructure |
ge.growing-europe.com/last-deployed |
ISO timestamp | 2026-03-26T10:00:00Z |
Service¶
GE uses ClusterIP by default. NodePort only for LAN-accessible services in Zone 1.
apiVersion: v1
kind: Service
metadata:
name: ge-{service}
namespace: ge-{namespace}
labels:
app.kubernetes.io/name: ge-{service}
app.kubernetes.io/part-of: growing-europe
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: ge-{service}
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
CHECK: Service selector matches Deployment pod labels exactly
CHECK: port names are descriptive (http, grpc, metrics) not just numbers
IF: service must be accessible from host machine (Zone 1 dev)
THEN: use NodePort with port from config/ports.yaml
ANTI_PATTERN: hardcoding NodePort values — always read from config
Ingress¶
Zone 1 uses Traefik (bundled with k3s). Zones 2+3 use UpCloud Managed Load Balancer.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ge-{service}
namespace: ge-{namespace}
annotations:
traefik.ingress.kubernetes.io/router.tls: "true"
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
tls:
- hosts:
- "{service}.ge.local"
secretName: ge-{service}-tls
rules:
- host: "{service}.ge.local"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ge-{service}
port:
name: http
IF: Zone 2 or 3 (UpCloud)
THEN: use UpCloud-specific Ingress annotations
THEN: TLS termination at load balancer level
CHECK: TLS is configured on every Ingress — no plain HTTP in any zone
PodDisruptionBudget¶
Every production Deployment with 2+ replicas MUST have a PDB:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ge-{service}
namespace: ge-{namespace}
spec:
minAvailable: 1
selector:
matchLabels:
app.kubernetes.io/name: ge-{service}
CHECK: PDB exists for every Deployment with replicas >= 2
CHECK: minAvailable is at least 1
HorizontalPodAutoscaler¶
GE caps HPA at 5 replicas to prevent token burn from runaway scaling.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ge-{service}
namespace: ge-{namespace}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ge-{service}
minReplicas: 2
maxReplicas: 5
behavior:
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
CHECK: maxReplicas never exceeds 5
CHECK: scaleUp.stabilizationWindowSeconds is at least 120
ANTI_PATTERN: setting maxReplicas > 5
FIX: 5 is the hard cap — see token burn prevention rules in CLAUDE.md
Resource Limits¶
GE resource tier guidelines:
| Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Use For |
|---|---|---|---|---|---|
| Tiny | 50m | 200m | 64Mi | 256Mi | CronJobs, sidecars |
| Small | 100m | 500m | 128Mi | 512Mi | API services, admin UI |
| Medium | 250m | 1000m | 256Mi | 1Gi | Executors, orchestrator |
| Large | 500m | 2000m | 512Mi | 2Gi | Database proxies, heavy processing |
CHECK: every container has both requests and limits
CHECK: limit-to-request ratio does not exceed 4x (prevents noisy-neighbour issues)
ANTI_PATTERN: setting no resource limits
FIX: OOMKilled and CPU throttling are worse than conservative limits
ANTI_PATTERN: setting requests equal to limits
FIX: allow burst headroom — set limits at 2-4x requests
Manifest Organization¶
k8s/
├── base/ # Base manifests (all zones)
│ ├── agents/ # Agent executor, orchestrator
│ ├── data/ # PostgreSQL, Redis
│ ├── system/ # Admin UI, wiki
│ └── monitoring/ # Health checks
├── overlays/
│ ├── dev/ # Zone 1 overrides (k3s-specific)
│ ├── staging/ # Zone 2 overrides (UpCloud)
│ └── production/ # Zone 3 overrides (UpCloud)
└── kustomization.yaml
IF: resource differs between zones
THEN: put the common parts in base/, zone-specific in overlays/
THEN: use Kustomize patches, not separate full manifests
Cross-References¶
READ_ALSO: wiki/docs/stack/kubernetes/index.md
READ_ALSO: wiki/docs/stack/kubernetes/networking.md
READ_ALSO: wiki/docs/stack/kubernetes/security.md
READ_ALSO: wiki/docs/stack/kubernetes/operations.md
READ_ALSO: wiki/docs/stack/kubernetes/pitfalls.md
READ_ALSO: wiki/docs/stack/kubernetes/checklist.md