Skip to content

Kubernetes — Manifests

OWNER: gerco (dev), thijmen (staging), rutger (production)
ALSO_USED_BY: arjan, alex, tjitte
LAST_VERIFIED: 2026-03-26
GE_STACK_VERSION: k3s v1.34.x (Zone 1), UpCloud Managed K8s (Zones 2+3)


Overview

All GE Kubernetes resources are defined as YAML manifests stored in git.
Manifests live under k8s/base/ with zone-specific overlays in k8s/overlays/{zone}/.
This page covers Deployment, Service, Ingress, PDB, HPA, and resource limit conventions.


Deployment

Every GE Deployment follows this baseline structure:

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: ge-{service}  
  namespace: ge-{namespace}  
  labels:  
    app.kubernetes.io/name: ge-{service}  
    app.kubernetes.io/part-of: growing-europe  
    app.kubernetes.io/managed-by: ge-agents  
    ge.zone: "{zone}"  
spec:  
  replicas: 2  
  strategy:  
    type: RollingUpdate  
    rollingUpdate:  
      maxUnavailable: 0  
      maxSurge: 1  
  selector:  
    matchLabels:  
      app.kubernetes.io/name: ge-{service}  
  template:  
    metadata:  
      labels:  
        app.kubernetes.io/name: ge-{service}  
        app.kubernetes.io/part-of: growing-europe  
    spec:  
      securityContext:  
        runAsNonRoot: true  
        runAsUser: 1000  
        fsGroup: 1000  
      containers:  
        - name: ge-{service}  
          image: ge-bootstrap-{service}:latest  
          resources:  
            requests:  
              cpu: "100m"  
              memory: "128Mi"  
            limits:  
              cpu: "500m"  
              memory: "512Mi"  
          livenessProbe:  
            httpGet:  
              path: /healthz  
              port: 8080  
            initialDelaySeconds: 10  
            periodSeconds: 30  
          readinessProbe:  
            httpGet:  
              path: /readyz  
              port: 8080  
            initialDelaySeconds: 5  
            periodSeconds: 10  

CHECK: every Deployment has both livenessProbe and readinessProbe
CHECK: every container has resources.requests AND resources.limits
CHECK: securityContext.runAsNonRoot: true is set
CHECK: maxUnavailable: 0 for zero-downtime rolling updates


GE Label Convention

All GE resources MUST carry these labels:

Label Value Required
app.kubernetes.io/name Service name (e.g., ge-executor) Yes
app.kubernetes.io/part-of growing-europe Yes
app.kubernetes.io/managed-by ge-agents or terraform Yes
app.kubernetes.io/component api, worker, database, monitoring Yes
app.kubernetes.io/version Semver or git SHA Recommended
ge.zone dev, staging, production Yes
ge.team alfa, bravo, zulu, shared Recommended

ANTI_PATTERN: using custom label keys like service: foo
FIX: use the app.kubernetes.io/ standard labels


GE Annotations

Annotation Purpose Example
ge.growing-europe.com/owner Agent responsible gerco
ge.growing-europe.com/cost-center Billing context infrastructure
ge.growing-europe.com/last-deployed ISO timestamp 2026-03-26T10:00:00Z

Service

GE uses ClusterIP by default. NodePort only for LAN-accessible services in Zone 1.

apiVersion: v1  
kind: Service  
metadata:  
  name: ge-{service}  
  namespace: ge-{namespace}  
  labels:  
    app.kubernetes.io/name: ge-{service}  
    app.kubernetes.io/part-of: growing-europe  
spec:  
  type: ClusterIP  
  selector:  
    app.kubernetes.io/name: ge-{service}  
  ports:  
    - name: http  
      port: 80  
      targetPort: 8080  
      protocol: TCP  

CHECK: Service selector matches Deployment pod labels exactly
CHECK: port names are descriptive (http, grpc, metrics) not just numbers

IF: service must be accessible from host machine (Zone 1 dev)
THEN: use NodePort with port from config/ports.yaml
ANTI_PATTERN: hardcoding NodePort values — always read from config


Ingress

Zone 1 uses Traefik (bundled with k3s). Zones 2+3 use UpCloud Managed Load Balancer.

apiVersion: networking.k8s.io/v1  
kind: Ingress  
metadata:  
  name: ge-{service}  
  namespace: ge-{namespace}  
  annotations:  
    traefik.ingress.kubernetes.io/router.tls: "true"  
    traefik.ingress.kubernetes.io/router.entrypoints: websecure  
spec:  
  tls:  
    - hosts:  
        - "{service}.ge.local"  
      secretName: ge-{service}-tls  
  rules:  
    - host: "{service}.ge.local"  
      http:  
        paths:  
          - path: /  
            pathType: Prefix  
            backend:  
              service:  
                name: ge-{service}  
                port:  
                  name: http  

IF: Zone 2 or 3 (UpCloud)
THEN: use UpCloud-specific Ingress annotations
THEN: TLS termination at load balancer level

CHECK: TLS is configured on every Ingress — no plain HTTP in any zone


PodDisruptionBudget

Every production Deployment with 2+ replicas MUST have a PDB:

apiVersion: policy/v1  
kind: PodDisruptionBudget  
metadata:  
  name: ge-{service}  
  namespace: ge-{namespace}  
spec:  
  minAvailable: 1  
  selector:  
    matchLabels:  
      app.kubernetes.io/name: ge-{service}  

CHECK: PDB exists for every Deployment with replicas >= 2
CHECK: minAvailable is at least 1


HorizontalPodAutoscaler

GE caps HPA at 5 replicas to prevent token burn from runaway scaling.

apiVersion: autoscaling/v2  
kind: HorizontalPodAutoscaler  
metadata:  
  name: ge-{service}  
  namespace: ge-{namespace}  
spec:  
  scaleTargetRef:  
    apiVersion: apps/v1  
    kind: Deployment  
    name: ge-{service}  
  minReplicas: 2  
  maxReplicas: 5  
  behavior:  
    scaleUp:  
      stabilizationWindowSeconds: 120  
      policies:  
        - type: Pods  
          value: 1  
          periodSeconds: 60  
    scaleDown:  
      stabilizationWindowSeconds: 300  
  metrics:  
    - type: Resource  
      resource:  
        name: cpu  
        target:  
          type: Utilization  
          averageUtilization: 70  

CHECK: maxReplicas never exceeds 5
CHECK: scaleUp.stabilizationWindowSeconds is at least 120

ANTI_PATTERN: setting maxReplicas > 5
FIX: 5 is the hard cap — see token burn prevention rules in CLAUDE.md


Resource Limits

GE resource tier guidelines:

Tier CPU Request CPU Limit Memory Request Memory Limit Use For
Tiny 50m 200m 64Mi 256Mi CronJobs, sidecars
Small 100m 500m 128Mi 512Mi API services, admin UI
Medium 250m 1000m 256Mi 1Gi Executors, orchestrator
Large 500m 2000m 512Mi 2Gi Database proxies, heavy processing

CHECK: every container has both requests and limits
CHECK: limit-to-request ratio does not exceed 4x (prevents noisy-neighbour issues)

ANTI_PATTERN: setting no resource limits
FIX: OOMKilled and CPU throttling are worse than conservative limits

ANTI_PATTERN: setting requests equal to limits
FIX: allow burst headroom — set limits at 2-4x requests


Manifest Organization

k8s/  
├── base/                    # Base manifests (all zones)  
│   ├── agents/              # Agent executor, orchestrator  
│   ├── data/                # PostgreSQL, Redis  
│   ├── system/              # Admin UI, wiki  
│   └── monitoring/          # Health checks  
├── overlays/  
│   ├── dev/                 # Zone 1 overrides (k3s-specific)  
│   ├── staging/             # Zone 2 overrides (UpCloud)  
│   └── production/          # Zone 3 overrides (UpCloud)  
└── kustomization.yaml  

IF: resource differs between zones
THEN: put the common parts in base/, zone-specific in overlays/
THEN: use Kustomize patches, not separate full manifests


Cross-References

READ_ALSO: wiki/docs/stack/kubernetes/index.md
READ_ALSO: wiki/docs/stack/kubernetes/networking.md
READ_ALSO: wiki/docs/stack/kubernetes/security.md
READ_ALSO: wiki/docs/stack/kubernetes/operations.md
READ_ALSO: wiki/docs/stack/kubernetes/pitfalls.md
READ_ALSO: wiki/docs/stack/kubernetes/checklist.md