Skip to content

Kubernetes — Checklist

OWNER: gerco (dev), thijmen (staging), rutger (production)
ALSO_USED_BY: arjan, alex, tjitte
LAST_VERIFIED: 2026-03-26
GE_STACK_VERSION: k3s v1.34.x (Zone 1), UpCloud Managed K8s (Zones 2+3)


DEPLOYMENT CHECKLIST (new service)

  • [ ] CHECK: Deployment has livenessProbe and readinessProbe
    IF_SKIPPED: unhealthy pods receive traffic, no automatic restart
  • [ ] CHECK: Every container has resources.requests AND resources.limits
    IF_SKIPPED: resource starvation, OOMKilled without warning
    ADDED_FROM: redis-oom-2026-02, unbounded memory usage
  • [ ] CHECK: securityContext.runAsNonRoot: true is set
    IF_SKIPPED: ISO 27001 non-compliance, security audit failure
  • [ ] CHECK: allowPrivilegeEscalation: false on every container
    IF_SKIPPED: container escape risk
  • [ ] CHECK: capabilities.drop: [ALL] on every container
    IF_SKIPPED: unnecessary kernel capabilities exposed
  • [ ] CHECK: GE standard labels applied (app.kubernetes.io/name, part-of, managed-by, component, ge.zone)
    IF_SKIPPED: monitoring and service mesh routing break
  • [ ] CHECK: PodDisruptionBudget created for replicas >= 2
    IF_SKIPPED: all replicas can be evicted simultaneously during node drain
  • [ ] CHECK: HPA maxReplicas does not exceed 5
    IF_SKIPPED: token burn from runaway scaling
    ADDED_FROM: token-burn-prevention-2026-02
  • [ ] CHECK: HPA scaleUp.stabilizationWindowSeconds >= 120
    IF_SKIPPED: flapping scale events
  • [ ] CHECK: Service selector matches Deployment pod labels exactly
    IF_SKIPPED: traffic goes nowhere — silent failure

NETWORKING CHECKLIST (new service)

  • [ ] CHECK: Service uses ClusterIP (NodePort only for Zone 1 LAN access)
    IF_SKIPPED: unnecessary host port exposure
  • [ ] CHECK: Port numbers read from config/ports.yaml, not hardcoded
    IF_SKIPPED: port conflicts, configuration drift
    ADDED_FROM: redis-port-6381-2026-02
  • [ ] CHECK: Ingress has TLS configured — no plain HTTP
    IF_SKIPPED: traffic interceptable, compliance violation
  • [ ] CHECK: NetworkPolicy default-deny-ingress exists in namespace
    IF_SKIPPED: all pods accept traffic from anywhere
  • [ ] CHECK: Cross-namespace flows have explicit NetworkPolicy allow rules
    IF_SKIPPED: traffic blocked after default-deny is applied
  • [ ] CHECK: No hostNetwork: true in pod spec
    IF_SKIPPED: port conflicts on rolling updates
    ADDED_FROM: executor-scaling-2026-02

SECRETS CHECKLIST (new service)

  • [ ] CHECK: All secrets come via ExternalSecret from Vault
    IF_SKIPPED: secrets in git — compliance violation
  • [ ] CHECK: No Secret manifests with plain-text values committed
    IF_SKIPPED: credential leak
  • [ ] CHECK: ServiceAccount can only read its own Vault path
    IF_SKIPPED: cross-service secret access
  • [ ] CHECK: Redis auth uses password from ge-secrets secret, key redis-password
    IF_SKIPPED: Redis connection refused
    ADDED_FROM: orchestrator-redis-auth-2026-03

IMAGE CHECKLIST (Zone 1)

  • [ ] CHECK: Image built with build-executor.sh (or equivalent build script)
    IF_SKIPPED: stale code deployed
  • [ ] CHECK: Image imported to k3s via k3s ctr images import
    IF_SKIPPED: ImagePullBackOff
  • [ ] CHECK: imagePullPolicy is Never or IfNotPresent in Zone 1
    IF_SKIPPED: ImagePullBackOff — no registry exists
    ADDED_FROM: executor-deployment-2026-02
  • [ ] CHECK: No kubectl cp used to patch running pods
    IF_SKIPPED: Python module caching makes patches invisible
    ADDED_FROM: executor-hotpatch-failure-2026-02

PRE-PROMOTION CHECKLIST (Zone 1 → Zone 2)

  • [ ] CHECK: Service runs stable in Zone 1 for minimum 1 week
    IF_SKIPPED: untested in production-like conditions
  • [ ] CHECK: Resource limits are validated against actual usage in Zone 1
    IF_SKIPPED: over- or under-provisioned in staging
  • [ ] CHECK: NetworkPolicies tested in Zone 1
    IF_SKIPPED: unexpected traffic blocks in Zone 2
  • [ ] CHECK: Image uses immutable tag (semver or SHA), not :latest
    IF_SKIPPED: deployment reproducibility lost
  • [ ] CHECK: thijmen (staging) has approved the promotion
    IF_SKIPPED: uncoordinated deployment

CROSS-REFERENCES

READ_ALSO: wiki/docs/stack/kubernetes/index.md
READ_ALSO: wiki/docs/stack/kubernetes/manifests.md
READ_ALSO: wiki/docs/stack/kubernetes/pitfalls.md
READ_ALSO: wiki/docs/stack/kubernetes/security.md