Skip to content

Deployment Pipeline

The full deploy chain from merged code to running production. Every step is automated, auditable, and reversible.


Pipeline Overview

Code Merged (Marta/Iwona approval)
Leon (Deployment Coordinator) — orchestrates entire chain
Boris / Yoanna (DB Migration) — schema changes first
Arjan (Infrastructure) — Terraform apply if infra changes
Container Build (CI) — image built, tagged, scanned
Thijmen (Staging Verify) — deploy to staging, verify
Rutger (Production Apply) — deploy to production
Stef (Network) — DNS, certificates, routing
Karel (CDN) — edge cache invalidation, asset deploy
Otto (Backup) — post-deploy backup verification

Step 1: Deployment Orchestration

Owner: Leon (Deployment Coordinator) Input: Merged PR with approved changes Output: Deployment plan with step sequence

What Leon does

Leon receives notification of a merged PR and creates a deployment plan. The plan determines:

  • Which steps are needed (not every deploy needs DB migration or infra changes)
  • The order of execution
  • Rollback procedures for each step
  • Success criteria for each step

What blocks this step

  • PR not approved by Marta/Iwona (merge gate)
  • Container image build failed
  • Previous deployment still in progress
  • Active incident in production

Rollback

Leon coordinates rollback across all steps. If any downstream step fails, Leon triggers rollback in reverse order.


Step 2: Database Migration

Owner: Boris (DBA, Alfa) / Yoanna (DBA, Bravo) Input: Migration files from the merged PR Output: Schema changes applied to the target environment

What Boris/Yoanna do

Database migrations run before application deployment. This ensures the new code deploys against the new schema. The migration process:

  1. Review migration SQL — verify it is reversible
  2. Run migration on staging — verify it succeeds
  3. Verify data integrity — run constraint checks
  4. Run migration on production — within a maintenance window if destructive
  5. Verify production schema — matches expected state

Migration rules

  • Every migration must be reversible. Include both up and down SQL. If a migration cannot be reversed (e.g., dropping a column with data), document the data recovery procedure.
  • Never run destructive migrations without a backup. Otto verifies backup exists before Boris/Yoanna proceed.
  • Migrations are additive first. Add the new column, deploy new code, then remove the old column in a separate deployment. Never rename/remove and deploy simultaneously.
  • Use Drizzle migrations. Migrations live in drizzle/migrations/. No hand-written SQL applied directly.

What blocks this step

  • Migration is not reversible and no recovery procedure documented
  • Backup not verified by Otto
  • Migration fails on staging
  • Data integrity check fails after migration

Rollback

Run the down migration. If the down migration fails, restore from the pre-migration backup (Otto provides).


Step 3: Infrastructure Changes

Owner: Arjan (Infrastructure Provisioner) Input: Terraform changes from the merged PR Output: Infrastructure updated to match desired state

What Arjan does

When the deployment includes infrastructure changes (new services, scaling changes, network policy updates), Arjan applies them via Terraform:

  1. terraform plan — review the diff, verify no unintended changes
  2. Apply to staging — verify infrastructure is functional
  3. Apply to production — after staging verification

Infrastructure-as-code rules

  • All infrastructure is defined in Terraform. No ClickOps, no manual kubectl apply, no imperative commands.
  • State is remote. Terraform state is stored centrally, not on anyone's machine.
  • Plan before apply. Every terraform apply is preceded by a terraform plan review. No blind applies.
  • Blast radius limits. Changes affecting more than 5 resources require human approval.

What blocks this step

  • terraform plan shows unintended changes
  • Staging infrastructure verification fails
  • Blast radius exceeds limit without human approval
  • State lock held by another process

Rollback

Re-apply the previous Terraform state. Terraform tracks state history, making rollback deterministic.


Step 4: Container Build

Owner: CI pipeline (automated) Input: Source code from the merged commit Output: Tagged container image, security-scanned

What happens

  1. Build — Multi-stage Docker build from Dockerfile
  2. Tag — Image tagged with commit SHA (never latest)
  3. Scan — Security scan for known vulnerabilities
  4. Import — Image imported to k3s container registry (docker save | k3s ctr images import)

Build rules

  • Build runs in CI, not on developer machines
  • Base images pinned to digest
  • Build cache used for speed, but cache is never shipped
  • Image size enforced (no build tools in runtime image)
  • Scan must pass — no critical or high vulnerabilities

What blocks this step

  • Dockerfile syntax error
  • Build failure (dependency resolution, compilation)
  • Security scan finds critical/high vulnerability
  • Image exceeds size limit

Rollback

Not applicable — the image exists or it does not. If the build fails, deployment stops.


Step 5: Staging Verification

Owner: Thijmen (Kubernetes Operator) Input: Container image, deployment manifests Output: Staging environment running and verified

What Thijmen does

Thijmen deploys to the staging environment and runs verification:

  1. Deploy — Apply Kubernetes manifests to staging namespace
  2. Health check — All pods running, readiness probes passing
  3. Smoke tests — Core functionality works end-to-end
  4. Integration tests — Full test suite against staging
  5. Performance baseline — Response times within expected range
  6. Compare — Staging behavior matches previous staging deployment

Staging rules

  • Staging mirrors production topology. Same services, same configuration structure (different values), same network policies.
  • Staging uses production-equivalent data. Anonymized production data snapshot, not synthetic data.
  • Staging runs for at least 15 minutes before production deploy is approved. Some issues only appear under sustained load.

What blocks this step

  • Pod crash loops
  • Readiness probe failures
  • Smoke test failures
  • Integration test failures
  • Performance regression (> 20% slower than baseline)

Rollback

Redeploy previous staging manifests. Staging rollback is practice for production rollback.


Step 6: Production Apply

Owner: Rutger (Production Operations Engineer) Input: Staging-verified container image and manifests Output: Production running new version

What Rutger does

Rutger is the final human gate. Before applying to production:

  1. Verify image match — Production image tag matches staging
  2. Check maintenance window — Deploy during low-traffic period for high-risk changes
  3. Verify rollback procedure — Tested in staging
  4. Apply — Rolling update to production
  5. Monitor — Watch health metrics for 15 minutes post-deploy
  6. Declare success — Or trigger rollback

Rolling update strategy

  • maxSurge: 1 — At most 1 extra pod during update
  • maxUnavailable: 0 — Never fewer pods than desired
  • minReadySeconds: 30 — Pod must be healthy for 30s before old pod is terminated
  • progressDeadlineSeconds: 300 — Fail if update takes > 5 min

What blocks this step

  • Staging verification not passed
  • Active incident in production
  • No rollback procedure documented
  • Deploy outside maintenance window for high-risk changes
  • Otto has not confirmed backup exists

Rollback

kubectl rollout undo deployment/{name} -n {namespace}

Rollback to the previous ReplicaSet. The previous container image is still in the registry. Rollback takes < 60 seconds.


Step 7: Network Configuration

Owner: Stef (Network + DNS + Certificates Engineer) Input: Network changes from the deployment plan Output: DNS records, TLS certificates, routing rules updated

What Stef does

When the deployment includes network changes:

  1. DNS updates — New records, changed records, TTL adjustments
  2. TLS certificates — New certificates issued, renewals processed
  3. Ingress rules — Routing updates for new endpoints
  4. Network policies — Firewall rules for new services
  5. Load balancer — Backend pool updates

Network rules

  • DNS through TransIP API — no manual DNS edits
  • Certificates via Let's Encrypt — automated issuance and renewal
  • All traffic TLS — no plaintext HTTP in production
  • Network policies default-deny — explicit allow rules only

What blocks this step

  • DNS propagation failure
  • Certificate issuance failure
  • Network policy blocks required traffic
  • Load balancer health check failure

Rollback

Revert DNS records (TTL-dependent), revert ingress rules, revert network policies. Certificate rollback not needed (old certificate remains valid).


Step 8: CDN and Edge

Owner: Karel (Edge Platform Specialist) Input: Static assets and cache invalidation requirements Output: Edge caches updated, assets deployed

What Karel does

  1. Asset deployment — Push new static assets to CDN origin
  2. Cache invalidation — Purge stale cached content
  3. Edge rules — Update edge-side routing or transformation rules
  4. Verification — Confirm assets are served from edge with correct headers and content

CDN rules

  • EU-only routing — bunny.net routing filter ensures traffic stays within EU (see EU Data Sovereignty)
  • Cache-busting via content hash — filenames include hash, eliminating stale cache issues
  • Immutable assets — Static files are never overwritten, only new versions are deployed
  • Origin shield — Reduce origin load by routing through a shield PoP

What blocks this step

  • Asset verification failure (wrong content, missing files)
  • Cache invalidation incomplete
  • Edge rules syntax error

Rollback

Deploy previous asset version. Since assets are content-hashed and never overwritten, the previous version is still on the origin.


Step 9: Backup Verification

Owner: Otto (Backup Guardian + BCP) Input: Post-deployment production state Output: Backup verification report

What Otto does

After every production deployment, Otto verifies:

  1. Pre-deploy backup exists — Taken before Step 6
  2. Post-deploy backup runs — Captures the new production state
  3. Backup restore test — Verifies the backup can be restored (monthly full test, per-deploy spot check)
  4. Retention compliance — Backup retention meets policy
  5. Cross-zone replication — Backup exists in a different availability zone

Backup rules

  • Pre-deploy backup is mandatory. Rutger will not apply without Otto's confirmation.
  • Database and persistent volumes are backed up
  • Backup encryption — All backups encrypted at rest
  • Backup location — EU-only storage (see EU Data Sovereignty)
  • Retention — 30 daily, 12 monthly, 7 yearly

What blocks this step

  • Backup creation failure
  • Backup verification failure
  • Restore test failure (monthly)

Rollback

Not applicable to this step — Otto verifies, does not deploy. If backup verification fails, the deployment is flagged for rollback review.


Three-Zone Separation

GE operates three zones with strict boundaries:

Development

  • Local k3s cluster on developer machines
  • Full stack runs locally
  • No access to staging or production data
  • Free to experiment, break things, iterate

Staging

  • k3s cluster mirroring production topology
  • Anonymized production data snapshot
  • Same network policies as production
  • Performance testing runs here
  • No customer-facing traffic

Production

  • k3s production cluster
  • Real customer data, real traffic
  • Changes only through deployment pipeline
  • No direct access — all operations through tooling
  • 24/7 monitoring and alerting

Zone boundaries

What Dev Staging Prod
Data Synthetic Anonymized prod Real
Access Open Restricted Pipeline only
Changes Direct Pipeline Pipeline
Monitoring Optional Required Required + alerting
Backups None Daily Pre/post-deploy + daily

Pipeline Timing

Step Owner Typical Duration Parallel
Orchestration Leon 1 min No
DB Migration Boris/Yoanna 2-10 min No
Infrastructure Arjan 3-15 min No
Container Build CI 3-5 min With Step 2-3
Staging Verify Thijmen 15-30 min No
Production Apply Rutger 5-10 min No
Network Stef 2-5 min With Step 6
CDN Karel 2-5 min With Step 6
Backup Verify Otto 5-10 min After Step 6

Total typical deployment time: 30-60 minutes for standard changes. Critical changes with extended staging verification: up to 2 hours.


Ownership

Role Agent Responsibility
Deployment Coordinator Leon Pipeline orchestration, sequencing
Production Operations Rutger Production apply, monitoring, rollback
DBA (Alfa) Boris Database migrations
DBA (Bravo) Yoanna Database migrations
Infrastructure Arjan Terraform, infrastructure-as-code
Kubernetes Operator Thijmen Staging deploy and verification
Network Engineer Stef DNS, TLS, routing, network policies
Edge Specialist Karel CDN, edge caching, asset deployment
Backup Guardian Otto Pre/post-deploy backups, restore testing
Sysadmin Gerco Host-level operations, OS updates