Deployment Pipeline¶

The full deploy chain from merged code to running production. Every step is automated, auditable, and reversible.

Pipeline Overview¶

Code Merged (Marta/Iwona approval)
  ↓
Leon (Deployment Coordinator) — orchestrates entire chain
  ↓
Boris / Yoanna (DB Migration) — schema changes first
  ↓
Arjan (Infrastructure) — Terraform apply if infra changes
  ↓
Container Build (CI) — image built, tagged, scanned
  ↓
Thijmen (Staging Verify) — deploy to staging, verify
  ↓
Rutger (Production Apply) — deploy to production
  ↓
Stef (Network) — DNS, certificates, routing
  ↓
Karel (CDN) — edge cache invalidation, asset deploy
  ↓
Otto (Backup) — post-deploy backup verification

Step 1: Deployment Orchestration¶

Owner: Leon (Deployment Coordinator) Input: Merged PR with approved changes Output: Deployment plan with step sequence

What Leon does¶

Leon receives notification of a merged PR and creates a deployment plan. The plan determines:

Which steps are needed (not every deploy needs DB migration or infra changes)
The order of execution
Rollback procedures for each step
Success criteria for each step

What blocks this step¶

PR not approved by Marta/Iwona (merge gate)
Container image build failed
Previous deployment still in progress
Active incident in production

Rollback¶

Leon coordinates rollback across all steps. If any downstream step fails, Leon triggers rollback in reverse order.

Step 2: Database Migration¶

Owner: Boris (DBA, Alfa) / Yoanna (DBA, Bravo) Input: Migration files from the merged PR Output: Schema changes applied to the target environment

What Boris/Yoanna do¶

Database migrations run before application deployment. This ensures the new code deploys against the new schema. The migration process:

Review migration SQL — verify it is reversible
Run migration on staging — verify it succeeds
Verify data integrity — run constraint checks
Run migration on production — within a maintenance window if destructive
Verify production schema — matches expected state

Migration rules¶

Every migration must be reversible. Include both up and down SQL. If a migration cannot be reversed (e.g., dropping a column with data), document the data recovery procedure.
Never run destructive migrations without a backup. Otto verifies backup exists before Boris/Yoanna proceed.
Migrations are additive first. Add the new column, deploy new code, then remove the old column in a separate deployment. Never rename/remove and deploy simultaneously.
Use Drizzle migrations. Migrations live in drizzle/migrations/. No hand-written SQL applied directly.

What blocks this step¶

Migration is not reversible and no recovery procedure documented
Backup not verified by Otto
Migration fails on staging
Data integrity check fails after migration

Rollback¶

Run the down migration. If the down migration fails, restore from the pre-migration backup (Otto provides).

Step 3: Infrastructure Changes¶

Owner: Arjan (Infrastructure Provisioner) Input: Terraform changes from the merged PR Output: Infrastructure updated to match desired state

What Arjan does¶

When the deployment includes infrastructure changes (new services, scaling changes, network policy updates), Arjan applies them via Terraform:

terraform plan — review the diff, verify no unintended changes
Apply to staging — verify infrastructure is functional
Apply to production — after staging verification

Infrastructure-as-code rules¶

All infrastructure is defined in Terraform. No ClickOps, no manual kubectl apply, no imperative commands.
State is remote. Terraform state is stored centrally, not on anyone's machine.
Plan before apply. Every terraform apply is preceded by a terraform plan review. No blind applies.
Blast radius limits. Changes affecting more than 5 resources require human approval.

What blocks this step¶

terraform plan shows unintended changes
Staging infrastructure verification fails
Blast radius exceeds limit without human approval
State lock held by another process

Rollback¶

Re-apply the previous Terraform state. Terraform tracks state history, making rollback deterministic.

Step 4: Container Build¶

Owner: CI pipeline (automated) Input: Source code from the merged commit Output: Tagged container image, security-scanned

What happens¶

Build — Multi-stage Docker build from Dockerfile
Tag — Image tagged with commit SHA (never latest)
Scan — Security scan for known vulnerabilities
Import — Image imported to k3s container registry (docker save | k3s ctr images import)

Build rules¶

Build runs in CI, not on developer machines
Base images pinned to digest
Build cache used for speed, but cache is never shipped
Image size enforced (no build tools in runtime image)
Scan must pass — no critical or high vulnerabilities

What blocks this step¶

Dockerfile syntax error
Build failure (dependency resolution, compilation)
Security scan finds critical/high vulnerability
Image exceeds size limit

Rollback¶

Not applicable — the image exists or it does not. If the build fails, deployment stops.

Step 5: Staging Verification¶

Owner: Thijmen (Kubernetes Operator) Input: Container image, deployment manifests Output: Staging environment running and verified

What Thijmen does¶

Thijmen deploys to the staging environment and runs verification:

Deploy — Apply Kubernetes manifests to staging namespace
Health check — All pods running, readiness probes passing
Smoke tests — Core functionality works end-to-end
Integration tests — Full test suite against staging
Performance baseline — Response times within expected range
Compare — Staging behavior matches previous staging deployment

Staging rules¶

Staging mirrors production topology. Same services, same configuration structure (different values), same network policies.
Staging uses production-equivalent data. Anonymized production data snapshot, not synthetic data.
Staging runs for at least 15 minutes before production deploy is approved. Some issues only appear under sustained load.

What blocks this step¶

Pod crash loops
Readiness probe failures
Smoke test failures
Integration test failures
Performance regression (> 20% slower than baseline)

Rollback¶

Redeploy previous staging manifests. Staging rollback is practice for production rollback.

Step 6: Production Apply¶

Owner: Rutger (Production Operations Engineer) Input: Staging-verified container image and manifests Output: Production running new version

What Rutger does¶

Rutger is the final human gate. Before applying to production:

Verify image match — Production image tag matches staging
Check maintenance window — Deploy during low-traffic period for high-risk changes
Verify rollback procedure — Tested in staging
Apply — Rolling update to production
Monitor — Watch health metrics for 15 minutes post-deploy
Declare success — Or trigger rollback

Rolling update strategy¶

maxSurge: 1 — At most 1 extra pod during update
maxUnavailable: 0 — Never fewer pods than desired
minReadySeconds: 30 — Pod must be healthy for 30s before old pod is terminated
progressDeadlineSeconds: 300 — Fail if update takes > 5 min

What blocks this step¶

Staging verification not passed
Active incident in production
No rollback procedure documented
Deploy outside maintenance window for high-risk changes
Otto has not confirmed backup exists

Rollback¶

kubectl rollout undo deployment/{name} -n {namespace}

Rollback to the previous ReplicaSet. The previous container image is still in the registry. Rollback takes < 60 seconds.

Step 7: Network Configuration¶

Owner: Stef (Network + DNS + Certificates Engineer) Input: Network changes from the deployment plan Output: DNS records, TLS certificates, routing rules updated

What Stef does¶

When the deployment includes network changes:

DNS updates — New records, changed records, TTL adjustments
TLS certificates — New certificates issued, renewals processed
Ingress rules — Routing updates for new endpoints
Network policies — Firewall rules for new services
Load balancer — Backend pool updates

Network rules¶

DNS through TransIP API — no manual DNS edits
Certificates via Let's Encrypt — automated issuance and renewal
All traffic TLS — no plaintext HTTP in production
Network policies default-deny — explicit allow rules only

What blocks this step¶

DNS propagation failure
Certificate issuance failure
Network policy blocks required traffic
Load balancer health check failure

Rollback¶

Revert DNS records (TTL-dependent), revert ingress rules, revert network policies. Certificate rollback not needed (old certificate remains valid).

Step 8: CDN and Edge¶

Owner: Karel (Edge Platform Specialist) Input: Static assets and cache invalidation requirements Output: Edge caches updated, assets deployed

What Karel does¶

Asset deployment — Push new static assets to CDN origin
Cache invalidation — Purge stale cached content
Edge rules — Update edge-side routing or transformation rules
Verification — Confirm assets are served from edge with correct headers and content

CDN rules¶

EU-only routing — bunny.net routing filter ensures traffic stays within EU (see EU Data Sovereignty)
Cache-busting via content hash — filenames include hash, eliminating stale cache issues
Immutable assets — Static files are never overwritten, only new versions are deployed
Origin shield — Reduce origin load by routing through a shield PoP

What blocks this step¶

Asset verification failure (wrong content, missing files)
Cache invalidation incomplete
Edge rules syntax error

Rollback¶

Deploy previous asset version. Since assets are content-hashed and never overwritten, the previous version is still on the origin.

Step 9: Backup Verification¶

Owner: Otto (Backup Guardian + BCP) Input: Post-deployment production state Output: Backup verification report

What Otto does¶

After every production deployment, Otto verifies:

Pre-deploy backup exists — Taken before Step 6
Post-deploy backup runs — Captures the new production state
Backup restore test — Verifies the backup can be restored (monthly full test, per-deploy spot check)
Retention compliance — Backup retention meets policy
Cross-zone replication — Backup exists in a different availability zone

Backup rules¶

Pre-deploy backup is mandatory. Rutger will not apply without Otto's confirmation.
Database and persistent volumes are backed up
Backup encryption — All backups encrypted at rest
Backup location — EU-only storage (see EU Data Sovereignty)
Retention — 30 daily, 12 monthly, 7 yearly

What blocks this step¶

Backup creation failure
Backup verification failure
Restore test failure (monthly)

Rollback¶

Not applicable to this step — Otto verifies, does not deploy. If backup verification fails, the deployment is flagged for rollback review.

Three-Zone Separation¶

GE operates three zones with strict boundaries:

Development¶

Local k3s cluster on developer machines
Full stack runs locally
No access to staging or production data
Free to experiment, break things, iterate

Staging¶

k3s cluster mirroring production topology
Anonymized production data snapshot
Same network policies as production
Performance testing runs here
No customer-facing traffic

Production¶

k3s production cluster
Real customer data, real traffic
Changes only through deployment pipeline
No direct access — all operations through tooling
24/7 monitoring and alerting

Zone boundaries¶

What	Dev	Staging	Prod
Data	Synthetic	Anonymized prod	Real
Access	Open	Restricted	Pipeline only
Changes	Direct	Pipeline	Pipeline
Monitoring	Optional	Required	Required + alerting
Backups	None	Daily	Pre/post-deploy + daily

Pipeline Timing¶

Step	Owner	Typical Duration	Parallel
Orchestration	Leon	1 min	No
DB Migration	Boris/Yoanna	2-10 min	No
Infrastructure	Arjan	3-15 min	No
Container Build	CI	3-5 min	With Step 2-3
Staging Verify	Thijmen	15-30 min	No
Production Apply	Rutger	5-10 min	No
Network	Stef	2-5 min	With Step 6
CDN	Karel	2-5 min	With Step 6
Backup Verify	Otto	5-10 min	After Step 6

Total typical deployment time: 30-60 minutes for standard changes. Critical changes with extended staging verification: up to 2 hours.

Ownership¶

Role	Agent	Responsibility
Deployment Coordinator	Leon	Pipeline orchestration, sequencing
Production Operations	Rutger	Production apply, monitoring, rollback
DBA (Alfa)	Boris	Database migrations
DBA (Bravo)	Yoanna	Database migrations
Infrastructure	Arjan	Terraform, infrastructure-as-code
Kubernetes Operator	Thijmen	Staging deploy and verification
Network Engineer	Stef	DNS, TLS, routing, network policies
Edge Specialist	Karel	CDN, edge caching, asset deployment
Backup Guardian	Otto	Pre/post-deploy backups, restore testing
Sysadmin	Gerco	Host-level operations, OS updates