Traefik Kubernetes Migration Guide¶
Last Updated: 2026-01-29 Status: Active Maintained by: GE Infrastructure Team
Overview¶
This document details the migration of Traefik from a Docker-only deployment to a hybrid Docker+Kubernetes architecture. The migration was necessary to provide native Kubernetes IngressController capabilities while maintaining existing Let's Encrypt SSL certificates and production services.
Key Outcome: Two separate Traefik instances working in tandem without port conflicts or SSL certificate issues.
Table of Contents¶
- Architecture Evolution
- Why Two Traefik Instances
- Critical Configuration: ClusterIP Service
- SSL Certificate Conflict Resolution
- Kubernetes Traefik Configuration
- RBAC Configuration
- IngressClass and Ingress Patterns
- Let's Encrypt Management
- Migration Procedure
- Rollback Procedure
- Troubleshooting
Architecture Evolution¶
Old Architecture (Docker Traefik Only)¶
flowchart TB
Users[External Users] --> DockerTraefik[Docker Traefik<br/>80/443<br/>Let's Encrypt]
DockerTraefik --> AdminUI[Admin UI<br/>office.growing-europe.com]
DockerTraefik --> DockerServices[Other Docker Services]
K8sServices[K8s Services<br/>No Ingress] -.No routing.-> DockerTraefik
Limitations: - No native Kubernetes Ingress support - Manual routing configuration required for K8s services - No automatic service discovery for K8s workloads - Difficult to scale K8s-based hosting
New Architecture (Hybrid Docker + K8s)¶
flowchart TB
Users[External Users] --> DockerTraefik[Docker Traefik<br/>80/443<br/>Let's Encrypt SSL]
DockerTraefik --> AdminUI[Admin UI<br/>office.growing-europe.com]
DockerTraefik --> DockerServices[Other Docker Services]
DockerTraefik --> K8sTraefik[K8s Traefik<br/>ClusterIP<br/>IngressController]
K8sTraefik --> ClientWorkloads[Client Workloads<br/>*.hosting.growing-europe.com]
K8sTraefik --> K8sServices[K8s Services<br/>Auto-discovered]
Improvements: - Native Kubernetes Ingress resources - Automatic service discovery via IngressController - Scalable multi-tenant hosting - Unified SSL certificate management - No port conflicts
Why Two Traefik Instances¶
Rationale¶
The hybrid architecture solves several critical challenges:
- Existing SSL Certificates
- Docker Traefik holds production Let's Encrypt certificates in
acme.json - Migrating certificates would cause downtime
-
Multiple domains already configured and renewed automatically
-
Port Binding Constraints
- K3s LoadBalancer services use
svclb(ServiceLB) to bind host ports - If K8s Traefik uses LoadBalancer, it attempts to bind ports 80/443
- Docker Traefik already owns these ports on the host
-
Result: Port conflict, service fails to start
-
Operational Continuity
- Docker services (admin-ui, Redis, Vault) need continued access
- Zero downtime migration required
- Gradual rollout of K8s hosting capabilities
Division of Responsibilities¶
| Component | Docker Traefik | K8s Traefik |
|---|---|---|
| Port Binding | 80/443 on host | None (ClusterIP) |
| SSL Certificates | Let's Encrypt via acme.json | None (passes through) |
| Routing | Docker containers, proxy to K8s | Kubernetes Ingress resources |
| Service Discovery | Docker labels | K8s Ingress/Service resources |
| High Availability | Single instance | 2 replicas with anti-affinity |
Critical Configuration: ClusterIP Service¶
The ClusterIP Requirement¶
CRITICAL: The K8s Traefik service MUST be ClusterIP, NOT LoadBalancer.
File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-service.yaml
apiVersion: v1
kind: Service
metadata:
name: traefik
namespace: ge-ingress
annotations:
# WARNING: Keep as ClusterIP - Docker Traefik owns 80/443 on host
ge.hosting/note: "ClusterIP only - Docker Traefik handles external ingress"
spec:
# ClusterIP - NOT LoadBalancer (would conflict with Docker Traefik)
type: ClusterIP
selector:
app: traefik
ports:
- name: web
port: 80
targetPort: web
protocol: TCP
- name: websecure
port: 443
targetPort: websecure
protocol: TCP
What Happens If You Change to LoadBalancer¶
If you change the service type to LoadBalancer:
- K3s ServiceLB (svclb) activates
- Creates DaemonSet pods on all nodes
-
Attempts to bind ports 80/443 on the host
-
Port Conflict
- Docker Traefik already owns 80/443
- K3s svclb fails to bind ports
-
Service remains in "Pending" state
-
SSL Certificate Conflict
- External traffic routes to K8s Traefik instead of Docker Traefik
- K8s Traefik has no Let's Encrypt certificates
- SSL verification fails
-
office.growing-europe.comreturns SSL errors -
Production Outage
- Admin UI becomes unreachable
- Existing Docker services lose ingress
- Certificate renewal fails
Verification¶
Check that the service is correctly configured:
# Verify service type
kubectl get svc traefik -n ge-ingress -o jsonpath='{.spec.type}'
# Expected output: ClusterIP
# Verify no external IP
kubectl get svc traefik -n ge-ingress
# Should show: EXTERNAL-IP <none>
# Check for svclb pods (should not exist for ClusterIP)
kubectl get pods -n kube-system | grep svclb-traefik
# Expected: No results
SSL Certificate Conflict Resolution¶
The office.growing-europe.com Issue¶
Problem Encountered: During initial implementation, K8s Traefik was configured as LoadBalancer. This caused:
- K3s svclb intercepted port 443 traffic
- Traffic routed to K8s Traefik (no certificates)
office.growing-europe.comshowed SSL errors- Let's Encrypt renewal failed
Root Cause: Two ingress controllers competing for the same ports with different SSL configurations.
Solution¶
Step 1: Changed K8s Traefik service to ClusterIP
Step 2: Added warning comments to prevent regression
Step 3: Added corresponding warning in Docker Compose
# In docker-compose.yml traefik service
# IMPORTANT: Docker Traefik is the PRODUCTION ingress for minisforum.
# K8s Traefik (k8s/base/ingress/) is ClusterIP only for internal routing.
# DO NOT enable K8s Traefik LoadBalancer - it conflicts with ports 80/443.
Step 4: Documented architecture in both locations
Traffic Flow (Corrected)¶
External Request (HTTPS)
↓ (DNS: *.growing-europe.com → server IP)
↓
Docker Traefik (port 443)
↓ (Let's Encrypt SSL termination)
↓
Route Decision:
├─ office.growing-europe.com → Admin UI (Docker)
├─ *.hosting.growing-europe.com → K8s Traefik (ClusterIP)
└─ Other domains → Docker services
↓
K8s Traefik IngressController
↓ (Reads Ingress resources)
↓
Client Workload Services (ClusterIP)
↓
Client Pods
Kubernetes Traefik Configuration¶
Static Configuration¶
File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: traefik-config
namespace: ge-ingress
data:
traefik.yaml: |
# Traefik Static Configuration (v2.11)
# API and Dashboard
api:
dashboard: true
insecure: false
# Entry Points
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
permanent: true
websecure:
address: ":443"
http:
tls:
certResolver: letsencrypt
traefik:
address: ":8080"
# Certificate Resolvers
certificatesResolvers:
letsencrypt:
acme:
email: dirkjan@growing-europe.com
storage: /data/acme.json
httpChallenge:
entryPoint: web
# Providers
providers:
kubernetesIngress:
namespaces:
- ge-ingress
- ge-hosting
- ge-system
ingressClass: traefik
kubernetesCRD:
namespaces:
- ge-ingress
- ge-hosting
- ge-system
# Logging
log:
level: INFO
format: json
# Metrics for Prometheus
metrics:
prometheus:
entryPoint: traefik
addEntryPointsLabels: true
addServicesLabels: true
# Ping for health checks
ping:
entryPoint: traefik
Key Configuration Points:
- Certificate Resolver: Configured but not actively used (Docker Traefik handles SSL)
- Namespace Watching: Limited to specific namespaces for security
- IngressClass: Set to
traefikfor explicit routing - Metrics: Prometheus scraping enabled on port 8080
Deployment Configuration¶
File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-deployment.yaml
High availability deployment with 2 replicas:
apiVersion: apps/v1
kind: Deployment
metadata:
name: traefik
namespace: ge-ingress
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
# ... pod template with anti-affinity rules
Features: - 2 Replicas: High availability - Pod Anti-Affinity: Spreads replicas across nodes - Rolling Updates: Zero downtime during updates - Security Context: Non-root user, read-only filesystem - Resource Limits: CPU 100m-500m, Memory 128Mi-256Mi
RBAC Configuration¶
Required Permissions¶
File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-rbac.yaml
Traefik requires cluster-wide permissions to watch Ingress resources:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: traefik-ingress-controller
rules:
# Core API resources
- apiGroups: [""]
resources: ["services", "endpoints", "secrets"]
verbs: ["get", "list", "watch"]
# Ingress resources
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses", "ingressclasses"]
verbs: ["get", "list", "watch"]
# Update ingress status
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses/status"]
verbs: ["update"]
# Traefik CRDs
- apiGroups: ["traefik.io", "traefik.containo.us"]
resources:
- ingressroutes
- middlewares
- tlsoptions
- traefikservices
verbs: ["get", "list", "watch"]
Security Considerations:
- Read-Only Access: Traefik can only read resources, not modify them
- Status Updates: Can update Ingress status (required for LoadBalancer IP)
- No Secrets Write: Cannot create or modify secrets
- Namespace-Scoped Watching: Limited to specific namespaces in configuration
ServiceAccount¶
Binding:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik
namespace: ge-ingress
IngressClass and Ingress Patterns¶
IngressClass Resource¶
File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-ingressclass.yaml
apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
name: traefik
annotations:
ingressclass.kubernetes.io/is-default-class: "true"
spec:
controller: traefik.io/ingress-controller
Purpose:
- Declares traefik as the default IngressClass
- All Ingress resources without explicit ingressClassName use this controller
- Prevents conflicts if multiple ingress controllers are installed
Standard Ingress Pattern¶
Client workloads use this pattern:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web
namespace: sh-acme-corp
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
spec:
ingressClassName: traefik
tls:
- hosts:
- acme-corp.hosting.growing-europe.com
secretName: acme-corp-tls
rules:
- host: acme-corp.hosting.growing-europe.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80
Annotations Explained:
| Annotation | Purpose |
|---|---|
router.entrypoints: websecure |
Only accept HTTPS traffic |
router.tls: "true" |
Enable TLS on this route |
router.tls.certresolver: letsencrypt |
Use Let's Encrypt for SSL |
Wildcard Ingress (Hosting Landing Page)¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: hosting-wildcard
namespace: ge-hosting
spec:
ingressClassName: traefik
rules:
- host: hosting.growing-europe.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: hosting-landing
port:
number: 80
Let's Encrypt Management¶
Certificate Storage¶
Location: Docker Traefik only
All Let's Encrypt certificates are stored in:
File Permissions:
Certificate Issuance Flow¶
sequenceDiagram
participant Client
participant DockerTraefik
participant LetsEncrypt
participant K8sTraefik
participant ClientPod
Client->>DockerTraefik: HTTPS request to acme-corp.hosting.growing-europe.com
DockerTraefik->>DockerTraefik: Check acme.json for certificate
alt Certificate exists
DockerTraefik->>K8sTraefik: Forward with SSL termination
K8sTraefik->>ClientPod: Route via Ingress
ClientPod->>K8sTraefik: Response
K8sTraefik->>DockerTraefik: Response
DockerTraefik->>Client: HTTPS response
else Certificate missing
DockerTraefik->>LetsEncrypt: HTTP-01 challenge request
LetsEncrypt->>DockerTraefik: Challenge token
DockerTraefik->>LetsEncrypt: Challenge response
LetsEncrypt->>DockerTraefik: Certificate issued
DockerTraefik->>DockerTraefik: Save to acme.json
DockerTraefik->>K8sTraefik: Forward with SSL termination
end
Important Notes¶
- K8s Traefik does NOT issue certificates
- It receives already-decrypted traffic from Docker Traefik
- Certificate annotations in Ingress are informational only
-
Docker Traefik handles all Let's Encrypt operations
-
Certificate Renewal
- Automatic renewal 30 days before expiry
- Handled entirely by Docker Traefik
-
No K8s restarts required
-
Backup Strategy
Migration Procedure¶
If migrating from scratch or to a new environment:
Pre-Migration Checklist¶
- [ ] Backup existing
traefik/acme.json - [ ] Document all existing Traefik routes
- [ ] Verify Docker Traefik is running and healthy
- [ ] Test DNS resolution for all domains
- [ ] Ensure K3s cluster is operational
Migration Steps¶
Step 1: Create Namespaces
Step 2: Deploy RBAC
Step 3: Create ConfigMap
Step 4: Deploy Traefik (ClusterIP)
kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-deployment.yaml
kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-service.yaml
Step 5: Wait for Readiness
Step 6: Deploy IngressClass
Step 7: Create Test Ingress
# Deploy a test client to verify routing
/home/claude/ge-bootstrap/tools/create-client.sh \
--type shared \
--name test \
--resources small
Step 8: Verify Traffic Flow
# Check pod status
kubectl get pods -n ge-ingress
# Check service
kubectl get svc traefik -n ge-ingress
# Test routing
curl -I https://test.hosting.growing-europe.com
Step 9: Monitor Logs
# K8s Traefik logs
kubectl logs -n ge-ingress deploy/traefik --tail=50
# Docker Traefik logs
docker logs traefik --tail=50
Post-Migration Validation¶
Run the verification checklist:
# 1. Service type is ClusterIP
kubectl get svc traefik -n ge-ingress -o jsonpath='{.spec.type}'
# 2. No svclb pods
kubectl get pods -n kube-system | grep svclb-traefik
# 3. Docker Traefik healthy
docker ps | grep traefik
docker logs traefik 2>&1 | grep -i error | tail -20
# 4. SSL certificate valid
curl -I https://office.growing-europe.com
# 5. K8s Ingress routing works
kubectl get ingress -A
Rollback Procedure¶
If K8s Traefik causes issues and needs to be removed:
Quick Rollback (Remove K8s Traefik)¶
# 1. Delete all Ingress resources (clients will be unreachable)
kubectl delete ingress --all -A
# 2. Delete K8s Traefik
kubectl delete -k /home/claude/ge-bootstrap/k8s/base/ingress/
# 3. Verify Docker Traefik is handling all traffic
docker logs traefik --tail=50
# 4. Office UI should still work
curl -I https://office.growing-europe.com
Full Rollback (Restore Docker-Only Architecture)¶
# 1. Remove all client namespaces
kubectl delete namespace -l app.kubernetes.io/component=client-hosting
# 2. Delete ingress namespace
kubectl delete namespace ge-ingress
# 3. Remove ingress from kustomization
# Edit /home/claude/ge-bootstrap/k8s/base/kustomization.yaml
# Remove: - ./ingress/
# 4. Verify Docker Traefik
docker ps | grep traefik
docker exec traefik cat /etc/traefik/traefik.toml
Restore from Backup¶
If acme.json is corrupted:
# Stop Docker Traefik
docker stop traefik
# Restore backup
cp /home/claude/ge-bootstrap/traefik/acme.json.backup-YYYYMMDD \
/home/claude/ge-bootstrap/traefik/acme.json
# Fix permissions
chmod 600 /home/claude/ge-bootstrap/traefik/acme.json
# Restart Docker Traefik
docker start traefik
Troubleshooting¶
K8s Traefik Pods Not Starting¶
Symptoms:
Diagnosis:
Common Causes:
-
RBAC Permissions Missing
-
ConfigMap Syntax Error
-
Image Pull Error
Ingress Not Routing Traffic¶
Symptoms:
Diagnosis:
# 1. Check Ingress resource exists
kubectl get ingress -n sh-client
# 2. Check Ingress has address
kubectl describe ingress web -n sh-client
# 3. Check Traefik logs
kubectl logs -n ge-ingress deploy/traefik | grep client
Solutions:
-
IngressClass Mismatch
-
Service Not Found
-
Network Policy Blocking Traffic
SSL Certificate Errors¶
Symptoms:
Diagnosis:
# 1. Check which Traefik is handling traffic
curl -I https://client.hosting.growing-europe.com 2>&1 | grep -i server
# 2. Check Docker Traefik logs
docker logs traefik 2>&1 | grep client.hosting.growing-europe.com
# 3. Check certificate in acme.json
sudo cat /home/claude/ge-bootstrap/traefik/acme.json | jq '.letsencrypt.Certificates[] | select(.domain.main == "client.hosting.growing-europe.com")'
Solutions:
-
K8s Traefik is LoadBalancer (WRONG)
-
Certificate Not Issued Yet
-
DNS Not Resolving
Port Conflicts¶
Symptoms:
Diagnosis:
# Check for port conflicts
sudo netstat -tulpn | grep ':80\|:443'
# Check svclb pods
kubectl get pods -n kube-system | grep svclb
Solution:
# Change service to ClusterIP
kubectl edit svc traefik -n ge-ingress
# Change:
# type: LoadBalancer
# To:
# type: ClusterIP
# Delete svclb pods if they exist
kubectl delete pods -n kube-system -l app=svclb-traefik
Related Documentation¶
- Architecture Overview - Full hosting architecture
- Client Onboarding - Creating client environments
- Zero-Downtime Deployments - Update procedures
- Platform Startup - Platform initialization
Lessons Learned¶
What Went Wrong Initially¶
- LoadBalancer Service Created Port Conflict
- K3s svclb attempted to bind ports 80/443
- Docker Traefik already owned these ports
-
Result: Service stuck in Pending, SSL broken
-
Insufficient Documentation
- Configuration files lacked warnings
- Easy to accidentally change service type
-
No clear indication of consequences
-
Testing Gaps
- Initial deployment tested without production SSL
- office.growing-europe.com not included in test plan
- SSL conflict discovered in production
Improvements Made¶
- Explicit ClusterIP Configuration
- Service type explicitly set with warnings
- Comments in both YAML and docker-compose.yml
-
Documentation explains why
-
Comprehensive Documentation
- Architecture diagrams showing both Traefik instances
- Troubleshooting guides for common issues
-
Rollback procedures documented
-
Verification Commands
- Scripts to check service type
- Automated validation in startup script
- Health checks for both Traefik instances
This migration guide is maintained by the GE Infrastructure Team. For questions or updates, contact the infrastructure lead or create an issue in the ge-ops repository.