Skip to content

Traefik Kubernetes Migration Guide

Last Updated: 2026-01-29 Status: Active Maintained by: GE Infrastructure Team


Overview

This document details the migration of Traefik from a Docker-only deployment to a hybrid Docker+Kubernetes architecture. The migration was necessary to provide native Kubernetes IngressController capabilities while maintaining existing Let's Encrypt SSL certificates and production services.

Key Outcome: Two separate Traefik instances working in tandem without port conflicts or SSL certificate issues.


Table of Contents


Architecture Evolution

Old Architecture (Docker Traefik Only)

flowchart TB
    Users[External Users] --> DockerTraefik[Docker Traefik<br/>80/443<br/>Let's Encrypt]
    DockerTraefik --> AdminUI[Admin UI<br/>office.growing-europe.com]
    DockerTraefik --> DockerServices[Other Docker Services]

    K8sServices[K8s Services<br/>No Ingress] -.No routing.-> DockerTraefik

Limitations: - No native Kubernetes Ingress support - Manual routing configuration required for K8s services - No automatic service discovery for K8s workloads - Difficult to scale K8s-based hosting

New Architecture (Hybrid Docker + K8s)

flowchart TB
    Users[External Users] --> DockerTraefik[Docker Traefik<br/>80/443<br/>Let's Encrypt SSL]

    DockerTraefik --> AdminUI[Admin UI<br/>office.growing-europe.com]
    DockerTraefik --> DockerServices[Other Docker Services]
    DockerTraefik --> K8sTraefik[K8s Traefik<br/>ClusterIP<br/>IngressController]

    K8sTraefik --> ClientWorkloads[Client Workloads<br/>*.hosting.growing-europe.com]
    K8sTraefik --> K8sServices[K8s Services<br/>Auto-discovered]

Improvements: - Native Kubernetes Ingress resources - Automatic service discovery via IngressController - Scalable multi-tenant hosting - Unified SSL certificate management - No port conflicts


Why Two Traefik Instances

Rationale

The hybrid architecture solves several critical challenges:

  1. Existing SSL Certificates
  2. Docker Traefik holds production Let's Encrypt certificates in acme.json
  3. Migrating certificates would cause downtime
  4. Multiple domains already configured and renewed automatically

  5. Port Binding Constraints

  6. K3s LoadBalancer services use svclb (ServiceLB) to bind host ports
  7. If K8s Traefik uses LoadBalancer, it attempts to bind ports 80/443
  8. Docker Traefik already owns these ports on the host
  9. Result: Port conflict, service fails to start

  10. Operational Continuity

  11. Docker services (admin-ui, Redis, Vault) need continued access
  12. Zero downtime migration required
  13. Gradual rollout of K8s hosting capabilities

Division of Responsibilities

Component Docker Traefik K8s Traefik
Port Binding 80/443 on host None (ClusterIP)
SSL Certificates Let's Encrypt via acme.json None (passes through)
Routing Docker containers, proxy to K8s Kubernetes Ingress resources
Service Discovery Docker labels K8s Ingress/Service resources
High Availability Single instance 2 replicas with anti-affinity

Critical Configuration: ClusterIP Service

The ClusterIP Requirement

CRITICAL: The K8s Traefik service MUST be ClusterIP, NOT LoadBalancer.

File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: traefik
  namespace: ge-ingress
  annotations:
    # WARNING: Keep as ClusterIP - Docker Traefik owns 80/443 on host
    ge.hosting/note: "ClusterIP only - Docker Traefik handles external ingress"
spec:
  # ClusterIP - NOT LoadBalancer (would conflict with Docker Traefik)
  type: ClusterIP
  selector:
    app: traefik
  ports:
    - name: web
      port: 80
      targetPort: web
      protocol: TCP
    - name: websecure
      port: 443
      targetPort: websecure
      protocol: TCP

What Happens If You Change to LoadBalancer

If you change the service type to LoadBalancer:

  1. K3s ServiceLB (svclb) activates
  2. Creates DaemonSet pods on all nodes
  3. Attempts to bind ports 80/443 on the host

  4. Port Conflict

  5. Docker Traefik already owns 80/443
  6. K3s svclb fails to bind ports
  7. Service remains in "Pending" state

  8. SSL Certificate Conflict

  9. External traffic routes to K8s Traefik instead of Docker Traefik
  10. K8s Traefik has no Let's Encrypt certificates
  11. SSL verification fails
  12. office.growing-europe.com returns SSL errors

  13. Production Outage

  14. Admin UI becomes unreachable
  15. Existing Docker services lose ingress
  16. Certificate renewal fails

Verification

Check that the service is correctly configured:

# Verify service type
kubectl get svc traefik -n ge-ingress -o jsonpath='{.spec.type}'
# Expected output: ClusterIP

# Verify no external IP
kubectl get svc traefik -n ge-ingress
# Should show: EXTERNAL-IP <none>

# Check for svclb pods (should not exist for ClusterIP)
kubectl get pods -n kube-system | grep svclb-traefik
# Expected: No results

SSL Certificate Conflict Resolution

The office.growing-europe.com Issue

Problem Encountered: During initial implementation, K8s Traefik was configured as LoadBalancer. This caused:

  1. K3s svclb intercepted port 443 traffic
  2. Traffic routed to K8s Traefik (no certificates)
  3. office.growing-europe.com showed SSL errors
  4. Let's Encrypt renewal failed

Root Cause: Two ingress controllers competing for the same ports with different SSL configurations.

Solution

Step 1: Changed K8s Traefik service to ClusterIP

kubectl edit svc traefik -n ge-ingress
# Change: type: LoadBalancer
# To:     type: ClusterIP

Step 2: Added warning comments to prevent regression

# In traefik-service.yaml
# WARNING: Keep as ClusterIP - Docker Traefik owns 80/443 on host

Step 3: Added corresponding warning in Docker Compose

# In docker-compose.yml traefik service
# IMPORTANT: Docker Traefik is the PRODUCTION ingress for minisforum.
# K8s Traefik (k8s/base/ingress/) is ClusterIP only for internal routing.
# DO NOT enable K8s Traefik LoadBalancer - it conflicts with ports 80/443.

Step 4: Documented architecture in both locations

Traffic Flow (Corrected)

External Request (HTTPS)
    ↓ (DNS: *.growing-europe.com → server IP)
Docker Traefik (port 443)
    ↓ (Let's Encrypt SSL termination)
Route Decision:
    ├─ office.growing-europe.com → Admin UI (Docker)
    ├─ *.hosting.growing-europe.com → K8s Traefik (ClusterIP)
    └─ Other domains → Docker services
K8s Traefik IngressController
    ↓ (Reads Ingress resources)
Client Workload Services (ClusterIP)
Client Pods

Kubernetes Traefik Configuration

Static Configuration

File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-config
  namespace: ge-ingress
data:
  traefik.yaml: |
    # Traefik Static Configuration (v2.11)

    # API and Dashboard
    api:
      dashboard: true
      insecure: false

    # Entry Points
    entryPoints:
      web:
        address: ":80"
        http:
          redirections:
            entryPoint:
              to: websecure
              scheme: https
              permanent: true
      websecure:
        address: ":443"
        http:
          tls:
            certResolver: letsencrypt
      traefik:
        address: ":8080"

    # Certificate Resolvers
    certificatesResolvers:
      letsencrypt:
        acme:
          email: dirkjan@growing-europe.com
          storage: /data/acme.json
          httpChallenge:
            entryPoint: web

    # Providers
    providers:
      kubernetesIngress:
        namespaces:
          - ge-ingress
          - ge-hosting
          - ge-system
        ingressClass: traefik
      kubernetesCRD:
        namespaces:
          - ge-ingress
          - ge-hosting
          - ge-system

    # Logging
    log:
      level: INFO
      format: json

    # Metrics for Prometheus
    metrics:
      prometheus:
        entryPoint: traefik
        addEntryPointsLabels: true
        addServicesLabels: true

    # Ping for health checks
    ping:
      entryPoint: traefik

Key Configuration Points:

  1. Certificate Resolver: Configured but not actively used (Docker Traefik handles SSL)
  2. Namespace Watching: Limited to specific namespaces for security
  3. IngressClass: Set to traefik for explicit routing
  4. Metrics: Prometheus scraping enabled on port 8080

Deployment Configuration

File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-deployment.yaml

High availability deployment with 2 replicas:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: traefik
  namespace: ge-ingress
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  # ... pod template with anti-affinity rules

Features: - 2 Replicas: High availability - Pod Anti-Affinity: Spreads replicas across nodes - Rolling Updates: Zero downtime during updates - Security Context: Non-root user, read-only filesystem - Resource Limits: CPU 100m-500m, Memory 128Mi-256Mi


RBAC Configuration

Required Permissions

File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-rbac.yaml

Traefik requires cluster-wide permissions to watch Ingress resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: traefik-ingress-controller
rules:
  # Core API resources
  - apiGroups: [""]
    resources: ["services", "endpoints", "secrets"]
    verbs: ["get", "list", "watch"]

  # Ingress resources
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses", "ingressclasses"]
    verbs: ["get", "list", "watch"]

  # Update ingress status
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses/status"]
    verbs: ["update"]

  # Traefik CRDs
  - apiGroups: ["traefik.io", "traefik.containo.us"]
    resources:
      - ingressroutes
      - middlewares
      - tlsoptions
      - traefikservices
    verbs: ["get", "list", "watch"]

Security Considerations:

  • Read-Only Access: Traefik can only read resources, not modify them
  • Status Updates: Can update Ingress status (required for LoadBalancer IP)
  • No Secrets Write: Cannot create or modify secrets
  • Namespace-Scoped Watching: Limited to specific namespaces in configuration

ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: traefik
  namespace: ge-ingress

Binding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: traefik-ingress-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik
    namespace: ge-ingress


IngressClass and Ingress Patterns

IngressClass Resource

File: /home/claude/ge-bootstrap/k8s/base/ingress/traefik-ingressclass.yaml

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: traefik
  annotations:
    ingressclass.kubernetes.io/is-default-class: "true"
spec:
  controller: traefik.io/ingress-controller

Purpose: - Declares traefik as the default IngressClass - All Ingress resources without explicit ingressClassName use this controller - Prevents conflicts if multiple ingress controllers are installed

Standard Ingress Pattern

Client workloads use this pattern:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  namespace: sh-acme-corp
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
    traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
spec:
  ingressClassName: traefik
  tls:
    - hosts:
        - acme-corp.hosting.growing-europe.com
      secretName: acme-corp-tls
  rules:
    - host: acme-corp.hosting.growing-europe.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80

Annotations Explained:

Annotation Purpose
router.entrypoints: websecure Only accept HTTPS traffic
router.tls: "true" Enable TLS on this route
router.tls.certresolver: letsencrypt Use Let's Encrypt for SSL

Wildcard Ingress (Hosting Landing Page)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hosting-wildcard
  namespace: ge-hosting
spec:
  ingressClassName: traefik
  rules:
    - host: hosting.growing-europe.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hosting-landing
                port:
                  number: 80

Let's Encrypt Management

Certificate Storage

Location: Docker Traefik only

All Let's Encrypt certificates are stored in:

/home/claude/ge-bootstrap/traefik/acme.json

File Permissions:

chmod 600 /home/claude/ge-bootstrap/traefik/acme.json

Certificate Issuance Flow

sequenceDiagram
    participant Client
    participant DockerTraefik
    participant LetsEncrypt
    participant K8sTraefik
    participant ClientPod

    Client->>DockerTraefik: HTTPS request to acme-corp.hosting.growing-europe.com
    DockerTraefik->>DockerTraefik: Check acme.json for certificate

    alt Certificate exists
        DockerTraefik->>K8sTraefik: Forward with SSL termination
        K8sTraefik->>ClientPod: Route via Ingress
        ClientPod->>K8sTraefik: Response
        K8sTraefik->>DockerTraefik: Response
        DockerTraefik->>Client: HTTPS response
    else Certificate missing
        DockerTraefik->>LetsEncrypt: HTTP-01 challenge request
        LetsEncrypt->>DockerTraefik: Challenge token
        DockerTraefik->>LetsEncrypt: Challenge response
        LetsEncrypt->>DockerTraefik: Certificate issued
        DockerTraefik->>DockerTraefik: Save to acme.json
        DockerTraefik->>K8sTraefik: Forward with SSL termination
    end

Important Notes

  1. K8s Traefik does NOT issue certificates
  2. It receives already-decrypted traffic from Docker Traefik
  3. Certificate annotations in Ingress are informational only
  4. Docker Traefik handles all Let's Encrypt operations

  5. Certificate Renewal

  6. Automatic renewal 30 days before expiry
  7. Handled entirely by Docker Traefik
  8. No K8s restarts required

  9. Backup Strategy

    # Backup acme.json before any Traefik changes
    cp /home/claude/ge-bootstrap/traefik/acme.json \
       /home/claude/ge-bootstrap/traefik/acme.json.backup-$(date +%Y%m%d)
    


Migration Procedure

If migrating from scratch or to a new environment:

Pre-Migration Checklist

  • [ ] Backup existing traefik/acme.json
  • [ ] Document all existing Traefik routes
  • [ ] Verify Docker Traefik is running and healthy
  • [ ] Test DNS resolution for all domains
  • [ ] Ensure K3s cluster is operational

Migration Steps

Step 1: Create Namespaces

kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/namespace.yaml

Step 2: Deploy RBAC

kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-rbac.yaml

Step 3: Create ConfigMap

kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-config.yaml

Step 4: Deploy Traefik (ClusterIP)

kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-deployment.yaml
kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-service.yaml

Step 5: Wait for Readiness

kubectl wait --for=condition=ready pod -l app=traefik -n ge-ingress --timeout=180s

Step 6: Deploy IngressClass

kubectl apply -f /home/claude/ge-bootstrap/k8s/base/ingress/traefik-ingressclass.yaml

Step 7: Create Test Ingress

# Deploy a test client to verify routing
/home/claude/ge-bootstrap/tools/create-client.sh \
  --type shared \
  --name test \
  --resources small

Step 8: Verify Traffic Flow

# Check pod status
kubectl get pods -n ge-ingress

# Check service
kubectl get svc traefik -n ge-ingress

# Test routing
curl -I https://test.hosting.growing-europe.com

Step 9: Monitor Logs

# K8s Traefik logs
kubectl logs -n ge-ingress deploy/traefik --tail=50

# Docker Traefik logs
docker logs traefik --tail=50

Post-Migration Validation

Run the verification checklist:

# 1. Service type is ClusterIP
kubectl get svc traefik -n ge-ingress -o jsonpath='{.spec.type}'

# 2. No svclb pods
kubectl get pods -n kube-system | grep svclb-traefik

# 3. Docker Traefik healthy
docker ps | grep traefik
docker logs traefik 2>&1 | grep -i error | tail -20

# 4. SSL certificate valid
curl -I https://office.growing-europe.com

# 5. K8s Ingress routing works
kubectl get ingress -A

Rollback Procedure

If K8s Traefik causes issues and needs to be removed:

Quick Rollback (Remove K8s Traefik)

# 1. Delete all Ingress resources (clients will be unreachable)
kubectl delete ingress --all -A

# 2. Delete K8s Traefik
kubectl delete -k /home/claude/ge-bootstrap/k8s/base/ingress/

# 3. Verify Docker Traefik is handling all traffic
docker logs traefik --tail=50

# 4. Office UI should still work
curl -I https://office.growing-europe.com

Full Rollback (Restore Docker-Only Architecture)

# 1. Remove all client namespaces
kubectl delete namespace -l app.kubernetes.io/component=client-hosting

# 2. Delete ingress namespace
kubectl delete namespace ge-ingress

# 3. Remove ingress from kustomization
# Edit /home/claude/ge-bootstrap/k8s/base/kustomization.yaml
# Remove: - ./ingress/

# 4. Verify Docker Traefik
docker ps | grep traefik
docker exec traefik cat /etc/traefik/traefik.toml

Restore from Backup

If acme.json is corrupted:

# Stop Docker Traefik
docker stop traefik

# Restore backup
cp /home/claude/ge-bootstrap/traefik/acme.json.backup-YYYYMMDD \
   /home/claude/ge-bootstrap/traefik/acme.json

# Fix permissions
chmod 600 /home/claude/ge-bootstrap/traefik/acme.json

# Restart Docker Traefik
docker start traefik

Troubleshooting

K8s Traefik Pods Not Starting

Symptoms:

kubectl get pods -n ge-ingress
# Shows: CrashLoopBackOff or ImagePullBackOff

Diagnosis:

kubectl describe pod <traefik-pod> -n ge-ingress
kubectl logs <traefik-pod> -n ge-ingress

Common Causes:

  1. RBAC Permissions Missing

    # Check ClusterRole exists
    kubectl get clusterrole traefik-ingress-controller
    
    # Check binding
    kubectl get clusterrolebinding traefik-ingress-controller
    

  2. ConfigMap Syntax Error

    # Validate ConfigMap
    kubectl get cm traefik-config -n ge-ingress -o yaml
    
    # Check for YAML errors
    kubectl get events -n ge-ingress | grep traefik
    

  3. Image Pull Error

    # Pull image manually
    docker pull traefik:v2.11
    
    # Import to K3s
    docker save traefik:v2.11 | sudo k3s ctr images import -
    

Ingress Not Routing Traffic

Symptoms:

curl https://client.hosting.growing-europe.com
# Returns: 404 Not Found

Diagnosis:

# 1. Check Ingress resource exists
kubectl get ingress -n sh-client

# 2. Check Ingress has address
kubectl describe ingress web -n sh-client

# 3. Check Traefik logs
kubectl logs -n ge-ingress deploy/traefik | grep client

Solutions:

  1. IngressClass Mismatch

    # Add ingressClassName to Ingress
    kubectl edit ingress web -n sh-client
    # Add: spec.ingressClassName: traefik
    

  2. Service Not Found

    # Verify service exists
    kubectl get svc -n sh-client
    
    # Check endpoints
    kubectl get endpoints -n sh-client
    

  3. Network Policy Blocking Traffic

    # Check policies
    kubectl get networkpolicies -n sh-client
    
    # Temporarily delete for testing
    kubectl delete networkpolicy client-isolation -n sh-client
    

SSL Certificate Errors

Symptoms:

curl https://client.hosting.growing-europe.com
# Returns: SSL certificate problem

Diagnosis:

# 1. Check which Traefik is handling traffic
curl -I https://client.hosting.growing-europe.com 2>&1 | grep -i server

# 2. Check Docker Traefik logs
docker logs traefik 2>&1 | grep client.hosting.growing-europe.com

# 3. Check certificate in acme.json
sudo cat /home/claude/ge-bootstrap/traefik/acme.json | jq '.letsencrypt.Certificates[] | select(.domain.main == "client.hosting.growing-europe.com")'

Solutions:

  1. K8s Traefik is LoadBalancer (WRONG)

    # Change to ClusterIP
    kubectl patch svc traefik -n ge-ingress -p '{"spec":{"type":"ClusterIP"}}'
    
    # Restart Docker Traefik
    docker restart traefik
    

  2. Certificate Not Issued Yet

    # Wait 2-5 minutes for Let's Encrypt
    # Check Docker Traefik logs for ACME challenge
    docker logs traefik 2>&1 | grep -i acme | tail -20
    

  3. DNS Not Resolving

    # Check DNS
    dig client.hosting.growing-europe.com
    
    # Should point to server IP
    # If not, wait for DNS propagation
    

Port Conflicts

Symptoms:

kubectl get svc traefik -n ge-ingress
# Shows: <pending> under EXTERNAL-IP for LoadBalancer

Diagnosis:

# Check for port conflicts
sudo netstat -tulpn | grep ':80\|:443'

# Check svclb pods
kubectl get pods -n kube-system | grep svclb

Solution:

# Change service to ClusterIP
kubectl edit svc traefik -n ge-ingress

# Change:
#   type: LoadBalancer
# To:
#   type: ClusterIP

# Delete svclb pods if they exist
kubectl delete pods -n kube-system -l app=svclb-traefik



Lessons Learned

What Went Wrong Initially

  1. LoadBalancer Service Created Port Conflict
  2. K3s svclb attempted to bind ports 80/443
  3. Docker Traefik already owned these ports
  4. Result: Service stuck in Pending, SSL broken

  5. Insufficient Documentation

  6. Configuration files lacked warnings
  7. Easy to accidentally change service type
  8. No clear indication of consequences

  9. Testing Gaps

  10. Initial deployment tested without production SSL
  11. office.growing-europe.com not included in test plan
  12. SSL conflict discovered in production

Improvements Made

  1. Explicit ClusterIP Configuration
  2. Service type explicitly set with warnings
  3. Comments in both YAML and docker-compose.yml
  4. Documentation explains why

  5. Comprehensive Documentation

  6. Architecture diagrams showing both Traefik instances
  7. Troubleshooting guides for common issues
  8. Rollback procedures documented

  9. Verification Commands

  10. Scripts to check service type
  11. Automated validation in startup script
  12. Health checks for both Traefik instances

This migration guide is maintained by the GE Infrastructure Team. For questions or updates, contact the infrastructure lead or create an issue in the ge-ops repository.