Skip to content

GE Unified Hosting Architecture

Last Updated: 2026-01-29 Status: Active Maintained by: GE Infrastructure Team


Overview

The GE Unified Hosting Architecture provides a scalable, secure, and isolated multi-tenant hosting platform for client workloads on Kubernetes. It combines:

  • Traefik IngressController for unified routing and SSL termination
  • Namespace-based isolation for security and resource management
  • Immutable deployment packages for reliable, auditable deployments
  • Zero-downtime deployment strategies for production reliability
  • Shared and dedicated hosting tiers for flexible resource allocation

Architecture Diagram

flowchart TB
    subgraph Internet
        Users[External Users/Clients]
    end

    subgraph Docker Host
        DockerTraefik[Docker Traefik<br/>Production Ingress<br/>Let's Encrypt SSL]
    end

    subgraph K8s Cluster
        subgraph ge-ingress namespace
            K8sTraefik[K8s Traefik<br/>IngressController<br/>ClusterIP]
        end

        subgraph ge-hosting namespace
            SharedPool[Shared Hosting Pool<br/>Landing Page]
        end

        subgraph sh-client1 namespace
            SharedClient1[Shared Client<br/>Workload]
        end

        subgraph sh-client2 namespace
            SharedClient2[Shared Client<br/>Workload]
        end

        subgraph ded-bigcorp namespace
            DedicatedClient[Dedicated Client<br/>Workload<br/>+ HPA + PDB]
        end
    end

    Users --> DockerTraefik
    DockerTraefik --> K8sTraefik
    K8sTraefik --> SharedPool
    K8sTraefik --> SharedClient1
    K8sTraefik --> SharedClient2
    K8sTraefik --> DedicatedClient

Two-Tier Ingress Architecture

Why Two Traefik Instances?

The architecture uses two separate Traefik instances to solve a specific operational challenge:

  1. Docker Traefik (Production Ingress)
  2. Runs on the Docker host (outside K8s)
  3. Binds to host ports 80/443
  4. Manages Let's Encrypt certificates
  5. Handles external traffic
  6. Routes to both Docker services AND K8s services

  7. K8s Traefik (IngressController)

  8. Runs inside K8s as a Deployment
  9. Uses ClusterIP service (internal only)
  10. Handles K8s Ingress resources
  11. Routes traffic within the cluster
  12. No port conflict with Docker Traefik

Critical: K8s Traefik service MUST remain ClusterIP. Changing it to LoadBalancer will cause port conflicts and break SSL certificates.

Traffic Flow

External User
Docker Traefik (80/443)
    ↓ (Let's Encrypt SSL termination)
K8s Traefik (ClusterIP)
    ↓ (Routes via Ingress resources)
Client Workload (8080)

Namespace Strategy

Core Infrastructure Namespaces

Namespace Purpose Components
ge-ingress Ingress controller K8s Traefik, IngressClass
ge-hosting Shared hosting pool Landing page, shared resources
ge-system Core infrastructure Redis, Vault, core services
ge-agents Agent platform Dolly, executors, agents
ge-monitoring Observability Loki, Grafana

Client Namespaces

Client namespaces follow a prefix convention:

Prefix Type Resource Profile Use Case
sh-* Shared Hosting Small-medium resources Cost-effective multi-tenant hosting
ded-* Dedicated Hosting Large resources + HPA + PDB High-traffic, isolated workloads

Example: - sh-acme-corp - Shared hosting for Acme Corp - ded-bigcorp - Dedicated hosting for BigCorp


Shared vs Dedicated Hosting

Shared Hosting (sh-*)

Characteristics: - Lower resource allocation - Cost-effective for multiple clients - Suitable for low-medium traffic - Faster provisioning

Resources: | Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas | |------|-------------|-----------|----------------|--------------|----------| | Small | 10m | 100m | 32Mi | 128Mi | 1 | | Medium | 50m | 250m | 64Mi | 256Mi | 2 | | Large | 100m | 500m | 128Mi | 512Mi | 2 |

Components: - Deployment with rolling updates - Service (ClusterIP) - Ingress (TLS via Let's Encrypt) - NetworkPolicy (isolation) - ConfigMaps

Dedicated Hosting (ded-*)

Characteristics: - Higher resource allocation - Isolated environment per client - Auto-scaling (HPA) - High availability (PDB) - Suitable for high-traffic workloads

Resources: | Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas (min) | |------|-------------|-----------|----------------|--------------|----------------| | Large | 100m | 500m | 128Mi | 512Mi | 2 | | XLarge | 200m | 1000m | 256Mi | 1Gi | 3 |

Additional Components: - HorizontalPodAutoscaler (scales 2-10 replicas) - PodDisruptionBudget (maintains availability during updates) - Resource quotas (namespace-level limits)


Network Isolation

NetworkPolicy Rules

All client namespaces are isolated by default with specific allow rules:

Ingress: - ✅ Allow traffic from ge-ingress namespace (Traefik) - ❌ Deny all other ingress

Egress: - ✅ Allow DNS queries to kube-system - ❌ Deny all other egress (optional: allow specific external services)

Example NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: client-isolation
  namespace: sh-acme-corp
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              app.kubernetes.io/component: ingress
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53


Immutable Package Flow

flowchart LR
    A[Client Overlay<br/>/k8s/clients/acme-corp] --> B[package-client.sh<br/>--client acme-corp<br/>--version v1.2.3]
    B --> C[Immutable Package<br/>acme-corp-v1.2.3/]
    C --> D[verify-package.sh<br/>Checksum validation]
    D --> E{Valid?}
    E -->|Yes| F[deploy-package.sh<br/>Apply to cluster]
    E -->|No| G[Fix Issues<br/>Regenerate]
    F --> H[K8s Cluster<br/>Rolling Update]
    H --> I[Health Checks<br/>Readiness Probes]
    I --> J{Healthy?}
    J -->|Yes| K[Deployment Complete]
    J -->|No| L[Automatic Rollback]

Package Contents

Every immutable package contains:

acme-corp-v1.2.3/
├── MANIFEST.json          # Package metadata
├── manifests/
│   └── all.yaml          # All K8s resources
├── images.txt            # Container image digests
├── secrets.env.enc       # Secrets reference (not actual secrets)
├── deploy.sh            # Deployment script
└── checksum.sha256      # Integrity verification

MANIFEST.json Schema:

{
  "client": "acme-corp",
  "version": "v1.2.3",
  "namespace": "sh-acme-corp",
  "created": "2026-01-29T10:00:00Z",
  "created_by": "package-client.sh",
  "source": "/k8s/clients/acme-corp",
  "files": {
    "manifests": "manifests/all.yaml",
    "images": "images.txt",
    "secrets_ref": "secrets.env.enc",
    "deploy_script": "deploy.sh"
  }
}


SSL Certificate Management

Let's Encrypt via Docker Traefik

  • Certificate Resolver: letsencrypt
  • Challenge Type: HTTP-01
  • Storage: /traefik/acme.json (Docker volume)
  • Auto-renewal: Managed by Docker Traefik
  • Supported Domains:
  • *.hosting.growing-europe.com (wildcard for client subdomains)
  • office.growing-europe.com (admin UI)

Certificate Flow

1. Client Ingress created with TLS annotation
2. Docker Traefik detects new hostname
3. Let's Encrypt HTTP-01 challenge initiated
4. Certificate issued and stored in acme.json
5. Traffic served over HTTPS
6. Auto-renewal 30 days before expiry

Important: K8s Traefik does NOT manage Let's Encrypt certificates. All certificate operations are handled by Docker Traefik.


Security Considerations

Container Security

All client workloads run with:

securityContext:
  runAsNonRoot: true
  runAsUser: 101
  runAsGroup: 101
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
  readOnlyRootFilesystem: true

Network Policies

  • Default Deny: All namespaces start with default deny ingress/egress
  • Explicit Allow: Only required traffic paths are allowed
  • Namespace Isolation: Clients cannot communicate with each other
  • Ingress-Only Access: Only Traefik can reach client workloads

Secret Management

  • Never in packages: Secrets are referenced, not included
  • Vault Integration: Secrets retrieved from Vault at runtime
  • K8s Secrets: Mounted as volumes, not environment variables
  • Rotation: Secrets can be rotated without package regeneration

Resource Management

Resource Quotas

Dedicated namespaces can have resource quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: ded-bigcorp
spec:
  hard:
    requests.cpu: "2"
    requests.memory: "4Gi"
    limits.cpu: "4"
    limits.memory: "8Gi"
    persistentvolumeclaims: "10"

Pod Disruption Budgets (Dedicated Only)

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web
  namespace: ded-bigcorp
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: web

Ensures at least 1 replica remains available during: - Node maintenance - K8s upgrades - Cluster scaling operations


Monitoring and Observability

Prometheus Metrics

Traefik exposes metrics on port 8080: - Request count by service - Response times - Error rates - Active connections

Loki Log Aggregation

All container logs are collected by: 1. Node-level: Promtail DaemonSet 2. Aggregation: Loki in ge-monitoring 3. Visualization: Grafana dashboards

Health Checks

Every deployment includes:

livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 3
  periodSeconds: 5

Troubleshooting

Common Issues

1. Client URL returns 404

Check:

# Verify Ingress created
kubectl get ingress -n sh-acme-corp

# Check Traefik routing
kubectl logs -n ge-ingress deploy/traefik | grep acme-corp

# Verify DNS
dig acme-corp.hosting.growing-europe.com

2. SSL Certificate Not Issued

Check:

# Check Docker Traefik logs
docker logs traefik 2>&1 | grep acme

# Verify acme.json
sudo ls -lh /home/claude/ge-bootstrap/traefik/acme.json

# Check Let's Encrypt rate limits
curl -s https://crt.sh/?q=%.hosting.growing-europe.com | jq

3. Pod Not Starting

Check:

# Pod status
kubectl get pods -n sh-acme-corp

# Pod events
kubectl describe pod <pod-name> -n sh-acme-corp

# Container logs
kubectl logs <pod-name> -n sh-acme-corp

# Check resource constraints
kubectl top pods -n sh-acme-corp

4. Network Policy Blocking Traffic

Check:

# List policies
kubectl get networkpolicies -n sh-acme-corp

# Describe policy
kubectl describe networkpolicy client-isolation -n sh-acme-corp

# Test connectivity
kubectl run -it --rm debug --image=alpine --restart=Never -n sh-acme-corp -- wget -O- http://web



Maintenance

Daily Tasks

  • Monitor Let's Encrypt certificate renewals
  • Check Traefik pod health
  • Review Loki logs for errors

Weekly Tasks

  • Audit client resource usage
  • Review NetworkPolicy effectiveness
  • Check for K8s version updates

Monthly Tasks

  • Review and optimize resource quotas
  • Audit client namespace labels
  • Test disaster recovery procedures

This documentation is maintained by the GE Infrastructure Team. For updates or corrections, contact the infrastructure lead or create an issue in the ge-ops repository.