GE Unified Hosting Architecture¶
Last Updated: 2026-01-29 Status: Active Maintained by: GE Infrastructure Team
Overview¶
The GE Unified Hosting Architecture provides a scalable, secure, and isolated multi-tenant hosting platform for client workloads on Kubernetes. It combines:
- Traefik IngressController for unified routing and SSL termination
- Namespace-based isolation for security and resource management
- Immutable deployment packages for reliable, auditable deployments
- Zero-downtime deployment strategies for production reliability
- Shared and dedicated hosting tiers for flexible resource allocation
Architecture Diagram¶
flowchart TB
subgraph Internet
Users[External Users/Clients]
end
subgraph Docker Host
DockerTraefik[Docker Traefik<br/>Production Ingress<br/>Let's Encrypt SSL]
end
subgraph K8s Cluster
subgraph ge-ingress namespace
K8sTraefik[K8s Traefik<br/>IngressController<br/>ClusterIP]
end
subgraph ge-hosting namespace
SharedPool[Shared Hosting Pool<br/>Landing Page]
end
subgraph sh-client1 namespace
SharedClient1[Shared Client<br/>Workload]
end
subgraph sh-client2 namespace
SharedClient2[Shared Client<br/>Workload]
end
subgraph ded-bigcorp namespace
DedicatedClient[Dedicated Client<br/>Workload<br/>+ HPA + PDB]
end
end
Users --> DockerTraefik
DockerTraefik --> K8sTraefik
K8sTraefik --> SharedPool
K8sTraefik --> SharedClient1
K8sTraefik --> SharedClient2
K8sTraefik --> DedicatedClient
Two-Tier Ingress Architecture¶
Why Two Traefik Instances?¶
The architecture uses two separate Traefik instances to solve a specific operational challenge:
- Docker Traefik (Production Ingress)
- Runs on the Docker host (outside K8s)
- Binds to host ports 80/443
- Manages Let's Encrypt certificates
- Handles external traffic
-
Routes to both Docker services AND K8s services
-
K8s Traefik (IngressController)
- Runs inside K8s as a Deployment
- Uses ClusterIP service (internal only)
- Handles K8s Ingress resources
- Routes traffic within the cluster
- No port conflict with Docker Traefik
Critical: K8s Traefik service MUST remain ClusterIP. Changing it to LoadBalancer will cause port conflicts and break SSL certificates.
Traffic Flow¶
External User
↓
Docker Traefik (80/443)
↓ (Let's Encrypt SSL termination)
↓
K8s Traefik (ClusterIP)
↓ (Routes via Ingress resources)
↓
Client Workload (8080)
Namespace Strategy¶
Core Infrastructure Namespaces¶
| Namespace | Purpose | Components |
|---|---|---|
ge-ingress |
Ingress controller | K8s Traefik, IngressClass |
ge-hosting |
Shared hosting pool | Landing page, shared resources |
ge-system |
Core infrastructure | Redis, Vault, core services |
ge-agents |
Agent platform | Dolly, executors, agents |
ge-monitoring |
Observability | Loki, Grafana |
Client Namespaces¶
Client namespaces follow a prefix convention:
| Prefix | Type | Resource Profile | Use Case |
|---|---|---|---|
sh-* |
Shared Hosting | Small-medium resources | Cost-effective multi-tenant hosting |
ded-* |
Dedicated Hosting | Large resources + HPA + PDB | High-traffic, isolated workloads |
Example:
- sh-acme-corp - Shared hosting for Acme Corp
- ded-bigcorp - Dedicated hosting for BigCorp
Shared vs Dedicated Hosting¶
Shared Hosting (sh-*)¶
Characteristics: - Lower resource allocation - Cost-effective for multiple clients - Suitable for low-medium traffic - Faster provisioning
Resources: | Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas | |------|-------------|-----------|----------------|--------------|----------| | Small | 10m | 100m | 32Mi | 128Mi | 1 | | Medium | 50m | 250m | 64Mi | 256Mi | 2 | | Large | 100m | 500m | 128Mi | 512Mi | 2 |
Components: - Deployment with rolling updates - Service (ClusterIP) - Ingress (TLS via Let's Encrypt) - NetworkPolicy (isolation) - ConfigMaps
Dedicated Hosting (ded-*)¶
Characteristics: - Higher resource allocation - Isolated environment per client - Auto-scaling (HPA) - High availability (PDB) - Suitable for high-traffic workloads
Resources: | Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas (min) | |------|-------------|-----------|----------------|--------------|----------------| | Large | 100m | 500m | 128Mi | 512Mi | 2 | | XLarge | 200m | 1000m | 256Mi | 1Gi | 3 |
Additional Components: - HorizontalPodAutoscaler (scales 2-10 replicas) - PodDisruptionBudget (maintains availability during updates) - Resource quotas (namespace-level limits)
Network Isolation¶
NetworkPolicy Rules¶
All client namespaces are isolated by default with specific allow rules:
Ingress:
- ✅ Allow traffic from ge-ingress namespace (Traefik)
- ❌ Deny all other ingress
Egress:
- ✅ Allow DNS queries to kube-system
- ❌ Deny all other egress (optional: allow specific external services)
Example NetworkPolicy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: client-isolation
namespace: sh-acme-corp
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
app.kubernetes.io/component: ingress
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
Immutable Package Flow¶
flowchart LR
A[Client Overlay<br/>/k8s/clients/acme-corp] --> B[package-client.sh<br/>--client acme-corp<br/>--version v1.2.3]
B --> C[Immutable Package<br/>acme-corp-v1.2.3/]
C --> D[verify-package.sh<br/>Checksum validation]
D --> E{Valid?}
E -->|Yes| F[deploy-package.sh<br/>Apply to cluster]
E -->|No| G[Fix Issues<br/>Regenerate]
F --> H[K8s Cluster<br/>Rolling Update]
H --> I[Health Checks<br/>Readiness Probes]
I --> J{Healthy?}
J -->|Yes| K[Deployment Complete]
J -->|No| L[Automatic Rollback]
Package Contents¶
Every immutable package contains:
acme-corp-v1.2.3/
├── MANIFEST.json # Package metadata
├── manifests/
│ └── all.yaml # All K8s resources
├── images.txt # Container image digests
├── secrets.env.enc # Secrets reference (not actual secrets)
├── deploy.sh # Deployment script
└── checksum.sha256 # Integrity verification
MANIFEST.json Schema:
{
"client": "acme-corp",
"version": "v1.2.3",
"namespace": "sh-acme-corp",
"created": "2026-01-29T10:00:00Z",
"created_by": "package-client.sh",
"source": "/k8s/clients/acme-corp",
"files": {
"manifests": "manifests/all.yaml",
"images": "images.txt",
"secrets_ref": "secrets.env.enc",
"deploy_script": "deploy.sh"
}
}
SSL Certificate Management¶
Let's Encrypt via Docker Traefik¶
- Certificate Resolver:
letsencrypt - Challenge Type: HTTP-01
- Storage:
/traefik/acme.json(Docker volume) - Auto-renewal: Managed by Docker Traefik
- Supported Domains:
*.hosting.growing-europe.com(wildcard for client subdomains)office.growing-europe.com(admin UI)
Certificate Flow¶
1. Client Ingress created with TLS annotation
2. Docker Traefik detects new hostname
3. Let's Encrypt HTTP-01 challenge initiated
4. Certificate issued and stored in acme.json
5. Traffic served over HTTPS
6. Auto-renewal 30 days before expiry
Important: K8s Traefik does NOT manage Let's Encrypt certificates. All certificate operations are handled by Docker Traefik.
Security Considerations¶
Container Security¶
All client workloads run with:
securityContext:
runAsNonRoot: true
runAsUser: 101
runAsGroup: 101
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
readOnlyRootFilesystem: true
Network Policies¶
- Default Deny: All namespaces start with default deny ingress/egress
- Explicit Allow: Only required traffic paths are allowed
- Namespace Isolation: Clients cannot communicate with each other
- Ingress-Only Access: Only Traefik can reach client workloads
Secret Management¶
- Never in packages: Secrets are referenced, not included
- Vault Integration: Secrets retrieved from Vault at runtime
- K8s Secrets: Mounted as volumes, not environment variables
- Rotation: Secrets can be rotated without package regeneration
Resource Management¶
Resource Quotas¶
Dedicated namespaces can have resource quotas:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: ded-bigcorp
spec:
hard:
requests.cpu: "2"
requests.memory: "4Gi"
limits.cpu: "4"
limits.memory: "8Gi"
persistentvolumeclaims: "10"
Pod Disruption Budgets (Dedicated Only)¶
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web
namespace: ded-bigcorp
spec:
minAvailable: 1
selector:
matchLabels:
app: web
Ensures at least 1 replica remains available during: - Node maintenance - K8s upgrades - Cluster scaling operations
Monitoring and Observability¶
Prometheus Metrics¶
Traefik exposes metrics on port 8080: - Request count by service - Response times - Error rates - Active connections
Loki Log Aggregation¶
All container logs are collected by:
1. Node-level: Promtail DaemonSet
2. Aggregation: Loki in ge-monitoring
3. Visualization: Grafana dashboards
Health Checks¶
Every deployment includes:
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 3
periodSeconds: 5
Troubleshooting¶
Common Issues¶
1. Client URL returns 404
Check:
# Verify Ingress created
kubectl get ingress -n sh-acme-corp
# Check Traefik routing
kubectl logs -n ge-ingress deploy/traefik | grep acme-corp
# Verify DNS
dig acme-corp.hosting.growing-europe.com
2. SSL Certificate Not Issued
Check:
# Check Docker Traefik logs
docker logs traefik 2>&1 | grep acme
# Verify acme.json
sudo ls -lh /home/claude/ge-bootstrap/traefik/acme.json
# Check Let's Encrypt rate limits
curl -s https://crt.sh/?q=%.hosting.growing-europe.com | jq
3. Pod Not Starting
Check:
# Pod status
kubectl get pods -n sh-acme-corp
# Pod events
kubectl describe pod <pod-name> -n sh-acme-corp
# Container logs
kubectl logs <pod-name> -n sh-acme-corp
# Check resource constraints
kubectl top pods -n sh-acme-corp
4. Network Policy Blocking Traffic
Check:
# List policies
kubectl get networkpolicies -n sh-acme-corp
# Describe policy
kubectl describe networkpolicy client-isolation -n sh-acme-corp
# Test connectivity
kubectl run -it --rm debug --image=alpine --restart=Never -n sh-acme-corp -- wget -O- http://web
Related Documentation¶
- Client Onboarding Runbook - Step-by-step client creation
- Traefik K8s Migration - Architecture evolution details
- Deployment Packages - Immutable package specification
- Zero-Downtime Deployments - Deployment procedures
- Platform Startup - Startup sequence and troubleshooting
Maintenance¶
Daily Tasks¶
- Monitor Let's Encrypt certificate renewals
- Check Traefik pod health
- Review Loki logs for errors
Weekly Tasks¶
- Audit client resource usage
- Review NetworkPolicy effectiveness
- Check for K8s version updates
Monthly Tasks¶
- Review and optimize resource quotas
- Audit client namespace labels
- Test disaster recovery procedures
This documentation is maintained by the GE Infrastructure Team. For updates or corrections, contact the infrastructure lead or create an issue in the ge-ops repository.