Client Onboarding Runbook¶
Last Updated: 2026-01-29 Maintained by: GE Infrastructure Team Estimated Time: 15-30 minutes per client
Overview¶
This runbook guides you through onboarding a new client to the GE Unified Hosting Platform. The process creates a dedicated namespace, configures ingress routing, provisions SSL certificates, and deploys the client workload.
Prerequisites¶
Before starting, ensure you have:
- [ ] Access to the Kubernetes cluster (
kubectlconfigured) - [ ] SSH/console access to the GE infrastructure server
- [ ] Client requirements documented (see below)
- [ ] DNS access (if custom domain required)
- [ ] Vault access (for secrets management)
Required Client Information¶
| Field | Description | Example |
|---|---|---|
| Client Name | Lowercase, alphanumeric, hyphens | acme-corp |
| Hosting Type | shared or dedicated |
shared |
| Resource Tier | small, medium, large, xlarge |
small |
| Domain | Subdomain under hosting.growing-europe.com | acme-corp.hosting.growing-europe.com |
| Application Image | Container image (optional, default nginx) | nginx:alpine |
Step 1: Validate Client Name¶
Client names must follow Kubernetes naming conventions:
Rules:
- Lowercase only
- Start with a letter
- Contain only a-z, 0-9, and hyphens
- Maximum 50 characters
- Must be unique (no existing namespace with sh-{name} or ded-{name})
Examples:
# Valid
acme-corp
test-client-01
bigcorp
# Invalid
AcmeCorp # Uppercase
acme_corp # Underscore
-acme # Starts with hyphen
acme-corp.com # Contains period
Verification:
# Check if namespace already exists
CLIENT_NAME="acme-corp"
TYPE="shared" # or "dedicated"
# Determine namespace
if [[ "$TYPE" == "shared" ]]; then
NAMESPACE="sh-${CLIENT_NAME}"
else
NAMESPACE="ded-${CLIENT_NAME}"
fi
# Check for conflicts
kubectl get namespace "$NAMESPACE" 2>/dev/null && echo "❌ Namespace exists" || echo "✅ Available"
Step 2: Choose Resource Tier¶
Select the appropriate tier based on client requirements:
Shared Hosting Tiers¶
| Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Replicas | Monthly Traffic Estimate |
|---|---|---|---|---|---|---|
| Small | 10m | 100m | 32Mi | 128Mi | 1 | <10k requests/day |
| Medium | 50m | 250m | 64Mi | 256Mi | 2 | 10k-100k requests/day |
| Large | 100m | 500m | 128Mi | 512Mi | 2 | 100k-1M requests/day |
Dedicated Hosting Tiers¶
| Tier | CPU Request | CPU Limit | Memory Request | Memory Limit | Min Replicas | Max Replicas | Use Case |
|---|---|---|---|---|---|---|---|
| Large | 100m | 500m | 128Mi | 512Mi | 2 | 10 | High availability required |
| XLarge | 200m | 1000m | 256Mi | 1Gi | 3 | 10 | High traffic, mission-critical |
Decision Matrix: - Small: Development, staging, low-traffic sites - Medium: Production sites, moderate traffic - Large: High-traffic sites, business-critical - XLarge: Enterprise clients, guaranteed uptime SLAs
Step 3: Create Client Environment¶
Use the create-client.sh script to generate the client environment:
Dry Run (Recommended First)¶
cd /home/claude/ge-bootstrap/tools
# Shared hosting example
./create-client.sh \
--type shared \
--name acme-corp \
--resources small \
--dry-run
# Dedicated hosting example
./create-client.sh \
--type dedicated \
--name bigcorp \
--resources large \
--dry-run
Review the output: - Check generated manifests - Verify namespace, labels, annotations - Confirm resource limits - Review ingress configuration
Apply to Cluster¶
Once verified, run without --dry-run:
Expected Output:
==========================================
Creating shared client: acme-corp
==========================================
[INFO] Generating overlay for acme-corp...
[OK] Overlay generated at /home/claude/ge-bootstrap/k8s/clients/acme-corp
[INFO] Deploying client acme-corp...
[INFO] Waiting for deployment...
deployment.apps/web condition met
[OK] Client acme-corp deployed successfully
[INFO] Client status:
NAME READY STATUS RESTARTS AGE
pod/web-xxxxx-xxxxx 1/1 Running 0 10s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/web ClusterIP 10.43.xxx.xxx <none> 80/TCP 10s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/web 1/1 1 1 10s
[OK] Client URL: https://acme-corp.hosting.growing-europe.com
==========================================
[OK] Done!
==========================================
Step 4: Verify Deployment¶
Check Pod Status¶
CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"
# Check all resources
kubectl get all -n "$NAMESPACE"
# Check pod logs
kubectl logs -n "$NAMESPACE" -l app=web --tail=50
# Check pod events
kubectl get events -n "$NAMESPACE" --sort-by='.lastTimestamp'
Healthy Output:
NAME READY STATUS RESTARTS AGE
pod/web-xxxxx-xxxxx 1/1 Running 0 2m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/web ClusterIP 10.43.xx.xx <none> 80/TCP 2m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/web 1/1 1 1 2m
Check Ingress Routing¶
# Check Ingress resource
kubectl get ingress -n "$NAMESPACE"
# Describe Ingress (check annotations and rules)
kubectl describe ingress web -n "$NAMESPACE"
# Check Traefik logs
kubectl logs -n ge-ingress deploy/traefik | grep "$CLIENT_NAME"
Verify Ingress configuration:
Name: web
Namespace: sh-acme-corp
Address: 10.43.0.1
Ingress Class: traefik
Rules:
Host Path Backends
---- ---- --------
acme-corp.hosting.growing-europe.com
/ web:80 (10.42.x.x:8080)
Annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: true
traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt
Step 5: DNS Configuration¶
Standard Subdomain¶
For standard subdomains under hosting.growing-europe.com, DNS is automatically configured:
DNS Record (handled by infrastructure):
No action required for standard subdomains.
Custom Domain (Optional)¶
If the client requires a custom domain:
-
Create DNS CNAME record (client's DNS provider):
-
Update Ingress to include custom domain:
Add custom domain to spec.rules and spec.tls.hosts:
spec:
tls:
- hosts:
- acme-corp.hosting.growing-europe.com
- www.client-domain.com # Add custom domain
secretName: acme-corp-tls
rules:
- host: acme-corp.hosting.growing-europe.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80
- host: www.client-domain.com # Add custom domain
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80
Step 6: SSL Certificate Provisioning¶
SSL certificates are automatically provisioned via Let's Encrypt (Docker Traefik).
Verification¶
# Wait for certificate issuance (can take 1-5 minutes)
sleep 60
# Test HTTPS access
curl -I https://acme-corp.hosting.growing-europe.com
# Expected output:
# HTTP/2 200
# server: nginx
# ...
Check Certificate Details¶
# Check certificate issuer
echo | openssl s_client -servername acme-corp.hosting.growing-europe.com \
-connect acme-corp.hosting.growing-europe.com:443 2>/dev/null | \
openssl x509 -noout -issuer -dates
# Expected output:
# issuer=C = US, O = Let's Encrypt, CN = R3
# notBefore=Jan 29 09:00:00 2026 GMT
# notAfter=Apr 29 09:00:00 2026 GMT
Troubleshooting Certificate Issues¶
Certificate not issued after 5 minutes:
-
Check Docker Traefik logs:
-
Check Let's Encrypt rate limits:
-
Verify DNS resolution:
-
Check Traefik configuration:
Step 7: Configure Secrets (If Required)¶
If the client application requires secrets:
Create Secrets in Vault¶
# Access Vault
kubectl port-forward -n ge-system svc/vault 8200:8200 &
export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN="<root-token>"
# Create client secrets
vault kv put secret/clients/acme-corp \
redis-password="<secure-password>" \
api-key="<client-api-key>" \
custom-secret="<value>"
Reference Secrets in K8s¶
Update the client deployment to reference secrets:
# Create K8s secret from Vault
kubectl create secret generic client-secrets \
-n sh-acme-corp \
--from-literal=redis-password="<value>" \
--from-literal=api-key="<value>"
# Update deployment to mount secrets
kubectl edit deployment web -n sh-acme-corp
Add volume and volumeMount:
spec:
template:
spec:
containers:
- name: nginx
volumeMounts:
- name: secrets
mountPath: /run/secrets
readOnly: true
volumes:
- name: secrets
secret:
secretName: client-secrets
Step 8: Deploy Client Application¶
Replace the default nginx container with the client's application:
Update Deployment Image¶
# Edit deployment
kubectl set image deployment/web \
nginx=<client-image>:<version> \
-n sh-acme-corp
# Example:
kubectl set image deployment/web \
nginx=acme-corp/webapp:v1.2.3 \
-n sh-acme-corp
Verify Rolling Update¶
# Watch rollout
kubectl rollout status deployment/web -n sh-acme-corp
# Check pod status
kubectl get pods -n sh-acme-corp -w
Rollback if Needed¶
# Check rollout history
kubectl rollout history deployment/web -n sh-acme-corp
# Rollback to previous version
kubectl rollout undo deployment/web -n sh-acme-corp
Step 9: Update Client Registry¶
Add the client to the registry for tracking:
# Client registry location
REGISTRY="/home/claude/ge-bootstrap/config/clients.yaml"
# The create-client.sh script automatically updates this
# Verify entry exists
grep "name: acme-corp" "$REGISTRY"
Example Registry Entry:
clients:
- name: acme-corp
type: shared
namespace: sh-acme-corp
resources: small
domain: acme-corp.hosting.growing-europe.com
created: 2026-01-29T10:00:00Z
status: active
Step 10: Final Verification Checklist¶
Use this checklist to confirm successful onboarding:
CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"
DOMAIN="${CLIENT_NAME}.hosting.growing-europe.com"
echo "=== Client Onboarding Verification ==="
echo ""
# 1. Namespace exists
kubectl get namespace "$NAMESPACE" &>/dev/null && echo "✅ Namespace exists" || echo "❌ Namespace missing"
# 2. Deployment is ready
kubectl get deployment web -n "$NAMESPACE" -o jsonpath='{.status.conditions[?(@.type=="Available")].status}' | grep -q "True" && echo "✅ Deployment ready" || echo "❌ Deployment not ready"
# 3. Service exists
kubectl get service web -n "$NAMESPACE" &>/dev/null && echo "✅ Service exists" || echo "❌ Service missing"
# 4. Ingress exists
kubectl get ingress web -n "$NAMESPACE" &>/dev/null && echo "✅ Ingress exists" || echo "❌ Ingress missing"
# 5. NetworkPolicy exists
kubectl get networkpolicy client-isolation -n "$NAMESPACE" &>/dev/null && echo "✅ NetworkPolicy exists" || echo "❌ NetworkPolicy missing"
# 6. DNS resolves
dig +short "$DOMAIN" | grep -q "\." && echo "✅ DNS resolves" || echo "❌ DNS not resolving"
# 7. HTTPS access works
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://$DOMAIN" 2>/dev/null)
[[ "$HTTP_CODE" == "200" ]] && echo "✅ HTTPS access works" || echo "⚠️ HTTPS returned: $HTTP_CODE"
# 8. Certificate is valid
CERT_ISSUER=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN":443 2>/dev/null | openssl x509 -noout -issuer 2>/dev/null | grep -i "Let's Encrypt")
[[ -n "$CERT_ISSUER" ]] && echo "✅ SSL certificate valid" || echo "⚠️ Check SSL certificate"
echo ""
echo "=== Verification Complete ==="
Expected Output:
=== Client Onboarding Verification ===
✅ Namespace exists
✅ Deployment ready
✅ Service exists
✅ Ingress exists
✅ NetworkPolicy exists
✅ DNS resolves
✅ HTTPS access works
✅ SSL certificate valid
=== Verification Complete ===
Troubleshooting¶
Issue: Deployment Not Ready¶
Symptoms: Pods not starting, CrashLoopBackOff
Steps: 1. Check pod status:
-
Check logs:
-
Check events:
Common Causes: - Image pull errors (check image name and registry access) - Resource limits too restrictive (increase tier) - Missing secrets (provision secrets in Vault) - Health check failures (adjust probe timing)
Issue: HTTPS Returns 404¶
Symptoms: DNS resolves, but HTTPS returns 404
Steps: 1. Check Ingress configuration:
-
Check Traefik routing:
-
Verify service endpoints:
Common Causes: - Ingress hostname mismatch - Service selector not matching pod labels - No healthy pods (readiness probes failing)
Issue: SSL Certificate Not Issued¶
Symptoms: HTTPS connection fails or shows invalid certificate
Steps: 1. Check Docker Traefik logs:
-
Verify Let's Encrypt rate limits:
-
Check DNS propagation:
Common Causes: - DNS not propagated yet (wait 5-10 minutes) - Let's Encrypt rate limit reached (50 certs/week per domain) - Firewall blocking port 80 (required for HTTP-01 challenge)
Post-Onboarding Tasks¶
Monitor Resource Usage¶
# Check resource consumption
kubectl top pods -n sh-acme-corp
# Check resource requests vs usage
kubectl describe node | grep -A 10 "sh-acme-corp"
Set Up Monitoring¶
- Configure Prometheus scraping (if client exports metrics)
- Add Grafana dashboard for client workload
- Set up alerts for pod failures, high CPU/memory usage
Documentation¶
- Update client registry with contact information
- Document any custom configurations
- Create runbook for client-specific operations
Client Offboarding¶
To remove a client:
CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"
# Delete namespace (removes all resources)
kubectl delete namespace "$NAMESPACE"
# Remove client directory
rm -rf "/home/claude/ge-bootstrap/k8s/clients/$CLIENT_NAME"
# Update client registry
# Edit /home/claude/ge-bootstrap/config/clients.yaml
# Change status to: decommissioned
# Certificate cleanup (optional - will auto-expire)
# Traefik will stop renewing the certificate
Related Documentation¶
- Hosting Architecture - Architecture overview
- Deployment Packages - Immutable deployment packages
- Zero-Downtime Deployments - Update procedures
- Platform Startup - Platform management
For questions or issues with client onboarding, contact the GE Infrastructure Team or escalate via the incident response process.