Client Onboarding Runbook¶

Last Updated: 2026-01-29 Maintained by: GE Infrastructure Team Estimated Time: 15-30 minutes per client

Overview¶

This runbook guides you through onboarding a new client to the GE Unified Hosting Platform. The process creates a dedicated namespace, configures ingress routing, provisions SSL certificates, and deploys the client workload.

Prerequisites¶

Before starting, ensure you have:

[ ] Access to the Kubernetes cluster (kubectl configured)
[ ] SSH/console access to the GE infrastructure server
[ ] Client requirements documented (see below)
[ ] DNS access (if custom domain required)
[ ] Vault access (for secrets management)

Required Client Information¶

Field	Description	Example
Client Name	Lowercase, alphanumeric, hyphens	`acme-corp`
Hosting Type	`shared` or `dedicated`	`shared`
Resource Tier	`small`, `medium`, `large`, `xlarge`	`small`
Domain	Subdomain under hosting.growing-europe.com	`acme-corp.hosting.growing-europe.com`
Application Image	Container image (optional, default nginx)	`nginx:alpine`

Step 1: Validate Client Name¶

Client names must follow Kubernetes naming conventions:

Rules: - Lowercase only - Start with a letter - Contain only a-z, 0-9, and hyphens - Maximum 50 characters - Must be unique (no existing namespace with sh-{name} or ded-{name})

Examples:

# Valid
acme-corp
test-client-01
bigcorp

# Invalid
AcmeCorp          # Uppercase
acme_corp         # Underscore
-acme             # Starts with hyphen
acme-corp.com     # Contains period

Verification:

# Check if namespace already exists
CLIENT_NAME="acme-corp"
TYPE="shared"  # or "dedicated"

# Determine namespace
if [[ "$TYPE" == "shared" ]]; then
    NAMESPACE="sh-${CLIENT_NAME}"
else
    NAMESPACE="ded-${CLIENT_NAME}"
fi

# Check for conflicts
kubectl get namespace "$NAMESPACE" 2>/dev/null && echo "❌ Namespace exists" || echo "✅ Available"

Step 2: Choose Resource Tier¶

Select the appropriate tier based on client requirements:

Shared Hosting Tiers¶

Tier	CPU Request	CPU Limit	Memory Request	Memory Limit	Replicas	Monthly Traffic Estimate
Small	10m	100m	32Mi	128Mi	1	<10k requests/day
Medium	50m	250m	64Mi	256Mi	2	10k-100k requests/day
Large	100m	500m	128Mi	512Mi	2	100k-1M requests/day

Dedicated Hosting Tiers¶

Tier	CPU Request	CPU Limit	Memory Request	Memory Limit	Min Replicas	Max Replicas	Use Case
Large	100m	500m	128Mi	512Mi	2	10	High availability required
XLarge	200m	1000m	256Mi	1Gi	3	10	High traffic, mission-critical

Decision Matrix: - Small: Development, staging, low-traffic sites - Medium: Production sites, moderate traffic - Large: High-traffic sites, business-critical - XLarge: Enterprise clients, guaranteed uptime SLAs

Step 3: Create Client Environment¶

Use the create-client.sh script to generate the client environment:

Dry Run (Recommended First)¶

cd /home/claude/ge-bootstrap/tools

# Shared hosting example
./create-client.sh \
  --type shared \
  --name acme-corp \
  --resources small \
  --dry-run

# Dedicated hosting example
./create-client.sh \
  --type dedicated \
  --name bigcorp \
  --resources large \
  --dry-run

Review the output: - Check generated manifests - Verify namespace, labels, annotations - Confirm resource limits - Review ingress configuration

Apply to Cluster¶

Once verified, run without --dry-run:

./create-client.sh \
  --type shared \
  --name acme-corp \
  --resources small

Expected Output:

==========================================
Creating shared client: acme-corp
==========================================
[INFO] Generating overlay for acme-corp...
[OK] Overlay generated at /home/claude/ge-bootstrap/k8s/clients/acme-corp
[INFO] Deploying client acme-corp...
[INFO] Waiting for deployment...
deployment.apps/web condition met
[OK] Client acme-corp deployed successfully

[INFO] Client status:
NAME                  READY   STATUS    RESTARTS   AGE
pod/web-xxxxx-xxxxx   1/1     Running   0          10s

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/web   ClusterIP   10.43.xxx.xxx   <none>        80/TCP    10s

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           10s

[OK] Client URL: https://acme-corp.hosting.growing-europe.com
==========================================
[OK] Done!
==========================================

Step 4: Verify Deployment¶

Check Pod Status¶

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"

# Check all resources
kubectl get all -n "$NAMESPACE"

# Check pod logs
kubectl logs -n "$NAMESPACE" -l app=web --tail=50

# Check pod events
kubectl get events -n "$NAMESPACE" --sort-by='.lastTimestamp'

Healthy Output:

NAME                      READY   STATUS    RESTARTS   AGE
pod/web-xxxxx-xxxxx       1/1     Running   0          2m

NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/web   ClusterIP   10.43.xx.xx    <none>        80/TCP    2m

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           2m

Check Ingress Routing¶

# Check Ingress resource
kubectl get ingress -n "$NAMESPACE"

# Describe Ingress (check annotations and rules)
kubectl describe ingress web -n "$NAMESPACE"

# Check Traefik logs
kubectl logs -n ge-ingress deploy/traefik | grep "$CLIENT_NAME"

Verify Ingress configuration:

Name:             web
Namespace:        sh-acme-corp
Address:          10.43.0.1
Ingress Class:    traefik
Rules:
  Host                                  Path  Backends
  ----                                  ----  --------
  acme-corp.hosting.growing-europe.com
                                        /     web:80 (10.42.x.x:8080)
Annotations:
  traefik.ingress.kubernetes.io/router.entrypoints:  websecure
  traefik.ingress.kubernetes.io/router.tls:           true
  traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt

Step 5: DNS Configuration¶

Standard Subdomain¶

For standard subdomains under hosting.growing-europe.com, DNS is automatically configured:

DNS Record (handled by infrastructure):

Type: A
Name: *.hosting.growing-europe.com
Value: <server-public-ip>
TTL: 3600

No action required for standard subdomains.

Custom Domain (Optional)¶

If the client requires a custom domain:

Create DNS CNAME record (client's DNS provider):

Type: CNAME
Name: www.client-domain.com
Value: acme-corp.hosting.growing-europe.com
TTL: 3600

Update Ingress to include custom domain:

kubectl edit ingress web -n sh-acme-corp

Add custom domain to spec.rules and spec.tls.hosts:

href="#__codelineno-12-1">spec: tls: - hosts: - acme-corp.hosting.growing-europe.com - www.client-domain.com # Add custom domain secretName: acme-corp-tls rules: - host: acme-corp.hosting.growing-europe.com http: paths: - path: / pathType: Prefix backend: service: name: web port: number: 80 - host: www.client-domain.com # Add custom domain http: paths: - path: / pathType: Prefix backend: service: name: web port: number: 80

Step 6: SSL Certificate Provisioning¶

SSL certificates are automatically provisioned via Let's Encrypt (Docker Traefik).

Verification¶

# Wait for certificate issuance (can take 1-5 minutes)
sleep 60

# Test HTTPS access
curl -I https://acme-corp.hosting.growing-europe.com

# Expected output:
# HTTP/2 200
# server: nginx
# ...

Check Certificate Details¶

# Check certificate issuer
echo | openssl s_client -servername acme-corp.hosting.growing-europe.com \
  -connect acme-corp.hosting.growing-europe.com:443 2>/dev/null | \
  openssl x509 -noout -issuer -dates

# Expected output:
# issuer=C = US, O = Let's Encrypt, CN = R3
# notBefore=Jan 29 09:00:00 2026 GMT
# notAfter=Apr 29 09:00:00 2026 GMT

Troubleshooting Certificate Issues¶

Certificate not issued after 5 minutes:

Check Docker Traefik logs:

docker logs traefik 2>&1 | grep -i "acme\|letsencrypt\|${CLIENT_NAME}"

Check Let's Encrypt rate limits:

# Check certificate transparency logs
curl -s "https://crt.sh/?q=%.hosting.growing-europe.com&output=json" | jq '.[0:5]'

Verify DNS resolution:

dig acme-corp.hosting.growing-europe.com +short

Check Traefik configuration:

docker exec traefik cat /etc/traefik/traefik.toml | grep -A 10 letsencrypt

Step 7: Configure Secrets (If Required)¶

If the client application requires secrets:

Create Secrets in Vault¶

# Access Vault
kubectl port-forward -n ge-system svc/vault 8200:8200 &

export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN="<root-token>"

# Create client secrets
vault kv put secret/clients/acme-corp \
  redis-password="<secure-password>" \
  api-key="<client-api-key>" \
  custom-secret="<value>"

Reference Secrets in K8s¶

Update the client deployment to reference secrets:

# Create K8s secret from Vault
kubectl create secret generic client-secrets \
  -n sh-acme-corp \
  --from-literal=redis-password="<value>" \
  --from-literal=api-key="<value>"

# Update deployment to mount secrets
kubectl edit deployment web -n sh-acme-corp

Add volume and volumeMount:

spec:
  template:
    spec:
      containers:
        - name: nginx
          volumeMounts:
            - name: secrets
              mountPath: /run/secrets
              readOnly: true
      volumes:
        - name: secrets
          secret:
            secretName: client-secrets

Step 8: Deploy Client Application¶

Replace the default nginx container with the client's application:

Update Deployment Image¶

# Edit deployment
kubectl set image deployment/web \
  nginx=<client-image>:<version> \
  -n sh-acme-corp

# Example:
kubectl set image deployment/web \
  nginx=acme-corp/webapp:v1.2.3 \
  -n sh-acme-corp

Verify Rolling Update¶

# Watch rollout
kubectl rollout status deployment/web -n sh-acme-corp

# Check pod status
kubectl get pods -n sh-acme-corp -w

Rollback if Needed¶

# Check rollout history
kubectl rollout history deployment/web -n sh-acme-corp

# Rollback to previous version
kubectl rollout undo deployment/web -n sh-acme-corp

Step 9: Update Client Registry¶

Add the client to the registry for tracking:

# Client registry location
REGISTRY="/home/claude/ge-bootstrap/config/clients.yaml"

# The create-client.sh script automatically updates this
# Verify entry exists
grep "name: acme-corp" "$REGISTRY"

Example Registry Entry:

clients:
  - name: acme-corp
    type: shared
    namespace: sh-acme-corp
    resources: small
    domain: acme-corp.hosting.growing-europe.com
    created: 2026-01-29T10:00:00Z
    status: active

Step 10: Final Verification Checklist¶

Use this checklist to confirm successful onboarding:

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"
DOMAIN="${CLIENT_NAME}.hosting.growing-europe.com"

echo "=== Client Onboarding Verification ==="
echo ""

# 1. Namespace exists
kubectl get namespace "$NAMESPACE" &>/dev/null && echo "✅ Namespace exists" || echo "❌ Namespace missing"

# 2. Deployment is ready
kubectl get deployment web -n "$NAMESPACE" -o jsonpath='{.status.conditions[?(@.type=="Available")].status}' | grep -q "True" && echo "✅ Deployment ready" || echo "❌ Deployment not ready"

# 3. Service exists
kubectl get service web -n "$NAMESPACE" &>/dev/null && echo "✅ Service exists" || echo "❌ Service missing"

# 4. Ingress exists
kubectl get ingress web -n "$NAMESPACE" &>/dev/null && echo "✅ Ingress exists" || echo "❌ Ingress missing"

# 5. NetworkPolicy exists
kubectl get networkpolicy client-isolation -n "$NAMESPACE" &>/dev/null && echo "✅ NetworkPolicy exists" || echo "❌ NetworkPolicy missing"

# 6. DNS resolves
dig +short "$DOMAIN" | grep -q "\." && echo "✅ DNS resolves" || echo "❌ DNS not resolving"

# 7. HTTPS access works
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://$DOMAIN" 2>/dev/null)
[[ "$HTTP_CODE" == "200" ]] && echo "✅ HTTPS access works" || echo "⚠️  HTTPS returned: $HTTP_CODE"

# 8. Certificate is valid
CERT_ISSUER=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN":443 2>/dev/null | openssl x509 -noout -issuer 2>/dev/null | grep -i "Let's Encrypt")
[[ -n "$CERT_ISSUER" ]] && echo "✅ SSL certificate valid" || echo "⚠️  Check SSL certificate"

echo ""
echo "=== Verification Complete ==="

Expected Output:

=== Client Onboarding Verification ===

✅ Namespace exists
✅ Deployment ready
✅ Service exists
✅ Ingress exists
✅ NetworkPolicy exists
✅ DNS resolves
✅ HTTPS access works
✅ SSL certificate valid

=== Verification Complete ===

Troubleshooting¶

Issue: Deployment Not Ready¶

Symptoms: Pods not starting, CrashLoopBackOff

Steps: 1. Check pod status:

kubectl get pods -n sh-acme-corp
kubectl describe pod <pod-name> -n sh-acme-corp

Check logs:

kubectl logs <pod-name> -n sh-acme-corp

Check events:

kubectl get events -n sh-acme-corp --sort-by='.lastTimestamp'

Common Causes: - Image pull errors (check image name and registry access) - Resource limits too restrictive (increase tier) - Missing secrets (provision secrets in Vault) - Health check failures (adjust probe timing)

Issue: HTTPS Returns 404¶

Symptoms: DNS resolves, but HTTPS returns 404

Steps: 1. Check Ingress configuration:

kubectl describe ingress web -n sh-acme-corp

Check Traefik routing:

kubectl logs -n ge-ingress deploy/traefik | grep acme-corp

Verify service endpoints:

kubectl get endpoints web -n sh-acme-corp

Common Causes: - Ingress hostname mismatch - Service selector not matching pod labels - No healthy pods (readiness probes failing)

Issue: SSL Certificate Not Issued¶

Symptoms: HTTPS connection fails or shows invalid certificate

Steps: 1. Check Docker Traefik logs:

docker logs traefik 2>&1 | tail -100 | grep -i acme

Verify Let's Encrypt rate limits:

curl -s "https://crt.sh/?q=%.hosting.growing-europe.com&output=json" | jq '. | length'

Check DNS propagation:

dig acme-corp.hosting.growing-europe.com +trace

Common Causes: - DNS not propagated yet (wait 5-10 minutes) - Let's Encrypt rate limit reached (50 certs/week per domain) - Firewall blocking port 80 (required for HTTP-01 challenge)

Post-Onboarding Tasks¶

Monitor Resource Usage¶

# Check resource consumption
kubectl top pods -n sh-acme-corp

# Check resource requests vs usage
kubectl describe node | grep -A 10 "sh-acme-corp"

Set Up Monitoring¶

Configure Prometheus scraping (if client exports metrics)
Add Grafana dashboard for client workload
Set up alerts for pod failures, high CPU/memory usage

Documentation¶

Update client registry with contact information
Document any custom configurations
Create runbook for client-specific operations

Client Offboarding¶

To remove a client:

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"

# Delete namespace (removes all resources)
kubectl delete namespace "$NAMESPACE"

# Remove client directory
rm -rf "/home/claude/ge-bootstrap/k8s/clients/$CLIENT_NAME"

# Update client registry
# Edit /home/claude/ge-bootstrap/config/clients.yaml
# Change status to: decommissioned

# Certificate cleanup (optional - will auto-expire)
# Traefik will stop renewing the certificate

Hosting Architecture - Architecture overview
Deployment Packages - Immutable deployment packages
Zero-Downtime Deployments - Update procedures
Platform Startup - Platform management

For questions or issues with client onboarding, contact the GE Infrastructure Team or escalate via the incident response process.