Skip to content

Client Onboarding Runbook

Last Updated: 2026-01-29 Maintained by: GE Infrastructure Team Estimated Time: 15-30 minutes per client


Overview

This runbook guides you through onboarding a new client to the GE Unified Hosting Platform. The process creates a dedicated namespace, configures ingress routing, provisions SSL certificates, and deploys the client workload.


Prerequisites

Before starting, ensure you have:

  • [ ] Access to the Kubernetes cluster (kubectl configured)
  • [ ] SSH/console access to the GE infrastructure server
  • [ ] Client requirements documented (see below)
  • [ ] DNS access (if custom domain required)
  • [ ] Vault access (for secrets management)

Required Client Information

Field Description Example
Client Name Lowercase, alphanumeric, hyphens acme-corp
Hosting Type shared or dedicated shared
Resource Tier small, medium, large, xlarge small
Domain Subdomain under hosting.growing-europe.com acme-corp.hosting.growing-europe.com
Application Image Container image (optional, default nginx) nginx:alpine

Step 1: Validate Client Name

Client names must follow Kubernetes naming conventions:

Rules: - Lowercase only - Start with a letter - Contain only a-z, 0-9, and hyphens - Maximum 50 characters - Must be unique (no existing namespace with sh-{name} or ded-{name})

Examples:

# Valid
acme-corp
test-client-01
bigcorp

# Invalid
AcmeCorp          # Uppercase
acme_corp         # Underscore
-acme             # Starts with hyphen
acme-corp.com     # Contains period

Verification:

# Check if namespace already exists
CLIENT_NAME="acme-corp"
TYPE="shared"  # or "dedicated"

# Determine namespace
if [[ "$TYPE" == "shared" ]]; then
    NAMESPACE="sh-${CLIENT_NAME}"
else
    NAMESPACE="ded-${CLIENT_NAME}"
fi

# Check for conflicts
kubectl get namespace "$NAMESPACE" 2>/dev/null && echo "❌ Namespace exists" || echo "✅ Available"


Step 2: Choose Resource Tier

Select the appropriate tier based on client requirements:

Shared Hosting Tiers

Tier CPU Request CPU Limit Memory Request Memory Limit Replicas Monthly Traffic Estimate
Small 10m 100m 32Mi 128Mi 1 <10k requests/day
Medium 50m 250m 64Mi 256Mi 2 10k-100k requests/day
Large 100m 500m 128Mi 512Mi 2 100k-1M requests/day

Dedicated Hosting Tiers

Tier CPU Request CPU Limit Memory Request Memory Limit Min Replicas Max Replicas Use Case
Large 100m 500m 128Mi 512Mi 2 10 High availability required
XLarge 200m 1000m 256Mi 1Gi 3 10 High traffic, mission-critical

Decision Matrix: - Small: Development, staging, low-traffic sites - Medium: Production sites, moderate traffic - Large: High-traffic sites, business-critical - XLarge: Enterprise clients, guaranteed uptime SLAs


Step 3: Create Client Environment

Use the create-client.sh script to generate the client environment:

cd /home/claude/ge-bootstrap/tools

# Shared hosting example
./create-client.sh \
  --type shared \
  --name acme-corp \
  --resources small \
  --dry-run

# Dedicated hosting example
./create-client.sh \
  --type dedicated \
  --name bigcorp \
  --resources large \
  --dry-run

Review the output: - Check generated manifests - Verify namespace, labels, annotations - Confirm resource limits - Review ingress configuration

Apply to Cluster

Once verified, run without --dry-run:

./create-client.sh \
  --type shared \
  --name acme-corp \
  --resources small

Expected Output:

==========================================
Creating shared client: acme-corp
==========================================
[INFO] Generating overlay for acme-corp...
[OK] Overlay generated at /home/claude/ge-bootstrap/k8s/clients/acme-corp
[INFO] Deploying client acme-corp...
[INFO] Waiting for deployment...
deployment.apps/web condition met
[OK] Client acme-corp deployed successfully

[INFO] Client status:
NAME                  READY   STATUS    RESTARTS   AGE
pod/web-xxxxx-xxxxx   1/1     Running   0          10s

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/web   ClusterIP   10.43.xxx.xxx   <none>        80/TCP    10s

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           10s

[OK] Client URL: https://acme-corp.hosting.growing-europe.com
==========================================
[OK] Done!
==========================================


Step 4: Verify Deployment

Check Pod Status

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"

# Check all resources
kubectl get all -n "$NAMESPACE"

# Check pod logs
kubectl logs -n "$NAMESPACE" -l app=web --tail=50

# Check pod events
kubectl get events -n "$NAMESPACE" --sort-by='.lastTimestamp'

Healthy Output:

NAME                      READY   STATUS    RESTARTS   AGE
pod/web-xxxxx-xxxxx       1/1     Running   0          2m

NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/web   ClusterIP   10.43.xx.xx    <none>        80/TCP    2m

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/web   1/1     1            1           2m

Check Ingress Routing

# Check Ingress resource
kubectl get ingress -n "$NAMESPACE"

# Describe Ingress (check annotations and rules)
kubectl describe ingress web -n "$NAMESPACE"

# Check Traefik logs
kubectl logs -n ge-ingress deploy/traefik | grep "$CLIENT_NAME"

Verify Ingress configuration:

Name:             web
Namespace:        sh-acme-corp
Address:          10.43.0.1
Ingress Class:    traefik
Rules:
  Host                                  Path  Backends
  ----                                  ----  --------
  acme-corp.hosting.growing-europe.com
                                        /     web:80 (10.42.x.x:8080)
Annotations:
  traefik.ingress.kubernetes.io/router.entrypoints:  websecure
  traefik.ingress.kubernetes.io/router.tls:           true
  traefik.ingress.kubernetes.io/router.tls.certresolver: letsencrypt


Step 5: DNS Configuration

Standard Subdomain

For standard subdomains under hosting.growing-europe.com, DNS is automatically configured:

DNS Record (handled by infrastructure):

Type: A
Name: *.hosting.growing-europe.com
Value: <server-public-ip>
TTL: 3600

No action required for standard subdomains.

Custom Domain (Optional)

If the client requires a custom domain:

  1. Create DNS CNAME record (client's DNS provider):

    Type: CNAME
    Name: www.client-domain.com
    Value: acme-corp.hosting.growing-europe.com
    TTL: 3600
    

  2. Update Ingress to include custom domain:

    kubectl edit ingress web -n sh-acme-corp
    

Add custom domain to spec.rules and spec.tls.hosts:

spec:
  tls:
    - hosts:
        - acme-corp.hosting.growing-europe.com
        - www.client-domain.com  # Add custom domain
      secretName: acme-corp-tls
  rules:
    - host: acme-corp.hosting.growing-europe.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80
    - host: www.client-domain.com  # Add custom domain
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80


Step 6: SSL Certificate Provisioning

SSL certificates are automatically provisioned via Let's Encrypt (Docker Traefik).

Verification

# Wait for certificate issuance (can take 1-5 minutes)
sleep 60

# Test HTTPS access
curl -I https://acme-corp.hosting.growing-europe.com

# Expected output:
# HTTP/2 200
# server: nginx
# ...

Check Certificate Details

# Check certificate issuer
echo | openssl s_client -servername acme-corp.hosting.growing-europe.com \
  -connect acme-corp.hosting.growing-europe.com:443 2>/dev/null | \
  openssl x509 -noout -issuer -dates

# Expected output:
# issuer=C = US, O = Let's Encrypt, CN = R3
# notBefore=Jan 29 09:00:00 2026 GMT
# notAfter=Apr 29 09:00:00 2026 GMT

Troubleshooting Certificate Issues

Certificate not issued after 5 minutes:

  1. Check Docker Traefik logs:

    docker logs traefik 2>&1 | grep -i "acme\|letsencrypt\|${CLIENT_NAME}"
    

  2. Check Let's Encrypt rate limits:

    # Check certificate transparency logs
    curl -s "https://crt.sh/?q=%.hosting.growing-europe.com&output=json" | jq '.[0:5]'
    

  3. Verify DNS resolution:

    dig acme-corp.hosting.growing-europe.com +short
    

  4. Check Traefik configuration:

    docker exec traefik cat /etc/traefik/traefik.toml | grep -A 10 letsencrypt
    


Step 7: Configure Secrets (If Required)

If the client application requires secrets:

Create Secrets in Vault

# Access Vault
kubectl port-forward -n ge-system svc/vault 8200:8200 &

export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN="<root-token>"

# Create client secrets
vault kv put secret/clients/acme-corp \
  redis-password="<secure-password>" \
  api-key="<client-api-key>" \
  custom-secret="<value>"

Reference Secrets in K8s

Update the client deployment to reference secrets:

# Create K8s secret from Vault
kubectl create secret generic client-secrets \
  -n sh-acme-corp \
  --from-literal=redis-password="<value>" \
  --from-literal=api-key="<value>"

# Update deployment to mount secrets
kubectl edit deployment web -n sh-acme-corp

Add volume and volumeMount:

spec:
  template:
    spec:
      containers:
        - name: nginx
          volumeMounts:
            - name: secrets
              mountPath: /run/secrets
              readOnly: true
      volumes:
        - name: secrets
          secret:
            secretName: client-secrets


Step 8: Deploy Client Application

Replace the default nginx container with the client's application:

Update Deployment Image

# Edit deployment
kubectl set image deployment/web \
  nginx=<client-image>:<version> \
  -n sh-acme-corp

# Example:
kubectl set image deployment/web \
  nginx=acme-corp/webapp:v1.2.3 \
  -n sh-acme-corp

Verify Rolling Update

# Watch rollout
kubectl rollout status deployment/web -n sh-acme-corp

# Check pod status
kubectl get pods -n sh-acme-corp -w

Rollback if Needed

# Check rollout history
kubectl rollout history deployment/web -n sh-acme-corp

# Rollback to previous version
kubectl rollout undo deployment/web -n sh-acme-corp

Step 9: Update Client Registry

Add the client to the registry for tracking:

# Client registry location
REGISTRY="/home/claude/ge-bootstrap/config/clients.yaml"

# The create-client.sh script automatically updates this
# Verify entry exists
grep "name: acme-corp" "$REGISTRY"

Example Registry Entry:

clients:
  - name: acme-corp
    type: shared
    namespace: sh-acme-corp
    resources: small
    domain: acme-corp.hosting.growing-europe.com
    created: 2026-01-29T10:00:00Z
    status: active


Step 10: Final Verification Checklist

Use this checklist to confirm successful onboarding:

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"
DOMAIN="${CLIENT_NAME}.hosting.growing-europe.com"

echo "=== Client Onboarding Verification ==="
echo ""

# 1. Namespace exists
kubectl get namespace "$NAMESPACE" &>/dev/null && echo "✅ Namespace exists" || echo "❌ Namespace missing"

# 2. Deployment is ready
kubectl get deployment web -n "$NAMESPACE" -o jsonpath='{.status.conditions[?(@.type=="Available")].status}' | grep -q "True" && echo "✅ Deployment ready" || echo "❌ Deployment not ready"

# 3. Service exists
kubectl get service web -n "$NAMESPACE" &>/dev/null && echo "✅ Service exists" || echo "❌ Service missing"

# 4. Ingress exists
kubectl get ingress web -n "$NAMESPACE" &>/dev/null && echo "✅ Ingress exists" || echo "❌ Ingress missing"

# 5. NetworkPolicy exists
kubectl get networkpolicy client-isolation -n "$NAMESPACE" &>/dev/null && echo "✅ NetworkPolicy exists" || echo "❌ NetworkPolicy missing"

# 6. DNS resolves
dig +short "$DOMAIN" | grep -q "\." && echo "✅ DNS resolves" || echo "❌ DNS not resolving"

# 7. HTTPS access works
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://$DOMAIN" 2>/dev/null)
[[ "$HTTP_CODE" == "200" ]] && echo "✅ HTTPS access works" || echo "⚠️  HTTPS returned: $HTTP_CODE"

# 8. Certificate is valid
CERT_ISSUER=$(echo | openssl s_client -servername "$DOMAIN" -connect "$DOMAIN":443 2>/dev/null | openssl x509 -noout -issuer 2>/dev/null | grep -i "Let's Encrypt")
[[ -n "$CERT_ISSUER" ]] && echo "✅ SSL certificate valid" || echo "⚠️  Check SSL certificate"

echo ""
echo "=== Verification Complete ==="

Expected Output:

=== Client Onboarding Verification ===

✅ Namespace exists
✅ Deployment ready
✅ Service exists
✅ Ingress exists
✅ NetworkPolicy exists
✅ DNS resolves
✅ HTTPS access works
✅ SSL certificate valid

=== Verification Complete ===


Troubleshooting

Issue: Deployment Not Ready

Symptoms: Pods not starting, CrashLoopBackOff

Steps: 1. Check pod status:

kubectl get pods -n sh-acme-corp
kubectl describe pod <pod-name> -n sh-acme-corp

  1. Check logs:

    kubectl logs <pod-name> -n sh-acme-corp
    

  2. Check events:

    kubectl get events -n sh-acme-corp --sort-by='.lastTimestamp'
    

Common Causes: - Image pull errors (check image name and registry access) - Resource limits too restrictive (increase tier) - Missing secrets (provision secrets in Vault) - Health check failures (adjust probe timing)

Issue: HTTPS Returns 404

Symptoms: DNS resolves, but HTTPS returns 404

Steps: 1. Check Ingress configuration:

kubectl describe ingress web -n sh-acme-corp

  1. Check Traefik routing:

    kubectl logs -n ge-ingress deploy/traefik | grep acme-corp
    

  2. Verify service endpoints:

    kubectl get endpoints web -n sh-acme-corp
    

Common Causes: - Ingress hostname mismatch - Service selector not matching pod labels - No healthy pods (readiness probes failing)

Issue: SSL Certificate Not Issued

Symptoms: HTTPS connection fails or shows invalid certificate

Steps: 1. Check Docker Traefik logs:

docker logs traefik 2>&1 | tail -100 | grep -i acme

  1. Verify Let's Encrypt rate limits:

    curl -s "https://crt.sh/?q=%.hosting.growing-europe.com&output=json" | jq '. | length'
    

  2. Check DNS propagation:

    dig acme-corp.hosting.growing-europe.com +trace
    

Common Causes: - DNS not propagated yet (wait 5-10 minutes) - Let's Encrypt rate limit reached (50 certs/week per domain) - Firewall blocking port 80 (required for HTTP-01 challenge)


Post-Onboarding Tasks

Monitor Resource Usage

# Check resource consumption
kubectl top pods -n sh-acme-corp

# Check resource requests vs usage
kubectl describe node | grep -A 10 "sh-acme-corp"

Set Up Monitoring

  • Configure Prometheus scraping (if client exports metrics)
  • Add Grafana dashboard for client workload
  • Set up alerts for pod failures, high CPU/memory usage

Documentation

  • Update client registry with contact information
  • Document any custom configurations
  • Create runbook for client-specific operations

Client Offboarding

To remove a client:

CLIENT_NAME="acme-corp"
NAMESPACE="sh-${CLIENT_NAME}"

# Delete namespace (removes all resources)
kubectl delete namespace "$NAMESPACE"

# Remove client directory
rm -rf "/home/claude/ge-bootstrap/k8s/clients/$CLIENT_NAME"

# Update client registry
# Edit /home/claude/ge-bootstrap/config/clients.yaml
# Change status to: decommissioned

# Certificate cleanup (optional - will auto-expire)
# Traefik will stop renewing the certificate


For questions or issues with client onboarding, contact the GE Infrastructure Team or escalate via the incident response process.