DOMAIN:NETWORKING:TLS_CERTIFICATES¶
OWNER: stef UPDATED: 2026-03-24 SCOPE: all TLS certificate operations for GE and client domains AGENTS: stef (primary), margot (client cert CSR relay), victoria (wildcard approval), piotr (Vault credentials) CA: Let's Encrypt (primary), client-provided CAs (custom flow) TOOL: cert-manager (k8s-native certificate automation)
TLS:OVERVIEW¶
PURPOSE: automated TLS certificate issuance, renewal, and management CA: Let's Encrypt (free, automated, trusted by all browsers) TOOL: cert-manager in Kubernetes (ClusterIssuer + Certificate CRDs) CHALLENGE: DNS-01 via TransIP webhook (supports wildcards, no inbound HTTP needed) TERMINATION: Traefik IngressController (TLS terminates at ingress, plain HTTP to backends) MIN_TLS_VERSION: TLS 1.2 (enforced via TLSOption CRD)
RULE: all public-facing endpoints MUST use HTTPS RULE: HTTP (port 80) redirects to HTTPS — never serves content RULE: TLS 1.0 and 1.1 are disabled (insecure, deprecated) RULE: cert-manager handles renewal — never manually renew Let's Encrypt certs
TLS:CERT_MANAGER_SETUP¶
CLUSTERISSUER¶
GE uses two ClusterIssuers: staging (for testing) and production.
STAGING (use first to avoid rate limits):
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: certs@growing-europe.com
privateKeySecretRef:
name: letsencrypt-staging-key
solvers:
- dns01:
webhook:
groupName: acme.transip.nl
solverName: transip
config:
accountName: ${TRANSIP_ACCOUNT}
privateKeySecretRef:
name: transip-api-key
key: key
PRODUCTION:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: certs@growing-europe.com
privateKeySecretRef:
name: letsencrypt-production-key
solvers:
- dns01:
webhook:
groupName: acme.transip.nl
solverName: transip
config:
accountName: ${TRANSIP_ACCOUNT}
privateKeySecretRef:
name: transip-api-key
key: key
RULE: always test with staging issuer first — Let's Encrypt production has rate limits (50 certs/week/domain)
CERTIFICATE_RESOURCE¶
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {domain}-tls
namespace: {namespace}
spec:
secretName: {domain}-tls-secret
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
dnsNames:
- {domain}
- www.{domain}
renewBefore: 720h # renew 30 days before expiry (Let's Encrypt certs = 90 days)
INGRESS_ANNOTATION (alternative to Certificate resource)¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {app}
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-production"
spec:
tls:
- hosts:
- {domain}
secretName: {domain}-tls-secret
rules:
- host: {domain}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {app}
port:
number: 80
TLS:CHALLENGE_TYPES¶
DNS_01 (GE default)¶
HOW_IT_WORKS:
1. cert-manager requests certificate from Let's Encrypt
2. Let's Encrypt returns challenge token
3. cert-manager creates TXT record via TransIP webhook:
_acme-challenge.{domain} TXT "{token}"
4. Let's Encrypt verifies TXT record
5. Certificate issued, stored as k8s Secret
6. Challenge TXT record cleaned up
ADVANTAGES: - Works for wildcard certificates (*.example.com) - Does not require inbound HTTP access (firewall-friendly) - Works when domain does not yet point to the server
DISADVANTAGES: - Requires DNS provider API access (TransIP) - Propagation delay (typically 30-120 seconds) - More complex setup (webhook deployment)
USE_WHEN: always for GE — DNS-01 is the standard
HTTP_01¶
HOW_IT_WORKS: 1. cert-manager creates temporary HTTP endpoint at /.well-known/acme-challenge/{token} 2. Let's Encrypt makes HTTP request to verify 3. Certificate issued
ADVANTAGES: - Simpler setup (no DNS provider integration) - No propagation delay
DISADVANTAGES: - Cannot issue wildcard certificates - Requires inbound HTTP (port 80) from internet - Domain must already resolve to the server
USE_WHEN: fallback only — if DNS-01 is broken and non-wildcard cert needed urgently
TLS:WILDCARD_CERTIFICATES¶
WHEN_TO_USE¶
USE: when client has many subdomains that need TLS (e.g., {tenant}.app.client.com) METHOD: DNS-01 challenge ONLY (HTTP-01 cannot issue wildcards)
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-{domain}-tls
spec:
secretName: wildcard-{domain}-tls-secret
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
dnsNames:
- "*.{domain}"
- "{domain}" # wildcard does NOT cover apex domain — include explicitly
RULE: wildcard certs in production require Victoria's explicit approval REASON: wildcard cert compromise = all subdomains compromised MITIGATION: rotate wildcard certs more frequently, monitor Certificate Transparency logs
TLS:CUSTOM_CLIENT_CERTIFICATES¶
CSR_FLOW (when client brings their own CA)¶
CONTEXT: some enterprise clients require certificates from their corporate CA (DigiCert, Sectigo, etc.)
PHASE_1: GENERATE CSR
STEF: generates CSR from k8s
TOOL: openssl
RUN: openssl req -new -newkey rsa:4096 -nodes \
-keyout /tmp/{domain}.key \
-out /tmp/{domain}.csr \
-subj "/CN={domain}/O={client_org}/C=NL"
THEN: store private key in Vault immediately
VAULT_PATH: secret/clients/{client}/certs/{domain}/private-key
PHASE_2: RELAY TO CLIENT
STEF: sends CSR to Margot
MARGOT: relays CSR to client contact
CLIENT: submits CSR to their CA, receives signed certificate
PHASE_3: RECEIVE AND VALIDATE
MARGOT: receives signed cert from client, forwards to Stef
STEF: validates certificate:
TOOL: openssl
RUN: openssl x509 -in {domain}.crt -text -noout
CHECK:
→ domain name matches CSR
→ certificate chain is complete (leaf + intermediate + root)
→ expiry date is reasonable (>6 months)
→ key usage includes server authentication
→ signature algorithm is SHA-256 or better (not SHA-1)
PHASE_4: INSTALL
STEF: creates k8s TLS Secret
TOOL: kubectl
RUN: kubectl create secret tls {domain}-tls-secret \
--cert={domain}.crt \
--key={domain}.key \
-n {namespace}
THEN: update Ingress to reference new secret
VERIFY: curl -vI https://{domain} 2>&1 | grep "subject:"
PHASE_5: MONITOR
STEF: adds to custom cert monitoring
ALERT_SCHEDULE:
60 days before expiry: notify Stef
60 days before expiry: Stef notifies Margot
Margot notifies client: "Certificate expires in 60 days — please renew"
30 days before expiry: escalation reminder
14 days before expiry: critical alert
RULE: custom certs are NOT auto-renewed — manual tracking required RULE: 60-day advance notice is MANDATORY (corporate CA renewal takes weeks) RULE: private key NEVER leaves GE's Vault (CSR contains only public key)
TLS:CERTIFICATE_ROTATION¶
LET_S_ENCRYPT (automated)¶
LIFECYCLE:
- Certificate issued: valid for 90 days
- cert-manager checks: daily
- Renewal trigger: 30 days before expiry (renewBefore: 720h)
- New cert issued: seamless, no downtime
- Old cert secret updated in-place
- Traefik picks up new cert automatically (watches k8s Secrets)
NO_ACTION_REQUIRED: cert-manager handles everything VERIFY_RENEWAL_WORKING:
TOOL: kubectl
RUN: kubectl get certificates -A
RUN: kubectl describe certificate {name} -n {namespace}
LOOK_FOR: "Certificate is up to date and has not expired"
LOOK_FOR: "Renewal time" in the future
CUSTOM_CERTS (manual)¶
TRACKING: all custom certs tracked in wiki/docs/clients/{client}/certificates.md
FIELDS: domain, issuer, expiry, auto_renew (no), last_notification_sent
SCHEDULE: Stef checks custom cert expiry daily at 6am (recurring task)
ALERT_CHAIN: Stef → Margot → client (60, 30, 14 days before expiry)
TLS:TLS_OPTIONS (Traefik)¶
CONFIGURATION¶
apiVersion: traefik.io/v1alpha1
kind: TLSOption
metadata:
name: default
namespace: ge-ingress
spec:
minVersion: VersionTLS12
cipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
curvePreferences:
- X25519
- CurveP256
RATIONALE: - TLS 1.2 minimum (TLS 1.0/1.1 deprecated, known vulnerabilities) - ECDHE key exchange (forward secrecy) - AES-GCM and ChaCha20 ciphers (modern, fast, secure) - X25519 curve preferred (fastest, most secure) - No CBC ciphers (vulnerable to padding oracle attacks)
TLS:TROUBLESHOOTING¶
CERTIFICATE_NOT_ISSUING¶
STEP_1: Check Certificate status
TOOL: kubectl
RUN: kubectl get certificates -A
LOOK_FOR: "False" in READY column
STEP_2: Check Certificate details
RUN: kubectl describe certificate {name} -n {namespace}
LOOK_FOR: Events section — specific error messages
STEP_3: Check CertificateRequest
RUN: kubectl get certificaterequests -A
RUN: kubectl describe certificaterequest {name} -n {namespace}
STEP_4: Check Challenge (if DNS-01)
RUN: kubectl get challenges -A
RUN: kubectl describe challenge {name} -n {namespace}
COMMON_ERRORS:
"waiting for DNS record" → TransIP API issue or propagation delay
"NXDOMAIN" → wrong domain name or nameserver not configured
"Unauthorized" → TransIP API key expired or invalid
STEP_5: Check cert-manager logs
RUN: kubectl logs -l app=cert-manager -n cert-manager --tail=100
STEP_6: Check TransIP webhook logs
RUN: kubectl logs -l app=transip-webhook -n cert-manager --tail=100
CERTIFICATE_EXPIRED¶
IF: Let's Encrypt cert expired
→ cert-manager should have renewed automatically
→ CHECK: is cert-manager running? (kubectl get pods -n cert-manager)
→ CHECK: is TransIP webhook running?
→ CHECK: is TransIP API key valid?
→ QUICK_FIX: delete the Certificate resource and recreate — forces immediate re-issuance
IF: custom cert expired
→ ALERT: client was notified at 60/30/14 days (check notification log)
→ IMMEDIATE: request emergency renewal from client
→ TEMPORARY: can switch to Let's Encrypt cert while waiting for client's CA
TLS_HANDSHAKE_FAILURES¶
TOOL: openssl
RUN: openssl s_client -connect {domain}:443 -servername {domain}
CHECK:
→ Certificate chain correct? (leaf → intermediate → root)
→ TLS version negotiated? (should be TLS 1.2 or 1.3)
→ Cipher suite used? (should match TLSOption config)
→ Certificate not expired?
COMMON_ISSUES:
"certificate verify failed" → incomplete chain (missing intermediate)
"no shared cipher" → client too old for modern ciphers (TLS 1.0 only)
"certificate has expired" → renewal failed, investigate cert-manager
TLS:INTERNAL_TLS¶
SERVICES_WITH_INTERNAL_TLS¶
GE uses internal CA certificates for service-to-service encryption:
| Service | Internal TLS | Cert Path |
|---|---|---|
| Grafana | Yes | /ge-ops/system/ssl/certs/grafana/ |
| Loki | Yes | /ge-ops/system/ssl/certs/loki/ |
| Wiki | Yes | /ge-ops/system/ssl/certs/wiki/ |
| Vault | Yes | Vault manages own TLS |
| Redis | No | Internal only, NetworkPolicy protected |
RULE: internal TLS uses self-signed CA (not Let's Encrypt) REASON: internal services not accessible from internet, Let's Encrypt unnecessary overhead VERIFY: services must trust internal CA cert (mounted as volume or env var)
TLS:CERTIFICATE_TRANSPARENCY¶
MONITORING¶
PURPOSE: detect unauthorized certificate issuance for GE domains TOOL: Certificate Transparency (CT) log monitoring
CHECK: crt.sh (public CT log search)
URL: https://crt.sh/?q=%.growing-europe.com
ALERT: if certificate issued by unexpected CA
ACTION: if unauthorized cert detected → Victoria (security incident)
TLS:AGENT_WORKFLOW¶
FOR_STEF¶
ON_NEW_DOMAIN_TLS: 1. READ this page for certificate standards 2. CREATE Certificate resource (or annotate Ingress) 3. USE staging issuer first — verify challenge works 4. SWITCH to production issuer 5. VERIFY: curl -vI https://{domain} 6. CONFIRM to leon: TLS ready
ON_CUSTOM_CERT_REQUEST: 1. GENERATE CSR 2. STORE private key in Vault immediately 3. SEND CSR to Margot for client relay 4. RECEIVE signed cert, validate chain 5. INSTALL as k8s Secret 6. ADD to custom cert monitoring (60/30/14 day alerts)
ON_DAILY_CERT_CHECK (6am): 1. List all certificates (kubectl get certificates -A) 2. Check custom certs expiry 3. Alert if any cert expires within 60 days 4. Report issues to relevant agent
TLS:ANTI_PATTERNS¶
BEFORE_EVERY_TLS_ACTION: 1. Am I using self-signed certs for public-facing services? (NEVER) 2. Am I allowing TLS 1.0 or 1.1? (NEVER — minimum TLS 1.2) 3. Am I using weak ciphers (RC4, 3DES, CBC)? (NEVER) 4. Am I storing private keys outside Vault? (NEVER) 5. Am I skipping staging issuer test? (NEVER — test first, avoid rate limits) 6. Am I issuing wildcard without Victoria's approval? (NEVER in production) 7. Am I manually renewing Let's Encrypt certs? (NEVER — cert-manager handles it) 8. Am I forgetting the 60-day custom cert notification? (NEVER — client CAs are slow)
TLS:CROSS_REFERENCES¶
DNS_MANAGEMENT: domains/networking/dns-management.md — DNS-01 challenge records NETWORK_SECURITY: domains/networking/network-security.md — TLS termination, mTLS CDN_EDGE: domains/networking/cdn-edge.md — CDN TLS (Bunny manages edge TLS) KUBERNETES_OPERATIONS: domains/infrastructure/kubernetes-operations.md — cert-manager in k8s SECURITY: domains/security/index.md — TLS security policies