Skip to content

DOMAIN:NETWORKING:TLS_CERTIFICATES

OWNER: stef UPDATED: 2026-03-24 SCOPE: all TLS certificate operations for GE and client domains AGENTS: stef (primary), margot (client cert CSR relay), victoria (wildcard approval), piotr (Vault credentials) CA: Let's Encrypt (primary), client-provided CAs (custom flow) TOOL: cert-manager (k8s-native certificate automation)


TLS:OVERVIEW

PURPOSE: automated TLS certificate issuance, renewal, and management CA: Let's Encrypt (free, automated, trusted by all browsers) TOOL: cert-manager in Kubernetes (ClusterIssuer + Certificate CRDs) CHALLENGE: DNS-01 via TransIP webhook (supports wildcards, no inbound HTTP needed) TERMINATION: Traefik IngressController (TLS terminates at ingress, plain HTTP to backends) MIN_TLS_VERSION: TLS 1.2 (enforced via TLSOption CRD)

RULE: all public-facing endpoints MUST use HTTPS RULE: HTTP (port 80) redirects to HTTPS — never serves content RULE: TLS 1.0 and 1.1 are disabled (insecure, deprecated) RULE: cert-manager handles renewal — never manually renew Let's Encrypt certs


TLS:CERT_MANAGER_SETUP

CLUSTERISSUER

GE uses two ClusterIssuers: staging (for testing) and production.

STAGING (use first to avoid rate limits):

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: certs@growing-europe.com
    privateKeySecretRef:
      name: letsencrypt-staging-key
    solvers:
    - dns01:
        webhook:
          groupName: acme.transip.nl
          solverName: transip
          config:
            accountName: ${TRANSIP_ACCOUNT}
            privateKeySecretRef:
              name: transip-api-key
              key: key

PRODUCTION:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: certs@growing-europe.com
    privateKeySecretRef:
      name: letsencrypt-production-key
    solvers:
    - dns01:
        webhook:
          groupName: acme.transip.nl
          solverName: transip
          config:
            accountName: ${TRANSIP_ACCOUNT}
            privateKeySecretRef:
              name: transip-api-key
              key: key

RULE: always test with staging issuer first — Let's Encrypt production has rate limits (50 certs/week/domain)

CERTIFICATE_RESOURCE

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: {domain}-tls
  namespace: {namespace}
spec:
  secretName: {domain}-tls-secret
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
  - {domain}
  - www.{domain}
  renewBefore: 720h    # renew 30 days before expiry (Let's Encrypt certs = 90 days)

INGRESS_ANNOTATION (alternative to Certificate resource)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {app}
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-production"
spec:
  tls:
  - hosts:
    - {domain}
    secretName: {domain}-tls-secret
  rules:
  - host: {domain}
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: {app}
            port:
              number: 80

TLS:CHALLENGE_TYPES

DNS_01 (GE default)

HOW_IT_WORKS: 1. cert-manager requests certificate from Let's Encrypt 2. Let's Encrypt returns challenge token 3. cert-manager creates TXT record via TransIP webhook: _acme-challenge.{domain} TXT "{token}" 4. Let's Encrypt verifies TXT record 5. Certificate issued, stored as k8s Secret 6. Challenge TXT record cleaned up

ADVANTAGES: - Works for wildcard certificates (*.example.com) - Does not require inbound HTTP access (firewall-friendly) - Works when domain does not yet point to the server

DISADVANTAGES: - Requires DNS provider API access (TransIP) - Propagation delay (typically 30-120 seconds) - More complex setup (webhook deployment)

USE_WHEN: always for GE — DNS-01 is the standard

HTTP_01

HOW_IT_WORKS: 1. cert-manager creates temporary HTTP endpoint at /.well-known/acme-challenge/{token} 2. Let's Encrypt makes HTTP request to verify 3. Certificate issued

ADVANTAGES: - Simpler setup (no DNS provider integration) - No propagation delay

DISADVANTAGES: - Cannot issue wildcard certificates - Requires inbound HTTP (port 80) from internet - Domain must already resolve to the server

USE_WHEN: fallback only — if DNS-01 is broken and non-wildcard cert needed urgently


TLS:WILDCARD_CERTIFICATES

WHEN_TO_USE

USE: when client has many subdomains that need TLS (e.g., {tenant}.app.client.com) METHOD: DNS-01 challenge ONLY (HTTP-01 cannot issue wildcards)

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-{domain}-tls
spec:
  secretName: wildcard-{domain}-tls-secret
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
  - "*.{domain}"
  - "{domain}"        # wildcard does NOT cover apex domain — include explicitly

RULE: wildcard certs in production require Victoria's explicit approval REASON: wildcard cert compromise = all subdomains compromised MITIGATION: rotate wildcard certs more frequently, monitor Certificate Transparency logs


TLS:CUSTOM_CLIENT_CERTIFICATES

CSR_FLOW (when client brings their own CA)

CONTEXT: some enterprise clients require certificates from their corporate CA (DigiCert, Sectigo, etc.)

PHASE_1: GENERATE CSR
  STEF: generates CSR from k8s
  TOOL: openssl
  RUN: openssl req -new -newkey rsa:4096 -nodes \
    -keyout /tmp/{domain}.key \
    -out /tmp/{domain}.csr \
    -subj "/CN={domain}/O={client_org}/C=NL"
  THEN: store private key in Vault immediately
  VAULT_PATH: secret/clients/{client}/certs/{domain}/private-key

PHASE_2: RELAY TO CLIENT
  STEF: sends CSR to Margot
  MARGOT: relays CSR to client contact
  CLIENT: submits CSR to their CA, receives signed certificate

PHASE_3: RECEIVE AND VALIDATE
  MARGOT: receives signed cert from client, forwards to Stef
  STEF: validates certificate:
  TOOL: openssl
  RUN: openssl x509 -in {domain}.crt -text -noout
  CHECK:
    → domain name matches CSR
    → certificate chain is complete (leaf + intermediate + root)
    → expiry date is reasonable (>6 months)
    → key usage includes server authentication
    → signature algorithm is SHA-256 or better (not SHA-1)

PHASE_4: INSTALL
  STEF: creates k8s TLS Secret
  TOOL: kubectl
  RUN: kubectl create secret tls {domain}-tls-secret \
    --cert={domain}.crt \
    --key={domain}.key \
    -n {namespace}
  THEN: update Ingress to reference new secret
  VERIFY: curl -vI https://{domain} 2>&1 | grep "subject:"

PHASE_5: MONITOR
  STEF: adds to custom cert monitoring
  ALERT_SCHEDULE:
    60 days before expiry: notify Stef
    60 days before expiry: Stef notifies Margot
    Margot notifies client: "Certificate expires in 60 days — please renew"
    30 days before expiry: escalation reminder
    14 days before expiry: critical alert

RULE: custom certs are NOT auto-renewed — manual tracking required RULE: 60-day advance notice is MANDATORY (corporate CA renewal takes weeks) RULE: private key NEVER leaves GE's Vault (CSR contains only public key)


TLS:CERTIFICATE_ROTATION

LET_S_ENCRYPT (automated)

LIFECYCLE:
- Certificate issued: valid for 90 days
- cert-manager checks: daily
- Renewal trigger: 30 days before expiry (renewBefore: 720h)
- New cert issued: seamless, no downtime
- Old cert secret updated in-place
- Traefik picks up new cert automatically (watches k8s Secrets)

NO_ACTION_REQUIRED: cert-manager handles everything VERIFY_RENEWAL_WORKING:

TOOL: kubectl
RUN: kubectl get certificates -A
RUN: kubectl describe certificate {name} -n {namespace}
LOOK_FOR: "Certificate is up to date and has not expired"
LOOK_FOR: "Renewal time" in the future

CUSTOM_CERTS (manual)

TRACKING: all custom certs tracked in wiki/docs/clients/{client}/certificates.md
FIELDS: domain, issuer, expiry, auto_renew (no), last_notification_sent
SCHEDULE: Stef checks custom cert expiry daily at 6am (recurring task)
ALERT_CHAIN: Stef → Margot → client (60, 30, 14 days before expiry)

TLS:TLS_OPTIONS (Traefik)

CONFIGURATION

apiVersion: traefik.io/v1alpha1
kind: TLSOption
metadata:
  name: default
  namespace: ge-ingress
spec:
  minVersion: VersionTLS12
  cipherSuites:
  - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
  - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
  - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
  - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
  curvePreferences:
  - X25519
  - CurveP256

RATIONALE: - TLS 1.2 minimum (TLS 1.0/1.1 deprecated, known vulnerabilities) - ECDHE key exchange (forward secrecy) - AES-GCM and ChaCha20 ciphers (modern, fast, secure) - X25519 curve preferred (fastest, most secure) - No CBC ciphers (vulnerable to padding oracle attacks)


TLS:TROUBLESHOOTING

CERTIFICATE_NOT_ISSUING

STEP_1: Check Certificate status
TOOL: kubectl
RUN: kubectl get certificates -A
LOOK_FOR: "False" in READY column

STEP_2: Check Certificate details
RUN: kubectl describe certificate {name} -n {namespace}
LOOK_FOR: Events section — specific error messages

STEP_3: Check CertificateRequest
RUN: kubectl get certificaterequests -A
RUN: kubectl describe certificaterequest {name} -n {namespace}

STEP_4: Check Challenge (if DNS-01)
RUN: kubectl get challenges -A
RUN: kubectl describe challenge {name} -n {namespace}
COMMON_ERRORS:
  "waiting for DNS record" → TransIP API issue or propagation delay
  "NXDOMAIN" → wrong domain name or nameserver not configured
  "Unauthorized" → TransIP API key expired or invalid

STEP_5: Check cert-manager logs
RUN: kubectl logs -l app=cert-manager -n cert-manager --tail=100

STEP_6: Check TransIP webhook logs
RUN: kubectl logs -l app=transip-webhook -n cert-manager --tail=100

CERTIFICATE_EXPIRED

IF: Let's Encrypt cert expired
  → cert-manager should have renewed automatically
  → CHECK: is cert-manager running? (kubectl get pods -n cert-manager)
  → CHECK: is TransIP webhook running?
  → CHECK: is TransIP API key valid?
  → QUICK_FIX: delete the Certificate resource and recreate — forces immediate re-issuance

IF: custom cert expired
  → ALERT: client was notified at 60/30/14 days (check notification log)
  → IMMEDIATE: request emergency renewal from client
  → TEMPORARY: can switch to Let's Encrypt cert while waiting for client's CA

TLS_HANDSHAKE_FAILURES

TOOL: openssl
RUN: openssl s_client -connect {domain}:443 -servername {domain}
CHECK:
  → Certificate chain correct? (leaf → intermediate → root)
  → TLS version negotiated? (should be TLS 1.2 or 1.3)
  → Cipher suite used? (should match TLSOption config)
  → Certificate not expired?

COMMON_ISSUES:
  "certificate verify failed" → incomplete chain (missing intermediate)
  "no shared cipher" → client too old for modern ciphers (TLS 1.0 only)
  "certificate has expired" → renewal failed, investigate cert-manager

TLS:INTERNAL_TLS

SERVICES_WITH_INTERNAL_TLS

GE uses internal CA certificates for service-to-service encryption:

Service Internal TLS Cert Path
Grafana Yes /ge-ops/system/ssl/certs/grafana/
Loki Yes /ge-ops/system/ssl/certs/loki/
Wiki Yes /ge-ops/system/ssl/certs/wiki/
Vault Yes Vault manages own TLS
Redis No Internal only, NetworkPolicy protected

RULE: internal TLS uses self-signed CA (not Let's Encrypt) REASON: internal services not accessible from internet, Let's Encrypt unnecessary overhead VERIFY: services must trust internal CA cert (mounted as volume or env var)


TLS:CERTIFICATE_TRANSPARENCY

MONITORING

PURPOSE: detect unauthorized certificate issuance for GE domains TOOL: Certificate Transparency (CT) log monitoring

CHECK: crt.sh (public CT log search)
URL: https://crt.sh/?q=%.growing-europe.com
ALERT: if certificate issued by unexpected CA
ACTION: if unauthorized cert detected → Victoria (security incident)

TLS:AGENT_WORKFLOW

FOR_STEF

ON_NEW_DOMAIN_TLS: 1. READ this page for certificate standards 2. CREATE Certificate resource (or annotate Ingress) 3. USE staging issuer first — verify challenge works 4. SWITCH to production issuer 5. VERIFY: curl -vI https://{domain} 6. CONFIRM to leon: TLS ready

ON_CUSTOM_CERT_REQUEST: 1. GENERATE CSR 2. STORE private key in Vault immediately 3. SEND CSR to Margot for client relay 4. RECEIVE signed cert, validate chain 5. INSTALL as k8s Secret 6. ADD to custom cert monitoring (60/30/14 day alerts)

ON_DAILY_CERT_CHECK (6am): 1. List all certificates (kubectl get certificates -A) 2. Check custom certs expiry 3. Alert if any cert expires within 60 days 4. Report issues to relevant agent


TLS:ANTI_PATTERNS

BEFORE_EVERY_TLS_ACTION: 1. Am I using self-signed certs for public-facing services? (NEVER) 2. Am I allowing TLS 1.0 or 1.1? (NEVER — minimum TLS 1.2) 3. Am I using weak ciphers (RC4, 3DES, CBC)? (NEVER) 4. Am I storing private keys outside Vault? (NEVER) 5. Am I skipping staging issuer test? (NEVER — test first, avoid rate limits) 6. Am I issuing wildcard without Victoria's approval? (NEVER in production) 7. Am I manually renewing Let's Encrypt certs? (NEVER — cert-manager handles it) 8. Am I forgetting the 60-day custom cert notification? (NEVER — client CAs are slow)


TLS:CROSS_REFERENCES

DNS_MANAGEMENT: domains/networking/dns-management.md — DNS-01 challenge records NETWORK_SECURITY: domains/networking/network-security.md — TLS termination, mTLS CDN_EDGE: domains/networking/cdn-edge.md — CDN TLS (Bunny manages edge TLS) KUBERNETES_OPERATIONS: domains/infrastructure/kubernetes-operations.md — cert-manager in k8s SECURITY: domains/security/index.md — TLS security policies