Skip to content

DOMAIN:SECURITY:VAULT_SECRETS

OWNER: piotr
ALSO_USED_BY: hugo, victoria, tjitte, arjan
UPDATED: 2026-03-24
SCOPE: all secrets management — GE internal and client projects
TOOL: HashiCorp Vault (self-hosted on k3s)


CORE_PRINCIPLE

RULE: secrets MUST live in Vault — never in env vars, config files, or source code
RULE: applications NEVER store secrets at rest — fetch from Vault at runtime
RULE: every secret has an owner, a rotation schedule, and an audit trail
RULE: principle of least privilege — each service gets only the secrets it needs


KV_V2_ENGINE

STANDARD: Vault KV Secrets Engine v2 (versioned)

PATH_CONVENTIONS

GE VAULT PATH STRUCTURE:

secret/
  ge/                           # GE internal secrets
    admin-ui/                   # admin-ui application secrets
      db-credentials
      webauthn-config
      nextauth-secret
    orchestrator/               # ge-orchestrator secrets
      redis-password
      db-credentials
    executor/                   # agent executor secrets
      anthropic-api-key
      openai-api-key
      gemini-api-key
    infrastructure/             # infrastructure secrets
      k3s-token
      bunnycdn-api-key
      docker-registry-creds
  clients/                      # per-client secrets
    {client-slug}/              # e.g., secret/clients/acme-corp/
      keycloak/
        client-secret
        admin-password
      database/
        app-credentials
        admin-credentials
      integrations/
        stripe-api-key
        sendgrid-api-key
        {service}-api-key
      certificates/
        tls-cert
        tls-key

RULE: client secrets ALWAYS under secret/clients/{client-slug}/
RULE: GE internal secrets ALWAYS under secret/ge/
RULE: NEVER store secrets at path root — always categorize
RULE: use descriptive names — db-credentials not creds or pass

KV_V2_OPERATIONS

# Write a secret (creates new version)
vault kv put secret/clients/acme-corp/database/app-credentials \
  username="app_user" \
  password="$(openssl rand -base64 32)"

# Read current version
vault kv get secret/clients/acme-corp/database/app-credentials

# Read specific version
vault kv get -version=2 secret/clients/acme-corp/database/app-credentials

# List secrets at path
vault kv list secret/clients/acme-corp/

# Delete current version (soft delete — recoverable)
vault kv delete secret/clients/acme-corp/database/app-credentials

# Undelete (recover soft-deleted version)
vault kv undelete -versions=3 secret/clients/acme-corp/database/app-credentials

# Permanently destroy a version
vault kv destroy -versions=1 secret/clients/acme-corp/database/app-credentials

# View metadata (versions, creation times, custom metadata)
vault kv metadata get secret/clients/acme-corp/database/app-credentials

VERSIONING_RULES

RULE: KV v2 keeps version history — use this for audit trail
RULE: set max-versions per path to prevent unbounded growth
RULE: soft-delete by default — permanent destroy only for compromised secrets

# Configure max versions for a path
vault kv metadata put -max-versions=10 secret/clients/acme-corp/database/app-credentials

# Configure delete-version-after (auto-cleanup)
vault kv metadata put -delete-version-after=720h secret/ge/executor/anthropic-api-key

CUSTOM_METADATA

# Tag secrets with metadata for management
vault kv metadata put \
  -custom-metadata=owner="piotr" \
  -custom-metadata=rotation_days="90" \
  -custom-metadata=last_rotated="2026-03-24" \
  -custom-metadata=client="acme-corp" \
  secret/clients/acme-corp/database/app-credentials

DYNAMIC_DATABASE_CREDENTIALS

STANDARD: Vault Database Secrets Engine
BENEFIT: no static DB passwords — Vault creates temporary credentials on demand

SETUP

# Enable database secrets engine
vault secrets enable database

# Configure PostgreSQL connection
vault write database/config/client-acme-db \
  plugin_name=postgresql-database-plugin \
  allowed_roles="acme-readonly,acme-readwrite" \
  connection_url="postgresql://{{username}}:{{password}}@postgres.ge-system.svc:5432/acme_db?sslmode=require" \
  username="vault_admin" \
  password="$(vault kv get -field=password secret/ge/infrastructure/vault-db-admin)"

# Create read-only role (TTL: 1 hour, max: 24 hours)
vault write database/roles/acme-readonly \
  db_name=client-acme-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="REVOKE ALL ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

# Create read-write role
vault write database/roles/acme-readwrite \
  db_name=client-acme-db \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\"; \
    GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="REVOKE ALL ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; \
    REVOKE USAGE ON ALL SEQUENCES IN SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

USAGE

# Application requests temporary credentials
vault read database/creds/acme-readonly
# Returns: { username: "v-approle-acme-read-xxxxx", password: "yyyyy", lease_id: "...", lease_duration: 3600 }

# Application uses these credentials for PostgreSQL connection
# When TTL expires, Vault automatically revokes the role

# Renew lease before expiry (if more time needed)
vault lease renew database/creds/acme-readonly/lease-id-here

# Revoke immediately (on application shutdown)
vault lease revoke database/creds/acme-readonly/lease-id-here

DYNAMIC_CREDENTIALS_RULES

RULE: use dynamic credentials for ALL client project databases
RULE: application must handle credential expiry gracefully (reconnect with new creds)
RULE: default TTL should be short (1 hour) — renew as needed
RULE: revoke credentials on application shutdown (graceful cleanup)

ANTI_PATTERN: using static database passwords for client projects
FIX: configure dynamic credentials via Vault Database engine

ANTI_PATTERN: setting max_ttl too long (days) — defeated purpose of dynamic creds
FIX: max_ttl = 24 hours, default_ttl = 1 hour


PKI_ENGINE

STANDARD: Vault PKI Secrets Engine
USE_CASE: TLS certificates for internal services, mTLS between services

SETUP_INTERNAL_CA

# Enable PKI engine for root CA
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki  # 10 years

# Generate root CA
vault write pki/root/generate/internal \
  common_name="GE Internal Root CA" \
  ttl=87600h \
  key_bits=4096

# Enable PKI for intermediate CA
vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=43800h pki_int  # 5 years

# Generate intermediate CA CSR
vault write pki_int/intermediate/generate/internal \
  common_name="GE Internal Intermediate CA" \
  key_bits=4096

# Sign intermediate with root
vault write pki/root/sign-intermediate \
  csr=@intermediate.csr \
  format=pem_bundle \
  ttl=43800h

# Set signed intermediate certificate
vault write pki_int/intermediate/set-signed certificate=@signed_intermediate.pem

# Create role for issuing certificates
vault write pki_int/roles/ge-internal \
  allowed_domains="ge-system.svc.cluster.local,ge-agents.svc.cluster.local" \
  allow_subdomains=true \
  max_ttl=720h \
  key_bits=2048 \
  key_type=rsa \
  require_cn=false \
  allowed_uri_sans="spiffe://ge-cluster/*"

ISSUE_CERTIFICATE

# Issue certificate for a service
vault write pki_int/issue/ge-internal \
  common_name="admin-ui.ge-system.svc.cluster.local" \
  alt_names="admin-ui.ge-system.svc,admin-ui" \
  ttl=720h
# Returns: certificate, private_key, ca_chain, serial_number

# For mTLS: issue client certificate
vault write pki_int/issue/ge-internal \
  common_name="ge-orchestrator.ge-agents.svc.cluster.local" \
  ttl=720h

PKI_RULES

RULE: root CA key NEVER leaves Vault — sign intermediates only
RULE: intermediate CA issues all service certificates
RULE: certificate TTL max 30 days for internal services (auto-renew)
RULE: certificate TTL max 90 days for external-facing (Let's Encrypt preferred)
RULE: revoke certificates immediately when service is decommissioned


APPROLE_AUTH

STANDARD: Vault AppRole Auth Method
GE_USAGE: all k8s pods authenticate to Vault via AppRole

SETUP

# Enable AppRole auth
vault auth enable approle

# Create policy for admin-ui
vault policy write admin-ui-policy - <<EOF
# Read GE internal secrets
path "secret/data/ge/admin-ui/*" {
  capabilities = ["read", "list"]
}

# Read client secrets (admin-ui needs access to all clients)
path "secret/data/clients/*" {
  capabilities = ["read", "list"]
}

# No write access — admin-ui reads only
# Write access only for piotr's management role
EOF

# Create policy for executor
vault policy write executor-policy - <<EOF
path "secret/data/ge/executor/*" {
  capabilities = ["read"]
}
# Executor gets API keys only — no client data access
EOF

# Create policy for client project app
vault policy write client-acme-policy - <<EOF
path "secret/data/clients/acme-corp/*" {
  capabilities = ["read"]
}
path "database/creds/acme-readonly" {
  capabilities = ["read"]
}
path "database/creds/acme-readwrite" {
  capabilities = ["read"]
}
# Client app sees ONLY its own secrets — not other clients
EOF

# Create AppRole for admin-ui
vault write auth/approle/role/admin-ui \
  token_policies="admin-ui-policy" \
  token_ttl=1h \
  token_max_ttl=4h \
  secret_id_ttl=0 \
  token_num_uses=0

# Get role ID (static, can be in config)
vault read auth/approle/role/admin-ui/role-id

# Generate secret ID (sensitive, must be in k8s Secret)
vault write -f auth/approle/role/admin-ui/secret-id

K8S_INTEGRATION

# k8s Secret containing AppRole credentials
apiVersion: v1
kind: Secret
metadata:
  name: vault-approle
  namespace: ge-system
type: Opaque
data:
  role-id: <base64-encoded-role-id>
  secret-id: <base64-encoded-secret-id>
---
# Pod spec mounting Vault credentials
spec:
  containers:
    - name: admin-ui
      env:
        - name: VAULT_ADDR
          value: "http://vault.ge-system.svc.cluster.local:8200"
        - name: VAULT_ROLE_ID
          valueFrom:
            secretKeyRef:
              name: vault-approle
              key: role-id
        - name: VAULT_SECRET_ID
          valueFrom:
            secretKeyRef:
              name: vault-approle
              key: secret-id

APPLICATION_VAULT_CLIENT

// lib/vault-client.ts — Vault client for Node.js applications
import Vault from 'node-vault'

let client: ReturnType<typeof Vault> | null = null
let tokenExpiry = 0

async function getClient(): Promise<ReturnType<typeof Vault>> {
  if (client && Date.now() < tokenExpiry - 60000) return client

  const vault = Vault({
    apiVersion: 'v1',
    endpoint: process.env.VAULT_ADDR!,
  })

  // Authenticate with AppRole
  const { auth } = await vault.approleLogin({
    role_id: process.env.VAULT_ROLE_ID!,
    secret_id: process.env.VAULT_SECRET_ID!,
  })

  vault.token = auth.client_token
  tokenExpiry = Date.now() + (auth.lease_duration * 1000)
  client = vault
  return vault
}

// Read a secret
export async function getSecret(path: string): Promise<Record<string, string>> {
  const vault = await getClient()
  const result = await vault.read(path)
  return result.data.data  // KV v2: data is nested
}

// Usage:
// const dbCreds = await getSecret('secret/data/clients/acme-corp/database/app-credentials')
// const { username, password } = dbCreds

APPROLE_RULES

RULE: one AppRole per application/service — never share roles
RULE: role-id is semi-public (like username) — secret-id is the credential
RULE: secret-id should be rotated regularly (every 90 days)
RULE: token TTL should match application lifecycle (1-4 hours, auto-renew)
RULE: policies follow principle of least privilege — each role reads ONLY its own paths

ANTI_PATTERN: one AppRole with wildcard access to all secrets
FIX: granular policies per role — client apps see only their client's secrets

ANTI_PATTERN: long-lived Vault tokens (days/weeks)
FIX: short TTL (1 hour) with auto-renewal in application code


EXTERNAL_SECRETS_OPERATOR

STANDARD: External Secrets Operator (ESO) for Kubernetes
USE_CASE: sync Vault secrets to k8s Secrets automatically

SETUP

# ClusterSecretStore — connects ESO to Vault
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: vault-backend
spec:
  provider:
    vault:
      server: "http://vault.ge-system.svc.cluster.local:8200"
      path: "secret"
      version: "v2"
      auth:
        appRole:
          path: "approle"
          roleId: "vault-role-id"
          secretRef:
            name: vault-approle-eso
            namespace: ge-system
            key: secret-id
---
# ExternalSecret — syncs specific secrets to k8s Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: admin-ui-secrets
  namespace: ge-system
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault-backend
  target:
    name: admin-ui-secrets     # resulting k8s Secret name
    creationPolicy: Owner
  data:
    - secretKey: NEXTAUTH_SECRET
      remoteRef:
        key: ge/admin-ui/nextauth-secret
        property: value
    - secretKey: DATABASE_URL
      remoteRef:
        key: ge/admin-ui/db-credentials
        property: connection_string

ESO_RULES

RULE: use ExternalSecret for k8s workloads — do not manually create k8s Secrets for app credentials
RULE: refreshInterval: 1h for most secrets, 15m for frequently rotated secrets
RULE: target creationPolicy: Owner — ESO manages lifecycle
RULE: one ExternalSecret per application per namespace

ANTI_PATTERN: manually creating k8s Secrets with kubectl — drift from Vault
FIX: all application secrets via ExternalSecret resources

ANTI_PATTERN: refreshInterval too short (1m) — excessive Vault API calls
FIX: 1h default, 15m minimum for high-rotation secrets


ROTATION_POLICIES

ROTATION_SCHEDULE

secret type rotation period method owner
API keys (third-party) 90 days manual rotate + update in Vault piotr
Database passwords (static) 90 days Vault rotate-root + app restart piotr
Database credentials (dynamic) 1 hour automatic (Vault lease) Vault
TLS certificates (internal) 30 days automatic (Vault PKI + cert-manager) Vault
TLS certificates (external) 90 days Let's Encrypt auto-renewal cert-manager
AppRole secret-id 90 days manual regenerate + update k8s Secret piotr
Keycloak client secrets 90 days manual rotate in Keycloak + Vault piotr + hugo
Encryption keys 1 year Vault transit key rotation (auto) piotr
SSH keys 1 year regenerate + distribute piotr

ROTATION_PROCEDURE

FOR EACH SECRET ROTATION:

1. Generate new secret value
   RUN: openssl rand -base64 32 (for passwords/keys)
   OR: use Vault auto-rotation where available

2. Update secret in Vault
   RUN: vault kv put secret/{path} value="new-secret-value"

3. Verify new version in Vault
   RUN: vault kv get -version=latest secret/{path}

4. Trigger secret refresh in applications
   IF using ESO: wait for refreshInterval OR trigger manual refresh
   IF using direct Vault: application picks up on next token renewal
   IF using k8s Secret: kubectl rollout restart deployment/{app}

5. Verify application works with new secret
   CHECK: health endpoint returns 200
   CHECK: functionality test passes

6. Update rotation metadata
   RUN: vault kv metadata put -custom-metadata=last_rotated="$(date +%Y-%m-%d)" secret/{path}

7. Audit log entry
   NOTE: Vault audit log captures this automatically — verify it's there

COMPROMISED_SECRET_PROCEDURE

IF secret is compromised (leaked, stolen, exposed):

1. IMMEDIATELY rotate the compromised secret
   TIME: within 15 minutes of discovery

2. Revoke all active leases/tokens using the compromised secret
   RUN: vault lease revoke -prefix secret/{path}
   RUN: vault token revoke -accessor {accessor} (if token compromised)

3. Check audit log for unauthorized access
   RUN: vault audit list
   CHECK: any unexpected reads of the compromised path?

4. Assess blast radius
   CHECK: what systems used this secret?
   CHECK: what data could have been accessed?
   CHECK: is there evidence of exploitation?

5. Notify stakeholders
   IF client data potentially accessed → escalate to human (Dirk-Jan)
   IF GDPR breach → 72-hour notification requirement (GDPR Art. 33)

6. Post-incident: determine how secret was compromised
   CHECK: was it committed to git? (run trufflehog)
   CHECK: was it logged? (search application logs)
   CHECK: was it exposed in error message?
   CHECK: was it transmitted insecurely?

EMERGENCY_PROCEDURES

SEAL_VAULT

WHEN TO SEAL:
  - suspected Vault compromise
  - during maintenance windows
  - before infrastructure migration

HOW TO SEAL:
  RUN: vault operator seal
  EFFECT: all secrets become inaccessible, all leases suspended
  NOTE: applications will fail to authenticate — this is EXPECTED during emergency

WARNING: sealing Vault affects ALL applications using it
DECISION: seal if compromise risk > availability impact

UNSEAL_VAULT

PROCEDURE:
  1. Verify the emergency is resolved
  2. Gather unseal key holders (Shamir's Secret Sharing — need threshold of N)
  3. Each key holder runs: vault operator unseal {their-key-share}
  4. After threshold reached, Vault unseals
  5. Verify: vault status (should show Sealed: false)
  6. Verify applications reconnect successfully

GE SETUP:
  - 5 unseal key shares, threshold of 3
  - Key holders: Dirk-Jan + 2 designated backup humans
  - Keys stored in separate secure locations (NOT in Vault itself)
  - Auto-unseal via cloud KMS is an option for availability

AUTO_UNSEAL

OPTION: Auto-unseal with transit key (another Vault or cloud KMS)

BENEFIT: Vault unseals automatically on restart — no human intervention
RISK: if KMS is compromised, Vault auto-unseals for attacker
RECOMMENDATION: use auto-unseal for availability, manual unseal for highest-security

IF using auto-unseal:
  - protect KMS access with IAM + audit logging
  - monitor unseal events — alert on unexpected unseals
  - maintain manual unseal keys as backup

AUDIT_LOGGING

ENABLE_AUDIT

# Enable file audit device
vault audit enable file file_path=/vault/audit/vault-audit.log

# Enable syslog audit device (for centralized logging)
vault audit enable syslog tag="vault" facility="AUTH"

# RULE: at least TWO audit devices enabled
# If one fails, Vault blocks ALL operations rather than operate unaudited

AUDIT_LOG_MONITORING

MONITOR FOR:
  - failed authentication attempts (brute force on AppRole)
  - unexpected secret reads (data exfiltration)
  - policy changes (privilege escalation)
  - new auth method enablement (backdoor creation)
  - seal/unseal events (emergency situation)
  - root token generation (should NEVER happen in normal operation)

ALERT ON:
  - >5 failed auth attempts in 1 minute from same source
  - any read of secret/clients/* from unexpected source
  - any policy write operation
  - any auth enable/disable operation
  - any seal event
  - ANY root token creation

AUDIT_RULES

RULE: Vault audit MUST be enabled at all times — Vault refuses to operate without it
RULE: audit logs shipped to centralized logging — not only on Vault server
RULE: audit log retention: 1 year minimum (compliance requirement)
RULE: audit logs are APPEND-ONLY — no deletion, no modification


COMMON_MISTAKES

ANTI_PATTERN: using Vault root token for application access
FIX: root token used ONLY for initial setup, then revoked. Applications use AppRole.

ANTI_PATTERN: overly broad policies (path "secret/*" capabilities = ["read"])
FIX: granular paths per application — client app reads only its client's secrets

ANTI_PATTERN: not monitoring Vault audit logs
FIX: ship to centralized logging, alert on anomalies

ANTI_PATTERN: storing Vault unseal keys in a shared document
FIX: Shamir shares distributed to separate individuals, stored in separate secure locations

ANTI_PATTERN: no backup plan for Vault unavailability
FIX: test recovery procedure quarterly, maintain encrypted backup of Vault data

ANTI_PATTERN: using Vault for configuration (non-secret values)
FIX: config in ConfigMap/env vars, secrets in Vault — don't mix concerns


SELF_CHECK

BEFORE_DEPLOYING_SECRET_CHANGES:
- [ ] secret stored at correct path following conventions?
- [ ] policy restricts access to only necessary services?
- [ ] rotation schedule documented in metadata?
- [ ] audit logging enabled and shipping to central logging?
- [ ] ESO ExternalSecret configured (if k8s workload)?
- [ ] application handles secret rotation gracefully (reconnect)?
- [ ] no secrets in source code, env files, or config maps?
- [ ] backup/recovery procedure tested?


READ_ALSO: domains/security/authentication-patterns.md, domains/security/security-hardening.md