DOMAIN:SECURITY:VAULT_SECRETS¶
OWNER: piotr
ALSO_USED_BY: hugo, victoria, tjitte, arjan
UPDATED: 2026-03-24
SCOPE: all secrets management — GE internal and client projects
TOOL: HashiCorp Vault (self-hosted on k3s)
CORE_PRINCIPLE¶
RULE: secrets MUST live in Vault — never in env vars, config files, or source code
RULE: applications NEVER store secrets at rest — fetch from Vault at runtime
RULE: every secret has an owner, a rotation schedule, and an audit trail
RULE: principle of least privilege — each service gets only the secrets it needs
KV_V2_ENGINE¶
STANDARD: Vault KV Secrets Engine v2 (versioned)
PATH_CONVENTIONS¶
GE VAULT PATH STRUCTURE:
secret/
ge/ # GE internal secrets
admin-ui/ # admin-ui application secrets
db-credentials
webauthn-config
nextauth-secret
orchestrator/ # ge-orchestrator secrets
redis-password
db-credentials
executor/ # agent executor secrets
anthropic-api-key
openai-api-key
gemini-api-key
infrastructure/ # infrastructure secrets
k3s-token
bunnycdn-api-key
docker-registry-creds
clients/ # per-client secrets
{client-slug}/ # e.g., secret/clients/acme-corp/
keycloak/
client-secret
admin-password
database/
app-credentials
admin-credentials
integrations/
stripe-api-key
sendgrid-api-key
{service}-api-key
certificates/
tls-cert
tls-key
RULE: client secrets ALWAYS under secret/clients/{client-slug}/
RULE: GE internal secrets ALWAYS under secret/ge/
RULE: NEVER store secrets at path root — always categorize
RULE: use descriptive names — db-credentials not creds or pass
KV_V2_OPERATIONS¶
# Write a secret (creates new version)
vault kv put secret/clients/acme-corp/database/app-credentials \
username="app_user" \
password="$(openssl rand -base64 32)"
# Read current version
vault kv get secret/clients/acme-corp/database/app-credentials
# Read specific version
vault kv get -version=2 secret/clients/acme-corp/database/app-credentials
# List secrets at path
vault kv list secret/clients/acme-corp/
# Delete current version (soft delete — recoverable)
vault kv delete secret/clients/acme-corp/database/app-credentials
# Undelete (recover soft-deleted version)
vault kv undelete -versions=3 secret/clients/acme-corp/database/app-credentials
# Permanently destroy a version
vault kv destroy -versions=1 secret/clients/acme-corp/database/app-credentials
# View metadata (versions, creation times, custom metadata)
vault kv metadata get secret/clients/acme-corp/database/app-credentials
VERSIONING_RULES¶
RULE: KV v2 keeps version history — use this for audit trail
RULE: set max-versions per path to prevent unbounded growth
RULE: soft-delete by default — permanent destroy only for compromised secrets
# Configure max versions for a path
vault kv metadata put -max-versions=10 secret/clients/acme-corp/database/app-credentials
# Configure delete-version-after (auto-cleanup)
vault kv metadata put -delete-version-after=720h secret/ge/executor/anthropic-api-key
CUSTOM_METADATA¶
# Tag secrets with metadata for management
vault kv metadata put \
-custom-metadata=owner="piotr" \
-custom-metadata=rotation_days="90" \
-custom-metadata=last_rotated="2026-03-24" \
-custom-metadata=client="acme-corp" \
secret/clients/acme-corp/database/app-credentials
DYNAMIC_DATABASE_CREDENTIALS¶
STANDARD: Vault Database Secrets Engine
BENEFIT: no static DB passwords — Vault creates temporary credentials on demand
SETUP¶
# Enable database secrets engine
vault secrets enable database
# Configure PostgreSQL connection
vault write database/config/client-acme-db \
plugin_name=postgresql-database-plugin \
allowed_roles="acme-readonly,acme-readwrite" \
connection_url="postgresql://{{username}}:{{password}}@postgres.ge-system.svc:5432/acme_db?sslmode=require" \
username="vault_admin" \
password="$(vault kv get -field=password secret/ge/infrastructure/vault-db-admin)"
# Create read-only role (TTL: 1 hour, max: 24 hours)
vault write database/roles/acme-readonly \
db_name=client-acme-db \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="REVOKE ALL ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
# Create read-write role
vault write database/roles/acme-readwrite \
db_name=client-acme-db \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\"; \
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="REVOKE ALL ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; \
REVOKE USAGE ON ALL SEQUENCES IN SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="1h" \
max_ttl="24h"
USAGE¶
# Application requests temporary credentials
vault read database/creds/acme-readonly
# Returns: { username: "v-approle-acme-read-xxxxx", password: "yyyyy", lease_id: "...", lease_duration: 3600 }
# Application uses these credentials for PostgreSQL connection
# When TTL expires, Vault automatically revokes the role
# Renew lease before expiry (if more time needed)
vault lease renew database/creds/acme-readonly/lease-id-here
# Revoke immediately (on application shutdown)
vault lease revoke database/creds/acme-readonly/lease-id-here
DYNAMIC_CREDENTIALS_RULES¶
RULE: use dynamic credentials for ALL client project databases
RULE: application must handle credential expiry gracefully (reconnect with new creds)
RULE: default TTL should be short (1 hour) — renew as needed
RULE: revoke credentials on application shutdown (graceful cleanup)
ANTI_PATTERN: using static database passwords for client projects
FIX: configure dynamic credentials via Vault Database engine
ANTI_PATTERN: setting max_ttl too long (days) — defeated purpose of dynamic creds
FIX: max_ttl = 24 hours, default_ttl = 1 hour
PKI_ENGINE¶
STANDARD: Vault PKI Secrets Engine
USE_CASE: TLS certificates for internal services, mTLS between services
SETUP_INTERNAL_CA¶
# Enable PKI engine for root CA
vault secrets enable -path=pki pki
vault secrets tune -max-lease-ttl=87600h pki # 10 years
# Generate root CA
vault write pki/root/generate/internal \
common_name="GE Internal Root CA" \
ttl=87600h \
key_bits=4096
# Enable PKI for intermediate CA
vault secrets enable -path=pki_int pki
vault secrets tune -max-lease-ttl=43800h pki_int # 5 years
# Generate intermediate CA CSR
vault write pki_int/intermediate/generate/internal \
common_name="GE Internal Intermediate CA" \
key_bits=4096
# Sign intermediate with root
vault write pki/root/sign-intermediate \
csr=@intermediate.csr \
format=pem_bundle \
ttl=43800h
# Set signed intermediate certificate
vault write pki_int/intermediate/set-signed certificate=@signed_intermediate.pem
# Create role for issuing certificates
vault write pki_int/roles/ge-internal \
allowed_domains="ge-system.svc.cluster.local,ge-agents.svc.cluster.local" \
allow_subdomains=true \
max_ttl=720h \
key_bits=2048 \
key_type=rsa \
require_cn=false \
allowed_uri_sans="spiffe://ge-cluster/*"
ISSUE_CERTIFICATE¶
# Issue certificate for a service
vault write pki_int/issue/ge-internal \
common_name="admin-ui.ge-system.svc.cluster.local" \
alt_names="admin-ui.ge-system.svc,admin-ui" \
ttl=720h
# Returns: certificate, private_key, ca_chain, serial_number
# For mTLS: issue client certificate
vault write pki_int/issue/ge-internal \
common_name="ge-orchestrator.ge-agents.svc.cluster.local" \
ttl=720h
PKI_RULES¶
RULE: root CA key NEVER leaves Vault — sign intermediates only
RULE: intermediate CA issues all service certificates
RULE: certificate TTL max 30 days for internal services (auto-renew)
RULE: certificate TTL max 90 days for external-facing (Let's Encrypt preferred)
RULE: revoke certificates immediately when service is decommissioned
APPROLE_AUTH¶
STANDARD: Vault AppRole Auth Method
GE_USAGE: all k8s pods authenticate to Vault via AppRole
SETUP¶
# Enable AppRole auth
vault auth enable approle
# Create policy for admin-ui
vault policy write admin-ui-policy - <<EOF
# Read GE internal secrets
path "secret/data/ge/admin-ui/*" {
capabilities = ["read", "list"]
}
# Read client secrets (admin-ui needs access to all clients)
path "secret/data/clients/*" {
capabilities = ["read", "list"]
}
# No write access — admin-ui reads only
# Write access only for piotr's management role
EOF
# Create policy for executor
vault policy write executor-policy - <<EOF
path "secret/data/ge/executor/*" {
capabilities = ["read"]
}
# Executor gets API keys only — no client data access
EOF
# Create policy for client project app
vault policy write client-acme-policy - <<EOF
path "secret/data/clients/acme-corp/*" {
capabilities = ["read"]
}
path "database/creds/acme-readonly" {
capabilities = ["read"]
}
path "database/creds/acme-readwrite" {
capabilities = ["read"]
}
# Client app sees ONLY its own secrets — not other clients
EOF
# Create AppRole for admin-ui
vault write auth/approle/role/admin-ui \
token_policies="admin-ui-policy" \
token_ttl=1h \
token_max_ttl=4h \
secret_id_ttl=0 \
token_num_uses=0
# Get role ID (static, can be in config)
vault read auth/approle/role/admin-ui/role-id
# Generate secret ID (sensitive, must be in k8s Secret)
vault write -f auth/approle/role/admin-ui/secret-id
K8S_INTEGRATION¶
# k8s Secret containing AppRole credentials
apiVersion: v1
kind: Secret
metadata:
name: vault-approle
namespace: ge-system
type: Opaque
data:
role-id: <base64-encoded-role-id>
secret-id: <base64-encoded-secret-id>
---
# Pod spec mounting Vault credentials
spec:
containers:
- name: admin-ui
env:
- name: VAULT_ADDR
value: "http://vault.ge-system.svc.cluster.local:8200"
- name: VAULT_ROLE_ID
valueFrom:
secretKeyRef:
name: vault-approle
key: role-id
- name: VAULT_SECRET_ID
valueFrom:
secretKeyRef:
name: vault-approle
key: secret-id
APPLICATION_VAULT_CLIENT¶
// lib/vault-client.ts — Vault client for Node.js applications
import Vault from 'node-vault'
let client: ReturnType<typeof Vault> | null = null
let tokenExpiry = 0
async function getClient(): Promise<ReturnType<typeof Vault>> {
if (client && Date.now() < tokenExpiry - 60000) return client
const vault = Vault({
apiVersion: 'v1',
endpoint: process.env.VAULT_ADDR!,
})
// Authenticate with AppRole
const { auth } = await vault.approleLogin({
role_id: process.env.VAULT_ROLE_ID!,
secret_id: process.env.VAULT_SECRET_ID!,
})
vault.token = auth.client_token
tokenExpiry = Date.now() + (auth.lease_duration * 1000)
client = vault
return vault
}
// Read a secret
export async function getSecret(path: string): Promise<Record<string, string>> {
const vault = await getClient()
const result = await vault.read(path)
return result.data.data // KV v2: data is nested
}
// Usage:
// const dbCreds = await getSecret('secret/data/clients/acme-corp/database/app-credentials')
// const { username, password } = dbCreds
APPROLE_RULES¶
RULE: one AppRole per application/service — never share roles
RULE: role-id is semi-public (like username) — secret-id is the credential
RULE: secret-id should be rotated regularly (every 90 days)
RULE: token TTL should match application lifecycle (1-4 hours, auto-renew)
RULE: policies follow principle of least privilege — each role reads ONLY its own paths
ANTI_PATTERN: one AppRole with wildcard access to all secrets
FIX: granular policies per role — client apps see only their client's secrets
ANTI_PATTERN: long-lived Vault tokens (days/weeks)
FIX: short TTL (1 hour) with auto-renewal in application code
EXTERNAL_SECRETS_OPERATOR¶
STANDARD: External Secrets Operator (ESO) for Kubernetes
USE_CASE: sync Vault secrets to k8s Secrets automatically
SETUP¶
# ClusterSecretStore — connects ESO to Vault
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "http://vault.ge-system.svc.cluster.local:8200"
path: "secret"
version: "v2"
auth:
appRole:
path: "approle"
roleId: "vault-role-id"
secretRef:
name: vault-approle-eso
namespace: ge-system
key: secret-id
---
# ExternalSecret — syncs specific secrets to k8s Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: admin-ui-secrets
namespace: ge-system
spec:
refreshInterval: 1h
secretStoreRef:
kind: ClusterSecretStore
name: vault-backend
target:
name: admin-ui-secrets # resulting k8s Secret name
creationPolicy: Owner
data:
- secretKey: NEXTAUTH_SECRET
remoteRef:
key: ge/admin-ui/nextauth-secret
property: value
- secretKey: DATABASE_URL
remoteRef:
key: ge/admin-ui/db-credentials
property: connection_string
ESO_RULES¶
RULE: use ExternalSecret for k8s workloads — do not manually create k8s Secrets for app credentials
RULE: refreshInterval: 1h for most secrets, 15m for frequently rotated secrets
RULE: target creationPolicy: Owner — ESO manages lifecycle
RULE: one ExternalSecret per application per namespace
ANTI_PATTERN: manually creating k8s Secrets with kubectl — drift from Vault
FIX: all application secrets via ExternalSecret resources
ANTI_PATTERN: refreshInterval too short (1m) — excessive Vault API calls
FIX: 1h default, 15m minimum for high-rotation secrets
ROTATION_POLICIES¶
ROTATION_SCHEDULE¶
| secret type | rotation period | method | owner |
|---|---|---|---|
| API keys (third-party) | 90 days | manual rotate + update in Vault | piotr |
| Database passwords (static) | 90 days | Vault rotate-root + app restart | piotr |
| Database credentials (dynamic) | 1 hour | automatic (Vault lease) | Vault |
| TLS certificates (internal) | 30 days | automatic (Vault PKI + cert-manager) | Vault |
| TLS certificates (external) | 90 days | Let's Encrypt auto-renewal | cert-manager |
| AppRole secret-id | 90 days | manual regenerate + update k8s Secret | piotr |
| Keycloak client secrets | 90 days | manual rotate in Keycloak + Vault | piotr + hugo |
| Encryption keys | 1 year | Vault transit key rotation (auto) | piotr |
| SSH keys | 1 year | regenerate + distribute | piotr |
ROTATION_PROCEDURE¶
FOR EACH SECRET ROTATION:
1. Generate new secret value
RUN: openssl rand -base64 32 (for passwords/keys)
OR: use Vault auto-rotation where available
2. Update secret in Vault
RUN: vault kv put secret/{path} value="new-secret-value"
3. Verify new version in Vault
RUN: vault kv get -version=latest secret/{path}
4. Trigger secret refresh in applications
IF using ESO: wait for refreshInterval OR trigger manual refresh
IF using direct Vault: application picks up on next token renewal
IF using k8s Secret: kubectl rollout restart deployment/{app}
5. Verify application works with new secret
CHECK: health endpoint returns 200
CHECK: functionality test passes
6. Update rotation metadata
RUN: vault kv metadata put -custom-metadata=last_rotated="$(date +%Y-%m-%d)" secret/{path}
7. Audit log entry
NOTE: Vault audit log captures this automatically — verify it's there
COMPROMISED_SECRET_PROCEDURE¶
IF secret is compromised (leaked, stolen, exposed):
1. IMMEDIATELY rotate the compromised secret
TIME: within 15 minutes of discovery
2. Revoke all active leases/tokens using the compromised secret
RUN: vault lease revoke -prefix secret/{path}
RUN: vault token revoke -accessor {accessor} (if token compromised)
3. Check audit log for unauthorized access
RUN: vault audit list
CHECK: any unexpected reads of the compromised path?
4. Assess blast radius
CHECK: what systems used this secret?
CHECK: what data could have been accessed?
CHECK: is there evidence of exploitation?
5. Notify stakeholders
IF client data potentially accessed → escalate to human (Dirk-Jan)
IF GDPR breach → 72-hour notification requirement (GDPR Art. 33)
6. Post-incident: determine how secret was compromised
CHECK: was it committed to git? (run trufflehog)
CHECK: was it logged? (search application logs)
CHECK: was it exposed in error message?
CHECK: was it transmitted insecurely?
EMERGENCY_PROCEDURES¶
SEAL_VAULT¶
WHEN TO SEAL:
- suspected Vault compromise
- during maintenance windows
- before infrastructure migration
HOW TO SEAL:
RUN: vault operator seal
EFFECT: all secrets become inaccessible, all leases suspended
NOTE: applications will fail to authenticate — this is EXPECTED during emergency
WARNING: sealing Vault affects ALL applications using it
DECISION: seal if compromise risk > availability impact
UNSEAL_VAULT¶
PROCEDURE:
1. Verify the emergency is resolved
2. Gather unseal key holders (Shamir's Secret Sharing — need threshold of N)
3. Each key holder runs: vault operator unseal {their-key-share}
4. After threshold reached, Vault unseals
5. Verify: vault status (should show Sealed: false)
6. Verify applications reconnect successfully
GE SETUP:
- 5 unseal key shares, threshold of 3
- Key holders: Dirk-Jan + 2 designated backup humans
- Keys stored in separate secure locations (NOT in Vault itself)
- Auto-unseal via cloud KMS is an option for availability
AUTO_UNSEAL¶
OPTION: Auto-unseal with transit key (another Vault or cloud KMS)
BENEFIT: Vault unseals automatically on restart — no human intervention
RISK: if KMS is compromised, Vault auto-unseals for attacker
RECOMMENDATION: use auto-unseal for availability, manual unseal for highest-security
IF using auto-unseal:
- protect KMS access with IAM + audit logging
- monitor unseal events — alert on unexpected unseals
- maintain manual unseal keys as backup
AUDIT_LOGGING¶
ENABLE_AUDIT¶
# Enable file audit device
vault audit enable file file_path=/vault/audit/vault-audit.log
# Enable syslog audit device (for centralized logging)
vault audit enable syslog tag="vault" facility="AUTH"
# RULE: at least TWO audit devices enabled
# If one fails, Vault blocks ALL operations rather than operate unaudited
AUDIT_LOG_MONITORING¶
MONITOR FOR:
- failed authentication attempts (brute force on AppRole)
- unexpected secret reads (data exfiltration)
- policy changes (privilege escalation)
- new auth method enablement (backdoor creation)
- seal/unseal events (emergency situation)
- root token generation (should NEVER happen in normal operation)
ALERT ON:
- >5 failed auth attempts in 1 minute from same source
- any read of secret/clients/* from unexpected source
- any policy write operation
- any auth enable/disable operation
- any seal event
- ANY root token creation
AUDIT_RULES¶
RULE: Vault audit MUST be enabled at all times — Vault refuses to operate without it
RULE: audit logs shipped to centralized logging — not only on Vault server
RULE: audit log retention: 1 year minimum (compliance requirement)
RULE: audit logs are APPEND-ONLY — no deletion, no modification
COMMON_MISTAKES¶
ANTI_PATTERN: using Vault root token for application access
FIX: root token used ONLY for initial setup, then revoked. Applications use AppRole.
ANTI_PATTERN: overly broad policies (path "secret/*" capabilities = ["read"])
FIX: granular paths per application — client app reads only its client's secrets
ANTI_PATTERN: not monitoring Vault audit logs
FIX: ship to centralized logging, alert on anomalies
ANTI_PATTERN: storing Vault unseal keys in a shared document
FIX: Shamir shares distributed to separate individuals, stored in separate secure locations
ANTI_PATTERN: no backup plan for Vault unavailability
FIX: test recovery procedure quarterly, maintain encrypted backup of Vault data
ANTI_PATTERN: using Vault for configuration (non-secret values)
FIX: config in ConfigMap/env vars, secrets in Vault — don't mix concerns
SELF_CHECK¶
BEFORE_DEPLOYING_SECRET_CHANGES:
- [ ] secret stored at correct path following conventions?
- [ ] policy restricts access to only necessary services?
- [ ] rotation schedule documented in metadata?
- [ ] audit logging enabled and shipping to central logging?
- [ ] ESO ExternalSecret configured (if k8s workload)?
- [ ] application handles secret rotation gracefully (reconnect)?
- [ ] no secrets in source code, env files, or config maps?
- [ ] backup/recovery procedure tested?
READ_ALSO: domains/security/authentication-patterns.md, domains/security/security-hardening.md