DOMAIN:SYSTEM_INTEGRITY — CONFIGURATION_VALIDATION¶
OWNER: ron
ALSO_USED_BY: gerco, thijmen, rutger, annegreet
UPDATED: 2026-03-26
SCOPE: schema validation, immutable config enforcement, config versioning, secret rotation, certificate monitoring
CONFIGURATION_VALIDATION:CORE_PRINCIPLE¶
PURPOSE: ensure all configuration files conform to their schema, are versioned, and have not been tampered with
RULE: every config file MUST have a corresponding schema definition
RULE: schema validation runs before deployment — never deploy unvalidated config
RULE: validation failures are blocking — no exceptions, no overrides
RULE: config changes require git commit — uncommitted config is unauthorized config
CHECK: is the config file format parseable?
IF: YAML syntax error THEN: reject immediately — do not attempt partial parsing
IF: JSON syntax error THEN: reject immediately — malformed config is worse than no config
CONFIGURATION_VALIDATION:SCHEMA_VALIDATION¶
YAML_SCHEMA_VALIDATION¶
PURPOSE: validate YAML config files against JSON Schema definitions
TOOL: conftest, yq, or custom validator
RUN: conftest test config/ports.yaml --policy policy/config-schemas/
GE_CONFIG_FILES requiring schema validation:
config/ports.yaml — port assignments (redis, admin-ui, wiki, etc.)
config/dolly-routing.yaml — agent routing rules (work_type → agent mapping)
config/agent-execution.yaml — execution limits (timeouts, retries, concurrency)
config/post-completion-hooks.yaml — hook definitions (trigger conditions, targets)
config/constitution.md — agent constitution (structural validation only)
CHECK: ports.yaml has no duplicate port assignments
IF: two services claim same port THEN: CRITICAL — port conflict on startup
CHECK: dolly-routing.yaml references only agents that exist in AGENT-REGISTRY.json
IF: route targets non-existent agent THEN: HIGH — work will be undeliverable
CHECK: agent-execution.yaml timeout values are positive integers
IF: timeout <= 0 THEN: agent will never execute — reject config
ANTI_PATTERN: validating only structure, not semantic correctness
FIX: schema checks structure; custom validators check cross-file references
JSON_SCHEMA_VALIDATION¶
PURPOSE: validate JSON config files (AGENT-REGISTRY.json) against schema
TOOL: ajv or conftest
RUN: conftest test ge-ops/master/AGENT-REGISTRY.json --policy policy/registry-schema/
REQUIRED_FIELDS per agent entry:
name: string, lowercase, alphanumeric + hyphens
displayName: string, non-empty
role: string, one of known roles
status: enum: "active" | "unavailable" | "maintenance"
provider: enum: "anthropic" | "openai" | "google"
providerModel: string, matches provider's known models
team: enum: "alfa" | "bravo" | "zulu" | "shared" | null
CHECK: all required fields present for every agent
CHECK: status is a valid enum value
CHECK: provider and providerModel are a valid combination
CHECK: no duplicate agent names
IF: agent missing required field THEN: reject registry change
IF: unknown status value THEN: reject — typo could silently disable agent
IF: provider/model mismatch THEN: reject — would cause API errors at execution time
KUBERNETES_MANIFEST_VALIDATION¶
PURPOSE: validate k8s manifests for correctness before apply
TOOL: kubectl, conftest
RUN: kubectl apply --dry-run=server -f k8s/base/agents/
RUN: conftest test k8s/base/ --policy policy/k8s-policies/
POLICIES to enforce:
1. all containers MUST have resource limits (cpu + memory)
2. all containers MUST have liveness and readiness probes
3. no containers running as root (securityContext.runAsNonRoot: true)
4. no hostNetwork: true (causes port conflicts on rolling updates)
5. no hostPath mounts except for kubectl/kubeconfig (admin-ui exception)
6. HPA maxReplicas <= 5 (binding rule from CLAUDE.md)
7. all Deployments have PodDisruptionBudget
8. image pull policy correct (Never for local, Always for registry)
CHECK: dry-run succeeds (API server accepts the manifest)
IF: dry-run fails THEN: manifest has API-level errors — fix before applying
CHECK: conftest passes (policy-level validation)
IF: conftest fails THEN: manifest violates policy — fix before applying
ANTI_PATTERN: only using dry-run, skipping policy checks
FIX: dry-run validates syntax; conftest validates policy — both are needed
CONFIGURATION_VALIDATION:IMMUTABLE_CONFIG_ENFORCEMENT¶
WHAT_IS_IMMUTABLE¶
PURPOSE: certain config values MUST NOT change without explicit human approval
IMMUTABLE_VALUES:
config/ports.yaml: redis port (6381)
CLAUDE.md: all binding rules (cost limits, MAXLEN, HPA caps)
config/constitution.md: constitution version and principles
ge_agent/execution/cost_gate.py: threshold values ($5/$10/$100)
k8s manifests: HPA maxReplicas (5)
RULE: changes to immutable values require a discussion (admin-ui API) and human approval
RULE: drift detection flags any change to immutable values as CRITICAL
RULE: automated processes MUST NOT modify immutable values
CHECK: has an immutable value changed since last baseline?
IF: yes AND no approved discussion exists THEN: CRITICAL — unauthorized change
IF: yes AND approved discussion exists THEN: update baseline, log the change
ENFORCEMENT_TECHNIQUE¶
TECHNIQUE: checksum-based immutability verification
1. compute sha256 of immutable config sections
2. store checksums in DB as immutable_baselines
3. on every drift detection cycle: recompute and compare
4. any mismatch = unauthorized change alert
TOOL: sha256sum
RUN: sha256sum config/ports.yaml config/constitution.md
TOOL: grep for threshold verification
RUN: grep -n 'SESSION_LIMIT\|AGENT_HOUR_LIMIT\|DAILY_LIMIT\|maxReplicas' ge_agent/execution/cost_gate.py k8s/base/agents/executor.yaml
CHECK: extracted values match CLAUDE.md binding rules
IF: any mismatch THEN: CRITICAL — binding rule violated
ANTI_PATTERN: checking immutability only at deploy time
FIX: runtime changes (kubectl edit, redis-cli SET) bypass deploy-time checks — verify continuously
CONFIGURATION_VALIDATION:CONFIG_VERSIONING¶
VERSION_TRACKING¶
PURPOSE: track which version of a config is active and when it changed
RULE: every config change is a git commit — git history IS the version history
RULE: config files include a header comment with last-updated date
RULE: breaking config changes increment a version number in the file
TECHNIQUE: git-based version tracking
1. git log --oneline -- config/ports.yaml (history of changes)
2. git show HEAD:config/ports.yaml (current committed version)
3. git diff HEAD -- config/ports.yaml (uncommitted changes)
4. git blame config/ports.yaml (who changed each line)
CHECK: are there uncommitted config changes?
IF: yes THEN: either commit them or revert — uncommitted config is a drift source
CHECK: does the running system use the committed version?
IF: config loaded at startup THEN: check pod creation time vs last config commit
IF: config loaded dynamically THEN: check config reload mechanism
ROLLBACK_CAPABILITY¶
RULE: every config change must be reversible
RULE: rollback = apply the previous git version of the config
RULE: test rollback procedure regularly — untested rollback is no rollback
TECHNIQUE: git-based rollback
1. identify the last known-good commit: git log --oneline -- config/<file>
2. checkout the previous version: git show <commit>:config/<file> > /tmp/rollback.yaml
3. validate the rollback config: conftest test /tmp/rollback.yaml
4. apply: kubectl apply -f /tmp/rollback.yaml (or copy to config/)
5. verify: drift detection confirms match
ANTI_PATTERN: config changes without testing rollback
FIX: every config PR should include rollback instructions
CONFIGURATION_VALIDATION:SECRET_ROTATION_VERIFICATION¶
ROTATION_POLICY¶
PURPOSE: verify secrets are rotated on schedule and old secrets are invalidated
ROTATION_SCHEDULE:
API keys (LLM providers): every 90 days
Redis password: every 90 days
Database credentials: every 90 days
Internal API tokens: every 90 days
Vault keys: every 90 days
CHECK: when was each secret last rotated?
TOOL: kubectl
RUN: kubectl get secret ge-secrets -n ge-agents -o json | jq '.metadata.annotations["last-rotated"]'
IF: last-rotated annotation missing THEN: rotation tracking not set up — add it
IF: last-rotated > 90 days ago THEN: rotation overdue — schedule rotation
IF: last-rotated within policy THEN: compliant
ROTATION_VERIFICATION¶
TECHNIQUE: post-rotation health check
1. rotate secret in Vault / k8s Secret
2. restart dependent pods (rolling restart)
3. verify pods start successfully with new secret
4. verify old secret is invalidated (cannot authenticate)
5. update last-rotated annotation
6. log rotation event to session_learnings
CHECK: after rotation, do all pods come up healthy?
IF: pod crashloops after rotation THEN: new secret may be wrong — rollback immediately
IF: old secret still works THEN: invalidation failed — old secret is a security risk
ANTI_PATTERN: rotating secrets without verifying the new secret works
FIX: always test new credentials before invalidating old ones
ANTI_PATTERN: rotating secrets during peak hours
FIX: rotate during maintenance window — rotation involves pod restarts
CONFIGURATION_VALIDATION:CERTIFICATE_MONITORING¶
TLS_CERTIFICATE_EXPIRY¶
PURPOSE: detect certificates approaching expiry before they cause outages
TOOL: openssl
RUN: echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -dates
CHECK: certificate notAfter is > 30 days from now
IF: < 30 days THEN: HIGH — schedule renewal
IF: < 7 days THEN: CRITICAL — renew immediately
IF: expired THEN: CRITICAL — service is broken for TLS clients
RULE: certificate checks run daily
RULE: alerts fire at 30-day, 14-day, 7-day, and 1-day thresholds
RULE: log certificate expiry status to session_learnings for audit
K8S_TLS_SECRETS¶
TOOL: kubectl
RUN: kubectl get secrets -A --field-selector type=kubernetes.io/tls -o json | jq '.items[] | {ns: .metadata.namespace, name: .metadata.name, created: .metadata.creationTimestamp}'
CHECK: TLS secrets exist for all ingresses that expect them
CHECK: TLS secret data contains both tls.crt and tls.key
CHECK: certificate in tls.crt is not expired
IF: TLS secret missing THEN: ingress will fail TLS termination
IF: certificate expired in secret THEN: browsers will show security warning
ANTI_PATTERN: only checking certificate expiry when users report TLS errors
FIX: automated daily checks catch expiry before it impacts users
ANTI_PATTERN: renewing certificates manually without automation
FIX: use cert-manager or similar for automated renewal — human renewal is error-prone
CONFIGURATION_VALIDATION:CROSS_FILE_CONSISTENCY¶
REFERENCE_INTEGRITY¶
PURPOSE: verify that config files referencing each other are consistent
CROSS_REFERENCES to validate:
dolly-routing.yaml agent names → AGENT-REGISTRY.json agent names
post-completion-hooks.yaml agents → AGENT-REGISTRY.json agent names
k8s manifests configMap refs → actual ConfigMap names in cluster
k8s manifests secret refs → actual Secret names in cluster
admin-ui DB agent records → AGENT-REGISTRY.json entries
CHECK: every agent name in routing config exists in registry
IF: routing references non-existent agent THEN: work will be lost — HIGH
CHECK: every ConfigMap/Secret referenced in manifests exists
IF: reference broken THEN: pod will fail to start — CRITICAL
TOOL: bash + jq
RUN: jq -r '.[].name' ge-ops/master/AGENT-REGISTRY.json | sort > /tmp/registry-agents.txt
RUN: grep -oP 'agent_id:\s*\K\w+' config/dolly-routing.yaml | sort > /tmp/routing-agents.txt
RUN: comm -23 /tmp/routing-agents.txt /tmp/registry-agents.txt
IF: output is non-empty THEN: routing references agents not in registry — fix routing config
ANTI_PATTERN: validating each config file in isolation
FIX: cross-file reference checks catch broken links between configs