Skip to content

DOMAIN:SYSTEM_INTEGRITY — CONFIGURATION_VALIDATION

OWNER: ron
ALSO_USED_BY: gerco, thijmen, rutger, annegreet
UPDATED: 2026-03-26
SCOPE: schema validation, immutable config enforcement, config versioning, secret rotation, certificate monitoring


CONFIGURATION_VALIDATION:CORE_PRINCIPLE

PURPOSE: ensure all configuration files conform to their schema, are versioned, and have not been tampered with

RULE: every config file MUST have a corresponding schema definition
RULE: schema validation runs before deployment — never deploy unvalidated config
RULE: validation failures are blocking — no exceptions, no overrides
RULE: config changes require git commit — uncommitted config is unauthorized config

CHECK: is the config file format parseable?
IF: YAML syntax error THEN: reject immediately — do not attempt partial parsing
IF: JSON syntax error THEN: reject immediately — malformed config is worse than no config


CONFIGURATION_VALIDATION:SCHEMA_VALIDATION

YAML_SCHEMA_VALIDATION

PURPOSE: validate YAML config files against JSON Schema definitions

TOOL: conftest, yq, or custom validator
RUN: conftest test config/ports.yaml --policy policy/config-schemas/

GE_CONFIG_FILES requiring schema validation:

config/ports.yaml              — port assignments (redis, admin-ui, wiki, etc.)  
config/dolly-routing.yaml      — agent routing rules (work_type → agent mapping)  
config/agent-execution.yaml    — execution limits (timeouts, retries, concurrency)  
config/post-completion-hooks.yaml — hook definitions (trigger conditions, targets)  
config/constitution.md         — agent constitution (structural validation only)  

CHECK: ports.yaml has no duplicate port assignments
IF: two services claim same port THEN: CRITICAL — port conflict on startup
CHECK: dolly-routing.yaml references only agents that exist in AGENT-REGISTRY.json
IF: route targets non-existent agent THEN: HIGH — work will be undeliverable
CHECK: agent-execution.yaml timeout values are positive integers
IF: timeout <= 0 THEN: agent will never execute — reject config

ANTI_PATTERN: validating only structure, not semantic correctness
FIX: schema checks structure; custom validators check cross-file references

JSON_SCHEMA_VALIDATION

PURPOSE: validate JSON config files (AGENT-REGISTRY.json) against schema

TOOL: ajv or conftest
RUN: conftest test ge-ops/master/AGENT-REGISTRY.json --policy policy/registry-schema/

REQUIRED_FIELDS per agent entry:

name:           string, lowercase, alphanumeric + hyphens  
displayName:    string, non-empty  
role:           string, one of known roles  
status:         enum: "active" | "unavailable" | "maintenance"  
provider:       enum: "anthropic" | "openai" | "google"  
providerModel:  string, matches provider's known models  
team:           enum: "alfa" | "bravo" | "zulu" | "shared" | null  

CHECK: all required fields present for every agent
CHECK: status is a valid enum value
CHECK: provider and providerModel are a valid combination
CHECK: no duplicate agent names

IF: agent missing required field THEN: reject registry change
IF: unknown status value THEN: reject — typo could silently disable agent
IF: provider/model mismatch THEN: reject — would cause API errors at execution time

KUBERNETES_MANIFEST_VALIDATION

PURPOSE: validate k8s manifests for correctness before apply

TOOL: kubectl, conftest
RUN: kubectl apply --dry-run=server -f k8s/base/agents/
RUN: conftest test k8s/base/ --policy policy/k8s-policies/

POLICIES to enforce:

1. all containers MUST have resource limits (cpu + memory)  
2. all containers MUST have liveness and readiness probes  
3. no containers running as root (securityContext.runAsNonRoot: true)  
4. no hostNetwork: true (causes port conflicts on rolling updates)  
5. no hostPath mounts except for kubectl/kubeconfig (admin-ui exception)  
6. HPA maxReplicas <= 5 (binding rule from CLAUDE.md)  
7. all Deployments have PodDisruptionBudget  
8. image pull policy correct (Never for local, Always for registry)  

CHECK: dry-run succeeds (API server accepts the manifest)
IF: dry-run fails THEN: manifest has API-level errors — fix before applying
CHECK: conftest passes (policy-level validation)
IF: conftest fails THEN: manifest violates policy — fix before applying

ANTI_PATTERN: only using dry-run, skipping policy checks
FIX: dry-run validates syntax; conftest validates policy — both are needed


CONFIGURATION_VALIDATION:IMMUTABLE_CONFIG_ENFORCEMENT

WHAT_IS_IMMUTABLE

PURPOSE: certain config values MUST NOT change without explicit human approval

IMMUTABLE_VALUES:

config/ports.yaml:            redis port (6381)  
CLAUDE.md:                    all binding rules (cost limits, MAXLEN, HPA caps)  
config/constitution.md:       constitution version and principles  
ge_agent/execution/cost_gate.py: threshold values ($5/$10/$100)  
k8s manifests:                HPA maxReplicas (5)  

RULE: changes to immutable values require a discussion (admin-ui API) and human approval
RULE: drift detection flags any change to immutable values as CRITICAL
RULE: automated processes MUST NOT modify immutable values

CHECK: has an immutable value changed since last baseline?
IF: yes AND no approved discussion exists THEN: CRITICAL — unauthorized change
IF: yes AND approved discussion exists THEN: update baseline, log the change

ENFORCEMENT_TECHNIQUE

TECHNIQUE: checksum-based immutability verification

1. compute sha256 of immutable config sections  
2. store checksums in DB as immutable_baselines  
3. on every drift detection cycle: recompute and compare  
4. any mismatch = unauthorized change alert  

TOOL: sha256sum
RUN: sha256sum config/ports.yaml config/constitution.md

TOOL: grep for threshold verification
RUN: grep -n 'SESSION_LIMIT\|AGENT_HOUR_LIMIT\|DAILY_LIMIT\|maxReplicas' ge_agent/execution/cost_gate.py k8s/base/agents/executor.yaml

CHECK: extracted values match CLAUDE.md binding rules
IF: any mismatch THEN: CRITICAL — binding rule violated

ANTI_PATTERN: checking immutability only at deploy time
FIX: runtime changes (kubectl edit, redis-cli SET) bypass deploy-time checks — verify continuously


CONFIGURATION_VALIDATION:CONFIG_VERSIONING

VERSION_TRACKING

PURPOSE: track which version of a config is active and when it changed

RULE: every config change is a git commit — git history IS the version history
RULE: config files include a header comment with last-updated date
RULE: breaking config changes increment a version number in the file

TECHNIQUE: git-based version tracking

1. git log --oneline -- config/ports.yaml (history of changes)  
2. git show HEAD:config/ports.yaml (current committed version)  
3. git diff HEAD -- config/ports.yaml (uncommitted changes)  
4. git blame config/ports.yaml (who changed each line)  

CHECK: are there uncommitted config changes?
IF: yes THEN: either commit them or revert — uncommitted config is a drift source

CHECK: does the running system use the committed version?
IF: config loaded at startup THEN: check pod creation time vs last config commit
IF: config loaded dynamically THEN: check config reload mechanism

ROLLBACK_CAPABILITY

RULE: every config change must be reversible
RULE: rollback = apply the previous git version of the config
RULE: test rollback procedure regularly — untested rollback is no rollback

TECHNIQUE: git-based rollback

1. identify the last known-good commit: git log --oneline -- config/<file>  
2. checkout the previous version: git show <commit>:config/<file> > /tmp/rollback.yaml  
3. validate the rollback config: conftest test /tmp/rollback.yaml  
4. apply: kubectl apply -f /tmp/rollback.yaml (or copy to config/)  
5. verify: drift detection confirms match  

ANTI_PATTERN: config changes without testing rollback
FIX: every config PR should include rollback instructions


CONFIGURATION_VALIDATION:SECRET_ROTATION_VERIFICATION

ROTATION_POLICY

PURPOSE: verify secrets are rotated on schedule and old secrets are invalidated

ROTATION_SCHEDULE:

API keys (LLM providers):    every 90 days  
Redis password:               every 90 days  
Database credentials:         every 90 days  
Internal API tokens:          every 90 days  
Vault keys:                   every 90 days  

CHECK: when was each secret last rotated?
TOOL: kubectl
RUN: kubectl get secret ge-secrets -n ge-agents -o json | jq '.metadata.annotations["last-rotated"]'

IF: last-rotated annotation missing THEN: rotation tracking not set up — add it
IF: last-rotated > 90 days ago THEN: rotation overdue — schedule rotation
IF: last-rotated within policy THEN: compliant

ROTATION_VERIFICATION

TECHNIQUE: post-rotation health check

1. rotate secret in Vault / k8s Secret  
2. restart dependent pods (rolling restart)  
3. verify pods start successfully with new secret  
4. verify old secret is invalidated (cannot authenticate)  
5. update last-rotated annotation  
6. log rotation event to session_learnings  

CHECK: after rotation, do all pods come up healthy?
IF: pod crashloops after rotation THEN: new secret may be wrong — rollback immediately
IF: old secret still works THEN: invalidation failed — old secret is a security risk

ANTI_PATTERN: rotating secrets without verifying the new secret works
FIX: always test new credentials before invalidating old ones

ANTI_PATTERN: rotating secrets during peak hours
FIX: rotate during maintenance window — rotation involves pod restarts


CONFIGURATION_VALIDATION:CERTIFICATE_MONITORING

TLS_CERTIFICATE_EXPIRY

PURPOSE: detect certificates approaching expiry before they cause outages

TOOL: openssl
RUN: echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -dates

CHECK: certificate notAfter is > 30 days from now
IF: < 30 days THEN: HIGH — schedule renewal
IF: < 7 days THEN: CRITICAL — renew immediately
IF: expired THEN: CRITICAL — service is broken for TLS clients

RULE: certificate checks run daily
RULE: alerts fire at 30-day, 14-day, 7-day, and 1-day thresholds
RULE: log certificate expiry status to session_learnings for audit

K8S_TLS_SECRETS

TOOL: kubectl
RUN: kubectl get secrets -A --field-selector type=kubernetes.io/tls -o json | jq '.items[] | {ns: .metadata.namespace, name: .metadata.name, created: .metadata.creationTimestamp}'

CHECK: TLS secrets exist for all ingresses that expect them
CHECK: TLS secret data contains both tls.crt and tls.key
CHECK: certificate in tls.crt is not expired

IF: TLS secret missing THEN: ingress will fail TLS termination
IF: certificate expired in secret THEN: browsers will show security warning

ANTI_PATTERN: only checking certificate expiry when users report TLS errors
FIX: automated daily checks catch expiry before it impacts users

ANTI_PATTERN: renewing certificates manually without automation
FIX: use cert-manager or similar for automated renewal — human renewal is error-prone


CONFIGURATION_VALIDATION:CROSS_FILE_CONSISTENCY

REFERENCE_INTEGRITY

PURPOSE: verify that config files referencing each other are consistent

CROSS_REFERENCES to validate:

dolly-routing.yaml agent names    → AGENT-REGISTRY.json agent names  
post-completion-hooks.yaml agents → AGENT-REGISTRY.json agent names  
k8s manifests configMap refs      → actual ConfigMap names in cluster  
k8s manifests secret refs         → actual Secret names in cluster  
admin-ui DB agent records         → AGENT-REGISTRY.json entries  

CHECK: every agent name in routing config exists in registry
IF: routing references non-existent agent THEN: work will be lost — HIGH
CHECK: every ConfigMap/Secret referenced in manifests exists
IF: reference broken THEN: pod will fail to start — CRITICAL

TOOL: bash + jq
RUN: jq -r '.[].name' ge-ops/master/AGENT-REGISTRY.json | sort > /tmp/registry-agents.txt
RUN: grep -oP 'agent_id:\s*\K\w+' config/dolly-routing.yaml | sort > /tmp/routing-agents.txt
RUN: comm -23 /tmp/routing-agents.txt /tmp/registry-agents.txt

IF: output is non-empty THEN: routing references agents not in registry — fix routing config

ANTI_PATTERN: validating each config file in isolation
FIX: cross-file reference checks catch broken links between configs