DOMAIN:SECURITY:SECURE_FAILURE_HANDLING¶

OWNER: koen, eric
ALSO_USED_BY: urszula, maxim, alex, tjitte, arjan, thijmen
UPDATED: 2026-03-19
SCOPE: all code reviews, all backend/frontend projects
SOURCE: Secure by Design (Johnsson/Deogun/Sawano, 2019), Ch. 9-10

CORE_PRINCIPLE¶

RULE: failures MUST NOT compromise security
RULE: never leak system internals through error responses
RULE: distinguish business exceptions from technical exceptions — handle separately

SECURITY:EXCEPTION_HANDLING¶

BUSINESS_VS_TECHNICAL_EXCEPTIONS¶

RULE: business exceptions (domain rule violated) — handle with domain logic
RULE: technical exceptions (DB down, network timeout) — handle with infrastructure logic
ANTI_PATTERN: mixing business and technical exceptions in same catch block
FIX: separate exception hierarchies — never cross-contaminate

exception type	example	handling
business	insufficient funds, item out of stock	domain-specific response to user
technical	connection timeout, disk full, OOM	log internally, generic error to user

EXCEPTION_PAYLOAD¶

RULE: exception messages for INTERNAL logging only — never expose to end user
RULE: never include stack traces, SQL, file paths, server names in user-facing errors
ANTI_PATTERN: catch(e) { res.json({ error: e.message }) }
FIX: catch(e) { log.error(e); res.json({ error: "An error occurred" }) }
ANTI_PATTERN: exception message contains user input verbatim → XSS via error page
FIX: sanitize or omit user input from error messages

GLOBAL_EXCEPTION_HANDLER¶

RULE: install global exception handler as safety net — catches unhandled exceptions
RULE: global handler returns GENERIC error to client, DETAILED error to logs
RULE: global handler must NOT swallow exceptions silently — always log
CHECK: does the application have a global exception handler?
CHECK: does it return generic messages to users and detailed messages to logs?

HANDLING_WITHOUT_EXCEPTIONS¶

PATTERN: use Result/Either types instead of throwing for expected failures
BENEFIT: caller MUST handle both success and failure — can't forget
BENEFIT: no exception overhead, clearer control flow
WHEN: business logic with expected failure paths (validation, authorization)
WHEN: functional programming style
ANTI_PATTERN: using exceptions for flow control (e.g., try login, catch invalid password)
FIX: return Result — caller handles explicitly

SECURITY:BAD_DATA¶

RULE: NEVER repair bad data — reject it
RULE: NEVER echo user input verbatim in error responses
RULE: treat all external input as potentially malicious

REPAIR_IS_DANGEROUS¶

ANTI_PATTERN: stripping HTML tags from input before storing
FIX: reject input that doesn't match domain rules — period
ANTI_PATTERN: auto-correcting date format, trimming special chars
FIX: validate against domain primitive — accept or reject, no middle ground
REASON: repair creates implicit assumptions about what "clean" means — attackers exploit these
NOTE: second-order attacks bypass repair: stored payload triggers on later retrieval

XSS_POLYGLOTS¶

CHECK: input that appears safe to ONE parser may be executable in ANOTHER
EXAMPLE: jaVasCript:/*-/*\\/\'//"//(/ /oNcliCk=alert() )//...` — bypasses many filters
RULE: defense in depth — validate input AND encode output — never rely on one layer

OUTPUT_ENCODING¶

RULE: always encode output for the context (HTML, JS, URL, CSS)
RULE: even validated domain primitives need output encoding when rendered
NOTE: domain primitives prevent most injection but not ALL — output encoding is the second layer

SECURITY:AVAILABILITY_DESIGN¶

RULE: design for failure — assume every dependency will fail
RULE: contain failures — prevent cascade across system

CIRCUIT_BREAKERS¶

PATTERN: wrap external calls in circuit breaker
STATES: closed (normal) → open (failing, fast-fail) → half-open (testing recovery)

IF dependency fails > threshold
  → circuit OPENS → all calls fail-fast (no timeout wait)
  → after cooldown → circuit HALF-OPEN → allow one test call
  → IF test succeeds → circuit CLOSES (normal)
  → IF test fails → circuit stays OPEN

BENEFIT: prevents cascade failure when dependency is down
BENEFIT: preserves thread pool — no threads waiting on dead service
CHECK: are external service calls wrapped in circuit breakers?
CHECK: is there a fallback response when circuit is open?
ANTI_PATTERN: infinite timeout on external calls → thread starvation → cascade failure
FIX: set explicit timeouts + circuit breaker

BULKHEADS¶

PATTERN: isolate resources per dependency — failure in one doesn't exhaust all
EXAMPLE: separate thread pool per external service, separate DB connection pool per tenant
BENEFIT: if service A is slow, it exhausts only its own pool — service B unaffected
CHECK: are resource pools (threads, connections) shared across independent concerns?

WORK_QUEUES¶

PATTERN: decouple request acceptance from processing via queue
BENEFIT: system accepts work at its own pace, not caller's pace
BENEFIT: queue absorbs burst traffic, protects downstream
CHECK: are high-throughput endpoints backed by queues?

DOS_TESTING¶

RULE: test availability as part of CI/CD
CHECK: has headroom been estimated? (max expected load × safety factor)
CHECK: what happens at 2x, 5x, 10x normal load?
CHECK: do domain rules have performance implications? (complex validation = DoS vector)
ANTI_PATTERN: regex on unbounded input → ReDoS
FIX: length limit BEFORE regex (see VALIDATION_ORDER in secure-design-patterns.md)

SECURITY:CLOUD_DESIGN¶

SOURCE: Secure by Design Ch. 10 — Twelve-Factor App + Three R's

TWELVE_FACTOR_SECURITY_BENEFITS¶

factor	security benefit
codebase (one repo)	single audit surface
dependencies (explicit)	no hidden transitive risk
config (in environment)	no secrets in code
backing services (attached)	swap compromised service without code change
build/release/run (strict)	immutable releases, auditable
processes (stateless)	no session hijacking via server state
port binding (self-contained)	reduced attack surface
concurrency (scale out)	availability via redundancy
disposability (fast start/stop)	rapid replace compromised instances
dev/prod parity	security tests run in prod-like env
logs (event streams)	centralized, tamper-evident logging
admin processes (one-off)	auditable admin actions

THREE_RS_OF_ENTERPRISE_SECURITY¶

STANDARD: Rotate, Repave, Repair

ROTATE_SECRETS¶

RULE: all secrets have expiry — rotate before expiry
RULE: treat credentials as ephemeral, not permanent
CHECK: when was each secret last rotated?
CHECK: are secrets stored in environment/vault, NEVER in code or config files?
ANTI_PATTERN: hardcoded API keys, database passwords in source
FIX: vault-managed secrets with automatic rotation
ANTI_PATTERN: long-lived API keys (>90 days)
FIX: short-lived tokens, automatic rotation schedule

REPAVE_SERVERS¶

RULE: regularly destroy and rebuild servers from immutable image
RULE: assume any long-running server is compromised
BENEFIT: APT attacks lose persistence — any implant destroyed on repave
BENEFIT: drift eliminated — known-good state guaranteed
CHECK: how long since last repave? (target: hours/days, not weeks/months)
NOTE: containers make repaving trivial — kubectl rollout restart

REPAIR_VULNERABILITIES¶

RULE: patch known CVEs immediately — don't wait for maintenance window
RULE: automate vulnerability scanning in CI/CD pipeline
CHECK: is there a process for emergency patching?
CHECK: are CVE notifications monitored and triaged?

CONFIGURATION_SECURITY¶

RULE: NEVER store config in code — use environment variables or external config service
RULE: NEVER store secrets in resource files (even if encrypted locally)
RULE: encrypt sensitive config values at rest
CHECK: are there secrets in git history? (even if removed from HEAD)
ANTI_PATTERN: config values that change behavior without audit trail
FIX: config changes go through version control or auditable config service

LOGGING_SECURITY¶

RULE: log to event stream (stdout), not to file on disk
RULE: centralize logs — aggregated, searchable, tamper-evident
ANTI_PATTERN: logging to local file → availability risk (disk full), confidentiality risk (who can read?), integrity risk (can be modified/deleted)
FIX: log as event stream → centralized service → immutable storage

CIA-T concern	file logging risk	stream logging benefit
confidentiality	anyone with server access reads logs	centralized access control
integrity	logs can be modified/deleted	append-only, tamper-evident
availability	disk full = log loss	external storage, unlimited
traceability	scattered across servers	unified, searchable

WIKI_REF: domains/security/books/secure-by-design.md (full chapter mapping)
READ_ALSO: domains/security/secure-design-patterns.md, domains/security/index.md