DOMAIN:SECURITY:SECURE_FAILURE_HANDLING¶
OWNER: koen, eric
ALSO_USED_BY: urszula, maxim, alex, tjitte, arjan, thijmen
UPDATED: 2026-03-19
SCOPE: all code reviews, all backend/frontend projects
SOURCE: Secure by Design (Johnsson/Deogun/Sawano, 2019), Ch. 9-10
CORE_PRINCIPLE¶
RULE: failures MUST NOT compromise security
RULE: never leak system internals through error responses
RULE: distinguish business exceptions from technical exceptions — handle separately
SECURITY:EXCEPTION_HANDLING¶
BUSINESS_VS_TECHNICAL_EXCEPTIONS¶
RULE: business exceptions (domain rule violated) — handle with domain logic
RULE: technical exceptions (DB down, network timeout) — handle with infrastructure logic
ANTI_PATTERN: mixing business and technical exceptions in same catch block
FIX: separate exception hierarchies — never cross-contaminate
| exception type | example | handling |
|---|---|---|
| business | insufficient funds, item out of stock | domain-specific response to user |
| technical | connection timeout, disk full, OOM | log internally, generic error to user |
EXCEPTION_PAYLOAD¶
RULE: exception messages for INTERNAL logging only — never expose to end user
RULE: never include stack traces, SQL, file paths, server names in user-facing errors
ANTI_PATTERN: catch(e) { res.json({ error: e.message }) }
FIX: catch(e) { log.error(e); res.json({ error: "An error occurred" }) }
ANTI_PATTERN: exception message contains user input verbatim → XSS via error page
FIX: sanitize or omit user input from error messages
GLOBAL_EXCEPTION_HANDLER¶
RULE: install global exception handler as safety net — catches unhandled exceptions
RULE: global handler returns GENERIC error to client, DETAILED error to logs
RULE: global handler must NOT swallow exceptions silently — always log
CHECK: does the application have a global exception handler?
CHECK: does it return generic messages to users and detailed messages to logs?
HANDLING_WITHOUT_EXCEPTIONS¶
PATTERN: use Result/Either types instead of throwing for expected failures
BENEFIT: caller MUST handle both success and failure — can't forget
BENEFIT: no exception overhead, clearer control flow
WHEN: business logic with expected failure paths (validation, authorization)
WHEN: functional programming style
ANTI_PATTERN: using exceptions for flow control (e.g., try login, catch invalid password)
FIX: return Result
SECURITY:BAD_DATA¶
RULE: NEVER repair bad data — reject it
RULE: NEVER echo user input verbatim in error responses
RULE: treat all external input as potentially malicious
REPAIR_IS_DANGEROUS¶
ANTI_PATTERN: stripping HTML tags from input before storing
FIX: reject input that doesn't match domain rules — period
ANTI_PATTERN: auto-correcting date format, trimming special chars
FIX: validate against domain primitive — accept or reject, no middle ground
REASON: repair creates implicit assumptions about what "clean" means — attackers exploit these
NOTE: second-order attacks bypass repair: stored payload triggers on later retrieval
XSS_POLYGLOTS¶
CHECK: input that appears safe to ONE parser may be executable in ANOTHER
EXAMPLE: jaVasCript:/*-/*\\/\'//"//(/ /oNcliCk=alert() )//...` — bypasses many filters
RULE: defense in depth — validate input AND encode output — never rely on one layer
OUTPUT_ENCODING¶
RULE: always encode output for the context (HTML, JS, URL, CSS)
RULE: even validated domain primitives need output encoding when rendered
NOTE: domain primitives prevent most injection but not ALL — output encoding is the second layer
SECURITY:AVAILABILITY_DESIGN¶
RULE: design for failure — assume every dependency will fail
RULE: contain failures — prevent cascade across system
CIRCUIT_BREAKERS¶
PATTERN: wrap external calls in circuit breaker
STATES: closed (normal) → open (failing, fast-fail) → half-open (testing recovery)
IF dependency fails > threshold
→ circuit OPENS → all calls fail-fast (no timeout wait)
→ after cooldown → circuit HALF-OPEN → allow one test call
→ IF test succeeds → circuit CLOSES (normal)
→ IF test fails → circuit stays OPEN
BENEFIT: prevents cascade failure when dependency is down
BENEFIT: preserves thread pool — no threads waiting on dead service
CHECK: are external service calls wrapped in circuit breakers?
CHECK: is there a fallback response when circuit is open?
ANTI_PATTERN: infinite timeout on external calls → thread starvation → cascade failure
FIX: set explicit timeouts + circuit breaker
BULKHEADS¶
PATTERN: isolate resources per dependency — failure in one doesn't exhaust all
EXAMPLE: separate thread pool per external service, separate DB connection pool per tenant
BENEFIT: if service A is slow, it exhausts only its own pool — service B unaffected
CHECK: are resource pools (threads, connections) shared across independent concerns?
WORK_QUEUES¶
PATTERN: decouple request acceptance from processing via queue
BENEFIT: system accepts work at its own pace, not caller's pace
BENEFIT: queue absorbs burst traffic, protects downstream
CHECK: are high-throughput endpoints backed by queues?
DOS_TESTING¶
RULE: test availability as part of CI/CD
CHECK: has headroom been estimated? (max expected load × safety factor)
CHECK: what happens at 2x, 5x, 10x normal load?
CHECK: do domain rules have performance implications? (complex validation = DoS vector)
ANTI_PATTERN: regex on unbounded input → ReDoS
FIX: length limit BEFORE regex (see VALIDATION_ORDER in secure-design-patterns.md)
SECURITY:CLOUD_DESIGN¶
SOURCE: Secure by Design Ch. 10 — Twelve-Factor App + Three R's
TWELVE_FACTOR_SECURITY_BENEFITS¶
| factor | security benefit |
|---|---|
| codebase (one repo) | single audit surface |
| dependencies (explicit) | no hidden transitive risk |
| config (in environment) | no secrets in code |
| backing services (attached) | swap compromised service without code change |
| build/release/run (strict) | immutable releases, auditable |
| processes (stateless) | no session hijacking via server state |
| port binding (self-contained) | reduced attack surface |
| concurrency (scale out) | availability via redundancy |
| disposability (fast start/stop) | rapid replace compromised instances |
| dev/prod parity | security tests run in prod-like env |
| logs (event streams) | centralized, tamper-evident logging |
| admin processes (one-off) | auditable admin actions |
THREE_RS_OF_ENTERPRISE_SECURITY¶
STANDARD: Rotate, Repave, Repair
ROTATE_SECRETS¶
RULE: all secrets have expiry — rotate before expiry
RULE: treat credentials as ephemeral, not permanent
CHECK: when was each secret last rotated?
CHECK: are secrets stored in environment/vault, NEVER in code or config files?
ANTI_PATTERN: hardcoded API keys, database passwords in source
FIX: vault-managed secrets with automatic rotation
ANTI_PATTERN: long-lived API keys (>90 days)
FIX: short-lived tokens, automatic rotation schedule
REPAVE_SERVERS¶
RULE: regularly destroy and rebuild servers from immutable image
RULE: assume any long-running server is compromised
BENEFIT: APT attacks lose persistence — any implant destroyed on repave
BENEFIT: drift eliminated — known-good state guaranteed
CHECK: how long since last repave? (target: hours/days, not weeks/months)
NOTE: containers make repaving trivial — kubectl rollout restart
REPAIR_VULNERABILITIES¶
RULE: patch known CVEs immediately — don't wait for maintenance window
RULE: automate vulnerability scanning in CI/CD pipeline
CHECK: is there a process for emergency patching?
CHECK: are CVE notifications monitored and triaged?
CONFIGURATION_SECURITY¶
RULE: NEVER store config in code — use environment variables or external config service
RULE: NEVER store secrets in resource files (even if encrypted locally)
RULE: encrypt sensitive config values at rest
CHECK: are there secrets in git history? (even if removed from HEAD)
ANTI_PATTERN: config values that change behavior without audit trail
FIX: config changes go through version control or auditable config service
LOGGING_SECURITY¶
RULE: log to event stream (stdout), not to file on disk
RULE: centralize logs — aggregated, searchable, tamper-evident
ANTI_PATTERN: logging to local file → availability risk (disk full), confidentiality risk (who can read?), integrity risk (can be modified/deleted)
FIX: log as event stream → centralized service → immutable storage
| CIA-T concern | file logging risk | stream logging benefit |
|---|---|---|
| confidentiality | anyone with server access reads logs | centralized access control |
| integrity | logs can be modified/deleted | append-only, tamper-evident |
| availability | disk full = log loss | external storage, unlimited |
| traceability | scattered across servers | unified, searchable |
WIKI_REF: domains/security/books/secure-by-design.md (full chapter mapping)
READ_ALSO: domains/security/secure-design-patterns.md, domains/security/index.md