Testing Standards¶

TDD Approach¶

RULE: Write the test before the implementation. The test defines what "done" means. RATIONALE: Prevents scaffolding without integration. The test is the first call site for any new code.

Test Levels¶

Unit Tests (supplementary)¶

Test individual functions in isolation
Fast, numerous, focused
NOT sufficient for feature verification
NOT proof of life

Integration Tests (required for features)¶

Test the actual system path end-to-end
Exercise real network calls, real database operations, real Redis streams
This IS proof of life
Every new feature must have at least one integration test

Regression Tests (scheduled)¶

Run on schedule (post-deployment or daily)
Exercise all existing features through actual system paths
A failing regression test blocks new feature work (Principle 10)

What Tests Must Prove¶

For a new API endpoint: - The route is registered and reachable (curl from outside the pod) - The handler processes a real request - The response matches the contract schema - The side effects (DB writes, Redis publishes) actually occurred

For a new agent capability: - The trigger reaches the executor via Redis Stream - The executor spawns a real CLI session (Claude/Codex/Gemini) - The CLI produces real output (verified via PTY capture) - The completion file is written to ge-ops/system/completions/

NOT ACCEPTABLE: Tests that only verify the function body without verifying it's callable through the real path.

ENFORCEMENT: Marije/Judith run test suites. Koen/Eric verify test coverage in code review.

Testing Tools¶

webapp-testing Skill (Playwright)¶

The webapp-testing skill (installed in .claude/skills/, source: anthropics/skills) provides Playwright-based web app testing patterns. Used by Marije and Judith for E2E and integration test authoring during Phase 8 (Integration).

The skill auto-activates when agents work on test files targeting web applications. It provides structured patterns for:

Page object models
Test fixtures and setup/teardown
Assertion patterns for UI state
Network interception and mocking
Visual regression testing

CI Pipeline Testing Standards¶

General Rules¶

All tests must pass in CI — no allow_failure on test stages
Test paths must use dynamic GE_ROOT detection via tests.conftest.GE_ROOT_PATH — never hardcode /home/claude/ge-bootstrap in test files
Test fixtures must be self-contained and clean up after themselves

Mutation Testing¶

Mutation testing threshold: 80% on new code (enforced by test:mutation CI stage)
Mutation testing threshold: 60% on existing code (tracked, not yet blocking)
Tool: Stryker (TypeScript), mutmut (Python)

Adversarial Testing¶

Property-based testing with Hypothesis (Python) and fast-check (TypeScript) in the test:adversarial CI stage
Fuzz testing on condition evaluator and critical path functions
All 7 attack categories (type confusion, boundary, resource exhaustion, injection, concurrency, precision, unicode) must be covered

CI Job Reference¶

CI Job	Stage	What it verifies
`tdd:red-gate`	TDD	All TDD tests are red before implementation
`tdd:green-gate`	TDD	All TDD tests turn green after implementation
`tdd:oracle-check`	TDD	Oracle independence — tests don't import implementation
`build:backend`	Build	Implementation compiles and builds
`lint:python`	Quality	Ruff linting (zero errors)
`lint:secrets`	Quality	Gitleaks secret detection
`security:bandit`	Security	Python SAST
`security:semgrep`	Security	Multi-language static analysis
`security:dependency-scan`	Security	Dependency vulnerability audit
`test:unit:backend`	Testing	Backend unit test suite
`test:integration`	Testing	Full integration test suite
`test:reconciliation`	Testing	TDD vs post-impl test suite comparison
`test:adversarial`	Testing	Fuzz and property-based tests
`test:contract`	SSOT	API contract verification + verify_ssot.sh
`test:mutation`	Quality	Mutation testing thresholds
`review:gate`	Merge	Manual merge approval (future: automated scoring)