Skip to content

Pitfall: Mutation Testing

Rule

Every code change MUST maintain 80%+ mutation kill rate. No exceptions.

What Is Mutation Testing

Mutation testing injects small bugs (mutants) into your code — flipping operators, removing statements, changing constants. If your tests don't catch the injected bug, the mutant "survives" and your tests have a blind spot.

A mutation score of 80% means: 80% of injected bugs are caught by your test suite.

Tools

Stack Tool Config Threshold
TypeScript (admin-ui) Stryker admin-ui/stryker.config.mjs break: 80
Python (ge-bootstrap) mutmut 3.x setup.cfg [mutmut] TDD_MUTATION_THRESHOLD=80

Pre-Push Check

# TypeScript
cd admin-ui && npx stryker run --incremental

# Python
mutmut run --max-children 8

Common Mistakes

Shipping code without tests

New files with 0% coverage drag the overall score down. Always ship tests with code. This is the #1 cause of pipeline failures.

mutmut 3.x paths_to_mutate format

mutmut 3.x does NOT support comma-separated paths in setup.cfg. Use a directory:

# WRONG (treats entire string as one path):
paths_to_mutate=ge_orchestrator/a.py,ge_orchestrator/b.py

# RIGHT:
paths_to_mutate=ge_orchestrator

mutmut sandbox and imports

mutmut 3.x copies tests to a mutants/ sandbox but NOT the source package. A conftest.py in the test directory must add the project root to sys.path:

import os, sys
from pathlib import Path
_cwd = Path(os.getcwd())
_project_root = str(_cwd.parent) if _cwd.name == "mutants" else str(_cwd)
if _project_root not in sys.path:
    sys.path.insert(0, _project_root)

Tests that don't actually test behavior

If tests pass but mutation score is low, your tests are likely: - Checking types/shapes instead of values - Using mocks that don't verify calls - Testing happy path only (no edge cases) - Asserting existence instead of correctness

CI Tier

  • mutation:typescript — STANDARD tier (merge-blocking on every MR)
  • test:mutation (Python) — FULL tier (nightly + manual, due to CPU cost)

Incident History

  • 2026-04-10: Score dropped 50.71% → 49.13% when ETF Phase 2 + agent backfill merged without mutation tests. Blocked the pipeline for the entire CI/CD fix session.