DOMAIN:TESTING — THOUGHT_LEADERS¶
OWNER: marije, judith
ALSO_USED_BY: all testing agents (reference material)
UPDATED: 2026-03-24
SCOPE: testing philosophy, key thinkers, foundational resources
CORE_PHILOSOPHY¶
"Write tests. Not too many. Mostly integration."
— Guillermo Rauch (Vercel CEO), popularized by Kent C. Dodds
This single sentence captures GE's testing philosophy:
1. WRITE_TESTS: non-negotiable. Every feature has tests.
2. NOT_TOO_MANY: tests have a cost (maintenance, speed, false confidence). More is not always better.
3. MOSTLY_INTEGRATION: integration tests give the best confidence-to-cost ratio.
KENT_C_DODDS¶
WHO¶
Creator of Testing Library (React Testing Library, DOM Testing Library).
Primary advocate for testing USER BEHAVIOR over implementation details.
Author of "Testing JavaScript" course and the Testing Trophy concept.
TESTING_TROPHY (vs Test Pyramid)¶
Martin Fowler's pyramid: many unit → fewer integration → minimal E2E
Kent's trophy: fewer unit → MANY integration → some E2E → static analysis at base
TROPHY_LAYERS (bottom to top):
1. STATIC: TypeScript, ESLint — catches typos and type errors at zero runtime cost
2. UNIT: isolated function tests — fast but low confidence about integration
3. INTEGRATION: components/services working together — HIGHEST confidence per cost
4. E2E: full user flows — highest confidence but slowest and most expensive
GE_POSITION: we use a HYBRID — pyramid structure with trophy emphasis.
- Static analysis is mandatory (TypeScript strict, ESLint)
- Unit tests for pure logic (calculations, transformations, validators)
- Integration tests for service interactions (API routes, DB queries, component rendering)
- E2E tests for critical user journeys only (login, payment, core CRUD)
KEY_PRINCIPLES¶
PRINCIPLE_1: "The more your tests resemble the way your software is used, the more confidence they can give you."
MEANING: test through the user-facing interface, not internal APIs.
APPLICATION: use getByRole, getByText — not querySelector('.btn-class').
PRINCIPLE_2: "Write tests that give you confidence, not tests that give you coverage."
MEANING: a test that covers code but doesn't verify behavior is worthless.
APPLICATION: mutation testing (Koen's domain) validates this — coverage without detection = theater.
PRINCIPLE_3: "Avoid testing implementation details."
MEANING: don't test HOW something works, test WHAT it produces.
APPLICATION: don't assert on internal state, cache entries, or specific function calls.
EXCEPTION: GE's TDD phase (Antje) tests WHAT spec-compliance, post-impl phase tests WHAT behavior.
PRINCIPLE_4: "AHA — Avoid Hasty Abstractions in tests."
MEANING: test code should be MORE explicit than production code, not less.
APPLICATION: prefer duplication in tests over clever shared helpers that obscure what's being tested.
TESTING_LIBRARY_APPROACH¶
Testing Library enforces good testing by making it HARD to test implementation details:
- No access to component internals (state, props, methods)
- Queries by role, text, label — what the user sees
- Encourages accessible markup (if you can't find by role, your HTML needs fixing)
GE_RULE: use Testing Library queries in component tests
GE_RULE: if a test needs container.querySelector, the test is wrong OR the component needs accessibility fixes
MARTIN_FOWLER¶
WHO¶
Author of "Refactoring", co-author of Agile Manifesto.
Defined the Test Pyramid, coined many testing terms.
Chief Scientist at Thoughtworks.
TEST_PYRAMID¶
/ E2E \ Slow, expensive, high confidence
/ Integr. \ Medium speed, medium confidence
/ Unit \ Fast, cheap, lower confidence
PYRAMID_RULES:
- MANY unit tests: fast, cheap, isolate bugs quickly
- FEWER integration tests: verify components work together
- MINIMAL E2E tests: verify critical paths only
WHY_PYRAMID_SHAPE:
- Execution speed: unit > integration > E2E
- Maintenance cost: unit < integration < E2E
- Debugging ease: unit > integration > E2E
- False positive rate: unit < integration < E2E
GE_APPLICATION: GE follows the pyramid with a twist — TDD tests (Antje) are mostly unit-level,
post-impl tests (Marije/Judith) span all levels, and reconciliation (Jasper) covers the gaps.
TEST_DOUBLES (Fowler taxonomy)¶
DUMMY: passed but never used (fill a parameter)
STUB: provides canned answers to calls
SPY: records calls for later verification
MOCK: pre-programmed with expectations, verifies interaction
FAKE: working implementation but simplified (in-memory database)
GE_PREFERENCE:
- FAKES for databases in integration tests (test DB, not in-memory)
- STUBS for external APIs (predictable responses)
- SPIES for verifying side effects (email sent, event emitted)
- MOCKS sparingly — they couple tests to implementation
KEY_ARTICLES¶
"Test Pyramid" — https://martinfowler.com/bliki/TestPyramid.html
"Mocks Aren't Stubs" — https://martinfowler.com/articles/mocksArentStubs.html
"Test Double" — https://martinfowler.com/bliki/TestDouble.html
"Practical Test Pyramid" — https://martinfowler.com/articles/practical-test-pyramid.html
"Eradicating Non-Determinism in Tests" — https://martinfowler.com/articles/nonDeterminism.html
GOOGLE_TESTING_BLOG¶
WHO¶
Google's engineering testing team, publishing since 2007.
Source of many industry-standard testing practices.
Blog: https://testing.googleblog.com/
KEY_CONCEPTS¶
CONCEPT_1: TEST_SIZES (not types)
Google classifies tests by RESOURCE usage, not abstraction level:
- SMALL: single process, no I/O, < 1 min (≈ unit)
- MEDIUM: single machine, limited I/O, < 5 min (≈ integration)
- LARGE: multi-machine, full I/O, < 15 min (≈ E2E/system)
GE_APPLICATION: we use type names (unit/integration/E2E) but enforce SIZE constraints:
- Unit: < 10 seconds total suite
- Integration: < 60 seconds total suite
- E2E: < 5 minutes total suite
CONCEPT_2: TESTING ON THE TOILET
Short, practical testing tips. Key ones for GE:
- "Don't Put Logic in Tests" — tests should be obvious, not clever
- "Test Behavior, Not Implementation" — aligns with Dodds
- "Keep Cause and Effect Clear" — test should make it obvious WHY it fails
- "Prefer Testing Public APIs" — aligns with oracle independence
CONCEPT_3: BEYONCE RULE
"If you liked it then you should have put a test on it."
MEANING: if a behavior matters, it has a test. Period.
No implicit "it's obvious" or "it's just logging."
CONCEPT_4: CHANGE_DETECTOR_TESTS
Tests that break when code changes but behavior doesn't = change detector tests.
These are HARMFUL — they slow down refactoring without catching bugs.
EXAMPLE: snapshot tests of internal data structures, tests that assert specific SQL queries.
PLAYWRIGHT_TEAM¶
WHO¶
Microsoft team behind Playwright (originally the Puppeteer team at Google who moved to Microsoft).
Led by Andrey Lushnikov and Dmitry Gozman.
KEY_PRINCIPLES¶
PRINCIPLE_1: AUTO_WAIT
Playwright automatically waits for elements to be actionable before interacting.
No manual waitForSelector or sleep — these are anti-patterns.
GE_RULE: if you add a waitForTimeout, you're doing it wrong.
PRINCIPLE_2: TEST_ISOLATION
Every test gets a fresh browser context. No shared state between tests.
Tests can run in any order, in parallel, and still pass.
GE_RULE: if changing test order breaks tests, tests are broken — not the order.
PRINCIPLE_3: WEB_FIRST_ASSERTIONS
await expect(locator).toBeVisible() waits and retries automatically.
No const isVisible = await locator.isVisible(); expect(isVisible).toBe(true);
GE_RULE: always use web-first assertions — they handle timing automatically.
PRINCIPLE_4: TRACING
When a test fails, the trace shows EXACTLY what happened: DOM snapshots, network calls, console logs.
GE_RULE: traces are the FIRST debugging artifact in CI failures.
BEST_PRACTICES_FROM_PLAYWRIGHT_DOCS¶
- Use
getByRoleas primary selector (accessible + resilient) - Test user-visible behavior, not DOM structure
- Avoid testing third-party dependencies
- Use
test.describe.parallelfor independent test groups - Use
test.stepfor readability in complex tests - Use Page Object Model for reusability
STRYKER_TEAM¶
WHO¶
Open-source team behind Stryker mutation testing framework.
Originally built at Info Support (Netherlands — relevant for GE).
Active community with yearly "Stryker Days" events.
KEY_PRINCIPLES¶
PRINCIPLE_1: MUTATION_SCORE_OVER_COVERAGE
Coverage measures code execution. Mutation score measures code TESTING.
100% coverage with 40% mutation score = your tests are lying to you.
PRINCIPLE_2: INCREMENTAL_BY_DEFAULT
Full mutation testing is slow. Incremental mode makes it practical for CI.
Only re-test files that changed since last run.
GE_RULE: PR builds use incremental. Weekly full builds catch drift.
PRINCIPLE_3: TYPESCRIPT_CHECKER
Many mutations create type-invalid code. The TypeScript checker eliminates these
without running tests — significant speedup.
GE_RULE: always enable @stryker-mutator/typescript-checker.
STRYKER_RESOURCES¶
Handbook: https://stryker-mutator.io/docs/
Dashboard (public results): https://dashboard.stryker-mutator.io/
Supported mutators: https://stryker-mutator.io/docs/mutation-testing-elements/supported-mutators/
PROPERTY_BASED_TESTING_COMMUNITY¶
FAST_CHECK (Nicolas Dubien)¶
Library: https://github.com/dubzzz/fast-check
Key idea: instead of testing specific examples, test PROPERTIES that hold for all inputs.
Originally inspired by Haskell's QuickCheck.
KEY_CONCEPTS:
- ARBITRARIES: generators for random test data
- PROPERTIES: invariants that must hold for all generated data
- SHRINKING: when a property fails, fast-check finds the SMALLEST failing input
- REPRODUCIBILITY: seeds make random tests deterministic (same seed = same test run)
GE_APPLICATION: Antje uses fast-check in TDD phase for algorithmic requirements.
Properties come from the spec — "output is always positive", "sort is stable", etc.
WHEN_PBT_EXCELS¶
- Mathematical functions (commutativity, associativity, identity elements)
- Serialization/deserialization round-trips
- Invariants ("always sorted", "always non-negative", "always valid JSON")
- Fuzz testing input validators (should reject all invalid inputs)
- Idempotent operations (applying twice = applying once)
WHEN_PBT_IS_NOT_HELPFUL¶
- UI rendering (properties are hard to express)
- Integration flows (too many moving parts)
- Business rules with many discrete cases (example-based is clearer)
- One-off configuration tests
KEY_RESOURCES¶
BOOKS¶
"Unit Testing Principles, Practices, and Patterns" — Vladimir Khorikov
BEST_FOR: understanding what makes a good test vs a bad test
KEY_TAKEAWAY: test behavior through public API, not implementation through private state
"Test Driven Development: By Example" — Kent Beck
BEST_FOR: TDD methodology (Antje's core reference)
KEY_TAKEAWAY: red-green-refactor cycle, triangulation, fake it till you make it
"Growing Object-Oriented Software, Guided by Tests" — Freeman & Pryce
BEST_FOR: TDD in object-oriented systems, mock-based testing
KEY_TAKEAWAY: test external boundaries, not internal structure
"Working Effectively with Legacy Code" — Michael Feathers
BEST_FOR: adding tests to untested code
KEY_TAKEAWAY: characterization tests — capture current behavior before refactoring
ARTICLES¶
Kent C. Dodds — "Write tests. Not too many. Mostly integration."
https://kentcdodds.com/blog/write-tests
Kent C. Dodds — "Testing Implementation Details"
https://kentcdodds.com/blog/testing-implementation-details
Kent C. Dodds — "Avoid the Test User"
https://kentcdodds.com/blog/avoid-the-test-user
Martin Fowler — "Practical Test Pyramid"
https://martinfowler.com/articles/practical-test-pyramid.html
Google — "Software Engineering at Google" (Chapter 11: Testing Overview)
https://abseil.io/resources/swe-book/html/ch11.html
CONFERENCE_TALKS¶
"The Magic of Testing" — Sandi Metz (RailsConf)
KEY_TAKEAWAY: only test messages crossing the boundary of the object under test
"Integrated Tests are a Scam" — J.B. Rainsberger
KEY_TAKEAWAY: (controversial) integration tests give a false sense of security; unit tests with contracts are better
GE_POSITION: we disagree partially — GE uses both, reconciliation catches the gaps
GE_SYNTHESIS¶
GE's testing approach combines insights from all these thought leaders:
FROM_DODDS: test user behavior, not implementation. Use accessible selectors. Integration-heavy.
FROM_FOWLER: pyramid structure. Clear test double taxonomy. Deterministic tests.
FROM_GOOGLE: test sizes as resource constraints. Beyonce rule. No change detectors.
FROM_PLAYWRIGHT: auto-wait, test isolation, tracing for debugging.
FROM_STRYKER: mutation score is the real quality metric.
FROM_BECK: TDD red-green-refactor. Triangulation. Tests before code.
FROM_FAST_CHECK: properties for algorithmic correctness.
UNIQUE_TO_GE: two-phase testing (TDD + post-impl) with independent reconciliation.
This is GE's innovation — no thought leader advocates this exact pattern because it requires
two separate testing agents working independently, which only makes sense in a multi-agent system.
CROSS_REFERENCES¶
INDEX: domains/testing/index.md — domain overview and GE's testing architecture
VITEST: domains/testing/vitest-patterns.md — applying these principles in Vitest
PLAYWRIGHT: domains/testing/playwright-e2e.md — applying Playwright team principles
TDD: domains/testing/tdd-methodology.md — applying Beck/TDD principles
PITFALLS: domains/testing/pitfalls.md — anti-patterns that violate these principles