Skip to content

DOMAIN:TESTING — THOUGHT_LEADERS

OWNER: marije, judith
ALSO_USED_BY: all testing agents (reference material)
UPDATED: 2026-03-24
SCOPE: testing philosophy, key thinkers, foundational resources


CORE_PHILOSOPHY

"Write tests. Not too many. Mostly integration."
— Guillermo Rauch (Vercel CEO), popularized by Kent C. Dodds

This single sentence captures GE's testing philosophy:
1. WRITE_TESTS: non-negotiable. Every feature has tests.
2. NOT_TOO_MANY: tests have a cost (maintenance, speed, false confidence). More is not always better.
3. MOSTLY_INTEGRATION: integration tests give the best confidence-to-cost ratio.


KENT_C_DODDS

WHO

Creator of Testing Library (React Testing Library, DOM Testing Library).
Primary advocate for testing USER BEHAVIOR over implementation details.
Author of "Testing JavaScript" course and the Testing Trophy concept.

TESTING_TROPHY (vs Test Pyramid)

Martin Fowler's pyramid: many unit → fewer integration → minimal E2E
Kent's trophy: fewer unit → MANY integration → some E2E → static analysis at base

TROPHY_LAYERS (bottom to top):
1. STATIC: TypeScript, ESLint — catches typos and type errors at zero runtime cost
2. UNIT: isolated function tests — fast but low confidence about integration
3. INTEGRATION: components/services working together — HIGHEST confidence per cost
4. E2E: full user flows — highest confidence but slowest and most expensive

GE_POSITION: we use a HYBRID — pyramid structure with trophy emphasis.
- Static analysis is mandatory (TypeScript strict, ESLint)
- Unit tests for pure logic (calculations, transformations, validators)
- Integration tests for service interactions (API routes, DB queries, component rendering)
- E2E tests for critical user journeys only (login, payment, core CRUD)

KEY_PRINCIPLES

PRINCIPLE_1: "The more your tests resemble the way your software is used, the more confidence they can give you."
MEANING: test through the user-facing interface, not internal APIs.
APPLICATION: use getByRole, getByText — not querySelector('.btn-class').

PRINCIPLE_2: "Write tests that give you confidence, not tests that give you coverage."
MEANING: a test that covers code but doesn't verify behavior is worthless.
APPLICATION: mutation testing (Koen's domain) validates this — coverage without detection = theater.

PRINCIPLE_3: "Avoid testing implementation details."
MEANING: don't test HOW something works, test WHAT it produces.
APPLICATION: don't assert on internal state, cache entries, or specific function calls.
EXCEPTION: GE's TDD phase (Antje) tests WHAT spec-compliance, post-impl phase tests WHAT behavior.

PRINCIPLE_4: "AHA — Avoid Hasty Abstractions in tests."
MEANING: test code should be MORE explicit than production code, not less.
APPLICATION: prefer duplication in tests over clever shared helpers that obscure what's being tested.

TESTING_LIBRARY_APPROACH

Testing Library enforces good testing by making it HARD to test implementation details:
- No access to component internals (state, props, methods)
- Queries by role, text, label — what the user sees
- Encourages accessible markup (if you can't find by role, your HTML needs fixing)

GE_RULE: use Testing Library queries in component tests
GE_RULE: if a test needs container.querySelector, the test is wrong OR the component needs accessibility fixes


MARTIN_FOWLER

WHO

Author of "Refactoring", co-author of Agile Manifesto.
Defined the Test Pyramid, coined many testing terms.
Chief Scientist at Thoughtworks.

TEST_PYRAMID

        /  E2E  \          Slow, expensive, high confidence
       / Integr. \         Medium speed, medium confidence
      /   Unit    \        Fast, cheap, lower confidence

PYRAMID_RULES:
- MANY unit tests: fast, cheap, isolate bugs quickly
- FEWER integration tests: verify components work together
- MINIMAL E2E tests: verify critical paths only

WHY_PYRAMID_SHAPE:
- Execution speed: unit > integration > E2E
- Maintenance cost: unit < integration < E2E
- Debugging ease: unit > integration > E2E
- False positive rate: unit < integration < E2E

GE_APPLICATION: GE follows the pyramid with a twist — TDD tests (Antje) are mostly unit-level,
post-impl tests (Marije/Judith) span all levels, and reconciliation (Jasper) covers the gaps.

TEST_DOUBLES (Fowler taxonomy)

DUMMY: passed but never used (fill a parameter)
STUB: provides canned answers to calls
SPY: records calls for later verification
MOCK: pre-programmed with expectations, verifies interaction
FAKE: working implementation but simplified (in-memory database)

GE_PREFERENCE:
- FAKES for databases in integration tests (test DB, not in-memory)
- STUBS for external APIs (predictable responses)
- SPIES for verifying side effects (email sent, event emitted)
- MOCKS sparingly — they couple tests to implementation

KEY_ARTICLES

"Test Pyramid" — https://martinfowler.com/bliki/TestPyramid.html
"Mocks Aren't Stubs" — https://martinfowler.com/articles/mocksArentStubs.html
"Test Double" — https://martinfowler.com/bliki/TestDouble.html
"Practical Test Pyramid" — https://martinfowler.com/articles/practical-test-pyramid.html
"Eradicating Non-Determinism in Tests" — https://martinfowler.com/articles/nonDeterminism.html


GOOGLE_TESTING_BLOG

WHO

Google's engineering testing team, publishing since 2007.
Source of many industry-standard testing practices.
Blog: https://testing.googleblog.com/

KEY_CONCEPTS

CONCEPT_1: TEST_SIZES (not types)
Google classifies tests by RESOURCE usage, not abstraction level:
- SMALL: single process, no I/O, < 1 min (≈ unit)
- MEDIUM: single machine, limited I/O, < 5 min (≈ integration)
- LARGE: multi-machine, full I/O, < 15 min (≈ E2E/system)

GE_APPLICATION: we use type names (unit/integration/E2E) but enforce SIZE constraints:
- Unit: < 10 seconds total suite
- Integration: < 60 seconds total suite
- E2E: < 5 minutes total suite

CONCEPT_2: TESTING ON THE TOILET
Short, practical testing tips. Key ones for GE:
- "Don't Put Logic in Tests" — tests should be obvious, not clever
- "Test Behavior, Not Implementation" — aligns with Dodds
- "Keep Cause and Effect Clear" — test should make it obvious WHY it fails
- "Prefer Testing Public APIs" — aligns with oracle independence

CONCEPT_3: BEYONCE RULE
"If you liked it then you should have put a test on it."
MEANING: if a behavior matters, it has a test. Period.
No implicit "it's obvious" or "it's just logging."

CONCEPT_4: CHANGE_DETECTOR_TESTS
Tests that break when code changes but behavior doesn't = change detector tests.
These are HARMFUL — they slow down refactoring without catching bugs.
EXAMPLE: snapshot tests of internal data structures, tests that assert specific SQL queries.


PLAYWRIGHT_TEAM

WHO

Microsoft team behind Playwright (originally the Puppeteer team at Google who moved to Microsoft).
Led by Andrey Lushnikov and Dmitry Gozman.

KEY_PRINCIPLES

PRINCIPLE_1: AUTO_WAIT
Playwright automatically waits for elements to be actionable before interacting.
No manual waitForSelector or sleep — these are anti-patterns.
GE_RULE: if you add a waitForTimeout, you're doing it wrong.

PRINCIPLE_2: TEST_ISOLATION
Every test gets a fresh browser context. No shared state between tests.
Tests can run in any order, in parallel, and still pass.
GE_RULE: if changing test order breaks tests, tests are broken — not the order.

PRINCIPLE_3: WEB_FIRST_ASSERTIONS
await expect(locator).toBeVisible() waits and retries automatically.
No const isVisible = await locator.isVisible(); expect(isVisible).toBe(true);
GE_RULE: always use web-first assertions — they handle timing automatically.

PRINCIPLE_4: TRACING
When a test fails, the trace shows EXACTLY what happened: DOM snapshots, network calls, console logs.
GE_RULE: traces are the FIRST debugging artifact in CI failures.

BEST_PRACTICES_FROM_PLAYWRIGHT_DOCS

  • Use getByRole as primary selector (accessible + resilient)
  • Test user-visible behavior, not DOM structure
  • Avoid testing third-party dependencies
  • Use test.describe.parallel for independent test groups
  • Use test.step for readability in complex tests
  • Use Page Object Model for reusability

STRYKER_TEAM

WHO

Open-source team behind Stryker mutation testing framework.
Originally built at Info Support (Netherlands — relevant for GE).
Active community with yearly "Stryker Days" events.

KEY_PRINCIPLES

PRINCIPLE_1: MUTATION_SCORE_OVER_COVERAGE
Coverage measures code execution. Mutation score measures code TESTING.
100% coverage with 40% mutation score = your tests are lying to you.

PRINCIPLE_2: INCREMENTAL_BY_DEFAULT
Full mutation testing is slow. Incremental mode makes it practical for CI.
Only re-test files that changed since last run.
GE_RULE: PR builds use incremental. Weekly full builds catch drift.

PRINCIPLE_3: TYPESCRIPT_CHECKER
Many mutations create type-invalid code. The TypeScript checker eliminates these
without running tests — significant speedup.
GE_RULE: always enable @stryker-mutator/typescript-checker.

STRYKER_RESOURCES

Handbook: https://stryker-mutator.io/docs/
Dashboard (public results): https://dashboard.stryker-mutator.io/
Supported mutators: https://stryker-mutator.io/docs/mutation-testing-elements/supported-mutators/


PROPERTY_BASED_TESTING_COMMUNITY

FAST_CHECK (Nicolas Dubien)

Library: https://github.com/dubzzz/fast-check
Key idea: instead of testing specific examples, test PROPERTIES that hold for all inputs.
Originally inspired by Haskell's QuickCheck.

KEY_CONCEPTS:
- ARBITRARIES: generators for random test data
- PROPERTIES: invariants that must hold for all generated data
- SHRINKING: when a property fails, fast-check finds the SMALLEST failing input
- REPRODUCIBILITY: seeds make random tests deterministic (same seed = same test run)

GE_APPLICATION: Antje uses fast-check in TDD phase for algorithmic requirements.
Properties come from the spec — "output is always positive", "sort is stable", etc.

WHEN_PBT_EXCELS

  • Mathematical functions (commutativity, associativity, identity elements)
  • Serialization/deserialization round-trips
  • Invariants ("always sorted", "always non-negative", "always valid JSON")
  • Fuzz testing input validators (should reject all invalid inputs)
  • Idempotent operations (applying twice = applying once)

WHEN_PBT_IS_NOT_HELPFUL

  • UI rendering (properties are hard to express)
  • Integration flows (too many moving parts)
  • Business rules with many discrete cases (example-based is clearer)
  • One-off configuration tests

KEY_RESOURCES

BOOKS

"Unit Testing Principles, Practices, and Patterns" — Vladimir Khorikov
BEST_FOR: understanding what makes a good test vs a bad test
KEY_TAKEAWAY: test behavior through public API, not implementation through private state

"Test Driven Development: By Example" — Kent Beck
BEST_FOR: TDD methodology (Antje's core reference)
KEY_TAKEAWAY: red-green-refactor cycle, triangulation, fake it till you make it

"Growing Object-Oriented Software, Guided by Tests" — Freeman & Pryce
BEST_FOR: TDD in object-oriented systems, mock-based testing
KEY_TAKEAWAY: test external boundaries, not internal structure

"Working Effectively with Legacy Code" — Michael Feathers
BEST_FOR: adding tests to untested code
KEY_TAKEAWAY: characterization tests — capture current behavior before refactoring

ARTICLES

Kent C. Dodds — "Write tests. Not too many. Mostly integration."
https://kentcdodds.com/blog/write-tests

Kent C. Dodds — "Testing Implementation Details"
https://kentcdodds.com/blog/testing-implementation-details

Kent C. Dodds — "Avoid the Test User"
https://kentcdodds.com/blog/avoid-the-test-user

Martin Fowler — "Practical Test Pyramid"
https://martinfowler.com/articles/practical-test-pyramid.html

Google — "Software Engineering at Google" (Chapter 11: Testing Overview)
https://abseil.io/resources/swe-book/html/ch11.html

CONFERENCE_TALKS

"The Magic of Testing" — Sandi Metz (RailsConf)
KEY_TAKEAWAY: only test messages crossing the boundary of the object under test

"Integrated Tests are a Scam" — J.B. Rainsberger
KEY_TAKEAWAY: (controversial) integration tests give a false sense of security; unit tests with contracts are better
GE_POSITION: we disagree partially — GE uses both, reconciliation catches the gaps


GE_SYNTHESIS

GE's testing approach combines insights from all these thought leaders:

FROM_DODDS: test user behavior, not implementation. Use accessible selectors. Integration-heavy.
FROM_FOWLER: pyramid structure. Clear test double taxonomy. Deterministic tests.
FROM_GOOGLE: test sizes as resource constraints. Beyonce rule. No change detectors.
FROM_PLAYWRIGHT: auto-wait, test isolation, tracing for debugging.
FROM_STRYKER: mutation score is the real quality metric.
FROM_BECK: TDD red-green-refactor. Triangulation. Tests before code.
FROM_FAST_CHECK: properties for algorithmic correctness.

UNIQUE_TO_GE: two-phase testing (TDD + post-impl) with independent reconciliation.
This is GE's innovation — no thought leader advocates this exact pattern because it requires
two separate testing agents working independently, which only makes sense in a multi-agent system.


CROSS_REFERENCES

INDEX: domains/testing/index.md — domain overview and GE's testing architecture
VITEST: domains/testing/vitest-patterns.md — applying these principles in Vitest
PLAYWRIGHT: domains/testing/playwright-e2e.md — applying Playwright team principles
TDD: domains/testing/tdd-methodology.md — applying Beck/TDD principles
PITFALLS: domains/testing/pitfalls.md — anti-patterns that violate these principles