DOMAIN:DEVOPS — RELEASE_READINESS_RUBRIC¶

OWNER: marta
ALSO_USED_BY: iwona (co-evaluator), koen (lint input), nessa (performance input), jasper (reconciliation input)
UPDATED: 2026-03-24
SCOPE: scoring rubric for release readiness assessment — used by Marta and Iwona to produce a numeric score for every PR and release candidate
PURPOSE: make merge/block decisions objective, reproducible, and auditable

SCORING_OVERVIEW¶

Every PR receives a score from 0 to 100.
Threshold: >= 70 to merge. < 70 is blocked.
The score is computed from 7 weighted criteria.
Each criterion is scored 0-10 with Pass/Partial/Fail definitions.
SEE: devops/merge-gate-calibration.md for worked examples at different score levels.

CRITERIA_TABLE¶

#	Criterion	Weight	Source
1	Tests pass	25%	CI pipeline, Marije
2	Spec traceability	20%	Jasper reconciliation report
3	No test weakening	15%	Diff analysis (Marta)
4	Koen clean	15%	Koen lint report
5	Performance budget	10%	Nessa performance report
6	Code churn	10%	Git history analysis
7	PR size	5%	Diff stats

FORMULA: Score = SUM(criterion_score / 10 * weight * 100)

CRITERION_1: TESTS PASS (25%)¶

Measures: Do all tests in the suite pass on the current build?

Score	Definition
10	All tests pass. Zero failures, zero skips (or skips have tracked tickets).
8	All tests pass. 1-2 tests skipped with tracked tickets and documented reason.
6	All tests pass. 3-5 tests skipped. Skipped tests are not in critical paths.
4	1-3 test failures. Failures are in non-critical paths. Developer claims "known issue."
2	4+ test failures or any failure in a critical path (auth, payment, data integrity).
0	Test suite does not run, crashes, or has been disabled.

CRITICAL_PATHS: authentication, authorization, payment, data persistence, API contracts.
A single failure in a critical path caps this criterion at 2.

SKIP_RULES:
- Skipped tests MUST have a tracked ticket number in the skip reason
- Skipped tests MUST NOT be in critical paths
- More than 5 skips in a single suite suggests the suite needs attention, not more skips

CRITERION_2: SPEC TRACEABILITY (20%)¶

Measures: Can every spec requirement be traced to at least one test?

Score	Definition
10	100% of spec requirements have corresponding tests. Mapping is documented.
8	90-99% coverage. Missing items are cosmetic (tooltips, placeholder text).
6	75-89% coverage. Missing items are non-critical but behavioral.
4	50-74% coverage. Significant spec items untested.
2	25-49% coverage. Most spec items untested.
0	No traceability. Tests exist but don't map to spec.

INPUT: Jasper's reconciliation report provides the coverage matrix.
If Jasper has not run reconciliation, Marta must request it before scoring.

SPEC_ITEM_CLASSIFICATION:
- CRITICAL: auth, payment, data integrity, API contracts → must be tested (mandatory)
- IMPORTANT: core user flows, business logic, error handling → should be tested
- COSMETIC: UI text, tooltips, animation timing → nice to have

Missing CRITICAL items cap this criterion at 3.

CRITERION_3: NO TEST WEAKENING (15%)¶

Measures: Did this PR delete, skip, or weaken any previously passing test assertions?

Score	Definition
10	No test assertions deleted, skipped, or weakened. Test coverage maintained or improved.
8	Test assertions changed but functionally equivalent (e.g., renamed, restructured).
6	1-2 non-critical assertions removed with documented justification (behavior intentionally changed per spec).
4	3+ assertions removed, OR any critical-path assertion weakened, even with justification.
2	Test file deleted without replacement.
0	Systematic test weakening: multiple files, pattern of removing assertions to make tests pass.

HARD_BLOCK_RULE (TW-1): If any previously passing assertion in a critical path is deleted without spec-backed justification, this criterion scores 0 AND triggers a hard block regardless of total score.

WHAT_COUNTS_AS_WEAKENING:
- Deleting an expect() call
- Changing toBe(specificValue) to toBeDefined()
- Changing toHaveLength(5) to toHaveLength(expect.any(Number))
- Adding .skip to a previously passing test
- Wrapping a test in try/catch that swallows the assertion error

WHAT_DOES_NOT_COUNT:
- Updating an expected value because the spec changed (with Anna's spec change documented)
- Moving assertions to a different test file
- Replacing one assertion with a more specific one

CRITERION_4: KOEN CLEAN (15%)¶

Measures: Does the code pass lint, format, and static analysis checks?

Score	Definition
10	Zero errors, zero warnings. Clean pass.
8	Zero errors, 1-3 warnings. Warnings are cosmetic (unused imports, line length).
6	Zero errors, 4+ warnings. Or 1 error that is a false positive with documented override.
4	1-2 errors. Errors are style-related, not logic-related.
2	3+ errors or any `any` type usage in TypeScript (type safety violation).
0	Lint does not run or has been disabled for this PR.

KOEN_REPORTS: Koen provides the lint report as part of the pipeline. If Koen has not run, Marta must request it.

ERROR_CLASSIFICATION:
- TYPE_SAFETY: any type, missing return types, unchecked nulls → always errors
- STYLE: naming conventions, import order, line length → warnings
- UNUSED: dead code, unused variables → warnings (but accumulated unused code is a smell)

CRITERION_5: PERFORMANCE BUDGET (10%)¶

Measures: Does the build stay within performance budgets?

Score	Definition
10	All metrics within budget. No regressions detected.
8	All metrics within budget. Minor regression (< 10%) on non-critical endpoint.
6	One metric marginally over budget (< 15% over). Non-critical endpoint.
4	One metric significantly over budget (15-50% over). OR critical endpoint marginally over.
2	Multiple metrics over budget. OR critical endpoint significantly over.
0	Performance tests not run. OR any metric > 2x budget.

IF_NO_PERFORMANCE_DATA:
- New feature with no baseline: score N/A, weight redistributed to other criteria
- Performance tests skipped by developer: score 0 (not N/A — skipping is a choice)
- Performance tests flaky: score 5, flag for infrastructure investigation

SEE: performance/performance-rubric.md for per-metric thresholds.
SEE: performance/calibration-examples.md for pass/fail examples.

CRITERION_6: CODE CHURN (10%)¶

Measures: Is this code area stable, or is it being repeatedly patched?

Score	Definition
10	All changed files have 0-1 changes in the past 30 days. Stable code.
8	Most files stable. 1 file has 2 changes in 14 days (common during active development).
6	2-3 files have 2 changes each in 14 days. Active development area.
4	Any file has 3+ changes in 14 days. Pattern suggests unstable code.
2	Multiple files with 3+ changes in 14 days. Systematic instability.
0	Same logic file changed 5+ times in 14 days. Fundamental design problem.

EXCLUDED_FROM_CHURN:
- Test files (expected to change alongside code)
- Configuration files (expected to change during setup)
- Auto-generated files (migrations, lockfiles)
- Documentation files

CHURN_CONTEXT:
- High churn during initial feature development (first 2 weeks) is normal — don't penalize
- High churn on bug fixes to the same file is a red flag — penalize
- Use git log to distinguish "building" from "patching"

CRITERION_7: PR SIZE (5%)¶

Measures: Is the PR appropriately scoped?

Score	Definition
10	1-50 lines changed. Single concern.
8	51-150 lines changed. Clear scope.
6	151-300 lines changed. Reasonable for a feature.
4	301-500 lines changed. Should be reviewed for splitting opportunity.
2	501-1000 lines changed. Likely should be split.
0	1000+ lines changed. Almost certainly needs splitting.

ADJUSTMENTS:
- Auto-generated lines (migrations, lockfiles) are EXCLUDED from line count
- Test lines are counted at 50% weight (more tests is good, not a complexity risk)
- Renamed/moved files: count only the actual content changes, not the full file

SCORE_CALCULATION_EXAMPLE¶

PR: "Add user profile page with avatar upload"

Tests pass:       8/10 * 25% = 20.0
Spec traceability: 7/10 * 20% = 14.0
No test weakening: 10/10 * 15% = 15.0
Koen clean:       9/10 * 15% = 13.5
Performance:      6/10 * 10% =  6.0
Code churn:       8/10 * 10% =  8.0
PR size:          6/10 *  5% =  3.0
                              -----
TOTAL:                         79.5 → PASS (>= 70)

HARD_BLOCK_OVERRIDES¶

These conditions trigger a BLOCK regardless of total score:

Code	Condition	Triggered By
TW-1	Critical-path test assertion deleted without spec justification	Criterion 3 = 0
TW-2	Test file deleted without replacement	Criterion 3 <= 2
SEC-1	Security-relevant test coverage decreased	Criterion 2 + 3 combined
DATA-1	Migration contains DROP/TRUNCATE without approval	Manual flag
FAIL-1	Any test failure in auth, payment, or data persistence	Criterion 1 <= 2

BREAK_GLASS_OVERRIDE¶

Conditions for passing a PR below threshold (production hotfix only):

Requirement	Details
Active incident	Production users are currently affected
Minimal change	<= 20 lines changed
Hotfix label	Applied by authorized team member
Follow-up tracking	48-hour deadline for proper fix with full tests
Score recorded	The below-threshold score is still recorded for audit

Break-glass does NOT apply to: deadline pressure, client demos, "we'll fix it later" without tracked follow-up.

WEIGHT_ADJUSTMENTS_BY_PR_TYPE¶

PR Type	Tests	Spec	Weakening	Koen	Perf	Churn	Size
Feature (default)	25%	20%	15%	15%	10%	10%	5%
Test-only	30%	30%	15%	10%	0%	10%	5%
Config-only	10%	10%	10%	20%	5%	25%	20%
Hotfix	30%	10%	20%	10%	5%	20%	5%
Migration-only	15%	20%	15%	10%	10%	15%	15%

Marta selects the PR type based on the content of the diff. Mixed PRs use the default weights.