DOMAIN:PERFORMANCE — PERFORMANCE_RUBRIC¶
OWNER: nessa
ALSO_USED_BY: marta (merge gate criterion 5), iwona (release readiness input)
UPDATED: 2026-03-24
SCOPE: performance pass/fail rubric with per-metric thresholds and decision trees — JIT injected before every performance evaluation
PURPOSE: standardize performance decisions across all client projects with clear thresholds, escalation paths, and no-baseline protocols
METRIC_THRESHOLDS¶
WEB_VITALS (client-facing pages)¶
| Metric | Good | Needs Improvement | Poor | Unit |
|---|---|---|---|---|
| LCP (Largest Contentful Paint) | < 1.5s | 1.5s - 2.5s | > 2.5s | seconds |
| INP (Interaction to Next Paint) | < 200ms | 200ms - 500ms | > 500ms | milliseconds |
| CLS (Cumulative Layout Shift) | < 0.1 | 0.1 - 0.25 | > 0.25 | unitless |
| FCP (First Contentful Paint) | < 1.0s | 1.0s - 1.8s | > 1.8s | seconds |
| TTFB (Time to First Byte) | < 200ms | 200ms - 500ms | > 500ms | milliseconds |
SOURCE: Web Vitals thresholds aligned with Google CrUX "good" benchmarks.
OVERRIDE: Client SLOs take precedence. If a client contract specifies LCP < 1.0s, that is the threshold.
API_ENDPOINTS (backend)¶
| Metric | Good | Needs Improvement | Poor | Unit |
|---|---|---|---|---|
| p50 latency | < 100ms | 100ms - 300ms | > 300ms | milliseconds |
| p95 latency | < 250ms | 250ms - 500ms | > 500ms | milliseconds |
| p99 latency | < 500ms | 500ms - 1000ms | > 1000ms | milliseconds |
| Error rate | < 0.1% | 0.1% - 1% | > 1% | percentage |
| Throughput | > 100 rps | 50-100 rps | < 50 rps | requests/second |
ENDPOINT_TYPE_ADJUSTMENTS:
- Read endpoints (GET): use standard thresholds above
- Write endpoints (POST/PUT/DELETE): multiply latency thresholds by 1.5x (writes are inherently slower)
- Aggregation endpoints (reports, dashboards): multiply latency thresholds by 3x
- File upload endpoints: multiply latency thresholds by 5x, measure separately from other endpoints
- Health check endpoints: p99 must be < 50ms (used by k8s probes, affects pod restarts)
BUNDLE_SIZE (frontend)¶
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| Initial JS bundle | < 150 KB (gzipped) | 150-300 KB | > 300 KB |
| Total JS (lazy-loaded) | < 500 KB | 500 KB - 1 MB | > 1 MB |
| CSS bundle | < 50 KB (gzipped) | 50-100 KB | > 100 KB |
| Largest single chunk | < 100 KB | 100-200 KB | > 200 KB |
DECISION_TREE¶
PRIMARY_DECISION: BLOCK, WARN, OR PASS¶
START: Compare current metrics to baseline (or thresholds if no baseline)
1. Is any metric in the POOR range?
├─ YES → Is it a critical endpoint (auth, payment, checkout)?
│ ├─ YES → BLOCK RELEASE
│ └─ NO → Is the regression > 50% from baseline?
│ ├─ YES → BLOCK RELEASE
│ └─ NO → WARN (pass with mandatory follow-up ticket)
└─ NO → Continue
2. Is any metric in the NEEDS IMPROVEMENT range?
├─ YES → Was it previously in the GOOD range?
│ ├─ YES → Is the regression > 25% from baseline?
│ │ ├─ YES → WARN (pass with follow-up ticket)
│ │ └─ NO → PASS (note the change)
│ └─ NO → PASS (already known, not a new regression)
└─ NO → Continue
3. All metrics in GOOD range
└─ PASS
CLIENT_SLO_OVERRIDE¶
Does the client have explicit SLOs in their contract?
├─ YES → Use client SLOs instead of standard thresholds
│ └─ Any client SLO violated?
│ ├─ YES → Was the SLO already violated before this PR?
│ │ ├─ YES → Did this PR make it WORSE?
│ │ │ ├─ YES → BLOCK (regression on already-bad metric)
│ │ │ └─ NO → WARN (pre-existing issue, not this PR's fault)
│ │ └─ NO → BLOCK (this PR caused the SLO violation)
│ └─ NO → Use standard decision tree above
└─ NO → Use standard thresholds
NO_BASELINE_PROTOCOL¶
When evaluating a new endpoint or page with no historical data:
STEP_1: CATEGORIZE¶
Determine the endpoint type from the list:
- READ: simple data fetch (single record or small list)
- AGGREGATION: data computation (reports, dashboards, analytics)
- WRITE: data mutation (create, update, delete)
- UPLOAD: file handling
- HEALTH: system status check
STEP_2: APPLY_CATEGORY_THRESHOLDS¶
Use the standard API thresholds with the endpoint-type multiplier. For example, an aggregation endpoint gets 3x the standard latency budget.
STEP_3: COMPARE_TO_PEERS¶
Find similar endpoints in the same project or in other GE client projects:
- Same category, similar data volume → closest peer
- If peer is 2x faster, investigate why the new endpoint is slower
- If peer is similar, accept as reasonable
STEP_4: RECORD_INITIAL_BASELINE¶
BASELINE_RECORD:
endpoint: [path]
date: [date]
category: [type]
p50: [value] p95: [value] p99: [value]
data_volume: [approximate record count]
confidence: initial
next_review: [date + 3 release cycles]
STEP_5: CONDITIONAL PASS¶
New endpoints with no baseline receive a CONDITIONAL PASS:
- Metric must be within category thresholds
- Baseline is recorded for future comparison
- Load test recommended before calling baseline stable
- Re-evaluate after 3 release cycles with production traffic data
REGRESSION_CLASSIFICATION¶
| Regression Size | Classification | Action |
|---|---|---|
| < 5% | Noise | PASS — within measurement variance |
| 5-15% | Minor | PASS with note — monitor next release |
| 15-25% | Moderate | WARN — pass with follow-up investigation ticket |
| 25-50% | Significant | WARN on non-critical, BLOCK on critical endpoints |
| 50-100% | Major | BLOCK — investigate before merge |
| > 100% (2x+) | Critical | BLOCK — likely a bug, not a performance issue |
CRITICAL_ENDPOINTS (always BLOCK at 25%+):
- Authentication and session management
- Payment processing
- Order creation and checkout
- Data export and download
- Health check endpoints (affects k8s pod lifecycle)
MEASUREMENT_STANDARDS¶
MINIMUM_RUN_COUNT¶
| Decision | Minimum Runs |
|---|---|
| PASS (no concerns) | 5 runs |
| WARN (marginal) | 10 runs |
| BLOCK (confirm regression) | 10 runs with environment verification |
ENVIRONMENT_REQUIREMENTS¶
- Benchmark must run on dedicated environment (not shared dev cluster)
- Database must be seeded with representative data volume
- No other benchmarks running concurrently
- First run discarded (warm-up)
- Same hardware and configuration as previous baseline
VARIANCE_ACCEPTANCE¶
| Coefficient of Variation | Interpretation |
|---|---|
| < 5% | Stable — high confidence in results |
| 5-10% | Acceptable — results are usable |
| 10-20% | Noisy — increase run count, check environment |
| > 20% | Unreliable — fix environment before drawing conclusions |
REPORTING_FORMAT¶
When returning a performance assessment, use this format:
PERFORMANCE_ASSESSMENT
Build: [build ID or PR number]
Decision: [PASS | WARN | BLOCK]
Metrics:
LCP: [value] ([GOOD|NEEDS_IMPROVEMENT|POOR]) [delta from baseline]
INP: [value] ([GOOD|NEEDS_IMPROVEMENT|POOR]) [delta from baseline]
CLS: [value] ([GOOD|NEEDS_IMPROVEMENT|POOR]) [delta from baseline]
p99: [value] ([GOOD|NEEDS_IMPROVEMENT|POOR]) [delta from baseline]
[If WARN: specific metric(s) that need follow-up]
[If BLOCK: specific regression with baseline comparison and recommended investigation]
[If no baseline: initial baseline recorded, conditional pass conditions]
ESCALATION_PATH¶
ESCALATE_TO_MARTA when:
- Performance block would delay a client delivery
- Client SLO conflict (SLO is unreasonable given the feature complexity)
- Infrastructure bottleneck suspected (not a code issue)
ESCALATE_TO_DEVELOPER when:
- Specific query or function identified as regression source
- Bundle analysis shows unexpected large dependency
- Missing index or N+1 query pattern detected
ESCALATE_TO_ANNA when:
- Spec requires a feature that is inherently slow (real-time aggregation of large dataset)
- Performance and functionality are in direct conflict
- Client SLO needs renegotiation based on technical constraints