Skip to content

DOMAIN:PERFORMANCE

OWNER: nessa
UPDATED: 2026-03-24
SCOPE: pre-release performance validation, performance budgets, SLO enforcement
AGENTS: nessa (primary), mira (incident patterns), sandro/tobias (fixes)


PERFORMANCE:LOAD_TESTING

TOOL_SELECTION

TOOL: k6 (Grafana)
USE_WHEN: API/backend load testing, scripting in JavaScript, CI/CD integration
STRENGTHS: scriptable in JS, excellent CLI output, Grafana integration, threshold-based pass/fail
LIMITATIONS: no browser rendering (use k6 browser extension for frontend)
RUN: k6 run --vus 50 --duration 5m load-test.js
RUN: k6 run --out json=results.json load-test.js (for analysis)

TOOL: Artillery
USE_WHEN: quick YAML-based load tests, protocol diversity (HTTP, WebSocket, Socket.io)
STRENGTHS: YAML config (fast to write), good for WebSocket testing, built-in phases
LIMITATIONS: less flexible scripting than k6
RUN: artillery run load-test.yaml
RUN: artillery run --output results.json load-test.yaml

TOOL: Locust
USE_WHEN: Python team, complex user behavior simulation, distributed testing
STRENGTHS: Python scripting, web UI for monitoring, distributed mode
LIMITATIONS: Python dependency, higher resource usage
RUN: locust -f locustfile.py --headless -u 100 -r 10 --run-time 5m

RECOMMENDATION_FOR_GE: k6 as primary (JS ecosystem matches our stack, CI/CD friendly)

LOAD_TEST_TYPES

TYPE: smoke_test
PURPOSE: verify system works under minimal load
VUSERS: 1-5
DURATION: 1-2 minutes
WHEN: every deployment
PASS_CRITERIA: 0 errors, p95 < 500ms

TYPE: load_test
PURPOSE: verify system handles expected production load
VUSERS: expected concurrent users (start with 20-50 for GE)
DURATION: 5-15 minutes
WHEN: before every release
PASS_CRITERIA: error rate < 1%, p95 < target, p99 < 2x target

TYPE: stress_test
PURPOSE: find the breaking point
VUSERS: ramp up beyond expected load (2x, 5x, 10x)
DURATION: 15-30 minutes with ramp-up stages
WHEN: quarterly or after major architecture changes
PASS_CRITERIA: graceful degradation, no data loss, recovery after load drops

TYPE: soak_test
PURPOSE: detect memory leaks, connection leaks, gradual degradation
VUSERS: normal load sustained
DURATION: 2-8 hours
WHEN: before major releases, after memory-related fixes
PASS_CRITERIA: no resource growth trend, stable response times

TYPE: spike_test
PURPOSE: verify behavior under sudden traffic bursts
VUSERS: 0 → peak → 0 in seconds
DURATION: 5-10 minutes with sharp spikes
WHEN: before launches, if expecting marketing traffic
PASS_CRITERIA: recovery within 30s after spike ends

K6_SCRIPT_TEMPLATE

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 20 },   // ramp up
    { duration: '3m', target: 20 },   // steady state
    { duration: '1m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    http_req_failed: ['rate<0.01'],
    checks: ['rate>0.99'],
  },
};

export default function () {
  const res = http.get('http://target/api/endpoint');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

RULE: always include thresholds — k6 exits non-zero when thresholds fail (CI/CD gate)
RULE: always include ramp-up — sudden full load is unrealistic and masks real issues
RULE: always include checks — response code alone doesn't verify correctness


PERFORMANCE:BUDGETS

WHAT_TO_BUDGET

METRIC: page_load_time (frontend)
TARGET: < 3s on 4G connection
THRESHOLD_WARN: > 2s
THRESHOLD_FAIL: > 3s

METRIC: time_to_interactive (TTI)
TARGET: < 5s on 4G connection
THRESHOLD_WARN: > 3.5s
THRESHOLD_FAIL: > 5s

METRIC: javascript_bundle_size
TARGET: < 200KB gzipped (initial load)
THRESHOLD_WARN: > 150KB
THRESHOLD_FAIL: > 200KB
NOTE: per-route code splitting — measure initial bundle, not total

METRIC: api_response_time (backend)
TARGET: p95 < 500ms for reads, p95 < 1000ms for writes
THRESHOLD_WARN: p95 > 300ms reads, p95 > 700ms writes
THRESHOLD_FAIL: p95 > 500ms reads, p95 > 1000ms writes

METRIC: database_query_time
TARGET: p95 < 100ms
THRESHOLD_WARN: p95 > 50ms
THRESHOLD_FAIL: p95 > 100ms
EXCEPTION: complex reports/analytics queries — budget separately

METRIC: error_rate
TARGET: < 0.1% of requests
THRESHOLD_WARN: > 0.05%
THRESHOLD_FAIL: > 0.1%

METRIC: image_total_weight
TARGET: < 500KB per page (after optimization)
THRESHOLD_WARN: > 300KB
THRESHOLD_FAIL: > 500KB

ENFORCEMENT

IF: budget exceeded in CI THEN: block merge (red pipeline)
IF: budget exceeded in staging THEN: block release
IF: budget approaches warn threshold THEN: flag in PR review
TOOL: Lighthouse CI for frontend budgets (see section below)
TOOL: k6 thresholds for backend budgets

ANTI_PATTERN: setting budgets and never checking them
FIX: budgets MUST be automated gates in CI/CD — human discipline is unreliable

ANTI_PATTERN: budgets so loose they never fail
FIX: start with current p95 + 20% margin, tighten quarterly


PERFORMANCE:CORE_WEB_VITALS

LCP (Largest Contentful Paint)

TARGET: < 2.5s (good), < 4.0s (needs improvement), > 4.0s (poor)
MEASURES: loading performance — when does the main content become visible?
ELEMENTS_COUNTED: ,

COMMON_CAUSES_OF_POOR_LCP:
- Slow server response (high TTFB) → optimize server, use CDN
- Render-blocking resources (CSS, sync JS) → defer non-critical CSS, async JS
- Slow resource load (large image) → optimize images, use WebP/AVIF, preload LCP image
- Client-side rendering delay → SSR/SSG for critical content

TOOL: measure LCP
RUN: lighthouse --only-categories=performance --output=json <url>
TOOL: identify LCP element
RUN: Chrome DevTools → Performance tab → look for "LCP" marker

FIX: preload the LCP image: <link rel="preload" as="image" href="hero.webp">
FIX: inline critical CSS, defer the rest
FIX: use fetchpriority="high" on LCP image

INP (Interaction to Next Paint) — replaced FID March 2024

TARGET: < 200ms (good), < 500ms (needs improvement), > 500ms (poor)
MEASURES: responsiveness — how quickly does the page respond to ALL user interactions?
DIFFERENCE_FROM_FID: FID measured only first input delay; INP measures worst interaction

COMMON_CAUSES_OF_POOR_INP:
- Long JavaScript tasks blocking main thread (> 50ms = long task)
- Heavy re-renders in React (unnecessary state updates, missing memo)
- Expensive event handlers (complex calculations on click)
- Layout thrashing (read-write-read-write DOM in loop)

FIX: break long tasks with setTimeout(fn, 0) or requestIdleCallback
FIX: use React.memo, useMemo, useCallback to prevent unnecessary re-renders
FIX: move expensive computation to Web Worker
FIX: use CSS content-visibility: auto for off-screen content
FIX: debounce/throttle frequent event handlers (scroll, resize, input)

CLS (Cumulative Layout Shift)

TARGET: < 0.1 (good), < 0.25 (needs improvement), > 0.25 (poor)
MEASURES: visual stability — does content jump around as page loads?

COMMON_CAUSES_OF_POOR_CLS:
- Images without width/height attributes (browser doesn't know size until loaded)
- Ads/embeds without reserved space
- Dynamically injected content above viewport (banners, notifications)
- Web fonts causing FOIT/FOUT (text reflow on font load)

FIX: always set width and height on images (or use aspect-ratio CSS)
FIX: reserve space for dynamic content with min-height
FIX: use font-display: optional or preload fonts
FIX: prefer CSS transforms over properties that trigger layout (top, left, width, height)

MEASUREMENT

TOOL: Lighthouse (lab data)
RUN: npx lighthouse <url> --output=json --output=html --chrome-flags="--headless"

TOOL: Chrome UX Report (field data)
CHECK: PageSpeed Insights for real-user CWV data

TOOL: web-vitals library (in-app measurement)

import { onLCP, onINP, onCLS } from 'web-vitals';
onLCP(metric => sendToAnalytics(metric));
onINP(metric => sendToAnalytics(metric));
onCLS(metric => sendToAnalytics(metric));

RULE: lab data (Lighthouse) identifies issues; field data (CrUX) confirms impact
RULE: optimize for p75 of real users, not best-case lab scores


PERFORMANCE:BACKEND

LATENCY_METRICS

METRIC: p50 (median) — what most users experience
METRIC: p95 — what 1 in 20 users experience (primary target)
METRIC: p99 — tail latency, often reveals systemic issues
METRIC: throughput — requests per second at steady state
METRIC: error_rate — percentage of 5xx responses

RULE: optimize for p95, monitor p99
RULE: p99 > 10x p50 indicates bimodal distribution — investigate the slow path
RULE: throughput drop + latency rise = saturation point reached

BACKEND_OPTIMIZATION_CHECKLIST

CHECK: N+1 queries — use eager loading, batch queries
CHECK: missing database indexes — EXPLAIN ANALYZE on slow queries
CHECK: connection pool sizing — too small = queuing, too large = memory waste
CHECK: unnecessary serialization — JSON.parse/stringify in hot paths
CHECK: synchronous I/O in async context — blocking the event loop
CHECK: missing caching — Redis/in-memory for frequently accessed immutable data
CHECK: payload size — paginate lists, select specific columns, compress responses
CHECK: redundant middleware — each middleware adds latency to every request

TOOL: profile Node.js application
RUN: node --prof app.js then node --prof-process isolate-*.log

TOOL: trace slow requests
RUN: add timing middleware that logs requests > 500ms with full context


PERFORMANCE:DATABASE

QUERY_ANALYSIS

TOOL: explain a query
RUN: EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) <query>;

CHECK: sequential scan on large table (> 10K rows) → add index
CHECK: nested loop join with high row count → verify join condition, add index
CHECK: sort operation with high cost → add index matching ORDER BY
CHECK: hash join with large work_mem → increase work_mem or optimize query

INDEX_ANALYSIS

TOOL: find missing indexes
RUN: SELECT relname, seq_scan, idx_scan, seq_scan - idx_scan AS too_many_seq FROM pg_stat_user_tables WHERE seq_scan > idx_scan AND seq_scan > 1000 ORDER BY too_many_seq DESC;

TOOL: find unused indexes (waste of write performance)
RUN: SELECT indexrelid::regclass AS index, relid::regclass AS table, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelid NOT IN (SELECT conindid FROM pg_constraint) ORDER BY pg_relation_size(indexrelid) DESC;

TOOL: find duplicate indexes
RUN: SELECT a.indexrelid::regclass, b.indexrelid::regclass FROM pg_index a, pg_index b WHERE a.indrelid = b.indrelid AND a.indexrelid != b.indexrelid AND a.indkey::text = b.indkey::text;

RULE: every WHERE clause, JOIN condition, and ORDER BY used in production should have an index
RULE: composite indexes — put equality conditions first, range conditions last
RULE: don't over-index — each index slows down INSERT/UPDATE/DELETE
RULE: partial indexes for queries with constant WHERE conditions (e.g., WHERE status = 'active')

CONNECTION_POOL_SIZING

FORMULA: connections = (2 * cpu_cores) + effective_spindle_count
NOTE: for SSD, effective_spindle_count ≈ 0, so connections ≈ 2 * cpu_cores
NOTE: for GE (single node, 4 cores): pool per service = ~8-10 connections
NOTE: total across all services must be < max_connections (default 100)

RULE: set pool min=2, max=calculated, idle_timeout=30s, connection_timeout=5s
RULE: monitor pool metrics: total, idle, waiting — alert if waiting > 0 sustained


PERFORMANCE:REGRESSION_DETECTION

PURPOSE: catch performance regressions before they reach production

COMPARISON_METHOD

APPROACH: compare release N metrics against release N-1 baseline

METRICS_TO_COMPARE:
- p50, p95, p99 latency per endpoint
- throughput per endpoint
- error rate per endpoint
- resource usage (CPU, memory) under equivalent load
- frontend: LCP, INP, CLS, bundle size

DETECTION_RULES:
IF: p95 latency increased > 20% vs baseline THEN: regression — block release
IF: p95 latency increased 10-20% vs baseline THEN: warning — investigate
IF: throughput decreased > 10% at same VU count THEN: regression — investigate
IF: error rate increased > 0.1% THEN: regression — block release
IF: bundle size increased > 10KB THEN: warning — justify or fix
IF: LCP increased > 500ms THEN: regression — block release

BASELINE_MANAGEMENT

RULE: baseline is the metric set from the current production release
RULE: update baseline only when a release passes all performance gates
RULE: store baselines in version control alongside performance test scripts
FORMAT: JSON file with endpoint → metric → value mapping

CI_CD_INTEGRATION

PIPELINE_STAGE: performance_gate (runs after integration tests, before deploy)
STEPS:
1. Deploy candidate to staging environment
2. Run load test suite against staging
3. Compare results against stored baseline
4. IF: any threshold exceeded THEN: fail pipeline
5. IF: passed THEN: update baseline for next comparison

TOOL: k6 threshold comparison

export const options = {
  thresholds: {
    'http_req_duration{endpoint:api_users}': ['p(95)<500'],
    'http_req_duration{endpoint:api_projects}': ['p(95)<800'],
  },
};


PERFORMANCE:SLO_DEFINITION

FROM_BUSINESS_TO_SLO

STEP_1: identify user journeys that matter
EXAMPLES: login, load dashboard, create project, deploy, view analytics

STEP_2: define SLI (Service Level Indicator) for each journey
SLI_TYPES:
- Availability: successful requests / total requests
- Latency: proportion of requests faster than threshold
- Quality: proportion of requests returning correct data

STEP_3: set SLO (Service Level Objective) — the target
RULE: SLO < 100% — perfection is not the goal
RULE: typical SLOs: 99.9% availability (43 min downtime/month), 99% latency within target
RULE: SLO should reflect what users actually need, not what engineering wants

STEP_4: define error budget = 1 - SLO
IF: SLO = 99.9% THEN: error_budget = 0.1% = 43 min/month
IF: error budget exhausted THEN: freeze features, focus on reliability

SLO_EXAMPLES_FOR_GE

SLO: client application availability
SLI: successful HTTP responses (non-5xx) / total HTTP responses
TARGET: 99.9% over 30-day rolling window
ERROR_BUDGET: 43 minutes downtime per month

SLO: client application latency
SLI: proportion of requests completing within 500ms
TARGET: 95% of requests under 500ms over 30-day rolling window

SLO: agent pipeline throughput
SLI: tasks completed / tasks submitted within SLA time
TARGET: 99% of tasks completed within SLA

SLO: deployment success rate
SLI: successful deployments / total deployments
TARGET: 99% over 30-day rolling window


PERFORMANCE:LIGHTHOUSE_CI

PURPOSE: automated frontend performance gating in CI/CD

SETUP

TOOL: install Lighthouse CI
RUN: npm install -D @lhci/cli

TOOL: configure Lighthouse CI
FILE: .lighthouserc.json

{
  "ci": {
    "collect": {
      "url": ["http://localhost:3000/", "http://localhost:3000/dashboard"],
      "numberOfRuns": 3,
      "settings": {
        "chromeFlags": "--no-sandbox --headless"
      }
    },
    "assert": {
      "assertions": {
        "categories:performance": ["error", {"minScore": 0.8}],
        "categories:accessibility": ["warn", {"minScore": 0.9}],
        "first-contentful-paint": ["error", {"maxNumericValue": 2000}],
        "largest-contentful-paint": ["error", {"maxNumericValue": 2500}],
        "cumulative-layout-shift": ["error", {"maxNumericValue": 0.1}],
        "total-blocking-time": ["error", {"maxNumericValue": 300}],
        "interactive": ["error", {"maxNumericValue": 5000}]
      }
    },
    "upload": {
      "target": "filesystem",
      "outputDir": "./lhci-results"
    }
  }
}

TOOL: run Lighthouse CI
RUN: npx lhci autorun

CI_PIPELINE_INTEGRATION

RULE: Lighthouse CI runs on every PR that touches frontend code
RULE: performance score < 80 blocks merge
RULE: accessibility score < 90 warns (blocks at < 80)
RULE: run 3x and take median (Lighthouse results vary)
RULE: use consistent hardware/environment (CI runner, not developer laptop)

ANTI_PATTERN: running Lighthouse once and trusting the result
FIX: run 3-5 times, use median — variance of 5-10 points is normal

ANTI_PATTERN: ignoring Lighthouse warnings because "it works fine for me"
FIX: Lighthouse simulates throttled connection — that's what real users experience