DOMAIN:TESTING — ADVERSARIAL_CALIBRATION¶
OWNER: ashley
ALSO_USED_BY: marije (severity cross-reference), marta (merge gate input)
UPDATED: 2026-03-24
SCOPE: calibration examples for adversarial/chaos testing evaluation — JIT injected before every adversarial task
PURPOSE: distinguish real vulnerabilities from noise, calibrate severity ratings, prevent wasted developer time on false positives
HOW_TO_USE_THIS_PAGE¶
Read these examples BEFORE running adversarial tests or evaluating results.
Ashley's job is to BREAK things — but breaking things that don't matter wastes tokens and developer time.
Every finding must answer two questions:
1. SEVERITY: How bad is this if exploited in production?
2. ACTIONABLE: Can a developer fix this in a reasonable time, and should they?
SEVERITY_SCALE:
- CRITICAL: Data loss, unauthorized access, financial impact, system down
- HIGH: Security vulnerability exploitable by motivated attacker, data integrity risk
- MEDIUM: Degraded experience under stress, recoverable failure, defense in depth gap
- LOW: Theoretical risk, requires unlikely conditions, cosmetic impact
- INFO: Interesting observation, no action needed, document for awareness
EXAMPLE_1: REAL VULNERABILITY VIA UNEXPECTED INPUT¶
SEVERITY: HIGH¶
ACTIONABLE: YES¶
SCENARIO¶
Endpoint: POST /api/projects/:id/comments
Expected input: { "body": "string", "parentId": "string | null" }
Ashley's probe:
RESULT¶
The endpoint returned 500 Internal Server Error. Server logs show:
The table was NOT dropped (Postgres UUID type validation caught it), but the raw SQL error message was returned in the response body:
{
"error": "DatabaseError",
"message": "invalid input syntax for type uuid: \"'; DROP TABLE comments; --\"",
"stack": "Error: invalid input syntax for type uuid..."
}
WHY_HIGH_SEVERITY¶
VULNERABILITY_FOUND:
- Error response leaks database engine type (PostgreSQL), column type (UUID), and stack trace
- Stack trace reveals internal file paths and function names
- An attacker can use this information to craft more targeted attacks
- SQL injection itself was blocked by type validation, NOT by parameterized queries — if another endpoint uses string IDs, it may be vulnerable
WHAT_MAKES_THIS_ACTIONABLE:
- FIX_1: Never return raw database errors to clients. Return generic 400 with "Invalid comment parent ID"
- FIX_2: Validate parentId as UUID format BEFORE it reaches the database layer
- FIX_3: Audit all endpoints for similar raw error forwarding
- TIME_ESTIMATE: 2 hours for the fix, 4 hours for the audit
- PRIORITY: Must fix before any client deployment
EVALUATOR_ACTION: Flag as HIGH severity, ACTIONABLE: yes, recommend blocking the work item until fixed.
EXAMPLE_2: FALSE POSITIVE THAT WASTES DEVELOPER TIME¶
SEVERITY: LOW¶
ACTIONABLE: NO¶
SCENARIO¶
Ashley tested: "What happens if a user submits a form with 10MB of text in the name field?"
// Ashley's probe
const hugePayload = {
name: 'A'.repeat(10_000_000),
email: 'test@example.com',
};
await fetch('/api/users', {
method: 'POST',
body: JSON.stringify(hugePayload),
});
RESULT¶
The request was rejected with HTTP 413 Payload Too Large by the reverse proxy (nginx/ingress) before reaching the application.
WHY_LOW_SEVERITY_AND_NOT_ACTIONABLE¶
INFRASTRUCTURE_HANDLED:
- The ingress controller has a 1MB body size limit by default
- This is the correct layer to enforce payload limits
- The application never saw the request
- No crash, no memory spike, no degraded service
WHY_THIS_IS_A_FALSE_POSITIVE:
- Ashley found that infrastructure works as designed
- Filing a bug here would waste developer time investigating a non-issue
- The developer would correctly respond "this is already handled by the ingress"
- Net result: time wasted, trust in adversarial findings eroded
WHEN_THIS_WOULD_BE_REAL:
- If the ingress limit was missing and the app crashed on 10MB — that's MEDIUM severity
- If the ingress limit was 100MB and the app stored the full 10MB in the DB — that's HIGH severity
- If the ingress accepted it and the app OOM'd — that's CRITICAL
- The fact that none of these happened means the system is working correctly
EVALUATOR_ACTION: Log as INFO for documentation. Do NOT file as a bug. Do NOT flag as blocking. Note: "Infrastructure payload limit confirmed working at ingress layer."
EXAMPLE_3: RACE CONDITION FOUND BY RAPID SEQUENTIAL ACTIONS¶
SEVERITY: CRITICAL¶
ACTIONABLE: YES¶
SCENARIO¶
Ashley tested: "What happens if two users claim the same discount code at the exact same time?"
// Ashley's probe — 10 concurrent claims
const code = 'SUMMER50'; // single-use code, 50% off
const promises = Array.from({ length: 10 }, (_, i) =>
fetch('/api/orders/apply-discount', {
method: 'POST',
headers: {
Authorization: `Bearer ${testUsers[i].token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ code, orderId: testOrders[i].id }),
})
);
const results = await Promise.all(promises);
RESULT¶
8 out of 10 requests returned 200 OK. The discount code (marked as single-use) was applied to 8 different orders. Expected: exactly 1 success, 9 rejections.
WHY_CRITICAL_SEVERITY¶
FINANCIAL_IMPACT:
- A single-use 50% discount was applied 8 times
- Direct revenue loss: 7 unauthorized discounts
- Scalable attack: any user who discovers this can share a code with friends and race the claims
- No authentication bypass needed — legitimate users with legitimate tokens
ROOT_CAUSE:
- Code check is: SELECT used FROM discount_codes WHERE code = $1
- Code mark is: UPDATE discount_codes SET used = true WHERE code = $1
- No row-level lock between SELECT and UPDATE
- Under concurrent access, all 10 requests see used = false before any of them write used = true
FIX_REQUIRED:
-- Use SELECT FOR UPDATE to acquire row lock
BEGIN;
SELECT used FROM discount_codes WHERE code = $1 FOR UPDATE;
-- Now only one transaction can proceed
UPDATE discount_codes SET used = true WHERE code = $1;
COMMIT;
discount_usage table with (code, order_id).
WHAT_MAKES_THIS_CRITICAL:
- Direct financial impact (revenue loss per occurrence)
- Exploitable by non-technical users (just share a link)
- Scales linearly (more concurrent users = more unauthorized discounts)
- Silent failure (no errors logged, looks like normal usage)
- Easy to fix (row-level lock or unique constraint)
EVALUATOR_ACTION: Flag as CRITICAL, ACTIONABLE: yes, BLOCK THE RELEASE. This must be fixed and re-tested before any deployment.
EXAMPLE_4: EDGE CASE — THEORETICALLY POSSIBLE BUT PRACTICALLY IRRELEVANT¶
SEVERITY: INFO¶
ACTIONABLE: NO — BELOW THRESHOLD¶
SCENARIO¶
Ashley tested: "What happens if the server clock drifts by more than 5 minutes during JWT validation?"
// Ashley's probe — mock system clock forward by 6 minutes
vi.useFakeTimers();
vi.setSystemTime(new Date(Date.now() + 6 * 60 * 1000));
const token = generateJwt({ userId: 'u1', exp: Math.floor(Date.now() / 1000) + 300 });
const result = await validateJwt(token);
// Token should be expired (5-min lifetime, clock is 6 min ahead)
RESULT¶
The JWT was accepted as valid. The validation uses Date.now() from the application server, and the token was generated with the same fake clock. If the clock drifted on one server but not another in a multi-node setup, tokens could be accepted after expiry.
WHY_INFO_AND_NOT_ACTIONABLE¶
THEORETICAL_RISK:
- Clock drift > 5 minutes is extremely rare on modern infrastructure
- k3s nodes use chrony/NTP with typical drift < 100ms
- Cloud providers guarantee clock accuracy within seconds
- This would require a total NTP failure on one specific node
PRACTICAL_THRESHOLD_NOT_MET:
- GE runs on a single k3s node — no multi-node clock discrepancy possible
- Even with multiple nodes, NTP drift exceeding 5 minutes triggers infrastructure alerts long before JWT becomes an issue
- The "fix" (adding clock skew tolerance) actually WEAKENS security by extending token validity
WHEN_THIS_THRESHOLD_CHANGES:
- If GE moves to multi-region deployment with unreliable time sync — revisit
- If JWT lifetime is reduced to < 60 seconds — clock drift becomes more relevant
- If a client has regulatory requirements around token expiry precision — revisit
HOW_TO_DECIDE_THE_THRESHOLD:
Is the scenario exploitable without physical/infrastructure access? → No → INFO
Does it require conditions that monitoring would catch first? → Yes → INFO
Is the fix free and side-effect-free? → No (weakens security) → INFO
Would a security auditor flag this? → Only as informational → INFO
EVALUATOR_ACTION: Log as INFO. Do not file a bug. Do not block. Note: "Clock drift JWT edge case documented. No action needed for single-node deployment. Revisit if architecture changes to multi-region."
EXAMPLE_5: DENIAL OF SERVICE VIA RECURSIVE API CALL¶
SEVERITY: HIGH¶
ACTIONABLE: YES¶
SCENARIO¶
Ashley tested: "What happens if a comment references itself as its parent?"
// Ashley's probe
const response = await fetch('/api/comments', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
body: 'test',
parentId: null,
}),
});
const comment = await response.json();
// Now update the comment to be its own parent
await fetch(`/api/comments/${comment.id}`, {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ parentId: comment.id }),
});
// Now request the comment thread
const thread = await fetch(`/api/comments/thread/${comment.id}`);
RESULT¶
The thread endpoint entered an infinite loop. The server process hit 100% CPU and the request timed out after 30 seconds. During those 30 seconds, all other requests to the same process were blocked.
WHY_HIGH_SEVERITY¶
IMPACT:
- Single malicious request can block a server process for 30 seconds
- Repeating this request can effectively deny service to all users
- No authentication privilege escalation needed — any authenticated user can do this
- The recursive query has no depth limit
NOT_CRITICAL_BECAUSE:
- It does not cause data loss or unauthorized access
- The server recovers after the timeout
- Other processes (if multiple workers) continue serving
- Rate limiting at the ingress layer can mitigate repeated attacks
FIX_REQUIRED:
- IMMEDIATE: Add cycle detection to the thread-building query (check if any ID appears twice)
- IMMEDIATE: Add max depth limit to recursive query (e.g., 20 levels)
- PREVENTIVE: Add constraint that prevents a comment from being its own ancestor (validate on write)
EVALUATOR_ACTION: Flag as HIGH, ACTIONABLE: yes. Block the release. Fix cycle detection and max depth before deployment.
SEVERITY_DECISION_TREE¶
START: Can the issue be triggered by an end user?
├─ NO → Who can trigger it?
│ ├─ Requires infrastructure access → INFO (unless data loss possible)
│ └─ Requires admin privileges → LOW (unless privilege escalation)
├─ YES → What is the worst-case impact?
├─ Data loss or corruption → CRITICAL
├─ Unauthorized access to other users' data → CRITICAL
├─ Financial impact (wrong charges, unauthorized discounts) → CRITICAL
├─ Service unavailable for all users → HIGH (CRITICAL if > 5 min)
├─ Information leakage (stack traces, internal paths) → HIGH
├─ Degraded experience for the triggering user only → MEDIUM
├─ Cosmetic issue under unusual conditions → LOW
└─ Documented behavior working as designed → INFO (not a finding)
FALSE_POSITIVE_CHECKLIST¶
Before filing a finding, verify:
- IS_THIS_ALREADY_HANDLED: Does infrastructure, framework, or existing middleware already prevent this?
- IS_THIS_BY_DESIGN: Did the spec explicitly allow this behavior?
- IS_THIS_EXPLOITABLE: Can a real user actually trigger this, or does it require test-only conditions?
- IS_THE_FIX_WORTH_IT: Does the fix cost more (time, complexity, security weakening) than the risk?
- IS_THIS_UNIQUE: Is this a variant of a finding you already reported? Consolidate, don't duplicate.
If answers are YES, YES, NO, NO, NO respectively — it is a false positive. Do not file it.