Skip to content

DOMAIN:MONITORING — PATTERN_DETECTION

OWNER: eltjo, annegreet
ALSO_USED_BY: ron, mira, nessa
UPDATED: 2026-03-26
SCOPE: cross-session pattern recognition, error clustering, anomaly detection, trend analysis, learning extraction, confidence scoring


PATTERN_DETECTION:CORE_PRINCIPLE

PURPOSE: automatically identify recurring error patterns, systemic issues, and improvement opportunities from aggregated session data

RULE: pattern detection operates on aggregated data, not individual sessions
RULE: patterns must reach a confidence threshold before being treated as knowledge
RULE: patterns can be validated, stale, or invalidated — never blindly trusted
RULE: human review is required before patterns influence agent behavior (JIT injection)

CHECK: is the pattern detection pipeline producing new patterns?
IF: no patterns in 7 days THEN: pipeline may be broken — check data flow
IF: producing patterns THEN: review confidence distribution — too many low-confidence is noise


PATTERN_DETECTION:CROSS_SESSION_RECOGNITION

FINGERPRINT_BASED_GROUPING

PURPOSE: group related errors across different agent sessions using fingerprints

TECHNIQUE:

1. each session produces structured learnings with error fingerprints  
2. fingerprint = sha256(normalized_error_class + normalized_message_template)  
3. group all learnings by fingerprint  
4. count occurrences, unique agents, time span  
5. groups with 3+ occurrences become candidate patterns  

TOOL: SQL aggregation on session_learnings

SELECT  
    fingerprint,  
    COUNT(*) as occurrences,  
    COUNT(DISTINCT agent_name) as affected_agents,  
    array_agg(DISTINCT agent_name) as agents,  
    MIN(created_at) as first_seen,  
    MAX(created_at) as last_seen,  
    MODE() WITHIN GROUP (ORDER BY error_class) as primary_error_class  
FROM session_learnings  
WHERE created_at > NOW() - INTERVAL '7 days'  
  AND fingerprint IS NOT NULL  
GROUP BY fingerprint  
HAVING COUNT(*) >= 3  
ORDER BY occurrences DESC;  

CHECK: are fingerprints grouping correctly?
IF: same error gets different fingerprints THEN: normalization too narrow — strip more variable parts
IF: different errors get same fingerprint THEN: normalization too broad — keep more specific tokens

RULE: review fingerprint quality monthly — adjust normalization rules as needed

SCOPE_CLASSIFICATION

PURPOSE: determine whether a pattern is system-wide or agent-specific

RULE: 3+ unique agents affected = system-wide pattern (tag scope:system)
RULE: single agent only = agent-specific pattern (tag scope:agent:{name})
RULE: 2 agents = inconclusive — wait for more data before classifying

CHECK: is the pattern scope correctly assigned?
IF: system-wide pattern caused by single root cause THEN: correct
IF: agent-specific patterns share same fingerprint THEN: may be system-wide — check normalization

RULE: system-wide patterns are 3x higher priority than agent-specific patterns
RULE: system-wide patterns are candidates for infrastructure fixes
RULE: agent-specific patterns are candidates for agent configuration changes


PATTERN_DETECTION:ERROR_CLUSTERING

TEMPORAL_CLUSTERING

PURPOSE: detect bursts of errors that indicate an active incident

TECHNIQUE: sliding window count

1. query errors in last 1 hour, grouped by 5-minute buckets  
2. compute mean error rate per bucket  
3. if any bucket exceeds 3x the mean: temporal cluster detected  
4. temporal cluster = likely active incident  

TOOL: SQL windowed aggregation

SELECT  
    DATE_TRUNC('minute', created_at) as bucket,  
    COUNT(*) as error_count,  
    AVG(COUNT(*)) OVER (  
        ORDER BY DATE_TRUNC('minute', created_at)  
        ROWS BETWEEN 12 PRECEDING AND CURRENT ROW  
    ) as rolling_avg  
FROM session_learnings  
WHERE level = 'error'  
  AND created_at > NOW() - INTERVAL '1 hour'  
GROUP BY bucket  
HAVING COUNT(*) > 3 * AVG(COUNT(*)) OVER (  
    ORDER BY DATE_TRUNC('minute', created_at)  
    ROWS BETWEEN 12 PRECEDING AND CURRENT ROW  
);  

CHECK: is there a temporal cluster right now?
IF: yes THEN: active incident — correlate with recent deployments or config changes
IF: no THEN: normal error rate — continue routine monitoring

AGENT_CLUSTERING

PURPOSE: detect when multiple agents experience the same class of error simultaneously

TECHNIQUE:

1. query errors in last 30 minutes, grouped by error_class  
2. for each error_class: count unique agents affected  
3. if 3+ agents share the same error_class: agent cluster detected  
4. agent cluster = likely infrastructure issue (not agent misconfiguration)  

CHECK: are multiple agents failing with the same error?
IF: yes THEN: shared dependency failure — investigate infrastructure (Redis, DB, network)
IF: no THEN: isolated issue — investigate individual agent or task

RULE: agent clustering overrides individual agent alerts — suppress agent alerts, emit cluster alert

CATEGORY_CLUSTERING

PURPOSE: detect when errors from the same category spike across the system

CATEGORIES (from error taxonomy):

infra:*     — infrastructure failures  
runtime:*   — code-level errors  
api:*       — API communication issues  
cost:*      — budget violations  
loop:*      — infinite loop / hook loop  

CHECK: is one error category dominating the last hour?
IF: > 70% of errors are in one category THEN: systematic issue in that domain
IF: errors spread across categories THEN: multiple unrelated issues or noisy period


PATTERN_DETECTION:ANOMALY_DETECTION

BASELINE_DEVIATION

PURPOSE: detect when current behavior significantly differs from historical baseline

TECHNIQUE: statistical baseline

1. compute 7-day rolling average for key metrics:  
   - errors per hour  
   - session duration  
   - cost per session  
   - task completion rate  
2. compute standard deviation  
3. current value > mean + 2*stddev = anomaly (warning)  
4. current value > mean + 3*stddev = anomaly (critical)  

METRICS_TO_MONITOR:

error_rate_per_hour:       sudden increase = incident  
avg_session_duration:      sudden increase = performance regression  
avg_cost_per_session:      sudden increase = cost burn  
task_completion_rate:      sudden decrease = systemic failure  
avg_tokens_per_session:    sudden increase = prompt bloat or loop  

CHECK: any metric outside 2-sigma?
IF: yes THEN: anomaly detected — investigate
IF: all within bounds THEN: system operating normally

ANTI_PATTERN: setting static thresholds for anomaly detection
FIX: use statistical baselines — static thresholds do not adapt to system growth

ANTI_PATTERN: anomaly detection on less than 7 days of data
FIX: insufficient baseline data produces false positives — wait for enough history

CHANGE_POINT_DETECTION

PURPOSE: detect when a metric permanently shifts to a new level (not a spike, a regime change)

TECHNIQUE:

1. compare mean of last 24 hours vs previous 7-day mean  
2. if shift > 2*stddev AND sustained for > 6 hours: change point  
3. change point = permanent shift, not transient anomaly  
4. investigate: was there a deployment, config change, or dependency upgrade?  

CHECK: has error rate permanently shifted up after last deployment?
IF: yes THEN: deployment introduced a regression — rollback or fix
IF: no (temporary spike that recovered) THEN: transient issue, monitor


PATTERN_DETECTION:TREND_ANALYSIS

PURPOSE: detect deteriorating or improving patterns over hours/days

TECHNIQUE: linear regression on hourly error counts

1. collect hourly error counts for last 48 hours  
2. fit linear regression: error_count = slope * hour + intercept  
3. if slope > 0 (statistically significant): deteriorating trend  
4. if slope < 0 (statistically significant): improving trend  
5. if slope ~ 0: stable  

CHECK: is the error trend deteriorating?
IF: positive slope > 1 error/hour increase THEN: warn — investigate before it becomes critical
IF: stable or improving THEN: no action needed

PURPOSE: detect systemic improvements or degradations over weeks/months

METRICS_FOR_LONG_TERM:

weekly_error_count:        decreasing = system maturing  
weekly_mttr:               decreasing = operations improving  
weekly_cost_per_task:      decreasing = efficiency improving  
weekly_learning_reuse:     increasing = knowledge pipeline working  
weekly_false_positive_rate: decreasing = fingerprinting improving  

RULE: report long-term trends in weekly SLA report
RULE: deteriorating long-term trends trigger planning discussions, not immediate alerts


PATTERN_DETECTION:LEARNING_EXTRACTION

FROM_PATTERNS_TO_LEARNINGS

PURPOSE: promote validated patterns into actionable learnings for agent JIT injection

PROMOTION_CRITERIA:

1. confidence >= 0.8 (see confidence scoring below)  
2. resolution_rate >= 0.7 (pattern has known solution that works)  
3. recency: last_seen within 30 days  
4. diversity: affects 3+ agents OR is critical severity  

CHECK: does the pattern meet all promotion criteria?
IF: all met THEN: promote to VALIDATED learning in knowledge_patterns table
IF: not met THEN: keep as candidate — continue collecting data

RULE: promoted learnings are written to wiki at wiki/docs/learnings/patterns/
RULE: promoted learnings are eligible for JIT injection into agent prompts
RULE: max 5 learnings injected per session (500 token budget)

LEARNING_FORMAT

PURPOSE: structured format for learnings that can be injected into agent context

FORMAT:

symptom: "exact error message template (normalized)"  
context: "what the agent was typically doing when this occurs"  
root_cause: "underlying reason for the error"  
solution: "specific steps to resolve"  
prevention: "what to do differently to avoid this"  
confidence: 0.0-1.0  
tags: ["category:subcategory", "scope:system|agent:name"]  
agents_affected: ["boris", "gerco", "thijmen"]  
first_seen: "2026-03-01"  
last_seen: "2026-03-25"  
occurrences: 47  

RULE: one learning per distinct root cause
RULE: solution field must be specific and actionable
RULE: learnings with null solution are still valuable (known issues without fix)


PATTERN_DETECTION:CONFIDENCE_SCORING

CONFIDENCE_FORMULA

PURPOSE: quantify how trustworthy a detected pattern is

FORMULA:

base_confidence = min(1.0, occurrences / 10)  
recency_factor = 1.0 if last_seen < 24h  
                 0.8 if last_seen < 7d  
                 0.5 if last_seen < 30d  
                 0.2 if last_seen > 30d  
diversity_factor = min(1.0, unique_agents / 5)  
resolution_factor = successful_resolutions / total_occurrences  

confidence = (base_confidence * 0.4) +  
             (recency_factor * 0.2) +  
             (diversity_factor * 0.2) +  
             (resolution_factor * 0.2)  

RULE: confidence is recalculated on every new occurrence
RULE: confidence < 0.3 AND last_seen > 30 days = STALE
RULE: confidence >= 0.8 AND resolution >= 0.7 = VALIDATED
RULE: STALE learnings excluded from JIT injection but never deleted

STALENESS_DETECTION

TRIGGERS for staleness review:

1. learning references a file path that no longer exists  
2. learning references a dependency version that has been upgraded  
3. learning has not matched any session error in 60 days  
4. codebase area the learning covers has been refactored (git log check)  

TOOL: staleness sweep query

SELECT id, symptom, solution, confidence, last_matched_at,  
       EXTRACT(DAYS FROM NOW() - last_matched_at) as days_since_match  
FROM knowledge_patterns  
WHERE confidence > 0  
  AND (last_matched_at < NOW() - INTERVAL '60 days'  
       OR last_matched_at IS NULL)  
ORDER BY confidence DESC;  

RULE: do NOT delete stale learnings — demote to confidence 0.1
RULE: stale learnings can be revived if a new matching error occurs
RULE: weekly staleness sweep (CronJob, not file watcher — NEVER use file watchers)

ANTI_PATTERN: keeping confidence scores from initial detection forever
FIX: recalculate on every match — confidence should reflect current relevance

ANTI_PATTERN: treating confidence as binary (trusted/not trusted)
FIX: confidence is a spectrum — use thresholds for different decisions