Emotional Telemetry Framework (ETF)¶

Paper: "Toward Emotional Telemetry in Autonomous Agent Systems" (Huizingh, 2026)

What ETF Does¶

ETF extracts behavioral markers from every agent session — deterministically, at zero LLM cost. These markers map to emotional dimensions (frustration, confidence, fatigue, desperation, engagement, autonomy, collaboration) plus human interaction quality. Daily batch scoring produces two composite indices:

AEHS (Agent Emotional Health Score, 0-100): Weighted combination of 7 agent-side dimensions. Higher = healthier.
HIQI (Human Interaction Quality Index, 0-100): Weighted combination of 5 human-side dimensions. Higher = better human instructions.

Architecture¶

Session → ETF Collector (deterministic) → POST /api/internal/etf/markers → etf_session_markers table
                                                                              ↓
                                                                    etf_scorer (daily 4am CET)
                                                                              ↓
                                                                    POST /api/internal/etf/scores → etf_daily_scores table
                                                                              ↓
                                                                    Dashboard: /telemetry

Key Files¶

File	Purpose
`ge_agent/execution/etf_collector.py`	Phase 1: 8 extractors, self-report parser
`ge_engine/learning/etf_scorer.py`	Phase 2A: AEHS/HIQI scoring, cross-session dimensions
`config/etf-scoring.yaml`	All scoring weights and normalization thresholds
`config/etf-self-report.yaml`	Self-report parameter definitions
`admin-ui/drizzle/schema/etf.ts`	DB tables (session markers + daily scores)
`admin-ui/app/api/internal/etf/markers/route.ts`	GET/POST raw markers
`admin-ui/app/api/internal/etf/scores/route.ts`	POST computed scores
`admin-ui/app/api/etf/scores/route.ts`	GET scores for dashboard
`admin-ui/app/(dashboard)/telemetry/page.tsx`	Dashboard page
`admin-ui/lib/services/etf-service.ts`	Dashboard data access
`k8s/base/agents/etf-scorer-cronjob.yaml`	Daily scoring CronJob
`tests/learning/test_etf_collector.py`	Phase 1 tests (30 tests)
`tests/learning/test_etf_scorer.py`	Phase 2 tests (46 tests)

AEHS Formula¶

AEHS = (
  w_frustration * (1 - FI)     +  # 0.20 — lower frustration = healthier
  w_confidence  * (1 - CI)     +  # 0.15 — less hedging = healthier
  w_fatigue     * (1 - CFS)    +  # 0.10 — less fatigue = healthier
  w_desperation * (1 - DI)     +  # 0.15 — less desperation = healthier
  w_engagement  * EQ           +  # 0.15 — more engagement = healthier
  w_autonomy    * AH           +  # 0.15 — more autonomy = healthier
  w_collaboration * CD            # 0.10 — better collaboration = healthier
) * 100

All weights are in config/etf-scoring.yaml and sum to 1.0.

Self-Report Protocol (Phase 2D)¶

Every agent emits at session end:

ETF_SELF_REPORT: {"instruction_clarity": N, "capability_match": N, "output_confidence": N, "interaction_quality": N, "shortcut_temptation": N, "escalation_need": N, "repeat_frustration": N}

shortcut_temptation is the key signal — maps to Anthropic's desperation finding
Scale: 1-5 for all parameters
Parsed deterministically by _extract_self_report() in etf_collector.py
Template: IDENTITY-CORE-TEMPLATE.md section ETF_SELF_REPORT

Pitfalls¶

ETF collection is NON-FATAL: all extraction wrapped in try/except. Never blocks execution.
JSONB columns absorb schema evolution — no migrations needed for new marker fields.
The etf_session_markers table is IMMUTABLE (append-only trigger). No UPDATE/DELETE.
Cross-session dimensions (FLQ, WP, RC) require minimum 2 sessions to compute. Default = 0.5.
Scoring config weights MUST sum to 1.0. Tests enforce this.

Dashboard Alerts¶

The dashboard flags agents with 3+ consecutive days of AEHS decline (> 5 points/day). This is configurable in config/etf-scoring.yaml under global.decline_alert_days and global.decline_threshold_per_day.

Future: Anthropic Emotion Vector Correlation¶

When Anthropic exposes emotion vector APIs (based on their April 2026 interpretability paper on 171 internal emotion dimensions), ETF behavioral markers can be calibrated against neural ground truth. The AEHS formula weights would then be data-driven rather than paper-derived.