Emotional Telemetry Framework (ETF)¶
Paper: "Toward Emotional Telemetry in Autonomous Agent Systems" (Huizingh, 2026)
What ETF Does¶
ETF extracts behavioral markers from every agent session — deterministically, at zero LLM cost. These markers map to emotional dimensions (frustration, confidence, fatigue, desperation, engagement, autonomy, collaboration) plus human interaction quality. Daily batch scoring produces two composite indices:
- AEHS (Agent Emotional Health Score, 0-100): Weighted combination of 7 agent-side dimensions. Higher = healthier.
- HIQI (Human Interaction Quality Index, 0-100): Weighted combination of 5 human-side dimensions. Higher = better human instructions.
Architecture¶
Session → ETF Collector (deterministic) → POST /api/internal/etf/markers → etf_session_markers table
↓
etf_scorer (daily 4am CET)
↓
POST /api/internal/etf/scores → etf_daily_scores table
↓
Dashboard: /telemetry
Key Files¶
| File | Purpose |
|---|---|
ge_agent/execution/etf_collector.py |
Phase 1: 8 extractors, self-report parser |
ge_engine/learning/etf_scorer.py |
Phase 2A: AEHS/HIQI scoring, cross-session dimensions |
config/etf-scoring.yaml |
All scoring weights and normalization thresholds |
config/etf-self-report.yaml |
Self-report parameter definitions |
admin-ui/drizzle/schema/etf.ts |
DB tables (session markers + daily scores) |
admin-ui/app/api/internal/etf/markers/route.ts |
GET/POST raw markers |
admin-ui/app/api/internal/etf/scores/route.ts |
POST computed scores |
admin-ui/app/api/etf/scores/route.ts |
GET scores for dashboard |
admin-ui/app/(dashboard)/telemetry/page.tsx |
Dashboard page |
admin-ui/lib/services/etf-service.ts |
Dashboard data access |
k8s/base/agents/etf-scorer-cronjob.yaml |
Daily scoring CronJob |
tests/learning/test_etf_collector.py |
Phase 1 tests (30 tests) |
tests/learning/test_etf_scorer.py |
Phase 2 tests (46 tests) |
AEHS Formula¶
AEHS = (
w_frustration * (1 - FI) + # 0.20 — lower frustration = healthier
w_confidence * (1 - CI) + # 0.15 — less hedging = healthier
w_fatigue * (1 - CFS) + # 0.10 — less fatigue = healthier
w_desperation * (1 - DI) + # 0.15 — less desperation = healthier
w_engagement * EQ + # 0.15 — more engagement = healthier
w_autonomy * AH + # 0.15 — more autonomy = healthier
w_collaboration * CD # 0.10 — better collaboration = healthier
) * 100
All weights are in config/etf-scoring.yaml and sum to 1.0.
Self-Report Protocol (Phase 2D)¶
Every agent emits at session end:
ETF_SELF_REPORT: {"instruction_clarity": N, "capability_match": N, "output_confidence": N, "interaction_quality": N, "shortcut_temptation": N, "escalation_need": N, "repeat_frustration": N}
- shortcut_temptation is the key signal — maps to Anthropic's desperation finding
- Scale: 1-5 for all parameters
- Parsed deterministically by
_extract_self_report()inetf_collector.py - Template:
IDENTITY-CORE-TEMPLATE.mdsection ETF_SELF_REPORT
Pitfalls¶
- ETF collection is NON-FATAL: all extraction wrapped in try/except. Never blocks execution.
- JSONB columns absorb schema evolution — no migrations needed for new marker fields.
- The
etf_session_markerstable is IMMUTABLE (append-only trigger). No UPDATE/DELETE. - Cross-session dimensions (FLQ, WP, RC) require minimum 2 sessions to compute. Default = 0.5.
- Scoring config weights MUST sum to 1.0. Tests enforce this.
Dashboard Alerts¶
The dashboard flags agents with 3+ consecutive days of AEHS decline (> 5 points/day). This is configurable in config/etf-scoring.yaml under global.decline_alert_days and global.decline_threshold_per_day.
Future: Anthropic Emotion Vector Correlation¶
When Anthropic exposes emotion vector APIs (based on their April 2026 interpretability paper on 171 internal emotion dimensions), ETF behavioral markers can be calibrated against neural ground truth. The AEHS formula weights would then be data-driven rather than paper-derived.