Skip to content

Emotional Telemetry Framework (ETF)

Paper: "Toward Emotional Telemetry in Autonomous Agent Systems" (Huizingh, 2026)

What ETF Does

ETF extracts behavioral markers from every agent session — deterministically, at zero LLM cost. These markers map to emotional dimensions (frustration, confidence, fatigue, desperation, engagement, autonomy, collaboration) plus human interaction quality. Daily batch scoring produces two composite indices:

  • AEHS (Agent Emotional Health Score, 0-100): Weighted combination of 7 agent-side dimensions. Higher = healthier.
  • HIQI (Human Interaction Quality Index, 0-100): Weighted combination of 5 human-side dimensions. Higher = better human instructions.

Architecture

Session → ETF Collector (deterministic) → POST /api/internal/etf/markers → etf_session_markers table
                                                                    etf_scorer (daily 4am CET)
                                                                    POST /api/internal/etf/scores → etf_daily_scores table
                                                                    Dashboard: /telemetry

Key Files

File Purpose
ge_agent/execution/etf_collector.py Phase 1: 8 extractors, self-report parser
ge_engine/learning/etf_scorer.py Phase 2A: AEHS/HIQI scoring, cross-session dimensions
config/etf-scoring.yaml All scoring weights and normalization thresholds
config/etf-self-report.yaml Self-report parameter definitions
admin-ui/drizzle/schema/etf.ts DB tables (session markers + daily scores)
admin-ui/app/api/internal/etf/markers/route.ts GET/POST raw markers
admin-ui/app/api/internal/etf/scores/route.ts POST computed scores
admin-ui/app/api/etf/scores/route.ts GET scores for dashboard
admin-ui/app/(dashboard)/telemetry/page.tsx Dashboard page
admin-ui/lib/services/etf-service.ts Dashboard data access
k8s/base/agents/etf-scorer-cronjob.yaml Daily scoring CronJob
tests/learning/test_etf_collector.py Phase 1 tests (30 tests)
tests/learning/test_etf_scorer.py Phase 2 tests (46 tests)

AEHS Formula

AEHS = (
  w_frustration * (1 - FI)     +  # 0.20 — lower frustration = healthier
  w_confidence  * (1 - CI)     +  # 0.15 — less hedging = healthier
  w_fatigue     * (1 - CFS)    +  # 0.10 — less fatigue = healthier
  w_desperation * (1 - DI)     +  # 0.15 — less desperation = healthier
  w_engagement  * EQ           +  # 0.15 — more engagement = healthier
  w_autonomy    * AH           +  # 0.15 — more autonomy = healthier
  w_collaboration * CD            # 0.10 — better collaboration = healthier
) * 100

All weights are in config/etf-scoring.yaml and sum to 1.0.

Self-Report Protocol (Phase 2D)

Every agent emits at session end:

ETF_SELF_REPORT: {"instruction_clarity": N, "capability_match": N, "output_confidence": N, "interaction_quality": N, "shortcut_temptation": N, "escalation_need": N, "repeat_frustration": N}

  • shortcut_temptation is the key signal — maps to Anthropic's desperation finding
  • Scale: 1-5 for all parameters
  • Parsed deterministically by _extract_self_report() in etf_collector.py
  • Template: IDENTITY-CORE-TEMPLATE.md section ETF_SELF_REPORT

Pitfalls

  • ETF collection is NON-FATAL: all extraction wrapped in try/except. Never blocks execution.
  • JSONB columns absorb schema evolution — no migrations needed for new marker fields.
  • The etf_session_markers table is IMMUTABLE (append-only trigger). No UPDATE/DELETE.
  • Cross-session dimensions (FLQ, WP, RC) require minimum 2 sessions to compute. Default = 0.5.
  • Scoring config weights MUST sum to 1.0. Tests enforce this.

Dashboard Alerts

The dashboard flags agents with 3+ consecutive days of AEHS decline (> 5 points/day). This is configurable in config/etf-scoring.yaml under global.decline_alert_days and global.decline_threshold_per_day.

Future: Anthropic Emotion Vector Correlation

When Anthropic exposes emotion vector APIs (based on their April 2026 interpretability paper on 171 internal emotion dimensions), ETF behavioral markers can be calibrated against neural ground truth. The AEHS formula weights would then be data-driven rather than paper-derived.