Multi-Tenancy Isolation Design¶

Status: IMPLEMENTED — Phases 0-6 complete, merged to main 2026-04-11
Date: 2026-04-11
Author: Claude (Opus 4.6)
Scope: Full technical design for client/project isolation at scale
Cross-validation: Data flow trace (2 flows), Evil Client B (12 scenarios), Exhaustive table check (45/45)
Implementation: §0 + Phases 1-6 merged. 644 unit tests, 31 structural tests. Branch feat/multi-tenancy-phase4-redis.
Remaining: Wiki brain partitioning (§8) — separate concern, not blocking commercial operation.

0. Prerequisites (Fix Before Multi-Tenancy)¶

These are pre-existing bugs that must be fixed BEFORE any multi-tenancy work begins. They are independent fixes that improve the system regardless.

0.1 Field Name Mismatch: `client` vs `client_id`¶

Severity: CRITICAL (data silently lost on every execution today)
Found by: Cross-validation Method 1, Hop 3

The orchestrator writes client_id and project_id to trigger streams (main.py:630-631):

"client_id": data.get("client_id", ""),
"project_id": data.get("project_id", ""),

The listener reads client and project (listener.py:404-405):

client=msg_data.get("client"),      # Always None — wrong key name
project=msg_data.get("project"),    # Always None — wrong key name

Fix: Change listener to read client_id and project_id:

client=msg_data.get("client_id"),
project=msg_data.get("project_id"),

This fix must land FIRST because every downstream system (HALT checking, billing, prompt building, COMP file metadata) depends on TaskContext.client and TaskContext.project being populated.

0.2 `ge:work:completed` Stream Missing Tenant Data¶

Severity: CRITICAL (breaks entire inter-agent chain)
Found by: Cross-validation Method 1, Hop 4 + agent-to-agent trace Hop 4

The design doc (§6.3) previously stated ge:work:completed "already carries" client_id/project_id. This is false. The XADD in task-service.ts:updateTask() (line 212-225) writes 9 fields but neither project_id nor client_id is among them.

Fix: After Phase 1 (migration 0021 adds project_id to tasks), update updateTask() to include tenant fields in the completion XADD:

await redis.xadd('ge:work:completed', '*',
  'task_id', task.id,
  'agent_id', task.agentId,
  'project_id', task.projectId ?? '',  // NEW
  'client_id', project?.clientId ?? '', // NEW (join via project)
  'status', status,
  // ... existing fields
);

0.3 Auto-Chain Does Not Propagate Tenant Context¶

Severity: CRITICAL (Koen→Marije handoff loses all client context)
Found by: Cross-validation Method 1, Hop 14 + agent-to-agent trace Hop 6a

When _process_completion() creates a chained task (lines 842-858), the XADD to ge:work:incoming carries task_id, title, description, agent_id, priority, etc. but NOT project_id or client_id. Same gap in hooks.py:fire_hooks() (lines 232-258) — hook dispatches carry 12 fields, none of which are tenant context.

Fix: Both auto-chain and hook dispatch must forward project_id and client_id from the completion data:

# In _process_completion() auto-chain XADD:
"project_id": data.get("project_id", ""),
"client_id": data.get("client_id", ""),

# In hooks.py fire_hooks() XADD:
"project_id": completion_data.get("project_id", ""),
"client_id": completion_data.get("client_id", ""),

0.4 Session Worktrees for Parallel Claude Code Sessions¶

Severity: HIGH (recurring data loss from parallel sessions)
Found by: /insights report — 3+ sessions had lost commits, contaminated staging, reflog recovery

When multiple Claude Code sessions run simultaneously on the same repo checkout, they share the git index. git add from Session A includes Session B's unstaged changes. git reset in one session destroys the other's work.

This is independent of multi-tenancy (it's an operator workflow problem, not a client isolation problem) but uses the same mechanism (git worktrees) and should ship alongside.

Fix: Each Claude Code session operates in its own worktree:

/home/claude/ge-bootstrap/                    ← main checkout (Dirk-Jan direct work)
/home/claude/ge-bootstrap-sessions/
  session-{uuid}/                             ← per-session worktree (ephemeral)
    .git                                      ← worktree link
    ...                                       ← full working tree

Lifecycle: 1. Session starts → git worktree add ../ge-bootstrap-sessions/session-{uuid} -b session/{uuid} 2. Session works in the worktree — all git operations are isolated 3. Session ends → merge branch to target, git worktree remove 4. Cleanup cron: remove stale worktrees older than 24h

Relation to multi-tenancy worktrees: These are separate. Session worktrees isolate operator sessions from each other. Multi-tenancy worktrees (§7.1) isolate client projects from each other. Both use git worktree but at different layers:

	Session worktrees	Project worktrees
Created by	Claude Code session startup	Executor on first task for a project
Path	`/home/claude/ge-bootstrap-sessions/`	`/workspaces/{client}/{project}/`
Lifecycle	Ephemeral (session duration)	Persistent (project lifetime)
Purpose	Git index safety between parallel sessions	Client data isolation

1. Problem Statement¶

GE is a multi-client software development agency. When multiple clients have active projects simultaneously, their data, code, conversations, and agent learnings must be fully isolated. Today, GE operates as a single-tenant system where serialized execution (MAX_CONCURRENT_PER_AGENT=1) accidentally prevents the worst isolation failures.

This design enables safe concurrent multi-client operation without data leakage.

2. Design Principles¶

Additive, not destructive. Every change is backwards-compatible. New columns get defaults. Existing queries keep working. No big-bang cutover.
project_id is the isolation key. Not client_id. Projects already link to clients via FK. Using project_id gives finer granularity (a client with 2 projects gets per-project isolation) and avoids a separate migration to add client_id later.
Dual identity unification. The clients table and client_companies table must be bridged. We add client_id FK to client_companies to connect the two hierarchies.
Test at every step. Each migration phase ships with isolation tests that verify cross-tenant queries return empty results.

3. Scope¶

3.1 What Changes¶

Layer	Changes	Estimated Scope
Database schema	14 tables get `project_id` column	14 migrations
Database queries	~25 service functions get WHERE clauses	~25 function edits
Redis keys	~15 key patterns get `{project_id}:` prefix	~15 code changes
Redis streams	Team-bound + shared agent stream namespacing	Orchestrator + listener
Filesystem	Per-project workspace via git worktree	Executor config
Wiki brain	Client/project knowledge partitioning	Wiki structure + JIT injector
Knowledge	Learnings get `project_id` scope	Learnings service + injector
Chat	Chat sessions scoped by `project_id`	Chat service
Monitoring	Project attribution on alerts/incidents	Monitoring agents + alerts

3.2 What Does NOT Change¶

agents table — agents are shared resources, not tenant-scoped
authority_rules — internal governance
team_capacity — global config
service_providers, provider_models, task_type_multipliers, billing_config — global config tables
credentials — WebAuthn, internal auth only
system_state — global KV
Billing tables (wallets, consumption_records, project_budgets, etc.) — already properly scoped

4. Database Migrations¶

4.1 Bridge the Dual Identity System¶

Migration 0020: Add client_id FK to client_companies

ALTER TABLE client_companies
  ADD COLUMN client_id UUID REFERENCES clients(id);

-- Backfill: match by name or leave NULL for manual linking
-- This MUST be manually verified before enforcing NOT NULL

This bridges the security layer (client_companies → client_users → agent_sessions) with the operational layer (clients → projects → everything else).

4.2 Add `project_id` to 14 Operational Tables¶

Each migration follows the same pattern: 1. ALTER TABLE ADD COLUMN project_id UUID REFERENCES projects(id) 2. DEFAULT NULL initially (backwards-compatible) 3. Backfill from context where possible 4. Add index on project_id 5. Future: enforce NOT NULL once all write paths populate it

Migration 0021: tasks

ALTER TABLE tasks ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_tasks_project ON tasks(project_id);
-- Backfill: match via work_package_deps.task_id → work_package_deps.project_id
UPDATE tasks t SET project_id = wp.project_id
  FROM work_package_deps wp WHERE wp.task_id = t.id AND t.project_id IS NULL;

Migration 0022: task_logs (inherits via tasks FK — no column needed, but add for query performance)

-- No separate column needed. task_logs.task_id → tasks.project_id.
-- For query perf, denormalize:
ALTER TABLE task_logs ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_task_logs_project ON task_logs(project_id);

Migration 0023: questions

ALTER TABLE questions ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_questions_project ON questions(project_id);
-- Backfill: questions.task_id → tasks.project_id (after 0021)
UPDATE questions q SET project_id = t.project_id
  FROM tasks t WHERE t.id = q.task_id AND q.project_id IS NULL;

Migration 0024: feedback

ALTER TABLE feedback ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_feedback_project ON feedback(project_id);

Migration 0025: chat_messages

ALTER TABLE chat_messages ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_chat_messages_project ON chat_messages(project_id);

Migration 0026: session_learnings

ALTER TABLE session_learnings ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_session_learnings_project ON session_learnings(project_id);

Migration 0027: knowledge_patterns

ALTER TABLE knowledge_patterns ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_knowledge_patterns_project ON knowledge_patterns(project_id);
-- NULL = global pattern (applies to all projects)
-- Non-NULL = client-specific pattern (scoped)

Migration 0028: agent_learnings

ALTER TABLE agent_learnings ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_agent_learnings_project ON agent_learnings(project_id);
-- NULL = global learning (framework knowledge)
-- Non-NULL = client-specific learning

Migration 0029: learnings (old table)

ALTER TABLE learnings ADD COLUMN project_id UUID REFERENCES projects(id);

Migration 0030: etf_session_markers

ALTER TABLE etf_session_markers ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_etf_markers_project ON etf_session_markers(project_id);

Migration 0031: etf_daily_scores

-- Daily scores become per-agent-per-project-per-day (not just per-agent-per-day)
ALTER TABLE etf_daily_scores ADD COLUMN project_id UUID REFERENCES projects(id);
-- Drop old unique constraint and create new one
ALTER TABLE etf_daily_scores DROP CONSTRAINT IF EXISTS uq_etf_daily_agent_date;
ALTER TABLE etf_daily_scores ADD CONSTRAINT uq_etf_daily_agent_project_date
  UNIQUE (agent_id, project_id, score_date);
CREATE INDEX idx_etf_scores_project ON etf_daily_scores(project_id);

Migration 0032: daily_metrics

ALTER TABLE daily_metrics ADD COLUMN project_id UUID REFERENCES projects(id);
-- NULL = system-wide aggregate (kept for backwards compat)
-- Non-NULL = per-project metrics
ALTER TABLE daily_metrics DROP CONSTRAINT IF EXISTS daily_metrics_pkey;
ALTER TABLE daily_metrics ADD COLUMN id UUID DEFAULT gen_random_uuid();
ALTER TABLE daily_metrics ADD PRIMARY KEY (id);
CREATE UNIQUE INDEX idx_daily_metrics_date_project
  ON daily_metrics(date, project_id) WHERE project_id IS NOT NULL;

Migration 0033: alerts

ALTER TABLE alerts ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_alerts_project ON alerts(project_id);

Migration 0034: audit_log

ALTER TABLE audit_log ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_audit_log_project ON audit_log(project_id);

4.3 Drizzle Schema Updates¶

Every table above gets its schema .ts file updated to include:

projectId: uuid('project_id').references(() => projects.id),

Plus an index entry in the table's index block.

5. Service Query Updates¶

Every service function that returns data to a client-facing context must accept and filter by projectId. Functions that serve the admin dashboard (Dirk-Jan only) can optionally show cross-project data.

5.1 task-service.ts (7 functions)¶

Function	Change
`createTask()`	Accept `projectId`, INSERT it into the column (currently silently dropped)
`updateTask()`	No change (operates by task.id)
`cancelTask()`	No change (operates by task.id)
`addLog()`	Denormalize `projectId` from parent task
`getTasks()`	Add optional `projectId` filter parameter
`getTask()`	No change (single record by ID)
`getTaskCounts()`	Add optional `projectId` filter, default to global for admin

5.2 learnings-service.ts (5 functions)¶

Function	Change
`createLearning()`	Accept `projectId`, store it. Trigram dedup scoped to project.
`getLearnings()`	Add `projectId` filter. NULL projectId returns global learnings.
`getHotLearnings()`	Filter by projectId. Return global (NULL) + project-specific.
`checkAndPromote()`	No change (operates by learning.id)
`runDemotionCheck()`	No change (global sweep is correct — demotes unused learnings everywhere)

5.3 etf-service.ts (6 functions)¶

Function	Change
`getSystemETFSummary()`	Add optional `projectId`. NULL = system-wide (admin only).
`getAgentETFTimeline()`	Add optional `projectId` filter.
`getAgentComparison()`	Add optional `projectId` filter.
`getETFAlerts()`	Add optional `projectId` filter.
`getAgentsWithScores()`	Add optional `projectId` filter.
`getMarkerCount()`	Add optional `projectId` filter.

5.4 etf-calibration.ts (3 functions)¶

Function	Change
`getOutcomeCorrelation()`	Add optional `projectId` to SQL WHERE.
`getComponentCorrelation()`	Add optional `projectId` to SQL WHERE.
`getSelfReportCorrelation()`	Add optional `projectId` to SQL WHERE.

5.5 claude-chat.ts (2 functions)¶

Function	Change
`chatWithAgent()`	CRITICAL: Accept `projectId`. Load message history filtered by `projectId`. System prompt must include "You are working exclusively for {client_name} on {project_name}."
`getChatHistory()`	Add required `projectId` filter. Never return cross-project messages.

5.6 discussion services (3 functions)¶

Function	Change
`createDiscussion()`	Enforce `projectId` is NOT NULL (currently nullable).
`listDiscussions()`	Add required `projectId` filter for client-facing, optional for admin.
`getDiscussion()`	Verify project ownership before returning data.

6. Redis Key Namespacing¶

6.1 Strategy¶

All Redis keys that store per-execution or per-agent-session data get prefixed with {project_id}:. Keys that are truly global (health, config) stay unprefixed.

6.2 Key Changes¶

Current Key	New Key	Scope
`exec_dedup:{work_item_id}`	`exec_dedup:{project_id}:{work_item_id}`	Per-project dedup
`ge:idem:{work_item_id}:{action}`	`ge:idem:{project_id}:{work_item_id}:{action}`	Per-project idempotency
`ge:wal:{work_item_id}`	`ge:wal:{project_id}:{work_item_id}`	Per-project WAL
`work_item:{work_item_id}`	`work_item:{project_id}:{work_item_id}`	Per-project state machine
`hook_triggered:{comp}:{hook}`	`hook_triggered:{project_id}:{comp}:{hook}`	Per-project hook dedup
`hook_count:{comp}:{hook}`	`hook_count:{project_id}:{comp}:{hook}`	Per-project hook circuit
`hook_iterations:{origin}:{hook}`	`hook_iterations:{project_id}:{origin}:{hook}`	Per-project iteration
`hook_rate:{agent}:hourly`	`hook_rate:{project_id}:{agent}:hourly`	Per-project rate
`hook:trigger_graph:{agent}`	`hook:trigger_graph:{project_id}:{agent}`	Per-project cycle detect
`cost:agent:{agent}:hourly`	No change — cost is per-agent globally	Safety limit stays global
`cost:system:daily:{date}`	No change — system budget is global	Safety limit stays global
`guardian:session:{id}`	No change — session IDs are UUID-unique	Already isolated
`ge:health:*`	No change	System health is global

6.3 Stream Changes¶

GE has two agent categories with different isolation needs:

Team-bound agents (Urszula, Floris, Faye, Koen, etc.) — belong to one team, serve that team's clients only. Shared agents (Dima, Aimee, Anna, Antje, Alexander) — serve ALL teams. Also infra agents (Alex, Tjitte, Thijmen).

Current Stream	New Stream	Applies To	Rationale
`triggers.{agent}`	`triggers.{team}.{agent}`	Team-bound agents	Teams are client-bound. One stream per team per agent.
`triggers.{agent}`	`triggers.{agent}` (unchanged)	Shared agents	Shared agents keep simple stream names. Project isolation via payload (`project_id` field in message). Per-project streams were rejected because Redis XREADGROUP doesn't support wildcard subscriptions — the listener can't dynamically subscribe to new project streams at runtime.
`ge:work:incoming`	No change	All	Single intake, orchestrator routes by team/project. Data carries `client_id`/`project_id` (fixed in §0.2).
`ge:work:completed`	No change (XADD includes `project_id` + `client_id` — fixed in §0.2)	All	Fixed: `task-service.ts:updateTask()` now includes tenant fields.
`ge:events:*`	`ge:events:{team}:*`	Team events	Pub/sub channels scoped by team. Admin subscribes to all; client portal subscribes to their team only.
`dlq.{agent}`	`dlq.{team}.{agent}`	Team-bound	DLQ scoped by team.
`dlq.{agent}`	`dlq.{agent}` (unchanged)	Shared agents	Matches stream pattern.

Orchestrator routing logic for shared agents: 1. Task arrives at ge:work:incoming with project_id and agent_id 2. Orchestrator checks agent registry: is this a shared agent? 3. If shared: dispatch to triggers.{agent} (simple name, project_id in payload) 4. If team-bound: look up project → client → team, dispatch to triggers.{team}.{agent} 5. Executor subscribes to team-namespaced streams for team-bound agents, simple streams for shared agents

Shared agent isolation model: - Stream is shared (triggers.dima), but each message carries project_id in payload - Dedup key is project-scoped: exec_dedup:{project_id}:{work_item_id} - MAX_CONCURRENT_PER_AGENT = 1 globally (shared agents serialize across all projects) - Future: if shared agent concurrency is needed, implement message-level filtering in the listener

6.4 Rate Limiter + Circuit Breaker¶

Currently keyed by agent_id globally. Change to {team}:{agent_id}:

Rate limiter: self._windows[f"{team}:{agent_id}"] — Client A can't exhaust Client B's agent quota.
Circuit breaker: self._failures[f"{team}:{agent_id}"] — Client A's failures don't poison Client B's agent.
Parallel tracking: self.active_agents[f"{team}:{agent_id}"] — Same agent can run for different teams concurrently.

6.5 Orchestrator Concurrency¶

Setting	Current	New	Rationale
`MAX_CONCURRENT_PER_AGENT`	1 (global)	1 per team (so 3 teams = 3 concurrent for same agent)	Team isolation enables safe concurrency
`MAX_CONCURRENT_TOTAL`	5	15 (5 per team × 3 teams)	Scale with team count
`pending_queue`	1 global list	Per-team queues: `{team: [items]}`	Fair queuing across clients

7. Filesystem Isolation¶

7.1 Git Worktree per Project¶

/workspaces/
  {client_short_name}/
    {project_code}/              ← git worktree
      .git                       ← worktree link to main repo
      ge-ops/master/             ← symlink to shared agent configs (read-only)
      ge-ops/wiki/docs/company/  ← symlink to shared wiki sections (read-only)
      ge-ops/wiki/docs/development/  ← symlink (read-only)
      ge-ops/wiki/docs/methodologies/ ← symlink (read-only)
      ge-ops/wiki/docs/domains/  ← symlink (read-only)
      ge-ops/wiki/docs/clients/{own_client}/  ← MOUNT: own client data only
      config/                    ← symlink to shared config (read-only)
      src/                       ← project-specific code (read-write)
      completions/               ← project COMP files (read-write)

CRITICAL (from cross-validation scenario 5): The wiki clients/ directory lives under ge-ops/wiki/docs/. If ge-ops/ is symlinked as a whole, Client B's agent can read Client A's wiki pages at ge-ops/wiki/docs/clients/acme/. This bypasses the JIT injector's software boundary.

Solution: Do NOT symlink ge-ops/ as a monolith. Instead, symlink individual subdirectories: - ge-ops/master/ → shared agent configs, constitution, registry (read-only) - ge-ops/wiki/docs/company/, development/, methodologies/, domains/ → shared knowledge (read-only) - ge-ops/wiki/docs/clients/{own_client}/ → mount ONLY this client's data - ge-ops/wiki/docs/clients/ parent directory is NEVER exposed — no directory listing of other clients

This gives two layers of defense: 1. Filesystem boundary: Agent physically cannot access clients/other-client/ because it's not mounted 2. Software boundary: JIT injector filters by project_id as a second check

Setup: When the executor receives a task with project_id, it: 1. Looks up project.code and client.short_name from DB (cached) 2. If /workspaces/{client}/{project}/ doesn't exist, creates the worktree with selective symlinks (script: scripts/create-project-workspace.sh) 3. Sets cwd to the worktree path 4. The Claude/Codex CLI runs scoped to that directory

Per-project resources (read-write): - src/ — project code - completions/ — COMP files - Session logs (under worktree, not shared path)

7.2 Completion Files¶

Currently: ge-ops/system/completions/{agent}/COMP-*.md

New: /workspaces/{client}/{project}/completions/{agent}/COMP-*.md

Add project_id and client_id to COMP file YAML frontmatter for the sync cron.

7.3 LEARNINGS.md Race Condition¶

Currently: ge-ops/master/agent-configs/{agent}/LEARNINGS.md — shared, no locking, TOCTOU race.

Fix: Move agent learnings entirely to the DB (agent_learnings table, already exists with HOT/WARM/COLD tiers). The filesystem LEARNINGS.md becomes a read-only export for human inspection, regenerated periodically from DB. DB operations use ON CONFLICT DO UPDATE — no race condition.

7.4 Provider Native Config Files¶

Currently: AGENTS.md, .codex/config.toml, .gemini/settings.json written to shared work_dir.

Fix: Written to the per-project worktree directory. Each project gets its own config files. No cross-contamination.

7.5 Singleton Lock¶

Currently: acquire_agent_lock("urszula") blocks all Urszula tasks globally.

Fix: acquire_agent_lock(f"{team}:urszula") — locks per team. Urszula can run for Alfa and Bravo simultaneously, but not twice for the same team.

8. Wiki Brain Partitioning¶

8.1 Structure¶

The wiki brain becomes a partitioned knowledge store. GE-internal knowledge stays shared; client/project knowledge is isolated.

ge-ops/wiki/docs/
  company/                          ← GE identity, brand, strategy (SHARED)
  development/                      ← Standards, pitfalls, learnings (SHARED)
  methodologies/                    ← TDD, API-first, security-first (SHARED)
  domains/                          ← Domain expertise (SHARED)
  clients/                          ← CLIENT DATA (ISOLATED)
    {client_short_name}/
      profile.md                    ← Client description, contacts, relationship history
      learnings.md                  ← Client-wide patterns and preferences
      pitfalls.md                   ← Client-specific pitfalls
      {project_code}/
        briefing.md                 ← Project briefing, scope, requirements
        progress.md                 ← Sprint progress, milestones, status
        architecture.md             ← Technical decisions, stack, constraints
        communication/              ← Meeting notes, decisions, email summaries
          YYYY-MM-DD-{topic}.md
        contracts/                  ← SLAs, terms, pricing agreements
        subprojects/                ← Feature area breakdowns
          {feature}/
            spec.md
            progress.md
        learnings.md                ← Project-specific learnings
        pitfalls.md                 ← Project-specific pitfalls

8.2 Access Rules¶

Wiki Path	Who Can Read	Who Can Write
`company/`, `development/`, `methodologies/`, `domains/`	All agents, all sessions	GE-internal sessions only
`clients/{client}/`	Agents working for that client	Agents working for that client
`clients/{client}/{project}/`	Agents working on that project	Agents working on that project

MkDocs nav: Shows everything to Dirk-Jan (admin). Client portal (future) shows only clients/{their_client}/.

8.3 JIT Injection Model¶

When an agent boots for a task with project_id:

Shared knowledge — Load relevant pages from development/, methodologies/, domains/ based on task type
Client knowledge — Load clients/{client}/profile.md, clients/{client}/learnings.md, clients/{client}/pitfalls.md
Project knowledge — Load clients/{client}/{project}/briefing.md, clients/{client}/{project}/architecture.md, clients/{client}/{project}/learnings.md, clients/{client}/{project}/pitfalls.md
Never load another client's clients/ subtree

The context injector receives project_id → resolves client_short_name and project_code → constructs the wiki path filter.

8.4 Wiki Write Isolation¶

When an agent extracts a learning or writes a pitfall: - If it's a framework/methodology insight → write to development/learnings/ or development/pitfalls/ (shared) - If it's client-specific → write to clients/{client}/learnings.md or clients/{client}/pitfalls.md - If it's project-specific → write to clients/{client}/{project}/learnings.md

The learning extractor must classify the scope based on content. Default: project-specific (safest). Promotion to client-wide or global requires human review or cross-project recurrence.

9. Monitoring Agent Isolation¶

9.1 The Problem¶

Monitoring agents (Ron, Annegreet, Mira, Eltjo) are GE-internal. They inspect cross-project state by design — that's their job. But their outputs (alerts, incidents, escalations) need project attribution so the right team gets notified and the right context is loaded.

9.2 What Monitoring Agents Can See¶

Agent	Role	Cross-Project Access	Rationale
Ron	Guardian	YES — inspects all sessions	Must verify constitutional compliance across all work
Annegreet	QA	YES — reviews all completions	Must catch quality issues regardless of client
Mira	Incident Commander	YES — handles all escalations	Must triage incidents from any project
Eltjo	Watcher	YES — monitors all streams	Must detect anomalies system-wide

Rule: Monitoring agents have READ access to all projects but their OUTPUTS must carry project_id so alerts/incidents are attributed correctly.

9.3 Changes¶

Resource	Change
`alerts` table	`project_id` column (migration 0033) — monitoring agents set it when creating alerts
`audit_log` table	`project_id` column (migration 0034) — all audit entries carry project context
Escalation to Mira	`triggers.mira` messages must include `project_id` from the originating task
Guardian challenges	Challenge injection carries `project_id` so the guardian session log is attributable
Health checks	System-wide health stays global (not project-scoped) — this is infrastructure monitoring

9.4 Token Burn Prevention at Scale¶

Current safety limits stay global (not per-project) because they protect GE's infrastructure:

Limit	Stays Global?	Rationale
`$5/session`	YES	Protects against runaway individual sessions
`$10/agent/hour`	PER-PROJECT	Agent working on Project A shouldn't exhaust Project B's budget
`$100/day system`	YES	Overall infrastructure protection
`MAXLEN` on streams	YES	Redis memory protection
`HPA maxReplicas=5`	YES	Cluster resource protection
Circuit breaker (5 failures)	PER-TEAM (see §6.4)	One client's failures don't poison another

New per-project limit: Each project's project_budget (already exists in billing) becomes the authoritative spend cap. The cost gate checks both the global limit AND the project budget before allowing execution.

10. Knowledge Scoping¶

10.1 Three-Tier Learnings¶

Tier	`project_id`	`client_id`	Injected When	Example
Global	NULL	NULL	Always, for all sessions	"Use `ON CONFLICT DO UPDATE` for upserts"
Client-scoped	NULL	UUID	When working for that client (any project)	"Client A prefers Tailwind over Bootstrap"
Project-scoped	UUID	UUID	Only when working on that specific project	"Project X uses PostgreSQL 16 with RLS policies"

This requires adding client_id to agent_learnings as well (not just project_id):

-- Migration 0028 (revised): agent_learnings
ALTER TABLE agent_learnings ADD COLUMN client_id UUID REFERENCES clients(id);
ALTER TABLE agent_learnings ADD COLUMN project_id UUID REFERENCES projects(id);
CREATE INDEX idx_agent_learnings_client ON agent_learnings(client_id);
CREATE INDEX idx_agent_learnings_project ON agent_learnings(project_id);

When an agent boots for a task: 1. Load global HOT learnings (client_id IS NULL AND project_id IS NULL) 2. Load client-scoped HOT learnings (client_id = task's client_id AND project_id IS NULL) 3. Load project-specific HOT learnings (project_id = task's project_id) 4. Never load another client's or project's learnings

10.2 Learning Extraction¶

The learning_extractor.py must receive both client_id and project_id from the executor and pass them to the admin-ui API when creating learnings. The createLearning() function stores both.

Default scope: All extracted learnings start as project-scoped (safest default).

Promotion rules: - A learning that recurs across multiple projects for the same client → promote to client-scoped (project_id = NULL, client_id kept) - A learning that recurs across multiple clients → flag for human review, Dirk-Jan decides whether to promote to global - Never auto-promote to global — human gatekeeper required - Dirk-Jan can manually promote/demote via admin UI at any tier

10.3 Knowledge Synthesizer¶

Cross-session pattern detection must be scoped: - Within-project patterns: high confidence, auto-promoted to project learnings - Within-client patterns: medium confidence, auto-promoted to client learnings after 3+ recurrences - Cross-client patterns: flagged for human review before becoming global — never auto-promoted - The synthesizer queries must filter: WHERE client_id = ? for client analysis, no cross-client joins without explicit admin flag

11. Chat Isolation¶

9.1 Session Scoping¶

chatWithAgent() changes:

// Before (UNSAFE):
const history = await getChatHistory(agentId);

// After (ISOLATED):
const history = await getChatHistory(agentId, projectId);
// WHERE agent_id = ? AND project_id = ?

9.2 System Prompt¶

Add to every client-facing chat:

You are working exclusively for {client_name} on the project "{project_name}".
You have no knowledge of other clients, other projects, or GE's internal operations.
Never reference other clients, their projects, or any information not related to this project.

9.3 Message Storage¶

All new chat messages get project_id set. Old messages without project_id are only visible in admin mode.

12. Migration Execution Plan¶

Phase 1: Schema Only (no behavior change)¶

Run migrations 0020-0034
Update Drizzle schema files
All new columns are nullable with no defaults
Zero risk: existing code ignores the new columns

Phase 2: Write Path (populate new columns)¶

Update createTask() to write project_id
Update createLearning() to write project_id
Update ETF collector to pass project_id
Update chat to store project_id
Update COMP file writer to include project_id in frontmatter
Low risk: writes extra data, reads don't change yet

Phase 3: Read Path (filter by project)¶

Add projectId parameter to all service query functions
Admin UI passes NULL (sees everything)
Client portal passes their projectId (sees only their data)
Medium risk: must verify every query returns correct results

Phase 4: Redis Namespacing¶

Prefix keys with project_id
Change stream names to include team
Update rate limiter and circuit breaker keying
Medium risk: rolling deploy must handle both old and new key formats

Phase 5: Filesystem Isolation¶

Implement git worktree creation per project
Update executor to set per-project cwd
Move COMP files to per-project paths
Higher risk: changes execution environment, needs thorough testing

Phase 6: Concurrency Unlocks¶

Raise MAX_CONCURRENT_PER_AGENT to 1-per-team
Raise MAX_CONCURRENT_TOTAL to 15
Implement per-team fair queuing
Depends on phases 1-5 being stable

13. Isolation Tests¶

Each phase ships with tests that verify isolation:

// Test: cross-project task visibility
test('getTasks filters by project', async () => {
  await createTask({ title: 'A', projectId: projectA });
  await createTask({ title: 'B', projectId: projectB });

  const tasksA = await getTasks({ projectId: projectA });
  expect(tasksA.map(t => t.title)).toEqual(['A']);
  expect(tasksA.map(t => t.title)).not.toContain('B');
});

// Test: cross-project chat isolation
test('chat history is project-scoped', async () => {
  await chatWithAgent('urszula', projectA, 'Secret for A');
  const historyB = await getChatHistory('urszula', projectB);
  expect(historyB).toHaveLength(0);
});

// Test: cross-project learning isolation
test('learnings do not leak across projects', async () => {
  await createLearning({ agentId: 'koen', projectId: projectA, summary: 'Use RLS' });
  const learningsB = await getHotLearnings('koen', projectB);
  expect(learningsB.find(l => l.summary === 'Use RLS')).toBeUndefined();
});

14. Cross-Validation Results (2026-04-11)¶

Three independent verification methods were run against this design:

Method 1: Data Flow Trace¶

Traced 2 complete flows hop-by-hop through actual code: - Flow A: Client briefing → Dima intake → task creation → Redis → executor → COMP → learnings → ETF → completion → next agent (14 hops) - Flow B: Koen completion → COMP file → sync cron → completion stream → orchestrator → hooks → Marije trigger → execution (10 hops)

Result: Design covers every hop. Found 3 pre-existing bugs added as §0 prerequisites.

Method 2: Evil Client B Penetration Scenarios¶

12 attack scenarios tested:

Scenario	Blocked by Design?
B queries A's tasks via API	YES (S5.1 WHERE clause)
B's agent gets A's learnings via JIT	YES (S8.3 + S10.1)
B chats with shared agent, gets A's history	YES (S9)
B's agent greps filesystem for A's files	YES (S7.1 per-project worktree)
B's agent follows ge-ops symlink to A's wiki	FIXED (S7.1 amended — selective symlinks)
B sees A's task events on pub/sub	YES (S6.3 team-scoped channels)
B's hook chains to A's agent	YES (S6.3 project-scoped shared streams)
B views A's ETF scores	YES (S5.3 projectId filter)
B's HITL question in A's queue	YES (M0023 project_id)
B's learning auto-promoted, injected into A	YES (S10.2 human gatekeeper)
B reads Redis streams directly	Acceptable risk (infra-level)
B modifies AGENT-REGISTRY.json	YES (S7.1 read-only symlinks)

Result: 11/12 blocked by design. 1 acceptable infrastructure risk. 1 gap found and fixed (symlink amendment).

Method 3: Exhaustive Table Cross-Reference¶

All 45 database tables verified: - 14 tables covered by new migrations (0020-0034) - 16 tables already properly scoped - 10 tables are global config/internal (justified) - 5 tables are security layer with existing tenant_id

Result: 45/45 accounted for. Zero omissions.

Amendments Made from Cross-Validation¶

§0 added — 3 prerequisite bug fixes (field name mismatch, completion stream, auto-chain propagation)
§7.1 amended — Selective symlinks instead of monolithic ge-ops symlink (wiki clients/ isolation)
§6.3 amended — ge:work:completed does NOT already carry tenant data (corrected false assumption)

15. Open Questions¶

Should project_id be enforced NOT NULL on tasks immediately, or phased? Recommendation: phased. Allow NULL for GE-internal tasks that don't belong to a client project.
How do GE-internal tasks (maintenance, monitoring, agent profiling) get scoped? Recommendation: a reserved project ge-internal under a reserved client growing-europe. This keeps the schema consistent.
Do we need per-project Redis databases (SELECT 0, 1, 2...)? Recommendation: no. Key prefixing is sufficient and avoids connection pool complexity.
Wiki brain partitioning: The wiki becomes a dual-purpose store: shared GE knowledge (company/, development/, methodologies/, domains/) plus isolated client/project knowledge (clients/{client}/{project}/). See §8 for full structure. The JIT injector must enforce read boundaries per client/project.
Unification of clients ↔ client_companies: How are existing records matched? Recommendation: manual step by Dirk-Jan since there are few records. Add client_id FK to client_companies, populate by hand, then enforce NOT NULL.
Shared agent stream explosion: RESOLVED. Per-project streams for shared agents were rejected during implementation. Redis XREADGROUP doesn't support wildcard subscriptions, making dynamic per-project streams impractical. Shared agents use simple triggers.{agent} streams with project_id in the message payload. Isolation is enforced at the key/dedup level, not the stream level.
Wiki filesystem isolation enforcement: The wiki is a hostPath mount. All agents could technically read all files. The isolation is enforced at the JIT injector level (software boundary), not at the filesystem level (OS boundary). Is this sufficient? For GE-internal use where agents are trusted code, yes. For a future where agents run untrusted code, we'd need mount namespace isolation per project.
client_id on agent_learnings in addition to project_id: Needed for client-scoped learnings (applies to all projects for one client). This means migration 0028 adds two columns, not one. Schema impact is minimal but the query logic for getHotLearnings() becomes a 3-way OR: (client_id IS NULL AND project_id IS NULL) OR (client_id = ? AND project_id IS NULL) OR (project_id = ?).
Monitoring agent budget isolation: Monitoring agents (Ron, Mira, etc.) run on GE's own budget, not client budgets. Their cost gate should check the GE-internal budget only, never a client's project budget. Need a reserved ge-internal project for this.