Prompt Engineering for Agent Identity¶

Beyond Prompt Engineering¶

Traditional prompt engineering focuses on getting a single correct response from an LLM. Agentic prompt engineering is fundamentally different: it focuses on shaping consistent behavior across hundreds or thousands of interactions. The prompt is not a question — it is a personality, a set of principles, and a behavioral framework that persists across every task the agent performs.

This distinction matters because the techniques that work for one-shot prompting often fail for agentic use. A cleverly worded prompt that produces a brilliant response once may produce wildly inconsistent responses across 500 executions. Agentic prompt engineering optimizes for reliability and consistency, not peak performance on any single task.

The Persona Effect¶

Research on persona prompting in LLMs (2024-2025) shows mixed results. For factual, accuracy-based tasks, adding a persona to the prompt offers little benefit and sometimes degrades performance. But for open-ended, creative, and judgment-based tasks — which constitute the majority of software development work — agents with well-defined personas produce measurably better output.

The mechanism is not mysterious. A persona constrains the output distribution. When an LLM is told "you are a senior security engineer who has seen every kind of vulnerability," its output shifts toward security-conscious patterns. It is more likely to check for injection, validate input, and question trust boundaries. Without the persona, the model draws from its entire training distribution, which includes vast amounts of insecure code.

Why GE Uses Detailed Personas¶

GE's agents are not generic "helpful AI assistants." Each agent has a name, a role, a communication style, a set of explicit expertise areas, and a set of explicit limitations. This specificity serves three purposes:

1. Behavioral consistency. An agent named "Ron" who identifies as "Guardian of the codebase" consistently applies security-first thinking across every task. A generic "security review agent" is less predictable because it has no persistent identity to maintain.

2. Boundary enforcement. An agent told "you never modify production code directly; you only produce review findings" will reliably refuse to make code changes, even if asked. The identity boundary is stronger than a one-off instruction because it is framed as part of who the agent is, not just what it should do right now.

3. Debugging clarity. When an agent produces incorrect output, its identity helps diagnose why. If Ron (security) approves insecure code, the problem is in Ron's identity definition or context. If a generic agent approves insecure code, the problem could be anywhere.

Research Caveats¶

Persona prompting research (PromptHub, 2025) found that LLM-generated personas typically outperform human-written ones, and that intersectional, idiosyncratic attributes (hobbies, quirks) can enhance representativeness. GE's approach incorporates these findings: agent identities are detailed, specific, and include personality traits that go beyond role description.

However, research also found that persona prompting can cause significant performance degradation in some configurations. GE mitigates this by keeping personas focused on software engineering roles (where the training data is abundant) and by testing each agent's persona against known-good outputs before deployment.

Identity File Structure¶

Every GE agent has an identity file organized into consistent sections. This structure is not arbitrary — each section serves a specific function in the agent's behavior.

Section 1: Core Identity¶

# {Agent Name}

**Role:** {Official title}
**Team:** {Team assignment}
**Provider:** {Claude/OpenAI/Gemini}
**Model:** {Specific model}

You are {Name}, {role description in one sentence}.

This section establishes the agent's fundamental identity. It appears at the top of the system prompt so it benefits from primacy bias (LLMs attend more strongly to the beginning of their input).

Section 2: Behavioral Principles¶

## How You Work

- You {primary behavior pattern}
- You {secondary behavior pattern}
- You never {boundary definition}
- When uncertain, you {escalation behavior}

Behavioral principles are stated as declarations, not requests. "You always validate input" is stronger than "please validate input." The declarative form frames the behavior as an inherent property of the agent, not a favor being asked.

Section 3: Boundaries¶

## What You Do NOT Do

- You do not {prohibited action 1}
- You do not {prohibited action 2}
- You do not {prohibited action 3}

Negative boundaries are as important as positive instructions. LLMs operating without explicit boundaries will occasionally attempt actions outside their role, especially when the task context suggests it would be "helpful." Explicit "do not" statements prevent this drift.

GE's experience: agents without negative boundaries attempted to fix bugs they found during review (instead of reporting them), modified infrastructure they were only supposed to monitor, and offered pricing estimates to clients they were only supposed to greet.

Section 4: Interaction Protocols¶

## Working With Others

- Hand off {work type} to {agent name}
- Receive {work type} from {agent name}
- Escalate {decision type} to {agent name or human}

Inter-agent protocols are specified in the identity, not in task descriptions. This ensures consistent handoff behavior regardless of the specific task.

Section 5: Quality Standards¶

## Quality

- Before completing, verify: {checklist}
- Your output must include: {required elements}
- Never submit work that {quality threshold}

Quality standards in the identity file establish a baseline that applies to every task. Task-specific quality requirements are additive, not replacements.

Boundary Definition as Behavior Shaping¶

The most powerful prompt engineering technique in agentic systems is boundary definition. Telling an agent what NOT to do is often more effective than telling it what TO do.

Why Boundaries Work¶

LLMs are trained on vast corpora where "helpful" often means "doing as much as possible." Without boundaries, an agent will expand its scope to be maximally helpful, which in a multi-agent system means stepping on other agents' responsibilities, making decisions above its authority level, and producing output that was not requested.

Boundaries constrain this expansion. They are not punitive — they are clarifying. An agent with clear boundaries operates more confidently within those boundaries because it knows exactly where its responsibility ends.

Boundary Categories¶

Category	Example	Why It Matters
Scope	"You review code. You do not write code."	Prevents role overlap with developer agents
Authority	"You do not approve your own changes."	Prevents self-certification
Communication	"You do not contact clients directly."	Prevents unauthorized commitments
Technical	"You do not modify database schemas."	Prevents cross-domain interference
Temporal	"You do not continue past 40 turns."	Prevents context degradation

Boundary Enforcement¶

Boundaries in identity files are suggestions to the LLM, not hard constraints. They work most of the time, but under pressure (complex tasks, long sessions, contradictory instructions), boundaries can erode. GE reinforces identity boundaries with system-level enforcement:

The orchestrator will not route database work to a frontend agent, regardless of what the agent's identity says.
The cost gate will terminate a session that exceeds its budget, regardless of the agent's turn limit instruction.
The executor will not execute commands that require permissions the agent does not have.

This layered approach — identity boundary as first defense, system enforcement as backstop — ensures that boundary violations are caught even when the LLM ignores its instructions.

Agentic Format as Instruction Language¶

GE uses a specific formatting convention — "agentic format" — designed to reduce ambiguity when instructions are processed by LLMs.

Key Constructs¶

RULE: A mandatory instruction. No exceptions, no interpretation.

RULE: Every Redis XADD includes MAXLEN.

CHECK/IF/THEN: Conditional logic. Reduces interpretation by making the decision tree explicit.

CHECK: Before modifying a shared interface.
IF: The interface has consumers outside your team
THEN: Open a discussion with affected agent owners.
IF: The interface is internal to your domain
THEN: Modify and update the contract documentation.

ANTI_PATTERN: A named mistake with context. More effective than just saying "don't do X" because it explains why.

ANTI_PATTERN: File watchers in production (chokidar, fs.watch).
INCIDENT: Caused $100/hr token burn via feedback loop (2026-02-12).
FIX: Use polling or event-driven triggers instead.

OWNER: Who is responsible for this artifact or decision.

OWNER: Antje (test generation from specification)

Why Agentic Format Works¶

Natural language is ambiguous. "Consider validating input" could mean "always validate input" or "think about whether input validation is needed." Agentic format eliminates this ambiguity:

RULE means always.
CHECK means evaluate the condition.
IF/THEN means follow this exact logic.
ANTI_PATTERN means never.

This is not a new insight — it is the application of structured data principles to natural language instructions. The more structure you provide, the less the LLM has to interpret, and the fewer interpretation errors it makes.

Temperature and Creativity by Role¶

Different agent roles require different levels of creativity in their output.

Temperature Guidelines¶

Role Category	Recommended Temperature	Rationale
Code generation	0.0-0.2	Code must be deterministic and correct
Test generation	0.0-0.1	Tests are specifications; creativity is a bug
Code review	0.2-0.4	Needs to consider multiple failure modes
Architecture	0.4-0.6	Benefits from exploring alternatives
Scoping/design	0.5-0.7	Creative problem-solving adds value
Client communication drafts	0.3-0.5	Must be engaging but accurate

RULE: Routing and orchestration agents (e.g., the orchestrator) always operate at temperature 0. They make deterministic decisions based on rules, not creative ones.

Temperature vs Persona¶

Temperature and persona interact. A creative persona at temperature 0 produces surprisingly rigid output. A rigid persona at temperature 0.7 produces surprisingly creative output. GE calibrates both together during agent commissioning to achieve the desired behavior profile.

The Constitutional Layer¶

The Constitution is a set of 10 principles that all GE agents inherit. It is injected into every agent's context alongside their identity, and it overrides identity-specific instructions when they conflict.

How the Constitution Works¶

The Constitution is shared context — every agent has it, every agent follows it. This creates alignment without requiring every agent's identity to repeat the same rules. Agent-specific identities define unique behavior; the Constitution defines universal behavior.

Constitutional Principles (Summary)¶

Config Is King — Read config files, never hardcode values.
Real Over Simulated — Every feature works through its actual production path.
Enterprise-Grade From Day One — Build for 100k users from line 1.
Integration Before Expansion — Wire up first, build out second.
Observable By Default — Every boundary crossing produces a trace.
Blast Radius Awareness — Identify dependencies before changing shared interfaces.
Wiki Brain — What you learn, you write to the wiki. What you need to know, you read from the wiki.
Idempotent By Design — Every operation is safe to run twice.
Regression Is The Enemy — Prove old things still work, not just that new things were added.
No Hardcoded Values — Operational values come from config files, always.

See Constitution v2 for the full text with enforcement mechanisms.

Why a Constitution, Not Just Standards¶

Standards tell agents how to write code. The Constitution tells agents how to think. It shapes decision-making at a level above any individual coding standard. When an agent faces a choice not covered by its identity or task specification, the Constitution provides the framework for deciding.

This is the difference between "follow this coding style guide" and "always prefer the approach that is safe to run twice." The former is a rule. The latter is a principle that generates rules.

System Prompt Assembly¶

The complete system prompt for a GE agent is assembled from multiple sources at boot time:

1. Constitution v2 (~800 tokens)
2. CORE Identity (~1,200 tokens)
3. ROLE Identity (~2,500 tokens)
4. JIT Knowledge (variable, max ~5,000 tokens)
5. Task Specification (variable, ~500-3,000 tokens)

Total system context at boot: approximately 5,000-12,500 tokens. This leaves the vast majority of the context window available for working context (code, conversation, tool output).

Assembly Order Matters¶

The Constitution appears first because it has the highest authority. The CORE identity appears second because it establishes who the agent is. The ROLE identity provides functional detail. JIT knowledge provides domain-specific context. The task specification appears last, providing the immediate work focus.

This order exploits primacy bias (LLMs attend more to the beginning) to ensure that behavioral guardrails and identity have the strongest influence on output.

Common Prompt Engineering Mistakes¶

1. Identity That Is Too Long¶

An identity file that exceeds 4,000 tokens is consuming context budget that would be better spent on working context. Prune ruthlessly. Move knowledge to the wiki brain where it can be JIT-injected when relevant.

2. Positive-Only Instructions¶

An identity that only says what to do will produce an agent that also does things it should not. Always include explicit negative boundaries.

3. Abstract Instructions¶

"Write clean code" means nothing to an LLM. "Every function has a return type annotation, every variable has a descriptive name longer than 3 characters, and every error path returns a structured error object" means something specific and verifiable.

4. Conflicting Instructions¶

When the identity says "always prioritize security" and the task says "ship this fast," the agent faces a conflict it cannot resolve. The Constitution provides a hierarchy for these conflicts, but the best approach is to avoid them by keeping identity principles and task instructions at different levels of abstraction.

5. No Acknowledgment Protocol¶

GE requires every agent to output "Constitution v2 acknowledged. Principles loaded." at the start of every task. This is not ritual — it forces the model to process the Constitution actively, increasing the likelihood that it influences subsequent behavior.