Skip to content

Formal Specification Pitfalls

How to Use This Page

Each pitfall follows the same structure:

  • ANTI_PATTERN: What the mistake looks like
  • WHY IT HAPPENS: Root cause, especially in agentic context
  • DETECTION: How to catch it
  • FIX: How to correct it

RULE: These pitfalls apply to Anna's specification work, but all agents interacting with specs should know them.


Pitfall 1: Over-Specification — Constraining Implementation Unnecessarily

ANTI_PATTERN: The spec dictates HOW to implement, not just WHAT to implement.

# BAD: Specifies implementation details
behavior:
  summary: "Store invoices in a PostgreSQL table called 'invoices' using
    a B-tree index on the amount column, with a trigger that updates
    the modified_at column on every UPDATE"

# GOOD: Specifies behavior only
behavior:
  summary: "Persist invoices with O(log n) lookup by amount.
    Track modification timestamps automatically."

WHY IT HAPPENS: Anna (or any spec-writing agent) has been trained on codebases where implementation details are common in documentation. The line between "what" and "how" is genuinely difficult for LLMs.

DETECTION: CHECK: Does the spec mention any of these? - Specific database tables, columns, or indexes - Specific data structures (HashMap, Array, LinkedList) - Specific algorithms (quicksort, binary search) - Specific libraries or frameworks - Specific file paths or directory structures

IF: Yes — the spec is over-specified. These are implementation choices.

FIX: Replace implementation details with behavioral requirements:

Over-Specified Correctly Specified
"Use a HashMap for lookup" "Lookup must complete in O(1) amortized time"
"Store in PostgreSQL" "Data must be persisted durably with ACID guarantees"
"Use bcrypt with 12 rounds" "Password hashing must resist brute force for >= 10 years at current hardware"
"Use React Query for caching" "Data must be cached client-side with stale-while-revalidate semantics"

Exception: Platform Constraints

IF: The scope document explicitly constrains the implementation (e.g., "must integrate with existing PostgreSQL database") THEN: This is a legitimate constraint, not over-specification. Include it as a constraint, not a behavior.

# ACCEPTABLE: External constraint, not implementation choice
constraints:
  - "Must use existing PostgreSQL 15 database (client requirement)"
  - "Must expose REST API (existing mobile app depends on it)"

Pitfall 2: Under-Specification — Leaving Room for LLM Interpretation

ANTI_PATTERN: The spec is vague, uses ambiguous language, or omits behaviors that the LLM will have to invent.

# BAD: Under-specified
behavior:
  summary: "Handle invoice creation appropriately"
  post_conditions:
    - "Invoice is saved"
    - "User is notified"

# GOOD: Fully specified
behavior:
  summary: "Create a new invoice entity in draft status"
  post_conditions:
    - "Invoice persisted to database with status 'draft'"
    - "Invoice ID generated in format INV-{YYYYMMDD}-{seq}"
    - "Email notification sent to invoice.createdBy within 30 seconds"
    - "Audit log entry created with action 'invoice.created'"
    - "Response contains full invoice object with all fields populated"

WHY IT HAPPENS: Under-specification is the default failure mode. Writing precise specs is hard. It is always easier to write "handle appropriately" than to enumerate every behavior. LLMs, having been trained on vague requirements, reproduce this vagueness naturally.

DETECTION: CHECK: Does the spec use any of these words? - "appropriately" → WHAT is appropriate? Define it. - "reasonable" → WHAT is reasonable? Quantify it. - "as needed" → WHEN is it needed? Define the condition. - "etc." → List ALL items. No "etc." allowed. - "should" → Is this a MUST or a nice-to-have? Decide. - "may" → Under what conditions? Always? Never? Define. - "handle" (without specifics) → Handle HOW? Error? Retry? Ignore? Log? - "properly" → What constitutes "proper" behavior? Define it.

CHECK: For each operation, does the spec define: - What happens on success? (post-conditions) - What happens on every type of failure? (error conditions) - What the system state is during the operation? (invariants) - What must be true before the operation? (pre-conditions)

IF: Any of these are missing — the spec is under-specified.

FIX: The "5 Whys" for specifications: 1. "Handle errors" → Which errors? → "Database timeout, auth failure, validation error" 2. "Database timeout" → What happens? → "Return 503 with Retry-After header" 3. "Return 503" → What about partial state? → "Transaction rolled back, no side effects" 4. "No side effects" → Including async effects? → "Email queue entry removed if created" 5. "Email queue" → Is email part of the transaction? → "No — email is fire-and-forget with at-least-once delivery"

Each "why" eliminates one layer of ambiguity.


Pitfall 3: Spec Drift — Specification and Code Diverge

ANTI_PATTERN: The spec says one thing, the code does another, and nobody notices because the spec is treated as a one-time document.

Day 1: Spec says "maximum 100 line items per invoice"
Day 30: Developer increases limit to 500 for a client request
Day 60: New developer reads spec, implements 100-item limit in new feature
Day 90: Bug report — inconsistent limits across features

WHY IT HAPPENS: Specs are written once and forgotten. Code evolves continuously. Without enforcement, they diverge.

DETECTION: CHECK: Does the spec version match the implementation version? - Jasper checks this during reconciliation (Stage 6) - Aydan validates specs against codebase - If a developer changes behavior, the spec MUST be updated

FIX: Enforce spec-code coupling through process:

RULE: A developer MUST NOT change behavior without a spec change request. RULE: Behavior changes flow: developer → Antje → Anna → Aimee (if scope change needed) → back down

Developer discovers limit should be 500, not 100:
  → Developer files spec change request to Anna
  → Anna evaluates: does this require scope change?
    IF: Yes → Anna escalates to Aimee → Aimee confirms with client
    IF: No → Anna revises spec with new limit
  → Antje updates tests to reflect new limit
  → Developer implements the change
  → Koen verifies tests pass with new limit

ANTI_PATTERN: Developer changes code first, then "updates" the spec to match. FIX: The spec leads. Code follows. Never the reverse.

Automated Drift Detection

Jasper can detect drift by comparing: - Spec-defined boundaries vs code-defined boundaries (constants, validation rules) - Spec-defined state machines vs code-defined state machines - Spec-defined error responses vs actual error responses

RULE: Drift detection runs on every deployment. Deployment is blocked if spec-code mismatch is detected.


Pitfall 4: Spec as Documentation Only — Not Machine-Actionable

ANTI_PATTERN: The spec is written in prose and cannot be automatically consumed by Antje or other tools.

# BAD: Prose that requires human interpretation
behavior:
  summary: "The invoice system should handle various edge cases
    including but not limited to empty line items, very large amounts,
    and international characters in descriptions. The system should
    be resilient to these scenarios and provide meaningful feedback
    to the user."

# GOOD: Structured, enumerable, machine-actionable
edge_cases:
  - case: "empty line items array"
    input: { lineItems: [] }
    expected: "400 with error 'At least one line item required'"
  - case: "amount exceeding maximum"
    input: { lineItems: [{ amount: 1000000 }] }
    expected: "400 with error 'Amount exceeds maximum 999999.99'"
  - case: "unicode in description"
    input: { lineItems: [{ description: "Factuur voor Jan-Willem de Vries (NL)" }] }
    expected: "201 with description preserved exactly"

WHY IT HAPPENS: Spec writers (human or AI) default to natural language because it is easier to produce. Structured YAML requires more thought and discipline.

DETECTION: CHECK: Can Antje programmatically iterate over the spec elements? IF: Yes — each invariant, edge case, and condition is a discrete object → GOOD IF: No — spec is prose that requires interpretation → BAD

FIX: Every spec element must be: 1. Discrete — One element per block, not bundled in paragraphs 2. Enumerable — Can be listed, counted, and iterated 3. Testable — Contains enough information to write a test (input + expected output) 4. Identifiable — Has a unique ID for traceability (INV-1, EC-3, PC-2)


Pitfall 5: Gold Plating the Spec

ANTI_PATTERN: The spec includes features or behaviors that were not requested by the client and are not in the scope.

# BAD: Gold plating — client did not ask for AI-powered suggestions
post_conditions:
  - "Invoice created successfully"
  - "AI-powered description suggestions offered to user"  # NOT IN SCOPE
  - "Predictive amount calculation based on history"       # NOT IN SCOPE

WHY IT HAPPENS: LLMs are trained on feature-rich codebases and naturally suggest enhancements. Anna may add spec elements that seem useful but are not requested.

DETECTION: CHECK: Can every spec element be traced to a scope element? IF: No → the element is gold plating. Remove it.

FIX: Anna's lineage tracking enforces this. Every spec element must have a derived_from reference to the scope document. Orphaned elements are flagged.

RULE: Build what the client asked for. Nothing more. Enhancements go through the full scoping process (Aimee → client approval → Anna).


Pitfall 6: Circular Definitions

ANTI_PATTERN: Spec elements that define themselves in terms of themselves.

# BAD: Circular
invariants:
  - statement: "The total is correct"
    rationale: "Correctness is required"
pre_conditions:
  - "Data is valid"
post_conditions:
  - "Data remains valid"

WHY IT HAPPENS: LLMs generate plausible-sounding tautologies. "The total is correct" sounds like a specification but specifies nothing.

DETECTION: CHECK: For each spec element, remove the element. Does anything change? IF: Removing the element loses no information → it is circular or tautological. Remove it.

FIX: Replace circular definitions with concrete, falsifiable statements:

Circular Concrete
"Total is correct" "total == SUM(lineItems[i].amount * lineItems[i].quantity) for all i"
"Data is valid" "All required fields are present AND all types match schema AND all values within defined bounds"
"System is responsive" "p95 response time < 200ms for list operations with < 100 results"

Pitfall 7: Missing Concurrency Specification

ANTI_PATTERN: The spec defines behavior for a single user/request but says nothing about concurrent access.

WHY IT HAPPENS: Concurrency is invisible in functional specifications. Scope documents describe single-user workflows. LLMs rarely add concurrency considerations unprompted.

DETECTION: CHECK: For each write operation, does the spec define: - What happens when two users modify the same entity simultaneously? - What happens when the same user sends the same request twice? - What isolation level is required for database operations?

IF: No concurrency specification exists — this is a gap.

FIX: Anna must explicitly address concurrency for every write operation:

concurrency:
  optimistic_locking: true
  conflict_resolution: "Return 409 with current version; client must re-read and retry"
  idempotency:
    mechanism: "Idempotency-Key header"
    window: "24 hours"
    behavior: "Return existing result for duplicate key"

Pitfall 8: Ignoring Non-Functional Requirements

ANTI_PATTERN: The spec covers functional behavior but ignores performance, security, and observability.

WHY IT HAPPENS: Functional behavior is what the client asks for. Non-functional requirements are often assumed. LLMs follow the explicit request and skip the implicit expectations.

DETECTION: CHECK: Does the spec include: - Performance bounds (response time, throughput)? - Security constraints (authentication, authorization, input sanitization)? - Observability requirements (logging, metrics, tracing)? - Data retention and privacy requirements?

FIX: Anna's spec template includes a non-functional section. It is not optional:

non_functional:
  performance:
    response_time_p95: "200ms"
    throughput: "100 requests/second"
  security:
    authentication: "required"
    authorization: "invoice.create permission"
    input_sanitization: "All string inputs sanitized against XSS"
  observability:
    logging: "Structured JSON logs for all operations"
    metrics: "Request count, latency histogram, error rate"
  data:
    retention: "7 years (Dutch fiscal requirement)"
    privacy: "PII encrypted at rest"

Pitfall 9: Specifying UI Layout Instead of Behavior

ANTI_PATTERN: The spec describes how the UI looks instead of what it does.

# BAD: Layout specification
behavior:
  summary: "The invoice form has a header with the company logo on the left,
    the invoice number on the right, below that a table with 4 columns..."

# GOOD: Behavioral specification
behavior:
  summary: "The invoice form collects line items and calculates totals"
  user_interactions:
    - action: "Add line item"
      result: "New row appears with description, quantity, amount fields"
    - action: "Submit form"
      pre_condition: "At least one line item with valid amount"
      result: "Invoice created via POST /api/invoices, user redirected to invoice detail"

WHY IT HAPPENS: Scope documents often include wireframes or layout descriptions. Anna may transcribe these into the spec instead of extracting behavior.

DETECTION: CHECK: Does the spec reference visual elements (colors, positions, sizes)? IF: Yes → these are design decisions, not behavioral specifications. Remove.

FIX: Specs define what the user CAN DO, not what they SEE. Visual design is the domain of the design team, not the specification.


Pitfall Summary

# Pitfall Severity Who Catches It
1 Over-specification HIGH Aydan (validates against implementation options)
2 Under-specification CRITICAL Antje (cannot generate tests)
3 Spec drift HIGH Jasper (reconciliation), Aydan (validation)
4 Documentation-only spec HIGH Antje (cannot programmatically parse)
5 Gold plating MEDIUM Aimee (scope comparison)
6 Circular definitions MEDIUM Anna (self-review), Antje (untestable)
7 Missing concurrency HIGH Marije (integration testing reveals races)
8 Missing non-functional HIGH Koen (performance gates), Marije (load testing)
9 UI layout in spec LOW Design team, Antje (cannot test layout from spec)

Decision Tree: Is My Spec Ready?

CHECK: Does every element have a unique ID?
  IF: No → Add IDs

CHECK: Does every element trace to a scope element?
  IF: No → Remove (gold plating) or report gap to Aimee

CHECK: Can every element be tested automatically?
  IF: No → Revise until testable

CHECK: Does the spec avoid implementation details?
  IF: No → Replace with behavioral requirements

CHECK: Are all ambiguous words eliminated?
  IF: No → Replace with specific, measurable terms

CHECK: Is concurrency addressed for all write operations?
  IF: No → Add concurrency section

CHECK: Are non-functional requirements included?
  IF: No → Add performance, security, observability sections

CHECK: Does the spec define error behavior for every operation?
  IF: No → Add error conditions

IF all checks pass → PUBLISH
IF any check fails → REVISE