Skip to content

DOMAIN:CREATIVE — CREATIVE_ACCEPTANCE_CRITERIA

OWNER: leah ALSO_USED_BY: anna (spec authoring), antje (TDD from spec), margot (design production), boris (visual production), jasper (test reconciliation), faye (team alfa PM), sytske (team bravo PM) UPDATED: 2026-03-28 SCOPE: defining, measuring, and verifying acceptance criteria for creative deliverables — the "when is it done?" framework


PURPOSE

"Looks good" is not an acceptance criterion. "I like it" is not an acceptance criterion.

Creative acceptance criteria must be: - Specific: references exact values, not vague descriptions - Measurable: can be verified with a binary pass/fail outcome - Traceable: links back to a requirement (see creative-traceability.md) - Testable: another agent (or tool) can independently verify the claim

Leah uses acceptance criteria to determine: is this deliverable DONE? Not "good enough" — DONE.


THE PROBLEM WITH SUBJECTIVE ACCEPTANCE

HOW CREATIVE WORK GETS STUCK

DESIGNER: "Here's the homepage design."
PM: "Hmm, can you make it pop more?"
DESIGNER: [changes something]
PM: "Better, but now it feels too busy."
DESIGNER: [changes something else]
PM: "Almost... maybe try a different blue?"
[repeat 12 times]

This happens because there are no measurable acceptance criteria. Nobody can agree on "done" because "done" was never defined.

HOW MEASURABLE ACCEPTANCE FIXES THIS

ACCEPTANCE_CRITERIA:
  AC-001: Primary CTA uses hex #FF6B2B as background color
  AC-002: CTA text meets WCAG AA contrast (4.5:1 minimum against #FF6B2B)
  AC-003: Font family is SF Pro for all body text
  AC-004: Hero image is minimum 1920x1080px, WebP format
  AC-005: Page loads under 3 seconds on 4G connection
  AC-006: Layout is single-column below 768px breakpoint
  AC-007: All interactive elements have hover, active, focus, and disabled states

DESIGNER: "Here's the homepage design."
LEAH: [runs acceptance criteria]
  AC-001: PASS (#FF6B2B verified)
  AC-002: PASS (contrast ratio 4.7:1)
  AC-003: PASS (SF Pro confirmed)
  AC-004: PASS (2560x1440, WebP)
  AC-005: PASS (2.1s on simulated 4G)
  AC-006: PASS (verified at 768px)
  AC-007: FAIL — disabled state missing on secondary CTA
LEAH: "One criterion not met. AC-007: disabled state missing on secondary CTA."

No opinions. No "make it pop." One specific, actionable item to fix.


WRITING CREATIVE ACCEPTANCE CRITERIA

THE FORMULA

Every acceptance criterion follows this pattern:

[ELEMENT] [VERB] [MEASURABLE_VALUE] [CONTEXT/CONDITION]

GOOD examples: - "Primary heading uses font-size 48px at 1440px viewport width" - "Brand logo has minimum 24px clear space on all sides" - "Color palette contains exactly 5 colors matching hex values in brand-tokens.json" - "Hero section occupies 100vh on initial load at all breakpoints" - "Button border-radius is 8px for all button variants" - "Body text line-height is 1.5 relative to font-size" - "All images include alt text between 5-125 characters"

BAD examples (and why): - "Design looks professional" — not measurable - "Colors feel right" — not measurable - "Layout is clean" — not measurable - "Typography is readable" — measurable only if you specify size, contrast, and line-height - "It should work on mobile" — which breakpoints? what behavior?

ACCEPTANCE CRITERIA BY DELIVERABLE TYPE

WEB DESIGN

Category Example Criteria
Layout "Content area max-width is 1200px, centered horizontally"
Responsive "Navigation collapses to hamburger menu below 768px"
Typography "Body text is 16px/1.5 Inter Regular on all viewports"
Color "All background/text combinations meet WCAG AA (4.5:1)"
Performance "Largest Contentful Paint under 2.5 seconds"
States "All form inputs have empty, filled, focused, error, and disabled states"
Animation "Page transitions use 300ms ease-in-out"

iOS APP DESIGN

Category Example Criteria
Platform "Navigation uses UINavigationController standard back gesture"
Typography "All text uses SF Pro; supports Dynamic Type from xSmall to AX5"
Layout "All layouts use Auto Layout; no fixed dimensions except icons"
Touch "All tap targets are minimum 44x44pt"
Safe areas "Content respects safe area insets on all supported devices"
Dark mode "All screens have dark mode variants using semantic system colors"
Accessibility "VoiceOver can navigate all screens with logical reading order"

BRAND IDENTITY DELIVERABLE

Category Example Criteria
Logo "Logo provided in SVG, PNG (1x, 2x, 3x), and EPS formats"
Color "Palette includes primary, secondary, accent, success, warning, error, neutral scale"
Typography "Type scale defines 7 levels (display, h1-h4, body, caption) with exact sizes"
Guidelines "Usage rules cover minimum size, clear space, forbidden modifications, co-branding"
Completeness "All deliverables listed in scope document are present and match spec"

MARKETING COLLATERAL

Category Example Criteria
Dimensions "Social media assets provided at 1200x630 (OG), 1080x1080 (IG), 1200x675 (Twitter)"
Bleed "Print materials include 3mm bleed on all sides"
Resolution "Print assets are minimum 300 DPI; digital assets are 72 DPI"
Color mode "Print assets use CMYK color mode; digital assets use sRGB"
File format "Print: PDF/X-1a; Digital: PNG or WebP"

DEFINITION OF DESIGN DONE

Acceptance criteria are per-deliverable. Definition of Done (DoD) applies to ALL creative work.

GE CREATIVE DEFINITION OF DONE

A creative deliverable is DONE when ALL of the following are true:

  1. All acceptance criteria pass — zero failures, not "mostly passing"
  2. Brand compliance score >= 95 (see brand-compliance.md scoring model)
  3. Traceability complete — every element traces to a requirement, every requirement traces to an element (see creative-traceability.md)
  4. No contradictions — no conflicts between brief, strategy, spec, and deliverable
  5. Accessibility verified — WCAG AA minimum, AAA target for text
  6. Cross-deliverable consistency verified (if multi-deliverable project)
  7. Assets exported — all formats, sizes, and naming per spec
  8. Handoff documentation complete — design decisions documented with rationale

CREATIVE DOD vs DEVELOPMENT DOD

Aspect Creative DoD Development DoD
Primary gate Brand compliance + brief alignment Tests pass + code review
Verification Visual comparison + spec measurement Automated test suite
Subjectivity Higher — some checks require anchored judgment Lower — most checks are binary
Iteration Expected (creative exploration) Minimized (TDD drives implementation)
Handoff Design → Development (specs, tokens, assets) Development → Deployment (PR, CI/CD)
Regression risk Style drift over iterations Code regression over changes

KEY_DIFFERENCE: Creative acceptance criteria must account for the inherent subjectivity of visual work by anchoring every subjective assessment to a documented objective.


SELF-EVALUATION FRAMEWORK FOR LEAH

Inspired by Anthropic's approach to task evaluation — structured self-assessment to catch bias and blind spots.

PRE-REVIEW CALIBRATION

Before reviewing any deliverable, Leah asks herself:

  1. Do I have the current spec? (Not a remembered version — the actual document)
  2. Do I have the brand guidelines? (Current version, not an older iteration)
  3. Do I have the creative brief? (The client's words, not a summary)
  4. Do I have the acceptance criteria? (Written before the work started)
  5. Am I checking against docs or against my expectations? (Docs only)

If ANY answer is "no" — STOP. Get the missing document before proceeding.

DURING-REVIEW BIAS CHECKS

Bias Detection Question Mitigation
Anchoring "Am I judging this against the last deliverable I reviewed, or against the spec?" Re-read spec before each review
Confirmation "Am I looking for evidence that it passes, or checking if it fails?" Start with most-likely-to-fail criteria first
Recency "Am I weighting recent feedback more than the original brief?" Always start from the brief, not from revision notes
Authority "Am I accepting this because a senior designer made it?" Apply same criteria regardless of who made it
Halo effect "Is the overall impression affecting my assessment of individual criteria?" Score each criterion independently before overall score
Familiarity "Am I passing this because it looks like something that worked before?" Verify against THIS project's spec, not past projects

POST-REVIEW VERIFICATION

After completing a review, Leah checks:

  1. Every criterion has a PASS or FAIL — no "maybe" or "mostly"
  2. Every FAIL has a specific reference — guideline section, hex value, dimension
  3. Every FAIL has an actionable fix — not "make it better" but "change X to Y"
  4. No new requirements introduced — Leah validates, she does not scope
  5. Overall score matches individual scores — no gut-feel override of systematic assessment

ACCEPTANCE CRITERIA LIFECYCLE

1. ANNA writes spec → includes acceptance criteria (AC-xxx)
   |
2. ANTJE validates criteria are testable → flags vague criteria
   |
3. DESIGNER receives spec with criteria → knows exactly what "done" means
   |
4. DESIGNER delivers → self-checks against criteria before handoff
   |
5. LEAH receives deliverable → runs criteria systematically
   |
   5a. ALL_PASS → mark DONE, generate compliance report
   5b. ANY_FAIL → return with specific failures, reference criteria IDs
   |
6. DESIGNER fixes specific failures → resubmits
   |
7. LEAH re-runs ONLY failed criteria (not full re-review)
   |
8. LOOP until ALL_PASS or ESCALATE (if designer disputes a criterion)

CRITERIA DISPUTES

If a designer disagrees with a criterion: - The criterion may be wrong (spec error) — escalate to Anna - The criterion may be infeasible (technical constraint) — escalate to Anna with evidence - The designer may want a waiver (intentional deviation) — escalate to PM for client approval - Leah NEVER waives a criterion herself — she enforces, she does not negotiate


MEASURING QA EFFECTIVENESS

Track these metrics to evaluate whether the acceptance criteria system is working:

Metric Target What It Reveals
First-pass approval rate >80% If low: criteria unclear, or designers not self-checking
Average criteria per deliverable 10-25 If low: criteria too vague or incomplete. If high: over-specified
Disputes per project <2 If high: criteria quality issue or spec-design misalignment
Time from submission to approval <4 hours If high: too many failures, or Leah is a bottleneck
Criteria FAIL rate by category Even distribution If one category dominates: training gap or spec gap
Rework cycles per deliverable <=2 If high: criteria not clear enough, or design fundamentals issue
False positive rate <5% Leah flagging things that are actually correct — re-calibrate
False negative rate 0% target Leah missing actual failures — most dangerous, monitor via client feedback

TEMPLATES

ACCEPTANCE CRITERIA TEMPLATE (For Anna to fill per deliverable)

DELIVERABLE: [name]
PROJECT: [project name]
CLIENT: [client name]
SPEC_VERSION: [version]
DATE: [date]

ACCEPTANCE_CRITERIA:

  ## Visual
  AC-V001: [criterion]
  AC-V002: [criterion]
  ...

  ## Typography
  AC-T001: [criterion]
  AC-T002: [criterion]
  ...

  ## Layout
  AC-L001: [criterion]
  AC-L002: [criterion]
  ...

  ## Responsive
  AC-R001: [criterion]
  AC-R002: [criterion]
  ...

  ## Accessibility
  AC-A001: [criterion]
  AC-A002: [criterion]
  ...

  ## Content
  AC-C001: [criterion]
  AC-C002: [criterion]
  ...

  ## Assets/Export
  AC-E001: [criterion]
  AC-E002: [criterion]
  ...

QA REPORT TEMPLATE (For Leah to fill per review)

QA_REPORT
=========
Deliverable: [name]
Reviewed by: leah
Date: [date]
Spec version: [version]

RESULTS:
  TOTAL_CRITERIA: [N]
  PASSED: [N]
  FAILED: [N]
  BLOCKED: [N] (missing info to evaluate)

  PASS_RATE: [X]%
  BRAND_COMPLIANCE_SCORE: [X]/100

FAILURES:
  [AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]
  [AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]

BLOCKED:
  [AC-xxx]: BLOCKED — [what's missing] — need: [info/document]

VERDICT: [APPROVED / RETURN_FOR_FIXES / ESCALATE]

NOTES: [any additional observations, not introducing new requirements]

ANTI-PATTERNS

Anti-Pattern Why It's Wrong What to Do Instead
"Looks good to me" approval Not a criterion, not measurable, not traceable Run every criterion, document results
Accepting "80% done" Partial completion is not completion ALL criteria must pass; no partial credit
Criteria written after design is done Post-hoc criteria are biased toward what was built Write criteria before design starts, from the spec
Vague criteria ("looks professional") Cannot be tested, leads to infinite revision loops Rewrite with measurable values (font, size, color, spacing)
Designer self-certifying "done" Conflict of interest — maker should not be sole validator Independent validation by Leah, always
Changing criteria mid-review Moving goalposts destroy trust and process Criteria changes require spec amendment through Anna
Reviewing without the spec open Leah's memory of the spec is not the spec Always have the actual spec document open during review
Applying personal preference Leah's taste is irrelevant — the spec is the authority If it passes all criteria, it passes. Period.
Scoring overall without scoring parts Halo effect — strong first impression hides specific failures Score each criterion independently, then aggregate
Skipping the self-evaluation Bias accumulates silently — calibration prevents drift Run pre-review calibration and post-review verification every time

REFERENCES