DOMAIN:CREATIVE — CREATIVE_ACCEPTANCE_CRITERIA¶

OWNER: leah ALSO_USED_BY: anna (spec authoring), antje (TDD from spec), margot (design production), boris (visual production), jasper (test reconciliation), faye (team alfa PM), sytske (team bravo PM) UPDATED: 2026-03-28 SCOPE: defining, measuring, and verifying acceptance criteria for creative deliverables — the "when is it done?" framework

PURPOSE¶

"Looks good" is not an acceptance criterion. "I like it" is not an acceptance criterion.

Creative acceptance criteria must be: - Specific: references exact values, not vague descriptions - Measurable: can be verified with a binary pass/fail outcome - Traceable: links back to a requirement (see creative-traceability.md) - Testable: another agent (or tool) can independently verify the claim

Leah uses acceptance criteria to determine: is this deliverable DONE? Not "good enough" — DONE.

THE PROBLEM WITH SUBJECTIVE ACCEPTANCE¶

HOW CREATIVE WORK GETS STUCK¶

DESIGNER: "Here's the homepage design."
PM: "Hmm, can you make it pop more?"
DESIGNER: [changes something]
PM: "Better, but now it feels too busy."
DESIGNER: [changes something else]
PM: "Almost... maybe try a different blue?"
[repeat 12 times]

This happens because there are no measurable acceptance criteria. Nobody can agree on "done" because "done" was never defined.

HOW MEASURABLE ACCEPTANCE FIXES THIS¶

ACCEPTANCE_CRITERIA:
  AC-001: Primary CTA uses hex #FF6B2B as background color
  AC-002: CTA text meets WCAG AA contrast (4.5:1 minimum against #FF6B2B)
  AC-003: Font family is SF Pro for all body text
  AC-004: Hero image is minimum 1920x1080px, WebP format
  AC-005: Page loads under 3 seconds on 4G connection
  AC-006: Layout is single-column below 768px breakpoint
  AC-007: All interactive elements have hover, active, focus, and disabled states

DESIGNER: "Here's the homepage design."
LEAH: [runs acceptance criteria]
  AC-001: PASS (#FF6B2B verified)
  AC-002: PASS (contrast ratio 4.7:1)
  AC-003: PASS (SF Pro confirmed)
  AC-004: PASS (2560x1440, WebP)
  AC-005: PASS (2.1s on simulated 4G)
  AC-006: PASS (verified at 768px)
  AC-007: FAIL — disabled state missing on secondary CTA
LEAH: "One criterion not met. AC-007: disabled state missing on secondary CTA."

No opinions. No "make it pop." One specific, actionable item to fix.

WRITING CREATIVE ACCEPTANCE CRITERIA¶

THE FORMULA¶

Every acceptance criterion follows this pattern:

[ELEMENT] [VERB] [MEASURABLE_VALUE] [CONTEXT/CONDITION]

GOOD examples: - "Primary heading uses font-size 48px at 1440px viewport width" - "Brand logo has minimum 24px clear space on all sides" - "Color palette contains exactly 5 colors matching hex values in brand-tokens.json" - "Hero section occupies 100vh on initial load at all breakpoints" - "Button border-radius is 8px for all button variants" - "Body text line-height is 1.5 relative to font-size" - "All images include alt text between 5-125 characters"

BAD examples (and why): - "Design looks professional" — not measurable - "Colors feel right" — not measurable - "Layout is clean" — not measurable - "Typography is readable" — measurable only if you specify size, contrast, and line-height - "It should work on mobile" — which breakpoints? what behavior?

ACCEPTANCE CRITERIA BY DELIVERABLE TYPE¶

WEB DESIGN¶

Category	Example Criteria
Layout	"Content area max-width is 1200px, centered horizontally"
Responsive	"Navigation collapses to hamburger menu below 768px"
Typography	"Body text is 16px/1.5 Inter Regular on all viewports"
Color	"All background/text combinations meet WCAG AA (4.5:1)"
Performance	"Largest Contentful Paint under 2.5 seconds"
States	"All form inputs have empty, filled, focused, error, and disabled states"
Animation	"Page transitions use 300ms ease-in-out"

iOS APP DESIGN¶

Category	Example Criteria
Platform	"Navigation uses UINavigationController standard back gesture"
Typography	"All text uses SF Pro; supports Dynamic Type from xSmall to AX5"
Layout	"All layouts use Auto Layout; no fixed dimensions except icons"
Touch	"All tap targets are minimum 44x44pt"
Safe areas	"Content respects safe area insets on all supported devices"
Dark mode	"All screens have dark mode variants using semantic system colors"
Accessibility	"VoiceOver can navigate all screens with logical reading order"

BRAND IDENTITY DELIVERABLE¶

Category	Example Criteria
Logo	"Logo provided in SVG, PNG (1x, 2x, 3x), and EPS formats"
Color	"Palette includes primary, secondary, accent, success, warning, error, neutral scale"
Typography	"Type scale defines 7 levels (display, h1-h4, body, caption) with exact sizes"
Guidelines	"Usage rules cover minimum size, clear space, forbidden modifications, co-branding"
Completeness	"All deliverables listed in scope document are present and match spec"

MARKETING COLLATERAL¶

Category	Example Criteria
Dimensions	"Social media assets provided at 1200x630 (OG), 1080x1080 (IG), 1200x675 (Twitter)"
Bleed	"Print materials include 3mm bleed on all sides"
Resolution	"Print assets are minimum 300 DPI; digital assets are 72 DPI"
Color mode	"Print assets use CMYK color mode; digital assets use sRGB"
File format	"Print: PDF/X-1a; Digital: PNG or WebP"

DEFINITION OF DESIGN DONE¶

Acceptance criteria are per-deliverable. Definition of Done (DoD) applies to ALL creative work.

GE CREATIVE DEFINITION OF DONE¶

A creative deliverable is DONE when ALL of the following are true:

All acceptance criteria pass — zero failures, not "mostly passing"
Brand compliance score >= 95 (see brand-compliance.md scoring model)
Traceability complete — every element traces to a requirement, every requirement traces to an element (see creative-traceability.md)
No contradictions — no conflicts between brief, strategy, spec, and deliverable
Accessibility verified — WCAG AA minimum, AAA target for text
Cross-deliverable consistency verified (if multi-deliverable project)
Assets exported — all formats, sizes, and naming per spec
Handoff documentation complete — design decisions documented with rationale

CREATIVE DOD vs DEVELOPMENT DOD¶

Aspect	Creative DoD	Development DoD
Primary gate	Brand compliance + brief alignment	Tests pass + code review
Verification	Visual comparison + spec measurement	Automated test suite
Subjectivity	Higher — some checks require anchored judgment	Lower — most checks are binary
Iteration	Expected (creative exploration)	Minimized (TDD drives implementation)
Handoff	Design → Development (specs, tokens, assets)	Development → Deployment (PR, CI/CD)
Regression risk	Style drift over iterations	Code regression over changes

KEY_DIFFERENCE: Creative acceptance criteria must account for the inherent subjectivity of visual work by anchoring every subjective assessment to a documented objective.

SELF-EVALUATION FRAMEWORK FOR LEAH¶

Inspired by Anthropic's approach to task evaluation — structured self-assessment to catch bias and blind spots.

PRE-REVIEW CALIBRATION¶

Before reviewing any deliverable, Leah asks herself:

Do I have the current spec? (Not a remembered version — the actual document)
Do I have the brand guidelines? (Current version, not an older iteration)
Do I have the creative brief? (The client's words, not a summary)
Do I have the acceptance criteria? (Written before the work started)
Am I checking against docs or against my expectations? (Docs only)

If ANY answer is "no" — STOP. Get the missing document before proceeding.

DURING-REVIEW BIAS CHECKS¶

Bias	Detection Question	Mitigation
Anchoring	"Am I judging this against the last deliverable I reviewed, or against the spec?"	Re-read spec before each review
Confirmation	"Am I looking for evidence that it passes, or checking if it fails?"	Start with most-likely-to-fail criteria first
Recency	"Am I weighting recent feedback more than the original brief?"	Always start from the brief, not from revision notes
Authority	"Am I accepting this because a senior designer made it?"	Apply same criteria regardless of who made it
Halo effect	"Is the overall impression affecting my assessment of individual criteria?"	Score each criterion independently before overall score
Familiarity	"Am I passing this because it looks like something that worked before?"	Verify against THIS project's spec, not past projects

POST-REVIEW VERIFICATION¶

After completing a review, Leah checks:

Every criterion has a PASS or FAIL — no "maybe" or "mostly"
Every FAIL has a specific reference — guideline section, hex value, dimension
Every FAIL has an actionable fix — not "make it better" but "change X to Y"
No new requirements introduced — Leah validates, she does not scope
Overall score matches individual scores — no gut-feel override of systematic assessment

ACCEPTANCE CRITERIA LIFECYCLE¶

1. ANNA writes spec → includes acceptance criteria (AC-xxx)
   |
2. ANTJE validates criteria are testable → flags vague criteria
   |
3. DESIGNER receives spec with criteria → knows exactly what "done" means
   |
4. DESIGNER delivers → self-checks against criteria before handoff
   |
5. LEAH receives deliverable → runs criteria systematically
   |
   5a. ALL_PASS → mark DONE, generate compliance report
   5b. ANY_FAIL → return with specific failures, reference criteria IDs
   |
6. DESIGNER fixes specific failures → resubmits
   |
7. LEAH re-runs ONLY failed criteria (not full re-review)
   |
8. LOOP until ALL_PASS or ESCALATE (if designer disputes a criterion)

CRITERIA DISPUTES¶

If a designer disagrees with a criterion: - The criterion may be wrong (spec error) — escalate to Anna - The criterion may be infeasible (technical constraint) — escalate to Anna with evidence - The designer may want a waiver (intentional deviation) — escalate to PM for client approval - Leah NEVER waives a criterion herself — she enforces, she does not negotiate

MEASURING QA EFFECTIVENESS¶

Track these metrics to evaluate whether the acceptance criteria system is working:

Metric	Target	What It Reveals
First-pass approval rate	>80%	If low: criteria unclear, or designers not self-checking
Average criteria per deliverable	10-25	If low: criteria too vague or incomplete. If high: over-specified
Disputes per project	<2	If high: criteria quality issue or spec-design misalignment
Time from submission to approval	<4 hours	If high: too many failures, or Leah is a bottleneck
Criteria FAIL rate by category	Even distribution	If one category dominates: training gap or spec gap
Rework cycles per deliverable	<=2	If high: criteria not clear enough, or design fundamentals issue
False positive rate	<5%	Leah flagging things that are actually correct — re-calibrate
False negative rate	0% target	Leah missing actual failures — most dangerous, monitor via client feedback

TEMPLATES¶

ACCEPTANCE CRITERIA TEMPLATE (For Anna to fill per deliverable)¶

DELIVERABLE: [name]
PROJECT: [project name]
CLIENT: [client name]
SPEC_VERSION: [version]
DATE: [date]

ACCEPTANCE_CRITERIA:

  ## Visual
  AC-V001: [criterion]
  AC-V002: [criterion]
  ...

  ## Typography
  AC-T001: [criterion]
  AC-T002: [criterion]
  ...

  ## Layout
  AC-L001: [criterion]
  AC-L002: [criterion]
  ...

  ## Responsive
  AC-R001: [criterion]
  AC-R002: [criterion]
  ...

  ## Accessibility
  AC-A001: [criterion]
  AC-A002: [criterion]
  ...

  ## Content
  AC-C001: [criterion]
  AC-C002: [criterion]
  ...

  ## Assets/Export
  AC-E001: [criterion]
  AC-E002: [criterion]
  ...

QA REPORT TEMPLATE (For Leah to fill per review)¶

QA_REPORT
=========
Deliverable: [name]
Reviewed by: leah
Date: [date]
Spec version: [version]

RESULTS:
  TOTAL_CRITERIA: [N]
  PASSED: [N]
  FAILED: [N]
  BLOCKED: [N] (missing info to evaluate)

  PASS_RATE: [X]%
  BRAND_COMPLIANCE_SCORE: [X]/100

FAILURES:
  [AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]
  [AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]

BLOCKED:
  [AC-xxx]: BLOCKED — [what's missing] — need: [info/document]

VERDICT: [APPROVED / RETURN_FOR_FIXES / ESCALATE]

NOTES: [any additional observations, not introducing new requirements]

ANTI-PATTERNS¶

Anti-Pattern	Why It's Wrong	What to Do Instead
"Looks good to me" approval	Not a criterion, not measurable, not traceable	Run every criterion, document results
Accepting "80% done"	Partial completion is not completion	ALL criteria must pass; no partial credit
Criteria written after design is done	Post-hoc criteria are biased toward what was built	Write criteria before design starts, from the spec
Vague criteria ("looks professional")	Cannot be tested, leads to infinite revision loops	Rewrite with measurable values (font, size, color, spacing)
Designer self-certifying "done"	Conflict of interest — maker should not be sole validator	Independent validation by Leah, always
Changing criteria mid-review	Moving goalposts destroy trust and process	Criteria changes require spec amendment through Anna
Reviewing without the spec open	Leah's memory of the spec is not the spec	Always have the actual spec document open during review
Applying personal preference	Leah's taste is irrelevant — the spec is the authority	If it passes all criteria, it passes. Period.
Scoring overall without scoring parts	Halo effect — strong first impression hides specific failures	Score each criterion independently, then aggregate
Skipping the self-evaluation	Bias accumulates silently — calibration prevents drift	Run pre-review calibration and post-review verification every time

REFERENCES¶

Atlassian: What is Acceptance Criteria?
AltexSoft: Acceptance Criteria vs Definition of Done
Nulab: Definition of Done vs Acceptance Criteria
Everyday Design: Writing Acceptance Criteria
LeanWisdom: Definition of Done and Acceptance Criteria Guide
Zenhub: Acceptance Criteria Best Practices
See also: domains/creative/design-qa.md (Leah's QA methodology)
See also: domains/creative/creative-traceability.md (requirements traceability)
See also: domains/creative/brand-compliance.md (brand compliance scoring)
See also: domains/testing/test-reconciliation.md (Jasper's parallel for code QA)