DOMAIN:CREATIVE — CREATIVE_ACCEPTANCE_CRITERIA¶
OWNER: leah ALSO_USED_BY: anna (spec authoring), antje (TDD from spec), margot (design production), boris (visual production), jasper (test reconciliation), faye (team alfa PM), sytske (team bravo PM) UPDATED: 2026-03-28 SCOPE: defining, measuring, and verifying acceptance criteria for creative deliverables — the "when is it done?" framework
PURPOSE¶
"Looks good" is not an acceptance criterion. "I like it" is not an acceptance criterion.
Creative acceptance criteria must be: - Specific: references exact values, not vague descriptions - Measurable: can be verified with a binary pass/fail outcome - Traceable: links back to a requirement (see creative-traceability.md) - Testable: another agent (or tool) can independently verify the claim
Leah uses acceptance criteria to determine: is this deliverable DONE? Not "good enough" — DONE.
THE PROBLEM WITH SUBJECTIVE ACCEPTANCE¶
HOW CREATIVE WORK GETS STUCK¶
DESIGNER: "Here's the homepage design."
PM: "Hmm, can you make it pop more?"
DESIGNER: [changes something]
PM: "Better, but now it feels too busy."
DESIGNER: [changes something else]
PM: "Almost... maybe try a different blue?"
[repeat 12 times]
This happens because there are no measurable acceptance criteria. Nobody can agree on "done" because "done" was never defined.
HOW MEASURABLE ACCEPTANCE FIXES THIS¶
ACCEPTANCE_CRITERIA:
AC-001: Primary CTA uses hex #FF6B2B as background color
AC-002: CTA text meets WCAG AA contrast (4.5:1 minimum against #FF6B2B)
AC-003: Font family is SF Pro for all body text
AC-004: Hero image is minimum 1920x1080px, WebP format
AC-005: Page loads under 3 seconds on 4G connection
AC-006: Layout is single-column below 768px breakpoint
AC-007: All interactive elements have hover, active, focus, and disabled states
DESIGNER: "Here's the homepage design."
LEAH: [runs acceptance criteria]
AC-001: PASS (#FF6B2B verified)
AC-002: PASS (contrast ratio 4.7:1)
AC-003: PASS (SF Pro confirmed)
AC-004: PASS (2560x1440, WebP)
AC-005: PASS (2.1s on simulated 4G)
AC-006: PASS (verified at 768px)
AC-007: FAIL — disabled state missing on secondary CTA
LEAH: "One criterion not met. AC-007: disabled state missing on secondary CTA."
No opinions. No "make it pop." One specific, actionable item to fix.
WRITING CREATIVE ACCEPTANCE CRITERIA¶
THE FORMULA¶
Every acceptance criterion follows this pattern:
GOOD examples: - "Primary heading uses font-size 48px at 1440px viewport width" - "Brand logo has minimum 24px clear space on all sides" - "Color palette contains exactly 5 colors matching hex values in brand-tokens.json" - "Hero section occupies 100vh on initial load at all breakpoints" - "Button border-radius is 8px for all button variants" - "Body text line-height is 1.5 relative to font-size" - "All images include alt text between 5-125 characters"
BAD examples (and why): - "Design looks professional" — not measurable - "Colors feel right" — not measurable - "Layout is clean" — not measurable - "Typography is readable" — measurable only if you specify size, contrast, and line-height - "It should work on mobile" — which breakpoints? what behavior?
ACCEPTANCE CRITERIA BY DELIVERABLE TYPE¶
WEB DESIGN¶
| Category | Example Criteria |
|---|---|
| Layout | "Content area max-width is 1200px, centered horizontally" |
| Responsive | "Navigation collapses to hamburger menu below 768px" |
| Typography | "Body text is 16px/1.5 Inter Regular on all viewports" |
| Color | "All background/text combinations meet WCAG AA (4.5:1)" |
| Performance | "Largest Contentful Paint under 2.5 seconds" |
| States | "All form inputs have empty, filled, focused, error, and disabled states" |
| Animation | "Page transitions use 300ms ease-in-out" |
iOS APP DESIGN¶
| Category | Example Criteria |
|---|---|
| Platform | "Navigation uses UINavigationController standard back gesture" |
| Typography | "All text uses SF Pro; supports Dynamic Type from xSmall to AX5" |
| Layout | "All layouts use Auto Layout; no fixed dimensions except icons" |
| Touch | "All tap targets are minimum 44x44pt" |
| Safe areas | "Content respects safe area insets on all supported devices" |
| Dark mode | "All screens have dark mode variants using semantic system colors" |
| Accessibility | "VoiceOver can navigate all screens with logical reading order" |
BRAND IDENTITY DELIVERABLE¶
| Category | Example Criteria |
|---|---|
| Logo | "Logo provided in SVG, PNG (1x, 2x, 3x), and EPS formats" |
| Color | "Palette includes primary, secondary, accent, success, warning, error, neutral scale" |
| Typography | "Type scale defines 7 levels (display, h1-h4, body, caption) with exact sizes" |
| Guidelines | "Usage rules cover minimum size, clear space, forbidden modifications, co-branding" |
| Completeness | "All deliverables listed in scope document are present and match spec" |
MARKETING COLLATERAL¶
| Category | Example Criteria |
|---|---|
| Dimensions | "Social media assets provided at 1200x630 (OG), 1080x1080 (IG), 1200x675 (Twitter)" |
| Bleed | "Print materials include 3mm bleed on all sides" |
| Resolution | "Print assets are minimum 300 DPI; digital assets are 72 DPI" |
| Color mode | "Print assets use CMYK color mode; digital assets use sRGB" |
| File format | "Print: PDF/X-1a; Digital: PNG or WebP" |
DEFINITION OF DESIGN DONE¶
Acceptance criteria are per-deliverable. Definition of Done (DoD) applies to ALL creative work.
GE CREATIVE DEFINITION OF DONE¶
A creative deliverable is DONE when ALL of the following are true:
- All acceptance criteria pass — zero failures, not "mostly passing"
- Brand compliance score >= 95 (see brand-compliance.md scoring model)
- Traceability complete — every element traces to a requirement, every requirement traces to an element (see creative-traceability.md)
- No contradictions — no conflicts between brief, strategy, spec, and deliverable
- Accessibility verified — WCAG AA minimum, AAA target for text
- Cross-deliverable consistency verified (if multi-deliverable project)
- Assets exported — all formats, sizes, and naming per spec
- Handoff documentation complete — design decisions documented with rationale
CREATIVE DOD vs DEVELOPMENT DOD¶
| Aspect | Creative DoD | Development DoD |
|---|---|---|
| Primary gate | Brand compliance + brief alignment | Tests pass + code review |
| Verification | Visual comparison + spec measurement | Automated test suite |
| Subjectivity | Higher — some checks require anchored judgment | Lower — most checks are binary |
| Iteration | Expected (creative exploration) | Minimized (TDD drives implementation) |
| Handoff | Design → Development (specs, tokens, assets) | Development → Deployment (PR, CI/CD) |
| Regression risk | Style drift over iterations | Code regression over changes |
KEY_DIFFERENCE: Creative acceptance criteria must account for the inherent subjectivity of visual work by anchoring every subjective assessment to a documented objective.
SELF-EVALUATION FRAMEWORK FOR LEAH¶
Inspired by Anthropic's approach to task evaluation — structured self-assessment to catch bias and blind spots.
PRE-REVIEW CALIBRATION¶
Before reviewing any deliverable, Leah asks herself:
- Do I have the current spec? (Not a remembered version — the actual document)
- Do I have the brand guidelines? (Current version, not an older iteration)
- Do I have the creative brief? (The client's words, not a summary)
- Do I have the acceptance criteria? (Written before the work started)
- Am I checking against docs or against my expectations? (Docs only)
If ANY answer is "no" — STOP. Get the missing document before proceeding.
DURING-REVIEW BIAS CHECKS¶
| Bias | Detection Question | Mitigation |
|---|---|---|
| Anchoring | "Am I judging this against the last deliverable I reviewed, or against the spec?" | Re-read spec before each review |
| Confirmation | "Am I looking for evidence that it passes, or checking if it fails?" | Start with most-likely-to-fail criteria first |
| Recency | "Am I weighting recent feedback more than the original brief?" | Always start from the brief, not from revision notes |
| Authority | "Am I accepting this because a senior designer made it?" | Apply same criteria regardless of who made it |
| Halo effect | "Is the overall impression affecting my assessment of individual criteria?" | Score each criterion independently before overall score |
| Familiarity | "Am I passing this because it looks like something that worked before?" | Verify against THIS project's spec, not past projects |
POST-REVIEW VERIFICATION¶
After completing a review, Leah checks:
- Every criterion has a PASS or FAIL — no "maybe" or "mostly"
- Every FAIL has a specific reference — guideline section, hex value, dimension
- Every FAIL has an actionable fix — not "make it better" but "change X to Y"
- No new requirements introduced — Leah validates, she does not scope
- Overall score matches individual scores — no gut-feel override of systematic assessment
ACCEPTANCE CRITERIA LIFECYCLE¶
1. ANNA writes spec → includes acceptance criteria (AC-xxx)
|
2. ANTJE validates criteria are testable → flags vague criteria
|
3. DESIGNER receives spec with criteria → knows exactly what "done" means
|
4. DESIGNER delivers → self-checks against criteria before handoff
|
5. LEAH receives deliverable → runs criteria systematically
|
5a. ALL_PASS → mark DONE, generate compliance report
5b. ANY_FAIL → return with specific failures, reference criteria IDs
|
6. DESIGNER fixes specific failures → resubmits
|
7. LEAH re-runs ONLY failed criteria (not full re-review)
|
8. LOOP until ALL_PASS or ESCALATE (if designer disputes a criterion)
CRITERIA DISPUTES¶
If a designer disagrees with a criterion: - The criterion may be wrong (spec error) — escalate to Anna - The criterion may be infeasible (technical constraint) — escalate to Anna with evidence - The designer may want a waiver (intentional deviation) — escalate to PM for client approval - Leah NEVER waives a criterion herself — she enforces, she does not negotiate
MEASURING QA EFFECTIVENESS¶
Track these metrics to evaluate whether the acceptance criteria system is working:
| Metric | Target | What It Reveals |
|---|---|---|
| First-pass approval rate | >80% | If low: criteria unclear, or designers not self-checking |
| Average criteria per deliverable | 10-25 | If low: criteria too vague or incomplete. If high: over-specified |
| Disputes per project | <2 | If high: criteria quality issue or spec-design misalignment |
| Time from submission to approval | <4 hours | If high: too many failures, or Leah is a bottleneck |
| Criteria FAIL rate by category | Even distribution | If one category dominates: training gap or spec gap |
| Rework cycles per deliverable | <=2 | If high: criteria not clear enough, or design fundamentals issue |
| False positive rate | <5% | Leah flagging things that are actually correct — re-calibrate |
| False negative rate | 0% target | Leah missing actual failures — most dangerous, monitor via client feedback |
TEMPLATES¶
ACCEPTANCE CRITERIA TEMPLATE (For Anna to fill per deliverable)¶
DELIVERABLE: [name]
PROJECT: [project name]
CLIENT: [client name]
SPEC_VERSION: [version]
DATE: [date]
ACCEPTANCE_CRITERIA:
## Visual
AC-V001: [criterion]
AC-V002: [criterion]
...
## Typography
AC-T001: [criterion]
AC-T002: [criterion]
...
## Layout
AC-L001: [criterion]
AC-L002: [criterion]
...
## Responsive
AC-R001: [criterion]
AC-R002: [criterion]
...
## Accessibility
AC-A001: [criterion]
AC-A002: [criterion]
...
## Content
AC-C001: [criterion]
AC-C002: [criterion]
...
## Assets/Export
AC-E001: [criterion]
AC-E002: [criterion]
...
QA REPORT TEMPLATE (For Leah to fill per review)¶
QA_REPORT
=========
Deliverable: [name]
Reviewed by: leah
Date: [date]
Spec version: [version]
RESULTS:
TOTAL_CRITERIA: [N]
PASSED: [N]
FAILED: [N]
BLOCKED: [N] (missing info to evaluate)
PASS_RATE: [X]%
BRAND_COMPLIANCE_SCORE: [X]/100
FAILURES:
[AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]
[AC-xxx]: FAIL — [what's wrong] — expected: [X], actual: [Y] — ref: [spec section]
BLOCKED:
[AC-xxx]: BLOCKED — [what's missing] — need: [info/document]
VERDICT: [APPROVED / RETURN_FOR_FIXES / ESCALATE]
NOTES: [any additional observations, not introducing new requirements]
ANTI-PATTERNS¶
| Anti-Pattern | Why It's Wrong | What to Do Instead |
|---|---|---|
| "Looks good to me" approval | Not a criterion, not measurable, not traceable | Run every criterion, document results |
| Accepting "80% done" | Partial completion is not completion | ALL criteria must pass; no partial credit |
| Criteria written after design is done | Post-hoc criteria are biased toward what was built | Write criteria before design starts, from the spec |
| Vague criteria ("looks professional") | Cannot be tested, leads to infinite revision loops | Rewrite with measurable values (font, size, color, spacing) |
| Designer self-certifying "done" | Conflict of interest — maker should not be sole validator | Independent validation by Leah, always |
| Changing criteria mid-review | Moving goalposts destroy trust and process | Criteria changes require spec amendment through Anna |
| Reviewing without the spec open | Leah's memory of the spec is not the spec | Always have the actual spec document open during review |
| Applying personal preference | Leah's taste is irrelevant — the spec is the authority | If it passes all criteria, it passes. Period. |
| Scoring overall without scoring parts | Halo effect — strong first impression hides specific failures | Score each criterion independently, then aggregate |
| Skipping the self-evaluation | Bias accumulates silently — calibration prevents drift | Run pre-review calibration and post-review verification every time |
REFERENCES¶
- Atlassian: What is Acceptance Criteria?
- AltexSoft: Acceptance Criteria vs Definition of Done
- Nulab: Definition of Done vs Acceptance Criteria
- Everyday Design: Writing Acceptance Criteria
- LeanWisdom: Definition of Done and Acceptance Criteria Guide
- Zenhub: Acceptance Criteria Best Practices
- See also:
domains/creative/design-qa.md(Leah's QA methodology) - See also:
domains/creative/creative-traceability.md(requirements traceability) - See also:
domains/creative/brand-compliance.md(brand compliance scoring) - See also:
domains/testing/test-reconciliation.md(Jasper's parallel for code QA)