DOMAIN:VISUAL_PRODUCTION:IMAGE_GENERATION¶

OWNER: felice
UPDATED: 2026-03-24
SCOPE: AI image generation — DALL-E 3, Midjourney, Flux Pro, quality evaluation, batch patterns
AGENTS: felice (primary), alexander (design direction)
PARENT: Visual Production

IMAGE_GEN:DALL_E_3¶

API_BASICS¶

TOOL: OpenAI DALL-E 3 API
ENDPOINT: POST https://api.openai.com/v1/images/generations

PARAMS:
- model: "dall-e-3"
- prompt: max 4000 chars
- size: "1024x1024" | "1024x1792" (portrait) | "1792x1024" (landscape)
- quality: "standard" (faster, cheaper) | "hd" (more detail, 2x cost)
- style: "vivid" (hyper-real, dramatic) | "natural" (less exaggerated)
- n: always 1 (DALL-E 3 generates 1 image per request)
- response_format: "url" (default, expires 1hr) | "b64_json" (base64-encoded)

RESPONSE:
- data[0].url — temporary URL (valid ~60 minutes)
- data[0].revised_prompt — the prompt DALL-E 3 actually used (it rewrites your prompt internally)
- RULE: always save revised_prompt — useful for understanding how the model interpreted your intent
- RULE: always download the image immediately — URL expires within 1 hour

PROMPT_ENGINEERING¶

DALL-E 3 rewrites prompts internally before generating. This means:
- vague prompts get "enhanced" unpredictably
- specific prompts stay closer to intent
- you cannot fully control the final prompt, but specificity minimizes drift

STRUCTURE for reliable prompts:

[SUBJECT] — what is in the image
[COMPOSITION] — how it is arranged
[STYLE] — visual treatment
[LIGHTING] — illumination
[MOOD] — emotional tone
[CONSTRAINTS] — what to exclude

EXAMPLE (weak):

a website design

EXAMPLE (strong):

A clean SaaS dashboard with a left sidebar navigation showing 6 menu items with icons,
white background, blue (#2563EB) accent color, a data table with 5 rows in the main
content area, a line chart in the top right showing upward trend over 12 months,
flat design style, no text labels, soft drop shadows, modern UI design,
studio lighting, professional product screenshot aesthetic

STYLE_REFERENCES:
- "flat vector illustration" — clean, modern, minimal shading
- "photorealistic photography" — indistinguishable from a real photo
- "watercolor painting" — soft edges, visible brushstrokes
- "isometric 3D illustration" — geometric, consistent perspective
- "editorial photography" — magazine-quality, composed
- "minimalist line drawing" — simple, elegant, outline only
- "collage style" — layered cutouts, mixed media feel

COMPOSITION_DIRECTIVES:
- "centered subject with negative space" — clean, focused
- "rule of thirds placement" — natural, professional
- "bird's eye view" — top-down, spatial
- "close-up detail shot" — texture, intimacy
- "wide establishing shot" — context, scale
- "symmetrical composition" — formal, balanced

NEGATIVE_CONSTRAINTS:
- DALL-E 3 handles negative instructions better than DALL-E 2
- "no text" — prevents random text artifacts
- "no watermark" — prevents watermark-like artifacts
- "no people" — removes human figures
- "no borders or frames" — clean edge to edge
- RULE: always include "no text" unless you specifically want text in the image

TEXT_IN_IMAGES:
- DALL-E 3 can render short text (3-4 words max reliably)
- spell the text exactly as you want it: with text reading "HELLO"
- longer text will have misspellings or distortion
- RULE: for text-heavy images, use Flux Pro instead — superior text rendering
- RULE: prefer post-processing text overlay for reliable typography

ASPECT_RATIO_SELECTION:
- 1024x1024 — social media posts, profile images, thumbnails
- 1792x1024 — hero banners, presentation slides, landscape scenes
- 1024x1792 — mobile screens, stories, portrait content, Pinterest
- RULE: choose aspect ratio based on delivery context, not subject matter

PRICING¶

PRICING (as of 2025):
- standard 1024x1024: $0.040/image
- standard 1792x1024 or 1024x1792: $0.080/image
- hd 1024x1024: $0.080/image
- hd 1792x1024 or 1024x1792: $0.120/image

COST_OPTIMIZATION:
- use "standard" quality for concepts, mood boards, initial iterations
- use "hd" only for final deliverables
- batch exploration at 1024x1024 standard ($0.04) before committing to large hd ($0.12)
- 3 standard iterations + 1 hd final = $0.24 vs 4 hd = $0.48

LICENSING: outputs owned by the requesting user per OpenAI terms. Commercial use allowed.
No attribution required. No exclusivity — OpenAI may train on outputs.

IMAGE_GEN:MIDJOURNEY¶

API_BASICS¶

TOOL: Midjourney API (official API launched 2025)
ENDPOINT: check current docs at docs.midjourney.com
NOTE: also accessible via Discord bot for interactive exploration

STRENGTHS:
- best-in-class aesthetic quality and artistic style
- excellent at photorealistic and editorial content
- strong community prompt library for reference
- character and style reference for brand consistency

WEAKNESSES:
- less precise prompt adherence than Flux Pro
- text rendering unreliable
- 4-image grid output (upscale needed for single image)
- slower than DALL-E 3 or Flux Pro

PARAM_SYNTAX¶

All parameters appended to the prompt after --:

ASPECT_RATIO:
- --ar 16:9 — landscape video frame
- --ar 9:16 — vertical mobile/story
- --ar 1:1 — square
- --ar 4:5 — Instagram portrait
- --ar 3:2 — classic photography
- --ar 21:9 — ultrawide cinematic
- NOTE: any ratio accepted — not limited to presets

MODEL_VERSION:
- --v 6.1 — latest model (best quality, recommended)
- --v 6 — previous stable
- --niji 6 — anime/illustration specialized model

STYLE_CONTROLS:
- --style raw — less Midjourney beautification, more literal interpretation
- --stylize 50 — 0-1000. Lower = literal, higher = artistic. Default 100.
- 0-50: very literal, close to prompt
- 50-200: balanced (recommended range)
- 200-1000: increasingly artistic, prompt becomes suggestion
- --chaos 20 — 0-100. Variation between the 4 generated images. Higher = more diverse.
- 0-20: similar results (good for refinement)
- 20-50: moderate variation (good for exploration)
- 50-100: wildly different (good for brainstorming)

QUALITY:
- --quality 1 — full quality (default)
- --quality 0.5 — half quality, faster, cheaper (good for iteration)
- --quality 0.25 — quarter quality (quick concepts only)

NEGATIVE_PROMPT:
- --no text, watermark, borders — comma-separated exclusions
- RULE: always include --no text unless text is specifically desired

REFERENCES:
- --sref <url> — style reference image (match the visual style)
- --cref <url> — character reference (maintain character appearance)
- --sref <url> --sw 50 — style weight 0-1000 (how strongly to match style)
- --cref <url> --cw 50 — character weight 0-100 (0 = face only, 100 = full appearance)
- RULE: use --sref for brand consistency across a campaign
- RULE: use --cref for recurring characters/mascots

SPECIAL:
- --tile — seamless tiling pattern
- --seed <number> — reproducibility (same seed + same prompt = similar result)
- --repeat 4 — run the same prompt 4 times (paid plans)

PROMPT_ENGINEERING¶

STRUCTURE:
- Midjourney responds best to comma-separated descriptors, NOT full sentences
- most important subject first
- style/medium keywords at the end
- parameters after all descriptive text

EXAMPLE (weak):

I would like a picture of a modern office that has big windows and natural light

EXAMPLE (strong):

modern open-plan office interior, floor-to-ceiling windows, minimal Scandinavian furniture,
warm wood tones, green plants, natural daylight streaming in, architectural photography,
shot on Hasselblad --ar 16:9 --style raw --stylize 30 --no text, people

CAMERA/LENS_KEYWORDS that improve photorealistic output:
- "shot on Hasselblad" — medium format, high detail
- "85mm lens" — portrait, shallow depth of field
- "35mm street photography" — documentary feel
- "macro lens" — extreme close-up detail
- "drone aerial photography" — overhead perspective
- "tilt-shift miniature" — selective focus, toy-like

LIGHTING_KEYWORDS:
- "golden hour" — warm, directional, long shadows
- "blue hour" — cool, twilight
- "studio lighting" — controlled, professional
- "Rembrandt lighting" — dramatic portrait
- "backlit silhouette" — high contrast
- "overcast diffused" — soft, even, no harsh shadows

LICENSING: users on paid plans own commercial rights to outputs.
Free tier users: non-commercial only. Attribution not required on paid plans.

IMAGE_GEN:FLUX_PRO¶

API_BASICS¶

TOOL: Flux Pro API
PROVIDERS: BFL direct (api.bfl.ml/v1/flux-pro), Replicate, fal.ai, Together AI

STRENGTHS vs competitors:
- BEST text rendering in images (most reliable as of 2025)
- excellent prompt adherence — minimal "creative reinterpretation"
- good at technical diagrams, UI mockups, infographics
- fast generation (2-5 seconds via optimized providers)
- flexible dimensions (any multiple of 8)

WEAKNESSES:
- less artistic/aesthetic than Midjourney
- smaller community and fewer prompt references
- no built-in style/character reference system

PARAMS:
- prompt: detailed description
- width, height: flexible (multiples of 8, recommended: 512-2048)
- num_inference_steps: 20-50 (higher = better quality, slower). 28 is good default.
- guidance_scale: 2.0-10.0 (higher = stricter prompt adherence). 7.5 is good default.
- seed: for reproducibility
- safety_tolerance: 0-6 (0 = strictest, 6 = most permissive)

PROMPT_ENGINEERING¶

Flux Pro takes literal prompts well — say exactly what you want.

TEXT_RENDERING:
- with text reading "Growing Europe" in white sans-serif font centered at the top
- a sign that says "OPEN" in red block letters
- a laptop screen showing the text "Welcome Back" in a blue header bar
- RULE: Flux Pro is the go-to tool when text in image is required
- RULE: still verify text output — not 100% reliable for long strings

TECHNICAL_CONTENT:
- UI mockups: describe layout precisely (sidebar width, button positions, color values)
- Diagrams: specify flow direction, box count, connection style
- Infographics: state data points, chart type, label positions

EXAMPLE:

A modern web application login page with a centered white card on a gradient
blue-to-purple background, the card contains: a "Growing Europe" logo at top
in dark blue, an email input field, a password input field, a large blue
"Sign In" button, and small gray "Forgot password?" text below, clean UI
design, subtle shadow on the card, 1200x800 pixels

LICENSING: outputs owned by user. Check BFL terms for specific commercial use cases.
Provider-specific terms may apply (Replicate, fal.ai each have their own ToS).

IMAGE_GEN:TOOL_DECISION_TREE¶

WHEN_TO_USE_WHICH¶

START: What kind of image do you need?
│
├─ Text in the image is critical?
│  ├─ YES → Flux Pro
│  └─ NO → continue
│
├─ Photorealistic or high-aesthetic quality?
│  ├─ YES → Midjourney (--style raw for brand accuracy)
│  └─ NO → continue
│
├─ Technical diagram, UI mockup, or infographic?
│  ├─ YES → Flux Pro
│  └─ NO → continue
│
├─ Need to match existing brand style precisely?
│  ├─ YES → Midjourney with --sref (style reference)
│  └─ NO → continue
│
├─ Need recurring character consistency?
│  ├─ YES → Midjourney with --cref (character reference)
│  └─ NO → continue
│
├─ Fast iteration on concepts (many variants)?
│  ├─ YES → DALL-E 3 (cheapest at $0.04, fast API)
│  └─ NO → continue
│
├─ Seamless tiling pattern?
│  ├─ YES → Midjourney with --tile
│  └─ NO → continue
│
└─ General purpose / no special requirement
   └─ DALL-E 3 (fastest, cheapest, most predictable API)

COST_COMPARISON (single image, standard quality):
| Tool | Cost | Speed | Text Quality | Aesthetic Quality | Prompt Adherence |
|------|------|-------|--------------|-------------------|------------------|
| DALL-E 3 | $0.04 | 5-10s | poor | good | good |
| Midjourney | ~$0.02-0.10 | 30-60s | poor | excellent | moderate |
| Flux Pro | $0.02-0.05 | 2-5s | excellent | good | excellent |

MULTI_TOOL_WORKFLOW¶

For client deliverables that need both quality and text:
1. generate hero visual in Midjourney (best aesthetics)
2. generate text-bearing elements in Flux Pro
3. composite in post-processing (Sharp/Pillow overlay)
4. final format optimization (WebP + AVIF)

For brand asset packages:
1. establish style with Midjourney --sref image
2. generate all variations with same --sref
3. text overlays via Flux Pro or post-processing
4. export at required dimensions per delivery-specs.md

IMAGE_GEN:QUALITY_EVALUATION¶

AUTOMATED_SCORING¶

Every generated image MUST pass automated checks before human review:

RESOLUTION_CHECK:
- verify output dimensions match requested dimensions
- flag if dimensions differ by more than 2%

ARTIFACT_DETECTION:
- JPEG quality assessment: reject if estimated quality < 60
- compression artifact detection at block boundaries
- banding detection in gradients

AI_ARTIFACT_CHECKLIST:
- [ ] text distortion (misspelled or warped text)
- [ ] finger/hand deformities (extra digits, fused fingers)
- [ ] asymmetric faces (eyes misaligned, uneven features)
- [ ] background object blending (objects merging into each other)
- [ ] inconsistent shadows (light direction changes within image)
- [ ] seamless tiling breaks (if --tile was used)
- [ ] impossible geometry (Escher-like spatial errors)
- [ ] texture repetition (same pattern copy-pasted)

BRAND_COMPLIANCE:
- sample pixels at key positions to verify brand colors
- check dominant color matches specification
- verify logo placement if specified

FILE_SIZE_VALIDATION:
| Context | Max Size |
|---------|----------|
| web hero | 5 MB |
| web thumbnail | 500 KB |
| social media | 8 MB |
| email | 200 KB |
| mobile asset | 2 MB |

HUMAN_REVIEW_PROTOCOL¶

RULE: always generate 3-4 variations and select best — AI generation is stochastic
RULE: human review required before client delivery — no auto-publish of AI images
RULE: rejected images must be regenerated with refined prompt, not post-processed to fix

REVIEW_CRITERIA:
1. does the image match the brief?
2. is it free of AI artifacts?
3. does it match brand guidelines?
4. is the composition appropriate for the delivery context?
5. would this pass for professional work to an uninformed viewer?

IMAGE_GEN:BATCH_GENERATION¶

PATTERNS¶

SEQUENTIAL_BATCH (same style, different subjects):

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def generate_batch(prompts: list[str], size="1024x1024", quality="standard"):
    """Generate multiple images concurrently with rate limiting."""
    semaphore = asyncio.Semaphore(5)  # max 5 concurrent requests

    async def generate_one(prompt: str):
        async with semaphore:
            response = await client.images.generate(
                model="dall-e-3",
                prompt=prompt,
                size=size,
                quality=quality,
                n=1,
            )
            return {
                "prompt": prompt,
                "url": response.data[0].url,
                "revised_prompt": response.data[0].revised_prompt,
            }

    return await asyncio.gather(*[generate_one(p) for p in prompts])

VARIATION_BATCH (same subject, different styles):
- generate the same subject with different style keywords
- present all variations to reviewer for selection
- RULE: always generate at least 3 variations for any client-facing asset

CAMPAIGN_BATCH (consistent style, different content):
- establish style reference first (Midjourney --sref or DALL-E 3 style prompt prefix)
- reuse the same style prefix/reference across all generations
- verify consistency: sample images from batch and compare side-by-side

COST_TRACKING:
- log every generation: timestamp, tool, prompt, cost, accepted/rejected
- set per-project budget limits
- alert at 80% of budget consumed
- RULE: never exceed project budget without explicit approval

CROSS_REFERENCES¶

Delivery specifications: delivery-specs.md
Format optimization: asset-optimization.md
Accessibility: accessibility-media.md
Video from images (Runway image-to-video): video-production.md