DOMAIN:VISUAL_PRODUCTION:VIDEO_PRODUCTION¶
OWNER: felice
UPDATED: 2026-03-24
SCOPE: Video production pipeline — generative video, avatar video, voiceover, post-processing
AGENTS: felice (primary), alexander (creative direction)
PARENT: Visual Production
VIDEO:PIPELINE_OVERVIEW¶
STAGES¶
BRIEF → STORYBOARD → ASSET_CREATION → ANIMATION → RENDERING → POST_PROCESSING → OPTIMIZATION → DELIVERY
STAGE: BRIEF
- INPUT: client requirements, brand guidelines, target audience, delivery context
- OUTPUT: structured brief (purpose, duration, style, key messages, CTA, platform)
- RULE: brief must specify delivery context — web embed, social, presentation, app
- RULE: brief must specify maximum duration — shorter is always better
STAGE: STORYBOARD
- INPUT: brief
- OUTPUT: frame-by-frame breakdown
- FIELDS per scene: visual description, text overlay, narration line, duration (seconds), transition type
- RULE: each scene 3-8 seconds — longer scenes lose attention
- RULE: total video duration: 30s (social), 60-90s (explainer), 3-5min (tutorial max)
STAGE: ASSET_CREATION
- INPUT: storyboard
- OUTPUT: images, icons, logos, background music, narration audio
- TOOLS: DALL-E 3/Midjourney/Flux Pro (images), ElevenLabs (narration), Epidemic Sound (music)
- RULE: all assets at 2x target resolution (render down, never up)
- RULE: images as PNG (lossless source) — convert to delivery format only at final stage
STAGE: ANIMATION
- INPUT: assets + storyboard
- OUTPUT: Remotion project with all scenes composed
- TOOLS: Remotion (React), Lottie (micro-animations)
- SEE: remotion-patterns.md for implementation details
STAGE: RENDERING
- INPUT: Remotion project
- OUTPUT: master video file
- FORMAT: ProRes (archival) or H.264 CRF 15 (high quality master)
- TOOLS: Remotion CLI or Remotion Lambda
STAGE: POST_PROCESSING
- INPUT: master video
- OUTPUT: captioned, color-corrected, audio-balanced video
- TOOLS: FFmpeg
- TASKS: captions, audio normalization, color correction, trimming
STAGE: OPTIMIZATION
- INPUT: post-processed video
- OUTPUT: delivery-optimized variants per platform
- TOOLS: FFmpeg
- SEE: delivery-specs.md for per-platform specs
STAGE: DELIVERY
- INPUT: optimized variants
- OUTPUT: CDN-hosted with adaptive streaming
- TOOLS: BunnyCDN Stream
- SEE: asset-optimization.md for CDN configuration
VIDEO:RUNWAY_GEN_3¶
API_BASICS¶
TOOL: Runway Gen-3 Alpha API
ENDPOINT: POST https://api.runwayml.com/v1/generate
PURPOSE: generative video — text-to-video and image-to-video
MODES:
- text-to-video: describe scene, receive 4-16 second clip
- image-to-video: provide start frame, model animates it
- image-to-video with end frame: start + end image, model interpolates
PARAMS:
- promptText: scene description (max ~300 chars effective)
- duration: 4 | 10 | 16 seconds
- ratio: "16:9" | "9:16" | "1:1"
- seed: for reproducibility
- watermark: false (paid plans only)
- init_image: URL or base64 (for image-to-video)
- end_image: URL or base64 (for interpolation mode)
RESPONSE:
- async: returns task_id, poll for completion
- generation takes 30-120 seconds depending on duration
- output: MP4 URL (download immediately, temporary)
CAMERA_MOTION_CONTROLS¶
Describe camera motion explicitly in the prompt — Runway interprets these reliably:
STATIC:
- "static wide shot" — locked camera, no movement
- "static close-up" — locked camera, tight framing
- RULE: use static for talking heads, product shots, UI demos
PAN:
- "slow pan left to right" — horizontal sweep
- "pan right revealing the scene" — discovery motion
- RULE: pans work best at 10-16s duration for smooth motion
DOLLY:
- "slow dolly in" — move camera toward subject
- "dolly out revealing the full scene" — pull back
- "push in on the subject's face" — dramatic focus
TRACKING:
- "tracking shot following the subject" — lateral movement with subject
- "orbit around the subject" — circular motion
CRANE/VERTICAL:
- "crane up revealing the city skyline" — upward vertical
- "low angle looking up" — power shot
- "bird's eye descending" — drone-like descent
ZOOM:
- "slow zoom in" — digital zoom effect
- "zoom out revealing context" — wider view
- NOTE: zoom is digital (crop), not optical — quality degrades
PROMPT_ENGINEERING¶
STRUCTURE:
EXAMPLE (weak):
EXAMPLE (strong):
slow tracking shot, a woman in a navy blazer walks confidently through
a modern glass office lobby, warm afternoon light streaming through
floor-to-ceiling windows, shallow depth of field, cinematic
RULES:
- describe action in present continuous ("is walking", "are falling")
- describe camera motion first — sets the visual framework
- specify lighting and atmosphere explicitly
- keep prompts under 200 words — shorter = more coherent
- avoid multiple subjects performing different actions
- one clear action per generation
QUALITY_EVALUATION¶
ALWAYS generate 3+ variants for any generative video. Evaluate each against:
TEMPORAL_CONSISTENCY:
- [ ] objects maintain shape across frames (no morphing)
- [ ] colors stay consistent (no sudden hue shifts)
- [ ] subject identity maintained (face doesn't change)
MOTION_QUALITY:
- [ ] no flickering or strobing
- [ ] camera motion is smooth (no sudden jumps)
- [ ] physics are plausible (gravity, motion blur direction correct)
- [ ] no speed inconsistencies (sudden slow-down or speed-up)
ARTIFACT_CHECK:
- [ ] no melting or warping at edges
- [ ] no ghosting of previous frames
- [ ] no sudden appearance/disappearance of objects
- [ ] hands and faces remain coherent
RULE: if no variant passes all checks, refine the prompt and regenerate
RULE: never deliver generative video without human review
RULE: generative video is best for b-roll and atmospheric shots — not primary content
PRICING¶
~$0.05-0.50 per generation depending on duration and plan.
4-second clips: cheapest, good for loops and short accents.
16-second clips: most expensive, best for establishing shots.
LICENSING: user owns output on paid plans. Commercial use permitted.
VIDEO:AVATAR_VIDEO¶
SYNTHESIA_API¶
TOOL: Synthesia API
ENDPOINT: POST https://api.synthesia.io/v2/videos
PURPOSE: avatar-based explainer videos from text script
REQUEST:
{
"title": "Product Walkthrough",
"input": [{
"scriptText": "Welcome to Growing Europe. Today I will walk you through our platform features.",
"avatar": "anna_costume1_cameraA",
"background": "off_white",
"avatarSettings": {
"horizontalAlign": "left",
"scale": 0.8,
"style": "rectangular"
}
}],
"aspectRatio": "16:9",
"test": false
}
AVATAR_SELECTION:
- browse available avatars via GET /v2/avatars
- filter by: gender, age range, attire (casual/business/custom)
- custom avatars: requires video submission + consent process
- RULE: use consistent avatar across a client's video series
- RULE: select avatar matching target audience expectations
SCRIPT_FORMATTING:
- plain text only — no SSML, no markup, no HTML
- punctuation controls pacing: periods create pauses, commas create brief pauses
- exclamation marks add emphasis
- question marks adjust intonation
- RULE: one paragraph per scene/slide
- RULE: read the script aloud before submitting — if it sounds unnatural, rewrite
- RULE: max ~10 minutes per video
- RULE: keep sentences under 20 words for natural delivery
MULTI_SCENE:
{
"input": [
{
"scriptText": "Welcome to our platform.",
"avatar": "anna_costume1_cameraA",
"background": "off_white"
},
{
"scriptText": "Let me show you the dashboard.",
"avatar": "anna_costume1_cameraA",
"background": "https://example.com/dashboard-screenshot.png"
},
{
"scriptText": "Thank you for watching.",
"avatar": "anna_costume1_cameraA",
"background": "ge_branded_outro"
}
]
}
MULTI_LANGUAGE:
- Synthesia supports 120+ languages
- specify language per scene: "en-US", "nl-NL", "de-DE", etc.
- same avatar can speak different languages
- RULE: verify pronunciation of brand names in each language
- RULE: script length varies by language — German ~30% longer than English
WORKFLOW:
1. submit video via POST (returns id)
2. poll GET /v2/videos/{id} for status
3. status progression: pending → processing → complete
4. rendering takes 5-15 minutes
5. download from download URL in response
PRICING: ~$0.50-2.00 per minute of output (varies by plan).
HEYGEN_API¶
TOOL: HeyGen API
ENDPOINT: POST https://api.heygen.com/v2/video/generate
PURPOSE: avatar videos — similar to Synthesia
COMPARISON:
| Feature | Synthesia | HeyGen |
|---------|-----------|--------|
| Lip sync quality | excellent | very good |
| Corporate feel | stronger | moderate |
| Rendering speed | 5-15 min | 2-10 min |
| Avatar variety | 200+ | 300+ |
| Custom avatar | yes (expensive) | yes (lower cost) |
| API design | REST, polling | REST, webhooks |
| Cost | higher | lower |
WHEN_TO_USE:
- Synthesia: corporate client presentations, formal explainers, enterprise onboarding
- HeyGen: product demos, social media content, rapid iteration, cost-sensitive projects
LICENSING: both — user owns output video. Avatar likeness rights included in subscription.
VIDEO:VOICEOVER¶
ELEVENLABS_API¶
TOOL: ElevenLabs API
ENDPOINT: POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
PURPOSE: high-quality AI voiceover for narration
REQUEST:
{
"text": "Welcome to Growing Europe. We build enterprise-grade software.",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.3,
"use_speaker_boost": true
}
}
VOICE_SETTINGS:
- stability: 0.0-1.0. Higher = more consistent, lower = more expressive. 0.5 for narration.
- similarity_boost: 0.0-1.0. Higher = closer to original voice. 0.75 recommended.
- style: 0.0-1.0. Higher = more expressive style. 0.0-0.3 for professional narration.
- RULE: test voice settings on a sample paragraph before full script
VOICE_SELECTION:
- browse voices via GET /v1/voices
- filter by: gender, age, accent, use case
- clone custom voice: requires consent + 3+ minutes of clean audio
- RULE: use consistent voice across a client's content
- RULE: match voice demographics to target audience
MULTI_LANGUAGE:
- eleven_multilingual_v2 supports 29 languages
- same voice can switch languages
- RULE: verify pronunciation of technical terms and brand names
SCRIPT_PREPARATION:
- write for spoken delivery, not reading
- short sentences (under 20 words)
- use hyphens for compound adjectives read as one: "enterprise-grade"
- spell out acronyms that should be spoken: "S-M-E" not "SME"
- add <break time="500ms"/> for intentional pauses (SSML if supported)
AUDIO_SPECS:
- output format: MP3 (default) or PCM WAV
- sample rate: 44100 Hz
- RULE: request WAV for Remotion integration (better quality, no decode overhead)
- RULE: normalize audio to -16 LUFS for consistent loudness
PRICING: ~$0.15-0.30 per 1000 characters depending on plan.
VIDEO:SCREEN_RECORDING¶
COMPOSITION_PATTERN¶
Screen recordings are often combined with narration and overlays:
RECORDING:
- capture at 2x target resolution (3840x2160 for 1920x1080 delivery)
- 60fps capture, deliver at 30fps (smoother motion, temporal downsampling)
- record system audio separately from microphone
- TOOL: OBS Studio (local), browser-based recorder (remote)
POST_CAPTURE:
1. trim dead time (loading screens, mistakes)
2. speed up repetitive actions (2x-4x)
3. add cursor highlight/zoom for important clicks
4. overlay narration audio
5. add animated callouts (Remotion or Lottie overlays)
REMOTION_INTEGRATION:
import { AbsoluteFill, Video, Sequence, staticFile } from 'remotion';
import { AnimatedCallout } from './AnimatedCallout';
import { CursorHighlight } from './CursorHighlight';
export const ScreenRecordingOverlay = () => (
<AbsoluteFill>
<Video src={staticFile("screen-recording.mp4")} style={{ objectFit: 'contain' }} />
<Sequence from={90} durationInFrames={60}>
<CursorHighlight x={400} y={300} />
</Sequence>
<Sequence from={120} durationInFrames={90}>
<AnimatedCallout
text="Click here to open settings"
position={{ bottom: 60, left: 400 }}
/>
</Sequence>
</AbsoluteFill>
);
VIDEO:FFMPEG_POST_PROCESSING¶
COMMON_OPERATIONS¶
TRANSCODE to web-optimized H.264:
ffmpeg -i input.mov -c:v libx264 -preset slow -crf 20 -c:a aac -b:a 128k -movflags +faststart output.mp4
-movflags +faststart moves metadata to front — REQUIRED for web streaming.
RESIZE to 720p:
ffmpeg -i input.mp4 -vf "scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2" output-720p.mp4
EXTRACT keyframes (for thumbnails/poster):
ADD SOFT CAPTIONS (user-selectable):
BURN-IN CAPTIONS (hard-coded):
ffmpeg -i input.mp4 -vf "subtitles=captions.srt:force_style='FontSize=24,PrimaryColour=&HFFFFFF&,Outline=2'" output.mp4
CREATE GIF PREVIEW (optimized):
ffmpeg -i input.mp4 -ss 0 -t 3 -vf "fps=10,scale=480:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" preview.gif
CONCATENATE CLIPS:
echo "file 'intro.mp4'" > list.txt
echo "file 'main.mp4'" >> list.txt
echo "file 'outro.mp4'" >> list.txt
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4
NORMALIZE AUDIO to -16 LUFS:
EXTRACT AUDIO:
ADD FADE IN/OUT:
ffmpeg -i input.mp4 -vf "fade=in:0:30,fade=out:st=27:d=1" -af "afade=in:0:1,afade=out:st=27:d=1" output.mp4
ANTI_PATTERNS¶
ANTI_PATTERN: re-encoding video unnecessarily (quality loss each time)
FIX: use -c copy when no pixel transformation needed (concatenating, adding subs, remuxing)
ANTI_PATTERN: missing -movflags +faststart for web video
FIX: always include it — without it, browser must download entire file before playback
ANTI_PATTERN: not normalizing audio across clips
FIX: always run loudnorm filter — inconsistent volume is jarring
ANTI_PATTERN: encoding at CRF 0-10 for web delivery
FIX: CRF 18-23 for web, CRF 15-18 for archival. Lower CRF = exponentially larger files.
VIDEO:QUALITY_CONTROL¶
CHECKLIST¶
BEFORE DELIVERY every video must pass:
VISUAL:
- [ ] resolution matches delivery spec
- [ ] no encoding artifacts (blocking, banding, mosquito noise)
- [ ] color correct (no unintended tint or saturation)
- [ ] brand colors accurate (sample check)
- [ ] text readable at target display size
- [ ] smooth motion (no dropped frames)
AUDIO:
- [ ] narration clear and intelligible
- [ ] background music not competing with narration
- [ ] consistent volume throughout (-16 LUFS target)
- [ ] no clipping or distortion
- [ ] no unintended silence gaps
- [ ] audio sync with visual (lip sync, timing)
TECHNICAL:
- [ ] file size within delivery spec
- [ ] format correct (H.264 MP4 for web)
- [ ] faststart flag present (check with ffprobe)
- [ ] correct FPS (30 for web, 24 for cinematic)
- [ ] poster frame set (first frame or custom)
ACCESSIBILITY:
- [ ] captions present and accurate
- [ ] audio descriptions if visual-only information present
- [ ] see accessibility-media.md
CROSS_REFERENCES¶
- Remotion implementation: remotion-patterns.md
- Image generation for video assets: image-generation.md
- Delivery specifications per platform: delivery-specs.md
- Asset optimization and CDN: asset-optimization.md
- Media accessibility requirements: accessibility-media.md