DOMAIN:DATABASE — MIGRATIONS¶

OWNER: boris, yoanna
UPDATED: 2026-03-24
SCOPE: all database migrations for client projects and GE internal

MIGRATIONS:PRINCIPLES¶

RULE: every schema change is a migration — no manual DDL in production, ever
RULE: migrations are forward-only in production — rollback is a NEW forward migration
RULE: every migration runs on staging before production — no exceptions
RULE: migrations must be idempotent where possible — safe to run twice
RULE: Boris or Yoanna reviews every migration before production execution
RULE: zero downtime is the default expectation — application serves traffic during migration

MIGRATIONS:DRIZZLE_WORKFLOW¶

DEVELOPMENT CYCLE¶

STEP_1: modify schema files in src/db/schema/
STEP_2: generate migration: npx drizzle-kit generate
STEP_3: review generated SQL in drizzle/migrations/ — Drizzle is a generator, not an authority
STEP_4: test migration on local database: npx drizzle-kit migrate
STEP_5: test migration on staging
STEP_6: commit migration file — one migration per logical change
STEP_7: DBA review (Boris/Yoanna)
STEP_8: production deployment via CI/CD pipeline

DRIZZLE CONFIG¶

// drizzle.config.ts
import { defineConfig } from 'drizzle-kit';

export default defineConfig({
  schema: './src/db/schema/index.ts',
  out: './drizzle/migrations',
  dialect: 'postgresql',
  dbCredentials: {
    url: process.env.DATABASE_URL!,
  },
});

MIGRATION FILE NAMING¶

FORMAT: NNNN_{description}.sql — sequential numbering, snake_case description
EXAMPLE: 0001_initial_schema.sql, 0002_add_user_roles.sql, 0003_orders_add_status_index.sql
NOTE: Drizzle generates timestamp-based names — rename to sequential for clarity
NOTE: never modify a committed migration file — create a new migration instead

MIGRATIONS:ZERO_DOWNTIME_PATTERN¶

EXPAND-CONTRACT PATTERN¶

The core strategy for zero-downtime schema changes. Three phases:

PHASE_1_EXPAND: add new structures alongside old ones — both work simultaneously
PHASE_2_MIGRATE: backfill data, switch application to use new structures
PHASE_3_CONTRACT: remove old structures after verification period

EXAMPLE: RENAME A COLUMN¶

WRONG — causes immediate downtime:

-- DO NOT DO THIS — application queries break instantly
ALTER TABLE users RENAME COLUMN name TO display_name;

RIGHT — expand-contract:

-- Migration 1: EXPAND — add new column
ALTER TABLE users ADD COLUMN display_name varchar(100);

-- Migration 2: BACKFILL — copy data (run in batches for large tables)
UPDATE users SET display_name = name WHERE display_name IS NULL;

-- Application deploy: code reads from display_name, writes to BOTH name AND display_name

-- Migration 3: CONTRACT — after all code uses display_name only
ALTER TABLE users DROP COLUMN name;
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;

TIMELINE: expand and application deploy can happen same day. Contract waits minimum 48 hours.

EXAMPLE: CHANGE A COLUMN TYPE¶

WRONG — locks table, blocks all queries:

-- DO NOT DO THIS on large tables — full table rewrite under ACCESS EXCLUSIVE lock
ALTER TABLE orders ALTER COLUMN amount TYPE numeric(14,2);

RIGHT — expand-contract:

-- Migration 1: add new column with desired type
ALTER TABLE orders ADD COLUMN amount_new numeric(14,2);

-- Migration 2: backfill (batched, see below)
-- Application: dual-write to both columns

-- Migration 3: swap
ALTER TABLE orders DROP COLUMN amount;
ALTER TABLE orders RENAME COLUMN amount_new TO amount;

EXAMPLE: ADD NOT NULL TO EXISTING COLUMN¶

WRONG — fails if any NULL values exist, locks table for constraint check:

ALTER TABLE users ALTER COLUMN bio SET NOT NULL;

RIGHT — three-step:

-- Migration 1: set default for new rows
ALTER TABLE users ALTER COLUMN bio SET DEFAULT '';

-- Migration 2: backfill existing NULLs (batched)
UPDATE users SET bio = '' WHERE bio IS NULL;

-- Migration 3: add NOT NULL constraint as NOT VALID first (instant, no lock)
ALTER TABLE users ADD CONSTRAINT chk_users_bio_not_null CHECK (bio IS NOT NULL) NOT VALID;

-- Migration 4: validate the constraint (takes ShareUpdateExclusive lock, not exclusive)
ALTER TABLE users VALIDATE CONSTRAINT chk_users_bio_not_null;

-- Migration 5 (optional): convert to real NOT NULL after validation
ALTER TABLE users ALTER COLUMN bio SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT chk_users_bio_not_null;

NOTE: the NOT VALID + VALIDATE two-step avoids ACCESS EXCLUSIVE lock during check
NOTE: VALIDATE takes ShareUpdateExclusive lock — allows reads AND writes during validation

MIGRATIONS:DANGEROUS_OPERATIONS¶

LOCK_LEVEL_REFERENCE¶

ACCESS_EXCLUSIVE (blocks everything): ALTER TABLE ... ADD COLUMN with DEFAULT (PG <11), DROP COLUMN, ALTER TYPE, RENAME, ADD CONSTRAINT (without NOT VALID)
SHARE_UPDATE_EXCLUSIVE (blocks other DDL but allows DML): VALIDATE CONSTRAINT, CREATE INDEX CONCURRENTLY
SHARE_ROW_EXCLUSIVE: CREATE TRIGGER
ACCESS_SHARE (minimal): SELECT

ADDING A COLUMN¶

SAFE_IN_PG16: ALTER TABLE ADD COLUMN ... DEFAULT ... — no table rewrite since PG 11
SAFE: ALTER TABLE ADD COLUMN nullable_column type — instant, no rewrite
DANGEROUS: adding a column with a volatile default (function call) — requires table rewrite
NOTE: even though ADD COLUMN is instant in PG 11+, it still takes ACCESS EXCLUSIVE lock briefly

DROPPING A COLUMN¶

RISK: ACCESS EXCLUSIVE lock — brief but blocks all queries during acquisition
MITIGATION: drop during low-traffic period, ensure lock acquisition timeout is set

SET lock_timeout = '5s';
ALTER TABLE users DROP COLUMN IF EXISTS old_column;

CREATING AN INDEX¶

WRONG — locks table for writes during entire index build:

CREATE INDEX idx_orders_user ON orders (user_id);

RIGHT — concurrent index build, no write lock:

CREATE INDEX CONCURRENTLY idx_orders_user ON orders (user_id);

NOTE: CONCURRENTLY cannot run inside a transaction — Drizzle migrations run in transactions by default
SOLUTION: use a separate migration file with transaction disabled
NOTE: if concurrent index creation fails, it leaves an INVALID index — must DROP and retry
CHECK: after CONCURRENTLY, verify index is valid:

SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE indexrelid = 'idx_orders_user'::regclass;

DROPPING AN INDEX¶

SAFE: DROP INDEX CONCURRENTLY idx_name — does not block queries
NOTE: also cannot run inside a transaction

ADDING A FOREIGN KEY¶

DANGEROUS: full table scan to validate — long lock on large tables
SAFE_PATTERN:

-- Add FK as NOT VALID — instant, no scan
ALTER TABLE orders ADD CONSTRAINT fk_orders_user
  FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Validate separately — ShareUpdateExclusive lock, allows reads+writes
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_user;

ALTERING ENUMS¶

SAFE: ALTER TYPE status ADD VALUE 'new_value' — cannot run in transaction in PG <12
DANGEROUS: removing or renaming enum values — requires creating new type and migrating
PATTERN_FOR_REMOVING_ENUM_VALUE:

-- 1. Create new enum type without the removed value
CREATE TYPE status_new AS ENUM ('active', 'archived');
-- 2. Alter column to use text temporarily
ALTER TABLE orders ALTER COLUMN status TYPE text;
-- 3. Update any rows with the removed value
UPDATE orders SET status = 'archived' WHERE status = 'deleted';
-- 4. Alter column to new enum
ALTER TABLE orders ALTER COLUMN status TYPE status_new USING status::status_new;
-- 5. Drop old enum
DROP TYPE status;
-- 6. Rename new enum
ALTER TYPE status_new RENAME TO status;

MIGRATIONS:BATCHED_BACKFILL¶

For tables with >100k rows, never UPDATE all rows in one transaction.

BATCH UPDATE PATTERN¶

-- Update in batches of 10,000
DO $$
DECLARE
  batch_size INT := 10000;
  affected INT;
BEGIN
  LOOP
    UPDATE users SET display_name = name
    WHERE id IN (
      SELECT id FROM users
      WHERE display_name IS NULL
      LIMIT batch_size
      FOR UPDATE SKIP LOCKED
    );
    GET DIAGNOSTICS affected = ROW_COUNT;
    EXIT WHEN affected = 0;
    RAISE NOTICE 'Updated % rows', affected;
    PERFORM pg_sleep(0.1); -- brief pause to reduce lock contention
    COMMIT;
  END LOOP;
END $$;

WHY: avoids long-running transaction, reduces lock duration, allows autovacuum to work
NOTE: FOR UPDATE SKIP LOCKED prevents conflicts with concurrent operations
NOTE: adjust batch_size based on table width and server capacity — monitor during execution

MIGRATIONS:ROLLBACK_STRATEGY¶

PHILOSOPHY¶

RULE: there is no rollback in production — there is only a new forward migration
WHY: rollback scripts are rarely tested, often incomplete, can cause data loss
STRATEGY: every migration must be reversible by writing a new forward migration

PRE-MIGRATION CHECKLIST¶

[ ] Migration tested on staging with production-like data volume
[ ] Rollback migration written and tested (a new forward migration that undoes the change)
[ ] Lock timeouts configured: SET lock_timeout = '5s'
[ ] Monitoring dashboard open during execution
[ ] Communication to team: migration window, expected duration, rollback plan
[ ] Backup verified: point-in-time recovery available

DISASTER RECOVERY¶

SCENARIO: migration corrupted data
ACTION: stop application traffic, assess damage, restore from backup to point-in-time before migration
TOOL: UpCloud point-in-time recovery (production), pg_dump snapshots (staging)
NOTE: UpCloud managed DB supports PITR — coordinate with Arjan for restore procedure

MIGRATIONS:IDEMPOTENT_PATTERNS¶

SAFE PATTERNS¶

-- Create table if not exists
CREATE TABLE IF NOT EXISTS new_table (...);

-- Add column if not exists (PG 9.6+)
ALTER TABLE users ADD COLUMN IF NOT EXISTS bio text;

-- Drop column if exists
ALTER TABLE users DROP COLUMN IF EXISTS old_column;

-- Create index if not exists
CREATE INDEX IF NOT EXISTS idx_name ON table (column);

-- Create extension if not exists
CREATE EXTENSION IF NOT EXISTS pgcrypto;

ENUM IDEMPOTENCY¶

-- Add enum value only if it doesn't exist (PG 12+)
ALTER TYPE status ADD VALUE IF NOT EXISTS 'new_value';

FUNCTION IDEMPOTENCY¶

CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = now();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

MIGRATIONS:TESTING¶

LOCAL TESTING¶

STEP_1: start from clean database: dropdb test && createdb test
STEP_2: run ALL migrations from scratch: npx drizzle-kit migrate
STEP_3: verify schema matches expectations: compare pg_dump output
STEP_4: run application test suite against migrated database

STAGING TESTING¶

STEP_1: take staging backup before migration
STEP_2: run migration on staging
STEP_3: verify migration completed without errors
STEP_4: run application smoke tests against staging
STEP_5: verify data integrity — row counts, NULL checks, constraint satisfaction
STEP_6: measure migration duration — multiply by 2x-5x for production estimate

LOAD TESTING MIGRATIONS¶

FOR_LARGE_TABLES: time the migration against staging with representative data volume
ALERT: if migration takes >30 seconds on staging, it needs redesign
MEASURE: lock wait times, active query impact, replication lag
TOOL: pg_stat_activity during migration to monitor blocking

MIGRATIONS:CI_CD_INTEGRATION¶

PIPELINE STEPS¶

# In CI/CD pipeline
migrate:
  staging:
    - run: npx drizzle-kit migrate --config drizzle.staging.config.ts
    - verify: pg_isready and smoke tests
    - approve: manual gate (Boris/Yoanna)
  production:
    - pre: verify staging migration succeeded
    - run: npx drizzle-kit migrate --config drizzle.production.config.ts
    - post: smoke tests + monitoring check
    - rollback: manual procedure documented in runbook

MIGRATION LOCK¶

NOTE: Drizzle Kit uses a __drizzle_migrations table to track applied migrations
NOTE: only one migration process should run at a time — use advisory locks if needed:

SELECT pg_try_advisory_lock(12345); -- acquire lock
-- run migrations
SELECT pg_advisory_unlock(12345); -- release lock

MIGRATIONS:CHECKLIST_FOR_DBA_REVIEW¶

Boris/Yoanna — check these before approving any migration:

[ ] Does this migration acquire ACCESS EXCLUSIVE lock on a large table?
[ ] Are indexes created CONCURRENTLY?
[ ] Are foreign keys added with NOT VALID + separate VALIDATE?
[ ] Is the migration idempotent (IF NOT EXISTS / IF EXISTS)?
[ ] For column type changes: is expand-contract used?
[ ] For NOT NULL additions: is the NOT VALID pattern used?
[ ] Is lock_timeout set for any DDL?
[ ] Has this been tested on staging with representative data volume?
[ ] Is the estimated execution time acceptable for production?
[ ] Is a rollback migration prepared?
[ ] Are backfills batched for tables >100k rows?