DOMAIN:DATABASE — PERFORMANCE¶

OWNER: boris, yoanna
UPDATED: 2026-03-24
SCOPE: PostgreSQL performance tuning for all GE projects

PERFORMANCE:EXPLAIN_ANALYZE¶

HOW TO USE¶

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

ALWAYS: use ANALYZE (actually runs the query) — plain EXPLAIN only estimates
ALWAYS: include BUFFERS — shows I/O activity
FORMAT: TEXT for human reading, JSON for programmatic analysis
CAUTION: ANALYZE executes the query — for destructive queries (UPDATE, DELETE), wrap in a transaction and ROLLBACK:

BEGIN;
EXPLAIN (ANALYZE, BUFFERS) UPDATE orders SET status = 'cancelled' WHERE ...;
ROLLBACK;

READING THE OUTPUT¶

Seq Scan on orders  (cost=0.00..1234.00 rows=50000 width=100) (actual time=0.01..45.23 rows=50000 loops=1)
  Filter: (status = 'pending')
  Rows Removed by Filter: 450000
  Buffers: shared hit=8234 read=1234

KEY_FIELDS:
- cost=startup..total — planner estimate, not actual time. First number is startup cost, second is total.
- rows=N — estimated row count (planner) vs actual rows=N — real row count
- actual time=start..end — real execution time in milliseconds
- loops=N — how many times this node executed (important for nested loops)
- Rows Removed by Filter — rows examined but discarded. HIGH NUMBER = missing index
- Buffers: shared hit=N — pages found in cache. read=N — pages read from disk. High read = cold cache or table too big for memory.

NODE TYPES TO KNOW¶

SEQ_SCAN: full table scan. OK for small tables (<10k rows). BAD for large tables — add an index.
INDEX_SCAN: uses index to find rows, then fetches from table. GOOD for selective queries.
INDEX_ONLY_SCAN: all data from index, no table fetch. BEST — means index covers the query.
BITMAP_INDEX_SCAN + BITMAP_HEAP_SCAN: index scan for moderate selectivity. PostgreSQL combines multiple indexes. OK.
NESTED_LOOP: for each row in outer, scan inner. GOOD for small outer + indexed inner. BAD for large both.
HASH_JOIN: builds hash table from smaller set, probes with larger. GOOD for equi-joins on larger sets.
MERGE_JOIN: both sides sorted, merge. GOOD when both are already sorted.
SORT: explicit sort. Check if an index can eliminate it. Sort Method: external merge = ran out of work_mem.
HASH_AGGREGATE: grouping via hash table. Watch for Batches: N > 1 = exceeded work_mem.

RED FLAGS IN EXPLAIN¶

FLAG: Seq Scan on table with >10k rows and selective WHERE clause → missing index
FLAG: Rows Removed by Filter >> actual rows → index not selective enough or missing
FLAG: Nested Loop with large outer set → consider hash join (check join conditions, indexes)
FLAG: Sort Method: external merge Disk → increase work_mem or add index to avoid sort
FLAG: Buffers read >> shared hit → table/index not cached, may need more shared_buffers
FLAG: actual rows >> estimated rows → stale statistics, run ANALYZE on the table
FLAG: actual rows << estimated rows → stale statistics or correlation issue
FLAG: loops > 1 with high time per loop → repeated expensive operation

PERFORMANCE:INDEX_STRATEGY¶

WHEN TO ADD AN INDEX¶

ALWAYS: foreign key columns (PostgreSQL does NOT auto-create these)
ALWAYS: columns used in WHERE clauses of frequent queries
ALWAYS: columns used in JOIN conditions
ALWAYS: columns used in ORDER BY for queries needing sorted results
CONSIDER: columns used in GROUP BY
NEVER: columns on very small tables (<1000 rows) — seq scan is faster
NEVER: columns with very low cardinality on large tables (boolean, status with 3 values) — use partial index instead

INDEX SELECTION GUIDE¶

B-TREE → equality, range, ORDER BY, LIKE 'prefix%'
GIN → JSONB operators, array containment, full-text search, trigram similarity
GIST → range types, geometric, spatial (PostGIS), exclusion constraints
BRIN → very large tables with natural ordering (time-series, append-only logs)

COVERING INDEX (INDEX-ONLY SCAN)¶

-- Query: SELECT email, display_name FROM users WHERE is_active = true ORDER BY email
-- Covering index:
CREATE INDEX idx_users_active_email ON users (email) INCLUDE (display_name) WHERE is_active = true;

RESULT: Index Only Scan — zero table heap access, maximum performance
NOTE: INCLUDE columns are stored in the index but not used for search — only for retrieval
NOTE: wider indexes = more storage, slower writes — only cover frequently executed queries

MONITORING INDEX USAGE¶

-- Indexes that are never used (candidates for dropping)
SELECT s.schemaname, s.relname AS table, s.indexrelname AS index,
       s.idx_scan AS scans, pg_size_pretty(pg_relation_size(s.indexrelid)) AS size
FROM pg_stat_user_indexes s
JOIN pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0
AND NOT i.indisunique  -- keep unique constraint indexes
AND s.schemaname = 'public'
ORDER BY pg_relation_size(s.indexrelid) DESC;

-- Most-scanned indexes (confirm they are working)
SELECT relname, indexrelname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC LIMIT 20;

INDEX BLOAT¶

CAUSE: frequent updates/deletes leave dead index entries until REINDEX
DETECT:

SELECT nspname, relname,
       round(100 * pg_relation_size(indexrelid) / pg_relation_size(indrelid)) AS index_ratio
FROM pg_index
JOIN pg_class ON pg_class.oid = indexrelid
JOIN pg_namespace ON pg_namespace.oid = relnamespace
WHERE pg_relation_size(indrelid) > 0
ORDER BY pg_relation_size(indexrelid) DESC LIMIT 20;

FIX: REINDEX INDEX CONCURRENTLY idx_name; — does not block reads/writes (PG 12+)

PERFORMANCE:QUERY_OPTIMIZATION¶

PATTERN: N+1 QUERY¶

SYMPTOM: 1 query to get list + N queries to get related data (one per row)
CAUSE: ORM lazy loading or manual loop queries
FIX: use JOIN or Drizzle's with (relational queries)

// BAD: N+1
const users = await db.select().from(usersTable);
for (const user of users) {
  const orders = await db.select().from(ordersTable).where(eq(ordersTable.userId, user.id));
}

// GOOD: single query with join
const usersWithOrders = await db
  .select()
  .from(usersTable)
  .leftJoin(ordersTable, eq(usersTable.id, ordersTable.userId));

PATTERN: MISSING LIMIT¶

SYMPTOM: query returns thousands of rows when UI shows 20
FIX: always LIMIT + OFFSET or cursor-based pagination
PREFER: cursor-based pagination for large datasets (OFFSET becomes slow at high values)

// Cursor-based pagination
const nextPage = await db
  .select()
  .from(ordersTable)
  .where(gt(ordersTable.createdAt, lastSeenCreatedAt))
  .orderBy(ordersTable.createdAt)
  .limit(20);

PATTERN: SELECT *¶

SYMPTOM: fetching all columns when only 2 are needed
FIX: select specific columns — reduces I/O, enables index-only scans

// BAD
const results = await db.select().from(usersTable);

// GOOD
const results = await db.select({ id: usersTable.id, email: usersTable.email }).from(usersTable);

PATTERN: FUNCTION IN WHERE CLAUSE¶

SYMPTOM: WHERE lower(email) = 'foo@bar.com' cannot use index on email
FIX: functional index: CREATE INDEX idx_users_email_lower ON users (lower(email));
ALTERNATIVE: use citext type for case-insensitive columns

PATTERN: OR IN WHERE CLAUSE¶

SYMPTOM: WHERE status = 'a' OR status = 'b' may not use index efficiently
FIX: WHERE status IN ('a', 'b') — optimizer handles this better
FIX_FOR_DIFFERENT_COLUMNS: WHERE a = 1 OR b = 2 → use UNION ALL of two indexed queries

PATTERN: CORRELATED SUBQUERY¶

SYMPTOM: subquery re-executed for every row of outer query
FIX: rewrite as JOIN or CTE

-- BAD: correlated subquery
SELECT * FROM orders o WHERE amount > (SELECT AVG(amount) FROM orders WHERE user_id = o.user_id);

-- GOOD: JOIN with aggregate
SELECT o.* FROM orders o
JOIN (SELECT user_id, AVG(amount) as avg_amount FROM orders GROUP BY user_id) a
  ON o.user_id = a.user_id AND o.amount > a.avg_amount;

PATTERN: IMPLICIT CASTING¶

SYMPTOM: WHERE uuid_column = '...' — string vs UUID comparison may prevent index use
FIX: ensure parameter types match column types — Drizzle handles this, raw SQL may not

PERFORMANCE:CONNECTION_POOLING¶

FORMULA¶

optimal_pool_size = (core_count * 2) + effective_spindle_count

FOR_SSD: effective_spindle_count = 1
EXAMPLE: 4 cores, SSD → pool_size = (4 * 2) + 1 = 9

NOTE: this is for the DATABASE connection limit, not the application pool
APPLICATION_POOL: should be 2x-4x the database pool (PgBouncer multiplexes)
UPCLOUD_MANAGED: max_connections is fixed by plan tier — check UpCloud dashboard

PGBOUNCER SIZING¶

[pgbouncer]
pool_mode = transaction
max_client_conn = 200        # Application connections to PgBouncer
default_pool_size = 20       # PgBouncer connections to PostgreSQL
reserve_pool_size = 5        # Emergency extra connections
reserve_pool_timeout = 3     # Seconds before using reserve pool
server_idle_timeout = 600    # Close idle backend connections after 10min

MONITORING CONNECTIONS¶

-- Current connection count vs maximum
SELECT count(*) AS active, setting AS max
FROM pg_stat_activity, pg_settings
WHERE pg_settings.name = 'max_connections'
GROUP BY setting;

-- Connections by state
SELECT state, count(*) FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state;

ALERT: if active connections > 80% of max_connections, investigate immediately
ALERT: idle in transaction connections are holding locks — find and fix the application code

PERFORMANCE:VACUUM¶

WHY VACUUM MATTERS¶

PostgreSQL uses MVCC — UPDATE creates a new row version, old version remains until VACUUM.
Without VACUUM: table bloats, indexes bloat, queries slow down, transaction ID wraparound risk.

AUTOVACUUM CONFIGURATION¶

DEFAULT autovacuum works for most tables. Tune only when needed.

-- Check autovacuum settings per table
SELECT relname, reloptions FROM pg_class WHERE relname = 'your_table';

-- Per-table autovacuum tuning for high-write tables
ALTER TABLE orders SET (
  autovacuum_vacuum_threshold = 1000,        -- minimum dead tuples before vacuum (default 50)
  autovacuum_vacuum_scale_factor = 0.05,     -- fraction of table size (default 0.2)
  autovacuum_analyze_threshold = 500,        -- minimum changed rows before analyze (default 50)
  autovacuum_analyze_scale_factor = 0.025    -- fraction of table size (default 0.1)
);

WHEN_TO_TUNE: table has >1M rows AND high update/delete rate
RULE: lower thresholds = more frequent vacuum = less bloat but more CPU usage

MONITORING VACUUM¶

-- When was each table last vacuumed and analyzed?
SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze,
       n_dead_tup, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Tables with most dead tuples (need vacuum attention)
SELECT relname, n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

ALERT: dead_pct > 20% → autovacuum is falling behind, tune or run manual VACUUM
ALERT: n_dead_tup > n_live_tup → table is more dead than alive, VACUUM urgently needed

TRANSACTION ID WRAPAROUND¶

CRITICAL: PostgreSQL has a 2-billion transaction ID limit. After that: database SHUTS DOWN.
MONITOR:

SELECT datname, age(datfrozenxid) AS xid_age,
       current_setting('autovacuum_freeze_max_age')::int AS freeze_max
FROM pg_database
ORDER BY age(datfrozenxid) DESC;

ALERT: if xid_age > 500,000,000 → investigate why autovacuum anti-wraparound is not running
ALERT: if xid_age > 1,000,000,000 → CRITICAL, manual intervention needed immediately

PERFORMANCE:TABLE_PARTITIONING¶

WHEN TO PARTITION¶

CONSIDER: table exceeds 50M rows AND queries consistently filter by partition key
CONSIDER: table grows continuously (time-series, logs, events) AND old data needs archival
DO_NOT_PARTITION: tables under 10M rows — overhead exceeds benefit
DO_NOT_PARTITION: tables queried without the partition key — every partition gets scanned

PARTITION STRATEGIES¶

RANGE: time-based partitioning (most common) — monthly, weekly, daily

CREATE TABLE events (
  id UUID DEFAULT gen_random_uuid(),
  created_at TIMESTAMPTZ NOT NULL,
  event_type TEXT NOT NULL,
  payload JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_01 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

LIST: category-based partitioning (by region, by type)
HASH: even distribution when no natural partition key (rarely used)

PARTITION MAINTENANCE¶

AUTOMATE: partition creation for future months — cron job or pg_partman extension
ARCHIVE: detach old partitions instead of deleting data:

ALTER TABLE events DETACH PARTITION events_2025_01;
-- Archive to cold storage or drop

NOTE: detaching a partition is instant — no data movement

PERFORMANCE:MATERIALIZED_VIEWS¶

WHEN TO USE¶

USE_FOR: expensive aggregate queries that do not need real-time data
USE_FOR: dashboard summaries, reporting queries, leaderboards
REFRESH: on a schedule (cron) or after relevant data changes
NOT_FOR: data that must be real-time — use regular views or optimized queries instead

IMPLEMENTATION¶

CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT date_trunc('month', created_at) AS month,
       tenant_id,
       sum(amount) AS total,
       count(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1, 2;

CREATE UNIQUE INDEX idx_mv_monthly_revenue ON monthly_revenue (month, tenant_id);

-- Refresh without blocking reads
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;

NOTE: CONCURRENTLY requires a unique index on the materialized view
NOTE: first refresh cannot be CONCURRENT — must be regular refresh
SCHEDULE: refresh via cron job or trigger after bulk data changes

PERFORMANCE:PG_STAT_STATEMENTS¶

SETUP¶

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

CONFIG (postgresql.conf):

shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = top

TOP QUERIES BY TIME¶

SELECT query, calls, mean_exec_time::numeric(10,2) AS avg_ms,
       total_exec_time::numeric(12,2) AS total_ms,
       rows, shared_blks_hit, shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

TOP QUERIES BY CALLS¶

SELECT query, calls, mean_exec_time::numeric(10,2) AS avg_ms
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

RESET AFTER OPTIMIZATION¶

SELECT pg_stat_statements_reset();

NOTE: reset after deploying optimizations to get clean metrics
NOTE: UpCloud managed DB may have pg_stat_statements pre-configured — check with Arjan

PERFORMANCE:QUICK_WINS_CHECKLIST¶

[ ] pg_stat_statements enabled and monitored
[ ] All foreign key columns have indexes
[ ] Connection pooling (PgBouncer) in front of production DB
[ ] autovacuum running (check last_autovacuum timestamps)
[ ] No idle in transaction connections lingering
[ ] No Seq Scans on tables >10k rows in frequent queries
[ ] LIMIT on all list queries
[ ] No SELECT * in production code
[ ] work_mem set appropriately (4MB-64MB depending on query complexity)
[ ] shared_buffers = 25% of available RAM
[ ] effective_cache_size = 75% of available RAM
[ ] random_page_cost = 1.1 for SSD (default 4.0 is for spinning disk)