Query Execution & Optimization

Query Pipeline

Every SQL query passes through these stages:

flowchart LR
    SQL["SELECT ..."] --> Parser
    Parser -->|Parse Tree| Rewriter
    Rewriter -->|Query Tree| Planner
    Planner -->|Plan Tree| Executor
    Executor -->|Tuples| Result

1. Parser: Converts SQL text to an AST (parse tree). Validates syntax, resolves table/column names, checks permissions.

2. Rewriter: Applies semantic transformations — expands views, unfolds nested queries, simplifies expressions, applies rules (PostgreSQL rules system).

3. Planner (Optimizer): The most complex stage. Generates multiple execution plans and picks the cheapest (cost-based optimization).

4. Executor: Executes the chosen plan. Pulls tuples through the plan tree (Volcano-style iterator model).

Scan Methods

Scan Type	Description	When Used
Sequential Scan	Read all rows sequentially	No index, large portion of table needed
Index Scan	Walk B-Tree → fetch heap tuple	Highly selective queries
Index-Only Scan	Read from index alone, no heap fetch	Index covers all needed columns
Bitmap Scan	Multiple index scans → bitmap → heap fetch	Combination of conditions, moderate selectivity
TID Scan	Direct row access by CTID	From subquery or WITH (known row location)

Each scan method has a cost formula that multiplies the number of pages read by a cost factor (sequential vs random I/O). The optimizer picks the cheapest based on table statistics.

Join Algorithms

Nested Loop Join

for each row in outer (smaller) relation:
    for each row in inner (larger) relation:
        if match: emit joined row

Variant	Complexity	When Used
Plain	O(n*m)	Small outer, no index on inner
Indexed	O(n * log m)	Small outer, index on inner join key
Materialized	O(n*m)	Small outer, inner materialized in memory

Best for: Small outer relation (<1000 rows), especially with an index on the inner relation.

Hash Join

1. Build: Scan inner relation, build hash table on join key
2. Probe: Scan outer relation, probe hash table for matches

Phase	Memory	Complexity
Build	O(min(n,m)) hash table size	O(m)
Probe	Ongoing	O(n)
Grace Hash Join (disk)	When exceeds work_mem	O(n+m) writes/reads to disk

Best for: Equi-joins on unsorted data where one side can fit in memory.

Merge Join

1. Sort both relations on join key (or use existing order)
2. Iterate both sorted streams in parallel, emitting matches

Phase	Memory	Complexity
Sort	O(n+m) if not pre-sorted	O(n log n + m log m)
Merge	O(1)	O(n+m)

Best for: Large tables already sorted on join key (e.g., from index order, or when ORDER BY matches join key).

Parallel Query Execution

Modern databases distribute query execution across multiple CPU cores:

graph TD
    Leader[Parallel Leader<br/>gathers results]
    Leader --> W1[Worker 1<br/>Partial scan/join]
    Leader --> W2[Worker 2<br/>Partial scan/join]
    Leader --> W3[Worker 3<br/>Partial scan/join]
    W1 --> PG[Partial Aggregate<br/>count=30]
    W2 --> PG2[Partial Aggregate<br/>count=25]
    W3 --> PG3[Partial Aggregate<br/>count=28]
    PG & PG2 & PG3 --> Final[Final Aggregate<br/>count=83]

Operator	Parallelism	Notes
Seq Scan	Multiple workers scan page ranges	Each worker reads a portion of the table
Index Scan	Multiple workers scan index ranges	Range-based partitioning
Hash Join	Build hash in parallel, probe in parallel	Shared hash or partitioned hash
Aggregate	Partial → gather → final	Two-phase aggregation
Sort	Merge sort: each worker sorts a chunk	GATHER MERGE sorts final output

Limitations:

Only sequential scans, index scans, joins, and aggregates can be parallelized
CTEs (WITH) are optimization fences in some databases
Small tables or queries with LIMIT often don’t use parallel plans (overhead > benefit)

Cost Estimation

Databases use table and column statistics to estimate the cost of each plan. The optimizer computes a cost for each possible plan node based on estimated row counts, I/O costs (sequential vs random), and CPU costs. Each database engine has its own cost model and system catalogs for statistics.

Common Query Optimizations

| Problem | Symptom | Fix | |---|---|---|---| | Missing index | Table scan on large table | Add index on WHERE/JOIN columns | | Wrong join order | Nested loop on large tables | Update statistics, check join column indexes | | Stale statistics | Suboptimal plan choice | Refresh table statistics (ANALYZE-equivalent) | | Subquery repeated | Same subquery executed N times | Rewrite as join or use CTE |

Index Design Heuristics

Prefix equality columns in compound indexes: WHERE a = 1 AND b > 5 → index on (a, b)
Covering indexes: Include all queried columns to enable index-only scans
Partial indexes: CREATE INDEX ... WHERE status = 'active' — smaller and faster
Included columns (SQL Server, PostgreSQL 11+): Non-key columns in index for covering