Query Processing & Execution Internals
Query processing is the complete sequence of operations a DBMS performs to transform a high-level declarative SQL statement into an efficient, executable plan that retrieves or manipulates data. Understanding these internals is essential for writing fast queries, diagnosing performance problems, and designing database systems.
This guide covers the full query pipeline (lexing, parsing, semantic analysis, optimization, execution), all major join algorithms with I/O cost formulas, external sorting, the Volcano iterator model, vectorized execution, annotated SQL walkthroughs, and a 10-question practice quiz.
1Introduction
When you submit a SQL query, the database does not simply "run" it. Instead, a sophisticated pipeline of components parses, validates, rewrites, optimizes, and then executes your query. The difference between a naive execution and an optimized one can be the difference between milliseconds and hours on the same data.
Query processing matters across the full spectrum of database applications: in OLTP systems, efficient single-row lookups with index scans keep transaction latency low; in OLAP systems, vectorized hash joins and aggregate operators enable analytical queries over billions of rows in seconds. For database administrators, engineers, and architects alike, a working knowledge of these internals is indispensable for performance tuning.
Modern analytical databases like Snowflake, Google BigQuery, and DuckDB leverage advanced query processing techniques — vectorized execution, JIT compilation, and cost-based join reordering — to achieve sub-second query latencies over petabytes of data. Understanding these internals lets you write queries that cooperate with the optimizer rather than fight it.
Three Main Phases
Parsing
Lexical analysis, syntax checking, and semantic validation. Produces a logical query plan expressed in relational algebra.
Optimization
Logical rewrites (predicate pushdown, join reordering) and physical operator selection (Hash Join vs. Sort-Merge). Produces a physical query plan.
Execution
The execution engine interprets the physical plan via the Volcano iterator model (or vectorized/JIT variants) and returns results.
2Key Definitions
Precise terminology is essential before diving into the mechanics of query processing.
Query Processor
The core DBMS component that interprets, optimizes, and executes user queries — orchestrating the entire pipeline from SQL to results.
Lexical Analysis (Scanning)
Breaks the SQL string into tokens (keywords like SELECT, identifiers, operators, literals). Performed by a lexer or scanner.
Parse Tree (Syntax Tree)
Hierarchical tree representing the grammatical structure of the query according to SQL grammar rules.
Semantic Analysis
Validates name resolution, type checking, authorization, and view expansion against the database schema.
Logical Query Plan
An operator tree in relational algebra (σ, π, ⋈, ∪) describing what data to retrieve, independent of physical implementation.
Physical Query Plan
A fully executable plan specifying how: concrete operators (HashJoin, IndexScan), access methods, and execution order.
Query Optimizer
Transforms a logical plan into an optimal physical plan by exploring join orders, algorithm choices, and access paths, guided by a cost model.
Cost Model
A function estimating resource consumption (I/O, CPU, network, memory) of a plan. Example: cost = C_CPU × |tuples| + C_IO × |blocks|.
Cardinality Estimation
Predicting the number of rows each operator will output. Uses histograms, distinct value counts, and statistical assumptions.
Iterator Model (Volcano)
Pull-based execution model where each operator implements open(), next(), close(). Parents pull tuples from children on demand.
Pipelining
Intermediate results flow directly between operators without being fully stored. Reduces I/O and latency.
Materialization
An intermediate result is fully computed and stored before the next operator reads it. Required for Sort, Hash Join build phase, etc.
Access Path
Method for reading data from a base table: Table Scan (read all blocks) or Index Scan (traverse index to locate matching rows).
Vectorized Execution
Operators process data in batches (vectors) of thousands of tuples, improving cache efficiency and enabling SIMD instructions.
External Sorting
Multi-pass merge-sort for data that exceeds available memory. Creates sorted runs, then merges them progressively.
Join Ordering
Determining the optimal sequence to join N relations. An NP-hard problem solved approximately with dynamic programming or heuristics.
3The Query Processing Pipeline
A SQL query passes through nine distinct stages before results are returned. Each stage transforms the query representation, progressively moving from high-level intent to low-level, executable operations.
Query Processing Pipeline
From raw SQL to result set in three phases
SQL Query
Declarative statement from user
SELECT name FROM Employees WHERE dept = 10
Lexer
Breaks query into tokens
SELECT | name | FROM | Employees | WHERE | dept | = | 10
Parser
Checks grammar, builds parse tree
SelectStmt → SelectList, FromClause, WhereClause
Semantic Analyzer
Validates names, types, permissions
Employees.dept exists? INT = INT? User has SELECT?
Logical Plan
Relational algebra tree
π_name( σ_dept=10( Employees ) )
Optimizer
Explores plans, estimates costs via statistics
Predicate pushdown, join reordering, index selection
Physical Plan
Concrete algorithms chosen
IndexScan(dept_idx) → Filter → Project
Executor
Volcano iterator: open / next / close
Pull-based, one tuple at a time (or vectorized batches)
Result Set
Rows returned to client application
{ "Alice", "Bob", "Carol" }
3.1 Lexical Analysis & Parsing
The lexer reads the SQL string character by character and groups characters into tokens: keywords (SELECT, FROM, WHERE), identifiers (table and column names), operators (=, >), and literals (numbers, strings). If an unrecognized character sequence is encountered, a lexical error is reported immediately.
The parser takes the token stream and checks it against the formal SQL grammar. If the token sequence is syntactically valid, the parser constructs a parse tree (syntax tree), a hierarchical structure mirroring the grammar rules. For example, a SELECT statement produces child nodes for the select list, from clause, and where clause. A syntax error terminates processing here.
3.2 Semantic Analysis
Even a syntactically valid parse tree can be meaningless. The semantic analyzer performs deeper validation against the database schema:
- Name resolution: All referenced tables, columns, and functions must exist and be in scope.
- Type checking: Operations must be applied to compatible types (e.g., comparing a string to an integer is an error).
- Authorization: The user must hold sufficient privileges (SELECT, UPDATE, etc.) on the referenced objects.
- View expansion: References to views are replaced with their underlying definitions.
3.3 Intermediate Representation (Logical Plan)
After successful semantic analysis, the parse tree is translated into a logical query plan— an operator tree expressed in relational algebra. Relational algebra operators include:
σ (Selection) · π (Projection) · ⋈ (Natural Join) · ⊕ (Union) · ∩ (Intersection) · − (Difference) · G (Group By)
Example: SELECT emp_name FROM Employees WHERE salary > 50000 becomes: π emp_name(σ salary>50000(Employees))
This logical plan captures what to compute without specifying how. It is the input to the query optimizer.
4Query Optimization
Query optimization is the most challenging phase of query processing. The optimizer's goal is to find the physical execution plan with the lowest estimated resource cost among potentially billions of candidates.
4.1 Logical Rewrites
Before selecting physical operators, the optimizer applies algebraic equivalence rules to rewrite the logical plan into an equivalent but cheaper form:
Predicate Pushdown
Move selection operators (σ) as far down the tree as possible. Filtering data early reduces the size of all intermediate results flowing upward. This is the single most impactful logical rewrite.
Projection Pushdown
Move projection operators (π) down to eliminate unnecessary columns early. Narrower tuples use less memory and I/O bandwidth as they flow through the plan.
Join Commutativity & Associativity
R ⋈ S ≡ S ⋈ R (commutative) and (R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T) (associative). These rules allow the optimizer to freely reorder joins.
Subquery Unnesting
Correlated subqueries (EXISTS, IN) are rewritten into joins where possible, replacing O(|outer| × |inner|) correlated execution with an efficient join.
4.2 Cost-Based Optimization
Modern optimizers are cost-based: they assign an estimated cost to each candidate physical plan and select the lowest-cost plan. The cost model accounts for:
The System R optimizer (IBM, 1979) pioneered cost-based optimization with dynamic programming over left-deep join trees. It remains the conceptual template for most modern optimizers. The optimizer enumerates sub-expressions bottom-up, storing the cheapest plan for each sub-expression, then combines them.
4.3 Cardinality Estimation & Statistics
Cardinality estimation — predicting how many rows each operator will produce — is critical because operator costs are proportional to input/output sizes. The optimizer uses:
- N_R: number of tuples in relation R
- V(A, R): number of distinct values of attribute A in R
- Histograms: frequency distributions of attribute values
- Selectivity formulas: for σ_A=v(R), estimated rows = N_R / V(A, R); for σ_A>v(R), estimate uses histogram buckets
- Join cardinality: for R ⋈ S on A=B, estimated = N_R × N_S / max(V(A, R), V(B, S))
Inaccurate cardinality estimates are the leading cause of suboptimal query plans. Always keep statistics up-to-date with ANALYZE or UPDATE STATISTICS after large data changes.
4.4 Rule-Based vs. Cost-Based Optimization
| Aspect | Rule-Based | Cost-Based |
|---|---|---|
| Basis | Fixed heuristics (always push selections down) | Statistical cost estimates |
| Data awareness | None — ignores data distribution | Yes — relies on statistics and histograms |
| Optimality | Can be suboptimal for unusual data | Near-optimal given accurate statistics |
| Use in practice | Logical rewrites still use rule-based steps | Dominant for physical plan selection |
5Join Processing Algorithms
Joins are typically the most expensive operations in relational queries. Choosing the right join algorithm for each join in a query is one of the optimizer's most consequential decisions. Let B_R and B_S denote the number of disk blocks in relations R and S, |R| the number of tuples, and M available memory blocks.
Join Algorithms Comparison
How Nested Loop, Sort-Merge, and Hash Join process the same data
| name | dept_id |
|---|---|
| Alice | 10 |
| Bob | 20 |
| Carol | 10 |
| Dan | 30 |
| dept_name | dept_id |
|---|---|
| Engineering | 10 |
| Marketing | 20 |
| Sales | 30 |
5.1 Nested Loop Join (NLJ)
The simplest join: for every tuple in R, scan all of S.
FOR each tuple r IN R DO
FOR each tuple s IN S DO
IF r and s satisfy join condition THEN emit (r, s)
I/O Cost: B_R + |R| × B_S (naive); improved to B_R + ⌈B_R / (M−1)⌉ × B_S with Block NLJ (reads M−1 blocks of R at a time). Use when: outer relation is very small, or no suitable index exists.
5.2 Index Nested Loop Join (INLJ)
A variant of NLJ where an index exists on the inner relation's join attribute. Instead of scanning all of S for each tuple in R, the optimizer uses the index to fetch only matching tuples. I/O Cost: B_R + |R| × C_index, where C_index is the index lookup cost (typically 2–4 I/Os for a non-clustered B-tree). Use when: the outer relation is small and the join is highly selective.
5.3 Sort-Merge Join (SMJ)
Sort both R and S on their join attributes, then merge the two sorted streams in a single pass. I/O Cost: Sort(R) + Sort(S) + B_R + B_S, where Sort(X) ≈ 2·B_X·(1 + ⌈log_(M−1)(B_X/M)⌉). If both relations are already sorted, cost collapses to B_R + B_S. Use when: inputs are pre-sorted, or the query requires sorted output on the join key anyway.
5.4 Hash Join (HJ)
Build an in-memory hash table on the smaller relation (R), then probe it with every tuple from S. I/O Cost: B_R + B_S if R fits in memory. If R does not fit, a partitioning phase writes both relations to disk partitioned by hash key: total cost ≈ 3·(B_R + B_S). Use when: equijoin, large unsorted relations, and the build side fits in memory. Generally the fastest join algorithm in practice.
5.5 Join Algorithm Comparison
| Algorithm | Avg I/O Cost | Requires Sort? | Best For |
|---|---|---|---|
| Simple NLJ | B_R + |R|·B_S | No | Very small outer relation |
| Block NLJ | B_R + ⌈B_R/(M−1)⌉·B_S | No | No index available |
| INLJ | B_R + |R|·C_index | No | Small outer, selective index |
| Sort-Merge | Sort(R) + Sort(S) + B_R + B_S | Yes | Pre-sorted inputs, sorted output needed |
| Hash Join | B_R + B_S | No | Large unsorted equijoins (fastest avg) |
Query Execution Plan Tree
Bottom-up construction of a physical query plan
SELECT E.name, D.dept_name FROM Employees E JOIN Departments D ON E.dept_id = D.dept_id WHERE E.salary > 50000
6Sorting & Aggregation
ORDER BY, GROUP BY, DISTINCT, and set operations (UNION, INTERSECT, EXCEPT) all require sorting or hashing. When data exceeds available memory, the DBMS must use disk-based algorithms.
6.1 External Merge-Sort
When data is too large to sort in memory (B_R > M blocks), external merge-sort proceeds in two phases:
Phase 1: Run Formation
- Read M blocks at a time into memory
- Sort the chunk in-place (quicksort)
- Write the sorted "run" to disk
- Produces ⌈B_R / M⌉ sorted runs
- I/O cost: 2 × B_R (read + write)
Phase 2: Merge Passes
- Use M−1 input buffers to merge M−1 runs at once
- Write merged result to disk
- Repeat until 1 run remains
- Passes needed: ⌈log_(M−1)(B_R/M)⌉
- Total I/O: 2·B_R·(1 + ⌈log_(M−1)(B_R/M)⌉)
External Merge-Sort Animation
12 values · M = 4 memory blocks · 3 initial runs
Complete Pipeline (Final State)
External merge-sort handles data too large for memory. This view shows the complete final sorted result. Step through to see how we got here.
Current Memory Chunk (reading from disk)
Sorted Runs (on disk)
Run 1
Run 2
Run 3
Merge Output (on disk)
Merged Run A
Run 3 (unchanged)
Final Sorted Output
6.2 Sort-Based vs. Hash-Based Aggregation
Sort-Based Aggregation
- Sort on GROUP BY attributes (external sort if needed)
- Scan once: adjacent equal keys form a group
- Compute SUM, COUNT, AVG per group
Good when: data needs sorting anyway (ORDER BY same key)
Hash-Based Aggregation
- Scan relation, build hash table keyed by GROUP BY attributes
- Update aggregate accumulators (SUM += value, COUNT++)
- Spill partitions to disk if hash table overflows
Good when: few distinct groups fit in hash table; generally faster
7Query Execution Models
Once the physical query plan is selected, the execution engine takes over. There are three dominant execution paradigms, each with different performance characteristics.
7.1 Iterator / Volcano Model
The Volcano model (Graefe & McKenna, 1993) is the classical execution paradigm. Every physical operator implements three methods:
open()Initializes the operator and recursively calls open() on children. Allocates buffers, opens files.
next()Returns the next qualifying tuple. Recursively calls next() on children. May loop internally if the child tuple fails a predicate.
close()Releases resources and recursively calls close() on children. Closes files, frees buffers.
Volcano / Iterator Model
Pull-based execution: parent calls child.next() to get one tuple at a time
7.2 Pipelining vs. Materialization
Pipelining
Tuples flow directly from one operator to the next without intermediate storage. The default behavior in the Volcano model.
- Lower I/O, lower latency
- Less temporary storage needed
- All pipelined operators active concurrently in memory
- Cannot work for blocking operators (Sort, Hash Join build)
Materialization
The complete output of an operator is written to a temporary buffer or disk before the next operator reads it.
- Required for blocking operators (Sort, Hash Join build phase)
- Allows reuse of intermediate results
- Higher I/O and latency
- Useful under memory pressure
7.3 Vectorized Execution
Traditional Volcano processes one tuple at a time. Vectorized execution processes data in batches (vectors) of 1,000–10,000 tuples. Each call to next() returns an entire vector instead of a single tuple:
Cache Efficiency
Processing a column of values sequentially improves spatial locality, reducing CPU cache misses significantly.
Reduced Overhead
One next() call per vector instead of per tuple eliminates the amortized cost of function dispatch per row.
SIMD Instructions
Modern CPUs apply a single instruction to 8–16 values simultaneously, enabling massive parallelism for bulk comparisons.
7.4 JIT Compilation
Some advanced systems (Apache Impala with LLVM, HyPer, CockroachDB) use Just-in-Time (JIT) compilation to translate the physical query plan into native machine code at runtime. Unlike interpreted execution, JIT-compiled code contains no virtual dispatch overhead, enables loop unrolling, and generates specialized code for specific data types and predicates. The compilation overhead is amortized over long-running or repeated queries.
8SQL Walkthroughs with EXPLAIN ANALYZE
EXPLAIN ANALYZE is the primary tool for observing how a DBMS actually executes a query. It shows the physical plan, estimated vs. actual row counts, and buffer/I/O statistics.
Example 1: Selection with Index Scan
Retrieve names and salaries of employees earning over $50,000. An index on salary exists.
EXPLAIN ANALYZE SELECT emp_name, salary FROM Employees WHERE salary > 50000.00; -- QUERY PLAN ----------------------------------------------- -- Project (emp_name, salary) -- -> Index Scan using idx_employees_salary on Employees -- Filter: (salary > 50000.00) -- Rows Removed by Filter: 150000 -- Actual Rows: 100000 -- Buffers: shared hit=10000, read=5000
Index Scan: The optimizer chose an index scan because the selectivity is high (40% of rows qualify). The index locates matching row pointers efficiently.
Buffers: read=5000: 5,000 pages had to be fetched from disk (cache miss). Higher read counts indicate more I/O pressure.
I/O Cost: B_index_leaf + K × C_data_fetch ≈ 5,000 + 5,000 = 10,000 block accesses in this example.
Example 2: Hash Join with Predicate Pushdown & Hash Aggregation
Find total salary per department for "New York" departments. Demonstrates predicate pushdown, hash join, and hash-based aggregation.
EXPLAIN ANALYZE SELECT D.dept_name, SUM(E.salary) AS total_salary FROM Employees AS E JOIN Departments AS D ON E.dept_id = D.dept_id WHERE D.location = 'New York' GROUP BY D.dept_name ORDER BY total_salary DESC; -- QUERY PLAN ----------------------------------------------- -- Sort (total_salary DESC) -- -> HashAggregate (SUM(E.salary), Group Key: D.dept_name) -- -> Hash Join (E.dept_id = D.dept_id) -- -> Seq Scan on Employees E [probe side] -- -> Hash (D.dept_id) [build side] -- -> Index Scan on Departments D -- Filter: (location = 'New York') -- Rows Removed: 900 Actual Rows: 100
Predicate Pushdown: The location = 'New York' filter is applied to Departments before the join, reducing the build-side hash table from 1,000 to 100 rows.
Hash Join strategy: Filtered Departments (100 rows) are the build side; the large Employees table is the probe side — exactly the right choice for minimal memory.
HashAggregate: Groups are accumulated into a hash table keyed on dept_name. Final Sort handles ORDER BY.
Example 3: EXISTS Subquery Rewritten to Join
Find departments with at least one employee earning over $70,000. The optimizer rewrites the correlated subquery into an efficient join.
EXPLAIN ANALYZE
SELECT D.dept_name
FROM Departments D
WHERE EXISTS (
SELECT 1 FROM Employees E
WHERE E.dept_id = D.dept_id AND E.salary > 70000.00
);
-- QUERY PLAN -----------------------------------------------
-- HashAggregate (DISTINCT D.dept_name) -- EXISTS → join + distinct
-- -> Hash Join (D.dept_id = E.dept_id)
-- -> Seq Scan on Departments D
-- -> Hash (E.dept_id)
-- -> Index Scan using idx_salary on Employees E
-- Filter: (salary > 70000.00)
-- Rows Removed: 200000 Actual Rows: 50000Subquery unnesting: The optimizer rewrites the correlated EXISTS into a Hash Join + DISTINCT, avoiding O(|D| × |E|) correlated execution.
Index on salary: Applied during the subquery scan, reducing 250,000 employees to 50,000 before the join. Overall complexity drops from O(|D|·|E|) to O(|D| + |E_filtered|).
9Common Mistakes
Confusing logical and physical query plans
A logical plan describes what the query computes in relational algebra (e.g., "a join"). A physical plan specifies how, using concrete algorithms and access paths (e.g., "Hash Join using an Index Scan on Employees.dept_id"). Optimizers generate logical plans first, then select physical operators.
Underestimating the impact of join ordering
The order of joins can change the size of intermediate results by orders of magnitude. Joining two large tables before a small table can turn a fast query into one that runs for hours. Always check that the optimizer is choosing a sensible join order via EXPLAIN.
Using outdated or missing statistics
The optimizer relies on accurate statistics (row counts, histograms, distinct values) to estimate cardinalities. After large bulk loads or deletes, run ANALYZE (PostgreSQL) or UPDATE STATISTICS (SQL Server) to refresh statistics. Stale stats are the #1 cause of suboptimal plans in production.
Assuming pipelining is always better than materialization
Pipelining generally reduces I/O, but some operators (Sort, Hash Join build phase, Aggregation with hash spills) inherently require materialization. The optimizer decides when to pipeline and when to materialize based on operator type and memory availability.
Assuming an index will always be used
An index may exist but the optimizer might not use it — for example, if the table is small enough that a sequential scan is cheaper, or if the predicate selectivity is too low (e.g., salary > 0 matches nearly every row). Always verify with EXPLAIN.
Using SELECT * when only a few columns are needed
SELECT * fetches and propagates all columns through the entire operator tree. This increases intermediate result sizes, consumes more buffer pool memory, can prevent covering index scans, and may force spilling to disk for join and aggregation operators that would otherwise fit in memory.
Frequently Asked Questions
What is the primary goal of the query optimizer, and why is it so complex?+
The primary goal of the query optimizer is to find the most efficient physical query plan for a given logical plan, minimizing resource consumption (I/O, CPU, memory, network). It is complex because the plan space grows exponentially with query complexity, accurate cost estimation depends on dynamic statistics, and heuristics are needed to prune an NP-hard search space within a reasonable time.
When would a Hash Join be preferred over a Sort-Merge Join, and vice versa?+
Hash Join is preferred when one relation is significantly smaller and fits in memory (optimal B_R + B_S I/O cost), relations are unsorted, and the output order does not matter. Sort-Merge Join is preferred when one or both relations are already sorted on the join attribute, the query requires sorted output (e.g., ORDER BY on join key), or memory constraints make hash-table spilling expensive.
Explain the difference between pipelining and materialization, and when materialization is necessary.+
Pipelining passes intermediate results directly between operators without writing to temporary storage, reducing I/O and latency. Materialization fully computes and stores the result of an intermediate operation before the next operator reads it. Materialization is necessary for operators that must see all input before producing output — a full sort must materialize all incoming tuples before it can emit any sorted output; the build phase of a hash join must materialize the entire build relation into an in-memory hash table.
How do database statistics influence query optimization?+
Database statistics (tuple counts, distinct value counts, histograms) are the foundation of cost-based optimization. They enable cardinality estimation — predicting how many rows each operator will produce — which determines join algorithm selection, access path selection (table scan vs. index scan), and join ordering. Stale or missing statistics are a leading cause of suboptimal query plans.
What is vectorized execution, and why is it beneficial?+
Vectorized execution processes data in batches (vectors) of hundreds or thousands of tuples rather than one at a time. Benefits include improved CPU cache utilization due to sequential column-at-a-time access, dramatically reduced function-call overhead (one next() per vector instead of per tuple), and the ability to leverage SIMD CPU instructions for bulk arithmetic and comparison operations. It is the dominant execution model in modern OLAP databases.
What is the Join Ordering Problem and why does it matter in practice?+
The Join Ordering Problem is determining the most efficient sequence in which to join N relations. For N relations there are exponentially many possible join orders, and the optimal solution is NP-hard. In practice, a poor join order can produce enormous intermediate results, consuming memory, CPU, and disk and turning a sub-second query into one that runs for hours. Optimizers use dynamic programming (e.g., the Selinger algorithm) and heuristics to find near-optimal orders within a time budget.
Practice Quiz
Test your understanding of query processing internals -- select the correct answer for each question.
1.Which phase of query processing is responsible for converting a logical query plan into an optimal physical query plan?
2.What is the primary purpose of a Cost Model in a query optimizer?
3.Which join algorithm is generally most efficient for equijoins when one relation is significantly smaller and can fit entirely in memory?
4.An operator that must process all its input before producing any output (e.g., a full external sort) typically requires which execution strategy?
5.What is the main advantage of Predicate Pushdown in query optimization?
6.The Iterator Model (Volcano-style) is characterized by which type of data flow between operators?
7.Which of the following is NOT a typical factor considered by a cost-based optimizer?
8.For external merge-sort of a relation with B_R blocks and M available memory blocks, approximately how many merge passes are required?
9.What is the primary benefit of vectorized execution compared to tuple-at-a-time processing?
10.In the context of join ordering, what does a "left-deep join tree" mean?
Study Tips & Memory Aids
Pipeline mnemonic — "PSLO-E"
- Parsing (lex + syntax)
- Semantic Analysis
- Logical Plan
- Optimization (physical plan)
- Execution
Join algorithm I/O costs (R is smaller)
- NLJ: B_R + |R|·B_S (worst)
- SMJ: Sort(R) + Sort(S) + B_R + B_S
- HJ: B_R + B_S (best, if R fits in mem)
- INLJ: B_R + |R|·C_index
External sort formula
Total I/O = 2·B_R·(1 + ⌈log_(M−1)(B_R/M)⌉)
Volcano model interface
open() ↓ · next() ↑ · close() ↓
open/close go DOWN the tree; tuples flow UP via next()