ResourcesDatabaseStorage, Indexing & File Structures
DatabaseCollege

Storage, Indexing & File Structures

Storage, Indexing, and File Structures form the foundational layer of every database management system. This domain governs how data is physically laid out on disk, how it is organised into logical files, and how auxiliary data structures accelerate retrieval.

This guide covers physical storage and buffer management, file organisations (heap, sequential, hash), B+Tree and hash-based indexes, bitmap indexes, advanced techniques (composite, covering, partial indexes), SQL walkthroughs with EXPLAIN ANALYZE, and a 10-question practice quiz.

1Introduction

In the architecture of a DBMS, Storage, Indexing, and File Structures sit at the physical and logical data organisation levels. This layer bridges the logical schema (relations, attributes) and the physical hardware (HDDs, SSDs). Understanding it is a prerequisite for query processing, optimisation, and scalable system design.

The efficiency of a database is critically dependent on these choices. Poorly chosen file organisations or missing indexes can cause catastrophic performance bottlenecks under heavy load. The central trade-off is always between storage space, access time, and update cost.

In Practice

Modern databases like PostgreSQL, MySQL, and Oracle rely heavily on optimised B+Tree indexes for primary key constraints and frequently queried columns. Choosing the right index type -- B+Tree for range queries, hash for equality -- and verifying its use via EXPLAIN ANALYZE is a daily task for database administrators and performance engineers. A missing index on a large table can turn a 10ms query into a 30-second full table scan.

Storage

Managing physical devices (HDDs, SSDs), allocating disk blocks, and caching pages in main memory via the buffer pool.

Indexing

Auxiliary data structures (B+Trees, hash indexes) that enable rapid record retrieval without scanning the entire data file.

File Structures

How records within a relation are physically organised -- heap, sequential, or hash -- affecting insert, delete, and scan performance.

2Key Definitions

Essential terms for understanding storage, indexing, and file structures at the university level.

Disk Page (Block)

The smallest unit of data transfer between disk and memory. Typically 4 KB, 8 KB, or 16 KB. All I/O is measured in page reads/writes.

Record (Tuple)

A single row in a relation, composed of fields (attributes). Records are stored in pages on disk.

Data File

A collection of records of the same type (same relation) stored on disk. e.g., employees.dat.

Index File

A separate file of (key, pointer) pairs enabling fast lookup into a data file based on a specific attribute.

Heap File

Records stored in no particular order. New records are appended or placed in any free slot. Fast insert, slow search.

Sequential File

Records stored in sorted order by a designated search key. Efficient for range queries, costly to maintain on updates.

Dense Index

An index entry exists for every search key value in the data file. Required for secondary indexes.

Sparse Index

Index entries exist only for some search key values -- typically one per disk page. Only usable with clustered (sorted) files.

Primary Index

An index whose search key specifies the physical sequential order of the file. Usually on the primary key.

Secondary Index

An index on an attribute different from the file's physical ordering key. Must be dense. Requires RID lookups.

Fan-out (f)

The maximum number of child pointers an internal B+Tree node can hold. Higher fan-out = shorter tree = fewer I/Os.

I/O Cost

The number of disk page reads/writes to perform an operation. The dominant performance metric in disk-based databases.

Record ID (RID)

A unique identifier for a record: (PageID, SlotNumber). Secondary indexes store RIDs to locate data records.

Buffer Pool

A designated area of main memory that caches disk pages. Managed by the buffer manager to minimise disk I/O.

3Physical Storage & Disk Organisation

Storage Hierarchy

Faster and smaller at the top; slower and larger at the bottom

LEVELACCESS TIMECAPACITYCPU Registers< 1 nsBytesL1 / L2 Cache1–10 nsKB – MBMain Memory (RAM)50–100 nsGBSolid State Drive (SSD)0.05–0.15 msGB – TBHard Disk Drive (HDD)5–20 msTBTape / OpticalSeconds – MinutesPBSPEED

Database buffer pools sit in RAM, caching pages from SSD/HDD to minimise costly disk I/O

Disk Access Time Components

Traditional Hard Disk Drives (HDDs) are mechanical devices. Their access time has three components:

Seek Time

Time to move the arm to the correct track. Usually the largest cost component: 5--20 ms for HDDs. Near-zero for SSDs.

Rotational Latency

Time waiting for the sector to rotate under the head. Average = half a full rotation (~4 ms at 7200 RPM).

Transfer Time

Time to actually read/write the data bytes once positioned. The fastest component; sequential I/O amortises this.

Total Access Time = Seek Time + Rotational Latency + Transfer Time

Random I/O  = 1 x (SeekTime + RotLatency + TransferTime)  -- expensive
Sequential  = 1 x SeekTime + 1 x RotLatency + N x TransferTime  -- efficient
Key Insight

Random I/O is far more expensive than sequential I/O. This is why minimising random disk accesses -- through good file organisations, indexes, and buffer management -- is the central goal of database storage design. SSDs dramatically reduce seek time but sequential reads are still faster than random reads due to internal parallelism and prefetching.

RAID Levels

RAID 0 (Striping)

Data striped across disks for parallel I/O. Best performance, but zero redundancy -- any disk failure loses all data.

RAID 1 (Mirroring)

Data duplicated across two disks. Excellent redundancy, fast reads. 50% storage overhead.

RAID 5 (Stripe + Parity)

Data striped with distributed parity. Good balance of performance, redundancy, and cost. Tolerates one disk failure.

RAID 10 (Mirror + Stripe)

Combines RAID 1 and RAID 0. Highest performance and redundancy. Most costly -- 50% storage overhead.

Buffer Pool Management

Since disk I/O is expensive, the DBMS maintains a buffer pool -- a designated area of main memory that caches frequently accessed disk pages. The buffer manager controls fetching, pinning, and eviction of pages.

Buffer Pool Management

Buffer frames cache disk pages in RAM. The LRU policy evicts the least-recently-used clean frame first.

Disk (HDD/SSD)
read (miss) →← write dirty
Buffer Pool
page request →← data
Query CPU

Buffer Frames (Main Memory)

Frame 1

P-12

dirty: YESpin: 2

Frame 2

P-07

dirty: nopin: 0

Frame 3

P-33

dirty: YESpin: 1

Frame 4

P-01

dirty: nopin: 0

Frame 5

(free)

Frame 6

P-44

dirty: nopin: 1

LRU Queue

MRU at top

P-12
P-44
P-33
P-07
P-01evict next
dirty (must flush before evict)clean pagepinned (in use)

LRU Policy

Evicts the page that has not been accessed for the longest time. Good general-purpose policy.

CLOCK Policy

Efficient approximation of LRU using a reference bit. Sweeps frames in a circular fashion.

MRU Policy

Evicts the most recently used page. Optimal for nested-loop join patterns where the same pages cycle repeatedly.

4File Organizations

A file organisation dictates how records are physically stored and managed on disk. The choice significantly impacts the efficiency of insert, delete, search, and range scan operations.

Heap File vs Sequential File

Heap: records in arrival order (fast insert, slow search) | Sequential: records sorted by key (efficient range queries)

Heap File

(unordered)

Page 1

ID:42
ID:07
ID:15

Page 2

ID:31
ID:03
(free)

Page 3

ID:22
ID:09
ID:55

Insert: O(1) | Search: O(N pages)

Sequential File

(sorted by key)

Page 1

ID:01
ID:03
ID:07

Page 2

ID:09
ID:15
ID:22

Page 3

ID:31
ID:42
ID:55

Range: O(log N + K) | Insert: O(N)

Heap File

  • Records in arrival order
  • Insert: O(1) amortised
  • Search: O(N pages) full scan
  • Best for: small tables, index-driven access

Sequential File

  • Sorted by ordering key
  • Range query: O(log N + K)
  • Insert: O(N) worst case
  • Best for: static tables with range queries

Hash File

  • Records in hash buckets
  • Equality search: O(1) avg
  • Range queries: not supported
  • Best for: equality-only lookups

Clustered vs Unclustered

PropertyClusteredUnclustered
Physical orderMatches index key orderIndependent of index order
Indexes per tableOnly oneMultiple allowed
Range queriesExcellent (sequential reads)Poor (random I/O per record)
Key updatesExpensive (may move records)Less expensive

5B+Tree Indexes

The B+Tree is the de facto standard for database indexing. It is a self-balancing tree structure where all data pointers reside in leaf nodes (which are linked), and internal nodes serve purely as a navigation directory.

B+Tree Structure

Internal nodes route searches; leaf nodes hold data and are linked for range scans

3060Root1020Internal4050Internal7080Internal159Leaf (data ptr)101520303540505560RootInternalLeaf (linked)

Dashed green arrows show the linked list connecting all leaf nodes for sequential range traversal

Key Properties

All leaves at the same depth

Guarantees O(log_f N) I/O cost for any search. The tree is always perfectly height-balanced.

Linked leaf nodes

Leaves form a doubly-linked list, enabling efficient sequential range traversal once the start is found.

High fan-out

Internal nodes hold only keys (no data), maximising the number of child pointers per page and minimising tree height.

Self-balancing

Splits propagate up on insert; merges/redistributions handle deletes. Height grows only at the root.

Operations & I/O Cost

OperationI/O CostNotes
Exact searchh I/Osh = tree height approx log_f(N). One more for unclustered data fetch.
Range queryh + K I/Osh to find start, K leaf pages to scan the range.
Insert (no split)h + 1 I/OsTraverse to leaf, write modified leaf.
Insert (with split)h + splitsSplits propagate up; root split increases height by 1.
Memory Aid: Fan-out Power

With a page size of 8 KB and a key+pointer of 8 bytes, fan-out f ≈ 1000. A B+Tree with 1 billion records has height h = log₁₀₀₀(1,000,000,000) = 3. That means just 3 disk I/Os to find any record in a billion-row table.

6Hash-Based Indexes

Hash-based indexes provide very fast O(1) average-case equality lookups. A hash function maps a search key to a bucket, which holds the matching records. They are completely unsuitable for range queries.

Static Hashing

Fixed number of buckets. Overflow chains degrade performance as the table grows. Suitable only for static datasets.

Linear Hashing

Buckets grow gradually using a family of hash functions and a split pointer. No directory needed. Avoids global restructuring.

Extendible Hashing

Uses a directory of bucket pointers. Directory doubles when needed. No overflow chains. O(1) average lookup.

Extendible Hashing

Directory doubles when local depth equals global depth. No overflow chains needed.

// Final state (gd=3). Click Play to animate from the beginning.
Global Depth: 3Directory size: 8 entries

Directory

000B0
001B1
010B2
011B1
100B3
101B1
110B2
111B1

Buckets (capacity=2)

B0ld=3
16
B1ld=1
3
B2ld=2
26
B3ld=3
412
Final state with global depth 3. The directory has 8 entries. Multiple entries can point to the same bucket when local depth < global depth.
Final

B+Tree vs Hash Index Comparison

FeatureB+TreeHash Index
Equality searchO(log_f N)O(1) average
Range searchO(log_f N + K)Not supported
OrderingSorted leaf nodesNone
ORDER BY supportYesNo
Primary use caseGeneral purpose, rangesEquality lookups only

7Bitmap Indexes

A bitmap index creates a separate bit vector for each distinct value of a low-cardinality attribute. Each bit corresponds to one record: 1 if the record has that value, 0 otherwise.

-- Example: Gender column with 10 records
-- Bitmap for 'M': 1 0 1 0 0 1 0 1 0 1  (records 1,3,6,8,10 are Male)
-- Bitmap for 'F': 0 1 0 1 1 0 1 0 1 0  (records 2,4,5,7,9 are Female)

-- Query: Gender='M' AND Department='Sales'
-- Result = BITWISE_AND(Bitmap_M, Bitmap_Sales)
-- This is a single CPU instruction per 64 records!

Ideal for OLAP / Data Warehousing

Complex multi-condition analytics on low-cardinality columns (status, region, gender). Bitwise AND/OR/NOT are extremely fast.

Low Cardinality Only

Works well for columns with few distinct values (e.g., 2--1000). For high-cardinality columns (e.g., salary), the index becomes as large as the data file.

Avoid for OLTP

Each row update requires modifying multiple bitmaps simultaneously. High write contention makes bitmap indexes impractical for transactional workloads.

8Advanced Indexing Techniques

Composite Indexes & Column Ordering

A composite index is built on multiple columns. The column order critically determines which queries can use the index.

-- Index on (department_id, employee_id)
WHERE department_id = 10                     -- uses index (leading column)
WHERE department_id = 10 AND employee_id > 5 -- uses index (both columns)
WHERE employee_id = 101                      -- cannot use index (skips leading)

Covering Indexes

A covering index includes all columns referenced by a query (WHERE + SELECT), enabling an index-only scan that never touches the data file.

-- Query requires: department_id (filter), employee_id, salary (output)
CREATE INDEX idx_covering
  ON Employees (department_id, employee_id, salary);

-- Now this query needs zero data-page I/Os:
SELECT employee_id, salary FROM Employees WHERE department_id = 20;

Partial (Filtered) Indexes

A partial index indexes only a subset of rows satisfying a WHERE clause. Smaller size, lower maintenance overhead.

-- Only index active employees -- much smaller than a full index
CREATE INDEX idx_active_emp
  ON Employees (employee_id)
  WHERE status = 'Active';

Clustered vs Non-Clustered Comparison

PropertyClustered IndexNon-Clustered Index
Data orderLeaf level IS the dataSeparate structure; stores RIDs
Count per tableMaximum 1Multiple allowed
Range scansExcellent (sequential)Poor unless covering
Key update costHigh (may relocate data)Lower
Index-only scanAlways (data in leaf)Only if covering index

Multi-Dimensional Indexes (R-Trees, Quad-Trees)

Spatial data requires indexes that handle multi-dimensional queries. R-Trees use minimum bounding rectangles (MBRs) to partition space hierarchically -- used by PostGIS for geospatial queries. Quad-Trees subdivide 2D space into quadrants recursively. Both dramatically outperform B+Trees for "find all points within this region" queries.

9SQL Walkthroughs

Introductory

Index Creation & Equality Query

-- Without index: Sequential Scan (O(N) pages)
EXPLAIN ANALYZE
SELECT * FROM Employees WHERE department_id = 20;

-- Create B+Tree index on department_id
CREATE INDEX idx_emp_dept ON Employees (department_id);

-- With index: Index Scan (O(log_f N + M) I/Os)
EXPLAIN ANALYZE
SELECT * FROM Employees WHERE department_id = 20;

Before index: PostgreSQL performs a Sequential Scan -- reads every page in the table. Cost = N_pages I/Os regardless of selectivity.

After index: Optimizer uses an Index Scan on idx_emp_dept. Traverses the B+Tree (3--4 I/Os), then fetches only the matching data pages. Orders of magnitude faster for large tables.

Intermediate

Range Query with B+Tree Index

-- Create index on hire_date
CREATE INDEX idx_emp_hire ON Employees (hire_date);

-- Range query: B+Tree traverses to start of range,
-- then follows linked leaf nodes sequentially
EXPLAIN ANALYZE
SELECT employee_id, first_name, hire_date
FROM Employees
WHERE hire_date BETWEEN '2020-01-01' AND '2021-12-31';

-- Without salary index: forces a full Sequential Scan
EXPLAIN ANALYZE
SELECT employee_id, salary
FROM Employees
WHERE salary BETWEEN 70000 AND 80000;

hire_date range: B+Tree finds the first matching leaf in O(log_f N) I/Os, then traverses the linked leaf list sequentially -- highly efficient.

salary range (no index): Full table scan required. Every page read to check the condition. Adding a B+Tree on salary would fix this.

Key insight: B+Trees are ideal for ranges precisely because leaf nodes are linked -- the tree gives the start point, then the list gives the rest.

Advanced

Covering Index & Index-Only Scan

-- Covering index: includes all columns needed by the query
CREATE INDEX idx_covering_dept
  ON Employees (department_id, status, salary);

-- Index-Only Scan: zero data-page I/Os
EXPLAIN ANALYZE
SELECT employee_id, department_id, salary
FROM Employees
WHERE department_id = 20 AND status = 'Active';

-- Requires Heap Fetch (first_name not in index)
EXPLAIN ANALYZE
SELECT first_name, department_id, salary
FROM Employees
WHERE department_id = 20 AND status = 'Active';

Index-Only Scan: All required columns (department_id, status, salary, plus employee_id as implicit row identifier) are in the index. PostgreSQL returns results without touching data pages -- minimum possible I/O.

Heap Fetch required: first_name is not in the covering index, so the DBMS must also fetch the actual data page for each matching row -- adding random I/Os.

Design rule: For frequently executed queries, design covering indexes by including all projected columns. The extra index size is often worth the eliminated data I/Os.

10Common Mistakes

Creating Too Many Indexes

Indexing every column to maximise read performance

Every index must be updated on INSERT, UPDATE, and DELETE -- adding write overhead and disk space. Create indexes judiciously on columns frequently in WHERE, JOIN, ORDER BY, or GROUP BY clauses. Use EXPLAIN ANALYZE to confirm an index is actually used.

Confusing Clustered vs Non-Clustered Indexes

Assuming multiple clustered indexes are possible per table

A clustered index physically orders the data on disk -- only one is possible per table. All other indexes are non-clustered and store RID pointers. This distinction fundamentally impacts range query performance and update cost.

Wrong Composite Index Column Order

Putting a low-selectivity column first in a composite index

Index (A, B, C) supports queries on A, (A,B), or (A,B,C) -- but not B alone or (B,C). Place the most selective column (or the one most often in equality conditions) first. Skipping the leading column forces the optimizer to ignore the index entirely.

Using Hash Indexes for Range Queries

Expecting a hash index to speed up BETWEEN or range predicates

Hash indexes have no inherent key ordering. A range query requires scanning all buckets -- equivalent to a full table scan. Always use a B+Tree for range queries, ORDER BY, or any operation requiring sorted access.

Neglecting Buffer Pool Tuning

Leaving the buffer pool at default size for a production workload

An undersized buffer pool causes excessive cache misses (buffer thrashing), resulting in constant disk I/O. PostgreSQL's shared_buffers should typically be 25% of RAM. Monitor buffer hit ratios and tune the replacement policy for your workload type.

File Organization Mismatch with Access Patterns

Using heap files for tables with frequent unindexed range queries

Match the file organisation to the dominant access pattern: Heap for index-driven or insert-heavy workloads; Sequential for mostly-static tables with range queries on the ordering key; Hash for equality-only access. A mismatch causes O(N) full scans where O(log N) is possible.

Frequently Asked Questions

What is the main difference between a B-Tree and a B+Tree, and why is B+Tree preferred in databases?
In a B-Tree, data pointers can exist in both internal and leaf nodes. In a B+Tree, all data pointers reside only in leaf nodes -- internal nodes serve purely as a directory. B+Trees are preferred for three reasons: (1) leaf nodes are linked, enabling efficient range query traversal without returning to the root; (2) internal nodes hold only keys, giving a higher fan-out and shorter tree height; (3) all searches end uniformly at the leaf level, simplifying the implementation.
When should I choose a hash index over a B+Tree index?
Choose a hash index when your primary access pattern involves exact match equality queries (e.g., WHERE employee_id = 123). Hash indexes offer O(1) average-case lookup, which is faster than a B+Tree's O(log_f N). However, hash indexes provide no inherent ordering, making range queries (BETWEEN, >, <) require a full table scan. For any query involving order, ranges, or ORDER BY, a B+Tree is the correct choice.
What is a "covering index" and how does it improve query performance?
A covering index is a non-clustered index that includes all columns required by a query -- both in the WHERE clause and the SELECT list. It improves performance by allowing the database to satisfy the query entirely by reading only the index, without accessing the actual data pages. This eliminates potentially many random I/Os, which is especially significant for unclustered indexes where each data fetch involves a separate disk read.
What are the trade-offs of using a clustered index versus a non-clustered index?
A clustered index physically orders data records on disk in index key order, making range queries and sequential scans extremely fast since related records are co-located. Only one clustered index is allowed per table. Updates to the clustering key are expensive as they may require physically relocating records. A non-clustered index is a separate structure pointing to data via RIDs -- multiple can exist per table, supporting fast lookups on many attributes. However, each record retrieval may involve a separate random I/O to fetch the actual data page, unless the index is a covering index.
How does the buffer pool interact with disk I/O, and what is the role of a replacement policy?
The buffer pool is a cache in main memory holding copies of disk pages. When the DBMS needs a page, it first checks the buffer pool (a buffer hit avoids disk I/O). On a buffer miss, the page is fetched from disk and placed in a free frame. A replacement policy (LRU, CLOCK, MRU) determines which page to evict when the pool is full. The goal is to evict the page least likely to be needed again, maximising the buffer hit rate and minimising slow disk I/Os.
Why is I/O cost the dominant performance metric in disk-based databases?
Disk I/O is orders of magnitude slower than CPU operations or RAM access. An HDD seek takes 5-20ms while a CPU instruction takes nanoseconds -- a difference of over a million times. Even SSDs are 10-100x slower than RAM for random access. Because a single query can require thousands of page reads, minimising the number of disk I/O operations (especially random ones) is far more impactful than any CPU optimisation. This is why file organisations, indexes, and buffer management all focus on reducing I/O count.

Practice Quiz

Test your understanding of storage, indexing, and file structures -- select the correct answer for each question.

1.Which of the following disk characteristics is generally the most significant contributor to disk access time in traditional HDDs?

2.What is the primary advantage of a B+Tree over a B-Tree in a database context?

3.Which file organization is best suited for scenarios where records are primarily accessed via exact match equality queries and range queries are rare?

4.A query `SELECT name, email FROM Users WHERE status = 'Active';` could benefit most from which type of index if `status` is a low-cardinality column?

5.What is the purpose of the "dirty bit" in buffer pool management?

6.Which RAID level provides striping for performance but no data redundancy?

7.In the context of a B+Tree, what does "fan-out" (branching factor) refer to?

8.Which of the following statements about a clustered index is TRUE?

9.An index on (department_id, employee_id) would be most efficiently used by which query?

10.Which dynamic hashing technique uses a directory that doubles in size when a bucket split occurs at the global depth?

Study Tips

  • Draw the B+Tree manually: Practice inserting keys 1 through 20 into a B+Tree with order d=2. Trace each split and observe how the tree height grows. This builds intuition that no textbook explanation can replace.
  • Run EXPLAIN ANALYZE on your own queries: Set up a local PostgreSQL instance, create a table with 100,000 rows, and toggle indexes on/off. Observe the shift from "Seq Scan" to "Index Scan" and "Index Only Scan" in the output.
  • Memorise the I/O cost table: Heap scan = N pages. B+Tree search = h I/Os. Hash lookup = 1 I/O average. Range via B+Tree = h + K. These formulas appear in exams and interviews regularly.
  • Trace the extendible hashing animation step-by-step: Work through a bucket split on paper -- redraw the directory before and after. The directory doubling is the key insight that separates extendible hashing from linear hashing.

Related Topics