DatabaseCollege

SQL Basics

Structured Query Language (SQL) is the standard declarative language for managing data in Relational Database Management Systems (RDBMS). It allows you to define database structures, manipulate data, and control access — all by specifying what you want rather than how to get it.

This guide covers DDL (CREATE, ALTER, DROP), DML (INSERT, UPDATE, DELETE), DQL (SELECT with JOINs, aggregation, subqueries), constraints, SQL walkthroughs, and a 10-question practice quiz.

SQL Query Execution Flow

SELECT e.name, d.dept_name, AVG(e.salary)
FROM Employees e JOIN Departments d ON e.dept_id = d.dept_id
WHERE e.salary > 50000
GROUP BY d.dept_name
HAVING COUNT(e.id) > 2
ORDER BY AVG(e.salary) DESC
LIMIT 5
Click Play or Step Forward to trace query execution

Result Table

Step through to see intermediate results

Step through to see how SQL processes each clause in logical order.
Step 0 / 7
SQL processes clauses in logical order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. This differs from the written order of a query.

1Introduction

SQL is a domain-specific, declarative language designed for managing data held in a relational database. It adheres to the principles of the relational model, introduced by E.F. Codd in 1970, which organizes data into relations (tables) with attributes (columns) and tuples (rows). SQL has been standardized by ANSI and ISO since 1986, ensuring broad interoperability across database systems like PostgreSQL, MySQL, Oracle, and SQL Server.

In a university database curriculum, SQL bridges the gap between theoretical data organization (relational algebra, ER modeling) and practical data interaction. Its declarative nature means the query optimizer determines the most efficient execution plan — you focus on the logic, not the mechanics. Mastery of SQL is indispensable for data science, software engineering, database administration, and business intelligence.

In Practice

A major e-commerce platform relies on SQL databases to manage customer orders, product catalogs, and inventory. When a customer places an order, SQL INSERT statements add new records, UPDATE statements adjust inventory levels, and complex SELECT queries retrieve order details for real-time processing and analytical reporting.

Anatomy of a SQL statement showing SELECT, FROM, WHERE, and ORDER BY clauses

2Key Definitions

Essential terms for understanding SQL and relational databases at the university level.

DDL (Data Definition Language)

Commands to define/modify database structure: CREATE, ALTER, DROP

DML (Data Manipulation Language)

Commands to manipulate data: SELECT, INSERT, UPDATE, DELETE

SELECT

Retrieves data from one or more tables; the most-used SQL command

FROM

Specifies the table(s) to query; supports JOINs for combining tables

WHERE

Filters individual rows before grouping using boolean conditions

JOIN

Combines rows from two+ tables based on related columns (INNER, LEFT, RIGHT, FULL)

GROUP BY

Groups rows with same values so aggregate functions operate per group

HAVING

Filters groups after GROUP BY using aggregate conditions

ORDER BY

Sorts the result set by one or more columns (ASC or DESC)

Primary Key

Column(s) that uniquely identify each row; NOT NULL + UNIQUE

Foreign Key

Column(s) referencing another table's primary key; enforces referential integrity

NULL

Special marker for unknown/missing values; uses three-valued logic (TRUE, FALSE, UNKNOWN)

Index

Data structure (B+Tree, hash) that speeds up data retrieval at cost of storage

View

Virtual table based on a SELECT query; does not store data itself

3DDL: Data Definition Language

DDL commands define, modify, and drop the structure of database objects. These operations typically involve an implicit COMMIT, meaning changes are permanent and cannot be rolled back.

DDL: Building a Table Column by Column

CREATE TABLE defines the structure of a new table, specifying columns, data types, and constraints one at a time.

Click Play or Step Forward to begin the DDL walkthrough

SQL Statement

CREATE TABLE Employees (    employee_id   SERIAL PRIMARY KEY,    first_name    VARCHAR(50) NOT NULL,    last_name     VARCHAR(50) NOT NULL,    email         VARCHAR(100) UNIQUE,    hire_date     DATE DEFAULT CURRENT_DATE,    salary        DECIMAL(10,2) CHECK (salary >= 0),    department_id INT,    FOREIGN KEY (department_id)        REFERENCES Departments(department_id)        ON DELETE SET NULL        ON UPDATE CASCADE);

Table Structure

Employees
Press Play to start
Step through to see how CREATE TABLE builds a table structure column by column.
Step 0 / 10

ALTER TABLE: Modifying Table Structure

ALTER TABLE modifies existing table structure. ADD/DROP columns, change types, and manage constraints — all without recreating the table.

Click Play or Step Forward to begin

ALTER TABLE Commands

-- 1. Add a new columnALTER TABLE EmployeesADD COLUMN phone_number VARCHAR(20);-- 2. Add a self-referencing FK columnALTER TABLE EmployeesADD COLUMN manager_id INT    REFERENCES Employees(employee_id);-- 3. Change a column's data typeALTER TABLE EmployeesALTER COLUMN phone_number TYPE VARCHAR(25);-- 4. Drop a columnALTER TABLE EmployeesDROP COLUMN phone_number;-- 5. Add a constraintALTER TABLE EmployeesADD CONSTRAINT chk_salary_min    CHECK (salary >= 15000);

Table Structure

Employees6 columns
employee_idSERIALPK
first_nameVARCHAR(50)NOT NULL
last_nameVARCHAR(50)NOT NULL
emailVARCHAR(100)UNIQUE
salaryDECIMAL(10,2)
department_idINTFK
Step through to see how ALTER TABLE modifies an existing table structure.
Step 0 / 7

DROP TABLE vs TRUNCATE TABLE

DROP TABLE

  • Removes table structure AND data
  • Cannot be rolled back (DDL)
  • Very fast (removes metadata)
  • Fails if foreign keys reference it

TRUNCATE TABLE

  • Removes all rows, keeps structure
  • Cannot be rolled back (DDL)
  • Faster than DELETE for full table
  • Resets auto-increment counters

4DML: Data Manipulation Language

DML commands query and modify data within the database. These operations are transactional and can be rolled back before a COMMIT.

DML: Modifying Data Step by Step

DML commands (INSERT, UPDATE, DELETE) modify data within tables. These operations are transactional and can be rolled back.

Click Play or Step Forward to begin

DML Commands

-- 1. Single row insertINSERT INTO Employees (name, salary, dept)VALUES ('Alice', 75000, 'Engineering');-- 2. Multi-row insertINSERT INTO Employees (name, salary, dept) VALUES    ('Bob', 65000, 'Marketing'),    ('Carol', 80000, 'Engineering');-- 3. Insert another rowINSERT INTO Employees (name, salary, dept)VALUES ('Dave', 55000, 'Marketing');-- 4. Update specific rowUPDATE EmployeesSET salary = 72000WHERE name = 'Bob';-- 5. Update with different columnUPDATE EmployeesSET dept = 'Engineering'WHERE name = 'Dave';-- 6. Delete specific rowDELETE FROM EmployeesWHERE name = 'Carol';-- 7. Update multiple rows (10% raise)UPDATE Employees SET salary = salary * 1.10WHERE dept = 'Engineering';

Employees Table0 rows

IDNameSalaryDept
Press Play to start
Step through to see how INSERT, UPDATE, and DELETE modify table data.
Step 0 / 10

5DQL: SELECT Anatomy

The SELECT statement is the most frequently used SQL command. Understanding both its written syntax and logical execution order is essential.

SELECT Execution: Logical Pipeline

Click Play or Step Forward to begin the SELECT pipeline walkthrough
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_sal
FROM Employees
WHERE hire_year >= 2020
GROUP BY department
HAVING COUNT(*) >= 2
ORDER BY avg_sal DESC
LIMIT 2;
IDNameSalaryDepartmenthire_year
1Alice75,000Engineering2021
2Bob65,000Marketing2019
3Carol80,000Engineering2022
4Dave55,000Sales2020
5Eve90,000Engineering2023
6Frank60,000Marketing2021
7Grace70,000Sales2018
Step through to see how a SELECT query processes data through each clause in logical execution order.
Step 0 / 9
SQL executes in logical order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. Understanding this order is key to writing correct queries.

Written Order vs Logical Execution Order

Written Order (Syntax)

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY
  7. LIMIT

Logical Execution Order

  1. FROM (identify tables, joins)
  2. WHERE (filter rows)
  3. GROUP BY (group rows)
  4. HAVING (filter groups)
  5. SELECT (project columns)
  6. ORDER BY (sort results)
  7. LIMIT (restrict output)

JOIN Operations: Step by Step

JOINs combine rows from two or more tables based on a related column.

Click Play or Step Forward to begin

SQL Queries

-- 1. INNER JOIN: only matching rowsSELECT e.Name, d.dept_nameFROM Employees eINNER JOIN Departments d ON e.dept_id = d.dept_id;-- 2. LEFT JOIN: all left rows + matchesSELECT e.Name, d.dept_nameFROM Employees eLEFT JOIN Departments d ON e.dept_id = d.dept_id;-- 3. FULL OUTER JOIN: all rows from bothSELECT e.Name, d.dept_nameFROM Employees eFULL OUTER JOIN Departments d ON e.dept_id = d.dept_id;

Source Tables

Employees
IDNamedept_id
1Alice10
2Bob20
3Carol30
4DaveNULL
Departments
dept_iddept_name
10Engineering
20Marketing
40Sales

Result0 rows

Namedept_name
Press Play to start
Step through to see how INNER JOIN, LEFT JOIN, and FULL OUTER JOIN combine rows from two tables.
Step 0 / 12

Aggregate Functions & GROUP BY

Aggregate functions perform calculations on groups of rows. Combined with GROUP BY and HAVING, they enable powerful analytical queries.

Click Play or Step Forward to begin

SQL Query

SELECT    department,    COUNT(*) AS emp_count,    AVG(salary) AS avg_salary,    MAX(salary) AS max_salaryFROM EmployeesGROUP BY departmentHAVING COUNT(*) >= 2ORDER BY avg_salary DESC;

Employees Table6 rows

IDNameSalaryDept
1Alice75,000Engineering
2Bob65,000Marketing
3Carol80,000Engineering
4Dave55,000Sales
5Eve90,000Engineering
6Frank60,000Marketing
Step through to see how GROUP BY, aggregate functions, and HAVING work together.
Step 0 / 10

Subqueries: Scalar vs Correlated

A subquery is a query nested inside another. Correlated subqueries reference the outer query and re-execute for each row.

Click Play or Step Forward to begin

SQL Queries

-- 1. Scalar subquery: above overall averageSELECT name, salaryFROM EmployeesWHERE salary > (SELECT AVG(salary) FROM Employees);-- 2. Correlated subquery: above dept averageSELECT e.name, e.salary, e.deptFROM Employees eWHERE e.salary > (    SELECT AVG(e2.salary) FROM Employees e2    WHERE e2.dept = e.dept);

Employees Table5 rows

IDNameSalaryDept
1Alice75,000Eng
2Bob55,000Mkt
3Carol80,000Eng
4Dave60,000Mkt
5Eve90,000Eng
Step through to see how scalar and correlated subqueries execute.
Step 0 / 12

6Constraints

Constraints are rules enforced on table columns to ensure data integrity and consistency. They prevent invalid data from entering the database.

Constraints: Validating INSERT Operations

Constraints enforce data integrity rules at the database level. Invalid operations are rejected before they can corrupt the data.

Click Play or Step Forward to begin

INSERT Statements

-- 1. Valid rowINSERT INTO Employees VALUES (1, 'Alice', '[email protected]', 75000, 1);-- 2. Valid rowINSERT INTO Employees VALUES (2, 'Bob', '[email protected]', 65000, 2);-- 3. NOT NULL violationINSERT INTO Employees VALUES (3, NULL, '[email protected]', 80000, 1);-- 4. UNIQUE violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 1);-- 5. CHECK violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', -5000, 1);-- 6. FK violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 999);-- 7. Valid rowINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 1);-- 8. NULL FK (allowed)INSERT INTO Employees VALUES (4, 'Dave', '[email protected]', 60000, NULL);

Employees Table0 rows

PKNOT NULLUNIQUECHECK(>=0)FK
idnameemailsalarydept_id
Press Play to start
Step through to see how SQL constraints validate each INSERT operation.
Step 0 / 10

NOT NULL

Ensures a column cannot store NULL values. Every row must have a value for this column.

UNIQUE

Ensures all values in a column are distinct. Allows NULL (typically once).

CHECK

Ensures values satisfy a boolean condition. Example: CHECK (salary >= 0)

DEFAULT

Provides a default value when no value is specified during INSERT.

PRIMARY KEY

NOT NULL + UNIQUE. Uniquely identifies each row. One per table.

FOREIGN KEY

References another table's primary key. Enforces referential integrity with ON DELETE/ON UPDATE actions.

Foreign Key Actions

FOREIGN KEY (department_id)
    REFERENCES Departments(department_id)
    ON DELETE CASCADE      -- Delete child rows when parent deleted
    ON UPDATE CASCADE;     -- Update child FKs when parent PK changes

-- Other options:
-- ON DELETE SET NULL      -- Set FK to NULL when parent deleted
-- ON DELETE RESTRICT      -- Prevent parent deletion if children exist
-- ON DELETE SET DEFAULT   -- Set FK to default value

7SQL Walkthroughs

Walkthrough 1: Basic Selection & Filtering

Retrieve employees earning over $60,000 hired after 2022, ordered by salary descending.

Step 1/8:Source Table

SQL Query

SELECT    first_name,    last_name,    salaryFROM    EmployeesWHERE    salary > 60000    AND hire_date > '2022-01-01'ORDER BY    salary DESC;

Employees Table6 of 6 rows

first_namelast_namesalaryhire_date
AliceJohnson75,0002023-03-15
BobSmith55,0002021-06-01
CarolWilliams80,0002022-08-20
DaveBrown45,0002023-01-10
EveDavis90,0002022-11-05
FrankMiller62,0002020-04-22
The Employees table contains 6 rows. We will walk through each SQL clause to see how the result set is built.
Step 1 / 8

Walkthrough 2: JOIN + Aggregation

Calculate average salary per department with more than 3 employees.

Click Play or Step Forward to begin

SQL Query

SELECT    d.department_name,    AVG(e.salary) AS average_salaryFROM    Employees eINNER JOIN    Departments d ON e.department_id = d.department_idGROUP BY    d.department_nameHAVING    COUNT(e.employee_id) > 3ORDER BY    average_salary DESC;

Employees

idnamesalarydept_id
1Alice75,0001
2Bob65,0002
3Carol80,0001
4Dave55,0002
5Eve90,0001
6Frank60,0001
7Grace70,0003

Departments

dept_iddepartment_name
1Engineering
2Marketing
3Sales
Step through to see how JOIN, GROUP BY, HAVING, and ORDER BY process data.
Step 0 / 9

Walkthrough 3: Correlated Subquery

Find employees whose salary exceeds their department's average.

Click Play or Step Forward to begin

SQL Query

SELECT    e.first_name,    e.last_name,    e.salary,    e.department_idFROM    Employees eWHERE    e.salary > (        SELECT AVG(e2.salary)        FROM Employees e2        WHERE e2.department_id = e.department_id    );

Employees TableSource - 5 rows

IDFirstLastSalaryDept
1AliceJohnson75,0001
2BobSmith65,0002
3CarolWilliams80,0001
4DaveBrown55,0002
5EveDavis90,0001
Step through to see how a correlated subquery evaluates each row against its department average.
Step 0 / 10

Walkthrough 4: Window Function (DENSE_RANK)

Find the top 2 highest-paid employees in each department using a CTE.

Click Play or Step Forward to begin

SQL Query

WITH RankedEmployees AS (    SELECT        e.first_name,        e.salary,        d.department_name,        DENSE_RANK() OVER (            PARTITION BY e.department_id            ORDER BY e.salary DESC        ) AS rank    FROM Employees e    INNER JOIN Departments d        ON e.department_id = d.department_id)SELECT first_name, salary,       department_name, rankFROM RankedEmployeesWHERE rank <= 2ORDER BY department_name, rank;

Data6 rows

NameSalaryDept
Alice75,000Engineering
Carol80,000Engineering
Eve90,000Engineering
Frank60,000Engineering
Bob65,000Marketing
Dave55,000Marketing
Step through to see how a CTE with DENSE_RANK finds the top-N per group.
Step 0 / 10

8Memory Aids

SQL Clause Execution Order

“FROM WHERE GROUP HAVING SELECT ORDER LIMIT” — “Friendly Waiters Graciously Help Serve Ordered Lunches.”

WHERE vs HAVING

“WHERE filters rows (before grouping). HAVING filters groups (after grouping). WHERE = individual bouncer. HAVING = group bouncer.”

JOIN Types

“INNER = intersection (only matches). LEFT = all left + matches right. FULL = everything with NULLs for gaps.”

NULL Comparisons

“Never use = NULL. Always use IS NULL. Think of NULL as 'I don't know' — you can't compare unknowns.”

PRIMARY KEY vs FOREIGN KEY

“Primary Key = your SSN (uniquely identifies you). Foreign Key = your address pointing to a house in the Houses table.”

DDL vs DML

“DDL = building the house (CREATE, ALTER, DROP). DML = arranging the furniture inside (INSERT, UPDATE, DELETE, SELECT).”

9Common Mistakes

Using = NULL instead of IS NULL

Wrong: WHERE manager_id = NULL

Correct: WHERE manager_id IS NULL. Comparisons with NULL using = always evaluate to UNKNOWN, never TRUE. This is the most common SQL pitfall.

Using HAVING instead of WHERE

Wrong: HAVING salary > 50000 (non-aggregate filter)

Correct: Use WHERE for row-level filters (before grouping) and HAVING only for aggregate conditions. Using HAVING for row filtering may work but is inefficient — it forces grouping before filtering.

SELECT columns not in GROUP BY

Wrong: SELECT name, department_id, AVG(salary) GROUP BY department_id

Correct: Every column in SELECT must either appear in GROUP BY or be inside an aggregate function. The database cannot determine which “name” to show for a group of multiple employees.

Forgetting ON clause in JOINs

Wrong: FROM Employees e INNER JOIN Departments d (no ON)

Correct: Always specify the join condition with ON. Without it, you get a CROSS JOIN (Cartesian product), which produces N x M rows and almost certainly incorrect results.

UPDATE or DELETE without WHERE

Dangerous: DELETE FROM Employees; (deletes ALL rows)

Correct: Always include a WHERE clause with UPDATE and DELETE unless you intentionally want to affect every row. Run a SELECT with the same WHERE first to verify which rows will be affected.

Confusing COUNT(*) with COUNT(column)

Wrong assumption: COUNT(manager_id) counts all employees

Correct: COUNT(column) ignores NULL values. If 3 out of 10 employees have NULL manager_id, COUNT(manager_id) returns 7, not 10. Use COUNT(*) to count all rows.

Using DISTINCT to hide a bad JOIN

Wrong pattern: Adding SELECT DISTINCT to fix duplicate rows from a query

Correct: If your JOIN produces unexpected duplicates, your join condition is likely wrong or you are missing a condition. DISTINCT masks the real problem — fix the JOIN instead.

Using aliases in WHERE before SELECT executes

Wrong: WHERE average_salary > 50000 (alias from SELECT)

Correct: WHERE is processed before SELECT in execution order, so column aliases defined in SELECT are not yet available. Repeat the expression or use a subquery/CTE.

Frequently Asked Questions

What is the difference between DDL and DML in SQL?
DDL (Data Definition Language) defines and modifies database structure — commands like CREATE TABLE, ALTER TABLE, and DROP TABLE. DML (Data Manipulation Language) works with the data itself — SELECT to query, INSERT to add, UPDATE to modify, and DELETE to remove rows. DDL changes are typically auto-committed and cannot be rolled back, while DML operations participate in transactions.
When should I use GROUP BY vs HAVING?
GROUP BY groups rows that share common values so aggregate functions (COUNT, AVG, SUM, etc.) can operate on each group. HAVING filters those groups after aggregation. Use WHERE to filter individual rows before grouping, and HAVING to filter groups after. For example: "departments with more than 5 employees" requires GROUP BY department_id HAVING COUNT(*) > 5.
How does SQL handle NULL values?
NULL represents an unknown or missing value. SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison with NULL (even NULL = NULL) returns UNKNOWN. Use IS NULL or IS NOT NULL to test for NULLs. Aggregate functions like COUNT(column) ignore NULLs, while COUNT(*) counts all rows. The COALESCE function returns the first non-NULL value from a list of expressions.
What is the difference between INNER JOIN and LEFT JOIN?
INNER JOIN returns only rows that have matching values in both tables — unmatched rows are excluded. LEFT JOIN (LEFT OUTER JOIN) returns all rows from the left table and matching rows from the right table. If no match exists on the right, NULL values fill in. Use LEFT JOIN when you need all records from one table regardless of matches (e.g., all customers, even those with no orders).
What is the difference between DELETE, TRUNCATE, and DROP?
DELETE removes specific rows (uses WHERE) and can be rolled back within a transaction. TRUNCATE removes all rows but keeps the table structure — it is faster because it deallocates data pages rather than logging individual row deletions, and it resets auto-increment counters. DROP removes the entire table including its structure, data, indexes, and constraints. TRUNCATE and DROP are DDL commands and typically cannot be rolled back.
What are window functions and when should I use them?
Window functions perform calculations across a set of rows related to the current row without collapsing them into groups (unlike GROUP BY). They use the OVER clause with optional PARTITION BY and ORDER BY. Common uses include ranking (ROW_NUMBER, RANK, DENSE_RANK), running totals (SUM OVER), and moving averages. Use them for "top-N per group" queries, cumulative calculations, or when you need both detail and aggregate data in the same result set.

Practice Quiz

Test your understanding of SQL basics — select the correct answer for each question.

1.Which SQL sublanguage is used to define, modify, or drop database objects like tables and indexes?

2.What is the primary difference between WHERE and HAVING clauses?

3.What does an INNER JOIN return?

4.What is the correct logical order of SQL clause execution?

5.What happens when you compare a value to NULL using the = operator?

6.What is the difference between DROP TABLE and TRUNCATE TABLE?

7.Which aggregate function counts all rows including those with NULL values?

8.What does the ON DELETE CASCADE option on a foreign key constraint do?

9.What is a correlated subquery?

10.What does the DENSE_RANK() window function do differently from RANK()?

Study Tips

  • Practice on real data: Set up PostgreSQL or SQLite locally and write queries against sample datasets like the classic Employees/Departments schema.
  • Trace execution order: For every SELECT query, mentally walk through FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY to understand what happens at each step.
  • Draw Venn diagrams for JOINs: Visualize INNER, LEFT, RIGHT, and FULL JOINs as overlapping circles to build intuition for which rows are included.
  • Test edge cases with NULLs: Insert NULL values and observe how they behave with comparisons, aggregates, and JOINs.

Related Topics