SQL Basics
Structured Query Language (SQL) is the standard declarative language for managing data in Relational Database Management Systems (RDBMS). It allows you to define database structures, manipulate data, and control access — all by specifying what you want rather than how to get it.
This guide covers DDL (CREATE, ALTER, DROP), DML (INSERT, UPDATE, DELETE), DQL (SELECT with JOINs, aggregation, subqueries), constraints, SQL walkthroughs, and a 10-question practice quiz.
SQL Query Execution Flow
Result Table
Step through to see intermediate results
1Introduction
SQL is a domain-specific, declarative language designed for managing data held in a relational database. It adheres to the principles of the relational model, introduced by E.F. Codd in 1970, which organizes data into relations (tables) with attributes (columns) and tuples (rows). SQL has been standardized by ANSI and ISO since 1986, ensuring broad interoperability across database systems like PostgreSQL, MySQL, Oracle, and SQL Server.
In a university database curriculum, SQL bridges the gap between theoretical data organization (relational algebra, ER modeling) and practical data interaction. Its declarative nature means the query optimizer determines the most efficient execution plan — you focus on the logic, not the mechanics. Mastery of SQL is indispensable for data science, software engineering, database administration, and business intelligence.
A major e-commerce platform relies on SQL databases to manage customer orders, product catalogs, and inventory. When a customer places an order, SQL INSERT statements add new records, UPDATE statements adjust inventory levels, and complex SELECT queries retrieve order details for real-time processing and analytical reporting.

2Key Definitions
Essential terms for understanding SQL and relational databases at the university level.
DDL (Data Definition Language)
Commands to define/modify database structure: CREATE, ALTER, DROP
DML (Data Manipulation Language)
Commands to manipulate data: SELECT, INSERT, UPDATE, DELETE
SELECT
Retrieves data from one or more tables; the most-used SQL command
FROM
Specifies the table(s) to query; supports JOINs for combining tables
WHERE
Filters individual rows before grouping using boolean conditions
JOIN
Combines rows from two+ tables based on related columns (INNER, LEFT, RIGHT, FULL)
GROUP BY
Groups rows with same values so aggregate functions operate per group
HAVING
Filters groups after GROUP BY using aggregate conditions
ORDER BY
Sorts the result set by one or more columns (ASC or DESC)
Primary Key
Column(s) that uniquely identify each row; NOT NULL + UNIQUE
Foreign Key
Column(s) referencing another table's primary key; enforces referential integrity
NULL
Special marker for unknown/missing values; uses three-valued logic (TRUE, FALSE, UNKNOWN)
Index
Data structure (B+Tree, hash) that speeds up data retrieval at cost of storage
View
Virtual table based on a SELECT query; does not store data itself
3DDL: Data Definition Language
DDL commands define, modify, and drop the structure of database objects. These operations typically involve an implicit COMMIT, meaning changes are permanent and cannot be rolled back.
DDL: Building a Table Column by Column
CREATE TABLE defines the structure of a new table, specifying columns, data types, and constraints one at a time.
SQL Statement
CREATE TABLE Employees ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) UNIQUE, hire_date DATE DEFAULT CURRENT_DATE, salary DECIMAL(10,2) CHECK (salary >= 0), department_id INT, FOREIGN KEY (department_id) REFERENCES Departments(department_id) ON DELETE SET NULL ON UPDATE CASCADE);
Table Structure
ALTER TABLE: Modifying Table Structure
ALTER TABLE modifies existing table structure. ADD/DROP columns, change types, and manage constraints — all without recreating the table.
ALTER TABLE Commands
-- 1. Add a new columnALTER TABLE EmployeesADD COLUMN phone_number VARCHAR(20);-- 2. Add a self-referencing FK columnALTER TABLE EmployeesADD COLUMN manager_id INT REFERENCES Employees(employee_id);-- 3. Change a column's data typeALTER TABLE EmployeesALTER COLUMN phone_number TYPE VARCHAR(25);-- 4. Drop a columnALTER TABLE EmployeesDROP COLUMN phone_number;-- 5. Add a constraintALTER TABLE EmployeesADD CONSTRAINT chk_salary_min CHECK (salary >= 15000);
Table Structure
DROP TABLE vs TRUNCATE TABLE
DROP TABLE
- Removes table structure AND data
- Cannot be rolled back (DDL)
- Very fast (removes metadata)
- Fails if foreign keys reference it
TRUNCATE TABLE
- Removes all rows, keeps structure
- Cannot be rolled back (DDL)
- Faster than DELETE for full table
- Resets auto-increment counters
4DML: Data Manipulation Language
DML commands query and modify data within the database. These operations are transactional and can be rolled back before a COMMIT.
DML: Modifying Data Step by Step
DML commands (INSERT, UPDATE, DELETE) modify data within tables. These operations are transactional and can be rolled back.
DML Commands
-- 1. Single row insertINSERT INTO Employees (name, salary, dept)VALUES ('Alice', 75000, 'Engineering');-- 2. Multi-row insertINSERT INTO Employees (name, salary, dept) VALUES ('Bob', 65000, 'Marketing'), ('Carol', 80000, 'Engineering');-- 3. Insert another rowINSERT INTO Employees (name, salary, dept)VALUES ('Dave', 55000, 'Marketing');-- 4. Update specific rowUPDATE EmployeesSET salary = 72000WHERE name = 'Bob';-- 5. Update with different columnUPDATE EmployeesSET dept = 'Engineering'WHERE name = 'Dave';-- 6. Delete specific rowDELETE FROM EmployeesWHERE name = 'Carol';-- 7. Update multiple rows (10% raise)UPDATE Employees SET salary = salary * 1.10WHERE dept = 'Engineering';
Employees Table0 rows
5DQL: SELECT Anatomy
The SELECT statement is the most frequently used SQL command. Understanding both its written syntax and logical execution order is essential.
SELECT Execution: Logical Pipeline
| ID | Name | Salary | Department | hire_year |
|---|---|---|---|---|
| 1 | Alice | 75,000 | Engineering | 2021 |
| 2 | Bob | 65,000 | Marketing | 2019 |
| 3 | Carol | 80,000 | Engineering | 2022 |
| 4 | Dave | 55,000 | Sales | 2020 |
| 5 | Eve | 90,000 | Engineering | 2023 |
| 6 | Frank | 60,000 | Marketing | 2021 |
| 7 | Grace | 70,000 | Sales | 2018 |
Written Order vs Logical Execution Order
Written Order (Syntax)
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
- LIMIT
Logical Execution Order
- FROM (identify tables, joins)
- WHERE (filter rows)
- GROUP BY (group rows)
- HAVING (filter groups)
- SELECT (project columns)
- ORDER BY (sort results)
- LIMIT (restrict output)
JOIN Operations: Step by Step
JOINs combine rows from two or more tables based on a related column.
SQL Queries
-- 1. INNER JOIN: only matching rowsSELECT e.Name, d.dept_nameFROM Employees eINNER JOIN Departments d ON e.dept_id = d.dept_id;-- 2. LEFT JOIN: all left rows + matchesSELECT e.Name, d.dept_nameFROM Employees eLEFT JOIN Departments d ON e.dept_id = d.dept_id;-- 3. FULL OUTER JOIN: all rows from bothSELECT e.Name, d.dept_nameFROM Employees eFULL OUTER JOIN Departments d ON e.dept_id = d.dept_id;
Source Tables
Result0 rows
Aggregate Functions & GROUP BY
Aggregate functions perform calculations on groups of rows. Combined with GROUP BY and HAVING, they enable powerful analytical queries.
SQL Query
SELECT department, COUNT(*) AS emp_count, AVG(salary) AS avg_salary, MAX(salary) AS max_salaryFROM EmployeesGROUP BY departmentHAVING COUNT(*) >= 2ORDER BY avg_salary DESC;
Employees Table6 rows
Subqueries: Scalar vs Correlated
A subquery is a query nested inside another. Correlated subqueries reference the outer query and re-execute for each row.
SQL Queries
-- 1. Scalar subquery: above overall averageSELECT name, salaryFROM EmployeesWHERE salary > (SELECT AVG(salary) FROM Employees);-- 2. Correlated subquery: above dept averageSELECT e.name, e.salary, e.deptFROM Employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM Employees e2 WHERE e2.dept = e.dept);
Employees Table5 rows
6Constraints
Constraints are rules enforced on table columns to ensure data integrity and consistency. They prevent invalid data from entering the database.
Constraints: Validating INSERT Operations
Constraints enforce data integrity rules at the database level. Invalid operations are rejected before they can corrupt the data.
INSERT Statements
-- 1. Valid rowINSERT INTO Employees VALUES (1, 'Alice', '[email protected]', 75000, 1);-- 2. Valid rowINSERT INTO Employees VALUES (2, 'Bob', '[email protected]', 65000, 2);-- 3. NOT NULL violationINSERT INTO Employees VALUES (3, NULL, '[email protected]', 80000, 1);-- 4. UNIQUE violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 1);-- 5. CHECK violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', -5000, 1);-- 6. FK violationINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 999);-- 7. Valid rowINSERT INTO Employees VALUES (3, 'Carol', '[email protected]', 80000, 1);-- 8. NULL FK (allowed)INSERT INTO Employees VALUES (4, 'Dave', '[email protected]', 60000, NULL);
Employees Table0 rows
NOT NULL
Ensures a column cannot store NULL values. Every row must have a value for this column.
UNIQUE
Ensures all values in a column are distinct. Allows NULL (typically once).
CHECK
Ensures values satisfy a boolean condition. Example: CHECK (salary >= 0)
DEFAULT
Provides a default value when no value is specified during INSERT.
PRIMARY KEY
NOT NULL + UNIQUE. Uniquely identifies each row. One per table.
FOREIGN KEY
References another table's primary key. Enforces referential integrity with ON DELETE/ON UPDATE actions.
Foreign Key Actions
FOREIGN KEY (department_id)
REFERENCES Departments(department_id)
ON DELETE CASCADE -- Delete child rows when parent deleted
ON UPDATE CASCADE; -- Update child FKs when parent PK changes
-- Other options:
-- ON DELETE SET NULL -- Set FK to NULL when parent deleted
-- ON DELETE RESTRICT -- Prevent parent deletion if children exist
-- ON DELETE SET DEFAULT -- Set FK to default value7SQL Walkthroughs
Walkthrough 1: Basic Selection & Filtering
Retrieve employees earning over $60,000 hired after 2022, ordered by salary descending.
SQL Query
SELECT first_name, last_name, salaryFROM EmployeesWHERE salary > 60000 AND hire_date > '2022-01-01'ORDER BY salary DESC;
Employees Table6 of 6 rows
Walkthrough 2: JOIN + Aggregation
Calculate average salary per department with more than 3 employees.
SQL Query
SELECT d.department_name, AVG(e.salary) AS average_salaryFROM Employees eINNER JOIN Departments d ON e.department_id = d.department_idGROUP BY d.department_nameHAVING COUNT(e.employee_id) > 3ORDER BY average_salary DESC;
Employees
Departments
Walkthrough 3: Correlated Subquery
Find employees whose salary exceeds their department's average.
SQL Query
SELECT e.first_name, e.last_name, e.salary, e.department_idFROM Employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM Employees e2 WHERE e2.department_id = e.department_id );
Employees TableSource - 5 rows
Walkthrough 4: Window Function (DENSE_RANK)
Find the top 2 highest-paid employees in each department using a CTE.
SQL Query
WITH RankedEmployees AS ( SELECT e.first_name, e.salary, d.department_name, DENSE_RANK() OVER ( PARTITION BY e.department_id ORDER BY e.salary DESC ) AS rank FROM Employees e INNER JOIN Departments d ON e.department_id = d.department_id)SELECT first_name, salary, department_name, rankFROM RankedEmployeesWHERE rank <= 2ORDER BY department_name, rank;
Data6 rows
8Memory Aids
“FROM WHERE GROUP HAVING SELECT ORDER LIMIT” — “Friendly Waiters Graciously Help Serve Ordered Lunches.”
“WHERE filters rows (before grouping). HAVING filters groups (after grouping). WHERE = individual bouncer. HAVING = group bouncer.”
“INNER = intersection (only matches). LEFT = all left + matches right. FULL = everything with NULLs for gaps.”
“Never use = NULL. Always use IS NULL. Think of NULL as 'I don't know' — you can't compare unknowns.”
“Primary Key = your SSN (uniquely identifies you). Foreign Key = your address pointing to a house in the Houses table.”
“DDL = building the house (CREATE, ALTER, DROP). DML = arranging the furniture inside (INSERT, UPDATE, DELETE, SELECT).”
9Common Mistakes
Wrong: WHERE manager_id = NULL
Correct: WHERE manager_id IS NULL. Comparisons with NULL using = always evaluate to UNKNOWN, never TRUE. This is the most common SQL pitfall.
Wrong: HAVING salary > 50000 (non-aggregate filter)
Correct: Use WHERE for row-level filters (before grouping) and HAVING only for aggregate conditions. Using HAVING for row filtering may work but is inefficient — it forces grouping before filtering.
Wrong: SELECT name, department_id, AVG(salary) GROUP BY department_id
Correct: Every column in SELECT must either appear in GROUP BY or be inside an aggregate function. The database cannot determine which “name” to show for a group of multiple employees.
Wrong: FROM Employees e INNER JOIN Departments d (no ON)
Correct: Always specify the join condition with ON. Without it, you get a CROSS JOIN (Cartesian product), which produces N x M rows and almost certainly incorrect results.
Dangerous: DELETE FROM Employees; (deletes ALL rows)
Correct: Always include a WHERE clause with UPDATE and DELETE unless you intentionally want to affect every row. Run a SELECT with the same WHERE first to verify which rows will be affected.
Wrong assumption: COUNT(manager_id) counts all employees
Correct: COUNT(column) ignores NULL values. If 3 out of 10 employees have NULL manager_id, COUNT(manager_id) returns 7, not 10. Use COUNT(*) to count all rows.
Wrong pattern: Adding SELECT DISTINCT to fix duplicate rows from a query
Correct: If your JOIN produces unexpected duplicates, your join condition is likely wrong or you are missing a condition. DISTINCT masks the real problem — fix the JOIN instead.
Wrong: WHERE average_salary > 50000 (alias from SELECT)
Correct: WHERE is processed before SELECT in execution order, so column aliases defined in SELECT are not yet available. Repeat the expression or use a subquery/CTE.
Frequently Asked Questions
- What is the difference between DDL and DML in SQL?
- DDL (Data Definition Language) defines and modifies database structure — commands like CREATE TABLE, ALTER TABLE, and DROP TABLE. DML (Data Manipulation Language) works with the data itself — SELECT to query, INSERT to add, UPDATE to modify, and DELETE to remove rows. DDL changes are typically auto-committed and cannot be rolled back, while DML operations participate in transactions.
- When should I use GROUP BY vs HAVING?
- GROUP BY groups rows that share common values so aggregate functions (COUNT, AVG, SUM, etc.) can operate on each group. HAVING filters those groups after aggregation. Use WHERE to filter individual rows before grouping, and HAVING to filter groups after. For example: "departments with more than 5 employees" requires GROUP BY department_id HAVING COUNT(*) > 5.
- How does SQL handle NULL values?
- NULL represents an unknown or missing value. SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison with NULL (even NULL = NULL) returns UNKNOWN. Use IS NULL or IS NOT NULL to test for NULLs. Aggregate functions like COUNT(column) ignore NULLs, while COUNT(*) counts all rows. The COALESCE function returns the first non-NULL value from a list of expressions.
- What is the difference between INNER JOIN and LEFT JOIN?
- INNER JOIN returns only rows that have matching values in both tables — unmatched rows are excluded. LEFT JOIN (LEFT OUTER JOIN) returns all rows from the left table and matching rows from the right table. If no match exists on the right, NULL values fill in. Use LEFT JOIN when you need all records from one table regardless of matches (e.g., all customers, even those with no orders).
- What is the difference between DELETE, TRUNCATE, and DROP?
- DELETE removes specific rows (uses WHERE) and can be rolled back within a transaction. TRUNCATE removes all rows but keeps the table structure — it is faster because it deallocates data pages rather than logging individual row deletions, and it resets auto-increment counters. DROP removes the entire table including its structure, data, indexes, and constraints. TRUNCATE and DROP are DDL commands and typically cannot be rolled back.
- What are window functions and when should I use them?
- Window functions perform calculations across a set of rows related to the current row without collapsing them into groups (unlike GROUP BY). They use the OVER clause with optional PARTITION BY and ORDER BY. Common uses include ranking (ROW_NUMBER, RANK, DENSE_RANK), running totals (SUM OVER), and moving averages. Use them for "top-N per group" queries, cumulative calculations, or when you need both detail and aggregate data in the same result set.
Practice Quiz
Test your understanding of SQL basics — select the correct answer for each question.
1.Which SQL sublanguage is used to define, modify, or drop database objects like tables and indexes?
2.What is the primary difference between WHERE and HAVING clauses?
3.What does an INNER JOIN return?
4.What is the correct logical order of SQL clause execution?
5.What happens when you compare a value to NULL using the = operator?
6.What is the difference between DROP TABLE and TRUNCATE TABLE?
7.Which aggregate function counts all rows including those with NULL values?
8.What does the ON DELETE CASCADE option on a foreign key constraint do?
9.What is a correlated subquery?
10.What does the DENSE_RANK() window function do differently from RANK()?
Study Tips
- Practice on real data: Set up PostgreSQL or SQLite locally and write queries against sample datasets like the classic Employees/Departments schema.
- Trace execution order: For every SELECT query, mentally walk through FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY to understand what happens at each step.
- Draw Venn diagrams for JOINs: Visualize INNER, LEFT, RIGHT, and FULL JOINs as overlapping circles to build intuition for which rows are included.
- Test edge cases with NULLs: Insert NULL values and observe how they behave with comparisons, aggregates, and JOINs.