Normal Distribution
The Normal Distribution, often called the Gaussian distribution, is one of the most important and widely used continuous probability distributions in statistics. Its symmetric, bell-shaped curve is a cornerstone of inferential statistics.
This guide covers key definitions, the standard normal distribution, Z-scores, the Empirical Rule (68-95-99.7), probability calculations, worked examples, memory aids, and a practice quiz.
1Introduction
The Normal Distribution is a cornerstone of inferential statistics because many natural phenomena and statistical processes tend to follow this distribution, at least approximately. Understanding it allows us to model, analyze, and make predictions about data.
It is defined by two parameters: its mean (μ) and its standard deviation (σ). The mean determines the center of the bell curve, while the standard deviation controls how spread out the data is around that center.
Imagine measuring the heights of every student in your school. Most students cluster around the average height, with fewer and fewer students being very short or very tall. If you plotted all these heights on a histogram, the shape would form a smooth, symmetric bell curve -- a normal distribution.
Why It Matters
Natural Phenomena
Many natural measurements (heights, blood pressure, measurement errors, test scores) are approximately normally distributed.
Statistical Inference
It is the basis for many inference methods including confidence intervals and hypothesis testing.
Central Limit Theorem
The sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population shape.
Probability Calculations
Allows us to calculate the probability of observing values within any given range using Z-scores and standard tables.
2Key Definitions
Normal Distribution
A continuous probability distribution that is symmetric about its mean, forming a bell-shaped curve. Defined by two parameters: mean (μ) and standard deviation (σ).
Bell Curve
The graphical representation of the normal distribution. It is symmetric, with the highest point at the mean, and tails that extend indefinitely but never touch the x-axis.
Mean (μ)
Pronounced "mu." The central tendency of the distribution. It is the peak of the bell curve and also represents the median and mode in a normal distribution.
Standard Deviation (σ)
Pronounced "sigma." A measure of the spread or variability of data around the mean. Larger σ means wider, flatter curve; smaller σ means narrower, taller curve.
Z-score (Standard Score)
A measure of how many standard deviations a data point is from the mean. Formula: Z = (X - μ) / σ. It allows comparison of values from different normal distributions.
Empirical Rule (68-95-99.7)
Approximately 68% of data falls within 1σ of the mean, 95% within 2σ, and 99.7% within 3σ.
3Standard Normal Distribution
The Standard Normal Distribution is a special case of the normal distribution where the mean (μ) is 0 and the standard deviation (σ) is 1.
Definition
The Standard Normal Distribution has μ = 0 and σ = 1.
Any normal distribution can be transformed into a standard normal distribution using the Z-score formula.
This transformation is incredibly useful because:
- It allows us to use a single Z-table or calculator function to find probabilities for any normal distribution.
- Probabilities associated with Z-scores represent the area under the standard normal curve.
Z = (X - μ) / σ
Convert any value X from a normal distribution to its standardized Z-score.
Think of the Z-score as a universal translator. No matter what units or scale your original data uses, the Z-score converts it into "number of standard deviations from the mean," letting you compare apples to oranges.
4Finding Probabilities with Z-scores
To find the probability of a value falling within a certain range in a normal distribution, follow these steps:
Step 1: Identify parameters
Identify μ and σ for the given distribution.
Step 2: Identify the X value(s)
Determine the X value(s) for which you want to find the probability.
Step 3: Calculate the Z-score(s)
Use the formula: Z = (X - μ) / σ
Step 4: Look up the Z-score(s)
Use a Z-table or calculator. A Z-table typically gives the cumulative probability P(Z < z) -- the area to the left of the Z-score.
Interpreting Probabilities
P(X < x)
Look up z in the table directly. The table gives the area to the left.
P(X > x)
Calculate 1 - P(Z < z). The total area under the curve is 1.
P(x₁ < X < x₂)
Calculate P(Z < z₂) - P(Z < z₁). Subtract the left area from the right area.
The Z-table gives "area to the left." To find area to the right, subtract from 1. To find area between two values, subtract the smaller left-area from the larger left-area. This covers every probability question you will encounter.
5The Empirical Rule (68-95-99.7)
The Empirical Rule provides a quick estimate of probabilities for values close to the mean without needing a Z-table. It describes the percentage of data that falls within a certain number of standard deviations from the mean.
68%
Within 1σ of μ
34% on each side of the mean
95%
Within 2σ of μ
47.5% on each side of the mean
99.7%
Within 3σ of μ
49.85% on each side of the mean
Since the curve is symmetric, 5% falls outside μ ± 2σ, which means 2.5% is in each tail. This is why 2 standard deviations is often used as a threshold for "unusual" values in statistics.
6Worked Examples
Example 1: Using the Empirical Rule
The scores on a standardized test are normally distributed with μ = 500 and σ = 100.
Question a: What percentage of students scored between 400 and 600?
400 = 500 - 100 = μ - 1σ
600 = 500 + 100 = μ + 1σ
By the Empirical Rule: approximately 68%
Question b: What percentage of students scored above 700?
700 = 500 + 2 × 100 = μ + 2σ
95% scored between 300 and 700 (μ ± 2σ)
100% - 95% = 5% scored outside this range
By symmetry, half (2.5%) scored above 700
Approximately 2.5% scored above 700
Example 2: Finding P(X < x)
The height of adult males in a city is normally distributed with μ = 175 cm and σ = 7 cm. What is the probability that a randomly selected male is shorter than 168 cm?
Identify: μ = 175 cm, σ = 7 cm, X = 168 cm
Z = (168 - 175) / 7 = -7 / 7 = -1.00
P(Z < -1.00) = 0.1587
P(height < 168 cm) = 0.1587 (15.87%)
Example 3: Finding P(X > x)
Using the same height distribution (μ = 175 cm, σ = 7 cm), what is the probability that a randomly selected male is taller than 182 cm?
Z = (182 - 175) / 7 = 7 / 7 = 1.00
P(Z < 1.00) = 0.8413
P(Z > 1.00) = 1 - 0.8413 = 0.1587
P(height > 182 cm) = 0.1587 (15.87%)
Example 4: Finding P(x₁ < X < x₂)
Using the same distribution (μ = 175 cm, σ = 7 cm), what is the probability that a randomly selected male is between 170 cm and 180 cm tall?
For X₁ = 170: Z₁ = (170 - 175) / 7 = -5/7 ≈ -0.71
For X₂ = 180: Z₂ = (180 - 175) / 7 = 5/7 ≈ 0.71
P(Z < -0.71) ≈ 0.2389
P(Z < 0.71) ≈ 0.7611
P(-0.71 < Z < 0.71) = 0.7611 - 0.2389 = 0.5222
P(170 < height < 180) = 0.5222 (52.22%)
7Key Formulas
Z-Score Formula
Z = (X - μ) / σ
Within 1σ
P(μ - σ < X < μ + σ) ≈ 0.68
Within 2σ
P(μ - 2σ < X < μ + 2σ) ≈ 0.95
Within 3σ
P(μ - 3σ < X < μ + 3σ) ≈ 0.997
8Memory Aids
"How many Zero-Mean, One-Standard-Deviation steps away?"
The Z-score tells you how many standard deviations your X value is from the mean. Z stands for "Zero-mean" because it standardizes everything to a distribution centered at zero.
"68-95-99.7: A Statistical Postcode"
Remember these three numbers like a postcode for normal data. 68% at 1 standard deviation, 95% at 2, and 99.7% at 3. It's the address of nearly all your data.
"Bell-shaped, symmetric, tails never touch the ground"
The normal curve always looks like a bell, is perfectly symmetric around the mean, and the tails extend infinitely in both directions without ever reaching the x-axis.
"Mean is the Bullseye, Standard Deviation is the Scatter"
Think of a target: the mean is the bullseye (center), and the standard deviation tells you how scattered the arrows (data points) are around it.
9Common Mistakes
Forgetting to subtract from 1
When finding P(X > x) or P(Z > z), students often forget that the Z-table gives P(Z < z), so you need to calculate 1 - P(Z < z) to get the area to the right.
Mixing up X and Z
Remember, X is a value from the original distribution and Z is its standardized score. They are not interchangeable. Always convert X to Z before using the Z-table.
Incorrectly using the Empirical Rule
The Empirical Rule is an approximation and only applies to integer multiples of standard deviations (1, 2, or 3). For other values, you must use Z-scores and a Z-table or calculator.
Rounding Z-scores too early
Rounding Z-scores to too few decimal places can significantly impact the accuracy of the probability. Typically, round to two decimal places for Z-table use.
Misinterpreting the Z-table
Always check what your specific Z-table provides (area to the left, area to the right, or area between mean and Z). Most common tables give area to the left.
Assuming all data is normal
Not all data follows a normal distribution. Always check for normality (e.g., with a histogram, Q-Q plot, or normality tests) if it is not explicitly stated.
Quick Revision Summary
- ✓The Normal Distribution is a continuous probability distribution defined by its mean (μ) and standard deviation (σ).
- ✓It forms a bell-shaped, symmetric curve with the mean at the center.
- ✓The Standard Normal Distribution has μ = 0 and σ = 1.
- ✓The Z-score formula Z = (X - μ) / σ standardizes any normal distribution for probability calculations.
- ✓The Empirical Rule: 68% within 1σ, 95% within 2σ, 99.7% within 3σ of the mean.
- ✓For P(X > x), use 1 - P(Z < z). For P(x₁ < X < x₂), subtract: P(Z < z₂) - P(Z < z₁).
- ✓The mean, median, and mode are all equal in a normal distribution.
- ✓The total area under the normal curve is always 1 (representing 100% probability).
- ✓Always verify normality before applying normal distribution methods to real data.
Frequently Asked Questions
- Can a normal distribution have any mean or standard deviation?
- Yes, the mean (μ) can be any real number, and the standard deviation (σ) can be any positive real number. The mean shifts the center of the bell curve left or right, while the standard deviation controls the width and height of the curve.
- Why do we convert to Z-scores?
- To standardize values from different normal distributions. This allows us to use a single Z-table or statistical function to find probabilities, regardless of the original mean and standard deviation. Without Z-scores, we would need a separate probability table for every possible combination of mean and standard deviation.
- Is all data normally distributed?
- No. Many datasets are approximately normal, but many are skewed, uniform, or follow other distributions. It is important not to assume normality without checking. You can verify normality using histograms, Q-Q plots, or formal normality tests.
- When should I use the Empirical Rule versus Z-scores?
- Use the Empirical Rule for quick approximations when values are exactly 1, 2, or 3 standard deviations from the mean. For any other values, or for precise probabilities, use Z-scores and a Z-table or calculator.
- What does the area under the normal curve represent?
- The area under the curve represents probability. The total area under the entire curve is always equal to 1 (or 100%). The area between any two values on the x-axis gives the probability that a randomly selected data point falls within that range.
Practice Quiz
Test your knowledge — select the correct answer for each question.
1.Which two parameters define a normal distribution?
2.What is the shape of a normal distribution called?
3.According to the Empirical Rule, approximately what percentage of data falls within one standard deviation of the mean?
4.A Z-score tells us:
5.What are the mean and standard deviation of the Standard Normal Distribution?
6.If a Z-score is -2.00, it means the data point is:
7.A normal distribution has a mean of 70 and a standard deviation of 5. What is the Z-score for a data point of 75?
8.Using the Empirical Rule, if a normal distribution has mu = 100 and sigma = 10, approximately what percentage of data falls between 80 and 120?
9.If P(Z < 1.25) = 0.8944, what is P(Z > 1.25)?
10.A Z-table typically provides the area under the standard normal curve:
Final Study Advice
- 1.Always draw a sketch of the normal curve and shade the area you are looking for before doing any calculations.
- 2.Memorize the Empirical Rule (68-95-99.7) for quick estimates and sanity checks on your answers.
- 3.Practice converting between X values and Z-scores until it becomes automatic -- most errors come from this step.
- 4.Remember that the Z-table gives P(Z < z). For "greater than" questions, subtract from 1.
- 5.Check your answer against the Empirical Rule for reasonableness. If your Z-score is about 1 and your probability is far from 0.16 or 0.84, double-check your work.