ResourcesMathematicsConfidence Intervals
Mathematics (Statistics AP)High School

Confidence Intervals

A confidence interval (CI) is a range of values, derived from a sample, that is likely to contain the true value of an unknown population parameter. It provides a more informative estimate than a single point estimate by also conveying the precision of our estimate.

This guide covers key definitions, constructing CIs for means and proportions, proper interpretation, factors affecting width, sample size determination, worked examples, memory aids, and a practice quiz.

1Introduction

In statistics, we often want to know about a characteristic of a large group (a population), but it is usually impossible to measure every single member. Instead, we take a sample and use its data to make educated guesses about the population.

A point estimate like the sample mean is almost certainly not exactly equal to the population mean. Confidence intervals address this by giving us a range of plausible values along with a measure of how confident we are in that range.

Picture This

Imagine trying to guess the average height of all students at your school. You measure 50 students and find a mean of 170 cm. Rather than claiming the true average is exactly 170 cm, you say you are 95% confident the true average is between 167 cm and 173 cm. That range is your confidence interval.

Why Use Confidence Intervals?

Quantify Uncertainty

A point estimate is a single guess. CIs show the range of plausible values, revealing how much uncertainty exists.

Communicate Precision

A narrow CI indicates a precise estimate, while a wide CI signals more uncertainty in the data.

Decision Making

CIs help make informed decisions by showing the range of potential outcomes based on the data.

2Key Definitions

Confidence Interval (CI)

A range of values within which the true population parameter is estimated to lie, with a certain level of confidence.

Margin of Error (MOE)

The "plus or minus" amount that creates the interval. CI = Point Estimate +/- Margin of Error.

Confidence Level (CL)

The probability that the method used to construct the interval will produce an interval containing the true parameter. Common levels: 90%, 95%, 99%.

Significance Level (alpha)

alpha = 1 - Confidence Level. Represents the probability of the interval not containing the true parameter.

Point Estimate

A single value from the sample used to estimate a population parameter: sample mean for population mean, sample proportion for population proportion.

Standard Error (SE)

Measures the variability of the sample statistic. For the mean: SE = s / sqrt(n). For proportions: SE = sqrt(p-hat(1 - p-hat) / n).

Critical Value

The Z* or t* value from the sampling distribution that corresponds to the chosen confidence level. For 95% CI: Z* = 1.96.

Degrees of Freedom (df)

For a t-interval: df = n - 1. Determines the shape of the t-distribution used to find the critical value.

CI = Point Estimate +/- (Critical Value x Standard Error)

The general formula for constructing any confidence interval.

3Confidence Interval for the Mean

Case 1: Population Standard Deviation Known (Z-interval)

When the population standard deviation (sigma) is known, we use the Z-distribution to construct the interval.

Assumptions

  • Random sample from the population
  • Population is normally distributed OR sample size is large (n >= 30)
  • Population standard deviation (sigma) is known

x-bar +/- Z* (sigma / sqrt(n))

Z-interval formula. Z* = 1.645 (90%), 1.96 (95%), or 2.576 (99%).

Case 2: Population Standard Deviation Unknown (t-interval)

When sigma is unknown (the most common scenario), we use the sample standard deviation (s) and the t-distribution.

Assumptions

  • Random sample from the population
  • Population is normally distributed OR sample size is large (n >= 30)
  • Population standard deviation (sigma) is unknown

x-bar +/- t* (s / sqrt(n))

t-interval formula. t* depends on confidence level and degrees of freedom (df = n - 1).

Z for Known, t for Unknown

Use the Z-distribution if the population standard deviation (sigma) is known. Use the t-distribution if only the sample standard deviation (s) is available. As the sample size increases, the t-distribution approaches the Z-distribution.

4Confidence Interval for Proportions

When estimating the proportion of a population with a certain characteristic (e.g., the fraction of voters supporting a candidate), we use the Z-distribution with the sample proportion.

Assumptions

  • Random sample from the population
  • Sample size large enough: n * p-hat >= 10 and n * (1 - p-hat) >= 10

p-hat +/- Z* sqrt(p-hat(1 - p-hat) / n)

Proportion CI formula. p-hat = number of successes / total sample size.

Common Critical Z-values

Confidence LevelalphaZ*
90%0.101.645
95%0.051.960
99%0.012.576

5Interpreting Confidence Intervals

The interpretation of a confidence interval is crucial and often misunderstood. Getting it right is one of the most important skills in introductory statistics.

Correct Interpretation (95% CI)

"We are 95% confident that the true population mean lies within this interval." Or equivalently: "If we repeated this sampling process many times, 95% of the confidence intervals constructed would contain the true population mean."

Wrong: "95% probability the true mean is in this interval"

Once the interval is calculated, the true mean is either in it or it is not. There is no probability associated with this specific instance.

Wrong: "95% of data points fall within this interval"

This describes data spread, not the location of the population parameter. A CI is about the parameter, not individual observations.

Wrong: "We are 95% confident the sample mean is in this interval"

The sample mean is always the center of the interval, so we are 100% confident it is in there. The CI estimates the population mean, not the sample mean.

6Factors Affecting CI Width

The width of a confidence interval indicates the precision of our estimate. A narrower interval is generally preferred as it implies greater precision. Three main factors control width.

Confidence Level

Higher confidence level leads to a wider interval. To be more confident, you need a wider net. A 99% CI is wider than a 90% CI for the same data.

Sample Size (n)

Larger sample size leads to a narrower interval. More data reduces uncertainty because the standard error decreases as n increases.

Variability (sigma or s)

Greater variability leads to a wider interval. If the data is more spread out, there is more inherent uncertainty in the estimate.

Remember: C-W, N-N, V-W

Confidence (level) = Wider. Sample size (N) = Narrower. Variability = Wider.

7Sample Size Determination

Before collecting data, you can determine the sample size needed to achieve a desired margin of error at a specific confidence level.

For Estimating a Population Mean

n = (Z* x sigma / MOE)²

Z* is the critical value for the desired CL. sigma is the estimated population standard deviation. MOE is the desired margin of error.

For Estimating a Population Proportion

n = p-hat(1 - p-hat)(Z* / MOE)²

If no prior estimate for p-hat is available, use p-hat = 0.5 for the most conservative (largest) sample size.

Why p-hat = 0.5?

Using p-hat = 0.5 maximizes the product p-hat(1 - p-hat) = 0.25, which gives the largest possible required sample size. This ensures your sample is large enough regardless of the true proportion.

8Worked Examples

Example 1: CI for the Mean (Unknown sigma)

A researcher wants to estimate the average height of adult males in a city. A random sample of 50 adult males yields a sample mean of 175 cm and a sample standard deviation of 8 cm. Construct a 95% confidence interval.

Given: n = 50, x-bar = 175 cm, s = 8 cm, CL = 95%

Step 1: df = n - 1 = 49

Step 2: t* for 95% CI with df = 49 is approximately 2.0096

Step 3: SE = s / sqrt(n) = 8 / sqrt(50) = 8 / 7.071 = 1.131 cm

Step 4: MOE = t* x SE = 2.0096 x 1.131 = 2.273 cm

Step 5: CI = 175 +/- 2.273 = (172.73, 177.27)

Conclusion: We are 95% confident that the true average height of adult males in the city is between 172.73 cm and 177.27 cm.

Example 2: CI for a Proportion

A poll of 800 likely voters found that 432 plan to vote for Candidate A. Construct a 90% confidence interval for the true proportion of voters supporting Candidate A.

Given: n = 800, x = 432, CL = 90%

Step 1: p-hat = 432 / 800 = 0.54

Step 2: Check: n*p-hat = 432 >= 10, n*(1 - p-hat) = 368 >= 10 (OK)

Step 3: Z* for 90% CI = 1.645

Step 4: SE = sqrt(0.54 x 0.46 / 800) = sqrt(0.0003105) = 0.0176

Step 5: MOE = 1.645 x 0.0176 = 0.0289

Step 6: CI = 0.54 +/- 0.0289 = (0.5111, 0.5689)

Conclusion: We are 90% confident that the true proportion of likely voters supporting Candidate A is between 51.11% and 56.89%.

Example 3: CI for the Mean (Known sigma)

Construct a 95% CI for the population mean with x-bar = 100, sigma = 15, n = 36.

Given: x-bar = 100, sigma = 15, n = 36, CL = 95%

Step 1: Z* = 1.96 for 95%

Step 2: SE = 15 / sqrt(36) = 15 / 6 = 2.5

Step 3: MOE = 1.96 x 2.5 = 4.9

Step 4: CI = 100 +/- 4.9 = (95.1, 104.9)

Conclusion: We are 95% confident that the true population mean is between 95.1 and 104.9.

9Memory Aids

CI is like a "Net"

A wider net (higher CL) is more likely to catch the fish (true parameter). More data (larger n) makes the net tighter around the fish (more precise).

P +/- M

Point Estimate +/- Margin of Error. The simplest way to remember the CI formula.

MOE = CV x SE

Margin of Error = Critical Value x Standard Error. The margin of error is always the critical value times the standard error.

Z for Known, t for Unknown

Use Z-distribution if population sigma is known. Use t-distribution if only sample s is known.

10Common Mistakes

Misinterpreting the confidence level

A 95% CI does not mean there is a 95% chance the true parameter is in this specific interval. It means that 95% of intervals generated by repeated sampling would contain the true parameter.

Confusing CI for individual data points

A CI estimates a population parameter (mean or proportion), not the range where individual data points are expected to fall.

Using Z instead of t (or vice versa)

Use the Z-distribution for means only if sigma is known (rare) or for proportions. Use the t-distribution for means when sigma is unknown (common).

Ignoring assumptions

Failing to check for random sampling, normality (or large sample size), and independence can invalidate the confidence interval.

Rounding errors

Rounding intermediate calculations too early can lead to inaccuracies in the final interval. Keep more decimal places during calculations.

Not stating units

Always include the units of measurement for the CI (e.g., cm, kg, %). A confidence interval without units is incomplete.

Assuming normality for small samples

If n is less than 30 and the population distribution is heavily skewed or has outliers, a t-interval might not be appropriate. Non-parametric methods may be needed.

Using wrong p-hat for sample size

When calculating required sample size for proportions without a prior estimate, you must use p-hat = 0.5 for the most conservative estimate.

Quick Revision Summary

  • A confidence interval provides a range of plausible values for a population parameter, along with a measure of confidence.
  • The general formula is Point Estimate +/- (Critical Value x Standard Error).
  • Use a Z-interval when sigma is known; use a t-interval when sigma is unknown.
  • For proportions, use the Z-distribution and check that np-hat and n(1-p-hat) are both at least 10.
  • Higher confidence level = wider interval. Larger sample size = narrower interval. Greater variability = wider interval.
  • The confidence level describes the reliability of the method, not the probability that the true parameter is in a specific interval.
  • Always check assumptions: random sample, normality or large n, and independence.
  • For sample size determination, use p-hat = 0.5 if no prior proportion estimate exists.

Frequently Asked Questions

When should I use a Z-interval versus a t-interval for the mean?
Use a Z-interval if the population standard deviation is known. Use a t-interval if the population standard deviation is unknown (which is the vast majority of real-world scenarios) and you are using the sample standard deviation as an estimate.
What if my data is not normally distributed?
If your sample size is large (n >= 30), the Central Limit Theorem states that the sampling distribution of the mean will be approximately normal, so Z or t-intervals are generally robust. For small samples from non-normal populations, you might need to use non-parametric methods or bootstrap techniques.
Can I construct a 100% confidence interval?
Technically yes, but it would be infinitely wide (from negative infinity to positive infinity), making it useless. A practical confidence interval always has some uncertainty.
How do I choose the right confidence level?
The choice depends on the context and the consequences of being wrong. 95% is a common standard. For high-stakes decisions (e.g., medical research), 99% might be preferred. For exploratory studies, 90% might suffice.
What if my sample is not random?
If the sample is not random and representative of the population, any confidence interval calculated from it will be unreliable and potentially biased. Random sampling is a fundamental assumption for valid confidence intervals.

Practice Quiz

Test your knowledge — select the correct answer for each question.

1.A confidence interval provides:

2.The margin of error for a confidence interval for the mean is calculated as:

3.What happens to the width of a confidence interval if the confidence level is increased (e.g., from 90% to 99%)?

4.Which of the following would lead to a narrower confidence interval?

5.A 95% confidence interval for the average weight of a certain species of fish is (1.8 kg, 2.2 kg). Which is the correct interpretation?

6.When constructing a confidence interval for a population mean, if the population standard deviation is unknown, which distribution should be used?

7.To estimate the sample size needed for a proportion with no prior estimate for p-hat, which value should be used for p-hat?

8.What is the critical Z-value for a 99% confidence interval?

9.If a confidence interval for the mean is (25, 35), what is the point estimate?

10.A 95% CI means that if we repeated the sampling process many times:

Final Study Advice

  • 1.Always check your assumptions (random sample, normality or large n, known vs. unknown sigma) before choosing which formula to use.
  • 2.Practice writing correct interpretations of confidence intervals -- this is a common exam question and frequently answered incorrectly.
  • 3.Memorize the common critical Z-values: 1.645 (90%), 1.96 (95%), 2.576 (99%).
  • 4.Remember that the margin of error = critical value x standard error, and practice computing it step by step.
  • 5.Always include units and state your conclusions in context of the original problem.

Related Topics