ResourcesMathematicsHypothesis Testing
Mathematics (Statistics AP)High School

Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data. It is a formal procedure for investigating ideas about the world using statistical evidence.

This guide covers key definitions, the steps of hypothesis testing, types of errors, z-tests and t-tests, two-sample tests, worked examples, memory aids, and a practice quiz.

1Introduction

Hypothesis testing is one of the most widely used tools in statistics. It allows researchers to determine whether observed differences or effects are statistically significant or simply due to random chance.

The core idea is straightforward: you start with a default assumption (the null hypothesis), collect data, and determine whether the evidence is strong enough to reject that assumption in favor of an alternative explanation.

Picture This

Imagine a company claims their new battery lasts an average of 200 hours. You suspect it is less. You buy 36 batteries, test them, and find an average life of 195 hours. Is this enough evidence to say the company is wrong, or could the difference be due to random variation? Hypothesis testing gives you a systematic framework to answer this question.

Why Use Hypothesis Testing?

Validate Claims

Test whether claims or theories about a population hold up when confronted with real data.

Compare Groups

Determine whether two treatments, methods, or populations differ in a meaningful way.

Distinguish Signal from Noise

Determine if observed differences are statistically significant or just due to random chance.

Inform Decisions

Make evidence-based decisions in research, business, medicine, and many other fields.

2Key Definitions

Null Hypothesis (H0)

The "status quo" or default assumption. States there is no effect, no difference, or no relationship. Always includes an equality sign (e.g., mu = mu_0).

Alternative Hypothesis (Ha)

What we are trying to prove or suspect is true. States there is an effect, difference, or relationship. Never includes an equality sign.

p-value

The probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming H0 is true. A small p-value provides evidence against H0.

Test Statistic

A standardized value that measures how far the sample result deviates from what is expected under H0. Its formula depends on the type of test (z, t, F, chi-square).

Significance Level (alpha)

The pre-determined threshold for rejecting H0. It is the maximum acceptable probability of a Type I error. Commonly set at 0.05, 0.01, or 0.10.

Rejection Region

The range of test statistic values that leads to rejecting H0. Defined by the critical value(s) corresponding to alpha and the type of test.

Type I Error (alpha)

Rejecting H0 when it is actually true. A "false positive." Analogy: convicting an innocent person.

Type II Error (beta)

Failing to reject H0 when it is actually false. A "false negative." Analogy: letting a guilty person go free.

3Steps in Hypothesis Testing

Hypothesis testing follows a systematic process. Here are the seven steps you should follow every time.

Step 1: State the Hypotheses

Clearly define H0 and Ha in terms of population parameters. H0 always contains the equality; Ha specifies the direction of the test.

Step 2: Choose the Significance Level (alpha)

Decide on the maximum acceptable probability of a Type I error. Common choices: 0.05, 0.01, or 0.10.

Step 3: Select the Appropriate Test

Choose which test to use (z-test, t-test, chi-square, etc.) based on data type, sample size, and known population parameters.

Step 4: Calculate the Test Statistic

Compute the value of the test statistic using the sample data and the appropriate formula.

Step 5: Determine the p-value or Critical Value(s)

Find the probability associated with the test statistic (p-value approach) or find the critical value(s) from the distribution table (critical value approach).

Step 6: Make a Decision

p-value approach: If p-value <= alpha, reject H0. If p-value > alpha, fail to reject H0. Critical value approach: If the test statistic falls in the rejection region, reject H0.

Step 7: State the Conclusion in Context

Translate the statistical decision into plain language related to the original research question. Never say "accept H0."

4Types of Tests

The direction of the alternative hypothesis determines whether the test is one-tailed or two-tailed.

Two-Tailed Test

Ha: mu is not equal to mu_0

Tests for any difference from the hypothesized value. Rejection regions in both tails.

Left-Tailed Test

Ha: mu < mu_0

Tests whether the parameter is less than the hypothesized value. Rejection region in the left tail only.

Right-Tailed Test

Ha: mu > mu_0

Tests whether the parameter is greater than the hypothesized value. Rejection region in the right tail only.

Choosing One-Tailed vs. Two-Tailed

Use a two-tailed test when you want to detect any difference (greater or less). Use a one-tailed test when you have a specific directional claim to test (e.g., "the new drug performs better" or "the average is below the standard").

5Errors & Power

There are two types of errors that can occur in hypothesis testing, and understanding them is essential.

Decision Outcomes Table

H0 is TrueH0 is False
Reject H0Type I Error (alpha)Correct Decision (Power = 1 - beta)
Fail to Reject H0Correct DecisionType II Error (beta)

Type I Error (False Positive)

Rejecting H0 when it is actually true. Probability = alpha. Like convicting an innocent person.

Type II Error (False Negative)

Failing to reject H0 when it is actually false. Probability = beta. Like letting a guilty person go free.

Statistical Power = 1 - beta

Power is the probability of correctly rejecting a false null hypothesis. Higher power means the test is better at detecting real effects. Power increases with: larger sample size, larger effect size, or larger alpha.

6One-Sample z-test for Means

The one-sample z-test is used to test a hypothesis about a single population mean when the population standard deviation (sigma) is known.

When to Use

  • Testing a hypothesis about a single population mean
  • Population standard deviation (sigma) is known
  • Sample size is large (n >= 30) OR population is normally distributed

Z = (x-bar - mu_0) / (sigma / sqrt(n))

x-bar = sample mean, mu_0 = hypothesized mean, sigma = population standard deviation, n = sample size.

One-Sample t-test (when sigma is unknown)

When sigma is unknown, use the sample standard deviation (s) and the t-distribution with df = n - 1.

t = (x-bar - mu_0) / (s / sqrt(n))

Same structure as the z-test, but uses s instead of sigma and the t-distribution with df = n - 1.

Decision Rules

p-value Approach

If p-value <= alpha: Reject H0

If p-value > alpha: Fail to reject H0

Critical Value Approach

If |test statistic| > |critical value|: Reject H0

Otherwise: Fail to reject H0

7Two-Sample Tests

Two-sample tests are used to compare parameters of two different populations or groups.

Two-Sample z-test for Means

Compares two population means when both population standard deviations are known and samples are independent.

Two-Sample t-test for Means

Compares two population means when population standard deviations are unknown and samples are independent. Requires assumptions about population variances (equal or unequal).

Paired t-test

Compares means from dependent or paired samples (e.g., before-and-after measurements on the same subjects). Focuses on the mean of the differences between pairs.

Two-Sample z-test for Proportions

Compares two population proportions. Used for categorical data when comparing success rates between two groups.

8Worked Examples

Example 1: One-Sample z-test

A company claims their "Jumbo" apples weigh an average of 200 grams. A quality control manager suspects the average weight is different. They test 36 apples and find a mean of 195 grams. The population standard deviation is known to be 15 grams. Test at the 5% significance level.

H0: mu = 200 grams

Ha: mu is not equal to 200 grams (two-tailed)

alpha: 0.05

Test: One-sample z-test (sigma = 15 is known, n = 36)

Z = (195 - 200) / (15 / sqrt(36)) = -5 / 2.5 = -2.00

p-value: 2 x P(Z < -2.00) = 2 x 0.0228 = 0.0456

Decision: p-value (0.0456) <= alpha (0.05), so reject H0

Conclusion: At a 5% significance level, there is sufficient evidence to conclude that the average weight of the "Jumbo" apples is significantly different from 200 grams.

Example 2: One-Sample z-test (Right-Tailed)

Test H0: mu = 50 vs Ha: mu > 50 with x-bar = 52, sigma = 10, n = 36, alpha = 0.05.

Z = (52 - 50) / (10 / sqrt(36)) = 2 / 1.667 = 1.20

Critical value: For alpha = 0.05 (one-tailed), Z* = 1.645

Decision: 1.20 < 1.645, so fail to reject H0

Conclusion: At the 5% significance level, there is not enough evidence to conclude that the population mean is greater than 50.

Example 3: Two-Sample t-test (Conceptual)

A researcher compares two teaching methods on student test scores. 25 students use Method A, 28 use Method B. Population standard deviations are unknown.

H0: mu_A = mu_B (no difference)

Ha: mu_A is not equal to mu_B (two-tailed)

alpha: 0.01

Test: Two-sample t-test for independent means

Process: Calculate t-statistic using sample means, sample SDs, and sample sizes. Compare to t-distribution with appropriate df.

If p-value <= 0.01: There is sufficient evidence of a significant difference between the two teaching methods.
If p-value > 0.01: There is not enough evidence to conclude a significant difference exists.

9Memory Aids

"p-value Low, Null Must Go!"

If p-value is low (less than alpha), reject the null hypothesis.

"p-value High, Null Will Fly!"

If p-value is high (greater than alpha), fail to reject the null hypothesis.

Type I = False Positive

Rejecting a true H0. Think of it as a "false alarm" -- concluding there is an effect when there is not.

Type II = False Negative

Failing to reject a false H0. Think of it as "missing the signal" -- failing to detect a real effect.

"Alpha is the Error You Tolerate"

Alpha is the maximum acceptable probability of a Type I error you are willing to accept.

p-value = Probability of the Data

The p-value is the probability of seeing data this extreme if H0 is true. It is NOT the probability that H0 is true.

10Common Mistakes

Confusing p-value with P(H0 is true)

The p-value is the probability of the data given H0 is true, not the probability that H0 itself is true.

Saying "accept H0"

We never "accept" H0. We either reject it or fail to reject it. Failing to find evidence against H0 is not the same as proving it.

Not stating conclusions in context

Always relate your statistical decision back to the original real-world problem. A naked "reject H0" without context is incomplete.

Choosing the wrong test

Incorrectly applying a z-test instead of a t-test, or using a one-sample test for a two-sample problem. Check your conditions carefully.

Ignoring assumptions

Each test has underlying assumptions (normality, independence, equal variances). Violating these can invalidate the results.

Confusing Type I and Type II errors

Type I = false positive (rejecting true H0). Type II = false negative (failing to reject false H0). Know the difference and their consequences.

Fishing for significance

Running multiple tests until one yields a "significant" p-value without proper correction inflates the Type I error rate.

Focusing solely on the p-value

The p-value should be considered alongside effect size, confidence intervals, and practical significance. A statistically significant result may not be practically important.

Quick Revision Summary

  • Hypothesis testing is a formal procedure to decide whether sample data provides enough evidence to reject a claim about a population parameter.
  • H0 (null) represents the status quo and always includes an equality. Ha (alternative) is what we want to prove.
  • The p-value is the probability of seeing data as extreme as ours if H0 is true. If p-value <= alpha, reject H0.
  • Type I error = false positive (rejecting true H0). Type II error = false negative (failing to reject false H0).
  • Use a z-test when sigma is known; use a t-test when sigma is unknown.
  • Statistical power (1 - beta) is the probability of correctly rejecting a false H0.
  • Never say "accept H0." Instead say "fail to reject H0" -- absence of evidence is not evidence of absence.
  • Always state conclusions in context of the original research question.

Frequently Asked Questions

What is the difference between a z-test and a t-test?
A z-test is used when the population standard deviation is known, or when the sample size is very large (n >= 30) and sigma is approximated by the sample standard deviation. A t-test is used when the population standard deviation is unknown and must be estimated from the sample standard deviation, especially for smaller sample sizes.
Can I "accept" the null hypothesis?
No. In hypothesis testing, we either "reject the null hypothesis" or "fail to reject the null hypothesis." We never "accept" it because we are working with sample data, which can never definitively prove the absence of an effect in the entire population. Failing to find evidence against H0 does not mean H0 is true.
How do I choose the significance level (alpha)?
The choice depends on the context and consequences of making a Type I error. Use 0.05 (5%) as a common standard. Use 0.01 (1%) when a Type I error is very costly (e.g., medical trials). Use 0.10 (10%) when a Type II error is more costly or for exploratory research.
What is the relationship between alpha and beta?
Alpha (Type I error) and beta (Type II error) have an inverse relationship. Decreasing alpha (making it harder to reject H0) generally increases beta (making it harder to detect a true effect), assuming sample size and effect size remain constant.
What is statistical power?
Statistical power is the probability of correctly rejecting a false null hypothesis, calculated as 1 - beta. A higher power means the test is more likely to detect a real effect if one exists. Power can be increased by increasing sample size, increasing effect size, or increasing alpha.

Practice Quiz

Test your knowledge — select the correct answer for each question.

1.Which of the following is true about the null hypothesis (H0)?

2.What is the p-value?

3.If a p-value is 0.03 and the significance level (alpha) is 0.05, what is the correct decision?

4.A Type I error occurs when:

5.The significance level (alpha) is the probability of committing which type of error?

6.When would you typically use a one-sample t-test instead of a one-sample z-test for means?

7.Which of the following is an example of an alternative hypothesis for a two-tailed test concerning a population mean mu?

8.What does "failing to reject the null hypothesis" mean?

9.Which factor is directly related to the power of a statistical test?

10.In the context of hypothesis testing, what is the rejection region?

Final Study Advice

  • 1.Always state both hypotheses (H0 and Ha) before doing any calculations. The alternative hypothesis determines whether you use a one-tailed or two-tailed test.
  • 2.Memorize the decision rule: p-value <= alpha means reject H0. Practice until it becomes second nature.
  • 3.Practice identifying whether a problem calls for a z-test or t-test, and whether it is one-sample or two-sample.
  • 4.Always write your conclusion in the context of the original problem, not just "reject H0" or "fail to reject H0."
  • 5.Understand the relationship between errors, power, and sample size -- this is frequently tested on exams.

Related Topics