Hypothesis Testing
Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data. It is a formal procedure for investigating ideas about the world using statistical evidence.
This guide covers key definitions, the steps of hypothesis testing, types of errors, z-tests and t-tests, two-sample tests, worked examples, memory aids, and a practice quiz.
1Introduction
Hypothesis testing is one of the most widely used tools in statistics. It allows researchers to determine whether observed differences or effects are statistically significant or simply due to random chance.
The core idea is straightforward: you start with a default assumption (the null hypothesis), collect data, and determine whether the evidence is strong enough to reject that assumption in favor of an alternative explanation.
Imagine a company claims their new battery lasts an average of 200 hours. You suspect it is less. You buy 36 batteries, test them, and find an average life of 195 hours. Is this enough evidence to say the company is wrong, or could the difference be due to random variation? Hypothesis testing gives you a systematic framework to answer this question.
Why Use Hypothesis Testing?
Validate Claims
Test whether claims or theories about a population hold up when confronted with real data.
Compare Groups
Determine whether two treatments, methods, or populations differ in a meaningful way.
Distinguish Signal from Noise
Determine if observed differences are statistically significant or just due to random chance.
Inform Decisions
Make evidence-based decisions in research, business, medicine, and many other fields.
2Key Definitions
Null Hypothesis (H0)
The "status quo" or default assumption. States there is no effect, no difference, or no relationship. Always includes an equality sign (e.g., mu = mu_0).
Alternative Hypothesis (Ha)
What we are trying to prove or suspect is true. States there is an effect, difference, or relationship. Never includes an equality sign.
p-value
The probability of observing a test statistic as extreme as (or more extreme than) the one calculated, assuming H0 is true. A small p-value provides evidence against H0.
Test Statistic
A standardized value that measures how far the sample result deviates from what is expected under H0. Its formula depends on the type of test (z, t, F, chi-square).
Significance Level (alpha)
The pre-determined threshold for rejecting H0. It is the maximum acceptable probability of a Type I error. Commonly set at 0.05, 0.01, or 0.10.
Rejection Region
The range of test statistic values that leads to rejecting H0. Defined by the critical value(s) corresponding to alpha and the type of test.
Type I Error (alpha)
Rejecting H0 when it is actually true. A "false positive." Analogy: convicting an innocent person.
Type II Error (beta)
Failing to reject H0 when it is actually false. A "false negative." Analogy: letting a guilty person go free.
3Steps in Hypothesis Testing
Hypothesis testing follows a systematic process. Here are the seven steps you should follow every time.
Step 1: State the Hypotheses
Clearly define H0 and Ha in terms of population parameters. H0 always contains the equality; Ha specifies the direction of the test.
Step 2: Choose the Significance Level (alpha)
Decide on the maximum acceptable probability of a Type I error. Common choices: 0.05, 0.01, or 0.10.
Step 3: Select the Appropriate Test
Choose which test to use (z-test, t-test, chi-square, etc.) based on data type, sample size, and known population parameters.
Step 4: Calculate the Test Statistic
Compute the value of the test statistic using the sample data and the appropriate formula.
Step 5: Determine the p-value or Critical Value(s)
Find the probability associated with the test statistic (p-value approach) or find the critical value(s) from the distribution table (critical value approach).
Step 6: Make a Decision
p-value approach: If p-value <= alpha, reject H0. If p-value > alpha, fail to reject H0. Critical value approach: If the test statistic falls in the rejection region, reject H0.
Step 7: State the Conclusion in Context
Translate the statistical decision into plain language related to the original research question. Never say "accept H0."
4Types of Tests
The direction of the alternative hypothesis determines whether the test is one-tailed or two-tailed.
Two-Tailed Test
Ha: mu is not equal to mu_0
Tests for any difference from the hypothesized value. Rejection regions in both tails.
Left-Tailed Test
Ha: mu < mu_0
Tests whether the parameter is less than the hypothesized value. Rejection region in the left tail only.
Right-Tailed Test
Ha: mu > mu_0
Tests whether the parameter is greater than the hypothesized value. Rejection region in the right tail only.
Choosing One-Tailed vs. Two-Tailed
Use a two-tailed test when you want to detect any difference (greater or less). Use a one-tailed test when you have a specific directional claim to test (e.g., "the new drug performs better" or "the average is below the standard").
5Errors & Power
There are two types of errors that can occur in hypothesis testing, and understanding them is essential.
Decision Outcomes Table
| H0 is True | H0 is False | |
|---|---|---|
| Reject H0 | Type I Error (alpha) | Correct Decision (Power = 1 - beta) |
| Fail to Reject H0 | Correct Decision | Type II Error (beta) |
Type I Error (False Positive)
Rejecting H0 when it is actually true. Probability = alpha. Like convicting an innocent person.
Type II Error (False Negative)
Failing to reject H0 when it is actually false. Probability = beta. Like letting a guilty person go free.
Statistical Power = 1 - beta
Power is the probability of correctly rejecting a false null hypothesis. Higher power means the test is better at detecting real effects. Power increases with: larger sample size, larger effect size, or larger alpha.
6One-Sample z-test for Means
The one-sample z-test is used to test a hypothesis about a single population mean when the population standard deviation (sigma) is known.
When to Use
- Testing a hypothesis about a single population mean
- Population standard deviation (sigma) is known
- Sample size is large (n >= 30) OR population is normally distributed
Z = (x-bar - mu_0) / (sigma / sqrt(n))
x-bar = sample mean, mu_0 = hypothesized mean, sigma = population standard deviation, n = sample size.
One-Sample t-test (when sigma is unknown)
When sigma is unknown, use the sample standard deviation (s) and the t-distribution with df = n - 1.
t = (x-bar - mu_0) / (s / sqrt(n))
Same structure as the z-test, but uses s instead of sigma and the t-distribution with df = n - 1.
Decision Rules
p-value Approach
If p-value <= alpha: Reject H0
If p-value > alpha: Fail to reject H0
Critical Value Approach
If |test statistic| > |critical value|: Reject H0
Otherwise: Fail to reject H0
7Two-Sample Tests
Two-sample tests are used to compare parameters of two different populations or groups.
Two-Sample z-test for Means
Compares two population means when both population standard deviations are known and samples are independent.
Two-Sample t-test for Means
Compares two population means when population standard deviations are unknown and samples are independent. Requires assumptions about population variances (equal or unequal).
Paired t-test
Compares means from dependent or paired samples (e.g., before-and-after measurements on the same subjects). Focuses on the mean of the differences between pairs.
Two-Sample z-test for Proportions
Compares two population proportions. Used for categorical data when comparing success rates between two groups.
8Worked Examples
Example 1: One-Sample z-test
A company claims their "Jumbo" apples weigh an average of 200 grams. A quality control manager suspects the average weight is different. They test 36 apples and find a mean of 195 grams. The population standard deviation is known to be 15 grams. Test at the 5% significance level.
H0: mu = 200 grams
Ha: mu is not equal to 200 grams (two-tailed)
alpha: 0.05
Test: One-sample z-test (sigma = 15 is known, n = 36)
Z = (195 - 200) / (15 / sqrt(36)) = -5 / 2.5 = -2.00
p-value: 2 x P(Z < -2.00) = 2 x 0.0228 = 0.0456
Decision: p-value (0.0456) <= alpha (0.05), so reject H0
Conclusion: At a 5% significance level, there is sufficient evidence to conclude that the average weight of the "Jumbo" apples is significantly different from 200 grams.
Example 2: One-Sample z-test (Right-Tailed)
Test H0: mu = 50 vs Ha: mu > 50 with x-bar = 52, sigma = 10, n = 36, alpha = 0.05.
Z = (52 - 50) / (10 / sqrt(36)) = 2 / 1.667 = 1.20
Critical value: For alpha = 0.05 (one-tailed), Z* = 1.645
Decision: 1.20 < 1.645, so fail to reject H0
Conclusion: At the 5% significance level, there is not enough evidence to conclude that the population mean is greater than 50.
Example 3: Two-Sample t-test (Conceptual)
A researcher compares two teaching methods on student test scores. 25 students use Method A, 28 use Method B. Population standard deviations are unknown.
H0: mu_A = mu_B (no difference)
Ha: mu_A is not equal to mu_B (two-tailed)
alpha: 0.01
Test: Two-sample t-test for independent means
Process: Calculate t-statistic using sample means, sample SDs, and sample sizes. Compare to t-distribution with appropriate df.
If p-value <= 0.01: There is sufficient evidence of a significant difference between the two teaching methods.
If p-value > 0.01: There is not enough evidence to conclude a significant difference exists.
9Memory Aids
"p-value Low, Null Must Go!"
If p-value is low (less than alpha), reject the null hypothesis.
"p-value High, Null Will Fly!"
If p-value is high (greater than alpha), fail to reject the null hypothesis.
Type I = False Positive
Rejecting a true H0. Think of it as a "false alarm" -- concluding there is an effect when there is not.
Type II = False Negative
Failing to reject a false H0. Think of it as "missing the signal" -- failing to detect a real effect.
"Alpha is the Error You Tolerate"
Alpha is the maximum acceptable probability of a Type I error you are willing to accept.
p-value = Probability of the Data
The p-value is the probability of seeing data this extreme if H0 is true. It is NOT the probability that H0 is true.
10Common Mistakes
Confusing p-value with P(H0 is true)
The p-value is the probability of the data given H0 is true, not the probability that H0 itself is true.
Saying "accept H0"
We never "accept" H0. We either reject it or fail to reject it. Failing to find evidence against H0 is not the same as proving it.
Not stating conclusions in context
Always relate your statistical decision back to the original real-world problem. A naked "reject H0" without context is incomplete.
Choosing the wrong test
Incorrectly applying a z-test instead of a t-test, or using a one-sample test for a two-sample problem. Check your conditions carefully.
Ignoring assumptions
Each test has underlying assumptions (normality, independence, equal variances). Violating these can invalidate the results.
Confusing Type I and Type II errors
Type I = false positive (rejecting true H0). Type II = false negative (failing to reject false H0). Know the difference and their consequences.
Fishing for significance
Running multiple tests until one yields a "significant" p-value without proper correction inflates the Type I error rate.
Focusing solely on the p-value
The p-value should be considered alongside effect size, confidence intervals, and practical significance. A statistically significant result may not be practically important.
Quick Revision Summary
- ✓Hypothesis testing is a formal procedure to decide whether sample data provides enough evidence to reject a claim about a population parameter.
- ✓H0 (null) represents the status quo and always includes an equality. Ha (alternative) is what we want to prove.
- ✓The p-value is the probability of seeing data as extreme as ours if H0 is true. If p-value <= alpha, reject H0.
- ✓Type I error = false positive (rejecting true H0). Type II error = false negative (failing to reject false H0).
- ✓Use a z-test when sigma is known; use a t-test when sigma is unknown.
- ✓Statistical power (1 - beta) is the probability of correctly rejecting a false H0.
- ✓Never say "accept H0." Instead say "fail to reject H0" -- absence of evidence is not evidence of absence.
- ✓Always state conclusions in context of the original research question.
Frequently Asked Questions
- What is the difference between a z-test and a t-test?
- A z-test is used when the population standard deviation is known, or when the sample size is very large (n >= 30) and sigma is approximated by the sample standard deviation. A t-test is used when the population standard deviation is unknown and must be estimated from the sample standard deviation, especially for smaller sample sizes.
- Can I "accept" the null hypothesis?
- No. In hypothesis testing, we either "reject the null hypothesis" or "fail to reject the null hypothesis." We never "accept" it because we are working with sample data, which can never definitively prove the absence of an effect in the entire population. Failing to find evidence against H0 does not mean H0 is true.
- How do I choose the significance level (alpha)?
- The choice depends on the context and consequences of making a Type I error. Use 0.05 (5%) as a common standard. Use 0.01 (1%) when a Type I error is very costly (e.g., medical trials). Use 0.10 (10%) when a Type II error is more costly or for exploratory research.
- What is the relationship between alpha and beta?
- Alpha (Type I error) and beta (Type II error) have an inverse relationship. Decreasing alpha (making it harder to reject H0) generally increases beta (making it harder to detect a true effect), assuming sample size and effect size remain constant.
- What is statistical power?
- Statistical power is the probability of correctly rejecting a false null hypothesis, calculated as 1 - beta. A higher power means the test is more likely to detect a real effect if one exists. Power can be increased by increasing sample size, increasing effect size, or increasing alpha.
Practice Quiz
Test your knowledge — select the correct answer for each question.
1.Which of the following is true about the null hypothesis (H0)?
2.What is the p-value?
3.If a p-value is 0.03 and the significance level (alpha) is 0.05, what is the correct decision?
4.A Type I error occurs when:
5.The significance level (alpha) is the probability of committing which type of error?
6.When would you typically use a one-sample t-test instead of a one-sample z-test for means?
7.Which of the following is an example of an alternative hypothesis for a two-tailed test concerning a population mean mu?
8.What does "failing to reject the null hypothesis" mean?
9.Which factor is directly related to the power of a statistical test?
10.In the context of hypothesis testing, what is the rejection region?
Final Study Advice
- 1.Always state both hypotheses (H0 and Ha) before doing any calculations. The alternative hypothesis determines whether you use a one-tailed or two-tailed test.
- 2.Memorize the decision rule: p-value <= alpha means reject H0. Practice until it becomes second nature.
- 3.Practice identifying whether a problem calls for a z-test or t-test, and whether it is one-sample or two-sample.
- 4.Always write your conclusion in the context of the original problem, not just "reject H0" or "fail to reject H0."
- 5.Understand the relationship between errors, power, and sample size -- this is frequently tested on exams.