Chi-Square Tests
Chi-Square tests are a family of non-parametric statistical tests used to analyze categorical data. They help determine if there is a significant difference between observed frequencies and expected frequencies, telling us whether a pattern in data is likely due to chance or a real relationship.
This guide covers the Goodness-of-Fit test, Test of Independence, and Test of Homogeneity with step-by-step calculations, worked examples, key formulas, and a 10-question practice quiz.
1Introduction
Chi-square tests are among the most widely used statistical tools for analyzing categorical data. Unlike tests that compare means (like t-tests), chi-square tests compare observed counts against expected counts to determine whether there is a statistically significant pattern or relationship.
Whether you are testing if a die is fair, checking if customer preferences match market share predictions, or investigating whether two categorical variables are related, chi-square tests provide a rigorous framework for making these decisions.
Chi-square tests appear in nearly every field: marketing surveys, clinical trials, genetics, quality control, and social science research. They let you determine whether observed patterns in categorical data reflect real differences or are simply due to random chance.
Goodness-of-Fit
Does a sample distribution match a hypothesized population distribution?
Independence
Are two categorical variables related within a single population?
Homogeneity
Is the distribution of a variable the same across different populations?
2Key Definitions
Chi-Square Statistic
A single numerical value that quantifies the difference between observed and expected frequencies. Larger values indicate greater discrepancy.
Observed Frequencies (O)
The actual counts or frequencies recorded in each category from your sample data.
Expected Frequencies (E)
The counts expected in each category if the null hypothesis were true (no relationship, no difference, or data fit the expected distribution).
Degrees of Freedom (df)
A parameter that defines the shape of the chi-square distribution. It represents the number of independent pieces of information used in the calculation.
Contingency Table
A table displaying the frequency distribution of two categorical variables, used in tests of independence and homogeneity.
Null Hypothesis (H0)
The assumption that there is no significant difference or relationship. Chi-square tests attempt to reject or fail to reject H0.
P-value
The probability of observing results as extreme as the test statistic, assuming H0 is true. Compared to the significance level (alpha).
3Chi-Square Distribution
The chi-square distribution is a family of continuous probability distributions that are positively skewed (tail to the right). Its shape depends on one parameter: the degrees of freedom (df).
The distribution is always non-negative, starting at 0 and extending infinitely to the right. As degrees of freedom increase, the distribution becomes more symmetrical and approaches a normal distribution.
Key Properties
How to Use the Distribution
Compare your calculated chi-square test statistic to the critical value from a chi-square table (using your df and significance level). If the test statistic exceeds the critical value, reject H0. Alternatively, compare the p-value to your alpha level.
4Calculating the Test Statistic
The general formula for the chi-square test statistic is the same across all three types of chi-square tests. What differs is how you calculate the expected frequencies.
Chi-Square Test Statistic Formula
chi-square = Sum of (O - E)² / E
Where O = observed frequency, E = expected frequency for each category or cell
Step-by-Step Process
State hypotheses: Define H0 (null) and H1 (alternative).
Determine observed frequencies (O): These come directly from your data.
Calculate expected frequencies (E): This step depends on the specific test type.
Compute (O - E)² / E for each category or cell.
Sum the results to get the chi-square test statistic.
Determine df, find the critical value or p-value, and make your decision.
Key Insight: (O - E)² / E
Squaring the difference ensures all values are positive and penalizes larger deviations. Dividing by E normalizes the contribution so that a difference of 5 from an expected count of 10 counts more than a difference of 5 from an expected count of 1000.
5Goodness-of-Fit Test
The Goodness-of-Fit test determines whether a single categorical variable follows a hypothesized distribution. For example, testing whether a die is fair or whether customer preferences match predicted market shares.
Expected Frequency Formula
E = n x p (where n = total observations, p = hypothesized proportion for each category)
Degrees of Freedom
df = k - 1 (where k = number of categories)
Example: Testing a Fair Die
A die is rolled 120 times. Observed results: 18, 22, 20, 21, 19, 20. If the die is fair, we expect 20 for each face.
| Face | O | E | (O-E)²/E |
|---|---|---|---|
| 1 | 18 | 20 | 0.20 |
| 2 | 22 | 20 | 0.20 |
| 3 | 20 | 20 | 0.00 |
| 4 | 21 | 20 | 0.05 |
| 5 | 19 | 20 | 0.05 |
| 6 | 20 | 20 | 0.00 |
| Total | 120 | 120 | 0.50 |
With df = 5 and the critical value at alpha = 0.05 being 11.07, our chi-square of 0.50 is far below the critical value. We fail to reject H0 -- the die appears fair.
6Test of Independence
The Test of Independence determines whether there is a statistically significant association between two categorical variables within a single population. For example: Is there a relationship between gender and preferred type of social media?
Expected Frequency Formula
E = (Row Total x Column Total) / Grand Total
Degrees of Freedom
df = (r - 1)(c - 1)
Hypotheses
H0 (Null)
The two variables are independent (no association).
H1 (Alternative)
The two variables are dependent (there is an association).
Association, Not Causation
A significant chi-square test only shows that two variables are associated. It does NOT prove that one variable causes the other. There may be confounding variables or the relationship may be coincidental.
7Test of Homogeneity
The Test of Homogeneity determines whether the distribution of a single categorical variable is the same across two or more different populations. For example: Are the proportions of people who prefer coffee, tea, or soda the same in City A and City B?
The calculation is identical to the Test of Independence -- the same formula, the same degrees of freedom. The difference lies in the sampling method and research question.
Independence vs. Homogeneity
Same Math, Different Questions
The formulas and calculations are identical for both tests. The key difference is how you collected your data and what question you are answering. On exams, pay close attention to the scenario description to determine which test is being used.
8Worked Examples
Example 1: Goodness-of-Fit Test
A company claims its website traffic is equally distributed across four sections: Home, Products, About Us, and Contact. A sample of 200 users showed: Home (60), Products (55), About Us (45), Contact (40). Test at alpha = 0.05.
H0: Traffic is equally distributed (25% each).
H1: Traffic is NOT equally distributed.
Expected: 200 x 0.25 = 50 for each section.
| Section | O | E | (O-E) | (O-E)²/E |
|---|---|---|---|---|
| Home | 60 | 50 | 10 | 2.0 |
| Products | 55 | 50 | 5 | 0.5 |
| About Us | 45 | 50 | -5 | 0.5 |
| Contact | 40 | 50 | -10 | 2.0 |
| Total | 200 | 200 | 5.0 |
df = 4 - 1 = 3. Critical value at alpha = 0.05 is 7.815.
Decision: chi-square = 5.0 < 7.815, so we fail to reject H0. The p-value is approximately 0.172.
There is not enough evidence to conclude that the website traffic is unequally distributed.
Example 2: Test of Independence
A survey of 180 people asked about preferred news source (TV, Online, Print) and age group (Under 30, 30-60, Over 60). Is there a relationship between age group and preferred news source at alpha = 0.01?
H0: Age group and news source are independent.
H1: Age group and news source are dependent.
| Age Group | TV | Online | Row Total | |
|---|---|---|---|---|
| Under 30 | 10 | 40 | 10 | 60 |
| 30-60 | 30 | 20 | 10 | 60 |
| Over 60 | 40 | 10 | 10 | 60 |
| Col Total | 80 | 70 | 30 | 180 |
Expected frequencies are computed as (Row Total x Col Total) / 180. For example, Under 30 / TV: (60 x 80) / 180 = 26.67.
chi-square = 37.51 with df = (3-1)(3-1) = 4.
Critical value at alpha = 0.01 with df = 4 is 13.277.
Decision: 37.51 > 13.277, so we reject H0. The p-value is less than 0.0001.
There is strong evidence that age group and preferred news source are related.
9Key Formulas
Chi-Square Test Statistic
chi-square = Sum of (O - E)² / E
Degrees of Freedom (Goodness-of-Fit)
df = k - 1 (k = number of categories)
Degrees of Freedom (Independence / Homogeneity)
df = (r - 1)(c - 1) (r = rows, c = columns)
Expected Frequency (Goodness-of-Fit)
E = n x p (n = total, p = hypothesized proportion)
Expected Frequency (Independence / Homogeneity)
E = (Row Total x Column Total) / Grand Total
10Memory Aids
"Chi-Square is for CATS"
Used for CATegorical data. If your data involves categories and counts, think chi-square.
"OEE" -- Observed, Expected, Errors
Remember the three key values: Observed counts, Expected counts, and Errors (differences between O and E).
"DF for GOF: Go F(-1)"
Goodness-of-Fit df = k - 1. Just subtract one from the number of categories.
"Small E is BAD"
Expected frequencies should be at least 5 for valid results. Small expected values violate the chi-square approximation.
11Common Mistakes
Using Raw Data Instead of Frequencies
Chi-square tests operate on counts/frequencies, not raw scores or percentages. Always convert to counts first if your data is in percentage form.
Violating the Expected Frequency Assumption
All expected frequencies must be at least 5 for valid results. If not, combine categories or use Fisher's Exact Test for small tables.
Incorrectly Calculating Degrees of Freedom
Using the wrong df formula leads to incorrect critical values and p-values. Remember: Goodness-of-Fit uses k-1; Independence/Homogeneity uses (r-1)(c-1).
Claiming Causation from Association
A significant chi-square test only indicates an association or relationship, not causation. Confounding variables may be at play.
Confusing the Three Test Types
Goodness-of-Fit uses one variable and a hypothesized distribution. Independence and Homogeneity both use contingency tables but differ in sampling design and research question.
Quick Revision
- ✓Chi-square tests analyze categorical data by comparing observed vs. expected frequencies.
- ✓Formula: chi-square = Sum of (O - E)² / E for all categories or cells.
- ✓Goodness-of-Fit: One variable, compares to a hypothesized distribution. df = k - 1.
- ✓Independence: Two variables in one population. Are they associated? df = (r-1)(c-1).
- ✓Homogeneity: One variable across multiple populations. Same distribution? df = (r-1)(c-1).
- ✓Expected frequencies must all be at least 5 for valid results.
- ✓The distribution is always right-skewed and non-negative; becomes more normal as df increases.
- ✓Decision rule: Reject H0 if chi-square > critical value (or if p-value < alpha).
- ✓Association only: A significant result shows association, never causation.
Frequently Asked Questions
- What is the main difference between a Chi-Square Test of Independence and a Test of Homogeneity?
- Both use the same formula and calculation steps. The difference lies in sampling: Independence uses one sample measuring two categorical variables to see if they are associated. Homogeneity uses two or more independent samples to see if the distribution of one categorical variable is the same across different populations.
- What if my expected frequencies are less than 5?
- This violates the assumption of the chi-square test, making the results unreliable. You can combine categories, use Fisher's Exact Test for 2x2 tables, or apply Yates's Correction for Continuity (though this is debated).
- Can Chi-Square tests be used for continuous data?
- No. Chi-square tests are specifically for categorical (nominal or ordinal) data. For continuous data, use parametric tests like t-tests or ANOVA, or non-parametric alternatives like Mann-Whitney U or Kruskal-Wallis.
- Is a larger Chi-Square value always better?
- A larger chi-square value indicates a greater observed deviation from what was expected under the null hypothesis. If this value exceeds the critical value, it provides evidence to reject the null hypothesis — meaning stronger evidence of a relationship or difference.
- What does the p-value mean in a Chi-Square test?
- The p-value is the probability of observing a chi-square statistic as extreme as (or more extreme than) the one calculated, assuming the null hypothesis is true. A small p-value (typically less than alpha) suggests the deviation from expected is unlikely due to chance.
Practice Quiz
Test your knowledge — select the correct answer for each question.
1.Which of the following is the primary type of data analyzed by Chi-Square tests?
2.The Chi-Square Goodness-of-Fit test is used to:
3.What is the formula for calculating expected frequencies in a Chi-Square Test of Independence?
4.If a Chi-Square Test of Independence has 3 rows and 4 columns, what are the degrees of freedom?
5.A researcher obtains a p-value of 0.03 with a significance level of 0.05. What is the correct decision?
6.Which assumption is critical for the validity of a Chi-Square test?
7.A significant Chi-Square test of independence means:
8.What happens to the shape of the Chi-Square distribution as degrees of freedom increase?
9.Which scenario would typically use a Chi-Square Test of Homogeneity?
10.If your calculated chi-square test statistic is very small (close to 0), what does this imply?
Final Study Advice
- 1.Always check that all expected frequencies are at least 5 before proceeding with the test.
- 2.Practice computing expected frequencies by hand for both Goodness-of-Fit and contingency table scenarios.
- 3.Create a summary table of the three test types, noting the differences in sampling and hypotheses.
- 4.When reading a problem, identify whether you have one variable or two, and how many populations are involved.
- 5.Remember: chi-square tests are always right-tailed -- a large test statistic means more evidence against H0.