ANOVA | Tianqi’s Blog

type

status

date

slug

summary

Resources:

Sample_and_Data.xlsx

18.5KB

How it works in Excel:

click Data Analysis

choose ANOVA

press ok, get the result

Terminology:

Hypothesis Tests

Null Hypothesis (H0)
Alternative Hypothesis (H1)
Significance Level α (often 0.05)

The sum of Squares (SS): The sum of squares of the differences between data points and their mean.

Treatment Sum of Squares (SST): The sum of squares of the differences between group means and the overall mean.

Error Sum of Squares (SSE): The sum of squares of the differences between data points within a group and their group mean.

Mean Square (MS): The sum of squares divided by its degrees of freedom.

F-statistic: The ratio of the treatment mean square to the error mean square.

Degrees of Freedom (df): The number of data points used to estimate parameters minus the number of parameters.

F-distribution table

One-Way ANOVA

Two-way ANOVA without Interaction

Two-way ANOVA with Interaction

Multifactor Orthogonal Design and Analysis of Variance

How to use ANOVA:

Formulate Hypotheses:

Null Hypothesis (H0) → For instance: The population means of all groups are equal.

Alternative Hypothesis (H1) → For instance: The population means of at least two groups are not equal.

Select Significance Level α (usually α = 0.05)

Calculate F-statistic: F = Treatment Mean Square / Error Mean Square.

Decision: Compare with the critical value from the F-distribution. If the calculated F-value is greater than the critical value, reject the null hypothesis.

In simple terms:

Current Situation: While researching the results of a project, it's suspected that one factor is a significant factor affecting the outcomes.

Objective: To prove that this factor is indeed significant.

One-Way ANOVA

Considering only one Factor A for its effect on the experimental results of interest.

Example:

Assume you want to study the effect of three different teaching methods (Factor A) on students' exam scores (experimental results). You randomly select 30 students and evenly divide them into three groups of 10. Each group of students uses a different teaching method.

Group 1: Uses teaching method A.

Group 2: Uses teaching method B.

Group 3: Uses teaching method C.

After a period of study, all students take the same exam, and the exam scores are as follows (simplified data):

Group 1 (Method A): 90, 85, 88, 84, 82, 91, 85, 88, 90, 86

Group 2 (Method B): 78, 80, 82, 79, 77, 81, 80, 79, 78, 80

Group 3 (Method C): 70, 72, 68, 74, 71, 69, 70, 68, 73, 72

You now want to know if there is a significant difference between these three teaching methods.

In this example, each group of students is a “group”. You are comparing the mean exam scores of these three “groups”.

ANOVA Analysis:

Hypothesis:

H0 (Null Hypothesis): The mean exam scores of the three groups are the same.

H1 (Alternative Hypothesis): The mean exam scores of at least two groups are different.

Calculation:

Calculate the mean of each group and the overall mean.

Compute the Treatment Sum of Squares (between-group variance) and the Error Sum of Squares (within-group variance) based on the differences between data points and their group mean, and the group means and the overall mean.

Use the appropriate formula to calculate the F-statistic.

Conclusion:

Suppose the calculated F-value is greater than the critical value at a significance level of α = 0.05. In that case, we reject the null hypothesis, indicating a significant difference in exam scores among the three teaching methods.

If the null hypothesis is rejected, you might proceed with multiple comparisons to identify which pairs of methods have significant differences.

Calculation Process:

Data:

Group 1 (Method A): 90, 85, 88, 84, 82, 91, 85, 88, 90, 86

Group 2 (Method B): 78, 80, 82, 79, 77, 81, 80, 79, 78, 80

Group 3 (Method C): 70, 72, 68, 74, 71, 69, 70, 68, 73, 72

Step 1: Calculate the mean of each group and the overall mean

Step 2: Calculate the Treatment Sum of Squares (between-group variance)

where n is the sample size of each group, here n=10.

Step 3: Calculate the Error Sum of Squares (within-group variance)

For each group, calculate the sum of the squared differences between each data point and its group mean. Sum up all these values to obtain the error sum of squares.

Step 4: Calculate Mean Squares

where k is the number of groups, here k=3.

where N is the total number of data points, here N=30.

After calculation,

Step 5: Calculate F-statistic

Step 6: Look up the critical value from the F-distribution table

With a significance level of α=0.05, degrees of freedom for the numerator df1=(k-1)=2, and degrees of freedom for the denominator df2=(N-k)=27, you find the critical value of F=3.35 from the F-distribution table.

Step 7: Decision

Since F=79.87 > F_critical=3.35, we reject the null hypothesis H0. This means there is a statistically significant difference in exam scores among the three teaching methods.

Note: For more complex analysis or a large dataset, it's recommended to use statistical software like R, Python (with libraries like statsmodels or SciPy), or commercial packages like SPSS or SAS.