type
status
date
slug
summary
tags
category
icon
password

Resources:

How it works in Excel:

notion image
click Data Analysis
notion image
choose ANOVA
notion image
press ok, get the result
notion image
 

Terminology:

  1. Hypothesis Tests
    1. Null Hypothesis (H0)
    2. Alternative Hypothesis (H1)
    3. Significance Level α (often 0.05)
  1. The sum of Squares (SS): The sum of squares of the differences between data points and their mean.
  1. Treatment Sum of Squares (SST): The sum of squares of the differences between group means and the overall mean.
  1. Error Sum of Squares (SSE): The sum of squares of the differences between data points within a group and their group mean.
  1. Mean Square (MS): The sum of squares divided by its degrees of freedom.
  1. F-statistic: The ratio of the treatment mean square to the error mean square.
  1. Degrees of Freedom (df): The number of data points used to estimate parameters minus the number of parameters.
  1. F-distribution table
  1. One-Way ANOVA
  1. Two-way ANOVA without Interaction
  1. Two-way ANOVA with Interaction
  1. Multifactor Orthogonal Design and Analysis of Variance

How to use ANOVA:

  1. Formulate Hypotheses:
      • Null Hypothesis (H0) → For instance: The population means of all groups are equal.
      • Alternative Hypothesis (H1) → For instance: The population means of at least two groups are not equal.
  1. Select Significance Level α (usually α = 0.05)
  1. Calculate F-statistic: F = Treatment Mean Square / Error Mean Square.
  1. Decision: Compare with the critical value from the F-distribution. If the calculated F-value is greater than the critical value, reject the null hypothesis.

In simple terms:

Current Situation: While researching the results of a project, it's suspected that one factor is a significant factor affecting the outcomes.
Objective: To prove that this factor is indeed significant.

One-Way ANOVA

Considering only one Factor A for its effect on the experimental results of interest.

Example:

Assume you want to study the effect of three different teaching methods (Factor A) on students' exam scores (experimental results). You randomly select 30 students and evenly divide them into three groups of 10. Each group of students uses a different teaching method.
  • Group 1: Uses teaching method A.
  • Group 2: Uses teaching method B.
  • Group 3: Uses teaching method C.
After a period of study, all students take the same exam, and the exam scores are as follows (simplified data):
  • Group 1 (Method A): 90, 85, 88, 84, 82, 91, 85, 88, 90, 86
  • Group 2 (Method B): 78, 80, 82, 79, 77, 81, 80, 79, 78, 80
  • Group 3 (Method C): 70, 72, 68, 74, 71, 69, 70, 68, 73, 72
You now want to know if there is a significant difference between these three teaching methods.
In this example, each group of students is a “group”. You are comparing the mean exam scores of these three “groups”.
ANOVA Analysis:
  1. Hypothesis:
      • H0 (Null Hypothesis): The mean exam scores of the three groups are the same.
      • H1 (Alternative Hypothesis): The mean exam scores of at least two groups are different.
  1. Calculation:
      • Calculate the mean of each group and the overall mean.
      • Compute the Treatment Sum of Squares (between-group variance) and the Error Sum of Squares (within-group variance) based on the differences between data points and their group mean, and the group means and the overall mean.
      • Use the appropriate formula to calculate the F-statistic.
  1. Conclusion:
      • Suppose the calculated F-value is greater than the critical value at a significance level of α = 0.05. In that case, we reject the null hypothesis, indicating a significant difference in exam scores among the three teaching methods.
      • If the null hypothesis is rejected, you might proceed with multiple comparisons to identify which pairs of methods have significant differences.

Calculation Process:

Data:
Group 1 (Method A): 90, 85, 88, 84, 82, 91, 85, 88, 90, 86
Group 2 (Method B): 78, 80, 82, 79, 77, 81, 80, 79, 78, 80
Group 3 (Method C): 70, 72, 68, 74, 71, 69, 70, 68, 73, 72
Step 1: Calculate the mean of each group and the overall mean
Step 2: Calculate the Treatment Sum of Squares (between-group variance)
where n is the sample size of each group, here n=10.
Step 3: Calculate the Error Sum of Squares (within-group variance)
For each group, calculate the sum of the squared differences between each data point and its group mean. Sum up all these values to obtain the error sum of squares.
Step 4: Calculate Mean Squares
where k is the number of groups, here k=3.
where N is the total number of data points, here N=30.
After calculation,
Step 5: Calculate F-statistic
Step 6: Look up the critical value from the F-distribution table
With a significance level of α=0.05, degrees of freedom for the numerator df1=(k-1)=2, and degrees of freedom for the denominator df2=(N-k)=27, you find the critical value of F=3.35 from the F-distribution table.
Step 7: Decision
Since F=79.87 > F_critical=3.35, we reject the null hypothesis H0. This means there is a statistically significant difference in exam scores among the three teaching methods.
Note: For more complex analysis or a large dataset, it's recommended to use statistical software like R, Python (with libraries like statsmodels or SciPy), or commercial packages like SPSS or SAS.
AlgorithmsUseful Codes
Tianqi
Tianqi
I'm currently working in a lab focused on computer vision projects powered by machine learning.
Announcement
type
status
date
slug
summary
tags
category
icon
password
🎉Welcome to my blog🎉
Sometimes it is necessary to refresh the page twice to get the latest data because the data in the database is not updated in time. This operation can be performed on each page.
-- Tianqi ---