The analytic technique for multiple-means hypothesis testing is based on variances. In this video, learn the basics of this technique. It's called the Analysis of Variance, or ANOVA.

- [Instructor] Let's get into the analytic technique for testing hypotheses about more than two means, the analysis of variance. The hypotheses and alpha we deal with look like this in the simplest case. The null hypothesis is that three population means are equal. The alternative hypothesis, Not H0 and alpha is 0.05. These data illustrate an example, 15 people are randomly assigned to solve paper and pencil mazes in either red, green, or blue light. The dependent variable is the time in seconds to solve the maze. The null hypothesis is that the three parameters, mu red, mu green, and mu blue are equal. The alternative hypothesis is Not H0 with alpha at 0.05. Because pairwise t-tests of means won't work, we have to think in terms of variances, instead of means. But first, let's go back to a basic. This is sample variance, an estimate of population variance. A variance estimate, like this one, is also called a Mean Square. The numerator on the right is called Sum of Squares, and the denominator of the variance estimate is called degrees of freedom. In general, we can say that Mean Square is equal to sum of squares divided by degrees of freedom, abbreviated as MS equals SS over DF. So if we think in terms of variance, three kinds of variance reside within this dataset. There's variance among the means, just like any set of numbers these three have variance. This variance is called Mean Square Between Groups or MSB. We have variance within the groups, the scores within each group vary around their group mean. The group means are different from one another, but the variances in each group are about the same. We can pull these variances to create MS within groups, the Means Squared Within Groups or MSW. And we also have total variance, all 15 scores vary from the mean of all scores, the grand mean. This is called Mean Square Total, or MST. Now for the reasoning process. If the null hypothesis is true, the three levels of the independent variable, light color, have no effect on the dependent variable. This does not mean that X-bar1, X-bar2, and X-bar3 should necessarily be equal. They don't all have to be equal for the null hypothesis to be true. Now, and this is important, what this does mean is that X-bar1, X-bar2, and X-bar3 vary as much, or as little as any three numbers randomly selected from the population. Let's look at the reasoning process in terms of Mean Squares. To decide whether or not to reject a null hypothesis, we have to compare the variance among the group means, MSB, to an estimate of the population variance. As for the decision, if MSB, the Mean Squared Between, is greater than the estimated population variance, then reject the null hypothesis. If not, then we don't reject the null hypothesis. So it comes down to estimating the population variance, and the Mean Squared Within Groups, MSW, is our best estimate of the population variance. Group members were randomly selected from the population and randomly assigned to each group. We would expect that group variances are about the same, even though the means are different. So to decide whether or not to reject the null hypothesis, compare the Mean Squared Between to the Mean Squared Within. And as for the decision, well, if the Mean Squared Between is significantly greater than the Mean Squared Within, reject the null hypothesis. If not, then don't reject the null hypothesis. How do we compare two variances? With an F ratio. The F ratio here is the Mean Squared Between divided by the Mean Squared Within. If the null hypothesis is true, that F ratio should be less than or equal to one. If the null hypothesis is not true, then at F ratio should be greater than one. And this is the analysis of variance. We can frame the hypothesis in terms of variances rather than means. In terms of means it's the null hypothesis is that mu1 equals mu2 equals mu3, and the alternative hypothesis is Not H0. But in terms of variances, the null hypothesis is that Sigma squared B divided by Sigma squared W is less than or equal to one. The alternative hypothesis is that that ratio is greater than one. And this is more in keeping with how we handle the statistics. Let's examine the Mean Squared Between. This is a variance estimate based on a number of groups, so the degrees of freedom is the number of groups minus one, which is the denominator of this variance estimate. Now let's look at the Mean Squared Within. This is a variance estimate that's based on pooling the group variances, sort of like averaging them together. So the denominator of this variance estimate is the degrees of freedom within, which is the number of people in the study minus the number of groups. And the Mean Square Total, this variance estimate is based on all the people in this study, and it's denominator is number of people in the study minus one, and that's the degrees of freedom total. That X double bar stands for the grand mean. For this example, the degrees of freedom between is the number of groups minus one, which is two. The degrees of freedom within is the number of people minus the number of groups, which is 15 minus three, or 12. And the degrees of freedom total is the number of people minus one, 15 minus one, 14. Some notable relationships. With respect to the degrees of freedom, the degrees of freedom total equals the sum of the other two degrees of freedom. And with respect to the Sum of Squares, the Sum of Squares total is equal to the sum of the other two Sums of Squares. So wrapping up, when testing more than two means, we have to think in terms of variances, Mean Squares rather than means. We test the Mean Square Between versus the Mean Squared Within. If the Mean Squared Between is greater than the Mean Squared Within, we reject the null hypothesis. And this is called the analysis of variance.

###### Released

6/11/2019- Explain how to calculate simple probability.
- Review the Excel statistical formulas for finding mean, median, and mode.
- Differentiate statistical nomenclature when calculating variance.
- Identify components when graphing frequency polygons.
- Explain how t-distributions operate.
- Describe the process of determining a chi-square.

## Share this video

## Embed this video

Video: Introducing ANOVA