4.3 Analysis of Variance

One-way analysis of variance can be viewed as a special case of bivariate analysis, where one variable (X) is a categorical (nominal) variable and the other variable (Y) is an interval-scaled variable. Suppose X is classified into g categories. The interrelationship between X and Y involves a comparison of the distribution of Y among the g categories of X. This comparison may involve comparison of distribution parameters, such as means or variances. The statistical procedure that compares the means of different groups is called Analysis of Variance (ANOVA). It rather seems odd that a procedure that compares means is called analysis of variance, but this name is derived from the fact that in order to test for statistical significance between means, actually variances are compared (i.e., analyzed). The categorical variable is usually referred to as a factor and the categories are called levels. The one-way analysis of variance is also called single - factor analysis.






Group g




















S y1

S y2

S yi


S yg



Overall or grand mean = (S y1 + S y2+ …. + …. S yg)/n

Essentially, the method depends upon the partitioning of both degrees of freedom and sums of squared deviations between a component, called ‘error’, and the other, called ‘effect’. The sum of squared deviations for effects is also influenced by the error, which is an all-pervading uncertainty or noise, distributed such that under the null hypothesis of no effect between the means of different categories (i.e., absence of effect), the expected values of the two sums of squared deviations are proportional to their respective degrees of freedom. Hence, the mean squared deviations (i.e., the sum of squared deviations divided by the degrees of freedom) would have the same expectations. If, however, the effect does exist, it will inflate its own mean squared deviation, but not that of the error. If the effect is large enough, it would lead to significance shown by the F–test. In this context, F equals to the ratio of the mean squared deviations for the effect and that for the error.

The behavior of the Y observations can be modeled as follow:

yij = m ij + e ij             i=1,2, . . . ,nj.          j = 1,2, . . . ,g

where e ij represents a random variable with mean 0 and variance s . It is assumed that e ij are mutually independent. Under the assumption of homogeneity of variance the model can be written as:

        i=1,2, . . . ,nj.;          j=1,2, . . .,g

where is an estimate of mj and (yij) is an estimate of e ij. The predicted values for yij from this model are ŷij = .

Total sum of squares

The total variation in Y over the sample, or the total sum of squares (TSS) is determined by

where y.. is the overall or grand mean given by

where n =

This sum of squares measures the variations in the values of Y around the grand mean (i.e. the sum of squared deviations).

Sum of squares between groups

The sum of squares between groups (BSS) measures the variation between the group means and is computed as follows:

Mean square a between groups =MSB = BSS/(g-1)

Sum of squares within groups

The sum of squares within groups is given by

Mean square within groups = MSW=WSS /(n-g).

It can be easily seen that TSS=BSS+ WSS. Thus, the total variation is partitioned into two components –:Within Groups Variance and Between Groups Variance.

Under the normality assumption, the ratio MSB/MSW has an F distribution with (g-1) and (n-g) degrees of freedom. This test statistic can be used to test the null hypothesis of no difference between the group means. The computation of F statistic is illustrated in the following table, called ANOVA table.

Source of variance

Degrees of freedom

Sum of squares

Mean square

F ratio




MSB=BSS/ (g-1)