5.3 Multiple Classification Analysis

Multiple Classification Analysis (MCA) is a technique for examining the interrelationship between several predictor variables and one dependent variable in the context of an additive model. Unlike simpler forms of other multivariate methods, MCA can handle predictors with no better than nominal measurements and interrelationships of any form among the predictor variables or between a predictor and dependent variable. It is however essential that the dependent variable should be either an interval-scale variable without extreme skewness or a dichotomous variable with frequencies which are not extremely unequal.

In statistical terms, the MCA model specifies that a coefficient be assigned to each category of each predictor, and that each individual’s score on the dependent variable be treated as the sum of the coefficients assigned to categories characterizing that individual, plus the average for all cases, plus an error term.

Yij...n=+ ai +bj+ . . . .+e ij..n


Yij...n = The score on the dependent variable of individual n who falls in category i of predictor A, category j of predictor B, etc

= Grand mean of the dependent variable.

ai = The effect of the membership in the i th category of predictor A.

bj = The effect of the membership in the j th category of predictor B.

e ij..n= Error term for this individual..

The coefficients are estimated in such a way that they provide the best possible fit to the observed data, i.e., they minimize the sum of squared errors. The coefficients can be estimated by solving a set of equations, known as normal equations (or least squares equations). The normal equations used by the MCA program (shown here for three predictors) are as follows:

a i = A i – Y – (1/Wi ) S Wij bj - (1/Wi ) S Wik ck

bj = Bj – Y - (1/Wj )S Wij bj - (1/Wj )S Wik ck

ck =– Y - (1/Wk) S Wij bj -(1/Wk ) S Wik ck


A i = Mean value of Y for cases falling in the i th category of predictor A.

Bj = Mean value of Y for cases falling in the j th category of predictor B.

C k = Mean value of Y for cases falling in the kth category of predictor C.

The MCA program uses an iterative procedure to solve the normal equations

An important feature of the MCA program is its ability to determine the ‘coefficients’ or ‘adjusted deviations’ associated with the categories of each predictor. The adjusted deviations represent the program’s attempt to fit an additive model by solving a set of linear equations. The program actually arrives at the coefficients by a series of successive approximations, altering one coefficient at a time on the basis of the latest estimates of other coefficients.

Formulae for statistics printed by the program

Yk = Individual k’s score on the dependent variable

wk = Individual k’s weight

N = Number of individuals

C = Total number of categories across all predictors

ci = Total number of categories in predictor i

P = Number of predictors

aij = Adjusted deviation of j th category of predictor i on final iteration


Sum of Y



Sum of Y2


Grand mean of Y


Sum of Y for category j of predictor I


Sum of Y2 for category j of predictor I


Standard deviation of Y


Mean of Y for category j of predictor I


Sum of squares based on unadjusted deviations for predictor i


Sum of squares based on adjusted deviations for predictor i


Explained sum of squares


Total sum of squares


Residual sum of squares


Eta for predictor i


Beta for predictor i


Multiple correlation coefficient (squared)


Adjustment for degrees of freedom


Multiple correlation coefficient (squared and adjusted for degrees of freedom)


Eta (squared and adjusted for degrees of freedom)

A variety of F tests can be computed from the statistics printed by the program. The first test answers the question: ‘Do all predictors together explain a significant proportion of the variance of the dependent variable?’

The second test answers the question: ‘Does this particular predictor all by itself explains a significant portion of the variance of the dependent variable?’ This is the classical question answered by one-way analysis of variance. The F test for predictor i is computed as follows:

Measures of importance of predictors

The following criteria can be used for assessing the importance of a predictor, i.e., the degree of relationship between an independent variable and the dependent variable or its predictive power.

Eta statistic

This statistic can be used for assessing the bivariate relationship between a predictor and the dependent variable. Eta squared (also called the correlation ratio) can be interpreted as the proportion of variance explained by the predictor.

Beta statistic

This statistic is an approximate measure of the relationship between a predictor and the dependent variable, while holding constant all other predictors, i.e., assuming that in each category of a given predictor all other predictors are distributed as they are in the population at large. The rank order of these betas indicates the relative importance of the various predictors in their explanation of the variance in dependent variable, if all other predictors were held constant.

To assess the marginal or unique explanatory power a predictor has over and above what can be explained by other predictors, the following procedures are suggested.

  1. One can remove the effects of other predictors from the predictor in question, and then correlate the residuals of that predictor (actual minus predicted values) with the dependent variable. This part correlation asks whether there is any variability in X not predictable by other predictors that help to explain Y? In other words, one assesses the importance of a predictor in terms of the variance in the dependent variable marginally explainable by the predictor relative to the total variance in the dependent variable. The squared part correlation can be obtained by carrying out two MCA analyses, with and without the predictor in question, since the squared part correlation is equal to the increase in multiple R squared.

Squared part correlation =

(R2adj with everything in) - (R2adj omitting one set)

  1. One can remove the effects of the other predictors from both the dependent variable and the predictor in question, and correlate the two sets of residuals. This is the partial correlation coefficient. The squared partial correlation can be estimated from two multiple R-squares:
Advantages of Multiple Classification Analysis

MCA can overcome some of the problems of Analysis of Variance, Multiple Regression or Discriminant Analysis. In the case of Analysis of Variance, the problem of correlated predictors must be considered, whereas in the case of Multiple Regression or Discriminant Analysis, one is faced with the problem of predictors, which are not interval scale variables, but categories, often with scales as weak as the nominal level.

An important feature of MCA is its ability to show the effect of each predictor on the dependent variable, both before and after taking into account the effects of all other predictors. Multiple Regression and Discriminant Analysis can also do this, but under certain restrictive conditions. They usually require that all predictor variables are measured on interval scales and the relationships are linear or linearized. MCA is not constrained by any of these conditions. The predictors are always treated as sets of classes or categories; hence it does not matter whether a particular set represents a nominal scale (categories) or ordinal scale (ranking) or an interval scale (classes of numerical variable).

Another important feature is the format in which the results are presented. All coefficients are expressed as deviations from the overall mean, not from unknown mean of the excluded class in each set. The constant term in the predicting equation is the overall mean, not some composite sum of means of the excluded subclasses. Moreover adjusted and unadjusted subgroup means are available in the same table, which can be used to detect the amount of intercorrelations between the predictors.