6.5.2 Reduction of Dimensionality

Another way of looking at correspondence analysis is to consider it as a method for decomposing the overall inertia by identifying a small number of dimensions in which the deviations from the expected values can be represented. This is similar to the goal of factor analysis, where the total variance is decomposed, so as to arrive at a lower - dimensional representation of variables that allows one to reconstruct most of the variance/covariance matrix of variables.

Criterion for Dimensionality Reduction

In correspondence analysis, we are essentially looking for a low-dimensional subspace, which is as close as possible to the set of profile points in the high-dimensional true space. . Let S denote any candidate subspace. For the i:th profile point, we can compute the Chi-square distance between the profile point and S, denoted by di (S). The weighted measure of the distance of the profile point and the subspace is given by:

ri [ di (S).] 2

The distance of all the profiles to the subspace S is given by:

S ri [ di (S).] 2

The objective of correspondence analysis is to discover which subspace S minimizes the above criterion.

The criterion used for dimensionality reduction implies that the inertia of a cloud in the optimal subspace is maximum, but that would still be less than that in the true space. What is lost in this process is the knowledge of how far and in which direction the profiles lie off this subspace. What is gained is a view of the profiles, which otherwise would not be possible. The ratio of inertia inside the subspace to the total inertia gives a measure of the accuracy of representation of a cloud in the subspace.

Correspondence analysis determines the principal axes of inertia and for each axis the corresponding eigenvalue, which is the same as the inertia of the cloud in the direction of the axis. The first factorial axis is the line in the direction of which the inertia of the cloud is a maximum. The second factorial axis is, among all the lines that are perpendicular to the first factorial axis, the one in whose direction the inertia of the cloud is a maximum. The third factorial axis is, among all the lines that are perpendicular to both the first and second factorial axes, the line in whose direction the inertia of the cloud is a maximum, and so on The optimal subspace is a subspace spanned by the principal axes. The inertia of a profile along a principal axis is called the Principal Inertia.

Geometrically, the principal inertia is the weighted average of the Chi-squared distances from the centroid to the projections of the row profiles on the respective principal axis. It is an absolute measure of the dispersion of the row profiles in the direction of that axis. Each principal inertia can be decomposed into components due to each row profile (or column profile). Rows, which contribute highly to a principal axis, largely determine the orientation and the identity of the corresponding principal axis.

The cosines of the row profiles’ deviation vectors from the centroid and the principal axis describe how closely each profile vector lies or correlates with a principal axis. Thus, they measure how well the display approximates the profile’s true position.

The eigenvalues (l i), corresponding to the sequence of the principal axes are in the decreasing order of magnitude:

l 1 > l 2 > l 3 > . . . . > l k

Row and Column Analyses

The row analysis of a matrix consists in situating the row profiles in a multidimensional space and finding the low- dimensional subspace, which comes closest to the profile points. The row profiles are projected onto such a subspace for interpretation of the inter-profile positions. Similarly, the analysis of column profiles involves situating the column profiles in a multidimensional space and finding the low-dimensional subspace, which comes closest to the profile points.

The row and column analyses are intimately connected. If a row analysis is performed, the column analysis is also ipso facto performed, and vice versa. The two analyses are equivalent in the sense that each has the same total inertia, the same dimensionality and the same decomposition of inertia into principal inertias along principal axes.

Row and Column Contributions to Inertia

These contributions can be expressed in relative terms:

Maximum number of dimensions

Since the sums of the frequencies across the columns must be equal to the row totals, and the sums across the rows equal to the column totals, there are in a sense only (number, J, of olumns-1) independent entries in each row, and (number, I, of rows-1) independent entries in each column of the contingency table. Thus, the maximum number of eigenvalues that can be extracted from a two- way table is equal to the minimum of [ the number of columns minus 1, and the number of rows minus 1] . If we choose to extract (i.e., interpret) the maximum number of dimensions that can be extracted, then we can reproduce exactly all the information contained in the table.