Principal Components and





The main objectives of factor analytic techniques described in this chapter are:

Therefore, factor analysis is applied as a data reduction or structure detection method.

Consider a data set, an n p matrix

X = |x ij |

In which the columns (j) represent the variables and rows (i) represent measurements of specific objects or individuals on the variables. Such a data set, particularly, when p is large, is unwieldy and difficult to comprehend. It is usually desirable to obtain a smaller set of variables that can be used to approximate the original data matrix X. The new variables, called principal components or factors, are designed to carry most of the information in the columns of X. Greater the correlation between the columns of X, the fewer the number of new variables required.

Principal components analysis and its cousin factor analysis operate by replacing the original data matrix X by an estimate composed of the product of two matrices. The left matrix in the product contains a small number of columns corresponding to the factors or components, whereas the right matrix of the product provides the information that relates components to the original variables. A scatter plot based on the left matrix is useful for relating the n objects of X with respect to the new factors. A plot based on the rows of the right matrix can be used to relate the components to the original variables. The decomposition of X into a product of two matrices is a special case of a matrix approximation procedure, called: Singular value decomposition. A two-dimensional plot based on this approximation is called a biplot.

Singular value decomposition can also be applied to a rectangular matrix: |r c |matrix formed by an r c contingency table. This application of singular value decomposition is called Correspondence Analysis.