Principal components analysis can be defined as follows.
Consider a data matrix:
X =|x_{ij} |
in which the columns represent the p variables and rows represent measurements of n objects or individuals on those variables. The data can be represented by a cloud of n points in a p-dimensional space, each axis corresponding to a measured variable. We can then look for a line OY_{1} in this space such that the dispersion of n points when projected onto this line is a maximum. This operation defines a derived variable of the form
with coefficients satisfying the condition
After obtaining OY_{1},_{ }consider the (p-1) - dimensional subspace orthogonal to OY_{1 }and look for the line OY_{2} in this subspace such that the dispersion of points when projected onto this line is a maximum. This is equivalent to seeking a line OY_{2} perpendicular to OY_{1} such that the dispersion of points when they are projected onto this line is the maximum. Having obtained OY_{2}, consider a line in the (p-2) - dimensional subspace, which is orthogonal to both OY_{1} and OY_{2}, such that the dispersion of points when projected onto this line is as large as possible. The process can be continued, until p mutually orthogonal lines are determined. Each of these lines defines a derived variable:
where the constants are determined by the requirement that the variance of Y_{i} is a maximum, subject to the constraint of orthogonality as well as
for each i.
The Y_{i} thus obtained are called Principal Components of the system and the process of obtaining them is called Principal Components Analysis.
The p-dimensional geometric model defined above can be considered as the true picture of the data. If we wish to obtain the best q-dimensional representation of the p-dimensional true picture, then we simply have to project the points onto the q-dimensional subspace defined by the first q principal components Y_{1}, Y_{2}, …., Y_{q}.
The variance of a linear composite:
is given by
where s_{ij} is the covariance between variables i and j.. The variance of a linear composite can also be expressed in the notation of matrix algebra as:
a^{T} S a
where a is the vector of the variable weights and S is the covariance matrix. a^{T} is the transpose of a.
Principal components analysis finds the weight vector a, that maximizes
a^{T}S a
subject to the constraint that
It is essential to constrain the size of a, otherwise the variance of the linear composite can become arbitrarily large by selecting large weights.
It is important to note that principal components decomposition is not scale invariant. We would get different decompositions, depending upon whether the principal components are calculated from the un-scaled cross-products matrix (SSCP) or covariance matrix. The magnitudes of the diagonal elements of a cross-products matrix or a covariance matrix influence the nature of the principal components. Hence standardized variables are commonly used. The X^{T}X matrix based on standardized variables is proportional to a correlation matrix. The covariance matrix can be viewed as a partial step between the SSCP and the correlation matrix. Since the covariance matrix is based on the deviations of the variables from their respective means, it corrects for the differences in the magnitudes of the elements of SSCP for the overall level, but it does not correct for the differences in the variances among the variables.
If we have a set of n observations (objects/cases) on p variables, then we can find the largest principal component (of a cross-products matrix, covariance matrix or correlation matrix) as the weight vector
which maximizes the variance of
subject to the constraint
We can then define the second largest principal component as the weight vector
a_{2}=
which maximizes the variance of
subject to the constraints:
·
·
Principal
component 2 is linearly independent of principal component 1, i.e.
We can define the third largest principal component as the weight vector
a_{3}=
which maximizes the variance of
subject to the constraints:
·
The third principal component is orthogonal to the first two principal components. These two orthogonality conditions are
This process can be continued till the last (i.e., the p^{th} ) principal component is derived.
The sum of the variances of the principal components is equal to the variance of the original variables.
where is the variance of the principal component. If the variables are standardized then
In the matrix notation, the above definition of principal components leads to the following equation
R A = A L
where A is a matrix of eigenvectors as column vectors and L is a diagonal matrix of the corresponding latent roots (or the eigenvalues) of the correlation matrix R, rank- ordered from the largest to the smallest. The elements of L have to be in the same order as their associated latent vector (or eigenvector). The largest latent root (l _{1}) of R is the variance of the first or largest principal component of R and its associated vector.
is the set of weights for the first principal component, which maximizes the variance of
Similarly for the second principal component, and so on.
The last latent root (l _{p}) is the variance of the last or the smallest principal component.
The i ^{th} latent root and its associated weight vector satisfy the matrix equation:
Ra_{i} = l _{i}a_{I}
Pre-multiplying the above equation by a_{i}^{T} leads to
a_{i}^{T}Ra_{i}_{ }= a_{i}^{T}_{i}_{l iaI }=l_{i}_{ }
since
The variance of the first principal component = l _{1. }Similarly for the second principal component, and so on. The last latent root (l _{p}) is the variance of the last or the smallest principal component. Thus:
Ra _{1} = l _{i}a_{1}
Ra_{2} = l _{2}a_{2}
.
Ra_{p} = l _{p}a_{p}
In matrix notation,
RA =A Λ
Where A is the matrix of eigenvectors, as column vectors, and Λ is a matrix of corresponding latent roots ordered from the largest to the smallest.
Since RA = A Λ
Pre-multiplying by A^{T} leads to
A^{T}RA = Λ ^{T}A
= Λ
because A^{T}A = I
This means that we can decompose R into a product of three matrices, involving eigenvectors and eigenroots. In other words, the variation in R is expressed in terms of the weighting vectors (eigenvectors) of the principal components and variances (eigenvalues) of the principal components. This is called the singular value decomposition of the correlation matrix R. This is the key concept underlying Principal Components Analysis.
Interpretation of principal components
It becomes easier to interpret the principal components when the elements of the latent vectors are transformed to correlations of the variables with the particular principal components. This can be done by multiplying each element of a particular latent vector, a_{i} by the square root of the associated latent root, . The correlations of variables with principal components are called loadings.
The purpose of principal components analysis is to reduce the complexity of the multivariate data into the principal components space and then choose the first q principal component (q < p) that explain most of the variation in the original variables. The following criteria for selecting the number of principal components are suggested in the literature:
Usually, the first approach includes too many components, whereas the second approach includes too few components. The 80% criterion can be a good compromise.