Principal components analysis can be defined as follows.
Consider a data matrix:
X =|xij |
in which the columns represent the p variables and rows represent measurements of n objects or individuals on those variables. The data can be represented by a cloud of n points in a p-dimensional space, each axis corresponding to a measured variable. We can then look for a line OY1 in this space such that the dispersion of n points when projected onto this line is a maximum. This operation defines a derived variable of the form
![]()
with coefficients
satisfying the
condition
![]()
After obtaining OY1, consider the (p-1) - dimensional subspace orthogonal to OY1 and look for the line OY2 in this subspace such that the dispersion of points when projected onto this line is a maximum. This is equivalent to seeking a line OY2 perpendicular to OY1 such that the dispersion of points when they are projected onto this line is the maximum. Having obtained OY2, consider a line in the (p-2) - dimensional subspace, which is orthogonal to both OY1 and OY2, such that the dispersion of points when projected onto this line is as large as possible. The process can be continued, until p mutually orthogonal lines are determined. Each of these lines defines a derived variable:
![]()
where the constants
are determined by
the requirement that the variance of Yi is a maximum, subject
to the constraint of orthogonality as well as
![]()
for each i.
The Yi thus obtained are called Principal Components of the system and the process of obtaining them is called Principal Components Analysis.
The p-dimensional geometric model defined above can be considered as the true picture of the data. If we wish to obtain the best q-dimensional representation of the p-dimensional true picture, then we simply have to project the points onto the q-dimensional subspace defined by the first q principal components Y1, Y2, …., Yq.
The variance of a linear composite:
![]()
is given by
![]()
where sij is the covariance between variables i and j.. The variance of a linear composite can also be expressed in the notation of matrix algebra as:
aT S a
where a is the vector of the variable weights and S is the covariance matrix. aT is the transpose of a.
Principal components analysis finds the weight vector a, that maximizes
aTS a
subject to the constraint that
![]()
It is essential to constrain the size of a, otherwise the variance of the linear composite can become arbitrarily large by selecting large weights.
It is important to note that principal components decomposition is not scale invariant. We would get different decompositions, depending upon whether the principal components are calculated from the un-scaled cross-products matrix (SSCP) or covariance matrix. The magnitudes of the diagonal elements of a cross-products matrix or a covariance matrix influence the nature of the principal components. Hence standardized variables are commonly used. The XTX matrix based on standardized variables is proportional to a correlation matrix. The covariance matrix can be viewed as a partial step between the SSCP and the correlation matrix. Since the covariance matrix is based on the deviations of the variables from their respective means, it corrects for the differences in the magnitudes of the elements of SSCP for the overall level, but it does not correct for the differences in the variances among the variables.
If we have a set of n observations (objects/cases) on p variables, then we can find the largest principal component (of a cross-products matrix, covariance matrix or correlation matrix) as the weight vector

which maximizes the variance of
![]()
subject to the constraint
![]()
We can then define the second largest principal component as the weight vector
a2=![]()
which maximizes the variance of
![]()
subject to the constraints:
·
![]()
·
Principal
component 2 is linearly independent of principal component 1, i.e.
![]()
We can define the third largest principal component as the weight vector
a3= ![]()
which maximizes the variance of
![]()
subject to the constraints:
·
![]()
The third principal component is orthogonal to the first two principal components. These two orthogonality conditions are
![]()
![]()
This process can be continued till the last (i.e., the pth ) principal component is derived.
The sum of the variances of the principal components is equal to the variance of the original variables.
![]()
where
is the variance of the principal
component. If the variables are standardized then
![]()
In the matrix notation, the above definition of principal components leads to the following equation
R A = A L
where A is a matrix of eigenvectors as column vectors and L is a diagonal matrix of the corresponding latent roots (or the eigenvalues) of the correlation matrix R, rank- ordered from the largest to the smallest. The elements of L have to be in the same order as their associated latent vector (or eigenvector). The largest latent root (l 1) of R is the variance of the first or largest principal component of R and its associated vector.

is the set of weights for the first principal component, which maximizes the variance of
![]()
Similarly for the second principal component, and so on.
The last latent root (l p) is the variance of the last or the smallest principal component.
The i th latent root and its associated weight vector satisfy the matrix equation:
Rai = l iaI
Pre-multiplying the above equation by aiT leads to
aiTRai = aiTil iaI =li
since
![]()
The variance of the first principal component = l 1. Similarly for the second principal component, and so on. The last latent root (l p) is the variance of the last or the smallest principal component. Thus:
Ra 1 = l ia1
Ra2 = l 2a2
.
Rap = l pap
In matrix notation,
RA =A Λ
Where A is the matrix of eigenvectors, as column vectors, and Λ is a matrix of corresponding latent roots ordered from the largest to the smallest.
Since RA = A Λ
Pre-multiplying by AT leads to
ATRA = Λ TA
= Λ
because ATA = I
This means that we can decompose R into a product of three matrices, involving eigenvectors and eigenroots. In other words, the variation in R is expressed in terms of the weighting vectors (eigenvectors) of the principal components and variances (eigenvalues) of the principal components. This is called the singular value decomposition of the correlation matrix R. This is the key concept underlying Principal Components Analysis.
Interpretation of principal components
It becomes easier to interpret the principal components when the elements of
the latent vectors are transformed to correlations of the variables with the
particular principal components. This can be done by multiplying each element
of a particular latent vector, ai
by the square root of the associated latent root,
. The correlations of variables with principal components are
called loadings.
The purpose of principal components analysis is to reduce the complexity of the multivariate data into the principal components space and then choose the first q principal component (q < p) that explain most of the variation in the original variables. The following criteria for selecting the number of principal components are suggested in the literature:
Usually, the first approach includes too many components, whereas the second approach includes too few components. The 80% criterion can be a good compromise.