### 6.4 Descriptive Principal Components Analysis

The objective of descriptive principal components analysis is to represent geometrically the information in a data matrix in a low-dimensional subspace. The primary objective is to highlight relations among elements of the data matrix represented by points as well as the features displayed by the factorial maps. Each row of matrix X = can be considered as a vector or a point in.Âp. The entire matrix constitutes a cloud of points in.Âp, denoted as N (I). Similarly, each column can be considered as a vector or point in Â n – denoted as N (J).

In the space.Âp if two points (individuals) are close to each other, it means that they must have similar values on the p variables. In the space Â n, if two variables are situated close together, it means that they have similar values over the sample of individuals.

Principal components analysis enables us to represent the points graphically after a series of iterations in which we attempt to find a reduced number of dimensions (or subspaces) that provide a good fit for the observations and the variables such that the distances between the points in the subspace provide an accurate representation of the distances in the true space (i.e., the original data matrix).

Fitting the data points in Âp

The entire matrix can be considered as a cloud of n points in Âp. The objective of principal components analysis is to find a q-dimensional subspace of Âp., such that q << p and the configuration of n points in this subspace closely approximates that of n points in the p-dimensional space.

Let us first find a 1-dimensional vector subspace i.e., the straight line passing through the origin, which represents the best possible fit of the data. Let u be a unit vector and uT be its transpose. Obviously uTu = 1. Consider a vector OVi. Its projection on the 1-dimensional subspace (defined by u) is the scalar product of OVi and u. The points are fitted to the subspace, using the least squares principle, i.e., by minimizing the sum of squares of the distances:

but,

(1)

Since is fixed, the minimization of is equivalent to the maximization of. This quantity can be expressed as a function of X and u:

= (Xu)T Xu

= uTXTX u                                                                                                                                            (2)

u can be determined by maximizing the quadratic function [u TXTX u ], subject to the constraint uT u =1. Suppose

Max [u TXTX u ] = u1                                                                                                                             (3)

i.e., u1 is the vector for which this maximum is attained.

Obviously, the two-dimensional subspace that best fits the data contains the subspace defined by u1. We can find the second vector u2 in this subspace, which is orthogonal to u1 and maximizes the quadratic function. [u TXTX u ]..

The q-dimensional subspace (q £ p) that is best fit in the least squares sense is found in a similar way. As a result, we get orthogonal vectors u1, u2,…..,uq of the matrix XTX corresponding to the q largest eigenvalues, ranked in the descending order:

l 1 > l 2 > ……. > l q

Fitting the data points in Â n

Now let us consider the space Â n, where the matrix X can be represented by p variable points, whose n coordinates are the columns of X. The problem of finding a unit vector v (and subsequently) a q-dimensional subspace that is best fit for the points in Â n is equivalent to the problem of maximizing the quadratic function:

(X T v) TX Tv= v T X Tv                                                                                                                           (4)

subject to the constraint v TV=1.

We retain q eigenvectors of XXT corresponding to the (largest eigenvalues).

It can be proved that the largest eigenvalue l 1 of X TX corresponds to the largest eigenvalue u1 of XXT

In other words l 1 = u1. Similarly, the series of nonzero eigenvalues of the two matrices

X TX and XXT are identical. Therefore, it is unnecessary to repeat the computations for diagonalization of the matrix XXT, since by a simple linear transformation associated with the original matrix X, we obtain the vector Xu in 3 n .The norm of the vector XU a is l a since ua T X TX u a..= l a.

Therefore the unit vector (va corresponding to the same eigenvalue l a (l a ¹ 0) is given by the relationship

(5)

We find in a symmetric manner, for every a such that a ¹ 0

(6)

ua is the a th principal axis in Â p .va is the a th principal axis in Â n.

Analysis in Â n

The problem of description of a large data set is different from that of its summarization. The problem is no longer that of maximizing the sums of squares of the distances of the projections of the points from the origin. Now the objective is to maximize the sums of squares of the distances between all the pairs of individuals. This implies that the best fitting line H1 is no larger limited to pass through the origin.

Let hi and hj represent the values of the projections of two individual points i and j on H1. Then the expression, after some algebraic manipulation reduces to:

where is a summation for all i’s and j’s £ n, is the mean of the projections of the n individuals and therefore the projection of the centroid G of the set of points on to H1.

The point with abscissa on H1 is the projection of point G whose jth coordinates are:

Thus if the origin is placed at G, the quantity to be maximized becomes again the sum of squares of distances from the origin.

The required subspace is obtained by first transforming the data matrix into a matrix X, whose general term is and then performing the analysis on X.

The distance between two individuals k and k¢ in 3 p is given by

(7)

It is possible that for some values of j the corresponding variables are of different orders of magnitude. In such cases it would be necessary to give the same weights to each variable in defining the distances among individuals. Normalized principal axes are computed in such a situation. The measurement scales are standardized by using the following distance measure:

(8)

where sj is the standard deviation of variable j.

The normalized analysis in row Â p of the data matrix is also the general analysis of X whose general term is

(9)

In this space, the matrix C = X TX is diagonalized, whose general term

,

that is cij

(10)

is the correlation coefficient between j and j¢ .

The coordinates of the n individual points on the principal axis Ua (the a th egenvector of matrix C) are the n components of the vector Va =XUa.

The abscissae of the individual point i on this axis can be written as

(11)

Analysis in Ân

The Euclidean distances between two variables j and j¢ is given by

(12)

But

All the variables are located on a sphere of radius 1, whose center is at the origin of the axes.

Moreover

(13)

Therefore the proximity between two variables is given by the correlation coefficient.

Cjj¢ = 1 Û the points j and j¢ are close

Cjj¢ = - 1 Û the points j and j¢ are opposite

Note that the analysis in 3 n is not performed relative to the centroid of the variable points.

It is not necessary to diagonalize the (n ×n) matrix XXT since the eigenvalues l a and the eigenvector Ua of matrix C = XXT are known. This is due to the fact that the vector

(14)

is the unit vector of XXT relative to the eigenvalue l a

Coordinates of points of N( I )

Fa (i) = ith coordinate of i of cloud N (I) on the a - axis. The a coordinates of points i of I are the components of the vector XUa

In the case of normalized data, Fa (i) = the correlation coefficient between the a - axis and the point i.

Coordinates of points of N(j)

Ga (j) = jth coordinate of j of cloud N( J ) on the a -axis. The a coordinates of points j of J are the components of the vector X TVa .

In the case of normalized data, Ga ( j ) = the correlation coefficient between the a - axis and the variable j.

Relation between N(I) and N (J)

(15)

(16)

Total variance of I = = Total variance of J

It is important to note that the interpretation of the transformation represented by equation (14) differs greatly in Â p and Â n

• In Â p , the transformation is equivalent to translating the origin of axes to the centroid of the points.
• In Â n , the transformation

is a projection parallel to the first bisector of the axes. The general term of the (n × n) matrix P, that is associated with this transformation is:

(17)

where

if i= i¢

if i ¹ i¢