The objective of descriptive principal components analysis is to represent
geometrically the information in a data matrix in a low-dimensional subspace. The
primary objective is to highlight relations among elements of the data matrix
represented by points as well as the features displayed by the factorial maps.
Each row of matrix **X** = can be considered as a vector or a
point in.Â^{p}_{.}
The entire matrix constitutes a cloud of points in.**Â**^{p},
denoted as *N* (*I*). Similarly, each column can be considered as a
vector or point in **Â** ^{n}
– denoted as *N* (*J*).

In the space.**Â**^{p} if two points (individuals) are close to each
other, it means that they must have similar values on the *p *variables.
In the space **Â** ^{n}, if two
variables are situated close together, it means that they have similar values
over the sample of individuals.

Principal components analysis enables us to represent the points graphically
after a series of iterations in which we attempt to find a reduced number of
dimensions (or subspaces) that provide a good fit for the observations and the
variables such that the distances between the points in the subspace provide an
accurate representation of the distances in the true space (*i.e*., the
original data matrix).

**Fitting the data points in ****Â**^{p}

The entire matrix can be considered as a cloud of *n* points in* ***Â**^{p}_{.}
The objective of principal components analysis is to find a *q*-dimensional
subspace of **Â**^{p}_{.},
such that *q << p* and the configuration of* n* points in this
subspace closely approximates that of *n* points in the *p*-dimensional
space.

Let us first find a 1-dimensional vector subspace *i.e*., the straight
line passing through the origin, which represents the best possible fit of the
data. Let **u** be a unit vector and **u**^{T}
be its transpose. Obviously **u**^{T}**u**
= 1. Consider a vector *OV _{i}*. Its projection
on the 1-dimensional subspace (defined by

but,

(1)

Since is fixed, the minimization of is equivalent to
the maximization of. This quantity can be expressed as
a function of **X** and **u**:

= **(Xu) ^{T}**

= **u**^{T}**X**^{T}**X
u** (2)

**u** can be determined by maximizing the quadratic function [*u *^{T}*X*^{T}** X
u **], subject to the constraint

Max [**u **

*i.e*., **u**_{1 }is the vector for which this maximum is
attained.

Obviously, the two-dimensional subspace that best fits the data contains the
subspace defined by **u**_{1}. We can find the second vector **u _{2}**
in this subspace, which is orthogonal to

The *q*-dimensional subspace (*q **£ p*) that is best fit in the least squares sense is found in
a similar way. As a result, we get orthogonal vectors **u _{1, }u_{2},…..,u**

*l *_{1
}_{> }l _{2 }_{> }……. >
l _{q}

**Fitting the data points in ****Â **^{n}

Now let us consider the space **Â
**^{n}, where the matrix **X **can be represented by *p*
variable points, whose *n* coordinates are the columns of **X . **The
problem of finding a unit vector

(**X** ^{T} **v**) ^{T}**X**
^{T}**v**= **v** ^{T} **X** ^{T}**v** (4)

subject to the constraint **v **^{T}**V=**1.

We retain *q* eigenvectors of **XX**^{T }corresponding to
the (largest eigenvalues).

It can be proved that the largest eigenvalue l
_{1} of **X **^{T}**X** corresponds to the largest
eigenvalue *u_{1}* of

In other words l * _{1}*
=

**X **^{T}**X** and **XX**^{T} are identical.
Therefore, it is unnecessary to repeat the computations for diagonalization of
the matrix **XX**^{T}, since by a simple linear transformation
associated with the original matrix **X**, we obtain the vector **Xu** in
3 **^{n}** .The norm of the vector

Therefore the unit vector (**v*** _{a}
*corresponding to the same eigenvalue l

(5)

We find in a symmetric manner, for every a such that a ¹ 0

(6)

*u** _{a} *is the a

*Analysis in*__ __*Â*__ __^{n}

The problem of description of a large data set is different from that of its
summarization. The problem is no longer that of maximizing the sums of squares
of the distances of the projections of the points from the origin. Now the
objective is to maximize the sums of squares of the distances between all the
pairs of individuals. This implies that the best fitting line H_{1} is
no larger limited to pass through the origin.

Let *h _{i}* and

where is a summation for all *i*’s
and *j*’s £ *n*, is the mean of the projections of the *n* individuals and
therefore the projection of the centroid G of the set of points on to H_{1}.

The point with abscissa on H_{1} is the projection of point G whose *j*^{th}
coordinates are:

Thus if the origin is placed at G, the quantity to be maximized becomes again the sum of squares of distances from the origin.

The required subspace is obtained by first transforming the data matrix into
a matrix ** X**, whose general term is and then
performing the analysis on

The distance between two individuals *k* and *k**¢ *in 3 ^{p} is given by

(7)

It is possible that for some values of *j* the corresponding variables
are of different orders of magnitude. In such cases it would be necessary to
give the same weights to each variable in defining the distances among
individuals. Normalized principal axes are computed in such a situation. The
measurement scales are standardized by using the following distance measure:

(8)

where *s _{j}*

The normalized analysis in row Â
^{p} of the data matrix is also the general analysis of ** X**
whose general term is

(9)

In this space, the matrix **C = X **^{T}**X** is diagonalized, whose general term

,

that is* c _{ij}*

(10)

is the correlation coefficient between *j* and
*j**¢ *.

The coordinates of the *n* individual points on the principal axis *U**a *(the a
^{th} egenvector of matrix **C**) are the *n* components
of the vector **V****a =XU****a.**

The abscissae of the individual point *i* on this axis can be written as

(11)

*Analysis in **Â*^{n}

The Euclidean distances between two variables *j* and *j**¢ *is given by

(12)

But

All the variables are located on a sphere of radius 1, whose center is at the origin of the axes.

Moreover

(13)

Therefore the proximity between two variables is given by the correlation coefficient.

C_{jj}_{¢ }= 1 Û
the points *j* and *j**¢ *are
close

C_{jj}_{¢ }= - 1 Û
the points *j* and *j**¢ *are
opposite

*Note that the analysis in 3 *^{n}* **is not
performed relative to the centroid of the variable
points.*

It is not necessary to diagonalize the (*n ×n*)
matrix **XX**^{T}** **since the eigenvalues
l a and the eigenvector

(14)

is the unit vector of **XX**^{T}** **relative
to the eigenvalue l
a

*Coordinates of points of** N( I ) *

Fa (*i*)
= *i*^{th} coordinate of *i* of
cloud *N *(I) on the a - axis. The
a coordinates of points *i* of *I* are the components of the vector **XU****a **

In the case of normalized data, Fa (*i*) = the correlation coefficient between the a - axis and the point *i*.

*Coordinates of points of N(j)*

Ga (j) = *j*^{th}
coordinate of *j* of cloud N( *J *) on the a -axis. The a coordinates of points j of J
are the components of the vector *X *^{T}*V*** a **.

In the case of normalized data, Ga ( *j** *) = the correlation coefficient between
the a - axis and the variable *j*.

*Relation between N(I)
and N (J)*

(15)

(16)

Total variance of *I*** **= = Total variance of *J*

It is important to note that the interpretation of the transformation
represented by equation (14) differs greatly in Â ^{p} and Â ^{n
}

- In Â
^{p}, the transformation is equivalent to translating the origin of axes to the centroid of the points. - In Â
^{n }, the transformation

is a projection parallel to the first bisector of
the axes. The general term of the (*n × n*) matrix ** P**, that
is associated with this transformation is:

(17)

where

if *i*= *i**¢ *

if *i* ¹ *i**¢*