6.5.1 Basic Concepts and Definitions

There are certain fundamental concepts in correspondence analysis: which are described below.

Primitive matrix

The original data matrix, N ( I , J ), or contingency table, is called the primitive matrix or primitive table. The elements of this matrix are nij.

Profiles

While interpreting a cross-tabulation, it makes little sense to compare the actual frequencies in each cell. Each row and each column has a different number of respondents, called the base of respondents. For comparison it is essential to reduce either the rows or columns to the same base.

Consider a contingency table N (I, J) with I rows (i=1, 2, I) and J columns ( j =1,2,…,J ) having frequencies nil. Marginal frequencies are denoted by ni+ and n+j

Total frequency is given by

Row profiles

The profile of each row i is a vector of conditional densities:

The complete set of the row profile may be denoted by I J matrix R.

Matrix of Row Profiles

Rows

Columns

Total

 

1

2                       j

 

1.

2.

3.

.

I

.

............

............

............

.

 

 

............

1

1

1

 

1

Column mass

            

1

 

Column Profiles

The profile of each column j is a vector of conditional densities . The complete set of the column profiles may be denoted by (i j) matrix C.

Matrix of Column Profiles

Rows

Columns

Row Mass

 

1

2                       J

 

1.

2.

3.

.

I

.

............

............

............

.

 

 

............

 

 

Column mass

1

…1                  1

1

Average row profile = n+j /N (j=1,2,….J )

Average column profile = ni+/N (i=1,2,….,I )

Masses

Another fundamental concept in correspondence analysis is the concept of mass. The mass of the ith row =Marginal frequency of the ith row/Grand total

=n+i/n

Similarly the mass of the jth column = Marginal frequency of the jth column/Grand total

=nj+/n

Correspondence matrix

The correspondence matrix P is defined as the original table N divided by the grand total n, P = (1/n) N. Thus, each cell of the correspondence matrix is given by the cell frequency divided by the grand total.

The correspondence matrix shows how one unit of mass is distributed across the cells. The row and column totals of the correspondence matrix are the row mass and column mass, respectively.

Clouds of Points N (I ) and N ( J )

The cloud of points N (I) is the set of elements of points i I, whose coordinates are the components of the profile and whose mass is

The cloud of points N ( J ) is the set of elements of points j J, whose coordinates are the components of the profile and whose mass is nj+ / n++.

Distances

A variant of Euclidean distance, called the weighted Euclidean distance, is used to measure and thereby depict the distances between profile points. Here, the weighting refers to differential weighting of the dimensions of the space and not to the weighting of the profiles.

Distance between two rows i and i is given by

In a symmetric fashion, the distance between two columns j and j is given by

The distance thus obtained is called the Chi-square distance. The Chi-square distance differs from the usual Euclidean distance in that each square is weighted by the inverse of the frequency corresponding to each term.

The division of each squared term by the expected frequency is "variance – standardizing" and compensates for the larger variance in high frequencies and the smaller variance in low frequencies. If no such standardization were performed, the differences between larger proportions would tend to be large and thus dominate the distance calculation, while the differences between the smaller proportions would tend to be swamped. The weighting factors are used to equalize these differences.

Essentially, the reason for choosing the Chi-square distance is that it satisfies the principle of distributional equivalence, expressed as follows:

Inertia

Inertia is a term borrowed from the "moment of inertia" in mechanics. A physical object has a center of gravity (or centroid). Every particle of the object has a certain mass m and a certain distance d from the centroid. The moment of inertia of the object is the quantity md2 summed over all the particles that constitute the object.

Moment of inertia =

This concept has an analogy in correspondence analysis. There is a cloud of profile points with masses adding up to 1. These points have a centroid ( i.e., the average profile) and a distance (Chi-square distance) between profile points. Each profile point contributes to the inertia of the whole cloud. The inertia of a profile point can be computed by the following formula.

For the ith row profile,

Inertia =

where rij is the ratio nw/n i+ and is n.j/n

The inertia of the jth column profile is computed similarly.

The total inertia of the contingency table is given by:

Total inertia

which is the Chi-square statistic divided by n.