Correlation is a measure of the strength of relationship between random variables.
The (population) correlation between two variables *X* and *Y* is
defined as:

r (*X, Y*)
= Covariance (*X, Y*) / {Variance (*X*) ´ Variance (*Y*)} ^{1/2}

where

Covariance (*X*, *Y*)=S (*X**-
**m *_{X}) ´ (*Y**-
**m *_{Y})

where m_{X }and m_{Y }are the expected values of *X*
and *Y *respectively.

r is called the *Product Moment
Correlation Coefficient* or simply the *Correlation Coefficient*. If *X*
and *Y* tend to increase together, r
is positive. If, on the other hand, one tends to increase as the other tends to
decrease, r is negative. The value of
correlation coefficient lies between -1 and +1, inclusive.

The sample correlation of a set of *N* bivariate observations (*X*_{1},
*Y*_{1}),* *(*X _{2}*,

where

is the mean
value of *X*,

is the mean
value of *Y*,

*S*_{X} is the standard deviation of *X*,
and

*S*_{Y} is the standard deviation of *Y*.

The coefficient *r* satisfies the inequality -1 £ *r *£ +1.
Equality is achieved only if all the points in the scatter plot of *X* and
*Y* lie exactly on a straight line. By definition,* r* must be used
only if the relationship between *X* and *Y* is linear.

r@ ± 1 ÞStrong correlation between

XandY.

r@ 0 ÞIt must not be concluded that there is no relationship between

XandY. The scatter plot should be examined. If the scatter plot is a parabolic curve,rwould be approximately equal to zero.

Numerically, *r *can be interpreted as the average product of *X*
and *Y *coordinates of the scatter plot of the standardized data. If
points with both *X* and *Y* coordinates with positive sign
predominate in the scatter plot, *r *is positive; if the points with both *X
*and *Y* coordinates with negative sign predominate, *r *is
negative.

If it is assumed that (*X*_{1}, *Y*_{1}), (*X*_{2},
*Y _{2}*), . . ., (

To make inference about r using *r*,
we require the sampling distribution of *r*, which is quite complex. When r = 0 and (*X*, *Y*) is bivariate
normal, the statistic:

has a *t*-distribution with *N*-2 degrees of freedom, and can be
used to test the null hypothesis [r=0].

If *X* and *Y* are not bivariate normal and r = 0, the statistic has a standard
normal distribution in large samples. Thus, the statistic can be
used to test the null hypothesis [r=0],
even if the joint distribution is not bivariate normal.

If the assumption of bivariate normality is not satisfied by the data, it may be possible to make a preliminary transformation of the data to bivariate normality. However, it would be difficult to assess the effect of transformation on subsequent procedures involving correlation coefficient. An alternative procedure would be to compute a non-parametric correlation coefficient.