1.2 Characteristics of the Data

(i.)   Measurement Level: Whether the variables are measured on a metric or non-metric scale
(ii.)  Number of Variables


Data is a set of organized information; it is a quantification or measurement of the real world by a set of variables. Statistical data is a set of observations on which values of variables are given. Variables are things that one measures, controls or manipulates in a research problem. Variables differ in respect of their role in research (whether independent or dependent variables) and 'how well' they can be measured. The amount of information provided by a variable depends upon the type of measurement scale used for their measurement.

There are two types of data sets:

  1. Individuals - variables data sets
  2. Proximities data sets

Individuals - variables data set is a matrix (n p) in which the rows represent the individuals and columns represent the set of variables. At each intersection of a row, i, and column, j, (i.e. cell i, j) is a piece of information given by the variable, j, for the individual, i.

Proximities data set is a matrix (p p) or (n n), in which the cell values (directly measured or calculated) correspond to the distance between pairs of variables or pairs of individuals. Thus, there are two types of proximities data sets: Variable - Variable Proximities and Individual - Individual Proximities.