1.2 Characteristics of the Data

(i.)   Measurement Level: Whether the variables are measured on a metric or non-metric scale
(ii.)  Number of Variables

Data

Data is a set of organized information; it is a quantification or measurement of the real world by a set of variables. Statistical data is a set of observations on which values of variables are given. Variables are things that one measures, controls or manipulates in a research problem. Variables differ in respect of their role in research (whether independent or dependent variables) and 'how well' they can be measured. The amount of information provided by a variable depends upon the type of measurement scale used for their measurement.

There are two types of data sets:

  1. Individuals - variables data sets
  2. Proximities data sets

Individuals - variables data set is a matrix (n × p) in which the rows represent the individuals and columns represent the set of variables. At each intersection of a row, i, and column, j, (i.e. cell i, j) is a piece of information given by the variable, j, for the individual, i.

Proximities data set is a matrix (p × p) or (n × n), in which the cell values (directly measured or calculated) correspond to the distance between pairs of variables or pairs of individuals. Thus, there are two types of proximities data sets: Variable - Variable Proximities and Individual - Individual Proximities.