We can distinguish between two types of variables according to the level of measurement:
A quantitative variable is one in which the variates differ in magnitude, e.g. income, age, GNP, etc. A qualitative variable is one in which the variates differ in kind rather than in magnitude, e.g. marital status, gender, nationality, etc.
Continuous variables can be classified into three categories:
Interval scale data has order and equal intervals. Interval scale variables are measured on a linear scale, and can take on positive or negative values. It is assumed that the intervals keep the same importance throughout the scale. They allow us not only to rank order the items that are measured but also to quantify and compare the magnitudes of differences between them. We can say that the temperature of 40°C is higher than 30°C, and an increase from 20°C to 40°C is twice as much as the increase from 30°C to 40°C. Counts are interval scale measurements, such as counts of publications or citations, years of education, etc.
They occur when the measurements are continuous, but one is not certain whether they are on a linear scale, the only trustworthy information being the rank order of the observations. For example, if a scale is transformed by an exponential, logarithmic or any other nonlinear monotonic transformation, it loses its interval - scale property. Here, it would be expedient to replace the observations by their ranks.
These are continuous positive measurements on a nonlinear scale. A typical example is the growth of bacterial population (say, with a growth function Ae^{Bt}.). In this model, equal time intervals multiply the population by the same ratio. (Hence, the name ratio - scale).
Ratio data are also interval data, but they are not measured on a linear scale. . With interval data, one can perform logical operations, add, and subtract, but one cannot multiply or divide. For instance, if a liquid is at 40 degrees and we add 10 degrees, it will be 50 degrees. However, a liquid at 40 degrees does not have twice the temperature of a liquid at 20 degrees because 0 degrees does not represent "no temperature" -- to multiply or divide in this way we would have to use the Kelvin temperature scale, with a true zero point (0 degrees Kelvin = -273.15 degrees Celsius). In social sciences, the issue of "true zero" rarely arises, but one should be aware of the statistical issues involved.
There are three different ways to handle the ratio-scaled variables.
Discrete variables are also called categorical variables. A discrete variable, X, can take on a finite number of numerical values, categories or codes. Discrete variables can be classified into the following categories:
Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to certain distinct categories, but we cannot quantify or even rank order the categories: Nominal data has no order, and the assignment of numbers to categories is purely arbitrary. Because of lack of order or equal intervals, one cannot perform arithmetic (+, -, /, *) or logical operations (>, <, =) on the nominal data. Typical examples of such variables are:
Gender: |
1. Male |
Marital Status: |
1. Unmarried |
A discrete ordinal variable is a nominal variable, but its different states are ordered in a meaningful sequence. Ordinal data has order, but the intervals between scale points may be uneven. Because of lack of equal distances, arithmetic operations are impossible, but logical operations can be performed on the ordinal data. A typical example of an ordinal variable is the socio-economic status of families. We know 'upper middle' is higher than 'middle' but we cannot say 'how much higher'. Ordinal variables are quite useful for subjective assessment of 'quality; importance or relevance'. Ordinal scale data are very frequently used in social and behavioral research. Almost all opinion surveys today request answers on three-, five-, or seven- point scales. Such data are not appropriate for analysis by classical techniques, because the numbers are comparable only in terms of relative magnitude, not actual magnitude.
Consider for example a questionnaire item on the time involvement of scientists in the 'perception and identification of research problems'. The respondents were asked to indicate their involvement by selecting one of the following codes:
1 = Very low or nil
2 = Low
3 = Medium
4 = Great
5 = Very great
Here, the variable 'Time Involvement' is an ordinal variable with 5 states.
Ordinal variables often cause confusion in data analysis. Some statisticians treat them as nominal variables. Other statisticians treat them as interval scale variables, assuming that the underlying scale is continuous, but because of the lack of a sophisticated instrument, they could not be measured on an interval scale.
3. Dummy Variables from Quantitative Variables
A quantitative variable can be transformed into a categorical variable, called a dummy variable by recoding the values. Consider the following example: the quantitative variable Age can be classified into five intervals. The values of the associated categorical variable, called dummy variables, are 1, 2,3,4,5:
[Up
to 25] |
1 |
[25,
40 ] |
2 |
[40,
50] |
3 |
[50,
60] |
4 |
[Above
60] |
5 |
Preference variables are specific discrete variables, whose values are either in a decreasing or increasing order. For example, in a survey, a respondent may be asked to indicate the importance of the following nine sources of information in his research and development work, by using the code [1] for the most important source and [9] for the least important source:
Note that preference data are also ordinal. The interval distance from the first preference to the second preference is not the same as, for example, from the sixth to the seventh preference.
Multiple response variables are those, which can assume more than one value. A typical example is a survey questionnaire about the use of computers in research. The respondents were asked to indicate the purpose(s) for which they use computers in their research work. The respondents could score more than one category.
Since IDAMS does not handle multiple response variables, dummy variables have to be created for each category prior to analysis.