*Cluster Analysis*

A procedure for partitioning a set of objects into groups or clusters in
such a way that profiles of objects in the same cluster are very similar, whereas
the profiles of objects in different clusters are quite distinct. The number
and characteristics of clusters are not known *a priori* and are derived
from the data.

(*Module*: CLUSFIND)

*Correlation Analysis*

Correlation is a measure of the relationship between two or more variables.
The most commonly used type of correlation coefficient is *Pearson's* *r*,
also called linear or product moment correlation. It is essential that the
variables are measured on at least interval scales.

(*Module*: PEARSON)

*Discriminant Analysis*

A technique for classifying objects into one of two or more alternative
groups (or populations) on the basis of a set of measurements (*i.e*. Variables). The populations are known to be
distinct and an object can belong to only one of them. The technique can also
be used to identify which variables contribute to making the classification.
Thus, the technique can be used for description as well as for prediction.

(*Module*: DISCRAN)

*Principal Components Analysis*

*Principal components analysis* (*PCA*) is performed to simplify
the description of a set of interrelated variables in a data matrix. *PCA*
transforms the original variables into new uncorrelated variables, called
principal components. Each principal component is a linear combination of the
original variables. The amount of information conveyed by a principal component
is its variance. The principal components are derived in decreasing order of
variance. Thus, the most informative principal component is the first, and the
least informative is the last.

*Factor analysis* is similar to principal components analysis in that
it is a technique for examining the interrelationships among a set of
variables, but its objective is somewhat different. Classical factor analysis
is viewed as a technique for clarifying the underlying dimensions or factors
that explain the pattern of correlations among a much larger set of variables.
This technique is applied to (i) reduce the number of variables, and (ii)
detect the structure in the relationships among variables.

Modern factor analysis, implemented in IDAMS, aims to represent geometrically the information in a data matrix in a low-dimensional Euclidean space and to provide related statistics. The fundamental goal is to highlight relations among elements (variables/individuals), which are represented by points in graphical displays (called factorial maps), and reveal the structural features of the data matrix. In these maps, both variables and individuals can be displayed. Since, the number of individuals is often very large; they are represented by the centers of gravity of their categories.

*Correspondence* *analysis *(*CA*) is a multivariate
technique for exploring cross-tabular data by converting them into graphical
displays, called factorial maps, and related numerical statistics. *CA* is primarily intended to reveal
features in the data rather than to test hypotheses about the underlying
processes, which generate the data. However, correspondence analysis and
principal components analysis are used under different circumstances. *PCA*
uses covariances or correlations (Euclidean metrics) for data reduction and is
therefore applicable to continuous measurements.* CA*, on the other hand,
uses chi-square metrics and is therefore applicable to contingency tables
(cross- tabulations). By extension, correspondence analysis can also be applied
to tables with binary coding.

The module can handle active as well as passive variables. Active variables are those, which participate in the determination of factorial axes. Passive variables are those, which do not participate in the determination of factorial axes, but they are projected on to the factorial axes.

(*Module*: FACTOR)

*Multidimensional Scaling *(*MDS*)

Multidimensional scaling is an exploratory data analysis technique that
transforms the proximities (or distances) between each pair of objects (or
variables) in a given data set into comparable Euclidean distances. *MDS*
produces a spatial representation of the objects (usually two-dimensional maps)
in such a way that maximizes the fit between the proximities for each pair of
objects and the Euclidean distance between them in the spatial representation.
The greater the proximity between the objects, the closer they are situated in
the map. Like factor analysis, the main concern of *MDS* is to reveal the
structure of relationships among the objects.

(*Module*: MDSCAL)

*Multiple Classification Analysis *(*MCA*)

*MCA* is a technique for examining the inter-relationships between
several predictor variables and a dependent variable. The technique can handle predictors
with no better than nominal measurements and interrelationships of any kind
among predictors or between a predictor and the dependent variable. The
dependent variable may be interval-scaled or dichotomous.

(*Module*: MCA)

*Analysis of Variance*

This statistical technique assesses the effect of an independent or 'control' categorical variable (factor) upon a continuous dependent variable.

(*Module*: ONEWAY)

*POSCOR (Ranking program based on partially
ordered sets)*

*POSCOR* is a procedure for ranking of objects when more than one
variable is considered simultaneously in rank - ordering. The procedure offers
the possibility to give each object belonging to a given set its relative
position in probabilistic terms vis-à-vis the other objects in the same set. The
position of each object is measured by a score, called *POSCOR* score.

(*Module*: POSCOR)

*Rank*

The procedure allows the aggregation of individual opinions, expressing the choice of priorities, ranking of alternatives or selection of preferences. It determines a reasonable rank order of alternatives, using preference data as input and three different ranking procedures – two based on fuzzy logic and one based on classical logic.

(*Module*:* RANK)*

*Regression Analysis*

A technique for exploring the relationship between a dependent variable and one or more independent variables. Linear regression explores the relationship that can be described by straight lines or their generalization to many dimensions.

(*Module*: REGRESSN)

*Search*

*Search* is a binary segmentation procedure for developing a predictive
model for dependent variable(s). It divides the sample through a series of
binary splits into mutually exclusive series of subgroups such that at each
binary split the two new subgroups reduce the predictive error more than a
split into any other pair of subgroups.

(*Module*: SEARCH)

*Typology*

A clustering procedure for large data sets, which can handle nominal, ordinal and interval-scaled variables simultaneously. The procedure can handle active and passive variables. Active variables are those, which take part in the construction of the typology, whereas passive variables are those, which do not take part in the construction of the typology, but their average statistics are computed for each typology group.

(*Module*: TYPOL)

*Non-parametric Statistics*

Non-parametric statistics allow testing of hypothesis even when certain classical assumptions, such as interval-scale measurement or normal distribution are not met. In research practice, these classical assumptions are often strained. Basically, there is at least one non-parametric equivalent for each parametric general type of test. Non-parametric tests generally fall into the following groups:

- Tests of differences between groups
- Tests of differences between variables
- Tests of relationships between variables

*Tests of Differences between Groups*

*Mann-Whitney* *U-Test*: A non-parametric test equivalent to *t*-test.
It tests whether two independent samples are from the same population. Requires
an ordinal level of measurement. U is the number of times a value in the first
group precedes a value in the second group when values are ordered in ascending
order.

*Relationships between Variables*

Non-parametric equivalents of correlation coefficient are: Spearman's
correlation coefficient *Rho*, Kendall's *Tau* and *Gamma*.

*Spearman's Correlation Coefficient *is a commonly used
non-parametric measure of correlation between two* *ordinal variables. It
can be thought of as the regular product moment correlation coefficient in
terms of the proportion of variability accounted for.

*Kendall's* *Tau *is* *a non-parametric measure of
association for ordinal or ranked variables. It is equivalent to Spearman's *Rho*
with regard to the underlying assumptions. However, Spearman's *Rho* and
Kendall's *Tau* are not identical in magnitude, since their underlying
logic and computational formulae are quite different. Two different variants of
*Tau* are computed: *Tau b* and *Tau c. *These measures differ
only as to how tied ranks are handled. In most cases, these values are very
similar, and when discrepancies occur, it is probably safer to interpret the
lower value.

Another non-parametric measure of correlation is *Gamma*. In terms
of the underlying assumptions, *Gamma* is equivalent to Spearman's* Rho*
or Kendall's* Tau.* In terms of interpretation and computation, it is more
similar to Kendall's *Tau* than Spearman's *Rho*. *Gamma*
statistic is, however, preferable to Spearman's *Rho* and Kandall's *Tau*
when the data contain many tied observations.

*Chi-square test*: This goodness of fit test compares the observed
and expected frequencies in each category to test whether all the categories
contain the same proportion of values.

(*Module*: TABLES)