IDAMS is a software package for the validation, manipulation and statistical analysis of data. It is organized as a collection of data management and analysis facilities accessible through a user interface and a common control language. Examples of the types of data that can be processed with IDAMS are: the answers to questions by respondents in a survey, information about books in a library, the personal characteristics and performance of students at a college, measurements from a scientific experiment. The common features of such data are that they consist of values of variables for each of a collection of objects/cases (e.g. in a sample survey, the questions correspond to the variables and the respondents to the cases).
Many different packages and programs exist for aid in the statistical analysis of such data. One special feature of IDAMS is that it also provides facilities for extensive data validation (e.g. code checking and consistency checking) before embarking on analysis. As far as analysis is concerned, IDAMS performs classical techniques such as table building, regression analysis, one-way analysis of variance, discriminant and cluster analysis and also some more advanced techniques such as principal components factor analysis and analysis of correspondences, partial order scoring, rank ordering of alternatives, segmentation and iterative typology. In addition, WinIDAMS provides for interactive construction of multidimensional tables, interactive graphical exploration of data and interactive time series analysis.
It is a multiple
document interface (MDI) which allows to work simultaneously with
different types of documents in separate windows. The Interface
provides the following:
Aggregating
data (AGGREG). Allows the grouping of records from a number of
cases into one record and to output a new dataset with one record
for each group, for example, records representing members of a household
are grouped into household representing record. The variables in the
new records are summary statistics of specified variables from the
individual records, e.g. the sum, mean, minimum/maximum value. Building
an IDAMS dataset (BUILD). A raw data file (which may contain multiple
records per case) is input along with a dictionary describing the
variables to be selected. BUILD checks for non-numeric values in numeric
fields; blank fields can be recoded to user-specified numeric values
and other non-numerics are reported and replaced by 9's. The output
is an IDAMS dataset comprising a Data file with a single record per
case and a dictionary which describes each field in the data records.
Checking of codes (CHECK). Reports cases which have invalid
variable values. Valid codes for each variable are specified by the
user and/or taken from the dictionary. Checking of consistency
(CONCHECK). Reports cases with inconsistencies between two or more
variables. IDAMS Recode statements are used to specify the logical
relationships to be checked. Checking the merging of records
(MERCHECK). Checks that the correct records are present for each case
in a file with multiple records per case. It outputs a file containing
equal numbers of records per case. Invalid or duplicate records can
be deleted and missing records can be inserted with missing values
specified by the user. Correcting data (CORRECT). Updates
a Data file by applying corrections to individual variable values
for specified cases. The Results file contains a written trace of
corrections allowing them to be archived. Importing/exporting
data (IMPEX). Import is aimed at building IDAMS datasets or matrices
from files coming from other software. The aim of export is to make
possible the use of Data and Matrix files, stored in or created by
IDAMS, in other packages. Free and DIF format text files can be imported/exported.
Listing datasets (LIST). Values for selected variables (original
or recoded) and/or selected cases can be listed in the column format.
Merging datasets (MERGE). Two datasets can be merged by
matching cases according to a common set of variables called match
variables. There are 4 options for selecting cases for the output
dataset: (1) only cases present in both files (intersection); (2)
cases present in either file (union); (3) each case in the first file;
(4) each case in the second file. The user specifies which variables
from each of the two input files are to be output. An option exists
for matching a case from one file with more than one case from the
second file, e.g. for adding household data from one file to each
individual's record in a second file. Sorting and merging files
(SORMER). This is a general purpose utility for sorting data into
ascending or descending order on up to 12 fields. Up to 16 files may
be merged. Subsetting datasets (SUBSET). Outputs a new dataset
(Data and Dictionary files) containing selected cases and/or variables
from the input dataset. There is an option to check for duplicate
cases. Transforming data (TRANS). Allows variables created
with the IDAMS Recode facility to be saved in a permanent dataset.
Cluster analysis
(CLUSFIND). Performs cluster analysis by partitioning a set of objects
(cases or variables) into a set of clusters as determined by one of
6 algorithms, 2 based on partitioning around medoids, one based on
fuzzy clustering and the other 3 based on hierarchical clustering.
Configuration analysis (CONFIG). Performs analysis on a
single input configuration, created for example by MDSCAL program.
It has the capability of centering, norming, rotating, translating
dimensions, computing inter-point distances and scalar products. The
configuration can be plotted after each transformation. Discriminant
analysis (DISCRAN). Looks for the best linear discriminant function(s)
of a set of variables which reproduces, as far as possible, an a priori
grouping of the cases. It uses a stepwise procedure, i.e. in each
step the most powerful variable is entered. Three samples of cases
can be distinguished: basic sample on which the main discriminant
analysis steps are performed, test sample on which the power of the
discriminant function is checked and anonymous sample which is used
only for classifying the cases. Case assignment and values of the
two first discriminant factors (if there are more than 2 groups) can
be saved in a dataset. Distribution and Lorenz functions
(QUANTILE). Distribution functions with 2 to 100 subintervals, Lorenz
functions, Lorenz curve and Gini coefficients, and the Kolmogorov-Smirnov
test. Factor analysis (FACTOR). Covers a set of principal
component factor analyses (scalar products, covariances, correlations)
and factor analysis of correspondences. For each analysis, it constructs
a matrix representing the relations between variables and computes
its eigenvalues and eigenvectors. Then it calculates the case and/or
variable factors giving for each case and/or variable its ordinate,
its quality of representation and its contributions to the factors.
Factors can be saved in a dataset and a graphic representation of
cases and/or variables in the factor space can be obtained. Active
and passive variables and cases can be distinguished. Linear
regression (REGRESSN). Multiple linear regression analysis: standard
and stepwise. Either a dataset or a correlation matrix may be used
as input. Residuals can be printed with the Durbin-Watson statistic
for their first-order autocorrelation, and they can also be output
for further analyses. Multidimensional scaling (MDSCAL).
This is a non-metric multidimensional scaling procedure for the analysis
of similarities. Operates on a matrix of similarity or dissimilarity
measures and looks for the best geometric representation of the data
in n-dimensional space. The user controls the dimensionality of the
configuration obtained, the distance metric used and the way the ties
(equal values) in the input data should be handled. Multiple
classification analysis (MCA). Examines the relationships between
several predictors and a single dependent variable, and determines
the effect of each predictor before and after adjustment for its inter-correlations
with other predictors. Provides information about bivariate and multivariate
relationships between predictors and the dependent variable. Residuals
can be printed and/or saved in a dataset. Multivariate analysis
of variance (MANOVA). Performs univariate and multivariate analysis
of variance and of covariance, using a general linear model. Up to
eight factors (independent variables) can be used. If more than one
dependent variable is specified, both univariate and multivariate
analyses are performed. The program performs an exact solution with
either equal or unequal numbers of cases in the cells. One-way
analysis of variance (ONEWAY). Descriptive statistics of the dependent
variable within categories of the control variable and one-way analysis
statistics such as: total sum of squares, between means sum of squares,
within groups sum of squares, eta and eta squared (unadjusted and
adjusted) and the F-test value. Partial order scoring (POSCOR).
Calculates ordinal scale scores from interval or ordinal scale variables.
Scores are calculated for each case involved in analysis and they
measure the relative position of the case within the set of cases.
The scores, optionally with other user-specified variables, are output
in the form of an IDAMS dataset. Pearsonian correlation
(PEARSON). Calculates Pearson's r correlation coefficients, covariances,
and regression coefficients. Pairwise or casewise deletion of missing
data can be requested. Output correlation and covariance matrices
can be saved in a file. Rank-ordering of alternatives
(RANK). Determines a reasonable rank-order of alternatives using preference
data and three different ranking procedures, one based on classical
logic and two others based on fuzzy logic. Preference data can represent
either a selection or ranking of alternatives. Two types of individual
preference relations can be specified: weak and strict. With fuzzy
ranking, the data completely determine the results obtained whereas
with classical ranking the user has the possibility of controlling
the calculations. Scatter diagrams (SCAT). Scatter diagrams,
univariate statistics (mean, standard deviation and N) and bivariate
statistics (Pearson's r and regression statistics: coefficient B and
constant A). Searching for structure (SEARCH). A binary
segmentation procedure to develop predictive models. The question
"what dichotomous split on which predictor variable will give the
maximum improvement in the ability to predict values of the dependent
variable" embedded in an iterative scheme, is the basis of the algorithm
used. Univariate and bivariate tables (TABLES). Options
include: (1) univariate simple and cumulative frequency and percentage
distributions; (2) univariate statistics: mean, median, mode, variance,
standard deviation, skewness, kurtosis, minimum, maximum; (3) bivariate
frequency tables with row, column and total percentages; (4) tables
of mean values of an additional variable; (5) bivariate statistics:
t-test of means between pairs of rows, Chi-square, contingency coefficient,
Cramer's V, Kendall's Taus, Gamma, Lambdas, Spearman rho, a number
of statistics for Evidence Based Medicine, and 3 non-parametric tests:
Wilcoxon, Mann-Whitney and Fisher. Typology and ascending classification
(TYPOL). Creates a typology variable as a summary of a large number
of variables both quantitative and qualitative. The user chooses the
initial and final number of groups, the type of distance used, and
the way the initial typology is started. The groups of initial typology
are stabilized using an iterative procedure. The number of groups
can be reduced using an algorithm of hierarchical ascending classification.
A distinction can be made between active variables which participate
in the construction of typology, and passive variables, for which
main statistics are calculated within the groups of the typology.
Interactive multidimensional tables . This component allows
to visualize and customize multidimensional tables with frequencies,
row, column and total percentages, summary statistics (sum, count,
mean, maximum, minimum, variance, standard deviation) of additional
variables, and bivariate statistics. Up to seven variables can be
nested in rows or in columns. Construction of a table can be repeated
for each value of up to three "page" variables. The tables can also
be printed, or exported in free format (comma or tabulation character
delimited) or in HTML format. Interactive graphical exploration
of data. A separate component, GraphID, is available for exploring
data through graphic displays. The basic display is in the form of
multiple scatterplots for different pairs of variables. Additional
information such as histograms and regression lines may be displayed
on each plot. The plots may be manipulated in various ways. For example,
selected cases can be marked in one plot and then highlighted in all
the other plots. Parts of the display may be enlarged ("zoomed").
IDAMS matrices are displayed as three dimensional plots with rows
and columns being represented by two of the axes and the third dimension
being used to show the size of the statistic for each cell. Interactive
time series analysis. Another separate component, TimeSID, provides
a possibility for interactive analysis of time series. It contains
analysis of trends, auto-correlations and cross-correlations, statistical
and graphical analysis of time series values, tests of randomness
and trends, forecasting for short terms, periodograms and estimation
of spectral densities. Series can be transformed by calculating averages,
arithmetic compositions, sequential differences, rates of change,
smoothed by moving averages and decomposed using frequency filters.
IDAMS dataset - the Data
file. The data file input to IDAMS may be any character (ASCII)
fixed format file, i.e. the values for a given variable occupy the
same position (field) in the record for every case. Characteristics
of this file are:
IDAMS dataset - the Dictionary file. The dictionary is
used to describe the data:
The pair of files consisting of a Dictionary file and the Data
file it describes is known as an IDAMS dataset. IDAMS
matrices. Some analysis programs use a square or rectangular matrix
as input rather than the raw data. The square matrix
is used for symmetric arrays of bivariate statistics with a constant
on the diagonal. Only the upper right-hand corner of the matrix is
stored, without the diagonal. The rectangular matrix
is for non-symmetric arrays of values. The meaning of the rows and
columns varies according to the IDAMS program.
With
the exception of WinIDAMS interactive components, execution of an
IDAMS program is launched by a setup. The setup contains information
such as file specifications, program control statements, variable
recoding instructions, etc., separated by IDAMS commands (starting
with a $ character) which identify the kind of information being specified.
The first IDAMS command in the Setup file always identifies the first
program to be executed, e.g. Case selection.
By default all cases from a Data file will be processed in a program
execution. To select a subset, a filter statement is included in the
setup, e.g. INCLUDE V3=1 (include only those cases where variable
3 is equal to 1). Variable selection. Variables are referenced
by their numbers assigned in the dictionary. A set of variables is
specified in a variable list following keywords such as VARS, CONVARS,
OUTVARS. Such variable lists may also include R-variables constructed
by the IDAMS Recode facility (see below), e.g. VARS=(V3-V6,V129,R100,R101).
Transforming/recoding data. A powerful Recode facility permits
the recoding of variables and the construction of new variables. Recoding
instructions are prepared by the user in the IDAMS Recode language.
This includes the possibility of arithmetic computation as well as
the use of several special functions for operations such as the grouping
of values, the creation of "dummy" variables, etc. Conditional statements
are also allowed. Examples of Recode statements for constructing 3
new variables R100, R101 and R102 are: Weighting data. When
complex sampling procedures are used during data collection, it may
be necessary to use different weights for cases during analysis. Such
weights are usually stored as a variable in the Data file. The WEIGHT
parameter is then used in the program control statements to invoke
weighting, e.g. WEIGHT=V5. Treatment of missing data and
"bad" data. Special values for each numeric variable can be identified
as missing data codes and stored in the dictionary. During data processing
missing data is handled through two parameters:
IDAMS does not
use special internal file format for storing data. Any character file
in fixed format can be described by an IDAMS dictionary and then input
to IDAMS. On the other hand, free format data with Tab, comma or semicolon
used as separator can be imported through the WinIDAMS User Interface.
Moreover, the IMPEX program allows a fixed format IDAMS file to be
created from any text file in free or DIF format. Data files created
by IDAMS are always character files in fixed format. Such files can
be used directly by other software along with the appropriate data
descriptive information for that software. Free format files with
Tab, comma or semicolon used as separator can be obtained through
the WinIDAMS User Interface. Moreover, the IMPEX program allows a
fixed format IDAMS file to be exported as a text file in free or DIF
format. IDAMS matrices are stored in a format specific to IDAMS
(described in the "Data in IDAMS" chapter). The IMPEX program can
be used to import/export free format matrices. 1.1  WinIDAMS User Interface
1.2  Data Management Facilities
1.3  Data Analysis Facilities
1.4  Data in IDAMS
1.5  IDAMS Commands and the "Setup" File
$RUN TABLES
$FILES
DICTIN = name of Dictionary file
DATAIN = name of Data file
$SETUP
control statements for TABLES program
$RECODE
variable recoding statements
1.6  Standard IDAMS Features
R100=V4+V5
R101=BRAC(V10,0-15=1,16-60=2,61-98=3,99=9)
IF (MDATA(V3,V4) OR V4 EQ 0) THEN V102=99 ELSE R102=V3*100/V4
The R-variables thus constructed for each case can be
used temporarily in the program being executed or can be saved in
a dataset using the TRANS program.
Normally it is assumed that data have been cleaned prior to analysis.
If this is not the case, then the BADDATA parameter is available for
skipping cases with non-numeric values (including blank fields) in
numeric fields, or for treating such values as missing data. 1.7  Import and Export of Data