IDAMS is a software package
for the validation, manipulation and statistical analysis of data.
It is organized as a collection of data management and analysis facilities
accessible through a user interface and a common control language.
Examples of the types of data that can be processed with IDAMS are:
the answers to questions by respondents in a survey, information about
books in a library, the personal characteristics and performance
of students at a college, measurements from a scientific experiment.
The common features of such data are that they consist of values of
variables for each of a collection of objects/cases (e.g. in a sample
survey, the questions are the variables and the respondents are the
cases). Many different packages and programs exist for aid in the
statistical analysis of such data. One special feature of IDAMS is
that it also provides facilities for extensive data validation (e.g.
code checking and consistency checking) before embarking on analysis.
As far as analysis is concerned, IDAMS performs classical techniques
such as table building, regression analysis, one-way analysis of variance,
discriminant and cluster analysis and also some more advanced techniques
such as factorial analysis of correspondences, partial order scoring,
rank ordering of alternatives and iterative typology. In addition,
WinIDAMS provides for interactive graphical exploration of data, interactive
time series analysis and interactive construction of multidimensional
tables.
The WinIDAMS User
Interface is a multiple document interface. It can display and allow
to work simultaneously with different types of documents in separate
windows. The Interface provides the following:
Aggregating
data (AGGREG). Groups records (e.g. individuals into households)
and outputs a new dataset with one record for each group. The variables
in the new records are summary statistics of specified variables from
the individual records, e.g. the sum, mean, minimum/maximum value.
Building an IDAMS dataset (BUILD). A raw data file (which
may contain multiple records per case) is input along with a dictionary
describing the variables to be selected. BUILD checks for non-numeric
values in numeric fields; blank fields can be recoded to user-specified
numeric values and other non-numerics are reported and replaced by
9s. The output is an IDAMS dataset comprising a data file with a single
record per case and an associated dictionary which describes each
field in the data records. Checking of codes (CHECK). Reports
cases in a data file which have invalid variable values. Valid codes
for each variable are specified by the user on program control statements
and/or taken from code label records in the dictionary. Checking
of consistency (CONCHECK). Reports cases with inconsistencies
between two or more variables. IDAMS Recode statements are used to
specify the logical relationships to be checked. Checking of
data merge (MERCHECK). Checks that the correct records are present
for each case in a multiple record per case data file. It outputs
a file containing equal numbers of records per case. Invalid or duplicate
records can be deleted and missing records can be inserted with user
specified missing values. Correcting data (CORRECT). Updates
a data file by applying corrections to individual variable values
for specified cases. Importing/exporting data (IMPEX). Data
import is aimed at building IDAMS data objects (datasets or matrices)
from data files coming from other software. The aim of data export
is to make possible the use of data and matrix files, stored in or
created by IDAMS, in other packages. Free and DIF format text files
can be imported/exported. Listing and copying files (COPY).
Facility for producing a copy and/or a listing of a file. Listing
datasets (LIST). Values for selected variables (original or recoded)
and/or selected cases can be listed in a variety of formats. Merging
datasets (MERGE). Two datasets can be merged by matching cases
according to a common set of variables called match variables. There
are 4 options for selecting cases for the output dataset: (1) only
cases present in both files (intersection); (2) each case in both
files (union); (3) each case in the first file; (4) each case in the
second file. The user specifies which variables from each of the two
input files are to be output. An option exists for matching a case
from one file with more than one case from the second file, e.g. for
adding household data from one file to each individuals record in
a second file. Sorting and merging files (SORMER). This
is a general purpose sort/merge utility for sorting data into ascending
or descending order on up to 12 sort fields. Up to 16 files may be
merged. Subsetting datasets (SUBSET). Outputs a new dataset
(data and dictionary) containing selected cases and/or variables from
the input dataset. There is an option to check for duplicate cases.
Transforming data (TRANS). Allows to save variables created
with the IDAMS recoding facility in a permanent dataset.
Cluster analysis
(CLUSFIND). Performs cluster analysis by partitioning a set of objects
(cases or variables) into a set of clusters as determined by one of
6 algorithms, 2 based on partitioning around medoids, one based on
fuzzy clustering and the other 3 based on hierarchical clustering.
Configuration analysis (CONFIG). Performs analysis on a
single spatial configuration input, created for example by MDSCAL
program. It has the capability of centering, norming, rotating, translating
dimensions, computing inter-point distances and scalar products. The
configuration can be plotted after each transformation. Discriminant
analysis (DISCRAN). Looks for the best linear discriminant function(s)
of a set of variables which reproduces, as far as possible, an a priori
grouping of the cases. It uses a stepwise procedure, i.e. in each
step the most powerful variable is entered. Three sample of cases
can be distinguished: basic sample on which the main discriminant
analysis steps are performed, test sample on which the power of the
discriminant function is checked and anonymous sample which is used
only for classifying the cases. Distribution and Lorenz functions
(QUANTILE). Distribution functions with 2 to 100 subintervals, Lorenz
functions, Lorenz curve and Gini coefficients, and the Kolmogorov-Smirnov
test. Factor analysis (FACTOR). Covers a set of principal
component factor analyses (scalar products, covariances, correlations)
and factor analysis of correspondences. For each analysis it constructs
a matrix representing the relations between variables and computes
its eigenvalues and eigenvectors. Then it calculates the case and/or
variable factors giving for each case and/or variable its ordinate,
its quality of representation and its contributions to the factors.
A graphic representation of cases and/or variables in the factor space
can be obtained. Active and passive variables and cases can be distinguished.
Linear regression (REGRESSN). Provides a general multiple
regression capability for standard and stepwise linear regression
analysis. Either a dataset or a correlation matrix may be used as
input. Residuals can be printed with the Durbin-Watson statistic for
their first-order autocorrelation, and they can also be output, e.g.
for 2nd stage analysis. Multidimensional scaling (MDSCAL).
This is a non-metric multidimensional scaling procedure for the analysis
of similarities. Operates on a matrix of similarity or dissimilarity
measures and is designed to find the best geometric representation
of the data in the space. The user controls the dimensionality of
the configuration obtained, the type of distance metric used and the
way the ties (equal values) in the input data should be handled. Multiple
classification analysis (MCA). Examines the relationships between
several predictor (control) variables and a single dependent variable,
and determines the effect of each predictor before and after adjustment
for its inter-correlations with other predictors. Provides information
about bivariate and multivariate relationships between predictors
and the dependent variable. One-way analysis of variance
(ONEWAY). Descriptive statistics within categories of the control
variable and one-way analysis statistics such as: total sum of squares,
between means sum of squares, within groups sum of squares, eta and
eta squared (unadjusted and adjusted) and the F-test value. Partial
order scoring (POSCOR). Calculates ordinal scale scores from interval
or ordinal scale variables. Scores are calculated for each case involved
in analysis and they measure the relative position of the case within
the set of cases. The scores, optionally with other user-specified
variables, are output in the form of an IDAMS dataset. Pearsonian
correlation (PEARSON). Calculates Pearsons r correlation coefficients,
covariances, and regression coefficients for raw scores. Pairwise
or casewise deletion of missing data can be requested. Output correlation
matrices can be saved in a file. Rank-ordering of alternatives
(RANK). Determines a reasonable rank-order of alternatives using preference
data and three different ranking procedures, one based on classical
logic and two others based on fuzzy logic. Preference data can represent
either a selection or ranking of alternatives. Two types of individual
preference relations can be specified: weak and strict. With fuzzy
ranking, the data completely determine the results obtained whereas
with classical ranking the user has the possibility of controlling
the calculations. Scatter diagrams (SCAT). Scatter diagrams,
univariate statistics (mean, standard deviation and N) and bivariate
statistics (Pearsons r and regression statistics: coefficient B and
constant A). Searching for structure (SEARCH). A binary
segmentation procedure to develop predictive models. The question
"what dichotomous split on which predictor variable will give the
maximum improvement in the ability to predict values of the dependent
variable" embedded in an iterative scheme, is the basis of the algorithm
used. Univariate and bivariate tables (TABLES). Options
include (1) univariate simple and cumulative frequency and percentage
distributions (2) univariate statistics: mean, median, mode, variance,
standard deviation, skewness, kurtosis (3) 2-way and 3-way frequency
tables with row, column and total percentages (4) tables of mean values
of a dependent variable (5) bivariate statistics: t-test of means
between pairs of rows, Chi-square, contingency coefficient, Cramers
V, Kendalls Taus, Gamma, Lambdas, Spearman rho, and 3 non-parametric
tests: Wilcoxon, Mann-Whitney and Fisher. Typology and ascending
classification (TYPOL). Creates a typology variable as a summary
of a large number of variables both quantitative and qualitative.
The user chooses the initial and final number of groups, the type
of distance used, and the way the initial typology is started. The
groups of initial typology are stabilized using an iterative procedure.
The number of groups can be reduced using an algorithm of hierarchical
ascending classification. A distinction can be made between active
variables which participate in the construction of typology, and passive
variables, for which main statistics are calculated within the groups
of the typology. Interactive graphical exploration of data
(GraphID). A separate component, GraphID, is available for exploring
data through graphic displays. The basic display is in the form of
multiple scatterplots for different pairs of variables. Additional
information such as histograms and regression lines may be displayed
on each plot. The plots may be manipulated in various ways. For example,
selected cases can be marked in one plot and then highlighted in all
the other plots. Parts of the display may be enlarged ("zoomed").
IDAMS matrices are displayed as three dimensional plots with rows
and columns being represented by two of the axes and the third dimension
being used to show the size of the statistic for each cell. Interactive
time series analysis (TimeSID). Another separate component, TimeSID,
provides a possibility for interactive analysis of time series. It
contains analysis of trends, auto-correlations and cross-correlations,
statistical and graphical analysis of time series values, tests of
randomness and trends, and forecasting for short terms. Series can
be transformed by calculating averages, arithmetic compositions, sequential
differences, rates of change and smoothed by moving averages. Interactive
multidimensional tables . The interactive "Multidimensional Tables"
component allows you to visualize and customize multidimensional tables
with frequencies, row, column and total percentages, summary statistics
(sum, count, mean, maximum, minimum, variance, standard deviation)
of additional variables, and bivariate statistics. Up to seven variables
can be nested in rows or in columns. Construction of tables with specified
row, column and cell variables can be repeated for each value of up
to three "page" variables. The tables can also be printed, or exported
in free format (comma or tabulation character delimited) or in HTML
format.
IDAMS dataset - the data
file. The data file input to IDAMS may be any character (ASCII)
fixed format file, i.e. the values for a given variable (case) occupy
the same position in the record for every case. Characteristics
of the data file are:
IDAMS dataset - the dictionary. The dictionary is used
to describe the data:
The couple of a dictionary and the data it describes is known
as an IDAMS dataset . IDAMS matrices. Some analysis
programs use a square or rectangular matrix of as input rather than
a raw data file. The square matrix is used for symmetric
arrays of bivariate statistics with a constant on the diagonal. Only
the upper right-hand corner of the matrix is stored, without the diagonal.
The rectangular matrix is for non-symmetric arrays
of values. The meaning of the rows and columns varies according to
the IDAMS program.
Data management and analysis facilities (described in 1.3 and 1.4,
except WinIDAMS interactive components) are available in batch components
called programs. Execution of an IDAMS program is launched by a "setup"
file. This contains specifications such as file definitions, program
control statements, variable recoding instructions, etc., separated
by IDAMS commands (starting with a $ character) which identify the
kind of information being specified. The first IDAMS command in the
setup file always identifies the first program to be executed, e.g.
Case selection.
By default all cases in a file will be processed in a program execution.
To select a subset, a filter statement is included in the setup file,
e.g. INCLUDE V3=1 (include only those cases where variable 3 is equal
to 1). Variable selection. Variables are referenced by their
variable numbers assigned in the dictionary. A set of variables is
specified in a variable list following keywords such as VARS, CONVARS,
OUTVARS. Such variable lists may also include R-variables constructed
by the IDAMS Recode facility (see below), e.g. VARS=(V3-V6,V129,R100,R101).
Transforming/Recoding data. A powerful Recode facility permits
the recoding of variables and the construction of new variables. Recoding
instructions are prepared by the user in the IDAMS Recode language.
This includes the possibility of arithmetic computation as well as
the use of several special functions for operations such as the grouping
of values, the creation of "dummy" variables, etc. Conditional statements
are also allowed. Examples of Recode statements for constructing 3
new variables R100, R101 and R102 are: Weighting data.
When complex sampling procedures are used during data collection,
it may be necessary to use different weights for cases during analysis.
Such weights are usually stored as a variable in the data file. The
WEIGHT parameter is then used in the program control statements to
invoke weighting, e.g. WEIGHT=V5. Treatment of missing data
and "bad" data. Special values for each numeric variable can be
identified as missing data codes and stored in the dictionary. During
data processing missing data is handled through two parameters:
IDAMS does not
use special internal file format for storing data. Any fixed format
character data file can be described by an IDAMS dictionary and then
input to IDAMS. Thus if data from other software can be output into
a fixed format text file, then there is no need for a special import
function. Tab delimited and comma separated free format data files
can be imported directly through the WinIDAMS User Interface. Moreover,
the IMPEX program allows a fixed format IDAMS data file to be created
from a text file in free or DIF format. Data files created by IDAMS
are always fixed format character files. Such files can be input directly
to other software along with the appropriate data descriptive information
for that software. Tab delimited and comma separated free format data
files can be exported directly through the WinIDAMS User Interface.
Moreover, the IMPEX program allows a fixed format IDAMS data file
to be exported as a text file in free or DIF format. IDAMS matrices
are stored in a format specific to IDAMS (described in the "Data in
IDAMS" chapter). The IMPEX program can be used to import/export free
format matrices. 1.1  What is IDAMS
1.2  WinIDAMS User Interface
1.3  Data Management Facilities
1.4  Data Analysis Facilities
1.5  Data in IDAMS
1.6  IDAMS Commands and the ``Setup'' File
$RUN TABLES
$FILES
DICTIN = name of dictionary file
DATAIN = name of data file
$SETUP
program control statements for TABLES program
$RECODE
variable transformation statements
1.7  Standard IDAMS Features
R100=V4+V5
R101=BRAC(V10,0-15=1,16-60=2,61-98=3,99=9)
IF (MDATA(V3,V4) OR V4 EQ 0) THEN V102=99 ELSE R102=V3*100/V4
The R-variables thus constructed for each case can be
used temporarily in the program being executed or can be saved in
a dataset using the TRANS program.
Normally it is assumed that data have been cleaned prior to analysis.
If this is not the case, then the BADDATA parameter is available for
skipping cases with non-numeric values or blanks in numeric fields,
or for treating such values as missing data. 1.8  Import and Export of Data