Introduction

1    Introduction

IDAMS is a software package for the validation, manipulation and statistical analysis of data. It is organized as a collection of data management and analysis facilities accessible through a user interface and a common control language. Examples of the types of data that can be processed with IDAMS are: the answers to questions by respondents in a survey, information about books in a library, the personal characteristics and performance of students at a college, measurements from a scientific experiment. The common features of such data are that they consist of values of variables for each of a collection of objects/cases (e.g. in a sample survey, the questions correspond to the variables and the respondents to the cases).

Many different packages and programs exist for aid in the statistical analysis of such data. One special feature of IDAMS is that it also provides facilities for extensive data validation (e.g. code checking and consistency checking) before embarking on analysis. As far as analysis is concerned, IDAMS performs classical techniques such as table building, regression analysis, one-way analysis of variance, discriminant and cluster analysis and also some more advanced techniques such as principal components factor analysis and analysis of correspondences, partial order scoring, rank ordering of alternatives, segmentation and iterative typology. In addition, WinIDAMS provides for interactive construction of multidimensional tables, interactive graphical exploration of data and interactive time series analysis.


1.1  WinIDAMS User Interface

It is a multiple document interface (MDI) which allows to work simultaneously with different types of documents in separate windows.

The Interface provides the following:


1.2  Data Management Facilities

Aggregating data (AGGREG). Allows the grouping of records from a number of cases into one record and to output a new dataset with one record for each group, for example, records representing members of a household are grouped into household representing record. The variables in the new records are summary statistics of specified variables from the individual records, e.g. the sum, mean, minimum/maximum value.

Building an IDAMS dataset (BUILD). A raw data file (which may contain multiple records per case) is input along with a dictionary describing the variables to be selected. BUILD checks for non-numeric values in numeric fields; blank fields can be recoded to user-specified numeric values and other non-numerics are reported and replaced by 9's. The output is an IDAMS dataset comprising a Data file with a single record per case and a dictionary which describes each field in the data records.

Checking of codes (CHECK). Reports cases which have invalid variable values. Valid codes for each variable are specified by the user and/or taken from the dictionary.

Checking of consistency (CONCHECK). Reports cases with inconsistencies between two or more variables. IDAMS Recode statements are used to specify the logical relationships to be checked.

Checking the merging of records (MERCHECK). Checks that the correct records are present for each case in a file with multiple records per case. It outputs a file containing equal numbers of records per case. Invalid or duplicate records can be deleted and missing records can be inserted with missing values specified by the user.

Correcting data (CORRECT). Updates a Data file by applying corrections to individual variable values for specified cases. The Results file contains a written trace of corrections allowing them to be archived.

Importing/exporting data (IMPEX). Import is aimed at building IDAMS datasets or matrices from files coming from other software. The aim of export is to make possible the use of Data and Matrix files, stored in or created by IDAMS, in other packages. Free and DIF format text files can be imported/exported.

Listing datasets (LIST). Values for selected variables (original or recoded) and/or selected cases can be listed in the column format.

Merging datasets (MERGE). Two datasets can be merged by matching cases according to a common set of variables called match variables. There are 4 options for selecting cases for the output dataset: (1) only cases present in both files (intersection); (2) cases present in either file (union); (3) each case in the first file; (4) each case in the second file. The user specifies which variables from each of the two input files are to be output. An option exists for matching a case from one file with more than one case from the second file, e.g. for adding household data from one file to each individual's record in a second file.

Sorting and merging files (SORMER). This is a general purpose utility for sorting data into ascending or descending order on up to 12 fields. Up to 16 files may be merged.

Subsetting datasets (SUBSET). Outputs a new dataset (Data and Dictionary files) containing selected cases and/or variables from the input dataset. There is an option to check for duplicate cases.

Transforming data (TRANS). Allows variables created with the IDAMS Recode facility to be saved in a permanent dataset.


1.3  Data Analysis Facilities

Cluster analysis (CLUSFIND). Performs cluster analysis by partitioning a set of objects (cases or variables) into a set of clusters as determined by one of 6 algorithms, 2 based on partitioning around medoids, one based on fuzzy clustering and the other 3 based on hierarchical clustering.

Configuration analysis (CONFIG). Performs analysis on a single input configuration, created for example by MDSCAL program. It has the capability of centering, norming, rotating, translating dimensions, computing inter-point distances and scalar products. The configuration can be plotted after each transformation.

Discriminant analysis (DISCRAN). Looks for the best linear discriminant function(s) of a set of variables which reproduces, as far as possible, an a priori grouping of the cases. It uses a stepwise procedure, i.e. in each step the most powerful variable is entered. Three samples of cases can be distinguished: basic sample on which the main discriminant analysis steps are performed, test sample on which the power of the discriminant function is checked and anonymous sample which is used only for classifying the cases. Case assignment and values of the two first discriminant factors (if there are more than 2 groups) can be saved in a dataset.

Distribution and Lorenz functions (QUANTILE). Distribution functions with 2 to 100 subintervals, Lorenz functions, Lorenz curve and Gini coefficients, and the Kolmogorov-Smirnov test.

Factor analysis (FACTOR). Covers a set of principal component factor analyses (scalar products, covariances, correlations) and factor analysis of correspondences. For each analysis, it constructs a matrix representing the relations between variables and computes its eigenvalues and eigenvectors. Then it calculates the case and/or variable factors giving for each case and/or variable its ordinate, its quality of representation and its contributions to the factors. Factors can be saved in a dataset and a graphic representation of cases and/or variables in the factor space can be obtained. Active and passive variables and cases can be distinguished.

Linear regression (REGRESSN). Multiple linear regression analysis: standard and stepwise. Either a dataset or a correlation matrix may be used as input. Residuals can be printed with the Durbin-Watson statistic for their first-order autocorrelation, and they can also be output for further analyses.

Multidimensional scaling (MDSCAL). This is a non-metric multidimensional scaling procedure for the analysis of similarities. Operates on a matrix of similarity or dissimilarity measures and looks for the best geometric representation of the data in n-dimensional space. The user controls the dimensionality of the configuration obtained, the distance metric used and the way the ties (equal values) in the input data should be handled.

Multiple classification analysis (MCA). Examines the relationships between several predictors and a single dependent variable, and determines the effect of each predictor before and after adjustment for its inter-correlations with other predictors. Provides information about bivariate and multivariate relationships between predictors and the dependent variable. Residuals can be printed and/or saved in a dataset.

Multivariate analysis of variance (MANOVA). Performs univariate and multivariate analysis of variance and of covariance, using a general linear model. Up to eight factors (independent variables) can be used. If more than one dependent variable is specified, both univariate and multivariate analyses are performed. The program performs an exact solution with either equal or unequal numbers of cases in the cells.

One-way analysis of variance (ONEWAY). Descriptive statistics of the dependent variable within categories of the control variable and one-way analysis statistics such as: total sum of squares, between means sum of squares, within groups sum of squares, eta and eta squared (unadjusted and adjusted) and the F-test value.

Partial order scoring (POSCOR). Calculates ordinal scale scores from interval or ordinal scale variables. Scores are calculated for each case involved in analysis and they measure the relative position of the case within the set of cases. The scores, optionally with other user-specified variables, are output in the form of an IDAMS dataset.

Pearsonian correlation (PEARSON). Calculates Pearson's r correlation coefficients, covariances, and regression coefficients. Pairwise or casewise deletion of missing data can be requested. Output correlation and covariance matrices can be saved in a file.

Rank-ordering of alternatives (RANK). Determines a reasonable rank-order of alternatives using preference data and three different ranking procedures, one based on classical logic and two others based on fuzzy logic. Preference data can represent either a selection or ranking of alternatives. Two types of individual preference relations can be specified: weak and strict. With fuzzy ranking, the data completely determine the results obtained whereas with classical ranking the user has the possibility of controlling the calculations.

Scatter diagrams (SCAT). Scatter diagrams, univariate statistics (mean, standard deviation and N) and bivariate statistics (Pearson's r and regression statistics: coefficient B and constant A).

Searching for structure (SEARCH). A binary segmentation procedure to develop predictive models. The question "what dichotomous split on which predictor variable will give the maximum improvement in the ability to predict values of the dependent variable" embedded in an iterative scheme, is the basis of the algorithm used.

Univariate and bivariate tables (TABLES). Options include: (1) univariate simple and cumulative frequency and percentage distributions; (2) univariate statistics: mean, median, mode, variance, standard deviation, skewness, kurtosis, minimum, maximum; (3) bivariate frequency tables with row, column and total percentages; (4) tables of mean values of an additional variable; (5) bivariate statistics: t-test of means between pairs of rows, Chi-square, contingency coefficient, Cramer's V, Kendall's Taus, Gamma, Lambdas, Spearman rho, a number of statistics for Evidence Based Medicine, and 3 non-parametric tests: Wilcoxon, Mann-Whitney and Fisher.

Typology and ascending classification (TYPOL). Creates a typology variable as a summary of a large number of variables both quantitative and qualitative. The user chooses the initial and final number of groups, the type of distance used, and the way the initial typology is started. The groups of initial typology are stabilized using an iterative procedure. The number of groups can be reduced using an algorithm of hierarchical ascending classification. A distinction can be made between active variables which participate in the construction of typology, and passive variables, for which main statistics are calculated within the groups of the typology.

Interactive multidimensional tables . This component allows to visualize and customize multidimensional tables with frequencies, row, column and total percentages, summary statistics (sum, count, mean, maximum, minimum, variance, standard deviation) of additional variables, and bivariate statistics. Up to seven variables can be nested in rows or in columns. Construction of a table can be repeated for each value of up to three "page" variables. The tables can also be printed, or exported in free format (comma or tabulation character delimited) or in HTML format.

Interactive graphical exploration of data. A separate component, GraphID, is available for exploring data through graphic displays. The basic display is in the form of multiple scatterplots for different pairs of variables. Additional information such as histograms and regression lines may be displayed on each plot. The plots may be manipulated in various ways. For example, selected cases can be marked in one plot and then highlighted in all the other plots. Parts of the display may be enlarged ("zoomed"). IDAMS matrices are displayed as three dimensional plots with rows and columns being represented by two of the axes and the third dimension being used to show the size of the statistic for each cell.

Interactive time series analysis. Another separate component, TimeSID, provides a possibility for interactive analysis of time series. It contains analysis of trends, auto-correlations and cross-correlations, statistical and graphical analysis of time series values, tests of randomness and trends, forecasting for short terms, periodograms and estimation of spectral densities. Series can be transformed by calculating averages, arithmetic compositions, sequential differences, rates of change, smoothed by moving averages and decomposed using frequency filters.


1.4  Data in IDAMS

IDAMS dataset - the Data file. The data file input to IDAMS may be any character (ASCII) fixed format file, i.e. the values for a given variable occupy the same position (field) in the record for every case. Characteristics of this file are:

IDAMS dataset - the Dictionary file. The dictionary is used to describe the data:

The pair of files consisting of a Dictionary file and the Data file it describes is known as an IDAMS dataset.

IDAMS matrices. Some analysis programs use a square or rectangular matrix as input rather than the raw data.

The square matrix is used for symmetric arrays of bivariate statistics with a constant on the diagonal. Only the upper right-hand corner of the matrix is stored, without the diagonal.

The rectangular matrix is for non-symmetric arrays of values. The meaning of the rows and columns varies according to the IDAMS program.


1.5  IDAMS Commands and the "Setup" File

With the exception of WinIDAMS interactive components, execution of an IDAMS program is launched by a setup. The setup contains information such as file specifications, program control statements, variable recoding instructions, etc., separated by IDAMS commands (starting with a $ character) which identify the kind of information being specified. The first IDAMS command in the Setup file always identifies the first program to be executed, e.g.


     $RUN TABLES
     $FILES
     DICTIN = name of Dictionary file
     DATAIN = name of Data file
     $SETUP
         control statements for TABLES program
     $RECODE
         variable recoding statements

1.6  Standard IDAMS Features

Case selection. By default all cases from a Data file will be processed in a program execution. To select a subset, a filter statement is included in the setup, e.g. INCLUDE V3=1 (include only those cases where variable 3 is equal to 1).

Variable selection. Variables are referenced by their numbers assigned in the dictionary. A set of variables is specified in a variable list following keywords such as VARS, CONVARS, OUTVARS. Such variable lists may also include R-variables constructed by the IDAMS Recode facility (see below), e.g. VARS=(V3-V6,V129,R100,R101).

Transforming/recoding data. A powerful Recode facility permits the recoding of variables and the construction of new variables. Recoding instructions are prepared by the user in the IDAMS Recode language. This includes the possibility of arithmetic computation as well as the use of several special functions for operations such as the grouping of values, the creation of "dummy" variables, etc. Conditional statements are also allowed. Examples of Recode statements for constructing 3 new variables R100, R101 and R102 are:


     R100=V4+V5
     R101=BRAC(V10,0-15=1,16-60=2,61-98=3,99=9)
     IF (MDATA(V3,V4) OR V4 EQ 0) THEN V102=99 ELSE R102=V3*100/V4
The R-variables thus constructed for each case can be used temporarily in the program being executed or can be saved in a dataset using the TRANS program.

Weighting data. When complex sampling procedures are used during data collection, it may be necessary to use different weights for cases during analysis. Such weights are usually stored as a variable in the Data file. The WEIGHT parameter is then used in the program control statements to invoke weighting, e.g. WEIGHT=V5.

Treatment of missing data and "bad" data. Special values for each numeric variable can be identified as missing data codes and stored in the dictionary. During data processing missing data is handled through two parameters:

Normally it is assumed that data have been cleaned prior to analysis. If this is not the case, then the BADDATA parameter is available for skipping cases with non-numeric values (including blank fields) in numeric fields, or for treating such values as missing data.


1.7  Import and Export of Data

IDAMS does not use special internal file format for storing data. Any character file in fixed format can be described by an IDAMS dictionary and then input to IDAMS. On the other hand, free format data with Tab, comma or semicolon used as separator can be imported through the WinIDAMS User Interface. Moreover, the IMPEX program allows a fixed format IDAMS file to be created from any text file in free or DIF format.

Data files created by IDAMS are always character files in fixed format. Such files can be used directly by other software along with the appropriate data descriptive information for that software. Free format files with Tab, comma or semicolon used as separator can be obtained through the WinIDAMS User Interface. Moreover, the IMPEX program allows a fixed format IDAMS file to be exported as a text file in free or DIF format.

IDAMS matrices are stored in a format specific to IDAMS (described in the "Data in IDAMS" chapter). The IMPEX program can be used to import/export free format matrices.


1.8  Exchange of Data Between CDS/ISIS and IDAMS

There is a separate program, WinIDIS, which prepares data description and performs data transfer between IDAMS and CDS/ISIS (the UNESCO software for database management and information retrieval). Such transfer is controlled by IDAMS and ISIS data description files (the IDAMS dictionary and the CDS/ISIS Field Definition Table). When going from ISIS to IDAMS, a new IDAMS Dictionary and Data files are always constructed and they can be merged with other data using IDAMS data management facilities. When going from IDAMS to ISIS, there are three possibilities: (1) a completely new data base can be constructed, (2) transferred records can be added to an existing data base as new data base records, (3) records of an existing data base can be updated with the transferred data.


1.9  Structure of this Manual

All the general features of IDAMS, including the Recode facility, are described in Part 1 of this Manual.

Part 2 includes installation instructions, description of files and folders used in WinIDAMS, a section entitled "Getting Started" which takes a user through the steps required to perform simple task, and description of the WinIDAMS User Interface.

In-depth descriptions of each IDAMS program are given in Parts 3 and 4 . These write-ups contains the following sections:

    General Description. A statement of the primary purpose of the program.

    Standard IDAMS Features. Statements about the case and variable selection possibilities, data transformation, weighting capabilities, and missing data handling.

    Results. Details of results destined to be printed (or reviewed on the screen).

    Description of output and input files. One section for each IDAMS dataset, each matrix and each other input or output file, giving a description of their contents.

    Setup Structure. A designation of the file specifications, IDAMS commands, and program control statements needed to execute the program.

    Program Control Statements. The parameters and/or formats of each of the program control statements with an example of each type.

    Restrictions. A summary of the program limitations.

    Examples. Examples of complete sets of control statements for executing the program.

Part 5 provides description of WinIDAMS interactive components for construction of multidimensional tables, for graphical exploration of data and for time series analysis.

Part 6 provides details of statistical techniques, formulas and bibliographical references for all analysis programs.

Finally, errors issued by IDAMS programs are summarized in the Appendix.