Introduction

1    Introduction


1.1  What is IDAMS

IDAMS is a software package for the validation, manipulation and statistical analysis of data. It is organized as a collection of data management and analysis facilities accessible through a user interface and a common control language. Examples of the types of data that can be processed with IDAMS are: the answers to questions by respondents in a survey, information about books in a library, the personal characteristics and performance of students at a college, measurements from a scientific experiment. The common features of such data are that they consist of values of variables for each of a collection of objects/cases (e.g. in a sample survey, the questions are the variables and the respondents are the cases).

Many different packages and programs exist for aid in the statistical analysis of such data. One special feature of IDAMS is that it also provides facilities for extensive data validation (e.g. code checking and consistency checking) before embarking on analysis. As far as analysis is concerned, IDAMS performs classical techniques such as table building, regression analysis, one-way analysis of variance, discriminant and cluster analysis and also some more advanced techniques such as factorial analysis of correspondences, partial order scoring, rank ordering of alternatives and iterative typology. In addition, WinIDAMS provides for interactive graphical exploration of data, interactive time series analysis and interactive construction of multidimensional tables.


1.2  WinIDAMS User Interface

The WinIDAMS User Interface is a multiple document interface. It can display and allow to work simultaneously with different types of documents in separate windows. The Interface provides the following:


1.3  Data Management Facilities

Aggregating data (AGGREG). Groups records (e.g. individuals into households) and outputs a new dataset with one record for each group. The variables in the new records are summary statistics of specified variables from the individual records, e.g. the sum, mean, minimum/maximum value.

Building an IDAMS dataset (BUILD). A raw data file (which may contain multiple records per case) is input along with a dictionary describing the variables to be selected. BUILD checks for non-numeric values in numeric fields; blank fields can be recoded to user-specified numeric values and other non-numerics are reported and replaced by 9s. The output is an IDAMS dataset comprising a data file with a single record per case and an associated dictionary which describes each field in the data records.

Checking of codes (CHECK). Reports cases in a data file which have invalid variable values. Valid codes for each variable are specified by the user on program control statements and/or taken from code label records in the dictionary.

Checking of consistency (CONCHECK). Reports cases with inconsistencies between two or more variables. IDAMS Recode statements are used to specify the logical relationships to be checked.

Checking of data merge (MERCHECK). Checks that the correct records are present for each case in a multiple record per case data file. It outputs a file containing equal numbers of records per case. Invalid or duplicate records can be deleted and missing records can be inserted with user specified missing values.

Correcting data (CORRECT). Updates a data file by applying corrections to individual variable values for specified cases.

Importing/exporting data (IMPEX). Data import is aimed at building IDAMS data objects (datasets or matrices) from data files coming from other software. The aim of data export is to make possible the use of data and matrix files, stored in or created by IDAMS, in other packages. Free and DIF format text files can be imported/exported.

Listing and copying files (COPY). Facility for producing a copy and/or a listing of a file.

Listing datasets (LIST). Values for selected variables (original or recoded) and/or selected cases can be listed in a variety of formats.

Merging datasets (MERGE). Two datasets can be merged by matching cases according to a common set of variables called match variables. There are 4 options for selecting cases for the output dataset: (1) only cases present in both files (intersection); (2) each case in both files (union); (3) each case in the first file; (4) each case in the second file. The user specifies which variables from each of the two input files are to be output. An option exists for matching a case from one file with more than one case from the second file, e.g. for adding household data from one file to each individuals record in a second file.

Sorting and merging files (SORMER). This is a general purpose sort/merge utility for sorting data into ascending or descending order on up to 12 sort fields. Up to 16 files may be merged.

Subsetting datasets (SUBSET). Outputs a new dataset (data and dictionary) containing selected cases and/or variables from the input dataset. There is an option to check for duplicate cases.

Transforming data (TRANS). Allows to save variables created with the IDAMS recoding facility in a permanent dataset.


1.4  Data Analysis Facilities

Cluster analysis (CLUSFIND). Performs cluster analysis by partitioning a set of objects (cases or variables) into a set of clusters as determined by one of 6 algorithms, 2 based on partitioning around medoids, one based on fuzzy clustering and the other 3 based on hierarchical clustering.

Configuration analysis (CONFIG). Performs analysis on a single spatial configuration input, created for example by MDSCAL program. It has the capability of centering, norming, rotating, translating dimensions, computing inter-point distances and scalar products. The configuration can be plotted after each transformation.

Discriminant analysis (DISCRAN). Looks for the best linear discriminant function(s) of a set of variables which reproduces, as far as possible, an a priori grouping of the cases. It uses a stepwise procedure, i.e. in each step the most powerful variable is entered. Three sample of cases can be distinguished: basic sample on which the main discriminant analysis steps are performed, test sample on which the power of the discriminant function is checked and anonymous sample which is used only for classifying the cases.

Distribution and Lorenz functions (QUANTILE). Distribution functions with 2 to 100 subintervals, Lorenz functions, Lorenz curve and Gini coefficients, and the Kolmogorov-Smirnov test.

Factor analysis (FACTOR). Covers a set of principal component factor analyses (scalar products, covariances, correlations) and factor analysis of correspondences. For each analysis it constructs a matrix representing the relations between variables and computes its eigenvalues and eigenvectors. Then it calculates the case and/or variable factors giving for each case and/or variable its ordinate, its quality of representation and its contributions to the factors. A graphic representation of cases and/or variables in the factor space can be obtained. Active and passive variables and cases can be distinguished.

Linear regression (REGRESSN). Provides a general multiple regression capability for standard and stepwise linear regression analysis. Either a dataset or a correlation matrix may be used as input. Residuals can be printed with the Durbin-Watson statistic for their first-order autocorrelation, and they can also be output, e.g. for 2nd stage analysis.

Multidimensional scaling (MDSCAL). This is a non-metric multidimensional scaling procedure for the analysis of similarities. Operates on a matrix of similarity or dissimilarity measures and is designed to find the best geometric representation of the data in the space. The user controls the dimensionality of the configuration obtained, the type of distance metric used and the way the ties (equal values) in the input data should be handled.

Multiple classification analysis (MCA). Examines the relationships between several predictor (control) variables and a single dependent variable, and determines the effect of each predictor before and after adjustment for its inter-correlations with other predictors. Provides information about bivariate and multivariate relationships between predictors and the dependent variable.

One-way analysis of variance (ONEWAY). Descriptive statistics within categories of the control variable and one-way analysis statistics such as: total sum of squares, between means sum of squares, within groups sum of squares, eta and eta squared (unadjusted and adjusted) and the F-test value.

Partial order scoring (POSCOR). Calculates ordinal scale scores from interval or ordinal scale variables. Scores are calculated for each case involved in analysis and they measure the relative position of the case within the set of cases. The scores, optionally with other user-specified variables, are output in the form of an IDAMS dataset.

Pearsonian correlation (PEARSON). Calculates Pearsons r correlation coefficients, covariances, and regression coefficients for raw scores. Pairwise or casewise deletion of missing data can be requested. Output correlation matrices can be saved in a file.

Rank-ordering of alternatives (RANK). Determines a reasonable rank-order of alternatives using preference data and three different ranking procedures, one based on classical logic and two others based on fuzzy logic. Preference data can represent either a selection or ranking of alternatives. Two types of individual preference relations can be specified: weak and strict. With fuzzy ranking, the data completely determine the results obtained whereas with classical ranking the user has the possibility of controlling the calculations.

Scatter diagrams (SCAT). Scatter diagrams, univariate statistics (mean, standard deviation and N) and bivariate statistics (Pearsons r and regression statistics: coefficient B and constant A).

Searching for structure (SEARCH). A binary segmentation procedure to develop predictive models. The question "what dichotomous split on which predictor variable will give the maximum improvement in the ability to predict values of the dependent variable" embedded in an iterative scheme, is the basis of the algorithm used.

Univariate and bivariate tables (TABLES). Options include (1) univariate simple and cumulative frequency and percentage distributions (2) univariate statistics: mean, median, mode, variance, standard deviation, skewness, kurtosis (3) 2-way and 3-way frequency tables with row, column and total percentages (4) tables of mean values of a dependent variable (5) bivariate statistics: t-test of means between pairs of rows, Chi-square, contingency coefficient, Cramers V, Kendalls Taus, Gamma, Lambdas, Spearman rho, and 3 non-parametric tests: Wilcoxon, Mann-Whitney and Fisher.

Typology and ascending classification (TYPOL). Creates a typology variable as a summary of a large number of variables both quantitative and qualitative. The user chooses the initial and final number of groups, the type of distance used, and the way the initial typology is started. The groups of initial typology are stabilized using an iterative procedure. The number of groups can be reduced using an algorithm of hierarchical ascending classification. A distinction can be made between active variables which participate in the construction of typology, and passive variables, for which main statistics are calculated within the groups of the typology.

Interactive graphical exploration of data (GraphID). A separate component, GraphID, is available for exploring data through graphic displays. The basic display is in the form of multiple scatterplots for different pairs of variables. Additional information such as histograms and regression lines may be displayed on each plot. The plots may be manipulated in various ways. For example, selected cases can be marked in one plot and then highlighted in all the other plots. Parts of the display may be enlarged ("zoomed"). IDAMS matrices are displayed as three dimensional plots with rows and columns being represented by two of the axes and the third dimension being used to show the size of the statistic for each cell.

Interactive time series analysis (TimeSID). Another separate component, TimeSID, provides a possibility for interactive analysis of time series. It contains analysis of trends, auto-correlations and cross-correlations, statistical and graphical analysis of time series values, tests of randomness and trends, and forecasting for short terms. Series can be transformed by calculating averages, arithmetic compositions, sequential differences, rates of change and smoothed by moving averages.

Interactive multidimensional tables . The interactive "Multidimensional Tables" component allows you to visualize and customize multidimensional tables with frequencies, row, column and total percentages, summary statistics (sum, count, mean, maximum, minimum, variance, standard deviation) of additional variables, and bivariate statistics. Up to seven variables can be nested in rows or in columns. Construction of tables with specified row, column and cell variables can be repeated for each value of up to three "page" variables. The tables can also be printed, or exported in free format (comma or tabulation character delimited) or in HTML format.


1.5  Data in IDAMS

IDAMS dataset - the data file. The data file input to IDAMS may be any character (ASCII) fixed format file, i.e. the values for a given variable (case) occupy the same position in the record for every case.

Characteristics of the data file are:

IDAMS dataset - the dictionary. The dictionary is used to describe the data:

The couple of a dictionary and the data it describes is known as an IDAMS dataset .

IDAMS matrices. Some analysis programs use a square or rectangular matrix of as input rather than a raw data file.

The square matrix is used for symmetric arrays of bivariate statistics with a constant on the diagonal. Only the upper right-hand corner of the matrix is stored, without the diagonal.

The rectangular matrix is for non-symmetric arrays of values. The meaning of the rows and columns varies according to the IDAMS program.


1.6  IDAMS Commands and the ``Setup'' File

Data management and analysis facilities (described in 1.3 and 1.4, except WinIDAMS interactive components) are available in batch components called programs. Execution of an IDAMS program is launched by a "setup" file. This contains specifications such as file definitions, program control statements, variable recoding instructions, etc., separated by IDAMS commands (starting with a $ character) which identify the kind of information being specified. The first IDAMS command in the setup file always identifies the first program to be executed, e.g.


     $RUN TABLES
     $FILES
     DICTIN = name of dictionary file
     DATAIN = name of data file
     $SETUP
         program control statements for TABLES program
     $RECODE
         variable transformation statements

1.7  Standard IDAMS Features

Case selection. By default all cases in a file will be processed in a program execution. To select a subset, a filter statement is included in the setup file, e.g. INCLUDE V3=1 (include only those cases where variable 3 is equal to 1).

Variable selection. Variables are referenced by their variable numbers assigned in the dictionary. A set of variables is specified in a variable list following keywords such as VARS, CONVARS, OUTVARS. Such variable lists may also include R-variables constructed by the IDAMS Recode facility (see below), e.g. VARS=(V3-V6,V129,R100,R101).

Transforming/Recoding data. A powerful Recode facility permits the recoding of variables and the construction of new variables. Recoding instructions are prepared by the user in the IDAMS Recode language. This includes the possibility of arithmetic computation as well as the use of several special functions for operations such as the grouping of values, the creation of "dummy" variables, etc. Conditional statements are also allowed. Examples of Recode statements for constructing 3 new variables R100, R101 and R102 are:


     R100=V4+V5
     R101=BRAC(V10,0-15=1,16-60=2,61-98=3,99=9)
     IF (MDATA(V3,V4) OR V4 EQ 0) THEN V102=99 ELSE R102=V3*100/V4
The R-variables thus constructed for each case can be used temporarily in the program being executed or can be saved in a dataset using the TRANS program.

Weighting data. When complex sampling procedures are used during data collection, it may be necessary to use different weights for cases during analysis. Such weights are usually stored as a variable in the data file. The WEIGHT parameter is then used in the program control statements to invoke weighting, e.g. WEIGHT=V5.

Treatment of missing data and "bad" data. Special values for each numeric variable can be identified as missing data codes and stored in the dictionary. During data processing missing data is handled through two parameters:

Normally it is assumed that data have been cleaned prior to analysis. If this is not the case, then the BADDATA parameter is available for skipping cases with non-numeric values or blanks in numeric fields, or for treating such values as missing data.


1.8  Import and Export of Data

IDAMS does not use special internal file format for storing data. Any fixed format character data file can be described by an IDAMS dictionary and then input to IDAMS. Thus if data from other software can be output into a fixed format text file, then there is no need for a special import function. Tab delimited and comma separated free format data files can be imported directly through the WinIDAMS User Interface. Moreover, the IMPEX program allows a fixed format IDAMS data file to be created from a text file in free or DIF format.

Data files created by IDAMS are always fixed format character files. Such files can be input directly to other software along with the appropriate data descriptive information for that software. Tab delimited and comma separated free format data files can be exported directly through the WinIDAMS User Interface. Moreover, the IMPEX program allows a fixed format IDAMS data file to be exported as a text file in free or DIF format.

IDAMS matrices are stored in a format specific to IDAMS (described in the "Data in IDAMS" chapter). The IMPEX program can be used to import/export free format matrices.


1.9  Exchange of Data Between CDS/ISIS and IDAMS

There is a separate program, WinIDIS, which prepares data description and performs data transfer between IDAMS and CDS/ISIS (the UNESCO software for database management and information retrieval). Such transfer is controlled by IDAMS and ISIS data description files (the IDAMS dictionary and the CDS/ISIS Field Definition Table). When going from ISIS to IDAMS, a new IDAMS dictionary and data files are always constructed and they can be match-merged with other data using IDAMS data management facilities. When going from IDAMS to ISIS, there are three possibilities: (1) a completely new data base can be constructed, (2) transferred records can be added to an existing data base as new data base records, (3) records of an existing data base can be updated with the transferred data.


1.10  Structure of this Manual

All the general features of IDAMS, including the Recode facility, are described in Part 1 of this Manual.

Part 2 includes installation instructions, description of files and folders used in WinIDAMS, a section entitled "Getting Started" which takes a user through the steps required to perform simple task, and description of the WinIDAMS User Interface.

In-depth descriptions of each IDAMS program are given in Parts 3 and 4 . These write-ups contains the following sections:

    General Description. A statement of the primary purpose of the program.

    Standard IDAMS Features. Statements about the case and variable selection possibilities, data transformation, weighting capabilities, and missing data handling.

    Description of "printed" output. Details of results destined to be printed (or reviewed on the screen).

    Description of output and input files. One section for each IDAMS dataset, each matrix and each other distinct input or output data file, giving a description of data expected or generated by the program.

    Setup Structure. A designation of the file definitions, IDAMS commands, and program control statements needed to execute the program.

    Program Control Statements. The parameters and/or formats of each of the program control statements with an example of each type.

    Restrictions. A summary of the program limitations.

    Examples. Examples of complete sets of control statements for executing the program.

Part 5 provides description of WinIDAMS interactive components for graphical exploration of data, for time series analysis and for construction of multidimensional tables.

Details of statistical techniques, formulas and references for analysis programs can be found in Part 6 .

Finally, errors issued by IDAMS programs are summarized in the Appendix.