Before starting analysis of
data with whatever software, data normally need to be validated. Such
validation typically comprises three stages:
Checking for non-numeric
values in numeric variables and the optional conversion of blank fields
to user specified numeric values is performed by the BUILD program.
Checking for other invalid codes is performed by the program CHECK
where what are valid codes are defined on special control statements
or taken from C-records in the dictionary describing the data. If
data are entered using the WinIDAMS User Interface, non-numeric characters
(except empty fields) in numeric fields are not allowed. Moreover,
there is a possibility of code checking during data entry and of an
overall check for invalid codes in the whole data file. C-records
in the dictionary are used for this purpose. Consistency checks
can be expressed in the IDAMS Recoding language and used with the
CONCHECK program to list cases with inconsistencies. Errors found
in any of these steps can be corrected directly through the User Interface
or by using the IDAMS program CORRECT. A typical sequence of steps
for data error detection and correction with IDAMS is described in
more detail below.
Steps 2, 3 and 4 are necessary only when cases are composed of
more than one record.
Perform steps 7 and 8 until no errors are reported.
Assign a "result" number to each consistency check and translate
the logic into Recode statements where the result is set to 1 for
an inconsistency, e.g.
IDAMS contains
an extensive set of facilities for generating indices, derived measures,
aggregations, and other transformations of the data, including alphabetic
recoding. The most frequently used capabilities are provided by the
Recode facility, which can perform temporary operations in all analysis
programs that input an IDAMS dataset. Results of recoding can be saved
as permanent variables using the TRANS program. These facilities operate
on variables within one case and permit recoding of the values of
one or more variables, generation of variables by combinations of
variables, control of the sequence of these operations through tests
of logical expressions, and a number of specialized statements and
functions. The necessary new dictionary information to describe the
results of the operations performed is automatically produced. For
aggregation across cases, the AGGREG program is available. AGGREG
provides arithmetic sums and related measures, ranges, and counts
of valid data values within groups of cases. Typical use of AGGREG
involves the prior use of the SORMER program to order the Data file
into the desired groups. There are a number of circumstances in
which it is necessary to combine the records from two different files,
for example, data collected at different points in time. As values
for variables for each new wave are received, the objective is to
add them to the record containing all the previous data for the same
respondent or case. The MERGE program will accomplish this, including
appropriate padding with missing data where respondents are not found
in the new wave. Similar examples occur when residuals or some form
of scale scores are generated for each case by an analysis program
and need to be included with the original data. A somewhat different
combination process occurs when data from different levels of analysis
are to be combined. One illustration of this is the addition of household
data to individual respondent's records. When a dataset is ordered
such that all respondents in the same household are together, MERGE
will provide the necessary duplicate record merge. A similar situation
occurs when group summaries from AGGREG are to be added to the records
for each case in each respective group. Another dataset combination
process, often also termed a merge, occurs when additional cases are
to be added to a dataset. The new records must be described by the
same dictionary as the original data. This type of merge may be achieved
with the SORMER program. Sub-setting functions are available as
temporary operations in most IDAMS programs (by using a "filter")
to select particular cases for processing. Permanent files containing
subsets of IDAMS datasets (a subset of variables or a subset of cases,
or both) may also be created. The SUBSET and TRANS programs are most
likely to be used for such tasks, although several other programs
that output datasets, such as MERGE, may also be used. Selection of
cases may be done on the basis that only certain cases are logically
of interest (such as only the female respondents), or it may be done
on a random basis using the Recode function RAND with the TRANS program.
A display of the actual values stored in an IDAMS dataset is often
of substantial help for checking the results from data modification
steps and indeed at any other stages. The LIST program is available
for this purpose, and allows complete listings of a selection of specific
cases and variables. The selection or filtering of cases for display
may be done using combinations of several variables in logical expressions;
an example would be a selection of only records for unmarried women
between 21 and 25 years of age. Numeric and alphabetic variables from
a dataset as well as variables constructed with Recode statements
can be listed. The User Interface also has an option to print the
data in a table format.
The paramount consideration
for the user in selecting analysis programs is whether the appropriate
statistical functions are provided. Guidance on such matters is well
beyond the scope of this manual. A summary of the functions of each
IDAMS analysis program can be found in the Introduction. More details
are given in the individual program write-ups. The formulas used for
computing the statistics in each program, and references are given
in relevant chapters of the part "Statistical Formulas and Bibliographic
References".
Suppose that an IDAMS dataset contains responses to a survey questionnaire
and includes the following variables: V11 gives the sex of the
respondent according to the following code:
V13 through V16 are attitudinal measures on different issues. The
variables are each coded to reflect the feelings of the respondent
as follows:
Cross-tabulations are desired between the recoded version
of the income variable, V12, and each of the attitudinal variables,
V13 to V16. Only the female respondents are to be selected for this
analysis. An IDAMS "setup" containing the necessary control statements
to perform this work is shown below. The numbers in parentheses on
the left identify each control statement and link it to the subsequent
explanation.
5.1  Data Validation with IDAMS
5.1.1  Overview
Like much other statistical software, IDAMS requires that there must
be the same amount of data for each case. If the data for one case
spans several records, then each case must comprise exactly the same
set of records. If certain variables are not applicable to some cases,
then "missing" values must none-the-less be assigned. Record merge
checking capabilities in IDAMS allow for checking that each case of
data has the correct set of records. This is performed by the program
MERCHECK which produces a "rectangular" output file where extra/duplicate
records have been deleted and cases with missing records have either
been dropped or else padded with dummy records. 5.1.2  Checking Data Completeness
V2 AREA
01 02 03 04 05
01 3 6 2
V1 PSU 02 10 4 2 8 5
03
.
.
This table could be compared with the interviewers' log-book
to check whether the data for all interviews taken exist in the file.
Whichever method is selected, MERCHECK should be re-executed on the
corrected file to make sure all errors have been dealt with. 5.1.3  Checking for Non-numeric and Invalid Variable Values
Note that corrections could also be done with the WinIDAMS
User Interface if the number of cases is not too large. However using
CORRECT is a less error prone method. 5.1.4  Consistency Checking
Perform steps 9 and 10 until no errors are reported. The data output
from the final execution of CORRECT will be ready for analysis.
PREGNANT (V32) = inapplicable if and only if SEX (V6) =
Male.
IF V6 EQ 1 AND V32 NE 9 THEN R1001=1
IF V6 NE 1 AND V32 EQ 9 THEN R1001=1 ELSE R1001=0
Use the set of Recode statements with CONCHECK to print
cases with errors. 5.2  Data Management/Transformation
5.3  Data Analysis
5.4  Example of a Small Task to be Performed with IDAMS
1. Male 2. Female 9. Not ascertained
V12 is the respondent's income in dollars (99999 = not ascertained).
1. Very positive 2. Positive 3. Neutral 4. Negative
5. Very negative 8. Don't know
Suppose that only a grouping or recoding of income levels is needed
of the following kind:
9. Not ascertained 0. The
question is irrelevant for this respondent
New code
Meaning
1
Income in the range $0 to $9999
2
Income in the range $10,000 to $29,999
3
Income $30,000 and over
9
Refused, Not ascertained, Don't know
(1) $RUN TABLES
(2) $FILES
(3) DICTIN = ECON.DIC
(4) DATAIN = ECON.DAT
(5) $RECODE
(6) R101=BRAC(V12,0-9999=1,10000-29999=2,30000-99998=3, -
(7) ELSE=9)
(8) NAME R101 'GROUPED INCOME'
(9) $SETUP
(10) INCLUDE V11=2
(11) EXAMPLE OF TABLES USING ECONOMIC DATA
(12) *
(13) TABLES
(14) ROWVARS=(R101,V13-V16)
(15) ROWVAR=R101 COLVARS=(V13-V16) CELLS=(FREQS,ROWPCT) STATS=CHI
Briefly, this is what each statement does: