Building an IDAMS Dataset (BUILD)

11    Building an IDAMS Dataset (BUILD)


11.1  General Description

BUILD takes a raw data file, which may contain several records per case, along with a dictionary describing the required variables and creates a new Data file with a single record per case containing values only for the specified variables. At the same time, it outputs an IDAMS dictionary describing the newly formatted Data file, in other words an IDAMS dataset is created.

In addition to restructuring the data, BUILD also checks for non-numeric values in numeric variables.

Why use BUILD? Any IDAMS program can be used without first using BUILD by preparing separately an IDAMS dictionary. However BUILD is recommended as a preliminary step since it:

- provides checks on the correct preparation of the dictionary,
- ensures that there is an exact match between the dictionary and the data,
- ensures that there are no unexpected non-numeric characters in the data,
- reduces the data into a compact single record per case form,
- recodes all blank fields to user specified values.

Numeric variable processing. When BUILD processes a field as containing a numeric variable, it checks that the field either contains a recognizable number or is blank. If a value other than these occurs, e.g. '3J', '3-', '**2', etc. the sequential position of the case, the variable number associated with the field, and the input case are printed and a string of nines is used as the output value.

Processing rules are as follows:


                  Table showing examples of editing performed by BUILD
          and the contents of the output field for a 3-digit input numeric field
          ======================================================================
        Input  No.   MD1   Recoding   Output  Output    Error message
        value  dec.        specified   value   field
                                               width
        =====  ====  ===   =========  ======  ======    ===============
          032   0   9999      -         0032     4      -
           32   0             -          032     3      -
          3 2   0             -          999     3      embedded blanks in var ...
          32    0             -          999     3      embedded blanks in var ...
          -03   0             -          -03     3      -
           -3   0             -          -03     3      -
          - 3   0             -          -03     3      -
          3.2   0             -          003     3      -
           32   1             -          032     3      -
          .32   1             -          003     3      -
          3.2   1             -          032     3      -
          .32   2             -          032     3      -
          .35   1             -          004     3      -
          -.3   0             -          -00     3      -
          -.3   1             -          -03     3      -
          -03   1             -          -03     3      -
                -   8888      1         8888     4      (only if PRINT=RECODES)
                -             0          000     3      (only if PRINT=RECODES)
                -           None                 3      blanks in var ...
          A32   -             -          999     3      bad characters in var ...
          3-2   -             -          999     3      bad characters in var ...

11.2  Standard IDAMS Features

Case and variable selection. This program has no provision for selecting cases from the input data file. The standard filter is not available. By way of the variable descriptions, any subset of the fields within a case may be selected for the output data.

Transforming data. Recode statements may not be used.

Treatment of missing data. BUILD makes no distinction between substantive data and missing data values. However, blank fields may be replaced by missing data codes, zeros or nines.

11.3  Results

Input dictionary. (Optional: see the parameter PRINT). "Brule" column on the dictionary listing contains recoding rules for blank fields, as specified in col. 64 of the input dictionary. Note that error messages for the dictionary are interspersed with the dictionary listing and do not contain a variable number. If the input dictionary is not printed, the errors may be difficult to identify.

Output dictionary. (Optional: see the parameter PRINT). Variable description records (T-records) are printed without or with C-records, if any.

Output data file characteristic. Record length of the output data file.

Data editing messages. For each case containing errors, the input case (up to 100 characters per line) and a report of errors in variable number order are printed.

Blank field recoding messages. (Optional: see the parameter PRINT). For each case containing blank fields that were recoded, a message about this along with the input data case are printed. These messages are integrated with the data editing messages, if any errors also occur in the case.

11.4  Output Dataset

BUILD creates a Data file and a corresponding IDAMS dictionary, i.e. an IDAMS dataset. Note that the T-records always define the locations of variables in terms of starting position and field width.

The data file contains one record for each case. The record length is the sum of the field widths of all variables output and is determined by the BUILD program.

Numeric variable values. Numeric variable values are edited to a standard form as described in the "Numeric variable processing" paragraph above.

Alphabetic variable values. The data values for alphabetic variables are not edited and are the same on input and output.

Variable width. Normally BUILD assigns the width of a variable to be the same as the number of characters the variable occupies in the input data. However, if a missing data code has one more significant digit than the input field width, the output field width will be increased by one.

Variable location. BUILD assigns the output fields in variable number order. Thus, if the first two variables have output widths of 5 and 3, locations 1-5 are assigned to the first variable and 6-8 are assigned to the second, etc.

Reference number and study ID. The reference number, if it is not blank, and study ID are the same as their input values. If the reference number field of an input T-record or C-record is blank, it is filled with the variable number.

11.5  Input Dictionary

This describes those variables that are to be selected for output. The format is as described in the "Data in IDAMS" chapter with column 64 of T-records being used to specify a recoding rule for blanks in a variable as follows:

blank - no recoding of blank fields,
0 - recode blank fields to zeros,
1 - recode blank fields to 1st missing data code for variable,
2 - recode blank fields to 2nd missing data code for variable,
9 - recode blank fields to 9's.
Note: The Dictionary window of the User Interface does not provide access to the column 64. Thus, use the WinIDAMS General Editor (File/Open/File Using General Editor) or any other text editor to fill in this column.

11.6  Input Data

The data can be any fixed-length record file with one or more records per case providing there are exactly the same number of records for each case. The file should be sorted by record type within case ID. The values for any variable must be located in the same columns in the same record for every case.

If the input data has more than one record per case, MERCHECK should always be used prior to BUILD to ensure that the data do have the same set of records for each case.

Note that the exponential notation of data is not accepted by BUILD.


11.7  Setup Structure




     $RUN BUILD
   
     $FILES
          File specifications
 
     $SETUP
          1. Label
          2. Parameters
 
     $DICT (conditional)
          Dictionary
 
     $DATA (conditional)
          Data
 
 
     Files:
     DICTxxxx   input dictionary (omit if $DICT used)
     DATAxxxx   input data (omit if $DATA used)
     DICTyyyy   output dictionary
     DATAyyyy   output data
     PRINT      results (default IDAMS.LST)


11.8  Program Control Statements

Refer to "The IDAMS Setup File" chapter for further descriptions of the program control statements, items 1-2 below.

  1. Label (mandatory). One line containing up to 80 characters to label the results.
    
         Example:  FILE BUILDING STUDY A35
    
  2. Parameters (mandatory). For selecting program options.
    
         Example:  MAXERROR=50
    
    INFILE=IN /xxxx
    A 1-4 character ddname suffix for the input Dictionary and Data files.
    Default ddnames: DICTIN, DATAIN.

    LRECL=80 /n

    The length of each input data record.
    (Used to check if variable starting locations on T-records are valid).

    MAXCASES=n

    The maximum number of cases to be used from the input file.
    Default: All cases will be used.

    VNUM=CONTIGUOUS /NONCONTIGUOUS

    CONT 
    Check that variables are numbered in ascending order and consecutively in the input dictionary.
    NONC 
    Check only that variables are numbered in ascending order.

    MAXERR=10 /n

    The maximum number of cases with errors (unrecoded blanks and non-numeric values for numeric variables) before BUILD terminates execution.

    OUTFILE=OUT /yyyy

    A 1-4 character ddname suffix for the output Dictionary and Data files.
    Default ddnames: DICTOUT, DATAOUT.

    PRINT=(RECODES, CDICT/DICT, OUTDICT /OUTCDICT/NOOUTDICT)

    RECO 
    Print input cases that contain one or more blank fields which have been recoded.
    CDIC 
    Print the input dictionary for all variables with C-records if any.
    DICT 
    Print the input dictionary without C-records.
    OUTD 
    Print the output dictionary without C-records.
    OUTC 
    Print the output dictionary with C-records if any.
    NOOU 
    Do not print the output dictionary.

11.9  Restrictions

  1. The maximum input case length is 4000 characters (record length * number of records per case).
  2. The maximum output data record length is 3600.

11.10  Examples

Example 1. Build an IDAMS dataset (dictionary and data file); input data records have a record length of 80 with 3 records per case; variables are numbered non-contiguously in the input dictionary; variable V2 is the complete ID (columns 5-10) while variables V3 and V4 contain the two parts of the ID (columns 5-8, 9-10 respectively); blank fields should be replaced by the first missing data code for variables V101, V122, V168, and by zeros for variable V169; blanks for V123 (age) should be treated as errors.


     $RUN BUILD
     $FILES
     DATAIN  = ABCDATA RECL=80               input Data file
     DICTOUT = ABC.DIC                       output Dictionary file
     DATAOUT = ABC.DAT                       output Data file
     $SETUP
     BUILDING A IDAMS DATASET
     VNUM=NONC  MAXERR=200
     $DICT
        3   1 169   3
     T   1 TOWN CODE                 1 1 1 3                                 ID
     T   2 RESPONDENT ID               5  10                                 ID
     T   3 HOUSEHOLD NUMBER            5   8                                 ID
     T   4 RESPONDENT NUMBER           9  10                                 ID
     T 101 RESP POSITION IN FAMILY    13               0      9      1       QS1
     T 122 SEX                       225               9             1       QS2
     T 123 AGE                        48  49                                 QS2
     T 168 OCCUPATION                358  59          99     98      1       QS3
     T 169 INCOME                     61  65              99998      0       QS3

Example 2. Verify the presence of non-numeric characters in 4 numeric fields; the input data file has one record per case; records are identified by an alphabetic field; the 5 variables are not numbered contiguously; the output files normally produced by BUILD are not required and are defined as temporary files (extension TMP) which are automatically deleted by IDAMS at the end of execution.


     $RUN BUILD
     $FILES
     DATAIN  = A:NEWDATA RECL=256            input Data file
     DICTOUT = DIC.TMP                       temporary output Dictionary file
     DATAOUT = DAT.TMP                       temporary output Data file
     $SETUP
     CHECKING FOR AND REPORTING NON-NUMERIC CHARACTERS AND BLANKS
     VNUM=NONC  LRECL=256  PRINT=NOOU  MAXERR=200
     $DICT
        3   1  35   1   1
     T   1 RESPONDENT NAME             1  20 1
     T  21 AGE                        21   2
     T  22 INCOME                     29   6
     T  25 NO. WORK PLACES           129   1
     T  35 SCI. TITLE                201   1