BUILD takes a raw
data file, which may contain several records per case, along with
a dictionary describing the required variables and creates a new Data
file with a single record per case containing values only for the
specified variables. At the same time, it outputs an IDAMS dictionary
describing the newly formatted Data file, in other words an IDAMS
dataset is created. In addition to restructuring the data, BUILD
also checks for non-numeric values in numeric variables. Why use
BUILD? Any IDAMS program can be used without first using BUILD by
preparing separately an IDAMS dictionary. However BUILD is recommended
as a preliminary step since it:
Numeric variable processing. When BUILD processes a field
as containing a numeric variable, it checks that the field either
contains a recognizable number or is blank. If a value other than
these occurs, e.g. '3J', '3-', '**2', etc. the sequential position
of the case, the variable number associated with the field, and the
input case are printed and a string of nines is used as the output
value. Processing rules are as follows:
Case and variable
selection. This program has no provision for selecting cases from
the input data file. The standard filter is not available. By way
of the variable descriptions, any subset of the fields within a case
may be selected for the output data. Transforming data.
Recode statements may not be used. Treatment of missing data.
BUILD makes no distinction between substantive data and missing data
values. However, blank fields may be replaced by missing data codes,
zeros or nines.
Input dictionary. (Optional:
see the parameter PRINT). "Brule" column on the dictionary listing
contains recoding rules for blank fields, as specified in col. 64
of the input dictionary. Note that error messages for the dictionary
are interspersed with the dictionary listing and do not contain a
variable number. If the input dictionary is not printed, the errors
may be difficult to identify. Output dictionary. (Optional:
see the parameter PRINT). Variable description records (T-records)
are printed without or with C-records, if any. Output data file
characteristic. Record length of the output data file. Data
editing messages. For each case containing errors, the input case
(up to 100 characters per line) and a report of errors in variable
number order are printed. Blank field recoding messages.
(Optional: see the parameter PRINT). For each case containing blank
fields that were recoded, a message about this along with the input
data case are printed. These messages are integrated with the data
editing messages, if any errors also occur in the case.
BUILD creates a Data file
and a corresponding IDAMS dictionary, i.e. an IDAMS dataset. Note
that the T-records always define the locations of variables in terms
of starting position and field width. The data file contains one
record for each case. The record length is the sum of the field widths
of all variables output and is determined by the BUILD program. Numeric
variable values. Numeric variable values are edited to a standard
form as described in the "Numeric variable processing" paragraph above.
Alphabetic variable values. The data values for alphabetic
variables are not edited and are the same on input and output. Variable
width. Normally BUILD assigns the width of a variable to be the
same as the number of characters the variable occupies in the input
data. However, if a missing data code has one more significant digit
than the input field width, the output field width will be increased
by one. Variable location. BUILD assigns the output fields
in variable number order. Thus, if the first two variables have output
widths of 5 and 3, locations 1-5 are assigned to the first variable
and 6-8 are assigned to the second, etc. Reference number and
study ID. The reference number, if it is not blank, and study
ID are the same as their input values. If the reference number field
of an input T-record or C-record is blank, it is filled with the variable
number.
This describes those
variables that are to be selected for output. The format is as described
in the "Data in IDAMS" chapter with column 64 of T-records being used
to specify a recoding rule for blanks in a variable as follows: The data can be any fixed-length
record file with one or more records per case providing there are
exactly the same number of records for each case. The file should
be sorted by record type within case ID. The values for any variable
must be located in the same columns in the same record for every case.
If the input data has more than one record per case, MERCHECK should
always be used prior to BUILD to ensure that the data do have the
same set of records for each case. Note that the exponential notation
of data is not accepted by BUILD. 11.1  General Description
Table showing examples of editing performed by BUILD
and the contents of the output field for a 3-digit input numeric field
======================================================================
Input No. MD1 Recoding Output Output Error message
value dec. specified value field
width
===== ==== === ========= ====== ====== ===============
032 0 9999 - 0032 4 -
32 0 - 032 3 -
3 2 0 - 999 3 embedded blanks in var ...
32 0 - 999 3 embedded blanks in var ...
-03 0 - -03 3 -
-3 0 - -03 3 -
- 3 0 - -03 3 -
3.2 0 - 003 3 -
32 1 - 032 3 -
.32 1 - 003 3 -
3.2 1 - 032 3 -
.32 2 - 032 3 -
.35 1 - 004 3 -
-.3 0 - -00 3 -
-.3 1 - -03 3 -
-03 1 - -03 3 -
- 8888 1 8888 4 (only if PRINT=RECODES)
- 0 000 3 (only if PRINT=RECODES)
- None 3 blanks in var ...
A32 - - 999 3 bad characters in var ...
3-2 - - 999 3 bad characters in var ...
11.2  Standard IDAMS Features
11.3  Results
11.4  Output Dataset
11.5  Input Dictionary
blank
-
no recoding of blank fields,
0
-
recode blank fields to zeros,
1
-
recode blank fields to 1st missing data code for variable,
2
-
recode blank fields to 2nd missing data code for variable,
9
-
recode blank fields to 9's.
11.6  Input Data
$RUN BUILD
$FILES
File specifications
$SETUP
1. Label
2. Parameters
$DICT (conditional)
Dictionary
$DATA (conditional)
Data
Files:
DICTxxxx input dictionary (omit if $DICT used)
DATAxxxx input data (omit if $DATA used)
DICTyyyy output dictionary
DATAyyyy output data
PRINT results (default IDAMS.LST)
|
Refer to "The IDAMS Setup File" chapter for further descriptions of the program control statements, items 1-2 below.
Example: FILE BUILDING STUDY A35
Example: MAXERROR=50
INFILE=IN /xxxx
LRECL=80 /n
MAXCASES=n
VNUM=CONTIGUOUS /NONCONTIGUOUS
MAXERR=10 /n
OUTFILE=OUT /yyyy
PRINT=(RECODES, CDICT/DICT, OUTDICT /OUTCDICT/NOOUTDICT)
Example 1. Build an IDAMS dataset (dictionary and data file); input data records have a record length of 80 with 3 records per case; variables are numbered non-contiguously in the input dictionary; variable V2 is the complete ID (columns 5-10) while variables V3 and V4 contain the two parts of the ID (columns 5-8, 9-10 respectively); blank fields should be replaced by the first missing data code for variables V101, V122, V168, and by zeros for variable V169; blanks for V123 (age) should be treated as errors.
$RUN BUILD
$FILES
DATAIN = ABCDATA RECL=80 input Data file
DICTOUT = ABC.DIC output Dictionary file
DATAOUT = ABC.DAT output Data file
$SETUP
BUILDING A IDAMS DATASET
VNUM=NONC MAXERR=200
$DICT
3 1 169 3
T 1 TOWN CODE 1 1 1 3 ID
T 2 RESPONDENT ID 5 10 ID
T 3 HOUSEHOLD NUMBER 5 8 ID
T 4 RESPONDENT NUMBER 9 10 ID
T 101 RESP POSITION IN FAMILY 13 0 9 1 QS1
T 122 SEX 225 9 1 QS2
T 123 AGE 48 49 QS2
T 168 OCCUPATION 358 59 99 98 1 QS3
T 169 INCOME 61 65 99998 0 QS3
Example 2. Verify the presence of non-numeric characters in 4 numeric fields; the input data file has one record per case; records are identified by an alphabetic field; the 5 variables are not numbered contiguously; the output files normally produced by BUILD are not required and are defined as temporary files (extension TMP) which are automatically deleted by IDAMS at the end of execution.
$RUN BUILD
$FILES
DATAIN = A:NEWDATA RECL=256 input Data file
DICTOUT = DIC.TMP temporary output Dictionary file
DATAOUT = DAT.TMP temporary output Data file
$SETUP
CHECKING FOR AND REPORTING NON-NUMERIC CHARACTERS AND BLANKS
VNUM=NONC LRECL=256 PRINT=NOOU MAXERR=200
$DICT
3 1 35 1 1
T 1 RESPONDENT NAME 1 20 1
T 21 AGE 21 2
T 22 INCOME 29 6
T 25 NO. WORK PLACES 129 1
T 35 SCI. TITLE 201 1