lab statement
where lab is an optional 1-4 character label starting in position
1 of the line and followed by at least one blank. Unlabelled statements
must start in position 2 or beyond.
To give
some idea of how the elements of the Recode language fit together,
a sample set of Recode statements is given below. Except in the special
functions MAX, MEAN, MIN, STD, SUM, VAR, Recode does not automatically
check the values of variables for missing data. The user must therefore
control specifically for missing data before doing calculations with
variables. The MDATA function is available for this purpose; e.g.
Finally, missing data codes can be assigned to R or V variables
with the MDCODES definition statement; e.g. Sometimes a set of Recode statements does not assign a
value to an R-variable for a particular data record. The R-variable
will then take the default MD1 value of 1.5 * 109 to which
it is initialized. To change this to a more acceptable missing data
value, we must test if the value is large and, if so, assign an appropriate
missing data value, e.g. Syntax checking
and interpretation. Recode statements are read and analyzed for
errors prior to interpretation of other IDAMS program control statements
and prior to program execution. If errors are found, diagnostic messages
are printed and execution of the program is terminated. Results.
Recode prints out the Recode statements input by the user along with
syntax errors detected if any. This occurs before the program is executed,
i.e. before the interpretation of the program control statements is
printed. Initialization before starting to process the Data
file. If there are no syntax errors, tables, missing data codes,
names, etc. are initialized (according to the initialization/definition
statements supplied by the user) before starting to read the data.
R-variables in CARRY statements are initialized to zero. Initialization
before processing each data case. At the start of processing of
each case and before execution of the Recode statements for that case,
all R-variables, except those listed in CARRY statements, are initialized
to the IDAMS internal default missing data value (1.5 * 109).
Execution of Recode statements. The actual recoding takes
place after the data for a case is read and after the main filter
has been applied. Cases not passing the filter are not passed to the
recoding routines. Recode variables cannot therefore be used in main
filters. The use of the Recode statements is sequential (i.e. the
first statement is used first, then the second, third, etc.) except
as modified by GO TO, BRANCH, RETURN, REJECT, ENDFILE, ERROR statements
(the control statements). When all statements have been used, the
case is passed to the IDAMS program being executed. When the IDAMS
program has finished using the case, the next case passing the main
filter is processed, the R-variables (except the CARRY variables)
being reinitialized to missing data and the Recode statements executed
for that case and so on until the end of the data file is reached.
Testing Recode statements. Errors in logic can be made which
are not detectable by the Recode facility. To check the intended results
against those generated by Recode, the Recode statements should be
tested on a few records using the LIST program with the parameter
MAXCASES set, say, to 10. The data values for the variables input
and the corresponding result variables can then be inspected. Files
used by Recode. When a $RECODE command is encountered in the Setup
file, subsequent lines are copied into a work file on unit FT46. The
RECODE program reads Recode statements from this file and analyzes
them for errors prior to interpretation of other IDAMS program control
statements and prior to program execution. If errors are found, diagnostic
messages are printed and execution of the entire IDAMS step is terminated.
Interpreted statements are written in the form of tables to a work
file on unit FT49 from where they are read by the IDAMS program being
executed. Messages about Recode statements are written to unit
FT06 along with results from the IDAMS program being executed.
Variables. Variables
in Recode refer either to input variables (V-variables) or result
variables (R-variables). They are defined as follows:
Result variables
(Rn). "R" followed by a number (1 to 9999). These are variables that
are created by the user. R-variables (except for those listed in CARRY
statements - see below) are initialized to the default missing value
of 1.5 * 109 before processing of each case. To use
an R-variable in a program, specify an R (instead of V) on the variable
list attached to a keyword parameter (e.g. WEIGHT=R50 or VARS=(R10-R20)).
When printed out by programs, a result variable number is sometimes
identified by a negative sign. Thus, variable "10" is V10 and variable
"-10" is R10. It is less confusing to use numbers for the result variables
which are distinct from input variable numbers. R-variables are always
numeric. Character constants.
Character constants are enclosed in single primes (e.g. 'ABCXYZ',
'M'). A prime within a character constant must be represented by two
adjacent primes (e.g. DON'T would be written: 'DON"T'). Character
constants are used in the NAME statement to assign names to new variables.
They can also be used in logical expressions to test values of alphabetic
variables (e.g. IF V10 EQ 'M'); only the first 4 characters are used
in such comparisons and constants/variables values of length <
4 are padded on the right with blanks. Character constants cannot
be used in arithmetic functions (except BRAC).
Arithmetic operators.
Arithmetic operators are used between arithmetic operands. Available
operators, in precedence order, are:
Relational operators. Relational operators are used
to determine whether or not two arithmetic values have a particular
relationship to one another. The relational operators are:
An expression is a representation
of a value. A single constant, variable, or function reference is
an expression. Combinations of constants, variables, functions and
other expressions with operators are also expressions. Recode can
evaluate arithmetic and logical expressions. Note that brackets can
be used anywhere in an expression to clarify the order in which it
is to be evaluated. Arithmetic expressions. Arithmetic expressions
are created using arithmetic operators and variables, constants and
arithmetic functions. They yield a numeric value. Examples are: Arithmetic functions
all return a single numeric value. The argument list for functions
can be simple lists enclosed in parentheses or highly structured lists
involving both keyword elements and elements in specific positions
in the list. The available functions are:
4.1  Rules for Coding
4.2  Sample Set of Recode Statements
$RECODE
IF V5 LT 8 THEN REJECT (exclude cases where V5 < 8)
IF NOT MDATA(V6) THEN R51=TRUNC(V6/4) -
ELSE R51=0
R52=BRAC(V10,0-24=1,25-49=2,50-74=3, - (group values of V10)
74-99=4,TAB=1)
R53=BRAC(V11,TAB=1) (group V11 the same way as V10)
IF V26 INLIST(1-10) THEN R54=1 AND -
R55=1 ELSE R54=2
IF R54 EQ 1 THEN GO TO L1
R55=99
R56=V15 + V35
GO TO L2
L1 R56=99
L2 R57=COUNT(1,V20-V27,V29) (count how many of the listed
variables have the value 1)
NAME R52 'GROUPED AGE', -
R53 'GROUPED AGE AT MARRIAGE'
MDCODES R55(99),R56 (99)
4.3  Missing Data Handling
IF MDATA (V5,V6) THEN R1=999 ELSE R1=V5+V6
There are two additional functions, MD1 and MD2, which
return the 1st or 2nd missing data code value for a variable; e.g.
R2=MD1(V6)
assigns R2 the value of the 1st missing data code of
V6.
MDCODES R3(8,9)
assigns 8 and 9 as the 1st and 2nd missing data codes
for R3.
IF R100 GT 1000000 THEN R100=99
MDCODES R100(99)
4.4  How Recode Functions
4.5  Basic Operands
Input variables (Vn). "V" followed by a number. These are variables
as defined by the input dictionary. Their values may be changed by
Recode (e.g. V10=V10+V11). Variables should normally be numeric but
alphabetic variables of not more than 4 characters can also be used,
in particular, they can be recoded to numeric values.
Numeric constants. Constants may be integer or decimal, positive
or negative, e.g. (3, 5.5, -50, -0.5). 4.6  Basic Operators
Logical operators. Logical operators are used between
logical operands. Logical operands take only the values "true" or
"false". These are:
4.7  Expressions
V732 (the value of V732)
44 (the constant 44)
R67/V807 + 25 (25 plus the value of R67 divided by the value of V807)
LOG(R10) (the log of the value of R10)
Logical expressions. Logical expressions are
evaluated to a "true" or "false" value. Logical variables do not exist
in the Recode language, so that the result of logical expressions
cannot be assigned to a variable. Logical expressions can only be
used in IF statements. Examples are:
R5 EQ V333
True if the value of R5 is equal to the value of V333,
and false otherwise.
(V62 GT 10) OR (R5 EQ V333)
True if either of the logical expressions results in
a true value, and false if both result in a false value.
MDATA(V10,R20) AND V9 GT 2
True if the value of V10 or the value of R20 is a missing
data code and the value of V9 is larger than 2, false otherwise.
4.8  Arithmetic Functions
| Function | Example | Purpose |
| ABS | ABS(R3) | Absolute value |
| BRAC | BRAC(V5,TAB=1,ELSE=9, - | Univariate grouping |
| 1-10=1,11-20=2) | ||
| BRAC(V10,'F'=1,'M'=2) | Alphabetic recoding | |
| COMBINE | COMBINE V1(2), V42(3) | Combination of 2 variables |
| COUNT | COUNT(1,V20-V25) | Counting occurrences of a value |
| across a set of variables | ||
| LOG | LOG(V2) | Logarithm to the base 10 |
| MAX | MAX(V10-V20) | Maximum value |
| MD1,MD2 | MD1(V3) | Value of missing data code |
| MEAN | MEAN(V5-V8,MIN=2) | Mean value |
| MIN | MIN(V10-V20) | Minimum value |
| NMISS | NMISS(V3-V6) | Number of missing data values |
| NVALID | NVALID(V3-V6) | Number of non-missing values |
| RAND | RAND(0) | Random number |
| RECODE | RECODE V7,V8,(1/1)(1/2)=1, - | Multivariate recoding |
| (2-3/3)=2, ELSE=0 | ||
| SELECT | SELECT (BY=V10,FROM=R1-R5,9) | Selecting the value of one of a set of variables |
| according to an index variable | ||
| SQRT | SQRT(V2) | Square root |
| STD | STD(V20-V25,MIN=4) | Standard deviation |
| SUM | SUM(V6,V8,V9-V12,MIN=3) | Sum of values |
| TABLE | TABLE(V5,V3,TAB=2,ELSE=9) | Bivariate recoding |
| TRUNC | TRUNC(V26/3) | Integer part of the argument's value |
| VAR | VAR(V6,R5-R10,MIN=7) | Variance |
The exact syntax for each function is given below.
ABS. The ABS function returns a value which is the absolute value of the argument passed to the function.
Prototype: ABS(arg)
Where arg is any arithmetic expression for which the absolute value is to be taken.
Example:
R5=ABS(V5-V6)
BRAC. The BRAC function returns a value which
is derived from performing specified operations (rules) upon a single
variable. Prototype: BRAC(var [,TAB=i] [,ELSE=value] [,rule1,...,rule n] )
Where:
Examples:
R1=BRAC(V10,TAB=1,ELSE=9,1-10=1,11-20=2,<0=0)
The value of R1 will be 1 if variable 10 is in the range
1 to 10, 2 if V10 is in the range 11 - 20, and 0 if V10 is less than
0. If V10 has any other value, e.g. -3, 10.5, 25, 0, then the ELSE
clause would be applied, and R1 would be 9. These bracketing rules
are labelled table 1 so they can be re-used, e.g.
R2=V1 + BRAC(V2, TAB=1) * 3
In this example V2 would be bracketed by the same rules
as for V10 in the previous example. R2 would be set to V1 + (the result
of bracketing multiplied by 3).
R100=BRAC(V10,'F'=1,'M'=2,ELSE=9)
This is an example of recoding an alphabetic variable,
which has values 'F' or 'M', to numeric values of 1 and 2. COMBINE. The COMBINE function returns a unique value for each combination of values of the variables that are used as arguments. This function is normally used with categorical variables.
Prototype: COMBINE var1(n1), var2(n2),...,varm(nm)
Where:
V1 + (n1 * V2) + (n1 * n2 * V3) + (n1 * n2 * n3 * V4) etc.
The user, however, would normally determine the result of the function by listing the combinations of values in a table as in the first example below.
R1=COMBINE V6(2), R330(3)
Assume that V6 has two codes (0,1) representing men and
women respectively and R330 has three codes (0,1,2) representing young,
middle aged and old respondents, the statement will combine the codes
of V6 and R330 to give a single variable R1 as follows:
V6 V330 R1
0 0 0 Young men
1 0 1 Young women
0 1 2 Middle aged men
1 1 3 Middle aged women
0 2 4 Old men
1 2 5 Old women
Since V6 has two codes, and R330 has three, R1 will have
six. In the above example, if V6 had codes 1 and 2 instead of 0 and
1, the maximum value should be stated as "3". This would allow for
the values of 0,1, and 2, although code 0 would never appear. To avoid
these "extra" codes, the user should first recode such variables to
give a contiguous set of codes starting from 0, e.g. BRAC(V6,1=0,2=1).
Restrictions:
COUNT. The COUNT function returns a value which is equal to the number of times the value of a variable or constant occurs as the value of one of the variables in the list "varlist".
Prototype: COUNT(val,varlist)
Where:
Examples:
R3=COUNT(1,V20-V25)
R3 will be assigned a value equal to the number of times
the value 1 occurs in the 6 variables V20-V25. This might be used
for example to count the number of "YES" responses by a respondent
to a set of questions.
R5=COUNT(V1,V8-V10)
R5 will be assigned a value equal to the number of times
that the value of V1 occurs also as the value of variables V8-V10.
LOG. The LOG function returns a floating-point value which is the logarithm to the base 10 of the argument passed to the function.
Prototype: LOG(arg)
Where arg is any arithmetic expression for which the log to the base 10 is to be taken.
Examples:
R10=LOG(V30)
Note: The logarithm of any number X to any other base
B can readily be found by the following simple transformation:
R1=LOG(X)/LOG(B)
For the natural logarithm (base e), this becomes simply:
R1=2.302585 * LOG(X). Thus R1=2.302585 * LOG(V30) will assign to R1 the natural logarithm of variable 30.
MAX. The MAX function returns the maximum value in a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a maximum to be calculated. Otherwise the default missing data value 1.5 * 109 is returned.
Prototype: MAX(varlist [,MIN=n] )
Where:
Example:
R12=MAX(V20-V25)
MD1, MD2. The MD1 (or MD2) function returns a
value which is the first (or second) missing data code of the variable
given as the argument. Prototype: MD1(var) or MD2(var)
Where var is any input variable (V-variable) or previously defined result variable (R-variable).
Example:
R12=MD2(V20)
For each case processed, R12 will be assigned the second
missing data code for input variable V20. MEAN. The MEAN function returns the mean value of a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a mean to be calculated. Otherwise the default missing value 1.5 * 109 is returned.
Prototype: MEAN(varlist [,MIN=n] )
Where:
Example:
R15=MEAN(R2-R4,V22,V5,MIN=2)
The result will be the mean of the specified variables,
if at least two of the variables have non-missing values. Otherwise,
the result will be 1.5 * 109. MIN. The MIN function returns the minimum value in a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a minimum to be calculated. Otherwise the default missing value 1.5 * 109 is returned.
Prototype: MIN(varlist [,MIN=n] )
Where:
Example:
R10=MIN(V5,V7,V9,R2)
NMISS. The NMISS function returns the number
of missing values in a set of variables. Prototype: NMISS(varlist)
Where varlist is a list of V- and R-type variables.
Example:
R22=NMISS(R6-R10)
The returned value depends on how many of the variables
R6 - R10 have missing values. The maximum value is 5 for a case in
which all 5 variables have missing data. NVALID. The NVALID function returns the number of valid values (non-missing values) in a set of variables.
Prototype: NVALID(varlist)
Where varlist is a list of V- and R-type variables.
Example:
R2=NVALID(V20,V22,V24)
The returned value depends on how many of the variables
have valid values. The maximum value of 3 will be obtained if all
3 variables have valid values. 0 will be returned if all 3 are missing.
RAND. The RAND function returns a value which is a uniformly distributed random number based upon the arguments "starter" and "limit" as described below.
Prototype: RAND(starter [,limit] )
Where:
Examples:
R1=RAND(0)
IF RAND(0) NE 1 THEN REJECT
For each case processed, R1 will be set equal to a random
number, uniformly distributed from 1 to 10. The sequence is initialized
to the clock time the first time RAND is executed. Note that RAND
can be used with the REJECT statement to select a random sample of
cases. The 2nd example will result in including a random 1/10 sample
of cases. RECODE. The RECODE function is used to return one value based upon the concurrent values of m variables.
Prototype: RECODE var1,var2,...,varm [,TAB=i] [,ELSE=value] [,rule1,rule2,...,rule n]
Where:
The prototype for a rule is:
(a1/a2/.../am)(b1/b2/.../bm)...(x1/x2/.../xm)=cEach code list contains a list and/or a range of values for every variable, e.g. with two variables, (3/2)(6-9/4)(0/1,3,5)=1.
(a1/a2/a3)=c (the function will return c if var1=a1 and var2=a2 and var3=a3) (a1|a2|a3)=c (the function will return c if var1=a1 or var2=a2 or var3=a3)
Examples:
R7=RECODE V1,V2,(3/5)(7/8)=1,(6-9/1-6)=2
R7 will be assigned a value based on the values of V1
and V2. In this example, R7 will be set to 1 if V1=3 and V2=5, or
if V1=7 and V2=8. R7 will be set to 2 if V1=6-9 and V2=1-6. In all
other instances, R7 will be unchanged (see above).
R7=RECODE V1,V2,TAB=1,ELSE=MD1(R7),(3/5)(7/8)=1,(6-9/1-6)=2
R7 will be assigned a value the same as in the preceding
example, except that R7 will be set equal to its MD1 value when the
rules are not met. The TAB=1 will allow these rules to be used in
another RECODE function call. Restriction: When the RECODE function is used, it must be the only operand on the right-hand side of the equals sign.
SELECT. The SELECT function returns the value of the variable or constant in the FROM list holding the same position as the value of the BY variable. (Warning: If the value of the BY variable is less than 1 or greater than the number of variables in the FROM list, a fatal error results). There may be up to 50 items in the FROM list. The maximum value of the BY variable is therefore 50. A SELECT function may be combined with other functions, operations, and variables to form a complex expression. Note: The SELECT function selects the value of one of a set of variables; the SELECT statement selects the variable to be used for the result. (See section "Special Assignment Statements" for description of SELECT statement).
Prototype: SELECT (FROM=list of variables and/or constants, BY=variable)
Example:
R10=SELECT (FROM=R1-R3,9,BY=V2)
R10 will take the value of R1, R2, R3 or 9 for values
of 1, 2, 3 or 4 respectively of V2. SQRT. The SQRT function returns a value which is the square root of the argument passed to the function.
Prototype: SQRT(arg)
Where arg is any arithmetic expression.
Example:
R5=SQRT(V5)
STD. The STD function returns the standard deviation
of the values of a set of variables. Missing data values are excluded.
The MIN argument can be used to specify the minimum number of valid
values for a standard deviation to be calculated. Otherwise the default
missing value 1.5 * 109 is returned. Prototype: STD(varlist [,MIN=n] )
Where:
Example:
R5=STD(V20-V24,R56-R58,MIN=3)
SUM. The SUM function returns the sum of the
values of a set of variables. Missing values are excluded. The MIN
argument can be used to specify the minimum number of valid values
for a sum to be calculated. Otherwise the default missing value 1.5
* 109 is returned. Prototype: SUM(varlist [,MIN=n] )
Where:
Example:
R8=SUM(V20,V22,V24,V26,MIN=3)
If three or more of the variables have valid values,
the sum of these is returned. Otherwise the value 1.5 * 109
is returned. TABLE. The TABLE function returns a value based on the concurrent values of two variables.
Prototype: TABLE (r, c, [TAB=i,] [ELSE=value,] [PAD=value,] COLS c1,c2,...,cm,
ROWS r1(row r1 values),r2(row r2 values),...,rn(row rn values))
Where:
Examples: Assume the following table:
Col: 1 2 3 4 5 6
Row: 2 1 1 2 2 3 4
3 1 2 2 2 3 4
5 1 2 2 2 3 4
6 3 3 3 3 3 4
8 9 9 9 9 9 9
R1=TABLE (V6, V4, TAB=1, ELSE=0, PAD=9, COLS 1-6, ROWS 2(1,1,2,2,3,4), -
3(1,2,2,2,3,4),5(1,2,2,2,3,4),6(3,3,3,3,3,4),8(9))
If V6 equals 5 and V4 equals 3, then R1 will be assigned
the value 2 (intersect of row 5 and column 3).
R5=TABLE (3, V8, TAB=7, ELSE=TABLE(V1,V8,TAB=1) )
This will use the table named "7" with 3 as the row index
and the value of V8 as the column index. If a value of V8 is not in
table 7 then the table "1" will be used with row index V1 and column
index V8. TRUNC. The TRUNC function returns the integer value of an argument.
Prototype: TRUNC(arg)
Where arg is any arithmetic expression for which the integer value is to be taken.
Example:
R5=TRUNC(V5)
R5 will be assigned the value of the input variable V5
truncated to an integer. VAR. The VAR function returns the variance of the values of a set of variables, excluding missing data. The MIN argument can be used to specify the minimum number of valid values for the variance to be calculated. Otherwise the default missing value 1.5 * 109 is returned.
Prototype: VAR(varlist [,MIN=n] )
Where:
Example:
R9=VAR(V5-V10)
Logical functions return a value of "true" or "false" when evaluated. They cannot be used as arithmetic operands. Logical functions are used in logical expressions and logical expressions comprise the test portion of conditional "IF test THEN..." statements. The available functions are:
| Function | Example | Purpose |
| EOF | IF EOF THEN GO TO NEXT | Checks for the end of the data file |
| INLIST | IF V5 INLIST(2,4,6) THEN - | Searches a list of values |
| R100=1 ELSE R100=0 | ||
| MDATA | IF MDATA(V5,V6) THEN R101=99 | Checks for missing data |
EOF. The EOF function is used for aggregation of values across cases. See example 10 in section "Examples of Use of Recode Statements". The presence of the EOF function causes the Recode statements to be executed once more after the end-of-file has been encountered. The value of the EOF function is true during this after-end-file pass of the Recode statements and is false at all other times.
For the final pass through the Recode statements, V-variables will have the value they had after the last case was fully processed. R-variables (except those listed in CARRY statements) will be reinitialized to 1.5 * 109. CARRY R-variables will be left untouched. The user must be careful to set up a correct path to be followed through the Recode statements when end-of-file is reached.
Prototype: EOF
Example:
IF R1 NE V1 OR EOF THEN GO TO L1
INLIST. The INLIST function (abbreviated IN)
returns a value of "true" if the result of an arithmetic expression
is one of a specified set of values. If the expression equals a value
outside the set of values, the function returns a value of "false".
Prototype: expr INLIST(values) or expr IN(values)
Where:
Examples:
IF R12 INLIST(1-5,9,10) THEN V5=0
If R12 has a value of 1,2,3,4,5,9 or 10, the INLIST function
returns a value of "true", and input variable V5 is set to 0. Otherwise,
INLIST returns a value of "false" and input variable V5 retains its
original value.
IF (V3 + V7) IN(2,4,5,6) THEN R1=1 ELSE R1=9
If the sum of input variables V3 and V7 results in the
value 2,4,5, or 6, then INLIST returns a value of "true" and result
variable R1 will contain the value 1. Otherwise, INLIST returns a
value of "false" and R1 will be set to 9. MDATA. The MDATA function returns a value of "true" if any of the variables passed to the function have missing data values; otherwise, the function returns a value of "false". This function is used quite often, since missing data is not automatically checked in the evaluation of expressions except in the MAX, MEAN, MIN, STD, SUM and VAR functions.
Prototype: MDATA(varlist)
Where varlist is a list of V- and R-variables. There can be a maximum of 50 variables in this list.
Example:
IF MDATA(V1,V5-V6) THEN R1=MD1(R1) ELSE R1=V1+V5+V6
If any variable in the list V1, V5, V6 has a value equal
to its MD1 code or in the range specified by its MD2 code, the MDATA
function will return a value of "true", and result variable R1 will
be set to its first missing data code. Otherwise, the MDATA function
will return a value of "false" and R1 is set to the sum of V1, V5,
V6.
These are the main structural units of the Recode language. They are used to assign a value to a result. Any number between 1 and 9999 may be used for an R-variable but it avoids confusion if the R-numbers are distinct from V-numbers of variables in the input dictionary, e.g. if there are 22 variables in the dictionary then start numbering R-variables from R30. Assignment statements can also be used to assign a new value to an input variable. In this case the original value of the input variable is lost for the duration of the particular IDAMS program execution.
Prototype: variable=expression
Where:
Examples:
R10=5
R10 is assigned the constant 5 as its value.
R5=2*V10 + (V11 + V12)/2
Any arithmetic expression may be used and parentheses
are used to change normal precedence of the arithmetic operators.
V20=SQRT(V20)
The value in V20 is replaced by its square root using
the SQRT function.
R20=BRAC(V6,0-15=1,16-25=2,26-35=3,36-90=4,ELSE=9)
R20 is assigned the value 1, 2, 3, 4 or 9 according to
the group into which the value of V6 falls.
R10=MD1(V10)
R10 is assigned a value equal to V10's first missing
data code.
DUMMY. The DUMMY statement produces a series of "dummy variables", coded 0 or 1, from a single variable.
Prototype: DUMMY var1,...,varn USING var(val1)(val2)...(valn)[ELSE expression]
Where:
Example:
DUMMY R1-R3 USING V8(1-4)(5,7,9)(0,8) ELSE 99
The following chart shows the values of R1, R2 and R3
based on different V8 values:
V8: 1 2 3 4 5 7 8 9 0 OTHER
R1: 1 1 1 1 0 0 0 0 0 99
R2: 0 0 0 0 1 1 0 1 0 99
R3: 0 0 0 0 0 0 1 0 1 99
SELECT. The SELECT statement causes the variable
in the FROM list holding the same position as the value of the BY
variable to be set equal to the value of the expression to the right
of the equals sign i.e. it selects which variable is to be
assigned a value. If the value of the BY variable is less than 1 or
greater than the number of variables in the FROM list, a fatal error
results. The maximum number of items in the FROM list is 50. Therefore
the maximum value of the BY variable is 50. Prototype: SELECT (FROM=variable list, BY=variable)=expression
Examples:
SELECT (FROM=R1,V3-V10, BY=R99)=1
SELECT (BY=V1, FROM=V8,R2,R5)=R7*5
In the first example, R1 will be set to 1 if R99 equals
1; V3 will be set to 1 if R99 equals 2; ... ; and V10 will be set
to 1 if R99 equals 9. If R99 is greater than 9 or less than 1, a fatal
error will result. The values of the eight variables not selected
will not be altered. SELECT may be used to form a loop as follows:
R99=1
L1 SELECT (BY=R99, FROM=R1,V3-V10)=0
IF R99 LT 9 THEN R99=R99+1 AND GO TO L1
The nine variables R1, V3-V10 will be set to zero, one
after another, as R99 is incremented from 1 to 9. The loop is completed
when R99 equals 9 and all variables have been initialized.
Recode statements are normally executed on each data case in order from first to last. The order can be changed with one of the control statements:
| Statement | Example | Purpose |
| BRANCH | BRANCH (V16,L1,L2) | Branch depending on the value of a variable |
| CONTINUE | CONTINUE | Continue with next statement |
| ENDFILE | ENDFILE | Do not process any more |
| data cases after this one | ||
| ERROR | ERROR | Terminate execution completely |
| GO TO | GO TO TOWN | Branch unconditionally |
| REJECT | REJECT | Reject the current data case |
| RELEASE | RELEASE | Release the current data case to the program |
| for processing and then execute recode | ||
| statements again without reading another case | ||
| RETURN | RETURN | Use the current case for analysis |
| with no further recoding |
BRANCH. The BRANCH statement changes the sequence in which statements are executed, depending on the value of a variable.
Prototype: BRANCH(var,labels)
Where:
BRANCH(R99,LAB1,LAB2,LAB3)
Transfer is made to LAB1, LAB2, or LAB3, depending on
whether R99 has a value of 1,2, or 3. CONTINUE. CONTINUE is a simple statement which performs no operation. It is used as a convenient transfer point.
Prototype: CONTINUE
Example:
IF V17 EQ 10 THEN GO TO AT
R10=V11
GO TO THAT
AT R20=V11*100
THAT CONTINUE
ENDFILE. The ENDFILE statement causes the Recode
facility to close the input dataset exactly as if an end-of-file had
been reached. If the EOF function has been specified, the EOF function
will be given a true value for a final pass through the Recode statements
from the beginning, after ENDFILE has been executed. Prototype: ENDFILE
Example:
IF V1 EQ 100 THEN ENDFILE
This statement can be used to test a set of Recode statements
or an IDAMS setup on the first n cases of a dataset. ERROR. The ERROR statement directs the Recode facility to terminate execution with an error message that indicates the number of the case and the number of the Recode statement at which the error occurred.
Prototype: ERROR
Example:
IF R6 EQ 2 THEN GO TO B
ERROR
B CONTINUE
GO TO. The GO TO statement is used to change
the sequence in which the statements are executed. In the absence
of a GO TO or a BRANCH statement, each statement is executed sequentially.
Prototype: GO TO label
Where label is a 1-4 character statement label. The statement identified by the label may be physically before or after the GO TO statement. (Warning: Be careful of referencing a statement before the GO TO, as an endless loop can be formed).
Example:
GO TO TOWN
.
.
R10=R5
GO TO 1
TOWN R10=R5+V11
1 R11=...
REJECT. The REJECT statement directs the Recode
facility to reject the present case and obtain another case. The new
case is then processed from the beginning of the Recode statements.
Thus, REJECT can be used as a filter with R-variables. Prototype: REJECT
Example:
IF MDATA (V8,V12-V13) THEN REJECT
RELEASE. The RELEASE statement directs the Recode
facility to release the present case to the program for processing
and to regain control after the processing without reading another
case. After regaining control, Recode resumes with the first Recode
statement. RELEASE can be used to break up a single record into several
cases for analysis. Note: When using the RELEASE statement, care should
be taken that processing will not continue indefinitely. Prototype: RELEASE
Example:
CARRY (R1)
R1=R1+1
IF R1 LT V1 THEN RELEASE ELSE R1=0
RETURN. The RETURN statement directs the Recode
facility to return control to the IDAMS program. No other Recode statements
are executed for the current case. Prototype: RETURN
Example:
IF V8 LT 12 THEN GO TO A
RETURN
A R10=V8
The IF statement allows conditional assignment and/or conditional control. It is a compound statement with several simple statements connected by the keywords THEN, AND and ELSE.
Prototype:
Examples:
IF V5 EQ V6 THEN R1=1 ELSE R1=2
Set R1 to 1 if the value of V5 equals the value of V6;
otherwise set R1 to 2.
IF MDATA(V7,V10-V12) THEN R6=MD1(V7) AND R10=99 -
ELSE R6=V7+V10+V11 AND R10=V12*V7
Set R6 to V7's first missing data value and R10 to 99
if any of the variables V7, V10, V11, V12 are equal to their missing
data codes. Otherwise set R6 equal to the sum of V7, V10 and V11,
and also set R10 equal to the product of V12 and V7.
IF (V5 NE 7 AND R8 EQ 9) THEN V3=1 ELSE V3=0
Set V3 to 1 if both V5 is not equal to 7 and R8 is equal
to 9. (Note: The parentheses are not required).
IF MDATA(V6) OR V10 LT 0 THEN GO TO X
If the value of V6 is missing or V10 is less than 0,
branch to the statement labelled X; otherwise continue with the next
statement.
These statements are executed once, before processing of the data starts, to initialize values to be used during the execution of Recode statements. They cannot be used in expressions and they cannot have labels.
CARRY. The CARRY statement causes the values of the variables listed to be carried over from case to case. CARRY variables are initialized only once (before starting to read the data) to zero. The CARRY variables can be used as counters or as accumulators for aggregation.
Prototype: CARRY(varlist)
Where varlist is a list of R-variables.
Example:
CARRY(R1,R5-R10,R12)
MDCODES. The MDCODES statement changes dictionary
missing data codes for input variables or assigns missing data codes
for result variables. Defaults used by Recode for R- and V-variables
with no dictionary missing data specification and no MDCODES specification
are MD1=1.5 * 109 and MD2=1.6 * 109. Prototype: MDCODES (varlist1)(md1,md2),(varlist2)(md1,md2), ..., (varlistn)(md1,md2)
Where:
Examples:
MDCODES V5(8,9)
The first missing data code for V5 will be 8; the second
missing data code will be 9.
MDCODES (R9-R11)(,99), V7(8,9), V6(9)
For R9, R10 and R11, the first missing data code will
be 1.5 * 109 and the second missing data code will be 99.
NAME. The NAME statement assigns names to R-variables or renames V-variables.
Prototype: NAME var1 'name1' ,var2 'name2', ..., varn 'name n'
Where:
Example:
NAME R1 'V5 + V6', V1 'PERSON''S STATUS'
Suppose a data file exists with the following variables:
| V1 | Village ID | |
| V2 | Sex | 1=male, 2=female |
| V4 | Age | 21-98, 99=not stated |
| V5 | Education level | 1=primary, 2=secondary, |
| 3=university, 9=Not stated | ||
| V8 | Income from 1st job | |
| V9 | Income from 2nd job | |
| V10 | Partner's income | |
| V21 | Weight in kg (one decimal) | |
| V22 | Height in meters (2 decimals) | |
| V31 | Owns car? | 1=yes, 2=no, 9=NS |
| V32 | Owns TV? | |
| V33 | Owns stereo? | |
| V34 | Owns freezer? | |
| V35 | Owns Micro computer? | |
| V41 | Number of children | |
| V42 | Age of lst child | |
| V43 | Age of 2nd child | |
| V44 | Age of 3rd child | |
| V45 | Age of 4th child |
IF NVALID(V8,V9) EQ 0 THEN R101=-1 AND GO TO END
IF NVALID(V8,V9) EQ 2 THEN R101=V8+V9 AND GO TO END
IF MDATA(V8) THEN R101=V9 ELSE R101=V8
END CONTINUE
MDCODES R101(-1)
or R101=SUM(V8,V9,MIN=1)
IF R101 EQ 1.5 * 10 EXP 9 THEN R101=-1
MDCODES R101(-1)
IF MDATA(R101) OR R101 EQ 0 THEN REJECT
IF MDATA(V10) THEN V10=0
IF MDATA(R101) THEN R102=MD1(R102) -
ELSE R102=R101 * .75 + V10 * .25
NAME R102'Composite income'
MDCODES R102(99999)
R103=BRAC(V21,30-50=1,50-70=2,70-200=3,ELSE=9)
Note that V21 is recorded with a decimal place. To make
sure that values such as 50.2 get assigned to a category, ranges in
the BRAC statement should overlap. Recode works from left to right
and assigns the code for the first range into which the case falls.
Thus a value of 50.0 will fall in category 1 but a value 50.1 will
fall into category 2. To put values of 50 in the 2nd category, use
R103=BRAC(V21, <50=1, <70=2, <200=3, ELSE=9)
A value of 49 would fit in all 3 ranges, but Recode will
use the first valid range it finds (code 1). A value of 50 will not
satisfy the first range and will be assigned code 2.
R104=COUNT(1,V31-V35)
If all items are coded 1 (yes), the index, R104, will
take the value 5. If all are coded 2 (no) or are missing, then the
index will be zero.
DUMMY R105-R107 USING V5(1)(2)(3)
The 3 result variables will take values as follows: | V5=1 | R105=1, R106=0, R107=0 |
| V5=2 | R105=0, R106=1, R107=0 |
| V5=3 | R105=0, R106=0, R107=1 |
| V5 not 1,2 or 3 | R105=0, R106=0, R107=0 (default if no ELSE value given) |
IF V41 GT 4 THEN V41=4
IF V41 EQ 0 OR MDATA(V41) THEN R109=99 ELSE -
R109=SELECT (FROM=V42-V45, BY=V41)
NAME R109'Last child''s age'
MDCODES R109(99)
IF MDATA (V21,V22) OR V22 EQ 0 THEN R111=99 AND R112=99 -
ELSE R111=V21/V22 AND R112=TRUNC ((V21/V22) + .5)
NAME R111'Weight/Height ratio dec', R112 'W/H rounded'
MDCODES (R111,R112)(99)
Method a. First reduce the codes for sex and education into contiguous codes starting from 0, storing the results temporarily in variables R901, R902.
R901=BRAC (V5,1=0,2=1,ELSE=9)
R902=BRAC (V6,1=0,2=1,3=1,ELSE=9)
Then use the COMBINE function, making sure first that
cases with spurious codes are put in a missing data category.
IF R901 GT 1 OR R902 GT 1 THEN R110=9 ELSE -
R110=COMBINE R901(2),R902(2)
Method b. Use IFs, setting a default value of
9 at the start.
R110=9
IF V5 EQ 1 AND V6 EQ 1 THEN R110=1
IF V5 EQ 1 AND V6 INLIST (2,3) THEN R110=2
IF V5 EQ 2 AND V6 EQ 1 THEN R110=3
IF V5 EQ 2 AND V6 INLIST (2,3) THEN R110=4
Method c. Use the RECODE function.
R110=RECODE V5,V6(1/1)=1,(1/2-3)=2,(2/1)=4,(2/2-3)=5,ELSE=9
1 CARRY (R901,R902,R903,R904)
2 IF (R901 EQ 0) THEN R901=V1
3 IF (R901 NE V1) THEN GO TO VIL
4 IF EOF THEN GO TO VIL
5 R902=R902+1
6 R903=R903+V8+V9
7 IF (V31 EQ 1) THEN R904=R904+1
8 REJECT
9 VIL R101=(R904*100)/R902
10 R101=BRAC(R101,<25=1,<50=2,<75=3,<101=4)
11 R102=R903/R902
12 R102=BRAC(R102,<1000=1,<2000=2,<5000=3,ELSE=4)
13 R901=V1
14 R902=1
15 R903=V8+V9
16 IF (V31 EQ 1) THEN R904=1 ELSE R904=0
17 NAME R102'average income', R101'% owning car'
R901 is a work variable used to hold the current village
ID; when the first case is read (R901=0), R901 is assigned the value
of the village ID (V1); R902 to R904 are work variables for, respectively,
the number of people in the village, the total income of the people
in the village and the number of people owning cars in the village.
While the village ID stays the same, data is accumulated in variables R902 to R904 (whose values are "carried" as new cases are read). The case is then rejected (not passed to the analysis) and the next case read. When a change in village ID is encountered, the instructions at label VIL are executed: the current contents of R902, R903 and R904 are used to compute the required variables (grouped mean income and grouped % of car owners) and these variables are then passed to the analysis after first resetting the work variables to the values for the last case read (the first case for the next village). When the end of file is reached, we need to make sure that the data from the last village is used. Statement 4 achieves this.