Recode Facility

4    Recode Facility


4.1  Rules for Coding


4.2  Sample Set of Recode Statements

To give some idea of how the elements of the Recode language fit together, a sample set of Recode statements is given below.


     $RECODE
        IF V5 LT 8 THEN REJECT                    (exclude cases where V5 < 8)
        IF NOT MDATA(V6) THEN R51=TRUNC(V6/4) -
           ELSE R51=0
        R52=BRAC(V10,0-24=1,25-49=2,50-74=3, -    (group values of V10)
           74-99=4,TAB=1)
        R53=BRAC(V11,TAB=1)                       (group V11 the same way as V10)
        IF V26 INLIST(1-10) THEN R54=1 AND -
           R55=1 ELSE R54=2
        IF R54 EQ 1 THEN GO TO L1
        R55=99
        R56=V15 + V35
        GO TO L2
     L1 R56=99
     L2 R57=COUNT(1,V20-V27,V29)                  (count how many of the listed
                                                   variables have the value 1)
        NAME R52 'GROUPED AGE',  -
             R53 'GROUPED AGE AT MARRIAGE'
        MDCODES R55(99),R56 (99)

4.3  Missing Data Handling

Except in the special functions MAX, MEAN, MIN, STD, SUM, VAR, Recode does not automatically check the values of variables for missing data. The user must therefore control specifically for missing data before doing calculations with variables. The MDATA function is available for this purpose; e.g.


     IF MDATA (V5,V6) THEN R1=999 ELSE R1=V5+V6
There are two additional functions, MD1 and MD2, which return the 1st or 2nd missing data code value for a variable; e.g.

     R2=MD1(V6)
assigns R2 the value of the 1st missing data code of V6.

Finally, missing data codes can be assigned to R or V variables with the MDCODES definition statement; e.g.


     MDCODES R3(8,9)
assigns 8 and 9 as the 1st and 2nd missing data codes for R3.

Sometimes a set of Recode statements does not assign a value to an R-variable for a particular data record. The R-variable will then take the default MD1 value of 1.5 * 109 to which it is initialized. To change this to a more acceptable missing data value, we must test if the value is large and, if so, assign an appropriate missing data value, e.g.


     IF R100 GT 1000000 THEN R100=99
     MDCODES R100(99)

4.4  How Recode Functions

Syntax checking and interpretation. Recode statements are read and analyzed for errors prior to interpretation of other IDAMS program control statements and prior to program execution. If errors are found, diagnostic messages are printed and execution of the program is terminated.

Results. Recode prints out the Recode statements input by the user along with syntax errors detected if any. This occurs before the program is executed, i.e. before the interpretation of the program control statements is printed.

Initialization before starting to process the Data file. If there are no syntax errors, tables, missing data codes, names, etc. are initialized (according to the initialization/definition statements supplied by the user) before starting to read the data. R-variables in CARRY statements are initialized to zero.

Initialization before processing each data case. At the start of processing of each case and before execution of the Recode statements for that case, all R-variables, except those listed in CARRY statements, are initialized to the IDAMS internal default missing data value (1.5 * 109).

Execution of Recode statements. The actual recoding takes place after the data for a case is read and after the main filter has been applied. Cases not passing the filter are not passed to the recoding routines. Recode variables cannot therefore be used in main filters.

The use of the Recode statements is sequential (i.e. the first statement is used first, then the second, third, etc.) except as modified by GO TO, BRANCH, RETURN, REJECT, ENDFILE, ERROR statements (the control statements). When all statements have been used, the case is passed to the IDAMS program being executed.

When the IDAMS program has finished using the case, the next case passing the main filter is processed, the R-variables (except the CARRY variables) being reinitialized to missing data and the Recode statements executed for that case and so on until the end of the data file is reached.

Testing Recode statements. Errors in logic can be made which are not detectable by the Recode facility. To check the intended results against those generated by Recode, the Recode statements should be tested on a few records using the LIST program with the parameter MAXCASES set, say, to 10. The data values for the variables input and the corresponding result variables can then be inspected.

Files used by Recode. When a $RECODE command is encountered in the Setup file, subsequent lines are copied into a work file on unit FT46. The RECODE program reads Recode statements from this file and analyzes them for errors prior to interpretation of other IDAMS program control statements and prior to program execution. If errors are found, diagnostic messages are printed and execution of the entire IDAMS step is terminated.

Interpreted statements are written in the form of tables to a work file on unit FT49 from where they are read by the IDAMS program being executed.

Messages about Recode statements are written to unit FT06 along with results from the IDAMS program being executed.

4.5  Basic Operands

Variables. Variables in Recode refer either to input variables (V-variables) or result variables (R-variables). They are defined as follows:

    Input variables (Vn). "V" followed by a number. These are variables as defined by the input dictionary. Their values may be changed by Recode (e.g. V10=V10+V11). Variables should normally be numeric but alphabetic variables of not more than 4 characters can also be used, in particular, they can be recoded to numeric values.

    Result variables (Rn). "R" followed by a number (1 to 9999). These are variables that are created by the user. R-variables (except for those listed in CARRY statements - see below) are initialized to the default missing value of 1.5 * 109 before processing of each case.

    To use an R-variable in a program, specify an R (instead of V) on the variable list attached to a keyword parameter (e.g. WEIGHT=R50 or VARS=(R10-R20)). When printed out by programs, a result variable number is sometimes identified by a negative sign. Thus, variable "10" is V10 and variable "-10" is R10. It is less confusing to use numbers for the result variables which are distinct from input variable numbers. R-variables are always numeric.

Numeric constants. Constants may be integer or decimal, positive or negative, e.g. (3, 5.5, -50, -0.5).

Character constants. Character constants are enclosed in single primes (e.g. 'ABCXYZ', 'M'). A prime within a character constant must be represented by two adjacent primes (e.g. DON'T would be written: 'DON"T'). Character constants are used in the NAME statement to assign names to new variables. They can also be used in logical expressions to test values of alphabetic variables (e.g. IF V10 EQ 'M'); only the first 4 characters are used in such comparisons and constants/variables values of length < 4 are padded on the right with blanks. Character constants cannot be used in arithmetic functions (except BRAC).

4.6  Basic Operators

Arithmetic operators. Arithmetic operators are used between arithmetic operands. Available operators, in precedence order, are:

-  
(negation)
EXP x  
(exponentiation to the power x, where -181 < x < 175)
*  
(multiplication)
/  
(division)
+  
(addition)
-  
(subtraction)

Relational operators. Relational operators are used to determine whether or not two arithmetic values have a particular relationship to one another. The relational operators are:

LT  
(less than)
LE  
(less than or equal)
GT  
(greater than)
GE  
(greater than or equal)
EQ  
(equal)
NE  
(not equal)


Logical operators. Logical operators are used between logical operands. Logical operands take only the values "true" or "false". These are:

NOT  
AND  
(both)
OR  
(either)

4.7  Expressions

An expression is a representation of a value. A single constant, variable, or function reference is an expression. Combinations of constants, variables, functions and other expressions with operators are also expressions. Recode can evaluate arithmetic and logical expressions. Note that brackets can be used anywhere in an expression to clarify the order in which it is to be evaluated.

Arithmetic expressions. Arithmetic expressions are created using arithmetic operators and variables, constants and arithmetic functions. They yield a numeric value. Examples are:


     V732              (the value of V732)
     44                (the constant 44)
     R67/V807 + 25     (25 plus the value of R67 divided by the value of V807)
     LOG(R10)          (the log of the value of R10)
Logical expressions. Logical expressions are evaluated to a "true" or "false" value. Logical variables do not exist in the Recode language, so that the result of logical expressions cannot be assigned to a variable. Logical expressions can only be used in IF statements. Examples are:

     R5 EQ V333
True if the value of R5 is equal to the value of V333, and false otherwise.

     (V62 GT 10) OR (R5 EQ V333)
True if either of the logical expressions results in a true value, and false if both result in a false value.

     MDATA(V10,R20) AND V9 GT 2
True if the value of V10 or the value of R20 is a missing data code and the value of V9 is larger than 2, false otherwise.

4.8  Arithmetic Functions

Arithmetic functions all return a single numeric value. The argument list for functions can be simple lists enclosed in parentheses or highly structured lists involving both keyword elements and elements in specific positions in the list. The available functions are:

Function Example Purpose
ABS ABS(R3) Absolute value
BRAC BRAC(V5,TAB=1,ELSE=9, - Univariate grouping
1-10=1,11-20=2)
BRAC(V10,'F'=1,'M'=2) Alphabetic recoding
COMBINE COMBINE V1(2), V42(3) Combination of 2 variables
COUNT COUNT(1,V20-V25) Counting occurrences of a value
across a set of variables
LOG LOG(V2) Logarithm to the base 10
MAX MAX(V10-V20) Maximum value
MD1,MD2 MD1(V3) Value of missing data code
MEAN MEAN(V5-V8,MIN=2) Mean value
MIN MIN(V10-V20) Minimum value
NMISS NMISS(V3-V6) Number of missing data values
NVALID NVALID(V3-V6) Number of non-missing values
RAND RAND(0) Random number
RECODE RECODE V7,V8,(1/1)(1/2)=1, - Multivariate recoding
(2-3/3)=2, ELSE=0
SELECT SELECT (BY=V10,FROM=R1-R5,9) Selecting the value of one of a set of variables
according to an index variable
SQRT SQRT(V2) Square root
STD STD(V20-V25,MIN=4) Standard deviation
SUM SUM(V6,V8,V9-V12,MIN=3) Sum of values
TABLE TABLE(V5,V3,TAB=2,ELSE=9) Bivariate recoding
TRUNC TRUNC(V26/3) Integer part of the argument's value
VAR VAR(V6,R5-R10,MIN=7) Variance

The exact syntax for each function is given below.

ABS. The ABS function returns a value which is the absolute value of the argument passed to the function.

Prototype: ABS(arg)

Where arg is any arithmetic expression for which the absolute value is to be taken.

Example:


     R5=ABS(V5-V6)
BRAC. The BRAC function returns a value which is derived from performing specified operations (rules) upon a single variable.

Prototype: BRAC(var [,TAB=i] [,ELSE=value] [,rule1,...,rule n] )

Where:

Examples:


     R1=BRAC(V10,TAB=1,ELSE=9,1-10=1,11-20=2,<0=0)
The value of R1 will be 1 if variable 10 is in the range 1 to 10, 2 if V10 is in the range 11 - 20, and 0 if V10 is less than 0. If V10 has any other value, e.g. -3, 10.5, 25, 0, then the ELSE clause would be applied, and R1 would be 9. These bracketing rules are labelled table 1 so they can be re-used, e.g.

     R2=V1 + BRAC(V2, TAB=1) * 3
In this example V2 would be bracketed by the same rules as for V10 in the previous example. R2 would be set to V1 + (the result of bracketing multiplied by 3).

     R100=BRAC(V10,'F'=1,'M'=2,ELSE=9)
This is an example of recoding an alphabetic variable, which has values 'F' or 'M', to numeric values of 1 and 2.

COMBINE. The COMBINE function returns a unique value for each combination of values of the variables that are used as arguments. This function is normally used with categorical variables.

Prototype: COMBINE var1(n1), var2(n2),...,varm(nm)

Where:

Examples:

     R1=COMBINE V6(2), R330(3)
Assume that V6 has two codes (0,1) representing men and women respectively and R330 has three codes (0,1,2) representing young, middle aged and old respondents, the statement will combine the codes of V6 and R330 to give a single variable R1 as follows:

      V6     V330     R1

       0      0        0      Young men
       1      0        1      Young women
       0      1        2      Middle aged men
       1      1        3      Middle aged women
       0      2        4      Old men
       1      2        5      Old women
Since V6 has two codes, and R330 has three, R1 will have six. In the above example, if V6 had codes 1 and 2 instead of 0 and 1, the maximum value should be stated as "3". This would allow for the values of 0,1, and 2, although code 0 would never appear. To avoid these "extra" codes, the user should first recode such variables to give a contiguous set of codes starting from 0, e.g. BRAC(V6,1=0,2=1).

Restrictions:

COUNT. The COUNT function returns a value which is equal to the number of times the value of a variable or constant occurs as the value of one of the variables in the list "varlist".

Prototype: COUNT(val,varlist)

Where:

Examples:


     R3=COUNT(1,V20-V25)
R3 will be assigned a value equal to the number of times the value 1 occurs in the 6 variables V20-V25. This might be used for example to count the number of "YES" responses by a respondent to a set of questions.

     R5=COUNT(V1,V8-V10)
R5 will be assigned a value equal to the number of times that the value of V1 occurs also as the value of variables V8-V10.

LOG. The LOG function returns a floating-point value which is the logarithm to the base 10 of the argument passed to the function.

Prototype: LOG(arg)

Where arg is any arithmetic expression for which the log to the base 10 is to be taken.

Examples:


     R10=LOG(V30)
Note: The logarithm of any number X to any other base B can readily be found by the following simple transformation:

     R1=LOG(X)/LOG(B)
For the natural logarithm (base e), this becomes simply: R1=2.302585 * LOG(X).

Thus R1=2.302585 * LOG(V30) will assign to R1 the natural logarithm of variable 30.

MAX. The MAX function returns the maximum value in a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a maximum to be calculated. Otherwise the default missing data value 1.5 * 109 is returned.

Prototype: MAX(varlist [,MIN=n] )

Where:

Example:


     R12=MAX(V20-V25)
MD1, MD2. The MD1 (or MD2) function returns a value which is the first (or second) missing data code of the variable given as the argument.

Prototype: MD1(var) or MD2(var)

Where var is any input variable (V-variable) or previously defined result variable (R-variable).

Example:


     R12=MD2(V20)
For each case processed, R12 will be assigned the second missing data code for input variable V20.

MEAN. The MEAN function returns the mean value of a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a mean to be calculated. Otherwise the default missing value 1.5 * 109 is returned.

Prototype: MEAN(varlist [,MIN=n] )

Where:

Example:


     R15=MEAN(R2-R4,V22,V5,MIN=2)
The result will be the mean of the specified variables, if at least two of the variables have non-missing values. Otherwise, the result will be 1.5 * 109.

MIN. The MIN function returns the minimum value in a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a minimum to be calculated. Otherwise the default missing value 1.5 * 109 is returned.

Prototype: MIN(varlist [,MIN=n] )

Where:

Example:


     R10=MIN(V5,V7,V9,R2)
NMISS. The NMISS function returns the number of missing values in a set of variables.

Prototype: NMISS(varlist)

Where varlist is a list of V- and R-type variables.

Example:


     R22=NMISS(R6-R10)
The returned value depends on how many of the variables R6 - R10 have missing values. The maximum value is 5 for a case in which all 5 variables have missing data.

NVALID. The NVALID function returns the number of valid values (non-missing values) in a set of variables.

Prototype: NVALID(varlist)

Where varlist is a list of V- and R-type variables.

Example:


     R2=NVALID(V20,V22,V24)
The returned value depends on how many of the variables have valid values. The maximum value of 3 will be obtained if all 3 variables have valid values. 0 will be returned if all 3 are missing.

RAND. The RAND function returns a value which is a uniformly distributed random number based upon the arguments "starter" and "limit" as described below.

Prototype: RAND(starter [,limit] )

Where:

Examples:


     R1=RAND(0)
     IF RAND(0) NE 1 THEN REJECT
For each case processed, R1 will be set equal to a random number, uniformly distributed from 1 to 10. The sequence is initialized to the clock time the first time RAND is executed. Note that RAND can be used with the REJECT statement to select a random sample of cases. The 2nd example will result in including a random 1/10 sample of cases.

RECODE. The RECODE function is used to return one value based upon the concurrent values of m variables.

Prototype: RECODE var1,var2,...,varm [,TAB=i] [,ELSE=value] [,rule1,rule2,...,rule n]

Where:

Examples:


     R7=RECODE V1,V2,(3/5)(7/8)=1,(6-9/1-6)=2
R7 will be assigned a value based on the values of V1 and V2. In this example, R7 will be set to 1 if V1=3 and V2=5, or if V1=7 and V2=8. R7 will be set to 2 if V1=6-9 and V2=1-6. In all other instances, R7 will be unchanged (see above).

     R7=RECODE V1,V2,TAB=1,ELSE=MD1(R7),(3/5)(7/8)=1,(6-9/1-6)=2
R7 will be assigned a value the same as in the preceding example, except that R7 will be set equal to its MD1 value when the rules are not met. The TAB=1 will allow these rules to be used in another RECODE function call.

Restriction: When the RECODE function is used, it must be the only operand on the right-hand side of the equals sign.

SELECT. The SELECT function returns the value of the variable or constant in the FROM list holding the same position as the value of the BY variable. (Warning: If the value of the BY variable is less than 1 or greater than the number of variables in the FROM list, a fatal error results). There may be up to 50 items in the FROM list. The maximum value of the BY variable is therefore 50. A SELECT function may be combined with other functions, operations, and variables to form a complex expression. Note: The SELECT function selects the value of one of a set of variables; the SELECT statement selects the variable to be used for the result. (See section "Special Assignment Statements" for description of SELECT statement).

Prototype: SELECT (FROM=list of variables and/or constants, BY=variable)

Example:


     R10=SELECT (FROM=R1-R3,9,BY=V2)
R10 will take the value of R1, R2, R3 or 9 for values of 1, 2, 3 or 4 respectively of V2.

SQRT. The SQRT function returns a value which is the square root of the argument passed to the function.

Prototype: SQRT(arg)

Where arg is any arithmetic expression.

Example:


     R5=SQRT(V5)
STD. The STD function returns the standard deviation of the values of a set of variables. Missing data values are excluded. The MIN argument can be used to specify the minimum number of valid values for a standard deviation to be calculated. Otherwise the default missing value 1.5 * 109 is returned.

Prototype: STD(varlist [,MIN=n] )

Where:

Example:


     R5=STD(V20-V24,R56-R58,MIN=3)
SUM. The SUM function returns the sum of the values of a set of variables. Missing values are excluded. The MIN argument can be used to specify the minimum number of valid values for a sum to be calculated. Otherwise the default missing value 1.5 * 109 is returned.

Prototype: SUM(varlist [,MIN=n] )

Where:

Example:


     R8=SUM(V20,V22,V24,V26,MIN=3)
If three or more of the variables have valid values, the sum of these is returned. Otherwise the value 1.5 * 109 is returned.

TABLE. The TABLE function returns a value based on the concurrent values of two variables.

Prototype: TABLE (r, c, [TAB=i,] [ELSE=value,] [PAD=value,] COLS c1,c2,...,cm,

ROWS r1(row r1 values),r2(row r2 values),...,rn(row rn values))

Where:

Examples: Assume the following table:


             Col:     1   2   3   4   5   6

      Row:    2       1   1   2   2   3   4
              3       1   2   2   2   3   4
              5       1   2   2   2   3   4
              6       3   3   3   3   3   4
              8       9   9   9   9   9   9


     R1=TABLE (V6, V4, TAB=1, ELSE=0, PAD=9, COLS 1-6, ROWS 2(1,1,2,2,3,4), -
        3(1,2,2,2,3,4),5(1,2,2,2,3,4),6(3,3,3,3,3,4),8(9))
If V6 equals 5 and V4 equals 3, then R1 will be assigned the value 2 (intersect of row 5 and column 3).
If V6 equals 2 and V4 equals 6, then R1 will be assigned the value 4 (intersect of row 2 and column 6).
If V6 equals 4 and V4 equals 2, then R1 will be assigned the value 0 (row 4 is not defined; the ELSE value is used).

     R5=TABLE (3, V8, TAB=7, ELSE=TABLE(V1,V8,TAB=1) )
This will use the table named "7" with 3 as the row index and the value of V8 as the column index. If a value of V8 is not in table 7 then the table "1" will be used with row index V1 and column index V8.

TRUNC. The TRUNC function returns the integer value of an argument.

Prototype: TRUNC(arg)

Where arg is any arithmetic expression for which the integer value is to be taken.

Example:


     R5=TRUNC(V5)
R5 will be assigned the value of the input variable V5 truncated to an integer.

VAR. The VAR function returns the variance of the values of a set of variables, excluding missing data. The MIN argument can be used to specify the minimum number of valid values for the variance to be calculated. Otherwise the default missing value 1.5 * 109 is returned.

Prototype: VAR(varlist [,MIN=n] )

Where:

Example:


     R9=VAR(V5-V10)

4.9  Logical Functions

Logical functions return a value of "true" or "false" when evaluated. They cannot be used as arithmetic operands. Logical functions are used in logical expressions and logical expressions comprise the test portion of conditional "IF test THEN..." statements. The available functions are:

Function Example Purpose
EOF IF EOF THEN GO TO NEXT Checks for the end of the data file
INLIST IF V5 INLIST(2,4,6) THEN - Searches a list of values
R100=1 ELSE R100=0
MDATA IF MDATA(V5,V6) THEN R101=99 Checks for missing data

EOF. The EOF function is used for aggregation of values across cases. See example 10 in section "Examples of Use of Recode Statements". The presence of the EOF function causes the Recode statements to be executed once more after the end-of-file has been encountered. The value of the EOF function is true during this after-end-file pass of the Recode statements and is false at all other times.

For the final pass through the Recode statements, V-variables will have the value they had after the last case was fully processed. R-variables (except those listed in CARRY statements) will be reinitialized to 1.5 * 109. CARRY R-variables will be left untouched. The user must be careful to set up a correct path to be followed through the Recode statements when end-of-file is reached.

Prototype: EOF

Example:


     IF R1 NE V1 OR EOF THEN GO TO L1
INLIST. The INLIST function (abbreviated IN) returns a value of "true" if the result of an arithmetic expression is one of a specified set of values. If the expression equals a value outside the set of values, the function returns a value of "false".

Prototype: expr INLIST(values) or expr IN(values)

Where:

Examples:


     IF R12 INLIST(1-5,9,10) THEN V5=0
If R12 has a value of 1,2,3,4,5,9 or 10, the INLIST function returns a value of "true", and input variable V5 is set to 0. Otherwise, INLIST returns a value of "false" and input variable V5 retains its original value.

     IF (V3 + V7) IN(2,4,5,6) THEN R1=1 ELSE R1=9
If the sum of input variables V3 and V7 results in the value 2,4,5, or 6, then INLIST returns a value of "true" and result variable R1 will contain the value 1. Otherwise, INLIST returns a value of "false" and R1 will be set to 9.

MDATA. The MDATA function returns a value of "true" if any of the variables passed to the function have missing data values; otherwise, the function returns a value of "false". This function is used quite often, since missing data is not automatically checked in the evaluation of expressions except in the MAX, MEAN, MIN, STD, SUM and VAR functions.

Prototype: MDATA(varlist)

Where varlist is a list of V- and R-variables. There can be a maximum of 50 variables in this list.

Example:


     IF MDATA(V1,V5-V6) THEN R1=MD1(R1) ELSE R1=V1+V5+V6
If any variable in the list V1, V5, V6 has a value equal to its MD1 code or in the range specified by its MD2 code, the MDATA function will return a value of "true", and result variable R1 will be set to its first missing data code. Otherwise, the MDATA function will return a value of "false" and R1 is set to the sum of V1, V5, V6.

4.10  Assignment Statements

These are the main structural units of the Recode language. They are used to assign a value to a result. Any number between 1 and 9999 may be used for an R-variable but it avoids confusion if the R-numbers are distinct from V-numbers of variables in the input dictionary, e.g. if there are 22 variables in the dictionary then start numbering R-variables from R30. Assignment statements can also be used to assign a new value to an input variable. In this case the original value of the input variable is lost for the duration of the particular IDAMS program execution.

Prototype: variable=expression

Where:

Examples:


     R10=5
R10 is assigned the constant 5 as its value.

     R5=2*V10 + (V11 + V12)/2
Any arithmetic expression may be used and parentheses are used to change normal precedence of the arithmetic operators.

     V20=SQRT(V20)
The value in V20 is replaced by its square root using the SQRT function.

     R20=BRAC(V6,0-15=1,16-25=2,26-35=3,36-90=4,ELSE=9)
R20 is assigned the value 1, 2, 3, 4 or 9 according to the group into which the value of V6 falls.

     R10=MD1(V10)
R10 is assigned a value equal to V10's first missing data code.

4.11  Special Assignment Statements

DUMMY. The DUMMY statement produces a series of "dummy variables", coded 0 or 1, from a single variable.

Prototype: DUMMY var1,...,varn USING var(val1)(val2)...(valn)[ELSE expression]

Where:

Example:


     DUMMY R1-R3 USING V8(1-4)(5,7,9)(0,8) ELSE 99
The following chart shows the values of R1, R2 and R3 based on different V8 values:

             V8:   1   2   3   4   5   7   8   9   0   OTHER
             R1:   1   1   1   1   0   0   0   0   0   99
             R2:   0   0   0   0   1   1   0   1   0   99
             R3:   0   0   0   0   0   0   1   0   1   99
SELECT. The SELECT statement causes the variable in the FROM list holding the same position as the value of the BY variable to be set equal to the value of the expression to the right of the equals sign i.e. it selects which variable is to be assigned a value. If the value of the BY variable is less than 1 or greater than the number of variables in the FROM list, a fatal error results. The maximum number of items in the FROM list is 50. Therefore the maximum value of the BY variable is 50.

Prototype: SELECT (FROM=variable list, BY=variable)=expression

Examples:


     SELECT (FROM=R1,V3-V10, BY=R99)=1
     SELECT (BY=V1, FROM=V8,R2,R5)=R7*5
In the first example, R1 will be set to 1 if R99 equals 1; V3 will be set to 1 if R99 equals 2; ... ; and V10 will be set to 1 if R99 equals 9. If R99 is greater than 9 or less than 1, a fatal error will result. The values of the eight variables not selected will not be altered.

SELECT may be used to form a loop as follows:


         R99=1
     L1  SELECT (BY=R99, FROM=R1,V3-V10)=0
         IF R99 LT 9 THEN R99=R99+1 AND GO TO L1
The nine variables R1, V3-V10 will be set to zero, one after another, as R99 is incremented from 1 to 9. The loop is completed when R99 equals 9 and all variables have been initialized.

4.12  Control Statements

Recode statements are normally executed on each data case in order from first to last. The order can be changed with one of the control statements:

Statement Example Purpose
BRANCH BRANCH (V16,L1,L2) Branch depending on the value of a variable
CONTINUE CONTINUE Continue with next statement
ENDFILE ENDFILE Do not process any more
data cases after this one
ERROR ERROR Terminate execution completely
GO TO GO TO TOWN Branch unconditionally
REJECT REJECT Reject the current data case
RELEASE RELEASE Release the current data case to the program
for processing and then execute recode
statements again without reading another case
RETURN RETURN Use the current case for analysis
with no further recoding

BRANCH. The BRANCH statement changes the sequence in which statements are executed, depending on the value of a variable.

Prototype: BRANCH(var,labels)

Where:

Example:

     BRANCH(R99,LAB1,LAB2,LAB3)
Transfer is made to LAB1, LAB2, or LAB3, depending on whether R99 has a value of 1,2, or 3.

CONTINUE. CONTINUE is a simple statement which performs no operation. It is used as a convenient transfer point.

Prototype: CONTINUE

Example:


            IF V17 EQ 10 THEN GO TO AT
            R10=V11
            GO TO THAT
     AT     R20=V11*100
     THAT   CONTINUE
ENDFILE. The ENDFILE statement causes the Recode facility to close the input dataset exactly as if an end-of-file had been reached. If the EOF function has been specified, the EOF function will be given a true value for a final pass through the Recode statements from the beginning, after ENDFILE has been executed.

Prototype: ENDFILE

Example:


     IF V1 EQ 100 THEN ENDFILE
This statement can be used to test a set of Recode statements or an IDAMS setup on the first n cases of a dataset.

ERROR. The ERROR statement directs the Recode facility to terminate execution with an error message that indicates the number of the case and the number of the Recode statement at which the error occurred.

Prototype: ERROR

Example:


         IF R6 EQ 2 THEN GO TO B
         ERROR
     B   CONTINUE
GO TO. The GO TO statement is used to change the sequence in which the statements are executed. In the absence of a GO TO or a BRANCH statement, each statement is executed sequentially.

Prototype: GO TO label

Where label is a 1-4 character statement label. The statement identified by the label may be physically before or after the GO TO statement. (Warning: Be careful of referencing a statement before the GO TO, as an endless loop can be formed).

Example:


           GO TO TOWN
           .
           .
           R10=R5
           GO TO 1
     TOWN  R10=R5+V11
     1     R11=...
REJECT. The REJECT statement directs the Recode facility to reject the present case and obtain another case. The new case is then processed from the beginning of the Recode statements. Thus, REJECT can be used as a filter with R-variables.

Prototype: REJECT

Example:


     IF MDATA (V8,V12-V13) THEN REJECT
RELEASE. The RELEASE statement directs the Recode facility to release the present case to the program for processing and to regain control after the processing without reading another case. After regaining control, Recode resumes with the first Recode statement. RELEASE can be used to break up a single record into several cases for analysis. Note: When using the RELEASE statement, care should be taken that processing will not continue indefinitely.

Prototype: RELEASE

Example:


     CARRY (R1)
     R1=R1+1
     IF R1 LT V1 THEN RELEASE ELSE R1=0
RETURN. The RETURN statement directs the Recode facility to return control to the IDAMS program. No other Recode statements are executed for the current case.

Prototype: RETURN

Example:


         IF V8 LT 12 THEN GO TO A
         RETURN
     A   R10=V8

4.13  Conditional Statements

The IF statement allows conditional assignment and/or conditional control. It is a compound statement with several simple statements connected by the keywords THEN, AND and ELSE.

Prototype:

IF test THEN stmt1 [AND stmt2 AND ... stmt n][ELSE estmt1] [AND estmt2 AND ... estmt n]
Where:

Examples:


     IF V5 EQ V6 THEN R1=1 ELSE R1=2
Set R1 to 1 if the value of V5 equals the value of V6; otherwise set R1 to 2.


     IF MDATA(V7,V10-V12) THEN R6=MD1(V7) AND R10=99 -
        ELSE R6=V7+V10+V11 AND R10=V12*V7
Set R6 to V7's first missing data value and R10 to 99 if any of the variables V7, V10, V11, V12 are equal to their missing data codes. Otherwise set R6 equal to the sum of V7, V10 and V11, and also set R10 equal to the product of V12 and V7.


     IF (V5 NE 7 AND R8 EQ 9) THEN V3=1 ELSE V3=0
Set V3 to 1 if both V5 is not equal to 7 and R8 is equal to 9. (Note: The parentheses are not required).


     IF MDATA(V6) OR V10 LT 0 THEN GO TO X
If the value of V6 is missing or V10 is less than 0, branch to the statement labelled X; otherwise continue with the next statement.

4.14  Initialization/Definition Statements

These statements are executed once, before processing of the data starts, to initialize values to be used during the execution of Recode statements. They cannot be used in expressions and they cannot have labels.

CARRY. The CARRY statement causes the values of the variables listed to be carried over from case to case. CARRY variables are initialized only once (before starting to read the data) to zero. The CARRY variables can be used as counters or as accumulators for aggregation.

Prototype: CARRY(varlist)

Where varlist is a list of R-variables.

Example:


     CARRY(R1,R5-R10,R12)
MDCODES. The MDCODES statement changes dictionary missing data codes for input variables or assigns missing data codes for result variables. Defaults used by Recode for R- and V-variables with no dictionary missing data specification and no MDCODES specification are MD1=1.5 * 109 and MD2=1.6 * 109.

Prototype: MDCODES (varlist1)(md1,md2),(varlist2)(md1,md2), ..., (varlistn)(md1,md2)

Where:

Examples:


     MDCODES V5(8,9)
The first missing data code for V5 will be 8; the second missing data code will be 9.

     MDCODES (R9-R11)(,99), V7(8,9), V6(9)
For R9, R10 and R11, the first missing data code will be 1.5 * 109 and the second missing data code will be 99.
For V7, the first missing data code will be 8 and the second missing data code will be 9.
For V6, the first missing data code will be 9 and the second missing data code will be 1.6 * 109.

NAME. The NAME statement assigns names to R-variables or renames V-variables.

Prototype: NAME var1 'name1' ,var2 'name2', ..., varn 'name n'

Where:

Example:


     NAME R1 'V5 + V6', V1 'PERSON''S STATUS'

4.15  Examples of Use of Recode Statements

Suppose a data file exists with the following variables:

V1 Village ID
V2 Sex 1=male, 2=female
V4 Age 21-98, 99=not stated
V5 Education level 1=primary, 2=secondary,
3=university, 9=Not stated
V8 Income from 1st job
V9 Income from 2nd job
V10 Partner's income
V21 Weight in kg (one decimal)
V22 Height in meters (2 decimals)
V31 Owns car? 1=yes, 2=no, 9=NS
V32 Owns TV?
V33 Owns stereo?
V34 Owns freezer?
V35 Owns Micro computer?
V41 Number of children
V42 Age of lst child
V43 Age of 2nd child
V44 Age of 3rd child
V45 Age of 4th child
Ways to construct some possible analysis variables from this data are outlined below.

  1. Total Income. If income from lst and 2nd jobs are both missing, then the total income will be missing. If only one is missing, then use this as the total.
    
                       IF NVALID(V8,V9) EQ 0 THEN R101=-1 AND GO TO END
                       IF NVALID(V8,V9) EQ 2 THEN R101=V8+V9 AND GO TO END
                       IF MDATA(V8) THEN R101=V9 ELSE R101=V8
                  END  CONTINUE
                       MDCODES R101(-1)
    
         or       R101=SUM(V8,V9,MIN=1)
                  IF R101 EQ 1.5 * 10 EXP 9 THEN R101=-1
                  MDCODES R101(-1)
    
  2. Do not use the case if total income is zero or missing.
    
                  IF MDATA(R101) OR R101 EQ 0 THEN REJECT
    
  3. Composite income taking 3/4 of own income plus 1/4 of partner's income. If partner's income is missing, assume zero.
    
                  IF MDATA(V10) THEN V10=0
                  IF MDATA(R101) THEN R102=MD1(R102)  -
                     ELSE R102=R101 * .75 + V10 * .25
                  NAME R102'Composite income'
                  MDCODES R102(99999)
    
  4. Weight of respondent grouped into light (30-50), medium (51-70) and heavy (70+).
    
                  R103=BRAC(V21,30-50=1,50-70=2,70-200=3,ELSE=9)
    
    Note that V21 is recorded with a decimal place. To make sure that values such as 50.2 get assigned to a category, ranges in the BRAC statement should overlap. Recode works from left to right and assigns the code for the first range into which the case falls. Thus a value of 50.0 will fall in category 1 but a value 50.1 will fall into category 2. To put values of 50 in the 2nd category, use
    
                  R103=BRAC(V21, <50=1, <70=2, <200=3, ELSE=9)
    
    A value of 49 would fit in all 3 ranges, but Recode will use the first valid range it finds (code 1). A value of 50 will not satisfy the first range and will be assigned code 2.

  5. Affluence index with values 0-5 according to the number of possessions owned.
    
                  R104=COUNT(1,V31-V35)
    
    If all items are coded 1 (yes), the index, R104, will take the value 5. If all are coded 2 (no) or are missing, then the index will be zero.

  6. Create 3 dummy variables (coded 0/1) from the education variable.
    
                  DUMMY R105-R107 USING V5(1)(2)(3)
    
    The 3 result variables will take values as follows:

    V5=1 R105=1, R106=0, R107=0
    V5=2 R105=0, R106=1, R107=0
    V5=3 R105=0, R106=0, R107=1
    V5 not 1,2 or 3 R105=0, R106=0, R107=0 (default if no ELSE value given)

  7. Age of youngest child. Ages of the last 4 children are stored in variables 42 to 45, the oldest child being in V42. If someone has 3 children, then the value of V44 gives the age of the youngest child; if someone has 4 or more children then we want V45. In this case, V41 (number of children) can be used as an index to select the correct variable using the SELECT function.
    
                  IF V41 GT 4 THEN V41=4
                  IF V41 EQ 0 OR MDATA(V41) THEN  R109=99 ELSE  -
                     R109=SELECT (FROM=V42-V45, BY=V41)
                  NAME R109'Last child''s age'
                  MDCODES R109(99)
    
  8. Weight/Height ratio as a decimal number and rounded to the nearest integer.
    
                  IF MDATA (V21,V22) OR V22 EQ 0 THEN R111=99 AND R112=99 -
                     ELSE R111=V21/V22 AND R112=TRUNC ((V21/V22) + .5)
                  NAME R111'Weight/Height ratio dec', R112 'W/H rounded'
                  MDCODES (R111,R112)(99)
    
  9. Create a single variable combining sex and educational level into 4 groups as follows:
    Females, primary education only
    Females, secondary+ education
    Males, primary education only
    Males, secondary+ education

    Method a. First reduce the codes for sex and education into contiguous codes starting from 0, storing the results temporarily in variables R901, R902.

    
                  R901=BRAC (V5,1=0,2=1,ELSE=9)
                  R902=BRAC (V6,1=0,2=1,3=1,ELSE=9)
    
    Then use the COMBINE function, making sure first that cases with spurious codes are put in a missing data category.
    
                  IF R901 GT 1 OR R902 GT 1 THEN R110=9 ELSE -
                     R110=COMBINE R901(2),R902(2)
    
    Method b. Use IFs, setting a default value of 9 at the start.
    
                  R110=9
                  IF V5 EQ 1 AND V6 EQ 1 THEN R110=1
                  IF V5 EQ 1 AND V6 INLIST (2,3) THEN R110=2
                  IF V5 EQ 2 AND V6 EQ 1 THEN R110=3
                  IF V5 EQ 2 AND V6 INLIST (2,3) THEN R110=4
    
    Method c. Use the RECODE function.
    
                  R110=RECODE V5,V6(1/1)=1,(1/2-3)=2,(2/1)=4,(2/2-3)=5,ELSE=9
    
  10. Aggregating cases with Recode. Suppose we want to analyze the data (consisting of individual level records) at the village level, for example to produce a table showing the distribution of villages by income (V8,V9) and % of people owning a car (V31) in the village. We could do this by using AGGREG to aggregate the data to the village level and then executing TABLES. Alternatively, we may use the CARRY, EOF and REJECT statements of the Recode language and use TABLES directly.
    
          1         CARRY (R901,R902,R903,R904)
          2         IF (R901 EQ 0) THEN R901=V1
          3         IF (R901 NE V1) THEN GO TO VIL
          4         IF EOF THEN GO TO VIL
          5         R902=R902+1
          6         R903=R903+V8+V9
          7         IF (V31 EQ 1) THEN R904=R904+1
          8         REJECT
          9   VIL   R101=(R904*100)/R902
         10         R101=BRAC(R101,<25=1,<50=2,<75=3,<101=4)
         11         R102=R903/R902
         12         R102=BRAC(R102,<1000=1,<2000=2,<5000=3,ELSE=4)
         13         R901=V1
         14         R902=1
         15         R903=V8+V9
         16         IF (V31 EQ 1) THEN R904=1 ELSE R904=0
         17         NAME R102'average income', R101'% owning car'
    
    R901 is a work variable used to hold the current village ID; when the first case is read (R901=0), R901 is assigned the value of the village ID (V1); R902 to R904 are work variables for, respectively, the number of people in the village, the total income of the people in the village and the number of people owning cars in the village.

    While the village ID stays the same, data is accumulated in variables R902 to R904 (whose values are "carried" as new cases are read). The case is then rejected (not passed to the analysis) and the next case read. When a change in village ID is encountered, the instructions at label VIL are executed: the current contents of R902, R903 and R904 are used to compute the required variables (grouped mean income and grouped % of car owners) and these variables are then passed to the analysis after first resetting the work variables to the values for the last case read (the first case for the next village). When the end of file is reached, we need to make sure that the data from the last village is used. Statement 4 achieves this.


4.16  Restrictions

  1. Maximum number of R-variables is 200.
  2. Maximum number of numbered tables (BRAC, RECODE, TABLE) is 20.
  3. Maximum number of characters in a Recode statement excluding continuation -'s is 1024.
  4. Maximum number of statement labels is approximately 60.
  5. Maximum number of constants, including those in all tables, is approximately 1500.
  6. Maximum number of names that may be defined in NAME statements is 70.
  7. Maximum number of missing data values that may be defined in MDCODES statements is 100 and only 2 decimal places are retained for R-variables.
  8. Maximum number of parenthetical nestings within a statement (i.e. parentheses within parentheses) is 20.
  9. Maximum number of arithmetic operators is approximately 400.
  10. Maximum number of variables with SELECT statement is 50.
  11. Maximum number of IF statements is approximately 100.
  12. Maximum number of function nestings (i.e. function references as function arguments) is 25.
  13. Maximum number of statements is approximately 200.
  14. Maximum number of labels in a BRANCH statement is 20.
  15. Maximum number of CARRY variables is 100.
  16. The "maximum number of variables" given in the "Restrictions" section of each analysis program write-up includes R- and V-variables used in the analysis and V-variables used in Recode but not used in the analysis. Thus, if a program has a 40-variable maximum and 40 input variables are used in the analysis, one cannot use any other input variables than those 40 in the Recode statements. R-variables defined in Recode statements but not used in the analysis need not be counted toward the "maximum number of variables".
  17. Filtering takes place prior to recoding so that result variables may not be referenced in main filters.


4.17  Note

Univariate/bivariate recoding can be achieved using TABLE, IF or RECODE method. Below is a brief comparison of these methods taking into account two execution aspects.

Completeness

Size of table