Univariate and Bivariate Tables (TABLES)

# 37    Univariate and Bivariate Tables (TABLES)

## 37.1  General Description

The main use of TABLES is to obtain printed univariate or bivariate frequency tables with optional row, column and corner percentages and optional univariate and bivariate statistics. Tables of mean values of a variable can also be printed.

Both univariate/bivariate tables and bivariate statistics can be output to a file so that can be used with a report generating program for being printed in a user designed format, or can be input to GraphID or other packages such as EXCEL for graphical display.

Univariate tables. Both univariate frequencies and cumulative univariate frequencies may be generated for any number of input variables and may also be expressed as percentages of the weighted or unweighted total frequency. In addition, the mean of a cell variable can be obtained.

Bivariate tables. Any number of bivariate tables may be generated. In addition to the weighted and/or unweighted frequencies, a table may contain frequencies expressed as percentages based on the row marginals, column marginals or table total, and the mean of a cell variable. These various items may be printed in a single table with a possible six items per cell, or each may be printed as a distinct table.

Univariate statistics. For univariate analyses, the following statistics are available: mean, mode, median, variance (unbiased), standard deviation, skewness and kurtosis. A quantile option (NTILE) is also available. Division into as few as three parts or as many as ten parts may be requested.

Bivariate statistics. For bivariate analyses, the following statistics can be requested:

 - t-tests of means (assumes independent populations) between pairs of rows, - chi-square, contingency coefficient and Cramers V, - Kendalls Taus, Gamma, Lambdas, - S (numerator of the tau statistics and of gamma), its standard and normal deviations, and its variance, - Spearman rho, - non-parametric tests: Wilcoxon, Mann-Whitney and Fisher.

Matrices of statistics. Matrices of any of the above bivariate statistics except tests or statistics of S can be printed or written to a file. Corresponding matrices of weighted and/or unweighted ns can be produced.

3- and 4-way tables. These can be printed by making use of the repetition and subsetting features. The repetition variable can be thought of as a control or panel variable. The subsetting feature can be used to further select cases for a particular group of tables.

Tables of sums. Tables in which the cells contain the sum of a dependent variable can be produced by specifying the dependent variable as the weight. E.g. specify WEIGHT=V208, where V208 represents a respondents income, in order to get the total income of all respondents falling into a cell.

Note. The following options are available to control the appearance of the printout:

A title may be specified for each set of tables.

Percentages and mean values, if requested, may be printed in separate tables.

The grid can be suppressed.

Rows which have no entries in a particular section of a large frequency table can be printed; tables with more than ten columns are printed in sections and the use of this "zero rows" option ensures that the various sections have the same number of rows (which is important if they are to be cut and pasted together).

## 37.2  Standard IDAMS Features

Case and variable selection. The standard filter is available to select a subset of cases from the input data. In addition, local filters and repetition factors (called subset specifications) may be used to select a subset of cases for a particular table. For tables which are individually specified, the variable(s) to be used for the table are selected with the table specification parameters R and C. For sets of tables, variables are selected with the table specification parameters ROWVARS and COLVARS.

Transforming data. IDAMS Recoding may be used. Note that only integer part of recoded variables is used by the program, i.e. recoded variables are rounded to the nearest integer.

Weighting data. A weight variable may optionally be specified for each set of tables. Note that if the weight is a V-variable with decimals its values are treated as if the decimal point were absent, e.g. the number 20.1 is treated as 201, the scale factor used (10 in the example) is however printed. When the value of the weight variable for a case is zero, negative, missing or non-numeric, then the case is always skipped; the number of cases so treated is printed.

Treatment of missing data.

1. The MDVALUES parameter is available to indicate which missing data values, if any, are to be used to check for missing data.
2. Univariate and bivariate frequencies are always printed for all codes in the data whether or not they represent missing data. To remove missing data from tables completely, a filter or a table subset can be specified. Alternatively appropriate minimum and/or maximum values of row and column variable can be defined.
3. Cases with missing data may optionally be included in the computation of percentages and bivariate statistics. This can be done using the MDHANDLING table parameter.
4. Cases with missing data on a cell variable are always excluded from univariate and bivariate tables.
5. Cases with missing data are always excluded from the computation of univariate statistics.

## 37.3  Printed Output

Input dictionary. (Optional: see the parameter PRINT). Variable descriptor records, and C-records if any, only for variables used in the execution.

A table of contents for the printed output. The contents shows each table produced and gives the page number where it is located. The following information is provided:

- row and column variable numbers (0 if none)
- variable number for the mean value - cell variable (0 if none)
- weight variable number (0 if none)
- row minimum and maximum values (0 if none)
- column minimum and maximum values (0 if none)
- filter name and repetition factor name
- percentages: row, column and total (T=requested, F=not requested)
- RMD: row-variable missing data (T=delete, F=do not delete)
- CMD: column-variable missing data (T=delete, F=do not delete)
- CHI: chi-square (T=requested, F=not requested)
- TAU: tau a, b or c (T=requested, F=not requested)
- GAM: gamma (T=requested, F=not requested)
- TEE: t-tests (T=requested, F=not requested)
- EXA: Fisher non-parametric test (T=requested, F=not requested)
- WIL: Wilcoxon non-parametric test (T=requested, F=not requested)
- MW: Mann-Whitney non-parametric test (T=requested, F=not requested)
- SPM: Spearman rho (T=requested, F=not requested).
Tables which were requested using the PRINT=MATRIX or WRITE=MATRIX table parameters are not listed in the contents and are always printed first with negative page and table numbers.

Other tables are printed in the order of the table specifications except for tables for which only univariate statistics are requested; these are always grouped together and printed last.

Bivariate tables. Each bivariate table starts on a new page; a large table may take more than one page. Tables are printed with up to 10 columns and up to 16 rows per page depending on the number of items in each cell. Columns and rows are printed only for codes which actually appear in the data. Row and column totals, and cumulative marginal frequencies and percentages if requested, are printed around the edges of the table.

A large table is printed in vertical strips. For example, a table with 40 row codes and 40 column codes would normally be printed on 12 pages as indicated in the following diagram, where the numbers in the cells show the order in which the pages are printed:

```
1st     2nd     3rd     4th
10      10      10      10      codes

1st 16 codes    1       4       7      10

2nd 16 codes    2       5       8      11

last 8 codes    3       6       9      12
```
Bivariate statistics. (Optional: see the table parameter STATS).

t-tests. (Optional: see the table parameter STATS). If t-tests were requested, they and the means and standard deviations of the column variable for each row are printed on a separate page.

Matrices of bivariate statistics. (Optional: see the table parameter PRINT). The lower-left corner of the matrix is printed. Eight columns and 25 rows are printed per page.

Matrix of Ns. (Optional: see the table parameter PRINT). This is printed in the same format as the corresponding statistical matrix.

Univariate tables. (Optional: see the table parameter CELLS). Normally each univariate table is printed beginning on a new page. Frequencies, percents and mean values of a variable, if requested, for ten codes are printed across the page.

Univariate statistics. (Optional: see the table parameter USTATS).

Quantiles. (Optional: see the table parameter NTILE). N-1 points are printed; e.g. if quartiles are requested, the parameter NTILE is set to 4 and 3 breakpoints will be printed.

Page numbers. These are of the form: ttt.rr.ppp where

 ttt = table number rr = repetition number (00 if no repetition used) ppp = page number within the table.

## 37.4  Output Univariate/Bivariate Tables

Univariate and/or bivariate tables with statistics requested in the table parameter CELLS may be output to a file by specifying WRITE=TABLES. The tables are in the format of IDAMS rectangular matrix (see "Data in IDAMS" chapter). One matrix is output for each statistic requested. If a repetition factor is used, one matrix is output for each repetition.

Columns 21-80 on the matrix-descriptor record contain additional description of the matrix as follows:

21-40
Row variable name (for bivariate tables).
41-60
Column variable name.
61-80
Description of the values in the matrix.

Variable identification records (#R and #C) contain code values and code labels for the row and the column variable respectively.

The statistics are written as 80 character records according to a 7F10.2 Fortran format. Columns 73-80 contain an ID as follows:

73-76
Identification of the statistic: FREQ, UNFR, ROWP, COLP, TOTP or MEAN (see the table parameter CELLS).
77-80
Table number.

Note that the missing data codes are not included in the matrix.

## 37.5  Output Bivariate Statistics Matrices

Selected statistics may be output to a file. If, for example, gammas and tau bs were selected, a matrix of gammas and a separate matrix of tau bs would be generated. Output matrices of bivariate statistics are requested by specifying WRITE=MATRIX and either ROWVARS or ROWVARS and COLVARS table parameters. If a repetition factor is used, one matrix is output for each repetition. The matrices are in the format of IDAMS square or rectangular matrices (see "Data in IDAMS" chapter). The values in the matrix are written with Fortran format 6F11.5. Columns 73-80 contain an ID as follows:

73-76
Identification of the statistic: TAUA, TAUB, TAUC, GAMM, LSYM, LRD, LCD, CHI, CRMV or RHO (see the table parameter STATS).
77-80
Table number.

Note. If only ROWVARS is provided, dummy means and standard deviations records are written, 2 records per 60 variables. The second format (#F) record in the dictionary specifies a format of 60I1 for these dummy records. This is so that the matrix conforms to the format of an IDAMS square matrix.

## 37.6  Input Dataset

The input is a data file described by an IDAMS dictionary. All variables referenced must be numeric.

In distributions and weights, V-variables with decimal places are treated as if the decimal point were absent, e.g. the number 20.1 is treated as if it were 201, the scale factor used (10 in the example) is however printed. Fields containing non-numeric characters (including fields of blanks) can be tabulated by setting the parameter BADDATA to MD1 or MD2. See "The IDAMS Setup File" chapter.

## 37.7  Setup Structure

 ``` \$RUN TABLES \$FILES File definitions \$RECODE (optional) Recode statements \$SETUP 1. Filter (optional) 2. Label 3. Parameters 4. Subset specifications (optional) 5. TABLES 6. Table specifications (repeated as required) \$DICT (conditional) Dictionary \$DATA (conditional) Data Files: FT02 output tables/matrices DICTxxxx input dictionary (omit if \$DICT used) DATAxxxx input data (omit if \$DATA used) PRINT printed output (default IDAMS.LST) ```

## 37.8  Program Control Statements

Refer to "The IDAMS Setup File" chapter for further descriptions of the program control statements, items 1-3 and 6 below.

1. Filter (optional). Selects a subset of cases to be used in the execution.
```
Example:  INCLUDE V3=6
```
2. Label (mandatory). One line containing up to 80 characters to label the printed output.
```
Example:  FREQUENCY TABLES
```
3. Parameters (mandatory). For selecting program options.
```
```
INFILE=IN /xxxx
A 1-4 character ddname suffix for the input dictionary and data files.
Default ddnames: DICTIN, DATAIN.

Treatment of non-numeric data values. See "The IDAMS Setup File" chapter.

MAXCASES=n

The maximum number of cases (after filtering) to be used from the input file.
Default: All cases will be used.

MDVALUES=BOTH /MD1/MD2/NONE

Which missing data values are to be used for the variables accessed in this execution. See "The IDAMS Setup File" chapter.

PRINT=(CDICT/DICT, TIME)

CDIC
Print the input dictionary for the variables accessed with C-records if any.
DICT
Print the input dictionary without C-records.
TIME
Print the time after each table.

4. Subset specifications (optional). These statements permit selection of a subset of cases for a table or set of tables.
```
Example:  CLASS    INCLUDE V8=1,2,3,-7,9
```
There are two types of subset specifications: local filters and repetition factors. Each has a different function, but their formats are very similar. One specification may be used as a local filter for one or more tables and as a repetition factor for other tables. The format of these specifications is a standard IDAMS filter statement preceded by a subset name beginning with an alphabetic character in columns 1-8. This name must match exactly the name to be used on the table specifications; leading or embedded blanks are not ignored. It is always changed to upper case.

It is recommended that all names be left-justified. The filter statement must start in column 9 or beyond.

For repetition factors, only one variable may be specified in the expression.

The way local filters and repetition factors work is described below.

Local filters. A subset specification is identified as a local filter for a table or set of tables by specifying the subset name with the FILTER parameter. The local filter operates in the same manner as the standard filter except that it applies only to the table specification(s) in which it is referenced.

```
Example:  EDUCATN         INCLUDE V4=0-4,9 AND V5=1
(subset name)   (expression)
```
In the example above, if EDUCATN is designated as a local filter on the table specification, the table would be produced including only cases coded 0, 1, 2, 3, 4 or 9 for V4 and 1 for V5.

Repetition factors. A subset specification is identified as a repetition factor for a table or set of tables by specifying the subset name with the REPE parameter. Only one variable may be given on a subset specification to be used as a repetition factor. Repetition factors permit the generation of 3-way tables where the variable used in the repetition factor can be considered as the control or panel variable. Using a repetition factor and a filter, 4-way tables may be produced.

INCLUDE expressions cause tables to be produced including cases for each value or range of values of the control variable used in the expression. Commas separate the values or ranges. Thus if there are n commas in the expression, n+1 tables will be produced.

```
Example:  EDUCATN         INCLUDE V4=0-4,9
(subset name)   (expression)
```
In the above example, if EDUCATN is designated as a repetition factor, two tables will result: one including cases coded 0-4 for variable 4, and another including cases coded 9 for variable 4.

EXCLUDE may be used to produce tables with all values except those specified.

```
Example:  EDUCATN         EXCLUDE V1=1,4
(subset name)   (expression)
```
In the above example, if EDUCATN is designated as a repetition factor, two tables will result: one including all values except 1 and another including all values except 4.

5. TABLES. The word TABLES on this line signals that table specifications follow. It must be included (in order to separate subset specifications from table specifications) and must appear only once.

6. Table specifications. Table specifications are used to describe the characteristics of the tables to be produced. The coding rules are the same as for parameters. Each set of table specifications must start on a new line.
```
Examples:

R=(V6,1,8) CELLS=FREQS                (One univariate table).
R=(V6,1,8)  C=(V9,0,4) -              (One bivariate table with repetition
REPE=SEX  CELLS=(ROWP,FREQS)           factor, i.e. 3-way table).
ROWV=(V5-V9) CELLS=FREQS USTA=MEAN    (Set of univariate tables).
ROWV=(V3,V5) COLV=(V21-V31)  -        (Set of bivariate tables).
R=(0,1,8) C=(0,1,99)
```
ROWVARS=(variable list)
List of variables for which univariate tables are required or to be used as the rows in bivariate tables.

COLVARS=(variable list)

List of variables to be used as columns for bivariate tables.

R=(var, rmin, rmax)

var
Row or univariate variable number for a single table. To supply minimum and maximum values for a set of tables, set the variable number to zero, e.g. R=(0,1,5); in this case the minimum and maximum codes apply to all variables in the ROWVARS parameter.
rmin
Minimum code of the row variable(s) for statistical and percent calculations.
rmax
Maximum code of the row variable(s) for statistical and percent calculations.

If either rmin or rmax is specified, both must be specified. If only the variable number is specified, minimum and maximum values are not applied.

C=(var, cmin, cmax)

var
Column variable number for a single bivariate table. To supply minimum and maximum values for a set of tables, set the variable number to zero, e.g. C=(0,2,5); in this case, the minimum and maximum codes apply to all variables in the COLVARS parameter.
cmin
Minimum code of the column variable(s) for statistical and percent calculations.
cmax
Maximum code of the column variable(s) for statistical and percent calculations.

If either cmin or cmax is specified, both must be specified. If only the variable number is specified, minimum or maximum values are not applied.

TITLE=table title

Title to be printed at the top of each table in this set.
Default: No table title.

CELLS=(ROWPCT, COLPCT, TOTPCT, FREQS /NOFREQS, UNWFREQS, MEAN)

Contents of cells for tables when PRINT=TABLES or WRITE=TABLES specified.
ROWP
Percentages for univariate tables or percentages based on row totals for bivariate tables.
COLP
Percentages based on column totals in bivariate tables.
TOTP
Percentages based on grand total in bivariate tables.
FREQ
Weighted frequency counts (same as unweighted if WEIGHT not specified).
UNWF
Unweighted frequency counts.
MEAN
Mean of variable specified by VARCELL.

VARCELL=variable number

Variable number of the variable for which mean value is to be computed for each cell in the table.

MDHANDLING=ALL /R/C/NONE

ALL
Delete all missing data cases from statistics and percent calculations.
For bivariate tables, delete row-variable missing data from bivariate statistics and percent calculations.
Delete column-variable missing data from bivariate statistics and percent calculations.
NONE
Do not delete missing data. Note: missing data cases are always excluded from univariate statistics.

WEIGHT=variable number

The weight variable number if the data are to be weighted.

FILTER=xxxxxxxx

The 1-8 character name of the subset specification to be used as a local filter. Enclose the name in primes if it contains any non-alphanumeric characters. If the name does not match with any subset specification, the table will be skipped. Upper case letters should be used in order to match the name on the subset specification which is automatically converted to upper case.

REPE=xxxxxxxx

The 1-8 character name of the subset specification to be used as a repetition factor. Enclose the name in primes if it contains any non-alphanumeric characters. If the name does not match with any subset specification, the table will be skipped. Tables will be repeated for each group of cases specified. Upper case letters should be used in order to match the name on the subset specification which is automatically converted to upper case.

USTATS=(MEANSD, MEDMOD)

(Univariate tables only).
MEAN
Print mean, minimum, maximum, variance (unbiased), standard deviation, skewness, kurtosis, weighted and unweighted total number of cases.
MEDM
Print median and mode (if there are ties, numerically smallest value is selected).

NTILE=n

(Univariate tables only).
The n is the number of quantiles to be calculated; it must be in the range 3-10.

STATS=(CHI, CV, CC, LRD, LCD, LSYM, SPMR, GAMMA, TAUA, TAUB, TAUC, WILC, MW, FISHER, T)

If any bivariate statistics are to be printed or output supply the STAT parameter with each of the statistics desired.
Bivariate tables and matrix output
CHI
Chi-square. (If MATRIX is not requested, the selection of CHI, CV or CC will cause all three to be computed).
CV
Cramers V.
CC
Contingency coefficient.
LRD
Lambda, row variable is the dependent variable. (If MATRIX is not requested, the selection of any of the lambdas will cause all three to be computed).
LCD
Lambda, column variable is the dependent variable.
LSYM
Lambda, symmetric.
SPMR
Spearman rho statistic.
GAMM
Gamma statistic.
TAUA
Tau a statistic. (If MATRIX is not requested, the selection of any of the three taus will cause all three to be computed).
TAUB
Tau b statistic.
TAUC
Tau c statistic.

Bivariate tables only
WILC
Wilcoxon signed ranks test.
MW
Mann-Whitney test.
FISH
Fisher exact test.
t-tests between all combinations of rows, up to a limit of 50 rows.

DECPCT=2 /n

Number of decimals, maximum 4, printed for percentages.

DECSTATS=2 /n

Number of decimals printed for mean, median, taus, gamma, lambdas, and chi-square statistics. All other statistics will be printed with 2+n decimals (i.e. default of 4).

WRITE=MATRIX/TABLES

If an output file is to be generated, supply the WRITE parameter and the type of output.
MATR
Output the matrices of selected statistics.
If the ROWVARS parameter is specified produce a square matrix for each statistic requested by the STATS parameter using all pairings of the variables appearing in the list.
If the ROWVARS and COLVARS parameters are specified produce a rectangular matrix for each statistic requested by the STATS parameter using each variable appearing in the ROWVARS list paired with each variable appearing in the COLVARS list.
TABL
Output the tables of statistics requested with the CELLS parameter.

PRINT=(TABLES/NOTABLES, SEPARATE, ZEROS, CUM, GRIDNOGRID, N, WTDN, MATRIX)

Options relevant to univariate/bivariate tables only.
TABL
Print tables with items specified by CELLS.
SEPA
Print each item specified in CELLS as a separate table.
ZERO
Keep rows with zero marginals in printout. (Applicable only if table has more than 10 columns and hence must be printed in strips).
CUM
Print cumulative row and column marginal frequencies and percentages. If data are weighted, figures are computed on weighted frequencies only.
GRID
Print grid around cells of bivariate tables.
NOGR
Suppress grid around cells of bivariate tables.

Options relevant with WRITE=MATRIX only.
Print matrix of ns for matrices of statistics requested.
WTDN
Print matrix of weighted ns for matrices of statistics requested.
MATR
Print matrices of statistics specified under STATS.

## 37.9  Restrictions

1. The maximum number of variables for univariate frequencies is 400.
2. The combination of variables and subset specifications is subject to the restriction:
5NV + 107NF < 8499
where NF is the number of subset specifications and NV is the number of variables.
3. Code values for univariate tables must be in the range -2,147,483,648 to 2,147,483,647.
4. Code values for bivariate tables must be in the range -32,768 to 32,767. Any code values outside this range are automatically recoded to the end points of the range, e.g. -40,000 will become -32,768 and 40,000 will become 32,767. Thus, on the bivariate table specification, 32,767 is the maximum "maximum value". (Note that a 5-digit variable with a missing data code of 99999 will have the missing data row labeled 32,767 on the printout).
5. The maximum cumulative weighted or unweighted frequency for a table (and for any cell, row or column) is 2,147,483,647.
6. Table dimension maximums.
Bivariate: 500 row codes, 500 column codes, 3000 cells with non-zero entities.
Univariate: 3000 categories if frequencies, median/mode requested; otherwise, unlimited.
Note: For a variable such as income, if there are more than 3000 unique income values, one cannot get a median or mode without first bracketing the variable.
7. Non-integer V-variable values in distributions and in weights are treated as if the decimal point were absent; a scale factor is printed for each variable.
8. t-tests of means between rows are performed only on the first 50 rows of a table.
9. For bivariate statistical matrix output, the maximum number of variables that may be requested for a row or column is 95.
10. If output files for tables and matrices are both requested, these are output to the same physical file.
11. There is no way of labelling rows and columns of tables when recoded variables are used.

## 37.10  Example

In the example below, the following tables are requested:

1. Frequency counts for variables V201-V220.
2. Univariate statistics with no frequency tables for variables V54-V62 and V64. Means will have 1 decimal and other statistics 3 decimals.
3. Weighted and unweighted frequency counts and percentages with cumulative frequencies and percentages for variables V25-V30 and a grouped version of variable V7. Missing data cases are not to be excluded from the percentages or statistics. Median and mode statistics requested.
4. For the categories of the single variable V201, frequency counts and the mean of variable V54.
5. 8 bivariate tables (with row variables V25-V28 and column variables V29, V30) repeated by values 1 and 2 of variable V10 (sex), i.e. with sex as a panel (control) variable. Counts, row, column and total percentages will be in each cell. Chi-square and Taus statistics requested.
6. 3-way tables, using region (V3) grouped into 3 categories as the panel variable. Tables are restricted to male cases only (V10=1). Frequency counts and mean of variable V54 will appear in each cell.
7. A single weighted frequency count table, excluding cases where either the row variable and/or the column variable take the value 9.
8. Matrices of Tau A and Gamma statistics to be printed and written to a file for all pairs of variables V54-V62. A matrix of counts of valid cases for each pair of variables will also be printed.
```

\$RUN TABLES
\$FILES
PRINT  = TABLES.LST
FT02   = TREE.MAT                  matrices of statistics
DICTIN = TREE.DIC                  input dictionary file
DATAIN = TREE.DAT                  input data file
\$RECODE
R7=BRAC(V7,0-15=1,16-25=2,26-35=3,36-45=4,46-98=5,99=9)
NAME R7'GROUPED V7'
\$SETUP
TABLE EXAMPLES
MALE     INCLUDE V10=1
SEX      INCLUDE V10=1,2
REGION   INCLUDE V3=1-2,3-4,5
MD       EXCLUDE V19=9 OR V52=9
TABLES
1.  ROWV=(V201-V220)  TITLE='Frequency counts'
2.  ROWV=(V54-V62,V64)  USTATS=MEANSD  PRINT=NOTABLES  DECSTAT=1
3.  ROXV=(V25-V30,R7)   USTATS=MEDMOD  CELLS=(FREQS,UNWFREQS,ROWP)  -
WEIGHT=V9  PRINT=CUM  MDHAND=NONE
4.  R=(V201,1,3)  CELLS=(FREQS,MEAN)  VARCELL=V54
5.  ROWV=(V25-V28)  COLV=(V29-V30)  -
CELLS=(FREQS,ROWP,COLP,TOTP)  STATS=(CHI,TAUA)  REPE=SEX
6.  ROWV=(V201-V203)  COLV=V206  -
CELLS=(FREQS,MEAN)  VARCELL=V54  REPE=REGION  FILT=MALE
7.  R=V19  C=V52  WEIGHT=V9  FILT=MD
8.  ROWV=(V54-V62)  STATS=(TAUA,GAMMA)  PRINT=(MATRIX,N)  WRITE=MATRIX
```