### 4(1) Example of Pearson Correlation

 Research Question : What is the pattern of relationships between eleven scientific fields of India’s cooperation links with foreign countries? Methodology : Pearson correlation Dataset : COOP.DAT
##### SYNTAX
```\$RUN PEARSON
\$FILES
PRINT = PEARSON.LST
DICTIN = COOP.DIC
DATAIN =COOP.DAT
\$SETUP
PROTOTYPE FOR PEARSON PROGRAM
MDHANDLING=CASE -
ROWVARS=(V1-V11) -
PRINT=(DICT,COVA,PAIR,XPRODUCTS)
WRITE=(CORR)```
##### Extract from Computer Output

After filtering 105 cases read from the input data file

1 cases contained illegal characters and were treated according to BADDATA specification

Number of processed cases: 104

Variable adjusted Mean S. D. Mean S. D. T-test Correlation coeff.

Pair Wt. sum X X Y Y T R(i,j)

1 - 2 104. 2.365 10.650 32.769 102.245 17.469 .8657

1 - 3 104. 2.365 10.650 8.317 32.037 20.092 .8935

Unpaired means and standard deviations***

Name No. N Wt. sum Sum X Sum X2 X X

v1 1 104 104 2.4600000E+02 1.2264000E+04 2.365 10.650

v2 2 104 104 3.4080000E+03 1.1884340E+06 32.769 102.245

Correlation matrix ***
 0 VAR 1 2 3 4 5 6 7 8 9 10 0v2 2 0.8657 0v3 3 0.8935 .9560 0v4 4 0.8613 .9239 .9596 0v5 5 0.8978 .9533 .9773 .9673 0v6 6 0.794 .8523 .8718 .9149 .8862 0v7 7 0.8513 .9252 .9430 .9552 .9536 .9034 0v8 8 0.9084 .9465 .9865 .9585 .9767 .8836 .9600 0v9 9 0.9565 .9361 .9637 .9402 .9596 .8738 .9436 .9794 0v10 10 0.9594 .8534 .9163 .8551 .9033 .7828 .8574 .9360 .9520 0v11 11 0.891 .9501 .9649 .9410 .9658 .8727 .9608 .9728 .9543 .9157
Cross Products Matrix ***
 0 VAR 1 2 3 4 5 6 7 8 9 10 0v2 2 0.8657 0v3 3 0.8935 .9560 0v4 4 0.8613 .9239 .9596 0v5 5 0.8978 .9533 .9773 .9673 0v6 6 0.794 .8523 .8718 .9149 .8862 0v7 7 0.8513 .9252 .9430 .9552 .9536 .9034 0v8 8 0.9084 .9465 .9865 .9585 .9767 .8836 .9600 0v9 9 0.9565 .9361 .9637 .9402 .9596 .8738 .9436 .9794 0v10 10 0.9594 .8534 .9163 .8551 .9033 .7828 .8574 .9360 .9520 0v11 11 0.891 .9501 .9649 .9410 .9658 .8727 .9608 .9728 .9543 .9157

Covariance Matrix (with diagonal) ***

 0 VAR 1 2 3 4 5 6 7 8 9 10 11 0v1 1 112.328 0v2 2 933.613 10353.430 0v3 3 301.913 3101.362 1016.505 0v4 4 134.171 1381.845 449.703 216.055 0v5 5 143.408 1461.800 469.609 214.278 227.124 0v6 6 63.132 650.611 208.540 100.898 100.205 56.288 0v7 7 321.283 3352.261 1070.613 499.984 511.739 241.349 1268.047 0v8 8 322.184 3222.930 1052.538 471.488 492.570 221.830 1144.002 1119.842 0v9 9 203.893 1915.839 617.986 277.958 290.881 131.858 675.861 659.229 404.559 0v10 10 50.534 431.523 145.185 62.463 67.656 29.186 151.738 155.664 95.163 24.697 0v11 11 82.038 839.904 267.279 120.173 126.452 56.885 297.240 282.837 166.760 39.538 75.481
##### INTERPRETATION
 IDAMS reports that 105 cases were read; one case which had illegal characters was treated as bad data. Descriptive statistics for all pairs of variables. Pairwise comparison of means by t-test.// descriptive statistics of single variables. Correlation matrix Matrix of cross products The elements of the matrix are computed by the following formula: Covariance Matrix The elements of the matrix are computed by the following formula: Covariance (X,Y) = S (X-) ´ (Y-) The Pearson, Covariance and Cross products measures are related. If each entry of the Cross product matrix is divided by n – 1, the result is a Covariance matrix. If each entry of the Covariance matrix is divided by the product of the standard deviations of the two variables, the result is a Correlation matrix.

### 4(2) Example of Chi-square

 Research Question : Are there any differences in the distribution of academics of different ranks in different type of institutions? In other words, is there any association between rank of an academic and type of institution. Methodology : Chi-square Dataset : ANJU.DAT
##### SYNTAX
```\$RUN TABLES
\$FILES
PRINT = MYTAB.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
\$SETUP
EXAMPLES OF TABLES
PRINT=DICT
TABLE
SR=V14 C=V9 CELL=(FREQ,ROWP,COLP,TOTP)-
STAT=(CHI,CV) MDHANDL=ALL```
##### EXTRACT FROM COMPUTER OUTPUT
 The data matrix is 2 variables and 1073 cases Row Variable number: 14 Column Variable number: 9 sv:inst type v204:rank ``` | 1| 2| 3| 9| |prof |reader |lecturer| | Total Revised ________|________|________|________|________| 1| | | | | type1 | 97| 126| 51| 3| 377 374 Row %| 25.94| 33.69| 40.37| .00| 100.00 Col %| 26.43| 33.51| 48.40| .00| 35.45 Tot %| 9.19| 11.94| 14.31| .00| 35.45 ________|________|________|________|________| 2| | | | | type2 | 147| 96| 42| 12| 297 285 Row %| 51.58| 33.68| 14.74| .00| 100.00 Col %| 40.05| 25.53| 13.46| .00| 27.01 Tot %| 13.93| 9.10| 3.98| .00| 27.01 ________|________|________|________|________| 3| | | | | type3 | 41| 17| 2| 2| 62 60 Row %| 68.33| 28.33| 3.33| .00| 100.00 Col %| 11.17| 4.52| .64| .00| 5.69 Tot %| 3.89| 1.61| .19| .00| 5.69 ________|________|________|________|________| 4| | | | | type4 | 82| 137| 117| 1| 337 336 Row %| 24.40| 40.77| 34.82| .00| 100.00 Col %| 22.34| 36.44| 37.50| .00| 31.85 Tot %| 7.77| 12.99| 11.09| .00| 31.85 ________|________|________|________|________| Totals 367 376 312 18 1073 Col % 100.00 100.00 100.00 .00 Tot % 34.79 35.64 29.57 .00 100.00 Revised 367 376 312 0 1055 Column 9 is missing data and was deleted ``` Chi square 118.50 Cramer's V .24 Contingency coefficient .32 Degrees of freedom 6 Adjusted n 1055
##### INTERPRETATION
 IDAMS reports that there are two variables and 1073 cases. Row variable is type of institution and column variable is rank. Cross tabulation of ranks of academics and types of institutions. The value of Chi-square is statistically highly significant (p < .301) which means that the association between categories of rank and type of institution is not random.

### 4(3)Example of Oneway Analysis of Variance

 Research Question : How does the (time) involvement of an academic scientist in teaching vary with his rank? Methodology : Oneway Analysis of Variance Dataset : ANJU.DAT
##### SYNTAX
```\$RUN ONEWAY
\$FILES
PRINT = ONE_WAY.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
\$SETUP
EFFECT OF RANK ON INVOLVEMENT IN ADMINISTRATIVE WOR
PRINT=CDICT
DEPVARS=(V2) CONVARS=(V9)
PRINT=CDICT
DEPVARS=(V2) CONVARS=(V9)```
##### Extract from Computer Output

After filtering 1073 cases read from the input data file

3 cases contained illegal characters and were treated

Control variable = var 9 v204:rank

Depend. variable = var 2 v262:teaching

 Code Label N Weight-sum % Mean S.D.(estim.) Sum of X % Sum of X-square 1 prof 363 363 35.0 34.824 16.076 .1264100E+05 29.0 .5337650E+06 2 reader 366 366 35.3 42.440 16.957 .1553300E+05 35.7 .7641770E+06 3 lecturer 309 309 29.8 49.693 18.412 .1535500E+05 35.3 .8674430E+06 Total 1038 1038 100.0 41.935 18.107 .4352900E+05 100.0 .2165385E+07

 Total sum of squares = 339977 For 3 groups , Eta = 0.3301 For 3 groups , Etasq = 0.108966 For 3 groups , Eta(adj) = 0.327482 For 3 groups , Etasq(adj) = 0.107245 Between means sum of squares = 37046 Within groups sum of squares = 302931 F( 2,1035) = 63.286
##### INTERPRETATION
 IDAMS reports that 1073 cases were read, out of 1038 cases were used in the analysis ( 3 cases with illegal characters and 32 cases with missing data were treated as bad data. Specification Dependent variable = Time spent on teaching, Control variable = Rank ( 3 categories – PRO(FESSOR), READER, LECTURER) Descriptive statistics Eta indicates the strength of relationship between the dependent variable and the control variable (Eta=1 signifies perfect relationship and Eta=0 signifies no relationship). Eta adjusted : Eta adjusted for degrees of freedom. F ratio is statistically highly significant (p > .005. So we can conclude that the involvement of an academic scientist varies with his rank

### 4(4)Example of Simple Linear Regression

 Research Question : How does the involvement of an academic scientist in teaching affect his involvement in research? Methodology : Simple linear regression Dataset : ANJU.DAT
##### SYNTAX
```\$RUN REGRESSN
\$FILES
PRINT = ANJU.LST
DICTIN = ANJU.DIC
DATAIN = ANJU.DAT
\$SETUP
MDHANDLING=50 -
PRINT=(DICT,MATRIX)
DEPVAR=V3 -
VARS=(V2)```
##### EXTRACT FROM COMPUTER OUTPUT

After filtering1073 cases read from the input data file
3 cases contained illegal characters and were treated according to BADDATA

specification
Number of variables = 2
Number of cases = 1055

General statistics

 Variable Standard Range Number Sum Average Deviation Max Min Variable name 2 44103.00000 41.80379 18.08305 90.0000 .0000 v262:teaching 3 23662.00000 22.42844 12.45037 100.0000 .0000 v263:research

Total correlation matrix,R(i,j)

 Variable 2 3 2 1 3 -0.34391 1.00000

Dependent variable is V 3 v263:research

 Standard error of estimate 11.7 F ratio for the regression 141.253 Multiple correlation coefficient 0.34391 adjusted Fraction of explained variance (RSQD) 0.11828 adjusted .11744 Determinant of the correlation matrix 1 Residual degrees of freedom (N-K-1) 1053 Constant term 32.327

 Var. no. B Sigma(B) Beta Sigma(Beta) Partial RSQD Marg RSQD T-ratio Cov. ratio Variable name 2 .2368 .0199 .3439 .0289 .1183 .1183 11.8850 .0000 v262:teaching
##### INTERPRETATION
 IDAMS reports that 1073 cases were read, out of which 1055 cases were used in the analysis – 3 cases with illegal character and 15 cases with missing data were excluded.. Specification: Number of variables=2; Dependent variable:= Time on Research, Independent variable =:Ttime on teaching Descriptive statistics of both dependent and independent variables. Correlation matrix shows that the two variables are correlated negatively. Standard error of the estimate is a measure of the reliability of the estimating equation, indicating the variability of the observed points around the regression line – in other words, the extent to which the observed values differ from their predicted values on the regression line. : F ratio in the aanalysis of variance table is used to test the hypothesis that the slope ( b ) of the regression line is 0. F ratio is large when the independent variable explains the variation in the dependent variable. There is a significant negative linear relationship between time spent on research and time spent on teaching. (F ratio=142.153; degrees of freedom = 1, 1053 ; p < .001) Multiple correlation coefficient (Multiple R) is the correlation between the dependent variable (cooperation links) and the predicted value. Greater the value of Multiple R, greater is the agreement between the predicted and observed values. . Fraction of explained variance (RSQD) can be interpreted as the proportion of the variation in the dependent variable explained by the regression line. It is also called the coefficient of determination. Both Multiple R and Coefficient of Determination are indicators of goodness of overall effectiveness of the linear regression. If the value of R2=1, then the regression line is the perfect estimator. If R2 = 0, then there is no relationship between X and Y. . Determinant of the Correlation Matrix .is the determinant of the correlation matrix of the predictors. It represents as a single number the generalized variance in a set of variables., and varies from 0 to 1. However, this has no meaning in the case of simple linear regression Residual degrees of freedom: If the constant is not constrained to be zero, df=N-p-1., where N is the total number of observations and p is the number of predictors. Constant term: This is the constant in the regression equation. B is the regression coefficient i.e the slope of the regression line. Sigma B is the standard error of the regression coefficient, which is a measure of the sample regression coefficient around the population regression coefficient. It is an indicator of the reliability of the coefficient. Smaller values indicate greater reliability. Beta is the standardized regression coefficient, which is independent of the scale of measurement. In the case of simple regression , Beta is equal to Multiple R. . Sigma Beta is the standard error of Beta. RSQD is the fraction of the explained variance. Marginal RSQD: Since there is only one predictor, Marginal RSQD ( .1183) is equal to RSQD (.1183). T ratio is used to test the hypothesis that B =0. T ratio = B/ Sigma B. Its significance can be tested from the table of t with n-p-1 degrees of freedom. Here, the value of t =11.005, df = 1053, which is highly significant ( p < .0001). Covariance ratio of a variable is equal to the square of Multiple correlation coefficient with other independent variables in the regression equation. It has no meaning in the case of simple linear regression.

### 4(5)Example of Simple Linear Regression with Dummy Variables

 Research Question : What is the effect of the status of a scientist on the time devoted to administration? Methodology : Simple linear regression Dataset : ICSOPRU (R2CM.DAT)
##### SYNTAX
```\$RUN REGRESSN
\$FILES
PRINT = DUM.LST
DICTIN = R2R3CM.DIC
DATAIN = R2CM.DAT
\$SETUP
INCLUDE V1=360
DUMMY REGRESSION ICSOPRU DATA
MDHANDLING=50 -
CATE -
PRINT=(DICT,MATRIX)
V201(1,2)
DEPVAR=V222 -
VARS=(V201)```
##### EXTRACT FROM COMPUTER OUTPUT

After filtering 1151 cases read from the input data file

Number of variables = 3

Number of cases = 1149

General statistics

 Variable Standard Range Number Sum Average Deviation Max Min Variable name 201-1 239 .20801 .40606 1.0000 .0000 Rank_1 201-2 605. .52654 .49951 1.0000 .0000 Rank_2 222 7837 6.82071 9.27874 75.0000 .0000 % Administrative work

Total correlation matrix, R(i,j)

 Variable 201 201 222 201 1 201 -0.54045 1.00000 222 0.52016 -.23127 1.00000

 Standard regression dependent variable is V 222 J1C: % ADMINISTRATIVE WK Standard error of estimate 7.913 F ratio for the regression 216.336 Multiple correlation coefficient .52352 adjusted   .52231 Fraction of explained variance (RSQD) .27407 adjusted   .27281 Determinant of the correlation matrix .70791 Residual degrees of freedom (N-K-1) 1146 Constant term 3.4787

 Var.no. B Sigma Beta Sigma p Marg T-ratio Cov. Variable name (B) (Beta) RSQD RSQD Ratio 201- 1 12.7556 .6835 .5582 .0299 .2331 .2206 18.6611 .2921 CM POSITION IN UNIT 201- 2 1.3081 .5557 .0704 .0299 .0048 .0035 2.3541 .2921 CM POSITION IN UNIT
##### INTERPRETATION
 IDAMS reports that 1151 cases were taken after filtering, out of which 1149 cases were used for regression analysis. Two cases with missing data were excluded. The independent variable Rank is categorized into two dummy variables ( Rank_1 = Head, Rank_2 = Scientist). Thus, the total number of varibles=3 (One dependent and two independent dummy variables) Dependent variable is the percentage of work time spent on administrative work. Descriptive statistics of both dependent and independent variables. Total Correlation Matrix: The elements of this matrix are computed directly from the matrix of residual sums of squares and cross products. Standard error of estimate is the standard deviation of the residuals. F Ratio in the Analysis of Variance table is used to test the hypothesis: ( b 1, b 2 = 0 ). F Ratio is large when the independent variable explains the variation in the dependent variable. There is a significant linear relationship between the rank of a scientist and the time devoted to Administrative work. ( F Ratio=216.336; degrees of freedom = 2, 1146; p < .001) This implies that Rank does affect the time devoted by a scientist to Administrative work. Multiple correlation coefficient (Multiple R) is the correlation between the dependent variable ( time spent on Administrative work) and the predicted value. Greater the value of Multiple R, greater is the agreement between the predicted and observed values. Here, the value of Multiple R is sufficiently large. Fraction of explained variance (RSQD) can be interpreted as the proportion of the variation in the dependent variable explained by the predictor variables. It is also called the coefficient of determination. It is equal to the square of Multiple R. Adjusted squared Multiple R ( Adjusted Fraction of the variance explained) = R2 – (p-1) (1-r2)/ (n-p), where p is the number of predictors. Both Multiple R and Coefficient of Determination are indicators of goodness of overall effectiveness of the linear regression. Determinant of the Correlation Matrix .is the determinant of the correlation matrix of the predictors. It represents as a single number the generalized variance in a set of variables., and varies from 0 to 1. Determinants near zero indicate that some or all predictors are highly correlated. Here, the determinant of the correlation matrix ( .70791) is quite large, which indicates that the predictor variables (i.e. the categories of Rank) are not highly correlated. Note that a high correlation among the predictors can threaten computational accuracy, since it inflates the standard errors of the regression coefficients, which in turn attenuate the associated F statistics. Residual degrees of freedom: If the constant is not constrained to be zero, df=N-p-1., where N is the total number of observations and p is the number of predictors. Regression coefficient for Rank_1 is statistically highly significant ( t = 18.66, df= 1146, p = .001 ) Partial R squared (RSQD). This is a partial correlation, squared, between the predictor (Rank_1 ) and the dependent variable, with the influence of the other variable (Rank_2 ) eliminated. The Partial correlation coefficient squared is a measure of that part of the variance in the dependent variable that is not explained by other predictors. Here, 23.31 % of the variance in the dependent variable is explained by the dummy variable Rank_1. Regression coefficient for the dummy variable Rank_2 is also statistically significant ( t = 18.66, df= 1146, p = .001). The value of Partial correlation squared indicates that the dummy variable Rank_2 explains only 4.48 % of the variance.