Research Question 
: 
What is the pattern of relationships between eleven scientific fields of India’s cooperation links with foreign countries? 
Methodology 
: 
Pearson correlation 
Dataset 
: 
COOP.DAT 
$RUN PEARSON $FILES PRINT = PEARSON.LST DICTIN = COOP.DIC DATAIN =COOP.DAT $SETUP PROTOTYPE FOR PEARSON PROGRAM BADDATA=MD1  MDHANDLING=CASE  ROWVARS=(V1V11)  PRINT=(DICT,COVA,PAIR,XPRODUCTS) WRITE=(CORR)
After filtering 105 cases read from the input data file 1 cases contained illegal characters and were treated according to BADDATA specification Number of processed cases: 104 


Variable adjusted Mean S. D. Mean S. D. Ttest Correlation coeff. Pair Wt. sum X X Y Y T R(i,j) 1  2 104. 2.365 10.650 32.769 102.245 17.469 .8657 1  3 104. 2.365 10.650 8.317 32.037 20.092 .8935 Unpaired means and standard deviations*** Variable Variable Adjusted Adjusted Mean S. D. Name No. N Wt. sum Sum X Sum X2 X X v1 1 104 104 2.4600000E+02 1.2264000E+04 2.365 10.650 v2 2 104 104 3.4080000E+03 1.1884340E+06 32.769 102.245 

Correlation matrix ***


Cross Products Matrix ***


Covariance Matrix (with diagonal) ***

IDAMS reports that 105 cases were read; one case which had illegal characters was treated as bad data. 


Descriptive statistics for all pairs of variables. Pairwise comparison of means by ttest.// descriptive statistics of single variables. 

Correlation matrix  
Matrix of cross products The elements of the matrix are computed by the following formula:


Covariance Matrix The elements of the matrix are computed by the following formula: Covariance (X,Y) = S (X) ´ (Y) The Pearson, Covariance and Cross products measures are related. If each entry of the Cross product matrix is divided by n – 1, the result is a Covariance matrix. If each entry of the Covariance matrix is divided by the product of the standard deviations of the two variables, the result is a Correlation matrix. 
Research Question 
: 
Are there any differences in the distribution of academics of different ranks in different type of institutions? In other words, is there any association between rank of an academic and type of institution. 
Methodology 
: 
Chisquare 
Dataset 
: 
ANJU.DAT 
$RUN TABLES $FILES PRINT = MYTAB.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP EXAMPLES OF TABLES PRINT=DICT TABLE SR=V14 C=V9 CELL=(FREQ,ROWP,COLP,TOTP) STAT=(CHI,CV) MDHANDL=ALL
The data matrix is 2 variables and 1073 cases Row Variable number: 14 Column Variable number: 9 sv:inst type v204:rank 


 1 2 3 9 prof reader lecturer  Total Revised ________________________________________ 1     type1  97 126 51 3 377 374 Row % 25.94 33.69 40.37 .00 100.00 Col % 26.43 33.51 48.40 .00 35.45 Tot % 9.19 11.94 14.31 .00 35.45 ________________________________________ 2     type2  147 96 42 12 297 285 Row % 51.58 33.68 14.74 .00 100.00 Col % 40.05 25.53 13.46 .00 27.01 Tot % 13.93 9.10 3.98 .00 27.01 ________________________________________ 3     type3  41 17 2 2 62 60 Row % 68.33 28.33 3.33 .00 100.00 Col % 11.17 4.52 .64 .00 5.69 Tot % 3.89 1.61 .19 .00 5.69 ________________________________________ 4     type4  82 137 117 1 337 336 Row % 24.40 40.77 34.82 .00 100.00 Col % 22.34 36.44 37.50 .00 31.85 Tot % 7.77 12.99 11.09 .00 31.85 ________________________________________ Totals 367 376 312 18 1073 Col % 100.00 100.00 100.00 .00 Tot % 34.79 35.64 29.57 .00 100.00 Revised 367 376 312 0 1055 Column 9 is missing data and was deleted 

Chi square 118.50 Cramer's V .24 Contingency coefficient .32 Degrees of freedom 6 Adjusted n 1055 
IDAMS reports that there are two variables and 1073 cases. Row variable is type of institution and column variable is rank. 


Cross tabulation of ranks of academics and types of institutions.  
The value of Chisquare is statistically highly significant (p < .301) which means that the association between categories of rank and type of institution is not random. 
Research Question 
: 
How does the (time) involvement of an academic scientist in teaching vary with his rank? 
Methodology 
: 
Oneway Analysis of Variance 
Dataset 
: 
ANJU.DAT 
$RUN ONEWAY $FILES PRINT = ONE_WAY.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP EFFECT OF RANK ON INVOLVEMENT IN ADMINISTRATIVE WOR BADDATA=MD1  PRINT=CDICT DEPVARS=(V2) CONVARS=(V9) BADDATA=MD1  PRINT=CDICT DEPVARS=(V2) CONVARS=(V9)
After filtering 1073 cases read from the input data file 3 cases contained illegal characters and were treated according to BADDATA specification 


Control variable = var 9 v204:rank Depend. variable = var 2 v262:teaching 




IDAMS reports that 1073 cases were read, out of 1038 cases were used in the analysis ( 3 cases with illegal characters and 32 cases with missing data were treated as bad data.  

Specification Dependent variable = Time spent on teaching, Control variable = Rank ( 3 categories – PRO(FESSOR), READER, LECTURER)  
Descriptive statistics  
Eta indicates the strength of relationship between the dependent variable and the control variable (Eta=1 signifies perfect relationship and Eta=0 signifies no relationship). Eta adjusted : Eta adjusted for degrees of freedom. F ratio is statistically highly significant (p > .005. So we can conclude that the involvement of an academic scientist varies with his rank 
Research Question 
: 
How does the involvement of an academic scientist in teaching affect his involvement in research? 
Methodology 
: 
Simple linear regression 
Dataset 
: 
ANJU.DAT 
$RUN REGRESSN $FILES PRINT = ANJU.LST DICTIN = ANJU.DIC DATAIN = ANJU.DAT $SETUP REGRESSN OF ACADEMIC INVOLVEMENT BADDATA=MD1  MDHANDLING=50  PRINT=(DICT,MATRIX) DEPVAR=V3  VARS=(V2)
After filtering1073 cases read from the input data file 


specification Number of variables = 2 Number of cases = 1055 

General statistics


Total correlation matrix,R(i,j)


Dependent variable is V 3 v263:research



IDAMS reports that 1073 cases were read, out of which 1055 cases were used in the analysis – 3 cases with illegal character and 15 cases with missing data were excluded..  

Specification: Number of variables=2; Dependent variable:= Time on Research, Independent variable =:Ttime on teaching 

Descriptive statistics of both dependent and independent variables.  
Correlation matrix shows that the two variables are correlated negatively.  
Standard error of the estimate is a measure of the reliability of the estimating equation, indicating the variability of the observed points around the regression line – in other words, the extent to which the observed values differ from their predicted values on the regression line. : F ratio in the aanalysis of variance table is used to test the hypothesis that the slope ( b ) of the regression line is 0. F ratio is large when the independent variable explains the variation in the dependent variable. There is a significant negative linear relationship between time spent on research and time spent on teaching. (F ratio=142.153; degrees of freedom = 1, 1053 ; p < .001) Multiple correlation coefficient (Multiple R) is the correlation between the dependent variable (cooperation links) and the predicted value. Greater the value of Multiple R, greater is the agreement between the predicted and observed values. . Fraction of explained variance (RSQD) can be interpreted as the proportion of the variation in the dependent variable explained by the regression line. It is also called the coefficient of determination. Both Multiple R and Coefficient of Determination are indicators of goodness of overall effectiveness of the linear regression. If the value of R^{2}=1, then the regression line is the perfect estimator. If R2 = 0, then there is no relationship between X and Y. . Determinant of the Correlation Matrix .is the determinant of the correlation matrix of the predictors. It represents as a single number the generalized variance in a set of variables., and varies from 0 to 1. However, this has no meaning in the case of simple linear regression Residual degrees of freedom: If the constant is not constrained to be zero, df=Np1., where N is the total number of observations and p is the number of predictors. Constant term: This is the constant in the regression equation. 

B is the regression coefficient i.e the slope of the regression line. Sigma B is the standard error of the regression coefficient, which is a measure of the sample regression coefficient around the population regression coefficient. It is an indicator of the reliability of the coefficient. Smaller values indicate greater reliability. Beta is the standardized regression coefficient, which is independent of the scale of measurement. In the case of simple regression , Beta is equal to Multiple R. . Sigma Beta is the standard error of Beta. RSQD is the fraction of the explained variance. Marginal RSQD: Since there is only one predictor, Marginal RSQD ( .1183) is equal to RSQD (.1183). T ratio is used to test the hypothesis that B =0. T ratio = B/ Sigma B. Its significance can be tested from the table of t with np1 degrees of freedom. Here, the value of t =11.005, df = 1053, which is highly significant ( p < .0001). Covariance ratio of a variable is equal to the square of Multiple correlation coefficient with other independent variables in the regression equation. It has no meaning in the case of simple linear regression. 
Research Question 
: 
What is the effect of the status of a scientist on the time devoted to administration? 
Methodology 
: 
Simple linear regression 
Dataset 
: 
ICSOPRU (R2CM.DAT) 
$RUN REGRESSN $FILES PRINT = DUM.LST DICTIN = R2R3CM.DIC DATAIN = R2CM.DAT $SETUP INCLUDE V1=360 DUMMY REGRESSION ICSOPRU DATA BADDATA=MD1  MDHANDLING=50  CATE  PRINT=(DICT,MATRIX) V201(1,2) DEPVAR=V222  VARS=(V201)
After filtering 1151 cases read from the input data file Number of variables = 3 Number of cases = 1149 


General statistics


Total correlation matrix, R(i,j)





IDAMS reports that 1151 cases were taken after filtering, out of which 1149 cases were used for regression analysis. Two cases with missing data were excluded. The independent variable Rank is categorized into two dummy variables ( Rank_1 = Head, Rank_2 = Scientist). Thus, the total number of varibles=3 (One dependent and two independent dummy variables) Dependent variable is the percentage of work time spent on administrative work. 


Descriptive statistics of both dependent and independent variables.  
Total Correlation Matrix: The elements of this matrix are computed directly from the matrix of residual sums of squares and cross products.  
Standard error of estimate is the standard deviation of the residuals. F Ratio in the Analysis of Variance table is used to test the hypothesis: ( b 1, b 2 = 0 ). F Ratio is large when the independent variable explains the variation in the dependent variable. There is a significant linear relationship between the rank of a scientist and the time devoted to Administrative work. ( F Ratio=216.336; degrees of freedom = 2, 1146; p < .001) This implies that Rank does affect the time devoted by a scientist to Administrative work. Multiple correlation coefficient (Multiple R) is the correlation between the dependent variable ( time spent on Administrative work) and the predicted value. Greater the value of Multiple R, greater is the agreement between the predicted and observed values. Here, the value of Multiple R is sufficiently large. Fraction of explained variance (RSQD) can be interpreted as the proportion of the variation in the dependent variable explained by the predictor variables. It is also called the coefficient of determination. It is equal to the square of Multiple R. Adjusted squared Multiple R ( Adjusted Fraction of the variance explained) = R^{2} – (p1) (1r^{2})/ (np), where p is the number of predictors. Both Multiple R and Coefficient of Determination are indicators of goodness of overall effectiveness of the linear regression. Determinant of the Correlation Matrix .is the determinant of the correlation matrix of the predictors. It represents as a single number the generalized variance in a set of variables., and varies from 0 to 1. Determinants near zero indicate that some or all predictors are highly correlated. Here, the determinant of the correlation matrix ( .70791) is quite large, which indicates that the predictor variables (i.e. the categories of Rank) are not highly correlated. Note that a high correlation among the predictors can threaten computational accuracy, since it inflates the standard errors of the regression coefficients, which in turn attenuate the associated F statistics. Residual degrees of freedom: If the constant is not constrained to be zero, df=Np1., where N is the total number of observations and p is the number of predictors. 

Regression coefficient for Rank_1 is statistically highly significant ( t = 18.66, df= 1146, p = .001 ) Partial R squared (RSQD). This is a partial correlation, squared, between the predictor (Rank_1 ) and the dependent variable, with the influence of the other variable (Rank_2 ) eliminated. The Partial correlation coefficient squared is a measure of that part of the variance in the dependent variable that is not explained by other predictors. Here, 23.31 % of the variance in the dependent variable is explained by the dummy variable Rank_1. Regression coefficient for the dummy variable Rank_2 is also statistically significant ( t = 18.66, df= 1146, p = .001). The value of Partial correlation squared indicates that the dummy variable Rank_2 explains only 4.48 % of the variance. 