5.1 Key concepts and definitions

Multiple Regression

         SSE = error sum of squares = S (Yi - Est Yi) 2 where Yi is the actual value of Y for the ith case and Est Yi is the regression prediction for the ith case.

         SST = total sum of squares =S (Yi - MeanY) 2

         Adjusted R-Square: When there are a large number of independent variables, it is possible that R2 may become artificially large, simply because some independent variables' chance variations "explain" small parts of the variance of the dependent variable. It is therefore essential to adjust the value of R2 as the number of independent variables increases. In the case of a few independent variables, R2 and adjusted R2 will be close. In the case of a large number of independent variables, adjusted R2 may be noticeably lower.

         Multicollinearity is the intercorrelation of the independent variables. The values of r2's near 1 violate the assumption of no perfect collinearity, while high r2's increase the standard error of the regression coefficients and make assessment of the unique role of each independent variable difficult or impossible. While simple correlations tell something about multicollinearity, the preferred method of assessing multicollinearity is to compute the determinant of the correlation matrix. Determinants near zero indicate that some or all independent variables are highly correlated.

         Partial correlation is the correlation of two variables while controlling for a third or more other variables. For example r12.34 is the correlation of variables 1 and 2, controlling for variables 3 and 4. Partial correlation r12.34 equal to uncontrolled correlation r12 No effect of control variables Partial correlation near 0 Original correlation is spurious.

Multiple Classification Analysis