WHAT KIND OF CONTRASTS ARE THESE? David P. Nichols Senior Support Statistician SPSS, Inc. From SPSS Keywords, Number 63, 1997 Interpretation of parameter estimates is an essential part of the predictive modeling process. Estimates of interest often represent contrasts among the levels of a categorical predictor variable. A contrast is defined by a set of coefficients that sum to 0 over the levels of the categorical variable of interest. In SPSS, issues of interpretation of contrast results arise in several procedures, including LOGISTIC REGRESSION and COX REGRESSION. Both procedures have facilities for automatically treating predictors (or covariates) as as categorical variables. When a covariate with K levels is declared to be categorical in either one of these procedures, a set of K-1 variables is produced internally, and these variables are used as a set in the analysis. The values of the K-1 variables are determined by the choice of contrasts made by the user. The default contrasts in the current 7.5 release of SPSS for Windows have been changed in both procedures to INDICATOR, with the last category as the reference group. These contrasts produce estimates comparing each other group to the reference group. A point of considerable confusion among SPSS users is the relationship between the values of the internally created variables and the interpretation of the resulting parameter estimates. The output for the LOGISTIC REGRESSION and COX REGRESSION procedures provides the values of the internal variables used to estimate the desired contrasts. For example, suppose we have a three level categorical covariate. The new default INDICATOR contrasts would produce a set of "parameter codings" like those in Figure 1. Figure 1: Parameter codings for INDICATOR contrasts ------------------------------------------------------------------------------- Parameter Value Freq Coding (1) (2) GROUP 1 106 1.000 .000 2 116 .000 1.000 3 107 .000 .000 ------------------------------------------------------------------------------- End Figure 1 The predictor here is called simply GROUP. It takes on the values 1-3, with frequencies listed in the "Freq" column. The columns on the right (what are being called parameter codings) give the values of the internal variables created to represent the original categorical covariate. In this case there are two internal variables created. For the first variable, cases with a value of 1 for GROUP get a 1, while all other cases get a 0. For the second, cases with a 2 for GROUP get a 1, with all other cases getting a 0. The question that this output often elicits from SPSS users is how does this coding produce the contrasts claimed in our documentation? The reason is that one must distinguish between the values of the contrast coefficients defining contrasts of interest and the values of the variables in the data that will produce such a set of contrasts. The columns in the data that produce certain contrasts will resemble the contrast coefficients only when the matrix of contrast coefficients is orthogonal (the inner product of any two row vectors in the contrast matrix is 0). INDICATOR contrasts are not orthogonal, nor are the other most commonly used types in logistic or Cox regression models. Thus it is important to understand the following relationship between the columns of the data and the contrast results. If we append a constant unit (1) column onto the beginning of the two columns given above, we get what we call a basis or design matrix for generating the desired contrasts. If we call this matrix X, then for any model that uses a linear combination of the predictors in generating it's prediction function, we can compute C, the matrix of contrast coefficients, as: -1 C = (X'X) X' For the example given here, the basis matrix for INDICATOR contrasts given in Figure 2 produces the contrast matrix given in Figure 2. Figure 2: Basis and contrast matrices for INDICATOR contrasts ------------------------------------------------------------------------------- Basis: 1 1 0 Contrast: 0 0 1 1 0 1 1 0 -1 1 0 0 0 1 -1 ------------------------------------------------------------------------------- The first row of the contrast matrix gives the coefficients for the constant or intercept term, which with INDICATOR contrasts estimates the predicted value for the reference group (here, the last one). The other two rows give the contrasts estimated by the GROUP(1) and GROUP(2) parameter estimates, which are, respectively, the first group minus the last and the second minus the last. Earlier releases of SPSS used DEVIATION as the default contrast type, with the last category as the reference or excluded out category. DEVIATION contrasts compare each group other than the excluded group to the unweighted average of all groups. The value for the left out group is then by definition the negative of the sum of the given parameter estimates. Considerable confusion has resulted from the fact that the basis or design matrix for DEVIATION contrasts resembles the contrast matrix for SIMPLE contrasts, which compare each group to a reference category (like INDICATOR contrasts). It turns out that DEVIATION and SIMPLE contrasts are in a sense mirror images of one another, in that the variable codings required to produce one type of contrasts look like the transpose of the contrast matrix for the other type of contrasts. These relationships are illustrated for the three level case in Figures 3 and 4 (using fractions for precision; SPSS output shows decimal values). Note that the contrasts estimated for GROUP(1) and GROUP(2) are the same for SIMPLE contrasts as for INDICATOR, but that the intercept is now an unweighted average of all levels rather than the value for the last (or more generally, the reference) group. Figure 3: Basis and contrast matrices for DEVIATION contrasts ------------------------------------------------------------------------------- Basis: 1 1 0 Contrast: 1/3 1/3 1/3 1 0 1 2/3 -1/3 -1/3 1 -1 -1 -1/3 2/3 -1/3 ------------------------------------------------------------------------------- Figure 4: Basis and contrast matrices for SIMPLE contrasts ------------------------------------------------------------------------------- Basis: 1 2/3 -1/3 Contrast: 1/3 1/3 1/3 1 -1/3 2/3 1 0 -1 1 -1/3 -1/3 0 1 -1 -------------------------------------------------------------------------------