Correlation | SPSS Annotated Output

This page shows an example correlation with footnotes explaining the output. These data were collected on 200 high schools students and are scores on various tests, including science, math, reading and social studies (socst). The variable female is a dichotomous variable coded 1 if the student was female and 0 if male.

In the syntax below, the get file command is used to load the hsb2 data into SPSS. In quotes, you need to specify where the data file is located on your computer. Remember that you need to use the .sav extension and that you need to end the command with a period. By default, SPSS does a pairwise deletion of missing values. This means that as long as both variables in the correlation have valid values for a case, that case is included in the correlation. The /print subcommand is used to have the statistically significant correlations marked.

get file "c:\data\hsb2.sav".

correlations
 /variables = read write math science female
 /print = nosig.

Image spss_corr_1a

a. Pearson Correlation – These numbers measure the strength and direction of the linear relationship between the two variables. The correlation coefficient can range from -1 to +1, with -1 indicating a perfect negative correlation, +1 indicating a perfect positive correlation, and 0 indicating no correlation at all. (A variable correlated with itself will always have a correlation coefficient of 1.) You can think of the correlation coefficient as telling you the extent to which you can guess the value of one variable given a value of the other variable. From the scatterplot of the variables read and write below, we can see that the points tend along a line going from the bottom left to the upper right, which is the same as saying that the correlation is positive. The .597 is the numerical description of how tightly around the imaginary line the points lie. If the correlation was higher, the points would tend to be closer to the line; if it was smaller, they would tend to be further away from the line. Also note that, by definition, any variable correlated with itself has a correlation of 1.

b. Sig. (2-tailed) – This is the p-value associated with the correlation. The footnote under the correlation table explains what the single and double asterisks signify.

c. N – This is number of cases that was used in the correlation. Because we have no missing data in this data set, all correlations were based on all 200 cases in the data set. However, if some variables had missing values, the N’s would be different for the different correlations.

Scatterplot

graph
 /scatterplot(bivar) = write with read.

Image spss_corr_2

Correlation using listwise deletion of missing data

The correlations in the table below are interpreted in the same way as those above. The only difference is the way the missing values are handled. When you do a listwise deletion, as we do with the /missing = listwise subcommand, if a case has a missing value for any of the variables listed on the /variables subcommand, that case is eliminated from all correlations, even if there are valid values for the two variables in the current correlation. For example, if there was a missing value for the variable read, the case would still be excluded from the calculation of the correlation between write and math.

There are really no rules defining when you should use pairwise or listwise deletion. It depends on your purpose and whether it is important for exactly the same cases to be used in all of the correlations. If you have lots of missing data, some correlations could be based on many cases that are not included in other correlations. On the other hand, if you use a listwise deletion, you may not have many cases left to be used in the calculation.

Please note that SPSS sometimes includes footnotes as part of the output. We have left those intact and have started ours with the next letter of the alphabet.

correlations
 /variables = read write math science female
 /print = nosig
 /missing = listwise.

Image spss_corr_3a

b. Pearson Correlation – This is the correlation between the two variables (one listed in the row, the other in the column). It is interpreted just as the correlations in the previous example.

c. Sig. (2-tailed) – This is the p-value associated with the correlation. The footnote under the correlation table explains what the single and double asterisks signify.