This page shows an example of a canonical correlation analysis with footnotes explaining the output in SPSS. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

We have a data file,
https://stats.idre.ucla.edu/wp-content/uploads/2016/02/mmr.sav, with 600 observations on eight
variables. The psychological variables are **locus of control**, **
self-concept** and **motivation**. The academic variables are standardized
test scores in **reading**, **writing**, **math** and **science**. Additionally, the variable **female** is a zero-one indicator variable with
the one indicating a female student. The researcher is interested in the
relationship between the psychological variables and the academic variables,
with gender considered as well. Canonical correlation analysis aims to
find pairs of linear combinations of each group of variables that are highly
correlated. These linear combinations are called canonical variates. Each
canonical variate is orthogonal to the other canonical variates except for the
one with which its correlation has been maximized. The possible number of such
pairs is limited to the number of variables in the smallest group. In our
example, there are three psychological variables and more than three academic
variables. Thus, a canonical correlation analysis on these sets of variables
will generate three pairs of canonical variates.

To begin, let’s read in and summarize the dataset.

get file='d:\data\mmr.sav'. descriptives variables=locus_of_control self_concept motivation read write math science female /statistics=mean stddev min max.

These descriptives indicate that there are not any missing values in the data and suggest the different scales the different variables. We can proceed with the canonical correlation analysis without worries of missing data, keeping in mind that our variables differ widely in scale.

SPSS performs canonical correlation using the **manova** command with the **discrim**
option. The
**manova** command is one of the SPSS commands that can only be accessed via
syntax; there is not a sequence of pull-down menus or point-and-clicks that
could arrive at this analysis.

Due to the length of the output, we will be omitting some of the output that is extraneous to our canonical correlation analysis and making comments in several places along the way.

In the **manova** command, we first list the variables in our
psychological group (**locus_of_control**, **self_concept** and **
motivation**). Then, after the SPSS keyword **with**, we list the variables in our academic group
(**read**, **write**, **math**, **science** and **female**). SPSS refers to the first group of variables as the “dependent variables” and the
second group of variables as the “covariates”. This follows **manova**
convention.

manova locus_of_control self_concept motivation with read write math science female / discrim all alpha(1) / print=sig(eigen dim).

...[additional output omitted]...

* * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * EFFECT .. WITHIN CELLS Regression Multivariate Tests of Significance (S = 3, M = 1/2, N = 295 ) Test Name Value Approx. F Hypoth. DF Error DF Sig. of F Pillais .25425 11.00057 15.00 1782.00 .000 Hotellings .31430 12.37633 15.00 1772.00 .000 Wilks .75436 11.71573 15.00 1634.65 .000 Roys .21538 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Eigenvalues and Canonical Correlations Root No. Eigenvalue Pct. Cum. Pct. Canon Cor. Sq. Cor 1 .274 87.336 87.336 .464 .215 2 .029 9.185 96.522 .168 .028 3 .011 3.478 100.000 .104 .011 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dimension Reduction Analysis Roots Wilks L. F Hypoth. DF Error DF Sig. of F 1 TO 3 .75436 11.71573 15.00 1634.65 .000 2 TO 3 .96143 2.94446 8.00 1186.00 .003 3 TO 3 .98919 2.16461 3.00 594.00 .091 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

...[additional output omitted]...

* * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * Raw canonical coefficients for DEPENDENT variables Function No. Variable 1 2 3 locus_of 1.254 -.621 .662 self_con -.351 -1.188 -.827 motivati 1.262 2.027 -2.000 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Standardized canonical coefficients for DEPENDENT variables Function No. Variable 1 2 3 locus_of .840 -.417 .444 self_con -.248 -.838 -.583 motivati .433 .695 -.686 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between DEPENDENT and canonical variables Function No. Variable 1 2 3 locus_of .904 -.390 .176 self_con .021 -.709 -.705 motivati .567 .351 -.745 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in dependent variables explained by canonical variables CAN. VAR. Pct Var DE Cum Pct DE Pct Var CO Cum Pct CO 1 37.980 37.980 8.180 8.180 2 25.910 63.889 .727 8.907 3 36.111 100.000 .391 9.297 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Raw canonical coefficients for COVARIATES Function No. COVARIATE 1 2 3 read .045 -.005 -.021 write .036 .042 -.091 math .023 .004 -.009 science .005 -.085 .110 female .632 1.085 1.795 * * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * Standardized canonical coefficients for COVARIATES CAN. VAR. COVARIATE 1 2 3 read .451 -.050 -.216 write .349 .409 -.888 math .220 .040 -.088 science .049 -.827 1.066 female .315 .541 .894 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between COVARIATES and canonical variables CAN. VAR. Covariate 1 2 3 read .840 -.359 -.135 write .877 .065 -.255 math .764 -.298 -.148 science .658 -.677 .230 female .364 .755 .543 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in covariates explained by canonical variables CAN. VAR. Pct Var DE Cum Pct DE Pct Var CO Cum Pct CO 1 11.305 11.305 52.488 52.488 2 .701 12.006 24.994 77.482 3 .098 12.104 9.066 86.548 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

...[additional output omitted]...

## Data Summary, Eigenvalues and Hypothesis Tests

* * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * EFFECT .. WITHIN CELLS Regression Multivariate Tests of Significance (S = 3, M = 1/2, N = 295 ) Test Name Value^{e}Approx. F^{f}Hypoth. DF^{g}Error DF^{g}Sig. of F^{h}Pillais.25425 11.00057 15.00 1782.00 .000 Hotellings^{a}^{b}.31430 12.37633 15.00 1772.00 .000 Wilks.75436 11.71573 15.00 1634.65 .000 Roys^{c}^{d}.21538 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Eigenvalues and Canonical Correlations Root No.^{i}Eigenvalue^{j}Pct.^{k}Cum. Pct.^{l}Canon Cor.^{m}Sq. Cor^{n}1 .274 87.336 87.336 .464 .215 2 .029 9.185 96.522 .168 .028 3 .011 3.478 100.000 .104 .011 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dimension Reduction Analysis Roots^{o}Wilks L.^{p}FHypoth. DF^{f}^{g}Error DF^{g}Sig. of F^{h}1 TO 3 .75436 11.71573 15.00 1634.65 .000 2 TO 3 .96143 2.94446 8.00 1186.00 .003 3 TO 3 .98919 2.16461 3.00 594.00 .091

a. **Pillais** – This is Pillai’s trace, one of the four multivariate
statistics calculated by SPSS to test the null hypothesis that the canonical
correlations are zero (which, in turn, means that there is no linear
relationship between the two specified groups of variables). Pillai’s trace is the sum of the squared canonical
correlations, which can be found in the next section of output (see superscript
**n**): 0.464^{2} + 0.168^{2} + 0.104^{2} =
0.25425.

b. **Hotellings** – This is the Hotelling-Lawley trace. It is very similar
to Pillai’s trace and can be calculated as the sum
of the values of (canonical correlation^{2}/(1-canonical correlation^{2})). We can calculate 0.464^{2
}/(1- 0.464^{2}) + 0.168^{2}/(1-0.168^{2}) + 0.104^{2}/(1-0.104^{2}) = 0.31430.

c. **Wilks** – This is Wilks’ lambda, another multivariate
statistic calculated by SPSS. It is the product of the values of
(1-canonical correlation^{2}). In this example, our canonical
correlations are 0.4641, 0.1675, and 0.1040 so the Wilks’ Lambda is (1- 0.464^{2})*(1-0.168^{2})*(1-0.104^{2})
= 0.75436.

d. **Roys** – This is Roy’s greatest root. It can be calculated from
the largest eigenvalue: largest eigenvalue/(1 + largest eigenvalue). Because it is
based on a maximum, it can behave differently from the other three test
statistics. In instances where the other three are not statistically significant and Roy’s is
statistically significant, the effect should be considered to be not statistically significant.

e. **Value** – This is the value of the multivariate test
listed in the prior column.

f. **
(Approx.) F **–
These are the F values associated with the various tests that are included in
SPSS’s output. For the multivariate tests, the F values are approximate.

g. **Hypoth. DF, Error DF** – These are the degrees of freedom used in
determining the F values. Note that there are instances in which the
degrees of freedom may be a non-integer because these degrees of freedom are calculated using the mean
squared errors, which are often non-integers.

h. **Sig. of F** – This is the p-value associated with the F value of a
given test statistic. The null hypothesis that our two sets of variables are not
linearly related is evaluated with regard to this p-value. For a given alpha
level, such as 0.05, if the p-value is less than alpha, the null hypothesis is rejected. If
not, then we fail to reject the null hypothesis.

i. **Root No.** – This is the rank of the given eigenvalue (largest to
smallest). There are as many roots as there were variables in the smaller
of the two variable sets. In this example, our set of psychological
variables contains three variables and our set of academic variables contains
five variables. Thus the smaller variable set contains three variables and the
analysis generates three roots.

j. **Eigenvalue** – These are the eigenvalues of the product of the model matrix and the inverse of
the error matrix. These eigenvalues can also be calculated using the squared
canonical correlations. The largest eigenvalue is equal to largest squared
correlation /(1- largest squared correlation); 0.215/(1-0.215) =
0.274. These calculations can be completed for each correlation to find
the corresponding eigenvalue. The relative size of the eigenvalues reflect how
much of the variance in the canonical variates can be explained by the
corresponding canonical correlation. Thus, the eigenvalue corresponding to
the first correlation is greatest, and all subsequent eigenvalues are smaller.

k. **Pct.** – This is the percent of the sum of the eigenvalues represented by a given
eigenvalue. The sum of the three eigenvalues is (0.2745+0.0289+0.0109) =
0.3143. Then, the proportions can be calculated: 0.2745/0.3143 = 0.8734,
0.0289/0.3143 = 0.0919, and 0.0109/0.3143 = 0.0348. This is the proportion of explained variance in the canonical variates attributed to
a given canonical correlation.

l. **Cum. Pct.** – This is the cumulative sum of the percents.

m. **Canon Cor.** – These are the Pearson correlations of the pairs of
canonical variates. The first
pair of variates, a linear combination of the psychological measurements and
a linear combination of the academic measurements, has a correlation
coefficient of 0.464. The second pair has a correlation coefficient of
0.168, and the third pair 0.104. Each subsequent pair of canonical variates is
less correlated. These can be interpreted as any other Pearson
correlations. That is, the square of the correlation represents the
proportion of the variance in one group’s variate explained by the other group’s
variate.

n. **Sq. Cor** – These are the squares of the canonical correlations.
For example, (0.464*0.464) = 0.215.

o. **Roots** – This is the set of roots included in the null hypothesis
being tested. The null hypothesis is that all of the correlations
associated with the roots in the given set are equal to zero in the population. By testing these different sets of roots, we are determining how many dimensions
are required to describe the relationship between the two groups of variables. Because each root is less informative than the one before it, unnecessary
dimensions will be associated with the smallest eigenvalues. Thus, we
start our test with the full set of roots and then test subsets generated by
omitting the greatest root in the previous set. Here, we first tested all three
roots, then roots two and three, and then root three alone.

p. **Wilks L.** – Here, the Wilks lambda test statistic is used for
testing the null hypothesis that the given canonical correlation and all smaller
ones are equal to zero in the population. Each value can be calculated as the product of the values of
(1-canonical correlation^{2}) for the set of canonical correlations
being tested. In this example, our canonical
correlations are 0.464, 0.168 and 0.104, so the value for testing
that all three of the correlations are zero is (1- 0.464^{2})*(1-0.168^{2})*(1-0.104^{2})
= 0.75436. To test that the two smaller canonical correlations, 0.168
and 0.104, are zero in the population, the value is (1-0.168^{2})*(1-0.104^{2})
= 0.96143. The value for testing that the smallest canonical correlation is zero is (1-0.104^{2}) = 0.98919.

## Canonical Coefficients, Correlations, and Variance Explained

* * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * Raw canonical coefficients for DEPENDENT variables^{q}Function No. Variable 1 2 3 locus_of 1.254 -.621 .662 self_con -.351 -1.188 -.827 motivati 1.262 2.027 -2.000 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Standardized canonical coefficients for DEPENDENT variables^{r}Function No. Variable 1 2 3 locus_of .840 -.417 .444 self_con -.248 -.838 -.583 motivati .433 .695 -.686 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between DEPENDENT and canonical variables^{s}Function No. Variable 1 2 3 locus_of .904 -.390 .176 self_con .021 -.709 -.705 motivati .567 .351 -.745 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in dependent variables explained by canonical variables^{t}CAN. VAR. Pct Var DE Cum Pct DE Pct Var CO Cum Pct CO 1 37.980 37.980 8.180 8.180 2 25.910 63.889 .727 8.907 3 36.111 100.000 .391 9.297 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Raw canonical coefficients for COVARIATES^{q}Function No. COVARIATE 1 2 3 read .045 -.005 -.021 write .036 .042 -.091 math .023 .004 -.009 science .005 -.085 .110 female .632 1.085 1.795 * * * * * * A n a l y s i s o f V a r i a n c e -- design 1 * * * * * * Standardized canonical coefficients for COVARIATES^{r}CAN. VAR. COVARIATE 1 2 3 read .451 -.050 -.216 write .349 .409 -.888 math .220 .040 -.088 science .049 -.827 1.066 female .315 .541 .894 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between COVARIATES and canonical variables^{s}CAN. VAR. Covariate 1 2 3 read .840 -.359 -.135 write .877 .065 -.255 math .764 -.298 -.148 science .658 -.677 .230 female .364 .755 .543 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in covariates explained by canonical variables^{u}CAN. VAR. Pct Var DE Cum Pct DE Pct Var CO Cum Pct CO 1 11.305 11.305 52.488 52.488 2 .701 12.006 24.994 77.482 3 .098 12.104 9.066 86.548 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

q.** Raw canonical coefficients for DEPENDENT/COVARIATE variables** –
These are the raw canonical coefficients. They define the linear relationship
between the variables in a given group and the canonical variates. They can be interpreted in the same
manner as regression coefficients,
assuming the canonical variate as the outcome variable. For example, a one
unit increase in **locus_of_control** leads to a 1.254 unit increase in
the first variate of the psychological measurements, and a one unit
increase in **read**
score leads to a 0.045 unit increase in the first variate of the academic
measurements. Recall that our variables varied in scale. This is reflected in
the varied scale of these raw coefficients.

r. **
Standardized canonical coefficients for DEPENDENT/COVARIATE variables**
– These are the standardized canonical coefficients. This means that, if all of
the variables in the analysis are rescaled to have a mean of zero and a standard
deviation of 1, the coefficients generating the canonical variates would
indicate how a one standard deviation increase in the variable would change the
variate. For example, an increase of one standard deviation in **
locus_of_control**
would lead to a 0.840 standard deviation increase in the first variate of the psychological
measurements, and an increase of one standard deviation in **
read**
would lead to a 0.451 standard deviation increase in the first variate of the academic
measurements.

s. **
Correlations between DEPENDENT/COVARIATE variables and canonical
variables** – These are the correlations between each variable in a group and the group’s
canonical variates. For example, we can see in the “dependent” variables that **
locus_of_control **
has a Pearson correlation of 0.904 with
the first psychological variate, -0.390 with the second psychological variate,
and 0.176 with the third psychological variate. In the “covariates” section, we
can see that **read**
has a Pearson correlation of 0.840 with the first academic variate, -0.359 with
the second academic variate, and -0.135 with the third academic variate.

t. **
Variance in dependent variables explained by canonical variables** –
This is the degree to which the canonical variates of both the dependent
variables (**DE**)
and covariates (**CO**) can explain the
standardized variability in the dependent variables. For both sets of
canonical variates, the percent and cumulative percent of variability explained
by each variate is displayed.

u. **
Variance in covariates explained by canonical variables** –
This is the degree to which the canonical variates of both the dependent
variables (**DE**)
and covariates (**CO**) can explain the
standardized variability in the covariates. For both sets of canonical
variates, the percent and cumulative percent of variability explained by each
variate is displayed.

For further information on canonical correlation analysis in SPSS, see the corresponding Data Analysis Example page.