This page shows an example of a principal components analysis with footnotes explaining the output. The data used in this example were collected by Professor James Sidanius, who has generously shared them with us. You can download the data set here.
Overview: The "what" and "why" of principal components analysis
Principal components analysis is a method of data reduction. Suppose that you have a dozen variables that are correlated. You might use principal components analysis to reduce your 12 measures to a few principal components. In this example, you may be most interested in obtaining the component scores (which are variables that are added to your data set) and/or to look at the dimensionality of the data. For example, if two components are extracted and those two components accounted for 68% of the total variance, then we would say that two dimensions in the component space account for 68% of the variance. Unlike factor analysis, principal components analysis is not usually used to identify underlying latent variables. Hence, the loadings onto the components are not interpreted as factors in a factor analysis would be. Principal components analysis, like factor analysis, can be preformed on raw data, as shown in this example, or on a correlation or a covariance matrix. If raw data is used, the procedure will create the original correlation matrix or covariance matrix, as specified by the user. If the correlation matrix is used, the variables are standardized and the total variance will equal the number of variables used in the analysis (because each standardized variable has a variance equal to 1). If the covariance matrix is used, the variables will remain in their original metric. However, one must take care to use variables whose variances and scales are similar. Unlike factor analysis, which analyzes the common variance, the original matrix in a principal components analysis analyzes the total variance. Also, principal components analysis assumes that each original measure is collected without measurement error.
In this example we have included many options, including the original correlation matrix and the scree plot. While you may not wish to use all of these options, we have included them here to aid in the explanation of the analysis. We have also created a page of annotated output for a factor analysis that parallels this analysis. For general information regarding the similarities and differences between principal components analysis and factor analysis, please see our FAQ entitled What are some of the similarities and differences between principal components analysis and factor analysis?.
proc factor data = "d:m255_sas" corr scree ev method = principal; var item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 ; run;
Correlations ITEM13 ITEM14 ITEM15 ITEM13 INSTRUC WELL PREPARED 1.00000 0.66146 0.59999 ITEM14 INSTRUC SCHOLARLY GRASP 0.66146 1.00000 0.63460 ITEM15 INSTRUCTOR CONFIDENCE 0.59999 0.63460 1.00000 ITEM16 INSTRUCTOR FOCUS LECTURES 0.56626 0.50003 0.50535 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.57687 0.55150 0.58664 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.40898 0.43311 0.45707 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 0.28632 0.32041 0.35869 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.30418 0.31481 0.35568 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.47553 0.44896 0.50904 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 0.33255 0.33313 0.36884 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.56399 0.56461 0.58233 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.45360 0.44281 0.43481 Correlations ITEM16 ITEM17 ITEM18 ITEM13 INSTRUC WELL PREPARED 0.56626 0.57687 0.40898 ITEM14 INSTRUC SCHOLARLY GRASP 0.50003 0.55150 0.43311 ITEM15 INSTRUCTOR CONFIDENCE 0.50535 0.58664 0.45707 ITEM16 INSTRUCTOR FOCUS LECTURES 1.00000 0.58649 0.40479 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.58649 1.00000 0.55474 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.40479 0.55474 1.00000 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 0.33540 0.44930 0.62660 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.31676 0.41682 0.52055 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.45245 0.59526 0.55417 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 0.36255 0.44976 0.53609 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.45880 0.61302 0.56950 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.42967 0.52058 0.47382 Correlations ITEM19 ITEM20 ITEM21 ITEM13 INSTRUC WELL PREPARED 0.28632 0.30418 0.47553 ITEM14 INSTRUC SCHOLARLY GRASP 0.32041 0.31481 0.44896 ITEM15 INSTRUCTOR CONFIDENCE 0.35869 0.35568 0.50904 ITEM16 INSTRUCTOR FOCUS LECTURES 0.33540 0.31676 0.45245 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.44930 0.41682 0.59526 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.62660 0.52055 0.55417 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 1.00000 0.44647 0.49921 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.44647 1.00000 0.42479 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.49921 0.42479 1.00000 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 0.48404 0.38297 0.50651 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.44401 0.40962 0.59751 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.37383 0.35722 0.49977 Correlations ITEM22 ITEM23 ITEM24 ITEM13 INSTRUC WELL PREPARED 0.33255 0.56399 0.45360 ITEM14 INSTRUC SCHOLARLY GRASP 0.33313 0.56461 0.44281 ITEM15 INSTRUCTOR CONFIDENCE 0.36884 0.58233 0.43481 ITEM16 INSTRUCTOR FOCUS LECTURES 0.36255 0.45880 0.42967 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.44976 0.61302 0.52058 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.53609 0.56950 0.47382 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 0.48404 0.44401 0.37383 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.38297 0.40962 0.35722 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.50651 0.59751 0.49977 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 1.00000 0.49317 0.44440 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.49317 1.00000 0.70464 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.44440 0.70464 1.00000
The table above was included in the output because we included the keyword corr on the proc factor statement. This table gives the correlations between the original variables (which are specified on the var statement). Before conducting a principal components analysis, you want to check the correlations between the variables. If any of the correlations are too high (say above .9), you may need to remove one of the variables from the analysis, as the two variables seem to be measuring the same thing. Another alternative would be to combine the variables in some way (perhaps by taking the average). If the correlations are too low, say below .1, then one or more of the variables might load only onto one principal component (in other words, make its own principal component). This is not helpful, as the whole point of the analysis is to reduce the number of items (variables).
Initial Factor Method: Principal Components Prior Communality Estimates: ONE Eigenvalues of the Correlation Matrix: Total = 12 Average = 1 Eigenvaluea Differenceb Proportionc Cumulatived 1 6.24914661 5.01966832 0.5208 0.5208 2 1.22947829 0.51048923 0.1025 0.6232 3 0.71898906 0.10585957 0.0599 0.6831 4 0.61312949 0.05196458 0.0511 0.7342 5 0.56116491 0.05817383 0.0468 0.7810 6 0.50299107 0.03172750 0.0419 0.8229 7 0.47126357 0.08244834 0.0393 0.8622 8 0.38881523 0.02091149 0.0324 0.8946 9 0.36790373 0.03970330 0.0307 0.9252 10 0.32820043 0.01082277 0.0274 0.9526 11 0.31737767 0.06583773 0.0264 0.9790 12 0.25153994 0.0210 1.0000 2 factors will be retained by the MINEIGEN criterion.
a. Eigenvalue – This column contains the eigenvalues. The first component will always account for the most variance (and hence have the highest eigenvalue), and the next component will account for as much of the left over variance as it can, and so on. Hence, each successive component will account for less and less variance.
b. Difference – This column gives the differences between the current and the next eigenvalue. For example, 6.24 – 1.22 = 5.02. This gives you a sense of how much change there is in the eigenvalues from one component to the next.
c. Proportion – This column gives the proportion of variance accounted for by each component. In this example, the first component accounts for just over half of the variance (approximately 52%).
d. Cumulative – This column sums up to proportion column, so that you can see how much variance is accounted for by, say, the first five components, .7810.
Initial Factor Method: Principal Components Scree Plot of Eigenvalues | 7 + | | | | | 1 6 + | | | | | 5 + | | E | i | g | e 4 + n | v | a | l | u | e 3 + s | | | | | 2 + | | | | | 2 1 + | | 3 4 | 5 6 7 | 8 9 0 1 2 | 0 + -----+------+------+------+------+------+------+------+------+------+------+------+------+---- 0 1 2 3 4 5 6 7 8 9 10 11 12 Number Initial Factor Method: Principal Components
The scree plot graphs the eigenvalue against the component number. You can see these values in the first two columns of the table immediately above. From the third component on, you can see that the line is almost flat, meaning the each successive component is accounting for smaller and smaller amounts of the total variance. In general, we are interested in keeping only those principal components whose eigenvalues are greater than 1. Components with an eigenvalue of less than 1 account for less variance than did the original variable (which had a variance of 1), and so are of little use. Hence, you can see that the point of principal components analysis is to redistribute the variance in the correlation matrix (using the method of eigenvalue decomposition) to redistribute the variance to first components extracted.
Eigenvectors 1e 2e ITEM13 INSTRUC WELL PREPARED 0.29093 -0.40510 ITEM14 INSTRUC SCHOLARLY GRASP 0.28953 -0.36765 ITEM15 INSTRUCTOR CONFIDENCE 0.29851 -0.27789 ITEM16 INSTRUCTOR FOCUS LECTURES 0.27406 -0.25376 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.32261 -0.09492 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.30207 0.33002 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 0.25641 0.44823 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.23709 0.34083 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.30536 0.12133 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 0.26057 0.32871 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.32768 -0.03634 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.28550 0.00421
e. Eigenvectors – These columns give the eigenvectors for each variable in the principal components analysis. An eigenvector is a linear combination of the original variables. The two components that have been extracted are orthogonal to one another, and they can be thought of as weights. These weights are multiplied by each value in the original variable, and those values are then summed up to yield the eigenvector. The eigenvectors tell you about the strength of relationship between the variables and the components.
Factor Pattern Factor1 Factor2 ITEM13 INSTRUC WELL PREPARED 0.72729 -0.44919 ITEM14 INSTRUC SCHOLARLY GRASP 0.72378 -0.40766 ITEM15 INSTRUCTOR CONFIDENCE 0.74622 -0.30813 ITEM16 INSTRUCTOR FOCUS LECTURES 0.68511 -0.28137 ITEM17 INSTRUCTOR USES CLEAR RELEVANT EXAMPLES 0.80647 -0.10525 ITEM18 INSTRUCTOR SENSITIVE TO STUDENTS 0.75512 0.36593 ITEM19 INSTRUCTOR ALLOWS ME TO ASK QUESTIONS 0.64098 0.49700 ITEM20 INSTRUCTOR IS ACCESSIBLE TO STUDENTS OUTSIDE CLASS 0.59269 0.37792 ITEM21 INSTRUCTOR AWARE OF STUDENTS UNDERSTANDING 0.76335 0.13454 ITEM22 I AM SATISFIED WITH STUDENT PERFORMANCE EVALUATION 0.65138 0.36448 ITEM23 COMPARED TO OTHER INSTRUCTORS, THIS INSTRUCTOR IS 0.81914 -0.04029 ITEM24 COMPARED TO OTHER COURSES THIS COURSE WAS 0.71371 0.00467
f. Factor1 and Factor2 – This is the component matrix. This table contains component loadings, which are the correlations between the variable and the component. Because these are correlations, possible values range from -1 to +1. The columns under these headings are the principal components that have been extracted. As you can see, two components were extracted (the two components that had an eigenvalue greater than 1). You usually do not try to interpret the components the way that you would factors that have been extracted from a factor analysis. Rather, most people are interested in the component scores, which are used for data reduction (as opposed to factor analysis where you are looking for underlying latent continua).
Variance Explained by Each Factor Factor1 Factor2 6.2491466 1.2294783 Final Communality Estimates: Total = 7.478625 ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ITEM18 0.73071411 0.69004215 0.65179276 0.54854615 0.66147090 0.70412023 Initial Factor Method: Principal Components ITEM19 ITEM20 ITEM21 ITEM22 ITEM23 ITEM24 0.65786784 0.49410612 0.60081090 0.55713785 0.67261205 0.50940384