Mastering SAS for Data Analytics - Factor Analysis Essentials

Slide Note
Embed
Share

Factor analysis is a dimension reduction technique used to identify latent variables from observed data. Exploratory factor analysis involves steps like computing correlations, extracting factors, rotating factors for interpretation, and computing factor scores. SAS PROC FACTOR is commonly used for this analysis with various options available for customization.


Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward 1 SAS ESSENTIALS -- Elliott & Woodward

  2. Chapter 17: FACTOR ANALYSIS 2 SAS ESSENTIALS -- Elliott & Woodward

  3. LEARNING OBJECTIVES To be able to perform an exploratory factor analysis using PROC FACTOR To be able to use PROC FACTOR to identify underlying factors or latent variables in a data set To be able to use PROC FACTOR to rotate factors for improved interpretation To be able to use PROC FACTOR to compute factor scores 3 SAS ESSENTIALS -- Elliott & Woodward

  4. Factor Analysis Factor analysis is a dimension reduction technique designed to express the actual observed variables using a smaller number of underlying latent variables. Exploratory factor analysis involves identifying factors, determining which factors are needed to satisfactorily describe the original data, interpreting the meaning of these factors, and so on. Confirmatory factor analysis involves techniques for testing hypotheses to confirm theories, and so on. 4 SAS ESSENTIALS -- Elliott & Woodward

  5. 17.1 FACTOR ANALYSIS BASICS The typical steps in performing an exploratory factor analysis are the following: (a) Compute a correlation (or covariance) matrix for the observed variables. (b) Extract the factors (this involves deciding how many factors to extract, the method to use, and the values to use for the prior communality estimates). (c) Rotate the factors to improve interpretation. (d) Compute factor scores (if needed). Factor analysis can be quite subjective without unique solutions. Consequently, there is a certain amount of "art" involved in any factor analysis solution. 5 SAS ESSENTIALS -- Elliott & Woodward

  6. Using PROC Factor The SAS procedure used to perform exploratory factor analysis is PROC FACTOR. A simplified syntax for this procedure is as follows: PROC FACTOR <Options> ; VAR variables ; PRIORS communalities; RUN; 6 SAS ESSENTIALS -- Elliott & Woodward

  7. Table 17.1 Common Options for PROC FACTOR Option Explanation DATA = dataname Specifies which data set to use. METHOD=option Specifies the estimation method. Options include ML and PRINCIPAL MINEIGEN=n Specifies the smallest eigenvalue for retaining a factor. NFACTORS=n Specifies the maximum number of factors to retain NOPRINT Suppress output PRIORS= option Specifies the method for obtaining prior communalities ROTATE = name Specifies the rotation method. The default is ROTATE=NONE. Common rotation methods are VARIMAX, QUARTIMAX, EQUAMAX, and PROMAX. All of the above are orthogonal rotations except PROMAX. SCREE Displays a Scree plot of the eigenvalues. SIMPLE Displays means, standard deviations, and number of observations CORR Displays the correlation matrix 7 SAS ESSENTIALS -- Elliott & Woodward

  8. Common Statements for PROC FACTOR (Table 17.1 Continued) VAR variable list; Specifies the numeric variables to be analyzed. Default is to use all numeric variables BY, FORMAT, LABEL, WHERE procedures, and may be used here. These statements are common to most NOTE: If the Methods=Principal option is used, then principal component analysis is performed when the PRIORS= option is not used or is set to ONE (the default). If you specify a PRIORS= value other than PRIORS=ONE, then a principal factor method analysis is performed. A common usage is PRIORS=SMC in which case the prior communality for each variable is the squared multiple correlation of it with all other variables. After extracting the factors, the communalities represent the proportion of the variance in each of the original variables retained after extracting the factors. 8 SAS ESSENTIALS -- Elliott & Woodward

  9. Do Hands On Exercise p 379 (AFACTOR1.SAS) Two of the types of intelligence are Logical-Mathematical Intelligence and Linguistic Intelligence. In this example, we examine a hypothetical data set that contains six variables, each measured on a 0- 1 0 scale as follows: COMPUTATION - Test on mathematical computations VOCABULARY - A vocabulary test INFERENCE - A test of the use of inductive and deductive inference REASONING - A test of sequential reasoning WRITING - A score on a writing sample GRAMMAR - A test measuring proper grammar usage. 9 SAS ESSENTIALS -- Elliott & Woodward

  10. Using PROC LOGISTIC PROC FACTOR DATA=MYSASLIB.INTEL Displays common statistics SIMPLE CORR SCORE Specifies the estimation method. METHOD=PRINICPAL ROTATE=VARIMAX Specifies rotation method OUT=FS Specifies the method for obtaining prior communalities PRIORS=SMC PLOTS=SCREE; Requests SCREE plot RUN; 10 SAS ESSENTIALS -- Elliott & Woodward

  11. Observe Output From PROC FACTOR Simple Statistics 11 SAS ESSENTIALS -- Elliott & Woodward

  12. Correlation Matrix for Six Variables The high pairwise correlations among COMPUTATION, INFERENCE, and REASONING (to a lesser extent) seem to indicate some tendency to measure Math Intelligence while the variables VOCABULARY, WRITING, and GRAMMAR that seem to be measuring Linguistic Intelligence are also positively pairwise correlated. 12 SAS ESSENTIALS -- Elliott & Woodward

  13. Prior Communality Estimates Because we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. These prior communality estimates are given in this table 13 SAS ESSENTIALS -- Elliott & Woodward

  14. Scree Plot The Scree Plot gives a visual illustration of the sizes of the eigenvalues. It is clear that there are two dominant eigenvalues. 14 SAS ESSENTIALS -- Elliott & Woodward

  15. Eigenvalues This table displays eigenvalues associated with the factors based on the reduced correlation matrix. It is clear from the table that there are two dominant eigenvalues (2.319 and 1.725). Based on any reasonable criterion, it is clear that a two-factor solution should be used. 15 SAS ESSENTIALS -- Elliott & Woodward

  16. Communality Estiamates The communalities in this table are the proportion of the variance in each of the original variables retained after extracting the factors. It seems that all six variables are sufficiently well represented by the two factors, with variable REASONING having the smallest communality, 0.335. 16 SAS ESSENTIALS -- Elliott & Woodward

  17. Factor Pattern Matrix In this table, it can be seen that for Factor 1, each variable has a positive coefficient ranging from .41 for REASONING to .77 for WRITING. A reasonable interpretation of this factor is that it is an overall measure of intelligence. The second factor (Factor 2) has negative loadings on the variables measuring Linguistic Intelligence and positive coefficients on the others. 17 SAS ESSENTIALS -- Elliott & Woodward

  18. Interpreting the Factor Analysis Results Based on the less than ideal interpretability of these factors, we use a rotation in hope of producing more interpretable results. (Recall that by construction, there should be two factors: Math Intelligence and Linguistic Intelligence.) Using the option ROTATE=VARIMAX, we have instructed SAS to perform a Varimax rotation. SAS provides several rotation options, and Varimax is a popular "orthogonal rotation," which produces two orthogonal factors that are potentially easier to interpret. 18 SAS ESSENTIALS -- Elliott & Woodward

  19. Interpreting the Rotated Factor Pattern Matrix In this table the coefficients for COMPUTATION are the correlations of the variable COMPUTATION with each of the two factors. There is a large positive correlation between COMPUTATION and Factor 2 and a very small correlation between COMPUTATION and Factor 1. Similar interpretations show that Factor 1 is highly correlated with the three variables measuring Linguistic Intelligence and Factor 2 tends to correspond to Math Intelligence. 19 SAS ESSENTIALS -- Elliott & Woodward

  20. Storing Factor Scores Suppose you want to calculate factor scores and save them in a temporary working file FSCORES. In order to accomplish this, add the following PROC FACTOR options before PLOTS= SCREE; SCORE NFACTOR=2 OUT=FSCORE Then, after the RUN; statement add the code PROC PRINT DATA=FSCORE; VAR FACTORl FACTOR2; RUN; Outputs a SAS dataset named FSCORE 20 SAS ESSENTIALS -- Elliott & Woodward

  21. Results of OUT=FSCORE The two-factor scores are given the default names FACTOR1 and FACTOR2 (the prefix "FACTOR" can be changed using the PREFIX= option). Recalling that Factor 1 is a measure of Linguistic Intelligence and Factor 2 measures Math Intelligence, from the factor scores it can be seen that Subject 1 has a higher Linguistic Intelligence score, Subject 2 seems to have High Math Intelligence, and Subject 3 unfortunately doesn't seem to have strength in either dimension. 21 SAS ESSENTIALS -- Elliott & Woodward

  22. Do Hands On Example p 386 (AFACTOR2.SAS) Olympic Data This data set contains scores of 193 athletes who completed all 10 decathlon events in the 1988 through 2012 Olympic Games. The 10 events in the decathlon are 100-m run, long jump, shot put, high jump, 400-m run, 100-m hurdles, discus, pole vault, javelin, and 1500-m run. These events measure a wide variety of athletic ability, and in this example we use this decathlon data set to explore whether there are some underlying dimensions of athletic ability. It should be noted that the "times" in the running events are given negative signs so that " larger" values are better than "smaller" values as is the case in the distance measurements 22 SAS ESSENTIALS -- Elliott & Woodward

  23. Factor Analysis Code for Olympic Data PROC FACTOR SIMPLE CORR DATA MYSASLIB.OLYMPIC METHOD=PRINCIPAL MSA PRIORS=SMC ROTATE=VARIMAX OUTSTAT=FACT ALL PLOTS=SCREE; VAR RUNl0 LONGJUMP SHOTPUT HIGHJUMP RUN400 HURDLES DISCUS POLEVAULT JAVELIN RUNl500S; RUN; 23 SAS ESSENTIALS -- Elliott & Woodward

  24. Simple Statistics for Olympic Data As mentioned earlier, times in the running events are given negative signs so that "larger" values are better than "smaller" values as is the case in the distance measurements. Moreover, the 1500-m results are given in (negative) seconds rather than the usual reporting of minutes and seconds. 24 SAS ESSENTIALS -- Elliott & Woodward

  25. Correlations for Olympic Data There are positive correlations between speed events such as the 100-m run and 100-m hurdles (0.692) and between strength events SHOTPUT and DISCUS (0.748). The 1500-m run is not highly correlated with any of the other events. 400-m run (0.368). X 25 SAS ESSENTIALS -- Elliott & Woodward

  26. Communality Estimates, Olympic Data Since we specified METHOD=PRINCIPAL and PRIORS=SMC, SAS uses the principal factors method where the prior communality estimate for each variable is the squared multiple correlation of it with all other variables. This table shows the prior communality estimates (slightly rearranged from the original output) 26 SAS ESSENTIALS -- Elliott & Woodward

  27. Eigenvalues for Olympic Data See next slide 27 SAS ESSENTIALS -- Elliott & Woodward

  28. Eigenvalues for Olympic Data The eigenvalues table shows factors based on the reduced correlation matrix. PROC FACTOR selected three factors. It is clear from the previous table and the Scree plot that there are three dominant eigenvalues. 28 SAS ESSENTIALS -- Elliott & Woodward

  29. The communalities in this table (rearranged slightly from, output) are the proportion of the variance in each of the original variables retained after extracting the factors. It seems that all 10 events are fairly well represented by the three factors, with all communalities above 0.33. However, HIGHJUMP, POLEVALULT, JAVELIN, and RUN1500S all having communalities below 0.4. 29 SAS ESSENTIALS -- Elliott & Woodward

  30. As was the case for the unrotated solution for the Intelligence Data, it can be seen that Factor 1 has a positive coefficient, all of which are above 0.4 except for RUN1500S, which has a coefficient of 0.17. Factor Patterns A reasonable interpretation is that Factor 1 measures overall athletic ability, primarily related to the first nine events. Factors 2 and 3 are more difficult to interpret. 30 SAS ESSENTIALS -- Elliott & Woodward

  31. Use ROTATE=VARIMAX Based on the confusing interpretations associated with the Three-Factor solutions given in the previous table, we again use a rotation to produce more interpretable results. Using the option ROTATE=VARIMAX results in the Rotated Factor Pattern Matrix given in in the following slide 31 SAS ESSENTIALS -- Elliott & Woodward

  32. The first rotated factor seems to focus on events 100-m long jump, 400-m run, and 110-m hurdles that involve speed and spring. Factor 2 seems to be primarily an arm strength factor with high coefficients for shot put and long jump and lesser in javelin, pole vault, and high jump. The only event with a large coefficient in Factor 3 is the 1500-m hurdles. This is consistent the correlation matrix that suggested the 1500-m run was "different" from the other events. Rotated Factor Patterns 32 SAS ESSENTIALS -- Elliott & Woodward

  33. 17.2 SUMMARY In this chapter, we have discussed methods for using PROC FACTOR to perform exploratory factor analysis. In the Hands-on Examples, we have illustrated the use of rotation to obtain more understandable results. Continue to Chapter 18: CREATING CUSTOM GRAPHS 33 SAS ESSENTIALS -- Elliott & Woodward

  34. These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216X ISBN-13: 978-1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu.edu. Thanks. 34 SAS ESSENTIALS -- Elliott & Woodward

Related