Factor Analysis
Factor analysis is a statistical method used to identify underlying factors that explain correlations among variables. It helps in reducing large datasets by finding uncorrelated variables. There are two types of factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Applications of factor analysis include machine learning, data mining, psychology studies, education, and marketing. By modeling observed variables as linear combinations of potential factors, factor analysis aids in dimensionality reduction and revealing hidden relationships among variables.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Factor Analysis By Mayamin Hamid Raha (05/05/2023)
Contents Assumptions Factor Extraction Methods Case Study Limitations References Background Introduction Application fields Examples Statistical terms used Factor solution rule FA Model Factor Extraction Methods 2
Background Large datasets consisting of several variables can be reduced by observing groups of latent variables (factors). So FA is used to - identify underlying factors to explain correlation among variables The figure above shows how the four hidden factors in blue drive the measurable values in the yellow indicator tags https://statisticsbyjim.com/basics/factor-analysis/ - identify new smaller set of uncorrelated variable to replace original set of correlated variables. 3
A method of modeling observed variables and their covariance structures What is Factor Analysis ? Observed variables are modeled as linear combinations of potential factors The main idea behind FA There two types of FA - Exploratory Factor Analysis (EFA) - Confirmatory Factor Analysis (CFA) Measurable reduced Latent variables and observable that CFA is used for confirming hypotheses using path analysis diagram. share common variables variance EFA finds complex patterns by exploring dataset. Uses the correlation structure amongst observed variables to model a unobserved, latent variables known as factors [1] Dimensionality Reduction smaller number of 4
Application fields - Machine learning (unsupervised) - Data mining - Psychology studies - Education - Marketing 5
Example 1 Example: Find socioeconomic status (SES) of a person. - This is a factor that we cannot be measure directly. People with a particular socioeconomic status tend to have similar values If the factor (SES) has a strong relationship with these variables, then it accounts for a large portion of the variance in the variables. We can access (observable variables) - Occupation - Income - Education Level 6
- variable income has a correlation of 0.65 with Factor 1. Factor 1 -> Individual socioeconomic status. Factor 2 -> Neighborhood socioeconomic status. Done by orthogonal (factors are not correlated) or oblique rotations(factors are correlated). - Example 2 - - Output of a simple factor analysis looking at indicators of wealth, with just variables and two resulting factors (using SPSS software). 7 https://www.theanalysisfactor.com/rotations-factor-analysis/
Scree plot - A scree plot is a plot of the Eigenvalues against the number of factors in order of extraction. Statistical terms used in FA Communality - Amount of variance a variable shares with all the other variables Factor loadings - Correlations between the variables and the factors. Eigenvalue - Represents the total variance explained by each factor (value > = 1 preferred) A factor with an eigenvalue of 1 accounts for as much variance as a single variable, and the logic is that only factors that explain at least the same amount of variance as a single variable is worth keeping. https://en.wikipedia.org/wiki/Scree_plot https://studylib.net/doc/15380201/factor-analysis-%C2%A9- 2007-prentice-hall 8
Factor Solution Rule When to drop a factor solution ? - When a factor does not have high factor loadings from more than two variables. 9
FA Model Factor model is mentioned below: Xi= Ai 1F1+ Ai 2F2+ Ai 3F3+ . . . + AimFm+ ViUi Each variable is written as a linear combination of common factors and a unique factor. where, Xi = i th standardized variable Aij= standardized mult reg coeff of var i on common factor j Fj= common factor j Vi= standardized reg coeff of var i on unique factor i Ui = the unique factor for variable i m = number of common factors 10
Factor Extraction Methods - Maximum Likelihood (ML): estimates factor loading - Principal Axis Factor method (PAF) - Principal Component Analysis (PCA) 11
1)BARTLETTS TEST OF SPHERICITY Checks - if no correlation is present among the variables. - Idea is to try to reject it. - It returns chi squared value, and p statistic test results. - If the p test statistic value is less than 0.05, correlation is present among the variables with 95% confidence. - P statistics value 0.0 Case Study A recruiter wants to hire employees for a business firm. Interview is over and every interviewee has been assigned a score out of 10 against 32 different personality traits. Such as distant, careless, talkative etc. Now the recruiters want to apply factor analysis on this data to find correlation. Similar or correlated features can be grouped and represented as factors https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/ 12
KMO test 2) Can FA be applied to this data ? KAISER-MEYER-OLKIN (KMO) TEST Measures the proportion of variance that might be a common variance among the variables. A value between 0 and 1. k statistic value of 0.84. It shows that the data has more correlation and FA can be applied. Where rjk is the correlation between two variables and pjk is partial correlation. https://en.wikipedia.org/wiki/Kaiser%E2%80%93Meyer%E2%80 %93Olkin_test 13
Case study 3) Determine the number of factors: The number of factors can be decided on the basis of the amount of common variance the factors explain. Eigenvalues represent the variance each factor explains. Select the number of factors whose eigenvalues are greater than 1. Eigenvalues drop below 1 from the 7th factor. So, the optimal number of factors is 6 14
Case Study 4) Interpret Factors Ranges from -1 to 1. Values close to -1 or 1 means it has influence Values close to 0 indicates lower influence For example, in Factor 0, we can see that the features distant and shy talkative have high loadings than other variables. From this, we can see that Factor 0, explains the common variance in people who are reserved i.e. the variance among the people who are distant and shy. 15
Limitations - - Naming factors Split loadings (a variable may be difficult to interpret since it loads into more than one factors) 16
References 1. Yong, An Gie, and Sean Pearce. "A beginner s guide to factor analysis: Focusing on exploratory factor analysis." Tutorials in quantitative methods for psychology 9.2 (2013): 79-94. https://en.wikipedia.org/wiki/Factor_analysis https://www.theanalysisfactor.com/rotations-factor- analysis/?fbclid=IwAR3T4uLoEEUcmNU_i217ZXGN5Ti7wpyDcqDcOdlxW6SGtz084xnv6SFFVew https://www.youtube.com/watch?v=eAl0nXkzt7w https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor- analysis/ https://online.stat.psu.edu/stat505/lesson/12 https://statisticsbyjim.com/basics/factor-analysis/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883798/ 2. 3. 4. 5. 6. 7. 8. 17
End of Slide Thank you! 18