Factor Analysis

 
Factor Analysis
 
By Mayamin Hamid Raha  (05/05/2023)
 
Background
Introduction
Application fields
Examples
Statistical terms used
Factor solution rule
FA Model
Factor Extraction Methods
 
 
 
 
 
Contents
 
 
2
 
 
Assumptions
Factor Extraction Methods
Case Study
Limitations
References
 
 
Background
 
Large datasets consisting of several variables can
be reduced by observing groups of latent variables
(factors). So FA is used to
 
-
identify underlying factors to explain
correlation among variables
 
-
identify new smaller set of uncorrelated
variable to replace original set of correlated
variables.
 
3
 
The figure above shows how the four hidden factors in
blue drive the measurable values in the yellow
indicator tags
https://statisticsbyjim.com/basics/factor-analysis/
 
What is Factor Analysis ?
 
The main idea behind FA
 
 Measurable
 
Latent variables
 and observable
   
that
share common
variables
 
variance
 
Uses the correlation structure amongst observed
variables to model a smaller number of
unobserved, latent variables known as factors 
[1]
 
 
4
 
reduced
 
Dimensionality Reduction
 
A method of modeling observed
variables and their covariance structures
 
Observed variables are modeled as
linear combinations of potential factors
 
There two types of FA
-
Exploratory Factor Analysis (EFA)
-
Confirmatory Factor Analysis (CFA)
 
CFA is used for confirming hypotheses using
path analysis diagram.
 
EFA finds complex patterns by exploring
dataset.
 
Application fields
 
-
Machine learning (unsupervised)
-
Data mining
-
Psychology studies
-
Education
-
Marketing
 
 
5
 
Example 1
 
 
Example: Find socioeconomic status (SES) of a
person.
-
This is a factor that we cannot be measure
directly.
 
We can access (observable variables)
-
Occupation
-
Income
-
Education Level
 
 
 
 
 
 
 
 
 
 
 
 
 
 
People with a particular socioeconomic status
tend to have similar values
 
If the factor (SES) has a strong relationship
with these variables, then it accounts for a
large portion of the variance in the variables.
 
6
 
Example 2
 
7
 
Output of a simple factor analysis looking at
indicators of wealth, with just variables and two
resulting factors (using SPSS software).
 
-
variable income has a correlation of 0.65
with Factor 1.
-
Factor 1 -> “Individual socioeconomic
status.”
-
Factor 2 -> “Neighborhood socioeconomic
status.”
-
Done by orthogonal (factors are not
correlated) or oblique rotations(factors are
correlated).
 
https://www.theanalysisfactor.com/rotations-factor-analysis/
 
Statistical terms used in FA
 
Communality - Amount of variance a variable shares
with all the other variables
Factor loadings - Correlations between the variables
and the factors.
Eigenvalue - Represents the total variance explained
by each factor (value > = 1 preferred)
 
A factor with an eigenvalue of 1 accounts for as
much variance as a single variable, and the logic is
that only factors that explain at least the same amount
of variance as a single variable is worth keeping.
 
 
Scree plot - A scree plot is a plot of the
Eigenvalues against the number of factors
in order of extraction.
 
 
8
 
https://en.wikipedia.org/wiki/Scree_plot
 
https://studylib.net/doc/15380201/factor-analysis-%C2%A9-
2007-prentice-hall
 
 
 
Factor Solution Rule
 
When to drop a factor solution ?
-
When a factor does not have high factor loadings from more than two variables.
 
 
 
 
9
 
FA Model
 
Factor model is mentioned below:
 
 
 
X
i
 = 
A
i 
1
F
1
 + 
A
i 
2
F
2
 + 
A
i 
3
F
3
 + . . . + 
A
im
F
m
 + 
V
i
U
i
 
Each variable is written as a linear combination of common factors and a unique factor.
 where,
  
X
i
   = 
i 
th standardized variable
  
A
ij
  = standardized mult reg coeff of var 
i
 on common factor 
j
  
F
j
  = common factor 
j
  
V
i
  = standardized reg coeff of var 
i
 on unique factor 
i
  
U
i
   = the unique factor for variable 
i
  
m
   = number of common factors
 
 
10
 
Factor Extraction Methods
 
-
Maximum Likelihood (ML): estimates factor loading
-
Principal Axis Factor method (PAF)
-
Principal Component Analysis (PCA)
 
11
 
Case Study
 
1)BARTLETT’S TEST OF SPHERICITY
Checks - if no correlation is present among the
variables.
-
Idea is to try to reject it.
-
It returns chi squared value, and p
statistic test results.
-
If the p test statistic value is less than
0.05, correlation is present among the
variables with 95% confidence.
-
P statistics value 0.0
 
 
12
 
https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/
 
A recruiter wants to hire employees for a business
firm. Interview is over and every interviewee has
been assigned a score out of 10 against 32 different
personality traits. Such as distant, careless, talkative
etc. Now the recruiters want to apply factor analysis
on this data to find correlation.
 
 
 
Similar or correlated features
can be grouped and represented
as factors
 
KMO test
 
2) Can FA be applied to this data ?
 
KAISER-MEYER-OLKIN (KMO) TEST
Measures the proportion of variance that might be
a
 common variance 
among the variables. A value
between 0 and 1.
k statistic value of 0.84. It shows that the data has
more correlation and FA can be applied.
 
 
 
13
 
Where rjk is the correlation between two
variables and pjk is partial correlation.
 
https://en.wikipedia.org/wiki/Kaiser%E2%80%93Meyer%E2%80
%93Olkin_test
 
Case study
 
3) Determine the number of factors:
 
The number of factors can be decided on the basis
of the amount of common variance the factors
explain.
 
Eigenvalues represent the variance each factor
explains. Select the number of factors whose
eigenvalues are greater than 1.
 
 
 
 
 
 
 
 
 
 
 
Eigenvalues drop below 1 from the 7th
factor. So, the optimal number of factors is 6
 
14
 
Case Study
 
4) Interpret Factors
Ranges from -1 to 1.
Values close to -1 or 1 means it has influence
Values close to 0 indicates lower influence
For example, in Factor 0, we can see that the
features ‘distant’ and ‘shy’ talkative have high
loadings than other variables. From this, we can
see that Factor 0, explains the common variance
in people who are reserved i.e. the variance
among the people who are distant and shy.
 
 
15
 
Limitations
 
-
Naming factors
-
Split loadings (a variable may be difficult to
interpret since it loads into more than one
factors)
 
16
 
References
 
1.
Yong, An Gie, and Sean Pearce. "A beginner’s guide to factor analysis: Focusing on exploratory factor
analysis." 
Tutorials in quantitative methods for psychology
 9.2 (2013): 79-94.
2.
https://en.wikipedia.org/wiki/Factor_analysis
3.
https://www.theanalysisfactor.com/rotations-factor-
analysis/?fbclid=IwAR3T4uLoEEUcmNU_i217ZXGN5Ti7wpyDcqDcOdlxW6SGtz084xnv6SFFVew
4.
https://www.youtube.com/watch?v=eAl0nXkzt7w
5.
https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor-
analysis/
6.
https://online.stat.psu.edu/stat505/lesson/12
7.
https://statisticsbyjim.com/basics/factor-analysis/
8.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883798/
 
17
 
End of Slide
 
Thank you!
 
18
Slide Note
Embed
Share

Factor analysis is a statistical method used to identify underlying factors that explain correlations among variables. It helps in reducing large datasets by finding uncorrelated variables. There are two types of factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Applications of factor analysis include machine learning, data mining, psychology studies, education, and marketing. By modeling observed variables as linear combinations of potential factors, factor analysis aids in dimensionality reduction and revealing hidden relationships among variables.

  • Factor Analysis
  • Data Patterns
  • Statistical Method
  • Machine Learning
  • Dimensionality Reduction

Uploaded on Feb 16, 2025 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Factor Analysis By Mayamin Hamid Raha (05/05/2023)

  2. Contents Assumptions Factor Extraction Methods Case Study Limitations References Background Introduction Application fields Examples Statistical terms used Factor solution rule FA Model Factor Extraction Methods 2

  3. Background Large datasets consisting of several variables can be reduced by observing groups of latent variables (factors). So FA is used to - identify underlying factors to explain correlation among variables The figure above shows how the four hidden factors in blue drive the measurable values in the yellow indicator tags https://statisticsbyjim.com/basics/factor-analysis/ - identify new smaller set of uncorrelated variable to replace original set of correlated variables. 3

  4. A method of modeling observed variables and their covariance structures What is Factor Analysis ? Observed variables are modeled as linear combinations of potential factors The main idea behind FA There two types of FA - Exploratory Factor Analysis (EFA) - Confirmatory Factor Analysis (CFA) Measurable reduced Latent variables and observable that CFA is used for confirming hypotheses using path analysis diagram. share common variables variance EFA finds complex patterns by exploring dataset. Uses the correlation structure amongst observed variables to model a unobserved, latent variables known as factors [1] Dimensionality Reduction smaller number of 4

  5. Application fields - Machine learning (unsupervised) - Data mining - Psychology studies - Education - Marketing 5

  6. Example 1 Example: Find socioeconomic status (SES) of a person. - This is a factor that we cannot be measure directly. People with a particular socioeconomic status tend to have similar values If the factor (SES) has a strong relationship with these variables, then it accounts for a large portion of the variance in the variables. We can access (observable variables) - Occupation - Income - Education Level 6

  7. - variable income has a correlation of 0.65 with Factor 1. Factor 1 -> Individual socioeconomic status. Factor 2 -> Neighborhood socioeconomic status. Done by orthogonal (factors are not correlated) or oblique rotations(factors are correlated). - Example 2 - - Output of a simple factor analysis looking at indicators of wealth, with just variables and two resulting factors (using SPSS software). 7 https://www.theanalysisfactor.com/rotations-factor-analysis/

  8. Scree plot - A scree plot is a plot of the Eigenvalues against the number of factors in order of extraction. Statistical terms used in FA Communality - Amount of variance a variable shares with all the other variables Factor loadings - Correlations between the variables and the factors. Eigenvalue - Represents the total variance explained by each factor (value > = 1 preferred) A factor with an eigenvalue of 1 accounts for as much variance as a single variable, and the logic is that only factors that explain at least the same amount of variance as a single variable is worth keeping. https://en.wikipedia.org/wiki/Scree_plot https://studylib.net/doc/15380201/factor-analysis-%C2%A9- 2007-prentice-hall 8

  9. Factor Solution Rule When to drop a factor solution ? - When a factor does not have high factor loadings from more than two variables. 9

  10. FA Model Factor model is mentioned below: Xi= Ai 1F1+ Ai 2F2+ Ai 3F3+ . . . + AimFm+ ViUi Each variable is written as a linear combination of common factors and a unique factor. where, Xi = i th standardized variable Aij= standardized mult reg coeff of var i on common factor j Fj= common factor j Vi= standardized reg coeff of var i on unique factor i Ui = the unique factor for variable i m = number of common factors 10

  11. Factor Extraction Methods - Maximum Likelihood (ML): estimates factor loading - Principal Axis Factor method (PAF) - Principal Component Analysis (PCA) 11

  12. 1)BARTLETTS TEST OF SPHERICITY Checks - if no correlation is present among the variables. - Idea is to try to reject it. - It returns chi squared value, and p statistic test results. - If the p test statistic value is less than 0.05, correlation is present among the variables with 95% confidence. - P statistics value 0.0 Case Study A recruiter wants to hire employees for a business firm. Interview is over and every interviewee has been assigned a score out of 10 against 32 different personality traits. Such as distant, careless, talkative etc. Now the recruiters want to apply factor analysis on this data to find correlation. Similar or correlated features can be grouped and represented as factors https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/ 12

  13. KMO test 2) Can FA be applied to this data ? KAISER-MEYER-OLKIN (KMO) TEST Measures the proportion of variance that might be a common variance among the variables. A value between 0 and 1. k statistic value of 0.84. It shows that the data has more correlation and FA can be applied. Where rjk is the correlation between two variables and pjk is partial correlation. https://en.wikipedia.org/wiki/Kaiser%E2%80%93Meyer%E2%80 %93Olkin_test 13

  14. Case study 3) Determine the number of factors: The number of factors can be decided on the basis of the amount of common variance the factors explain. Eigenvalues represent the variance each factor explains. Select the number of factors whose eigenvalues are greater than 1. Eigenvalues drop below 1 from the 7th factor. So, the optimal number of factors is 6 14

  15. Case Study 4) Interpret Factors Ranges from -1 to 1. Values close to -1 or 1 means it has influence Values close to 0 indicates lower influence For example, in Factor 0, we can see that the features distant and shy talkative have high loadings than other variables. From this, we can see that Factor 0, explains the common variance in people who are reserved i.e. the variance among the people who are distant and shy. 15

  16. Limitations - - Naming factors Split loadings (a variable may be difficult to interpret since it loads into more than one factors) 16

  17. References 1. Yong, An Gie, and Sean Pearce. "A beginner s guide to factor analysis: Focusing on exploratory factor analysis." Tutorials in quantitative methods for psychology 9.2 (2013): 79-94. https://en.wikipedia.org/wiki/Factor_analysis https://www.theanalysisfactor.com/rotations-factor- analysis/?fbclid=IwAR3T4uLoEEUcmNU_i217ZXGN5Ti7wpyDcqDcOdlxW6SGtz084xnv6SFFVew https://www.youtube.com/watch?v=eAl0nXkzt7w https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor- analysis/ https://online.stat.psu.edu/stat505/lesson/12 https://statisticsbyjim.com/basics/factor-analysis/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883798/ 2. 3. 4. 5. 6. 7. 8. 17

  18. End of Slide Thank you! 18

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#