Factor Analysis

Factor Analysis

By Mayamin Hamid Raha  (05/05/2023)

❖

Background

❖

Introduction

❖

Application fields

❖

Examples

❖

Statistical terms used

❖

Factor solution rule

❖

FA Model

❖

Factor Extraction Methods

Contents

❖

Assumptions

❖

Factor Extraction Methods

❖

Case Study

❖

Limitations

❖

References

Background

Large datasets consisting of several variables can

be reduced by observing groups of latent variables

(factors). So FA is used to

identify underlying factors to explain

correlation among variables

identify new smaller set of uncorrelated

variable to replace original set of correlated

variables.

The figure above shows how the four hidden factors in

blue drive the measurable values in the yellow

indicator tags

https://statisticsbyjim.com/basics/factor-analysis/

What is Factor Analysis ?

The main idea behind FA

 Measurable

Latent variables

 and observable

that

share common

variables

variance

Uses the correlation structure amongst observed

variables to model a smaller number of

unobserved, latent variables known as factors

[1]

reduced

Dimensionality Reduction

A method of modeling observed

variables and their covariance structures

Observed variables are modeled as

linear combinations of potential factors

There two types of FA

Exploratory Factor Analysis (EFA)

Confirmatory Factor Analysis (CFA)

CFA is used for confirming hypotheses using

path analysis diagram.

EFA finds complex patterns by exploring

dataset.

Application fields

Machine learning (unsupervised)

Data mining

Psychology studies

Education

Marketing

Example 1

Example: Find socioeconomic status (SES) of a

person.

This is a factor that we cannot be measure

directly.

We can access (observable variables)

Occupation

Income

Education Level

People with a particular socioeconomic status

tend to have similar values

If the factor (SES) has a strong relationship

with these variables, then it accounts for a

large portion of the variance in the variables.

Example 2

Output of a simple factor analysis looking at

indicators of wealth, with just variables and two

resulting factors (using SPSS software).

variable income has a correlation of 0.65

with Factor 1.

Factor 1 -> “Individual socioeconomic

status.”

Factor 2 -> “Neighborhood socioeconomic

status.”

Done by orthogonal (factors are not

correlated) or oblique rotations(factors are

correlated).

https://www.theanalysisfactor.com/rotations-factor-analysis/

Statistical terms used in FA

Communality - Amount of variance a variable shares

with all the other variables

Factor loadings - Correlations between the variables

and the factors.

Eigenvalue - Represents the total variance explained

by each factor (value > = 1 preferred)

A factor with an eigenvalue of 1 accounts for as

much variance as a single variable, and the logic is

that only factors that explain at least the same amount

of variance as a single variable is worth keeping.

Scree plot - A scree plot is a plot of the

Eigenvalues against the number of factors

in order of extraction.

https://en.wikipedia.org/wiki/Scree_plot

https://studylib.net/doc/15380201/factor-analysis-%C2%A9-

2007-prentice-hall

Factor Solution Rule

When to drop a factor solution ?

When a factor does not have high factor loadings from more than two variables.

FA Model

Factor model is mentioned below:

 + . . . +

im

Each variable is written as a linear combination of common factors and a unique factor.

 where,

th standardized variable

ij

  = standardized mult reg coeff of var

 on common factor

  = common factor

  = standardized reg coeff of var

 on unique factor

   = the unique factor for variable

   = number of common factors

Factor Extraction Methods

Maximum Likelihood (ML): estimates factor loading

Principal Axis Factor method (PAF)

Principal Component Analysis (PCA)

Case Study

1)BARTLETT’S TEST OF SPHERICITY

Checks - if no correlation is present among the

variables.

Idea is to try to reject it.

It returns chi squared value, and p

statistic test results.

If the p test statistic value is less than

0.05, correlation is present among the

variables with 95% confidence.

P statistics value 0.0

https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/

A recruiter wants to hire employees for a business

firm. Interview is over and every interviewee has

been assigned a score out of 10 against 32 different

personality traits. Such as distant, careless, talkative

etc. Now the recruiters want to apply factor analysis

on this data to find correlation.

Similar or correlated features

can be grouped and represented

as factors

KMO test

2) Can FA be applied to this data ?

KAISER-MEYER-OLKIN (KMO) TEST

Measures the proportion of variance that might be

 common variance

among the variables. A value

between 0 and 1.

k statistic value of 0.84. It shows that the data has

more correlation and FA can be applied.

Where rjk is the correlation between two

variables and pjk is partial correlation.

https://en.wikipedia.org/wiki/Kaiser%E2%80%93Meyer%E2%80

%93Olkin_test

Case study

3) Determine the number of factors:

The number of factors can be decided on the basis

of the amount of common variance the factors

explain.

Eigenvalues represent the variance each factor

explains. Select the number of factors whose

eigenvalues are greater than 1.

Eigenvalues drop below 1 from the 7th

factor. So, the optimal number of factors is 6

Case Study

4) Interpret Factors

Ranges from -1 to 1.

Values close to -1 or 1 means it has influence

Values close to 0 indicates lower influence

For example, in Factor 0, we can see that the

features ‘distant’ and ‘shy’ talkative have high

loadings than other variables. From this, we can

see that Factor 0, explains the common variance

in people who are reserved i.e. the variance

among the people who are distant and shy.

Limitations

Naming factors

Split loadings (a variable may be difficult to

interpret since it loads into more than one

factors)

References

1.

Yong, An Gie, and Sean Pearce. "A beginner’s guide to factor analysis: Focusing on exploratory factor

analysis."

Tutorials in quantitative methods for psychology

 9.2 (2013): 79-94.

2.

https://en.wikipedia.org/wiki/Factor_analysis

3.

https://www.theanalysisfactor.com/rotations-factor-

analysis/?fbclid=IwAR3T4uLoEEUcmNU_i217ZXGN5Ti7wpyDcqDcOdlxW6SGtz084xnv6SFFVew

4.

https://www.youtube.com/watch?v=eAl0nXkzt7w

5.

https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor-

analysis/

6.

https://online.stat.psu.edu/stat505/lesson/12

7.

https://statisticsbyjim.com/basics/factor-analysis/

8.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883798/

End of Slide

Thank you!

Slide Note

Embed Share

Download

Factor analysis is a statistical method used to identify underlying factors that explain correlations among variables. It helps in reducing large datasets by finding uncorrelated variables. There are two types of factor analysis: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Applications of factor analysis include machine learning, data mining, psychology studies, education, and marketing. By modeling observed variables as linear combinations of potential factors, factor analysis aids in dimensionality reduction and revealing hidden relationships among variables.

chns924 Follow

Uploaded on Feb 16, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Factor Analysis By Mayamin Hamid Raha (05/05/2023)

Contents Assumptions Factor Extraction Methods Case Study Limitations References Background Introduction Application fields Examples Statistical terms used Factor solution rule FA Model Factor Extraction Methods 2

Background Large datasets consisting of several variables can be reduced by observing groups of latent variables (factors). So FA is used to - identify underlying factors to explain correlation among variables The figure above shows how the four hidden factors in blue drive the measurable values in the yellow indicator tags https://statisticsbyjim.com/basics/factor-analysis/ - identify new smaller set of uncorrelated variable to replace original set of correlated variables. 3

A method of modeling observed variables and their covariance structures What is Factor Analysis ? Observed variables are modeled as linear combinations of potential factors The main idea behind FA There two types of FA - Exploratory Factor Analysis (EFA) - Confirmatory Factor Analysis (CFA) Measurable reduced Latent variables and observable that CFA is used for confirming hypotheses using path analysis diagram. share common variables variance EFA finds complex patterns by exploring dataset. Uses the correlation structure amongst observed variables to model a unobserved, latent variables known as factors [1] Dimensionality Reduction smaller number of 4

Application fields - Machine learning (unsupervised) - Data mining - Psychology studies - Education - Marketing 5

Example 1 Example: Find socioeconomic status (SES) of a person. - This is a factor that we cannot be measure directly. People with a particular socioeconomic status tend to have similar values If the factor (SES) has a strong relationship with these variables, then it accounts for a large portion of the variance in the variables. We can access (observable variables) - Occupation - Income - Education Level 6

- variable income has a correlation of 0.65 with Factor 1. Factor 1 -> Individual socioeconomic status. Factor 2 -> Neighborhood socioeconomic status. Done by orthogonal (factors are not correlated) or oblique rotations(factors are correlated). - Example 2 - - Output of a simple factor analysis looking at indicators of wealth, with just variables and two resulting factors (using SPSS software). 7 https://www.theanalysisfactor.com/rotations-factor-analysis/

Scree plot - A scree plot is a plot of the Eigenvalues against the number of factors in order of extraction. Statistical terms used in FA Communality - Amount of variance a variable shares with all the other variables Factor loadings - Correlations between the variables and the factors. Eigenvalue - Represents the total variance explained by each factor (value > = 1 preferred) A factor with an eigenvalue of 1 accounts for as much variance as a single variable, and the logic is that only factors that explain at least the same amount of variance as a single variable is worth keeping. https://en.wikipedia.org/wiki/Scree_plot https://studylib.net/doc/15380201/factor-analysis-%C2%A9- 2007-prentice-hall 8

Factor Solution Rule When to drop a factor solution ? - When a factor does not have high factor loadings from more than two variables. 9

FA Model Factor model is mentioned below: Xi= Ai 1F1+ Ai 2F2+ Ai 3F3+ . . . + AimFm+ ViUi Each variable is written as a linear combination of common factors and a unique factor. where, Xi = i th standardized variable Aij= standardized mult reg coeff of var i on common factor j Fj= common factor j Vi= standardized reg coeff of var i on unique factor i Ui = the unique factor for variable i m = number of common factors 10

Factor Extraction Methods - Maximum Likelihood (ML): estimates factor loading - Principal Axis Factor method (PAF) - Principal Component Analysis (PCA) 11

1)BARTLETTS TEST OF SPHERICITY Checks - if no correlation is present among the variables. - Idea is to try to reject it. - It returns chi squared value, and p statistic test results. - If the p test statistic value is less than 0.05, correlation is present among the variables with 95% confidence. - P statistics value 0.0 Case Study A recruiter wants to hire employees for a business firm. Interview is over and every interviewee has been assigned a score out of 10 against 32 different personality traits. Such as distant, careless, talkative etc. Now the recruiters want to apply factor analysis on this data to find correlation. Similar or correlated features can be grouped and represented as factors https://www.analyticsvidhya.com/blog/2020/10/dimensionality-reduction-using-factor-analysis-in-python/ 12

KMO test 2) Can FA be applied to this data ? KAISER-MEYER-OLKIN (KMO) TEST Measures the proportion of variance that might be a common variance among the variables. A value between 0 and 1. k statistic value of 0.84. It shows that the data has more correlation and FA can be applied. Where rjk is the correlation between two variables and pjk is partial correlation. https://en.wikipedia.org/wiki/Kaiser%E2%80%93Meyer%E2%80 %93Olkin_test 13

Case study 3) Determine the number of factors: The number of factors can be decided on the basis of the amount of common variance the factors explain. Eigenvalues represent the variance each factor explains. Select the number of factors whose eigenvalues are greater than 1. Eigenvalues drop below 1 from the 7th factor. So, the optimal number of factors is 6 14

Case Study 4) Interpret Factors Ranges from -1 to 1. Values close to -1 or 1 means it has influence Values close to 0 indicates lower influence For example, in Factor 0, we can see that the features distant and shy talkative have high loadings than other variables. From this, we can see that Factor 0, explains the common variance in people who are reserved i.e. the variance among the people who are distant and shy. 15

Limitations - - Naming factors Split loadings (a variable may be difficult to interpret since it loads into more than one factors) 16

References 1. Yong, An Gie, and Sean Pearce. "A beginner s guide to factor analysis: Focusing on exploratory factor analysis." Tutorials in quantitative methods for psychology 9.2 (2013): 79-94. https://en.wikipedia.org/wiki/Factor_analysis https://www.theanalysisfactor.com/rotations-factor- analysis/?fbclid=IwAR3T4uLoEEUcmNU_i217ZXGN5Ti7wpyDcqDcOdlxW6SGtz084xnv6SFFVew https://www.youtube.com/watch?v=eAl0nXkzt7w https://stats.oarc.ucla.edu/spss/seminars/introduction-to-factor-analysis/a-practical-introduction-to-factor- analysis/ https://online.stat.psu.edu/stat505/lesson/12 https://statisticsbyjim.com/basics/factor-analysis/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7883798/ 2. 3. 4. 5. 6. 7. 8. 17

End of Slide Thank you! 18

Factor Analysis

Download Presentation

Presentation Transcript

Related

More Related Content