Factor and Principle Component Analysis

 
 
Factor and Principle
Component
Analysis
 
1
 
Variable reduction
 
In many modern analyses, we have few patients or
observations but lots of variables.
Examples include genomics, metabalomics and
other “biomarker” generating methods.
Older examples include asking many questions on a
psychometric test where each question is a
variable.
Analyses would be easier if we could reduce the
number of variables without losing (too much)
information.
 
2
 
Combine correlated variables
 
If X and Y are strongly correlated, don’t need both- redundant
.               r = 0.94
 
3
 
Combine correlated variables
 
If X and Y are strongly correlated, should use X
or Y?    No reason X is better or worse than Y.
So make a new variable “F”
                 
F = w
1
 X + w
2
 Y  
+ error
Most of the variation (information) in X and Y is
now contained in F.  One variable instead of two.
The stronger the correlation, the more “F”
contains all the information in X and Y.
 
4
 
Standardized data
 
Since we are only interested in the 
relative
differences and correlations among the data, it is
easier to work with the standardized data.
If “X
orig
” is the original variable, we compute
   X =  (X
orig
 – mean X
orig
) / SD(X
orig
)
   X has overall mean 0 and SD=1
For X, the correlation and covariances are the same.
If each X has variance =SD
2
=1, & if there are “K”
variables, their total variance is 
K.
(the standardized X is sometimes called a “Z” score)
 
5
 
Underlying “factor(s)”
 
If  X and Y are highly correlated, perhaps they
are both manifestation of a “latent” (not directly
observed) variable called “F”.
     X =  a
11
 F  + noise
     Y  = a
12
 F  + noise
This would be one explanation for why X and Y
are correlated, if they were both slightly
different functions of F.
 
6
 
Consolidating many correlated variables
 
7
 
Correlation matrix, K=9 variables
 
8
 
Sorted Correlation matrix
 
9
 
Heat map
 
10
 
Make K factors – keep most important
 
Initially, if have K variables, we make K factors
where each “factor” is 
uncorrelated
(orthogonal) with the others.  The factor with
the largest variance (also called the
“eigenvalue”) is denoted “Factor 1” and has the
most “information”. The factor with the next
largest variance is Factor 2 etc.  Keep the m out
of K factors whose variance is larger than 1.0 –
and/or examine scree plot.
 
11
 
Make K factors, K=9
 
Factor 1 = w
11
 X
1
 + w
12
 X
2
 + w
13
 X
3 
+ … w
19
 X
9
 
Factor 9  =  w
91
 X
1
 + w
92
 X
2
 + w
93
 X
3 
+ … w
99
 X
9
 
The w
ij
 values (weights) are chosen so the K
factors are 
mutually orthogonal. 
 Can compute
variance (and SD) of each factor.  Means (&
intercepts)  are zero by definition.  Note that this
strongly assumes linearity!
The factor with the largest variance is called
Factor 1, the next largest Factor 2, …
 
 
12
 
Scree plot-Eigenvalues=factor variances
 
13
 
Eigenvalues (variance accounted for)
Scree plot
 
14
 
Factor loadings (on variables)
 
    
total variance accounted for by factors
       factor    variance       pct        cum pct
            1           4.577     50.9%        50.9%
            2           3.458     38.4%        89.3%
     A = 0.964 Factor 1 
- 0.107 Factor 2
     E = 0.945 Factor 1 
-
 
 0.123 Factor 2
     
F =
 -0.104 Factor 1 
+ 0.917 Factor 2
     I =
 
-0.081 Factor 1
 
+
 
0.936 Factor 2
 
    
loadings coeffs are ≈ correlation of variable with factor
 
  Factor 1=w
11
A + w
12
B + w
13
C + w
14
D + w
15
E + error
  Factor 2=w
21
F + w
22
G + w
23
H + W
24
I + error
 
   
If X
i
 = a
i
 F
1
  for i=1 … C,    w
i
 = [1/a
i
]/C, weights derived from loadings
 
15
 
Example-Two factors
 
16
 
Rotated factor loadings
 
Factors are uncorrelated (orthogonal)
with each other. They represent non
redundant information
 
17
 
Communalities
 
        X
i
 = a
i1
 F
1
 + a
i2
 F
2
 +   … + a
iK
 F
K
  + error
a’s are factor loadings, most (hopefully) near 
zero
.
  Communality for variable X
i
       h
i
2
 = a
i1
2
 + a
i2
2
 + a
i3
2
… + a
iK
2
If the original K variables are made out of m ≤ K
factors (m=2 in this example), the communalities
for all K of the X variables should be 
high
 even if
some of the a
iK
 are set to zero. Implies observed
variables are (mostly) made out of the m < K
factors.
 
18
 
Communalities
 
How much of the variation in each variable is
accounted for by the factor(s) – similar to R
2
.
 
19
 
EXAMPLE- MMSE
 
 
20
 
WGCNA- Weighted gene co-expression
network analysis – Horvath (UCLA)
 
21
 
Factors can have factors
 
22
 
Power adjacency function results
in a weighted gene network
 
Often choosing beta=6 works well but in general we use the
“scale free topology criterion” described in Zhang and
Horvath 2005.
 
 
24
Slide Note
Embed
Share

Variable reduction methods like Factor and Principle Component Analysis help in combining correlated variables to reduce dimensionality and maximize information. Standardizing data and identifying underlying factors are key steps in consolidating multiple variables and understanding correlations in complex datasets.

  • Variable reduction
  • Data consolidation
  • Dimensionality reduction
  • Correlation analysis
  • Standardized data

Uploaded on Feb 16, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Factor and Principle Component Analysis 1

  2. Variable reduction In many modern analyses, we have few patients or observations but lots of variables. Examples include genomics, metabalomics and other biomarker generating methods. Older examples include asking many questions on a psychometric test where each question is a variable. Analyses would be easier if we could reduce the number of variables without losing (too much) information. 2

  3. Combine correlated variables If X and Y are strongly correlated, don t need both- redundant . r = 0.94 3

  4. Combine correlated variables If X and Y are strongly correlated, should use X or Y? No reason X is better or worse than Y. So make a new variable F F = w1 X + w2 Y + error Most of the variation (information) in X and Y is now contained in F. One variable instead of two. The stronger the correlation, the more F contains all the information in X and Y. 4

  5. Standardized data Since we are only interested in the relative differences and correlations among the data, it is easier to work with the standardized data. If Xorig is the original variable, we compute X = (Xorig mean Xorig) / SD(Xorig) X has overall mean 0 and SD=1 For X, the correlation and covariances are the same. If each X has variance =SD2=1, & if there are K variables, their total variance is K. (the standardized X is sometimes called a Z score) 5

  6. Underlying factor(s) If X and Y are highly correlated, perhaps they are both manifestation of a latent (not directly observed) variable called F . X = a11 F + noise Y = a12 F + noise This would be one explanation for why X and Y are correlated, if they were both slightly different functions of F. 6

  7. Consolidating many correlated variables 1 2 3 4 5 6 7 7

  8. Correlation matrix, K=9 variables A G C B I D E F H A 1 -0.215 0.912 0.936 -0.189 0.936 0.920 -0.199 -0.199 G -0.215 1 -0.250 -0.177 0.884 -0.239 -0.242 0.856 0.855 C 0.912 -0.250 1 0.908 -0.192 0.920 0.923 -0.212 -0.227 B 0.936 -0.177 0.908 1 -0.129 0.911 0.9118 -0.147 -0.148 I -0.189 0.884 -0.192 -0.129 1 -0.202 -0.184 0.870 0.843 D 0.936 -0.239 0.920 0.911 -0.202 1 0.903 -0.235 -0.234 E 0.920 -0.242 0.923 0.912 -0.184 0.903 1 -0.206 -0.224 F -0.199 0.856 -0.212 -0.147 0.870 -0.235 -0.206 1 0.846 H -0.199 0.855 -0.227 -0.148 0.843 -0.234 -0.224 0.846 1 8

  9. Sorted Correlation matrix A B C D E F G H I A 1 0.936 0.912 0.936 0.920 -0.199 -0.215 -0.199 -0.189 B 0.936 1 0.908 0.911 0.912 -0.147 -0.177 -0.148 -0.129 C 0.912 0.908 1 0.920 0.923 -0.212 -0.250 -0.227 -0.192 D 0.936 0.911 0.920 1 0.903 -0.235 -0.239 -0.234 -0.202 E 0.920 0.912 0.923 0.903 1 -0.206 -0.242 -0.224 -0.184 F -0.199 -0.147 -0.212 -0.235 -0.206 1 0.856 0.846 0.870 G -0.215 -0.177 -0.250 -0.239 -0.242 0.856 1 0.855 0.884 H -0.199 -0.148 -0.227 -0.234 -0.224 0.846 0.855 1 0.843 I -0.189 -0.129 -0.192 -0.202 -0.184 0.870 0.884 0.843 1 9

  10. Heat map 10

  11. Make K factors keep most important Initially, if have K variables, we make K factors where each factor is uncorrelated (orthogonal) with the others. The factor with the largest variance (also called the eigenvalue ) is denoted Factor 1 and has the most information . The factor with the next largest variance is Factor 2 etc. Keep the m out of K factors whose variance is larger than 1.0 and/or examine scree plot. 11

  12. Make K factors, K=9 Factor 1 = w11 X1 + w12 X2 + w13 X3 + w19 X9 Factor 9 = w91 X1 + w92 X2 + w93 X3 + w99 X9 The wij values (weights) are chosen so the K factors are mutually orthogonal. Can compute variance (and SD) of each factor. Means (& intercepts) are zero by definition. Note that this strongly assumes linearity! The factor with the largest variance is called Factor 1, the next largest Factor 2, 12

  13. Scree plot-Eigenvalues=factor variances cut off when eigenvalue < 1 to determine number of true factors 6.0 5.0 4.0 3.0 2.0 1.0 0.0 1 2 3 4 5 6 7 8 9 number of factors 13

  14. Eigenvalues (variance accounted for) Scree plot Cum Percent 57.61 91.73 93.60 95.29 96.61 97.65 98.65 99.43 100.0 Percent total variance accouted for factor 1 2 3 4 5 6 7 8 9 variance 5.185 3.071 0.168 0.152 0.119 0.094 0.090 0.070 0.051 Percent 57.61 34.12 1.87 1.69 1.32 1.04 1.00 0.78 0.57 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 number of factors total 9.000 100.00 -- 14

  15. Factor loadings (on variables) total variance accounted for by factors factor variance pct cum pct 1 4.577 50.9% 50.9% 2 3.458 38.4% 89.3% A = 0.964 Factor 1 - 0.107 Factor 2 E = 0.945 Factor 1 - 0.123 Factor 2 F = -0.104 Factor 1 + 0.917 Factor 2 I = -0.081 Factor 1 + 0.936 Factor 2 loadings coeffs are correlation of variable with factor Factor 1=w11A + w12B + w13C + w14D + w15E + error Factor 2=w21F + w22G + w23H + W24I + error If Xi = ai F1 for i=1 C, wi = [1/ai]/C, weights derived from loadings 15

  16. Example-Two factors Rotated factor loadings Variable (X) Factor 1 Factor 2 A 0.964 -0.107 B 0.958 -0.053 C 0.944 -0.130 D 0.949 -0.137 E 0.945 -0.123 F -0.104 0.917 G -0.128 0.929 H -0.112 0.902 I -0.081 0.936 16

  17. Factors are uncorrelated (orthogonal) with each other. They represent non redundant information 17

  18. Communalities Xi = ai1 F1 + ai2 F2+ + aiK FK + error a s are factor loadings, most (hopefully) near zero. Communality for variable Xi hi2 = ai12 + ai22 + ai32 + aiK2 If the original K variables are made out of m K factors (m=2 in this example), the communalities for all K of the X variables should be high even if some of the aiK are set to zero. Implies observed variables are (mostly) made out of the m < K factors. 18

  19. Communalities How much of the variation in each variable is accounted for by the factor(s) similar to R2. variable A B C D E F G H I value 0.940 0.921 0.908 0.919 0.907 0.852 0.878 0.826 0.883 19

  20. EXAMPLE- MMSE 20

  21. WGCNA- Weighted gene co-expression network analysis Horvath (UCLA) 21

  22. Factors can have factors 3 4 1 5 6 7 2 8 9 22

  23. Power adjacency function results in a weighted gene network = | ( , )| a cor x x ij i j Often choosing beta=6 works well but in general we use the scale free topology criterion described in Zhang and Horvath 2005.

  24. 24

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#