Factor and Principle Component Analysis
Variable reduction methods like Factor and Principle Component Analysis help in combining correlated variables to reduce dimensionality and maximize information. Standardizing data and identifying underlying factors are key steps in consolidating multiple variables and understanding correlations in complex datasets.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Factor and Principle Component Analysis 1
Variable reduction In many modern analyses, we have few patients or observations but lots of variables. Examples include genomics, metabalomics and other biomarker generating methods. Older examples include asking many questions on a psychometric test where each question is a variable. Analyses would be easier if we could reduce the number of variables without losing (too much) information. 2
Combine correlated variables If X and Y are strongly correlated, don t need both- redundant . r = 0.94 3
Combine correlated variables If X and Y are strongly correlated, should use X or Y? No reason X is better or worse than Y. So make a new variable F F = w1 X + w2 Y + error Most of the variation (information) in X and Y is now contained in F. One variable instead of two. The stronger the correlation, the more F contains all the information in X and Y. 4
Standardized data Since we are only interested in the relative differences and correlations among the data, it is easier to work with the standardized data. If Xorig is the original variable, we compute X = (Xorig mean Xorig) / SD(Xorig) X has overall mean 0 and SD=1 For X, the correlation and covariances are the same. If each X has variance =SD2=1, & if there are K variables, their total variance is K. (the standardized X is sometimes called a Z score) 5
Underlying factor(s) If X and Y are highly correlated, perhaps they are both manifestation of a latent (not directly observed) variable called F . X = a11 F + noise Y = a12 F + noise This would be one explanation for why X and Y are correlated, if they were both slightly different functions of F. 6
Consolidating many correlated variables 1 2 3 4 5 6 7 7
Correlation matrix, K=9 variables A G C B I D E F H A 1 -0.215 0.912 0.936 -0.189 0.936 0.920 -0.199 -0.199 G -0.215 1 -0.250 -0.177 0.884 -0.239 -0.242 0.856 0.855 C 0.912 -0.250 1 0.908 -0.192 0.920 0.923 -0.212 -0.227 B 0.936 -0.177 0.908 1 -0.129 0.911 0.9118 -0.147 -0.148 I -0.189 0.884 -0.192 -0.129 1 -0.202 -0.184 0.870 0.843 D 0.936 -0.239 0.920 0.911 -0.202 1 0.903 -0.235 -0.234 E 0.920 -0.242 0.923 0.912 -0.184 0.903 1 -0.206 -0.224 F -0.199 0.856 -0.212 -0.147 0.870 -0.235 -0.206 1 0.846 H -0.199 0.855 -0.227 -0.148 0.843 -0.234 -0.224 0.846 1 8
Sorted Correlation matrix A B C D E F G H I A 1 0.936 0.912 0.936 0.920 -0.199 -0.215 -0.199 -0.189 B 0.936 1 0.908 0.911 0.912 -0.147 -0.177 -0.148 -0.129 C 0.912 0.908 1 0.920 0.923 -0.212 -0.250 -0.227 -0.192 D 0.936 0.911 0.920 1 0.903 -0.235 -0.239 -0.234 -0.202 E 0.920 0.912 0.923 0.903 1 -0.206 -0.242 -0.224 -0.184 F -0.199 -0.147 -0.212 -0.235 -0.206 1 0.856 0.846 0.870 G -0.215 -0.177 -0.250 -0.239 -0.242 0.856 1 0.855 0.884 H -0.199 -0.148 -0.227 -0.234 -0.224 0.846 0.855 1 0.843 I -0.189 -0.129 -0.192 -0.202 -0.184 0.870 0.884 0.843 1 9
Heat map 10
Make K factors keep most important Initially, if have K variables, we make K factors where each factor is uncorrelated (orthogonal) with the others. The factor with the largest variance (also called the eigenvalue ) is denoted Factor 1 and has the most information . The factor with the next largest variance is Factor 2 etc. Keep the m out of K factors whose variance is larger than 1.0 and/or examine scree plot. 11
Make K factors, K=9 Factor 1 = w11 X1 + w12 X2 + w13 X3 + w19 X9 Factor 9 = w91 X1 + w92 X2 + w93 X3 + w99 X9 The wij values (weights) are chosen so the K factors are mutually orthogonal. Can compute variance (and SD) of each factor. Means (& intercepts) are zero by definition. Note that this strongly assumes linearity! The factor with the largest variance is called Factor 1, the next largest Factor 2, 12
Scree plot-Eigenvalues=factor variances cut off when eigenvalue < 1 to determine number of true factors 6.0 5.0 4.0 3.0 2.0 1.0 0.0 1 2 3 4 5 6 7 8 9 number of factors 13
Eigenvalues (variance accounted for) Scree plot Cum Percent 57.61 91.73 93.60 95.29 96.61 97.65 98.65 99.43 100.0 Percent total variance accouted for factor 1 2 3 4 5 6 7 8 9 variance 5.185 3.071 0.168 0.152 0.119 0.094 0.090 0.070 0.051 Percent 57.61 34.12 1.87 1.69 1.32 1.04 1.00 0.78 0.57 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 number of factors total 9.000 100.00 -- 14
Factor loadings (on variables) total variance accounted for by factors factor variance pct cum pct 1 4.577 50.9% 50.9% 2 3.458 38.4% 89.3% A = 0.964 Factor 1 - 0.107 Factor 2 E = 0.945 Factor 1 - 0.123 Factor 2 F = -0.104 Factor 1 + 0.917 Factor 2 I = -0.081 Factor 1 + 0.936 Factor 2 loadings coeffs are correlation of variable with factor Factor 1=w11A + w12B + w13C + w14D + w15E + error Factor 2=w21F + w22G + w23H + W24I + error If Xi = ai F1 for i=1 C, wi = [1/ai]/C, weights derived from loadings 15
Example-Two factors Rotated factor loadings Variable (X) Factor 1 Factor 2 A 0.964 -0.107 B 0.958 -0.053 C 0.944 -0.130 D 0.949 -0.137 E 0.945 -0.123 F -0.104 0.917 G -0.128 0.929 H -0.112 0.902 I -0.081 0.936 16
Factors are uncorrelated (orthogonal) with each other. They represent non redundant information 17
Communalities Xi = ai1 F1 + ai2 F2+ + aiK FK + error a s are factor loadings, most (hopefully) near zero. Communality for variable Xi hi2 = ai12 + ai22 + ai32 + aiK2 If the original K variables are made out of m K factors (m=2 in this example), the communalities for all K of the X variables should be high even if some of the aiK are set to zero. Implies observed variables are (mostly) made out of the m < K factors. 18
Communalities How much of the variation in each variable is accounted for by the factor(s) similar to R2. variable A B C D E F G H I value 0.940 0.921 0.908 0.919 0.907 0.852 0.878 0.826 0.883 19
WGCNA- Weighted gene co-expression network analysis Horvath (UCLA) 21
Factors can have factors 3 4 1 5 6 7 2 8 9 22
Power adjacency function results in a weighted gene network = | ( , )| a cor x x ij i j Often choosing beta=6 works well but in general we use the scale free topology criterion described in Zhang and Horvath 2005.