Factor and Principle Component Analysis

Factor and Principle

Component

Analysis

Variable reduction

In many modern analyses, we have few patients or

observations but lots of variables.

Examples include genomics, metabalomics and

other “biomarker” generating methods.

Older examples include asking many questions on a

psychometric test where each question is a

variable.

Analyses would be easier if we could reduce the

number of variables without losing (too much)

information.

Combine correlated variables

If X and Y are strongly correlated, don’t need both- redundant

.               r = 0.94

Combine correlated variables

If X and Y are strongly correlated, should use X

or Y?    No reason X is better or worse than Y.

So make a new variable “F”

F = w

 X + w

+ error

Most of the variation (information) in X and Y is

now contained in F.  One variable instead of two.

The stronger the correlation, the more “F”

contains all the information in X and Y.

Standardized data

Since we are only interested in the

relative

differences and correlations among the data, it is

easier to work with the standardized data.

If “X

orig

” is the original variable, we compute

   X =  (X

orig

 – mean X

orig

) / SD(X

orig

   X has overall mean 0 and SD=1

For X, the correlation and covariances are the same.

If each X has variance =SD

=1, & if there are “K”

variables, their total variance is

K.

(the standardized X is sometimes called a “Z” score)

Underlying “factor(s)”

If  X and Y are highly correlated, perhaps they

are both manifestation of a “latent” (not directly

observed) variable called “F”.

     X =  a

 F  + noise

     Y  = a

 F  + noise

This would be one explanation for why X and Y

are correlated, if they were both slightly

different functions of F.

Consolidating many correlated variables

Correlation matrix, K=9 variables

Sorted Correlation matrix

Heat map

Make K factors – keep most important

Initially, if have K variables, we make K factors

where each “factor” is

uncorrelated

(orthogonal) with the others.  The factor with

the largest variance (also called the

“eigenvalue”) is denoted “Factor 1” and has the

most “information”. The factor with the next

largest variance is Factor 2 etc.  Keep the m out

of K factors whose variance is larger than 1.0 –

and/or examine scree plot.

Make K factors, K=9

Factor 1 = w

+ w

+ w

+ … w

…

Factor 9  =  w

+ w

+ w

+ … w

The w

ij

 values (weights) are chosen so the K

factors are

mutually orthogonal.

 Can compute

variance (and SD) of each factor.  Means (&

intercepts)  are zero by definition.  Note that this

strongly assumes linearity!

The factor with the largest variance is called

Factor 1, the next largest Factor 2, …

Scree plot-Eigenvalues=factor variances

Eigenvalues (variance accounted for)

Scree plot

Factor loadings (on variables)

total variance accounted for by factors

       factor    variance       pct        cum pct

            1           4.577     50.9%        50.9%

            2           3.458     38.4%        89.3%

     A = 0.964 Factor 1

- 0.107 Factor 2

…

     E = 0.945 Factor 1

 0.123 Factor 2

F =

 -0.104 Factor 1

+ 0.917 Factor 2

…

I =

-0.081 Factor 1

0.936 Factor 2

loadings coeffs are ≈ correlation of variable with factor

  Factor 1=w

A + w

B + w

C + w

D + w

E + error

  Factor 2=w

F + w

G + w

H + W

I + error

If X

= a

  for i=1 … C,    w

 = [1/a

]/C, weights derived from loadings

Example-Two factors

Rotated factor loadings

Factors are uncorrelated (orthogonal)

with each other. They represent non

redundant information

Communalities

= a

i1

+ a

i2

 +   … + a

iK

  + error

a’s are factor loadings, most (hopefully) near

zero

  Communality for variable X

= a

i1

+ a

i2

+ a

i3

… + a

iK

If the original K variables are made out of m ≤ K

factors (m=2 in this example), the communalities

for all K of the X variables should be

high

 even if

some of the a

iK

 are set to zero. Implies observed

variables are (mostly) made out of the m < K

factors.

Communalities

How much of the variation in each variable is

accounted for by the factor(s) – similar to R

EXAMPLE- MMSE

WGCNA- Weighted gene co-expression

network analysis – Horvath (UCLA)

Factors can have factors

Power adjacency function results

in a weighted gene network

Often choosing beta=6 works well but in general we use the

“scale free topology criterion” described in Zhang and

Horvath 2005.

Slide Note

Embed Share

Download

Variable reduction methods like Factor and Principle Component Analysis help in combining correlated variables to reduce dimensionality and maximize information. Standardizing data and identifying underlying factors are key steps in consolidating multiple variables and understanding correlations in complex datasets.

zep_p Follow

Uploaded on Feb 16, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Factor and Principle Component Analysis 1

Variable reduction In many modern analyses, we have few patients or observations but lots of variables. Examples include genomics, metabalomics and other biomarker generating methods. Older examples include asking many questions on a psychometric test where each question is a variable. Analyses would be easier if we could reduce the number of variables without losing (too much) information. 2

Combine correlated variables If X and Y are strongly correlated, don t need both- redundant . r = 0.94 3

Combine correlated variables If X and Y are strongly correlated, should use X or Y? No reason X is better or worse than Y. So make a new variable F F = w1 X + w2 Y + error Most of the variation (information) in X and Y is now contained in F. One variable instead of two. The stronger the correlation, the more F contains all the information in X and Y. 4

Standardized data Since we are only interested in the relative differences and correlations among the data, it is easier to work with the standardized data. If Xorig is the original variable, we compute X = (Xorig mean Xorig) / SD(Xorig) X has overall mean 0 and SD=1 For X, the correlation and covariances are the same. If each X has variance =SD2=1, & if there are K variables, their total variance is K. (the standardized X is sometimes called a Z score) 5

Underlying factor(s) If X and Y are highly correlated, perhaps they are both manifestation of a latent (not directly observed) variable called F . X = a11 F + noise Y = a12 F + noise This would be one explanation for why X and Y are correlated, if they were both slightly different functions of F. 6

Consolidating many correlated variables 1 2 3 4 5 6 7 7

Correlation matrix, K=9 variables A G C B I D E F H A 1 -0.215 0.912 0.936 -0.189 0.936 0.920 -0.199 -0.199 G -0.215 1 -0.250 -0.177 0.884 -0.239 -0.242 0.856 0.855 C 0.912 -0.250 1 0.908 -0.192 0.920 0.923 -0.212 -0.227 B 0.936 -0.177 0.908 1 -0.129 0.911 0.9118 -0.147 -0.148 I -0.189 0.884 -0.192 -0.129 1 -0.202 -0.184 0.870 0.843 D 0.936 -0.239 0.920 0.911 -0.202 1 0.903 -0.235 -0.234 E 0.920 -0.242 0.923 0.912 -0.184 0.903 1 -0.206 -0.224 F -0.199 0.856 -0.212 -0.147 0.870 -0.235 -0.206 1 0.846 H -0.199 0.855 -0.227 -0.148 0.843 -0.234 -0.224 0.846 1 8

Sorted Correlation matrix A B C D E F G H I A 1 0.936 0.912 0.936 0.920 -0.199 -0.215 -0.199 -0.189 B 0.936 1 0.908 0.911 0.912 -0.147 -0.177 -0.148 -0.129 C 0.912 0.908 1 0.920 0.923 -0.212 -0.250 -0.227 -0.192 D 0.936 0.911 0.920 1 0.903 -0.235 -0.239 -0.234 -0.202 E 0.920 0.912 0.923 0.903 1 -0.206 -0.242 -0.224 -0.184 F -0.199 -0.147 -0.212 -0.235 -0.206 1 0.856 0.846 0.870 G -0.215 -0.177 -0.250 -0.239 -0.242 0.856 1 0.855 0.884 H -0.199 -0.148 -0.227 -0.234 -0.224 0.846 0.855 1 0.843 I -0.189 -0.129 -0.192 -0.202 -0.184 0.870 0.884 0.843 1 9

Heat map 10

Make K factors keep most important Initially, if have K variables, we make K factors where each factor is uncorrelated (orthogonal) with the others. The factor with the largest variance (also called the eigenvalue ) is denoted Factor 1 and has the most information . The factor with the next largest variance is Factor 2 etc. Keep the m out of K factors whose variance is larger than 1.0 and/or examine scree plot. 11

Make K factors, K=9 Factor 1 = w11 X1 + w12 X2 + w13 X3 + w19 X9 Factor 9 = w91 X1 + w92 X2 + w93 X3 + w99 X9 The wij values (weights) are chosen so the K factors are mutually orthogonal. Can compute variance (and SD) of each factor. Means (& intercepts) are zero by definition. Note that this strongly assumes linearity! The factor with the largest variance is called Factor 1, the next largest Factor 2, 12

Scree plot-Eigenvalues=factor variances cut off when eigenvalue < 1 to determine number of true factors 6.0 5.0 4.0 3.0 2.0 1.0 0.0 1 2 3 4 5 6 7 8 9 number of factors 13

Eigenvalues (variance accounted for) Scree plot Cum Percent 57.61 91.73 93.60 95.29 96.61 97.65 98.65 99.43 100.0 Percent total variance accouted for factor 1 2 3 4 5 6 7 8 9 variance 5.185 3.071 0.168 0.152 0.119 0.094 0.090 0.070 0.051 Percent 57.61 34.12 1.87 1.69 1.32 1.04 1.00 0.78 0.57 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 number of factors total 9.000 100.00 -- 14

Factor loadings (on variables) total variance accounted for by factors factor variance pct cum pct 1 4.577 50.9% 50.9% 2 3.458 38.4% 89.3% A = 0.964 Factor 1 - 0.107 Factor 2 E = 0.945 Factor 1 - 0.123 Factor 2 F = -0.104 Factor 1 + 0.917 Factor 2 I = -0.081 Factor 1 + 0.936 Factor 2 loadings coeffs are correlation of variable with factor Factor 1=w11A + w12B + w13C + w14D + w15E + error Factor 2=w21F + w22G + w23H + W24I + error If Xi = ai F1 for i=1 C, wi = [1/ai]/C, weights derived from loadings 15

Example-Two factors Rotated factor loadings Variable (X) Factor 1 Factor 2 A 0.964 -0.107 B 0.958 -0.053 C 0.944 -0.130 D 0.949 -0.137 E 0.945 -0.123 F -0.104 0.917 G -0.128 0.929 H -0.112 0.902 I -0.081 0.936 16

Factors are uncorrelated (orthogonal) with each other. They represent non redundant information 17

Communalities Xi = ai1 F1 + ai2 F2+ + aiK FK + error a s are factor loadings, most (hopefully) near zero. Communality for variable Xi hi2 = ai12 + ai22 + ai32 + aiK2 If the original K variables are made out of m K factors (m=2 in this example), the communalities for all K of the X variables should be high even if some of the aiK are set to zero. Implies observed variables are (mostly) made out of the m < K factors. 18

Communalities How much of the variation in each variable is accounted for by the factor(s) similar to R2. variable A B C D E F G H I value 0.940 0.921 0.908 0.919 0.907 0.852 0.878 0.826 0.883 19

EXAMPLE- MMSE 20

WGCNA- Weighted gene co-expression network analysis Horvath (UCLA) 21

Factors can have factors 3 4 1 5 6 7 2 8 9 22

Power adjacency function results in a weighted gene network = | ( , )| a cor x x ij i j Often choosing beta=6 works well but in general we use the scale free topology criterion described in Zhang and Horvath 2005.

Factor and Principle Component Analysis

Download Presentation

Presentation Transcript

Related

More Related Content