Understand Canonical Correlation Analysis

canonical correlation analysis n.w
1 / 15
Embed
Share

Canonical Correlation Analysis is a statistical method used to measure the association between two sets of variables by constructing linear combinations and assessing their correlation. This analysis involves creating canonical variables and examining correlations among pairs of linear combinations. It aims to summarize data structure through linear combinations and select optimal combinations for maximizing correlation. Explore canonical variables and obtain insights into data relationships.

  • Canonical Correlation
  • Analysis
  • Statistics
  • Variables
  • Linear Combinations

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Canonical Correlation Analysis

  2. Description Two Sets of Variables Observed on Units Set 1 has p Variables, Set 2 has q with p q Goal: Measure the association between the two sets of variables Makes use of constructed linear combinations from the two sets and measures the correlation between the two linear combinations Subsequent linear combinations uncorrelated with previous ones Pairs of Linear Combinations Canonical Variables Correlations among Pairs of LCs Canonical Correlations

  3. Model Description - I Random Vectors (observed on common Experimental/Sampling Units): = = = = = ( ) 2 q = = = ( ) 1 1 M ( ) 2 1 M X X ( ) 1 ( ) 2 X X for notational convenience p q 1 1 p q ( ) 1 p ( ) 2 q X X = ( ) 1 1 M ( ) 1 11 M O ( ) 1 1 p M L ( ) 1 ( ) 1 ( ) 1 = X E V X 11 ( ) 1 p ( ) 1 1 p ( ) 1 pp L ( ) 2 11 M ( ) 2 1 q M ( ) ( ) 2 1,2 11 M 1,2 q M L O L O 1 M 1 ( ) 2 ( ) 2 ( ) 2 ( ) 1 ( ) 2 = = = = = X COV , E V X X X 22 12 ( ) ( ) 2 1 q ( ) 2 qq ( p ) ( pq ) 1,2 1 1,2 L L ( ) 1 1 M ( ) 1 1 M X ( ) 1 p ( ) 1 p ( ) 1 ( ) 1 X X ' X X ( )( ) 11 12 = = = = = X X E V E X ' ( ) 2 1 M ( ) 2 1 M ( ) 2 ( ) 2 ( ) p q + X X 1 12 22 ( ) 2 q ( ) 2 q X

  4. Model Description - II ( ) 1 ( ) 2 Goal: Summarize the structure of through a few linear combinations of and X X 12 E U E V ( ) 1 ( ) 2 ( ) 1 ( ) 2 = = = = a'X b'X a' U V b' 1 p p p V U ( ) 1 ( ) 1 ii ( ) 1 ik = = = + 2 i a' X a a' a 2 V a a a 11 i k = = k i = + 1 1 1 i i 1 q q q ( ) 2 ( ) 2 ii ( ) 2 ik = = = + 2 i b' X b b' b 2 V V V b i k bb 22 = = k i = + 1 1 1 i i p q ( ) 1 ( ) 2 ( ik ) 1,2 = = = a' X X b a' b COV , COV , U V ab 12 i k = = 1 1 i k a' b = CORR , U V 12 a' a b' b 11 22 ( ) 2 = = a' a Goal: Choose a, b that maximizes CORR , s.t. 1 U V b' b 11 22 ( ) 1 = U V = a 'X b 'X At stage k, obtain 1) Maximize CORR , with Unit V ariances that: U V k k , = k k k k = = = = a ' a 2) Have CORR , CORR , 0 1,..., 1 Equivalently: 0 U U V V i k b ' b 11 22 i k i k i k i k

  5. Obtaining Canonical Variables 22 COV , X X X ( ) 1 ( ) 2 ( ) 1 ( ) 2 = = = = X V V ' 11 12 21 ( ) 1 X X 11 12 = = = Full Rank V V ( ) 2 X 21 22 ( ) 1 ( ) 2 = = a b a'X b'X For Fixed Vectors , forming , (with ): U V p q 1 1 p q ( ) 1 ( ) 1 ( ) 2 ( ) 2 = = = = = * 1 1/2 11 1/2 22 a 'X e ' X b'X f ' max CORR a b f , obtained with , where: U V U V X 1 1 1 1 1 , *2 1 1/2 11 1 1/2 11 Largest Eigenvalue of 12 22 12 1/2 11 1/2 22 1 1/2 11 1/2 22 e Eigenvector corresponding to the largest Eigenvalue of 1 12 22 1 11 12 Eigenvector correspon ding to the largest Eigenvalue of 1 21 21 ( ) ( ) 1 ( ) 2 = = = 1/2 11 1/2 22 th e ' For the pair 2,..., , : , where: k k p q U V X f ' X k k k k e f f *2 k 1/2 11 1 1/2 11 th Largest Eigenvalue of k 12 k 22 largest Eigenvalue of 12 1/2 11 1 1/2 11 th Eigenvector corresponding to the 12 22 12 k 1/2 22 1 1/2 22 th Eigenvector corresponding to the f largest Eigenvalue of k 21 11 21 k 1/2 22 1 1/2 22 ,..., Eigenvectors (orthogonal) c orresponding to the - 0 Eigenvalue of q p + 1 21 11 21 p q

  6. Standardized Variables ( ) 1 1 M ( ) 2 1 M Z Z ( ) 1 ( ) 2 ( ) 1 ( ) 2 ( ) 1 ( ) 2 = = = = = = Z Z Z COV , V V Z Z Z ' 11 22 12 21 ( ) 1 p ( ) 2 q Z Z ( ) 1 Z Z 11 12 = = = Full Rank V V ( ) 2 Z 21 22 ( ) = th For the pair 1,..., , : k k p q ( ) 1 ( ) 1 ( ) 2 ( ) 2 = = = = 1/2 11 1/2 22 a 'Z e ' ' , where: U e f V Z b 'Z f Z 1 Z Zk Zk Zk Zk Z k *2 Zk 1/2 11 1 1/2 11 th Largest Eigenvalue of k 12 22 12 1/2 11 1 1/2 11 th Eigenvector corresponding to the largest Eigenvalue of k 12 22 12 Zk 1/2 22 1 1/2 22 th Eigenvector corresponding t o the largest Eigenvalue of k 21 11 21 Zk

  7. Interpretation of Population Canonical Variables - I = = = = 1/2 11 1/2 22 E e e A E' L L F f f B F' 1 1 p q ( ) 1 1 M ( ) 1 1 M 1/2 11 M a ' M e ' X X U 1 1 1 U ( ) 1 ( ) 1 = = = = = = = 1/2 11 1/2 11 1/2 11 U AX E' M V X E' E I 11 1 p U ( ) 1 p ( ) 1 p 1/2 11 a ' e ' X X p p p ( ) 2 p M ( ) 2 p M 1/2 22 M b ' M f ' X X V M 1 1 1 V ( ) 2 ( ) 2 = = = = = = = 1/2 22 1/2 22 1/2 22 V B X F' V X F' F I 22 1 q ( ) 2 q ( ) 2 q 1/2 22 b ' f ' V X X q q q 1 1 1 V U ( ) 1 k ( ) 1 kk ( ) 1 k ( ) 1 k ( ) 1 k = = = = 1 1 COV , CORR , V X V X U X U X i i i ( ) 1 kk ( ) 1 kk ( ) 1 kk ( ) 1 ( ) 1 ( ) 1 ( ) 2 ( ) 2 ( ) 2 = = = = U X AX X A COV , COV , COV , COV , V X BX X B 11 22 A Z = = = = = 1/2 11 1/2 22 E e e A E ' L L F f f B F ' 1 1 Z Z Zp Z Z Z Z Zq Z Z ( ) 1 ( ) 1 ( ) 2 ( ) 2 = = = U U Z A COV , COV , V B Z V Z B 11 22 Z Z Z Z Z Z Z Z

  8. Interpretation of Population Canonical Variables - II , V X V X V V ( ) 1 ii ( ) 2 ii 1/2 1/2 = = V V Defining: diag and diag 11 22 ( ) 1 ( ) 1 1/2 1/2 = = = CORR , COV , U X U V X A V ( ) 1 11 11 11 U X , ( ) 2 ( ) 2 1/2 1/2 = = = CORR , COV , V X V V X B V ( ) 2 22 22 22 V X , ( ) 2 ( ) 2 1/2 1/2 = = = CORR , COV , U X U V X A V ( ) 2 22 12 22 U X , ( ) 1 ( ) 1 1/2 = = = 1/ 2 X B V CORR , COV , ( ) 1 11 21 11 = = = = A B A B ( ) 1 ( ) 2 ( ) 2 ( ) 1 11 22 12 11 Z Z Z Z U Z V Z U Z V Z , , , , Z Z Z Z = These sets are the same numerically: and so on ( ) 1 ( ) 1 U X U Z , , Z

  9. Sample Canonical Correlation Analysis - I ( ) 1 j x ( ) 1 ( ) 2 = = X X x Random Sample of observations of , : 1,..., n j n ( ) 2 j j x ( ) 1 11 ( ) 1 1 p ( ) 2 11 x ( ) 2 1 q x x ' x x ( ) 1 1 x 1 n = = = = X x X'1 ( ) 2 ( ) ( ) p q + p q + 1 n x ( ) 1 1 n ( ) 1 np ( ) 2 1 n ( ) 2 nq x ' x x x x n ) ) ) ) ) ) ) ) ( ( ( ( ( ( ( ( = n n ( ) 1 ( ) 1 ( ) 1 ( ) 2 ( ) 1 j ( ) 1 j ( ) 1 j ( ) 2 j x x x x ' x x x x ' 1 = = 1 1 j j = S 1 ( ) ( ) n n n p q + p q + ( ) 2 ( ) 1 ( ) 2 ( ) 2 ( ) 2 j ( ) 1 j ( ) 2 j ( ) 2 j x x x x ' x x x x ' = = 1 1 j j S S 11 12 1 1 n p p S p q S = = X' I J X 1 n 21 22 q p q q ^ ^ ( ) 1 ( ) 2 = = a'X b'X Linear Combinations of the form: have estimated variances, co variance, and correlation: a'S b a'S a b'S b U V ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ = = = = = = a'S a a'S a b'S b a'S b COV , CORR , V U V V U V U V 12 r 11 11 22 12 ^ ^ , U V 11 22

  10. Sample Canonical Correlation Analysis - II 1/2 1/2 11 S 1 S S S S Similar to Population Case - Construct with: 12 22 21 11 *2 *2 ^ ^ ^ e ^ e Eigenvectors: ... with corresponding Eigenvectors ,..., 1 p 1 p ^ f ^ f 1/2 22 1 1/2 22 S 21 11 S S S S ,..., Eigenvectors corresponding to ordere d non-zero Eigenvectors of 1 p 12 ^ f ^ f 1/2 22 1 1/2 22 S 21 11 S S S S ,..., Eigenvectors corresponding to the 0 Eigenvectors of + 1 p q 12 ^ ^ a 'X ^ e S ^ ^ ^ f S ( ) 1 ( ) 1 ( ) 2 X( ) b ' 2 = = = = = = 1/2 11 1/2 22 X a ' b 'X ' 1,..., ' 1,..., U k p V k q k k k k k k ^ a ' M ^ e S ^ ^ f S 1/2 11 M 1/2 22 M b ' ' ' 1 1 1 1 M ^ ^ a ^ a ^ ^ ^ = = = = = = A ' B b b ' L L 1 1 p q ^ ^ e S ^ ^ f S S 1/2 11 1/2 22 ' ' q p p q ^ ^ ^ ^ ( ) 1 ( ) 2 ( ) 1 ii ( ) 2 ii = = = = U AX V BX D diag D diag Defining: and : S 11 22 ^ ^ ^ ^ = = = = 1/2 11 1/2 22 1/2 22 1/2 11 R AS D R BS D R AS D R BS D 11 22 12 21 ^ ^ ^ ^ ( ) 1 ( ) 2 ( ) 2 ( ) 1 U X V X U X V X , , , , ( ) ( ) Z X and replacing R S Similar results for Standardized Observations with replacing

  11. Sample Descriptive Measures - I ^ a ' M ^ e 'S ^ ^ f 'S 1/2 11 M 1/2 22 M b ' 1 1 1 1 M ^ ^ a ^ a ^ ^ ^ = = = = = = A ' B b b ' L L 1 1 p q ^ a ' ^ e 'S ^ ^ f 'S 1/2 11 1/2 22 b ' q p p q ^ ^ ^ ^ ( ) 1 ( ) 2 = = U AX V BX ( ) 1 ( ) p 1 ^ ^ a ^ a ^ ^ a ^ a = = A ' A L L 1 p ^ a ' M 1 ( ) 1 ( ) p 1 1 1 ^ ^ ^ a ^ a ^ ^ ^ ^ ( ) 1 ( ) 2 = = = = AA I X A U X B V L ^ a ' p ^ ^ ^ ^ = = U AS A' I (By Definition) V 11 ( ) 1 ( ) 1 ( ) p ( ) p 1 1 1 1 ( ) 1 ^ ^ ^ ^ ^ ^ ^ a a ' ^ ^ a ^ = = = + + A AS A' A ' S A A ' a ' ... 11 11 ( ) 1 ( ) q ( ) q 1 1 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ = = = + + V BS B' S B B ' b b ' b b ' ... V 22 22

  12. Sample Descriptive Measures - II * ^ 0' 0 L 0 1 * ^ ^ ^ ^ ^ ^ M O 0' M 0 M L 0 M = = U V AS B' COV , 2 12 * ^ 0' 0 0 L p * ^ 0' 0 L 0 1 * ^ ( ) 1 ( ) 1 ( ) p ( ) p 1 1 * * ^ ^ ^ ^ a b ' ^ ^ ^ a ^ M O 0' M 0 M L 0 M = = + + A B ' S b ' ... 2 1 p 12 * ^ 0' 0 0 L p 1 1 1 ^ ^ ^ ^ ^ ^ ^ ^ ( ) 1 = = = X U A U U A I A COV , COV , ^ U 1 1 1 ^ ^ ^ ( ) 1 A X B First columns of r COV , M Same for ^ U r ^ ^ U V 1 1 ( ) 1 ( ) 1 ( ) r ( ) 2 ( ) 1 ( ) r ( ) 1 ( ) 2 ~ x ^ a ^ a ~ x ^ ^ ~ x ~ x = = b b S Define: L M and L M COV , 12 ^ ^ U V r r

  13. Sample Descriptive Measures - III Matrix Errors of Approximation: + + Last terms tend to be small in practice (decreasing correlations) ( ) 1 ( ) 1 ( ) r ( ) r ( ) ( ) ( ) p ( ) p + + 1 1 r r ^ a a ' ^ ^ a ^ a ^ a ^ a ^ a ^ a + = + + S ' ' ' ... ... 11 ( ) 1 ( ) 1 ( ) r ( ) r ( ) ( ) ( ) q ( ) q + + 1 1 r r ^ ^ ^ ^ ^ ^ ^ ^ + = + + S b b ' b b ' b b ' b b ' ... ... 22 ( ) 1 ( ) 1 ( ) r ( ) r ( ) ( ) ( ) p ( ) p + + * * * 1 1 *^ r r ^ ^ a b ' ^ ^ ^ a ^ ^ ^ a ^ ^ ^ + + = + + S b ' b ' a b ' ... ... + 1 1 r r p 12 ( ) 1 ( ) 1 ( ) r ( ) r ( ) ( ) ( ) p ( ) p + + 1 1 r r ^ a a ' ^ ^ a ^ a ^ a ^ a ^ a ^ a + + = + + R ' ' ' ... ... z z z z z z z z 11 ( ) 1 ( ) 1 ( ) r ( ) r ( ) ( ) ( ) q ( ) q ( ) ^ a + + 1 1 r r ^ ^ ^ ^ ^ ^ ^ ^ + + = + + R b b ' b b ' b b ' b b ' ... ... z z z z z z z z 22 ( ) 1 ( ) 1 ( ) r ( ) ( ) ( ) p ( ) p + + * * * 1 1 * r r r ^ ^ a b ' ^ ^ ^ ^ ^ a ^ ^ ^ a ^ + + = + + R b ' b ' b ' ... ... z z z z z z z z + 1 , 1 Z Zr Z r Zp 12

  14. Proportions of Sample Variance Explained = ( ) 1 1 1 ^ ^ ^ trace trace Z Z z = = 1 1 1 1 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ( ) 1 ( ) 2 = = = = z U A U U A z V B V V B COV , COV , COV , COV , Z Z Z Z Z Z Z Z Z Z ( ) p L L r r r r ^ ^ ^ ^ ( ) 1 1 z ( ) 1 1 z ( ) 2 1 z ( ) 2 1 z , , , , U U V V 1 M 1 M Z Zp Z Zq 1 1 ^ ^ = A B O L M O L M Z Z r r r r ^ ^ ^ ^ ( ) 1 p ( ) 1 p ( ) 1 q ( ) 1 q , , , , U z U z V z V z 1 1 Z Zp Z Zq ( ) 1 ( ) p ^ ^ a ^ a ( ) = + + = R A A ' R a a ' ' ... p z z z 11 11 ( ) 1 ( ) 1 ( ) q ( ) q 1 1 ^ ^ ^ ^ ^ ^ ( ) = + + = R B B ' R b b ' b b ' trace trace ... q Z Z z z z z 22 11 Variance Contributions of first r Canonical Variates: + ( ) 1 ( ) 1 ( ) r ( ) r ( ) 1 ( ) 1 ( ) r ( ) r p q r r ^ a a ' ^ ^ a ^ a ^ ^ ^ ^ + = + + = 2 2 ' b b ' b b ' trace .. . trace ... r r z z z z z z z z ^ ^ ( ) 1 k ( ) 2 k , , U z V z Zi Zi = = = = 1 1 1 1 i k i k Proportion of Sample Variance Explained: = ( ) 1 ( ) 1 ( ) r ( ) r ( ) 1 ( ) 1 ( ) r ( ) r ^ a a ' ^ ^ a ^ a ^ ^ ^ ^ + + + + ' b b ' b b ' trace ... trace ... z z z z z z z z = 2 2 R R ^ ^ ^ ^ p q ( ) 1 1 z ( ) 2 1 z ,..., ,..., U U V V 1 1 Z Zr Z Zr

  15. Large-Sample Tests ( ) 1 j S S ( ) 1 X 11 12 = = = = = X 1,..., Random Sample from , j n N ' ( ) 2 j 21 12 j ( ) 2 X 21 22 S S 11 12 = S Unbiased estimator for 21 22 = = = = = *2 1 *2 p : : Note: ... 0 H H 0 0 0 0 12 12 12 A S S *2 p ^ 2ln = = 11 22 ln ln 1 n n i S = 1 i + + *2 p 1 p q ^ ( ) 2 pq Bartlett Correction: Reject if 1 ln 1 H n i 0 2 = 1 i = *2 i = If is rejected, test a sequence of 0: H 0 = = = = = *2 1 *2 k *2 k *2 p *2 k *2 p k k A + + : 0,..., 0, .. . 0 : Not all ... 0 H H + + 0 1 1 *2 p 1 p q ^ )( ) 2 Bartlett Correction: Reject if 1 ln 1 H n ( )( p k q k i 0 2 i k = + 1

Related


More Related Content