Principal Component Analysis (PCA) and Neural Networks Overview
Explore the concepts of Hebbian learning, hierarchical PCA neural networks, second-order methods, principal components, and projection error minimization in the context of machine learning. Understand the principles behind PCA, its application in data visualization, and its significance in signal processing, statistics, and neural computing.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
? Machine Learning (Part II) Test Angelo Ciaramella
Question 4 Hebbian learning Question The hierarchical PCA NN is Deep NN Single Layer NN Mulit-Layer NN ML Verification tests .
Hebbian learning and NNs NNs based on the Hebb s rule Oja s rule computer scientist Erkki Oja Unsupervised learning Symmetric Oja Space Sanger s rule scientist Terence D. Sanger Unsupervised learning Selective Principal Components ML Verification tests Generates an algorithm for Principal Component Analysis (PCA) non-linear PCA Independent Component Analaysis (ICA) 3
Principal Component Analysis Principal Component Analysis (PCA) is a statistical technique Dimensionality reduction Lossy data compression Feature extraction Data visualization It is also known as the Karhunen-Loeve transform ML Verification tests PCA can be defined as the principal subspace such that the variance of the projected data is maximized
Second-Order methods The second-order methods are the most popular methods to find a linear transformation This methods find the representation using only the information contained in the covariance matrix of the data vector x ML Verification tests PCA is widely used in signal processing, statistics, and neural computing
Principal Components In a linear projection down to one dimension, the optimum choice of projection, in the sense of minimizing the sum-of-squares error, is obtained first subtracting off the mean of the data set, and then projecting onto the first eigenvector u1of the covariance matrix. ML Verification tests
Projection error minimization We introduce a complete orthonormal set of D- dimensional basis vectors (i=1, ,D) = T u u ij i j Because this basis is complete, each data point can be represented by a linear combination of the basis vectors = i 1 ML Verification tests D = x u n ni i
Projection error minimization We can write also that ( ) D = i = T n x u = T n x x u u nj j n i i 1 Our goal is to approximate this data point using a representation involving a restricted number M < D of variables corresponding to a projection onto a lower-dimensional subspace ML Verification tests M D ~ x = i = M = + u u z b 1 n ni i i i + 1 i
Projection error minimization As our distortion measure we shall use the squared distance between the original point and its approximation averaged over the data set so that our goal is to minimize = n N 1 1 N ~ x 2 = x J n n The general solution is obtained by choosing the basis to be eigenvectors of the covariance matrix given by ML Verification tests = Su u i i i
Projection error minimization The corresponding value of the distortion measure is then given by + = M i D = J i 1 We minimize this error selecting the eigenvectors defining the principal subspace are those corresponding to the M largest eigenvalues ML Verification tests
Complex distributions A linear dimensionality reduce technique, such as PCA, is unable to detect the lower dimensionality. In this case PCA gives two eigenvectors with equal eigenvalues. The data can described by a single eigenvalue parameter ML Verification tests Addition of a small level of noise to data having an intrinsic. Dimensionality to 1 can increase its intrinsic dimensionality to 2. The data can be represented to a good approximation by a single variable and can be regarded as having an intrinsic dimensionality of 1.
Unsupervised Neural Networks Typically Hebbian type learning rules are used There are two type of NN able to extract the Principal Components: Symmetric (Oja, 1989) ML Verification tests Hierarchical (Sanger, 1989)
Information and Hebbian Learning Information extraction output ? = ???? y ? w1 ... wn ML Verification tests x1 xn ... Hebbian learning - self-amplification ??= ???? the net learns to respond the patterns that present the most frequent samples 13
Principal Component Weights can grow to infinity Solution normalization (no - local) ?? ? ??= Competition mechanism for a stable solution weights in the direction of maximum variance of the distribution Maximization of the variance on the oputput weights in the direction of the eigenvector corresponding to the maximum eigenvalue of the correlation matrix ML Verification tests ? = ??? ? 14
Ojas rule Idea y w1 ... wn x1 xn ... ML Verification tests Information feedback 15
Ojas rule Normalization is not local Oia s rule ??= ? ?? ??? Forgetting factor ML Verification tests More outputs ? ???= ??? ?? ????? ?=1 16
Syemmetric NN Single layer Neural Network ML Verification tests y ( ) Symmetric PCA NN 2 = 2 T w x E E Objective function
Sangers rule Sanger s learning rule ? ???= ??? ?? ????? ?=1 ML Verification tests 18
Hierarchical NN Single layer Neural Network ML Verification tests Hierarchical PCA NN
Ojas rule vs. Sangers rule Oja s rule Symmetric Space Principal Components without a specific order Sanger s rule Hierarchical space Principal Components without a specific order weights of the first output neuron corresponding to the first component, weights of the second neuron to the second residual component, and so on ML Verification tests 20
Mixing matrix ML Verification tests
Non-linear objective function Maximization w (weights) ( ) x wT x E f L-dimensional vector where E is the expectation with respect to the (unknown) density of x and f(.) is a continue function (e.g. ln cosh(.)) Taylor series 1 1 1 = + + 2 4 6 8 ln cosh( ) ( ) y y y y O y ML Verification tests 2 12 45 ) 1 1 = + 2 4 T T ln cosh( ) ( ) ( ) E y E w x E w x 2 12 1 + 6 2 T T ( ) (( ) E w x E O w x 45 ( ) 1 1 2= = T and C I E w x 2 2 1 ( ) That is dominating, and the kurtosis is optimized 4 T E w x 12
Robust and non-linear PCA Standard PCA y E = ( ) 2 2 T w x E Nonlinear PCA Robust PCA = j ( ) ( ) w = + 2 T x w J E f (k ) E b 1 i i = T x x ( ) 1 ( ) J e E f ( ) I ( ) ( ) I i = j 2 i i T w w = x y w ( ) ( ) ( ) b k g k k ij i j ij i k j j 1 1 ML Verification tests + T w x ( ) ( ( )) k g e k ( ) i x i k ) 1 + = + ) 1 + = + w w ( ( ) k k w w y ) 1 + = + ( ( ) ( ( )) ( ) k k g k e k w w y ( ( ) ( ) ( ) k k g k b k k + k i i T k w i i i i ( ) ( ( )) k g e i i k Descendent gradient algorithm ( ) I = i = x y w ( ) ( ) ( ) e k k k k j j 1 j
Cocktail party ML Verification tests Estimated-Sources Sources Mixtures s A x W y
Source estimation Source signals Mixed signals Estimated signals ML Verification tests y1(t), y2(t),y3(t) are the separated signals x1(t), x2(t),x3(t) are the observed signals, s1(t), s2(t), s3(t) the source signals
References Material Slides Video Lessons Books D. Floreano, Manuale sulle Reti Neurali, Il Mulino, 1996 J. C. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 ML Verification tests