Understanding Matrix Factorization for Latent Factor Recovery

Slide Note

Explore the concept of matrix factorization for recovering latent factors in a matrix, specifically focusing on user ratings of movies. This technique involves decomposing a matrix into multiple matrices to extract hidden patterns and relationships. The process is crucial for tasks like image denoising and can be optimized using stochastic gradient descent. Learn how this approach works and its significance in various applications.

mith497 Follow

Uploaded on Sep 12, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Matrix Factorization Matrix Factorization

Recovering latent factors in a matrix Recovering latent factors in a matrix m movies v11 n users vij vnm V[i,j] = user i s rating of movie j

Recovering latent factors in a matrix Recovering latent factors in a matrix m movies m movies ~ a1 a2 .. am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users vij vnm xn yn V[i,j] = user i s rating of movie j

KDD 2011 talk pilfered from ..

Recovering latent factors in a matrix Recovering latent factors in a matrix r m movies m movies ~ .. H a1 a2 am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users W V vij vnm xn yn V[i,j] = user i s rating of movie j

for image denoising

Matrix factorization as SGD Matrix factorization as SGD step size

Matrix factorization as SGD Matrix factorization as SGD - - why does this work? this work? why does step size

Matrix factorization as SGD Matrix factorization as SGD - - why does this work? Here s the key claim: this work? Here s the key claim: why does

Checking the claim Checking the claim Think for SGD for logistic regression LR loss = compare y and = dot(w,x) similar but now update w (user weights) and x (movie weight)

What loss functions are possible? What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies generalized KL-divergence

What loss functions are possible? What loss functions are possible?

What loss functions are possible? What loss functions are possible?

ALS = alternating least squares

KDD 2011 talk pilfered from ..

Similar to McDonnell et al with perceptron learning

Slow convergence..

More detail. More detail . Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel

More detail. More detail . Z was V

More detail. More detail . Initialize W,H randomly not at zero Choose a random ordering (random sort) of the points in a stratum in each sub-epoch Pick strata sequence by permuting rows and columns of M, and using M [k,i] as column index of row i in subepoch k Use bold driver to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall M=