Understanding Matrix Factorization for Latent Factor Recovery
Explore the concept of matrix factorization for recovering latent factors in a matrix, specifically focusing on user ratings of movies. This technique involves decomposing a matrix into multiple matrices to extract hidden patterns and relationships. The process is crucial for tasks like image denoising and can be optimized using stochastic gradient descent. Learn how this approach works and its significance in various applications.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Matrix Factorization Matrix Factorization
Recovering latent factors in a matrix Recovering latent factors in a matrix m movies v11 n users vij vnm V[i,j] = user i s rating of movie j
Recovering latent factors in a matrix Recovering latent factors in a matrix m movies m movies ~ a1 a2 .. am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users vij vnm xn yn V[i,j] = user i s rating of movie j
KDD 2011 talk pilfered from ..
Recovering latent factors in a matrix Recovering latent factors in a matrix r m movies m movies ~ .. H a1 a2 am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users W V vij vnm xn yn V[i,j] = user i s rating of movie j
Matrix factorization as SGD Matrix factorization as SGD step size
Matrix factorization as SGD Matrix factorization as SGD - - why does this work? this work? why does step size
Matrix factorization as SGD Matrix factorization as SGD - - why does this work? Here s the key claim: this work? Here s the key claim: why does
Checking the claim Checking the claim Think for SGD for logistic regression LR loss = compare y and = dot(w,x) similar but now update w (user weights) and x (movie weight)
What loss functions are possible? What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies generalized KL-divergence
What loss functions are possible? What loss functions are possible?
What loss functions are possible? What loss functions are possible?
KDD 2011 talk pilfered from ..
Similar to McDonnell et al with perceptron learning
More detail. More detail . Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel
More detail. More detail . Z was V
More detail. More detail . Initialize W,H randomly not at zero Choose a random ordering (random sort) of the points in a stratum in each sub-epoch Pick strata sequence by permuting rows and columns of M, and using M [k,i] as column index of row i in subepoch k Use bold driver to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall M=
Wall Clock Time Wall Clock Time 8 nodes, 64 cores, R/snow 8 nodes, 64 cores, R/snow
Number of Epochs Number of Epochs
Varying rank Varying rank 100 epochs for all 100 epochs for all
Hadoop Hadoop scalability scalability Hadoop process setup time starts to dominate
Hadoop Hadoop scalability scalability