Understanding Matrix Factorization for Latent Factor Recovery

Slide Note
Embed
Share

Explore the concept of matrix factorization for recovering latent factors in a matrix, specifically focusing on user ratings of movies. This technique involves decomposing a matrix into multiple matrices to extract hidden patterns and relationships. The process is crucial for tasks like image denoising and can be optimized using stochastic gradient descent. Learn how this approach works and its significance in various applications.


Uploaded on Sep 12, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Matrix Factorization Matrix Factorization

  2. Recovering latent factors in a matrix Recovering latent factors in a matrix m movies v11 n users vij vnm V[i,j] = user i s rating of movie j

  3. Recovering latent factors in a matrix Recovering latent factors in a matrix m movies m movies ~ a1 a2 .. am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users vij vnm xn yn V[i,j] = user i s rating of movie j

  4. KDD 2011 talk pilfered from ..

  5. Recovering latent factors in a matrix Recovering latent factors in a matrix r m movies m movies ~ .. H a1 a2 am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users W V vij vnm xn yn V[i,j] = user i s rating of movie j

  6. for image denoising

  7. Matrix factorization as SGD Matrix factorization as SGD step size

  8. Matrix factorization as SGD Matrix factorization as SGD - - why does this work? this work? why does step size

  9. Matrix factorization as SGD Matrix factorization as SGD - - why does this work? Here s the key claim: this work? Here s the key claim: why does

  10. Checking the claim Checking the claim Think for SGD for logistic regression LR loss = compare y and = dot(w,x) similar but now update w (user weights) and x (movie weight)

  11. What loss functions are possible? What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies generalized KL-divergence

  12. What loss functions are possible? What loss functions are possible?

  13. What loss functions are possible? What loss functions are possible?

  14. ALS = alternating least squares

  15. KDD 2011 talk pilfered from ..

  16. Similar to McDonnell et al with perceptron learning

  17. Slow convergence..

  18. More detail. More detail . Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel

  19. More detail. More detail . Z was V

  20. More detail. More detail . Initialize W,H randomly not at zero Choose a random ordering (random sort) of the points in a stratum in each sub-epoch Pick strata sequence by permuting rows and columns of M, and using M [k,i] as column index of row i in subepoch k Use bold driver to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall M=

  21. Wall Clock Time Wall Clock Time 8 nodes, 64 cores, R/snow 8 nodes, 64 cores, R/snow

  22. Number of Epochs Number of Epochs

  23. Varying rank Varying rank 100 epochs for all 100 epochs for all

  24. Hadoop Hadoop scalability scalability Hadoop process setup time starts to dominate

  25. Hadoop Hadoop scalability scalability

Related