Matrix Factorization for Latent Factor Recovery

Matrix Factorization

Recovering latent factors in a matrix

 movies

V[i,j] = user i’s rating of movie j

users

Recovering latent factors in a matrix

 movies

users

 movies

V[i,j] = user i’s rating of movie j

talk pilfered from



…..

KDD 2011

Recovering latent factors in a matrix

 movies

users

 movies

V[i,j] = user i’s rating of movie j

for image denoising

Matrix factorization as SGD

step size

Matrix factorization as SGD - why does

this work?

step size

Matrix factorization as SGD - why does

this work?  Here’s the key claim:

Checking the claim

Think for SGD for logistic regression

•

LR loss = compare

and

ŷ

 = dot(w,x)

•

similar but now update w (user weights) and x (movie weight)

What loss functions are possible?

N1, N2 - diagonal

matrixes, sort of like IDF

factors for the

users/movies

“generalized” KL-divergence

What loss functions are possible?

What loss functions are possible?

ALS = alternating least squares

talk pilfered from



…..

KDD 2011

Similar to McDonnell et al

with perceptron learning

Slow convergence…..

More detail….

•

Randomly permute rows/cols of matrix

•

Chop V,W,H into blocks of size

d x d

–

m/d

blocks in W,

n/d

blocks in H

•

Group the data:

–

Pick a set of blocks with no overlapping rows

or columns (a

stratum)

–

Repeat until all blocks in V are covered

•

Train the SGD

–

Process strata in series

–

Process blocks within a stratum in parallel

More detail….

was

More detail….

•

Initialize W,H randomly

–

not at zero



•

Choose a random ordering (random sort) of the points

in a stratum in each “sub-epoch”

•

Pick strata sequence by permuting rows and columns

of M, and using M’[k,i] as column index of row i in

subepoch k

•

Use “bold driver” to set step size:

–

increase step size when loss decreases (in an epoch)

–

decrease step size when loss increases

•

Implemented in Hadoop and R/Snowfall

M=

Wall Clock Time

8 nodes, 64 cores, R/snow

Number of Epochs

Varying rank

100 epochs for all

Hadoop scalability

Hadoop process

setup  time starts

to dominate

Hadoop scalability

Slide Note

Embed Share

Download

Explore the concept of matrix factorization for recovering latent factors in a matrix, specifically focusing on user ratings of movies. This technique involves decomposing a matrix into multiple matrices to extract hidden patterns and relationships. The process is crucial for tasks like image denoising and can be optimized using stochastic gradient descent. Learn how this approach works and its significance in various applications.

mith497 Follow

Uploaded on Sep 12, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Matrix Factorization Matrix Factorization

Recovering latent factors in a matrix Recovering latent factors in a matrix m movies v11 n users vij vnm V[i,j] = user i s rating of movie j

Recovering latent factors in a matrix Recovering latent factors in a matrix m movies m movies ~ a1 a2 .. am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users vij vnm xn yn V[i,j] = user i s rating of movie j

KDD 2011 talk pilfered from ..

Recovering latent factors in a matrix Recovering latent factors in a matrix r m movies m movies ~ .. H a1 a2 am x1 y1 v11 b1 b2 bm x2 y2 .. .. n users W V vij vnm xn yn V[i,j] = user i s rating of movie j

for image denoising

Matrix factorization as SGD Matrix factorization as SGD step size

Matrix factorization as SGD Matrix factorization as SGD - - why does this work? this work? why does step size

Matrix factorization as SGD Matrix factorization as SGD - - why does this work? Here s the key claim: this work? Here s the key claim: why does

Checking the claim Checking the claim Think for SGD for logistic regression LR loss = compare y and = dot(w,x) similar but now update w (user weights) and x (movie weight)

What loss functions are possible? What loss functions are possible? N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies generalized KL-divergence

What loss functions are possible? What loss functions are possible?

What loss functions are possible? What loss functions are possible?

ALS = alternating least squares

KDD 2011 talk pilfered from ..

Similar to McDonnell et al with perceptron learning

Slow convergence..

More detail. More detail . Randomly permute rows/cols of matrix Chop V,W,H into blocks of size d x d m/d blocks in W, n/d blocks in H Group the data: Pick a set of blocks with no overlapping rows or columns (a stratum) Repeat until all blocks in V are covered Train the SGD Process strata in series Process blocks within a stratum in parallel

More detail. More detail . Z was V

More detail. More detail . Initialize W,H randomly not at zero Choose a random ordering (random sort) of the points in a stratum in each sub-epoch Pick strata sequence by permuting rows and columns of M, and using M [k,i] as column index of row i in subepoch k Use bold driver to set step size: increase step size when loss decreases (in an epoch) decrease step size when loss increases Implemented in Hadoop and R/Snowfall M=