Collaborative Bayesian Filtering in Online Recommendation Systems

Slide Note

COBAFI: COLLABORATIVE BAYESIAN FILTERING is a model developed by Alex Beutel and collaborators to predict user preferences in online recommendation systems. The model aims to fit user ratings data, understand user behavior, and detect spam. It utilizes Bayesian probabilistic matrix factorization and explores different inference methods to achieve accurate predictions.

khoss Follow

Uploaded on Sep 21, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

COBAFI: COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 Seoul, South Korea

2 Online Recommendation Movies 5 5 2 Users 5 3 5

3 Online Rating Models

4 Online Rating Models Reality Normal Collaborative Filtering Fit a Gaussian - Minimize the error Minimizing error isn t good enough - Understanding the shape matters!

5 Online Rating Models Normal Collaborative Filtering Fit a Gaussian - Minimize the error Our Model

6 Our Goals and Challenges Given: A matrix of user ratings Find: A model that best fits and predicts user preferences Goals: G1. Fit the recommender distribution G2. Understand users who rate few items G3. Detect abnormal spam behavior

7 OUTLINE 1. Background 2. Model Formulation 3. Inference 4. Catching Spam 5. Experiments

8 Collaborative Filtering [Background] Movies V Users X U X UVT XObama,SexAndTheCity uObama vSexAndTheCity Genres 5 = 5 = 2.236 2.236 2.23 1.2 0.2 1.5 0.73 0 0 0 6

10 Bayesian Probabilistic Matrix Factorization (Salakhutdinov & Mnih, ICML 2008) Bayesian Probabilistic M atrix Factorization using M CM C 0n n , W0 , W0 0 L LU V m m mU m0 V Ui j 0 V U Rij i=1,...,N j=1,...,M ~ a Figure 1. The left panel shows the graphical model for Probabilistic Matrix Factorization (PMF). The right panel shows the graphical model for Bayesian PMF. tors are assumed to be Gaussian: over model parameters and hyperparameters: N p(R p(R ij|R, 0) = ij|Ui,Vj)p(U,V|R, U, V) N (Ui| U, 1 p(U| U, U) = U), (5) i= 1 M p( U, V| 0)d{U,V}d{ U, V}. (9) N (Vi| V, 1 p(V| V, V) = V). (6) Since exact evaluation of this predictive distribution isanalytically intractabledueto thecomplexity of the posterior we need to resort to approximate inference. i= 1 We further place Gaussian-Wishart priorson the user and movie hyperparameters U V = { V, V}: [Background] = { U, U} and One choicewould be to use variational methods (Hin- ton & van Camp, 1993; Jordan et al., 1999) that pro- vide deterministic approximation schemes for posteri- ors. In particular, wecould approximatethetruepos- terior p(U,V, U, V|R) by adistribution that factors, with each factor havingaspecificparametricform such as a Gaussian distribution. This approximate poste- rior would allow us to approximate the integrals in Eq. 9. Variational methods have become the method- ology of choice, since they typically scale well to large applications. However, they can produce inaccurate results because they tend to involve overly simple ap- proximations to the posterior. p( U| 0) = p( U| U)p( U) = N ( U| 0,( 0 U) 1)W( U|W0, 0), (7) p( V| 0) = p( V| V)p( V) = N ( V| 0,( 0 V) 1)W( V|W0, 0). (8) HereW is the Wishart distribution with 0degreesof freedom and a D D scale matrix W0: 1 C| |( 0 D 1)/ 2exp( 1 2Tr(W 1 W( |W0, 0) = 0 )), MCMC-based methods (Neal, 1993), on the other hand, use the Monte Carlo approximation to the pre- dictive distribution of Eq. 9 given by: where C is the normalizing constant. For convenience we also define 0= { 0, 0,W0}. In our experiments we also set 0 = D and W0 to the identity matrix for both user and movie hyperparameters and choose 0= 0 by symmetry. K 1 K ij|U(k) ,V(k) j p(R ij|R, 0) p(R ). (10) i k= 1 The samples {U(k) a Markov chain whose stationary distribution is the posterior distribution over the model parameters and hyperparameters {U,V, U, V}. ,V(k) j } are generated by running 3.2. Predictions i The predictive distribution of the rating value R user i and query movie j is obtained by marginalizing ijfor The advantage of

11 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments

12 Our Model Cluster users (& items) Share preferences within clusters Use user preferences to predict ratings

13 The Recommender Distribution First introduced by Tan et al, 2013 Linear Normalization Quadratic Normalization 1 = 0 Vary 2 2 = 0.4 2 = -1.0

14 The Recommender Distribution ui 0.3 0.4 0.3 0.2 -0.7 0.4 0.3 0.8 0.4 Genre Preferences General Leaning How Polarized Goal 1: Fit the recommender distribution

15 Understanding varying preferences 5 2 5 3 1 1 5

16 Resulting Co-clustering V U

17 Finding User Preferences U U Goal 2: Understand users who rate few items

18 Chinese Restaurant Process 1 3 2 p(a7=2)=2 p(a7=1)=4 0.1 6.1 6.1 6.1

19 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments

20 Gibbs Sampling - Clusters [Details] Probability of picking a cluster = Probability of a cluster based on size (CRP) x Probability uiwould come from the cluster

21 Sampling user parameters [Details] Probability of user preferences ui = Probability of preferences ui given cluster parameters x Probability of predicting ratings ri,j using new preferences Recommender distribution is non-conjugate Can t sample directly!

22 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments

23 Review Spam and Fraud 1 5 5 5 1 1 1 1 1 1 1 1 5 5 5 5 Image from http://sinovera.deviantart.com/art/Cute-Devil-117932337

24 Clustering Fraudsters 1 3 2 New Spam Cluster Previous Real Cluster

25 Clustering Fraudsters 1 3 2 Too much spam get separated into fraud cluster Trying to hide just means (a) very little spam or (b) camouflage reinforcing realistic reviews.

26 Clustering Fraudsters 1 2 3 4 5 Na ve Spammers Spam + Noise Hijacked Accounts Goal 3: Detect abnormal spam behavior

27 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments

28 Does it work? Better Fit

29 Catching Na ve Spammers BPMF CoBaFi Fit (Predictive Probability) 2 1.5 1 0.5 0 Before After Injection 83% are clustered together

30 Clustered Hijacked Accounts Clustered hijacked accounts Clustered attacked movies 1.6 BPMF CoBaFi Fit (Predictive Probability) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Before After Injection

31 Real world clusters

32 Shape of real world data

33 Shape of Netflix reviews Most Gaussian The Rookie The Fan Cadet Kelly Money Train Alice Doesn t Live Here Aqua Teen Hunger Force: Vol. 2 Sea of Love Gilmore Girls: Season 3 Boiling Point Felicity: Season 4 True Believer The O.C. Season 1 Stakeout The Shield Season 3 The Package Queer as Folk Season 4 Most skewed The O.C. Season 2 Samurai X: Trust and Betrayal Aqua Teen Hunger Force: Vol. 2 Sealab 2001: Season 1 More Skewed More Gaussian

34 Shape of Amazon Clothing reviews Amazon Clothing Most Skewed Reviews Bra Disc Nipple Covers Vanity Fair Women s String Bikini Panty Lee Men s Relaxed Fit Tapered Jean Carhartt Men s Dungaree Jean Wrangler Men s Cowboy Cut Slim Fit Jean Nearly all are heavily polarized!

35 Shape of Amazon Electronics reviews Amazon Electronics Most Skewed Reviews Sony CD-R 50 Pack Spindle Olympus Stylus Epic Zoom Camera Sony AC Adapter Laptop Charger Apricorn Hard Drive Upgrade Kit Corsair 1GB Desktop Memory Nearly all are heavily polarized!

36 Shape of BeerAdvocate reviews BeerAdvocate Most Gaussian Reviews Weizenbock (Sierra Nevada) Ovila Abbey Saison (Sierra Nevada) Stoudt s Abbey Double Ale Stoudt s Fat Dog Stout Juniper Black Ale Nearly all are Gaussian!

37 Hypotheses on shape of data Hard to evaluate beyond binary vs. Selection bias Only committed viewers watch Season 4 of a TV series Hard to compare value across very different items. Lots of beers and movies to compare Fewer TV shows Even fewer jeans or hard drives

38 Key Points Modeling: Fit real data with flexible recommender distribution Prediction: Predict user preferences Anomaly Detection: When does a user not match the normal model?

39 Questions? Alex Beutel abeutel@cs.cmu.edu http://alexbeutel.com

40 Sampling Cluster Parameters Hyperparameters , , W , a Priors on , , W u5 u6

41 Gibbs Sampling - Clusters [Details] Probability uiwould be sampled from cluster a Probability of a cluster (CRP)

42 Sampling user parameters [Details] Probability of ui given cluster parameters Probability of predicting ratings ri,j Recommender distribution is non-conjugate Can t sample directly! Use a Laplace approximation and perform Metropolis-Hastings Sampling

43 Sampling user parameters [Details] Use candidate normal distribution Mode of p(ui) Variance of p(ui) Metropolis-Hastings Sampling: Sample Keep new with probability

44 Sampling Cluster Parameters [Details] Users/Items in the cluster Priors

45 Inferring Hyperparameters [Details] Solved directly no sampling needed! Prior hidden as additional cluster

46 Does Metropolis Hasting work? Have to use non-standard sampling procedure: 99.12% acceptance rate for Amazon Electronics 77.77% acceptance rate for Netflix 24k

47 Does it work? Uniform 1.6904 BPMF 1.2525 CoBaFi (us) 1.1827 Netflix (24k users) BeerAdvocate 2.1972 1.9855 1.6741 Compare on Predictive Probability (PP) to see how well our model fits the data

48 Handling Spammers PP Before 1.7047 1.0549 PP After 1.8146 1.7042 BPMF CoBaFi Random na ve spammers in Amazon Electronics dataset PP Before 1.2375 0.9670 PP After 1.3057 1.2935 BPMF CoBaFi Random hijacked accounts in Netflix 24k dataset