Collaborative Bayesian Filtering in Online Recommendation Systems
COBAFI: COLLABORATIVE BAYESIAN FILTERING is a model developed by Alex Beutel and collaborators to predict user preferences in online recommendation systems. The model aims to fit user ratings data, understand user behavior, and detect spam. It utilizes Bayesian probabilistic matrix factorization and explores different inference methods to achieve accurate predictions.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
COBAFI: COLLABORATIVE BAYESIAN FILTERING Alex Beutel Joint work with Kenton Murray, Christos Faloutsos, Alex Smola April 9, 2014 Seoul, South Korea
2 Online Recommendation Movies 5 5 2 Users 5 3 5
3 Online Rating Models
4 Online Rating Models Reality Normal Collaborative Filtering Fit a Gaussian - Minimize the error Minimizing error isn t good enough - Understanding the shape matters!
5 Online Rating Models Normal Collaborative Filtering Fit a Gaussian - Minimize the error Our Model
6 Our Goals and Challenges Given: A matrix of user ratings Find: A model that best fits and predicts user preferences Goals: G1. Fit the recommender distribution G2. Understand users who rate few items G3. Detect abnormal spam behavior
7 OUTLINE 1. Background 2. Model Formulation 3. Inference 4. Catching Spam 5. Experiments
8 Collaborative Filtering [Background] Movies V Users X U X UVT XObama,SexAndTheCity uObama vSexAndTheCity Genres 5 = 5 = 2.236 2.236 2.23 1.2 0.2 1.5 0.73 0 0 0 6
10 Bayesian Probabilistic Matrix Factorization (Salakhutdinov & Mnih, ICML 2008) Bayesian Probabilistic M atrix Factorization using M CM C 0n n , W0 , W0 0 L LU V m m mU m0 V Ui j 0 V U Rij i=1,...,N j=1,...,M ~ a Figure 1. The left panel shows the graphical model for Probabilistic Matrix Factorization (PMF). The right panel shows the graphical model for Bayesian PMF. tors are assumed to be Gaussian: over model parameters and hyperparameters: N p(R p(R ij|R, 0) = ij|Ui,Vj)p(U,V|R, U, V) N (Ui| U, 1 p(U| U, U) = U), (5) i= 1 M p( U, V| 0)d{U,V}d{ U, V}. (9) N (Vi| V, 1 p(V| V, V) = V). (6) Since exact evaluation of this predictive distribution isanalytically intractabledueto thecomplexity of the posterior we need to resort to approximate inference. i= 1 We further place Gaussian-Wishart priorson the user and movie hyperparameters U V = { V, V}: [Background] = { U, U} and One choicewould be to use variational methods (Hin- ton & van Camp, 1993; Jordan et al., 1999) that pro- vide deterministic approximation schemes for posteri- ors. In particular, wecould approximatethetruepos- terior p(U,V, U, V|R) by adistribution that factors, with each factor havingaspecificparametricform such as a Gaussian distribution. This approximate poste- rior would allow us to approximate the integrals in Eq. 9. Variational methods have become the method- ology of choice, since they typically scale well to large applications. However, they can produce inaccurate results because they tend to involve overly simple ap- proximations to the posterior. p( U| 0) = p( U| U)p( U) = N ( U| 0,( 0 U) 1)W( U|W0, 0), (7) p( V| 0) = p( V| V)p( V) = N ( V| 0,( 0 V) 1)W( V|W0, 0). (8) HereW is the Wishart distribution with 0degreesof freedom and a D D scale matrix W0: 1 C| |( 0 D 1)/ 2exp( 1 2Tr(W 1 W( |W0, 0) = 0 )), MCMC-based methods (Neal, 1993), on the other hand, use the Monte Carlo approximation to the pre- dictive distribution of Eq. 9 given by: where C is the normalizing constant. For convenience we also define 0= { 0, 0,W0}. In our experiments we also set 0 = D and W0 to the identity matrix for both user and movie hyperparameters and choose 0= 0 by symmetry. K 1 K ij|U(k) ,V(k) j p(R ij|R, 0) p(R ). (10) i k= 1 The samples {U(k) a Markov chain whose stationary distribution is the posterior distribution over the model parameters and hyperparameters {U,V, U, V}. ,V(k) j } are generated by running 3.2. Predictions i The predictive distribution of the rating value R user i and query movie j is obtained by marginalizing ijfor The advantage of
11 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments
12 Our Model Cluster users (& items) Share preferences within clusters Use user preferences to predict ratings
13 The Recommender Distribution First introduced by Tan et al, 2013 Linear Normalization Quadratic Normalization 1 = 0 Vary 2 2 = 0.4 2 = -1.0
14 The Recommender Distribution ui 0.3 0.4 0.3 0.2 -0.7 0.4 0.3 0.8 0.4 Genre Preferences General Leaning How Polarized Goal 1: Fit the recommender distribution
15 Understanding varying preferences 5 2 5 3 1 1 5
16 Resulting Co-clustering V U
17 Finding User Preferences U U Goal 2: Understand users who rate few items
18 Chinese Restaurant Process 1 3 2 p(a7=2)=2 p(a7=1)=4 0.1 6.1 6.1 6.1
19 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments
20 Gibbs Sampling - Clusters [Details] Probability of picking a cluster = Probability of a cluster based on size (CRP) x Probability uiwould come from the cluster
21 Sampling user parameters [Details] Probability of user preferences ui = Probability of preferences ui given cluster parameters x Probability of predicting ratings ri,j using new preferences Recommender distribution is non-conjugate Can t sample directly!
22 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments
23 Review Spam and Fraud 1 5 5 5 1 1 1 1 1 1 1 1 5 5 5 5 Image from http://sinovera.deviantart.com/art/Cute-Devil-117932337
24 Clustering Fraudsters 1 3 2 New Spam Cluster Previous Real Cluster
25 Clustering Fraudsters 1 3 2 Too much spam get separated into fraud cluster Trying to hide just means (a) very little spam or (b) camouflage reinforcing realistic reviews.
26 Clustering Fraudsters 1 2 3 4 5 Na ve Spammers Spam + Noise Hijacked Accounts Goal 3: Detect abnormal spam behavior
27 OUTLINE 1. Background 2. Our Model 3. Inference 4. Catching Spam 5. Experiments
28 Does it work? Better Fit
29 Catching Na ve Spammers BPMF CoBaFi Fit (Predictive Probability) 2 1.5 1 0.5 0 Before After Injection 83% are clustered together
30 Clustered Hijacked Accounts Clustered hijacked accounts Clustered attacked movies 1.6 BPMF CoBaFi Fit (Predictive Probability) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Before After Injection
31 Real world clusters
32 Shape of real world data
33 Shape of Netflix reviews Most Gaussian The Rookie The Fan Cadet Kelly Money Train Alice Doesn t Live Here Aqua Teen Hunger Force: Vol. 2 Sea of Love Gilmore Girls: Season 3 Boiling Point Felicity: Season 4 True Believer The O.C. Season 1 Stakeout The Shield Season 3 The Package Queer as Folk Season 4 Most skewed The O.C. Season 2 Samurai X: Trust and Betrayal Aqua Teen Hunger Force: Vol. 2 Sealab 2001: Season 1 More Skewed More Gaussian
34 Shape of Amazon Clothing reviews Amazon Clothing Most Skewed Reviews Bra Disc Nipple Covers Vanity Fair Women s String Bikini Panty Lee Men s Relaxed Fit Tapered Jean Carhartt Men s Dungaree Jean Wrangler Men s Cowboy Cut Slim Fit Jean Nearly all are heavily polarized!
35 Shape of Amazon Electronics reviews Amazon Electronics Most Skewed Reviews Sony CD-R 50 Pack Spindle Olympus Stylus Epic Zoom Camera Sony AC Adapter Laptop Charger Apricorn Hard Drive Upgrade Kit Corsair 1GB Desktop Memory Nearly all are heavily polarized!
36 Shape of BeerAdvocate reviews BeerAdvocate Most Gaussian Reviews Weizenbock (Sierra Nevada) Ovila Abbey Saison (Sierra Nevada) Stoudt s Abbey Double Ale Stoudt s Fat Dog Stout Juniper Black Ale Nearly all are Gaussian!
37 Hypotheses on shape of data Hard to evaluate beyond binary vs. Selection bias Only committed viewers watch Season 4 of a TV series Hard to compare value across very different items. Lots of beers and movies to compare Fewer TV shows Even fewer jeans or hard drives
38 Key Points Modeling: Fit real data with flexible recommender distribution Prediction: Predict user preferences Anomaly Detection: When does a user not match the normal model?
39 Questions? Alex Beutel abeutel@cs.cmu.edu http://alexbeutel.com
40 Sampling Cluster Parameters Hyperparameters , , W , a Priors on , , W u5 u6
41 Gibbs Sampling - Clusters [Details] Probability uiwould be sampled from cluster a Probability of a cluster (CRP)
42 Sampling user parameters [Details] Probability of ui given cluster parameters Probability of predicting ratings ri,j Recommender distribution is non-conjugate Can t sample directly! Use a Laplace approximation and perform Metropolis-Hastings Sampling
43 Sampling user parameters [Details] Use candidate normal distribution Mode of p(ui) Variance of p(ui) Metropolis-Hastings Sampling: Sample Keep new with probability
44 Sampling Cluster Parameters [Details] Users/Items in the cluster Priors
45 Inferring Hyperparameters [Details] Solved directly no sampling needed! Prior hidden as additional cluster
46 Does Metropolis Hasting work? Have to use non-standard sampling procedure: 99.12% acceptance rate for Amazon Electronics 77.77% acceptance rate for Netflix 24k
47 Does it work? Uniform 1.6904 BPMF 1.2525 CoBaFi (us) 1.1827 Netflix (24k users) BeerAdvocate 2.1972 1.9855 1.6741 Compare on Predictive Probability (PP) to see how well our model fits the data
48 Handling Spammers PP Before 1.7047 1.0549 PP After 1.8146 1.7042 BPMF CoBaFi Random na ve spammers in Amazon Electronics dataset PP Before 1.2375 0.9670 PP After 1.3057 1.2935 BPMF CoBaFi Random hijacked accounts in Netflix 24k dataset
49 Clustered Na ve Spammers 83% are clustered together
50 Clustered Hijacked Accounts Clustered hijacked accounts Clustered attacked movies