Non-Negative Tensor Factorization with RESCAL
This article discusses non-negative tensor factorization with RESCAL, covering topics such as Non-Negative Matrix Factorization, Multiplicative Updates, RESCAL for Relational Learning, and Non-Negative Constraint for RESCAL. It explores how factorizing matrices/tensors into non-negative factors can aid in interpretation of latent factors and clustering, and how RESCAL has provided good results in relational learning tasks. The discussion also includes integrating entity attribute information into the factorization model.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Non-Negative Tensor Factorization with RESCAL Denis Krompa 1, Maximilian Nickel1, Xueyan Jiang1and Volker Tresp1,2 1Department of Computer Science. Ludwig Maximilian University, Oettingenstra e 67, 80538 Munich, Germany 2Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, 81739 Munich, Germany
Overview 1. Non-Negative Matrix Factorization 2. Multiplicative Updates 3. RESCAL 4. Non-Negativity for RESCAL 5. Experiments 6. Benefits and Drawbacks
Non-Negative Matrix Factorization Factorize a Matrix/Tensor into non-negative factors e.g. X = AV Allows interpretation of latent factors Can be directly used for clustering Enforces sparse factors
Multiplicative Updates Introduced by Lee & Seung in 2000 Used by M rup and Hanson to infer NN Tucker decomposition Define a cost-function C( ) Derive the partial derivative with respect to i Identify negative and positive part of the derivative and construct update function: ( ) C Negative part of the derivative C = i i i + ( ) Positive part of the derivative i
RESCAL Tensor Factorization for Relational Learning Three-way-tensor factorization model ,..., 1 for , m k A AR X k k = = k T k 2 2 2 X R = + + T ( , , ) C A X AR A A R RESCAL k A R k k F F F Showed very good results in various relational learning tasks [5,8]
Non-Negative Constraint for RESCAL Regularized Least-Squares Cost Function X R X R R = + ( , , ) ( , , ) ( , ) C A f A f A 2 LS LS L 2 k k 2 2 X R = = + T ( , , ) and ( , ) f A X AR A f A R A R 2 LS k L A R k k F F F Regularized Kullback-Leibler-Divergence Cost Function X R X R R = + ( , , ) ( , , ) ( , ) C A f A f A 1 KL KL L X ijk k X R R ij A = + = + T ( , , ) log ( ) and ( , ) f A X X AR A f A A R 1 KL ij ij k ij L A R k T 1 1 ( ) AR k ij
Normalization and Integrating Entity Attribute Information Normalization of Factor Matrix A [13] A f A C = ~ with ~ X R X R R + * ( , , ) ( , , ) ( ) f 1 LS LS L A = ir A ir A r F Add attribute information to the model [8] , ( ) , , , , ( LS LS D f C attr attr = X R X R = + , ) ( , , ) C A D V C A C + D A V LS f attr = where ( , , ) ( ) A V V Include 2 LS LS L 2 2 = with ( , , ) and ( ) f D A V D AV f V V 2 LS L F F attr Also for KL- Divergence Cost- Function
Experiments Nations 14 x 14 x 56 multi-relational data that consist of relations between nations (treaties, immigration, etc). Additionally the dataset contains attribute information for each entry. Kinship 104 x 104 x 26 multi-relational data that consist of several kinship relations within the Alwayarra tribe. UMLS 135 x 135 x 49 multi relational data that consist if biomedical relationships between categorized concepts of the Unified Medical Language System (UMLS). 10 x cross-validation Initialized the non-negative factor matrices with Non-negative Double Singular Value Decomposition method (NNDSVD)[9] Nonzero entries were defined as entries smaller than -1.0e-9 or greater than 1.0e-9.
Results In Kinships and UMLS case, the performance is similar to the original RESCAL Worse performance on the nations dataset Sparsity of latent factors significantly lower for UMLS and Kinships Minimizing KL-Divergence Cost functions leads to more sparse factors LS KL-Divergence LS KL-Divergence
Conclusion Extended non-negative matrix factorization to relational learning tasks with the RESCAL model employing multiplicative update rules. Derived update rules for Least-Squares and KL-Divergence based cost functions including: Regularization, Normalization and Attribute Information (+) Benefits: + Updates also exploit data sparsity + Little loss in predictive performance + Significant gain in sparsity of latent factor representation (-) Drawbacks: - Slower convergence even after using non-random initialization of factor matrices as proposed by [9]
References 1. Harshman, RA: Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 1-84 (1970) Carroll, JD, Chang, JJ: Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckert- Young" decomposition. Psychometrika, 35, 283-319 (1970) Tucker, LR: Some Mathematical notes on three-mode factor analysis. Psychometrika, 31, 279-311 (1966) Lee, DD, Seung, H.S.: Algorithms for Non-negative Matrix Factorization. In NIPS, 556-562 (2000) Nickel, M, Tresp, V, Kriegel, HP: A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (2011) Wang, F, Li, P, K nig, AC: Efficient Document Clustering via Online Nonnegative Matrix Factorizations In Proceedings of SDM'11, 908-919 (2011) Kohen, Y: The BellKor Solution to the Netflix Grand Prize. (2009) Nickel, M, Tresp, V, Kriegel, HP: Factorizing YAGO. Scalable Machine Learning for Linked Data. In Proceedings of the 21st International World Wide Web Conference (WWW2012) (2012) Boutsidis, C, Gallopoulos, E: SVD-based initialization: A head start for nonnegative matrix factorization. Pat. Recogn. 41(4), 1350-1362 (2008) Langville, AN, Meyer, CD, Albright R: Initializations for the Nonnegative Matrix Factorization. KDD (2006) Lee, DD, Seung, HS: Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788-791 (1999) M rup, M, Hanson, LK: Algorithms for Sparse Non-negative Tucker decomposition. Neural Comput. 20(8), 2112-31 (2008) Eggert, J, K rner, E: Sparse coding and NMF. In Neural Networks, volume 4, pages 2529-2533 (2004) 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.