P-Rank: A Comprehensive Structural Similarity Measure over Information Networks

Slide Note

Analyzing the concept of structural similarity within Information Networks (INs), the study introduces P-Rank as a more advanced alternative to SimRank. By addressing the limitations of SimRank and offering a more efficient computational approach, P-Rank aims to provide a comprehensive measure of similarity among entities in diverse networks like webpages, research collaboration networks, and more. The presentation highlights the importance of considering both textual and structural proximity for accurate similarity assessments in massive information networks.

exes507 Follow

Uploaded on Sep 12, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

P-Rank: A Comprehensive Structural Similarity Measure over Information Networks Peixiang Zhao, Jiawei Han, Yizhou Sun University of Illinois at Urbana-Champaign Presented by Prof. Hong Cheng, CUHK CIKM 09 November 3 CIKM 09 November 3rd rd, 2009, Hong Kong , 2009, Hong Kong

Outline Introduction & Motivation P-Rank Formula Derivatives Computation Experimental Studies Future direction & Conclusion Nov. 3rd2009 CIKM 09 Hong Kong 1 of 15

Introduction Information Networks (INs) Physical, conceptual, and human/societal entities Interconnected relationships among different entities INs are ubiquitous and form a critical component of modern information infrastructure The Web highway or urban transportation networks research collaboration and publication networks Biological networks social networks Nov. 3rd2009 CIKM 09 Hong Kong 2 of 15

Problem Similarity computation on entities of INs How similar is webpage A with webpage B in the Web ? How similar is researcher A with researcher B in DBLP co- authorship network ? First of all, how to define similarity within a massive IN? Textual proximity of entity labels/contents Structural proximity conveyed through links! A good structural similarity measure in INs: SimRank (KDD 02) Nov. 3rd2009 CIKM 09 Hong Kong 3 of 15

Why SimRank is not Enough? Philosophy two entities are similar if they are referenced by similar entities Potential problems Semantic incomplete Only partial structural information from in-link direction is considered during similarity computation Biased similarity results May fail in different IN settings ! Inefficient in computation Worst-case O(n4), can be improved to O(n3), where n is the number of vertices in the information network Nov. 3rd2009 CIKM 09 Hong Kong 4 of 15

Why SimRank is not Enough? (a) A Heterogeneous IN and Structural Similarity Scores (b) A Homogeneous IN and Structural Similarity Scores Nov. 3rd2009 CIKM 09 Hong Kong 5 of 15

P(enetrating)-Rank Philosophy: Two entities are similar, if 1. they are referenced by similar entities 2. they reference similar entities Advantages Semantic complete Structural information from both in-link and out-link directions are considered during similarity computation Robust in different IN settings A unified structural similarity framework SimRank is just a special case Nov. 3rd2009 CIKM 09 Hong Kong 6 of 15

P-Rank Formula The structural similarity between vertex a and vertex b (a b), s(a, b): Recursive form In-link similarity Out-link similarity Approximate iterative form Nov. 3rd2009 CIKM 09 Hong Kong 7 of 15

P-Rank Property The iterative P-Rank has the following properties: Symmetry: sk(a, b) = sk(b, a) Monotonicity: 0 sk(a, b) sk+1(a, b) 1 Existence: The solution to the iterative P-Rank formula always exists and converges to a fixed point, s( , ), which is the theoretical solution to the recursive P-Rank formula Uniqueness: the solution to the iterative P-Rank formula is unique when C 1 The theoretical solution to P-Rank can be reached by a repetitive computation via the iterative form Nov. 3rd2009 CIKM 09 Hong Kong 8 of 15

P-Rank Derivatives P-Rank proposes a unified structural similarity framework, upon which many structural similarity measures are just its special cases Nov. 3rd2009 CIKM 09 Hong Kong 9 of 15

P-Rank Computation An iterative algorithm is executed until it reaches the fixed point Space complexity: O(n2) Time complexity: O(n4), can be improved to O(n3) by amortization Approximation algorithms on different IN scenarios Homogeneous IN Radius based pruning: vertex-pairs beyond a radius of r are no longer considered in similarity computation Heterogeneous IN Category based pruning: vertex-pairs in different categories are no longer considered in similarity computation Nov. 3rd2009 CIKM 09 Hong Kong 10 of 15

Experimental Studies Data sets: Heterogeneous IN: DBLP (paper, author, conference, year) Homogeneous IN: DBLP (paper with citation), Synthetic data R-MAT Methods P-Rank SimRank Metrics Compactness of clusters Algorithmic nature Ground truth Nov. 3rd2009 CIKM 09 Hong Kong 11 of 15

Compactness of Clusters P-Rank and SimRank are used as underlying similarity measures, respectively, and K-Medoids are used to cluster different vertices Compactness: intra-cluster distance/inter-cluster distance Homogeneous IN Heterogeneous IN Nov. 3rd2009 CIKM 09 Hong Kong 12 of 15

Algorithmic Nature Iterative P-Rank converges fast to the fixed point P-Rank v.s. the damping factor C P-Rank v.s. lambda Nov. 3rd2009 CIKM 09 Hong Kong 13 of 15

Ground Truth Ranking Result Top-10 ranking results for author vertices in DBLP by P-Rank Nov. 3rd2009 CIKM 09 Hong Kong 14 of 15

Conclusion The proliferation of information networks calls for effective structural similarity measures in Ranking Clustering Top-k Query Processing Compared with SimRank, P-Rank is witnessed to be a more effective structural similarity measure in large information networks Semantic complete, general, robust, and flexible enough to be employed in different IN settings Nov. 3rd2009 CIKM 09 Hong Kong 15 of 15