Semantically Similar Relation Clustering with Tripartite Graph

Slide Note

This research discusses a Constrained Information-Theoretic Tripartite Graph Clustering approach to identify semantically similar relations. Utilizing must-link and cannot-link constraints, the model clusters relations for applications in knowledge base completion, information extraction, and knowledge inference in a knowledge graph setting.

enav Follow

Uploaded on Sep 25, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

1 Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations IJCAI 15, Buenos Aires, Argentina Chenguang Wang (Peking Univ.), Yangqiu Song (UIUC), Dan Roth (UIUC), Chi Wang (MSR), Jiawei Han (UIUC), Heng Ji (RPI), and Ming Zhang (Peking Univ.)

2 Outline Problem: Relation Clustering Approach: Constrained Tripartite Graph Clustering Model Experiments

3 Open Information Extraction Relations Relations are not canonical: Similar relations are expressed in different natural language ways. Open information extraction (IE) relations Larry Page (born March 26, 1973) is an American computer scientist who cofounded Google Inc. with Sergey Brin. Larry Page Google Unstructured Data ReVerb , cofounded, Open Google was founded by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. Information Extraction Larry Page Google , was founded by,

4 Knowledge Base Relations Relations are not canonical: Multi-hop relation and one-hop relation has the same meaning. Knowledge base relations Harry Potter Series Harry Potter Series J.K Rowling J.K Rowling , written work, , written work, Multi-Hop Relation Generation Philosopher's Stone Harry Potter Series Knowledge Bases Philosopher's Stone Harry Potter Series , part of, , part of, Philosopher's Stone Philosopher's Stone J.K Rowling J.K Rowling , is author of, , is author of,

5 Solution: Clustering Relations Examples (X, wrote, Y) and (X, s written work, Y) (X, is founder of, Y) and (X, is CEO of, Y) (X, written by, Y) and (X, part of, Z)^(Y, wrote, Z) Applications Knowledge base completion [Socher et al., 2013; West et al., 2014] Information extraction [Chan and Roth, 2010; 2011; Li and Ji, 2014] Knowledge inference [Richardson and Domingos, 2006]

6 Constrained Tripartite Graph Clustering Relation Clustering

7 Problem Formulation: Constrained Tripartite Graph Clustering Left entity set Relation set Right entity set Left entity latent label set Right entity latent label set e.g., Person e.g., Organization Relation latent label set e.g., Leadership of

8 Must-Link and Cannot-Link Constraints Must-link e.g., Person Note: we impose soft constraints to the above relations and entities, since in practice, some constraints could be violated. Cannot-link e.g., Leadership of

9 Model Description Intuition Relation triplet joint probability decomposition: Calculated based on the co- occurrence count of and 1) p( ??,?? ? ?? ?? 2) 1,??,?? 2) p(??,?? p(?? Motivated by Information-Theoretic Co-Clustering (ITCC) [I. S. Dhillon KDD 03] : q(??,?? ???, ?| ???? ?)=p( ?)p(??| ?) ???? ???)p(?? ?) approximation p( ??,?? Cluster indicators Cluster indices Multinomial distributions composed byp( ??,?? Multinomial distributions composed byq( ??,?? Objective Function 1) 2) ???(p(R,?1)||q(R,?1))+???(p(R,?2)||q(R,?2)) ??2 ??1?(??1,??2 ??1)+ ??1=1 ??2 ?2 ?1 ? ?2 = ?????? ? ? + ??1=1 + ??1 + ??1 ??2 ???1?(??1,??2 ???1) 1?(??1 ?1 ?1 1,??2 1 ??1 1,??2 1 ???1 1 )+ ??1 ??2 1?(??1 1 ) 1 ??1 ??2 1 ???1 ??2 1=1 1=1 ?2 2,??2 2 ??1 2,??2 2 ???1 2 )+ ??1 Must-link set 2?(??1 2?(??1 2 ) 2 ??1 2 ???1 2=1 2=1 Cannot-link set

10 Experiments Datasets Name Description Rel-KB KB relations from Freebase, which particularly includes multi-hop relations Rel-OIE Open IE Relations extracted from Wikipedia using ReVerb ReVerb Relation Constraints for Rel-KB dataset(* Entity Constraints are similarly defined) Constraint Type Description Must-link If two relations are generated from the same relation category, we add a must-link Cannot-link Otherwise Relation Constraints for Rel-OIE dataset (* Entity Constraints are similarly defined) Constraint Type Description Must-link If the similarity between two relation phrases is beyond a predefined threshold (experimentally, 0.5), we add a must-link to these relations Cannot-link Otherwise

11 Comparable Methods Methods Description Kmeans One-dimensional clustering algorithm CKmeans Constrained Kmeans [S. Basu KDD 04] ITCC Information-theoretic co-clustering [I. S. Dhillon KDD 03] CITCC Constrained information-theoretic co-clustering [Y. Song TKDE 13] TFBC Tensor factorization based clustering [I. Sutskever NIPS 09] TGC Our method without constraints CTGC Our method

12 Analysis of Clustering Results Relation constraints are very effective: CTGC and TGC perform better, with more relation constraints in CTGC, the improvement is more significant. Finding #1: Entity constraints are effective, but not as relation constraints: CTGC with 3K entity constraints performs almost the same with that with 6K entity constraints. Finding #2:

13 Case Study of Clustering Results Examples generated by CTGC Category Examples Organization-Founder (X, founded by, Y); (X, led by, Y); (Y, is the owner of, X); (X, , sold by, Y) Actor-Film (X, act in, Y); (X, , appears in, Y); (X, won best actor for, Y) Examples generated by TGC Category Examples Organization-Founder (X, founded by, Y); (X, led by, Y); (Y, is the owner of, X); (X, , sold by, Y) Actor-Film (X, who played, Y); (X, starred in, Y); (X, s capital in, Y) Both CTGC and TGC generate reasonable results: The tripartite graph structure enhances the clustering by using entity and relation together. CTGC is better than TGC: The must-link and cannot-link constraints help filter out illegitimate relations. Finding #1: Finding #2:

14 Recall Problem Relation clustering CTGC Constrained information-theoretic tripartite graph clustering model Results In both knowledge base and open information extraction, CTGC is effective Thank You! If you have any problem, please contact via wangchenguang@pku.edu.cn

Semantically Similar Relation Clustering with Tripartite Graph

Download Presentation

Presentation Transcript

Related

More Related Content