Graph Summarization on Hierarchical DAGs

T
o
p
-
k
 
G
r
a
p
h
 
S
u
m
m
a
r
i
z
a
t
i
o
n
 
o
n
H
i
e
r
a
r
c
h
i
c
a
l
 
D
A
G
s
Xuliang Zhu
, Xin Huang, Byron Choi, Jianliang Xu
Hong Kong Baptist University
 Hong Kong, China
Outlines
Motivations
Related Work
KDAG-Problem
Algorithms
Experiments
Conclusions
Motivations
Hierarchical DAGs 
are everywhere
(e.g. Disease Ontology, ImageNet,
Wikipedia Categories)
Node Weights
Disease Ontology
The occurrences of diseases
ImageNet
The number of pictures
Wikipedia Categories
The views under the categories
 
Disease Ontology
 
Wikipedia Categories
 
ImageNet
Motivations
 
Top-k summarization
Massive terminologies and
complex structures
Difficult to understand or
visualize
An example (k=4)
disease: 
general concept
COVID-19: 
the most
important disease
cancer & flu: 
two
categories of multiple
diseases with large weights
 
 
Disease Ontology
Top-4 Summarization
Related Work
Aggregate method
[X Jing et al. 2014]
Select top-k vertices
with maximum
aggregate value.
Lack diversity
GVDO Model
Tree
Disease Ontology
Our method
Aggregate method
Related Work
Graph summarization
Graph stream summarization
 [X Gou et al.  ICDE’19]
RDF graph summarization 
[S Cebiric et al. PVLDB’15]
Summarization for keyword search 
[G Fakas et al. SIGMOD’15]
Top-k diversification
Top-k maximum clique 
[L Yuan et al. VLDBJ’ 16]
Top-k diversified subgraph 
[Z Yang et al. SIGMOD’16]
Outlines
Motivations
Related Work
KDAG-Problem
Algorithms
Experiments
Conclusions
kDAG-Problem
Summary Score
Here, S is the selected set, V is
the set of all vertices and feq(u)
is the weight of vertex u.
The score is determined by the
weight and distance from the
selected vertices.
KDAG-problem
Find the set
kDAG-Problem Analysis
Diversity
The selected vertices are not similar.
Small Scale
The size of selected vertices is small.
Large Coverage
The selected vertices can reach many
important vertices.
High Correlation
The representative correlation of selected
vertices to an important vertex is high.
 
Our kDAG-Problem is NP-hard
Reduction from 3-SAT problem
Outlines
Motivations
Related Work
KDAG-Problem
Algorithms
Experiments
Conclusions
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
37
46
39
42
18
6
6
Running Example
Marginal Gain
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
37
46
39
42
18
6
6
Running Example
Marginal Gain
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
37
39
42
18
6
6
Running Example
Marginal Gain
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
37
39
21
18
6
6
Running Example
Marginal Gain
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
37
14
21
18
6
6
Running Example
Marginal Gain
Greedy+
Marginal Gain
L
Add maximum
marginal gain into S
greedily.
1
2
3
4
5
6
7
1
0
24
42
18
6
6
1
2
3
4
5
6
7
1
14
21
18
6
6
Running Example
Marginal Gain
Greedy+
Monotone
Submodular
(1 – 1/e)-approximation guarantee
EXT-Greedy
Extract subtree based on
greedy+ answer.
Apply optimal method[1] on
the subtree.
[1] Ontology-based Graph Visualization for Summarized View.  X Zhu et al.
2020. arXiv:arXiv:2008.03053
1
2
3
4
5
6
7
1
0
24
42
18
6
6
Running Example
EXT-Greedy
Extract subtree based on
greedy+ answer.
Apply optimal method[1] on
the subtree.
[1] Ontology-based Graph Visualization for Summarized View.  X Zhu et al.
2020. arXiv:arXiv:2008.03053
1
2
3
4
5
6
7
1
0
24
42
18
6
6
Greedy+ answer
EXT-Greedy
Extract subtree based on
greedy+ answer.
Apply optimal method[1] on
the subtree.
[1] Ontology-based Graph Visualization for Summarized View.  X Zhu et al.
2020. arXiv:arXiv:2008.03053
1
2
3
4
5
6
7
1
0
24
42
18
6
6
Subtree extraction
EXT-Greedy
Extract subtree based on
greedy+ answer.
Apply optimal method[1] on
the subtree.
[1] Ontology-based Graph Visualization for Summarized View.  X Zhu et al.
2020. arXiv:arXiv:2008.03053
1
2
3
4
5
6
7
1
0
24
42
18
6
6
Optimal solution
in subtree
EXT-Greedy
Extract subtree based on
greedy+ answer.
Apply optimal method[1] on
the subtree.
[1] Ontology-based Graph Visualization for Summarized View.  X Zhu et al.
2020. arXiv:arXiv:2008.03053
1
2
3
4
5
6
7
1
0
24
42
18
6
6
Optimal solution
in the original DAG
K-PCGS
Compress DAG
Compress the DAG into a
new graph by removing
useless vertices.
Pruning Candidates
Prune the vertices from
candidates
 
whose upper
bound is smaller than
the top-k lower bound.
1
2
3
4
5
6
7
22
0
24
60
0
6
6
Compressed DAG
Pruned Candidates (Black)
Candidates (Blue)
1
2
3
4
5
6
7
22
 22
0
 0
12
 24
0
 0
0
 0
3
 6
3
 6
Lower Bound (Green)
Upper Bound (Red)
8
8
0
30
 60
K-PCGS
Compress DAG
Compress the DAG into a
new graph by removing
useless vertices.
Pruning Candidates
Prune the vertices from
candidates
 
whose upper
bound is smaller than
the top-k lower bound.
1
2
3
4
6
7
22
0
24
60
6
6
1
2
3
4
6
7
22
 22
0
 0
12
 24
0
 0
3
 6
3
 6
8
8
0
30
 60
Compressed DAG
Pruned Candidates (Black)
Candidates (Blue)
Lower Bound (Green)
Upper Bound (Red)
K-PCGS
Compress DAG
Compress the DAG into a
new graph by removing
useless vertices.
Pruning Candidates
Prune the vertices from
candidates
 
whose upper
bound is smaller than
the top-k lower bound.
1
2
3
4
22
0
(24,
 
12)
60
1
2
3
4
22
 22
0
 0
14
 30
0
 0
8
8
0
30
 60
Compressed DAG
Pruned Candidates (Black)
Candidates (Blue)
Lower Bound (Green)
Upper Bound (Red)
K-PCGS
Compress DAG
Compress the DAG into a
new graph by removing
useless vertices.
Pruning Candidates
Prune the vertices from
candidates
 
whose upper
bound is smaller than
the top-k lower bound.
1
2
3
4
22
0
(24,
 
12)
60
1
2
3
4
22
 22
0
 20
14
 30
0
 0
8
8
0
30
 60
Compressed DAG
Pruned Candidates (Black)
Candidates (Blue)
Lower Bound (Green)
Upper Bound (Red)
K-PCGS
Compress DAG
Compress the DAG into a
new graph by removing
useless vertices.
Pruning Candidates
Prune the vertices from
candidates
 
whose upper
bound is smaller than
the top-k lower bound.
1
2
3
4
22
0
(24,
 
12)
60
1
2
3
4
22
 53
0
 20
14
 30
0
 0
8
8
0
30
 60
Compressed DAG
Pruned Candidates (Black)
Candidates (Blue)
Lower Bound (Green)
Upper Bound (Red)
Complexity Comparison
Here, n is the size of vertices, m is the size of edges, 
h is the height of DAG (length of longest shortest path).
|C| is smaller than n and m’ is smaller than m. 
Outlines
Motivations
Related Work
KDAG-Problem
Algorithms
Experiments
Conclusions
Datasets
Evaluation metrics
Summary Score
Average Distance
1-hop Weighted Coverage
Methods compared
FEQ  
[X Jing et al. 2014]
Select k nodes with the highest weight.
CAGG  
[X Jing et al. 2014]
Aggregate method with a contribution ratio.
LASP 
[G Fakas et al. SIGMOD’15]
Add the largest averaged score path greedily.
Effectiveness Evaluations
 
Average Distance of different models.
Our kDAG model outperforms state-of-the-art
models with the smallest average distance.
Efficiency Evaluations
 
Efficiency evaluation of Greedy+, EXT-Greedy, k-PCGS
and CAGG methods on ImageNet.
The k-PCGS runs fastest among all methods
and EXT-Greedy is the slowest.
Efficiency Evaluations
 
The size of candidates
The k-PCGS
 
reduces at least 90% candidates in
the worst case and more than 99% in most cases.
Outlines
Motivations
Related Work
KDAG-Problem
Algorithms
Experiments
Conclusions
Conclusions
We study the top-k graph summarization problem on
large hierarchical DAGs.
We propose a novel model of kDAG-problem and
proof its NP-hardness.
We develop three algorithms to tackle kDAG-problem:
Greedy+:  
A baseline greedy method.
EXT-Greedy:  
A more effective method.
k-PCGS:  
A more efficient method.
 
Thank you for watching!
Contact us: csxlzhu@comp.hkbu.edu.hk
Slide Note
Embed
Share

Explore top-k graph summarization techniques on Hierarchical Directed Acyclic Graphs (DAGs) like Disease Ontology, ImageNet, and Wikipedia Categories. Understand motivations for summarization, related works, and the kDAG-Problem. Discover algorithms, experiments, and conclusions for efficient graph summarization in complex structures.

  • Graph Summarization
  • Top-k Summarization
  • Hierarchical DAGs
  • Disease Ontology
  • Algorithm

Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Top Top- -k Graph Summarization on k Graph Summarization on Hierarchical DAGs Hierarchical DAGs Xuliang Zhu, Xin Huang, Byron Choi, Jianliang Xu Hong Kong Baptist University Hong Kong, China

  2. Outlines Motivations Related Work KDAG-Problem Algorithms Experiments Conclusions

  3. Motivations Hierarchical DAGs are everywhere (e.g. Disease Ontology, ImageNet, Wikipedia Categories) Node Weights Disease Ontology The occurrences of diseases ImageNet The number of pictures Wikipedia Categories The views under the categories Disease Ontology ImageNet Wikipedia Categories

  4. Motivations Top-k summarization Massive terminologies and complex structures Difficult to understand or visualize An example (k=4) disease: general concept COVID-19: the most important disease cancer & flu: two categories of multiple diseases with large weights Disease Ontology Top-4 Summarization

  5. Related Work Aggregate method [X Jing et al. 2014] Select top-k vertices with maximum aggregate value. Lack diversity Disease Ontology GVDO Model Tree Aggregate method Our method

  6. Related Work Graph summarization Graph stream summarization[X Gou et al. ICDE 19] RDF graph summarization [S Cebiric et al. PVLDB 15] Summarization for keyword search [G Fakas et al. SIGMOD 15] Top-k diversification Top-k maximum clique [L Yuan et al. VLDBJ 16] Top-k diversified subgraph [Z Yang et al. SIGMOD 16]

  7. Outlines Motivations Related Work KDAG-Problem Algorithms Experiments Conclusions

  8. kDAG-Problem Summary Score Here, S is the selected set, V is the set of all vertices and feq(u) is the weight of vertex u. The score is determined by the weight and distance from the selected vertices. KDAG-problem Find the set

  9. kDAG-Problem Analysis Diversity The selected vertices are not similar. Small Scale The size of selected vertices is small. Large Coverage The selected vertices can reach many important vertices. High Correlation The representative correlation of selected vertices to an important vertex is high. Our kDAG-Problem is NP-hard Reduction from 3-SAT problem

  10. Outlines Motivations Related Work KDAG-Problem Algorithms Experiments Conclusions

  11. Greedy+ 37 1 1 1 Marginal Gain 39 46 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 42 18 42 18 Marginal Gain Running Example

  12. Greedy+ 37 1 1 1 Marginal Gain 39 46 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 42 18 42 18 Marginal Gain Running Example

  13. Greedy+ 37 1 1 1 Marginal Gain 39 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 42 18 42 18 Marginal Gain Running Example

  14. Greedy+ 37 1 1 1 Marginal Gain 39 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 21 18 42 18 Marginal Gain Running Example

  15. Greedy+ 37 1 1 1 Marginal Gain 14 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 21 18 42 18 Marginal Gain Running Example

  16. Greedy+ 1 1 1 1 Marginal Gain 14 24 0 2 3 2 3 L Add maximum marginal gain into S greedily. 4 5 6 6 7 6 4 5 6 6 7 6 21 18 42 18 Marginal Gain Running Example

  17. Greedy+ Monotone Submodular (1 1/e)-approximation guarantee

  18. EXT-Greedy 1 1 Extract subtree based on greedy+ answer. 24 0 2 3 Apply optimal method[1] on the subtree. 4 5 6 6 7 6 42 18 Running Example [1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.03053

  19. EXT-Greedy 1 1 Extract subtree based on greedy+ answer. 24 0 2 3 Apply optimal method[1] on the subtree. 4 5 6 6 7 6 42 18 Greedy+ answer [1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.03053

  20. EXT-Greedy 1 1 Extract subtree based on greedy+ answer. 24 0 2 3 Apply optimal method[1] on the subtree. 4 5 6 6 7 6 42 18 Subtree extraction [1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.03053

  21. EXT-Greedy 1 1 Extract subtree based on greedy+ answer. 24 0 2 3 Apply optimal method[1] on the subtree. 4 5 6 6 7 6 42 18 Optimal solution in subtree [1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.03053

  22. EXT-Greedy 1 1 Extract subtree based on greedy+ answer. 24 0 2 3 Apply optimal method[1] on the subtree. 4 5 6 6 7 6 42 18 Optimal solution in the original DAG [1] Ontology-based Graph Visualization for Summarized View. X Zhu et al. 2020. arXiv:arXiv:2008.03053

  23. K-PCGS 22 22 22 1 1 Compress DAG Compress the DAG into a new graph by removing useless vertices. 12 24 0 0 24 0 2 3 2 3 4 5 6 7 4 5 0 6 6 7 6 Pruning Candidates Prune the vertices from candidateswhose upper bound is smaller than the top-k lower bound. 3 6 3 6 0 0 0 0 0 8 30 60 8 60 Compressed DAG Pruned Candidates (Black) Candidates (Blue) Lower Bound (Green) Upper Bound (Red)

  24. K-PCGS 22 22 22 1 1 Compress DAG Compress the DAG into a new graph by removing useless vertices. 12 24 0 0 24 0 2 3 2 3 4 6 7 4 6 6 7 6 Pruning Candidates Prune the vertices from candidateswhose upper bound is smaller than the top-k lower bound. 3 6 3 6 0 0 0 8 30 60 8 60 Compressed DAG Pruned Candidates (Black) Candidates (Blue) Lower Bound (Green) Upper Bound (Red)

  25. K-PCGS 22 22 22 1 1 Compress DAG Compress the DAG into a new graph by removing useless vertices. 14 30 0 0 (24, 12) 0 2 3 2 3 4 4 Pruning Candidates Prune the vertices from candidateswhose upper bound is smaller than the top-k lower bound. 0 0 0 8 30 60 8 60 Compressed DAG Pruned Candidates (Black) Candidates (Blue) Lower Bound (Green) Upper Bound (Red)

  26. K-PCGS 22 22 22 1 1 Compress DAG Compress the DAG into a new graph by removing useless vertices. 14 30 0 20 (24, 12) 0 2 3 2 3 4 4 Pruning Candidates Prune the vertices from candidateswhose upper bound is smaller than the top-k lower bound. 0 0 0 8 30 60 8 60 Compressed DAG Pruned Candidates (Black) Candidates (Blue) Lower Bound (Green) Upper Bound (Red)

  27. K-PCGS 22 53 22 1 1 Compress DAG Compress the DAG into a new graph by removing useless vertices. 14 30 0 20 (24, 12) 0 2 3 2 3 4 4 Pruning Candidates Prune the vertices from candidateswhose upper bound is smaller than the top-k lower bound. 0 0 0 8 30 60 8 60 Compressed DAG Pruned Candidates (Black) Candidates (Blue) Lower Bound (Green) Upper Bound (Red)

  28. Complexity Comparison Method Greedy+ EXT-Greedy k-PCGS Time Complexity O(nkm) O(nkm+nk3h) O(mh + nlogn + k|C|m ) Space Complexity O(m) O(nk2h) O(m) Here, n is the size of vertices, m is the size of edges, h is the height of DAG (length of longest shortest path). |C| is smaller than n and m is smaller than m.

  29. Outlines Motivations Related Work KDAG-Problem Algorithms Experiments Conclusions

  30. Datasets Dataset Hierarchical DAG Weight n m LATT Medical Entity Dictionary Number of queries 4,226 8,190 from patients LNUR Medical Entity Dictionary Number of queries 4,226 8,190 from nurses ANIM Anime" catalog in Wikipedia Number of views 15,135 24,498 IMAGE YAGO3 ImageNet (WordNet) Wikipedia + WordNet Number of pictures Synthetic 73,298 5,699,254 74,732 17,512,754 UCI Social Network Only for efficiency test Synthetic 58,790,783 92,208,196

  31. Evaluation metrics Summary Score Average Distance 1-hop Weighted Coverage

  32. Methods compared FEQ [X Jing et al. 2014] Select k nodes with the highest weight. CAGG [X Jing et al. 2014] Aggregate method with a contribution ratio. LASP [G Fakas et al. SIGMOD 15] Add the largest averaged score path greedily.

  33. Effectiveness Evaluations Average Distance of different models. Our kDAG model outperforms state-of-the-art models with the smallest average distance.

  34. Efficiency Evaluations Efficiency evaluation of Greedy+, EXT-Greedy, k-PCGS and CAGG methods on ImageNet. The k-PCGS runs fastest among all methods and EXT-Greedy is the slowest.

  35. Efficiency Evaluations The size of candidates The k-PCGSreduces at least 90% candidates in the worst case and more than 99% in most cases.

  36. Outlines Motivations Related Work KDAG-Problem Algorithms Experiments Conclusions

  37. Conclusions We study the top-k graph summarization problem on large hierarchical DAGs. We propose a novel model of kDAG-problem and proof its NP-hardness. We develop three algorithms to tackle kDAG-problem: Greedy+: A baseline greedy method. EXT-Greedy: A more effective method. k-PCGS: A more efficient method.

  38. Thank you for watching! Contact us: csxlzhu@comp.hkbu.edu.hk

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#