Query-Centric Framework for Big Graph Querying

 
Da Yan (UAB)
, James Cheng (CUHK), M. Tamer Özsu (UWaterloo),
Fan Yang (CUHK), Yi Lu (MIT), John C.S. Lui (CUHK),
Qizhen Zhang (UPenn), Wilfred Ng (HKUST)
A General-Purpose Query-Centric
Framework for Querying Big Graphs
Outline
2
Pregel Review: Think Like a Vertex
Motivations of Developing Quegel
System Design
How to Use Quegel
Outline
3
Pregel Review: Think Like a Vertex
Motivations of Developing Quegel
System Design
How to Use Quegel
Google’s Pregel
 
Large-Scale Graph Processing
»
Think like a vertex
»
Message passing
»
Iterative
Each iteration is called a 
superstep
4
Google’s Pregel
Vertex Partitioning
5
0
1
2
3
4
5
6
7
8
 
M
0
 
M
1
 
M
2
Google’s Pregel
 
Programming Interfaces
»
 u
.
compute
(
msgs
)
»
 u
.
send_msg
(
v, msg
)
»
 
get_superstep_number
()
»
 u
.
vote_to_halt
()
6
 
Called inside 
u
.
compute
(
msgs
)
Google’s Pregel
 
Vertex state
»
Active / inactive
»
Reactivated by messages
Stop condition
»
All vertices are halted, and
»
No pending messages for the next superstep
7
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
1
8
s
 
0
 
 
 
 
 
 
 
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
1
9
0
s
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
1
10
0
halt
s
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
2
11
s
0
1
1
1
1
= 
∞:
 
set value 1
 
bcast msgs
 
halt
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
2
12
s
0
1
1
1
1
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
2
13
s
0
1
1
1
1
halt
halt
halt
halt
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
3
14
s
0
1
1
1
1
 
∞, halt
Google’s Pregel
Example: Breadth-First Search (BFS)
»
Superstep 
3
15
s
0
1
1
1
1
= 
∞:
 
set value 2
 
bcast msgs
 
halt
BigGraph@CUHK
Improving Pregel
»
Block-centric model (PVLDB’14)
»
Message reduction (WWW’15)
»
Querying workload (PVLDB’16)
»
Cost model (PVLDB’14)
»
Performance study (PVLDB’14)
»
Tutorial (SIGMOD’16)
»
Fault-tolerance, out-of-core support …
16
Outline
17
Pregel Review: Think Like a Vertex
Motivations of Developing Quegel
System Design
How to Use Quegel
A graph query usually has a light workload
»
Only a portion of the whole graph gets accessed
18
Quegel Motivations
P
o
i
n
t
-
t
o
-
P
o
i
n
t
 
S
h
o
r
t
e
s
t
 
P
a
t
h
 
BFS (terminate earlier)
Bi-directional BFS
Hub
2
 indexing
A graph query usually has a light workload
»
Only a portion of the whole graph gets accessed
19
Quegel Motivations
G
r
a
p
h
 
K
e
y
w
o
r
d
 
S
e
a
r
c
h
keywords: k
1
, k
2
k
1
k
2
k
1
k
2
 
Existing solutions are unsatisfactory
»
Option 
1
: to process queries 
one job after another
Network bandwidth under-utilization
Too many synchronization barriers
High startup overhead (e.g., graph loading)
20
Quegel Motivations
 
Existing solutions are unsatisfactory
»
Option 
2
: to hard code a job to process 
multiple
queries
Hard code the iterating through a batch of queries
Take care of stop conditions manually
Straggler problem
21
Quegel Motivations
 
Quegel: A Query-Centric Framework
»
Orders of magnitude
 performance improvement
e.g., point-to-point shortest path query on Twitter
Giraph: > 
100
 seconds 
per query
Quegel: 
3
 queries 
per second
22
Quegel Motivations
Outline
23
Pregel Review: Think Like a Vertex
Motivations of Developing Quegel
System Design
How to Use Quegel
 
Execution model: 
superstep-sharing
»
Each iteration is called a 
super-round
»
In a super-round, every query proceeds by one
superstep
24
 
S
u
p
e
r
R
o
u
n
d
 
#
 
1
 
q
1
 
2
 
3
 
4
1
2
3
4
 
q
3
 
q
2
 
q
4
 
T
i
m
e
 
Q
u
e
r
i
e
s
 
5
 
6
 
q
1
 
q
2
 
q
3
 
q
4
 
7
1
2
3
4
1
2
3
4
1
2
3
4
System Design
 
Benefits of Superstep-Sharing
»
Messages of multiple queries transmitted in one
batch
»
One synchronization barrier for each super-round
»
Better load balancing
25
 
Worker
 1
 
Worker
 2
 
time
 
sync
 
sync
 
sync
 
Individual Synchronization
 
Superstep-Sharing
System Design
 
Programming Interface
»
API is similar to Pregel
get_query
(): get content of current query
force_terminate
(): terminate the whole query immediately
»
The system differentiates three types of data
Q-data: superstep number, control information, …
V-data: adjacency list, vertex/edge labels
VQ-data: vertex state in the evaluation of each query
26
System Design
 
Performance-Critical Designs
»
Only create a VQ-data of a vertex for a query, if the
query touches the vertex
»
Garbage collection of Q-data and VQ-data
»
Distributed indexing (e.g., inverted index on local
machine)
»
27
System Design
Outline
28
Pregel Review: Think Like a Vertex
Motivations of Developing Quegel
System Design
How to Use Quegel
Get Started
Website: 
cse.cuhk.edu.hk/quegel
»
Open-source system code
»
Manual for 
deployment
»
Manual for 
running in a distributed cluster
»
Programming Interface
»
Example applications:
Point-to-point shortest path
Point-to-point reachability
Keyword search on XML/RDF data
Distance query on terrain data
29
30
Email: 
yanda@uab.edu
Website: 
http://yanda.cis.uab.edu
YAN, Da
Q & A
I’m looking for Ph.D. students
Slide Note
Embed
Share

A comprehensive exploration of Google's Pregel system, outlining its design, programming interfaces, vertex partitioning, vertex states, and practical examples like Breadth-First Search. The framework provides insights into large-scale graph processing by thinking like a vertex and leveraging message passing in an iterative superstep fashion.

  • Graph Processing
  • Pregel System
  • Big Data
  • Query Framework
  • Google Pregel

Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A General-Purpose Query-Centric Framework for Querying Big Graphs Da Yan (UAB), James Cheng (CUHK), M. Tamer zsu (UWaterloo), Fan Yang (CUHK), Yi Lu (MIT), John C.S. Lui (CUHK), Qizhen Zhang (UPenn), Wilfred Ng (HKUST)

  2. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 2

  3. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 3

  4. Googles Pregel Large-Scale Graph Processing Think like a vertex Message passing Iterative Each iteration is called a superstep 4

  5. Googles Pregel Vertex Partitioning 2 4 5 6 1 0 3 7 8 1 3 0 1 2 7 0 2 3 2 5 7 1 3 4 7 4 6 0 3 1 4 2 5 5 8 2 3 4 8 6 7 6 7 8 M0 M1 M2 5

  6. Googles Pregel Programming Interfaces u.compute(msgs) u.send_msg(v, msg) get_superstep_number() u.vote_to_halt() Called inside u.compute(msgs) 6

  7. Googles Pregel Vertex state Active / inactive Reactivated by messages Stop condition All vertices are halted, and No pending messages for the next superstep 7

  8. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s 8

  9. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s 9

  10. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s halt 10

  11. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 1 0 s 1 = : set value 1 bcast msgs halt 1 11

  12. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 1 0 s 1 1 12

  13. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 halt 1 0 halt s 1 halt 1 halt 13

  14. Googles Pregel Example: Breadth-First Search (BFS) Superstep 3 1 1 0 s 1 , halt 1 14

  15. Googles Pregel Example: Breadth-First Search (BFS) Superstep 3 1 1 0 s 1 = : set value 2 bcast msgs halt 1 15

  16. BigGraph@CUHK Improving Pregel Block-centric model (PVLDB 14) Message reduction (WWW 15) Querying workload (PVLDB 16) Cost model (PVLDB 14) Performance study (PVLDB 14) Tutorial (SIGMOD 16) Fault-tolerance, out-of-core support 16

  17. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 17

  18. Quegel Motivations A graph query usually has a light workload Only a portion of the whole graph gets accessed s t BFS (terminate earlier) Bi-directional BFS Hub2 indexing Point-to-Point Shortest Path 18

  19. Quegel Motivations A graph query usually has a light workload Only a portion of the whole graph gets accessed k2 k1 k2 k1 keywords: k1, k2 Graph Keyword Search 19

  20. Quegel Motivations Existing solutions are unsatisfactory Option 1: to process queries one job after another Network bandwidth under-utilization Too many synchronization barriers High startup overhead (e.g., graph loading) 20

  21. Quegel Motivations Existing solutions are unsatisfactory Option 2: to hard code a job to process multiple queries Hard code the iterating through a batch of queries Take care of stop conditions manually Straggler problem 21

  22. Quegel Motivations Quegel: A Query-Centric Framework Orders of magnitude performance improvement e.g., point-to-point shortest path query on Twitter Giraph: > 100 seconds per query Quegel: 3 queries per second 22

  23. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 23

  24. System Design Execution model: superstep-sharing Each iteration is called a super-round In a super-round, every query proceeds by one superstep 1 1 2 2 3 3 4 4 5 6 7 Super Round # q1 q2 1 2 3 4 q3 1 2 3 4 q4 1 2 3 4 Time q1 q2 q3 q4 Queries 24

  25. System Design Benefits of Superstep-Sharing Messages of multiple queries transmitted in one batch One synchronization barrier for each super-round Better load balancing sync sync sync time Worker 1 Worker 2 Superstep-Sharing Individual Synchronization 25

  26. System Design Programming Interface API is similar to Pregel get_query(): get content of current query force_terminate(): terminate the whole query immediately The system differentiates three types of data Q-data: superstep number, control information, V-data: adjacency list, vertex/edge labels VQ-data: vertex state in the evaluation of each query 26

  27. System Design Performance-Critical Designs Only create a VQ-data of a vertex for a query, if the query touches the vertex Garbage collection of Q-data and VQ-data Distributed indexing (e.g., inverted index on local machine) s t 27

  28. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 28

  29. Get Started Website: cse.cuhk.edu.hk/quegel Open-source system code Manual for deployment Manual for running in a distributed cluster Programming Interface Example applications: Point-to-point shortest path Point-to-point reachability Keyword search on XML/RDF data Distance query on terrain data 29

  30. Q & A YAN, Da I m looking for Ph.D. students Email: yanda@uab.edu Website: http://yanda.cis.uab.edu 30

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#