Query-Centric Framework for Big Graph Querying

Slide Note
Embed
Share

A comprehensive exploration of Google's Pregel system, outlining its design, programming interfaces, vertex partitioning, vertex states, and practical examples like Breadth-First Search. The framework provides insights into large-scale graph processing by thinking like a vertex and leveraging message passing in an iterative superstep fashion.


Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A General-Purpose Query-Centric Framework for Querying Big Graphs Da Yan (UAB), James Cheng (CUHK), M. Tamer zsu (UWaterloo), Fan Yang (CUHK), Yi Lu (MIT), John C.S. Lui (CUHK), Qizhen Zhang (UPenn), Wilfred Ng (HKUST)

  2. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 2

  3. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 3

  4. Googles Pregel Large-Scale Graph Processing Think like a vertex Message passing Iterative Each iteration is called a superstep 4

  5. Googles Pregel Vertex Partitioning 2 4 5 6 1 0 3 7 8 1 3 0 1 2 7 0 2 3 2 5 7 1 3 4 7 4 6 0 3 1 4 2 5 5 8 2 3 4 8 6 7 6 7 8 M0 M1 M2 5

  6. Googles Pregel Programming Interfaces u.compute(msgs) u.send_msg(v, msg) get_superstep_number() u.vote_to_halt() Called inside u.compute(msgs) 6

  7. Googles Pregel Vertex state Active / inactive Reactivated by messages Stop condition All vertices are halted, and No pending messages for the next superstep 7

  8. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s 8

  9. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s 9

  10. Googles Pregel Example: Breadth-First Search (BFS) Superstep 1 0 s halt 10

  11. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 1 0 s 1 = : set value 1 bcast msgs halt 1 11

  12. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 1 0 s 1 1 12

  13. Googles Pregel Example: Breadth-First Search (BFS) Superstep 2 1 halt 1 0 halt s 1 halt 1 halt 13

  14. Googles Pregel Example: Breadth-First Search (BFS) Superstep 3 1 1 0 s 1 , halt 1 14

  15. Googles Pregel Example: Breadth-First Search (BFS) Superstep 3 1 1 0 s 1 = : set value 2 bcast msgs halt 1 15

  16. BigGraph@CUHK Improving Pregel Block-centric model (PVLDB 14) Message reduction (WWW 15) Querying workload (PVLDB 16) Cost model (PVLDB 14) Performance study (PVLDB 14) Tutorial (SIGMOD 16) Fault-tolerance, out-of-core support 16

  17. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 17

  18. Quegel Motivations A graph query usually has a light workload Only a portion of the whole graph gets accessed s t BFS (terminate earlier) Bi-directional BFS Hub2 indexing Point-to-Point Shortest Path 18

  19. Quegel Motivations A graph query usually has a light workload Only a portion of the whole graph gets accessed k2 k1 k2 k1 keywords: k1, k2 Graph Keyword Search 19

  20. Quegel Motivations Existing solutions are unsatisfactory Option 1: to process queries one job after another Network bandwidth under-utilization Too many synchronization barriers High startup overhead (e.g., graph loading) 20

  21. Quegel Motivations Existing solutions are unsatisfactory Option 2: to hard code a job to process multiple queries Hard code the iterating through a batch of queries Take care of stop conditions manually Straggler problem 21

  22. Quegel Motivations Quegel: A Query-Centric Framework Orders of magnitude performance improvement e.g., point-to-point shortest path query on Twitter Giraph: > 100 seconds per query Quegel: 3 queries per second 22

  23. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 23

  24. System Design Execution model: superstep-sharing Each iteration is called a super-round In a super-round, every query proceeds by one superstep 1 1 2 2 3 3 4 4 5 6 7 Super Round # q1 q2 1 2 3 4 q3 1 2 3 4 q4 1 2 3 4 Time q1 q2 q3 q4 Queries 24

  25. System Design Benefits of Superstep-Sharing Messages of multiple queries transmitted in one batch One synchronization barrier for each super-round Better load balancing sync sync sync time Worker 1 Worker 2 Superstep-Sharing Individual Synchronization 25

  26. System Design Programming Interface API is similar to Pregel get_query(): get content of current query force_terminate(): terminate the whole query immediately The system differentiates three types of data Q-data: superstep number, control information, V-data: adjacency list, vertex/edge labels VQ-data: vertex state in the evaluation of each query 26

  27. System Design Performance-Critical Designs Only create a VQ-data of a vertex for a query, if the query touches the vertex Garbage collection of Q-data and VQ-data Distributed indexing (e.g., inverted index on local machine) s t 27

  28. Outline Pregel Review: Think Like a Vertex Motivations of Developing Quegel System Design How to Use Quegel 28

  29. Get Started Website: cse.cuhk.edu.hk/quegel Open-source system code Manual for deployment Manual for running in a distributed cluster Programming Interface Example applications: Point-to-point shortest path Point-to-point reachability Keyword search on XML/RDF data Distance query on terrain data 29

  30. Q & A YAN, Da I m looking for Ph.D. students Email: yanda@uab.edu Website: http://yanda.cis.uab.edu 30

Related