Efficient Billion-Scale Label-Constrained Reachability Queries

 
 
Answering Billion-Scale Label-Constrained Reachability
Answering Billion-Scale Label-Constrained Reachability
Queries within Microsecond
Queries within Microsecond
 
You Peng,
 
Ying Zhang,
 
Xuemin
 
Lin,
 
Lu Qin, Wenjie Zhang
The University of New South Wales
The University of Technology Sydney
 
Outline of This Paper
Introduction
 
Big 
graph
 data sets are ubiquitous
Social networks(e.g., Facebook, Twitter)
Knowledge graphs (e.g., DBPedia, MS Academic Graph)
Biological networks
 
 
 
 
 
 
 
 
 
 
We Focus on “
things
”(nodes or vertices) and their 
relationships
 (
labeled
 directed edges.)
 
 
Problem Definition
 
Label-Constrained Reachability(LCR) Queries
:
Given a graph G, a query pair(s, t) and label set, an LCR query is to ask 
whether a vertex s can reach
another vertex t through only edges with the labels in set L.
 
 
 
Example:
A metabolic network, Label Constraints = {L1, L2, L3, L4}.
     IS
i
 indicates the i
th
 intermediate substrate.
     L
i
 indicates the i
th
 enzyme.
 
Query(v1,v6,{L1,L4}) = True
 
Query(V1,V5,{L3}) = False
 
LCR Queries
Natural generalization of reachability queries.
An important fragment of the language of 
regular path queries.
Implemented in W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL
 
 
 
 
 
Existing Works
 
Two approaches: 
exhaustive search
 using state-of-the-art methods such as direction-optimizing BFS(DBFS)
Beamer et al. Scientific Programming 21, 2013
 
Or graph indexing for accelerated search:
Jin et al. SIGMOD 2010,
Bonchi et al. EDBT, 2014
Zou et al. Information Systems 40, 2014
Valstar et a., SIGMOD 2017. (
LI+
: the state-of-the-art)
 
 
 
Landmark Index(LI+):
Key idea: construct an index for 
partial landmarks 
by BFS from landmark.
Insert a new index entry: compare to the existing index entries and only keep the 
minimal
 one.
Answer an LCR-query:  use 
pruned BFS 
which uses the landmarks index.
Extended index techniques: e.g. index non-landmarks, pruning for accelerating false-queries.
Our contributions
 
 
New indexing methods based 
2-hop labeling 
framework to the well-studied LCR problem.
New 
Vertices orders 
and 
BFS search Orders 
are proposed to speed up the performance.
Scales to 
billion-scale
 graphs, where all the existing works could not address.
Up to orders of magnitude 
faster
 query evaluation, indexing time, and 
smaller
 indexing size than the
state-of-the-art LI+ algorithm.
Proposed Method
 
2-hop cover framework
In labels(L
in
) and out labels(L
out
) for each vertex
 
 
 
 
 
Ideal Properties for 2-hop index
Soundness
Completeness
Minimal
Proposed Method
P2H: pruned landmark
 
(1,r)
 
(1,g)
 
(1,b)
 
(1,rb)
 
(1,rg),1(gb),
 
(1,rg)
 
(1,r)
 
(1,r)
 
(2,r)
 
(2,g)
 
,(2,g)
 
,(2,rg)
 
,(3,b)
 
(4,g)
 
(5,r)
Proposed Method
 
P2H+: pruned landmark with minimal property
 
Minimal property: BFS Search order
 
Original BFS Order in distance order.
L
in
[5] = {(1,r), (1,rg), (1,rb), (1,rw)}
L
in
[6],
 
L
in
[7], L
in
[8]…
 
 
 
 BFS Order from in label size order.
L
in
[5] = {(1,r)}
L
in
[6] = {(1,rg)}
L
in
[7] = {(1,rb)}
L
in
[8] = {(1,r)}
Minimal Property
 
 
 
 
 
Proposed Method
 
Vertex Ordering Strategies:
Degree
Significant path
Dealing with large labels:
Primary Label Index
Secondary Label Index
 
 
 
 
 
 
 
 
 
 
Experiment
 
Algorithms:
Billion-Scale Edges
Graphs with large labels
Performance
 
Algorithms:
LI+: The state-of-the-art landmark
index-based
NP2H: Naïve One-label 2-hop index
P2H: proposed LC 2-hop index
P2H+: LC 2-hop index algorithm with
advanced BFS Search Order and Vertex
Order Strategy.
IT
: Index Time(S)
IS
: Index Size(MB)
 
Performance
 
Query Time(speed-ups on LI+)
 
Conclusion
 
and
 
future
 
work
Slide Note

Good morning, everyone.

My name is YouPeng. I am a postdoc researcher from the university of new south wales.

The title of my paper is “Answering Billion-Scale Label-Constrained Reachbility Queries within Microsecond”.

All the authors are from the university of new South Wales, and the university of Technology Sydney.

Embed
Share

Graph data sets are prevalent in various domains like social networks and biological networks. Label-Constrained Reachability (LCR) queries aim to determine if a vertex can reach another vertex through specific labeled edges. Existing works utilize exhaustive search or graph indexing techniques, but new indexing methods based on a 2-hop labeling framework provide significant improvements in query evaluation speed and indexing size, scaling to billion-scale graphs.

  • Graph Data Sets
  • Label-Constrained Reachability
  • Query Evaluation
  • Indexing Techniques
  • 2-hop Labeling

Uploaded on Sep 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Answering Billion-Scale Label-Constrained Reachability Queries within Microsecond You Peng, Ying Zhang, Xuemin Lin, Lu Qin, Wenjie Zhang The University of New South Wales The University of Technology Sydney 1

  2. Introduction Big graph data sets are ubiquitous Social networks(e.g., Facebook, Twitter) Knowledge graphs (e.g., DBPedia, MS Academic Graph) Biological networks We Focus on things (nodes or vertices) and their relationships (labeled directed edges.) 3

  3. Problem Definition Label-Constrained Reachability(LCR) Queries: Given a graph G, a query pair(s, t) and label set, an LCR query is to ask whether a vertex s can reach another vertex t through only edges with the labels in set L. Example: ISiindicates the ithintermediate substrate. Liindicates the ithenzyme. A metabolic network, Label Constraints = {L1, L2, L3, L4}. Query(v1,v6,{L1,L4}) = True Query(V1,V5,{L3}) = False LCR Queries Natural generalization of reachability queries. An important fragment of the language of regular path queries. Implemented in W3C s SPARQL 1.1, Neo4j s Cypher, and Oracle s PGQL 4

  4. Existing Works Two approaches: exhaustive search using state-of-the-art methods such as direction-optimizing BFS(DBFS) Beamer et al. Scientific Programming 21, 2013 Or graph indexing for accelerated search: Jin et al. SIGMOD 2010, Bonchi et al. EDBT, 2014 Zou et al. Information Systems 40, 2014 Valstar et a., SIGMOD 2017. (LI+: the state-of-the-art) Landmark Index(LI+): Key idea: construct an index for partial landmarks by BFS from landmark. Insert a new index entry: compare to the existing index entries and only keep the minimal one. Answer an LCR-query: use pruned BFS which uses the landmarks index. Extended index techniques: e.g. index non-landmarks, pruning for accelerating false-queries. 5

  5. Our contributions New indexing methods based 2-hop labeling framework to the well-studied LCR problem. New Vertices orders and BFS search Orders are proposed to speed up the performance. Scales to billion-scale graphs, where all the existing works could not address. Up to orders of magnitude faster query evaluation, indexing time, and smaller indexing size than the state-of-the-art LI+ algorithm. 6

  6. Proposed Method 2-hop cover framework In labels(Lin) and out labels(Lout) for each vertex Ideal Properties for 2-hop index Soundness Completeness Minimal 7

  7. Proposed Method P2H: pruned landmark id Lin 1 - 2 3 4 5 6 7 (1,rg),1(gb), (2,g) ,(3,b) (1,rg) ,(2,g) ,(2,rg) (1,r) (1,r) (5,r) (1,g) (1,r) Lout - - - - (4,g) (1,b) (1,rb) (2,r) 8

  8. Proposed Method P2H+: pruned landmark with minimal property Minimal property: BFS Search order Original BFS Order in distance order. Lin[5] = {(1,r), (1,rg), (1,rb), (1,rw)} Lin[6], Lin[7], Lin[8] BFS Order from in label size order. Lin[5] = {(1,r)} Lin[6] = {(1,rg)} Lin[7] = {(1,rb)} Lin[8] = {(1,r)} Minimal Property 9

  9. Proposed Method Vertex Ordering Strategies: Degree Significant path Dealing with large labels: Primary Label Index Secondary Label Index 10

  10. Experiment Algorithms: Billion-Scale Edges Graphs with large labels 11

  11. Performance Algorithms: LI+: The state-of-the-art landmark index-based NP2H: Na ve One-label 2-hop index P2H: proposed LC 2-hop index P2H+: LC 2-hop index algorithm with advanced BFS Search Order and Vertex Order Strategy. IT: Index Time(S) IS: Index Size(MB) 12

  12. Performance Query Time(speed-ups on LI+) 13

  13. Conclusion and future work Conclusion Future work We study the problem of LCR. We propose 2-hop based algorithms We develop advanced BFS search order and Vertex Order, which is the first work dealing with billion-scale LCR. We have conducted an extensive experimental evaluation Investigate the LCR problem on dynamic graphs. 14

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#