Efficient Billion-Scale Label-Constrained Reachability Queries

Answering Billion-Scale Label-Constrained Reachability

Answering Billion-Scale Label-Constrained Reachability

Queries within Microsecond

Queries within Microsecond

You Peng,

Ying Zhang,

Xuemin

Lin,

Lu Qin, Wenjie Zhang

The University of New South Wales

The University of Technology Sydney

Outline of This Paper

Introduction

•

Big

graph

 data sets are ubiquitous

•

Social networks(e.g., Facebook, Twitter)

•

Knowledge graphs (e.g., DBPedia, MS Academic Graph)

•

Biological networks

•

We Focus on “

things

”(nodes or vertices) and their

relationships

labeled

 directed edges.)

Problem Definition

•

Label-Constrained Reachability(LCR) Queries

Given a graph G, a query pair(s, t) and label set, an LCR query is to ask

whether a vertex s can reach

another vertex t through only edges with the labels in set L.

•

Example:

•

A metabolic network, Label Constraints = {L1, L2, L3, L4}.

IS

 indicates the i

th

 intermediate substrate.

 indicates the i

th

 enzyme.

•

Query(v1,v6,{L1,L4}) = True

•

Query(V1,V5,{L3}) = False

•

LCR Queries

•

Natural generalization of reachability queries.

•

An important fragment of the language of

regular path queries.

•

Implemented in W3C’s SPARQL 1.1, Neo4j’s Cypher, and Oracle’s PGQL

Existing Works

•

Two approaches:

exhaustive search

 using state-of-the-art methods such as direction-optimizing BFS(DBFS)

•

Beamer et al. Scientific Programming 21, 2013

•

Or graph indexing for accelerated search:

•

Jin et al. SIGMOD 2010,

•

Bonchi et al. EDBT, 2014

•

Zou et al. Information Systems 40, 2014

•

Valstar et a., SIGMOD 2017. (

LI+

: the state-of-the-art)

•

Landmark Index(LI+):

•

Key idea: construct an index for

partial landmarks

by BFS from landmark.

•

Insert a new index entry: compare to the existing index entries and only keep the

minimal

 one.

•

Answer an LCR-query:  use

pruned BFS

which uses the landmarks index.

•

Extended index techniques: e.g. index non-landmarks, pruning for accelerating false-queries.

Our contributions

•

New indexing methods based

2-hop labeling

framework to the well-studied LCR problem.

•

New

Vertices orders

and

BFS search Orders

are proposed to speed up the performance.

•

Scales to

billion-scale

 graphs, where all the existing works could not address.

•

Up to orders of magnitude

faster

 query evaluation, indexing time, and

smaller

 indexing size than the

state-of-the-art LI+ algorithm.

Proposed Method

•

2-hop cover framework

•

In labels(L

in

) and out labels(L

out

) for each vertex

•

Ideal Properties for 2-hop index

•

Soundness

•

Completeness

•

Minimal

Proposed Method

P2H: pruned landmark

(1,r)

(1,g)

(1,b)

(1,rb)

(1,rg),1(gb),

(1,rg)

(1,r)

(1,r)

(2,r)

(2,g)

,(2,g)

,(2,rg)

,(3,b)

(4,g)

(5,r)

Proposed Method

P2H+: pruned landmark with minimal property

Minimal property: BFS Search order

•

Original BFS Order in distance order.

•

in

[5] = {(1,r), (1,rg), (1,rb), (1,rw)}

•

in

[6],

in

[7], L

in

[8]…

•

 BFS Order from in label size order.

•

in

[5] = {(1,r)}

•

in

[6] = {(1,rg)}

•

in

[7] = {(1,rb)}

•

in

[8] = {(1,r)}

•

Minimal Property

Proposed Method

Vertex Ordering Strategies:

•

Degree

•

Significant path

•

Dealing with large labels:

•

Primary Label Index

•

Secondary Label Index

Experiment

•

Algorithms:

•

Billion-Scale Edges

•

Graphs with large labels

Performance

•

Algorithms:

•

LI+: The state-of-the-art landmark

index-based

•

NP2H: Naïve One-label 2-hop index

•

P2H: proposed LC 2-hop index

•

P2H+: LC 2-hop index algorithm with

advanced BFS Search Order and Vertex

Order Strategy.

•

IT

: Index Time(S)

•

IS

: Index Size(MB)

Performance

•

Query Time(speed-ups on LI+)

Conclusion

and

future

work

Slide Note

Good morning, everyone.

My name is YouPeng. I am a postdoc researcher from the university of new south wales.

The title of my paper is “Answering Billion-Scale Label-Constrained Reachbility Queries within Microsecond”.

All the authors are from the university of new South Wales, and the university of Technology Sydney.

Embed Share

Download

Graph data sets are prevalent in various domains like social networks and biological networks. Label-Constrained Reachability (LCR) queries aim to determine if a vertex can reach another vertex through specific labeled edges. Existing works utilize exhaustive search or graph indexing techniques, but new indexing methods based on a 2-hop labeling framework provide significant improvements in query evaluation speed and indexing size, scaling to billion-scale graphs.

prashant Follow

Uploaded on Sep 10, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Answering Billion-Scale Label-Constrained Reachability Queries within Microsecond You Peng, Ying Zhang, Xuemin Lin, Lu Qin, Wenjie Zhang The University of New South Wales The University of Technology Sydney 1

Introduction Big graph data sets are ubiquitous Social networks(e.g., Facebook, Twitter) Knowledge graphs (e.g., DBPedia, MS Academic Graph) Biological networks We Focus on things (nodes or vertices) and their relationships (labeled directed edges.) 3

Problem Definition Label-Constrained Reachability(LCR) Queries: Given a graph G, a query pair(s, t) and label set, an LCR query is to ask whether a vertex s can reach another vertex t through only edges with the labels in set L. Example: ISiindicates the ithintermediate substrate. Liindicates the ithenzyme. A metabolic network, Label Constraints = {L1, L2, L3, L4}. Query(v1,v6,{L1,L4}) = True Query(V1,V5,{L3}) = False LCR Queries Natural generalization of reachability queries. An important fragment of the language of regular path queries. Implemented in W3C s SPARQL 1.1, Neo4j s Cypher, and Oracle s PGQL 4

Existing Works Two approaches: exhaustive search using state-of-the-art methods such as direction-optimizing BFS(DBFS) Beamer et al. Scientific Programming 21, 2013 Or graph indexing for accelerated search: Jin et al. SIGMOD 2010, Bonchi et al. EDBT, 2014 Zou et al. Information Systems 40, 2014 Valstar et a., SIGMOD 2017. (LI+: the state-of-the-art) Landmark Index(LI+): Key idea: construct an index for partial landmarks by BFS from landmark. Insert a new index entry: compare to the existing index entries and only keep the minimal one. Answer an LCR-query: use pruned BFS which uses the landmarks index. Extended index techniques: e.g. index non-landmarks, pruning for accelerating false-queries. 5

Our contributions New indexing methods based 2-hop labeling framework to the well-studied LCR problem. New Vertices orders and BFS search Orders are proposed to speed up the performance. Scales to billion-scale graphs, where all the existing works could not address. Up to orders of magnitude faster query evaluation, indexing time, and smaller indexing size than the state-of-the-art LI+ algorithm. 6

Proposed Method 2-hop cover framework In labels(Lin) and out labels(Lout) for each vertex Ideal Properties for 2-hop index Soundness Completeness Minimal 7

Proposed Method P2H: pruned landmark id Lin 1 - 2 3 4 5 6 7 (1,rg),1(gb), (2,g) ,(3,b) (1,rg) ,(2,g) ,(2,rg) (1,r) (1,r) (5,r) (1,g) (1,r) Lout - - - - (4,g) (1,b) (1,rb) (2,r) 8

Proposed Method P2H+: pruned landmark with minimal property Minimal property: BFS Search order Original BFS Order in distance order. Lin[5] = {(1,r), (1,rg), (1,rb), (1,rw)} Lin[6], Lin[7], Lin[8] BFS Order from in label size order. Lin[5] = {(1,r)} Lin[6] = {(1,rg)} Lin[7] = {(1,rb)} Lin[8] = {(1,r)} Minimal Property 9

Proposed Method Vertex Ordering Strategies: Degree Significant path Dealing with large labels: Primary Label Index Secondary Label Index 10

Experiment Algorithms: Billion-Scale Edges Graphs with large labels 11

Performance Algorithms: LI+: The state-of-the-art landmark index-based NP2H: Na ve One-label 2-hop index P2H: proposed LC 2-hop index P2H+: LC 2-hop index algorithm with advanced BFS Search Order and Vertex Order Strategy. IT: Index Time(S) IS: Index Size(MB) 12

Performance Query Time(speed-ups on LI+) 13

Conclusion and future work Conclusion Future work We study the problem of LCR. We propose 2-hop based algorithms We develop advanced BFS search order and Vertex Order, which is the first work dealing with billion-scale LCR. We have conducted an extensive experimental evaluation Investigate the LCR problem on dynamic graphs. 14

Efficient Billion-Scale Label-Constrained Reachability Queries

Download Presentation

Presentation Transcript

Related

More Related Content