Temporal Graph Analysis for Real-world Graph Evolution
Chronos: A Graph Engine for Temporal Graph Analysis in a social graph environment from years 2012 to 2014. Uncover insights through temporal graph properties and user ranking variations across different years. Discover the evolution of real-world graphs and the computation of properties on graph snapshots. Delve into existing graph engines targeting static graph analysis and performance issues. Revisit static graph analysis through propagation-based graph computation models.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Chronos: A Graph Engine for Chronos: A Graph Engine for Temporal Graph Analysis Temporal Graph Analysis Wentao Han1,3, Youshan Miao2,3, Kaiwei Li1,3, Ming Wu3, Fan Yang3, Lidong Zhou3, Vijayan Prabhakaran3, Wenguang Chen1, Enhong Chen2 Tsinghua University1 University of Science and Technology of China2 Microsoft Research3 1
Temporal Temporal Graphs Graphs Real-world graphs evolve temporal graphs A Social Graph 2012 2014 2013 YEAR Temporal graph properties bring more insights User Ranking 2012 2013 YEAR 2014 2
Temporal Temporal Graphs Graphs Real-world graphs evolve temporal graphs A Social Graph 2012 2014 2013 YEAR Temporal graph properties bring more insights User Ranking Temporal ranks can tell their differences 2012 2013 YEAR 2014 3
Temporal Graph Temporal Graph Analysis Analysis Computing properties on a series of graph snapshots t0 t1 t2 Graph snapshot 2012 2014 2013 YEAR Static Graph Analysis User Ranking Graph Properties 2012 2013 YEAR 2014 4
Temporal Graph Analysis Temporal Graph Analysis Existing graph engines: targeting static graph analysis A possible solution: computing snapshot by snapshot 2012 2014 2013 YEAR Task 1 Task 2 Task 3 User Ranking 2012 2013 YEAR 2014 5
Performance Issues Performance Issues ?????????= ???????? ??????? 6
Revisit: Static Graph Revisit: Static Graph Analysis Analysis Propagation based graph computation model Vertex Data Array Data Propagation v v2 2 ... ... ... ... v2 v1 v3 v v1 1 Local computation Edge Array v3 v5 scan v1 v2 v1 v3 v3 v5 ... ... ... ... 7
Revisit: Static Graph Revisit: Static Graph Analysis Analysis Propagation based graph computation model Vertex Data Array Cache Miss Data Propagation v v2 2 ... ... ... ... v2 v1 v3 v v1 1 Local computation Edge Array v3 v5 scan v1 v2 v1 v3 v3 v5 ... ... ... ... 8
Revisit: Static Graph Analysis Revisit: Static Graph Analysis In parallel: Partition graph & computations among CPU cores Vertex Data Array Inter-core Communication Cross-partition edge Core 1 Core 0 v v2 2 ... ... ... ... v2 v1 v3 v v1 1 Core 0 Core 0 Core Core 1 1 Edge Array v3 v5 Core 1 Core 0 scan v1 v2 v1 v3 v3 v5 ... ... ... ... 9
Temporal Graph Analysis: Snapshot by Snapshot Temporal Graph Analysis: Snapshot by Snapshot Computation on multiple graph snapshot multiple cost Vertex Data Arrays Snapshot 1 ... ... ... ... v2 v1 v3 N snapshots N cache misses N inter-core comm. Snapshot 2 v2' ... v3' ... ... v1' ... Snapshot 3 v2 ... v3 ... ... v1 ... 10
Observations Observations Real-world graph often evolve gradually (Similar snapshots) " " ' v v1 1 v v2 2 v v1 1 v v2 2 v v1 1 v v2 2 ' ' " v4 v4 v4 ' ' v3 v5 v3 v5 v3 v5 " " Snapshot 1 Snapshot 2 Snapshot 3 11
Observations Observations Similar propagations across snapshots " " ' v v1 1 ' v v2 2 v v1 1 v v2 2 v v1 1 v v2 2 " v4 v4 v4 ' ' ' " " v3 v5 v3 v5 v3 v5 Snapshot 1 Snapshot 2 Snapshot 3 12
Idea Idea Group propagations by source & target, not by snapshot " " ' v v1 1 ' v v2 2 v v1 1 v v2 2 v v1 1 v v2 2 " v4 v4 v4 ' ' ' " " v3 v5 v3 v5 v3 v5 Snapshot 1 Snapshot 2 Snapshot 3 Step 2 Step 1 Step 3 Step 1 Step 2 Step 4 Step 3 Propagations: 1 3 1 4 1 5 1 2 13
Chronos: Data Layout Chronos: Data Layout Vertex Data Arrays (snapshot-by-snapshot) Snapshot 1 ... ... ... ... v2 v3 v1 Place together data for the same vertex across multiple snapshots Snapshot 2 v2' ... v3' ... ... ... v1' Snapshot 3 v2 ... v3 ... ... v1 ... Vertex Data Array (Chronos) (with time-locality) Snapshot 1, 2, 3 v2 v2' v2 ... ... ... ... v1 v1' v1 v3 v3' v3 ... fit in a cache line 14
Chronos: Propagation Scheduling Chronos: Propagation Scheduling Locality Aware Batch Scheduling (LABS): Batching propagating across snapshots vertex 1 -> vertex 3 across snapshots vertex 1 -> vertex 2 across snapshots Edge Array v1 v2 v1' v2' v1 v2 v1 v3 v1' v3' v1 v3 ... ... scan Vertex Data Array v2 v2' v2 ... ... ... ... v1 v1' v1 v3 v3' v3 ... fit in a cache line 15
Chronos: Propagation Scheduling Chronos: Propagation Scheduling Locality Aware Batch Scheduling (LABS): Batching propagating across snapshots N propagations 1 cache misses Edge Array v1 v2 v1 v2 v1 v2 v1' v2' v1 v2 v1' v2' v1 v2 v1' v2' v1 v2 v1 v3 v1 v3 v1 v3 v1' v3' v1 v3 ... v1' v3' v1 v3 ... v1' v3' v1 v3 ... ... ... ... scan Vertex Data Array v2 v2' v2 ... ... ... ... v1 v1' v1 v3 v3' v3 ... Cache Hit fit in a cache line 16
Chronos: Propagation Scheduling Chronos: Propagation Scheduling Locality Aware Batch Scheduling (LABS): Batching propagating across snapshots N propagations 1 inter-core comm. Edge Array v1 v2 v1 v2 v1 v2 v1 v2 v1' v2' v1 v2 v1' v2' v1 v2 v1' v2' v1 v2 v1' v2' v1 v2 v1 v3 v1 v3 v1 v3 v1 v3 v1' v3' v1 v3 ... v1' v3' v1 v3 ... v1' v3' v1 v3 ... v1' v3' v1 v3 ... ... ... ... ... scan Vertex Data Array Core 1 Core 0 v2 v2' v2 ... ... ... ... v1 v1' v1 v3 v3' v3 ... Inter-core Communication access in a batch 17
LABS: The Key of Chronos LABS: The Key of Chronos A graph layout Place together vertex/edge data across snapshots A scheduling mechanism Batch propagations across snapshots Efficient Reduced cache miss / inter-core comm. 18
Experimental Evaluation Experimental Evaluation Large temporal graphs Graph # of Vertices # of Edge Events Time Span Source Wiki 1.9 M 40.0 M 6 years Wikipedia graph from KONECT Twitter 7.5 M 61.6 M 3 months Provided by Twitter Weibo 27.7 M 4.9 B 3 years Crawled from Sina Weibo Web 133.6 M 7.2 B 12 months Web graph from DELIS Various graph algorithms PageRank Weakly-connected components (WCC) Single-source shortest path (SSSP) Maximal independent set (MIS) Sparse matrix-vector multiplication (SpMV) Settings CPU 2.4GHz 16-Core RAM 128GB DISK 1TB SSD 19
Chronos: Single Chronos: Single- -Thread Thread Effectiveness Effectiveness 5~9x speedup Temporal Graph Analysis on Wiki 10 WCC 9 Pagerank 8 SSSP 7 Speedup 6 5 4 3 2 1 0 8 16 24 32 1 BatchSize Baseline: Snapshot by snapshot 20
Chronos: Chronos: Single Single- -Thread Effectiveness Thread Effectiveness Cache Miss Reduction 10,000 8,759 Cache miss # (in millions) 8,000 92% 6,000 3,865 3,462 4,000 70% 95% 160 2,000 1,107 1,003 687 649 584 287 265 196 0 L1d Cache Miss LLC Cache Miss dTLB Miss BatchSize=1 BatchSize=4 BatchSize=16 BatchSize=32 Reduced cache misses 21
Chronos: Chronos: Multi Multi- -Core Performance Core Performance PageRank on Wiki 90 80 Snapshot-by-snapshot 70 Chronos 60 Speedup 10x 50 40 30 20 10 0 0 4 8 # of Cores 12 16 1 More than to 10x faster 22
Chronos: Multi Chronos: Multi- -Core Performance Core Performance Reduced Inter-Core Communications 10000 4244.2 Communication Num. 2471.6 977.64 1000 (in Millions) 98% 98% No LABS 98% 105.2 58.56 100 LABS 23.08 10 1 2 4 8 Number of Cores Reduced inter-core comm. 23
More More in in Paper: Paper: Graph computation modes v1 v3 v2 v1 v6 v2 v3 v6 v1 v2 v1 v5 v4 v7 v8 v2 v5 v8 Push Mode Pull Mode Stream Mode All benefit from LABS 24
More More in in Paper: Paper: Incremental graph computation Leveraging the previous snapshot s result Computing only the changed part Can be enhanced with LABS 25
Conclusion Conclusion Temporal graph analysis an emerging class of applications Chronos supports analysis of temporal graphs efficiently Joint design of data layout and scheduling Leveraging the temporal similarity of graphs Exploit data locality esp. in time dimension 26
Thank You! Thank You! Questions? Questions? Tsinghua University University of Science and Technology of China Microsoft Research 27
BACKUP BACKUP Experiment Environment Details Real Graphs Similarities over Time Batch Size Discussion LABS Locking LABS with Incremental Computation LABS on Cluster Related Work 28
Experiment Setup Experiment Setup CPU 2.4GHz Intel Xeon E5-2665 16-core RAM 128GB 1TB SSD (RAID 0 with 372GB1 *3) DISK Network InfiniBand (DDR, 40Gb/s) ClusterSize 4 1. SSD model: TOSHIBA MK4001GRZB 29
Temporal Distributions of Graphs Temporal Distributions of Graphs Edges increase gradually wiki weibo 100% 100% 90% 90% 80% 80% Number of Edges 70% Number of Edges 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 13% 19% 25% 31% 38% 44% 50% 56% 63% 69% 75% 81% 88% 94% 100% 6% Ratio of time range Ratio of time range 30
On On- -disk disk Temporal Graph Temporal Graph Snapshot Groups ... ... Snapshot Group 0 Snapshot Group 1 Time index A Snapshot Group ... a0,1 ... ... ... C0 a0,t C1 a1,1 a1,t Vertex index Edge activities of v0 Edge activities of v1 Edge data for v0 Edge data for v1 Ci: checkpoint of vi: Edges without time information aij: j-th activity of vi: Edge changes, e.g., <addE, (v0, v3, w), t2 > 31
LABS: In LABS: In- -memory Design memory Design Vertex Data Array ... v2 v2' v2 ... v1 v1' v1 Vertex index Data of v1 Data of v2 Logically Equals to: v1 v2 v1' v2' v1 v2 indicate which snapshots the edge exists in Edge Array (v1) v2 110 (v1) v3 ... ... 111 ... ... Vertex index Temporal Edge Edges of v1 32
Temporal Graph Re Temporal Graph Re- -construction construction User input time points: 0, 10, 20 Scan the graph activity log [Type, Endpoints, Time]: addE, v0->v1, 0 addE, v0->v2, 15 addE, v0->v3, 6 delE, v0->v3, 8 Temporal edges [Endpoints, BitSet]: v0->v1, 111 v0->v2, 001 33
Chronos System Overview Chronos System Overview On-Disk Temporal Graph Contains all the graph evolving activities User input multiple time points Scan activities(log) Reconstruct graph snapshots In-Memory Temporal Graph Contains ... v2 v2' v2 v1 v1' v1 ... only snapshots of interest ... ... (v1) v2 111 (v1) v3 111 ... ... Temporal Properties 34
Greater Batch Size of LABS Greater Batch Size of LABS Pros Possible to further reduce cache miss / inter-core comm. Cons Bit wide limit of the instruction: _BitScanForward64 Less snapshot similarity within a batch No more cache miss / inter-core comm. to reduce False sharing with locking 35
Compute Snapshot by Snapshot Compute Snapshot by Snapshot (another way) (another way) Snapshot-Parallelism Vertex Data Array Snapshot 1 Core 1 Core 0 Core 0 ... ... ... ... v2 v1 v3 Snapshot 2 Core 1 v2' ... v3' ... ... v1' ... 3 cache misses 3 inter-core comm. Snapshot 3 Core 2 v2 ... v3 ... ... v1 ... Cache Miss Inter-core communication 36
Parallelization Parallelization -- -- Summary Summary Good partitioning: Num. of intra-partition edge > Num. of inter-partition edge Partition- Parallelism Snapshot- Parallelism LABS- Parallelism More More Less Cache Miss ? Inter-core Communications More No Less Snapshot by snapshot LABS Partition-Parallelism: Computing partitions of the same snapshot in parallel Snapshot-Parallelism: Computing snapshots in parallel LABS-Parallel: Computing LABS-batched partition in parallel 37
LABS Performance on Multi LABS Performance on Multi- -Core Core PageRank on Wiki Partition-Parallelism 90 80 LABS-Parallelsm 70 Snapshot-Parallelism 60 Speedup 50 40 30 20 10 0 0 4 8 # of Cores 12 16 1 Baseline: Single Core LABS-Parallelism out-performs 38
LABS Performance on Cluster LABS Performance on Cluster A small cluster with 4 machines Up to 10x speed up 7318 6405 10000 2002 1250 518 1000 Time (s) 48 100 10 1 PageRank WCC SSSP Baseline LABS Benefit less than in single machine test The benefit of LABS hided by the high overhead of network 39
Reduced Reduced Lock Contentions Lock Contentions LABS amortizes the lock cost across snapshots PageRank on the Wiki graph 120 96.73 No LABS LABS 100 Lock time (second) 80 60 47.54 96% 34.25 40 28.85 96% 20 96% 95% 4.02 1.85 1.34 1.32 0 2 4 8 16 Number of Cores Reduced the time of locking by more than 95% 40
LABS with Incremental Computation LABS with Incremental Computation Traditional incremental computing Incremental Computing Snapshot 1 Snapshot 0 Snapshot 2 Snapshot 3 Incremental computing with LABS Apply LABS (BatchSize = 3) Snapshot 1 Snapshot 0 Snapshot 2 Snapshot 3 41
Gain of Incremental LABS Gain of Incremental LABS 70% WCC 60% SSSP 50% Improvement (%) 40% 30% 20% 10% 0% 1 2 4 8 16 32 Batch size Baseline: Traditional Incremental 42
Related work Related work Existing Graph Engines static graph engines Pregel (SIGMOD 10) Powergraph (OSDI 12) GraphLab (VLDB 12) Grace (ATC 12) X-stream (SOSP 13) Active studies on changes and new concepts in evolving graph Densification law, Shrinking diameters diameter (KDD 05) PageRank (CIKM 07), Facebook user activities (EuroSys 09), centrality in evolving graph (MLG 10), retweet after N friends retweets (WWW 11), Rumors detection (SOMA 10) 43