Data Processing and Analysis for Graph-Based Algorithms
This content delves into the preprocessing, computing, post-processing, and analysis of raw XML data for graph-based algorithms. It covers topics such as data ETL, graph analytics, PageRank computation, and identifying top users. Various tools and frameworks like GraphX, Spark, Giraph, and GraphLab are discussed for efficient data processing and computation. Additionally, it explores property graphs, data-parallel and graph-parallel processing, Pregel tables, and community detection in graphs. The content also touches on hyperlinks, top Wikipedia pages, and semantic analysis using LDA models.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Preprocessing Compute Post Proc. < / > < / > < / > XML Raw Data ETL Analyze Slice Compute Initial Graph Subgraph PageRank Top Users Repeat
Raw Wikipedia Hyperlinks PageRank Top 20 Pages < / > < / > < / > HDFS HDFS XML Spark Preprocess Compute Spark Post. Na ve Spark Na ve Spark 1492 Giraph + Spark Giraph + Spark 605 GraphX GraphX 342 GraphLab + Spark GraphLab + Spark 375 0 200 Total Runtime (in Seconds) Total Runtime (in Seconds) 400 600 800 1000 1200 1400 1600
Property Graph Vertex Table Advisor Id 3 7 5 2 Property (V) (rxin, student) (jgonzal, postdoc) (franklin, professor) (istoica, professor) Edge Table 3 5 rxin stu. franklin, prof. Collab. Colleague SrcId 3 5 2 5 DstId 7 3 5 7 Property (E) Collaborator Advisor Colleague PI 7 2 jgonzal, pst.doc. istoica prof.
Data-Parallel Graph-Parallel Pregel Table Property Graph Row Row Result Row Row
Hyperlinks PageRank Top 20 Pages Title PR Raw Text Table Wikipedia Title Body Term-Doc Graph Topic Model (LDA) < / > < / > < / > Word Topics XML Word Topic Community Detection User Community Topic Discussion Table Community Editor Graph User Disc. User Com. Topic Com.
Vertex Table (RDD) Routing Table (RDD) Edge Table (RDD) Property Graph Part. 1 A B B C A A 1 2 A C B B B C 1 A D C D C C 1 2D Vertex Cut Heuristic A E A D D D 1 2 A F E E 2 E D F E Part. 2 F F E F 2
Edge Cut Vertex Cut