FlashGraph: Processing Billion-Node Graphs on Commodity SSDs
FlashGraph proposes a system that combines SSDs and RAM for efficient graph processing, storing vertices in memory and edge lists in SSD storage. The system can handle large graphs without using excessive memory and boasts performance comparable to in-memory graph processing engines. While SSDs offer advantages in sequential I/O, concerns remain regarding their wear-out and the system's performance with dense graphs. The discussion raises questions about the future of SSDs as a commodity and their suitability for high write IO tasks combined with RAM.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Scribed by Vinh Ha
Motivation Graph analysis has lots of random reads and writes Analysing large graph requires big clusters traditionally so that the aggregate memory exceeds the graph size. Graph processing has novel applications everywhere. Solution Build up a system that tries to combine SSDs and RAM. Implement a graph processing engine on top of SSDs file system designed for high parallelism and high IOPS. They tried to store the vertices state in memory and edge lists in SSD storage. General and flexible programming interface focusing on vertices.
Pros: The idea of storing vertices and edge lists in different places is good at least for applications which don t write to the edge lists. The system can store billions of nodes and process them without utilizing too much memory. The performance of the system is comparable to most in-memory graph processing system, for example, Galois, a prevalent graph processing engine. It can conservatively merge the I/O requests to increase I/O throughput. Sequential I/O is important since SSDs performs way better in sequential I/O rather than random I/O Flash Graph s API seems easy to use and understand. It minimizes the amount of data to be stored in memory.
Cons The author has this assumption that SSDs has become a commodity. But the author doesn t really show extra resources regarding the prices of SSDs compared to other type of memory like RAM or regular storage. Another assumption that author has is that modification of edge doesn t happen in graph processing application. SSDs wearing out will still be a problem. Flash graph s vertex scheduler is constrained by the I/O ordering. What if the graph is really dense -> Many edges.
Discussion Questions The performance of this system in dense graph where there are lots of edges in the storage? The author mentioned that SSDs has become a commodity. Has it really become a commodity or in the future? Should we stick to the typical disk storage if we need high write IO combining with the RAM? Since the SSDs has a higher performance in sequential IO, is is the best for graph processing given that the author has stated that graph processing has high random IOs?