Optimizing Read-Only Transactions for Performance

Slide Note
Embed
Share

Explore the nuances of performance-optimal read-only transactions in distributed storage systems. Focus on achieving high throughput and low latency while considering algorithmic properties and engineering factors. Learn how coordination overhead affects performance and strategies to design efficient read-only transactions close to simple reads.


Uploaded on Sep 22, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Performance-Optimal Read-only Transactions Haonan Lu Siddhartha Sen , Wyatt Lloyd Princeton University, Microsoft Research 1

  2. Distributed Storage Systems Enable Today s Web Services Web Load Page Read Mia s Page Jack s Page Jack Storage Friend Jack Write Friend Lists Mia 2

  3. Distributed Storage Systems Reads Dominate Workloads Web Load Page Mia s Page Jack s Page Reads Jack Storage Friend Jack Writes Friend Lists Mia 3

  4. Distributed Storage Systems Simple Reads Are Insufficient Web New page Unfriend Mia Mia s Page Read Jack s Page New Page Jack Storage New Page Load Page Read Friends New Page Friend Lists Unfriended Mia 4

  5. Read-Only Transactions A group of simple reads sent in parallel Do not write data Writes are allowed in the system Coordinate a consistent view across shards Coordination overhead causes higher latency and lower throughput 5

  6. Goal: Read-only transaction performance as close as possible to simple reads 6

  7. Goal: Read-only transaction performance as close as possible to simple reads We answer: What does optimal performance mean for read-only transactions? When is optimal performance achievable? How can we design performance-optimal read-only transactions? 7

  8. Performance Factors Engineering vs. Algorithmic Focus on the algorithmic properties due to coordination Algorithmic Properties Coordination Batching Engineering Factors Equally impact simple reads and read-only transactions Networking Hardware Abstract engineering factors by comparing to simple reads 8

  9. Performance Factors Algorithmic Properties Algorithmic Properties Blocking Page R R Simple Read Simple Read Friends 9

  10. Performance Factors Algorithmic Properties Algorithmic Properties Blocking Messages Page R R Simple Read Simple Read Friends 10

  11. Performance Factors Algorithmic Properties Algorithmic Properties Timestamp Blocking Messages Page R R Metadata Simple Read Timestamp Simple Read Friends 11

  12. Performance Factors Coordination Is Algorithmic Algorithmic Properties More Messages Coordination Overhead Blocking Metadata Simple Read Simple Read 12

  13. Read-Only Transactions Optimal Performance Algorithmic Properties Performance-optimal Read-only Transactions (N,O,C) Blocking N Messages O Metadata C Simple Read Simple Read 13

  14. Non-Blocking Reads Do not wait on external events Distributed locks, timeouts, messages, etc. Lower latency Avoid any time spent blocking Higher throughput Avoid CPU cost of context switches 14

  15. One-Round Communication One-round on-path reads Succeed in one round, i.e., no retries No off-path messages Required by reads but off the critical path Lower latency Avoids time for extra on-path messages Higher throughput Avoids CPU cost of processing extra messages 15

  16. Constant Metadata Metadata Information used to find a consistent view Timestamps, transaction IDs, etc. Size of metadata remains constant regardless of contention Higher throughput Avoids CPU cost of processing extra data 16

  17. Performance-optimal read-only transactions are NOC: Non-blocking messages that complete in One-round with Constant metadata 17

  18. Strict Serializability The strongest consistency model Writing applications made easy Requires a total order + real-time order New Page Done New Page Add Mia Done Read Page New Jack Read Friends Friends Mia Mia 18

  19. The NOCS Theorem: Impossible for read-only transaction algorithms to achieve performance-optimality [N,O,C] and strict serializability [S] 19

  20. Proof Intuition of NOCS unstable stable Svr-1 Unfinalized Write Finalized Write Svr-2 Coordination Coordination Svr-3 Required Free Svr-4 now 20

  21. Proof Intuition of NOCS unstable stable Svr-1 Svr-2 ? Must give up either N, O, or C Svr-3 ? Svr-4 now ROTXN 21

  22. NOC Designs Strict By the NOCS Theorem Serializability Process-order Serializability Our new design: PORT Causal MySQL Cluster Read Committed Weak Consistency 22

  23. Design Insight Capturing the Stable Frontier unstable stable Svr-1 Svr-2 Svr-3 Svr-4 Stable Frontier (SF) now 23

  24. Version Clock A type of logical clock Specialized for distributed storage systems Treat reads and writes differently Enable optimizations for reads and writes Capture the stable frontier 24

  25. PORT Overview Jack Web Client Storage Server 25

  26. PORT Overview Jack Key A [AX]0 [AY]1 [AZ]2 Version Clock 26

  27. PORT Overview Jack Version Stamp (VS) 1 Key A [AX]0 [AY]1 [AZ]2 Version Clock VS 27

  28. Write in PORT Jack A := AY VS = 2 Write Done 1 2 Key A [AX]0[AY]2 Version clocks tick on writes 28

  29. Read in Port Jack A = ? VS = 2 Read A = AY 1 2 Key A [AX]0 [AY]2 [AZ]5 No tick on reads 29

  30. Read Promotion Ensures a Total Order Jack A = ? VS = 2 Read 1 2 Key A [AX]0[ ? ]2 30

  31. Read Promotion Ensures a Total Order Jack A = ? VS = 2 Read A = AX 1 2 Key A [AX]1 [AX]2 Immutable [AX]0 31

  32. Read Promotion Ensures a Total Order Mia A := AY Write VS = 2 Done 1 2 Key A [AX]0 2 [AY]3 32

  33. Track Stable Frontier SF Map SF = ? SF = 3 Mia SFA = 3 SFB = 3 SFC = 5 Read/Write SFA = 3 1 2 3 Key A [AX]0 2 [AY]3 Advance to stable frontier 33

  34. Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFB = 3 SFC = 5 Key A [AX]0 [AY]3 [AZ]7 1 3 Key B [BX]0 [BY]1 [BZ]3 34

  35. Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFB = 3 SFC = 5 Key A [AX]0 [AY]3 [AZ]7 1 3 Key B [BX]0 [BY]1 [BZ]3 35

  36. Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFA = 7 Key A SFB = 3 SFC = 5 [AX]0 [AY]3 [AZ]7 1 3 Key A [BX]0 [BY]1 [BZ]3 36

  37. PORT Is NOC Reading at the stable frontier ensures reads are non-blocking (N) Client pre-determined snapshot with VS ensures one-round communication (O) One VS per read request ensure constant metadata (C) 37

  38. PORT Systems Scylla-PORT Base system: ScyllaDB (non-transactional) Highly optimized sensitive to overhead NOC + Process-ordered serializability Supports simple writes (not write transactions) Eiger-PORT Base system: Eiger (N, O, C) Existing read-only and write transactions NOC + Causal consistency Supports write transactions 38

  39. Evaluation of Scylla-PORT To understand Overhead in latency and throughput compared to simple reads Performance advantages compared to other protocols, e.g., OCC. Experiment configuration YCSB benchmark with customized parameters for skew and read-to-write ratios Evaluated latency, throughput, scalability, freshness 39

  40. Latency-Throughput Uniform, 5% Writes Scylla-OCC Scylla-PORT ScyllaDB Higher Throughput Lower Latency 40

  41. Latency-Throughput Zipf = 0.99, 5% Writes Scylla-OCC Scylla-PORT ScyllaDB 8% 41

  42. Conclusion Performance-optimal read-only transactions: NOC The NOCS Theorem for read-only transactions Impossible to have all of the NOCS properties The design of PORT NOC with the strongest consistency to date Scylla-PORT Minimum performance overhead compared to simple reads Significantly outperforms the standard OCC 42

Related