Optimizing Read-Only Transactions for Performance
Explore the nuances of performance-optimal read-only transactions in distributed storage systems. Focus on achieving high throughput and low latency while considering algorithmic properties and engineering factors. Learn how coordination overhead affects performance and strategies to design efficient read-only transactions close to simple reads.
- Performance Optimization
- Read-Only Transactions
- Distributed Storage Systems
- Algorithmic Properties
- Engineering Factors
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Performance-Optimal Read-only Transactions Haonan Lu Siddhartha Sen , Wyatt Lloyd Princeton University, Microsoft Research 1
Distributed Storage Systems Enable Today s Web Services Web Load Page Read Mia s Page Jack s Page Jack Storage Friend Jack Write Friend Lists Mia 2
Distributed Storage Systems Reads Dominate Workloads Web Load Page Mia s Page Jack s Page Reads Jack Storage Friend Jack Writes Friend Lists Mia 3
Distributed Storage Systems Simple Reads Are Insufficient Web New page Unfriend Mia Mia s Page Read Jack s Page New Page Jack Storage New Page Load Page Read Friends New Page Friend Lists Unfriended Mia 4
Read-Only Transactions A group of simple reads sent in parallel Do not write data Writes are allowed in the system Coordinate a consistent view across shards Coordination overhead causes higher latency and lower throughput 5
Goal: Read-only transaction performance as close as possible to simple reads 6
Goal: Read-only transaction performance as close as possible to simple reads We answer: What does optimal performance mean for read-only transactions? When is optimal performance achievable? How can we design performance-optimal read-only transactions? 7
Performance Factors Engineering vs. Algorithmic Focus on the algorithmic properties due to coordination Algorithmic Properties Coordination Batching Engineering Factors Equally impact simple reads and read-only transactions Networking Hardware Abstract engineering factors by comparing to simple reads 8
Performance Factors Algorithmic Properties Algorithmic Properties Blocking Page R R Simple Read Simple Read Friends 9
Performance Factors Algorithmic Properties Algorithmic Properties Blocking Messages Page R R Simple Read Simple Read Friends 10
Performance Factors Algorithmic Properties Algorithmic Properties Timestamp Blocking Messages Page R R Metadata Simple Read Timestamp Simple Read Friends 11
Performance Factors Coordination Is Algorithmic Algorithmic Properties More Messages Coordination Overhead Blocking Metadata Simple Read Simple Read 12
Read-Only Transactions Optimal Performance Algorithmic Properties Performance-optimal Read-only Transactions (N,O,C) Blocking N Messages O Metadata C Simple Read Simple Read 13
Non-Blocking Reads Do not wait on external events Distributed locks, timeouts, messages, etc. Lower latency Avoid any time spent blocking Higher throughput Avoid CPU cost of context switches 14
One-Round Communication One-round on-path reads Succeed in one round, i.e., no retries No off-path messages Required by reads but off the critical path Lower latency Avoids time for extra on-path messages Higher throughput Avoids CPU cost of processing extra messages 15
Constant Metadata Metadata Information used to find a consistent view Timestamps, transaction IDs, etc. Size of metadata remains constant regardless of contention Higher throughput Avoids CPU cost of processing extra data 16
Performance-optimal read-only transactions are NOC: Non-blocking messages that complete in One-round with Constant metadata 17
Strict Serializability The strongest consistency model Writing applications made easy Requires a total order + real-time order New Page Done New Page Add Mia Done Read Page New Jack Read Friends Friends Mia Mia 18
The NOCS Theorem: Impossible for read-only transaction algorithms to achieve performance-optimality [N,O,C] and strict serializability [S] 19
Proof Intuition of NOCS unstable stable Svr-1 Unfinalized Write Finalized Write Svr-2 Coordination Coordination Svr-3 Required Free Svr-4 now 20
Proof Intuition of NOCS unstable stable Svr-1 Svr-2 ? Must give up either N, O, or C Svr-3 ? Svr-4 now ROTXN 21
NOC Designs Strict By the NOCS Theorem Serializability Process-order Serializability Our new design: PORT Causal MySQL Cluster Read Committed Weak Consistency 22
Design Insight Capturing the Stable Frontier unstable stable Svr-1 Svr-2 Svr-3 Svr-4 Stable Frontier (SF) now 23
Version Clock A type of logical clock Specialized for distributed storage systems Treat reads and writes differently Enable optimizations for reads and writes Capture the stable frontier 24
PORT Overview Jack Web Client Storage Server 25
PORT Overview Jack Key A [AX]0 [AY]1 [AZ]2 Version Clock 26
PORT Overview Jack Version Stamp (VS) 1 Key A [AX]0 [AY]1 [AZ]2 Version Clock VS 27
Write in PORT Jack A := AY VS = 2 Write Done 1 2 Key A [AX]0[AY]2 Version clocks tick on writes 28
Read in Port Jack A = ? VS = 2 Read A = AY 1 2 Key A [AX]0 [AY]2 [AZ]5 No tick on reads 29
Read Promotion Ensures a Total Order Jack A = ? VS = 2 Read 1 2 Key A [AX]0[ ? ]2 30
Read Promotion Ensures a Total Order Jack A = ? VS = 2 Read A = AX 1 2 Key A [AX]1 [AX]2 Immutable [AX]0 31
Read Promotion Ensures a Total Order Mia A := AY Write VS = 2 Done 1 2 Key A [AX]0 2 [AY]3 32
Track Stable Frontier SF Map SF = ? SF = 3 Mia SFA = 3 SFB = 3 SFC = 5 Read/Write SFA = 3 1 2 3 Key A [AX]0 2 [AY]3 Advance to stable frontier 33
Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFB = 3 SFC = 5 Key A [AX]0 [AY]3 [AZ]7 1 3 Key B [BX]0 [BY]1 [BZ]3 34
Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFB = 3 SFC = 5 Key A [AX]0 [AY]3 [AZ]7 1 3 Key B [BX]0 [BY]1 [BZ]3 35
Read-Only Transaction Logic SF Map SF = 3 Jack SFA = 3 SFA = 7 Key A SFB = 3 SFC = 5 [AX]0 [AY]3 [AZ]7 1 3 Key A [BX]0 [BY]1 [BZ]3 36
PORT Is NOC Reading at the stable frontier ensures reads are non-blocking (N) Client pre-determined snapshot with VS ensures one-round communication (O) One VS per read request ensure constant metadata (C) 37
PORT Systems Scylla-PORT Base system: ScyllaDB (non-transactional) Highly optimized sensitive to overhead NOC + Process-ordered serializability Supports simple writes (not write transactions) Eiger-PORT Base system: Eiger (N, O, C) Existing read-only and write transactions NOC + Causal consistency Supports write transactions 38
Evaluation of Scylla-PORT To understand Overhead in latency and throughput compared to simple reads Performance advantages compared to other protocols, e.g., OCC. Experiment configuration YCSB benchmark with customized parameters for skew and read-to-write ratios Evaluated latency, throughput, scalability, freshness 39
Latency-Throughput Uniform, 5% Writes Scylla-OCC Scylla-PORT ScyllaDB Higher Throughput Lower Latency 40
Latency-Throughput Zipf = 0.99, 5% Writes Scylla-OCC Scylla-PORT ScyllaDB 8% 41
Conclusion Performance-optimal read-only transactions: NOC The NOCS Theorem for read-only transactions Impossible to have all of the NOCS properties The design of PORT NOC with the strongest consistency to date Scylla-PORT Minimum performance overhead compared to simple reads Significantly outperforms the standard OCC 42