Cost/Performance in Modern Data Stores: How Data Caching Systems Succeed
Traditional databases are facing challenges with main memory row store and column store systems. Explore why traditional data caching systems are considered dead, while the elephants in the field still dance. Economic factors play a significant role in shaping hardware and software infrastructure needs. Money talks, as traditional database systems continue to cache data stores with good cost/performance, prompting a shift in focus for the industry.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Cost/Performance in Modern Data Stores How Data Caching Systems Succeed David Lomet Microsoft Research Redmond, WA 98052 2/20/2025 SoCal 2019 1
Traditional Databases are Challenged Main Memory Row Store Main Memory Row Store And the threat is Perceived to be growing Traditional Database Systems Column Store Streaming Column Store Streaming 2/20/2025 SoCal 2019 2
Traditional Data Caching Systems are Dead MSFT SQL Server MSFT SQL Server IBM DB2 Oracle Database Amazon Aurora 2/20/2025 SoCal 2019 3
The Elephants Still Dance WHY? 2/20/2025 SoCal 2019 4
Heres the story: Main Memory Row Store WHY? Niche market partnership absorbed Traditional Database Systems Column Store Streaming 2/20/2025 SoCal 2019 5
Economic Factors Hardware infrastructure needs to be Cheap => Commodity hardware Software infrastructure needs to be Efficient => high performance on this hardware Elastic Provisioning Free up expensive resources when otherwise under-utilized Give resources to those with higher value (willing to pay more) Under-provision for SLAs and pocket the difference 2/20/2025 SoCal 2019 6
Money Talks! $ 2/20/2025 SoCal 2019 7
Traditional Database Systems Are Caching Data Stores 2/20/2025 SoCal 2019 8
Traditional DBMSs have good Cost/Performance We need to move away from being fixated only on performance Costs vary enormously between systems We describe why traditional systems Have lower performance than recent research systems Yet succeed by having better cost/performance And suggest a change in focus for our field 2/20/2025 SoCal 2019 9
Caching Data Store . Uses a Cache Moves data from secondary storage into main memory when it is to be operated on Removes data from main memory when it is no longer being operated on Writes it back to secondary storage when it has been changed Drops it from main memory when unchanged Data lives permanently on secondary storage 2/20/2025 SoCal 2019 10
Result: Two Forms of Operation MM: main memory VERY fast N*10**6 ops/sec SS: secondary storage About 5.8X execution time of MM operations (our system) Must do MM operation + I/O to bring data into main memory Caching data stores look like a BAD idea 2/20/2025 SoCal 2019 11
Mixed Operation Performance weighted average of MM and SS operations: R = SS/MM ~ 5.8 +- 30% 1 Expected Perf (max) Expected Perf (min) 0.9 Relative Performance 0.8 Actual Perf (1-core) Actual Perf (4-core) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Percent of SS operations SoCal 2019 2/20/2025 12
A Performance Disaster? Each added SS operation pushes performance lower Toward SLOW SS performance Away from FAST MM performance Why Bother??? Costs matter in addition to performance 2/20/2025 SoCal 2019 13
So lets look at Cost/Performance 2/20/2025 SoCal 2019 14
Two Costs for Operating on Data Storage Cost Always paid Most of cost for cold data Or when data becomes cold Execution Cost Paid only when data is operated on Most of cost for hot data Or when data becomes hot 2/20/2025 SoCal 2019 15
Its All About Relative Costs 14 Relative Storage Costs Relative Execution Costs 12 12 New SSDs support n*100K IOPS 10 10 8 8 6 6 4 4 2 2 0 0 Main Memory Flash Storage Main Memory Op Execution Flash Storage Flash I/O Op Per Byte I/O Path Approximate Bw-tree Relative Numbers 2/20/2025 SoCal 2019 16
Cost Analysis We compute cost/sec and plot it against operations/sec executed Cost is a linear function of storage and execution costs: C = A + BX, where we will plot C vs X Cost/sec = (storage-rental/sec) + (cost/operation)*(operations/sec) C = A + B * X A: Storage cost = (cost/byte)*(size of data) and is constant for an operation on a piece of data DRAM cost/byte ~ 10X Flash cost/byte At ops/sec = 0, all cost is storage cost: [storage cost is y-intercept] Execution cost = (cost/operation)*(operations/sec) B: Cost/operation: SS op ~ 12X MM op Cost/operation determines the slope of the cost curve 2/20/2025 SoCal 2019 17
Some Details C = A + B * X $SS = cost of secondary storage ops/sec = $SSstor + $MMindex + {$MMexec + ($ SSD IO) + ($ Proc IO)}*Ops/sec $MM = cost of main memory ops/sec = {$MMstor + $SSstor + $MMindx} + {$Mmexec}*Ops/sec We set $SS = $MM and solve for (Ops/sec)**(-1) = time between ops $MMindx, $SSstor, $MMexec all cancel out Time (between ops) = {($ SSD IO) + ($ Proc IO)}/{($M/byte) * Msize} Gray s Five Minute Rule: Now ~ One Minute $SSD IO has dropped dramatically, so $ Proc IO is now important for cost Our full analysis tells you also about costs away from break-even 2/20/2025 SoCal 2019 18
RELATIVE COST Cost/Performance 20 18 Gray 5 min Rule Now ~ 1 min for 4K page 16 14 12 10 8 Storage Cost Caching System uses lowest cost OP: Either SS or MM 6 4 2 0 Storage Cost High Performance PERFORMANCE Ops/(unit time) Data Caching Op MM Op Flash Op 2/20/2025 SoCal 2019 20
How About Other Kinds of Systems 2/20/2025 SoCal 2019 21
Main Memory Systems Perform Better indeed, data caching systems perform better when fully cached but with worse cost/performance MassTree: high performance main memory data store Deuteronomy: high performance caching data store MassTree has higher performance than Deuteronomy s Bw-tree Relative costs Execution cost: Deuteronomy is 2.6 X MassTree because MassTree is 2.6X faster Storage cost: MassTree is 2.3 X Deuteronomy Because MassTree uses 2.3X the storage of Deuteronomy 2/20/2025 SoCal 2019 22
Cost/Performance: [Not to Scale] Data Caching Frequently Better Deuteronomy vs MassTree 40 About 3 seconds Per 4K page MassTree vs Bw-tree 3*10**6 ops/sec for 40GB 35 Deuteronomy has Better cost/performance 30 25 Relative Cost 20 All or Nothing No switching between systems 15 10 MM OP SS OP MassTree About 1 minute 4K page SS and MM ops 5 0 no operations Breakeven High High+ High++ High+++ Performance: Ops/Unit-time 2/20/2025 SoCal 2019 23
Lowering Storage Cost Facebook problem: huge data volume almost all cold But requiring high performance and decent latency Buying processors to be able to attach more SSDs To accommodate volume of data Solution: data caching system + data compression Use SSD storage for decent latency Scale out processors for good performance Data compression to lower storage cost Recall that storage cost is (cost/byte)*(size of data) At the expense of increased execution cost 2/20/2025 SoCal 2019 24
About 1 minute 4K page SS and MM ops Cost/Performance: Not to Scale With Compressed Data(hypothetical) 50 45 HOT 20% 40 35 Relative Cost Cold Data is in this range 30 25 20 Caching system can use lowest cost operation from choice of three COLD 80% 10 15 MM OP CSS SS OP CA OP 5 CSS used here for cold data 0 0 1 2 3 6 5 7 8 9 Performance: Ops/Unit-time 2/20/2025 SoCal 2019 25
NVM Storage? Used as SSD RELATIVE COST 20 NVM Storage? Used as Mem Extension NVM Cost/Performance Gray 5 min Rule Now ~ 1 min for 4K page 18 16 14 12 Deeper Memory Hierarchy 10 8 6 Caching System uses lowest cost OP: Either SS or MM Storage Cost 4 2 0 Storage Cost High Performance PERFORMANCE Ops/(unit time) Data Caching Op MM Op Flash Op SoCal 2019 2/20/2025 26
Making a High Performance and Low Cost Data Caching Systems 2/20/2025 SoCal 2019 27
How do we provide good cost/perf? Data caching system is part of answer But how do we optimize cost/performance in a data caching system Increase in-memorymulti-core performance Increased performance reduces cost must be relevant to data caching Improve single core performance, and multi-core scale-out Reduce number of I/Os Blind writes, log structuring, record cache Reduce data movement cost: SSD to/from main memory Reduce execution cost of I/O: e.g., user-level I/O Reduce SSD I/O cost: e.g., SSD with more IOPS Modify SSD: summer project with Jae Young Do and Ivan Picoli Reduce cost of secondary storage Compress Data for fewer bytes Use flash, not NVRAM (NVRAM ~ 3X cost of Flash) New NVM Technology? 2/20/2025 SoCal 2019 28
CAVEATS Two important ones 2/20/2025 SoCal 2019 29
Our Numbers are ApPRXIMATE Performance based on Set of point experiments Executed on our machines Using our Deuteronomy system Costs from web prices Variable at any one time Changing over time General thrust of the results is solid But your mileage may vary! 2/20/2025 SoCal 2019 30
Best cost/performance is NOT NOT always best Low costs are a plus, but not the only thing We want max[value cost] If value is high enough, low cost may be secondary Value might depend on latency (or peak performance) Some of the time, lowest latency wins the game and captures all the value E.g. some stock trading applications But this is not the common case Most of time, interactive system latency is adequate Anything less than a millisecond per op should be fine E.g. online shopping 2/20/2025 SoCal 2019 31
Recap Two costs Storage: always present Execution: only when operations execute When data is: Cold: storage cost dominates Hot: execution cost dominates Data Caching Systems win on cost/performance because they Move hot data to (high cost) main memory cache for low execution cost Move cold data to (low cost) secondary storage for low storage cost We should be focusing on data caching improvements! 2/20/2025 SoCal 2019 32
Talk brought to you by Bw-Tree in Production SQL Server Hekaton: Key-sequential index Lock-free for high concurrency consistent with Hekaton s non-blocking main memory architecture In-memory Bw-tree Azure DocumentDB: Indexing engine Rich query processing over a schema-free JSON model, with automatic indexing Sustained document ingestion at high rates Bw-tree + LLAMA Bing ObjectStore: Sorted key-value store Supports range queries Optimized for flash SSDs Bw-tree + LLAMA ObjectStore 2/20/2025 SoCal 2019 33
Some References David B. Lomet: Cost/performance in modern data stores: how data caching systems succeed. DaMoN 2018: 9:1-9:10 https://dl.acm.org/citation.cfm?doid=3211922.3211927 Justin J. Levandoski, David B. Lomet, Sudipta Sengupta: The Bw-Tree: A B- tree for new hardware platforms. ICDE 2013: 302-313 Justin J. Levandoski, David B. Lomet, Sudipta Sengupta: LLAMA: A Cache/Storage Subsystem for Modern Hardware. PVLDB 6(10): 877-888 (2013) http://www.vldb.org/pvldb/vol6/p877-levandoski.pdf Justin J. Levandoski, David B. Lomet, Sudipta Sengupta, Ryan Stutsman, Rui Wang: High Performance Transactions in Deuteronomy. CIDR 2015 http://cidrdb.org/cidr2015/Papers/CIDR15_Paper15.pdf 2/20/2025 SoCal 2019 34
QUESTIONS? 2/20/2025 SoCal 2019 35