Highly Available Relational Database System - Key Components and Design Choices

Slide Note
Embed
Share

A comprehensive overview of a highly available relational database system, focusing on scalability, concurrency control options, data replication, availability strategies, failure handling, and full transactional support. The content discusses essential elements such as optimistic concurrency control, two-phase locking, multiversion concurrency control, synchronous data replication, API-driven service discovery, instantaneous schema changes, and more.


Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Comdb2 Comdb2 BLOOMBERG S HIGHLY AVAILABLE RELATIONAL DATABASE SYSTEM BLOOMBERG S HIGHLY AVAILABLE RELATIONAL DATABASE SYSTEM A L E X S C O T T I , M A R K H A N N U M , M I C H A E L P O N O M A R E N K O , D O R I N H O G E A , A K S H A T S I K A R W A R , M O H I T K H U L L A R , A D I Z A I M I , J A M E S L E D D Y , R I V E R S Z H A N G , F A B I O A N G I U S , L I N G Z H I D E N G Presenter: Tianyuan Zhang 1

  2. Motivation SCENARIO: PRODUCTION ENVIRONMENT & DEMANDS: A trader from NASDAQ is browsing stock price using BLP s terminal during high traffic hour. He sees a profitable stock and wants to buy in immediately. All the other trader should be able to see the same price at the moment and act accordingly. Service should be reachable all the time. Any two application should display same information. Relative fast response time. Huge amount of data, scale up is costly and impractical. Scalability, High availability, Full transactional 2

  3. Scalability Replication over multiple machines. Trade of between latency and consistency. Master Eventual consistency vs strict consistency. Design choices: OCC vs MVCC vs 2PL Replicant Replicant Replicant 3

  4. Design choice: concurrency control OCC Optimistic concurrency control: Assumption: low data contention Transactions use data resources without acquiring locks. Check for conflict before committing. Fastest while guarantee concurrency. 2PL Two-phase locking: Work on any condition. Disallow concurrency. Guarantees serializability. Seriously slow down performance. MVCC Multiversion concurrency control: Concurrency. clean semantics. Less efficient with memory and disk space. Complexity of code. 4

  5. Availability Synchronously replicate across data-centers. Client use API to discover services and reconnect when server failed. Tolerant to any type of outage or maintenance. Elastic deployment: free to change cluster structure. Instantaneous schema changes: update schema without rebuild. 5

  6. HASQL failure handling Begin transaction Point-in-time token:100 API: S1 LSN:100 Token:100 Transaction: AAAAA BBBBB CCCCC DDDD Master LSN:100 S2 LSN:101 Snapshot:100 6

  7. Full Transactional Support Atomicity: provided by write-ahead log (WAL) protocol . Consistency: Synchronous replication of data, OCC. Isolation: Block, Read Committed, Snapshot Isolation, Serializable Durability: WAL, network commit. 7

  8. Database Isolation level none serializable Write skew Phantoms read Nonrepeatable read Snapshot Isolation Repeatable Read Read Committed [default] Block Isolation Read uncommitted Dirty Reads 8

  9. Life cycle Green: OCC phase Red: 2PL phase 9

  10. IMPLEMENTATION Storage Layer Replication Cdb2 Layer 10

  11. Storage Layer Uses B-trees to store every type of data Multiple B-trees form a table. Improve on BerkeleyDB: 1. Row Locks finer granularity compare to page lock 2. Prefaulting(readahead): B-tree readahead Local prefaulting Remote prefaulting 11

  12. Storage Layer(cont.) 3. Root caching---allow concurrent read 4. Compression---trade-off CPU cycles for less disk I/O 5. Concurrent I/O---Multi-threading flush pages. 12

  13. Replication logic Use of LSN(Log Sequence Number) Performance concerns of synchronous system Durability in comdb2: Network commit Early ack Preserving concurrency when send log to replicant. Coherency model To keep updated short term leases 13

  14. Eliminating dirty reads Master S0 C1 S1 S4 S2 S3 C2 14

  15. Eliminating dirty reads S0 S4 S1 S2 new Master S3 C2 15

  16. Cdb2 Layer Data organization: Genid: counter(48) update-id(12) stripe-id(4) Row header: update-id(12) schema version(8) flags(8) length(28) shadow trees: 16

  17. Cdb2 Layer (cont.) BPLog translation of SQL to low level execution Master execute bplog using 2PL -> WAL resilient to master reelection. Schema changes Declarative schema change phoenix transaction Compatible change: lazy substitution. Incompatible change rebuild hidden version in background. Reads occur against the original table. Writes are performed on both the regular and the hidden version of the table. 17

  18. TUNING 01 02 03 04 Trading off performance with consistency Asynchronous Logging Tuning incoherence By size By type 18

  19. 6 node cluster spread across 2 datacenters Evaluation 19

  20. Limitaion & future work Can not write to tables that exist in remote databases. User 2PC across database replication groups. Write operation can not linearly scaled with more machine. Saturate on a single machine s ability to process the low level bplogs. Solution: multi-master systems or partitioned data sets. 20

  21. Questions and discussion What does the master do if it receives no acknowledgement from the replicants and how long does it take to progress? OCC has it s limitations if the write to read ratio is high. Do Replicas just work as stand-by nodes for the case of failure? How does it compare to NoSQL systems? How does it perform is geo-distributed databases? 21

Related