Google Spanner: A Distributed Multiversion Database Overview

Slide Note
Embed
Share

Represented at OSDI 2012 by Wilson Hsieh, Google Spanner is a globally distributed database system that offers general-purpose transactions and SQL query support. It features lock-free distributed read transactions, ensuring external consistency of distributed transactions. Spanner enables property support for true-time and interval-based global time to ensure correctness and performance. The system integrates concurrency control, replication, and 2PC, marking it as the first of its kind at a global scale. Learn more about its architecture, benefits, and real-world applications through various examples presented at the event.


Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Spanner: Googles Globally-Distributed Database Wilson Hsieh representing a host of authors OSDI 2012

  2. What is Spanner? Distributed multiversion database General-purpose transactions (ACID) SQL query language Schematized tables Semi-relational data model Running in production Storage for Google s ad data Replaced a sharded MySQL database OSDI 2012 2

  3. Example: Social Network Sao Paulo Santiago Buenos Aires x1000 San Francisco Seattle Arizona Brazil x1000 User posts User posts User posts User posts User posts Friend lists Friend lists Friend lists Friend lists Friend lists Moscow Berlin Krakow London Paris Berlin Madrid Lisbon US x1000 x1000 Russia Spain OSDI 2012 3

  4. Overview Feature: Lock-free distributed read transactions Property: External consistency of distributed transactions First system at global scale Implementation: Integration of concurrency control, replication, and 2PC Correctness and performance Enabling technology: TrueTime Interval-based global time OSDI 2012 4

  5. Read Transactions Generate a page of friends recent posts Consistent view of friend list and their posts Why consistency matters 1. Remove untrustworthy person X as friend 2. Post P: My government is repressive OSDI 2012 5

  6. Single Machine Block writes Generate my page Friend1 post Friend2 post User posts Friend lists Friend lists User posts Friend999 post Friend1000 post OSDI 2012 6

  7. Multiple Machines Block writes User posts Friend lists Friend lists User posts Friend1 post Friend2 post Generate my page Friend999 post User posts Friend lists Friend lists User posts Friend1000 post OSDI 2012 7

  8. Multiple Datacenters User posts Friend lists Friend1 post US x1000 User posts Friend lists Friend2 post Spain x1000 Generate my page User posts Friend lists Friend999 post x1000 Brazil User posts Friend lists Friend1000 post Russia x1000 OSDI 2012 8

  9. Version Management Transactions that write use strict 2PL Each transaction T is assigned a timestamp s Data written by T is timestamped with s Time <8 8 15 [X] [] My friends [P] My posts X s friends [me] [] OSDI 2012 9

  10. Synchronizing Snapshots Global wall-clock time == External Consistency: Commit order respects global wall-time order == Timestamp order respects global wall-time order given timestamp order == commit order OSDI 2012 10

  11. Timestamps, Global Clock Strict two-phase locking for write transactions Assign timestamp while locks are held Acquired locks Release locks T Pick s = now() OSDI 2012 11

  12. Timestamp Invariants Timestamp order == commit order T1 T2 Timestamp order respects global wall-time order T3 T4 OSDI 2012 12

  13. TrueTime Global wall-clock time with bounded uncertainty TT.now() time earliest latest 2* OSDI 2012 13

  14. Timestamps and TrueTime Acquired locks Release locks T Pick s = TT.now().latest s Wait until TT.now().earliest > s Commit wait average average OSDI 2012 14

  15. Commit Wait and Replication Start consensus Achieve consensus Notify slaves Acquired locks Release locks T Pick s Commit wait done OSDI 2012 15

  16. Commit Wait and 2-Phase Commit Start logging Done logging Acquired locks Release locks TC Committed Notify participants of s Acquired locks Release locks TP1 Release locks Acquired locks TP2 Prepared Send s Compute s for each Commit wait done Compute overall s OSDI 2012 16

  17. Example Remove X from my friend list Risky post P TC T2 sC=6 s=8 s=15 Remove myself from X s friend list TP sP=8 s=8 Time <8 8 15 [X] [] My friends [P] My posts X s friends [me] [] OSDI 2012 17

  18. What Have We Covered? Lock-free read transactions across datacenters External consistency Timestamp assignment TrueTime Uncertainty in time can be waited out OSDI 2012 18

  19. What Havent We Covered? How to read at the present time Atomic schema changes Mostly non-blocking Commit in the future Non-blocking reads in the past At any sufficiently up-to-date replica OSDI 2012 19

  20. TrueTime Architecture GPS GPS GPS timemaster timemaster timemaster GPS Atomic-clock timemaster GPS timemaster timemaster Client Datacenter 1 Datacenter 2 Datacenter n Compute reference [earliest, latest] = now OSDI 2012 20

  21. TrueTime implementation now = reference now + local-clock offset = reference + worst-case local-clock drift +6ms 200 s/sec reference uncertainty time 0sec 30sec 60sec 90sec OSDI 2012 21

  22. What If a Clock Goes Rogue? Timestamp assignment would violate external consistency Empirically unlikely based on 1 year of data Bad CPUs 6 times more likely than bad clocks OSDI 2012 22

  23. Network-Induced Uncertainty 10 6 99.9 99 90 5 8 4 Epsilon (ms) 6 3 4 2 2 1 Mar 29 Mar 30 Mar 31 Apr 1 6AM 8AM Date (April 13) 10AM 12PM Date OSDI 2012 23

  24. Whats in the Literature External consistency/linearizability Distributed databases Concurrency control Replication Time (NTP, Marzullo) OSDI 2012 24

  25. Future Work Improving TrueTime Lower < 1 ms Building out database features Finish implementing basic features Efficiently support rich query patterns OSDI 2012 25

  26. Conclusions Reify clock uncertainty in time APIs Known unknowns are better than unknown unknowns Rethink algorithms to make use of uncertainty Stronger semantics are achievable Greater scale != weaker semantics OSDI 2012 26

  27. Thanks To the Spanner team and customers To our shepherd and reviewers To lots of Googlers for feedback To you for listening! Questions? OSDI 2012 27

Related