Spanner Database Overview

Spanner
Spanner
Lixin Shi
6.033 Quiz2 Review
(Some slides from Spanner’s OSDI presentation)
Key Points to Remember
Major Claims
Globally distributed data with configurable control
Consistent commit timestamps
Consistency Model: external consistency
TrueTime API
Set of supported read/write operations
Paxos write
Relaxed read w/wo time stamp
Globally Distributed Data
 
 
Cross-Datacenter
Distribution
 
User
Configurable
Transaction Model
Two-Phase locking with start/commit
Transactional write and lock-free read
Globally sortable time stamp with each
commit
Bounded error between time stamp and wall-
clock time
t
T
1
: 
s
1
start
commit
 
s
1
: Timestamp
External Consistency
If a transaction 
T
1
 commits before another
transaction 
T
2
 starts, then 
T
1
's commit
timestamp is smaller than 
T
2
A real-world approximation of global wall-
clock time consistency
t
T
1
: 
s
1
T
2
: 
s
2
start
commit
s
1
 < 
s
2
t
T
1
: 
s
1
T
2
: 
s
2
start
commit
s
1
 < 
s
2
x
TrueTime API
“Global wall-clock time” with bounded
uncertainty
time
earliest
latest
TT.now()
2*ε
Commit Wait
 
What this gives you:
T
 
Pick 
s
 = TT.now().latest
 
Acquired locks
 
Release locks
 
Wait until TT.now().earliest > 
s
 
s
 
average ε
 
Commit wait
 
average ε
 
absolute start time < s < absolute commit time
absolute start time < s < absolute commit time
From TrueTime to Consistency
 
Recall: what TrueTime gives you:
 
 
If there are two transactions:
t
T
1
: 
s
1
start
commit
s
1
 needs to be in this range
 
t
T
1
: 
s
1
T
2
: 
s
2
 
start
 
commit
 
s
s
1
1
 < 
 < 
s
s
2
2
 
s
1
 range
 
s
2
 range
Supported Read/Write Operations
Read-Write transaction
Paxos write
Commit wait
Read transaction with timestamp 
s
Lock-free
Every replica tracks 
t
safe
Read from any replica with 
t
safe
 >= 
s
Other variants of read
Standalone read: read with t.now().latest
Read with bounded timestamp
GFS
GFS
Lixin Shi
6.033 Quiz2 Review
(Some slides from GFS’s SOSP presentation)
Design Assumption/Facts
Hardware fails very often.
Large files (>=100MB) are typical.
For read operations: large streaming reads +
small random reads.
For write operations: Record appends are the
prevalent form of writing.
Need to handle concurrency.
Design choice: high bandwidth >> low lantency
Architecture
 
Read Algorithm
 
Read Algorithm (cont.)
 
Write Algorithm
 
Write Algorithm (cont.)
 
Write Algorithm (cont.)
 
Write Algorithm (cont.)
 
Atomic Record Appends
Atomic!
A few changes from the write algorithm:
Slide Note
Embed
Share

Spanner is a globally distributed database system that offers configurable control, consistent commit timestamps, external consistency, and TrueTime API for handling distributed data. It uses a transaction model with two-phase locking and lock-free reads, providing globally sortable timestamps. The system ensures external consistency by ordering commit timestamps and enforcing a real-world approximation of global wall-clock time consistency. Supported read/write operations include Paxos writes, commit waits, and various read variants to ensure data consistency and reliability.

  • Spanner Database
  • Globally Distributed Data
  • Consistency Model
  • TrueTime API
  • Distributed Transactions

Uploaded on Dec 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Spanner Lixin Shi 6.033 Quiz2 Review (Some slides from Spanner s OSDI presentation)

  2. Key Points to Remember Major Claims Globally distributed data with configurable control Consistent commit timestamps Consistency Model: external consistency TrueTime API Set of supported read/write operations Paxos write Relaxed read w/wo time stamp

  3. Globally Distributed Data Cross-Datacenter Distribution User Configurable

  4. Transaction Model Two-Phase locking with start/commit Transactional write and lock-free read Globally sortable time stamp with each commit Bounded error between time stamp and wall- clock time start commit t T1: s1 s1: Timestamp

  5. External Consistency If a transaction T1 commits before another transaction T2 starts, then T1's commit timestamp is smaller than T2 A real-world approximation of global wall- clock time consistency start commit start commit t t T1: s1 T1: s1 T2: s2 T2: s2 x s1 < s2 s1 < s2

  6. TrueTime API Global wall-clock time with bounded uncertainty TT.now() time earliest latest 2*

  7. Commit Wait Acquired locks Release locks T Pick s = TT.now().latest s Wait until TT.now().earliest > s Commit wait average average What this gives you: absolute start time < s < absolute commit time

  8. From TrueTime to Consistency Recall: what TrueTime gives you: start commit t T1: s1 s1 needs to be in this range If there are two transactions: start commit s1 < s2 t T1: s1 T2: s2 s1 range s2 range

  9. Supported Read/Write Operations Read-Write transaction Paxos write Commit wait Read transaction with timestamp s Lock-free Every replica tracks tsafe Read from any replica with tsafe >= s Other variants of read Standalone read: read with t.now().latest Read with bounded timestamp

  10. GFS Lixin Shi 6.033 Quiz2 Review (Some slides from GFS s SOSP presentation)

  11. Design Assumption/Facts Hardware fails very often. Large files (>=100MB) are typical. For read operations: large streaming reads + small random reads. For write operations: Record appends are the prevalent form of writing. Need to handle concurrency. Design choice: high bandwidth >> low lantency

  12. Architecture

  13. Read Algorithm

  14. Read Algorithm (cont.)

  15. Write Algorithm

  16. Write Algorithm (cont.)

  17. Write Algorithm (cont.)

  18. Write Algorithm (cont.)

  19. Atomic Record Appends Atomic! A few changes from the write algorithm:

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#