The Google File System Architecture

The Google File System
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
Google
Presented by Jiamin Huang
EECS 582 – W16
1
Component failures are the norm
Files are huge
Appends are common; random writes are rare
Co-designing the applications and file system API increases
flexibility
Problem
2
Architecture
3
Master
Single master
Metadata
File and chunk namespaces
Mapping from files to chunks
Locations of each chunk’s replica
Replicate using operation log
Read-only shadow masters
4
Master operations
Namespace management using locks
Place chunk replicas across racks
Replicate chunks for higher availability
Moves chunks around for disk space and load balancing
Garbage collection
Deleted files
Stale replicas
Lazy reclaim
5
Multiple chunkservers
Free to join and leave
Stores actual data
Report chunk locations to master
Checksums the data for integrity
Replicated by the master
Chunkserver
6
Interface
Normal operations: create, delete, open, close, read, write
Additional operations
Snapshot
Create copy of file or directory tree
Copy-on-write
Record append
Atomic
Returns the offset to the client
7
System Interaction - Read
1.
Client sends file name and chunk index to master
Can be multiple chunks
2.
Master returns replica locations
May return locations for the next chunks
3.
Client sends request to a chunkserver
4.
Chunk servers returns the data
5.
Further reads require no client-master interaction
8
System Interaction - Write
1.
Master selects a primary chunkserver and grants a lease
2.
Client asks master the location of primary and secondaries
3.
Client pushes the data to all replicas
4.
All replicas reply to the client
5.
Client sends write request to primary
6.
Primary executes request, forwards it to secondaries
7.
Secondaries replay all mutations in the order of the primary
9
System Interaction - Append
Same as write
Primary pads the chunk if space is not enough and client
retries
Each append is at most ¼ of the chunk size
Large appends are broken into multiple operations
10
Consistency Model
Consistency level
Defined
Consistent
Inconsistent
Implications for applications
Rely on appends rather than overwrites
Checkpoint
Use self-validating, self-identifying records
11
Evaluation - Micro-benchmarks
12
Evaluation - Real world clusters
Cluster A
Research and Development
A few MBs to a few TBs of data
Tasks run up to hours
Cluster B
Production use
Continuously generate and process multi-TB data
Long running tasks
13
Storage and Metadata
14
Read/Write Rate
15
Recovery Time
Kill one chunkserver
15000 chunks containing 600 GB data
All chunks restored in 23.2 mins
Kill two chunkservers
Each with 16000 chunks and 660 GB data
Results in 266 single replicas
Single replicas restored to at least 2x within 2 mins
16
Conclusion
Some assumptions no longer hold
Failures are normal
Optimize for large files
Optimize for appends
Fault tolerance
Constant monitoring
Replication
Fast recovery
17
Bisection, high bandwidth network
Flat storage model
Non-blocking API
Single master, multiple tractservers
Deterministic data placement
Dynamic work allocation with small work unit
Parallel writes to all replicas
Parallel replication
Flat Datacenter Storage (FDS)
18
Tachyon
Pushes lineage into the storage layer
Lineage information is persisted before the actual data
Asynchronous, selective checkpointing
Leaves and hot files first
Resource allocation based on job priority
Uses client side caching to increase replication factor
19
Discussion
How should the design be changed to handle small files?
How to use multiple masters to avoid SPOF?
How can consistency be improved?
20
Slide Note
Embed
Share

This content delves into the architecture of The Google File System, discussing key components such as master operations, chunk servers, and system interactions for both read and write operations. The system's design emphasizes fault tolerance, scalability, and efficient data storage and retrieval.

  • Google
  • File System
  • Architecture
  • Chunkserver
  • Master Operations

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang 1 EECS 582 W16

  2. Problem Component failures are the norm Files are huge Appends are common; random writes are rare Co-designing the applications and file system API increases flexibility 2

  3. Architecture 3

  4. Master Single master Metadata File and chunk namespaces Mapping from files to chunks Locations of each chunk s replica Replicate using operation log Read-only shadow masters 4

  5. Master operations Namespace management using locks Place chunk replicas across racks Replicate chunks for higher availability Moves chunks around for disk space and load balancing Garbage collection Deleted files Stale replicas Lazy reclaim 5

  6. Chunkserver Multiple chunkservers Free to join and leave Stores actual data Report chunk locations to master Checksums the data for integrity Replicated by the master 6

  7. Interface Normal operations: create, delete, open, close, read, write Additional operations Snapshot Create copy of file or directory tree Copy-on-write Record append Atomic Returns the offset to the client 7

  8. System Interaction - Read 1.Client sends file name and chunk index to master Can be multiple chunks 2.Master returns replica locations May return locations for the next chunks 3.Client sends request to a chunkserver 4.Chunk servers returns the data 5.Further reads require no client-master interaction 8

  9. System Interaction - Write 1.Master selects a primary chunkserver and grants a lease 2.Client asks master the location of primary and secondaries 3.Client pushes the data to all replicas 4.All replicas reply to the client 5.Client sends write request to primary 6.Primary executes request, forwards it to secondaries 7.Secondaries replay all mutations in the order of the primary 9

  10. System Interaction - Append Same as write Primary pads the chunk if space is not enough and client retries Each append is at most of the chunk size Large appends are broken into multiple operations 10

  11. Consistency Model Consistency level Defined Consistent Inconsistent Implications for applications Rely on appends rather than overwrites Checkpoint Use self-validating, self-identifying records 11

  12. Evaluation - Micro-benchmarks 12

  13. Evaluation - Real world clusters Cluster A Research and Development A few MBs to a few TBs of data Tasks run up to hours Cluster B Production use Continuously generate and process multi-TB data Long running tasks 13

  14. Storage and Metadata 14

  15. Read/Write Rate 15

  16. Recovery Time Kill one chunkserver 15000 chunks containing 600 GB data All chunks restored in 23.2 mins Kill two chunkservers Each with 16000 chunks and 660 GB data Results in 266 single replicas Single replicas restored to at least 2x within 2 mins 16

  17. Conclusion Some assumptions no longer hold Failures are normal Optimize for large files Optimize for appends Fault tolerance Constant monitoring Replication Fast recovery 17

  18. Flat Datacenter Storage (FDS) Bisection, high bandwidth network Flat storage model Non-blocking API Single master, multiple tractservers Deterministic data placement Dynamic work allocation with small work unit Parallel writes to all replicas Parallel replication 18

  19. Tachyon Pushes lineage into the storage layer Lineage information is persisted before the actual data Asynchronous, selective checkpointing Leaves and hot files first Resource allocation based on job priority Uses client side caching to increase replication factor 19

  20. Discussion How should the design be changed to handle small files? How to use multiple masters to avoid SPOF? How can consistency be improved? 20

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#