Core Concepts of Distributed Systems

 
Distributed Systems
Distributed Systems
 
Lecture 1 – Introduction
Lecture 1 – Introduction
 
Cheng Li
 
This class will teach you …
 
Core concepts of distributed systems
-
Abstractions, algorithms, implementation techniques
 
Popular distributed systems and tools used by big
companies today
-
E.g.: Google's protobuf/Bigtable/Spanner/MapReduce, Ceph,
Hadoop, Amazon's Dynamo, MXNet, etc.
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
References
 
MIT‘s 6.824 (Robert Morris and Frans Kaashoek)
-
http://nil.csail.mit.edu/6.824/2018/schedule.html
NYU's G22.3033 (Jinyang Li)
-
http://www.news.cs.nyu.edu/~jinyang/fa16-ds/
UW’s CSE452 (Tom Anderson)
-
https://courses.cs.washington.edu/courses/cse452/18sp/
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Acknowledgements: Lecture notes build on these courses!
 
References
 
Umich’s 491 (Harsha V. Madhyastha)
-
https://lamport.eecs.umich.edu/#schedule
Cornell’s 5414 (Lorenzo Alvisi)
-
http://www.cs.cornell.edu/courses/cs5414/2019fa/
Columbia’s 4113 (Roxana Geambasu)
-
https://columbia.github.io/ds1-class/
Stanford’s 244b (David Mazières)
-
http://www.scs.stanford.edu/17au-cs244b/
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Acknowledgements: Lecture notes build on these courses!
 
What is a Distributed System?
 
A distributed system is a
 collection of independent
computers that
-
communicate via network
-
cooperate
 to provide some service
-
appear to the users of the system 
as a single system
.
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Distributed systems vs. networks
 
Distributed systems raise the level of 
abstraction
Hide many complexities and make it easier to build
applications
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Why Distributed Systems?
 
For location transparency
Examples:
-
Your browser doesn’t need to know which Google servers are
serving Gmail right now
-
Your Amazon EC2-based mobile app doesn’t need to know which
servers in S3 are storing its data
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Why Distributed Systems?
 
For scalable capacity
Aggregate resources of many computers
-
CPU: MapReduce, Dryad, Hadoop
-
Disk: NFS, the Google file system, Hadoop HDFS
-
Memory: memcached, dist-cache
-
Bandwidth:  Akamai CDN
What scales are we talking about?
-
Typical datacenters have 100-200K machines!
-
Each service runs on more like 20K machines, though
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Why Distributed Systems?
 
For availability
Build a reliable system out of unreliable parts
-
Hardware can fail: power outage, disk failures, memory corruption,
network switch failures…
-
Software can fail: bugs, mis-configuration, upgrade …
-
To achieve 0.9999 availability, replicate data/computation on many
hosts with automatic failover
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Availability
 
Simply, each request eventually receives a response.
 
 
Measured as uptime/(uptime + downtime)
-
Google Spanner achieves 99.999%
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Why Distributed Systems?
 
For modular functionality
Your application is split into many simpler parts, which may
already exist or are easier to implement
-
Authentication service
-
Indexing service
-
Locking service
This is called the service-oriented architecture (SOA) and much
of the Web is built this way
-
E.g.: one request on Amazon’s website touches tens of services, each
with thousands of machines (e.g., pricing service, product rating service,
inventory service, shopping cart service, user preferences service,
etc…)
 
2/25/2025
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
Challenges
 
Achieving location transparency, scalability, availability, and
modularity in distributed systems is really hard!
System design challenges
-
What is the right interface or abstraction?
Achieving scalability is challenging
-
How to partition functions for scalability?
Consistency challenges
-
How do machines coordinate to achieve the task?
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
2/25/2025
 
Challenges (Continued)
 
Security challenges
-
How to authenticate clients or servers?
-
How to defend against misbehaving servers?
Fault tolerance challenges
-
How to keep system available despite machine or network
failures?
Implementation challenges
-
How to maximize concurrency?
-
What’s the bottleneck?
-
How to reduce load on the bottleneck resource?
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
2/25/2025
 
A word of warning
 
“A distributed system is one in which
the failure of a computer you didn’t
even know existed can render your
own computer unusable.”
 
--Leslie Lamport
 
USTC-ADSL-Dist-Sys-Lecture-Note
 
2/25/2025
 
Distributed Systems
Distributed Systems
 
Lecture 1 – Introduction
Lecture 1 – Introduction
 
Q&A!
Slide Note
Embed
Share

This lecture introduces distributed systems, covering key concepts, algorithms, and tools used by major companies today. Explore the benefits and challenges of distributed systems, with a focus on location transparency and abstraction.

  • Distributed Systems
  • Algorithms
  • Tools
  • Location Transparency
  • Abstraction

Uploaded on Feb 25, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Distributed Systems Lecture 1 Introduction Cheng Li

  2. This class will teach you Core concepts of distributed systems - Abstractions, algorithms, implementation techniques Popular distributed systems and tools used by big companies today - E.g.: Google's protobuf/Bigtable/Spanner/MapReduce, Ceph, Hadoop, Amazon's Dynamo, MXNet, etc. 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 2

  3. References MIT s 6.824 (Robert Morris and Frans Kaashoek) - http://nil.csail.mit.edu/6.824/2018/schedule.html NYU's G22.3033 (Jinyang Li) - http://www.news.cs.nyu.edu/~jinyang/fa16-ds/ UW s CSE452 (Tom Anderson) - https://courses.cs.washington.edu/courses/cse452/18sp/ Acknowledgements: Lecture notes build on these courses! 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 3

  4. References Umich s 491 (HarshaV. Madhyastha) - https://lamport.eecs.umich.edu/#schedule Cornell s 5414 (Lorenzo Alvisi) - http://www.cs.cornell.edu/courses/cs5414/2019fa/ Columbia s 4113 (Roxana Geambasu) - https://columbia.github.io/ds1-class/ Stanford s 244b (David Mazi res) - http://www.scs.stanford.edu/17au-cs244b/ Acknowledgements: Lecture notes build on these courses! 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 4

  5. What is a Distributed System? A distributed system is a collection of independent computers that - communicate via network - cooperate to provide some service - appear to the users of the system as a single system. 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 5

  6. Distributed systems vs. networks Distributed systems raise the level of abstraction Hide many complexities and make it easier to build applications 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 6

  7. Why Distributed Systems? For location transparency Examples: - Your browser doesn t need to know which Google servers are serving Gmail right now - Your Amazon EC2-based mobile app doesn t need to know which servers in S3 are storing its data 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 7

  8. Why Distributed Systems? For scalable capacity Aggregate resources of many computers - CPU: MapReduce, Dryad, Hadoop - Disk: NFS, the Google file system, Hadoop HDFS - Memory: memcached, dist-cache - Bandwidth: Akamai CDN What scales are we talking about? - Typical datacenters have 100-200K machines! - Each service runs on more like 20K machines, though 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 8

  9. Why Distributed Systems? For availability Build a reliable system out of unreliable parts - Hardware can fail: power outage, disk failures, memory corruption, network switch failures - Software can fail: bugs, mis-configuration, upgrade - To achieve 0.9999 availability, replicate data/computation on many hosts with automatic failover 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 9

  10. Availability Simply, each request eventually receives a response. Measured as uptime/(uptime + downtime) - Google Spanner achieves 99.999% Availability 90% ("one nine") 99% ("two nines") 99.9% ("three nines") 99.99% ("four nines") 99.999% ("five nines") 2/25/2025 Downtime per year 36.5 days 3.65 days 8.76 hours 52.56 minutes 5.26 minutes Downtime per month 72 hours 7.20 hours 43.8 minutes 4.38 minutes 25.9 seconds Downtime per day 2.4 hours 14.4 minutes 1.44 minutes 8.64 seconds 864.3 milliseconds USTC-ADSL-Dist-Sys-Lecture-Note 10

  11. Why Distributed Systems? For modular functionality Your application is split into many simpler parts, which may already exist or are easier to implement - Authentication service - Indexing service - Locking service This is called the service-oriented architecture (SOA) and much of the Web is built this way - E.g.: one request on Amazon s website touches tens of services, each with thousands of machines (e.g., pricing service, product rating service, inventory service, shopping cart service, user preferences service, etc ) 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 11

  12. Challenges Achieving location transparency, scalability, availability, and modularity in distributed systems is really hard! System design challenges - What is the right interface or abstraction? Achieving scalability is challenging - How to partition functions for scalability? Consistency challenges - How do machines coordinate to achieve the task? 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 12

  13. Challenges (Continued) Security challenges - How to authenticate clients or servers? - How to defend against misbehaving servers? Fault tolerance challenges - How to keep system available despite machine or network failures? Implementation challenges - How to maximize concurrency? - What s the bottleneck? - How to reduce load on the bottleneck resource? 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 13

  14. A word of warning A distributed system is one in which the failure of a computer you didn t even know existed can render your own computer unusable. --Leslie Lamport 2/25/2025 USTC-ADSL-Dist-Sys-Lecture-Note 14

  15. Distributed Systems Lecture 1 Introduction Q&A!

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#