Web Caching, Proxies, and CDNs in Web Architecture

 
Web Caching, Proxies and CDNs
 
COMP3227 Web Architecture & Hypertext Technologies
 
Dr Heather Packer – hp3@ecs.soton.ac.uk
 
Caching
 
Caching stores the result of an operation so that future operations
return faster
Computation is slow
Computation will run multiple times
When the output is the same for a particular input
 
 
 
 
Web Caching
 
The temporary storage (caching) of frequently accessed data for rapid access
Typically caches store “static assets”
HTML pages
Images
Stylesheets, Javascript
Caches can be located at various points in a network
Reduces access time/latency for clients
Reduces bandwidth usage across slower links
Reduces load on a server
 
Browser Cache
 
Browsers maintain a
small cache
Stored locally
Cache for a single
user or application
Browser sets a caching
policy, deciding what
data to cache
User specific content
Expensive content
 
Local Browser
 
Origin Server
 
What can be Cached?
 
Cache friendly
Logos and brand images
Style sheets
Javascript files, site and library
Fonts
Downloadable content
Media files
 
Be careful caching:
Data
HTML pages
Frequently modified Javascript and CSS
Content requested with authentication cookies
 
Never cache
Sensitive data
User-specific data that frequently changes
 
Caching using Conditional requests
 
Last-Modified: 
Tue 17 Nov 2020 08:00:20 GMT
GET Requests: If-Modified-Since:
HTTP Status code: 304 Not Modified
Etag: “0123456789ABCDEF…”
GET Request: If-None-Match:
HTTP Status code: 304 Not Modified
 
Controlling Caches with HTTP: Last-Modified Header
 
GET / HTTP/1.1
Host: comp3220.ecs.soton.ac.uk
Accept: */*
 
HTTP/1.1 200 OK
Date: Wed 18 Nov 2023 17:43:20 GMT
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Content-Length: 4003
Last-Modified: Tue 17 Nov 2023 08:00:20 GMT
 
GET / HTTP/1.1
Host: comp3220.ecs.soton.ac.uk
Accept: */*
If-Modified-Since: 
Tue 17 Nov 2023 08:00:20 GMT
 
HTTP/1.1 304 Not Modified
Date: Wed 15 Nov 2017 07:55:10 GMT
Connection: keep-alive
Last-Modified: Tue 17 Nov 2023 08:00:20 GMT
 
 
 
GET                                                     Conditional GET
 
Caching Headers
 
HTTP Response header Cache-Control:
 Flags
no-store
no-cache
max-age=
must-revalidate
public
private
Use of Cache-Control headers to be determined by website architect/designer
 
HTTP with Cache-Control Header
 
GET / HTTP/1.1
Host: comp3220.ecs.soton.ac.uk
Accept: */*
 
HTTP/1.1 200 OK
Date: Wed 18 Nov 2023 17:43:20 GMT
Connection: keep-alive
Content-Type: text/html; charset=UTF-8
Content-Length: 4003
Last-Modified: 
Tue 17 Nov 2023 08:00:20 GMT
Cache-Control: max-age=86400
 
* No request sent *
 
GET                                                       GET
 
Different Web Caching Solutions
 
Caches can be located at various points in a network
Browser Cache
Embedded in the browser
Proxy Cache
Reverse Proxy Cache
 
Proxy Cache
 
Cache located close to the
clients (hosted by University or
Internet Service Provider)
Decrease bandwidth usage
Decreases network latency
Scale provides the main
advantage: many users within
the ISP may all be asking for
the same web pages
ISPs use this approach to
decrease bandwidth across
their networks
 
Reverse Proxy Cache
 
 
Cache proxy located closer
to the origin web server
Usually deployed by a Web
host
Decreases load on the Web
service (e.g. database)
Several reverse proxy caches
implemented together can
for a Content Delivery
Network
 
Web Proxies
 
Web Proxy Architecture
 
A web proxy is a network service
They receive web requests from clients and make requests on their behalf to web
servers
A web proxies behaviour differs depending on their function
 
No Proxy
 
Proxy Types
 
Forward Proxy
 
 
 
 
Open Proxy
 
 
 
 
 
 
Reverse Proxy
 
Forward Proxy – Content Filtering
 
Restricts requests
Blocks blacklisted URLs
Response can be scanned
Unwanted content
Malware
 
Forward Proxy – Content Translation
P
 
Transform responses
Reduce bandwidth usage by client
Commonly used by mobile phone networks
Recompress images at lower resolution
Minimising HTML, CSS and JavaScript
Inject content into web pages e.g. adverts
 
Open Forward Proxy - 
Access Services Anonymously
 
Client accessing website via proxy:
Masks their IP address
Modifies their location (via GeoIP)
Improve anonymity
Defeat geo-blocking
No filtering, encryption, or checks over content
 
Reverse Proxy – Load-balancing
 
Distributes incoming web requests to a pool of web servers (targets)
Performance and availability
Many strategies including: round robin, weighted round robin
Health checks ensure resilience
 
Reverse Proxy – Content Switching
static
P
html
api
 
Examine incoming request and direct traffic to specific web servers
Can use:
Host or path based e.g. https://example.com/static -> static webserver
IP address, cookie or user-agent
 
Reverse Proxy – Protocol Translation
P
 
HTTPS  SSL/TLS off-loading (OSI Layer 6)
Incoming web requests over https, internal communications via http
HTTP/2 to HTTP/1.x (OSI Layer 7)
IPv6 <-> IPv4 (OSI Layer 3)
 
Reverse Proxy – Monitoring and filtering
P
 
Monitor incoming requests
Access logs / statistics
Filter incoming requests
Check security credentials (eg valid API tokens)
Rate limit requests
Intrusion detection and handling DDoS attacks
Applying security policies (WAF)
 
Content Delivery Networks
Motivation Scenario
 
Stream video content to 100,000+ simultaneous users
You could use a single large “mega-server”
Single point of failure
Point of network congestion
Long path to distant clients
Multiple copies of video sent over outgoing link
This solution 
doesn’t
 work in practice
 
Content Delivery Network
 
Origin
Server
 
CDN a geographically
distributed network
of proxy servers
(edge nodes)
Hosts static content
(such as images,
videos, CSS and JS)
Data travels to user
via the shortest path
(reduced latency)
 
A map of global fiber backbone networks and Internet exchange points
CDN
Services that use CDNs
Netflix
Amazon
Reddit
Twitch
gov.uk
PayPal
Shopify
BBC.com
Vimeo
YouTube
Hulu
Wikipedia
CNN
New York Times
The Guardian
Stack Overflow
GitHub
Stripe
Quora
Video streaming
Software downloads
Web and mobile
content acceleration
Payment services
E-commerce
News
Commerical CDNS
Limelight Networks
Level 3
Communications
Akamai Technologies
Amazon CloudCDN
CloudFlare
Motivational Scenario
Streaming video to 100,000+ simultaneous users
Working Web solution: store/serve many copies of video at multiple geographically distributed sites
(CDN)
Two strategies:
1.
Push CDN servers deep into many access networks
Close to users
Placed near ISP
Used by Akamai, 1700 locations
2.
Place larger clusters at key points in the network near internet exchanges
Internet exchanges where network providers connect their networks to each other
Dedicated high speed private networks are used to connect the clusters together
Used by Limelight
Better latency and better network performance.
Harder to maintain because there are many more servers in the CDN.
Higher latency and lower performance for the end user
Easier to manage but with higher latency and lower performance for
the end user.
 
CDN: Simple content access scenario
 
A CDN has to be able to tell clients where to find resources
A client will request a file, with one URL but retrieve it from another
http://video.netcinema.com/6Y7B23V
http://KingCDN.com/NetC6Y7B23V
1.
Contacts 
www.netcinema.com
 and receives a link to a video
http
://video.netcinema.com/6Y7B23V
 
 
6. Requests URL
http://cdn1.KingCDN.com/NetC6Y7B23V
 
CDN Cluster Selection Strategy
 
The CDN’s DNS decides which edge server to use
Pick CDN node geographically closest to client
Pick CDN node with shortest delay (min hops) to client (CDN nodes periodically ping
access ISPs, report results to CDN DNS)
Or let the Client decide – give client a list of several CDN servers
 
Case Study: Netflix’s First Approach
 
Owned very little infrastructure, uses 3
rd
 party services
Own registration, payment servers
Amazon (3
rd
 party) cloud services
Netflix uploads studio master to Amazon cloud
Create multiple version of movie (different encodings) in cloud
Upload versions from cloud to CDNs
Three 3
rd
 party CDNs host/stream Netflix content: Akamai, Limelight, Level-3
 
Case Study: Netflix
 
DASH - Dynamic Adaptive Streaming over HTTP
 
Server
Divides video files into multiple chucks
Each chunk stored encoded at different bit rates
Manifest file: provides URLs for different chunks
Client
Periodically measures server-to-client bandwidth
Consulting manifest, requests one chunk at a time
Chooses maximum coding rate sustainable given current bandwidth
Can choose different coding rates at different points in time (depending on available
bandwidth at time)
The intelligence happens at the client level to make sure that there is no buffer
starvation or overflow
 
 
 
MPEG-DASH Adoption
 
MPEG DASH is independent, open and international standard, which has broad support
from the industry
HTML5 Media Source Extensions and HbbTV are MPEG-DASH enabled
Heavy plugins like Silverlight and Flash perform poorly and cause security issues
 
Chrome dropped Silverlight support in 2015
Firefox dropped Silverlight support in 2017
It was a problem for the majority of premium video providers
Video providers delivered their streams via Smoothstreaming and Playready DRM, which
enforced Silverlight
These providers switch to using HTML5 with MPEG-DASH and MPEG-CENC based DRM
Less than 0.02% of sites used Silverlight 2023
 
 
Netflix OpenConnect
 
Netflix wanted the absolute best streaming they could get, while lowering cost
High optimised for delivery large files, still use Akamai for small assets
Data centers around the world
There may be a data center with a couple of racks that contain the entire Netflix
library
Others might only have 80% of the most popular content
Unpopular material will have to travel further
Client Intelligence
Calculates best edge server to use (based on bit rate and closeness)
Selects which edge server based on the required bit rate and latency
Continually probes the best way of receiving content
 
Next: Web Browsers
Slide Note
Embed
Share

This comprehensive guide delves into the concepts of web caching, proxies, and CDNs, explaining their importance in web architecture. It covers topics such as caching mechanisms, browser cache management, what can be cached, and controlling caches with HTTP headers. The provided images visually illustrate key points discussed throughout the content.

  • Web Caching
  • Proxies
  • CDNs
  • Web Architecture
  • Hypertext Technologies

Uploaded on Oct 05, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Web Caching, Proxies and CDNs COMP3227 Web Architecture & Hypertext Technologies Dr Heather Packer hp3@ecs.soton.ac.uk

  2. Caching Caching stores the result of an operation so that future operations return faster Computation is slow Computation will run multiple times When the output is the same for a particular input 3

  3. Web Caching The temporary storage (caching) of frequently accessed data for rapid access Typically caches store static assets HTML pages Images Stylesheets, Javascript Caches can be located at various points in a network Reduces access time/latency for clients Reduces bandwidth usage across slower links Reduces load on a server 4

  4. Browser Cache Browsers maintain a small cache Local Browser Origin Server Stored locally Cache for a single user or application Browser sets a caching policy, deciding what data to cache resource User specific content Browser Cache Expensive content 5

  5. What can be Cached? Be careful caching: Cache friendly Logos and brand images Style sheets Javascript files, site and library Fonts Downloadable content Media files Data HTML pages Frequently modified Javascript and CSS Content requested with authentication cookies Never cache Sensitive data User-specific data that frequently changes 6

  6. Caching using Conditional requests Last-Modified: Tue 17 Nov 2020 08:00:20 GMT GET Requests: If-Modified-Since: HTTP Status code: 304 Not Modified Etag: 0123456789ABCDEF GET Request: If-None-Match: HTTP Status code: 304 Not Modified 7

  7. Controlling Caches with HTTP: Last-Modified Header GET Conditional GET GET / HTTP/1.1 GET / HTTP/1.1 Host: comp3220.ecs.soton.ac.uk Host: comp3220.ecs.soton.ac.uk Accept: */* Accept: */* If-Modified-Since: Tue 17 Nov 2023 08:00:20 GMT HTTP/1.1 200 OK Date: Wed 18 Nov 2023 17:43:20 GMT Connection: keep-alive HTTP/1.1 304 Not Modified Content-Type: text/html; charset=UTF-8 Date: Wed 15 Nov 2017 07:55:10 GMT Content-Length: 4003 Connection: keep-alive Last-Modified: Tue 17 Nov 2023 08:00:20 GMT Last-Modified: Tue 17 Nov 2023 08:00:20 GMT 8

  8. Caching Headers HTTP Response header Cache-Control: Flags no-store no-cache max-age= must-revalidate public private Use of Cache-Control headers to be determined by website architect/designer 9

  9. HTTP with Cache-Control Header GET GET GET / HTTP/1.1 * No request sent * Host: comp3220.ecs.soton.ac.uk Accept: */* HTTP/1.1 200 OK Date: Wed 18 Nov 2023 17:43:20 GMT Connection: keep-alive Content-Type: text/html; charset=UTF-8 Content-Length: 4003 Last-Modified: Tue 17 Nov 2023 08:00:20 GMT Cache-Control: max-age=86400 10

  10. Different Web Caching Solutions Caches can be located at various points in a network Browser Cache Embedded in the browser Proxy Cache Reverse Proxy Cache 11

  11. Proxy Cache Cache located close to the clients (hosted by University or Internet Service Provider) Server Decrease bandwidth usage Decreases network latency Local Machine Scale provides the main advantage: many users within the ISP may all be asking for the same web pages resource ISPs use this approach to decrease bandwidth across their networks Proxy Cache 12

  12. Reverse Proxy Cache Cache proxy located closer to the origin web server Server Usually deployed by a Web host Decreases load on the Web service (e.g. database) Local Machine Reverse Proxy Cache Several reverse proxy caches implemented together can for a Content Delivery Network resource 13

  13. Web Proxies

  14. Web Proxy Architecture A web proxy is a network service They receive web requests from clients and make requests on their behalf to web servers A web proxies behaviour differs depending on their function 15

  15. No Proxy 16

  16. Proxy Types P Forward Proxy P Open Proxy P Reverse Proxy 17

  17. Forward Proxy Content Filtering P Restricts requests Blocks blacklisted URLs Response can be scanned Unwanted content Malware 18

  18. Forward Proxy Content Translation P Transform responses Reduce bandwidth usage by client Commonly used by mobile phone networks Recompress images at lower resolution Minimising HTML, CSS and JavaScript Inject content into web pages e.g. adverts 19

  19. Open Forward Proxy - Access Services Anonymously P Client accessing website via proxy: Masks their IP address Modifies their location (via GeoIP) Improve anonymity Defeat geo-blocking No filtering, encryption, or checks over content 20

  20. Reverse Proxy Load-balancing P Distributes incoming web requests to a pool of web servers (targets) Performance and availability Many strategies including: round robin, weighted round robin Health checks ensure resilience 21

  21. Reverse Proxy Content Switching static P html Examine incoming request and direct traffic to specific web servers Can use: Host or path based e.g. https://example.com/static -> static webserver IP address, cookie or user-agent api 22

  22. Reverse Proxy Protocol Translation P HTTPS SSL/TLS off-loading (OSI Layer 6) Incoming web requests over https, internal communications via http HTTP/2 to HTTP/1.x (OSI Layer 7) IPv6 <-> IPv4 (OSI Layer 3) 23

  23. Reverse Proxy Monitoring and filtering P Monitor incoming requests Access logs / statistics Filter incoming requests Check security credentials (eg valid API tokens) Rate limit requests Intrusion detection and handling DDoS attacks Applying security policies (WAF) 24

  24. Content Delivery Networks

  25. Motivation Scenario Stream video content to 100,000+ simultaneous users You could use a single large mega-server Single point of failure Point of network congestion Long path to distant clients Multiple copies of video sent over outgoing link This solution doesn t work in practice 26

  26. Content Delivery Network User Connection CDN Connection Origin Server Edge Server User CDN a geographically distributed network of proxy servers (edge nodes) Hosts static content (such as images, videos, CSS and JS) Data travels to user via the shortest path (reduced latency) 27

  27. A map of global fiber backbone networks and Internet exchange points 28

  28. CDN Video streaming Software downloads Web and mobile content acceleration Payment services E-commerce News YouTube Services that use CDNs Hulu Netflix Wikipedia Amazon CNN Reddit New York Times Twitch The Guardian gov.uk Commerical CDNS Limelight Networks Level 3 Communications Akamai Technologies Amazon CloudCDN CloudFlare Stack Overflow PayPal GitHub Shopify Stripe BBC.com Quora Vimeo 29

  29. Motivational Scenario Streaming video to 100,000+ simultaneous users Working Web solution: store/serve many copies of video at multiple geographically distributed sites (CDN) Two strategies: 1. Push CDN servers deep into many access networks Close to users Better latency and better network performance. Placed near ISP Harder to maintain because there are many more servers in the CDN. Used by Akamai, 1700 locations 2. Place larger clusters at key points in the network near internet exchanges Higher latency and lower performance for the end user Internet exchanges where network providers connect their networks to each other Dedicated high speed private networks are used to connect the clusters together Easier to manage but with higher latency and lower performance for the end user. Used by Limelight 30

  30. CDN: Simple content access scenario A CDN has to be able to tell clients where to find resources A client will request a file, with one URL but retrieve it from another http://video.netcinema.com/6Y7B23V http://KingCDN.com/NetC6Y7B23V 31

  31. 1. Contacts www.netcinema.com and receives a link to a video http://video.netcinema.com/6Y7B23V 32

  32. 2. Resolves Domain name via local DNS video.netcinema.com 33

  33. 3. NetCinemas DNS returns cdn1.KingCDN.com 4. Resolves Domain name cdn1.KingCDN.com 5. Returns IP of cdn1.KingCDN.com 34

  34. 6. Requests URL http://cdn1.KingCDN.com/NetC6Y7B23V 35

  35. CDN Cluster Selection Strategy The CDN s DNS decides which edge server to use Pick CDN node geographically closest to client Pick CDN node with shortest delay (min hops) to client (CDN nodes periodically ping access ISPs, report results to CDN DNS) Or let the Client decide give client a list of several CDN servers 36

  36. Case Study: Netflixs First Approach Owned very little infrastructure, uses 3rd party services Own registration, payment servers Amazon (3rd party) cloud services Netflix uploads studio master to Amazon cloud Create multiple version of movie (different encodings) in cloud Upload versions from cloud to CDNs Three 3rd party CDNs host/stream Netflix content: Akamai, Limelight, Level-3 37

  37. Case Study: Netflix 38

  38. DASH - Dynamic Adaptive Streaming over HTTP Server Divides video files into multiple chucks Each chunk stored encoded at different bit rates Manifest file: provides URLs for different chunks Client Periodically measures server-to-client bandwidth Consulting manifest, requests one chunk at a time Chooses maximum coding rate sustainable given current bandwidth Can choose different coding rates at different points in time (depending on available bandwidth at time) The intelligence happens at the client level to make sure that there is no buffer starvation or overflow 39

  39. MPEG-DASH Adoption MPEG DASH is independent, open and international standard, which has broad support from the industry HTML5 Media Source Extensions and HbbTV are MPEG-DASH enabled Heavy plugins like Silverlight and Flash perform poorly and cause security issues Chrome dropped Silverlight support in 2015 Firefox dropped Silverlight support in 2017 It was a problem for the majority of premium video providers Video providers delivered their streams via Smoothstreaming and Playready DRM, which enforced Silverlight These providers switch to using HTML5 with MPEG-DASH and MPEG-CENC based DRM Less than 0.02% of sites used Silverlight 2023 40

  40. Netflix OpenConnect Netflix wanted the absolute best streaming they could get, while lowering cost High optimised for delivery large files, still use Akamai for small assets Data centers around the world There may be a data center with a couple of racks that contain the entire Netflix library Others might only have 80% of the most popular content Unpopular material will have to travel further Client Intelligence Calculates best edge server to use (based on bit rate and closeness) Selects which edge server based on the required bit rate and latency Continually probes the best way of receiving content 41

  41. Next: Web Browsers

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#