HA Importance for PayPal: Ensuring Cloud Resilience

openstack ha @paypal n.w
1 / 17
Embed
Share

"Explore why high availability (HA) is crucial for PayPal's payment solutions, ensuring no downtime for users and seamless scaling across data centers and infrastructure racks. Learn about the key considerations and strategies implemented by PayPal to maintain reliability and performance in their cloud environment."

  • PayPal
  • Cloud Resilience
  • High Availability
  • Infrastructure
  • Scaling

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. OPENSTACK HA @PAYPAL Open Stack Summit Hong Kong - 2013

  2. ABOUT PAYPAL PayPal offers flexible and innovative payment solutions for consumers and merchants of all sizes. 137,000,000 users $300,000 payments processed each minute 193 markets / 26 currencies The World s Most Widely Used Digital Wallet 2

  3. AGENDA Why HA is important for PayPal? Our Learning Our Solution What is not solved? Q&A 3

  4. WHY HA IS IMPORTANT? no perceived downtime for cloud users Enterprise Class Auto Scaling & Flex up/down can never break API Integrations always succeed Everyone expected to use the cloud 4

  5. AVAILABILITY REQUIREMENTS No SPOF Under the Cloud Scale Across the Data Center(s) Scale Across Racks & Containers Respect natural availability zones within the data centers No cloud can impact any other cloud 5

  6. INFRASTRUCTURE RACK Layer 2 versus Layer 3 Cattle & Puppies Access LB Active LB Passive Mgmt Mgmt Mgmt Mgmt 1g 1g 1g 1g Passive Passive Passive Passive 10g 10g 10g 10g Active Active 10g 10g Active Active 10g 10g Infrastructure / Controller Racks Compute Racks 6

  7. INFRASTRUCTURE RACK OpenStack Services are all VM on KVM Every infra component resides on 2+ nodes Redundant physical racks Redundant power/switches in each rack Layer-3 connectivity between racks (no Layer 2) Enterprise Grade Physical LB (floating VIP) 7

  8. COMPUTE 1 2 Access LB Active LB Passive LB Active LB Passive 3 Mgmt Mgmt Mgmt Mgmt Mgmt Mgmt Mgmt Mgmt 1g 1g 1g 1g 1g 1g 1g 1g Passive Passive Passive Passive Passive Passive Passive Passive 10g 10g 10g 10g 10g 10g 10g 10g Active Active Active Active 10g 10g 10g 10g Active Active Active Active 10g 10g 10g 10g Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk Compute Node 96 Hyperscale 16 Core 256GB Ram 1.1T Disk 8

  9. COMPUTE Active Passive Top Of Rack Top Of Rack 10g 10g 10g 10g 10g 10g 10g 10g Hyperscale Raid-10 Hyperscale Raid-10 bond0 bond0 1g 1g 1g 1g Management 9

  10. OPENSTACK SERVICES swift storage node swift storage node swift storage node swift 6000 / TCP swift-object swift-container swift-account Browser 6001 / TCP UDNS (DNSaas) UDNS (DNSaas) 53 / TCP 10053 / TCP 6002 / TCP DNS Master LBaas LBaas 80 / TCP Quantum Server Quantum Server 22,80,443,161 / TCP 161/ UDP F5 Load Balancer quantum Openstack Controller Openstack Controller Openstack Controller 80 / TCP 80 / TCP Remedy API 9696 / TCP quantum-api httpd (dashboard) 443 / TCP 8140 / TCP Puppet DB glance F5 Load Balancer Nicira NVP Controller Nicira NVP Controller Nicira NVP Controller F5 Load Balancer 9292 / TCP 61613 / TCP Puppet VIP glance-admin glance-reg 9191 / TCP keystone keystone-admin keystone-api 6633 / TCP 6632 / TCP openflow mgmt port 35357 / TCP 5000 / TCP 8773 / TCP nova nova-api novametadata-api novavolume-api 8774 / TCP 8776 / TCP NVP Service Node NVP Service Node NVP Service Node 8080 / TCP swift-proxy 3115 / TCP 3115 / TCP xxxx / TCP NVP Gateway NVP Gateway NVP Gateway MYSQL DB MYSQL DB Mongo DB Mongo DB Compute Node Hypervisor mongo db mysql 5 nova mq OpenVswitch ovs-vswitchd ovsdb-server puppet

  11. OPENSTACK CONSIDERATIONS LB VIP for every service (unless it can t) Connect to LB VIP, not individual nodes Script to close Server Connections Pacemaker only works inside a single Layer-2 (not a large enterprise) Auto Restart using Monit MySQL Swift Cluster 11

  12. CONTINUED HEAT with Corosync/Pacemaker/keepalived (for now) KeyStone / Nova / Glance / Swift Proxy Rabbit MQ Cluster Cinder Volume Service 12

  13. CINDER SERVICES WORKFLOW User request (create volume)1 Figure shows a typical interaction between Cinder components to serve a end user request. (create new volume in this example). Cinder API Cinder Scheduler 3 2 4 AMPQ 5 Cinder Volume 6 Storage Back- Storage Back- end2 13 end1

  14. CINDER SERVICES WITH HA User request (create volume)1 How HA is implemented for Cinder Components: Load Balancer Cinder Scheduler A API (stateless) Load Balancer (A/A or A/P); 2 Cinder Scheduler B Cinder API A Cinder API B Scheduler (stateless) Pacemaker, Queue itself (A/A or A/P); AMPQ Cluster 4 3 Volume Pacemaker, Queue itself (A/A or A/P). 5 Cinder Volume A Cinder Volume B 6 Storage Back- end1 Storage Back- end2 14

  15. UNRESOLVED VIP-friendly Cinder Volume service Seamless Upgrade Flip Failed DB TX Reconciliation Consistent API Response Time 15

  16. cloud@paypal.com 16 Confidential and Proprietary

  17. THANK YOU HTTP://GITHUB.COM/PAYPAL/AURORA SCOTT CARLSON - @RELAXED137 RAJ GEDA ZHITENG HUANG IRC:WINSTON-D

Related


More Related Content