Understanding OpenStack Availability Zones for Optimal Workload Availability

Slide Note

Delve into the concept, design, and implementation of Availability Zones (AZs) in OpenStack through a detailed discussion by Craig Anderson, Principal Cloud Architect at AT&T. Explore the benefits of AZs, strategies for organizing them, and the value proposition they offer for achieving high workload availability within a region. Gain insights into common failure modes, single points of failure, and considerations for maximizing the effectiveness of AZs in your cloud infrastructure.

eurydice Follow

Uploaded on Aug 03, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Curse your bones, Availability Zones!

Introduction/Background Craig Anderson, Principal Cloud Architect, AT&T The target audience is OpenStack providers that want to understand Availability Zones, how to get the most out of them, and associated challenges The first half of the talk will cover high-level Availability Zone concept and design The second half of the talk will cover low-level details of how Availability Zones are implemented in three OpenStack projects AZ(s) will be used as shorthand for Availability Zone(s)

What is an AZ, and what benefit does it provide? It is a failure domain within a Region for tenant workloads It provides the potential for tenants to achieve higher workload availability within a given Region OpenStack implementation of AZs is flexible - there are other ways in which they can be used

How do I organize my AZs? AZs commonly organized around common modes of failure and single points of failure Example - Defining AZs by power source:

How do I organize my AZs? Another example - racks with only one top of rack switch (single point of failure). Make each rack its own AZ: However, tenants must specify AZs explicitly And, tenants can t interrogate free AZ capacity (b/c this is cloud) Too many AZs means capacity management headaches, and AZ guessing games for tenants

How do I organize my AZs? A second look at AZs by rack: vs.

What is the real value proposition for my AZs? But, it s hard to pin down the value provided by your AZs for these kinds of infrequent failure scenarios Data on failure modes is often sparse or unavailable Also, you may not have clear single points of failure to define AZs against It s not uncommon for data centers to be built with power diversity (A/B side) to the rack, redundant server PSUs, primary & backup TORs in each rack, etc.

A better value proposition: Planned Maintenance Planned maintenance activities often account for more downtime than random hardware failures. Ex: Data center maintenance - moving & upgrading equipment, recabling, rebuilding a rack, HVAC & electrical work, etc. Disruptive software updates - for the kernel, QEMU, certain security patches, OpenStack and operating system upgrades, etc. The catch is that these maintenance processes need to be aware of your AZs and need to be adapted to take advantage of them

One more way to organize your AZs - By Physical Domain Defining AZs by this time by physical boundaries: There is no AZ tiering , so you have to pick one AZ definition Switching AZ models won t be fun (workload migration)

Other uses for AZs Using AZs to distinguish cloud offerings - e.g., VMWare AZ1 and KVM AZ1 Instead can use hypervisor_type image metadata; private flavors & volume types Special tenant(s) with their own private AZ(s); security concern Better to use multi-tenancy isolation filter; private flavors. AZs don t provide security; they are usable and visible to everyone One AZ per compute host (targeted workload stack per node) Affinity / anti-affinity; sameHost / differentHost filters

AZ Implementations in OpenStack

The implementation of AZs varies between OpenStack projects Most OpenStack projects have some concept of AZs However, the implementation of AZs varies from project to project Therefore, planning your AZs requires a case-by-case look at the OpenStack projects you will be using We will look at the AZ implementation in detail for just three projects - Nova, Cinder, and Neutron

Nova AZs - Basics nova-compute agents are (indirectly) mapped to AZs: Ex - VM-1 is scheduled to AZ2: openstack server create ... --availability-zone AZ2 VM-1

Nova AZs are created as host aggregates Nova AZs are just host aggregates with availability_zone metadata set: $ nova aggregate-create HA1 +----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 1 | HA1 | - +----+---------+-------------------+-------+----------+ | | | $ nova aggregate-set-metadata HA1 availability_zone=AZ1 +----+---------+-------------------+-------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+------------------------+ | 1 | HA1 | AZ1 +----+---------+-------------------+-------+------------------------+ | | 'availability_zone=AZ1'|

Exceptions: Non-aggregate Nova AZs The AZ computes live in when not in a host-aggregate AZ. Set to any value not matching your other AZs. (Note that this is a global setting used by nova-api, not set per- compute.) The AZ where all other Nova services live. It has no practical function, but should be set to a unique AZ. internal_service_availability_zone = internal # nova.conf used by nova-api [DEFAULT] default_availability_zone = awaiting_AZ_assignment # nova.conf used by nova-api [DEFAULT] +----+------------------+---------------+------------------------+ | Id | Binary | Host | Zone | +----+------------------+---------------+------------------------+ | 1 | nova-compute | compute5 | awaiting_AZ_assignment | | 2 | nova-compute | compute6 | awaiting_AZ_assignment | | 3 | nova-conductor | control-host1 | internal | | 4 | nova-consoleauth | control-host1 | internal | | 5 | nova-scheduler | control-host1 | internal |

Nova AZs - Other info Nova will not allow the same compute host to belong to more than one host aggregate with conflicting availability_zone metadata. In other words, no overlapping AZs. A default AZ can be configured (used to schedule VMs if the caller does not specify any AZ): # nova.conf used by nova-api [DEFAULT] # Ex - default scheduling to AZ1 (not advised) default_schedule_zone = AZ1 You usually don t need/want to set this parameter - if caller omits AZ, let Nova schedule to any node with capacity If a user requests an invalid availability zone, the API call will fail. There is no availability zone fallback option.

Nova cross_az_attach cross_az_attach is an option that instructs Nova API to prohibit volume attachments to VMs with non-matching AZs: # nova.conf used by nova-api [cinder] cross_az_attach = False Useful when you want more deterministic behavior with respect to performance and/or availability by enforcing per-AZ storage backends Keep in that a setting of False: Should only be used when Cinder AZs match Nova AZs Can break some Boot from Volume API flows in Nova https://bugs.launchpad.net/nova/+bug/1497253 Is not currently supported when using Nova Cells v2 https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls

Cinder AZs - Basics cinder-volume agents are mapped to AZs: Ex - Vol-1 is scheduled to SSD backend in AZ2: openstack volume create ... --type SSD-LVM --availability-zone AZ2 Vol-1

Cinder AZs are created via service configs Define Cinder AZs in the cinder.conf for each cinder-volume service: # cinder.conf used by cinder-volume [DEFAULT] # Set the AZ that this cinder-volume service should be registered with storage_availability_zone = AZ1 Only one AZ can be specified (no overlapping AZs) Cinder service list for SSD backends: +---------------+------------------+------+---------+-------+ | Binary | Host | Zone | Status | State | +---------------+------------------+------+---------+-------+ | cinder-volume | lvmhost1@SSD-LVM | AZ1 | enabled | up | | cinder-volume | lvmhost2@SSD-LVM | AZ1 | enabled | up | | cinder-volume | lvmhost3@SSD-LVM | AZ2 | enabled | up | | cinder-volume | lvmhost4@SSD-LVM | AZ2 | enabled | up |

Cinder - Multi-backend Starting in the Pike release, we are able to define AZs on a per-backend basis. Continuing with the last LVM example: [SSD-LVM] backend_availability_zone=AZ1 ... [HDD-LVM] backend_availability_zone=AZ1 ... This is possible by setting backend_availability_zone for each backend Can be useful in cases where the same cinder-volume must manage remote backends in different AZs. Ex: However, it s more ideal if you are able to deploy cinder- volume into each AZ to keep cinder-volume services in the same failure domain as the storage backends they manage

Cinder - Default AZ A default AZ for volume scheduling can be set in cinder-api (used if the caller doesn t specify an AZ). Use this when you have only one Cinder AZ (e.g., a shared Ceph cluster): # cinder.conf used by cinder-api [DEFAULT] # Fallback to storage_availability_zone if default is not defined default_availability_zone = Ceph # This is a scheduling param, despite the name # Fallback to hard-coded default nova AZ if neither are defined storage_availability_zone = Ceph # Different function for cinder-api (scheduling) Unlike Nova, Cinder will always try to schedule to a statically defined AZ. It will not auto-select between AZs like Nova. Therefore if you have more than one Cinder AZ, consider configuring an invalid AZ (so the API call fails instead of promoting lopsided volume allocations) - e.g.: default_availability_zone = AZ_MISSING_FROM_API_CALL

Cinder - Distributed storage backends Cinder AZs will be driven by the storage backend(s) you use Third-party storage appliances and distributed storage backends (like Ceph) have their own redundancy that exist apart from OpenStack AZs Use the following option when you have only one Cinder AZ - e.g., Ceph: # cinder.conf used by cinder-api [DEFAULT] allow_availability_zone_fallback = True Nova AZ1 Nova AZ2 If the AZ requested is not defined in Cinder, it will schedule the volume to the default_availability_zone. Useful when Nova, Heat, or other third party client libraries attempt to create volumes in the same AZ as VMs - AZs that don t exist in Cinder (i.e., they assume consistent AZ naming between Nova and Cinder) Request AZ1 vol Request AZ2 vol Ceph AZ (Cinder) Return Ceph AZ vol (with fallback enabled)

Neutron - DHCP Agent AZs Neutron AZ support added in Mitaka for dhcp and L3 agents. Prior HA was available, but not AZ-aware. dhcp-agent example: dhcp_agents_per_network = 2 Ex - Net-1 is scheduled to both AZs: $ openstack network create --availability-zone-hint AZ1 \ --availability-zone-hint AZ2 Net-1 (Ensure that AZAwareWeightScheduler is used in neutron.conf)

Neutron - L3 Agent AZs Same story for L3 agent (deployed in HA mode) as for DHCP: min_l3_agents_per_router = 2 l3_ha = True Ex - Router-1 is scheduled to both AZs: $ openstack router create --availability-zone-hint AZ1 \ --ha=True --availability-zone-hint AZ2 Router-1 (Ensure that AZLeastRoutersScheduler is used in neutron.conf)

Neutron - AZ assignments for agents Setup AZs for Neutron DHCP and L3 agents to match Nova AZs, ex: # dhcp_agent.ini and l3_agent.ini [AGENT] # Set the AZ that this neutron agent should be registered with availability_zone = AZ1 A single agent can t belong to more than one AZ (no overlapping AZs) A default AZ can optionally be set in neutron-server config: # neutron-server neutron.conf [DEFAULT] # Only useful for reducing the number of AZ candidates. Ex, exclude AZ2: default_availability_zones = AZ1,AZ3 # (Optimal to leave undefined) Permits multiple AZs, but you usually don t want to set this parameter (AZ scheduler drivers consider AZ weighting regardless) Tenants don t need to supply AZs either in API calls, as long as dhcp_agents_per_network and min_l3_agents_per_router >= the number of AZs you have (this is recommended)

Neutron - Simplifying UX Simplifying UX with the right configs: >= number of AZs with a dhcp-agent Leave blank to include all AZs in weighting dhcp_agents_per_network = 2 #default_availability_zones = Ex - AZ hints no longer needed in user API calls: $ openstack network create --availability-zone-hint AZ1 \ --availability-zone-hint AZ2 Net-1

Neutron - Best Effort Scheduling AZ requests are fulfilled on a best-effort basis, so you are not guaranteed to get all the AZs you ask for. Ex, with 2 DHCP agents per network: $ openstack network create --availability-zone AZ1 \ --availability-zone AZ2 Net-1 Created a new network: ... | admin_state_up | True | | availability_zone_hints | AZ1 | | | AZ2 | $ openstack network show Net-1 ... | availability_zone_hints | AZ1 | | | AZ2 | | availability_zones | AZ1 | | | AZ1 | Note - It is possible for the AZs actually hosting the resource not to match the AZ hints (e.g., AZ2 agent failure, admin state down, or agent capacity exceeded) Gains the availability_zones attribute only after the resource is scheduled (subnet creation, router interface attach or gateway set, etc)

Neutron - AZ Corner cases & limitations Neutron AZs do not apply to L3 for certain scenarios, such as: VMs using provider networks (without Neutron routers) VMs with floating IPs when Neutron DVR is configured No AZ support for LBaaS, FWaaS, VPNaaS agents (but planned for the future) To use Neutron AZs, your Neutron plugins must support the relevant availability_zone extensions. Currently this is only the case for the ML2 and L3 router plugins. If a user requests one or more invalid availability zones, the API call will fail. There is no availability zone fallback option.

Project comparison summary Nova Cinder Neutron Can set to one AZ or none. Each VM lives in one AZ. Can set one AZ; cannot set none. Each volume lives in one AZ. Can set to any list of AZs or none. The same resource gets scheduled across multiple AZs. Default AZ scheduling No more than 1 AZ per nova-compute; set via host-aggregates No more than 1 AZ per cinder backend; set via agent config No more than 1 AZ per neutron agent; set via agent config AZ definition Can specify one AZ or none Can specify one AZ or none Can specify any number of AZs, but not all are guaranteed Client API AZ Restrictions Not supported Supported via config Not Supported AZ Fallback Commodity HW for computes; libvirt driver Commodity HW for storage; LVM iSCSI driver Commodity HW for neutron agents; ML2 plugin w/ DHCP & L3 OpenStack AZs more typically used when you have ... Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter) Third party drivers, backends, etc. that manage their own HA Third party plugins, backends, etc. that manage their own HA OpenStack AZs not typically used when you have ...

Summary of the AZ Curse (challenges) Good AZ design requires careful planning and coordination and at all layers of the solution stack: End-users / tenant application must be built for HA and be AZ-capable, and the AZ design informed by their availability/application requirements AZs should be analyzed on a case-by-case basis for each OpenStack project in the scope of deployment with respect to current limitations, implementation differences, and general UX AZ-aware software update/upgrade processes are needed to get the most out of AZs (to the extent that they are the leading cause of service interruptions) Storage and Network architecture designed with AZs in mind AZ-aware planned data center maintenance activities (e.g., evacuations for node servicing, rewiring or relocating physical equipment, etc) Informed AZ planning from the understanding of likely data center modes of failure (ideally backed up with supporting data)

Dont forget the cost-benefit analysis Defining AZs is one thing. Actually achieving higher application availability is another. Don t use AZs for the sake of using AZs. You need to be able to show some tangible value of having them, or a plan to get there (e.g., developing AZ-aware update/upgrade processes, maintenance procedures, etc)