OpenStack Upgrades: A Comprehensive Overview

undefined
OpenStack Upgrade:
A journey from
Liberty to Ocata
Ajay Kalambur, Technical Leader
Shail Bhargava, Technical Leader
Rich winters, Senior Software Eng.
November 8, 2017
Agenda
OpenStack Upgrades: Why Should One Care?
What Does An Upgrade Entail?
Environment Overview
Key Areas
OS: RHEL Upgrade major releases
Storage: CEPH Upgrade from Hammer to Jewel
OpenStack Ver: Upgrade from Liberty to Ocata
Infra Services; Upgrade of rabbitmq, galera, haproxy
Rollback Of An Upgrade (Newton-Ocata)
Verification/Testing Of An Upgrade Process
2
OpenStack Upgrade: Why Should One Care?
Minimize Any Production Cloud Downtime
Minimize data plane and control plane disruption
No need to recreate workloads on cloud
Having a secondary cloud to migrate workloads is not viable economically
Move At The Speed Of Open Source
Bring in new OpenStack features
Improved stability with each release (e.g. Bug fixes)
Reduce unnecessary support (e.g. EOL code)
3
OpenStack Upgrade: What does it entail??
General
Package Upgrade
Upgrade packages to bring in new software.
Update configuration files
Update configuration files with the latest parameters
Configuration Translations across releases
Sync databases
Run a database sync to update schemas to the new structure
Deployment Specific
Containers vs Host vs VM
Operating system
CEPH
Infra services (rabbitmq, galera, haproxy, etc.)
OpenStack Services
4
Our Environment
 
5
Ansible playbooks used to deploy Openstack services within Docker containers
Docker containers started via systemd
Haproxy used for load balancing
Containerized OpenStack Services
Openstack API load balanced via HA proxy and Keepalived [Active/Standby].
Openstack Services/Message queue/database high availability implemented using three
OpenStack control nodes.
Control Plane – High Availability
HA Proxy [Standby]
Galera (Standby)
Keystone
Horizon
Nova
Neutron Server
Others …
HA Proxy [Active]
Galera (Standby)
Keystone
Horizon
Nova
Neutron Server
Others …
Galera (Active)
Keystone
Horizon
Nova
Neutron Server
Others …
Control Node 2
Control Node 1
Control Node 3
Management
Network
API Network
HA Proxy [Standby]
 
API Call #2
 
API Call #1
Upgrade High Level View
8
Key Upgrade Events
Mitaka 
 
Newton
Liberty
 
 
Mitaka
Ocata
CEPH Upgrade:
Hammer to Jewel
Infra Services
OpenStack Services
Host Packages and Kernel
Upgrade (7.2->7.4)
Infra Service Upgrade
Openstack Services
Infra Services
OpenStack Services
Newton 
 
Ocata
Infra Services
OpenStack Services
Key Service Events (During an Upgrade)
L
I
B
E
R
T
Y
M
I
T
A
K
A
N
E
W
T
O
N
O
C
A
T
A
S
A
N
I
 
T
Y
S
A
N
I
T
Y
Upgrade Specifics
 
10
11
Operating System and Host Package Details
12
Ceph Upgrade Details
13
Infrastructure Upgrade: Rabbit & Galera
14
OpenStack Services Upgrade
U
p
g
r
a
d
e
 
a
n
d
 
R
o
l
l
b
a
c
k
 
U
s
e
r
 
f
l
o
w
Upgrade
success
Yes
No (Rollback)
Pre-Upgrade
Validations
Galera backup
Upgrade
Services
Commit
Shutdown all
OpenStack
services
Restore Galera
database
Rollback
OpenStack
Services
Post Rollback
Validations
Challenges faced and addressed
Galera cluster in a bad shape after a host package upgrade
Run automated galera recovery
Rabbitmq reconnections sometimes not working as expected
Restart rabbitmq servers post Upgrade
Handle soft deleted records on upgrade
https://review.openstack.org/#/c/435620/
Handling a different network design for CEPH nodes between Liberty 
 Ocata
Keep backward compatibility
VMs are not reachable over floating IP post upgrade
Move the network namespace to host from container
16
Upgrade Verification
Pre Upgrade Tests
Check Health of OpenStack services
Check CEPH health
Check health of Infra services (rabbitmq, galera, haproxy)
Post Upgrade Tests
Functional tests: Rally and tempest
Validate CEPH cluster functionality
Verify existing resources created before upgrade
Database schema comparison between upgraded setup and fresh deployment
Data plane and control plane downtime tracked through multi release upgrade
Validate health of infra services (rabbitmq, galera, haproxy)
17
Post Upgrade Verification
Automated testing
Performs end-to-end installation of liberty release followed by upgrade to mitaka/newton/ocata
End to end wrapper script which eases intermediate mitaka & newton upgrades
Helps uncovers timing issues (rabbitmq cluster does not respond intermittently)
Runs nightly to catch any regression
18
Demos
19
Demos
Neutron Upgrade (Mitaka-Newton)
Sample flow for Upgrade of 1 service
Rolling upgrade of each component
Bring down old service
Bootstrap new service
Bring up new service
Rollback(Ocata->Newton)
Preview of Ocata
Stop of Ocata Services
Restore Newton mysql DB
Rollback of Openstack services
Post rollback sanity of cloud
No control plane operatons post post Upgrade
20
Summary
 An OpenStack multi release upgrade can be performed by internally
triggering a step by step in-sequence upgrade (every release)
 An upgrade between releases normally also involves additional components
E.g. Operating system + CEPH + Infra Services
 Rollback support works
Control plane will be down during rollback window
 Containerized OpenStack deployments need to handle Docker upgrade
 Repeatable automation to run Upgrades every night is critical to flush out any
hidden timing bugs
21
Questions?
Ajay Kalambur akalambu@cisco.com
Shail Bhargava shabharg@cisco.com
Richard Winters riwinter@cisco.com
22
 
Backup/Details
23
 
RHEL version Upgrade
Ansible upgrade on management node is simple
Ansible: 1.9.4 -> 2.2.1
python-docker-py: 1.4.0-1.9.0
RHEL version Upgrade from 7.2 to 7.4
Docker Upgrade from 1.8.2-1.10
Option of Delayed reboot of Host operating system on compute nodes
All operating system and package updates performed Liberty-Mitaka
Auto execute galera cluster recovery see later
Removed the oci-register machine hook before docker upgrade
rm 
rf /usr/libexec/oci/hooks.d/oci-register-machine
 
24
RHEL Version Upgrade
SELinux relabeling done as part of OS Upgrade
Mitaka-Ocata use same kernel so no downtime of any nodes as it
s a hitless upgrade
Handle kexec changes for new kernel
Install kexec loader based on new kernel
Install modified kexec unit file
Setup kexec kernel load for restart
Default to kexec restart
Patch libvirt systemd file to add a dependency on machine.slice
Needed for VM to automatically startup after system reboot
Automatically install any new packages added as part of newer release
 
25
CEPH upgrade Hammer to Jewel
Upgrade CEPH mon nodes first
Change permission of /var/lib/ceph, /var/log ceph, /etc/ceph to ceph:ceph
We run ceph-mon in a docker container to we replace the new container
Track ceph-mon docker container through systemd
CEPH OSD node upgrade
One node upgraded at a time
Stop all OSD services
Change permission of /var/lib/ceph/osd, /var/log/ceph to ceph:ceph
Yum update all CEPH packages
Make sure to create mount entries for all ceph osd drives in /etc/fstab
Systemctl enable ceph-osd.target, ceph-osd@<osdid> for all OSD
touch ./autorelabel and reboot
 
26
CEPH Upgrade Post Upgrade tasks
On Mon node: ceph osd set require_jewel_osds
On Mon node: ceph osd crush tunables firefly
Check status of ceph cluster ceph -s to make sure Health is OK
 
27
Rabbitmq Cluster upgrade procedure
Involves a change to major or minor version
Stop all rabbitmq servers on all 3 controllers
Remove old containers and images
Remove the mnesia file: /var/lib/docker/volumes/rabbitmq/_data/mnesia
Bring up the new rabbitmq containers with new configs
Enable ha policy explicitly for mirrored queues: rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-
sync-mode":"automatic"}’
Validate cluster state to make sure things came up fine: rabbitmqctl cluster_status
Cluster running with required number of members
No partitions
[{nodes,[{disc,['rabbit@control-server-1','rabbit@control-server-2',                'rabbit@control-server-3']}]},
{running_nodes,['rabbit@control-server-2','rabbit@control-server-1',                 'rabbit@control-server-3']},
{cluster_name,<<"rabbit@control-server-3">>}, {partitions,[]}]...done.
 
28
Galera Cluster Upgrade procedure
Disable galera backend in haproxy
touch /var/tmp/clustercheck.disabled
Wait for few seconds for pending transactions to sync
Shut down galera services on all 3 nodes in proper order
Remove grastate.dat
Avoid getting into higher transaction id issues
Moved to new cluster approach vs a graceful highest transcation id shutdown (Requests in transit did
not matter to us)
Stop and remove old container and docker images
Start new container with new configs
Bootstrap one node as primary and startup
 
29
Galera cluster post upgrade procedure
Restore galera cluster behind haproxy remove /var/tmp/clustercheck.disabled
Perform xinetd checks and cluster status check
Validate cluster status output
mysql 
u root 
p <> -e “SHOW STATUS LIKE ‘wsrep%’
wsrep_local_state_comment: Synced, wsrep_cluster_size: <# controllers>
Perform health check of <vip>:3306 to make sure things work end to end
Ability to trigger galera automated recovery if previous steps fail
 
30
Galera cluster automated recovery
Handle the case of heuristic rollback where galera was shutdown in middle of
transaction
If cluster cannot still be recovered perform complete failure recovery
Stop all existing galera services
Bootstrap all mariadb containers with 
wsrep-recover option
Check node with highest transaction number
Force this node as primary and bootstrap mariadb with 
wsrep-new cluster option
Wait for node to come online and respond to SQL query
Start remaining nodes to join cluster
 
31
ELK-EFK Components Upgrade
Liberty setup: ELK components at 1.5 version
No requirement to persist logs and database between Liberty-Ocata
Moved to EFK 5.x with a rip and replace
Remove old container and images.
Install new EFK components
Old logs were lost but EFK was setup properly going forward with Ocata
 
32
OpenStack release Upgrade
Customer initiates Liberty-Ocata upgrade no extra nodes available
We internally perform rolling upgrade 1 release at a time
Liberty 
 
Mitaka 
 
Newton 
 
Ocata
Test all resources before and after Upgrade
Bring in any new services as part of Upgrade example placement in Ocata
Snapshot database using mysqldump to recover from an upgrade failure
POC rollback implementation Newton-Ocata
Upgrade of rabbitmq and galera cluster have different flow
 
33
Placement handling Newton-Ocata
Create nova cells database mandatory in Ocata
Create the placement user in keystone
Create the placement service in keystone
Create the placement endpoints in keystone
Register the cell0 database: nova-manage cell_v2 map_cell0
Create the cell1 cell: nova-manage cell_v2 create_cell --name=cell1
nova-manage cell_v2 simple_cell_setup
Perform nova api and nova db sync
Post nova upgrade trigger a manual  discover hosts: nova-manage cell_v2
discover_hosts (later on: config option discover_hosts_in_cells_interval 300)
 
34
Verification
Functionality testing
Comparison of upgraded & fresh installed setup
Database schema migration using mysqldump
Manual verification by comparing the two dumps using "diff"
Exploring automated tools for database schema comparison to put in CI/CD
Host RPM packages including kernel, ansible, docker, OpenStack clients, etc.
CEPH cluster functionality
Host configuration changes e.g. reserving a TCP port for newly introduced service(s)
Host system services ordering and dependencies
Functionality of kexec reboot (to minimize downtime and avoid doing lengthy hardware POST)
Migration of custom configuration
TCP/UDP port scan to identify changes in Open/Closed Ports
Existing OpenStack resources can be accessed, modified and deleted post upgrade
New OpenStack resources can be created post upgrade
OpenStack Services – running Rally test for various components on upgraded setup
 
35
Verification
Control plane downtime
Host RPM package update results in increased downtime (e.g. docker RPM)
Liberty to mitaka has more downtime when compared to mitaka to newton or newton to ocata upgrade
Data plane downtime
Host RPM package update increases the downtime (e.g. kernel, iptables)
Increased downtime with external (floating) network when compared to that of the provider networkVu
 
36
undefined
Slide Note
Embed
Share

Delve into the world of OpenStack upgrades as we explore the reasons for upgrading, the process involved, and the key areas to focus on. Discover the importance of minimizing downtime, improving stability, and bringing in new features. Gain insights into upgrading packages, configuration files, databases, and deployment specifics. Explore a containerized environment for OpenStack services and the tools used for deployment within Docker containers.

  • OpenStack
  • Upgrades
  • Deployment
  • Containerization
  • Stability

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. OpenStack Upgrade: A journey from Liberty to Ocata Ajay Kalambur, Technical Leader Shail Bhargava, Technical Leader Rich winters, Senior Software Eng. November 8, 2017

  2. Agenda OpenStack Upgrades: Why Should One Care? What Does An Upgrade Entail? Environment Overview Key Areas OS: RHEL Upgrade major releases Storage: CEPH Upgrade from Hammer to Jewel OpenStack Ver: Upgrade from Liberty to Ocata Infra Services; Upgrade of rabbitmq, galera, haproxy Rollback Of An Upgrade (Newton-Ocata) Verification/Testing Of An Upgrade Process 2

  3. OpenStack Upgrade: Why Should One Care? Minimize Any Production Cloud Downtime Minimize data plane and control plane disruption No need to recreate workloads on cloud Having a secondary cloud to migrate workloads is not viable economically Move At The Speed Of Open Source Bring in new OpenStack features Improved stability with each release (e.g. Bug fixes) Reduce unnecessary support (e.g. EOL code) 3

  4. OpenStack Upgrade: What does it entail?? General Package Upgrade Upgrade packages to bring in new software. Update configuration files Update configuration files with the latest parameters Configuration Translations across releases Sync databases Run a database sync to update schemas to the new structure Deployment Specific Containers vs Host vs VM Operating system CEPH Infra services (rabbitmq, galera, haproxy, etc.) OpenStack Services 4

  5. Our Environment 5

  6. Containerized OpenStack Services Ansible playbooks used to deploy Openstack services within Docker containers Docker containers started via systemd Haproxy used for load balancing Elastic Search Deployment Kibana Nova Glance Neutron Nova Container Registry Repo Mirror VMTP ceilometer haproxy Horizon Glance Neutron CEPH Rabbitmq Galera Haproxy Memcached Fluentd Libvirt Fluentd Cinder Keystone Fluentd Mgmt APIs Storage Node (min of 3) Management Node Control Node (3) Compute Node (n)

  7. Control Plane High Availability Openstack API load balanced via HA proxy and Keepalived [Active/Standby]. Openstack Services/Message queue/database high availability implemented using three OpenStack control nodes. API Call #1 API Network HA Proxy [Active] HA Proxy [Standby] HA Proxy [Standby] API Call #2 Management Network Galera (Standby) Keystone Horizon Nova Neutron Server Others Galera (Standby) Keystone Horizon Nova Neutron Server Others Galera (Active) Keystone Horizon Nova Neutron Server Others Control Node 3 Control Node 2 Control Node 1

  8. Upgrade High Level View 8

  9. Key Upgrade Events L I B E R T Y S A N I T Y M I T A K A N E W T O N S A N I T Y O C A T A Host Packages and Kernel Upgrade (7.2->7.4) Infra Service Upgrade Openstack Services CEPH Upgrade: Hammer to Jewel Infra Services OpenStack Services Infra Services OpenStack Services Infra Services OpenStack Services Ocata Newton Ocata Liberty Mitaka Mitaka Newton Key Service Events (During an Upgrade) Cleanup old service Bootstrap container Bring up Service Bring up Stop old service Remove old container Remove old image for container DB sync for service Create any new services/users for keystone Bring up new container with new configuration Check service health

  10. Upgrade Specifics 10

  11. Operating System and Host Package Details Component Operating System Specifics RHEL 7.2 7.4 Method Liberty-Mitaka includes RHEL 7.2 7.4 Same kernel Mitaka Ocata Kexec upgrade Optional delayed reboot of compute nodes SELinux relabel Mitaka-Ocata minimal control and data plane disruption Host packages Host packages Docker 1.8.2-1.10 Galera auto recovery after docker upgrade Ansible and python-docker-py upgrade 11

  12. Ceph Upgrade Details Component Hammer Jewel Specifics CEPH Monitors Tasks Change permission of /var/lib/ceph, /var/log ceph, /etc/ceph to ceph:ceph Upgrade to newer version OSD Nodes Change permission /var/lib/ceph/osd, /var/log/ceph to ceph:ceph Enable ceph-osd.target and ceph-osd@<osdid> services Upgrade to newer version Post Upgrade tasks ceph osd set require_jewel_osds ceph osd crush tunables firefly 12

  13. Infrastructure Upgrade: Rabbit & Galera Component Infra services upgrade Specifics Rabbitmq upgrade Tasks Stop rabbitmq on all 3 nodes (major or minor version change) Remove mnesia file Bring up new containers Set HA policy for mirrored queues Disable galera backend in haproxy Galera Upgrade Shutdown services in right order Remove gratstate.dat Bootstrap new cluster bring up primary node Bring up other 2 members Enable galera backend in haproxy 13

  14. OpenStack Services Upgrade Component OpenStack Services Specifics General flow Tasks Liberty-Mitaka-Newton-Ocata Rolling upgrade of each service Delete old container, bootstrap service(db sync), install new containers Bring in new services example placement as part of Upgrade Snapshot database using mysqldump to recover from an upgrade failure Rollback support to recover from upgrade failure Create nova cells database Nova Placement handling (Newton-Ocata) Keystone changes for placement Cell setup 14

  15. Upgrade and Rollback User flow Commit Yes Upgrade success Pre-Upgrade Validations Upgrade Services Galera backup No (Rollback) Shutdown all OpenStack services Rollback OpenStack Services Restore Galera database Post Rollback Validations

  16. Challenges faced and addressed Galera cluster in a bad shape after a host package upgrade Run automated galera recovery Rabbitmq reconnections sometimes not working as expected Restart rabbitmq servers post Upgrade Handle soft deleted records on upgrade https://review.openstack.org/#/c/435620/ Handling a different network design for CEPH nodes between Liberty Ocata Keep backward compatibility VMs are not reachable over floating IP post upgrade Move the network namespace to host from container 16

  17. Upgrade Verification Pre Upgrade Tests Check Health of OpenStack services Check CEPH health Check health of Infra services (rabbitmq, galera, haproxy) Post Upgrade Tests Functional tests: Rally and tempest Validate CEPH cluster functionality Verify existing resources created before upgrade Database schema comparison between upgraded setup and fresh deployment Data plane and control plane downtime tracked through multi release upgrade Validate health of infra services (rabbitmq, galera, haproxy) 17

  18. Post Upgrade Verification Automated testing Performs end-to-end installation of liberty release followed by upgrade to mitaka/newton/ocata End to end wrapper script which eases intermediate mitaka & newton upgrades Helps uncovers timing issues (rabbitmq cluster does not respond intermittently) Runs nightly to catch any regression 18

  19. Demos 19

  20. Demos Neutron Upgrade (Mitaka-Newton) Sample flow for Upgrade of 1 service Rolling upgrade of each component Bring down old service Bootstrap new service Bring up new service Rollback(Ocata->Newton) Preview of Ocata Stop of Ocata Services Restore Newton mysql DB Rollback of Openstack services Post rollback sanity of cloud No control plane operatons post post Upgrade 20

  21. Summary An OpenStack multi release upgrade can be performed by internally triggering a step by step in-sequence upgrade (every release) An upgrade between releases normally also involves additional components E.g. Operating system + CEPH + Infra Services Rollback support works Control plane will be down during rollback window Containerized OpenStack deployments need to handle Docker upgrade Repeatable automation to run Upgrades every night is critical to flush out any hidden timing bugs 21

  22. Questions? Ajay Kalambur akalambu@cisco.com Shail Bhargava shabharg@cisco.com Richard Winters riwinter@cisco.com 22

  23. Backup/Details 23

  24. RHEL version Upgrade Ansible upgrade on management node is simple Ansible: 1.9.4 -> 2.2.1 python-docker-py: 1.4.0-1.9.0 RHEL version Upgrade from 7.2 to 7.4 Docker Upgrade from 1.8.2-1.10 Option of Delayed reboot of Host operating system on compute nodes All operating system and package updates performed Liberty-Mitaka Auto execute galera cluster recovery see later Removed the oci-register machine hook before docker upgrade rm rf /usr/libexec/oci/hooks.d/oci-register-machine 24

  25. RHEL Version Upgrade SELinux relabeling done as part of OS Upgrade Mitaka-Ocata use same kernel so no downtime of any nodes as it s a hitless upgrade Handle kexec changes for new kernel Install kexec loader based on new kernel Install modified kexec unit file Setup kexec kernel load for restart Default to kexec restart Patch libvirt systemd file to add a dependency on machine.slice Needed for VM to automatically startup after system reboot Automatically install any new packages added as part of newer release 25

  26. CEPH upgrade Hammer to Jewel Upgrade CEPH mon nodes first Change permission of /var/lib/ceph, /var/log ceph, /etc/ceph to ceph:ceph We run ceph-mon in a docker container to we replace the new container Track ceph-mon docker container through systemd CEPH OSD node upgrade One node upgraded at a time Stop all OSD services Change permission of /var/lib/ceph/osd, /var/log/ceph to ceph:ceph Yum update all CEPH packages Make sure to create mount entries for all ceph osd drives in /etc/fstab Systemctl enable ceph-osd.target, ceph-osd@<osdid> for all OSD touch ./autorelabel and reboot 26

  27. CEPH Upgrade Post Upgrade tasks On Mon node: ceph osd set require_jewel_osds On Mon node: ceph osd crush tunables firefly Check status of ceph cluster ceph -s to make sure Health is OK 27

  28. Rabbitmq Cluster upgrade procedure Involves a change to major or minor version Stop all rabbitmq servers on all 3 controllers Remove old containers and images Remove the mnesia file: /var/lib/docker/volumes/rabbitmq/_data/mnesia Bring up the new rabbitmq containers with new configs Enable ha policy explicitly for mirrored queues: rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha- sync-mode":"automatic"} Validate cluster state to make sure things came up fine: rabbitmqctl cluster_status Cluster running with required number of members No partitions [{nodes,[{disc,['rabbit@control-server-1','rabbit@control-server-2', 'rabbit@control-server-3']}]}, {running_nodes,['rabbit@control-server-2','rabbit@control-server-1', 'rabbit@control-server-3']}, {cluster_name,<<"rabbit@control-server-3">>}, {partitions,[]}]...done. 28

  29. Galera Cluster Upgrade procedure Disable galera backend in haproxy touch /var/tmp/clustercheck.disabled Wait for few seconds for pending transactions to sync Shut down galera services on all 3 nodes in proper order Remove grastate.dat Avoid getting into higher transaction id issues Moved to new cluster approach vs a graceful highest transcation id shutdown (Requests in transit did not matter to us) Stop and remove old container and docker images Start new container with new configs Bootstrap one node as primary and startup 29

  30. Galera cluster post upgrade procedure Restore galera cluster behind haproxy remove /var/tmp/clustercheck.disabled Perform xinetd checks and cluster status check Validate cluster status output mysql u root p <> -e SHOW STATUS LIKE wsrep% wsrep_local_state_comment: Synced, wsrep_cluster_size: <# controllers> Perform health check of <vip>:3306 to make sure things work end to end Ability to trigger galera automated recovery if previous steps fail 30

  31. Galera cluster automated recovery Handle the case of heuristic rollback where galera was shutdown in middle of transaction If cluster cannot still be recovered perform complete failure recovery Stop all existing galera services Bootstrap all mariadb containers with wsrep-recover option Check node with highest transaction number Force this node as primary and bootstrap mariadb with wsrep-new cluster option Wait for node to come online and respond to SQL query Start remaining nodes to join cluster 31

  32. ELK-EFK Components Upgrade Liberty setup: ELK components at 1.5 version No requirement to persist logs and database between Liberty-Ocata Moved to EFK 5.x with a rip and replace Remove old container and images. Install new EFK components Old logs were lost but EFK was setup properly going forward with Ocata 32

  33. OpenStack release Upgrade Customer initiates Liberty-Ocata upgrade no extra nodes available We internally perform rolling upgrade 1 release at a time Liberty Mitaka Newton Ocata Test all resources before and after Upgrade Bring in any new services as part of Upgrade example placement in Ocata Snapshot database using mysqldump to recover from an upgrade failure POC rollback implementation Newton-Ocata Upgrade of rabbitmq and galera cluster have different flow 33

  34. Placement handling Newton-Ocata Create nova cells database mandatory in Ocata Create the placement user in keystone Create the placement service in keystone Create the placement endpoints in keystone Register the cell0 database: nova-manage cell_v2 map_cell0 Create the cell1 cell: nova-manage cell_v2 create_cell --name=cell1 nova-manage cell_v2 simple_cell_setup Perform nova api and nova db sync Post nova upgrade trigger a manual discover hosts: nova-manage cell_v2 discover_hosts (later on: config option discover_hosts_in_cells_interval 300) 34

  35. Verification Functionality testing Comparison of upgraded & fresh installed setup Database schema migration using mysqldump Manual verification by comparing the two dumps using "diff" Exploring automated tools for database schema comparison to put in CI/CD Host RPM packages including kernel, ansible, docker, OpenStack clients, etc. CEPH cluster functionality Host configuration changes e.g. reserving a TCP port for newly introduced service(s) Host system services ordering and dependencies Functionality of kexec reboot (to minimize downtime and avoid doing lengthy hardware POST) Migration of custom configuration TCP/UDP port scan to identify changes in Open/Closed Ports Existing OpenStack resources can be accessed, modified and deleted post upgrade New OpenStack resources can be created post upgrade OpenStack Services running Rally test for various components on upgraded setup 35

  36. Verification Control plane downtime Host RPM package update results in increased downtime (e.g. docker RPM) Liberty to mitaka has more downtime when compared to mitaka to newton or newton to ocata upgrade Data plane downtime Host RPM package update increases the downtime (e.g. kernel, iptables) Increased downtime with external (floating) network when compared to that of the provider networkVu 36

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#