non-disruptive upgrade to suse openstack cloud 7 · upgrading the administration node preliminary...
TRANSCRIPT
Non-disruptive upgrade to
SUSE® OpenStack Cloud 7
Nanuk Krinner Rick Salevsky
SUSE Cloud Engineer SUSE Cloud Engineer
Introduction
2
Speakers
● Nanuk Krinner○ Cloud Developer at SUSE
○ Systems Management Engineer
● Rick Salevsky○ Cloud Engineer at SUSE
○ Release Coordinator
● SUSE OpenStack Cloud
3
Agenda● Upgrading
● Why and What
● Non-disruptive Upgrade
○ Process
○ Requirements
○ Upgrading the Administration Node
○ Preparing the Client Nodes
○ Upgrading the Controller Nodes
○ Upgrading the Compute Nodes
○ Finalizing Upgrade
● Goals4
Upgrading?
5
Upgrading?
6
Day 1
Deploy
Upgrading?
7
Day 1
Deploy
Day 2
Operate
Upgrading?
8
Day 1
Deploy
Day 2
Operate
Day 3
Upgrade
Why and What
9
Why upgrading?
● Security fixes
● Stability improvements
● Performance improvements
● Closely follow upstream development
● New features
● Stay on a supported release
10
Problems while upgrading?
● Downtime
● Preparation
● Testing
● Adapting workflows
● Bugs
● Data loss
11
OpenStack User Survey April 2016
Customer demands
● Rolling upgrade
● Non-disruptive upgrade
● Easy way to cancel
● Clear documentation of what is happening
● Upgrade while skipping one or more releases
12
Upgrade marathon
13
ReleaseEvaluating the
release
Planning the
upgrade
Testing the
upgrade
Fine tuningIntegrating
new featuresUpgrade
Upgrade marathon
14
ReleaseEvaluating the
release
Planning the
upgrade
Testing the
upgrade
Fine tuning
Maybe not
this time?
Integrating
new featuresUpgrade
Non-disruptive Upgrade
15
Process
16
Process
● Non-disruptive for workloads○ HA as requirement
○ network downtime without HA
● Tools○ WebUI
○ command line tool (crowbarctl)
○ new REST-API (for manual upgrade)
17
Requirements
18
Requirements
● Maintenance Updates installed
● Cloud Network Services are healthy
● Pacemaker is available and healthy
● Ceph is healthy
● Compute Resources are available
19
Upgrading the Administration Node
20
Upgrading the Administration Node
● Preliminary Checks
● Non-disruptive Mode or Normal Mode
● Begin Upgrade○ Decouple node from crowbar
○ Disabling chef on nodes
○ Freezing the Cloud in the current state
● Backup of the admin node○ Optional but recommended
○ In case the admin node upgrade fails
21
Upgrading the Administration Node
● Update Repositories○ Manual by the Administrator
○ SLES 12 SP2
○ SUSE OpenStack Cloud 7
○ Updates Repositories
● Upgrade Administration Node OS○ Background Script
○ Stops chef-client service
○ Dumps current database
○ Executes zypper dist-upgrade
● Rebooting22
Upgrading the Administration Node
● Creating PostgreSQL database ○ Create new local database
○ Connect to existing database
● Migrating old data to new database
23
Preparing the Client Nodes
24
Preparing the Client Nodes
Before Upgrading
● Backup all important OpenStack data
● Create snapshots from important instances
● Last chance to update OpenStack resources
25
Preparing the Client Nodes
● Prepare client node repositories○ SLES 12 SP2
○ SUSE OpenStack Cloud 7
○ SLES 12 SP2 Height Availability Extension
○ Updates Repositories
● Automatic disabling old Repositories○ SUSE OpenStack Cloud 6
○ SLES 12 SP1
26
Preparing the Client Nodes
● Stopping Services○ Irrelevant OpenStack services → network
○ Related OpenStack services
● Creating backup of the OpenStack database○ Backup is stored on the Administration Node
● OpenStack API will is mostly unavailable
27
Upgrading the Controller
Nodes
28
SLES 12 SP1
Upgrading the Controller Nodes
29
DHCP
Neutron
OVS
L3
RabbitMQ
Keystone
DB
SLES 12 SP2
Admin Network
P
a
c
e
m
a
k
e
r
P
a
c
e
m
a
k
e
r
Upgrading the Controller Nodes
● First node to upgrade will a non master node
● Migrating neutron l3-agent
● Shutdown Pacemaker services
● Upgrade Controller Node OS
● Reboot but not start any services
● Prevent Pacemaker of running services on non
upgraded nodes
● Core API downtime start now → all services
30
SLES 12 SP1
Upgrading the Controller Nodes
31
DHCP
Neutron
OVS
L3
RabbitMQ
Keystone
DB
SLES 12 SP2
Admin Network
P
a
c
e
m
a
k
e
r
P
a
c
e
m
a
k
e
r
Upgrading the Controller Nodes
● Start Pacemaker Services
● Update all configurations via crowbar
● Stop synchronizing HA ressources
● Promote upgrades node to master
● Start all services on the upgraded node
● Core API downtime ends here
32
SLES 12 SP1
Upgrading the Controller Nodes
33
DHCP
Neutron
OVS
L3
RabbitMQ
Keystone
DB
SLES 12 SP2
Admin Network
P
a
c
e
m
a
k
e
r
P
a
c
e
m
a
k
e
r
Neutron
RabbitMQ
Keystone
DB
DHCP
OVS
L3
Upgrading the Controller Nodes
For all other controller nodes each at a time:
● All services are already stopped
● Moving network traffic to upgraded node
● Sync HA slave with master
● Stopping cluster stack
● Upgrade Controller Node OS
34
Upgrading the Controller Nodes
Normal Mode (non High Availability case)
● Several Network outages will happen
● Specially during network migrations
35
Upgrading the Compute
Nodes
36
Upgrading the Compute Nodes
For all compute nodes each at a time:
● Disabling nova hypervisor
● Live migrate instances to another compute node
● Stop Pacemaker remote
● Upgrading node OS
● Rebooting node
● Update all configurations via crowbar
● Adding node to pacemaker cluster
37
Finalizing Upgrade
38
Finalizing Upgrade
● Reapplying all barclamps
● Showing the barclamps page
39
Issues● Configuration file migration
● Migrations
● All or nothing
● Predefined upgrades
● Create backups!
40
Goals
41
Goals
● Finish non-disruptive upgrade for other services
● No downtime of important services
● Migrating existing data from every point
● Cancel upgrade in every step
● Rollback upgrade
42
Questions?
43
Nanuk Krinner Rick Salevsky
SUSE Cloud Engineer SUSE Cloud Engineer
Thank you!
44
Nanuk Krinner Rick Salevsky
SUSE Cloud Engineer SUSE Cloud Engineer