when disaster strikes the cloud: who, what, when, where and how to recover
TRANSCRIPT
![Page 1: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/1.jpg)
Accelerating Enterprise OpenStack
When Disaster Strikes the Cloud
Michael Factor IBM Research - Haifa
Who, What, When, Where and How to Recover
Ronen Kat IBM Research - Haifa [email protected]
Sean Cohen RedHat
![Page 2: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/2.jpg)
2
Talk Outline q What is disaster recovery?
q Concepts and basics
q Protecting data and applications from disasters q OpenStack Cinder toolbox for disaster recovery q Applications are more than just data
q The road ahead: Kilo and beyond
![Page 3: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/3.jpg)
3
What is Disaster Recovery?
According to Wikipedia, Disaster Recovery (DR) is "the process, policies and procedures . . . for recovery . . . of technology infrastructure . . . after a natural or human-induced disaster.”
Servers Storage Network Software Configuration
Surviving a disaster requires geographic dispersion
![Page 4: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/4.jpg)
4
Recovery Point Objective and Recovery Time Objective
How far back in time a disaster takes one
How long until operational after a disaster
Seconds 0
RECOVERY POINT OBJECTIVE (RPO)
Minutes Hours Days Weeks Weeks
RECOVERY POINT TIME (RTO)
Days Hours Minutes Seconds
Replication
Backup restore Active site Hot site
![Page 5: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/5.jpg)
5
Data and Metadata Consistency
Data consistency q If a modified datum is available,
all data it depends upon is also available
Metadata consistency q Configuration updates are seen
in the same order relative to one another and to data updates
Application VM
DB LOG
DB LOG
Remote Site
![Page 6: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/6.jpg)
6
OpenStack Cloud Metadata
Virtual networks between the cloud VM External network access Attached volumes Volume types Virtual machines flavors SSH keys for VM access Virtual machines images
Identities of users
![Page 7: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/7.jpg)
Accelerating Enterprise OpenStack
Protecting Data and Applications from Disasters
![Page 8: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/8.jpg)
8
Data Protection: Cinder Backup and Restore
q Cinder backup q Backup a volume to backup storage
Swift
backup-create
Primary Cloud
![Page 9: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/9.jpg)
9
Data Protection: Cinder Backup and Restore
q Can Cinder restore on secondary cloud?
q Problem: Cinder on secondary cloud is not aware of the backup
Swift backup-restore
Primary Cloud
Secondary Cloud
![Page 10: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/10.jpg)
10
Data Protection: Cinder Backup and Restore
q Solution: “electronic tape shipping” q backup-export q backup-import
q Cinder supports since Icehouse
Swift
backup-export
Primary Cloud
Secondary Cloud
Backup reference
backup-import
![Page 11: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/11.jpg)
11
Data Protection: Cinder Backup and Restore
q After backup-import Cinder can restore on secondary cloud q backup-restore
Swift backup-restore
Primary Cloud
Secondary Cloud
![Page 12: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/12.jpg)
12
Data Protection: Cinder Volume replication
q Cinder has initial support for volume replication in Juno release
q Cinder back-ends can “advertise” support for replication
q Volume created with replication extra-spec will be allocated on back-end supporting replication and will be replicated
q Supporting back ends: q IBM Storwize, more expected in Kilo
Cinder back-end
Cinder back-end
Volume-type extra specs: “capabilities:replication
<is> True”
![Page 13: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/13.jpg)
13
Data Protection: Cinder Volume replication
q Secondary volume can become primary when promoted q replication-promote
q Replication can be reversed following a replication-promote q replication-reenable
Cinder back-end
Cinder back-end
![Page 14: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/14.jpg)
14
Consistency Groups q New in Juno
q Support for volume grouping for consistency
q Grouping of volumes is based on the volume-type
q Supporting q Consistency group snapshots
q Needs to be extended to support q Cinder backup q Cinder volume replication
DB LOG
![Page 15: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/15.jpg)
15
Protecting Applications from Disasters
Servers Storage Network Software Configuration
Disaster Recovery Orchestration
![Page 16: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/16.jpg)
16
OpenStack Tools
q Applications are defined in OpenStack by q Heat Orchestration Templates
q However q Not all applications are template based q Deployments (including configuration) change over time q Some definitions are cloud specific, e.g., networks, types q Heat templates and Stacks don’t stay consistent
q Tools that can create a template from deployment, e.g., Flame, ReHeat
q But, template will only fit the current cloud
![Page 17: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/17.jpg)
17
OpenStack Tools and Beyond
q Demo: A technology preview for disaster recovery with IBM Cloud Manager
![Page 18: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/18.jpg)
18
THE ROAD AHEAD
![Page 19: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/19.jpg)
19
Ceph Multi-Site & Disaster Recovery (Block) example
q Export snapshots to geographically dispersed data centers q Provides disaster recovery
q Export incremental snapshots q Minimize network bandwidth by only sending changes
q Kilo cycle focus to extends the multi-site and disaster recovery options q RBD Mirroring q Cinder Volume Replication
![Page 20: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/20.jpg)
20
Ceph Multi-Site & Disaster Recovery (Object) example
q Zones and region support q Deploy topologies similar to S3
and others with a global namespace
q Data center synchronization q Back-up full or partial sets of data
between regions
q Read affinity q Serve local copies of data to local
users
![Page 21: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/21.jpg)
21
Disaster Recovery as a Service Catalog q Pluggable Disaster Recovery policies
q Replication targets can specify different RPO/RTO levels that can be offered based on the supported backend capabilities
q Disaster Recovery Policies q Active - Cold standby q Active - Hot standby q Active - Active (requires application awareness and transaction integrity) q Backup to Cloud / From the Cloud
![Page 22: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/22.jpg)
22
Extending Heat Orchestration for Disaster Recovery
q Heat can be used to automate q Add support for Cinder replication
q Need to make Consistency group across OpenStack projects q Nova Cinder, Trove….
q Stack Snapshot Backup / Rollback
q Enable customization of workload components at recovery site. q Networks, VM configurations changes, guest agent etc.
![Page 23: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/23.jpg)
23
The Road Toward Application Consistency
First phase: File system consistency
q Integrate into OpenStack to allow consistent snapshots and backups q Nova needs to request QEMU Guest Agent to freeze the file systems
(and applications if fsfreeze-hook is installed) during the snapshot
q Patches has proposed for Nova and Cinder, targeting the Kilo release
Source: Hitachi
![Page 24: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/24.jpg)
24
The Road Toward Application Consistency
Next phase: Consistency at the application level
q Application-Aware on Windows with VSS Support on qemu-ga q Application notification via Microsoft Volume Shadow Copy Service (VSS)
q Application-Aware on Linux Using qemu-ga Hooks q Application-consistent snapshots can be created with scripts interacting with the
QEMU guest agent q The scripts can notify applications to flush their data
![Page 25: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/25.jpg)
25
Disaster Recovery at Scale
q Site evacuation holy grail is an automatic planned migration of the workloads and data from one cloud-scale datacenter to another.
q New OpenStack HA approaches to help Recovery from infrastructure failures:
q Leveraging Pacemaker to provide automated detection of a failed hypervisor and the recovery of the VMs that were running there.
q Evacuate instance to a scheduled host was added in Juno q Simple tagging API for instances in Nova was accepted for Kilo release
q Can support automatic-recovery new tag
Suggest removing – no time
![Page 26: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/26.jpg)
26
OpenStack Documentation needs to catch up…
q Join the OpenStack Disaster Recovery Guide q We have a basic OpenStack High Availability Guide
q http://docs.openstack.org/high-availability-guide/content/
q A very outdated “Recover cloud after disaster” section in the Admin guide http://docs.openstack.org/admin-guide-cloud/content/section_nova-disaster-recovery-process.html
![Page 27: When disaster strikes the cloud: Who, what, when, where and how to recover](https://reader034.vdocuments.net/reader034/viewer/2022042716/55a688741a28ab501e8b46cd/html5/thumbnails/27.jpg)
Accelerating Enterprise OpenStack
Q&A
Michael Factor IBM Research - Haifa
THANK YOU
Ronen Kat IBM Research - Haifa [email protected]
Sean Cohen RedHat