data protection, recovery, and hafor private cloud deployments · problem statement: lack of...

68
Data Protection, Recovery, and HA for Private Cloud Deployments Lawrence To Sr. Director, MAA Development Oracle High Availability Systems Joseph Meeks Sr. Director, Product Management Oracle High Availability Systems Seungtaek Lee Principal Engineer CI-TEC Samsung Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Upload: others

Post on 28-Oct-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Data Protection, Recovery, and HA for Private Cloud Deployments Lawrence To Sr. Director, MAA Development Oracle High Availability Systems

Joseph Meeks Sr. Director, Product Management Oracle High Availability Systems

Seungtaek Lee Principal Engineer CI-TEC Samsung

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Page 2: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Protection and Oracle MAA

Bronze and Silver Data Protection

Gold and Platinum Data Protection

Data Protection as a Service

Samsung MAA Architecture for Private Cloud

1

2

3

4

5

2

Page 3: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

A Sky-is-Blue Statement

3

High Availability

Data Protection

Page 4: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Inadequate Data Protection = Downtime

U.S. State Government – SAN memory failure, problem mirrored to standby SAN.

European Cloud Infrastructure Provider – Storage array failed, unable to read tape backups used for DR

5-day outage

Global Specialty Retailer – Disk failure, followed by mirrored disk failure. Restore from local

backup failed. Restore using copy at DR site also failed.

8-day outage

5-day outage

Page 5: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle Data Protection and Availability Design Principles

Data Protection at Every Level

Strong Fault Isolation: Real-Time Validation

Real-time HA/DR: All Components Active

Page 6: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Edition-based Redefinition, Online Redefinition, Data Guard, GoldenGate – Minimal downtime maintenance, upgrades, migrations

Active Data Guard – Data Protection, DR – Query Offload

GoldenGate – Active-active replication – Heterogeneous

Active Replica

RMAN, Oracle Secure Backup, Recovery Appliance – Backup to disk, tape or cloud

Enterprise Manager Cloud Control – Site Guard, Coordinated Site Failover Application Continuity – Application HA Global Data Services – Service Failover / Load Balancing

RAC – Scalability – Server HA

ASM – Local storage

protection

Production Site

Flashback – Human error

correction

Oracle Maximum Availability Architecture (MAA)

Page 7: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Oracle MAA Availability Tiers Availability Service Levels for Unplanned and Planned Outages

BRONZE

SILVER

GOLD • Comprehensive HA and Disaster Protection • Recovery in seconds with zero or near-zero data loss

• High Availability (HA) for Recoverable Local Outages • Backups plus redo for Oracle data protection

• Basic Service Restart • Backups plus redo for Oracle data protection

PLATINUM • Zero Outage for Platinum Ready Applications • Zero data loss

Page 8: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Reference Architectures Oracle MAA Availability Tiers

BRONZE

SILVER

GOLD

PLATINUM

Single Instance

Replication

Backups

Platinum-Ready Apps

Clusters

Backups

Clusters

Clusters and Replication

Page 9: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Protection and Oracle MAA

Bronze and Silver Data Protection

Gold and Platinum Data Protection

Data Protection as a Service

Samsung MAA Architecture for Private Cloud

1

2

3

4

5

9

Page 10: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Problem Statement: Lack of Intelligent Data Validation

Data can be corrupted anywhere and anytime… and can be undetected unless touched Checksum is not sufficient

Backups and DR without validation is enormous risk Do not guarantee working or meeting recovery SLAs

Validation is helpful everywhere: I/O, memory, storage, Oracle data block, inter-block, database and application

10

How do we know restore and recovery will succeed? Is my mirrored copy corrupt too? Can I achieve recovery SLAs?

Page 11: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Third Party Backups lacks data protection and validation

• A backup is meaningless if it does not result in successful recovery • Recover operation can fail IF:

– Backup script is incorrect or incomplete (e.g. missing data files, archives, control files) – Backup operation is incorrect (e.g. online backups without database in backup mode) – Backups are corrupted (from source, from storage or media)

• Most backup appliances do not have ongoing checks and validations • Reality: The inability to recover successfully results in extended

downtime, lost revenue, damaged reputations…and career changes.

11

Page 12: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

A Story about Backup and Recovery...

• The story starts out great. • Backups completed

successfully with no errors. • Everything appears fine.

12

RMAN-03091: Finished backup at 11-SEP-14 RMAN> Recovery Manager complete

Page 13: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Then a Failure Occurs...

I hope this works....

13

Restore and Recovery is Required

Page 14: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 14

Missing Data File or Archive Gap Recovery Gone Wrong - Example 1

RMAN-03002: failure of restore command at 09/11/2014 07:48:13 RMAN-06026: some targets not found – aborting restore RMAN-06023: no backup or copy of datafile 4 found to restore

Page 15: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 15

Backups are corrupt, likely due to corrupt backup media Recovery gone wrong example 2

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00001 to /SHARED1/ORADATA/DBF/orcloow1/system01.dbf

channel ORA_DISK_1: reading from backup piece /SHARED1/ORADATA/FRA/Orpiafbb_1_1ORCL00W1 channel ORA_DISK_1: ORA-19870: error while restoring backup piece /SHARED1/ORADATA/FRA/Orpiafbb_1_1O...

ORA-19612: datafile 1 not restored due to missing or corrupt data

Page 16: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Bronze and Silver Data Protection

We can do a much better job preventing and repairing corruptions in real time.

I’d like to know that my backups are validated when they are created, and on a regular basis to make sure they are good. I want to be alerted whenever a database can NOT meet my recovery SLAs.

16

Page 17: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Capability Physical Block Corruption Logical Block Corruption

Dbverify, Analyze Physical block checks Logical checks for intra-block and

inter-object consistency

RMAN, ASM Physical block checks Intra-block logical checks

Database In-memory block and redo checksum In-memory intra-block checks

ASM Automatic corruption detection and repair using extent pairs

Exadata HARD checks on write, automatic disk scrubbing and repair HARD checks on write

Bronze - Single Instance Oracle Database (MOS 1302539.1) Oracle Data Protection

Runt

ime

Man

ual

Page 18: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

MAA: Data Protection for All Databases

18

Validation, Detection and Repair in Memory, during I/O and on Disk

DB_BLOCK_CHECKSUM=FULL (Optional DB_BLOCK_CHECKING) • Compute checksum on change and catches corruptions in memory • Validate checksum on read and update (DETECTION) • Prevents corrupted block to be written to disk (PREVENTION) • Recover using good data block and redo (REPAIR) Automatic Storage Management • Data Corruption or I/O error triggers repair (DETECTION/REPAIR) • Oracle semantics aware • Reads extent copies for good copy (PREVENTION of ERROR) • Good writes can correct existing corruptions (REPAIR) Exadata HARD and Automatic Disk Scrub and Repair • Prevents physical corruption during writes (OS to storage) (PREVENTION) • Inspects and repairs hard disk corruption that resides on storage (DETECTION) • Calls ASM to repair using good extent copy (REPAIR)

Bad SCN

Good SCN

Page 19: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 19

With ASM Redundancy Corruption detection, Mirror Read, and Automatic Repair

Database update

encounters corruption

Database reads ASM mirror copy and repairs corruption

Oracle logs the following for the administrator: Corrupt block relative dba: 0x16400087 (file 89, block 135)

Bad check value found during multiblock buffer read

Data in bad block:

type: 6 format: 2 rdba: 0x16400087

last change scn: 0x0000.b6702b33 seq: 0x1 flg: 0x04

spare1: 0x0 spare2: 0x0 spare3: 0x0

consistency value in tail: 0x2b330601

check value in block header: 0xa07a

computed block checksum: 0x3

Reading datafile '+DATA/qs/datafile/c.257.825768683' for corruption at rdba: 0x16400087 (file 89, block 135)

Read datafile mirror ‘DATA_CD_08_CELL13' (file 89, block 135) found same corrupt data (no logical check)

Read datafile mirror ‘DATA_CD_07_CELL14' (file 89, block 135) found valid data

Hex dump of (file 89, block 135) in trace file /u01/app/oracle/diag/… /qs1_ora_60475.trc

Repaired corruption at (file 89, block 135)

continues to run without ever noticing the failure

Application

Page 20: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 20

Exadata Disk Scrubbing Combined with ASM Auto Repair I/O Error Prevention

Disk sector goes bad

Cell disk scrub finds bad sector and ASM repairs it

never encounters the I/O error

Application

Oracle logs the following for the administrator:

Wed Jul 16 17:00:06 2014

Begin scrubbing CellDisk:CD_06_cell06.

Begin scrubbing CellDisk:CD_07_cell06.

..

Wed Jul 16 18:33:05 2014

Read Error on Cell Disk CD_06_cell06 (/dev/sdg) at device offset 2794140467200 bytes with size 1048576 bytes (errno: Input/output error [5])

Read Error on Grid Disk RECOC1_CD_06_cell06 at grid disk offset 423268188160 bytes with size 1048576 bytes from disk scrub

Wed Jul 16 18:33:12 2014

Broadcast: 1 events ASM REPAIR diskgroup of opcode 10 for diskgroup RECOC1 to:

...

Finished scrubbing CellDisk:CD_06_cell06, scrubbed blocks (1MB):2860960, found bad blocks:2

Finished scrubbing CellDisk:CD_07_cell06, scrubbed blocks (1MB):2860960, found bad blocks:0

..

Page 21: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 21

Exadata Hardware Assisted Resilient Disk (HARD) Corruption Prevention with Automatic Retry

Network packet containing

database write is corrupted

Cell prevents write of corrupt block and ASM

retries write

never encounters a corruption

Application

Oracle logs the following for the Administrator:

Cell side:

Thu Sep 11 08:42:33 2014

HARD CHECK FAILED for ftyp=0 blksiz=512 blkno=0checks=1 startblk=33182326784 nblks=16

Database side:

Errors in file /u01/app/oracle/diag/rdbms/qs/qs1/trace/qs1_dbwf_41262.trc:

ORA-27603: Cell storage I/O error, I/O failed on disk

o/192.168.10.29;192.168.10.30/DATAC1_CD_02_CELL7 at offset 151396352

for data length 8192

ORA-27626: Exadata error: 205 (HARD check failed)

WARNING: Write Failed, will retry. group:1 disk:74 AU:36 offset:401408

size:8192

Page 22: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Bronze and Silver Data Protection

We can do a much better job preventing and repairing corruptions in real time.

I’d like to know that my backups are validated when they are created, and on a regular basis to make sure they are good. I want to be alerted whenever a database can NOT meet my recovery SLAs.

22

Page 23: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Zero Data Loss Recovery Appliance Overview

23

Delta Store – virtual full backups • Stores validated, compressed changes on disk • Fast restores to any point-in-time using deltas and redo • Built on Exadata scaling and resilience • Enterprise Manager end-to-end control

ZDLRA

Delta Push • Access and send only changes

• Minimal impact on production • Data Guard-like real-time redo ship

instantly protects new transactions

Protected Databases

Protects all Oracle Databases • Petabytes of data, any release • No expensive backup agents

Offloads Tape Backup

Replicates to Remote ZDLRA for disaster recovery

Data Protection for Your Backups, Recovery for Your Business

Page 24: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

• Data validated on receive • Data periodically revalidated • Data validated on restore • Built using MAA practices • ASM auto repair • Exadata HARD checks and

automatic disk scrub/repair • Data validated on receive,

restore, and periodically • ASM and Exadata checks

and repair

• Data validated when copied to and restored from tape

Tape Archive

Remote Replica

Recovery Appliance

24

ZDLRA Understands and Validates Database Formats End-to-end Data Loss Protection from Corruptions

Page 25: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 25

Policy-Based Database Protection as a Service

Protection Policies • Easy-to-deploy • Standardized • Alerts when not

meeting Recovery SLAs

Platinum and Gold Policy, Mission Critical Disk: 90 days Tape: 2 years RPO: 5 secs

Tape

Silver Policy, Business Critical Disk: 30 days Tape: 45 days RPO: 15 mins

Bronze Policy, Test/Dev Disk: 5 days Tape: 30 days RPO: 1 hour

Replica

Replica ZDLRA also policy based

Page 26: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26

Single Console for End-to-End Visibility and Control

Page 27: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27

Immediate Alerts when Recovery Windows are at Risk

Page 28: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Let’s Summarize BRONZE and SILVER MAA parameters

ASM redundancy

RMAN Backups

ZDLRA so we can count on successful restore when required

Exadata-unique capabilities for the best database protection and availability

28

Page 29: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Protection and Oracle MAA

Bronze and Silver Data Protection

Gold and Platinum Data Protection

Data Protection as a Service

Samsung MAA Architecture for Private Cloud

1

2

3

4

5

29

Page 30: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Storage Remote Mirroring Architecture Problem: No real-time validation. Corruption and other problems are mirrored

Oracle Instance (in memory)

Primary Database Remote Volumes

SYNC or ASYNC block replication

Data corruptions are replicated • Zero Oracle validation • No Oracle block checks • No database recovery checks • No application validation

30

ORA-01578: ORACLE data block corrupt (file # 27, block # 331214)

000

000

Database Files

Recovery Files

Page 31: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Capability Physical Block Corruption Logical Block Corruption Dbverify, Analyze Physical block checks Logical checks for intra-block and

inter-object consistency

RMAN, ASM Physical block checks Intra-block logical checks

Active Data Guard

• Continuous physical block checking at standby • Strong isolation to prevent single point of failure • Automatic repair of physical corruptions • Automatic database failover

• Detect lost write corruption, auto shutdown and failover

• Intra-block logical checks at standby

Database In-memory block and redo checksum In-memory intra-block checks

ASM Automatic corruption detection and repair using extent pairs

Exadata HARD checks on write, automatic disk scrub and repair HARD checks on write

Gold and Platinum – Comprehensive Data Protection Oracle Data Protection

Runt

ime

Man

ual

Page 32: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Active Data Guard Architecture Oracle Aware Process Maintains an Exact Physical Copy of Production

Oracle Instance (in memory)

Primary Database

Oracle Instance (in memory)

SYNC or ASYNC database redo

Active Standby Database open read-only

Redo Apply

Recovery Files

Database Files

32

Data corruption is isolated to primary • Comprehensive run-time validation

• By Data Guard apply • By read-only application workload

• Automatic repair of primary using good copy from standby

000

Automatic block media recovery requested for (file#6, block #8738) Automatic block media recovery successful for (file#6, block #8738)

Page 33: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Primary Database

Far Sync Instance Active Standby Database • Oracle control file and log files

• No database files, no media recovery • Offload transport compression • Supports up to thirty remote

destinations

• DR and reporting instance • Open read-only • Continuous Oracle validation • Zero data loss failover target • Manual or automatic failover

SYNC Limited distance

ASYNC any distance

transport compression over WAN

Active Data Guard Demonstrations Environment: Primary, RAC Far Sync, Active Data Guard Standby

• Production instance

Page 34: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Demonstration: Automatic Repair of Primary Data Corruption

34

Page 35: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Active Data Guard Configuration • Data Guard Broker • Primary Database • Far Sync Instance

– Zero data loss over wan

• Active standby – Workload on both primary

and standby

35

Configuration – DGconfig Protection Mode: MaxAvailability Members: primary – Primary Database farsync – Far sync instance standby – Physical standby database

Primary Workload 13:46:54 175 8 13:46:55 173 8 ... Standby Workload 13:46:54 443 2 13:46:55 446 2 ...

Page 36: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

36

Corruption Detected at the Primary Database

Fri Sep 12 13:47:50 2014 Corrupt block relative dba: 0x00002222 (file 6, block 8738 Completely zero block found during multiblock buffer read

Page 37: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

37

Automatic Repair, No Application Error, Zero Downtime

Primary Workload 13:47:49 180 6 13:47:50 185 6 13:47:51 174 6 ... Standby Workload 13:47:49 446 1 13:47:50 450 1 13:47:51 455 1 ...

Fri Sep 12 13:47:50 2014 Automatic block media recovery is requested for (file# 6, block 8738) Fri Sep 12 13:47:51 2014 Automatic block media recovery successful for ((file# 6, block 8738)

Page 38: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Demonstration: Automatic Repair of Standby Data Corruption

38

Page 39: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

39

Corruption Detection and Auto-Repair at the Standby Fri Sep 12 13:54:33 2014 Corrupt block relative dba: 0x00002222 (file 6, block 8738 Completely zero block found during multiblock buffer read

Fri Sep 12 13:54:33 2014 Automatic block media recovery is requested for (file# 6, block 8738) Fri Sep 12 13:54:34 2014 Automatic block media recovery successful for ((file# 6, block 8738)

Primary Workload 13:54:33 181 5 13:54:34 180 5 ... Standby Workload 13:54:33 450 3 13:54:34 442 3 ...

Page 40: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Demonstration: HA for Far Sync Instance

40

Page 41: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

41

Primary disconnects and reconnects to second RAC Far Sync instance Fri Sep 12 13:57:04 2014 LGWR: Error 1041 disconnecting from dest LOG_ARCHIVE_DEST_2 standby host ‘farsyncp’ LGWR: RFS network connection re-established at host ‘farsyncp’ LGWR: RFS destination opened for reconnect at host ‘farsyncp’

Shutdown abort of the first RAC Far Sync Instance Fri Sep 12 13:57:06 2014 Instance shutdown complete

Page 42: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

42

Primary has brief brownout while network connection transitions from one node to the next Data Protection Mode remains at Maximum Availability and RPO=0 is maintained Primary Workload Standby Workload 13:57:03 179 5 13:57:03 441 1 13:57:04 188 5 13:57:04 450 1 13:57:05 27 5 13:57:05 449 1 13:57:06 148 5 13:57:06 438 1 13:57:07 177 5 13:57:07 442 1

Page 43: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Demonstration: Automatic Failover with WAN Zero Data Loss

43

Page 44: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

44

With Fast-Start Failover enabled – kill LGWR process on primary to induce failover Oracle-> kill -9 3808

Page 45: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

45

Data Guard Observer initiates and completes automatic failover 14:31:03.02 Initiating Fast-Start Failover to database “standby”... Performing failover NOW, please wait... Failover succeeded, new primary is “standby” 14:31:43.29

Total outage last 56 seconds, including detection, failover, and reconnect time Primary Workload Standby Workload 14:30:52 181 3 14:30:52 444 1 14:30:54 0 3 14:30:54 450 1 14:31:06 0 3 14:31:06 0 1 14:31:48 0 3 14:31:49 74 1 14:31:50 122 4 14:31:51 182 1

Page 46: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Let’s Summarize GOLD and PLATINUM MAA parameters + lost_write +

db_block_checking (standby only)

Active Data Guard provides continuous validation

Auto data block repair for primary and standby

Full utilization of standby for queries and reports

Fast database and application failover in seconds

Zero data loss with SYNC (LAN/MAN) or FAR SYNC (WAN)

46

Page 47: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Protection and Oracle MAA

Bronze and Silver Data Protection

Gold and Platinum Data Protection

Data Protection as a Service

Samsung MAA Architecture for Private Cloud

1

2

3

4

5

47

Page 48: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Cloud Control 12c

• Comprehensive support for all methods of consolidation

• Automated, intelligent placement

• Complete self-service catalog

• Flexible cloning architecture

• Integrated database lifecycle management

• API-driven (RESTful and command line)

Self-Service Provisioning

Page 49: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49

Comprehensive Database Service Catalog Enterprise Manager Cloud Control 12c

Primary Standbys EM12c R4

SI - SI SI

RAC - RAC SI RAC RAC RON - RON RON

SI – Single Instance RAC – Real Application Clusters RON – RAC One Node

1

2

3

4

5

6

7

• High Availability offerings with Active Data Guard

• Supports Single Instance, RAC One Node, and RAC standby; Multiple standby environments allowed

• Support for Oracle Database 10.2.0.5, 11.1.0.7, 11.2+, 12.1+

• Define your own custom or MAA Service Levels / Metals, and also allow users to upgrade or downgrade across these levels

• Define different Database sizes based on CPU, Memory, Storage, IOPS, etc

BRONZE

SILVER

GOLD

Page 50: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 50

Page 51: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 51

Page 52: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 52

Page 53: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 53

Page 54: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 54

Page 55: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 55

Page 56: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 56

Page 57: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 57

Page 58: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Protection and Oracle MAA

Bronze and Silver Data Protection

Gold and Platinum Data Protection

Data Protection as a Service

Samsung MAA Architecture for Private Cloud

1

2

3

4

5

58

Page 59: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

59

□ Cloud Architecture before Restart and FSFO

Restart and FSFO in Cloud

Diagram Weak points

- No support for OS level HA solutions

- Only Hypervisor HA is applied

. Does not detect DB crashes

. After a DB Crash, OS Reboot or Hypervisor

failover is needed; DBA must start DB manually

- DR Solution is not applied in case of:

. Hypervisor Pool Down

. Storage or Database file Failure

Hypervisor Hypervisor Hypervisor

DB

Hypervisor Pool

Hypervisor HA

Page 60: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

60

□ Architectural Improvement using Oracle Restart and Data Guard Fast Start Failover (FSFO)

Restart and FSFO in Cloud

Hypervisor Hypervisor Hypervisor

Hypervisor HA Restart

DB

Hypervisor Hypervisor Hypervisor

Restart

FSFO

DB

Hypervisor HA : Hypervisor Down

Oracle Restart : DB Crash in 30sec

Oracle FSFO : Hypervisor Down, Hypervisor Pool Down, DB File Corruption, DB Crash over 30sec, OS Reboot

SYNC

Page 61: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

61

Restart and FSFO in Cloud

1

1

0.5

10

5

0.5

30

25

20

0 5 10 15 20 25 30 35

Server Down

OS Reboot

DB Down

Failover Duration Time (min)

ServerCloud(without Restart/FSFO)

Restart

Restart/FSFO

Item Server Cloud(As-Was) Only Restart Restart+FSFO

Block Corruption Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Auto Repair

DB File Corruption Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)

Storage Down Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)

Hypervisor Pool Down Manual Recovery(Over 3hr) Manual Recovery(Over 3hr) Failover to Standby (in 1min)

FSFO Reduces Recovery Time Significantly !!

Restart and FSFO Reduce Failover Time !!

Restart can reduce detect/start DB time (our assumption: 20 min)

Page 62: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

62

□ The Result of Availability Test

Restart and FSFO in Cloud

Category Item Recovery

Time HA Solution Description

OS DB Server Reboot 46sec FSFO Executed Failover to Standby and Standby Reinstatement automatically

Observer Server Reboot 0sec MonObserver.sh Observer Restarted automatically after rebooting

DB LAN Card Fail 44sec FSFO Executed Failover to Standby and Standby Reinstatement automatically

DB DB Instance Crash 26sec Restart DB Instance was restarted automatically

DB Listener Crash 0sec Restart Listener was restarted automatically

GI Stop 39sec FSFO Executed Failover to Standby, but Standby should be reinstated manually

Datafile Write Fail 32sec FSFO Executed Failover to Standby, but Standby should be reinstated manually

Observer Observer Fail 0sec MonObserver.sh Observer Restarted automatically

DG Broker Manual Switch Over 15sec DG Broker Executed Switch Over by DG Broker

Manual Fail Over 15sec DG Broker Executed Failover and Automatic Standby Reinstatement

Hypervisor Live Migration 0sec Hypervisor Migrated to other Hypervisor online

Maximize Availability using Restart and FSFO

MonObserver.sh : Observer Restart Script, registered as a cron job

Page 63: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

63

Non Stop Cloud Active Data Center A

Zone

B Z

one

Restart Pods

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

금융 Pool

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Pool

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Pool

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Pool

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Pool

Hypervisor Hypervisor Hypervisor

DB1 DB3 DB2

RAC

Pool

FSFO

C Zone

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

금융 Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

Hypervisor Hypervisor Hypervisor

Restart

DB

Restart

DB

Restart

DB

Pool

FSFO SYNC

SYNC/ASYNC

ADG(Manual)

Restart Pod

Restart Pods RAC Pod

SYNC/ASYNC ………

ADG(Manual)

Page 64: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

64

Restart

FSFO

Restart

We can build Oracle in the Cloud

Page 65: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Key Take Aways • Generic solutions for backup and replication put data

at risk and make recovery uncertain

• Continuous validation using knowledge of Oracle block and redo structures is required – Physical and logical validation – Regularly scheduled background checks – Run-time database checks – Auto repair, transparent to the user when possible – Automatic recovery and failover when required

• Cloud-enabled to deliver Data Protection as a Service

65

Page 66: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

References • Oracle Maximum Availability Architecture

– www.oracle.com/goto/maa

• Oracle Enterprise Manager 12c Cloud Management – http://www.oracle.com/technetwork/oem/cloud-mgmt/em-dbaas-2104694.html

• The complete Samsung presentation – www.oracle.com/technetwork/database/availability/restart-fsf-in-cloud-oow-

2301693.pdf

Page 67: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Key HA Sessions and Demos by Oracle Development Monday, 29 September, Moscone South

11:45 MAA with Oracle Multitenant – Seeing is Believing, 104 1:30 Oracle Database 12c HA for Consolidation and Cloud, 306 2:45 Zero Data Loss Recovery Appliance, New Era in Data Protection, 307 4:00 Oracle GoldenGate 12c for Oracle Database 12c, 305 5:15 Maximizing Oracle RAC Uptime, 103 Tuesday, 30 September, Moscone South 10:45 Active Data Guard and GoldenGate HA Best Practices, 308 12:00 Zero-Downtime Mantra for Applications with Oracle RAC, 309 3:45 Zero Data Loss Recovery Appliance Best Practices, 305 5:00 Oracle WebLogic Server 12c: Oracle Database Integration, 304 5:00 Geodistributed Oracle GoldenGate and Active Data Guard:

Global Data Services, 307

Wednesday, 1 October, Moscone South 10:15 Resource Manager Best Practices 11:30 RMAN Best Practices in Oracle Database 12c, 104 12:45 Active Data Guard: Best Practices and Deep Dive, 104 2:00 Expert High-Availability Best Practices for Oracle Exadata, 102 4:45 GoldenGate Performance and Tuning for Oracle, NORTH 130

Thursday, 2 October, Moscone South 9:30 Best Practices for Zero Downtime, 103 12:00 Data Protection,Recovery and HA for Private Cloud, 103 Demos – Moscone South

Oracle Maximum Availability Architecture, SLD-140 Oracle Active Data Guard, SLD-145 Global Data Services, SLD-144

Continuous Availability, SLD-125 RMAN, Database Backup Cloud Service, Flashback, SLD-141 Oracle Secure Backup, SLD-142 Oracle Real Application Clusters, SLD-128

oracle.com/goto/availability https://blogs.oracle.com/MAA @OracleMAA

Page 68: Data Protection, Recovery, and HAfor Private Cloud Deployments · Problem Statement: Lack of Intelligent Data Validation . Data can be corrupted anywhere . and anytime… and can

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 68