database high availability db2 9 dba certification...

22
1 DB2 9 DBA certification exam 731 Database High Availability Solutions 2010 P.O. Box 200, 5520 AE Eersel, The Netherlands Tel.:(+31) 497-530190, Fax: (+31) 497-530191 E-mail: [email protected] Copyright © KBCE b.v., 2010 All Intellectual Rights Reserved 2 Disclaimer The information contained in this presentation is based on techniques, algorithms, and documentation published by the several authors and companies, and in addition is the result of research. It is therefore subject to change at any time without notice or warning. The information contained in this presentation has not been submitted to any formal tests or review and is distributed on an “As is” basis without any warranty, either expressed or implied. The use of this information or the implementation of any of these techniques is a client responsibility and depends on the client’s ability to evaluate and integrate them into the client’s operational environment. While each item may have been reviewed for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Clients attempting to adapt these techniques to their own environments do so at their own risks. Foils, handouts, and additional materials distributed as part of this presentation or seminar should be reviewed in their entirety. Note: This presentation gives you an overview of techniques used by database vendors. It can not be used for making company decisions regarding high availability without further studies.

Upload: others

Post on 18-Apr-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

1

DB2 9 DBA certification exam 731

Database High AvailabilitySolutions 2010

P.O. Box 200, 5520 AE Eersel, The NetherlandsTel.:(+31) 497-530190, Fax: (+31) 497-530191

E-mail: [email protected]

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 2

Disclaimer

The information contained in this presentation is based on techniques, algorithms, and documentation published by the several authors and companies, and in addition is the result of research. It is therefore subject to change at any time without notice or warning.

The information contained in this presentation has not been submitted to any formal tests or review and is distributed on an “As is” basis without any warranty, either expressed or implied.

The use of this information or the implementation of any of these techniques is a client responsibility and depends on the client’s ability to evaluate and integrate them into the client’s operational environment.

While each item may have been reviewed for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere.

Clients attempting to adapt these techniques to their own environments do so at their own risks.

Foils, handouts, and additional materials distributed as part of this presentation or seminar should be reviewed in their entirety.

Note: This presentation gives you an overview of techniques used

by database vendors. It can not be used for making company decisions regarding high availability without further studies.

Page 2: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

2

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 3

Trademarks

This presentation contains many trademarks in use by database vendorsif we are aware of a trademark we put it in capitals.

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 4

Agenda

What is downtime

Techniques in use

SQL Server Cluster

DB2 Data Sharing / PureScale vs Oracle RAC

Wise Words

Page 3: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

3

What is downtime?

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 6

Terminology in use

Term Business Risk Solution

Data RecoveryDowntime and

Data lossRedundant data

High Availability

DowntimeRedundant

systemcomponents

Disaster Recovery

Permanent Data loss and

"unable to continue"

Redundant systems

and facilities

Not investing in hardware, software and knowledge means potential high risk for downtime and data loss

Page 4: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

4

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 7

About Downtime and Data loss

Permitted downtime?

Acceptable data/transaction loss (if any)?

Mean time to recovery? • Difference for "normal down" and disaster?

How much damage is done after how much time? $$?

Who is first in case of disaster?Note:

Database uptime ≠ application availability• Application failures• Hardware Outages (Power, Network, etc.)

Uptime SLADowntimeper Year

Downtimeper Month

99.9% 8.76 hours 43.8 minutes

99.99% 52.6 minutes 4.38 minutes

99.999% 5.26 minutes 0.438 minutes

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 8

What is causing downtime?

Downtime

Unplanneddown

HardwareFailure

DataCorruption

Planneddown

Data / ApplChanges

System Upgrades

Page 5: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

5

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 9

Unplanned: Hardware Failure

Storage subsystem

• Disk or Controller…

• Firmware or driver problem

Network

• Often causing a partial down (difficult to measure)

• Often rely on third party (SLA!)

Server

• Cluster Support: Oracle RAC / Microsoft MSCS / IBM Sysplex and PureScale

• Where is the backup / cluster server?

• Can virtual server be a solution?

Power Outage

• Environment change, too many requests (unstable grid) etc.

• Third party, difficult to SLA

Unplanneddown

HardwareFailure

DataCorruption

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 10

Unplanned: Site failure

Site Failure

• Complete Server room down (e.g. fire)

• Can always happen because you dependon external party (e.g. power)

More than just a database problem

• All data and hardware is involved

Can you handle?

• Network changes

• Workload (different config)

• Fail-back situation

Isn't there a hidden Single Point of Failure

• E.g. glassfiber back-up in the same bundle

Unplanneddown

HardwareFailure

DataCorruption

Page 6: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

6

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 11

Unplanned: Data Corruption

Human error

• Biggest problem difficult because manyscenarios which need different solutions

Logical Corruption

• Difficult to detect (sometimes after years strange data emerges)

• Can you detect the cause (e.g. program) and how much data is affected?

– Special techniques to go back in time and select data again

– Can you re-process the data / transactions?

Unplanneddown

HardwareFailure

DataCorruption

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 12

Planned: Data / Application changes

Application upgrades

• Schema changes still difficult

– Still a market for vendor tools

– DB2 versioning has performance impact

• Running systems need to shutdown in order to reload

– E.g. middleware transaction refresh

• How are we testing / accepting this

– If test fails, how do we undo the change?

Data maintenance

• Offline REORG (sometimes needed)

• Roll-in / Roll-out data

Planneddown

Data/ApplChanges

System Upgrades

Page 7: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

7

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 13

Planned: System Upgrades

Hardware upgrades

• Growth

• End of life cycle / vendor support

• Redundant is not always hot-swap

Software upgrades

• Operating system, middleware, DBMS etc.

– Wise to combine upgrades?

– What if the new combination is not stable

How to respond to vendor patches

• Needed? What if we don't

• Policy? One size fits all?

Planneddown

Data/ApplChanges

System Upgrades

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 14

Down vs Solution

Page 8: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

8

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 15

Partial down

Most common down

• System does work but certain function are unavailable

– Examples:

Transactions with certain input abort

Certain location cannot connect

Single database corrupt

End-user to database goes through many layers

• How to report?

Layered approach can buy to downtime

• Let user work with front-end as if it is real-time

• Bring down backend for maintenance and queue requests

– E.g. online banking without full history or real-time transactions

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 16

How long does it take to fix it…

0

2

4

6

8

10

Tim

e N

eed

ed

Recovery TimeBreakdown of recovery

Many problems the analysis takes a (very) long time

• Human errors

• Corruptions

Many companies suffer a knowledge problem

How to fix it

• Creating the scenario

• Testing the scenario?

Speed of recovery itself

• Best parameters

• Tools

Page 9: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

9

Techniques in useto minimize downtime

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 18

Database Backups

Traditional backup types

• Full backup – Online / Offline (z/OS Sharelevel)

• Incremental & Differential backup

• Include log backup

• Any other than Full backup is substitute for log!

Disk is better than tape

• First backup to disk (separate physical disk volume)

• Detect exceptions encountered during backup

• Verify backup files

• Copy backup files to tape, remote disk or storage manager (TSM)

Data retention policy for backup files

• What are you going to do with these backups?

Page 10: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

10

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 19

Backup Retention Policy

Location of backup files

Duration of retention

Protection of sensitive data!

• Sarbanes/Oxley (SOX)

• HIPAA

• Internal policies for data management and protection

Access to backups from offsite data storage

• Often the weakest link in security scenario's

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 20

Snapshot technology

Two techniques

• Real disk mirroring

• Share disk after snap until an update is done

How useful is the snapped disk

• Was DBMS aware of the snap? Was it up?

• Did the DBMS participate in the snap?

Snap can be extremely useful

• As backup

• As a fallback (e.g. after failed upgrade)

Be careful

• Not all scenarios work on the snapped copy

Page 11: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

11

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 21

Rebuild database (restore & rollforward)

DB2 LUW more flexible than z/OS

• Include log with backup

• Differential and delta backups

Used for:

• Redirected restore (problem investigation)

• Simple scenario's which allow for down-time

• As a safety net when all else fails

Less safe than you might imagine

• Backup just reads files (does not analyze them)

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 22

Replication

Create offline solution

• Based on master / slave

• Can be a "High Availability" solution

All database support this (sometimes very advanced)

• Often horizontal / vertical segmented

Mostly row based but sometimes optimized

• E.g. MySQL DRBD is like RAID1 over network

Replication does not take care of:

• IP takeover

• Heartbeat & automate takeover

• Slave becoming master

• Fail back and resync

• Conflict resolution (slave is read-only)

• With further automation can be base for High Availability (MySQL Flipper)

Page 12: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

12

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 23

Replication using Log Shipping

DB2 for LUW HADR

• Many limitation compared to DataGuard

Xkoto Gridscale for DB2

• More options than HADR

Data Guard

• Suports both log shipping and SQL shipping

• Very mature / flexible product

DB2 for z/OS Trackersite

• Very basic (really in use?)

Microsoft SQL Server mirror

• Witness server automates the takeover

• Xkoto has also gridscale for SQL Server

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 24

Clustering solution

Withdraw a "node" from the solution

• Network problem (IP needs re-routed)

• Failover might not be active

• Can it be automated?

– Heartbeat or Timeout

– Client re-route

– Split Brain problem

What happens to running Units Of Work?

• Locked data or other node backout?

• Take-over of transaction / restart of transaction / fail transaction

Controlled "failure" for planned down (e.g. upgrade)

Fail back / Insert node into the cluster

Page 13: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

13

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 25

Shared nothing vs shared everything

Shared Everything

• DB2 Data Sharing & PureScale (HA & Perf)

• Oracle RAC (HA & Perf)

– Does not share memory, only disk

• Microsoft SQL Server Cluster (HA)

– Based on hardware / operating system solution

• Sybase IQ ??

Shared Nothing

• Microsoft SQL Server mirror (HA)

• Oracle Data Guard (HA)

• DB2 HADR (HA)

• MySQL Cluster (HA) MySQL Replication (Perf)

– Both have many limitations, still large system use them

• Perf: Terradata, Postgres (Greenplum), Netezza , DB2 LUW DPF

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 26

High Availabilty Replication v.s. Cluster

Both can have Single Point of Failure (SPF)

Wrong configuration can destroy data

SAN/NAS I/O overhead when shared storage

• With RAID SAN is no longer SPF

• Make sure network to the SAN is not SPF

Replication is easy to break

• Inconsistent data (e.g. middle transaction)

• Painful start-up / restart

Page 14: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

14

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 27

Split Brain condition

Due to communications failures nodes are separated

• Missing heartbeat is really down or communication failure

If multiple nodes control of the cluster, then it's called a split-brain condition

• If this happens, then bad things will happen

Special software solutions are needed to 100% secure a down of the other node(s)

• This software can become Single Point of Failure

SQL Server Cluster

Page 15: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

15

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 29

Client PCs

Node A Node B

Disk cabinet A

Disk cabinet B

HeartbeatSQL

Passive Node

Failure Occurs!

SCSIReserveBroken

SQL fails overand is available

to clients

SQL

New Reservation Established

SQL Server cluster: Failover clustering

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 30

MirrorPrincipal

Witness

Log

Application

SQL Server SQL Server

2

2

4

51

Data DataLog

3>2 >3

Commit

SQL Server Cluster: Data Mirroring

Page 16: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

16

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 31

SQL Server Comparison

Database Mirroring

Scope: user Database

Standard hardware

Very fast failover (seconds)

OS flexible (e.g. 32/64)

Independent storage

Reporting on mirror (Read-Only)

Geographic separation OK

Failover Clustering

Scope: Full instance

Certified hardware

Automatic failover (minutes)

Enterprise OS

Shared storage

Standby not available

Servers co-located (site failure!)

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 32

Compare DB2 to Oracle

DB2 Data Sharing / PureScale

Single system image

Dynamic workload management

Software / Hardware Solution

No Single Point of Failure

ORACLE RAC

High Speed inter-system links

Lots of communications

No Global cache

• Cache Fusion / Interconnect

• Passes data around a lot

Extra communication overhead

Scalability?

Data

Log Log Log

DB2 DB2 DB2

CF CF

Page 17: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

17

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 33

Components in a RAC Cluster

Global Cache Services (GCS)

• Manage Data Page Synchronization

• Sends DATA to other nodes

Global Enqueue Service (GES)

• Manages Global Locks for non-data pages

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 34

Oracle RAC in action – cache fusion

Example how Oracle RAC moves data around

Instance A Instance B Instance C

Instance DMaster Node

8741

1

2 3Read

into buffer cache

8741

Want to readdata block 8741

Do READdata block 8741

Instance B becomes "owner" of the block

Page 18: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

18

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 35

Oracle RAC in action – cache fusion

Example how Oracle RAC moves data around

Instance A Instance B Instance C

Instance DMaster Node

8741

8741

Want to readdata block 8741

12

87413

Please send8741 to node C

Send 8741

4

Receiveddata block 8741

from Node B

Data no longer comes from disk, owners forward them. Multiple copies around

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 36

Oracle RAC in action – cache fusion

Example how Oracle RAC moves data around

Instance A

Instance B

Instance C

Instance DMaster Node

8741

8741

1

3

8741

6

Flush PIdata block 8741

4Write

data block 8741

5

8741

Flush PIdata block 8741

6

2

Forward8741 to node B

PI = Previous Image

Owners have to write, after write caches are flushedNew read requests moves the data again around

Page 19: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

19

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 37

What happens if RAC node fails

Crash recovery by other node

• Freeze GCS, not allowing updates to database anymore

• Data block remaster and recover the pages using redo/undo

• Invalidate blocks recovered

False node failure detection

• Can have split brain problem

• Oracle Custer software

– Heartbeat based, Single Point of Failure (Yes, says IBM)

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 38

Unavailable time

Page 20: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

20

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 39

RAC vs Purescale Scalability

IBM PureScale benchmark:

• 95% scalability 32 members

• 81% scalability 112 members

Oracle RAC??

• No figures…

• License forbids publication of measurements

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 40

Even Larry admits…

eWEEK (www.eweek.com) 31-Oct-2003:

I make fun of a lot of other databases – all other databases, in fact, except the mainframe version of DB2. It's a first-rate piece of technology.

Larry Ellison, Oracle's Founder and CEO

I guess we have to add DB2 LUW PureScale to it… ;-)

Page 21: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

21

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 41

Wise words…

Prepare for failure… DISASTER WILL HAPPEN

Ensure that no important data is lost

• Think of the different types of unavailability (there is no golden bullet)

Keep It Simple, Stupid (KISS)

Complexity is the enemy of reliability

• Saving on education is like stopping a watch to save time…

Automate as much as possible

• Careful you still understand it, so document it (incl. what if scenario)

Test it! Frequently!! Use good scenarios!

Audit it

• You need a devile's advocate to find the holes…

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 42

New technology

Source: channelinsider.com

If you can see it an touch it then it is: physicalIf you cannot see it but you can touch it then it is: transparentIf you can see it but not touch it then it is: virtualIf you cannot see it nor touch it then… It's gone!

Be careful with new technology ;-)

Page 22: Database High Availability DB2 9 DBA certification ...dugi.molaro.be/wp-content/uploads/2010/06/June2010... · 1 DB2 9 DBA certification exam 731 Database High Availability Solutions

22

Copyright © KBCE b.v., 2010 – All Intellectual Rights Reserved 43

QUESTIONS ?