ashish prabhu douglas utzig high availability systems group server technologies oracle corporation
DESCRIPTION
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation. Maximum Availability Architecture Oracle's Recipe For Building An Unbreakable System. Agenda. Achieving High Availability Maximum Availability Architecture (MAA) Overview MAA Components - PowerPoint PPT PresentationTRANSCRIPT
Ashish PrabhuDouglas UtzigHigh Availability Systems GroupServer TechnologiesOracle Corporation
Maximum Availability ArchitectureOracle's Recipe For Building An Unbreakable System
Agenda
Achieving High Availability Maximum Availability Architecture (MAA)
Overview MAA Components Performance Considerations MAA Test Lab Q & A
High Availability is …
Causes of Downtime
Maintenance &Maintenance &ContinuousContinuousOperationsOperations
ScheduledScheduledOutagesOutages
Inadequate SystemInadequate SystemDesign, Testing & ProcessDesign, Testing & Process
UnscheduledUnscheduledOutagesOutages
Data Center Data Center DisastersDisasters
HumanHumanErrorError
System FaultsSystem Faultsand Crashesand Crashes
Data andData andMedia FailuresMedia Failures
High Availability Goal
Design and validate the best, integrated High Availability solution
– Unbreakable Architecture Handle all outages at all tiers
– Best Practices Cookbook for prevention, avoidance, mitigation, and
recovery Configuration, operational, outage solutions, restore fault
tolerance– Complete out-of-the-box high availability
Tested and validated solution
Unbreakable Architecture + Best Practices = Maximum Availability
Maximum Availability Architecture
Best Oracle High Availability Architecture– Blueprint for Database and Oracle9iAS– Guidelines for hardware and non-Oracle software
but platform, OS, storage, network, … independent– Evolves with new Oracle versions and features
Best Practices– Configuration and operational– Outages and detailed solutions– Restoring fault tolerance after an outage
Maximum Availability Architecture
WAN Traffic Manager
Dedicated Network
Primary Site
RAC
Oracle9iAS
Secondary Site
Oracle9iAS
RACData Guard
Secondary Site
Secondary Site is a Mirror of the Primary Site– Resolve unscheduled outages quickly and easily– Allow site-wide scheduled outages
Same Service Levels– Predictable performance and response time– Site transparency
Consistent Procedures and Processes– Reduces administrative complexity
Highly Available DatabaseReal Application Clusters
Fast Failover– Protection from local site system failures– Faster than cold cluster failover solution– Fast-start fault recovery (instance failure MTTR)
Availability and Accessibility – Allows for scheduled outages
Add and remove nodes transparently
– Transparent Application Failover (TAF) provides uninterrupted service
Highly Available DatabaseReal Application Clusters
Higher Scalability – All system resources from all nodes are leveraged– Cache fusion eliminates need to partition data or
modify the application – fully application transparent– Connection load balancing distributes connection
requests from application tier
Manageability– Provides a single image of the database to manage
Highly Available DatabaseOracle Data Guard
Data Protection– Protection from site failures, data failures, human
errors, and corruptions Protection modes balance availability with performance Apply delay prevents user error propagation
– Greater protection, performance, and manageability compared to remote mirroring solution
– Offload processing from primary database system Role Management
– Switchover operation for scheduled outages– Failover operation for unscheduled outages
Highly Available ApplicationOracle9iAS
Availability– Oracle9iAS J2EE (OC4J) and Web Cache
clustering for protection against system outages– Automatic monitor and restart of failed processes– Application state preserved through failures– Add and remove nodes transparently
Scalability– Hardware network load balancer distributes client
requests to Web Cache– Web Cache clustering for distributed caching and
load balancing across multiple OC4J instances
Highly Available ApplicationOracle9iAS
Application Application Server TierServer Tier
Database TierDatabase Tier
ClientsClients
Web CacheWeb Cache
OC4J ClustersOC4J Clusters
Load BalancerLoad Balancer
Network Infrastructure
Wide Area Traffic Manager to direct client traffic to proper site
Network load balancer to distribute incoming requests
Dedicated, fast link between sites– Influences production database performance
Redundant components and paths– Network paths to the site and within the site
Best Practices
Configuration– Detailed recommendations for Oracle software
Features to use, parameters to set– Guidelines for hardware and other software
Operational– Technical – e.g. Switchover and failover procedures– Logistical – e.g. Change management considerations– Emphasis on outages
Outages to monitor Detailed steps to resolve outages How to restore fault tolerance
Best Practices
Detect Detect OutageOutage
ConfigurationConfiguration Monitor for OutageMonitor for Outage
Restore Fault Restore Fault ToleranceTolerance
Resolve OutageResolve Outage
DatabaseOracle9iASOSStorageNetwork
Operational
HA and Performance
Combining high availability and performance– Secondary site with identical configuration as
primary site– Network bandwidth and latency between sites– Data Guard protection mode– Instance recovery time
Network Bandwidth / Latency
Network bandwidth and latency between sites influences commit response time
Longer network latency will increase response time
– Remote write = network round trip time + local write I/O time at secondary site
Network bandwidth should be greater than maximum redo generation rate
Database Protection Modes
Balance performance with level of protection from human error, data failures, and disasters
Maximum Protection and Maximum Availability modes
– No-data-loss protection, but can have a performance impact on production service levels
Maximum Performance mode– Data loss possible, but less impact on production
service levels
Instance Recovery Time
Balance performance with level of protection from system faults and crashes
Short instance recovery times can be achieved with negligible impact on performance
– Provided sufficient I/O capacity exists to handle additional data block writes generated
Fast-start checkpointing makes instance recovery time-bounded and predictable
Instance Recovery Time
0
100
200
300
400
500
600
700
800
900
disabled 300 180 90
writes/sec
tps
MAA Test LabOracle, Sun, HP, EMC, F5
WAN Traffic Manager
Dedicated Network
Primary Site
RAC
Oracle9iAS
Secondary Site
Oracle9iAS
RACData GuardF5 Networks
EMC
Hewlett-Packard
Sun Microsystems
Maximum Availability Architecture
Best Oracle High Availability Architecture What to use
Best Practices How to build it How to manage it How to fix it
MAA Information Sources
Oracle Technology Network– High Availability Collateral section
Maximum Availability Architecture - Overview Maximum Availability Architecture – The Details
http://otn.oracle.com/deploy/availability/techlisting.html
Oracle Consulting – Advanced Technologies Solutions (ATS) Group
http://otn.oracle.com/consulting/9iServices/content.html
Next StepsSessions by Oracle Database Development
RAC: The Present, The Future, but not Science Fiction
Mon, 1pm -- Moscone Room 103
Running Your Applications on Oracle Real Application Clusters
Mon, 11am -- Moscone Room 134
Real Customers, Real Application Clusters, Real Results
Mon, 4pm -- Moscone Room 134
Deploying A Highly Manageable Oracle Real Application Clusters
DatabaseMon, 5:30pm -- Moscone Room 134
Breaking All the Rules with The Unbreakable Database
Tue, 11am -- Moscone Room 103
Oracle’s Recipe For Building An Unbreakable System
Tue, 1pm -- Moscone Room 134
Bullet-Proof Data Protection with Oracle Data Guard
Tue, 4pm -- Moscone Room 134
TuesdayMonday
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Next StepsSessions by Oracle Database Development
Getting Under The Hood With Data Guard SQL Apply
Wed, 8:30am -- Moscone Room 134
LogMiner, Flashback Query and Online Redefinition: Power Tools
For DBAsWed, 11am -- Moscone Room 134
Are You Using The Best To Protect Your Enterprise Data?
Wed, 4pm -- Moscone Room 252
Oracle LogMiner - Not Just An Error Recovery Tool
Wed, 5:30pm -- Moscone Room 102
Wednesday
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
Real Application Clusters
Data Guard
Backup & Recovery with Recovery Manager
LogMiner, Flashback Query and Online Redefinition
Database HA Demos All Four DaysIn The Oracle Demo Campground
Next StepsSessions by Oracle Database Development
Showcase Presentation/Demo
11:00 AM-- Database High Availability: Data Guard
11:30 AM-- Database High Availability: Backup & Recovery and Recovery Manager
12:00 PM -- Database High Availability: Online Reorg, Flashback Query and LogMiner
11:00 AM-- Real Application Clusters: Scalability
11:30 AM-- Real Application Clusters: High Availability
12:00 PM -- Real Application Clusters: CFS on Linux
11:00 AM-- Real Application Clusters: Scalability
11:30 AM-- Real Application Clusters: High Availability
12:30 PM -- Database High Availability: Data Guard
Monday
Tuesday
Wednesday
For More Info On Oracle HA Go To http://otn.oracle.com/deploy/availability/
AQ&Q U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S