the self-diagnosing database: strategies and directions · business challenges yoracle database is...
TRANSCRIPT
The Self-Healing Database: Strategies and Directions
Richard SarwalVice President, Server Technologies
Oracle Corporation
Business Challenges
Oracle Database is the default choice for running mission critical applications
– Web Store Front, Banking Backbone, Stock Exchange, Credit Card Sales, Airline Reservations, etc.
The cost of failure is unacceptably high– Up to $6+ million per hour*
Product stability is, therefore, of paramount importance
– Fewer software defects out of box– Faster bug diagnostics and resolution
Source: Gartner Group & Contingency Planning Research, Inc.
Technical Challenges
“There are no more easy bugs”
Increase in application complexityIncrease in system complexityIncrease in database sizeIncrease in workload
Oracle Responds to the Challenges
DesignCost = x
Code/Unit TestCost = 10x
StressTesting
In Production
CustomersIn the Field
Cost = 1,000XTesting
Integration Testing
Cost = 100X
Code Renovation
Prod - 32 %
Introduce fewer bugs
Test Renovation catch bugs before productionSelf-Healing Database
Diagnose & resolve bugs quickly
Goal #1
Highest Quality
Continuous Code Renovation
Refresh and update code baseFocus renovation on more problematic areasEase further code development and maintenanceEase problem diagnosis
Even More Comprehensive QA
More stringent release exit criteriaSignificant enhanced testing
– Oracle 10g was subjected to approx. 145000 tests ( 3x compared to 9i, 30x compared to 7.3)
– Development Grid– More automated tests to allow more frequent and rigorous
testing– Dynamic tests to simulate unpredictable customer
environments– Test suites based on real customer workloads– All tests are being made RAC enabled
Widespread internal deployments during alpha and beta test phases
Goal #2
Self-Healing Database
Self-Healing Database
Aim to reduce business interruptions due to bugsA key development focus area for next database release
– Close interaction with support– Extensive customer, partner feedback
Wide-spread effort across entire Oracle technology stack
Self-Healing Database
GoalsDetect problem proactivelyLimit damage & interruptionsReduce problem diagnosis timeSimplify problem resolution and repairImprove solution delivery
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
DeliverySolutionPrevention
• Early Change Impact Analysis• Early Detection• Limit Damage
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
DeliverySolutionPrevention
• Early Change Impact Analysis• Early Detection• Limit Damage
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
Problem PreventionEarly Change Impact Analysis
– Identify, analyze and correct impact due to changes, e.g. database upgrade
– SQL Workload Comparison– On production system, before upgrade
Automatically capture SQL statements, SQL plans, execution statistics
– On test system, after upgradeRe-issue SQL statementsAutomatically identify SQL plans changes and performance regression For regressed SQLs, use SQL Tuning Advisor to analyze and improve the plans
– Correct impact before upgrading the production system
Problem Prevention
Early Detection– Automatically & periodically check database
health– When potential problems are detected,
Notify administratorRecommend corrective actionsPerform more thorough checksLimit damage by quarantine
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
Delivery
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
SolutionPrevention
• Early Change Impact Analysis• Early Detection• Limit Damage
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
Problem DiagnosisUnified Diagnostic Information Storage
– Single location for all database diagnostic information, e.g. alert log, dumps, trace, DDLs
DBA won’t have to manually collect information from different locations and compile them together manually
– Common format to allow co-relation across multiple tiers, products, instances, processes
RAC instances, client-server, CRS, ASMDB, Application Server, etc.Enable easy diagnosis of problems spanning across multiple components/products
– Available even when instance/database is down
Problem Diagnosis
Automatic Information Capture– Goal is to collect enough information to achieve First-
Failure diagnosisTraces, Dumps, OS information, Patch Information, DB Configuration, DDL, etc.
– Information capture is automatically triggered by errors – Built-in intelligence to
Capture targeted, relevant information to reduce volumeDetect repeated occurrences of the same problem and avoid redundant data collection
– Captured information may also be packaged automatically to create reproducible test cases
Problem Diagnosis
Automatic Analysis– Problems are automatically analyzed and
characterized into fully identifiable “Problem Identity”
– Oracle may automatically “Phone home” to match problems with already known issues/bugs
49% of bugs are duplicates or Not-a -bug
– Possible patches or workarounds may be automatically identified
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
DeliverySolution
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
Prevention
• Early Change Impact Analysis • Early Detection• Limit Damage
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
Problem Resolution
Faster resolution enabled by – Comprehensive diagnostic data
Automatically captured by the database– Flexible, dynamic infrastructure to collect
additional diagnostics information, if neededNo need to apply diagnostic patches
– Automatic test case generation– Diagnosability being made a key consideration
for all future development work
Problem Resolution
Automatic, Intelligent Data Repair– Automatic damage assessment
What? How extensive? How important?
– Recommendation of repair options with down time vs. data loss trade-offs
– In-depth feasibility check of customer chosen option– Automatic implementation of the repair operation, if
required RMAN recovery operations, patch data block, recreate index, etc.
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
SolutionPrevention
• Early Change Impact Analysis • Early Detection• Limit Damage
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
Delivery
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
Solution Delivery
Proactive Critical Bug Alerts– Enterprise Manager Grid Control automatically
notifies customers of known bugs and corresponding patch fixes
– Patch information downloaded automatically from MetaLink
– Patch application can also be automated, if desired– Allows customers to fix the problem before
encountering it– Only notify the sites with affected release, platform – This functionality is available today
Works with all supported database versions
Critical Patch advisory
Proactive Patch Notification
Problem Resolution Lifecycle
Diagnosis
• Data Collection• Knowledge Search• Diagnostic Execution• Analysis• Problem Description
Refinement
Time to Resolution
SolutionPrevention
• Early Change Impact Analysis • Early Detection• Limit Damage
• Identification of Work Around• Identification of
Repair Method• Identification of
Code Fix
Delivery
• Creation /Delivery ofAcceptable Solution• Customer
Acceptance
The New, Automatic Diagnostic Workflow
ProblemEncountered
1
Database automatically
captures diagnostic data and send out an
alert
Unified Diagnostic Repository
2
DBA follows recommended steps, including looking up support knowledge
base
3
Known issue?
4
Yes
Apply Patch
5
NoNew issue reported
Diagnostic data automatically packaged and uploaded
6
What does it mean to you?
Fewer “Surprises”– By anticipating problems and fixing them in advance
Faster problem resolutionReduced business interruption, higher availabilityEnhanced administrator productivity
– DBAs no longer need to collect diagnostic information manually
Better quality of service for end users
Conclusion
Oracle fully understands the customer needs and challenges
– Enhanced Manageability– Simplified Diagnosability
Responding to these challenges is the top development priority
– Oracle Database 10g took a giant strides in manageability, future releases to build on it
– Automatic, simplified Diagnosability is one of the key development focus area for the future
Oracle remains committed to providing customers most sophisticated, practical, and relevant solutions
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S