ugf9795 das-ioug oow rac reconfiguration ugf9795 final
TRANSCRIPT
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
1/31
How to Minimize Brownouts in
Oracle RAC ReconfigurationSession ID#UGF9795
Amit K Das
PayPal
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
2/31
Introduction: about our team Sehmuz BayhanOur visionary director. Executed great changes in lightning
speed.
Saibabu DevabhaktuniOur fearless leader at PayPal for at least 10 years.
http://sai-oracle.blogspot.com/
Kyle TowleOur fearless database architect at Paypal for at least 8 years.
Dong WangGoldengate expert, speaker at multiple conferences, PayPalDBA for going on 8 years.
John KanagarajAuthor, Oracle ACE, frequent speaker at Oracleconferences
Sarah BrydonOne of the very few Oracle Certified Masters.
Samrat RoyOne of our Expert DBA.
http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/ -
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
3/31
Who Am I?
11 years in Oracle RAC Development team.
Technical lead for worlds first Exadata production go-live (Apple), while at Oracle.
Currently Engineering lead/architect for World largestExadata OLTP system (PayPal).
Frequent presenter in Oracle Open World, IOUG,NoCOUG.
Love fishing.
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
4/31
PayPals Amazing Growth and Requirements
Amazing Growth Exponential growth in PayPal business year over year
Business is growing rapidly New users, features, transaction
New channels: POS, Mobile, etc
Massive growth in database demand every year Not uncommon to see database workloads grow 50-100% every
year
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
5/31
One of the Largest OLTP database on Oracle
Measured by Executions X Processes (concurrency)
Fast paced VLDB OLTP environment on Oracle 500+ database instances
OLTP databases commonly 10-130 TB 5-14K concurrent processes
Executions > 80K/sec, >10GB Redo/Minute
Continuously growing High growth of PayPals business per yearup to 2 X workload
increase
Tier one databases built to support 300K+ execs/sec
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
6/31
Agenda1. Understand Oracle RAC and its advantages
2. Understand OracleRAC reconfiguration
3. Measure the reconfiguration time
4. Oracle RAC reconfiguration vs. business SLA
5. Understand the workload and behavior
6. Minimizing the impact of Oracle RAC reconfiguration
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
7/31
Understand Oracle RAC and its Advantages
Why Oracle RAC?
1. Without it, we cannot scale our fast growing business.
2. Without it, we will not have high availability.3. The largest single machine cannot be a replacement for Oracle
RAC.
4. Without it, we will not able to do rolling patching.
5. Without it, we cannot build a real cloud.
6. And More
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
8/31
Understand RAC Reconfiguration
Every Oracle Block in RAC is protected by a lockelement, when it is accessed by an instance.
Total number of master locks will be equally distributedbetween all instances.
Other non mastered instance will own a copy/shadowlock if he does not own the master lock but interest withthat block.
Open Block count can measure from v$bh
Open lock count can measure from X$ view
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
9/31
Understand RAC Reconfiguration
Why do we need Reconfiguration?
To protect cache consistency when a node joins or leavesthe cluster.
When it can trigger?
If an instance joins or leaves the cluster
Start or Stop recovery of ADG
Partial reconfiguration triggers:
o In certain situations when DRM is ON
o Flushing of Buffer Cache on any node in the cluster.
o When redirecting locks
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
10/31
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
11/31
Note: We have huge OLTP load and very large
buffer cache, we observe things depending ourworkload and business needs .
Understand RAC Reconfiguration
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
12/31
Measure the reconfiguration time When node leave or join, dump the ASH data to a offline
table Create table ash_092213 as select * from v$active_session_history;
Generate report from ash_092213 with a SQL like belowcol MAXTIME format a15
col SAMPLE_TIME format a26
col BLOCKING_SESSION format 99999999
col event format a25
col AVGTIME format a15
select a.sample_time,a.BLOCKING_SESSION,b.event,a.BLOCKING_SESSION_STATUS,a.BLOCKING_HANGCHAIN_INFO, a.event,
count(*), round(max(a.TIME_WAITED)/1000,2)||' ms' maxtime,round(avg(a.TIME_WAITED)/1000,2)||' ms' avgtimefrom ash_092213 a, ash_092213 b
where a.sample_time=b.sample_time and b.SESSION_ID=a.BLOCKING_SESSION
group by a.sample_time,a.BLOCKING_SESSION,a.BLOCKING_SESSION_STATUS,b.event,
a.BLOCKING_HANGCHAIN_INFO,a.event having count(*)>5 order by a.sample_time ;
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
13/31
Measure the reconfiguration time Analyze the report of last SQL SAMPLE_TIME BLK_SESSION Blocker_EVENT BLK_Stat Waiter_EVENT COUNT(*) MAXTIME AVGTIME
-------------------------- ---------------- ------------------------- ----------- - ------------------------- ---------- --------------- ---------------
09-JUL-13 09.38.17.416 PM 3521 ges enter server mode VALID log file sync 1043 6605.26 ms 2889.65 ms
09-JUL-13 09.38.17.416 PM 7419 latch: enqueue hash chains VALID latch free 8 600.26 ms 341.55 ms
09-JUL-13 09.38.17.416 PM 9162 gcs resource directory to be unfrozen VALID gc buffer busy acquire 12 0.00 ms 0 ms
09-JUL-13 09.38.17.416 PM 4442 gcs resource directory to be unfrozen VALID gc buffer busy acquire 41 0.00 ms 0 ms 09-JUL-13 09.38.17.416 PM 18612 gcs resource directory to be unfrozen VALID gc buffer busy acquire 12 0.00 ms 0 ms
.............
.............
09-JUL-13 09.38.31.448 PM 22026 gcs resource directory to be unfrozen VALID gc buffer busy acquire 26 11117.21 ms 7216.45 ms
09-JUL-13 09.38.32.882 PM 3433 latch: gcs resource hash VALID latch: gc element 8 59.23 ms 34.64 ms
09-JUL-13 09.38.32.882 PM 3521 VALID log file sync 528 51.22 ms 19.71 ms
09-JUL-13 09.38.32.882 PM 3697 VALID gc domain validation 14 1070.05 ms 806.82 ms
09-JUL-13 09.38.32.882 PM 14559 db file sequential read VALID read by other session 36 186.36 ms 185.71 ms
09-JUL-13 09.38.48.997 PM 3521 log file parallel write VALID log file sync 29 14.51 ms 11.87 ms
09-JUL-13 09.38.50.015 PM 3521 log file parallel write VALID log file sync 6 .57 ms .45 ms
Total Time of RAC reconfiguration time is :(09-JUL-13 09.38.32.882 PM- 09-JUL-13 09.38.17.416 PM )=15sec+
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
14/31
Oracle RAC reconfiguration vs. business SLA
If your application is a DSS? most of the time reconfiguration will be invisible for the business.
If your application is a large OLTP? define your SLA and check whether reconfiguration can meet
your expectation
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
15/31
Understand the workload and behavior
Discover the data access pattern. Is the application partitioned by node?
Is the workload random read/write accessing from all nodes?
Is the workload read mostly accessing from all nodes and writeonly from one node?
Is the application configured as active/passive?
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
16/31
Understand the workload and behavior
Measure the object distribution across all nodes.
Maintain high uptime: avoid stopping or starting any instancefrequently (i.e. avoid reconfiguration)
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
17/31
How to minimize the impact of the reconfiguration in
Oracle RAC.
1.Proper sizing of buffer_cache.
2.Optimal number of LMS processes
3.Move application traffic and wait as long aspossible to cool down the buffer cache from thenode which is to be brought down.
4. Increase the bandwidth and low latency forprivate interconnect.
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
18/31
Proper sizing of Buffer Cache Larger Buffer cache => more objects => more time
required for RAC reconfiguration
If your application meets its I/O SLA with smaller Buffer
Cache, keep it as small as possible Prior to increasing buffer cache size, consider effect on
RAC reconfiguration
Note: RAC will require a larger buffer cache than a single
node to serve same amount of traffic for failover situation
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
19/31
Increasing LMS processes optimally
Do not simply increase the number of LMS processes Too many LMS processes cause overhead for synchronization Need to limit the # of LMS processes when you have multiple
DBs on same cluster
Check: The usage of interconnect
CPU usage
increase the number of LMS processes only if you haveenough bandwidth on the Interconnect and idle CPU cycles
Improve LMS performance further by applying Fix 13843646 : LONGLATCH WAIT ON GCS RESOURCE VALIDATE LIST ONLARGE NUMBER OF
LMSES
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
20/31
Planning for reconfiguration
Move your traffic to other nodes for planned shutdown
Allow more time for auto remastering
Fix for bug 10415371 will help faster reconfiguration (40-50% improvement) (needs latest JAN PSU with 11203)
Avoid unwanted double reconfiguration with fix 13812526
SQL>Alter system set db_cache_size=256Mscope=memory sid=;
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
21/31
From our experience
Need good amount of patient to execute special toolperfectly.
Make sure one node can handle the entire traffic.
Transfer all connections to one node.
Aggressive lock conversion can cause more sessionspike and make things worse.
Monitor the lock conversion.
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
22/31
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
23/31
Sharing the result from passive node
Passive Node DLM Locks
Beforeredirecting locks
12,340,238
3 Hours after 2,001,760
5 Hours after 435,628
2
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
24/31
Determining the best time to shutdown passive instance
Shutdown the passive instance when you see
Small number of DLM locks on passive node
Shrink the buffer_cache of passive node Shutdown passive instance with immediate option will
cause close to ZERO impact.
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
25/31
Lessons learned
Working with Oracle for further improvementWIP Do not flush buffer cache on passive node to close the open DLM
locks
But shrink buffer cache before shutdown
Optimization of LMS process count will improve reconfiguration.
The fix of bug 10415371 will help for faster reconfiguration (40-50%improvement).
Avoid unwanted double reconfiguration with fix 13812526
Instance startup from mount to open will be impacted (2-10 sec) withCF Enqueue contention, received the fix.- Bug 17237521 : DELAY INARC0 STARTING UP
Disabled DRM using the parameter _gc_policy_time=0
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
26/31
My Next Presentation
Session ID: CON2468Session Title: Tech System Refresh: ZeroDowntimewith Oracle RAC/Oracle Automatic Storage
ManagementVenue / Room: Moscone South - 300Date and Time: 9/26/13, 15:30 - 16:30
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
27/31
COLLABORATE14.IOUG.ORG
Network with 5,000+ database and
application professionals 5 days of in-depth education built by
users, for users
Complimentary Pre-Conference
Workshops for IOUG registrants
April 7 11, 2014
The Venetian Resort,Las Vegas, NV
Attend for free!
IOUGs Call for Speakers is now open
collaborate14.ioug.org/call-for-speakers
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
28/31
Communities
Training Close to Home
Maximum Availability Architecture
with Oracles Larry Carpenter and Joe Meeks
October 15: Milwaukee, WI
October 16: Rochester, NY
Performance and Internals with Craig Shallahamer
November 6: Chicago, IL
November 8: Atlanta, GA
Get more information at www.ioug.org/masterclass
or visit the IOUG kiosk in Moscone West, 2ndfloor
IOUG Master Class Series ReturnsComing soon to a city near you!
One Day Interactive Training Events to meet your
educational needs
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
29/31
Come See IOUG at the User Group Pavilion
Win a Free Registration to COLLABORATE 14!
Stop by the IOUG kiosk in the User Group Pavilion inMoscone West, 2nd floor, to pick up a lucky poker chip and
see if you are a winner of a free registration toCOLLABORATE 14 in Las Vegas!
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
30/31
-
8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final
31/31
THANK YOU
Amit K Das ([email protected])
mailto:[email protected]:[email protected]