ugf9795 das-ioug oow rac reconfiguration ugf9795 final

Upload: mobin-nasim

Post on 01-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    1/31

    How to Minimize Brownouts in

    Oracle RAC ReconfigurationSession ID#UGF9795

    Amit K Das

    PayPal

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    2/31

    Introduction: about our team Sehmuz BayhanOur visionary director. Executed great changes in lightning

    speed.

    Saibabu DevabhaktuniOur fearless leader at PayPal for at least 10 years.

    http://sai-oracle.blogspot.com/

    Kyle TowleOur fearless database architect at Paypal for at least 8 years.

    Dong WangGoldengate expert, speaker at multiple conferences, PayPalDBA for going on 8 years.

    John KanagarajAuthor, Oracle ACE, frequent speaker at Oracleconferences

    Sarah BrydonOne of the very few Oracle Certified Masters.

    Samrat RoyOne of our Expert DBA.

    http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/http://sai-oracle.blogspot.com/
  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    3/31

    Who Am I?

    11 years in Oracle RAC Development team.

    Technical lead for worlds first Exadata production go-live (Apple), while at Oracle.

    Currently Engineering lead/architect for World largestExadata OLTP system (PayPal).

    Frequent presenter in Oracle Open World, IOUG,NoCOUG.

    Love fishing.

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    4/31

    PayPals Amazing Growth and Requirements

    Amazing Growth Exponential growth in PayPal business year over year

    Business is growing rapidly New users, features, transaction

    New channels: POS, Mobile, etc

    Massive growth in database demand every year Not uncommon to see database workloads grow 50-100% every

    year

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    5/31

    One of the Largest OLTP database on Oracle

    Measured by Executions X Processes (concurrency)

    Fast paced VLDB OLTP environment on Oracle 500+ database instances

    OLTP databases commonly 10-130 TB 5-14K concurrent processes

    Executions > 80K/sec, >10GB Redo/Minute

    Continuously growing High growth of PayPals business per yearup to 2 X workload

    increase

    Tier one databases built to support 300K+ execs/sec

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    6/31

    Agenda1. Understand Oracle RAC and its advantages

    2. Understand OracleRAC reconfiguration

    3. Measure the reconfiguration time

    4. Oracle RAC reconfiguration vs. business SLA

    5. Understand the workload and behavior

    6. Minimizing the impact of Oracle RAC reconfiguration

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    7/31

    Understand Oracle RAC and its Advantages

    Why Oracle RAC?

    1. Without it, we cannot scale our fast growing business.

    2. Without it, we will not have high availability.3. The largest single machine cannot be a replacement for Oracle

    RAC.

    4. Without it, we will not able to do rolling patching.

    5. Without it, we cannot build a real cloud.

    6. And More

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    8/31

    Understand RAC Reconfiguration

    Every Oracle Block in RAC is protected by a lockelement, when it is accessed by an instance.

    Total number of master locks will be equally distributedbetween all instances.

    Other non mastered instance will own a copy/shadowlock if he does not own the master lock but interest withthat block.

    Open Block count can measure from v$bh

    Open lock count can measure from X$ view

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    9/31

    Understand RAC Reconfiguration

    Why do we need Reconfiguration?

    To protect cache consistency when a node joins or leavesthe cluster.

    When it can trigger?

    If an instance joins or leaves the cluster

    Start or Stop recovery of ADG

    Partial reconfiguration triggers:

    o In certain situations when DRM is ON

    o Flushing of Buffer Cache on any node in the cluster.

    o When redirecting locks

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    10/31

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    11/31

    Note: We have huge OLTP load and very large

    buffer cache, we observe things depending ourworkload and business needs .

    Understand RAC Reconfiguration

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    12/31

    Measure the reconfiguration time When node leave or join, dump the ASH data to a offline

    table Create table ash_092213 as select * from v$active_session_history;

    Generate report from ash_092213 with a SQL like belowcol MAXTIME format a15

    col SAMPLE_TIME format a26

    col BLOCKING_SESSION format 99999999

    col event format a25

    col AVGTIME format a15

    select a.sample_time,a.BLOCKING_SESSION,b.event,a.BLOCKING_SESSION_STATUS,a.BLOCKING_HANGCHAIN_INFO, a.event,

    count(*), round(max(a.TIME_WAITED)/1000,2)||' ms' maxtime,round(avg(a.TIME_WAITED)/1000,2)||' ms' avgtimefrom ash_092213 a, ash_092213 b

    where a.sample_time=b.sample_time and b.SESSION_ID=a.BLOCKING_SESSION

    group by a.sample_time,a.BLOCKING_SESSION,a.BLOCKING_SESSION_STATUS,b.event,

    a.BLOCKING_HANGCHAIN_INFO,a.event having count(*)>5 order by a.sample_time ;

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    13/31

    Measure the reconfiguration time Analyze the report of last SQL SAMPLE_TIME BLK_SESSION Blocker_EVENT BLK_Stat Waiter_EVENT COUNT(*) MAXTIME AVGTIME

    -------------------------- ---------------- ------------------------- ----------- - ------------------------- ---------- --------------- ---------------

    09-JUL-13 09.38.17.416 PM 3521 ges enter server mode VALID log file sync 1043 6605.26 ms 2889.65 ms

    09-JUL-13 09.38.17.416 PM 7419 latch: enqueue hash chains VALID latch free 8 600.26 ms 341.55 ms

    09-JUL-13 09.38.17.416 PM 9162 gcs resource directory to be unfrozen VALID gc buffer busy acquire 12 0.00 ms 0 ms

    09-JUL-13 09.38.17.416 PM 4442 gcs resource directory to be unfrozen VALID gc buffer busy acquire 41 0.00 ms 0 ms 09-JUL-13 09.38.17.416 PM 18612 gcs resource directory to be unfrozen VALID gc buffer busy acquire 12 0.00 ms 0 ms

    .............

    .............

    09-JUL-13 09.38.31.448 PM 22026 gcs resource directory to be unfrozen VALID gc buffer busy acquire 26 11117.21 ms 7216.45 ms

    09-JUL-13 09.38.32.882 PM 3433 latch: gcs resource hash VALID latch: gc element 8 59.23 ms 34.64 ms

    09-JUL-13 09.38.32.882 PM 3521 VALID log file sync 528 51.22 ms 19.71 ms

    09-JUL-13 09.38.32.882 PM 3697 VALID gc domain validation 14 1070.05 ms 806.82 ms

    09-JUL-13 09.38.32.882 PM 14559 db file sequential read VALID read by other session 36 186.36 ms 185.71 ms

    09-JUL-13 09.38.48.997 PM 3521 log file parallel write VALID log file sync 29 14.51 ms 11.87 ms

    09-JUL-13 09.38.50.015 PM 3521 log file parallel write VALID log file sync 6 .57 ms .45 ms

    Total Time of RAC reconfiguration time is :(09-JUL-13 09.38.32.882 PM- 09-JUL-13 09.38.17.416 PM )=15sec+

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    14/31

    Oracle RAC reconfiguration vs. business SLA

    If your application is a DSS? most of the time reconfiguration will be invisible for the business.

    If your application is a large OLTP? define your SLA and check whether reconfiguration can meet

    your expectation

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    15/31

    Understand the workload and behavior

    Discover the data access pattern. Is the application partitioned by node?

    Is the workload random read/write accessing from all nodes?

    Is the workload read mostly accessing from all nodes and writeonly from one node?

    Is the application configured as active/passive?

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    16/31

    Understand the workload and behavior

    Measure the object distribution across all nodes.

    Maintain high uptime: avoid stopping or starting any instancefrequently (i.e. avoid reconfiguration)

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    17/31

    How to minimize the impact of the reconfiguration in

    Oracle RAC.

    1.Proper sizing of buffer_cache.

    2.Optimal number of LMS processes

    3.Move application traffic and wait as long aspossible to cool down the buffer cache from thenode which is to be brought down.

    4. Increase the bandwidth and low latency forprivate interconnect.

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    18/31

    Proper sizing of Buffer Cache Larger Buffer cache => more objects => more time

    required for RAC reconfiguration

    If your application meets its I/O SLA with smaller Buffer

    Cache, keep it as small as possible Prior to increasing buffer cache size, consider effect on

    RAC reconfiguration

    Note: RAC will require a larger buffer cache than a single

    node to serve same amount of traffic for failover situation

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    19/31

    Increasing LMS processes optimally

    Do not simply increase the number of LMS processes Too many LMS processes cause overhead for synchronization Need to limit the # of LMS processes when you have multiple

    DBs on same cluster

    Check: The usage of interconnect

    CPU usage

    increase the number of LMS processes only if you haveenough bandwidth on the Interconnect and idle CPU cycles

    Improve LMS performance further by applying Fix 13843646 : LONGLATCH WAIT ON GCS RESOURCE VALIDATE LIST ONLARGE NUMBER OF

    LMSES

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    20/31

    Planning for reconfiguration

    Move your traffic to other nodes for planned shutdown

    Allow more time for auto remastering

    Fix for bug 10415371 will help faster reconfiguration (40-50% improvement) (needs latest JAN PSU with 11203)

    Avoid unwanted double reconfiguration with fix 13812526

    SQL>Alter system set db_cache_size=256Mscope=memory sid=;

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    21/31

    From our experience

    Need good amount of patient to execute special toolperfectly.

    Make sure one node can handle the entire traffic.

    Transfer all connections to one node.

    Aggressive lock conversion can cause more sessionspike and make things worse.

    Monitor the lock conversion.

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    22/31

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    23/31

    Sharing the result from passive node

    Passive Node DLM Locks

    Beforeredirecting locks

    12,340,238

    3 Hours after 2,001,760

    5 Hours after 435,628

    2

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    24/31

    Determining the best time to shutdown passive instance

    Shutdown the passive instance when you see

    Small number of DLM locks on passive node

    Shrink the buffer_cache of passive node Shutdown passive instance with immediate option will

    cause close to ZERO impact.

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    25/31

    Lessons learned

    Working with Oracle for further improvementWIP Do not flush buffer cache on passive node to close the open DLM

    locks

    But shrink buffer cache before shutdown

    Optimization of LMS process count will improve reconfiguration.

    The fix of bug 10415371 will help for faster reconfiguration (40-50%improvement).

    Avoid unwanted double reconfiguration with fix 13812526

    Instance startup from mount to open will be impacted (2-10 sec) withCF Enqueue contention, received the fix.- Bug 17237521 : DELAY INARC0 STARTING UP

    Disabled DRM using the parameter _gc_policy_time=0

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    26/31

    My Next Presentation

    Session ID: CON2468Session Title: Tech System Refresh: ZeroDowntimewith Oracle RAC/Oracle Automatic Storage

    ManagementVenue / Room: Moscone South - 300Date and Time: 9/26/13, 15:30 - 16:30

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    27/31

    COLLABORATE14.IOUG.ORG

    Network with 5,000+ database and

    application professionals 5 days of in-depth education built by

    users, for users

    Complimentary Pre-Conference

    Workshops for IOUG registrants

    April 7 11, 2014

    The Venetian Resort,Las Vegas, NV

    Attend for free!

    IOUGs Call for Speakers is now open

    collaborate14.ioug.org/call-for-speakers

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    28/31

    Communities

    Training Close to Home

    Maximum Availability Architecture

    with Oracles Larry Carpenter and Joe Meeks

    October 15: Milwaukee, WI

    October 16: Rochester, NY

    Performance and Internals with Craig Shallahamer

    November 6: Chicago, IL

    November 8: Atlanta, GA

    Get more information at www.ioug.org/masterclass

    or visit the IOUG kiosk in Moscone West, 2ndfloor

    IOUG Master Class Series ReturnsComing soon to a city near you!

    One Day Interactive Training Events to meet your

    educational needs

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    29/31

    Come See IOUG at the User Group Pavilion

    Win a Free Registration to COLLABORATE 14!

    Stop by the IOUG kiosk in the User Group Pavilion inMoscone West, 2nd floor, to pick up a lucky poker chip and

    see if you are a winner of a free registration toCOLLABORATE 14 in Las Vegas!

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    30/31

  • 8/9/2019 UGF9795 Das-IOUG OOW RAC Reconfiguration UGF9795 Final

    31/31

    THANK YOU

    Amit K Das ([email protected])

    mailto:[email protected]:[email protected]