seouc - how to solve the wrong problem

Upload: romeo-vasileniuc

Post on 02-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    1/42

    SEOUC 2011 November 4th, 2011

    How to Solve the Wrong Problem(DFS Lock Handle)

    Romeo Vasileniuc

    BB&T Specialized Lending

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    2/42

    About Romeo..

    I started working with Oracle database since 1994 I've been involved in about all aspects of Oracle database

    technologies including RAC, ASM, Data Guard and Streams Designed and implemented many different varieties of high

    available database environments using RAC on ASM, OCFS2 and

    Tru64 CFS I enjoy performance tuning Proficient Perl developer In my current role as data warehouse architect with BB&T, I have

    architected and implemented many business-driven solutions usingOracle and other vendor products to meet critical business needs

    especially in warehousing area Oracle Certified Master, OCP, OCE

    2

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    3/42

    Agenda

    DWH Story

    DFS Lock Handle

    Oracle DB Storage Overview

    Direct I/O

    Sync/Async Mode

    Supported Platforms

    System Monitoring

    Oracle testing tools Orion

    Q&A

    3

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    4/42

    DWH Story

    TDWH01 TDWH01

    TDWH Service

    TDWH01 TDWH01

    TDWH Service

    DWH01 DWH01

    DWH Service

    9.2.0.3 >>10.2.0.4

    TTS

    Tru64 9.2.0.3

    Tru64 10.2.0.4 RHEL5 10.2.0.4

    DWH01 DWH01

    DWH Service

    10.2.0.4 >>10.2.0.5

    RHEL5 10.2.0.5

    Storage Vendor A Storage Vendor B

    Storage Vendor BStorage Vendor A

    4

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    5/42

    DFS Lock Handle

    Parameter Description

    name The name or "type" of the enqueue or global lock can be determined by looking at the two highorder bytes of P1 or P1RAW. The name is always two characters. Use the following SQLstatement to retrieve the lock name.

    select chr(bitand(p1,-16777216)/16777215)||chr(bitand(p1,16711680)/65535) "Lock"

    from v$session_waitwhere event = 'DFS enqueue lock acquisition';

    mode The mode is usually stored in the low order bytes of P1 or P1RAW and indicates the mode ofthe enqueue or global lock request.

    select chr(bitand(p1,-16777216)/16777215)|| chr(bitand(p1, 16711680)/65535) "Lock",bitand(p1, 65536) "Mode" from v$session_wait where event = 'DFS enqueue lockacquisition';

    id1The first identifier (id1) of the enqueue or global lock takes its value from P2 or P2RAW. Themeaning of the identifier depends on the name (P1).

    id2 The second identifier (id2) of the enqueue or global lock takes its value from P3 or P3RAW.The meaning of the identifier depends on the name (P1).

    5

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    6/42

    DFS Lock Handle 10.2.0.4

    6

    Bug 8215444

    Sessions may hang in a RAC shared server environment whilewaiting for invalidation locks to be acquired. This is most likely tooccur when using Shared Servers with RAC - it should be very rare

    with dedicated server connections. Rediscovery Notes: In a RACenvironment, sessions wait for "DFS lock handle" with the lockbeing waited on being an invalidation lock (type "IV")

    Fixed in:

    10.2.0.5

    11.2.0.2 12.1 (future release)

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    7/42

    File System Characteristics

    Performance is not the most important point

    Oracle does not support files on file systems that do not have awrite-through-cache capability

    The file system must acknowledge the write operations (Standard

    NFS UDP / Network Appliance modified NFS) Security Requirements : data files should be accessible only for the

    database owner

    Journaling file systems changes are recorded in a journal file (fsckmore quickly compared to non-journaled file systems)

    7

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    8/42

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    9/42

    Recommended File Systems

    Single Node Any file systems supported by the Linux vendor

    Multi-node (RAC) RAW

    OCFS/OCFS2

    Redhat Global File System (GFS) See Document 329530.1

    NFS-based storage systems (e.g. NetApp, EMC)

    ASM

    9

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    10/42

    Oracle Memory/Disk Workflow

    User Process Redo Log Buffer Redo Log Writer DB Writer

    Buffer Cache

    Storage

    CBC CBC

    10

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    11/42

    Linux Write Operations Ext3

    Buffer Cache Page Cache

    Disk

    Oracle Process

    ssize_t = write(fd1, const void *buf, size_t count) Kernel

    Oracle Kernelfd1 = open(/system01.dbf, O_SYNC|..)

    I/O

    kernel switch

    kernel switch

    User Space

    Kernel Space

    11

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    12/42

    Linux Read Operations Ext3

    Buffer Cache Page Cache

    Disk

    Oracle Process

    ssize_t = read(fd1, const void *buf, size_t count) Kernel

    Oracle Kernelfd1 = open(/system01.dbf, O_SYNC|..);

    I/O

    kernel switch

    kernel switch

    User Space

    Kernel Space

    12

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    13/42

    Linux Write Operations RAW, OCFS

    Buffer Cache

    Disk

    Oracle Process

    ssize_t = write(fd1, const void *buf, size_t count) Kernel

    Oracle Kernelfd1 = open(/system01.dbf,

    O_SYNC|O_DIRECT|..);

    I/O

    kernel switch

    kernel switch

    User Space

    Kernel Space

    13

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    14/42

    Linux Read Operations RAW, OCFS

    Buffer Cache

    Disk

    Oracle Process

    ssize_t = read(fd1, const void *buf, size_t count) Kernel

    Oracle Kernelfd1 = open(/system01.dbf,

    O_SYNC|O_DIRECT|..);

    I/O

    kernel switch

    kernel switch

    User Space

    Kernel Space

    14

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    15/42

    Linux System Call : Open(..)int open(const char *pathname, int flags);O_APPEND

    The file is opened in append modeO_ASYNC

    Enable signal-driven I/OO_CLOEXEC (Since Linux 2.6.23)

    Enable the close-on-exec flag for the new file descriptor.O_CREAT

    If the file does not exist it will be created.O_DIRECT (Since Linux 2.4.10)Try to minimize cache effects of the I/O to and from this file. In general this will

    degrade performance, but it is useful in special situations, such as when applications dotheir own caching. File I/O is done directly to/from user space buffers. The O_DIRECTflag on its own makes at an effort to transfer data synchronously, but does not give theguarantees of the O_SYNC that data and necessary metadata are transferred. To guaranteesynchronous I/O the O_SYNC must be used in addition to O_DIRECT.O_SYNC

    The file is opened for synchronous I/O. Any writes on the resulting file descriptorwill block the calling process until the data has been physically written to the

    underlying hardware.

    15

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    16/42

    AIO Overview

    Standard feature of 2.6 kernels (patches for 2.4)

    Initiate a number of I/O operations without having to block or waitfor any to complete

    Later, or after being notified of I/O completion, the process can

    retrieve the results of the I/O

    16

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    17/42

    I/O models

    Blocking Non-blocking

    Synchronous Read/Write Read/Write(O_NONBLOCK)

    Aynchronous I/O multiplexing

    (select/poll)

    AIO

    17

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    18/42

    Synchronous blocking I/O model

    18

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    19/42

    Synchronous non-blocking I/O model

    19

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    20/42

    Aynchronous blocking I/O model

    20

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    21/42

    Aynchronous non-blocking I/O model (AIO)

    21

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    22/42

    AIO interface APIs

    API function Description

    aio_read Request an asynchronous read operation

    aio_error Check the status of an asynchronous request

    aio_return Get the return status of a completed asynchronous

    request

    aio_write Request an asynchronous operation

    aio_suspend Suspend the calling process until one or moreasynchronous requests have completed (or failed)

    aio_cancel Cancel an asynchronous I/O requestlio_listio Initiate a list of I/O operations

    22

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    23/42

    Example AIO Usage : AIOCB Structure#include

    int aio_read(struct aiocb *aiocbp);

    struct aiocb {/* The order of these fields is implementation-dependent */

    int aio_fildes; /* File descriptor */off_t aio_offset; /* File offset */volatile void *aio_buf; /* Location of buffer */

    size_t aio_nbytes; /* Length of transfer */int aio_reqprio; /* Request priority */struct sigevent aio_sigevent; /* Notification method */int aio_lio_opcode; /* Operation to be performed;

    lio_listio() only */

    /* Various implementation-internal fields not shown */};

    23

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    24/42

    Example AIO Usage...#include

    ...

    int main(){// open the fileint file = open(/tmp/file.txt", O_RDONLY|O_DIRECT|O_SYNC, 0);// create the bufferchar* buffer = new char[SIZE_TO_READ];

    // create the control block structureaiocb cb;memset(&cb, 0, sizeof(aiocb));cb.aio_nbytes = SIZE_TO_READ;

    cb.aio_fildes = file;cb.aio_offset = 0;cb.aio_buf = buffer;

    // read!if (aio_read(&cb) == -1){

    cout

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    25/42

    FILESYSTEMIO_OPTIONS

    Value Description AIO DIO

    asynch This allows asynchronous IO to be usedwhere supported by the OS.

    directIO This allows directIO to be used where

    supported by the OS. Direct IO bypasses anyUnix buffer cache. As of 10.2 most platformswill try to use "directio" option for NFSattributes are sensible).

    setall Enables both ASYNC and DIRECT IO.

    none This disables ASYNC IO and DIRECT IO sothat Oracle uses normal synchronous writes,without any direct io options.

    25

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    26/42

    FILESYSTEMIO_OPTIONS - ASM

    Value Description AIO DIO

    asynch This allows asynchronous IO to be usedwhere supported by the OS.

    DISK_ASYNC_IO{TRUE|FALSE}

    directIO This allows directIO to be used where

    supported by the OS. Direct IO bypassesany Unix buffer cache. As of 10.2 mostplatforms will try to use "directio" optionfor NFS attributes are sensible).

    DISK_ASYNC_IO{TRUE|FALSE}

    setall Enables both ASYNC and DIRECT IO.DISK_ASYNC_IO

    {TRUE|FALSE}

    none This disables ASYNC IO and DIRECT IOso that Oracle uses normal synchronouswrites, without any direct io options.

    DISK_ASYNC_IO{TRUE|FALSE}

    26

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    27/42

    DISK_ASYNC_IO

    Controls whether I/O to datafiles, control files and logfiles isasynchronous (that is, whether parallel server processes can overlapI/O requests with CPU processing during table scans).

    Default value : TRUE

    Set to FALSE to disable asynchronous I/O

    If you set DISK_ASYNCH_IO to false then you should setDBWR_IO_SLAVES to a value other than its default (0) in order tosimulate asynchronous I/O

    If DBWR_IO_SLAVES>0 then number of processes used by ARCH

    and LGWR is set to 4. Also, RMAN server processes will be set to 4only if asynchronous I/O is disabled

    27

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    28/42

    How to check the usage at OS Level

    If I/O async is enabled:

    If I/O async is disabled:

    [oracle@dwh01 ~]$ sudo cat /proc/slabinfo | grep kiokioctx 89 144 320 12 1 : tunables 54 27 8 : slabdata 12 12 0kiocb 47 240 256 15 1 : tunables 120 60 8 : slabdata 13 16 0

    [oracle@lnx6 ~]$ sudo cat /proc/slabinfo | grep kiokioctx 0 0 320 12 1 : tunables 54 27 8 : slabdata 0 0 0kiocb 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0

    28

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    29/42

    How to check the usage at DB Level

    If I/O async is enabled:

    If I/O async is disabled:

    [oracle@a01 ~]$ ps -eaf | grep lgworacle 13924 1 0 Oct12 ? 00:19:18 ora_lgwr_DWH1

    [oracle@a01 ~]$ strace -p 13924io_submit(47566907363328, 2, {{0x2b4309936b70, 0, 1, 0, 41}, {0x2b4309936d38, 0, 1, 0,42}}) = 2

    io_getevents(47566907363328, 1, 1024, {{0x2b4309936b70, 0x2b4309936b70, 512, 0},{0x2b4309936d38, 0x2b4309936d38, 512, 0}}, {600, 0}) = 2

    [oracle@b01 ~]$ ps -eaf | grep lgwroracle 28400 1 0 Feb22 ? 05:41:21 ora_lgwr_PMT1

    [oracle@b01 ~]$ strace -p 28400..pwrite64(20, "\1\"\0\0\4)\0\0p\317\10\0\20\200..\213k\r\1\0\0042"..., 512, 5376000) = 512pwrite64(21, "\1\"\0\0\4)\0\0p\317\10\0\20\200..\213k\r\1\0\0042"..., 512, 5376000) = 512..

    29

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    30/42

    How to check the usage at DB Level - ASM

    Async Enbled by default on >=10g and ASM

    You can disable using DISK_ASYNC_IO=false

    [oracle@c01 ~] strace -p 13260...open(0xfe2c85f0, O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0660) = 8

    writev(8, [?] 0xffffb608, 2) = 80...read(16, "MSA\0\2\0\10\0P\0\0\0\222\377\377\377@\313\373\5\0\0\0"..., 80) = 80...

    30

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    31/42

    Oracle Linux, Filesystem & I/O Type Supportability(Doc ID 279069.1)

    10g Async I/O

    10g Direct I/O

    11g - AsyncI/O

    11g Direct I/O

    ext3/ext4 Yes (8) Yes (8) Yes (8) Yes (8)

    raw Depr. (2) Depr. (2) Depr. (2) Depr. (2)

    block Yes

    (3)

    Yes (3) Yes

    (3)

    Yes (3)ASM

    (4)Yes Yes Yes Yes

    OCFS2 Yes Yes Yes Yes

    NFS Yes Yes(7)

    Yes (7) Yes

    GFS Yes(5)

    Yes(5)

    GFS2 Yes(6)

    Yes(6)

    31

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    32/42

    System Monitoring - LGWR[oracle@dwh1 ~]$ ps -eaf | grep lgwr

    oracle 28878 1 0 11:47 ? 00:00:02 ora_lgwr_test1[oracle@pd3 ~]$

    [oracle@dwh1 ~]$ strace -cp 28878Process 28878 attached - interrupt to quitProcess 28878 detached% time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ----------------91.68 0.076143 43 1791 pwrite5.67 0.004709 16 290 pread

    2.59 0.002152 10 212 7 semtimedop0.06 0.000053 0 5945 times0.00 0.000000 0 1 read0.00 0.000000 0 36 open0.00 0.000000 0 36 close0.00 0.000000 0 21 stat0.00 0.000000 0 35 writev0.00 0.000000 0 10 sendto0.00 0.000000 0 123 kill0.00 0.000000 0 67 semctl0.00 0.000000 0 10 getrusage

    ------ ----------- ----------- --------- --------- ----------------100.00 0.083057 8577 7 total

    32

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    33/42

    System Monitoring - DBWR[oracle@dwh1 ~]$ ps -eaf | grep dbw

    oracle 28876 1 0 11:47 ? 00:00:03 ora_dbw0_test1

    [oracle@dwh1 ~]$ strace -cp 28876Process 28876 attached - interrupt to quitProcess 28876 detached% time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ----------------89.77 0.019979 54 370 pwrite5.67 0.001262 0 70462 getrusage4.49 0.000999 42 24 9 semtimedop

    0.07 0.000015 0 817 times0.00 0.000000 0 2 mmap0.00 0.000000 0 21 semctl

    ------ ----------- ----------- --------- --------- ----------------100.00 0.022255 71696 9 total

    33

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    34/42

    System Monitoring LGWR - ASM[oracle@pd3 ~]$ ps -eaf | grep lgwroracle 30809 1 0 13:37 ? 00:00:00 ora_lgwr_test1

    [oracle@pd3 ~]$ strace -cp 30809Process 30809 attached - interrupt to quitProcess 30809 detached% time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ----------------65.97 0.034288 91 377 io_submit13.77 0.007155 4 1839 io_getevents11.15 0.005794 20 295 pread

    8.04 0.004178 13 314 7 semtimedop0.82 0.000428 31 14 pwrite0.23 0.000118 0 5475 times0.02 0.000011 0 248 kill0.00 0.000000 0 32 open0.00 0.000000 0 32 close0.00 0.000000 0 21 stat0.00 0.000000 0 32 writev0.00 0.000000 0 6 sendto0.00 0.000000 0 42 semctl0.00 0.000000 0 8 getrusage

    ------ ----------- ----------- --------- --------- ----------------100.00 0.051972 8735 7 total

    34

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    35/42

    System Monitoring DBWR - ASM[oracle@pd3 ~]$ ps -eaf | grep dbworacle 30807 1 0 13:37 ? 00:00:00 ora_dbw0_test1

    [oracle@pd3 ~]$ strace -cp 30807Process 30807 attached - interrupt to quitProcess 30807 detached% time seconds usecs/call calls errors syscall------ ----------- ----------- --------- --------- ----------------88.57 0.036811 449 82 io_submit5.32 0.002210 0 96686 getrusage4.36 0.001814 4 451 io_getevents

    1.11 0.000463 1 416 mmap0.57 0.000238 0 6008 kill0.06 0.000024 0 3488 times0.00 0.000000 0 2 semop0.00 0.000000 0 96 semctl0.00 0.000000 0 22 8 semtimedop

    ------ ----------- ----------- --------- --------- ----------------100.00 0.041560 107251 8 total

    35

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    36/42

    ORION (Oracle I/O Calibration Tool)

    Standalone Tool for calibrating the I/O performance for storagesystems that are intended to be used for Oracle databases

    No need to create and run an Oracle DB

    May be configured to simulate OLTP or DWH environments

    36

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    37/42

    Orion Configuration[oracle@pd3 ~]$ cat dwh_dg1.lun/dev/mapper/mpath7p1/dev/mapper/mpath8p1/dev/mapper/mpath9p1/dev/mapper/mpath14p1

    37

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    38/42

    Orion Configuration[oracle@dwh01 orion]$ cat orion-test1.sh#!/bin/bashHOST=`hostname -s`DG=$1TEST="${HOST}-${DG}"

    cd ${ROOT}

    echo "Hostname: ${HOST}"echo " dg: ${DG}"echo " test: ${TEST}"

    sudo ./orion_linux_x86-64 -run advanced \-testname ${TEST} \-matrix point \-num_small 0 \-num_large 4 \-size_large 1024 \-num_disks 4 \-type seq \-num_streamIO 4 \-simulate raid0 \-cache_size 0 \-stripe 1024 \-write 50 \-verbose

    38

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    39/42

    Orion Run : summary.txtORION VERSION 11.1.0.7.0

    Commandline: -run simple -testname dg1 -num_disks 4

    This maps to this test:Test: dg1Small IO size: 8 KBLarge IO size: 1024 KBIO Types: Small Random IOs, Large Random IOsSimulated Array Type: CONCATWrite: 0%

    Cache Size: Not EnteredDuration for each Data Point: 60 secondsSmall Columns:, 0Large Columns:, 0, 1, 2, 3, 4, 5, 6, 7, 8Total Data Points: 29

    Name: /dev/mapper/mpath7p1 Size: 1099522496512Name: /dev/mapper/mpath8p1 Size: 1099522496512Name: /dev/mapper/mpath9p1 Size: 1099522496512Name: /dev/mapper/mpath14p1 Size: 1099522496512

    4 FILEs found.

    Maximum Large MBPS=330.91 @ Small=0 and Large=8Maximum Small IOPS=8856 @ Small=20 and Large=0Minimum Small Latency=0.85 @ Small=1 and Large=0

    39

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    40/42

    Implementation Tips..

    Test, Test, Test

    Make friends in the Storage/System Administration Groups

    Be aware of any existing bugs/limitations : Data Pump Export(EXPDP) Received Error ORA-31641 (Doc ID 1330406.1) Setting filesystemio_options=O_DIRECT/SETALL and using expdp to export tables to a file

    system (tmpfs) which is not support O_DIRECT.

    alter system set filesystemio_options=none scope=spfile;

    40

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    41/42

    References

    Metalink Oracle Linux, Filesystem & I/O Type Supportability (Doc ID 279069.1)

    ASM Inherently Performs Asynchronous I/O Regardless of filesystemio_options Parameter(Doc ID 751463.1)

    How To Check if Asynchronous I/O is Working On Linux (Doc ID 237299.1)

    Supported and Recommended File Systems on Linux (Doc ID 236826.1)

    Comparing Performance Between RAW IO vs OCFS vs EXT 2/3 (Doc ID 236679.1)

    Using Redhat Global File System (GFS) as shared storage for RAC (Doc ID 329530.1)

    Asynchronous I/O (aio) on RedHat Advanced Server 2.1 and RedHat Enterprise Linux 3(Doc ID 225751.1)

    Asynchronous I/O Support on OCFS/OCFS2 and Related Settings: filesystemio_options,disk_asynch_io (Doc ID 432854.1)

    Other Sites M. Tim Jones - Boost application performance using asynchronous I/O

    (http://www.ibm.com/developerworks/linux/library/l-async/index.html)

    41

  • 7/27/2019 SEOUC - How to Solve the Wrong Problem

    42/42

    Questions & Contact Info

    http://blog.romeosoft.com/

    [email protected]

    42

    http://romeosoft.com/mailto:[email protected]:[email protected]://romeosoft.com/