©2011 quest software, inc. all rights reserved.. guy harrison director, r&d melbourne email:...

63
©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: [email protected] Twitter: @guyharrison Web: http://www.guyharrison.net Making the most of Solid State Disk in Oracle 11g

Upload: halie-barron

Post on 29-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

©2011 Quest Software, Inc. All rights reserved..

Guy HarrisonDirector, R&D Melbourne

Email: [email protected]: @guyharrisonWeb: http://www.guyharrison.net

Making the most of Solid State Disk in Oracle 11g

Page 2: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Introductions

Page 3: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

3

Page 4: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

4

©2011 Quest Software, Inc. All rights reserved..

Agenda

• Brief History of Magnetic Disk

• Solid State Disk (SSD) technologies

• SSD internals

• Oracle DB flash cache architecture

• Performance comparisons

• Recommendations and Suggestions

Page 5: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

5

©2011 Quest Software, Inc. All rights reserved..

A brief history of disk

Page 6: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

5MB HDD circa 1956

Page 7: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

28MB HDD - 1961

1800 RPM

Page 8: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

The more that things change....

Page 9: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

9

©2011 Quest Software, Inc. All rights reserved..

Moore’s law

• Transistor density doubles every 18 months

• Exponential growth is observed in most electronic

components:• CPU clock speeds

• RAM

• Hard Disk Drive storage density

• But not in mechanical components• Service time (Seek latency) – limited by actuator arm speed and disk

circumference

• Throughput (rotational latency) – limited by speed of rotation,

circumference and data density

Page 10: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Disk trends 2001-2009

IO Rate Disk Capacity IO/Capacity CPU IO/CPU-1,000

-500

0

500

1,000

1,500

2,000

260 1,635

-630

1,013

-390

%ag

e ch

ang

e

Page 11: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

11

©2011 Quest Software, Inc. All rights reserved..

Solid State Disk

Page 12: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

SSD to the rescue?

Magnetic Disk

SSD SATA Flash

SSD PCI flash

SSD DDR-RAM

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

4,000

80

25

15

Seek time (us)

Page 13: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Power consumption

Idle

Seek

Start up

1 10 100

8

10

20

Flash SSD

SATA HDD

Watts (logarithmic scale)

Page 14: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Economics of SSD

Seagate SATA HDD

Seagate SAS HDD

Intel MLC SATA SSD

Intel SLC SATA SSD

FusionIO PCI MLC Duo SSD

FusionIO PCI SLC SSD

0.00 0.50 1.00 1.50 2.00 2.50

0.00 10.00 20.00 30.00 40.00 50.00 60.00

2.38

1.53

0.05

0.05

0.06

0.06

0.09

1.00

6.88

21.88

24.92

53.44

$/GB

$/IOP

$/IOP

$/GB

$/GB

$/IOPS

Page 15: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Tiered storage management

Main Memory

DDR SSD

Flash SSD

Fast Disk (SAS, RAID 0+1)

Slow Disk (SATA, RAID 5)

Tape, Flat Files, Hadoop

$/IOP$/

GB

Page 16: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

16

©2011 Quest Software, Inc. All rights reserved..

Storage Tiering

Storage Tiering For Dummies,® Oracle Special Edition, Wiley 2011

Page 17: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

17

©2011 Quest Software, Inc. All rights reserved..

SSD technology and internals

Page 18: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Flavours of Flash SSD

DDR RAM Drive

SATA flash drive

PCI flash drive

SSD storage Server

Page 19: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

PCI SSD vs SATA SSD

PCI vs SATA• SATA was designed for traditional disk drives with high latencies

• PCI is designed for high speed devices

• PCI SSD has latency ~ 1/3rd of SATA

Page 20: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

20Booth 1107

Page 21: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

21

©2011 Quest Software, Inc. All rights reserved..

Flash SSD Technology

• Cell: One (SLC) or Two (MLC) bits• Page: Typically 4K • Block: Typically 128-512K

Storage Hierarchy:

• Read and first write require single page IO• Overwriting a page requires an erase & overwrite of the block

Writes:

• 100,000 erase cycles for SLC before failure • 5,000 – 10,000 erase cycles for MLC

Write endurance:

Page 22: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

22

©2011 Quest Software, Inc. All rights reserved..

Flash SSD performance

Read (4k page seek)

First insert (4k page write)

Update (256K block erase)

0 200 400 600 800 1000 1200 1400 1600 1800 2000

25

250

2000

Microseconds

Page 23: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Flash Disk write degradation

All Blocks empty:Write time=250 us

25% part full:• Write time= ( ¾ * 250 us + 1/4 * 2000 us) = 687 us

75% part full • Write time = ( ¼ * 250 us + ¾ * 2000 us ) = 1562 us

Empty

Partially Full

Page 24: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Valid Data Page

Empty Data Page

InValid Data Page

Free Block Pool

Used Block Pool

SSD ControllerInsert

Data Insert

Page 25: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Valid Data Page

Empty Data Page

Invalid Data Page

Free Block Pool

Used Block Pool

SSD ControllerUpdate

Data Update

Page 26: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Valid Data Page

Empty Data Page

Invalid Data Page

Free Block Pool

Used Block Pool

SSD Controller

Garbage Collection

Page 27: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

27

©2011 Quest Software, Inc. All rights reserved..

Page 28: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

28

©2011 Quest Software, Inc. All rights reserved..

11g DB flash Cache

Page 29: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

29

©2011 Quest Software, Inc. All rights reserved..

Oracle DB flash cache

•Introduced in 11gR2 for

OEL and Solaris only

•Secondary cache

maintained by the DBWR,

but only when idle cycles

permit

•Architecture is tolerant of

poor flash write

performance

Page 30: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Buffer cache and Free buffer waits

Database files

Buffer cache

DBWR

Oracle process

Free Buffer Waits

Write dirty blocks to disk

Write to buffer cache

Read from disk

Read from buffer cache

Free buffer waits often occur when reads are much faster than writes....

Page 31: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Flash Cache

Database files

Buffer cache

DBWR

Oracle process

Write dirty blocks to disk

Write to buffer cache

Read from disk

Read from buffer cache

Flash Cache

Write clean blocks (time permitting)

Read from flash cache

DB Flash cache architecture is designed to accelerate buffered reads

Page 32: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

32

©2011 Quest Software, Inc. All rights reserved..

Configuration

• Create filesystem from flash device

• Set DB_FLASH_CACHE_FILE and

DB_FLASH_CACHE_SIZE.

• Consider Filesystemio_options=setall

Page 33: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

33

©2011 Quest Software, Inc. All rights reserved..

Flash KEEP pool

• You can prioritise blocks for important objects using the

FLASH_CACHE clause:

Page 34: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

34

©2011 Quest Software, Inc. All rights reserved..

Oracle Db flash cache statistics

http://guyharrison.squarespace.com/storage/flash_insert_stats.sql

Page 35: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Flash Cache Efficiency

http://guyharrison.squarespace.com/storage/flash_time_savings.sql

Page 36: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Flash cache Contents

http://guyharrison.squarespace.com/storage/flashContents.sql

Page 37: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

37

©2011 Quest Software, Inc. All rights reserved..

Performance tests

Page 38: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

38

©2011 Quest Software, Inc. All rights reserved..

Test systems

• Low end system:• Dell Optiplex dual-core 4GB RAM

• 2xSeagate 7500RPM Baracuda SATA HDD

• Intel X-25E SLC SATA SSD

• Higher end system:• Dell R510 2xquad core, 32 GB RAM

• 4x300GB 15K RPM,6Gbps Dell SAS HDD

• 1xFusionIO ioDrive SLC PCI SSD

Page 39: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Performance: indexed reads(X-25)

No Flash

Flash cache

Flash tablespace

0 100 200 300 400 500 600

529.7

143.27

48.17

Total

db file IO

flash cache IO

Other

Elapsed (s)

Page 40: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Performance: Read/Write (X-25)

No Flash

Flash Cache

Flash tablespace

0 500 1000 1500 2000 2500 3000 3500

3,289

1,693

200

Total

db file IO

write complete

free buffer

flash cache IO

Other

Elapsed time (s)

Page 41: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Random reads – FusionIO

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 500 1000 1500 2000 2500

2,211

583

121

Total

Other

DB File IO

Flash cache IO

Elapsed time (s)

Page 42: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Updates – Fusion IO

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 1000 2000 3000 4000 5000 6000 7000

6,219

1,934

529

Total

db file IO

log file IO

flash cache

free buffer waits

Other

Elapsed Time (s)

Page 43: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Full table scan – FusionIO

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 50 100 150 200 250 300 350 400 450

418

398

72

Total

Other

DB File IO

Flash Cache IO

Elasped time (s)

Page 44: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

44

Sorting – what we expect

Table/Index IO CPU Time Temp Segment IO

PGA Memory available (MB)

Tim

e

Memory Sort

Single PassDisk Sort

Multi-passDisk Sort

Page 45: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

45

Disk Sorts – temporary tablespace

0501001502002503000

500

1000

1500

2000

2500

3000

3500

4000

SAS based TTS SSD based TTS

Sort Area Size

Ela

pse

d t

ime

(s)

Single PassDisk Sort

Multi-passDisk Sort

Page 46: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

SAS based redo log

Flash based redo log

0 50 100 150 200 250 300 350

292.39

291.93

Total

Log IO

Elapsed time (s)

Redo performance – Fusion IO

Page 47: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

47

Concurrent redo workload (x10)

SAS based redo log

Flash based redo log

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

1,605

1,637

397

331

1,944

1,681

CPU

Other

Log File IO

Elapsed time (s)

Page 48: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

48

©2011 Quest Software, Inc. All rights reserved..

Buffer Cache bottlenecks

• Flash cache architecture

avoids ‘free buffer waits’

due to flash IO, but write

complete waits can still

occur on hot blocks.

• Free buffer waits are still

likely against the

database files, due to

high physical read rates

created by the flash

cache

Page 49: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

49

©2011 Quest Software, Inc. All rights reserved..

Write degradation

• In theory, high sustained write IO can lead to SSD

degradation when GC fails to cope with the block

erase/update cycle

• In practice, this is rarely noticeable from Oracle:• Oracle write IO is largely asynchronous (DBWR)

• Almost all write activity has at least an equal amount of read activity

• Garbage collection and wear levelling algorithms are sophisticated in

decent SSD drives

Page 50: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

50

©2011 Quest Software, Inc. All rights reserved..

Page 51: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

51

©2011 Quest Software, Inc. All rights reserved..

Page 52: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Fusion IO direct cache

52

Read-intensive, potentially massive

tablespaces

• Temp Tablespace

• Hot Segments

• Hot Partitions• DB Flash

Cache

(limited to the size of the SSD)

Regular Block Device

ioMemory VSL

File System/ Raw Devices/ ASM

directCache

File System/ Raw Devices/ ASM

Caching Block Device

ioMemory VSL

LUN

Page 53: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Fusion IO direct cache – Table scans

No cache 1st scan

No cache 2nd scan

direct cache on 1st scan

direct cache on 2nd scan

0 20 40 60 80 100 120 140 160

147

147

147

36

Total

IO

Other

Elapsed time (s)

Page 54: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

54

©2011 Quest Software, Inc. All rights reserved..

Exadata

54

Page 55: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:
Page 56: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

56

©2011 Quest Software, Inc. All rights reserved..

Exadata flash storage

• 4x96GB PCI Flash drives on each storage server

• Flash can be configured as:• Exadata Smart Flash Cache (ESFC)

• Solid State Disk available to ASM disk groups

• ESFC is not the same as the DB flash cache:• Maintained by cellsrv, not DBWR

• DOES support full table scans

• DOES NOT support smart scans

• Unless CELL_FLASH_CACHE= KEEP,

• Statistics accessed via the cellcli program

• Considerations for cache vs. SSD are similar

Page 57: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

Exadata: Flash grid disk vs ESFC

SAS disk no flash cache

SAS disk with flash cache

SSD disks (no flash cache)

0 500 1000 1500 2000 2500 3000

1,240

429

119

CPU

Total

Seconds

100M row table, 200,000 random PK lookups, 1M possible keys

Page 58: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

58

©2011 Quest Software, Inc. All rights reserved..

Summary

Page 59: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

59

©2011 Quest Software, Inc. All rights reserved..

Recommendations

• Don’t wait for SSD to become as cheap as HDD• Magnetic HDD will always be cheaper per GB, SSD cheaper per IO

• Consider a mixed or tiered storage strategy• Using DB flash cache, selective SSD tablespaces or partitions

• Use SSD where your IO bottleneck is greatest and SSD advantage is

significant

• DB flash cache offers an easy way to leverage SSD for

OLTP workloads, but has few advantages for OLAP or

Data Warehouse

Page 60: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

60

©2011 Quest Software, Inc. All rights reserved..

How to use SSD

• Database flash cache• If your bottleneck is single block (indexed reads) and you are on OEL or

Solaris 11GR2

• Flash tablespace• Optimize read/writes against “hot” segments or partitions

• Flash temp tablespace• If multi-pass disk sorts or hash joins are your bottleneck

• FusionIO direct cache• If you want to optimize both scans and index reads OR you are not on

OEL/Solaris 11GR2

60

Page 61: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

61

©2011 Quest Software, Inc. All rights reserved..

Page 62: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

62

©2011 Quest Software, Inc. All rights reserved..

Page 63: ©2011 Quest Software, Inc. All rights reserved.. Guy Harrison Director, R&D Melbourne Email: guy.harrison@quest.com Twitter: @guyharrison Web:

63

©2011 Quest Software, Inc. All rights reserved..

References

• Latest version of this presentation:http://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache

• Quest whitepaper:• http://www.quest.com/documents/landing.aspx?id=15423

• Guy’s SSD guide• http://guyharrison.squarespace.com/ssdguide/