database & technology 1 _ guy harrison _ making the most of ssd in oracle11g.pdf
TRANSCRIPT
©2011 Quest Software, Inc. All rights reserved..
Guy Harrison Director, R&D Melbourne
Email: [email protected] Twitter: @guyharrison Web: http://www.guyharrison.net
Making the most of Solid State Disk in Oracle 11g
Introductions
0 10 20 30 40 50 60 70 80
Blue
Yellow
Red
Pct
Star trek shirt fatality analysis
10
©2011 Quest Software, Inc. All rights reserved..
Agenda
• Brief History of Magnetic Disk
• Solid State Disk (SSD) technologies
• SSD internals
• Oracle DB flash cache architecture
• Performance comparisons
• Recommendations and Suggestions
11
©2011 Quest Software, Inc. All rights reserved..
A brief history of disk
5MB HDD circa 1956
28MB HDD - 1961 1800 RPM
The more that things change....
15
©2011 Quest Software, Inc. All rights reserved..
Moore’s law
• Transistor density doubles every 18 months
• Exponential growth is observed in most electronic components: • CPU clock speeds
• RAM
• Hard Disk Drive storage density
• But not in mechanical components • Service time (Seek latency) – limited by actuator arm speed and disk
circumference
• Throughput (rotational latency) – limited by speed of rotation, circumference and data density
Disk trends 2001-2009
260 1,635
-630
1,013
-390
-1,000
-500
0
500
1,000
1,500
2,000
IO Rate Disk Capacity IO/Capacity CPU IO/CPU
%ag
e ch
ange
17
©2011 Quest Software, Inc. All rights reserved..
Solid State Disk
SSD to the rescue?
4,000
80
25
15
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500
Magnetic Disk
SSD SATA Flash
SSD PCI flash
SSD DDR-RAM
Seek time (us)
Power consumption
8
10
20
0.08
0.15
0.01 0.1 1 10 100
Idle
Seek
Start up
Watts (logarithmic scale)
Flash SSD
SATA HDD
Economics of SSD
2.38
1.53
0.05
0.05
0.06
0.06
0.09
1.00
6.88
21.88
24.92
53.44
0.00 10.00 20.00 30.00 40.00 50.00 60.00
0.00 0.50 1.00 1.50 2.00 2.50
Seagate SATA HDD
Seagate SAS HDD
Intel MLC SATA SSD
Intel SLC SATA SSD
FusionIO PCI MLC Duo SSD
FusionIO PCI SLC SSD
$/GB
$/IOP
$/IOP
$/GB
Tiered storage management
Main Memory
DDR SSD
Flash SSD
Fast Disk (SAS, RAID 0+1)
Slow Disk (SATA, RAID 5)
Tape, Flat Files, Hadoop
$/IOP $/
GB
22
©2011 Quest Software, Inc. All rights reserved..
SSD technology and internals
Flavours of Flash SSD
DDR RAM Drive SATA flash drive PCI flash drive SSD storage Server
PCI SSD vs SATA SSD
PCI vs SATA • SATA was designed for traditional disk drives with high latencies
• PCI is designed for high speed devices
• PCI SSD has latency ~ 1/3rd of SATA
25 Booth 1107
26
©2011 Quest Software, Inc. All rights reserved..
Flash SSD Technology
• Cell: One (SLC) or Two (MLC) bits • Page: Typically 4K • Block: Typically 128-512K
Storage Hierarchy:
• Read and first write require single page IO • Overwriting a page requires an erase & overwrite of the block
Writes:
• 100,000 erase cycles for SLC before failure • 5,000 – 10,000 erase cycles for MLC
Write endurance:
27
©2011 Quest Software, Inc. All rights reserved..
Flash SSD performance
25
250
2000
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Read (4k page seek)
First insert (4k page write)
Update (256K block erase)
Microseconds
Flash Disk write degradation
All Blocks empty: Write time=250 us
25% part full: • Write time= ( ¾ * 250 us + 1/4 * 2000 us) = 687 us
75% part full • Write time = ( ¼ * 250 us + ¾ * 2000 us ) = 1562 us
Empty
Partially Full
Valid Data Page
Empty Data Page
InValid Data Page
Free Block Pool
Used Block Pool
SSD Controller Insert
Data Insert
Valid Data Page
Empty Data Page
Invalid Data Page
Free Block Pool
Used Block Pool
SSD Controller Update
Data Update
Valid Data Page
Empty Data Page
Invalid Data Page
Free Block Pool
Used Block Pool
SSD Controller
Garbage Collection
32
©2011 Quest Software, Inc. All rights reserved..
33
©2011 Quest Software, Inc. All rights reserved..
11g DB flash Cache
34
©2011 Quest Software, Inc. All rights reserved..
Oracle DB flash cache
• Introduced in 11gR2 for OEL and Solaris only • Secondary cache maintained by the DBWR, but only when idle cycles permit • Architecture is tolerant of poor flash write performance
Buffer cache and Free buffer waits
Database files
Buffer cache
DBWR
Oracle process Free
Buffer Waits
Write dirty blocks to disk
Write to buffer cache
Read from disk
Read from buffer cache
Free buffer waits often occur when reads are much faster than writes....
Flash Cache
Database files
Buffer cache
DBWR
Oracle process
Write dirty blocks to disk
Write to buffer cache
Read from disk
Read from buffer cache
Flash Cache
Write clean blocks (time permitting)
Read from flash cache
DB Flash cache architecture is designed to accelerate buffered reads
37
©2011 Quest Software, Inc. All rights reserved..
Configuration
• Create filesystem from flash device
• Set DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE.
• Consider Filesystemio_options=setall
38
©2011 Quest Software, Inc. All rights reserved..
Flash KEEP pool
• You can prioritise blocks for important objects using the FLASH_CACHE clause:
39
©2011 Quest Software, Inc. All rights reserved..
Oracle Db flash cache statistics
http://guyharrison.squarespace.com/storage/flash_insert_stats.sql
Flash Cache Efficiency
http://guyharrison.squarespace.com/storage/flash_time_savings.sql
Flash cache Contents
http://guyharrison.squarespace.com/storage/flashContents.sql
42
©2011 Quest Software, Inc. All rights reserved..
Performance tests
43
©2011 Quest Software, Inc. All rights reserved..
Test systems
• Low end system: • Dell Optiplex dual-core 4GB RAM
• 2xSeagate 7500RPM Baracuda SATA HDD
• Intel X-25E SLC SATA SSD
• Higher end system: • Dell R510 2xquad core, 32 GB RAM
• 4x300GB 15K RPM,6Gbps Dell SAS HDD
• 1xFusionIO ioDrive SLC PCI SSD
Performance: indexed reads(X-25)
529.7
143.27
48.17
0 100 200 300 400 500 600
No Flash
Flash cache
Flash tablespace
Elapsed (s)
CPU
db file IO
flash cache IO
Other
Performance: Read/Write (X-25)
3,289
1,693
200
0 500 1000 1500 2000 2500 3000 3500
No Flash
Flash Cache
Flash tablespace
Elapsed time (s)
CPU
db file IO
write complete
free buffer
flash cache IO
Other
Random reads – FusionIO
2,211
583
121
0 500 1000 1500 2000 2500
SAS disk, no flash cache
SAS disk, flash cache
Table on SSD
Elapsed time (s)
CPU
Other
DB File IO
Flash cache IO
Updates – Fusion IO
6,219
1,934
529
0 1000 2000 3000 4000 5000 6000 7000
SAS disk, no flash cache
SAS disk, flash cache
Table on SSD
Elapsed Time (s)
DB CPU
db file IO
log file IO
flash cache
free buffer waits
Other
Full table scan – FusionIO
418
398
72
0 50 100 150 200 250 300 350 400 450
SAS disk, no flash cache
SAS disk, flash cache
Table on SSD
Elasped time (s)
CPU
Other
DB File IO
Flash Cache IO
49
Sorting – what we expect Ti
me
PGA Memory available (MB)
Table/Index IO CPU Time Temp Segment IO
Memory Sort
Single Pass Disk Sort
Multi-pass Disk Sort
50
Disk Sorts – temporary tablespace
0
500
1000
1500
2000
2500
3000
3500
4000
0 50 100 150 200 250 300
Elap
sed
time
(s)
Sort Area Size
SAS based TTS SSD based TTS
Single Pass Disk Sort
Multi-pass Disk Sort
Redo performance – Fusion IO
292.39
291.93
0 50 100 150 200 250 300 350
Flash based redo log
SAS based redo log
Elapsed time (s)
CPU
Log IO
52
Concurrent redo workload (x10)
1,605
1,637
397
331
1,944
1,681
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500
SAS based redo log
Flash based redo log
Elapsed time (s)
CPU
Other
Log File IO
53
©2011 Quest Software, Inc. All rights reserved..
Buffer Cache bottlenecks
• Flash cache architecture avoids ‘free buffer waits’ due to flash IO, but write complete waits can still occur on hot blocks.
• Free buffer waits are still likely against the database files, due to high physical read rates created by the flash cache
54
©2011 Quest Software, Inc. All rights reserved..
Write degradation
• In theory, high sustained write IO can lead to SSD degradation when GC fails to cope with the block erase/update cycle
• In practice, this is rarely noticeable from Oracle: • Oracle write IO is largely asynchronous (DBWR)
• Almost all write activity has at least an equal amount of read activity
• Garbage collection and wear levelling algorithms are sophisticated in decent SSD drives
55
©2011 Quest Software, Inc. All rights reserved..
56
©2011 Quest Software, Inc. All rights reserved..
57
©2011 Quest Software, Inc. All rights reserved..
Fusion IO direct cache
57
Read-intensive, potentially massive
tablespaces
• Temp Tablespace • Hot Segments • Hot Partitions • DB Flash Cache
(limited to the size of the SSD)
Regular Block Device
ioMemory VSL
File System/ Raw Devices/ ASM
directCache
File System/ Raw Devices/ ASM
Caching Block Device
ioMemory VSL
LUN
Fusion IO direct cache – Table scans
147
147
147
36
0 20 40 60 80 100 120 140 160
No cache 1st scan
No cache 2nd scan
direct cache on 1st scan
direct cache on 2nd scan
Elapsed time (s)
CPU
IO
Other
59
©2011 Quest Software, Inc. All rights reserved..
Exadata
59
61
©2011 Quest Software, Inc. All rights reserved..
Exadata flash storage
• 4x96GB PCI Flash drives on each storage server • Flash can be configured as:
• Exadata Smart Flash Cache (ESFC)
• Solid State Disk available to ASM disk groups
• ESFC is not the same as the DB flash cache: • Maintained by cellsrv, not DBWR
• DOES supprort full table scans
• DOES NOT support smart scans • Unless CELL_FLASH_CACHE= KEEP,
• Statistics accessed via the cellcli program
• Considerations for cache vs SSD may be similar
62
©2011 Quest Software, Inc. All rights reserved..
Summary
63
©2011 Quest Software, Inc. All rights reserved..
Recommendations
• Don’t wait for SSD to become as cheap as HDD • Magnetic HDD will always be cheaper per GB, SSD cheaper per IO
• Consider a mixed or tiered storage strategy • Using DB flash cache, selective SSD tablespaces or partitions
• Use SSD where your IO bottleneck is greatest and SSD advantage is significant
• DB flash cache offers an easy way to leverage SSD for OLTP workloads, but has few advantages for OLAP or Data Warehouse
64
©2011 Quest Software, Inc. All rights reserved..
How to use SSD
• Database flash cache • If your bottleneck is single block (indexed reads) and you are on OEL or
Solaris 11GR2
• Flash tablespace • Optimize read/writes against “hot” segments or partitions
• Flash temp tablespace • If multi-pass disk sorts or hash joins are your bottleneck
• FusionIO direct cache • If you want to optimize both scans and index reads OR you are not on
OEL/Solaris 11GR2
64
65
©2011 Quest Software, Inc. All rights reserved..
66
©2011 Quest Software, Inc. All rights reserved..
67
©2011 Quest Software, Inc. All rights reserved..
References • Latest version of this presentation:
http://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache
• Guy Harrison blog (guyharrison.net) postings: • All blog posts:
• http://guyharrison.squarespace.com/blog/tag/ssd
• SSD guiide (work in progress): • http://guyharrison.squarespace.com/ssdguide/
• Kevin Closson:
• http://kevinclosson.wordpress.com/2009/12/15/pardon-me-where-is-that-flash-cache-part-ii/
• General articles on SSD: • http://www.anandtech.com/storage/showdoc.aspx?i=3631 • http://en.wikipedia.org/wiki/Flash_memory • http://www.virident.com/downloads/Virident_Sustained_Performance_Whitepaper.pdf