11 33 55 scalable data warehouse & data marts reportsanalysis sql server dbms sql server...
TRANSCRIPT
Agenda
Fast Track I/O fundamentalsPutting the data on diskGetting the data off disk
Actual system results
TakeawaysUnderstand the interaction between SQL and the storage layer of the database stack for scan based disk I/O optimizationArticulate core Fast Track disk I/O balancing approach for Data Warehousing
Some SQL Data Warehouses Today
Big SANBig 64-core ServerConnected together!
What’s wrong withthis picture???
System out of balance
This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec• Even when the SAN is dedicated to the
SQL Data Warehouse, which it often isn’t• Lots of disks for Random IOPS BUT• Limited controllers Limited IO
bandwidth System is typically IO bound and
queries are slow• Despite significant investment in both
Server and Storage
The Alternative: A Balanced System
Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload
Avoid sharing storage devices among servers
Avoid overinvesting in disk drives Focus on scan performance, not IOPS
Layout and manage data to maximize range scan performance and minimize fragmentation
Key Customer challengesData explosion, increased complexity, Quality & Compliance
Support data from multiple sources
Rapidly Growing Data
1
3
Ensure Quality & Integrity
5
Scalable Data Warehouse & Data Marts
ReportsAnalysis
SQL Server DBMS
SQL Server Integration Services
Custom OLTP
Increase usage & trust
Consolidate & reduce cost
Increased complexity and inflexibility of EDWs
4
Increasing demand for real time data
2
Higher Customer Expectations
6
What is SQL Server Fast TrackData Warehouse?
A new offering to help Customers and Partners accelerate their Data Warehouse deployments.
Seven new SMP Reference Architectures
Our offering includes (1) New DW Reference Architectures
and (2) SI provided Templates for DW Solutions.
SI Solution Templates
• Method for designing a cost-effective, balanced system for Data Warehouse workloads
• Reference hardware configurations developed in conjunction with hardware partners using this method
• Best practices for data layout, loading and management
Fast Track Data Warehouse Components
Software:• SQL Server 2008
Enterprise• Windows Server 2008
Hardware:• Tight specifications for
servers, storage and networking
• ‘Per core’ building block
Configuration guidelines:• Physical table
structures• Indexes• Compression• SQL Server settings• Windows Server
settings• Loading
Published Reference Architectures Balanced System Examples -- HP / Dell / IBM, 8 to 48 core
Server CPU
CPU Core
s SANData Drive
Count
Initial Max
Capacity*
Capacity**
HP Proliant DL 385 G6
(2) AMD Opteron Istanbulsix core 2.6 GHz
12 (3) HP (24) 300GB 15k SAS
6TB 12TB
MSA2312HP Proliant DL 385 G6
(2) AMD Opteron Istanbul six core 2.6 GHz
12 (3) EMC AX4 (24) 300GB 15k FC 6TB 12TB
HP Proliant DL 585 G6
(4) AMD Opteron Instanbul six core 2.6 GHz
24 (6) HP MSA2312 (48) 300GB 15k SAS
12TB 24TB
HP Proliant DL 585 G6
(4) AMD Opteron Instanbul six core 2.6 GHz
24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB
HP Proliant DL 785 G6
(8) AMD Opteron Istanbul six core 2.8 GHz
48 (12) HP MSA2312 (96) 300GB 15k SAS
24TB 48TB
HP Proliant DL 785 G6
(8) AMD Opteron Istanbul six core 2.8 GHz
48 (12) EMC AX4 (96) 300GB 15k FC 24TB 48TB
Dell PowerEdge R710
(2) Intel Xeon Nehalem quad core 2.66
GHz
8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB
Dell Power Edge R900
(4) Intel Xeon Dunningtonsix core 2.67GHz
24 (6) EMC (48) 300GB 15k FC 12TB 24TB
AX4IBM X3650 M2 (2) Intel Xeon Nehalem
quad core 2.67 GHx8 (2) IBM DS3400 (16) 200GB 15K
FC4TB 8TB
IBM X3850 M2 (4) Intel Xeon Dunnington six core 2.67 GHz
24 (6) IBM DS3400 (24) 300GB 15k FC 12TB 24TB
IBM X3950 M2 (8) Intel Xeon Dunnington four core 2.13 GHz
32 (8) IBM DS3400 (32) 300GB 15k SAS
16TB 32TB
Benefits of Fast Track Data Warehouse
Appliance-like time to valueReduces DBA effort; fewer indexes, much higher level of sequential I/O
Choice of HW PlatformsHP, Dell, EMC, IBM, Bull – more in
future
Low TCO ThroughCommodity Hardware and value
pricing; Lower storage costs.
High ScaleNew reference architectures scale
up to 48 TB (assuming 2.5x compression)
Reduced RiskTested by Microsoft; better choice of hardware; application of Best
Practice
Scope of Fast Track Data Warehouse in a BI Solution
Dat
a P
ath
Data Warehouse
Analysis Services Cubes
PerformancePoint
SAN, Storage Array
Reporting Services
Web Analytic Tools
Integration Services ETL
SharePoint Services
Microsoft Office SharePoint
Data Staging,Bulk Loading
Subject AreaData Marts
Supporting Systems BI Data Storage Systems Presentation Layer Systems
Reference Architecture Scope (dashed)
Excel Services
Pre
sen
tati
on
Dat
aP
rese
nta
tio
n D
ata
Designing High Performance I/OFundamentals of Disk Drives and
Caches
The Full Stack
CPU PCI Bus I/O Controller / HBA Cabling Array Cache SpindleWindowsSQL Serv.
DB
Component Balance For… CPU Maximum consumption rate of cached data for targeted
query mix
Controller (Service Processor)
Bandwidth needed to feed CPU cores (based on targeted query mix)
HBA Aggregate bandwidth array controllers will deliver
Switch Aligned with HBA bandwidth and optimized for sequential IO
Disks Aggregate bandwidth of array controllers / Database capacity
Balanced Architecture ComponentsA hardware system where ALL resources
can be leveraged to its maximum without any of the other components in the system bottlenecking it.
Disk Subsystem
Server
NIC
Memory
Network1
53
4
2
SQL File Layout
HBA
Numbers to Remember - SpindlesTraditional Spindle throughput
10K RPM – 100 -130 IOPs at ‘full stroke’15K RPM – 150-180 IOPs at ‘full stroke’Can achieve 2x or more when ‘short stroking’ the disks (using less than 20% capacity of the physical spindle)These are for random 8K I/O
Aggregate throughput when sequential access:Between 90MB/sec and 125MB/sec for a single driveIf true sequential, any block size over 8K will give you these numbersDepends on drive form factor, 3.5” drives slightly faster than 2.5”
Approximate latency: 3-5ms
Scaling of Spindle Count Short vs. Full Stroke
Each 8 disks exposes a single 900GB LUN
Test data set is fixed at 800GB
Single 800GB for single LUN (8 disks), two 400GB test files across two LUNs, etc…
Lower IOPs per physical disk when more capacity of the physical disks are used (longer seek times)
8 Disks
16 Disks
32 Disks
48 Disks
64 Disks
0
50
100
150
200
250
300
350
400
0
10
20
30
40
50
60
70
80
90
100
233270
311327 336
Short vs. Full Stroke Impact
Random 8K Reads
Reads Per Disk (Random 8K)
I/O
's P
er
Second
I/O WeavingTest setup:
Two Partition on same LUN Hit both partitions with a sequential 64K read pattern7200rpm SATA drive
Compare with Random 64K
Stripes and Block Random
Stripe Size Transfer Size Random Size Sequential
128KB 64KB 1.2GB/sec 2.1GB/sec
128KB 512KB 1.5GB/sec 2.1GB/sec
256KB 64KB 1.0GB/sec 2.1GB/sec
256KB 512KB 2.0GB/sec 2.1GB/sec
RAID10 stripe on 24 spindles
Sequential Gated by controller87MB/sec/spindle
Theoretical: 125MB/sec/spindle
Note the effect of stripe sizes and increased transfer size
83MB/sec/spindle
Path to the drives - SAN
Cach
e
Fib
er C
ha
nn
el P
orts
Co
ntro
llers
/Pro
ce
ss
ors
Sw
itch
HBAS
witch
PCI BusPCI BusPCI Bus
PCI Bus
Best Practice: Make sure you have the tools to monitor the entire path to the drives. Understand utilization of individual componets
Fabric Array
Data Warehouse Workload Characteristics
Scan Intensive
Hash Joins
Aggregations
SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE,SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS
SUM_DISC_PRICE,SUM(L_EXTENDEDPRICE*(1-
L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY,AVG(L_EXTENDEDPRICE) AS AVG_PRICE,AVG(L_DISCOUNT) AS AVG_DISC,COUNT(*) AS COUNT_ORDER
FROM LINEITEMGROUP BY L_RETURNFLAG,
L_LINESTATUSORDER BY L_RETURNFLAG,
L_LINESTATUS
SQL File System
SQL Databases are comprised of one or more Files made up of blocks that comprise logical database pages, rows, and tables.Hierarchy: Physical to Logical
NTFS: Block (512 bytes)NTFS: Allocation Unit (64k)NTFS: FileDatabase: Row (variable)Database: Page (8k)Database: ExtentDatabase: Table
Database table/row ordering does not imply ordering of physical blocks on disk
Data fragmentation
Note: We also use the term “fragmentation” to generally describe poor sequential data layoutSQL Server Fragmentation types
File fragmentation (Physical)Extent fragmentation (Logical)Page fragmentation (Logical)Non clustered index fragmentation (Logical)
Fragmentation
File 1 File 2 File 1 File 1 File 2 File 1File
Extent
File 1Table 1 Table 2 Table 1
1:32
1:38
1:37
1:40
1:39
1:33
1:31
1:35
1:34
1:36
Extents
Page
SQL
Example: Page fragmentation
Bulk Inserts into Clustered Index using a moderate ‘BATCHSIZE’ parameter
Each ‘batch’ is sorted independentlyOverlapping batches lead to page splits
Note: Even without page splits a CI implies no disk level ordering of data
1:32
1:31
1:35
1:34
1:33
1:36
1:38
1:37
1:40
1:39
1:32
1:31
1:35
1:34
1:33
Key Order of Index
Writing Sequential DataGoals
Minimize fragmentationFor ordered tables: match logical to physical location as best we canUse filegroup locations to stripe data across multiple LUN: Fast Track Data Stripe
Maximize extent allocation. Goal is to increase effective request size.
–E startup option is used; Allocates 64 extents at a time (4MB)
Fast Track Data StripingFast Track evenly spreads SQL data files across physical RAID-1 disk arrays
ARY01D1v01
ARY01D2v02
ARY02D1v03
ARY02D2v04
ARY03D1v05
ARY03D2v06
ARY04D1v07
ARY04D2v08
ARY05v09
DB1-1.ndf DB1-7.ndfDB1-5.ndfDB1-3.ndf
DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf
DB1.ldf
Primary Data Log
FT Storage EnclosureRaid-1
Disk 1 & 2
SQL Extent Allocation -E will allocate 64 extents at a time as files grow: “Extent Allocation” becomes 4MBData is written round-robin across all locations in the Filegroup.
Each file receives data in 4MB incrementsEach Raid1 group has 2 locations (LUN) resulting in an 8MB stripe of data across all arrays in the Filegroup
ARY01D1v01
ARY01D2v02
ARY02D1v03
ARY02D2v04
ARY03D1v05
ARY03D2v06
ARY04D1v07
ARY04D2v08
ARY05v09
DB1-1.ndf DB1-7.ndfDB1-5.ndfDB1-3.ndf
DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf
DB1.ldf4MB
4MB4MB
4MB4MB4MB
4MB4MB
Autogrow and Fast Track RA
Best Practice: Pre-allocate all databases and do not use Autogrow.
*This is not always practical
When Autogrow is usedTF1117 should be used.This will enforce even file grow and maintain the 4MB extent allocation (stripe width).
Writing Sequential Data
Fast Track ‘sequential’ writes…Load processes are managed to minimize page and extent fragmentationBulk loads parallelism is managedVolatile/Non-Volatile data segregation if necessaryPrimary user databases are pre-allocated
Achieving Sequential ScanFast Track Goals
Maintain sequential data layoutCreate an affinity between LUN and physical CPU cores. Use SQL filegroups to stripe data across these LUNIncrease effective request size beyond the 512k limit by leveraging SQL and storage hardware
SQL Read AheadSAN hardware pre-fetchRAID1 mirroring
Storage Pre-Fetch
Disk Scan
Pages For Table ‘Y’
Pages For Table ‘X’ Extent 1..16
512k Read
*2MB Pre-Fetch
Extent 17..64
Extent 65……Extent
128Extent 129
Pre-fetch extends individual SQL read requests significantly (~2MB, hardware dependant) by reading additional, adjacent data each time the disk head touches platter.Assume contiguous extents; Pre-fetch will put data in cache that will subsequently be requested
Reduces disk head movementMaximizes SQL Read-Ahead
Storage Pre-Fetch & Fragmentation
Example: Extent FragmentationEffective read size is now restricted to the number of contiguous extents available within the pre-fetch rangeSignificantly reduces the value of pre-fetch to sequential I/O by increasing disk head movement
Disk Scan
Pages For Table ‘Y’
Pages For Table ‘X’ Extent 1..16
512k Read
*2MB Pre-Fetch
Extent 17-64 Extent 65..70 Extent 70..89 Extent 90..99
SQL Server Read-AheadSQL R-A (Read-Ahead) goals for Fast TrackI. Queue pending read requests at the storage enclosure
level.II. Maintain queue depth that keeps the storage system
busy
R-A + storage Pre-fetch; Maintain read efficiency for large scans or concurrency (coalescing reads)R-A and Fast Track data striping allow multiple drive bandwidth to a single read thread
Single read thread, leveraging R-A, may read up to 40MB per thread. FT data stripe is 8MB per RAID1 array; One thread can touch up to 5 arrays, or 10 LUN.
SQL Read-Ahead (per SQL thread)
Disk Scan
Pages For Table ‘Y’
Pages For Table ‘X’ Extent 1…
512k Read
*2MB Pre-Fetch
…Extent 64
Extent 65…
…Extent 128
Extent 129
*8MB Read-Ahead per ThreadR-A can request up to 40MB per threadQueue read requests to the storage hardware
Better chance of coalescing contiguous readsEach cache hit minimizes disk head movement and increases average scan rate
Keep the stack busyMaintain enough read queue depth to ensure that reads can process while SQL is processing data
Sequential Scan Components
RAID1+ data striping + pre-fetch + SQL read-ahead work to create efficient ‘Sequential IO’
Data stripe width is balanced against read-ahead “Depth”This combination can provide up to ~2MB ‘effective’ read size despite SQL Servers native 512k limit
Each element is necessary or efficiency falls off
ARY01D1v01
ARY01D2v02
ARY02D1v03
ARY02D2v04
ARY03D1v05
ARY03D2v06
ARY04D1v07
ARY04D2v08
DB1-1.ndf DB1-7.ndfDB1-5.ndfDB1-3.ndf
DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf
4MB
4MB4MB
4MB4MB4MB
4MB4MB
Fast Track Data Stripe
512k read
*2MB Pre-Fetch*8MB Read-Ahead
POC Hardware
HP DL785, 8 socket, 48 core12 HP MSA-2312 storage arraysMaximum Throughput: 8GB/s User DataTwo Primary database, 24 LUN filegroup in eachThe next example was executed on only ‘half’ of this Fast Track configured system.
Each database can sustain this workload benchmark at ~3.5GB/s when run concurrently…Net is ~8GB/s disk I/O throughput.
Fast Track I/O Performance: Recent POC
Billion Row heap, no index, no partition168GB User DataFilegroup: 24 LUN, 24 FilesQuery:
Select DATEPART(mm,update_dt) ,SUM(POS_CURR_REQ_AMT) ,COUNT(*) ,AVG(BASE_CURR_REQ_AMT) FROM tf_1 WHERE time_dim_ky between 234 and
149999965 GROUP BY datepart(mm,update_dt)
Benchmark Results
100 Seconds
15%
25%
2000 MB/s 3800 MB/s
Summary1G rows, 168GB user data scanned < 1 minute (DOP 16)Maximum disk I/O throughput at DOP 16: 3.8GB/s25% system CPU utilization at DOP 16
Customer Workload SLA: < 100 Seconds execution time
SLA achieved at DOP 6Disk I/O throughput at DOP6: 2GB/sCPU Utilization at DOP6: 15%
Takeaway: Managing Workloads
DOP settings – Resource managementFor I/O bound queries: DOP at some level less than total LUN count will achieve maximum I/O bandwidthFor CPU bound queries: DOP at some level at or above total LUN count will achieve maximum I/O bandwidthPrinciple is holistic; Can be viewed based on the nature of the overall workload (CPU or I/O bound)
Relative predictability for an SMP databaseFast Track CPU – I/O balancingData Striping across balanced LUNScan based workload