solution to help customers and partners accelerate their data
TRANSCRIPT
Implementing a Microsoft SQL Server Data Warehouse Fast TrackBrian KnightFounder, Pragmatic [email protected]
SESSION CODE: BIE402
About the Ugly Guy SpeakingSQL Server MVPFounder of Pragmatic WorksCo-Founder of BIDN.com, SQLServerCentral.com and SQLShare.comWritten more than a dozen books on SQL Server
Today’s Problems with IntegrationIntegration today
Increasing data volumesIncreasingly diverse sources
Requirements reached the Tipping PointLow-impact source extractionEfficient transformationBulk loading techniques
AgendaSQL Instance-level Data load tuningFast Track maintenance
SQL Server Fast Track Data WarehouseA method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this methodBest practices for data layout, loading and management
Solution to help customers and partners accelerate their data
Fast Track Data Warehouse Components
Software:•SQL Server 2008 Enterprise•Windows Server 2008
Hardware:•Tight specifications for servers, storage & networking•‘Per core’ building block
Configuration guidelines:•Physical table structures•Indexes•Compression•SQL Server settings•Windows Server settings•Loading
Fast Track Performance
HP ProLiant DL785 G6 (8) AMD Opteron CPUs, 6 core, 2.6 GHz 48 total CPU cores 24 TB optimized storage (48 TB max) 9600 MB/s throughput
HPStorageWorks
MSA2000
Model 2212fc
73625140 15111410139128 2319221821172016HP StorageWorks 8/24 SAN Switch
HPStorageWorks
MSA2000
Model 2212fc
87654321
UID21
HPProLiant
DL785G5
HPStorageWorks
MSA2000
Model 2212fc
HPStorageWorks
MSA2000
Model 2212fc
HPStorageWorks
MSA2000
Model 2212fc
HPStorageWorks
MSA2000
Model 2212fc
HPStorageWorks
MSA2000
Model 2212fc
HPStorageWorks
MSA2000
Model 2212fc
ProCurve NetworkingHP Innovation
StatusLED
Mode
Reset Clear
Dual-Personality Port 10/100/1000-T (T) or SFP (S)off = 10Mbps flash = 100Mbps on = 1000MbpsSpd Mode
Use
on
ly o
ne
(T
or
S)
for
ea
ch P
ort
10/100/1000Base-T Ports (1-24T) - Ports are Auto-MDIX
Link Mode
Link Mode
Link Mode
Link Mode
Link Mode
Link Mode22
21
24
23
22
21
24
23
T
T
T
T
20181614
19171513
121086
11975
42
31ProCurve
Power
Fault
Locator
Console
SpdTest
Fan FDx
Act
S
S
S
S
ProCurve Switch2510G-24J9279A
73625140 15111410139128 2319221821172016HP StorageWorks 8/24 SAN Switch
Fast Track Data Warehouse Reference Configurations
* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB.** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure. 30% of this storage should be reserved for DBA operations
Server CPU CPU Cores SAN Data Drive Count Initial
Capacity*Max
Capacity**HP Proliant DL 385 G6
(2) AMD Opteron Istanbulsix core 2.6 GHz
12 (3) HP MSA2312fc (24) 300GB 15k SAS 6TB 12TB
HP Proliant DL 380 G6
(2) Intel Xeon® 5500 SeriesQuad core
8 (2) HP MSA2312 (16) 300GB 15k SAS 4TB 8TB
HP Proliant DL 585 G6
(4) AMD Opteron Istanbul six core 2.6 GHz
24 (6) HP MSA2312fc (48) 300GB 15k SAS 12TB 24TB
HP Proliant DL 580 G5
(4) Intel Xeon® 7400 Series six core 24 (6) HP MSA2312 (48) 300GB 15k SAS 12TB 24TB
HP Proliant DL 785 G6
(8) AMD Opteron Istanbul six core 2.8 GHz
48 (12) HP MSA2312 (96) 300GB 15k SAS 24TB 48TB
Dell PowerEdge R710 (2) Intel Xeon Nehalem quad core 2.66 GHz 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TBDell Power Edge R900 (4) Intel Xeon Dunnington
six core 2.67GHz 24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB
IBM X3650 M2 (2) Intel Xeon Nehalem quad core 2.67 GHx
8 (2) IBM DS3400 (16) 200GB 15K FC 4TB 8TB
IBM X3850 M2 (4) Intel Xeon Dunnington six core 2.67 GHz
24 (6) IBM DS3400 (24) 300GB 15k FC 12TB 24TB
IBM X3950 M2 (8) Intel Xeon Nehalem four core 2.13 GHz 32 (8) IBM DS3400 (32) 300GB 15k SAS 16TB 32TB
Bull Novascale R460 E2 (2) Intel Xeon Nehalem quad core 2.66 GHz 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB
Bull Novascale R480 E1 (4) Intel Xeon Dunningtonsix core 2.67GHz
24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB
Potential Performance Bottlenecks
FCHBA
A
B
FCHBA
A
B FC S
WIT
CH
STORAGECONTROLLER
A
B
A
BCA
CH
E
SER
VER
CA
CH
E
SQ
L SER
VE
R
WIN
DO
WS
CPU
CO
RES
CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate
A
BDISK DISK
LUN
DISK DISK
LUN
SQL Server Read Ahead Rate
LUN Read Rate Disk Feed Rate
Fast Track SQL DW Architecture vs. Traditional DW
SQL 2008 Data Warehouse4 Processor 16 Core Server
Shared Network Bandwidth
Enterprise Shared SAN Storage
Dedicated Network Bandwidth
Traditional SQL DWArchitectureShared Infrastructure
Fast Track SQL DW ArchitectureDedicated DW InfrastructureArchitecture modeled after DW Appliances 1TB – 48TB Pre-Tested
Dedicated Low Cost SAN Arrays 1 for every 4 CPU Cores EMC AX4 – HP MSA2312
OLTP Applications
Benefits:More System Predictability Thus User ExperiencePretested Configurations Lowers TCOBalanced CPU to I/O Channel Optimized for DWModular Building Block ApproachScale Out or Up within limits of Server and San
Case: Insurance Claims – High-volume loads in a short load window
Example: Load and enrich 50 GB of incremental data in less than 1 hourOnly possible with a highly parallel load designUse partitioned destination table
# partitions = # coresParallel loading to staging table firstSeparate filegroups per-partition prevents interleaving during load
Results
Existing Appliance SQL Server Fast Track DW
Comparison
Loading – Subject Area 1
5:10:21 total time 51:31 total time R 6x faster
Loading – Subject Area 2
4:36:08 total time 1:50.01 total time R 2.5x faster
Query times – Subject Area 1
3:03 avg query time(using 9 benchmark queries)
0:15 avg query time(using 9 benchmark queries)
R 12x faster
Query times – Subject Area 2
56:44 avg query time(using 4 benchmark queries)
8:09 avg query time(using 4 benchmark queries)
R 7x faster
Price per TB (8TB) – Cal : $22K / TB
Price per TB (16TB) – Cal: $13K / TB
Case StudyReplaced AS/400 DB2 with SQL ServerReplaced CICS with SSISSaved ~$50,000 a monthTook 12 hour process down to 50 minutes
DW Products Positioning
Start here
Incremental HW Expansion, Fast parallel loading by default,
HA by defaultScaleComplexityHA by defaultSW-HW integration
1
2
3
SQL Server 2008with Fast Track
Reference Architecture
PDW with Hub-and-spoke
SQL Server 2008
4
PDW
Fast Track Data StripingFast Track evenly spreads SQL data files across physical RAID-1 disk arrays
ARY01D1v01
ARY01D2v02
ARY02D1v03
ARY02D2v04
ARY03D1v05
ARY03D2v06
ARY04D1v07
ARY04D2v08
ARY05v09
DB1-1.ndf DB1-7.ndfDB1-5.ndfDB1-3.ndf
DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf
DB1.ldf
Primary Data Log
FT Storage EnclosureRaid-1
Disk 1 & 2
Fast Track File Layout
SQL Server File SystemThree layers of storage configuration
SAN file systemLogical storage allocation
Primary Data (user databases)(4) 2 disk RAID-1 arrays per enclosure
Log(1) 2 disk RAID-1 array per enclosure
Database file creationUser databasesTempdbTransaction logs
Writing Sequential DataSequential scan performance starts with database creation and extent allocationRecall that the –E startup option is usedAllocate 64 extents at a time (4MB)Pre-allocation of user databases is recommenedAutogrow should be avoided if possibleIf used, always use 4MB increments
Mounting the SAN File SystemCreating LUNS
Mount points can be used to map LUN’s to the Windows Server OSFast Track RA recommends using a naming scheme to identity LUN to physical disk relationship.
LUN, RAID, and Physical Disk number are used as components of the windows volume nameNaming scheme enables targeted IO validation of disk (LUN), array, and storage processor using a tool such as SQLIO
Primary Data arrays: 2 LUN per ArrayLOG array: 1 LUN
SQL Server ConfigurationSQL Server Startup
-E : Allocate 64 extents at a time (4MB)This is not a guarantee of a logically contiguous extent allocation
-T1117: Autogrow in even increments-T610 : Minimal logging during data loadsAll databases should be sized to meet expected growth for next 12-18 monthsAutogrow for ALL Databases should be set to 4 MB
SQL Server FilesTransaction Log
Create a single transaction log file per database and place on a dedicated Log LUNEnable auto-grow for log filesThe transaction log size for each database should be at least twice the size of the largest DML operation
SQL Server FilesUser Databases
Create at least one Filegroup containing one data file per LUNFT targets 1:1 LUN to CPU core affinityMake all files the same sizeEffectively stripes database files across data LUNs
Multiple file groups may be advantageousDisable Auto-Grow for the databaseTransaction Log is allocated to a Log LUN
Data Load in a Fast Track
Conventional data loads lead to fragmentationBulk Inserts into Clustered Index using a moderate ‘batchsize’ parameter
Each ‘batch’ is sorted independentlyOverlapping batches lead to page splits
1:321:31 1:351:341:331:36 1:381:37 1:401:391:321:31 1:351:341:33
Key Order of Index
Techniques to Maximize Scan ThroughputMinimize use of NonClustered indexes on Fact TablesLoad techniques to avoid fragmentation
Load in Clustered Index order (e.g. date) when possibleIndex Creation always MAXDOP 1, SORT_IN_TEMPDBIsolate volatile tables in separate filegroupIsolate staging tables in separate filegroup or DBPeriodic maintenance
Minimizing Extent FragmentationExtent fragmentation can be minimized through use of filegroups
Separate filegroups for volatile dataSeparate filegroups for staging tablesPartition key tables across multiple filegroups
Useful if data volatility varies across partition rangesIsolate data operations that generate significant fragmentation to dedicated filegroups or databases
Loading DataPrimary method used to create sequential data layoutGoals
Maximize sequential data layoutMinimize fragmentation
Key considerationsConcurrent load operations to the same file will induce fragmentationDML change operations (Update/Delete) may induce fragmentation
Loading DataLoad recommendations for Fast Track are broken into two general scenarios
Migration: Very large one-time, or infrequent loadsTypically target empty tablesLess time sensitive relative to SLA’s
Incremental: Routine operational loadsTypically target populated tablesTime sensitive
Migration Loads – HeapMinimal Logging is recommended
Tablock and/or TF610 may be requiredPartitioned/Non Partitioned
Load directly into target tableSet BATCHSIZE appropriatelyParallelize Bulk Inserts if necessary
Migration Loads – CIMinimal Logging is recommended
Tablock and/or TF610 may be requiredBulk Insert to a Staging table
Stage option 1: Heap with matching partitionStage option 2: CI with matching partition
Concurrent Bulk InsertsStage option 1: YesStage option 2: Limited by TempDB sort
Insert-Select from StageMaxdop 1, single queryConcurrent Inserts if:
Partition range restricted orMultiple Filegroups targeted
Partition Switching can also be used
Data Loading – Recommendations for Incremental LoadsClustered Index Table Loads
Option 1 – Direct load into tableSorts and commit size must fit into memory
Option 2 – Empty tableLoad into empty clustered index tableSerial or parallelizedNon-parallelized INSERT SELECT statement to move to final table
Incremental Loads – CIMinimal Logging is recommended
Tablock and/or TF610 may be requiredBulk-Insert to identical CI staging table
Insert Select with maxdop1Direct Bulk-Insert to Target table
Ensure sorts fit in memoryHigher performance options
Partition Target table across multiple FilegroupsConcurrent Bulk inserts across Filegroups
Concurrent Bulk Insert to partitioned Heap StageConcurrent ranged restricted Insert-Select
Fast Track LoadParallel Loads
Demo
Maintenance considerationsUse ALTER INDEX … REBUILD … … WITH (MAXDOP = 1, SORT_IN_TEMPDB)
Single threaded -- avoids creating new extent fragmentationCan rebuild just the “current” partition
Avoid ALTER INDEX … REORGANIZEPages will become physically ordered, but significant extent fragmentation may occur
Column StatisticsAUTO CREATE and AUTO UPDATE is recommendedIdeally statistics are gathered on all columns of a table
Minimum of columns used in WHERE or HAVING clausesPerformance considerations will determine a balance between ideal & minimum casesFULLSCAN recommended for Dimensions
Composite statistics for joins that utilize composite keysComposite statistics cannot be auto created
Tenets of Parallel DesignPartition the problem
Preferable into equal sized pieces
Eliminate conflictsStateless designReduce the need for common resources
Schedule efficientlyOptimize the Gantt chart
Schedule EfficientlyCreate a priority queue of workStart multiple copies of the packagePackages process work in a loop
P5Pn …
Task Queue
Get Task Do WorkLoop
Get Task Do WorkLoop
DTEXEC (1)
DTEXEC (2)P3P4 P1P2
Demo Results
1 2 3 4 5 6 7 8
Start 0 8.90046358108523E-05
0.000181203708052636
0.000271793986030389
0.000366006948752329
0.000462037038232666
0.000555555561732035
0.000652268521662336
Duration 8.8807872089092E-05
9.2013884568587E-05
9.05902779777537E-05
9.42129627219404E-05
9.58333330345341E-05
9.33217597776095E-05
9.67129599303008E-05
9.4409719167743E-05
00:04.3
00:12.9
00:21.6
00:30.2
00:38.9
00:47.5
00:56.2
01:04.8
1 Process finishes in 64 seconds
Elap
sed
Tim
e
Demo Results
1 2 3 4 5 6 7 8
Start 0 1.96756445802748E-07
0.000104016202385538
0.000104166661913041
0.000204745367227588
0.000205092590476852
0.000316666664730293
0.000316863421176096
Duration 0.000103819438663777
0.000103819445939735
0.000100729164842051
0.000100740740890615
0.000111736109829508
0.000111770830699243
0.000107060186564923
0.000108136577182449
00:04.3
00:12.9
00:21.6
00:30.2
00:38.9
00:47.5
00:56.2
01:04.8
2 Processes finish in 36 seconds
Elap
sed
Tim
e
Demo Results
1 2 3 4 5 6 7 8
Start 0 0 9.25923814065755E-07
1.07639061752707E-06
0.000154780092998408
0.000157326387125067
0.000162037038535346
0.000168715276231524
Duration 0.000154629626194946
0.000168518519785721
0.00016091435099952
0.000156064808834344
0.000159340277605225
0.000164745368238073
0.000156597219756804
0.000156597219756804
00:04.3
00:12.9
00:21.6
00:30.2
00:38.9
00:47.5
00:56.2
01:04.8
4 Processes finish in 28 seconds
Elap
sed
Tim
e
Demo Results
1 2 3 4 5 6 7 8
Start 0 0 6.94446498528126E-07
8.91202944330875E-07
8.91202944330875E-07
1.07639061752707E-06
1.42361386679113E-06
1.42361386679113E-06
Duration 0.000306331021420193
0.000306331021420193
0.000305439811199904
0.000314305558276829
0.000283564819255844
0.000312118056172041
0.000304907407553402
0.00031613426108379
00:04.3
00:12.9
00:21.6
00:30.2
00:38.9
00:47.5
00:56.2
01:04.8
8 Processes finish in 27 seconds
Elap
sed
Tim
e
Using A Queue To Control Parallelism
Demo
Fast Track Data Warehouse Pricing
ConclusionTenets of parallel design
Partition the problemEliminate conflictsSchedule efficiently
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
Complete an evaluation on CommNet and enter to win!
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registrationJoin us in Atlanta next year
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.