filegroup “stage a” filegroup “stage a” filegroup “a” partition 1,2 filegroup “b”...
TRANSCRIPT
Fast Track DW 3.0 – Data Loading Best Practices
Name: Eric KraemerTitle: Senior Program Manager Company: Microsoft
DEMO
Loading DataMinimizing logical fragmentation is key to Fast Track scan performance
Minimizes Disk head movementMaintains high average request size (Think ~400k not 8k)
Implies lower operation (IOP) counts to achieve high scan rates than traditionally seen with SQL Server.
Sustain high average scan rates (up to 240 MB/s per RAID1 LUN)Key considerations for a Fast Track data load
Data Architecture: Destination table, partitioning, and filegroupSource Data: Format & sizeSystem Resources: CPU & Memory
Data Architecture
Starting point for building a Fast Track Load methodIdentify target table type
Structure (Heap, Cluster Index)Define data volatility for Cluster Index targets
Choose table architecture based on data volatilityPartition geometryFilegroup geometry
SQL Filegroups enable concurrent Load/DML operations with minimal logical fragmentation. Partitioning allows a single table to live in multiple Filegroups.
Data and ResourcesConsider source data architecture
Type: File or StreamTransaction: Bulk or RowFormat: Ordered, unordered, multi-file, single-fileFlexibility: Split to multiple files, key order
System resourcesCPU Cores & Memory
Building a Fast Track Bulk Load ProcessScenario: Single table migration (320GB)
Destination Table: Page Compressed, Partitioned Cluster IndexSource data considerations
Location: Legacy DBFrequency: One-time extractFormat: 8 Flat files, Unordered data
System Information: 8 CPU Cores, 192GB memorySQL file structure
8 years data, 8 total partitions: 40GB per partition4 Filegroups, 2 partitions per filegroup
Filegroup“Stage A”
Filegroup “A”
Partition 1,2
Filegroup “B”
Partition 3,4
Filegroup “C”
Partition 5,6
Filegroup “D”
Partition 7,8
Filegroup“Stage B”
Example: Fast Track Migration Load to Partitioned CI
8 Source Data Files
8 Core ServerBase Heap StageTable
Target Database
Partition 2 Destination CI
Partition 1 Destination CI
Partition 4 Destination CI
Partition 3 Destination CI
Partition 6 Destination CI
Partition 5 Destination CI
Partition 8 Destination CI
Partition 7 Destination CI
8 Concurrent Bulk Insert
Step 1“Base Load”
Step 2“Stage Insert”
Step 3“Transform” Step 4
“Final Append”
8 Heap Stage TableConstraint on CI Part
Key
8 Concurrent Inserts
2 CI Stage Tables
2 CI Stage Tables
2 CI Stage Tables
2 CI Stage Tables
2 sets, 4 concurrent Create Cluster Index with Compression
INTO “Final Destination”
Create CI
8 Concurrent Partition Switch
Part Switch
Part Switch
Part Switch
Part Switch
Destination Partitioned CI Table
Fast Track Partition CI Load (Migration): PrinciplesThe following determines Filegroup Architecture
VolatilityPartition SizeAvailable MemoryTotal Physical CPU Cores
Maximize efficient of the initial Bulk LoadAvoid sortsAvoid page lock contentionAvoid compression prior to creation of index
Maximize the Create Index operationStage by partition to keep sorts in memoryParallelize across filegroups to minimize fragmentationCompress with the Create IndexPartition switch into destination
Fast Track Incremental Load – Scenario A “Non-Volatile”
Scenario: 1GB Daily incremental load to Partitioned CISource Data
Single text file, unorderedCurrent data rarely or never crosses partition boundaries
Source Table: 48 monthly partitionsSystem: 8 Core, 192GB RAMSQL file structure
Filegroup “Historical”: Partitions 1..46Filegroup “Current”: Partitions 47,48
Example: Non-Volatile Incremental Load
Target Database
PrimaryPartitioned CI
Table
Current Partition
1 Source Data File
Step 1Incremental
Load
Bulk Insert
Fast Track Incremental Load – Scenario B “Volatile”
Scenario: 1GB Daily incremental load to Partitioned CISource Data
Single text file, unorderedCurrent data may touch two or more partitions
Source Table: 48 monthly partitionsSystem: 8 Core, 192GB RAMSQL file structure
Filegroup “Historical”: Partitions 1..46Filegroup “Current”: Partitions 47,48
Filegroup “Current”
Filegroup “Historical”
Example: Volatile Incremental Load
Target Database
Bulk Insert
Volatile HoldingTables
PrimaryPartitioned CI
Table
Partition 1..46 Destination CI
Primary ViewUnions Holding & Primary Tables
Step 1DDL
Step 2Primary View
1 Source Data File
Step 3Incremental
Load
Partition 48 Destination CI
Partition 47 Destination CI
Step 4Manage Partitions
Partitions47,48
Partitions1-46
Evaluating Page FragmentationAverage Fragment Size in Pages – This metric is a reasonable measure of contiguous page allocations for a table.Value should be >= 400 for optimal performanceselect db_name(ps.database_id) as database_name ,object_name(ps.object_id) as table_name ,ps.index_id ,i.name ,cast (ps.avg_fragmentation_in_percent as int) as [Logical Fragmentation] ,cast (ps.avg_page_space_used_in_percent as int) as [Avg Page Space Used] ,cast (ps.avg_fragment_size_in_pages as int) as [Avg Fragment Size In Pages] ,ps.fragment_count as [Fragment Count] ,ps.page_count ,(ps.page_count * 8)/1024/1024 as [Size in GB]from sys.dm_db_index_physical_stats (DB_ID() --NULL = All Databases
, OBJECT_ID('$(TABLENAME)') , 1 , NULL , 'SAMPLED') AS ps --DETAILED, SAMPLED, NULL = LIMITED
inner join sys.indexes AS i on (ps.object_id = i.object_id AND ps.index_id = i.index_id)where ps.database_id = db_id() and ps.index_level = 0;