parallel execution plans joe chang jchang6@yahoo.com
Post on 04-Jan-2016
221 Views
Preview:
TRANSCRIPT
Parallel Execution Plans
Joe Changjchang6@yahoo.comwww.sql-server-performance.com/joe_chang.asp
Parallel Execution PlansParallel Execution Plans
Allows single query to use multiple processors
Query should run faster but may consume more resources
Example
1 thread: 10 sec run time, 10 CPU-sec
2 threads: 6 sec run time, 12 CPU-sec
Parallel Execution ConfigurationParallel Execution Configuration
Cost Threshold For ParallelismMinimum query plan threshold for considering queries for parallel execution
Default 5: Considering increasing to 20-50 for new systems
Max Degree of ParallelismDefault 0: Can use all available processors
SQL Server determines level based on available memory and recent CPU usage
Parallel Plan OperatorsParallel Plan Operators
The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs.
The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams.
The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.
Execution Plan Cost FormulasExecution Plan Cost Formulas
Table Scan or Index Scan
I/O: 0.0375785 + 0.0007407 per pageCPU: 0.0000785 + 0.0000011 per row
Index Seek – Plan Formula
I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)
= 0.003203425 + 0.000740741 per additional page (>1GB)
CPU Cost = 0.000079600 + 0.000001100 per additional row
Bookmark Lookup – May have changed ?
I/O Cost = multiple of 0.006250000 (≤1GB)
= multiple of 0.003124925 (>1GB)
CPU Cost = 0.0000011 per row
Table Scan or Index Scan
IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)
IUD CPU Cost = 0.000001 per row
Cost InterpretationCost Interpretation
Time in seconds? CPU time?0.0062500sec -> 160/sec
0.000740741 ->1350/sec (8KB)->169/sec(64K)-> 10.8MB/sec
S2K BOL: Administering SQL Server, Managing Servers,Setting Configuration Options: cost threshold for parallelism OptQuery cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.
Too fast for 7200RPM disk random I/Os.
About right for 1997 sequential disk transfer rate?
Test TableTest Table
CREATE TABLE M3A_20 (GroupID int NOT NULL,ID int NOT NULL,ID2 int NOT NULL,ID3 int NOT NULL,ID4 int NOT NULL,sID smallint NOT NULL,bID1 bigint NOT NULL,bID2 bigint NOT NULL,bID3 bigint NOT NULL,rMoney money NOT NULL,rDate datetime NOT NULL,rReal real NOT NULL,rDecimal decimal (9,4) NOT NULL,CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 )
GO
Data Population Script 1Data Population Script 1SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues intSELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@CardWHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINTPRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)
Data Population Script 1 NotesData Population Script 1 Notes
Double While LoopEach Insert/Update/Delete statement is an implicit transaction
Gets separate transaction log entry
Explicit transaction – generates a single transaction log write (max 64KB per IO)
Single TRAN for entire loop requires excessively large log file
Inserts are grouped into intermediate size batches
Data Population Scripts 2Data Population Scripts 2
DECLARE @L int SELECT @L = 1WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121)END
UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN
CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE
Primary table populated using single row inserts in a WHILE loop,Additional tables populated with INSERT / SELECT statement
Single row inserts ~20-30K rows/secINSERT / SELECT statement ~100K+ rows/sec
Index Seek PlansIndex Seek Plans
Many rows returned,Non-parallel plan
Parallel Execution disabled
Cost: 9.34
Cost: 9.82
Cost: 4.94Parallel Plan
Index Seek DetailsIndex Seek Details
Non-parallel plan
Parallel plan
Index Seek – Non-parallelIndex Seek – Non-parallel
Cost assigned to SELECT
Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill)I/O cost is: 8.2365CPU Cost is 1.1000785Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value
Index Seek – Parallel PlanIndex Seek – Parallel Plan
No cost assigned to SELECT
Index Seek I/O and CPU cost ½ of non-parallel plan
Index Seek with AggregateIndex Seek with Aggregate
1234
Index Seek Aggregate Parallel Index Seek Aggregate Parallel Plan DetailsPlan Details
1
2
3
4
Table ScanTable Scan
Cost: 9.01
Cost: 8.26
Table Scan Details Table Scan Details
Non-parallel plan
Parallel plan
I/O cost sameCPU cost ½ of non parallel plan
Table Scan DetailsTable Scan Details
Non-parallel plan
Parallel plan
No cost on Select
No cost
I/O cost sameCPU cost ½ of non parallel plan
Parallel Plan Cost Formulas PatternsParallel Plan Cost Formulas Patterns
CPU costs are ½ of non-parallel plan
Index Seek I/O cost are also ½
Scan I/O cost is same as non-parallel plan
Parallel plan costs are based on 2 processors
Actual number of processors determined at runtime
Overhead operationsDistribute, Repartition & Gather Streams
Hash Join Hash Join
Cost: 6.50
Cost: 4.79
200,000 rows15 byte OS row size
Hash Join DetailsHash Join Details
Non-parallel plan
Parallel plan
Hash Join DetailsHash Join Details
Non-parallel plan
Parallel plan
Hash Join – Non-parallel planHash Join – Non-parallel plan
Hash Join – Parallel PlanHash Join – Parallel Plan1234
1
2
3
4
Hash Join with I/O CostHash Join with I/O Cost
900,000 rowsMAXDOP 1
Cost 74.1
Cost 85.1
Hash Join – Join I/O CostHash Join – Join I/O Cost
730,000 rows
740,000 rows
Hash Join - BitmapHash Join - Bitmap
Hash Join Cost FormulaHash Join Cost Formula
Index Seek – Plan Formula
I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB)
= 0.003203425 + 0.000740741 per additional page (>1GB)
CPU Cost = 0.000079600 + 0.000001100 per additional row
Hash Join
CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows)
+ 0.0000000720 (100 rows)
0.000015091 per row
0.000015857 (parallel)
+ 0.000001880 per row per 4 bytes in OS
+ 0.000005320 per additional row in IS
I/O Cost = 0.000042100 per row over 64MB (Row Size+8)
0.0000036609 per 4 byte over 15B
Parallel Cost FormulaParallel Cost Formula
Base Cost 0.028500
Repartition StreamCost per row
= 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes
Gather StreamCost per row
= 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes
Dispatch
Loop JoinLoop Join
Loop Join DetailsLoop Join Details
Non-parallel planOuter Source
Parallel planOuter Source
Loop Join DetailsLoop Join Details
Inner Source cost identical for both non-parallel and parallel plans
Loop Join DetailsLoop Join Details
Non-parallel plan
Parallel plan
Merge JoinMerge Join
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
Merge Join DetailsMerge Join Details
Non-parallel plan
Parallel plan
Index Seek + Aggregate TestIndex Seek + Aggregate Test
0
0.2
0.4
0.6
0.8
1
1.2
1 Sum 1 NULL 2 Sum 2 NULL 3 Sum 3 NULL
Du
rati
on
/1K
ro
ws
(ms)
1P 2P
00.10.20.30.40.50.60.7
1 Sum 2 Sum 3 Sum
Du
rati
on
/1K
ro
ws
(ms)
1P
2P
Opteron2.2GHz 1M
Xeon 2.4GHz/512K
Index Seek + Aggregate Test, Itanium 2Index Seek + Aggregate Test, Itanium 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 of 10M Count Sum Convert Max Money Decimal
Du
rati
on
ms/
1K r
ow
s 1P 2P 4P 8P
Itanium 2 1.5GHz/6M
Index Seek + Aggregate Test, SUM(INT)Index Seek + Aggregate Test, SUM(INT)
Itanium 2 1.5GHz/6M
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Count 1 Sum 2 Sum 3 Sum
1P 2P
4P 8P
Index Seek + Aggregate Test, NULLIndex Seek + Aggregate Test, NULL
Itanium 2 1.5GHz/6M
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Sum 1 NULL 2 Sum 2 NULL 3 Sum
1P
2P
4P
8P
Loop Join, COUNT(*)Loop Join, COUNT(*)
Itanium 2 1.5GHz/6M
0
1
2
3
4
5
6
7
100 1,000 10,000
rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P 2P 4P 8P
Hash Join, COUNT(*)Hash Join, COUNT(*)
Itanium 2 1.5GHz/6M
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
100 1,000 10,000rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P
2P
4P
8P
Merge Join, COUNT(*)Merge Join, COUNT(*)
Itanium 2 1.5GHz/6M
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
100 1,000 10,000rows (000's)
Du
rati
on
/1K
ro
ws
(ms)
1P 2P 4P
General RecommendationsGeneral Recommendations
Useful in DW, ETL, and maintenance activities
Use judgment on transactions processing
Is throughput more important
Or faster expensive queries
Increase Cost Threshold from 5 to 20-50
Limit MAXDOP to 4
Verify or limit parallelism on Xeon systems with Hyper-Threading enabled
Additional InformationAdditional Information
www.sql-server-performance.com/joe_chang.asp
SQL Server Quantitative Performance AnalysisSQL Server Quantitative Performance AnalysisServer System ArchitectureServer System ArchitectureProcessor PerformanceProcessor PerformanceDirect Connect Gigabit NetworkingDirect Connect Gigabit NetworkingParallel Execution PlansParallel Execution PlansLarge Data OperationsLarge Data OperationsTransferring StatisticsTransferring StatisticsSQL Server Backup Performance with Imceda LiteSpeedSQL Server Backup Performance with Imceda LiteSpeedjchang6@yahoo.com
top related