table partitioning for maintenance and performance jarmo nieminen, system engineer, principal gus...
TRANSCRIPT
Table Partitioning ForMaintenance and Performance
Jarmo Nieminen, System Engineer , PrincipalGus Bjorklund, Wizard
© 2015 Progress Software Corporation. All rights reserved.2
What language we are going to talk today?
“Saari, saari, heinäsaari; heinäsaaren neito.”
“Island, island, grassy island; grassy island's maiden.”
“Ö, ö, hö ö; hö ö mö”
© 2015 Progress Software Corporation. All rights reserved.3
Agenda
Partition design considerations
• Partition definition setup, not physical layout
Some internals of Table Partitioning in OpenEdge
Performance Impact (maintenance and runtime)
• Your configuration matters
• Yes, we have numbers
© 2015 Progress Software Corporation. All rights reserved.4
Is it Right for You?
Do you require 24/7 uptime for your large business critical OpenEdge application?
Does maintaining your home-grown archiving system take up too much of your time?
Do you enjoy maintaining your OpenEdge database during weekends and holidays?
How important are performance and response time SLAs for you?
© 2015 Progress Software Corporation. All rights reserved.5
Data Access: Partitioning types
Sub-partitioning(up to 15 levels!)OR
Northern Region
Western Region
Southern Region
Order Table Order Table
12/31/2011
Western Region
12/31/2013
Western Region
12/31/2015
Western Region
12/31/2011
Southern Region
12/31/2013
Southern Region
12/31/2015
Southern Region
12/31/2013
Northern Region
12/31/2015
Northern Region
12/31/2011
Northern Region
List Partitioning
Range PartitioningOR
Order Table
12/31/2011
12/31/2013
12/31/2015
© 2015 Progress Software Corporation. All rights reserved.7
What to Partition: Data Organization
Look for a “well known” grouping by “static data value”
• Known at creation time, changes infrequently
List Partitioning
Data organized geographically or grouped by specific entities
• Exact match
• Country, region, company, division
• Why or why not Sales-Rep?
Consider number of unique data values
• 32,765 max defined partitions per table
For best performance: Spread the data out
Northern Region
Western Region
Southern Region
Order Table
List Partitioning
© 2015 Progress Software Corporation. All rights reserved.8
What to Partition: Data Organization
Range Partitioning
Data organized by ranges of values
• Range rather than single value to identify a group of data
• Date (by year is most typical)
– Usage: Calendar year, fiscal year, quarter?
– Order-date vs ship-date?
– Consider affect on index choice
• Alphabetic or numeric range
– Product code
– Usage vs Balance: Group related products, balance A-Z spread
For best performance: Spread the data out
Range Partitioning
Order Table
12/31/2011
12/31/2013
12/31/2015
© 2015 Progress Software Corporation. All rights reserved.9
What to Partition: Data Organization
Sub-partitioning
Sub-partitioning candidate?
• Can you include another column (or add one)?
• By region by order-date
For best performance:
• Sub-partition AND spread the data out
Sub-partitioning
Order Table
12/31/2011
Western Region
12/31/2013
Western Region
12/31/2015
Western Region
12/31/2011
Southern Region
12/31/2013
Southern Region
12/31/2015
Southern Region
12/31/2013
Northern Region
12/31/2015
Northern Region
12/31/2011
Northern Region
© 2015 Progress Software Corporation. All rights reserved.10
Increasing Concurrency With Table Partitioning
A7 A8 A9
Table data all in one physical storage area
User #4User #3User #2User #1
A7
No partitioning – Order table data in 1 storage area
Create order. Assign Order-date = TODAY region = “NorthEast”. Create order. Assign Order-date = TODAY region = “SouthEast”.
© 2015 Progress Software Corporation. All rights reserved.11
Create order. Assign Order-date = TODAY.Create order. Assign Order-date = TODAY.
Increasing Concurrency With Table Partitioning
A7 A8 A9
“Current” data in one physical storage area
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
A7
Range Partitioningby Order-Date
© 2015 Progress Software Corporation. All rights reserved.12
Create order. Assign Order-date = TODAY Product-Code = “D100”. Create order. Assign Order-date = TODAY Product-Code = “A50”.
Increasing Concurrency With Table Partitioning
A7 A8 A9
“Current” data in one physical storage area
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
A7
Range Partitioningby Order-Date
A7 A8 A9
Table data across physical storage areas
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
Range partitioningby Product-Code
© 2015 Progress Software Corporation. All rights reserved.13
Increasing Concurrency With Table Partitioning
A7 A8 A9
Table data across physical storage areas
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
List partitioningby Region
Create order. Assign Order-date = TODAY region = “NorthEast”. Create order. Assign Order-date = TODAY region = “SouthEast”.
© 2015 Progress Software Corporation. All rights reserved.14
Increasing Concurrency With Table Partitioning
A7 A8 A9
Table data across physical storage areas
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
List partitioningby Region
Create order. Assign Order-date = TODAY region = “NorthEast”. Create order. Assign Order-date = TODAY region = “SouthEast”.
A7 A8 A9
Table data across physical storage areas
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
Sub-partitioningby Region & Order-Date
© 2015 Progress Software Corporation. All rights reserved.15
DOES IT REALLY MATTER?
LET’S LOOK AT SOME NUMBERS…
© 2015 Progress Software Corporation. All rights reserved.16
Physical Characteristics
Type II Areas
• Data and index separated
– 8 Kb block size with cluster sizes of 512(data) and 64(index)
• All partitions in separate areas
• Areas of proportional fixed sizes with matching db extends
Data
• Average record size 257, all same RPB (32)
– Might be interesting to show per partition RPB tuning
• 50,000 records to 10,000,000 per run (base on # users)
• 3 Global indexes and 2 local indexes
Recovery
• 8KB block with 128 MB cluster size
© 2015 Progress Software Corporation. All rights reserved.17
Testing performed
Scale users
• 1, 2, 5, 10, 25, 50, 100, 200
• Avoid application side conflicts
• Monitor internal resource conflicts
Operations executed
• Basic Create, Read, Delete
Vary transaction scope
• 10, 100, 500 records per transaction
Vary partitioning scheme
• No partitioning
• Range partitioning on {order-date}
• Sub-partitioning on {region(9), order-date}
© 2015 Progress Software Corporation. All rights reserved.18
Modified Server Parameters
Buffer pool
• -B 50000 -lruskips 250
Lock Table
• -L 100000 -lkwtmo 3600
Transaction
• -TXERetryLimit 1000
BI
• -bibufs 4000 -bwdelay 20
Latching
• -spin 50000 -napmax 10
Page writers: 1 BIW 3 APWs
© 2015 Progress Software Corporation. All rights reserved.19
Other Test Information
Machine Stats
• 16 sparcv9 processor operating at 3600 MH
• Memory size: 32768 Megabytes
Dbanalys performed before and after each activity
Database recreated with same .st file for each run
Variation across runs: ±1%
These tests were run with the best intentions
• There are some additional areas Progress needs to investigate
• As always, YMMV
© 2015 Progress Software Corporation. All rights reserved.20
Sub-partitioning on region AND order-date
Txn size of 10
Writes and deletesperform significantlybetter with thissub-partitioning scheme
Big jump starts at 25 & 50
No improvement for“Isolated” Read activity
# Users vs % difference
Neg. indicates a loss for TP
© 2015 Progress Software Corporation. All rights reserved.21
Sub-partitioning on region AND order-date
Deletes fall off with increased txn size
Big jump starts at 25
Reads remain flat
© 2015 Progress Software Corporation. All rights reserved.22
Sub-partitioning on region AND order-date
Deletes fall off with increased txn size
Writes improve withincreased txn size
Reads remain flat
© 2015 Progress Software Corporation. All rights reserved.23
Data Location Mapping Overhead
A7 A8 A9 A10
Table data now across physical storage areas
Partition 1 Partition 2 Partition 3 Partition 4
Table # +
Column Value
Area # andRecord data
Object Mapping
Partition mapping via “special” _partition-policy-detail (ppd) index
One additional index lookup
Per record created
Create order. Assign region = “NorthEast” and order-date = TODAY.
PartitionMapping
© 2015 Progress Software Corporation. All rights reserved.24
Why the big difference?
For the 100 user “write” example (117% runtime performance improvement):
Note: 3 level index for non-partitioned and global, only 2 for partitioned indexes
Stat Base Partition Delta Waits Delta
DB Buf I Lock 24,965,752 24,219,272 746,480 48,485,873
DB Buf S Lock 49,845,176 43,137,912 6,707,264 3,669,065
Find index entry 100 5,000,100 -5,000,00 0
BUF Latch 325,197,441 230,610,801 94,586,640 2,488,178
MTX Latch 31,877,745 31,280,314 597,431 -443,931
TXQ 60,339,744 60,930,902 -591,158 -71,178
Latch timeouts 3,205,140 1,088,120 2,117,020
Resource waits 64,643,103 6,940,906 57,702,197
Extends 44,224 48,224
© 2015 Progress Software Corporation. All rights reserved.25
Data Location Mapping Overhead
A7 A8 A9 A10
Table data now across physical storage areas
Partition 1 Partition 2 Partition 3 Partition 4
Table # +
Column Value
Area # andRecord data
Object Mapping
Partition mapping via “special’ ppd index
One additional index lookup
Per partition traversed
Query spanning 3 partitions requires only 3 additional partition index lookups
For each order where region = “NorthEast” and order-date > 01/01/2013: end.
PartitionMapping
© 2015 Progress Software Corporation. All rights reserved.26
Why so little difference?
For the 100 user “read” example (1.8% runtime performance improvement):
This is an isolated test case
• Very little conflict for read activity
Real world scenarios will have mixed activity introducing concurrency issues
• May be resolved with partitioning
Stat Base Partition Delta Waits Delta
Index operations 25,000,100 25,000,200 100 0
DB Buf S Lock 26,059,284 26,044,498 14,786 6,816
BHT Latch 25,872,927 29,698,636 -3,825,709 29,532
BUF Latch 81,084,554 81,295,318 -210,764 -3,519
Latch timeouts 57,193 31,327 25,866
Resource waits 7,172 356 6,816
© 2015 Progress Software Corporation. All rights reserved.27
Data Location Mapping Overhead
A7 A8 A9 A10
Table data now across physical storage areas
Partition 1 Partition 2 Partition 3 Partition 4
Table # +
Column Value
Area # andRecord data
Object Mapping
Partition mapping via “special” ppd index
One additional index lookup
Per partition traversed
Query spanning 3 partitions requires only 3 additional partition index lookups
For each order where region = “NorthEast” and order-date > 01/01/2013: DELETE order.
PartitionMapping
© 2015 Progress Software Corporation. All rights reserved.28
Why the big difference?
For the 100 user “delete” example (126% runtime performance improvement):
Partitioning has more Shared Buffer activity with less waits.
Latch time out and resource waits are significant
Stat Base Partition Delta Waits Delta
DB Buf I Lock 49,764,712 39,622,400 10,142,312 28,235,912
DB Buf S Lock 135,589,472 172,297,568 -36,708,096 45,324,198
BHT Latch 308,628,367 261,774,931 46,853,436 471,875
BUF Latch 527,202,766 516,930,247 10,272,519 2,955,964
Latch timeouts 5,200,069 1,849,659 3,350,410
Resource waits 120,187,904 46,963,709 73,224,195
© 2015 Progress Software Corporation. All rights reserved.29
Create order. Assign Order-date = TODAY.Create order. Assign Order-date = TODAY.
Numbers from a “bad” configuration
A7 A8 A9
“Current” data in one physical storage area
Partition 1 Partition 2 Partition 3
User #4User #3User #2User #1
A7
Range Partitioningby Order-Date
© 2015 Progress Software Corporation. All rights reserved.30
Poorly designed partitioning scheme (order-date only)
Write performanceis pretty bad for25-100 users
Expect MUCH flatterwrite performance
Improves with200 users
Reads remain flat
Same for other txn sizes
© 2015 Progress Software Corporation. All rights reserved.31
All the overhead without improved concurrency
For the 100 user “write” example (66% runtime performance loss):
Waits not significantly different but extends, index and buffer activity is MUCH higher.
• Test case was broken – data in wrong area
Stat Base Partition Delta Waits Delta
DB Buf I Lock 24,965,752 24,987,122 -21,370 1,135,820
DB Buf S Lock 49,845,176 54,496,236 -4,651,060 17,760
Find index entry 100 5,000,100 -5,000,00 0
BUF Latch 325,197,441 332,561,860 -7,364,419 268,565
MTX Latch 31,877,745 31,931,589 -53,844 9,455
TXQ 60,339,744 60,384,584 -44,840 -346
Latch timeouts 3,205,140 2,894,719 310,42
Resource waits 64,643,103 63,340,388 1,302,715
Extends 44,224 173,760
© 2015 Progress Software Corporation. All rights reserved.32
Poorly designed partitioning scheme (order-date only)
Much flatter response
No 200 user results
Further investigation:
• There is variation in reads and writes from previous.
• Stats show same as well
From small gain down to ~7% loss depending on the operation & # users
© 2015 Progress Software Corporation. All rights reserved.33
All the overhead without improved concurrency
For the 100 user “write” example (0.38% runtime performance loss):
Now have similar stats
• Index activity and X Buffer locks is the only real standout
Stat Base Partition Delta Waits Delta
DB Buf I Lock 24,942,392 24,953,548 -11,156 347,384
DB Buf S Lock 49,868,240 55,152,964 -5,284,724 -16,267
DB Buf X Lock 41,941,720 42,181,992 -240,272 -324,844
Find index entry 100 5,000,100 5,000,000 0
BUF Latch 321,989,710 333,313,379 -11,323,669 48,054
MTX Latch 31,931,319 31,944,574 -13,255 -4,037
Latch timeouts 3,166,505 3,132,653 33,852
Resource waits 61,308,496 6,865
Extends 3,392 10,880 -7,488
© 2015 Progress Software Corporation. All rights reserved.34
All the overhead without improved concurrency
For the 100 user “read” example (0.31% runtime performance loss):
Very little difference in activity
Index operations indicate use of “Global Index” vs local.
• Probably should rerun with different query to force comparative local index lookup
Stat Base Partition Delta Waits Delta
Index operations 25,000,100 25,000,100 0 0
DB Buf S Lock 26,782,308 26,503,194 279,114 372
BHT Latch 25,863,634 26,933,298 -1,069,664 1,428
BUF Latch 82,393,823 82,089,754 304,069 -790
Latch timeouts 59,747 58,588 1,159
Resource waits 3,893 3,521 372
© 2015 Progress Software Corporation. All rights reserved.35
All the overhead without improved concurrency
For the 100 user “delete” example (4.56% runtime performance loss):
TP experiences more activity
And now more waiting - NOTE: Buffer intent and share locks
Indexes all identical levels
Stat Base Partition Delta Waits Delta
DB Buf I Lock 49,708,524 49,747,972 -39,448 -2,270,296
DB Buf S Lock 133,305,600 133,141,072 164,528 -2,075,716
BHT Latch 292,113,659 296,430,870 -4,317,211 -57,465
BUF Latch 508,107,788 511,003,173 -2,895,385 -132,237
Latch timeouts 4,670,032 4,857,119 -187,087
Resource waits 106,292,224 110,639,104 -4,346,880
© 2015 Progress Software Corporation. All rights reserved.36
Maintenance Operations
10 GB Table (90 million records)
Binary Dump / load
Index rebuild
• Local
• Global
© 2015 Progress Software Corporation. All rights reserved.37
Binary Load Performance
Important to note:
Concurrent load for TP does NOT foul loading in dump order
• Multiple users insert on different allocation chains – one per partition
– Regardless if partitions are in same area or not!
Concurrent load for non-TP fouls loading in dump order
• Multiple users inserting on SAME allocation chains
• “Logical” scatter of data is re-introduced during load
© 2015 Progress Software Corporation. All rights reserved.38
Binary Load
Single user slower than expected
During TP load noticed high DBSI contention
• Due to global index support.
Operation Non-TP Table TP Entire Table % DifferenceBinary Load -1 24m35.698s 30m0.474s -22.90Binary Load -1 –i 24m36.540s 30m14.441s -21.95Binary Load –n 9 ** 24m59.249s 14m10.843s 76.15Binary Load –n 9 (w/apw,biw)
17m3.913s 11m45.879s 45.04
Binary Load –n 9 –i ** 17m30.530s 6m41.232s 162.09Binary Load –n 9 -i (w/apw,biw)
16m53.992s 6m34.930s 150.74
© 2015 Progress Software Corporation. All rights reserved.39
Index Rebuild
Modest performance loss running off line
• All TP indexes built – only need to build local indexes after binary load
• No real increased concurrency
Running online in parallel for all local indexes is a huge win
• Very small amount of BI activity when run online
Operation Non-TP Table
TP Entire Table
%Difference
Idxbuild off-line 16m37s 18m26s -10.93Idxbuild on-line (2 local) (9m15.5) 4m49s 92.04Idxbuild on-line –i (2 local) (9m15.5) 4m53s 89.42
© 2015 Progress Software Corporation. All rights reserved.40
Binary Dump
Operation Non-TP Table TP Entire Table %DifferenceBinary Dump -1 5m13.499s 3m21.848s 54.95Binary Dump –n w/1 exe 6m10.337s 3m43.359s 65.92Binary Dump –n w/9 exes 2m30.658s 1m11.738s 111.27Binary Dump threaded w/1 exe 2m37.818s 3m40.989s -125.51Binary Dump threaded w/9 exes 2m28.074s 1m10.527s 111.43Binary Dump specified threaded w/9 exes
2m27.048s 1m10.050s 110.00
© 2015 Progress Software Corporation. All rights reserved.41
Summary
Partition based on data grouping
Spread data across partitions
Create and delete performance improvements
• Significant for well designed partition schemes
Isolated read performance mostly unaffected by partitioning
Maintenance performance improvements also significant