1, 2, 3… scatter! getting your humpty-dumpty database in order. tom bascom, white star software...

1, 2, 3…Scatter!

Getting Your Humpty-Dumpty Database in Order.

Tom Bascom, White Star [email protected]

2

A Few Words about the Speaker

• Tom Bascom; Progress user & roaming DBA since 1987

• VP, White Star Software, LLC– Expert consulting services related to all aspects of Progress and

OpenEdge.– [email protected]

• President, DBAppraise, LLC– Remote database management service for OpenEdge.– Simplifying the job of managing and monitoring the world’s

best business applications.– [email protected]

3

“Fragmentation”vs

“Scatter”

4

$ proutil dbname –C dbanalys > dbname.dba

…RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95------------------------------------------------------- Record Size (B) -Fragments- ScatterTable Records Size Min Max Mean Count Factor FactorPUB.APP_FLAGS 1676180 47.9M 28 58 29 1676190 1.0 1.9…

Fragmentation

• “Fragmentation” is splitting records into multiple pieces.

5



Fragmentation


10 additional fragments Beware!

6

Fragmentation Occurs When

• Record data is too big for the block: i.e. 16k of data going into a 4k block.

• Updated data needs more room to expand than is available.

• The “create limit” and the “toss limit” can be used to reserve more free space in blocks and control fragmentation.

• Progress will automatically “de-frag” when possible (10.1+).

7

Scatter

• “Scatter” is a measure of the “sequentialness” of records.



8

What is “Scatter Factor”?

• The “factor” does not care about any ordering – not even primary key.

• For type 1 areas it is a measure of how well the data fits into the minimum # of blocks that would be required to hold it with “distance” between blocks taken into account.

• For type 2 areas there is no distance penalty – but free space in a cluster can increase scatter.

• “Logically adjacent” isn’t really reported by dbanalys.

9

Fragmentation and Scatter


• “Scatter” is a measure of the “sequentialness” of records.

• The “scatter factor” that proutil reports might be better described as “density”.

• If you have two or more indexes at least one of them is probably scattered.

10

Why is ThisImportant?

11

Locality of Reference

• When data is referenced there is a high probability that it will be referenced again soon.

• If data is referenced there is a high probability that “nearby” data will be referenced soon.

• Locality of reference is why caching exists at all levels of computing.

12

How Does Progress Help?

• Temporal Locality (Will be reused “soon”)• -B (LRU Chain)• -mmax• -Bt

• Spatial Locality (“nearby” data will be accessed)

• Type 2 storage areas• -B2 (no LRU)• Memory mapped prolib

13

Cache Effectiveness

Layer Time # of Recs

# of Ops Cost per Op

Relative

Progress 4GL to –B 0.96 100,000 203,473 0.000005 1-B to FS Cache 10.24 100,000 26,711 0.000383 75

FS Cache to SAN 5.93 100,000 26,711 0.000222 45-B to SAN Cache 11.17 100,000 26,711 0.000605 120

SAN Cache to Disk 200.35 100,000 26,711 0.007500 1500-B to Disk 211.52 100,000 26,711 0.007919 1585

14

LogicalScatter

15

Definition

• Logical Scatter is the probability that records in a given logical order are also in “physical” order (in the same block).

• Each index has its own ordering and thus its own logical scatter.

• It is very unlikely that more than one index will be well ordered.

• It is quite possible that all indexes might be scattered.

Type 1 Storage Area

16

Block 1

1 Lift Tours Burlington

3 66 9/23 9/28 Standard Mail

1 1 54 4.86 Shipped

1 2 55 23.85 Shipped

Block 2

1 3 53 8.77 Shipped

2 1 19 2.75 Shipped

2 2 49 6.78 Shipped

2 3 13 10.99 Shipped

Block 3

14 Cologne Germany Germany

2 Upton Frisbee Oslo

1 Koberlein Kelly

1 53 1/26 1/31 FlyByNight

Block 4

BBB Brawn, Bubba B. 1,600

DKP Pitt, Dirk K. 1,800

4 Go Fishing Ltd Harrow

16 Thundering Surf Inc. Coffee City

Type 2 Storage Area

17

Block 1



3 Hoops Atlanta


Block 2

5 Match Point Tennis Boston

6 Fanatical Athletes Montgomery

7 Aerobics Tikkurila

8 Game Set Match Deatsville

Block 3

9 Pihtiputaan Pyora Pihtipudas

10 Just Joggers Limited Ramsbottom

11 Keilailu ja Biljardi Helsinki

12 Surf Lautaveikkoset Salo

Block 4

13 Biljardi ja tennis Mantsala

14 Paris St Germain Paris

15 Hoopla Basketball Egg Harbor


18

Tangent…

• The preceding slides should be all you need to see in order to be convinced that type 1 areas are a bad place to be putting data.

• The schema area is always a type 1 area. Should it have data in it?

19

How to Determine “Logical Scatter”?

• You could read the whole database…• Multiple times…• (Because every index must be considered)

-or-• For each table randomly choose a record.• For each index of that table find the NEXT record.• Is it in the same block?• Lather, Rinse and Repeat.

Type 2 Storage Area

20

Block 1



3 Hoops Atlanta


Block 2

5 Match Point Tennis Boston

6 Fanatical Athletes Montgomery

7 Aerobics Tikkurila

8 Game Set Match Deatsville

Block 3

9 Pihtiputaan Pyora Pihtipudas

10 Just Joggers Limited Ramsbottom

11 Keilailu ja Biljardi Helsinki

12 Surf Lautaveikkoset Salo

Block 4

13 Biljardi ja tennis Mantsala

14 Paris St Germain Paris

15 Hoopla Basketball Egg Harbor


Id = 100% Name = 25% City 19%

21

WhichIndex?

22

4GL Index Selectioncompile cust.p xref “cust.xrf”

/* cust.p */for each customer no-lock: display custNum.end.

cust.p 1 COMPILE cust.pcust.p 1 CPINTERNAL ISO8859-1cust.p 1 CPSTREAM ISO8859-1cust.p 1 STRING "Customer" 8 NONE UNTRANSLATABLEcust.p 1 SEARCH sports2000.Customer CustNum WHOLE-INDEXcust.p 2 ACCESS sports2000.Customer CustNumcust.p 2 STRING ">>>>9" 5 NONE TRANSLATABLE FORMATcust.p 3 STRING "Cust Num" 8 LEFT TRANSLATABLEcust.p 3 STRING "CustNum" 7 NONE UNTRANSLATABLEcust.p 3 STRING "--------" 8 NONE UNTRANSLATABLE

23

Amdahl’s Law

The performance enhancement possible with a given improvement is limited by the fraction of the execution time that the improved feature is used.

24

Compile Time is Not Enough

• Dynamic Queries• SQL-92 Cost Based Optimizer• Unreached Code• Rarely Run Code• Widespread, but Low Impact Code

25

Execution Time Index Usagefor each _indexStat no-lock: find _index no-lock where _index._idx-num = _indexStat._indexStat-id no-error. if available( _index ) then do: find _file no-lock where recid( _file ) = _index._file-recid no-error. display _indexStat._indexStat-id "*" when ( _file._prime-index = recid( _index )) "u" when ( _index._unique = true ) _file._file-name when available _file _index._index-name when available _index _indexStat._indexStat-read . end.end.

26

Execution Time Index UsageId File-Name Index-Name Read

1 * _File _File-Name 39,550,882

2 * U _Field _File/Field 29

3 U _Field _Field-Name 2,457,451

4 U _Field _Field-Positi on 1,999,506

5 * U _Index _File/Index 4,791,744

6 * U _Index-Field _Index/Number 7,344,550

7 _Index-Field _Field 0

8 * U table1 t1_idx1 6,668,593

9 U table1 t1_idx2 224,913

10 * U table2 t2_idx1 42,078,065

11 table2 t2_idx2 19,772,967,351

12 table2 t2_idx3 0

27

VST Note

• The default db settings only collect statistics for the first 50 tables and indexes.

• To fix this:

-tablerangesize 1000-indexrangesize 3000

define variable t as integer no-undo label “Tables”.define variable i as integer no-undo label “Indexes”.for each _file no-lock where _hidden = no: t = t + 1. end.for each _index no-lock: i = i + 1. end.display t i.

28

CaseStudy

29

Logical Scatter Case Study

• A process reading approximately 1,000,000 records.

• An initial run time of 2 hours.– 139 records/sec.

• Un-optimized database.

30

BaselineTable Index %Sequential %Idx Used DensityTable1 t1_idx1 0% 100% 0.09

t1_idx2 0% 0% 0.09Table2 t2_idx1 69% 99% 0.51

t2_idx2 98% 1% 0.51t2_idx3 74% 0% 0.51

4k DB BlockType 1 Area

• -B 25,000• Hit Ratio 95%• 19,208 IO ops• Run time 2 hours

31

Round 1 – Increase Big BTable Index %Sequential %Idx Used DensityTable1 t1_idx1 0% 100% 0.09

t1_idx2 0% 0% 0.09Table2 t2_idx1 69% 99% 0.51

t2_idx2 98% 1% 0.51t2_idx3 74% 0% 0.51


• -B 100,000• Hit Ratio 98%• 9,816 IO ops• Run time 60 minutes

32

Round 2 – Increase Some More

Table Index %Sequential %Idx Used DensityTable1 t1_idx1 0% 100% 0.09

t1_idx2 0% 0% 0.09Table2 t2_idx1 69% 99% 0.51

t2_idx2 98% 1% 0.51t2_idx3 74% 0% 0.51



33

Restructure DB

• Dump & Load• Convert to 8KB DB Blocks• Convert to Type 2 Storage Areas

34

Round 3 – Back to Baseline –B

8k DB BlockType 2 Areas


Table Index %Sequential %Idx Used Density

Table1 t1_idx1 71% (0) 100% 0.10

t1_idx2 63% (0) 0% 0.10

Table2 t2_idx1 85% (69) 99% 1.00

t2_idx2 100% (98) 1% 1.00

t2_idx3 83% (74) 0% 0.99

35

Round 4 – Bump Big B




Table1 t1_idx1 71% 100% 0.10

t1_idx2 63% 0% 0.10

Table2 t2_idx1 85% 99% 1.00

t2_idx2 100% 1% 1.00

t2_idx3 83% 0% 0.99

36

Round 5 – Big B … Again




Table1 t1_idx1 71% 100% 0.10

t1_idx2 63% 0% 0.10

Table2 t2_idx1 85% 99% 1.00

t2_idx2 100% 1% 1.00

t2_idx3 83% 0% 0.99

37

Are We Done?


• The most used index is not the most sequential index!


Table1 t1_idx1 71% 100% 0.10

t1_idx2 63% 0% 0.10

Table2 t2_idx1 85% 99% 1.00

t2_idx2 100% 1% 1.00

t2_idx3 83% 0% 0.99

38

Restructure DB

• Dump & Load• Dump Table 2 using the most used index:

• t2_idx1

• Load Normally

39

Why Not?

• Return on Investment:– Pickups from improving %SEQ are less than those

from improving Hit Ratio.– That last 15% is a drop in the bucket compared to

the 6x improvement already gained.– Expected improvement would be about 2% -- of

20 minutes. Or around 24 seconds.

40

However…

• At low buffer hit ratios (95% or lower):– Restructuring to favor the most used index results in

a 60% improvement in time.– And the hit ratio improves to 99.75%.– By eliminating 95% of the disk IO ops (112,247 ->

5196).

• On the other hand… the system in question has grown again and it may now be worth revisiting.

41

Conclusion

• Type 2 Storage Areas improve “logical scatter”.• Addressing “logical scatter” can be a powerful

performance improvement technique.• Addressing “logical scatter” can be an

alternative to increasing –B in environments where shared memory is constrained.

42

Questions?

43

Questions?

• Should I use USE-INDEX to force a “well ordered” index?

• Why might scatter grow over time?

• I have two (or more) conflicting, but very important, needs. What can I do?

44

Thank You!

1, 2, 3… scatter! getting your humpty-dumpty database in order. tom bascom, white star software...

Documents