db-20: highly parallel dump & load tom bascom vp technology white star software [email protected]

40
DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software [email protected]

Upload: efrain-macklin

Post on 01-Apr-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump & Load

Tom BascomVP TechnologyWhite Star [email protected]

Page 2: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 2

Agenda…

Why Dump and Load? What Methods are Available? Using a Highly Parallel Approach Demo!

Page 3: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 3

Why Dump and Load?

Because it is X months since the last time. Because the vendor said so. Because we must in order to upgrade

Progress. Because it will improve performance.

Page 4: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 4

Why Dump and Load?

Because it is X months since the last time. Because the vendor said so. Because we must in order to upgrade

Progress. Because it will improve performance.X

Page 5: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 5

Good Reasons to D&L

Improve sequential data access (reports). Consolidate fragmented records. To recover corrupted data. In order to move to a new platform. In order to reconfigure a storage management

decision. In order to take advantage of hot new db

features!

Page 6: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 6

Hot Features That Need D&L

Variable Block Sizes Storage Areas Variable Rows Per Block Data Clusters (Type II Areas)

Page 7: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 7

Some Dump & Load Choices

“Classic” Dictionary Dump & Load Bulk Loader Binary Dump & Load Vendor Supplied Utilities

Parallelization– Many Benefits

– Very difficult to effectively balance manually. Automating the process

Page 8: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 8

Classic Dictionary D&L

Order-lineCust order hist

cust Order-line order

1st the dump…

… and then the load.

Page 9: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 9

Classic Dictionary D&L

This can take DAYS (or even WEEKS!)

Page 10: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 10

Multi-threaded D&L

Order-line

cust

order hist

cust

Order-line order

1st the dump…

… and then the load.

Page 11: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 11

Multi-threaded D&L

Much more reasonable than “classic.” Generally can be done in a weekend.

But still too long for many situations:– Threads aren’t easily balanced.

– Longest thread determines total time.

– Timing varies – it is hard to predict.

Page 12: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 12

Overlapping, Parallel, Balanced D&L

Order-line

cust

order hist

cust

Order-line

order

Both the dump…

… and the load – in parallel, multi-threaded and load-balanced.

Page 13: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 13

Parallel Balanced D&L

Usually a big improvement. Often can be done in hours.

But difficult to configure and manage:– You need an intelligent and dynamic control

program.

Page 14: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 14

Highly Parallel D&L

SRC DST

Control

dlctl.p dlmon.p

dlx.p

dlx.p

dlx.p

dlx.p

status

config

initialize

stat

us

dump load

Spawn

Page 15: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 15

Demo !!!

Page 16: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 16

Resources

SUSE Linux 9 Server– 2 x 2ghz (Xeon w/HT)– 512 MB RAM– 3 internal disks

Various External Disks

5GB Demo Database– 85 Tables– 250 Indexes– Various table and record sizes

Page 17: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 17

The Storage Area Design

Separate Tables from Indexes. Use a single Type 1 Area for tiny

“miscellaneous” tables that have low activity. Isolate Small, but very active, tables in Type 2

Areas (or standalone areas if v9). Organize Large Tables by RPB in Type 2 Areas

– Any table over 1 million rows

– Any table over 1GB Isolate very large indexes

Page 18: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 18

The Storage Area Designb exch06.b1 f 384000#d "Schema Area":6,64 .#d "Misc":10,256 exch06_10.d1 f 768#d "SmallTables":12,128;8 exch06_12.d1 f 9120#d "RPB256":14,256;512 exch06_14.d1 f 221280#d "RPB128":16,128;512 exch06_16.d1 f 768000#d "RPB64":18,64;512 exch06_18.d1 f 1536000d "RPB64":18,64;512 exch06_18.d2 f 1536000#d "RPB32":20,32;512 exch06_20.d1 f 2048000d "RPB32":20,32;512 exch06_20.d2 f 2048000#d "RPB16":22,16;512 exch06_22.d1 f 768000#d "Indexes":99,8 exch06_99.d1 f 1024000

Page 19: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 19

Highly Parallel Dump & Load

90 threads, 16 concurrent. Dump threads are the same as load threads. Easy monitoring and control. No latency – the load finishes when the dump

finishes. No intermediate disk IO:

– More efficient usage of disk resources.

– Higher overall throughput. No index rebuild. No need for table analysis.

Page 20: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 20

The 80/20 Rule

98% of the tables are done very quickly 2% of the tables take most of the time… … and nearly all of the space

These are the tables that we target with multiple threads!

Page 21: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 21

Multiple Threads

Table t Criteria──────── ─ ──────────────────────────────────────────────────────────────────

document 1 where src.document.app# < 1100000document 2 where src.document.app# >= 1100000 and src.document.app# < 1200000document 3 where src.document.app# >= 1200000 and src.document.app# < 1300000document 4 where src.document.app# >= 1300000 and src.document.app# < 1400000document 5 where src.document.app# >= 1400000 and src.document.app# < 1500000document 6 where src.document.app# >= 1500000 and src.document.app# < 1600000document 7 where src.document.app# >= 1600000

LOGS 1 where type = "P" and ( logdate <= 12/31/2003 )LOGS 6 where type = "P" and ( logdate >= 01/01/2004 and logdate <= 06/30/2004 )LOGS 7 where type = "P" and ( logdate >= 07/01/2004 and logdate <= 12/31/2004 )LOGS 8 where type = "P" and ( logdate >= 01/01/2005 and logdate <= 06/30/2005 )LOGS 9 where type = "P" and ( logdate >= 07/01/2005 and logdate <= 12/31/2005 )LOGS 10 where type = "P" and ( logdate >= 01/01/2006 )LOGS 14 where type = "L" and ( logdate <= 08/01/2002 )LOGS 15 where type = "L" and ( logdate > 08/01/2002 )LOGS 16 where type = "M"LOGS 17 where type = "U"

Page 22: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 22

Dump Factors

Major factors that contribute to dump time:– Data distribution (scatter factor)

– Index Selection

– -B, -Bp, -RO

– Storage Area Configuration (dedicated areas and type 2 areas can be very, very fast)

– Disk Configuration and Throughput

proutil dbname –C tabanalys (when doing a binary d&l)

Page 23: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 23

Concurrency -- Dumping

Dump Time

0

300

600

900

1,200

1,500

1,800

5

10

15

20

30

40

50

90

Threads

Tim

e (

se

co

nd

s)

Page 24: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 24

Take a Load Off

Factors that contribute to load time:– Progress version

– Build indexes

– -i, -bibufs, -bigrow, -B, -TB, -TM, -SG

– Storage Area Configuration

– Disk Configuration and Throughput

– -T files

– -noautoreslist (aka “forward-only”)

The importance of testing

Page 25: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 25

Concurrency -- Loading

Dump & Load Time

0300600900

1,2001,5001,8002,1002,4002,7003,0003,3003,600

5

10

15

20

30

40

50

90

Threads

Tim

e (

se

co

nd

s)

Page 26: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 26

Buffer-Copy & Raw-Transfer

Very Fast Eliminates The Middle Man (temp file IO

operations) Provides fine-grained, ABL level of control

– Allows on the fly data manipulation– Useful when merging databases

Can use with remote connections to bridge version numbers.

In OE10.1 performance is essentially equal Cannot use RAW-TRANSFER with -RO

Page 27: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 27

The Code

start.p dlctl.p dlclear.p dlwrap.p dlx.p

Page 28: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 28

start.p

/* start.p

*

*/

run ./dotp/dlclear.p.

pause 1 no-message.

run ./dotp/dlctl.p.

pause 1 no-message.

run ./dotp/dlmon.p.

return.

Page 29: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 29

dlclear.p/* dlclear.p * */

define new global shared variable dlstart as character no-undo format "x(20)".

dlstart = "".

for each dlctl exclusive-lock: assign dlctl.dumped = 0 dlctl.loaded = 0 dlctl.err_count = 0 dlctl.note = "" dlctl.xstatus = "" .end.

return.

Page 30: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 30

dlctl.p/* dlctl.p * */

for each dlctl no-lock where dlctl.active = yes by dlctl.expected descending:

os-command silent value( "mbpro -pf dl.pf -p ./dotp/dlwrap.p -param " + '"' + string( dlctl.thread ) + "|" + dlctl.tblname + "|" + dlctl.criteria +"|" + dlctl.pname + '"' + " >> /tmp/dl/error.log 2>&1 ~&" ).

end.

return.

Page 31: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 31

dlwrap.p/* dlwrap.p * */

define variable thr as character no-undo.define variable tbl as character no-undo.define variable cri as character no-undo.define variable pnm as character no-undo.

thr = entry( 1, session:parameter, "|" ).tbl = entry( 2, session:parameter, "|" ).cri = entry( 3, session:parameter, "|" ).pnm = entry( 4, session:parameter, "|" ).

if pnm = "" then pnm = "dlx".

pnm = "./dotp/" + pnm + ".p".

run value( pnm ) value( thr ) value( tbl ) value( cri ).

return.

Page 32: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 32

dlx.p - part 1/* {1} = thread number * {2} = table name * {3} = WHERE clause */

define variable flgname as char no-undo initial /tmp/dl/{2}.{1}.flg".define variable logname as char no-undo initial "/tmp/dl/{2}.{1}.log".

define stream unloaded.output stream unloaded to value( "/tmp/dl/{2}.{1}.d" ).

output to value( logname ) unbuffered.put today " " string( time, "hh:mm:ss" ) " {2} {1}" skip.

disable triggers for dump of src.{2}.disable triggers for load of dst.{2}.

define query q for src.{2}.open query q for each src.{2} no-lock {3}.

Page 33: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 33

dlx.p - part 2outer_loop: do for dlctl, dst.{2} while true transaction: inner_loop: do while true:

get next q no-lock. if not available src.{2} then leave outer_loop. /* end of table */ create dst.{2}. buffer-copy src.{2} to dst.{2} no-error. d = d + 1. if error-status:num-messages = 0 then l = l + 1. else do: delete dst.{2}. e = e + 1. export stream unloaded src.{2}. find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. dlctl.err_count = e. next inner_loop. end.

Page 34: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 34

dlx.p - part 3 if d modulo 100 = 0 then do: find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. assign dlctl.dumped = d dlctl.loaded = l . file-info:file-name = flgname. if file-info:full-pathname = ? then do: dlctl.note = "stopped". leave inner_loop. end. leave inner_loop. end. end. /* inner_loop */end. /* outer_loop */

Page 35: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 35

dlx.p - part 4do for dl.dlctl transaction:

find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. assign dlctl.dumped = d dlctl.loaded = l dlctl.err_count = e dlctl.xstatus = "done" dlctl.note = ( if dlctl.note <> "stopped" then "complete" else "stopped" ) . if dlctl.note <> "stopped" then do: if d < dlctl.expected then dlctl.note = "short". if d > dlctl.expected then dlctl.note = "long". end.

if e > 0 then dlctl.note = dlctl.note + "+errors".

end.

Page 36: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 36

Validation

Belt & Braces!!!– Compare Record Counts

dlcompare.p (for binary d&l)

– Check Logs for Known & Unknown Errors grep –i –f error.list fail, error, (1124)

– Check for Success grep “ 0 errors” /tmp/dl/*.log | wc –l grep “ dumped” /tmp/dl/*.log grep “ loaded” /tmp/dl/*.log ls –l /tmp/dl/*.d

Page 37: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 37

Post Load Monitoring

Watch for very rapid growth:– May be a rows-per-block assignment error (i.e.

data areas with RPB = 1).

– May be a cluster-size error (i.e. index areas with 512 blocks per cluster).

– May be an unannounced application change.

It is not unusual for there to be somewhat elevated IO for a short period of time (perhaps a week or two).

Page 38: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 38

Tip Sheet!

Don’t rely on announcements to keep users out of your system!

Shut down unnecessary services.– Don’t forget cron jobs!

Disable remote access. Rename the database, the directory, the

server… Take the server off the network. Always perform NON-DESTRUCTIVE Dump

& Load!

Page 39: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 39

Recap

Excellent Reasons to Dump & Load:– Move to a new platform!

– Reconfigure a storage configuration!!

– Take advantage of a hot new db features!!!

– Especially Type 2 Areas!!!!

How To Get There Safely… … and Quickly!

Page 40: DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software tom@wss.com

DB-20: Highly Parallel Dump and Load - 40

Questions

http://www.greenfieldtech.com/downloads.shtml