database i/o mechanisms performance and persistence richard banville fellow, openedge development...

42
Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

Upload: daniela-potter

Post on 23-Dec-2015

252 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

Database I/O Mechanisms

Performance and persistence

Richard BanvilleFellow, OpenEdge DevelopmentProgress Software

Page 2: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.2

Agenda

1 Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary5

Page 3: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.3

File Write I/O for File Types

Logical vs Physical

• Database request vs OS I/O

• Database I/O vs O/S I/O

Physical I/O always uses file system cache (no raw I/O)

Buffered vs unbuffered I/O

• Unbuffered I/O considered durable after write system call

– Recovery data with integrity

– User data with -directio

• Buffered I/O requires file system sync. for durability

– Recovery data with no integrity

– User data

Page 4: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.4

OpenEdge I/O & The File System

DatabaseBuffer Pool

BIBuffers

AIBuffers

File system cache

.d.d.d.d.d.d

.d.b.d.a

System Memory

Process Shared Memory

Physical Disk Devices

Multi level caches Multi level caches

I/O via F/S cache

Page 5: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.5

OpenEdge Data I/O & The File System

DatabaseBuffer Pool

File system cache

System Memory

Process Shared Memory

.d.d.d.d.d.d

Multi level caches

Buffered I/O to F/S cache

• F/S decides when to write to disk device

• Disk device decides when to write to physical disk

• At checkpoint, made durable via fdatasync() / FlushFileBuffers()

– Required for crash recovery and Bi space reuse to work properly

Promon Checkpoints:

Flushes Duration Sync Time

0 0.20 0.02

4 0.20 0.04

4 0.17 0.02

2 0.22 0.03

Disk Devices

Page 6: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.6

OpenEdge –directio I/O & The File System

DatabaseBuffer Pool

File system cache

System Memory

Process Shared Memory

.d.d.d.d.d.d

Multi level caches

-directio

• Unbuffered I/O thru F/S cache

– Not raw I/O to disk device

• Each I/O sync‘d to disk device

• Operational affects

– No need to sync at checkpoint

– Write I/O more expensive

– Additional cost to page writers

Promon Checkpoints:

Flushes Duration Sync Time

0 0.16 0.00

2 0.18 0.00

11 0.16 0.00

0 0.18 0.00

Disk Devices

Page 7: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.7

OpenEdge –directio I/O Performance

DatabaseBuffer Pool

File system cache

System Memory

Process Shared Memory

.d.d.d.d.d.d

Multi level caches

How could the more expensive writes of –directio improve performance?

• APWs absorb the additional cost

• If they do all the writing without adding OLTP contention

• Lower checkpoint costs

– Each I/O sync‘d to disk device

– No sync needed during checkpoint

– Higher throughput due to less pause

• May help on inadequate file system

• Less useful for

– Well tuned deployments

– Properly sized systems

– When buffers flushed at checkpoint

Disk Devices

Page 8: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.8

OpenEdge Recovery I/O & The File System

Unbuffered I/O to F/S cache

• Each I/O sync‘d to disk device

• For .bi, called “reliable I/O”

BI blocks written when:• BIW notices full block in out buffer

• APW writes data block with bi dependancy

• Broker notices aged commit (-Mf)

• User can‘t find empty bi block to store update notes

• User must perform checkpoint

BIBuffers

File system cache

System Memory

Process Shared Memory

Multi level caches

.d.b

Disk Devices

Page 9: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.9

OpenEdge Recovery: Making it unreliable

Never in production

Specific maintenance only

-r: BI writes are buffered (un-reliable) to F/S

All change notes recorded• Rollback will work

• Crash recovery likely to work

• Recovery from OS crash will most likely fail

• idxbuild some index, !“some; !”

BIBuffers

File system cache

System Memory

Process Shared Memory

Multi level caches

.d.b

*** An earlier -r session crashed, the database may be damaged. (514)

Disk Devices

Page 10: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.10

OpenEdge Recovery: Making it more unreliable

Never ever in production

Specific maintenance only

-i: no-integrity• BI writes are buffered

• No data dependency check (!WAL)

• No F/S sync at checkpoint

• No record of purely physical notes

• Rollback might work

• OS, DB crash, abnormal termination

– Must restore from backup

BIBuffers

File system cache

System Memory

Process Shared Memory

Multi level caches

.d.b** Your database cannot be repaired. You must restore a backup copy. (510)

Disk Devices

Page 11: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.11

Agenda

1 Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary5

Page 12: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.12

Buffer Pool I/O

Database

Buffer Pool (-B, -B2)

4 160 32 128 64 …

2 144 192 112 80 … LRU buffer eviction policy

LRU2 buffer eviction policy

Database

Buffer

Lookup

If not found via hash table lookup

• Incur O/S read I/O – “page-in”

• But where do you read into?

1 Buffer pool cache

1 Hash table

Multiple LRUreplacement chainsB

uffe

r p

oo

l ha

sh t

ab

le

Page 13: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.13

Buffer Pool I/O

Database

Buffer Pool (-B, -B2)

C C C C D …

D D C C D … LRU buffer eviction policy

LRU2 buffer eviction policy

Database

Buffer

Lookup

Start at LRU end of buffer replacement chain

• Look for first “non-dirty” buffer (to avoid write)

• Can’t find one after 10 tries?

– “Page-out” least recently used buffer (O/S write I/O) “LRU writes”

– May force (multiple) BI/AI writes, usually partial writes!

– “Page-in” your block to available buffer (O/S read I/O)

1 Buffer pool cache

1 Hash table

Multiple LRUreplacement chainsB

uffe

r p

oo

l ha

sh t

ab

le

Page 14: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.14

Data Read I/O Tuning

Avoiding read I/O

• Large buffer pool (-B)

• Utilize alternate buffer pool (-B2)

• Improve queries; Avoid table scans; Cache data locally

• Private “read-only” buffers (–Bp), utilities too!

Increase pool when read I/O unacceptable for properly tuned application

Too many buffers may cause O/S paging

• Decrease file system cache

• Avoid non-essential activities on production server

• Consider buying more memory

Database Buffer Pool

-B & -B2 buffers

I/O

DB

Increase performance by decreasing I/O

Page 15: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.15

Promon R&D => Performance indicators

Promon R&D => Buffer cache

• O/S reads and O/S writes

• Flushed at checkpoint

• LRU Writes

• APW enqueues*

Data I/O Performance Monitoring - Promon

What about buffer pool hit ratio % (BHR)?

• Too easily skewed by bad queries

• Not a fine enough metric (hits / requests)

– 270,000 database read requests / second

– Buffer hit ratio % of 98

– Still means 5,400 O/S Read I/Os per second!

– Fast F/S access still 75x slower than -B

Database Buffer Pool

-B & -B2 buffers

I/O

DB

A low BHR indicates apoorly tuned system

A high BHR does notdenote a well tuned system

Page 16: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.16

Data Write I/O Tuning

Avoiding write I/O

• Large buffer pool lessens forced “page-outs”

• Improve queries in the application

• Reduce checkpoint frequency (see next section)

• Run with APWs (Have someone else do it!)

– Avoids user and server writes

– Decreases LRU writes (forced “page-outs”)

– Reduces checkpoint time

– Performs DB buffer pool I/O

– May flush AI and BI data

Database Buffer Pool

-B & -B2 buffers

I/O

DB

Increase performance by decreasing I/O

Page 17: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.17

Asynchronous Page Writer Activities

CheckpointQueue

Primary –B buffer poolAND

Alternate –B2 buffer pool

C D C …

D D C …LRU

chains

4 148 200 120

BI

WAL

APW

DB

D D D …

APW Queue

Forced bi write only if cluster > 95% full

New adaptive mechanism for checkpoint processing

Avoids buffers flushed

10.2b FCS

#1

#2

#3

Page 18: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.18

Asynchronous Page Writer Performance

CheckpointQueue

Primary –B buffer poolAND

Alternate –B2 buffer pool

R U R …

U U R …

LRU chains

4 148 200 120

BI

WAL

APW

DB

U U U …

APW Queue

Promon R&D => Page Writers

• APW queue writes

• Checkpoint queue writes

• Buffers scanned

• Scan writes

Tuning

• Increase until 0 blocks flushed at checkpoint

• Decrease if partial BI writes increase

• Increasing BI cluster size can avoid:

– partial BI writes

– forcing BI writes (95% full less of the time)

• Typically need more if running with Direct I/O

Page 19: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.19

Agenda

1 Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary5

Page 20: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.20

Rollback Processing

BI Buffer Pool

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

15

Current Input Buffer

9

Backout Buffer

12

Backout Buffer

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

Page 21: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.21

BI Buffer Pool – Recording a change

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

B I W

User

Empty buffer waits

Busy buffer waits

BIB latch contention

• -bwdelay in ms (30ms)

• Nap time when nothing dirty

• Not much positive tuning affect

Page 22: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.22

BI Buffer Pool – Forced Write I/O

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

User

Buffer Pool

172

128

Associated BI notedependency ctr (based on fill %)

Data Blocks

WALAPW

DB

256

512

768

CheckpointQueue

Page 23: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.23

BI Buffer Pool – Write I/O

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

Broker

User

Is it OK to buffer modified BI blocks?

YES

Is it OK to buffer committed BI data?

Delayed commit (-Mf) is up to you!

Delayed commit (Durability)

Based on –Mf value, Broker may flush BI buffers to disk

For aged txn ends

-Mf default 3

Increasing -Mf Pros/Cons:

Page 24: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.24

Rollback Processing

BI Buffer Pool – Change rollback

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

15

Current Input Buffer

9

Backout Buffer

12

Backout Buffer

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

1 shared input buffer

Multiple privateback out buffers

Page 25: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.25

Rollback Processing

BI Buffer Pool – Change rollback

32 31

30

29

Modified Queue

15

Current Input Buffer

9

Back out Buffer

12

Back out Buffer

BI

Current Output Buffer

– Read I/O to find notes

– Write I/O when undoing

Promon:

• BI Reads

• Input buffer hits

• Output buffer hits

• Mod buffer hits

• BO buffer hits

Page 26: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.26

Tuning the Bi Buffer Pool

-bibufs 10

Free(a)

Free(b)

Free(c)

Free(d)

Free(e)

32 31

30

29

Modified QueueFree List

BI

Current Output Buffer

New Notes (Actions)

Forward Processing

B I W

User

Run BIW

Promon: 5. BI Log Activity

Empty buffer waits – all full

• Increase –bibufs (online)

• -aibufs >= -bibufs

• Start with –bibuf 150

Partial (forced) writes

• -Mf expired

– Increase if not risk adverse

• Too many APWs

• Tune checkpoint processing

Busy buffer waits – busy - OK

Log force waits/write – 2PC commit

Page 27: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.27

Monitoring BI Activity & Performance Summary

Activity

Forward Activity• Total BI writes• Records (notes) written• Clusters closed

Undo• Total BI reads• Notes read • Input buffer hits• Output buffer hits• Mod buffer hits• BO Buffer Hits

Performance

OK Waits & Writes• Busy buffer waits• BIW writes

Bad Waits & Writes• Empty buffer waits• Partial writes• Forced writes (2PC)• Flushed at checkpoint• Checkpoint duration (wait)

Page 28: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.28

Checkpoint Processing

Page 29: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.29

Checkpoint Processing

Quiet DB

Database changes halted

Page writers continue

Flush bibufs

Output, Mod buffers

May cause 1 partial write

Scan buffer pool

Write bufs on chkpt queue

Dirty buffs added to

chkpt queue

“Fuzzy” checkpoint

Hopefully flushed prior to next chkpt

Flush aibufs

Output, Mod buffers

May cause 1 partial write

Sync File System

F/S Sync system call

No more sync delay

Resume database activity

Page 30: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.30

Promon Checkpoint Data

No. Time … CPT Q Scan APW Q Flushes (Cont.)

27 10:23:12 … 0 384 52 0 …26 10:22:46 … 0 381 381 3 …25 10:22:18 … 0 380 380 2 …24 10:21:50 … 201 158 158 0 …

--------- Database Writes ---------

APW Specific Activity…

CPT Q: # data buffers APW wrote from checkpoint queue (from prev chkpt)

Scan: # data buffers APW wrote while scanning -B

APW Q: # data buffers APW wrote from APW Q

Dirty buffers added to APWQ from -B LRU eviction

Page 31: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.31

Promon Checkpoint Data

No. Time … CPT Q Scan APW Q Flushes (Cont.)

27 10:23:12 … 0 384 52 0 …26 10:22:46 … 0 381 381 3 …25 10:22:18 … 0 380 380 2 …24 10:21:50 … 201 158 158 0 …

--------- Database Writes ---------

Flushes:

• Number of database blocks written during checkpoint

– Very costly operation (db updates paused)

– Should add ai/bi flushes

• Marked from previous checkpoint

• Avoid with APWs and larger cluster sizes

Page 32: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.32

Promon Checkpoint Data

No. Time … Duration Sync Time

27 10:23:12 … 0.12 0.04

26 10:22:46 … 0.11 0.03

25 10:22:18 … 0.11 0.04

24 10:21:50 … 0.13 0.04

----- New Columns -----

Duration:

• Time to process checkpoint including:

– Write chkpt queue, buffer pool scan, bi/ai flush, F/S Sync

Sync Time: Amount of time in seconds it took for fdatasync() or FlushFileBuffers()

• Limit file system cache size and flush frequency

• Faster disks for data files

• Avoid with –directio (but increases all write I/Os)

File System Cache

File System Cache

DB

Page 33: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.33

Tuning Checkpoint Processing

Physical

BI truncate• Values in K

• -bi (cluster size in KB)

• -biblocksize (size in KB)

Before-image block size set to 8 or 16 kb

• Followed by sync command

Runtime BI bufs

BIW

proutil <db> -C truncate bi -biblocksize 8 -bi 8192

proutil <db> -C bigrow 8 -r

Page 34: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.34

Summary: Recovery Subsystem

AI/BI buffers

• No LRU replacement mechanism

• Database changes recorded orderly

• Forward processing causes BI write I/O

• Rollback may cause read I/O

– Backout Buffers (BOB) help rollback contention

Checkpoints

• Buffers flushed during checkpoint

Page writers

• BIW/AIW processing

• APW processing

Page 35: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.35

Agenda

1 Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary5

Page 36: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.36

Database Extend

Maintenance cost

Performance

Concurrency

Frequency

Database extend

• Storage area locked - no other extends

• Writes performed 16K at a time

• Extend by 64 blocks or cluster size

Recovery extend (AI/BI)

Acquire space from F/S

Unbuffered write

Bi grow after truncate

• Performance Improvements

• F/S interaction for extent create 11.3

• BI extend, format & grow in 11.3

Page 37: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.37

Monitoring I/O With Promon R&D

2. Activity Displays ...

1. Summary

3. Buffer Cache

4. Page Writers

5. BI Log / 6. AI Log

8. I/O Operations by Type

9. I/O Operations by File

Database Accesses vs File I/O

• Database writes

• O/S Writes

3. Other Displays…

1. Performance Indicators

2. I/O Operations by Process

4. Checkpoints

5. I/O Operations by User by Table

6. I/O Operations by User by Index

Page 38: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.38

Agenda

1 Database I/O Types

User Data I/O

Recovery Data I/O

Other I/O

2

3

4

Summary5

Page 39: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.39

Summary

• Always uses file system cache (no raw I/O)• Buffered vs unbuffered I/O• User data files: .d’s and recovery files: .ai, .bi, .tl

I/O Types

• Checkpoint process• Page writers (APW, BIW, AIW)

Data and recovery I/O

• Monitor via promon, VSTs and OS tools• Tuning tips

Performance

Page 40: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

© 2013 Progress Software Corporation. All rights reserved.40

Questions?

Page 41: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software

October 6–9, 2013 • Boston #PRGS13

www.progress.com/exchange-pug

Special low rate of $495 for PUG Challenge attendees with the code PUGAM

And visit the Progress booth to learn more about the Progress App Dev Challenge!

Page 42: Database I/O Mechanisms Performance and persistence Richard Banville Fellow, OpenEdge Development Progress Software