heaven a hierarchical storage and archive environment for multidimensional array-dbms bernd reiner...

11
HEAVEN A Hierarchical Storage and Archive Environment for Multidimensional Array-DBMS Bernd Reiner [email protected] muenchen.de

Upload: eustacia-baker

Post on 03-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

HEAVENA Hierarchical Storage and Archive Environment for

Multidimensional Array-DBMS

Bernd Reiner

[email protected]

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 2

Array DBMS

Multidimensional object (MDD)

set of multidimensional tiles

tile = subarray

Access to subsets of MDDsAccess to subsets of MDDs

Multidimensional query language RasQLMultidimensional query language RasQL

Indextiles stored in relational DBMS BLOBS

multidimensional index (R+ tree)

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 3

Motivation

Increasing amount of data (up to Petabyte)

Hard disks too small/expensive to hold hundreds of Terabytes

Typically data stored as files on Hierarchical Storage Management Systems (HSM-System, e.g. Tapes)

DBMS only used for Metadata

With the multidimensional array DBMS RasDaMan only subsets must be transferred instead of whole MDDs

Include archived data in DBMS data accessInclude archived data in DBMS data access

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 4

System Architecture

ClientClient

RasQLRasQL

DBMSOracle / DB2

DBMSOracle / DB2

SQLSQL

DBMS(on HDD)

DBMS(on HDD)

Tertiary Storage ManagerTertiary Storage Manager

RasDaMan ServerRasDaMan Server

File Storage ManagerFile Storage Manager

MultidimensionalArray DBMS

Online Nearline Offline

Offline Storage

HSM

import

exportmigrate

stageCache

HSM

import

exportmigrate

stageCache

Hierarchical Storage

Management System

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 5

Optimization

Minimization of tape access operationsTiling, Object-Framing, Caching

Minimization of media exchange operationsClustering, ordered Query-Queue, “lazy eject”

Minimization of positioning timeClustering, ordered Query-Queue

ParallelizationInter, intra object parallelization

Publications: VLDB 2002, DEXA 2002, DEXA 2003

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 6

Super-TileSuper-Tile

algorithmalgorithm

Export to Tertiary Media

Tile 1Tile 1 Tile 2Tile 2 Tile 3Tile 3 Tile 4Tile 4

ST-1 ST-2 ST-3 ST-4

exportexport

Magnetic TapeMagnetic Tape

One TileOne Tile

Preserves multidim. clusteringon Tape

Preserves multidim. clusteringon Tape

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 7

Import from Tertiary Media

Super-Tiles

Super-Tiles

computecompute

ImportSuper-Tiles

ImportSuper-Tiles

RasDaManviewer

RasDaManviewer

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 8

Object-Framing

Reducing tape access

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 9

Datenimport TCT (Objekt: mpim4d mit 1,35 GByte)

0

20

40

60

80

100

120

140

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Super-Tile-Nr. (48 MByte)

Zeit

[Sek

unde

n] Positionierung

Lesen der Super-Tilesvon DLT-Magnetband

DLT4000average accesstime 68s

Partitioning of data random

Data Retrieval from HSM

Super-Tile-No. (48 MByte)

Tim

e (s

ec.)

Object: mpim4d (1,35 GByte)

Positioning

Read data

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 10

Datenimport TCT (Objekt: mpim4d mit 1,35 GByte)

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Super-Tile-Nr. (48 MByte)

Zei

t [S

eku

nd

en]

Positionierung

Lesen der Super-Tilesvon DLT-Magnetband

Data Retrieval from HSM

Partitioning of data Super-Tile clustering

Super-Tile-No. (48 MByte)

Tim

e (s

ec.)

Object: mpim4d (1,35 GByte)

Positioning

Read data

DLT4000average accesstime 68s

© 2004 FORWISS / [email protected]

EDBT, March 2004, Slide 11

Clustering vs. Random order

Datenimport (Objekt: mpim4d mit 1,35 GByte)

19,66

43,78

0

5

10

15

20

25

30

35

40

45

50

Zei

t [M

inu

ten

]

Super-Tile clustering

random order

Tim

e (s

ec.)