1 tough choices materialize nothing. compute every cell on demand. worst query response time. no...

36
1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube. Many cells are computable from other cells. But which cells to materialize? More cells = better query performance. Materialize the entire data cube. Best query response time. Excessive space requirements.

Upload: taliyah-worstell

Post on 16-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

1

Tough Choices

Materialize nothing.Compute every cell on demand.Worst query response time.No space requirements.

Materialize part of the data cube.Many cells are computable from other cells.But which cells to materialize?More cells = better query performance.

Materialize the entire data cube.Best query response time.Excessive space requirements.

Page 2: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

2

Data Value Hypercube

DATA VALUE HYPERCUBES store data-record indices, whereas existing data cubes can only store data aggregates.

versus ordinary data cubes

DATA VALUE HYPERCUBES are generated as quickly as existing data cubes.

Page 3: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

3

Remember this?

Now it doesn’t matter.

OLTP

OLAP

UNSTRUCTUREDDATA

STRUCTUREDDATA

EmailMulti-

DimensionalDatabases

XML

EDISpreadsheets

Web Pages

RSS

Web Log

Voice recognition

Instant Messaging

Wikis

Content Management

Document Management

Taxonomies,OntologiesMultimedia

LegacyDatabases

RelationalDatabases

Main FrameDatabases

+80%

-80%

Page 4: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

4

Hypercubes are constructed so that each cell corresponds to a unique combination of database attribute values.

3 attributes require at least 8 cells.

Hypercube

Page 5: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

5

Page 6: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

6

CustomerPart

Customer

CustomerSupplier

None

PartSupplier

Supplier

Part

CustomerPartSupplier

Page 7: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

7

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

SupplierBoeingLockheed

CustomerDeltaFedEx

None

CockpitJet EngineWing

Part

Page 8: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

8

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

SupplierBoeingLockheed

CustomerDeltaFedEx

None

CockpitJet EngineWing

Part

1

23

4

5

6 7

8

3 attributes require at least 8 cells.

Page 9: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

9

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

Sales$10$20$30$40$50$60

$70$80

$90$100$110$120

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

Sales

$30$110$190$70$150$230

CockpitJet EngineWing

Part Sales$100$260$420

SupplierBoeingLockheed

Sales$330$450

CustomerDeltaFedEx

Sales$360$420

AllSales

$780

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

Sales

$40$60$120$140$200$220

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

Sales$150$180$210$240

This is entirely fictional data.

Page 10: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

10

Lattice Notation

A lattice is denoted as (L, <=).L = the set of elements (queries).<= is the dependence relation.

ancestor(a) = {b | a <= b}.descendant(a) = {b | b <= a}.Every element is its own descendant and ancestor.next(a) = the immediate proper ancestors of a.next(a) = {b | a < b, there exists a < c, c < b}.

Page 11: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

11

Lattice Diagrams

Lattice diagrams are graphs.

Elements are nodes.

There is an edge from a to b iff b is in next(a).

There is a path downward from y to x iff x <= y.

Page 12: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

12

Hypercube AlgebraSimple database warehouse example. Parts are purchased from suppliers and then sold to customers. Three dimensions: Part, Supplier, and Customer. The measure of interest is total sales. For each cell (p, s, c), store the total sales of part p that was bought from supplier s, and sold to customer c. Users are interested in consolidated sales. Example: what is the total sales of a given part p to a given customer c? This query is answered by looking up the value in cube cell (p, ALL, c).

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

Sales

$40$60$120$140$200$220

Many cells are computable from other cells.Dependent cells.Example: cell (p, ALL, c) is the sum of cells (p, s1, c), …, (p, sn, c).

Page 13: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

13

The Dependence Relationon Queries

Consider two queries Q1 and Q2.

Q1 ≤ Q2 iff Q1 can be answered using only Q2.

Q1 is dependent on Q2.

For example, the query (part), can be answered using only the query (part, customer).

(part) <= (part, customer).

Some queries are not comparable with each other using the <= operator.

For example, (part) !<= (customer) and (customer) !<= (part).

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

Sales

$40$60$120$140$200$220

Page 14: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

14

B-TREE LOGICEASIER THAN IT LOOKS

A C E G I K M O Q S U W Y Z

B F J N R V X

D L T

H P

1 3 5 7 9 11 13 15 17 19 21 23 25 26

2 6 10 14 18 22 24

4 12 20

8 16

Page 15: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

15

B-TREE LOGICB IS FOR BALANCED

100 20 50 80 99GIVEN 3RD ORDER B TREE

WITH THE NUMBERS:

20

8050 9990 10

INSERT 9

49

8050 99100 20

INSERT 49

50

8051 99100 20

INSERT 51

Insert any number < 20 and

becomes the root.

Insert any number > 50 and

becomes the root.

Insert any number > 20 and< 50 and it becomes the root.

50

20

Page 16: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

16

B-Tree Forest

Construction time for the tree forest is

where d is the

number of query dimensions and ni is the

O( 1≤ i ≤ d (log ni))

number of attributes in the database at level d.

Page 17: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

17

B-Tree Forest

A Balanced B-Tree Forest is the data structure that is used to represent a Hypercube.

Each dimension in the Hypercube is represented by a separate B-Tree.

B-Trees are great for storing sparse data and have fast insertion and search characteristics, (nlogn).

Page 18: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

18

B-Tree Forest

A binary tree forest consists of multiple levels of binary trees.

Each level represents a cube dimension.

A binary tree consists of nodes – stems or leaves.

Stems nodes point to left and right binary trees.

Leaf nodes point to a linked list of fact table IDs.

A linked list of fact table IDs points to fact table entries with identical attribute values.

A depth first search on a binary tree forest results in a GROUP BY clause.

Page 19: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

19

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

Sales$10$20$30$40$50$60

$70$80

$90$100$110$120

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

Sales

$30$110$190$70$150$230

CockpitJet EngineWing

Part Sales$100$260$420

SupplierBoeingLockheed

Sales$330$450

CustomerDeltaFedEx

Sales$360$420

AllSales

$780

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

Sales

$40$60$120$140$200$220

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

Sales$150$180$210$240

B-Tree Forest in Reverse: A primer

BoeingLockheed

Cockpit

WingJet Engine

DeltaFedEx

Supplier Tree Customer TreeParts Tree

Page 20: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

20

Extensive B-Trees Are Common

BOEING

GENERAL DYNAMICS

LOCKHEED MARTIN

HONEYWELL INT’L NORTHROP GRUMMAN

UNITED TECHNOLOGIES

AVIONICS

ELEVATOR

JET ENGINE

AILERON FLIGHT CONTROLS

STABILIZER

COCKPIT

FIN FUSELAGE

RUDDER

WING

LANDING GEAR

SOUTHWEST

DHL

DELTA

VIRGINFED EX

But let’s keep it simple for now.

Page 21: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

21

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

Sales

$30$110$190$70$150$230

CockpitJet EngineWing

Part Sales$100$260$420

CustomerDeltaFedEx

Sales$360$420

AllSales

$780

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

Sales

$40$60$120$140$200$220

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

Sales$150$180$210$240

Incoming Data StreamSupplierBoeingLockheed

Sales$330$450

CustomerPartSupplier SalesBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

$10$20$30$40$50$60

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

$70$80

$90$100$110$120

CustomerPartSupplier Sales

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

Sales$10$20$30$40$50$60

$70$80

$90$100$110$120

DATA FLOW

Chunk 1 Chunk 12 intervals of Data FlowChunk 2Chunk 1

Page 22: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

22

Setting up Fact & Dimension TablesSupplierBoeingLockheed

Sales$330$450

CustomerPartSupplier SalesBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

$10$20$30$40$50$60

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

$70$80

$90$100$110$120

CustomerPartSupplier Sales

Chunk 2Chunk 1

CustomerPartPartSupplier Sales

CockpitCockpit

BoeingBoeing

DeltaFedEx

$10$20$30$40$50$60

StringIDGlobal String Table

Boeing

Boeing0Lockheed1Cockpit2

Jet Engine3

PartPart

Wing4Delta5

FedEx6

Lockheed

Cockpit

Jet Engine

Wing

Delta

FedEx

UNSORTED

StringIDSupplier Dimension Table

Boeing 00Lockheed 11

StringIDPart Dimension Table

Cockpit 20Jet Engine 31

Wing 42

StringIDCustomer Dimension Table

Delta 50FedEx 61

SORTED

SupplierIDFact TablePart Customer Sales

00 0 0 $1001 0 1 $2012 0 0 $3013 0 1 $4004 1 0 $5005 1 1 $6016 1 0 $7017 1 1 $8008 2 0 $9009 2 1 $100110 2 0 $110111 2 1 $120

Page 23: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

23

Let’s just say ‘Parts’ is the most significant data of interest.

IDFact Table

Sales0 $101 $202 $303 $404 $505 $606 $707 $808 $909 $100

10 $11011

Customer010101010101 $120

Supplier001100110011

Part000011112222

Page 24: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

24

Understanding Nested B-Trees

IDFact Table

Sales0 $101 $202 $303 $404 $505 $606 $707 $808 $909 $100

10 $110

Supplier00110011001111

Part000011112222

Customer010101010101 $120

ID Fact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier001100110011

11

Part000011112222

Customer010101010101 $120

IDFact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier0

01

10

01

10

01

1

11

Part0

00

01

11

12

22

2

Customer01

01

01

01

01

01

$120

ID

Fact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier00

11

00

11

00

11

11

Part0

00

01

11

12

22

2

Customer01

01

01

01

01

01

$120

ID

Fact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier00

11

00

11

00

11

11

Part00

00

11

11

22

22

Customer01

01

01

01

01

01

$120

ID

Fact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier00

11

00

11

00

11

11

Part00

00

11

11

22

22

Customer

01

01

01

01

01

01

$120

ID

Fact Table

Sales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier

00

11

00

11

00

11

11

Part0

00

01

11

12

22

2

Custom

er

01

01

01

01

01

01

$120

ID

Fact T

ableS

ales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier

001100110011

11

Part000011112222

Custom

er

010101010101$120

ID

Fact T

ableS

ales

0

$10

1

$20

2

$30

3

$40

4

$50

5

$60

6

$70

7

$80

8

$90

9

$100

10

$110

Supplier

001100110011

11 Part

000011112222

Custom

er

010101010101$120

IDF

act Ta

bleS

ales

0$10

1$20

2$30

3$40

4$50

5$60

6$70

7$80

8$90

9$10

010

$110

Sup

plier001100110011

11

Part000011112222

Custom

er010101010101

$120

Page 25: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

25

Understanding Nested B-Trees

IDF

act Ta

bleS

ales

0$10

1$20

2$30

3$40

4$50

5$60

6$70

7$80

8$90

9$10

010

$110

Sup

plier001100110011

11

Part000011112222

Custom

er010101010101

$120

Fact Table

$10$20$30$40$50$60$70$80$90$10

0$11

0$12

0 Sales

001100110011

Supplier

000011112222

Part

010101010101

Customer

ID01234567891011 ID

StringIDSupplier Dimension Table

Boeing 00Lockheed 11

StringIDPart Dimension Table

Cockpit 20Jet Engine 31

Wing 42

StringIDCustomer Dimension Table

Delta 50FedEx 61

Wing Cockpit

B B BL L L

D DDDDD F FFFFF

Jet EngineJet EngineWing Cockpit

Page 26: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

26

Delta

FedEx

Delta

FedEx

Delta

FedEx

Delta

FedEx

Delta

FedEx

Making a B-Tree Forest

IDF

act Ta

bleS

ales

0$10

1$20

2$30

3$40

4$50

5$60

6$70

7$80

8$90

9$10

010

$110

Sup

plier001100110011

11

Part000011112222

Custom

er010101010101

$120

Fact Table

$10$20$30$40$50$60$70$80$90$10

0$11

0$12

0 Sales

001100110011

Supplier

000011112222

Part

010101010101

Customer

ID01234567891011 ID

Wing Cockpit

B B BL L L

D DDDDD F FFFFF

Jet Engine Jet EngineWing Cockpit

BoeingLockheed

Boeing

Lockheed

Boeing

Lockheed

Delta

FedEx

Drilling down the Hypercube to a Single Data Value

Page 27: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

27

Data Structure & Concept Side by Side

Do you see the Data Value Hypercube to the left?

Delta

FedEx

Delta

FedEx

Delta

FedEx

Delta

FedEx

Delta

FedEx

Boeing

Lockheed

Boeing

Lockheed

Delta

FedEx

Boeing

Lockheed

WingCockpit

Jet Engine

CustomerSupplierBoeingBoeing

DeltaFedEx

LockheedLockheed

DeltaFedEx

CustomerPartSupplierBoeingBoeing

CockpitCockpit

DeltaFedEx

LockheedLockheed

CockpitCockpit

DeltaFedEx

BoeingBoeing

Jet EngineJet Engine

DeltaFedEx

LockheedLockheed

Jet EngineJet Engine

DeltaFedEx

BoeingBoeing

WingWing

DeltaFedEx

LockheedLockheed

WingWing

DeltaFedEx

PartSupplier

BoeingBoeingBoeing

CockpitJet EngineWing

LockheedLockheedLockheed

CockpitJet EngineWing

CustomerPart

CockpitCockpit

DeltaFedEx

Jet EngineJet Engine

DeltaFedEx

WingWing

DeltaFedEx

SupplierBoeingLockheed

CustomerDeltaFedEx

CockpitJet EngineWing

Part

None

Page 28: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

28

Network Data Stream

ProtocolContentID Destination IPSource IPTime Stamp

ProtocolContentID Destination IPSource IPTime Stamp000 243917285212285642861166832000001 173614669517614485151166832001002 486514255117644282461166832002013 197245657418924544581166832005024 452261735616548235421166832006135 285645987612467894371166832007246 243985245214685317531166832008357 153698576714359432481166832010358 131452528612458975611166832011469 1354457862164875474511668320124710 1371566218134751298511668320135811 4655814344182547555811668320145812 2564258624134287218411668320156913 12452382181347164817116683202071014 136545775413448/4687116683202181115 185425756917475485281166832022

ProtocolContentID Destination IPSource IPTime Stamp8110 4825212523121458752811668320308111 149246455512457985661166832031802 139876124714361875611166832032813 175292458217621485681166832033814 258624588416745657231166832040815 439621558914365854791166832041816 179255865717985468221166832042827 134282315515875663121166832043828 191274638613456796581166832044829 48312536741486144679116683204581010 13482364871736569518116683204681011 14675884871344188545116683204781012 1135416853145587526711668320489913 4231144559155879646711668320499914 142355257717526214431166832050

StringIDSMB0LDAP1SSH2AOL3

JPEG4ENGLISH5

ZIP6COMPRESS7

GIFF8POP9

SMPT10IMAP11FTP12

TELNET13SKYPE14

CMS15

GLOBAL String Table

FRENCH16RUSSIAN17

BMP18BASIC SOURCE19

C SOURCE20DISCOVER21

String Table IDIDBASIC SOURCE 190

BMP 181C SOURCE 202

CMS 153COMPRESS 74DISCOVER 215

ENGLISH 56FRENCH 167

GIFF 88JPEG 49

RUSSIAN 1710ZIP 611

CONTENT Dimension Table

String Table IDIDAOL 30FTP 121

IMAP 112LDAP 13 POP 94

SKYPE 145SMB 06

SMTP 107SSH 28

TELNET 139

PROTOCOL Dimension Table

Only showing 2 out of 16 NETWORK DATA STREAM Dimensions

Page 29: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

29

B-TREE Notation

FTP

B (1,3)

Attribute Name

Node

B

Level Record Number

Page 30: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

30

NETWORK DATA STREAM

POP

B (1,9)

AOL

B (1,7)

IMAP

B (1,8)

SKYPE

B (1,4)

FTP

B (1,3)

LDAP

B (1,1)

TELNET

B (1,6)

SMTP

B (1,5)

SSH

B (1,2)

SMB

B (1,0)

“Protocols” B-TREE

Page 31: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

31

Notation

BMP4

B (7,9)(7,9)(7,9)(7,9)

Chunk Record Number

Attribute Name

Record Count

Tree nodes not only contain data aggregates but a linked list of data record indices.

Page 32: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

32

“Content” B-Trees

ZIP3

(2,10) (2,11) (2,12)

C SOURCE4

(2,3) (2,4) (2,5) (2,6)

BMP1

(2,2)

BASIC SOURCE3

(1,15) (2,0) (2,1)

RUSSIAN3

(2,7) (2,8) (2,9)

B (1,8) SSH

C SOURCE1

(1,4)

BMP1

(1,3)

BASIC SOURCE3

(1,0) (1,1) (1,2)

B (1,0) AOL

CMS1

(1,5)

B (1,1) FTP

COMPRESS1

(1,6)

B (1,2) IMAP

DISCOVER2

(1,7) (1,8)

B (1,3) LDAP

FRENCH1

(1,9)

B (1,4) POP

GIFF1

(1,10)

B (1,5) SKYPE

JPEG2

(1,11) (1,12)

B (1,6) SMB

RUSSIAN1

(1,14)

B (1,7) AOL

Page 33: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

33

B-Tree Forest

POP

B (1,9)

AOL

B (1,7)

IMAP

B (1,8)

SKYPE

B (1,4)

FTP

B (1,3)

LDAP

B (1,1)

TELNET

B (1,6)

SMTP

B (1,5)

SSH

B (1,2)

SMB

B (1,0)

Pointer

C SOURCE1

(1,4)

BMP1

(1,3)

BASIC SOURCE3

(1,0) (1,1) (1,2)

B (1,0) AOL

Level

Index of Treeat the same level

Page 34: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

34

ZIP3

(2,10) (2,11) (2,12)

C SOURCE4

(2,3) (2,4) (2,5) (2,6)

BMP1

(2,2)

BASIC SOURCE3

(1,15) (2,0) (2,1)

RUSSIAN3

(2,7) (2,8) (2,9)

B (1,8) SSH

C SOURCE1

(1,4)

BMP1

(1,3)

BASIC SOURCE3

(1,0) (1,1) (1,2)

B (1,0) AOL

CMS1

(1,5)

B (1,1) FTP

COMPRESS1

(1,6)

B (1,2) IMAP

DISCOVER2

(1,7) (1,8)

B (1,3) LDAP

FRENCH1

(1,9)

B (1,4) POP

GIFF1

(1,10)

B (1,5) SKYPE

JPEG2

(1,11) (1,12)

B (1,6) SMB

RUSSIAN1

(1,14)

B (1,7) AOL

POP

B (1,9)

AOL

B (1,7)

IMAP

B (1,8)

SKYPE

B (1,4)

FTP

B (1,3)

LDAP

B (1,1)

TELNET

B (1,6)

SMTP

B (1,5)

SSH

B (1,2)

SMB

B (1,0)

Page 35: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

35

Conclusion

B-tree forests are limited to data aggregates. Data aggregates only identify the existence of a dimensional combination. They do not provide access to complete data records.

With current OLAP implementations, examining data records requires issuing additional database queries, which is inefficient.

We solve this problem by extending a balanced b-tree forest to include references to data records. We call this new type of hypercube: the data value cube. Thus for our data cube, tree nodes not only contain data aggregates but a linked list of data record indices.

Page 36: 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube

36

THE Q&A

Stephen A. Broeker