prefixcube: prefix-sharing condensed data cube

PrefixCube: Prefix-sharing Condensed Data

Jianlin Feng Qiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech.

fengjl@mail.hust.edu.cn

Nov 12, 2004

DOLAP 2004 2 Jianlin Feng

Outline Introduction Related Work ODM: Ordered Datacube Model BST-Condensed Cube Prefix-sharing Condensed Cube Comparisons Conclusions

Introduction Data Cube (ICDE’96)

– N-dimensional cube(A1, A2, …, AN)– 2N cuboids, i.e. GROUP-BYs

The Huge Size Problem– When R is sparse, the size of a cuboid is

possibly close to the size of R. – The I/O cost even for storing the cube

result tuples becomes dominative.

Related Work Condensed Cube (ICDE’02) Dwarf (SIGMOD’02) Quotient Cube (VLDB’02) QC-Tree (SIGMOD’03) Basic idea: remove redundancies

existing among cube tuples. – prefix redundancy – suffix redundancy

Prefix redundancy Given an example cube(A, B, C)

– Each value of dimension A occurs in 4 cuboids: cuboid(A), (AB), (AC) and (ABC)

– Possibly many times in each cuboid except cuboid(A)

Inter-cuboid and Intra-cuboid prefix redundancy

Suffix Redundancy Occurs when cube tuples belonging to

different cuboids are actually aggregated from the same group of base relation tuples.

An extreme case – Let the source relation R have only one single

tuple r(a1, a2, …, an, m); – 2n cube tuples can be condensed into one

physical tuple: (a1, a2, …, an, V), where V = aggr(r);

– together with some information indicating that it is a representative tuple.

Thinking… Condensed cube

– It condenses those cube tuples, aggregated from one single base tuple, into a physical tuple in order to reduce cube’s size.

Dwarf– Besides suffix coalescing, i.e. multi-base-

tuple condensing, it also realized full prefix-sharing so as to achieve high cube size reducing effectiveness.

Motivation HOW to further reduce condensed

cube’s size while taking into account query characteristics we intend to answer - range query?

Augmenting BST-condensing with removing of intra-cuboid prefix redundancy!

Ordered Datacube Model Value ALL(or *) is encoded as 0. A dimension D and its cardinality C

– each dimension value is one-to-one mapped to an integer value between 1 and C inclusively.

N dimensions form a N-dimensional space.

The origin O(0, 0, …, 0) represents the grand total.

Ordered Datacube Model

Under ODM, a range query against a data cube can actually be reduced to a sub-query against only one particular cuboid in the cube or a union of such sub-queries.

BST-Condensed Cube Base Single Tuple (BST)

– t1 is a BST on SD {A} and {B}– t2 is a BST on SD {B}

A unique minimal BST-Condensed Cube can be got when fully taking advantage of each BST with all of its SDs - MinCube.

A B C Mt1 8 1 1 100t2 1 8 1 50t3 1 2 3 60

BU-BST Condensed Cube BottomUpBST algorithms (ICDE’02) Each BST corresponds to only one SD. It’s easier to compute and to restore normal cube tuple

from condensed cube compared with MinCube.

Note: BST Condensing is a special kind of Prefix-sharing !

A B C M8 * * 108 1 * 108 * 1 108 1 1 10

A B C M SD

ct7 8 1 1 10 {A}

A group of cube tuples with sharing

prefix are represented by a

A BU-BST Condensed Cube Example

A B C Mt1 8 1 1 100t2 1 8 1 50t3 1 2 3 60

A B C M SID CIDct1 * * * 210 ALLct2 1 * * 110 Act3 1 2 3 60 ABct4 1 8 1 50 ABct5 1 * 1 50 ACct6 1 * 3 60 ACct7 8 1 1 100 Act8 * 1 1 100 Bct9 * 2 3 60 B

ct10 * 8 1 50 Bct11 * * 1 150 Cct12 * * 3 60 C

Note:Intra-cuboid prefix redundancy: ct3 and ct4 Inter-cuboid prefix redundancy: ct2, ct3 and ct5

Prefix-sharing Condensed Cube - PrefixCube

BST Condensing BST Condensing ++

Intra-cuboid prefix-sharingIntra-cuboid prefix-sharing

Prefix-sharingPrefix-sharing

PrefixCubePrefixCube

A PrefixCube Example

SID = A SID = AB SID = B

1 110210 1 1 150 3 60

1 50 3 60

V-RootsN-Roots

CID = ALL CID = ACCID = A CID = A

Corresponding Dwarf

1 50 50

3 60 60

1 50 1103 60 1 150 2103 60

A Dimension

B Dimension

C Dimension

(node1)

(node2)

(node4)

(node3)

PrefixCube vs. DwarfPrefixCub

eDwarf

Prefix-sharing Intra-cuboid Inter- and Intra-cuboid

PrefixCube does not aim at blindly achieving effective compression ratio, but it is intended to make a good compromise among cube size reducing ratio, restoring and updating costs, and query characteristics!

Suffix Coalescing

BST Condensing

Multi-tuple Condensing

Compression Ratio

Lower Higher

Saving extra value ALL?

No Yes

Tuple clustered by

cuboid?

Yes No

Effectiveness of Size Reduction

Datasets– synthetic datasets with uniform distribution– # of tuples: 1,000,000

2 3 4 5 6 7 8 9

Number of Dimensions

BU-BSTPrefixCube

2 3 4 5 6 7 8 9

BU-BSTPrefixCube

(a) Cardinality = 100 (b) Cardinality = 1000

Effectiveness of Size Reduction

PrefixBUC– Full Cube (computed by BUC) – Prefix-sharing

2 3 4 5 6 7 8 9

C=100C=1000

Impact of Data Density Datasets

– Uniform distribution– # of dimensions: 6– Cardinality of dimensions: 100– # of tuples: range from 1,000 to 1,000,000

1.E+03 1.E+04 1.E+05 1.E+06

Number of Tuples

BU-BSTPrefixCubePrefixBUC

Impact of Data Skewness Datasets

– Zipf distribution– # of tuples: 1,000,000– Cardinality of dimensions: range from 1,000 to 500 with

100 interval– Zipf factor: range from 0 to 0.8 with 0.2 interval

0 0.2 0.4 0.6 0.8

Zipf Factors

Real-world Dataset Datasets

– Weather Datasets– # of tuples: 1,015,367

2 3 4 5 6 7 8 9

BUCBU-BSTPrefixCube

2 3 4 5 6 7 8 9

Conclusion A new cube structure PrefixCube was

proposed by augmenting BU-BST condensing with intra-cuboid prefix-sharing.– It can greatly reduce data cube’s size

compared with BU-BST condensed cube.– It can also reduce the impact of data skew

on BU-BST condensing.– It can make a quite stable size reduction

on both dense and sparse datasets.

The End

Thank u!

Any question?

prefixcube: prefix-sharing condensed data cube

Documents

chapter 9 | condensed matter physics 403 9 | condensed

2 think cube workshops introduction to think cube v2

condensed quarterly financial statements - … draft...

3d cube building cube by cube powerpoint ppt slides

data sheet telephone cube - bosse cube

cube calculus. overview of this presentation a brief review...

3d cube building cube by cube powerpoint ppt templates

condensed facilitator guide - home - insight...

nim cube unfoldign the innovation cube

text · geo location analysis reporting bpms intelligence...

rubik’s cube flags of the world - · pdf filerubik’s...

rubik’s cube, music’s cube - aalborg...

dbi331. cube measure group measure partition cube dimension...

cube firewall™ -...

installatievoorschrift basic cube en hp (cool) cube

x cube ii x cube 160w user manual

rubiks cube 2004 instructions - hasbro cube 2004...

web cube and news cube tips

rubik’s cube roboter - lqc€¦ ·...

prefixcube: prefix-sharing condensed data cube jianlin...