1 computing the cube abhinandan das cs 632 mar 8 2001
Post on 21-Dec-2015
217 views
TRANSCRIPT
![Page 1: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/1.jpg)
1
Computing the cube
Abhinandan Das CS 632
Mar 8 2001
![Page 2: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/2.jpg)
2
On the Computation of Multidimensional Aggregates
Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey Naughton, Raghu Ramakrishnan & Sunita Sarawagi
-- VLDB 1996
![Page 3: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/3.jpg)
3
Motivation OLAP / Multidimensional data analysis Eg: Transactions(Prod,Date,StoreId,Cust,Sales)
Sum of sales by: (P,SId) ; (P) ; (P,D,SId) Computing multidimensional
aggregates is a performance bottleneck
Efficient computation of several related group-bys
![Page 4: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/4.jpg)
4
What is a CUBE? n-dimensional generalization of the
group by operator Group-bys corresponding to all
possible subsets of a given set of attributes
Eg: SELECT P, D, C, Sum(Sales) FROM Transactions CUBE-BY P, D, C
ALL, P, D, C, PD, PC, DC, PDC
![Page 5: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/5.jpg)
5
Approaches Basic group-by methods:
Sort based Hash based
Naïve approach
![Page 6: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/6.jpg)
6
Possible optimizations1.1. Smallest parentSmallest parent2.2. Cache resultsCache results3.3. Amortize scansAmortize scans4.4. Share sortsShare sorts5.5. Share partitionsShare partitions Often contradictory Assumption: Distributive aggregate
function sum, count, min, max ; average-- Non distributive: median
![Page 7: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/7.jpg)
7
Sort based methods Algorithm PipeSort Share-sorts Vs Smallest parent
Optimize to get minimum total cost Cache-results & amortize-scans
Pipelining: ABCD ABC AB A
![Page 8: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/8.jpg)
8
PipeSort Assumption: Have an estimate of
the number of distinct values for each group-by
Input: Search lattice Graph where each vertex represents
a group-by of the cube Edge i j if |i|=|j|+1, ji
![Page 9: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/9.jpg)
9
Search lattice: CUBE-BY ABCD
![Page 10: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/10.jpg)
10
Search lattice (contd...) Each edge eij associated with two
costs: S(eij): Cost of computing j from i when i is
pre-sorted U(eij): Cost of computing j from i when i is
not already sorted Idea: If attribute order of a group-by j is a
prefix of parent i, compute j without sorting (Cost S(eij)) else first sort (Cost U(eij))
![Page 11: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/11.jpg)
11
PipeSort (contd...) Proceed level by level, k=0 to k=N-1 Find best way of computing level k
from level k+1 Weighted bipartite matching:
Make k additional replicas of each node at level k+1
Cost S(eij) on original node, U(eij) on replicas Find minimum cost maximal matching
![Page 12: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/12.jpg)
12
Min cost matching
![Page 13: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/13.jpg)
13
Algorithm PipeSort For level k = 0 to N-1
Generate_plan(k+1 k) Fix sort order of level k+1 nodes
Generate_plan(k+1 k): Make k additional replicas of level
k+1 nodes, assign appropriate edge costs
Find min-cost matching
![Page 14: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/14.jpg)
14
Example plan
![Page 15: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/15.jpg)
15
Tweaks Aggregate and remove duplicates
whilst sorting Use partial sorting order to reduce
sort costs Eg: ABC AC
![Page 16: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/16.jpg)
16
Hash based methods Algorithm PipeHash Can include all stated
optimizations: (If memory available)
For k=N downto 0 For each group-by g at level k+1
Compute in 1 scan of g all children for which g is smallest parent
Save g and deallocate hash table of g
![Page 17: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/17.jpg)
17
PipeHash Limited memory Use
partitioning Optimization share-partitions:
When data is partitioned on attribute A, all group-bys containing A can be computed independently on each partition
No recombination required
![Page 18: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/18.jpg)
18
PipeHash: Overview First choose smallest parent for
each group-by (gives MST) Optimize for memory constraints:
Decide what group-bys to compute together
Choice of attribute for data partitioning
Minimizing overall disk scan cost: NP-Hard!
![Page 19: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/19.jpg)
19
PipeHash
![Page 20: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/20.jpg)
20
Heuristics Optimizations cache-results and
amortize-scans favoured by choosing large subtrees of MST: Can compute multiple group-bys together
However, partitioning attribute limits subtree
Hence choose the partitioning attribute that allows choice of largest subtree
![Page 21: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/21.jpg)
21
Algorithm: Worklist w=MST While w not empty
Choose any tree T from w T’ = select_subtree(T) Compute_subtree(T’)
![Page 22: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/22.jpg)
22
Select_subtree(T) If mem reqd by T < M, return T Else: For any get subtree Ta
Let Pa=max # of partitions of root(T) possible if a used for partitioning
Choose a s.t. (mem reqd Ta)/Pa<M and
Ta is largest subtree over all a Add forest T-Ta to w, return Ta
Aa
![Page 23: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/23.jpg)
23
Compute_subtree(T’) numParts = (mem reqd T’)*
fudge_factor/M Partition root(T’) into numParts For each partition of root(T’)
For each node n in T’ (breadth first) Compute all children of n in 1 scan Release memory occupied by hash table of n
![Page 24: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/24.jpg)
24
Notes on PipeHash PipeHash biased towards smallest-
parent optimization Eg: Compute BC from BCD (fig) In practice, saving on sequential
disk scans less important than reducing CPU cost of aggregation by choosing smallest parent!
![Page 25: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/25.jpg)
25
Overlap method Sort based Minimize disk accesses by
overlapping computation of “cuboids”
Focus: Exploit partially matching sort orders to reduce sorting costs
Uses smallest parent optimization
![Page 26: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/26.jpg)
26
Sorted runs B = (A1,A2,...Aj) ; S=(A1,...Al-1,Al+1,...Aj) A sorted runsorted run of S in B is a maximal run
of tuples in B whose ordering is consistent with the sort order in S Eg:B=[(a,1,2),(a,1,3),(a,2,2),(b,1,3), (b,3,2),(c,3,1)] S=[(a,2),(a,3),(b,3),(b,2),(c,1)] (1st & 3rd)Sorted runs for S: [(a,2),(a,3)],[(a,2)],[(b,3)],
[(b,2)] and [(c,1)]
![Page 27: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/27.jpg)
27
Partitions B, S have common prefix A1,...,Al-1
A partition partition of a cuboid S in B is the union of sorted runs s.t. the first (l-1) columns (ie common prefix) have the same value
Previous eg: Partitions for S in B are: [(a,2),(a,3)], [(b,3),(b,2)] & [(c,1)] Tuples from different partitions need not
be merged for aggregation Partition is independent unit of
computation
![Page 28: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/28.jpg)
28
Overview Begin by sorting base cuboid All other cuboids computed w/o re-sorting Sort order of base cuboid determines sort
order of all other cuboids To maximize overlap across cuboid
computations, reduce memory requirements of individual cuboids
Since partition is unit of computation, while computing one sorted cuboid from another, just need mem sufficient to hold a partition
![Page 29: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/29.jpg)
29
Overview (contd...) When partition is computed, tuples
can be pipelined to descendants; same memory used by next partition
Significant saving: PartSize << CuboidSize
Eg: Computing ABC and ABD from ABCDPartSize(ABC) = 1 PartSize(ABD)=|D|
![Page 30: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/30.jpg)
30
Choosing parent cuboids Goal: Choose tree that minimizes
size of partitions Eg: Better to compute AC from
ACD than ABC Heuristic: Maximize size of
common prefix
![Page 31: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/31.jpg)
31
Example cuboid tree
![Page 32: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/32.jpg)
32
Choosing overlapping cuboids To compute a cuboid in memory,
need memory = PartSize If required memory is available,
cuboid is in PartitionPartition state Else allocate 1 memory page for the
cuboid, and mark as SortRun SortRun state Only tuples of a PartitionPartition cuboid can
be pipelined to descendants
![Page 33: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/33.jpg)
33
Heuristics Which cuboids to compute and in
what state: Opt allocation NP-hard! Heuristic: Traverse tree in BFS
order Intuition:
Cuboids to the left have smaller partition sizes
So require less memory
![Page 34: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/34.jpg)
34
Cuboid computation For each tuple t of B (parent)
If (state==partition) process_partition(t) Else process_sorted_run(t)
Process_partition(t): 3 cases:
Tuple starts new partition Tuple matches existing tuple in partition New tuple: Insert at appropriate place
![Page 35: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/35.jpg)
35
Cuboid computation (contd...) Process_sorted_run(t):
3 cases Tuple starts new sorted run Tuple matches last tuple in current run New tuple: Append tuple to end of current run
Cuboid in Partition state fully computed in 1 pass
Cuboid in SortRun state: Combine merge step with computation of descendant cuboids
![Page 36: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/36.jpg)
36
Example CUBE computation
(M=25 ; 9 sorted runs of BCD, CD to merge)
![Page 37: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/37.jpg)
37
An array based algorithm forsimultaneous multidimensional aggregates
Yihong Zhao, Prasad Deshpande, Jeffrey Naughton
-- SIGMOD ‘97
![Page 38: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/38.jpg)
38
ROLAP Vs MOLAP CUBE central to OLAP operations ROLAP: Relational OLAP systems
PipeSort, PipeHash, Overlap MOLAP: Multidimensional OLAP
systems Store data in sparse arrays instead of
relational tables
![Page 39: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/39.jpg)
39
MOLAP systems Relational tuple: (jacket, K-mart, 1996, $40) MOLAP representation:
Stores only ‘$40’ in a sparse array Position in array encodes (jacket,K-mart,1996)
Arrays are “chunked” and compressed for efficient storage
![Page 40: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/40.jpg)
40
Problem No concept of “reordering” to bring
together related tuples Order cell visits to simultaneously
compute several aggregates whilst minimizing memory requirements and # of cell visits
![Page 41: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/41.jpg)
41
Efficient array storage: Issues Array too large to fit in memory: Split
into “chunks” that fit in memory Even with chunking, array may be
sparse: Compression needed Standard PL technique of storing arrays
in row/column major form inefficient Creates asymmetry amongst dimensions,
favoring one over the other
![Page 42: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/42.jpg)
42
Chunking Divide n-D array into small n-D
chunks and store each chunk as independent object on disk
Keep size of chunk = disk block size
We shall use chunks having same size along each dimension
![Page 43: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/43.jpg)
43
Compressing sparse arrays Store “dense” chunks as is (>40% occ.) Already a significant compression over a
relational table Sparse arrays: Use chunk-offset
compression – (offsetInChunk,data) Better than LZW etc. because:
Uses domain knowledge LZW data requires decompression before
use
![Page 44: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/44.jpg)
44
Loading arrays from tables Input: Table, dim sizes, chunksize If M < array size, partition sets of chunks
into groups which fit in memory eg: Suppose 16 chunks and 2 partitions, group chunks 0-7 & 8-16
Scan table. For each tuple, calculate & store (chunk#,offset,data) into buffer page for corresponding partition
2nd pass: For each partition, read tuples and assign to chunks in memory. Compress.
![Page 45: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/45.jpg)
45
Basic algo (No overlapping) Eg: 3-D array ABC; To compute AB If array fits in memory, sweep plane of size
|A|*|B| along dim C, aggregating as you go If array chunked: Sweep plane of size |Ac|*|Bc| through upper left chunks. Store
result, move to chunks on right Each chunk read in only once Mem: 1 chunk + |Ac|*|Bc| plane
![Page 46: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/46.jpg)
46
Generalization Sweep k-1 dimensional subarrays
through a k-dimensional array Multiple group-bys: Use smallest parent
optimization in cube lattice Advantage over ROLAP: Since
dimension & chunk sizes known, exact node sizes can be computed
Min Size Spanning Tree (MSST): Parent of node n is parent node n’ in lattice of min size
![Page 47: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/47.jpg)
47
Basic array cubing algorithm: First construct MSST of the group-bys Compute a group-by Di1Di2...Dik from
parent Di1...Di.k+1 of min size: Read in each chunk of Di1...Di.k+1 along
dimension Di.k+1 and aggregate each chunk to a chunk of Di1...Dik. Once a chunk of
Di1...Dik is complete, flush and reuse mem
![Page 48: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/48.jpg)
48
Example ABC – 16x16x16 array Chunk size: 4x4x4 Dimension order: ABC Eg: Computing BC: Read in order
1..64 After every 4, flush chunk to disk
and reuse memory
![Page 49: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/49.jpg)
49
3-D array (Dim order ABC)
![Page 50: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/50.jpg)
50
Multi-Way algorithm Single pass version: Assume enough
memory to compute all group-bys in 1 scan
Reduce memory requirements using a special order to scan input array, called dimension orderdimension order
A dimension orderdimension order of the array chunks is a row major order of the chunks with n dimensions D1...Dn in some order
O = (Di1,Di2,...Din)
![Page 51: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/51.jpg)
51
Memory requirements For a given dimension order, can determine
which chunks of each group-by need to stay in memory to avoid rescanning input array
Eg: Suppose D.O. is ABC ie 1..64 BC: Sweep 1 chunk, deallocate & reuse AC: Sweep 4 chunks for entire AC AB: Sweep 16 chunks for entire AB Note: Each BC chunk is generated in DO.
Before writing a BC chunk to disk, use it to compute B,C chunks as if read in D.O.
![Page 52: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/52.jpg)
52
Memory requirements Generalizing: (Xc=chunk size, Xd=dim size) Computing BC requires |Bc|*|Cc| memory Computing AC, AB requires |Ad|*|Cc| and
|Ad|*|Bd| memory RULE1: For a gp-by (Dj1,...,Djn-1) of array
(D1,...Dn) with DO=(D1,...Dn), if (Dj1,...,Djn-1) contains a prefix of (D1,...Dn) of length p, then mem requirement for computing (Dj1,...,Djn-1) is:
p
i
n
piii CD
1
1
1
||*||
![Page 53: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/53.jpg)
53
Minimum Memory Spanning Tree MMST: Got from lattice by
choosing parents for each node N as per RULE1
Choose the parent that minimizes memory requirements: The prefix of the parent node contained in node N must have minimum length
Break ties by choosing node with minimum size as parent
![Page 54: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/54.jpg)
54
MMST
![Page 55: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/55.jpg)
55
Optimal dimension order Different dimension orders may generate
different MMSTs which have vastly different memory requirements
Eg: 4D array ABCD Dimension sizes: 10,100,1000,10000 resp. Chunk size =10x10x10x10
Figure shows MMSTs for dim orders ABCD and DBCA
Optimal dimension order is (D1,...,Dn) where
||...|||| 21 nDDD
![Page 56: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/56.jpg)
56
![Page 57: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/57.jpg)
57
Multi-pass Multi-way Array Algorithm
Single pass algo assumed we had memory MT required by MMST T of optimal dimension order
If M <= MT, we cannot allocate memory for some of the subtrees of MMST (incomplete subtrees)
Optimal memory allocation to different subtrees likely to be NP-Hard
![Page 58: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/58.jpg)
58
Heuristic Allocate memory to subtrees of
root from the right to left order Intuition: Rightmost node will be
the largest array and we want to avoid computing it in multiple passes
![Page 59: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/59.jpg)
59
Algorithm Create MMST T for opt D.O. O Add T to ToBeComputed list For each tree T’ in ToBeComputed list:
Create working subtree W and incomplete subtrees Is (allocate mem=node chunksize)
Scan chunks of root of T’ in order O Compute group-bys in W and write to disk Write intermediate result of aggregation of Dj1...Djn-1
group by for each chunk of D1,...,Dn
For each R=root(I) Generate chunks of Dj1...Djn-1 from the partitions of R & write
to disk (Merge several intermediate results into 1 chunk) Add I to ToBeComputed
![Page 60: 1 Computing the cube Abhinandan Das CS 632 Mar 8 2001](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649d615503460f94a43615/html5/thumbnails/60.jpg)
60
ROLAP vs MOLAP Proposed algorithm more efficient than
existing ROLAP techniques (even indirectly): Scan relational table and load into array Compute the cube on resulting array Dump resulting CUBEd array into tables
Why better? ROLAP table sizes much bigger than compressed
array sizes Multiple sorts reqd. MOLAP does not require sorts
since the multidimensional array captures relationships amongst all the different dimensions