cache conscious indexing for decision-support in main memory pradip dhara
Post on 20-Dec-2015
228 views
TRANSCRIPT
Cache Conscious Indexing for Decision-Support in Main
MemoryPradip Dhara
Why In-memory databases
• Telecommunications
• CAD tools
• Moore’s law will allow us to store relations in memory
Redesigning DBMS’s
• Optimize memory-cpu performance vs disk-memory performance
• Re-evaluate space/time tradeoff – space isn’t cheap
• Given certain space requirement, need to optimize response time for lookups
Indices in In-Memory DBMS’s
• Little extra space vs. Increased performance
• Index design takes on new dimensions when looking at in-memory databases
• Space overhead can not be ignored – hash tables are unacceptable
Hardware solutions
• Caches
• Growing disparity between CPU performance and memory performance.
• Cache misses can’t be overlapped
Solution
• CSS-trees indices exploit cache behavior to get improved performance
Direct Mapped Cache
Fully Associative Cache
2-Way Set Associative Cache
Binary Search on Sorted Array
Store the relation in sorted order on a key
Cache performance dependent upon tuple size
1 2 3 4 5 6 7 8 9 10 11 12 13 14
T-trees
pointer to record
4, * 8, *…
0, * 3, *…
10, * 16, *…
key
Enhanced B+ trees
1, * 3, *2, * 4, * 5, * 7, *6, * 8, * 9, * 11, *10, * 12, *
13, * 15, *14, * 16, * 17, * 19, *18, * 20, *
5 9 13 17
Hash Indices
000
111
010
011
100
101
110
001
0, * 8, * 80, *…
Put however many <key, rid> pairs fit into a cache line
Idea Behind CSS-trees
• Save space by not storing pointers
• Use an array as a tree
• Implicitly store pointers as offsets into the array
Useful Formulas for CSS-trees
Children of a node b are nodes b(m+1) to b(m+1) + (m+1)
N = n * m
n = # of elements
m = # of elements per node
N = # of nodes
# of Internal Nodes =
First leaf node in bottom level =
(EQ 1)
(EQ 2)
(EQ 3)
(EQ 4)
How it works
Sorted array
CSS-tree array (Directory)
Full CSS-tree
10 8 9 7 6 5 4 3 2 1
10 8 9 7 6 5 4 3 2 1 4 2 8 6
8 6
4 2 10 8 9 7
6 5 4 3 2 1
node 0
node 0
node 1 node 2 node 3
node 4 node 5 node 6
node 1 node 2 node 3 node 4 node 5 node 6
Internalnodes
Leafnodes
node 0 node 1 node 2 node 3 node 4
Values (Lemma 4.1)m (# keys per node) = 2n (# keys) = 10k (logm+1N)= 2N (# of Leaf Nodes) = 5Internal Nodes = 2First leaf node in bottom level
= 4
Building a full CSS-tree
Searching Within a Node
1 2 3 4 5 6 7 8
Level CSS-trees
1 2 3 4 5 6 7 Value of largest
key in subtree
m = 2t
Entries per node = m -1
Level vs. Full CSS-trees• Level CSS-trees will be deeper due to the
difference in branching factor• Level CSS-trees have fewer comparisons per node
• Level CSS-trees have more cache accesses and and node traversals
log2N vs log2N * logm+1m * (1 + 2/(m+1))
logmN vs Logm+1N
Time Analysis
R (size of rid) = 4 bytesK (size of key) = 4 bytesP (size of pointer) = 4 bytesh = 1.2n (# records) = 107
c (cache line) = 32 bytess (node size/c) = 1
D = time to derefence a pointerAb = time to compute child address for binary searchAfcss = time to compute child address for full CSSAlcss = time to compute child address for level CSS
s = mK/c
Space Analysis
R (size of rid) = 4 bytesK (size of key) = 4 bytesP (size of pointer) = 4 bytesh = 1.2n (# records) = 107
c (cache line) = 32 bytess (node size/c) = 1
D = time to derefence a pointerAb = time to compute child address for binary searchAfcss = time to compute child address for full CSSAlcss = time to compute child address for level CSS
s = mK/c
Experiment
• Results are for Ultra Sparc II– < 16K, 32B, 1>– <1M, 64B, 1>
• Keys randomly generated integers between 0 and 1 million
• Performed 5 tests of 100,00 searches for random keys
Figure 5a: Array Size vs. time
Figure 5b: Array Size vs. Time
Figure 6a: Array Size vs. 2nd cache accesses
Figure 6b: Array Size vs. 2nd cache misses
Figure 7: Node Size vs. Time
CSS Performance on Other Queries
• CSS is very good for individual selection queries
• CSS will probably perform the best in range queries
• Index nested loops join vs. Sort merge join
Doubts About CSS
• Flexibility of CSS-trees across different cache designs
• Any applicability to variable sized records
• Multiple CSS-tree indices on different keys
Conclusion
• CSS-trees improve searching performance by exploiting cache consciousness.
One Last Thought
• Cache designs
• Should we redesign them to let programmers have control?