class 18 hashing -...

Post on 06-Mar-2018

224 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

hashingprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 18

/33CS165, Fall 2015 Stratos Idreos 2

professors(id,name,…)

courses(id,name, profId,…)

students(id,name,…)

database

give me all students enrolled in cs165select student.name from students, enrolled, courses where courses.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…) foreign key

join

/33CS165, Fall 2015 Stratos Idreos 3

new resL[]; new resR[]; k=0 for (i=0;i<L.size;i=i++) for (j=0;j<R.size;j++) if L[i]==R[j] resL[k]=i resR[k++]=j

nested loops

L R

/33CS165, Fall 2015 Stratos Idreos 4

hash join

hash

join input 1 hash table

hash

join input 2

/33CS165, Fall 2015 Stratos Idreos 5

hash table

bucket

val (key),pos

hash

h=f(val)bucket=h mod k

k bucketsN keys

[(val1,pos1), (val7,pos7), …]

/33CS165, Fall 2015 Stratos Idreos 6

stream input

p1 p2 p3 p4

1. read input into stream buffer, hash and write to respective partition buffer 2. when input buffer is consumed, bring the next one 3. when a partition buffer is full, write to L2 we can partition into L1-1 pieces in one pass

partition

p1

p2 p3

p4

Level2

Level 1

/33CS165, Fall 2015 Stratos Idreos 7

grace hash join

hash partitioning

then join each pair of partitionsindependently in memory

join input 1 join input 2

/33CS165, Fall 2015 Stratos Idreos 8

stream input left

p1 right hash table result

probe

p1 left

p1 right

result

L1-2

Level2

Level 1

as long as at least one of the pieces <=L1-2

/33CS165, Fall 2015 Stratos Idreos 9

grace hash join

hash partitioning

apply recursively if a partition does not fit in memory (L1-2)

/33CS165, Fall 2015 Stratos Idreos 10

today: how to create the hash table

/33CS165, Fall 2015 Stratos Idreos 11

selectR.C

selectS.F

join

fetchR.A

fetchS.A

maxR.D

minS.G

/33CS165, Fall 2015 Stratos Idreos 12

static hashing

hash table

bucket

updates may create long lists

can this happen anyway, i.e., even with no updates

/33CS165, Fall 2015 Stratos Idreos 13

simple solution would be to double the hash table and rehash everything…

but…

/33CS165, Fall 2015 Stratos Idreos 14

dynamic hashing

extendible hashing - linear hashing

/33CS165, Fall 2015 Stratos Idreos 15

extendible hashing

directory bucketswhen there is no more space

split individual bucketsand if needed double the directory

hash table

with directory in memory we need 1 I/O to fetch the bucket

not true from memory to caches in general

/33CS165, Fall 2015 Stratos Idreos 16

directory maps hash values to buckets

h=hash(key)

use binary form of h use X last bits to map 2^x buckets

00 01 10 11

for X=2000 001 010 011 100 101 110 111

for X=3

extending the directory …

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

new

split

/33CS165, Fall 2015 Stratos Idreos 17

00 01 10 11

dic. X=2

000 001 010 011 100 101 110 111

for X=3

buckets

buckets

full

newextend directory only when no split is possible

cannot split

can split

split

/33CS165, Fall 2015 Stratos Idreos 18

linear hashingdo not use a directory (indirection cost)

instead incrementally extend the hash table one bucket at a time

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

next bucket to split00

01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

next bucket to split00

01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

*

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

next bucket to split00

01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

*full

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

next bucket to split00

01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

*full

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

next bucket to split00

01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

*full

/33CS165, Fall 2015 Stratos Idreos 19

e.g.,use 2 bits of hash value (h0)

00 01 10 11

dictionary is implicit when insert causes

overflow somewhere split next bucket

use +1 bit of hash value

next bucket to split

*full

/33CS165, Fall 2015 Stratos Idreos 20

000 001 010 011 100 101 110 111

for X=3buckets

newdictionary is

implicit

00 01 10 11

dictionary is implicit

next bucket to split

for X=2

search: use current hash function (bits) if after split bucket ok else use next hash function (bits+1)

/33CS165, Fall 2015 Stratos Idreos 21

000 001 010 011 100 101 110 111

for X=3buckets

C

00 01 10 11

next bucket to split

for X=2

keep splitting next bucket with each overflow until all buckets are split (for x=2) then restart for x=3

any problemswhat would happen

when we split bucket C

/33CS165, Fall 2015 Stratos Idreos 22

selectR.C

selectS.F

join

fetchR.A

fetchS.A

maxR.D

minS.G

What can we do to start working immediately? (hint: vectorized processing)

What can we do if we wait for all data to arrive? (hint: bulk processing)

/33CS165, Fall 2015 Stratos Idreos 2323

symmetric hash joinL R

while there exist buffered values from L, hash and

probe HT of R

1) hash

2) probe

3) output

when buffer is empty, switch!

/33CS165, Fall 2015 Stratos Idreos 2323

symmetric hash joinL R

while there exist buffered values from R, hash and

probe HT of L

2) probe1) hash

3) output

when buffer is empty, switch!

/33CS165, Fall 2015 Stratos Idreos 2424

static hashing with 2 passeskeys

hash

count size of each bucket

phase 1

hash tablekeys

hash

phase 2

we now exactlywhat

hash tableto build

/33CS165, Fall 2015 Stratos Idreos 25

keys

hash

count size of each bucket

phase 1

size1 size2 size3

… sizeK

array of k slots where k the buckets we want to have

for each bucket we know its size

D

/33CS165, Fall 2015 Stratos Idreos 26

hash tablekeys

hash

phase 2

we now exactlywhat

hash tableto build

size1 size2 size3

… sizeK

0 D[0]+s1 D[1]+s2 D[2]+s3

D D

pass D and sum all counts to get offsets in a sequentially stored hash table

/33CS165, Fall 2015 Stratos Idreos 27

can be done with binary ops only

use X LSBs

h=a*key+bb=h mod K

which hash function should we use?

/33CS165, Fall 2015 Stratos Idreos 28

what happens after the join?

selectR.C

selectS.F

join

fetchR.A

fetchS.A

maxR.D

minS.G

select max(R.D),min(S.G) from R,S where R.A=S.A and R.C<10 and S.F>30

block operator

access patterns

/33CS165, Fall 2015 Stratos Idreos 29

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

R.* scan

pos R.Jfetch

posR.J

ordered sparse

ordered sparse

preparing the R join input

same for the S join input

we need the original positions so we can fetch other R columns after the join

/33CS165, Fall 2015 Stratos Idreos 30

join input R.J R.J + posR

join input S.J S.J + posS partition (=reorder)

both join inputs to join

join result posR + posS

both sides are unordered

join

ordered sparse

ordered sparse

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

/33CS165, Fall 2015 Stratos Idreos 31

join result posR + posS

cluster on R

join result clustered on R posR + posS

fetch R payload with sequential pattern

1 2 3 4

ID + posS

cluster on S4 2 3 1

ID + posS

clus

tere

d

clus

tere

d fetch S payload with sequential pattern

4 2 3 1

e.g., ID + S.A

sort on ID decluster

all result columns aligned

radix declustering

both sides are unordered

select R.A, R.B, R.C, S.A, S.B, S.C from R, S where R.J=S.J and …

Cache-Conscious Radix Decluster ProjectionsBy S. Manegold, P. Boncz, N. Nes, and M. Kersten Very Large Databases Conference, 2004

/33CS165, Fall 2015 Stratos Idreos 32

Textbook Chapter 11

Cache-Conscious Radix Decluster ProjectionsS. Manegold, P. Boncz, N. Nes, and M. Kersten Very Large Databases Conference, 2004

DATA SYSTEMSprof. Stratos Idreos

class 18

hashing

top related