class 8 indexing & sorting - harvard...
TRANSCRIPT
![Page 1: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/1.jpg)
indexing & sortingprof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 8
![Page 2: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/2.jpg)
CS165, Fall 2017 Stratos Idreos /332
first part done: basic concepts in modern systems
coming up: indexing and fast scans
![Page 3: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/3.jpg)
CS165, Fall 2017 Stratos Idreos /333
registers
on chip cache2x
on board cache10x
memory100x
disk100Kx
Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations award
Pluto2 years
New York1.5 hours
this building10 min
this room1 min
my head~0
![Page 4: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/4.jpg)
CS165, Fall 2017 Stratos Idreos /334
random access & page-based access
…
need to only read x… but have to read all of page 1
page1 page2 page3
data value x
registers
on chip cache
on board cache
memory
disk
CPU
data
mov
e
![Page 5: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/5.jpg)
CS165, Fall 2017 Stratos Idreos /335
column-storage
A B C D
it all starts with how we layout the data (bits)
row-store and column-store are just two extremes in the design space
row-storage
A B C D
![Page 6: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/6.jpg)
CS165, Fall 2017 Stratos Idreos /336
as you are starting your projects, remembercome to office hours/lab - read/post in piazza
distributed API, DSL, code is supposed to help you start fast diverging is perfectly OK
functionality goal: select max(R.a), min(S.a) from R, S where R.j=S.j and R.b<20 and S.c>10 and S.d<50
+updates and persistency
performance goal: scalability (cores/queries)
cache conscious
![Page 7: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/7.jpg)
CS165, Fall 2017 Stratos Idreos /337
vectorwised processing: how to
select max(A) from R where B<20
p=select(B,null,20) a=fetch(A,p) res=max(a)
j=0; for(i=0; i<B.size; i+vector.size){
p=select(B,i,vector.size,null,20) a=fetch(A,p) rv[j++]=max(a)
} res=max(rv)
rewrite to
Extra: Enhanced stream processing in a DBMS kernelErietta Liarou, Stratos Idreos, Stefan Manegold, Martin Kersten In Proc. of the International Conf. on Extending Database Technology, 2013
optimizer assume optimizer does the rewriting and focus on analysis of property X - vectorwised vs column-at-a-time
take plans from here:
edge cases not included :)
![Page 8: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/8.jpg)
CS165, Fall 2017 Stratos Idreos /338
midtermshow to prepare
open book, notes, no laptop/discussion
material from lectures, “browse/read” readings
check all quizzes and questions
quiz-like questions - no exact answer
expectations: describe the design space - chose what you think is the best approach (>1 if we ask for it) and then analyze in detail all requests - if you made the wrong choice in the begging it is OK - but say so if you find out in the end and explain as much as possible
explain all steps and tradeoffs
Saturday & Sunday before midterm: Office hours 10am-3/5pm with each one of the five TFs and Stratos (noon-1pm)
10/11
![Page 9: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/9.jpg)
CS165, Fall 2017 Stratos Idreos /339
today+3data access made better
![Page 10: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/10.jpg)
CS165, Fall 2017 Stratos Idreos /3310
select
data data data
join
aggr
selectselect
join
it all starts with the select operator
it touches all the data
![Page 11: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/11.jpg)
CS165, Fall 2017 Stratos Idreos /3311
filtering data: point/range queriesindex
data
index knows structure of the data
an alternative data representation (data structure) of all or part of the data
![Page 12: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/12.jpg)
CS165, Fall 2017 Stratos Idreos /3312
why bother with creating/maintaining another data structure?
but wait, why not just sort the data (array) +
binary search?
![Page 13: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/13.jpg)
CS165, Fall 2017 Stratos Idreos /3313
ok let’s go with sorting for a while
A B C
initial state columns in
insertion order
sorted A B C
select B+C from R where A<10
![Page 14: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/14.jpg)
CS165, Fall 2017 Stratos Idreos /3314
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Ab1 b2 b3 b4 b5
c1 c2 c3 c4 c5
B Cvalues are out of order
![Page 15: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/15.jpg)
CS165, Fall 2017 Stratos Idreos /3314
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Ab1 b2 b3 b4 b5
c1 c2 c3 c4 c5
B Cvalues are out of order
5 3 2 1 4
![Page 16: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/16.jpg)
CS165, Fall 2017 Stratos Idreos /3314
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Ab1 b2 b3 b4 b5
c1 c2 c3 c4 c5
B Cvalues are out of order
5 3 2 1 4
a5 a3 a2 a1 a4
A5 3 2 1 4
select2 1 4
b1 b2 b3 b4 b5
B
intermediate out of order
![Page 17: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/17.jpg)
CS165, Fall 2017 Stratos Idreos /3314
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
A B Ca5 a3 a2 a1 a4
Ab1 b2 b3 b4 b5
c1 c2 c3 c4 c5
B Cvalues are out of order
5 3 2 1 4
a5 a3 a2 a1 a4
A5 3 2 1 4
select2 1 4
b1 b2 b3 b4 b5
B
intermediate out of order
sort or even better cluster at page boundaries
![Page 18: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/18.jpg)
CS165, Fall 2017 Stratos Idreos /3315
database kernel
data data data
algo
rithm
s/op
erat
ors
applications
sql
disk
memory
cpu
![Page 19: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/19.jpg)
CS165, Fall 2017 Stratos Idreos /3316
A B C
initial state columns in
insertion order
sorted A B C
sorted A B C
propagate order of A
![Page 20: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/20.jpg)
CS165, Fall 2017 Stratos Idreos /3317
pos1 pos2
1 0 1 0
A
sort
edselect max(D),min(E) from R where (A>10 and A<40) and (B>20 and B<60)
binary search for 10 & 40
B
for all B values between pos1 & 2: if B>20 and B<60 mark bit vector at pos i
maxD
D
for each marked position max(D)
…
avoid scan of A avoid TR on B work on a restricted area across all columns good for memory hierarchy
![Page 21: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/21.jpg)
CS165, Fall 2017 Stratos Idreos /3318
A
sort
edselect max(D),min(E) from R where (A>10 and A<40) or (B>20 and B<60)
binary search for 10 & 40
B
for all B values outside pos1 & 2: if B>20 and B<60 mark bit vector at pos i
maxD
D
for each marked position max(D)
…
0 0 0 1 1 1 1 1 0 0 0 0
0 1 0 1 1 1 1 1 0 1 1 0
![Page 22: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/22.jpg)
CS165, Fall 2017 Stratos Idreos /3319
A B C
base data
A B C
sort
ed
B A C
sort
ed
queries that filter on A benefit
…C-Store: A Column-oriented DBMSMichael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, Stanley B. Zdonik In Proc. of the Very Large Databases Conference (VLDB), 2005
queries that filter on B benefit
![Page 23: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/23.jpg)
CS165, Fall 2017 Stratos Idreos /33
initial state columns in
insertion order
20
A B C
base data
A B C
sort
ed
B A C
sort
ed
…
space overhead - update overhead - which ones to build?
![Page 24: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/24.jpg)
CS165, Fall 2017 Stratos Idreos /3321
declarative interface ask what you want
db system
DBAindexes/views/tuning knobs
![Page 25: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/25.jpg)
CS165, Fall 2017 Stratos Idreos /33
online
22
initial state columns in
insertion order
A B C
base datastorage budget<<smaller than the
possible set of projections
Browse: Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009
![Page 26: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/26.jpg)
CS165, Fall 2017 Stratos Idreos /3323
L1 memory
L2 memory
CPU
(assume simplified memory hierarchy)
cost to sort array Cs?cost to find a value once sorted Ca?optimized algorithm to minimize Cs & Ca
data does not fit in L1 memory; it fits in L2 CPU can read/write directly from/to L1 only
![Page 27: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/27.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
initial state: 8 unordered pages
![Page 28: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/28.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
quicksort in place
initial state: 8 unordered pages
![Page 29: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/29.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
initial state: 8 unordered pages
![Page 30: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/30.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
quicksort in place
initial state: 8 unordered pages
![Page 31: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/31.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
initial state: 8 unordered pages
![Page 32: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/32.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
quicksort in place
initial state: 8 unordered pages
![Page 33: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/33.jpg)
CS165, Fall 2017 Stratos Idreos /3324
memory level L
memory level L+1
(size=3 pages)
initial state: 8 unordered pages
each page is now sorted we read and wrote every page once
data movement cost is 2N pages
![Page 34: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/34.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
initial state: 8 sorted pages
![Page 35: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/35.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
merge to new page
initial state: 8 sorted pages
![Page 36: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/36.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
merge to new page
initial state: 8 sorted pages
![Page 37: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/37.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
merge to new page
initial state: 8 sorted pages
![Page 38: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/38.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
merge to new page
initial state: 8 sorted pages
![Page 39: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/39.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
merge to new page
initial state: 8 sorted pages
![Page 40: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/40.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
initial state: 8 sorted pages
![Page 41: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/41.jpg)
CS165, Fall 2017 Stratos Idreos /3325
memory level L
memory level L+1
(size=3 pages)
initial state: 8 sorted pages
each pair of pages is now sorted we read and wrote every page once
data movement cost is 2N pages (total 2N+2N)
![Page 42: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/42.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
![Page 43: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/43.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
![Page 44: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/44.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
log2(N)
![Page 45: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/45.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
log2(N)+1
![Page 46: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/46.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
2N(log2(N)+1)
![Page 47: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/47.jpg)
CS165, Fall 2017 Stratos Idreos /3326
1 pass to sort each page (2N pages)
1 pass to merge into 2 sorted pages (2N pages)
1 pass to merge into 4 sorted pages (2N pages)
1 pass to merge into 8 sorted pages (2N pages)
2N(log2(N)+1) x bytesPerPage
![Page 48: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/48.jpg)
CS165, Fall 2017 Stratos Idreos /3327
in general, we have M pages in memory (not just 3), so
2N(log2(N)+1) -> 2N(logM-1(N)+1)
![Page 49: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/49.jpg)
CS165, Fall 2017 Stratos Idreos /3327
in general, we have M pages in memory (not just 3), so
2N(log2(N)+1) -> 2N(logM-1(N)+1)
in our first pass we can immediately sort groups of M pages
2N(logM-1(N)+1) -> 2N(logM-1(N/M)+1)
![Page 50: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/50.jpg)
CS165, Fall 2017 Stratos Idreos /3328
data size: N pages memory size: M pages
how much memory M do we need to sort N data in p passes only?
or
how much data can we sort in p passes if we have M memory?
logM-1(N/M)+1<=p
![Page 51: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/51.jpg)
CS165, Fall 2017 Stratos Idreos /33
previous discussion holds for all levels of memory hierarchy
29
![Page 52: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/52.jpg)
CS165, Fall 2017 Stratos Idreos /3330
other usage of sorting, e.g.,:order by group by sort-merge join remove duplicates sort/cluster ids/positions to avoid random access
![Page 53: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/53.jpg)
CS165, Fall 2017 Stratos Idreos /3331
Indexing helps navigate data faster than scan
Indexing is (some times) just another way to organize data
We need to consider all levels of memory hierarchy
when we design our algorithms
and to optimally use all available bytes
Notes to remember
![Page 54: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/54.jpg)
CS165, Fall 2017 Stratos Idreos /3332
Read textbook: Chapter 13
Browse: Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009
![Page 55: class 8 indexing & sorting - Harvard SEASdaslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2019-05-13 · CS165, Fall 2017 Stratos Idreos 8 /33 midterms how to prepare](https://reader033.vdocuments.net/reader033/viewer/2022042011/5e725a36994b79525025a0b8/html5/thumbnails/55.jpg)
DATA SYSTEMSprof. Stratos Idreos
class 8
indexing & sorting