lecture 17 joins - github pages · 2017-12-13 · how do i choose the right graduate program?...
TRANSCRIPT
![Page 1: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/1.jpg)
Lecture17:Joins
Lecture17
![Page 2: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/2.jpg)
GraduateSchoolInformationPanel
ShouldIattendgraduateschoolinCS?
Thursday,Nov9@3:00PM1240CS
HowdoIprepareacompetitiveapplication?
JoinusforaliveQ&AwithCSfaculty,graduatestudents,anda
graduateschooladmissionscoordinator!
HowdoIchoosetherightgraduateprogram?
![Page 3: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/3.jpg)
Lecture17:Joins
Lecture17
![Page 4: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/4.jpg)
Today’sLecture
1. Recap:Select,Project
2. Joins
3. JoinsandBufferManagement
4
Lecture17
![Page 5: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/5.jpg)
1.Recap
5
Lecture17
![Page 6: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/6.jpg)
Lecture17
LogicalPlan=How
SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.nameAND Q.city=‘Madison’
SELECTSELECTcity=‘Madison’
JOINbuyer=name
PROJECTonbuyer
Purchase Person
![Page 7: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/7.jpg)
Lecture17
PhysicalPlan=What
SELECT P.buyerFROM Purchase P, Person QWHERE P.buyer=Q.nameAND Q.city=‘Madison’
TableScan IndexScan
NestedLoopJoin
Hash-basedProject
Purchase Person
![Page 8: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/8.jpg)
Lecture16
SelectOperator
accesspath =waytoretrievetuplesfromatable• FileScan• scantheentirefile• I/Ocost:O(N),whereN=#pages
• IndexScan:• useanindexavailableonsomepredicate• I/Ocost:itvariesdependingontheindex
![Page 9: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/9.jpg)
Lecture16
IndexMatching
• Wesaythatanindexmatches aselectionpredicateiftheindexcanbeusedtoevaluateit• Consideraconjunction-onlyselection.Anindexmatches(partof)apredicateif• Hash:onlyequalityoperation&thepredicateincludesall indexattributes
• B+Tree:theattributesareaprefixofthesearchkey(anyopsarepossible)
![Page 10: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/10.jpg)
Lecture16
ChoosingtheRightIndex
• Selectivity ofanaccesspath=fraction ofdatapagesthatneedtoberetrieved• Wewanttochoosethemostselectivepath!• Estimatingtheselectivityofanaccesspathisahardproblem
![Page 11: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/11.jpg)
Lecture17
Projection
Simplecase:SELECT R.a, R.d• scanthefileandforeachtupleoutputR.a,R.d
Hardcase:SELECT DISTINCT R.a, R.d• projectouttheattributes• eliminateduplicatetuples(thisisthedifficultpart!)
![Page 12: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/12.jpg)
Lecture17
Projection:Sort-based
Wecanimproveuponthenaïvealgorithmbymodifyingthesortingalgorithm:
1. InPass0 ofsorting,projectouttheattributes
2. Insubsequentpasses,eliminatetheduplicateswhilemergingtheruns
![Page 13: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/13.jpg)
Lecture17
Projection:Hash-based
2-phasealgorithm:
• partitioning• projectoutattributesandsplittheinputintoB-1partitionsusingahashfunctionh
• duplicateelimination• readeachpartitionintomemoryanduseanin-memoryhashtable(withadifferent hashfunction)toremoveduplicates
![Page 14: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/14.jpg)
2.Joins
14
Lecture17
![Page 15: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/15.jpg)
Whatyouwilllearnaboutinthissection
1. RECAP:Joins
2. NestedLoopJoin(NLJ)
3. BlockNestedLoopJoin(BNLJ)
4. IndexNestedLoopJoin(INLJ)
15
Lecture17
![Page 16: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/16.jpg)
1.NestedLoopJoins
16
Lecture17
![Page 17: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/17.jpg)
Whatyouwilllearnaboutinthissection
1. RECAP:Joins
2. NestedLoopJoin(NLJ)
3. BlockNestedLoopJoin(BNLJ)
4. IndexNestedLoopJoin(INLJ)
17
Lecture17
![Page 18: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/18.jpg)
RECAP:Joins
Lecture17>Joins
![Page 19: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/19.jpg)
19
Joins:Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
𝐑 ⋈ 𝑺
Lecture17>Joins
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
![Page 20: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/20.jpg)
20
Joins:Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
𝐑 ⋈ 𝑺
Lecture17>Joins
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
![Page 21: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/21.jpg)
21
Joins:Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
𝐑 ⋈ 𝑺
Lecture17>Joins
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
![Page 22: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/22.jpg)
22
Joins:Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
𝐑 ⋈ 𝑺
Lecture17>Joins
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
![Page 23: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/23.jpg)
23
Joins:Example
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R SA B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
3 1 1 7
𝐑 ⋈ 𝑺
Lecture17>Joins
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
![Page 24: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/24.jpg)
24
Semantically:ASubsetoftheCrossProduct
SELECT R.A,B,C,DFROM R, SWHERE R.A = S.A
Example: Returnsallpairsoftuplesr ∈ 𝑅, 𝑠 ∈ 𝑆suchthat𝑟. 𝐴 = 𝑠. 𝐴
A D
3 7
2 2
2 3
A B C
1 0 1
2 3 4
2 5 2
3 1 1
R S A B C D
2 3 4 2
2 3 4 3
2 5 2 2
2 5 2 3
3 1 1 7
×CrossProduct
Filterbyconditions(r.A =s.A)
… Canweactuallyimplementajoininthisway?
𝐑 ⋈ 𝑺
Lecture17>Joins
![Page 25: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/25.jpg)
Notes
• Wewrite𝐑 ⋈ 𝑺 tomeanjoinRandSbyreturningalltuplepairswhereallsharedattributesareequal
• Wewrite𝐑 ⋈ 𝑺 onA tomeanjoinRandSbyreturningalltuplepairswhereattribute(s)Aareequal
• Forsimplicity,we’llconsiderjoinsontwotables andwithequalityconstraints(“equijoins”)
Howeverjoinscanmerge>2tables,andsomealgorithmsdosupportnon-equalityconstraints!
Lecture17>Joins
![Page 26: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/26.jpg)
26
NestedLoopJoins
Lecture17>NLJ
![Page 27: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/27.jpg)
Notes• Weareagainconsidering“IOaware”algorithms:careaboutdiskIO
• GivenarelationR,let:• T(R)=#oftuplesinR• P(R)=#ofpagesinR
• Notealsothatweomitceilingsincalculations…goodexercisetoputbackin!
Lecture17>NLJ
Recallthatweread/writeentirepageswithdiskIO
![Page 28: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/28.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
![Page 29: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/29.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
P(R)
1. LoopoverthetuplesinR
NotethatourIOcostisbasedonthenumberofpagesloaded,notthenumberoftuples!
Cost:
![Page 30: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/30.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
P(R)+T(R)*P(S)
HavetoreadallofSfromdiskforeverytupleinR!
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
Cost:
![Page 31: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/31.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
P(R)+T(R)*P(S)
NotethatNLJcanhandlethingsotherthanequalityconstraints…justcheckintheifstatement!
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
3. Checkagainstjoinconditions
Cost:
![Page 32: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/32.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
P(R)+T(R)*P(S)+OUT
1. LoopoverthetuplesinR
2. ForeverytupleinR,loopoverallthetuplesinS
3. Checkagainstjoinconditions
4. Writeout(topage,thenwhenpagefull,todisk)
Cost:
WhatwouldOUTbeifourjoinconditionistrivial(ifTRUE)?
OUT couldbebiggerthanP(R)*P(S)…butusuallynotthatbad
![Page 33: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/33.jpg)
NestedLoopJoin(NLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for r in R:for s in S:if r[A] == s[A]:yield (r,s)
Lecture17>NLJ
P(R)+T(R)*P(S)+OUT
WhatifR(“outer”)andS(“inner”)switched?
Cost:
P(S)+T(S)*P(R)+OUT
Outervs.innerselectionmakesahugedifference-DBMSneedstoknowwhichrelationissmaller!
![Page 34: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/34.jpg)
IO-AwareApproach
Lecture17>BNLJ
![Page 35: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/35.jpg)
BlockNestedLoopJoin(BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Lecture17>BNLJ
P(𝑅)
GivenB+1pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
Cost:
Note:Therecouldbesomespeeduphereduetothefactthatwe’rereadinginmultiplepagessequentiallyhoweverwe’llignorethishere!
![Page 36: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/36.jpg)
BlockNestedLoopJoin(BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Lecture17>BNLJ
P 𝑅 +𝑃 𝑅𝐵 − 1𝑃(𝑆)
GivenB+1pagesofmemory
Note:Fastertoiterateoverthesmaller relationfirst!
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
Cost:
![Page 37: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/37.jpg)
BlockNestedLoopJoin(BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Lecture17>BNLJ
GivenB+1pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
BNLJcanalsohandlenon-equalityconstraints
Cost:
P 𝑅 +𝑃 𝑅𝐵 − 1𝑃(𝑆)
![Page 38: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/38.jpg)
BlockNestedLoopJoin(BNLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:for each B-1 pages pr of R:
for page ps of S:for each tuple r in pr:
for each tuple s in ps:if r[A] == s[A]:
yield (r,s)
Lecture17>BNLJ
P 𝑅 +; <=>?
𝑃(𝑆) +OUT
GivenB+1pagesofmemory
1. LoadinB-1pagesofRatatime(leaving1pageeachfreeforS&output)
2. Foreach(B-1)-pagesegmentofR,loadeachpageofS
3. Checkagainstthejoinconditions
4. Writeout
Cost:
Again,OUT couldbebiggerthanP(R)*P(S)…butusuallynotthatbad
![Page 39: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/39.jpg)
BNLJvs.NLJ:BenefitsofIOAware
• InBNLJ,byloadinglargerchunksofR,weminimizethenumberoffulldiskreads ofS• WeonlyreadallofSfromdiskforevery(B-1)-pagesegmentofR!• Stillthefullcross-product,butmoredoneonlyinmemory
P 𝑅 +; <=>?
𝑃(𝑆) +OUTP(R)+T(R)*P(S)+OUTNLJ BNLJ
BNLJisfasterbyroughly(=>?)@(<);(<)
!
Lecture17>BNLJ
![Page 40: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/40.jpg)
BNLJvs.NLJ:BenefitsofIOAware
• Example:• R:500pages• S:1000pages• 100tuples/page• Wehave12pagesofmemory(B=11)
• NLJ:Cost=500+50,000*1000 =50MillionIOs~=140hours
• BNLJ:Cost=500+ABB∗?BBB?B
=50Thousand IOs~=0.14hours
Lecture17>BNLJ
Averyrealdifferencefromasmallchangeinthealgorithm!
IgnoringOUThere…
![Page 41: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/41.jpg)
SmarterthanCross-Products
Lecture17>INLJ
![Page 42: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/42.jpg)
SmarterthanCross-Products:FromQuadratictoNearlyLinear• Alljoinsthatcomputethefullcross-product havesomequadraticterm• Forexamplewesaw:
• Nowwe’llseesome(nearly)linearjoins:• ~O(P(R)+P(S)+OUT),whereagainOUT couldbequadraticbutisusuallybetter
P 𝑅 +𝑷 𝑹=>?
𝑷(𝑺) +OUT
P(R)+T(R)P(S)+OUTNLJ
BNLJ
Wegetthisgainbytakingadvantageofstructure- movingtoequalityconstraints(“equijoin”)only!
Lecture17>INLJ
![Page 43: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/43.jpg)
IndexNestedLoopJoin(INLJ)
Compute R ⋈ 𝑆𝑜𝑛𝐴:Given index idx on S.A:
for r in R:s in idx(r[A]):
yield r,s
Lecture17>INLJ
P(R)+T(R)*L+OUT
àWecanuseanindex (e.g.B+Tree)toavoiddoingthefullcross-product!
whereListheIOcosttoaccessallthedistinctvaluesintheindex;assumingthesefitononepage,L~3 isgoodest.
Cost:
![Page 44: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/44.jpg)
3.JoinsandMemory
44
Lecture17
![Page 45: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/45.jpg)
Whatyouwilllearnaboutinthissection
1. Sort-MergeJoin(SMJ)
2. HashJoin(HJ)
3. SMJvs.HJ
45
Lecture17
![Page 46: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/46.jpg)
Sort-MergeJoin(SMJ)
46
Lecture17
![Page 47: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/47.jpg)
Whatyouwilllearnaboutinthissection
1. Sort-MergeJoin
2. “Backup”&TotalCost
3. Optimizations
47
Lecture17
![Page 48: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/48.jpg)
SortMergeJoin(SMJ):BasicProcedure
TocomputeR ⋈ 𝑆𝑜𝑛𝐴:
1. SortR,SonAusingexternalmergesort
2. Scan sortedfilesand“merge”
3. [Mayneedto“backup”- seenextsubsection]
NotethatifR,SarealreadysortedonA,SMJwillbeawesome!
Lecture17>SMJ
Notethatweareonlyconsideringequalityjoinconditionshere
![Page 49: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/49.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
• Forsimplicity:Leteachpagebeonetuple,andletthefirstvaluebeA
DiskMainMemory
BufferR (5,b) (3,j)(0,a)
S (7,f) (0,j)(3,g)
WeshowthefileHEAD,whichisthenextvaluetoberead!
Lecture17>SMJ
![Page 50: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/50.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
1.SorttherelationsR,Sonthejoinkey(firstvalue)
DiskMainMemory
BufferR (5,b) (3,j)(0,a)
S (7,f) (0,j)(3,g)
(3,j) (5,b)(0,a)
(3,g) (7,f)(0,j)
Lecture17>SMJ
![Page 51: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/51.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
2.Scanand“merge”onjoinkey!
DiskMainMemory
BufferR
S (3,g) (7,f)
(3,j) (5,b)
Output
(0,j)
(0,a)(0,a)
(0,j)
Lecture17>SMJ
![Page 52: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/52.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
2.Scanand“merge”onjoinkey!
DiskMainMemory
BufferR
S (3,g) (7,f)
(3,j) (5,b)
Output
(0,j)(0,a)
(0,a)
(0,j)(0,a,j)
Lecture17>SMJ
![Page 53: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/53.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
2.Scanand“merge”onjoinkey!
DiskMainMemory
BufferR
S (3,g) (7,f)
(3,j) (5,b)
Output
(0,a)
(0,j)
(0,a,j)
(3,j,g)
(3,j)
(3,g)
(5,b)
(7,f)
Lecture17>SMJ
![Page 54: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/54.jpg)
SMJExample:R ⋈ 𝑆𝑜𝑛𝐴with3pagebuffer
2.Done!
DiskMainMemory
BufferR
S 3,g 7,f
3,j 5,b
Output
(0,a)
(0,j)
(0,a,j)
(3,j)
(3,g)
(3,j,g)
(5,b)
(7,f)
Lecture17>SMJ
![Page 55: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/55.jpg)
Whathappenswithduplicatejoinkeys?
Lecture17>Backup
![Page 56: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/56.jpg)
MultipletupleswithSameJoinKey:“Backup”
1.Startwithsortedrelations,andbeginscan/merge…
DiskMainMemory
BufferR
S 3,g 7,f
3,j 5,b
Output
(0,j)
(0,g)
(0,b)
(7,f)
(0,a)
(0,j)
(0,a)
(0,j)
Lecture17>Backup
![Page 57: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/57.jpg)
1.Startwithsortedrelations,andbeginscan/merge…
DiskMainMemory
BufferR
S 3,g 7,f
3,j 5,b
Output
(0,j)
(0,g)
(0,b)
(7,f)
(0,a)
(0,a)(0,j)
(0,j) (0,a,j)
MultipletupleswithSameJoinKey:“Backup”
Lecture17>Backup
![Page 58: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/58.jpg)
1.Startwithsortedrelations,andbeginscan/merge…
DiskMainMemory
BufferR
S (0,g) 7,f
(0,j) 5,b
Output
(0,b)
(7,f)
(0,a)
(0,a)(0,j)
(0,a,j)
(0,a,g)
MultipletupleswithSameJoinKey:“Backup”
(0,g)
(0,j)
Lecture17>Backup
![Page 59: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/59.jpg)
1.Startwithsortedrelations,andbeginscan/merge…
DiskMainMemory
BufferR
S 0,g 7,f
0,j 5,b
Output
(0,j) (0,b)
(7,f)
(0,a)
(0,a,j)
(0,g)
(0,a,g)
(0,j)
Haveto“backup”inthescanofSandreadtuplewe’vealreadyread!
(0,j)
MultipletupleswithSameJoinKey:“Backup”
(0,j)
Lecture17>Backup
![Page 60: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/60.jpg)
Backup
• Atbest,nobackupà scantakesP(R)+P(S)reads• Forex:ifnoduplicatevaluesinjoinattribute
• Atworst(e.g.fullbackupeachtime),scancouldtakeP(R)*P(S)reads!• Forex:ifallduplicatevaluesinjoinattribute,i.e.alltuplesinRandShavethesamevalueforthejoinattribute
• Roughly:ForeachpageofR,we’llhavetobackupandreadeachpageofS…
• Oftennotthatbadhowever,pluswecan:• Leavemoredatainbuffer(forlargerbuffers)• Can“zig-zag”(seeanimation)
Lecture17>Backup
![Page 61: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/61.jpg)
SMJ:Totalcost
• CostofSMJiscostofsorting RandS…
• Plusthecostofscanning:~P(R)+P(S)• Becauseofbackup:inworstcaseP(R)*P(S);butthiswouldbeveryunlikely
• Plusthecostofwritingout:~P(R)+P(S)butinworstcaseT(R)*T(S)
~Sort(P(R))+Sort(P(S))+P(R)+P(S) +OUT
Recall:Sort(N)≈ 2𝑁 log=𝑵
𝟐(𝑩R𝟏)+ 1
Note:thisisusingrepacking,whereweestimatethatwecancreateinitialrunsoflength~2(B+1)
Lecture17>Backup
![Page 62: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/62.jpg)
SMJvs.BNLJ:SteelCageMatch
• Ifwehave100bufferpages,P(R)= 1000pagesandP(S)=500pages:• Sortbothintwopasses:2*2*1000+2*2*500=6,000IOs• Mergephase1000+500=1,500IOs• =7,500IOs+OUT
WhatisBNLJ?• 500+1000* ABB
TU=6,500IOs+OUT
• But,ifwehave35bufferpages?• SortMergehassamebehavior(still2passes)• BNLJ?15,500IOs+OUT!
SMJis~linearvs.BNLJisquadratic…Butit’sallaboutthememory.
Lecture17>Backup
![Page 63: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/63.jpg)
ASimpleOptimization:MergesMerged!
• SMJiscomposedofasortphaseandamergephase
• Duringthesortphase,runpassesofexternalmergesortonRandS• Supposeatsomepoint,RandShave<=B(sorted)runsintotal
• Wecoulddotwomerges(foreachofR&S)atthispoint,completethesortphase,andstartthemergephase…
• OR,wecouldcombinethem:doone B-waymergeandcompletethejoin!
GivenB+1bufferpages
Lecture17>Backup
![Page 64: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/64.jpg)
Merge/JoinPhase
SortPhase(Ext.MergeSort)
Un-OptimizedSMJ
SR
Split&sortSplit&sort
MergeMerge
MergeMerge
GivenB+1bufferpages
Joinedoutputfilecreated!
Unsortedinputrelations
Lecture17>Backup
![Page 65: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/65.jpg)
Merge/JoinPhase
SortPhase(Ext.MergeSort)
SimpleSMJOptimization
SR
Split&sortSplit&sort
MergeMerge
GivenB+1bufferpages
Joinedoutputfilecreated!
Unsortedinputrelations
<=Btotalruns
B-WayMerge/Join
Lecture17>Backup
![Page 66: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/66.jpg)
SimpleSMJOptimization
• Now,onthislastpass,weonlydoP(R)+P(S)IOstocompletethejoin!
• IfwecaninitiallysplitRandSintoBtotalrunseachoflengthapprox.<=2(B+1),assumingrepackingletsuscreateinitialrunsof~2(B+1)- thenweonlyneed3(P(R)+P(S))+OUT forSMJ!• 2R/Wperpagetosortrunsinmemory,1RperpagetoB-waymerge/join!
• Howmuchmemoryforthistohappen?• ; < R;(V)
=≤ 2 𝐵 + 1 ⇒ ~P R + P S ≤ 2𝐵Z
• Thus,𝐦𝐚𝐱{𝐏 𝐑 , 𝐏 𝐒 } ≤ 𝑩𝟐 isanapproximatesufficientcondition
GivenB+1bufferpages
IfthelargerofR,Shas<=B2 pages,thenSMJcosts3(P(R)+P(S))+OUT!
Lecture17>Backup
![Page 67: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/67.jpg)
TakeawaypointsfromSMJ
Ifinputalreadysortedonjoinkey,skipthesorts.• SMJisbasicallylinear.• Nastybutunlikelycase:Manyduplicatejoinkeys.
SMJneedstosortbothrelations• Ifmax{P(R),P(S)}<B2 thencostis3(P(R)+P(S))+OUT
Lecture17>Summary
![Page 68: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/68.jpg)
HashJoin(HJ)
68
Lecture17
![Page 69: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/69.jpg)
Whatyouwilllearnaboutinthissection
1. HashJoin
2. Memoryrequirements
69
Lecture17
![Page 70: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/70.jpg)
Recall:Hashing
• Magicofhashing:• AhashfunctionhB mapsinto[0,B-1]• Andmapsnearlyuniformly
• Ahashcollision iswhenx!=ybuthB(x)=hB(y)• Notehoweverthatitwillnever occurthatx=ybuthB(x)!=hB(y)
• WehashonanattributeA,soourhasfunctionishB(t)hastheformhB(t.A).• Collisionsmaybemorefrequent.
Lecture17>HJ
![Page 71: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/71.jpg)
HashJoin:High-levelprocedure
TocomputeR ⋈ 𝑆𝑜𝑛𝐴:
1. PartitionPhase:Usingone(shared)hashfunctionhB,partitionRandSintoB buckets
2. MatchingPhase:Takepairsofbucketswhosetupleshavethesamevaluesforh,andjointhese1. UseBNLJhere;orhashagainà eitherway,operatingonsmallpartitionsso
fast!
Noteagainthatweareonlyconsideringequalityconstraintshere
Wedecompose theproblemusinghB,thencompletethejoin
Lecture17>HJ
![Page 72: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/72.jpg)
HashJoin:High-levelprocedure
1.PartitionPhase:Usingone(shared)hashfunctionhB,partitionRandSintoB buckets
Disk
R
S
(3,j)(0,j)
(0,a)(0,a)
(3,b)
(5,b)(0,a)(0,j)
Disk
R1
S1
hB
S2
R2
Moredetailinasecond…
(0,a)(0,a)
(0,j)
(3,j)(3,b)
(0,a)(0,j)
(5,b)(5,b)
Noteournewconvention:pageseachhavetwotuples(oneperrow)
Lecture17>HJ
![Page 73: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/73.jpg)
HashJoin:High-levelprocedure
Disk
R
S
(3,j)(0,j)
(0,a)(0,a)
(3,b)
(5,b)(0,a)(0,j)
Disk
R1
S1
hB
S2
R2
(0,a)(0,a)
(0,j)
(0,a)(0,j)
(5,b)(5,b)
Joinmatchingbuckets
2.MatchingPhase:TakepairsofbucketswhosetupleshavethesamevaluesforhB,andjointhese
Lecture17>HJ
(3,j)(3,b)
![Page 74: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/74.jpg)
HashJoin:High-levelprocedure
Disk
R
S
(3,j)(0,j)
(0,a)(0,a)
(3,b)
(5,b)(0,a)(0,j)
Disk
R1
S1
hB
S2
R2
(0,a)(0,a)
(0,j)
(0,a)(0,j)
(5,b)(5,b)
Don’thavetojointheothers!E.g.(S1 andR2)!
2.MatchingPhase:TakepairsofbucketswhosetupleshavethesamevaluesforhB,andjointhese
Lecture17>HJ
(3,j)(3,b)
![Page 75: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/75.jpg)
HashJoinPhase1:Partitioning
Goal:Foreachrelation,partitionrelationintobuckets suchthatifhB(t.A)=hB(t’.A)theyareinthesamebucket
GivenB+1bufferpages,wepartitionintoBbuckets:• WeuseBbufferpagesforoutput(oneforeachbucket),and1forinput
• The“dual”ofsorting.• Foreachtupletininput,copytobufferpageforhB(t.A)• Whenpagefillsup,flushtodisk.
Lecture17>HJ
![Page 76: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/76.jpg)
Howbigaretheresultingbuckets?
• GivenNinputpages,wepartitionintoBbuckets:• à Ideallyourbucketsareeachofsize~N/Bpages
• Whathappensiftherearehashcollisions?• Bucketscouldbe>N/B• We’lldoseveralpasses…
• Whathappensifthereareduplicatejoinkeys?• Nothingwecandohere…couldhavesomeskew insizeofthebuckets
GivenB+1buffer pages
Lecture17>HJ
![Page 77: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/77.jpg)
Howbigdowewant theresultingbuckets?
• Ideally,ourbucketswouldbeofsize≤ 𝑩− 𝟏 pages• 1 forinputpage, 1foroutputpage,B-1 foreachbucket
• Recall:IfwewanttojoinabucketfromRandonefromS,wecandoBNLJinlineartimeifforoneofthem(wlog sayR),𝑷(𝑹) ≤ 𝑩 − 𝟏!• Andmoregenerally,beingabletofitbucketinmemoryisadvantageous
• Wecankeeppartitioningbucketsthatare>B-1pages,untiltheyare≤ 𝑩− 𝟏 pages• Usinganewhashkeywhichwillsplitthem… We’llcalleachofthese
a“pass”again…
GivenB+1buffer pages
RecallforBNLJ:
P 𝑅 +𝑃 𝑅 𝑃(𝑆)𝐵 − 1
Lecture17>HJ
![Page 78: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/78.jpg)
WepartitionintoB=2 bucketsusinghashfunctionh2 sothatwecanhaveonebufferpageforeachpartition(andoneforinput)
HashJoinPhase1:Partitioning
Disk
R
(3,j)(0,j)
GivenB+1=3buffer pages
(5,b) (5,a)(0,j)
(0,a)(3,a)
Forsimplicity,we’lllookatpartitioningoneofthetworelations- wejustdothesamefortheotherrelation!
Recall:ourgoalwillbetogetB=2buckets ofsize<=B-1à 1pageeach
Lecture17>HJ
![Page 79: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/79.jpg)
1.WereadpagesfromRintothe“input”pageofthebuffer…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(3,j)(0,j)
(5,b) (5,a)(0,j)
(0,a)(3,a)
Lecture17>HJ
![Page 80: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/80.jpg)
2.Thenweusehashfunctionh2 tosortintothebuckets,whicheachhaveonepageinthebuffer
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(3,a)
h2(0)=0
(0,a)(3,a)
(0,a)
(3,j)(0,j)
(5,b) (5,a)(0,j)
Lecture17>HJ
![Page 81: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/81.jpg)
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(3,a)
h2(3)=1
(0,a) (3,a)
(3,j)(0,j)
(5,b) (5,a)(0,j)
2.Thenweusehashfunctionh2 tosortintothebuckets,whicheachhaveonepageinthebuffer
Lecture17>HJ
![Page 82: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/82.jpg)
3.Werepeatuntilthebufferbucketpagesarefull…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(0,a) (3,a)
(3,j)(0,j)
(5,b) (5,a)(0,j)
Lecture17>HJ
![Page 83: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/83.jpg)
3.Werepeatuntilthebufferbucketpagesarefull…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(0,a) (3,a)(0,j)
(5,b) (5,a)(0,j)
h2(3)=1
(3,j)(0,j)
(3,a)(3,j)
Lecture17>HJ
![Page 84: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/84.jpg)
3.Werepeatuntilthebufferbucketpagesarefull…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(0,a) (3,a)(0,j)
(5,b) (5,a)(0,j)
h2(0)=0
(3,a)(3,j)
(0,a)(0,j)
Lecture17>HJ
![Page 85: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/85.jpg)
3.Werepeatuntilthebufferbucketpagesarefull…thenflushtodisk
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(5,b) (5,a)(0,j)
B0
B1
(3,a)(3,j)
(0,a)(0,j)
Lecture17>HJ
![Page 86: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/86.jpg)
3.Werepeatuntilthebufferbucketpagesarefull…thenflushtodisk
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(5,b)
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(5,a)(0,j)
Lecture17>HJ
![Page 87: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/87.jpg)
Notethatcollisionscanoccur!
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(5,b)
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
h2(5)=1
Collision!!!
(5,a)(0,j)
(5,a)
Lecture17>HJ
h2(5)=h2(3)=1
![Page 88: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/88.jpg)
Finishthispass…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
(5,b)
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
h2(0)=0
(5,a)(0,j)
Lecture17>HJ
![Page 89: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/89.jpg)
Finishthispass…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(5,a)(0,j)
(5,b)
Lecture17>HJ
![Page 90: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/90.jpg)
Finishthispass…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(5,a)(0,j)(5,b)
h2(5)=1
(5,a)(5,b)
Lecture17>HJ
h2(5)=h2(3)=1
Collision!!!
![Page 91: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/91.jpg)
Finishthispass…
MainMemory
Buffer
HashJoinPhase1:Partitioning
Inputpage
0 1
Output(bucket)pages
Disk
R
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j) (5,a)(5,b)
Lecture17>HJ
![Page 92: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/92.jpg)
HashJoinPhase1:Partitioning
Disk
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
(5,a)(5,b)
WewantedbucketsofsizeB-1=1…howeverwegotlargeronesdueto:
(1)Duplicatejoinkeys
(2)Hashcollisions
Lecture17>HJ
![Page 93: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/93.jpg)
HashJoinPhase1:Partitioning
Disk
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
(5,a)(5,b)
Totakecareoflargerbucketscausedby(2)hashcollisions,wecanjustdoanotherpass!
Whathashfunctionshouldweuse?
Doanotherpasswithadifferenthashfunction,h’2,ideallysuchthat:
h’2(3)!=h’2(5)
Lecture17>HJ
![Page 94: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/94.jpg)
HashJoinPhase1:Partitioning
Disk
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
Totakecareoflargerbucketscausedby(2)hashcollisions,wecanjustdoanotherpass!
Whathashfunctionshouldweuse?
Doanotherpasswithadifferenthashfunction,h’2,ideallysuchthat:
h’2(3)!=h’2(5)
B2(5,a)(5,b)
Lecture17>HJ
![Page 95: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/95.jpg)
HashJoinPhase1:Partitioning
Disk
GivenB+1=3buffer pages
B0
B1
(0,a)(0,j)
(3,a)(3,j)
(0,j)
Whataboutduplicatejoinkeys?Unfortunatelythisisaproblem…butusuallynotahugeone.
B2(5,a)(5,b)
Wecallthisunevennessinthebucketsizeskew
Lecture17>HJ
![Page 96: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/96.jpg)
NowthatwehavepartitionedRandS…
Lecture17>HJ
![Page 97: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/97.jpg)
HashJoinPhase2:Matching
• Now,wejustjoinpairsofbucketsfromRandSthathavethesamehashvaluetocompletethejoin!
Disk
R
S
(3,j)(0,j)
(0,a)(0,a)
(3,b)
(5,b)(0,a)(0,j)
Disk
R1
S1
hB
S2
R2
(0,a)(0,a)
(0,j)
(0,a)(0,j)
(5,b)(5,b)
Joinmatchingbuckets
Lecture17>HJ
(3,j)(3,b)
![Page 98: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/98.jpg)
HashJoinPhase2:Matching
• Notethatsincex=yà h(x)=h(y),weonlyneedtoconsiderpairsofbuckets(onefromR,onefromS)thathavethesamehashfunctionvalue
• Ifourbucketsare~𝑩 − 𝟏 pages, canjoineachsuchpairusingBNLJinlineartime;recall(withP(R)=B-1):
BNLJCost: P 𝑅 +; < ;(V)=>?
= 𝑃(𝑅) + (=>?);(V)=>?
=P(R)+P(S)
Joiningthepairsofbucketsislinear!(Aslongassmallerbucket<=B-1pages)
Lecture17>HJ
![Page 99: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/99.jpg)
HashJoinPhase2:Matching
h(1)=0
h(1)=0
h(2)=0
h(3)=1
h(3)=1
h(4)=1
h(5)=2
h(6)=2
h(6)=2
h(1) h(1) h(2) h(2) h(3) h(4) h(5) h(6) h(6)
R.Ahashedvalues
S.Ahashedvalues
R ⋈ 𝑆𝑜𝑛𝐴
Lecture17>HJ
![Page 100: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/100.jpg)
HashJoinPhase2:Matching
h(1)=0
h(1)=0
h(2)=0
h(3)=1
h(3)=1
h(4)=1
h(5)=2
h(6)=2
h(6)=2
h(1) h(1) h(2) h(2) h(3) h(4) h(5) h(6) h(6)
R.Ahashedvalues
S.Ahashedvalues
R ⋈ 𝑆𝑜𝑛𝐴
Toperformthejoin,weideallyjustneedtoexplorethedarkblueregions
=thetupleswithsamevaluesofthejoinkeyA
A=1
A=2
A=3
A=4A=5
A=6
Lecture17>HJ
![Page 101: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/101.jpg)
HashJoinPhase2:Matching
h(1)=0
h(1)=0
h(2)=0
h(3)=1
h(3)=1
h(4)=1
h(5)=2
h(6)=2
h(6)=2
h(1) h(1) h(2) h(2) h(3) h(4) h(5) h(6) h(6)
R.Ahashedvalues
S.Ahashedvalues
R ⋈ 𝑆𝑜𝑛𝐴
WithajoinalgorithmlikeBNLJthatdoesn’ttakeadvantageofequijoinstructure,we’dhavetoexplorethiswholegrid!
Lecture17>HJ
![Page 102: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/102.jpg)
HashJoinPhase2:Matching
h(1)=0
h(1)=0
h(2)=0
h(3)=1
h(3)=1
h(4)=1
h(5)=2
h(6)=2
h(6)=2
h(1) h(1) h(2) h(2) h(3) h(4) h(5) h(6) h(6)
R.Ahashedvalues
S.Ahashedvalues
R ⋈ 𝑆𝑜𝑛𝐴h(A)=0
h(A)=1
h(A)=2
WithHJ,weonlyexploretheblueregions
=thetupleswithsamevaluesofh(A)!
WecanapplyBNLJtoeachoftheseregions
Lecture17>HJ
![Page 103: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/103.jpg)
HashJoinPhase2:Matching
R.Ahashedvalues
S.Ahashedvalues
R ⋈ 𝑆𝑜𝑛𝐴h'(A)=0
h'(A)=2
AnalternativetoapplyingBNLJ:
Wecouldalsohashagain,andkeepdoingpassesinmemorytoreducefurther!
h'(A)=1
h'(A)=3 h'(A)
=4
h'(A)=5
Lecture17>HJ
![Page 104: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/104.jpg)
HowmuchmemorydoweneedforHJ?
• GivenB+1bufferpages
• Suppose(reasonably)thatwecanpartitionintoBbucketsin2passes:• ForR,wegetBbucketsofsize~P(R)/B• Tojointhesebucketsinlineartime,weneedthesebucketstofitinB-1pages,sowehave:
+WLOG:AssumeP(R)<=P(S)
𝐵 − 1 ≥𝑃 𝑅𝐵 ⇒ ~𝑩𝟐 ≥ 𝑷(𝑹)
Quadraticrelationshipbetweensmallerrelation’ssize&memory!
Lecture17>Memoryrequirements
![Page 105: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/105.jpg)
HashJoinSummary
• Givenenoughbufferpagesasonpreviousslide…
• Partitioning requiresreading+writingeachpageofR,S• à 2(P(R)+P(S))IOs
• Matching (withBNLJ)requiresreadingeachpageofR,S• à P(R)+P(S)IOs
• Writingoutresults couldbeasbadasP(R)*P(S)…butprobablyclosertoP(R)+P(S)
HJtakes~3(P(R)+P(S))+OUT IOs!
Lecture17>Memoryrequirements
![Page 106: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/106.jpg)
SMJvs.HJ
Lecture17
![Page 107: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/107.jpg)
Sort-Mergev.HashJoin
• Givenenoughmemory,bothSMJandHJhaveperformance:
• “Enough”memory=
• SMJ:B2 >max{P(R),P(S)}
• HJ:B2 >min{P(R),P(S)}
HashJoinsuperiorifrelationsizesdiffergreatly.Why?
~3(P(R)+P(S))+OUT
Lecture17
![Page 108: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/108.jpg)
FurtherComparisonsofHashandSortJoins
• HashJoinsarehighlyparallelizable.
• Sort-Mergelesssensitivetodataskew andresultissorted
Lecture17
![Page 109: Lecture 17 Joins - GitHub Pages · 2017-12-13 · How do I choose the right graduate program? Lecture 17: Joins Lecture 17. Today’s Lecture 1. Recap: Select, Project 2. Joins 3](https://reader033.vdocuments.net/reader033/viewer/2022050510/5f9b217442834004ee16db7a/html5/thumbnails/109.jpg)
Summary
• SawIO-awarejoinalgorithms• Massivedifference
• Memorysizeskeyinhashversussortjoin• HashJoin=Littledog(dependsonsmallerrelation)
• Skewisalsoamajorfactor
Lecture17