![Page 1: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/1.jpg)
Lecture11:ExternalSorting
Lecture11
![Page 2: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/2.jpg)
Announcements
1. MidtermReview:ThisFriday!
2. ProjectPart#2isout.ImplementCLOCK!
3. MidtermMaterial:EverythinguptoBuffermanagement.1. Today’slectureisnofairgame.2. Donotforgettogoovertheactivitiesaswell!
4. OurusualWednesdayupdate….
2
Lecture11
![Page 3: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/3.jpg)
Lecture11:ExternalSorting
Lecture11
![Page 4: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/4.jpg)
Whatyouwilllearnaboutinthissection
1. ExternalMerge(ofsortedfiles)
2. ExternalMerge- Sort
4
Lecture11
![Page 5: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/5.jpg)
1.ExternalMerge
5
Lecture11
![Page 6: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/6.jpg)
Challenge:MergingBigFileswithSmallMemory
Howdoweefficientlymergetwosortedfileswhenbotharemuchlargerthanourmainmemorybuffer?
Lecture11>Section1>ExternalMerge
![Page 7: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/7.jpg)
ExternalMergeAlgorithm
• Input:2sorted listsoflengthMandN
• Output: 1sortedlistoflengthM+N
• Required:Atleast3BufferPages
• IOs:2(M+N)
Lecture11>Section1>ExternalMerge
![Page 8: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/8.jpg)
Key(Simple)IdeaTofindanelementthatisnolargerthanallelementsintwolists,one
onlyneedstocompareminimumelementsfromeachlist.
Lecture11>Section1>ExternalMerge
If:𝐴" ≤ 𝐴$ ≤ ⋯ ≤ 𝐴&𝐵" ≤ 𝐵$ ≤ ⋯ ≤ 𝐵(
Then:𝑀𝑖𝑛(𝐴", 𝐵") ≤ 𝐴/𝑀𝑖𝑛(𝐴", 𝐵") ≤ 𝐵0
fori=1….Nandj=1….M
![Page 9: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/9.jpg)
ExternalMergeAlgorithm
Lecture11>Section1>ExternalMerge
7,11 20,31
23,24 25,30
Input:Twosortedfiles
Output:Onemergedsortedfile
Disk
MainMemory
Buffer1,5
2,22
F1
F2
![Page 10: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/10.jpg)
Lecture11>Section1>ExternalMerge
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
1,5 2,22Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
ExternalMergeAlgorithm
![Page 11: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/11.jpg)
Lecture11>Section1>ExternalMerge
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
5 22 1,2Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
ExternalMergeAlgorithm
![Page 12: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/12.jpg)
Lecture11>Section1>ExternalMerge
7,11 20,31
23,24 25,30
Disk
MainMemory
Buffer
5 22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
ExternalMergeAlgorithm
![Page 13: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/13.jpg)
Lecture11>Section1>ExternalMerge
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
Thisisallthealgorithm“sees”…Whichfiletoloadapagefromnext?
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
7,11
ExternalMergeAlgorithm
![Page 14: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/14.jpg)
Lecture11>Section1>ExternalMerge
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
WeknowthatF2 onlycontainsvalues≥ 22…soweshouldloadfromF1!
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F2
7,11
ExternalMergeAlgorithm
![Page 15: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/15.jpg)
Lecture11>Section1>ExternalMerge
20,31
23,24 25,30
Disk
MainMemory
Buffer
522
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F27,11
ExternalMergeAlgorithm
![Page 16: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/16.jpg)
Lecture11>Section1>ExternalMerge
20,31
23,24 25,30
Disk
MainMemory
Buffer
5,722
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
ExternalMergeAlgorithm
![Page 17: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/17.jpg)
Lecture11>Section1>ExternalMerge
20,31
23,24 25,30
Disk
MainMemory
Buffer
5,7
22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
ExternalMergeAlgorithm
![Page 18: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/18.jpg)
Lecture11>Section1>ExternalMerge
23,24 25,30
Disk
MainMemory
Buffer
5,7
22
1,2
Input:Twosortedfiles
Output:Onemergedsortedfile
F1
F211
20,31
Andsoon…
ExternalMergeAlgorithm
![Page 19: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/19.jpg)
Wecanmerge2listsofarbitrarylengthwithonly 3bufferpages.
IflistsofsizeMandN,thenCost: 2(M+N)IOs
Eachpageisreadonce,writtenonce
WithB+1bufferpages,canmergeBlists.How?
Lecture11>Section1>ExternalMerge
![Page 20: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/20.jpg)
2.ExternalMergeSort
20
Lecture11>Section2
![Page 21: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/21.jpg)
Whatyouwilllearnaboutinthissection
1. Externalmergesort(2-waysort)
2. Externalmergesortonlargerfiles
3. Optimizationsforsorting
21
Lecture11>Section2
![Page 22: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/22.jpg)
ExternalMergeAlgorithm
• Supposewewanttomergetwosorted filesbothmuchlargerthanmainmemory(i.e.thebuffer)
•Wecanusetheexternalmergealgorithm tomergefilesofarbitrarylength in2*(N+M)IOoperationswithonly3bufferpages!
Ourfirstexampleofan“IOaware”algorithm/costmodel
Lecture11>Section2>ExternalMergeSort
![Page 23: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/23.jpg)
WhyareSortAlgorithmsImportant?
• DatarequestedfromDBinsortedorderisextremelycommon• e.g.,findstudentsinincreasing GPA order
•Whynotjustusequicksortinmainmemory??• Whataboutifweneedtosort1TBofdatawith1GBofRAM…
Aclassicproblemincomputerscience!
Lecture11>Section2>ExternalMergeSort
![Page 24: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/24.jpg)
Morereasonstosort…
• Sortingusefulforeliminatingduplicatecopiesinacollectionofrecords(Why?)
• SortingisfirststepinbulkloadingB+treeindex.
• Sort-merge joinalgorithminvolvessorting
Comingup…
Comingup…
Lecture11>Section2>ExternalMergeSort
![Page 25: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/25.jpg)
Dopeoplecare?
Sortbenchmarkbearshisname
http://sortbenchmark.org
Lecture11>Section2>ExternalMergeSort
![Page 26: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/26.jpg)
ExternalMergeSort
Lecture11>Section2>ExternalMergeSort
![Page 27: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/27.jpg)
Sohowdowesortbigfiles?
1. Splitintochunkssmallenoughtosortinmemory(“runs”)
2. Merge pairs(orgroups)ofrunsusingtheexternalmergealgorithm
3. Keepmerging theresultingruns(eachtime=a“pass”)untilleftwithonesortedfile!
Lecture11>Section2>ExternalMergeSort
![Page 28: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/28.jpg)
ExternalMergeSortAlgorithm(2-waysort)
27,24 3,1
Example:• 3Bufferpages• 6-pagefile
Disk MainMemory
Buffer
18,22F
33,12 55,3144,10
1. Splitintochunkssmallenoughtosortinmemory
Lecture11>Section2>ExternalMergeSort
Orangefile=unsorted
![Page 29: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/29.jpg)
ExternalMergeSortAlgorithm(2-waysort)
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F2
33,12 55,3144,10
1. Splitintochunkssmallenoughtosortinmemory
Example:• 3Bufferpages• 6-pagefile
Lecture11>Section2>ExternalMergeSort
Orangefile=unsorted
![Page 30: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/30.jpg)
ExternalMergeSortAlgorithm(2-waysort)
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F233,12 55,3144,10
1. Splitintochunkssmallenoughtosortinmemory
Example:• 3Bufferpages• 6-pagefile
Lecture11>Section2>ExternalMergeSort
Orangefile=unsorted
![Page 31: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/31.jpg)
ExternalMergeSortAlgorithm(2-waysort)
27,24 3,1
Disk MainMemory
Buffer
18,22
F1
F231,33 44,5510,12
Example:• 3Bufferpages• 6-pagefile
1. Splitintochunkssmallenoughtosortinmemory
Lecture11>Section2>ExternalMergeSort
Orangefile=unsorted
![Page 32: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/32.jpg)
ExternalMergeSortAlgorithm(2-waysort)
Disk MainMemory
BufferF1
F2
31,33 44,5510,12
AndsimilarlyforF2
27,24 3,118,2218,22 24,271,3
1. Splitintochunkssmallenoughtosortinmemory
Example:• 3Bufferpages• 6-pagefile
Lecture11>Section2>ExternalMergeSort
Eachsortedfileisacalledarun
![Page 33: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/33.jpg)
ExternalMergeSortAlgorithm(2-waysort)
Disk MainMemory
BufferF1
F2
2.Nowjustruntheexternalmerge algorithm&we’redone!
31,33 44,5510,12
18,22 24,271,3
Example:• 3Bufferpages• 6-pagefile
Lecture11>Section2>ExternalMergeSort
![Page 34: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/34.jpg)
CalculatingIOCost
For3bufferpages,6pagefile:
1. Splitintotwo3-pagefiles andsortinmemory1. =1R+1Wforeachfile=2*(3+3)=12IOoperations
2. Merge eachpairofsortedchunksusingtheexternalmergealgorithm1. =2*(3+3)=12IOoperations
3. Totalcost=24IO
Lecture11>Section2>ExternalMergeSort
![Page 35: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/35.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
18,43 24,2745,38
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
Lecture11>Section2>ExternalMergeSort:Largerfiles
![Page 36: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/36.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
18,43 24,2745,38
31,33 47,5510,12
18,22 23,2041,3
31,33 39,5542,46
18,23 24,271,3
48,33 44,4010,12
18,22 24,2716,31
1.Splitintofilessmallenoughtosortinbuffer…
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Lecture11>Section2>ExternalMergeSort:Largerfiles
![Page 37: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/37.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
1.Splitintofilessmallenoughtosortinbuffer…andsort
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Lecture11>Section2>ExternalMergeSort:Largerfiles
Calleachofthesesortedfilesarun
![Page 38: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/38.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Lecture11>Section2>ExternalMergeSort:Largerfiles
![Page 39: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/39.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
3.Andrepeat…
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Assumewestillonlyhave3 bufferpages(Buffernotpictured)
Lecture11>Section2>ExternalMergeSort:Largerfiles
Calleachofthesestepsapass
![Page 40: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/40.jpg)
RunningExternalMergeSortonLargerFiles
Disk
31,33 44,5510,12
27,38 43,4518,24
31,33 47,5510,12
20,22 23,413,18
39,42 46,5531,33
18,23 24,271,3
33,40 44,4810,12
22,24 27,3116,18
4.Andrepeat!
Disk
18,24 27,3110,12
43,44 45,5533,38
12,18 20,223,10
33,41 47,5523,31
18,23 24,271,3
39,42 46,5531,33
16,18 22,2410,12
33,40 44,4827,31
Disk
10,12 12,183,10
22,23 24,2718,20
33,33 38,4131,31
45,47 55,5543,44
10,12 16,181,3
23,24 24,2718,22
31,33 33,3927,31
44,46 48,5540,42
Disk
3,10 10,101,3
12,16 18,1812,12
20,22 22,2318,18
24,24 27,2723,24
31,31 31,3327,31
33,38 39,4033,33
43,44 44,4541,42
48,55 55,5546,47
Lecture11>Section2>ExternalMergeSort:Largerfiles
![Page 41: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/41.jpg)
Simplified3-pageBufferVersionAssumeforsimplicitythatwesplitanN-pagefileintoNsingle-pagerunsandsortthese;then:
• Firstpass:MergeN/2pairsofrunseach oflength1page
• Secondpass:MergeN/4pairsofrunseachoflength2pages
• Ingeneral,forN pages,wedo 𝒍𝒐𝒈𝟐 𝑵 passes• +1fortheinitialsplit&sort
• Eachpassinvolvesreadingin&writingoutallthepages=2NIO
Unsortedinputfile
Split&sort
Merge
Merge
Sorted!
à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!
Lecture11>Section2>ExternalMergeSort:Largerfiles
![Page 42: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/42.jpg)
UsingB+1bufferpagestoreduce#ofpasses
SupposewehaveB+1bufferpagesnow;wecan:
1. Increaselengthofinitialruns.SortB+1atatime!Atthebeginning,wecansplittheNpagesintorunsoflengthB+1andsorttheseinmemory
Lecture11>Section2>Optimizationsforsorting
2𝑁( log$ 𝑁 + 1)
IOCost:
Startingwithrunsoflength1
2𝑁( log$𝑵
𝑩 + 𝟏 + 1)
StartingwithrunsoflengthB+1
![Page 43: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/43.jpg)
UsingB+1bufferpagestoreduce#ofpasses
SupposewehaveB+1bufferpagesnow;wecan:
2.PerformaB-waymerge.Oneachpass,wecanmergegroupsofBrunsatatime(vs.mergingpairsofruns)!
Lecture11>Section2>Optimizationsforsorting
IOCost:
2𝑁( log$ 𝑁 + 1) 2𝑁( log$𝑵
𝑩 + 𝟏 + 1)
Startingwithrunsoflength1
StartingwithrunsoflengthB+1
2𝑁( log@𝑵
𝑩 + 𝟏 + 1)
PerformingB-waymerges
![Page 44: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/44.jpg)
Repacking
Lecture11>Section2>Optimizationsforsorting
![Page 45: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/45.jpg)
Repackingforevenlongerinitialruns
• WithB+1bufferpages,wecannowstartwithB+1-lengthinitialruns(anduseB-waymerges)toget2𝑁( log@
𝑵𝑩A𝟏
+ 1) IOcost…
• Canwereducethiscostmorebygettingevenlongerinitialruns?
• Userepacking- producelongerinitialrunsby“merging”inbufferaswesortatinitialstage
Lecture11>Section2>Optimizationsforsorting
![Page 46: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/46.jpg)
Repacking Example:3pagebuffer
• Startwithunsortedsingleinputfile,andload2pages
57,24 3,98
DiskMainMemory
Buffer18,22F1
10,33 44,5531,12
Lecture11>Section2>Optimizationsforsorting
F2
![Page 47: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/47.jpg)
Repacking Example:3pagebuffer
• Taketheminimumtwovalues,andputinoutputpage
57,24 3,98
DiskMainMemory
Buffer18,22F1
10,33
44,55
31,12
Lecture11>Section2>Optimizationsforsorting
F2 31 33 10,12
m=12
Alsokeeptrackofmax(last)valueincurrentrun…
![Page 48: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/48.jpg)
Repacking Example:3pagebuffer
• Next,repack
57,24 3,98
DiskMainMemory
BufferF1
33
Lecture11>Section2>Optimizationsforsorting
F2 31 31,3310,12
m=1244,55
18,22
![Page 49: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/49.jpg)
Repacking Example:3pagebuffer
• Next,repack,thenloadanotherpageandcontinue!
57,24 3,98
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=1244,55
m=33
18,22
![Page 50: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/50.jpg)
Repacking Example:3pagebuffer
• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…
3,98
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=33
18,2218,22
Wecallthesevaluesfrozen becausewecan’taddthemtothisrun…
44,55
57,24
![Page 51: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/51.jpg)
Repacking Example:3pagebuffer
• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=55
44,55 57,24 18,22
3,98
![Page 52: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/52.jpg)
Repacking Example:3pagebuffer
• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=55
44,55 57,24 18,22 3,98
![Page 53: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/53.jpg)
Repacking Example:3pagebuffer
• Now,however,thesmallestvaluesarelessthanthelargest(last)inthesortedrun…
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=55
44,55 3,24 18,22 57,98
![Page 54: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/54.jpg)
Repacking Example:3pagebuffer• Onceallbufferpageshaveafrozenvalue,orinputfileisempty,startnewrunwiththefrozenvalues
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=0
44,55 3,24 18,22
57,98
F3
![Page 55: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/55.jpg)
Repacking Example:3pagebuffer• Onceallbufferpageshaveafrozenvalue,orinputfileisempty,startnewrunwiththefrozenvalues
DiskMainMemory
BufferF1
Lecture11>Section2>Optimizationsforsorting
F2 31,3310,12
m=0
44,55
57,98
F3
3,18 22,24
![Page 56: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/56.jpg)
Repacking
• Notethat,forbufferwithB+1pages:• Ifinputfileissortedà nothingisfrozenà wegetasingle run!• Ifinputfileisreversesorted(worstcase)à everythingisfrozenà wegetrunsoflengthB+1
• Ingeneral,withrepackingwedonoworse thanwithoutit!
• Whatifthefileisalreadysorted?
• Engineer’sapproximation:runswillhave~2(B+1)length
~2𝑁( log@𝑵
𝟐(𝑩 + 𝟏) + 1)
Lecture11>Section2>Optimizationsforsorting
![Page 57: Lecture 11: External Sorting - GitHub Pages · External Merge Sort Algorithm (2-way sort) Disk Main Memory Buffer F 1 F 2 10,12 31,33 44,55 And similarly for F 2 18,22 27,24 3,1 1,3](https://reader036.vdocuments.net/reader036/viewer/2022062602/5ebc8c9ca972fc4d043deaa7/html5/thumbnails/57.jpg)
Summary
• BasicsofIOandbuffermanagement.
• WeintroducedtheIOcostmodelusingsorting.• SawhowtodomergeswithfewIOs,• Worksbetterthanmain-memorysortalgorithms.
• Describedafewoptimizationsforsorting
Lecture11