chapter9 sorting(1)
DESCRIPTION
Chapter9 Sorting(1). Outline. introduction Sorting Networks Bubble Sort and its Variants. Introduction. Sorting is the most common operations performed by a computer Internal or external Comparison-based Θ( nlog n ) and non comparison-based Θ(n). background. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/1.jpg)
CHAPTER9 SORTING(1)
![Page 2: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/2.jpg)
2
Outline
introduction Sorting Networks Bubble Sort and its Variants
![Page 3: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/3.jpg)
3
Introduction Sorting is the most common operations
performed by a computer
Internal or external
Comparison-based Θ(nlogn) and non comparison-based Θ(n)
![Page 4: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/4.jpg)
4
background Where the input and output sequence are
stored?stored on one processdistributed among the process
○ Useful as an intermediate step What’s the order of output sequence among
the processes?Global enumeration
![Page 5: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/5.jpg)
5
How comparisons are performed Compare-exchange is not easy in parallel
sorting algorithms One element per process
Ts+Tw, Ts>>Tw => poor performance
![Page 6: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/6.jpg)
6
How comparisons are performed (contd’)
More than one element per processn/p elements, Ai <= AjCompare-split, (ts+tw*n/p)=> Ɵ(n/p)
![Page 7: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/7.jpg)
7
Outline
introduction Sorting Networks
Bitonic sortMapping bitonic sort to hypercube and mesh
Bubble Sort and its Variants
![Page 8: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/8.jpg)
8
Sorting Networks Ɵ(log2n) Key component: Comparator
Increasing comparatorDecreasing comparator
![Page 9: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/9.jpg)
9
A typical sorting network Depth: the number of columns it contains
Network speed is proportional to it
![Page 10: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/10.jpg)
10
Bitonic sort: Ɵ(log2n) Bitonic sequence <a0,a1,…,an>
Monotonically increasing then decreasing There exists a cyclic shift of indices so that the above satisfied EG: 8 9 2 1 0 4 5 7
How to rearrange a bitonic sequence to obtain a monotonic sequence? Let s= <a0,a1,…,an> is a bitonic sequence
s1 ,s2 are bitonic every element of s1 are smaller than every element of s2
Bitonic-split; bitonic-merge=>bitonic-merging network or
![Page 11: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/11.jpg)
11
Example of bitonic merging
![Page 12: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/12.jpg)
12
Bitonic merging network Logn column
![Page 13: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/13.jpg)
13
Sorting n unordered elements Bitonic sort, bitonic-sorting network d(n)=d(n/2)+logn => d(n)=Θ(log2n)
![Page 14: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/14.jpg)
14
The first three stage
![Page 15: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/15.jpg)
15
How to map Bitonic sort to a hypercube ?
One element per process How to map the bitonic sort algorithm on general
purpose parallel computer? Process <=> a wire Compare-exchange function is performed by a pair of
processes Bitonic is communication intensive=> considering the
topology of the interconnection network○ Poor mapping => long distance before compare, degrading
performance Observation:
Communication happens between pairs of wire which have 1 bit different
![Page 16: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/16.jpg)
16
The last stage of bitonic sort
![Page 17: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/17.jpg)
17
Communication characteristics
![Page 18: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/18.jpg)
18
Bitonic sort algorithm on 2d processors Tp=Θ(log2n), cost optimal to bitonic sort
![Page 19: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/19.jpg)
19
Mapping Bitonic sort to a mesh
![Page 20: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/20.jpg)
20
The last stage of the bitonic sort
![Page 21: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/21.jpg)
21
A block of elements per process case Each processor has n/p elements
S1: Think of each process as consisting of n/p smaller processes○ Poor parallel implementation
S2: Compare-exchange=> compare-split:Θ(n/p)+Θ(n/p)The different: S2 initially sorted locallyHypercube
mesh
![Page 22: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/22.jpg)
22
Performance on different Architecture
Either very efficient nor very scalable, since the sequential algorithm is sub optimal
![Page 23: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/23.jpg)
23
Outline
introduction Sorting Networks Bubble Sort and its Variants
![Page 24: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/24.jpg)
24
Bubble sort O(n2) Inherently sequential
![Page 25: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/25.jpg)
25
Odd-even transposition N phases, each Θ(n) comparisons
![Page 26: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/26.jpg)
26
Odd-even transposition
![Page 27: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/27.jpg)
27
Parallel formulation O(n)
![Page 28: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/28.jpg)
28
Shellsort Drawback of odd-even sort
A sequence which has a few elements out of order, still need Θ(n2) to sort.
ideaAdd a preprocessing phase, moving
elements across long distanceThus reduce the odd and even phase
![Page 29: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/29.jpg)
29
Shellsort
![Page 30: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/30.jpg)
30
Conclusion Sorting Networks
Bitonic networkMapping to hypercube and mesh
Bubble Sort and its VariantsOdd-even sortShell sort
![Page 31: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/31.jpg)
CHAPTER9 SORTING(2)
![Page 32: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/32.jpg)
32
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants
Quick sort Bucket and Sample sort Other sorting algorithms
![Page 33: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/33.jpg)
33
Quick Sort Feature
Simple, low overheadΘ(nlogn) ~ Θ(n2),
IdeaChoosing a pivot, how? Partitioning into two parts, Θ(n)Recursively solving two sub-problems
complexityT(n)=T(n-1)+ Θ(n)=> Θ(n2)T(n)=T(n/2)+ Θ(n)=> Θ(nlogn)
![Page 34: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/34.jpg)
34
The sequential algorithm
![Page 35: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/35.jpg)
35
Parallelizing quicksort Solution 1
Recursive decompositionDrawback: partition handled by single process,
Ω(n). Ω(n2) Solution 2
Idea: performing partition parallelly we could partition an array of size n into two
smaller arrays in time Θ(1) by using Θ(n) processes○ how?○ CRCW PRAM, Shard-address, message-passing
model
![Page 36: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/36.jpg)
36
Parallel Formulation for CRCW PRAM –cost optimal assumption
n elements, n process write conflicts are resolved arbitrarily Executing quicksort can be visualized as constructing a
binary tree
![Page 37: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/37.jpg)
37
Example
![Page 38: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/38.jpg)
38
algorithm1. procedure BUILD TREE (A[1...n]) 2. begin 3. for each process i do 4. begin 5. root := i; 6. parenti := root; 7. leftchild[i] := rightchild[i] := n + 1; 8. end for 9. repeat for each process i ≠ root do 10. begin 11. if (A[i] < A[parenti]) or (A[i]= A[parenti] and i <parenti) then 12. begin 13. leftchild[parenti] :=i ; 14. if i = leftchild[parenti] then exit 15. else
parenti := leftchild[parenti]; 16. end for 17. else 18. begin 19. rightchild[parenti] :=i; 20. If i = rightchild[parenti] then exit 21. else
parenti := rightchild[parenti]; 22. end else 23. end repeat 24. end BUILD_TREE
Assuming balanced tree:•Partition distributeTo all process O(1)•Θ(logn) * Θ(1)
![Page 39: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/39.jpg)
39
Parallel Formulation for Shared-Address-Space Architecture assumption
N element, p processes Shared memory
How to parallelize? Idea of the algorithm
Each process is assigned a block Selecting a pivot element, broadcast Local rearrangement Global rearrangement=> smaller block S, larger block L redistributing blocks to processes
○ How many? Until breaking the array into p parts
![Page 40: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/40.jpg)
40
Example
How to compute the location?
![Page 41: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/41.jpg)
41
Example(contd’)
![Page 42: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/42.jpg)
42
How to do global rearrangement?
![Page 43: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/43.jpg)
43
Analysis Assumption
Pivot selection results in balanced partitions Logp steps
Broadcasting Pivot Θ(logp)Locally rearrangement Θ(n/p) Prefix sum Θ(log p)Global rearrangement Θ(n/p)
![Page 44: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/44.jpg)
44
Parallel Formulation for Message Passing Architecture Similar to shared-address architecture Different
Array distributed to p processes
![Page 45: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/45.jpg)
45
Pivot selection Random selection
Drawback: bad pivot lead to significant performance degradation
Median selectionAssumption: the initial distribution of
elements in each process is uniform
![Page 46: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/46.jpg)
46
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants
Quick sort Bucket and Sample sort Other sorting algorithms
![Page 47: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/47.jpg)
47
Bucket Sort Assumption
n elements distributed uniformly over [a, b] Idea
Divided into m equal sized subintervalElement replacementSorted each one
Θ(nlog(n/m)) => Θ(n) Compare with QuickSort
![Page 48: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/48.jpg)
48
Parallelization on message passing architecture N elements, p processes=> p buckets Preliminary idea
Distributing elements n/pSubinterval, elements redistributionLocally sortingDrawback: the assumption is not realistic =>
performance degradation Solution:
Sample sorting => splittersGuarantee elements < 2n/m
![Page 49: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/49.jpg)
49
Example
![Page 50: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/50.jpg)
50
analysis Distributing elements n/p Local sort & sample selection Θ(p) Sample combining Θ(P2),sortingΘ(p2logp),
global splitter Θ(p) elements partitioning Θ(plog(n/p)),
redistribution O(n)+O(plogp) Locally sorting
![Page 51: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/51.jpg)
51
Outline Issues in Sorting Sorting Networks Bubble Sort and its Variants
Quick sort Bucket and Sample sort Other sorting algorithms
![Page 52: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/52.jpg)
52
Enumeration Sort Assumption
O(n2) process, n elements, CRCW PRAM Feature
Based the rank of each element Θ(1)
![Page 53: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/53.jpg)
53
Algorithm
1. procedure ENUM SORT (n) 2. begin 3. for each process P1,j do 4. C[j] :=0; 5. for each process Pi,j do 6. if (A[i] < A[j]) or ( A[i]= A[j] and i < j) then 7. C[j] := 1; 8. else 9. C[j] := 0; 10. for each process P1,j do 11. A[C[j]] := A[j]; 12. end ENUM_SORT
Common structure: A[n], C[n]
![Page 54: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/54.jpg)
54
Radix Sort Assumption
n elements, n process Feature
Based on binary presentation of the elements
Leveraging the enumeration sorting
![Page 55: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/55.jpg)
55
Algorithm1. procedure RADIX SORT(A, r) 2. begin 3. for i := 0 to b/r - 1 do 4. begin 5. offset := 0; 6. for j := 0 to 2^r -1 do 7. begin 8. flag := 0; 9. if the ith least significant r-bit block of A[Pk] = j then 10. flag := 1; 11. index := prefix_sum(flag) // Θ(log n) 12. if flag = 1 then 13. rank := offset + index; 14. offset := parallel_sum(flag); // Θ(log n)15. endfor 16. each process Pk send its element A[Pk] to process Prank;//Θ(n) 17. endfor 18. end RADIX_SORT
![Page 56: Chapter9 Sorting(1)](https://reader036.vdocuments.net/reader036/viewer/2022062310/568161b7550346895dd183ff/html5/thumbnails/56.jpg)
56
Conclusion Sorting Networks
Bitonic network, mapping to hypercube and mesh Bubble Sort and its Variants
Odd-even sorting, shell sorting Quick sort
Parallel formation on CRCW PRAM, shared address/MP architecutre
Bucket and Sample sort Enumeration and radix sorting