sorting algorithms cs 524 – high-performance computing

23
Sorting Algorithms CS 524 – High-Performance Computing

Post on 20-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Sorting Algorithms CS 524 – High-Performance Computing

Sorting Algorithms

CS 524 – High-Performance Computing

Page 2: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 2

Sorting

Sorting is the task of arranging an unordered collection (sequence) of elements into monotonically increasing (or decreasing) order

Sorting transforms an unordered set of elements S = {a1, a2, a3,…an} into the set S’ = {a’1, a’2, a’3,…a’n} where a’i ≤ a’j for 0 ≤ i ≤ j ≤ n and S’ is a permutation of S

Sorting algorithms can be categorized into internal (S can fit into main memory) and external (S cannot fit in main memory) We study internal algorithms only

Sorting algorithms can also be categorized as comparison-based or noncomparison-based

Page 3: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 3

Data Storage on Parallel Computers

Storage of input and output sequences Where? One processor or distributed among processors? How? What is the order of data distribution with respect to

the order of the processors

Page 4: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 4

Compare-Exchange on Parallel Computers

One element per processor: ai on Pi and aj on Pj

Compare-exchange between two processors Pi and Pj requires a communication and a comparison operation

A parallel system with as many processors as number of elements would deliver poor performance. Why?

Page 5: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 5

Compare-Split on Parallel Computers (1)

Page 6: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 6

Compare-Split on Parallel Computers (2)

Each processors has n/p elements of the sequence Initially processor Pi has block Ai

After sorting, the blocks of elements are ordered such that A’i ≤ A’j for i ≤ j and union of Ai = union of A’i

Compare-split Each processor sends its block to the other (each block is

sorted locally) The processor merges the two blocks of elements The processor splits the merged elements and retains the

appropriate half of it

Page 7: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 7

Sorting Network (1)

Sorting network is a specialized interconnection network that can perform many comparisons simultaneously thus improving sorting performance significantly

Key component of the soriting network: comparator Increasing comparator Decreasing comparator

Page 8: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 8

Sorting Network (2)

Page 9: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 9

Bubble Sort

Complexity: O(n2) Bubble sort is difficult to parallelize. Why?

Page 10: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 10

Odd-Even Transposition Sort (1)

Page 11: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 11

Odd-Even Transpositon Sort (2)

Page 12: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 12

Parallel Implementation: p = n

Data partitioning: Each processor Pi has one element ai

Computation and Communication: During each phase, the odd or even numbered processors perform a compare-exchange with their right processors

Performance On a linear array On a crossbar On a bus

Not cost optimal

Page 13: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 13

Parallel Implementation: p < n

Data partitioning: Each processor Pi has n/p elements in the block Ai

Computation and Communication: Sort Ai locally (using merge sort or quicksort). Then, execute p phases (p/2 odd and p/2 even) performing compare-split operations with the right neigboring processor.

Performance On a linear array On a crossbar On a bus

Cost optimal on linear array and crossbar when p = O(log n). Not cost optimal on bus

Page 14: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 14

Shellsort (1)

Odd-even transposition sort moves elements one position at a time If a sequence has only a few unordered elements and if they

are far away from their correct position then OE sort will take a long time to sort the sequence

Shellsort can move elements longer distances. It has two phases: In the first phase, blocks that are far away are compare-split In the second phase, an odd-even transposition sort is

conducted. This is continued as long as blocks are changing positions

Page 15: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 15

Shellsort (2)

Page 16: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 16

Shellsort (3)

Initially, each processor sort its block of elements locally

First phase1. Compare-split Pi (i < p/2) with Pp-i-1 (reverse order compare-

split)

2. The processors are partitioned into two groups; one group has the first p/2 processors and the other the next p/2 processors. Compare-split (in reverse order) among each group.

3. Go to 1. Repeat for log p times.

Second phase Perform OE sort until no changes occur

Page 17: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 17

Shellsort (4)

Performance On a linear array On a crossbar On a bus

Page 18: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 18

Quicksort (1)

Page 19: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 19

Quicksort (2)

Recursive divide-and-conquer algorithm that has an average complexity of O(nlogn)

Page 20: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 20

Quicksort (3)

The partitioning of a sequence of length n has a complexity of O(n)

The selection of the pivot affects significantly the overall complexity of quicksort In the worst case, where a n-length sequence is partitioned

into a 1 and a n-1-length subsequences, the overall complexity becomes O(n2)

On average, the complexity is O(nlogn)

Page 21: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 21

Parallelizing Quicksort

A naïve formulation Start off with one process with does the initial partitioning.

Then, assign one of the subproblems (the recursion) to another process. Repeat for each subsequence until no further partitioning is possible.

Not cost-optimal (Why?)

Analysis

Page 22: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 22

Message-Passing Parallel Formulation

Data partitioning: Each processor Pi has Ai of n/p elements

Computation and communication Select a pivot Broadcast the pivot to all processors Locally rearrange the block Ai into sub-blocks Si and Li

Combine Si and Li from all processors as S and L

Partition S to one group of processors and L to the other Recursively perform these operations until a sub-block is

assigned to one processor only. Then, the processors sort the set locally

Page 23: Sorting Algorithms CS 524 – High-Performance Computing

CS 524 (Au 2004/05)- Asim Karim @ LUMS 23