Chapter 6 Parallel Sorting Algorithm
• Sorting• Parallel Sorting• Bubble Sort• Odd-Even (Transposition) Sort• Parallel Odd-Even Transposition Sort• Related Functions
Sorting
• Arrange elements of a list into certain order • Make data become easier to access • Speed up other operations such as searching • Many sorting algorithms with different time and space
complexities
Parallel Sorting
Design methodology • Based on an existing sequential sort algorithm
–Try to utilize all resources available
–Possible to turn a poor sequential algorithm into a reasonable parallel algorithm (from O(n2) to O(n))
• Completely new approach
–New algorithm from scratch
–Harder to develop
–Sometimes yield better solution
Potential speedup• O(nlogn) optimal for any sequential sorting algorithm without using
special properties of the numbers• Optimal parallel time complexity O(nlogn/n ) = O(logn)
Bubble Sort
• One of the straight-forward sorting methods
–Cycles through the list
–Compares consecutive elements and swaps them if necessary
–Stops when no more out of order pair • Slow & inefficient • Average performance is O(n2)
Example: 6 5 3 1 8 7 2 4
Bubble Sort
for(int i=0; i<n; i++) {
for(int j=0; j<n-1; j++) {
if(array[j]>array[j+1]) {
int temp = array[j+1];
array[j+1] = array[j];
array[j] = temp;
}
}
}
Example: 6 5 3 1 8 7 2 4
Odd-Even (Transposition) Sort
• Variation of bubble sort.• Operates in two alternating phases, even phase and odd
phase.
Even phase
Even-indexed items compare and exchange with their right neighbor.
Odd phase
Odd-indexed items exchange numbers with their right neighbor.
Odd-Even (Transposition) Sort
for (int i = 0; i < n; i++) {
if (i % 2 == 1) { // odd phase
for (int j = 2; j < n; j += 2) {
if (a[j] < a[j-1])
swap (a[j-1], a[j]);
}
}
else { //even phase
for (int j = 1; j < n; j += 2) {
if (a[j] < a[j-1])
swap (a[j-1], a[j]);
}
}
}
Odd-Even (Transposition) Sort
Sorting n = 8 elements, using the odd-even transposition sort algorithm.
6 5 3 1 8 7 2 4
Parallel Odd-Even Transposition Sort
• Operates in two alternating phases, even phase and odd phase
• Even phase
Even-numbered processes exchange numbers with their right neighbor.
• Odd phase
Odd-numbered processes exchange numbers with their right neighbor.
Parallel Odd-Even Transposition
Parallel Odd-Even Transposition Sort
MPI_Comm_rank(MPI_COMM_WORLD, &mypid);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
for (int i = 0; i < nprocs; i++) {
if (i % 2 == 1) { // odd phase
if (mypid % 2 == 1)
compare_and_exchange_min(mypid+1);
else
compare_and_exchange_max(mypid-1);
}
else { //even phase
if (mypid % 2 == 0)
compare_and_exchange_min(mypid+1);
else
compare_and_exchange_max(mypid-1);
}
}
Parallel Odd-Even Transposition (n>>p)
MPI_Scatter
• MPI_Scatter is a collective routine that is very similar to MPI_Bcast
• A root processor sending data to all processors in a communicator
• MPI_Bcast sends the same piece of data to all processes
• MPI_Scatter sends chunks of an array to different processors
MPI_Scatter
• MPI_Bcast takes a single element at the root processor and copies it to all other processors
• MPI_Scatter takes an array of elements and distributes the elements in the order of the processor rank
MPI_Scatter
• Its prototypeMPI_Scatter(void* send_data, int send_count, MPI_Datatype send_datatype, void*
recv_data, int recv_count, MPI_Datatype recv_datatype, int root, MPI_Comm communicator)
• send_data: an array of data on the root processor• send_count and send_datatype: how many elements of
a MPI Datatype will be sent to each processor• recv_data: a buffer of data that can hold recv_count
elements• root: root processor• communicator
MPI_Gather
• The inverse of MPI_Scatter• Takes elements from many processors and gathers
them to one single processor• The elements are ordered by the rank of the processors
from which they were received• Used in parallel sorting and searching
MPI_Gather
• Its prototypeMPI_Gather(void* send_data, int send_count, MPI_Datatype send_datatype, void*
recv_data, int recv_count, MPI_Datatype recv_datatype, int root, MPI_Comm communicator)
• Only the root processors needs to have a valid receive buffer
• All other calling processors can pass NULL for recv_data• recv_count is the count of elements received per
processors, not the total summation of counts from all processors
Example 1
#include "mpi.h"
#include <stdio.h>
int main (int argc, char **argv)
{
int size, rank;
int recvbuf[4];
int sendbuf[16]={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Scatter(sendbuf,4,MPI_INT,recvbuf,4,MPI_INT,0,MPI_COMM_WORLD);
printf("Processor %d gets elements: %d %d %d %d\n",rank,recvbuf[0],
recvbuf[1],recvbuf[2],recvbuf[3]);
MPI_Finalize();
}
Example 1
Processor 0 gets elements: 1 2 3 4
Processor 1 gets elements: 5 6 7 8
Processor 3 gets elements: 13 14 15 16
Processor 2 gets elements: 9 10 11 12
Example 2
#include "mpi.h"
#include <stdio.h>
int main (int argc, char **argv)
{
int size, rank;
int sendbuf[4];
int recvbuf[16];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
int i;
for (i =0; i < 4; i++){
sendbuf[i]= 4*rank + i+1;
}
Example 2
MPI_Gather(sendbuf,4,MPI_INT,recvbuf,4,MPI_INT,0,MPI_COMM_WORLD);
if (rank == 0){
int j;
for(j = 0; j < 16; j++){
printf("The %d th element is %d\n", j, recvbuf[j]);
}
}
MPI_Finalize();
}
Example 2
The 0 th element is 1
The 1 th element is 2
The 2 th element is 3
The 3 th element is 4
The 4 th element is 5
The 5 th element is 6
The 6 th element is 7
The 7 th element is 8
The 8 th element is 9
The 9 th element is 10
The 10 th element is 11
The 11 th element is 12
The 12 th element is 13
The 13 th element is 14
The 14 th element is 15
The 15 th element is 16