stupid columnsort tricks geeta chaudhry tom cormen dartmouth college department of computer science
TRANSCRIPT
What Do We Know About Columnsort?
• Sorts N values on an r s mesh• Uses 8 steps
– Each step either sorts each column or performs a fixed permutation
• Divisibility restriction: s divides r• Height restriction: r ≥ 2s2• 4s3/2
– Exponent of s goes from 2 to 3/2– Mesh need not be quite so tall and skinny– Cost: 2 additional steps– Can simultaneously remove the divisibility
restriction and relax the height restriction tor ≥ 6s3/2
Why Relax the Conditions?
• Columnsort applies in more circumstances
• Our motivation: out-of-core sorting• Column height r is limited by amount of
memory– Either per processor or in entire system– N = rs, r ≥ 2s2 ==> N ≤ r3/2/21/2
– N = rs, r ≥ 4s3/2 ==> N ≤ r5/3/42/3
– Reducing the exponent of s in the bound for r allows us to sort more values with a given amount of memory
• A similar technique works for applying columnsort to in-core sorting
This Talk
• Slabpose columnsort– r ≥ 4s3/2
– Requires divisibility restriction
• Also in the paper– Subblock columnsort
• r ≥ 4s3/2 with divisibility restriction• r ≥ 6s3/2 without divisibility restriction
– Proof that the divisibility restriction is unnecessary in the basic columnsort algorithm
Columnsort Steps
1. Sort each column2. Transpose entire mesh3. Sort each column4. Untranspose entire mesh5. Sort each column6. Shift down by half a column7. Sort each column8. Shift up by half a column
1. Sort each column2. Slabpose: transpose within vertical
slabs3. Sort each column4. Shuffle columns5. Slabpose6. Sort each column7. Untranspose entire mesh8. Sort each column9. Shift down by half a column10.Sort each column11.Shift up by half a column
1. Sort each column2. Slabpose: transpose within vertical
slabs3. Sort each column4. Shuffle columns5. Slabpose6. Sort each column7. Untranspose entire mesh8. Sort each column9. Shift down by half a column10.Sort each column11.Shift up by half a column
1. Sort each column2. Slabpose: transpose within vertical
slabs3. Sort each column4. Shuffle columns5. Slabpose6. Sort each column7. Untranspose entire mesh8. Sort each column9. Shift down by half a column10.Sort each column11.Shift up by half a column
Slabpose Columnsort Steps
Oblivious!
1. Sort each column2. Slabpose: transpose within vertical
slabs3. Sort each column4. Shuffle columns + slabpose
5. Sort each column6. Untranspose entire mesh7. Sort each column8. Shift down by half a column9. Sort each column10.Shift up by half a column
Slabpose Columnsort Steps
Oblivious!
Why Work With Vertical Slabs?
• In regular columnsort, the matrix needs to be tall and skinny
• Working with vertical slabs allows us to change the aspect ratio to use tall and skinny slabs
• We’ll use slabs that are s columns wide• The mesh will have s slabs
0-1 Principle
• If an oblivious algorithm sorts all input sets consisting solely of 0s and 1s, then it sorts all input sets with arbitrary values
• Use the 0-1 Principle by looking at portions of the r s mesh
• Clean: all 0s or all 1s• Dirty: may be mixed 0s and 1s
Step 7: Untranspose Entire Mesh
≤ 2s3/2 elements
r ≥ 4s3/2 ==> 2s3/2 ≤ r/2 ==> dirty area ≤ half a column
Once the size of the dirty area is at most half a column,the last four steps will finish up
Subblock Columnsort
• Adds two steps to columnsort– Sort each column– A fixed permutation
• The permutation is any one that distributes all elements of each s s subblock to alls columns
• Like slabpose columnsort, the size of the dirty area is ≤ 2s3/2 entering the last four steps
• As long as 2s3/2 ≤ r/2 (half a column), the last four steps complete the sorting
Removing the Divisibility Restrictionfrom Columnsort
• With the divisibility restriction, the dirty rows after the transpose step have only 01 transitions
• Without the divisibility restriction, there may also be 10 transitions
• The proof shows that even with the 10 transitions, the size of the dirty area entering the last four steps does not increase
• Thus r ≥ 2s2 suffices, even without the divisibility restriction
Conclusion
• We can get around the restrictions of columnsort
• Reduce the exponent in the height restriction from 2 to 3/2– The mesh need not be quite so tall and skinny– Cost: Two extra steps– In out-of-core implementation, slabpose
columnsort requires no additional I/O
• The divisibility restriction is unnecessary• Open question: Can we reduce the
exponent further within the columnsort framework?