binar sort a generalized sorting algorithm

8/6/2019 Binar Sort a Generalized Sorting Algorithm

1/32

Binar Sort: A Linear Generalized Sorting Algorithm

by

William F. Gilreath

([email protected])

May 2011


2/32


-2-

Abstract

Sorting is a common and ubiquitous activity for computers. It is not surprising that there exists a

plethora of sorting algorithms. For all the sorting algorithms, it is an accepted performance limit

that sorting algorithms are linearithmic O(N lg N). The linearithmic lower bound in performance

stems from the fact that the sorting algorithms use the ordering property of the data. The sortingalgorithm uses comparison by the ordering property to arrange the data elements from an initial

permutation into a sorted permutation.

Linear O(N) sorting algorithms exist, but use a priori knowledge of the data to use a specific

property, and thus has greater performance. In contrast, the linearithmic sorting algorithms aregeneralized by using a universal property of data, relative ordering by comparison, but have a

linearithmic performance lower bound. The tradeoff in sorting algorithms is generality for per-

formance by the chosen property used to sort the data elements.

A general-purpose, linear sorting algorithm in the context of the tradeoff of performance for ge-

nerality seems implausible. There is an implicit assumption that only the property of ordering isuniversal. But, ordering is not the only universal property for data elements. The binar sort is a

general-purpose sorting algorithm that utilizes this other universal property to sort with linearperformance.

Keywords: comparison, comparison sorting, encoding, ordering, linear, linearithmic, sort, sort-ing, universal property.


3/32


-3-

1. IntroductionA. Sorting

Sorting is a frequent, ubiquitous computational activity on computers, often in tandem with other

algorithms such as search. The interest and ubiquity of sorting has led to the development of adiverse number of sorting algorithms. But it begs the question of What exactly is sorting?

Sorting Formalism

A sorting formalism independent of any specific sorting algorithm defines sorting without a spe-cific method to sort a collection of data elements. Knuth defines sorting [Knuth 1998] as:

The records:

R1, R2, , RN-1, RN

are supposed to be sorted into non-decreasing order of their keys K1, K2,,KN, essentially by

discovering a permutation p(1) p(2)p(N) such that:

K p(1) Kp(2) Kp(N-1) Kp(N)

Knuths definition of sorting defines sorting in terms of the end result, a specific permutation outof the n! permutations possible for a collection of records. The definition is more mathematical

than algorithmic. The result is described mathematically in organization among the dataelements, but not the mechanism for the result. The actual means is the specific sorting

algorithm.

Linearithmic Sorting

Most sorting algorithms are comparison based sorting algorithms. The comparison sorting

algorithms include such algorithms as the merge sort, quick sort, and heap sort. These sortingalgorithms use comparison to arrange the elements in the sorted permutation, and are general-

purpose in nature.

The comparison sorting algorithms have a well-known theoretical [Johnsonbaugh and Schaefer2004] performance limit that is the least upper bound for sorting that is linearithmic or O(N lg N)

in complexity. This theoretical lower bound is from the basis of the comparison sorting

algorithm-using sorting to arrange the data elements. A decision tree forNelements is oflogarithmic height lg N. Thus, the time complexity involves the cost that is the cost of using adecision tree to compare elements lg Nand the number of elementsN. Hence the theoretical least

upper bound is the product of the two costs involved, O(N lg N), which islinearithmic

complexity.


4/32


-4-

Linear Sorting

Most, but not all sorting algorithms are linearithmic and use comparison as the operation to

organize the data elements. But, there are linear sorting algorithms, or O(N) complexity, but are

not utilizing comparison. Such linear algorithms include the radix sort, hash sort, and counting

sort.

Linear sorting algorithms use a priori knowledge of the data elements, a specific property, that is

required to sort linearly. The linear sorting algorithms map with a function, the data elementusing the specific property.

The linear sorting algorithms are not general-purpose as they are tied to a specific property.Compared to (no pun intended) linearithmic sorting algorithms, the linear are special-purpose. In

sorting algorithms, there is the classic engineering tradeoff of generality with performance for

linear with linearithmic, respectively.

Summary of Linear with Linearithmic

Sorting is algorithmic, but the algorithm has a key operation based upon a property of the dataelements. Linearithmic sorting algorithms use comparison, or a comparison operation, but that

operation is based upon the universal property of ordering. Linear sorting algorithms use a

mapping operation based upon a specific property of the data elements.

The tradeoff in generality to performance for sorting algorithms is an absolute. But there is an

implicit presumption that there is no other universal property of the data elements. Such a

presumption is that ordering is the only universal property of data elements. However, orderingis not the only universal property of data elements.

B. Data

Long before computers were invented and constructed, data or lots of datum existed. Files, cards,

and records were kept as the industrial revolution lead to a revolution in data for production fig-

ures, records on employees, and the government kept records for taxes, census for population,and so forth. Hence the modern civilization is characterized by the growth of information in the

form of data in a variety of forms for different and varied uses. Now there is a vast dizzying

ocean of data, with more information accumulated, analyzed, mined, and processed.

Data on a Computer

In order to use data on a computer, the data must be represented in a form the computer can work

with natively. Roth explains, ...computers require processing of data which contains numbers,

letters, and other symbols such as punctuation marks. In order to transmit such alphanumeric da-

ta to or from a computer, or store it internally, in a computer, each symbol must be represented

by a binary code. [Roth 1992, pp. 13-14]


5/32


-5-

Data comes in the form of integers, ordinals, characters, strings, reals, and Boolean values. Com-

puters are binary, and only work on binary numbers, the binary digits or bits of a binary number.Hence an integer must be represented in binary, a Boolean value as binary, and all others as a

binary number.

Encoding Data

The representing of data on a computer as a binary number, or more generally in binary form, the

data must be encoded into binary form. The kinds of data, integers, ordinals, reals, characters,strings, and Booleans must be encoded.

Floyd explains, In order to communicate, we need not only numbers, but also letters and othersymbols. In the strictest sense, alphanumeric codes are codes that represent numbers and alpha-

betic characters (letters). Most such codes however, also represent symbols and various instruc-

tions necessary for conveying information. [Floyd 1990 p. 81].

Encoding is the system of representation of binary form of each data type in a consistent method.Many varieties of encoding exist, some well before computers were invented such as Morse

code, or the Baudot code. Other encodings are for communication efficiently such as Gray code,Excess-3, and encoding for error detection such as Hamming code. [Floyd 1990 p. 80] Null and

Lobur summarize encoding as ...how these internal values can be converted to a form that is

meaningful to humans. The manner in which this is done depends on both the coding system

used by computer and how the values are stored and retrieved. [Null and Lobur 2003 p. 68]

There are different encoding standards used, such as ASCII code [ANSI 1986], UTF-8 [Yergeau

2003], and Unicode [Unicode 2006] for characters, IEEE-754-1985 [IEEE 1985] for floating

point numbers, and twos complement for integers. Encoding is universal because computersrequire a ubiquitous form of representation to interchange data. But, the encoding is perceived as

a hardware feature, which is the reason encoding, is not readily apparent as a universal property.

Data (from the software perspective) automatically exists in the computer, and the details of thedata are a hardware, not software focus. An integer or Boolean exists in the programming

language, and thus in the computer, but the fact is is specifically encoded into a binary number is

abstracted away.

A summary of encoding for data types in the various programming languages:

1. integer - 2-complement encoding2. real (as float or double) - IEEE-754-19853. Boolean - binary word (each bit is a Boolean value of true or false)4. character - ASCII, EBCDIC, Unicode, UTF-8 (string is an array or linear structure of

characters)

Other Universal Property

The ordering property, in the form of a comparison function or operator, is not the only universal

property in data. Frequently in object-oriented software development, a class must define a


6/32


-6-

comparison method to determine the ordering of instances of the class as objects. From a

mathematical perspective, numbers such as real numbers and integer numbers have an ordering,but from an algorithmic perspective the ordering is not always given. Thus ordering is not the

only universal property, and thus the tradeoff in sorting algorithms of performance to generality

is not immutable.

The question is then one of: What is the alternate universal property of data? The answer isseemingly obvious, but is not. The answer is the encoding, that is how a data element is

represented, in the computer.

The basic data elements of integers, reals, and characters are all encoded in a binary form on a

computer. Computers are binary in operation and the representation or encoding is in bits--binarydigits. But for computers to exchange and share data the representation must be agreed upon--the

encoding, particularly for characters and strings of characters.

Given that the encoding is universal in a binary form, the germane question is: How does this

relate to a sorting algorithm? Simply put, finding another universal property allows forgenerality, and encoding is not ordering. Hence the algorithm using encoding is not comparison

sorting, and the linearithmic O(N lg N) lower bound is inapplicable.

Utilizing the encoding, a sorting algorithm can map the data elements instead of comparing a

data element to another, which is the basis for a general-purpose linear sorting algorithm. A

function or operator is needed to use the encoding to arrange the data elements from an initialrandom permutation into a sorted permutation. The sorting algorithm is the binar sort algorithm.

Ubiquity of Encoding

An obvious realization is that while Unicode is widely used, there is no one, single, universal

encoding of data, particulary for character data. It would seem that using the encoding as a

universal property to sort upon is contradictory as there is no universal encoding. No universalencoding implies that there is no universal property, hence that sorting is no general-purpose.

While there is no one universal encoding for data, all the various different encodings can betranslated or converted to one another. Even a proprietary encoding can be translated, out of the

need to work with the bulk of existing software, unless the originator of the proprietary encoding

intends to implement all the various forms of software to work with the encoding. The translationand conversion is a necessity for use of existing software. No matter how elegant and efficient an

encoding, the bulk of software exists using standardized encodings of data.

The universality is that all encodings are translatable to one another, so encoding is universal inthat each encoding is possible to use for a different platform or system.

More formally, for given encoding domain Ex there exists a translation function Fxy that

translates an encoding domain Ex into the encoding range Ey. A character or glyph cEx inencoding Ex is mapped by Fxy into the encoding Ey for a character cEy.


7/32


-7-

When cEx Ex cEy Ey then for Fxy, it follows Fxy : ExEythereforeFxy(cEx) cEy.

An encoding this is a subset of another larger encoding, the translation for the subset (a partialmapping function for Ex mapped by Fxy to Ey) is possible with all other characters outside the

domain encoding mapped to a default character in the range encoding.


8/32


-8-

2. Algorithm SynopsisThe binar sort partitions or divides data elements from the initial array into the lower and upper

sub-arrays. The algorithm extracts the nth

bit from a given data element, and then uses that bit to

map the element in the lower or upper sub-array.

Diagram 1 of Two Sub-Arrays at Extreme Ends of the Original Array

Two sub-arrays are created within the original array starting at the extreme endpoints and pro-

gressing towards each other at the middle of the initial array.

Diagram 2 of Two Sub-Arrays Approaching One Another

The process continues on each created sub-array using the next bit position in the data elements.

Diagram 3 of Two Sub-Arrays Encompassing Original Array Approaching One Another

This process continues until either the last bit in the data element is reached, or the size of a sub-

array is one element.

Diagram 4 Illustrating Two Sub-Arrays at Intersection or Crossover

The resulting array at the base case for termination cannot partition into more sub-arrays, or there

are no more bits for extraction. The original array is now in a sorted permutation.


9/32


-9-

3. Operation of AlgorithmThe binar sort algorithm operates both iteratively and recursively in place on the original array of

data elements.

The algorithm operates with four discrete steps that are:

1. Evaluate for recursive base case.2. Initialize starting array bounds.3. Partition the array into sub-arrays.4. Determine recursive call on sub-arrays.

Evaluate for Recursive Base Case

The first step is to evaluate the passed parameters for termination of the algorithm, to determineif a recursive base case has been reached. When one of the two possible base cases of the binar

sort is reached, the recursion terminates, and the call returns without any further operation on thepassed parameters.

The two criteria for the base case of the recursion are:

1. Reach the end of the bits in an element.2. The size of a sub-array is one element.

The first case is to reach the end of the number of bits for a data element. In effect, there are nomore bits to extract to use for partitioning. The second case is the check of the bounds of the ar-

ray parameter for any elements to partition into a sub-array. For an array of one element, there is

no point partitioning, as the element is in its final position. For either case, the operation of thealgorithm terminates.

Initialize Starting Array Bounds

When the binar sort algorithm does not terminate, then the operation proceeds to initialization.

The passed array bounds are used to initialize the bounds for the original array. The passed array

bounds are retained for use later in the operation of the algorithm. The bounds of the original ar-

ray are used by variables to track the changing boundaries of the original array during partition-ing. After initializing the original array boundaries, the operation then proceeds to partition.

Partition Array into Sub-arrays

The heart of the binar sort algorithm is the continual partitioning of the original array. The parti-


10/32


-10-

tion step of the algorithm divides the elements of the original array into lower and upper sub-

arrays.

The partition operation has three steps:

1.

Extract the n

th

bit from the data element in selected position.2. Map the data element in the correct sub-array.3. Adjust sub-array boundary to encompass element.

The initial start position used in the operation is the lower index position of the array. The start-

ing position acts as the locus of partitioning the array. The data element at the locus position is

the working element mapped into a sub-array.

Bit Extraction

Before partitioning the working data element, the nth bit is extracted to determine the sub-array

to map the element. Bit extraction uses a shift operation to the left by N-bits, and a bit mask. Thebit mask is a literal value that depends on the data element word size in bits. The result is all the

bits are zero or the value of the literal bit mask.

For a data element of a nybble or 4-bits, the bit mask is hexadecimal 0x8 or a binary number of

10002 for an integer value of 8.

For a data element using the bit mask with a logical and bitwise operation is:

XXXX2 && 10002 X0002

The resulting value for a nybble is either hexadecimal 0x0 or a binary number 00002 the integer0, a hexadecimal 0x8, or a binary number of 10002 for the integer value of 8, which is a non-

zero. Thus for a bit extraction, the result of the extracted bit for a data element is essentially a

zero or non-zero (the literal bit mask) value.

Placement of Element

The placement of the data element is dependent upon the bit value from the bit extraction. De-

pending on the extracted bit value, the data element is placed by one of two possibilities. The

data element selected is in the lower sub-array.

The two possibilities are:

1. Data element is in the correct position in the lower sub-array, extend sub-array boundary.2. Data element is in the wrong position, exchange with the element in the upper sub-array.

With an extracted bit value of zero, the data element is in the correct position in the lower sub-

array, and nothing is exchanged. For an extracted bit value of non-zero, the data element is ex-

changed or swapped with the data element in the upper sub-array. In both cases, it then followsto adjust the array bounds after the data element is placed.


11/32


-11-

Adjust Array Bounds

Once the data element is placed in the correct sub-array, the sub-array boundaries are adjusted to

encompass the element. For the lower sub-array, the boundary index is incremented, and for the

upper sub-array, the boundary index is decremented. The lower and upper array boundaries ap-proach each other as each data element is partitioned.

Partition Repetition

The partitioning process continues iteratively for each data element in the array. The iterative

process continues until the array bounds cross or intersect, when all the elements are partitionedinto the sub-arrays for a bit position. The remaining step is to determine the recursive continua-

tion of the operation of the algorithm.

Determine Recursive Continuation

The last step of the binar sort is recursive continuation of the algorithm on the result of partition-

ing. The partitioning creates one or two sub-arrays. For one sub-array, no partitioning occurred,which is pass through. A result of two sub-arrays, the partitioning process successfully divided

the data elements of the original array.

The continuation of the algorithm must determine the results of partitioning. The original arraybounds are evaluated using the index of sub-array bounds after partitioning, and partitioning con-

tinues recursively.

4. Illustration of the AlgorithmThe binar sort algorithm works on different data types (such as character, integer, ordinal, real,

and string) of different data bit sizes. The word size of the data element is a constant c of the al-


12/32


-12-

gorithm. In illustration of the algorithm, a nybble, or a 4-bit word size (a single hexadecimal

number) is used.

Consider an initial array of nybble values which are eight nybbles to sort into an ordered permu-

tation. The bit mask for the bit extraction is hexadecimal 0x1000, or integer value of 8.

First Pass at Partitioning

The first pass at partitioning divides the original array (in the center) into a lower sub-array (onthe left) and an upper sub-array (on the right). Initially before any partitioning the configuration

of the array, lower sub-array, and upper sub-array is:

Figure 1 Initial Array of Elements before Partitioning

The selected element is 'B', and the bit extracted is from the most significant position, which is a

non-zero. The element is not in the correct position, so is swapped with the element at the boun-

dary of the upper sub-array, the 'E' element. After the exchange, and array boundary adjustment

the configuration is:

Figure 2 Array of Elements after Partitioning First Element

The selected element is 'E', and the bit extracted is non-zero. Again, the element is not in the cor-

rect position, so is swapped with the 'C' element. The configuration is:

Figure 3 Array of Elements after Partitioning Second Element

The selected element is 'C', and the bit extracted is non-zero. The element is not in the correct

position, and so is swapped the 'A' element. The configuration is:


13/32


-13-

Figure 4 Array of Elements after Partitioning Third Element

The selected element is 'A', and the bit extracted is non-zero. The element is not in the correct

position, and so is swapped the '7' element. The configuration is:

Figure 5 Array of Elements after Partitioning Four Element

The selected element is '7', and the bit extracted is zero. The element is in the correct position,

and so only the array bounds are adjusted. The configuration is:

Figure 6 Array of Elements after Partitioning Fifth Element

The selected element is '4', and the bit extracted is zero. Again, the element is in the correct posi-

tion, and so only the array bounds are adjusted. The configuration is:

Figure 7 Array of Elements after Partitioning Sixth Element

The selected element is '0', and the bit extracted is zero. The element is in the correct position,and so only the array bounds are adjusted. The configuration is:


14/32


-14-

Figure 8 Array of Elements after Partitioning Seventh Element

Again, as the selected element is '0', and by the same process, the final configuration of:

Figure 9 Array of Elements after Partitioning Eighth Element

The algorithm continues on each sub-array as an array, further partitioning them on the next bit

position.

Partition from Initial Lower Sub-Array

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

guration is:

Figure 10 Lower Sub-array of Elements before Partitioning

The selected element is '7', and the extracted bit is non-zero. The element is placed with an ex-

change, and the configuration is:

Figure 11 Lower Sub-array of Elements after Partitioning First Element

The selected element is '0', and the extracted bit is zero. The element is in the correct position,

and the configuration is:


15/32


-15-

Figure 12 Lower Sub-array of Elements after Partitioning Second Element

The selected element is '4', and the extracted bit is non-zero. The element is placed with an ex-

change, and the configuration is:

Figure 13 Lower Sub-array of Elements after Partitioning Third Element

The selected element is '0', and the extracted bit is zero. The element is in the swapped into posi-

tion (with itself as there is only one element), and the configuration is:

Figure 14 Lower Sub-array of Elements after Partitioning Fourth Element

The resulting lower and upper sub-array elements are in the correct positions. However, parti-

tioning would continue, but in the case of pass through resulting in only one sub-array. The re-

maining third and fourth bit positions are evaluated, and when the last bit position is reached.

Partition from Initial Upper Sub-Array

The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

guration is:

Figure 15 Upper Sub-array of Elements before Partitioning

The selected element is 'A', and the extracted bit is zero. The element is in the correct position,



16/32


-16-

Figure 16 Upper Sub-array of Elements after Partitioning First Element

The selected element is 'C', and the extracted bit is non-zero. The element is swapped into the

correct position, and the configuration is:

Figure 17 Upper Sub-array of Elements after Partitioning Second Element

The selected element is 'B', and the extracted bit is zero. The element is in the correct position,


Figure 18 Upper Sub-array of Elements after Partitioning Third Element

Lastly, the selected element is 'E', and the extracted bit is non-zero. The element is swapped

(with itself as there is only one element) and the array bounds adjusted. The final configuration

is:

Figure 19 Upper Sub-array of Elements after Partitioning Fourth Element

The lower sub-array for the third and fourth bits the elements are in the correct position, the caseof pass through. For the upper sub-array, the elements are positioned on the third bit, but then the

recursion terminates, as the sub-array size is one element.

Summary of Illustration of the Algorithm


17/32


-17-

The partitioning process starts from an initial array of nibbles or hexadecimal numbers, and con-

tinues until the end of the last bit (pass through), or when the sub-array is only of one element insize. Consider the array configuration into sub-arrays for each bit position, starting from an ini-

tial array. The configuration for each pass is:

Figure 20 Partitioning Hierarchy from Initial Array into Sub-arrays

On the partitioning with the second and third bit, the sub-array with [E C] is partitioned into

two sub-arrays of a single element. Thus, for the remaining bit positions, no partitioning occurs

as the recursion terminates.

However, the other sub-arrays are the case of pass through, no data elements are partitioned. This

continues until the last bit position is reached, and the recursion terminates. Both recursive cases

are used in the overall partitioning process.

After the last bit position is evaluated for the three sub-arrays that are partitioned but result inpass through, the overall array is in the final configuration of:


18/32


-18-

Figure 21 Resulting Array after Partitioning into Sorted Permutation

The configuration is, by Knuth's formal definition, a sorted permutation of the original array of

elements. The partitioning on the sub-arrays operates in place, within the index boundaries for

sub-arrays within the overall array.

5. Analysis of AlgorithmThe binar sort is linear in both space and time complexity. The algorithm requires no extra sto-

rage space or memory, and performance time is proportional to the number of data elements.


19/32


-19-

Space Analysis

The binar sort is an in-place sorting algorithm; hence the same initial array is used for each re-

cursive invocation of the algorithm for partitioning. The sub-arrays are determined by the passed

parameters as index boundaries within the array. The only change is of the variables for each re-

cursive call of the binar sort.

The array ofNelements and a constant number of variables c for reach recursive invocation is

the space complexity S:

O(S) = N + c

As there are multiple invocations through recursion, the constant is the sum of each recursive

call, or for i total calls recursively:

O(S) = N + (c0 + ... + ci-1)

The expression can be simplified using summation notation to the expression:

O(S) = N +

jcj0

i1

But the summation of a constant is the product of the constant:

jcj 0

i1

= icj = c

Effectively the sum of the constants c for i recursive calls is simply another constant c, or inBig-Oh notation a constant c, so the space complexity for the binar sort is:

O(S) = N + c = O(N)

The space complexity of the binar sort is linear or O(N), proportional to the number of data ele-ments.

Time Analysis

The analysis of the performance time of sorting algorithms typically considers different cases,from an optimal to a worst case. Each case is simply a different permutation for the particular

sorting algorithm is better or worse performance. Frequently, the permutations of the data for the


20/32


-20-

different cases are for an ascending sorted permutation, a descending sorted permutation, and a

random permutation.

This approach to sorting algorithm analysis is used for the comparison-based sorting algorithms.

An unasked question is "Why use various permutations for performance time analysis?" The an-

swer is that comparison-based sorting algorithms do a relative comparison for positioning a dataelement. Each comparison results in a relative positioning of the data element in the overall array

of data elements. Thus, the initial permutation of the data elements can impact sorting perfor-

mance.

Why consider this particular point for time analysis of a sorting algorithm? Simply, the binar sort

does not use comparison to place an element in position, but the extracted bit values. For theconstant c total bits or word size of a data element, each bit constitutes 1/c of the process in the

correct positioning a data element within the array. Each positioning is absolute within the over-

all array, not relative to the other data elements. Effectively, the initial permutation of the data

elements is superfluous to the performance time of the binar sort. Comparison is not used to rela-

tively place the element; each bit is used to put an element in an absolute position.

In the analysis of the performance time, considering worse and optimal cases is irrelevant as thebinar sort is not comparison-based, so is independent of any particular permutation of the data

elements.

Analysis of the Performance

Analysis of the binar sort algorithm performance time does not use specific cases. The analysis

approach is to consider each discrete step, and consider the overall total performance time O(T)as the sum of each step. Thus the initial analysis of the performance time of the binar sort is:

O(T) = O(T0) + O(T1) + O(T2) + O(T3)

Each discrete step has its own performance time, and for each discrete step the performance time

is:

1. O(T0) - Evaluate for recursive base case.2. O(T1) - Initialize starting array bounds.3. O(T2) - Partition array into sub-arrays.4. O(T3) - Determine recursive call on sub-arrays.

Except for the partition step of the algorithm, all the other steps are constant in the performance

time. For the last step involving a logical decision of the recursive call, the complexity of theevaluation is constant for the decision, and the greater performance time for two recursive calls is

used as the constant. Substituting a constant for each step, the performance time of the binar sort

becomes:

O(T) = O(c0) + O(c1) + O(T2) + O(c3)


21/32


-21-

The expression simplifies the constants to a single constant c, and then is:

O(T) = O(T2) + c

Partitioning Analysis

The analysis of partitioning is not constant, as the partitioning step in the algorithm is iterative.

However, the iterative step processes through all elements in the array, and for the recursive con-

tinuation the sub-arrays, which are divisions of the overall array. Thus partitioning is a constantperformance time c for the bit extraction and determination of which sub-array to place a data

element. But this process is repeated n times, or once for each data element. This means that the

performance time for the partitioning step is:

O(T2) = cN

Substituting this into the original performance time expression for the algorithm is:

O(T) = cN + c

But for the Big-Oh time complexity, this expression simplifies to:

O(T) = N

Recursion Analysis

One important consideration in the performance time analysis, is that for the expression as the

sum of the performance time for each discrete step, is that it is for one operation of the algorithm

on a single bit position. This means the performance time expression is:

O(T) = 1N

For each recursive pass, each bit of each data element is accessed once, and there are a constant

number c of bits (and for other data types with variants, such as a string of characters, there is an

overall average length for a data element that is independent of the number of data elements).This means that the recursion will occur for each bit position, so that the performance time is:

O(T) = cN

Again, the performance time is O(N) linear.

Summary of Performance Analysis

Regardless of the permutation of the data elements, the binar sort in the worst case will access a

data element once for each bit. Hence forNdata elements with a constant number of bits c, theperformance time is O(N). The binar sort algorithm operates independently of the permutation of


22/32


-22-

the data elements, so avoids a worst or optimal case. Thus, the binar sort algorithm remains con-

sistently linear in performance time.

6. Applied PerformanceThe binar sort is implemented in both managed code (in Java and C#) and native code (using C)

to evaluate the applied performance of the algorithm. The code was executed on different plat-

forms, from an AMD 64-bit PC with Windows 7, an PowerPC eMac running Mac OS X, an Intel


23/32


-23-

Core Duo MacBook, and Linux Pentium 4 PC. The managed code was executed on the Common

Language Runtime (CLR) of .NET, and the Java Virtual Machine (JVM).

The same data sets are used for all the implementations of the binar sort. The test set examined

three cases:

1. Unsorted integers

2. Sorted integers

3. Partially sorted integers

In each case, the test was executed, and the performance time and size of the data written to a

data log file. The test sizes are varied, but as increments of blocks from a size of 216

= 65536 in-tegers, and blocks the size of 220 = 1,048,576 integers.

The performance is linear, with a statistical covariance ranging from 0.9994 to 0.9999 in the

graph of the performance. The applied implementation of the binar sort was consistent with the

theoretical expectation, linear O(N) performance.

Chart of Test for Binar Sort Performance at 0.9994 Covariance


24/32


-24-




25/32


-25-


Chart of Test for Binar Sort Performance at 0.9999 Covariance on Megabyte Block

Summary of Charts


26/32


-26-

The charts of the performance of data size N to time T illustrate a linear, straight line. The size ofdata increases linearly, and then the time to sort the data increases linearly in proportion to a con-

stant. The covariance varies from 0.9994 to 0.9999, but is 0.999 for each equation. Hence the

equations are statistically accurate for the binar sort performance.

Chart of Binar Sort Performance Equations High Scalar Range Plot

When the equations of data size to sort time are plotted, the lines are congruent, and some paral-

lel to each other.

Chart of Binar Sort Performance Equations Lower Scale Range Plot

The equations change along the position along the axes, but the plot of the lines are consistent.This indicates how the binar sort performance is consistently linear, with a change in the equa-

tion constants from the data sizes and the runtime platform.

7. Future Work


27/32


-27-

There are two major potential aspects for further work with the binar sort that are:

1. Variations2. Optimization

Algorithm Variants

The binar sort is recursive and executes on a serial processor. This leads to the opportunity for:

1. Purely iterative implementation.2. Parallel version of the algorithm.

The algorithm variant of purely iterative is to replace recursion with iteration. The implementa-

tion is more an investigation into the potential possibility, so an open research question. An itera-

tive version would remove the overhead in time and space of activation records on the runtime

stack of the environment. A comparison of the performance of a recursive to iterative implemen-tation of the binar sort would show possible performance improvement.

The binar sort executes on a serial processor, and thus it has potential parallelization. A parallel

version of the binar sort would let each processor focus on the partition for the processor identi-

ty. The interesting feature is that from the initial array of data elements, each parallel operation is

completely independent of the other. Theoretically, the linear algorithm ofO(N) increases in per-formance by O(N/P) where there are a total ofP-processors.

Optimize the Algorithm

The binar sort is efficient as a linear O(N) algorithm, but optimizations are possible in the opera-

tion of the algorithm. Two such possible optimizations are:

1. Improve for the partitioning case of pass through.2. Optimize for determining if elements in sorted order.

A potential optimization is for pass through when partitioning, essentially no data elements are

partitioned into sub-arrays. The cost of a recursive call to continue partitioning in the same

bounds for the next bit position is optimized by iteration. Instead of a recursive call, the partition-ing would proceed but for the next bit position. This also simplifies the continuation recursive

step, as the recursion would always be for two sub-arrays, not just a single one.

The illustration of the operation of the binar sort demonstrated that the algorithm continues therecursive invocation of the algorithm when the entire array is in a sorted permutation. This is

wasteful of process time, and increases the overhead of additional recursion without any further

need. A viable optimization is to check within the original array bounds to verify if the data ele-

ments are in sorted order, and indicate a sorted permutation exists.


28/32


-28-

The difficulty with the optimization is to avoid unnecessary checks when the array is not poten-

tially sorted, but to determine when the array is sorted to avoid unnecessary recursive partition-ing. For example, after several occurrences of pass through while partitioning a sorted order

check is performed. If the array is in a sorted ordered arrangement, the recursive calls in other

sub-arrays would terminate.


29/32


-29-

8. ConclusionThe binar sort is a linear, unstable, and recursive sorting algorithm. The binar sort algorithm is

independent of the initial data permutation. By using the bits of the encoded data elements, the

binar sort arranges the data elements into a sorted permutation. Utilizing the encoding makes the

binar sort universal, but also a linear algorithm independent of any initial permutation for ex-treme performance cases.

Beyond the features of the binar sort algorithm, are some intriguing realizations. One is that ge-

nerality and performance are not a hard, fast, tradeoff that is frequently presumed. The binar sort,

by its nature illustrates that such a tradeoff is not immutable. The binar sort algorithm illustratesthat the linearithmic, comparison based sorting algorithms have an implicit presumption that or-

dering is the only universal property feasible for generalized sorting. This implicit presumption is

flawed, and the other universal property is (of which all computers at the time of this writing are)

the universal property of the encoding of data.

In the analysis of the binar sort algorithm, the permutation of the data is superfluous and irrele-vant to the performance of the binar sort. This illustrates that comparison based sorting algo-

rithms are sensitive to the permutation of the data set to sort. By using encoding instead of order-ing, the binar sort avoids the sensitivity flaw to the permutation of the data. This flaw in sensitiv-

ity to the permutation of the data is an accepted defect in existing comparison based sorting algo-

rithms, but again it is an implicit presumption that this is necessary.

The binar sort algorithm breaks new ground, but any further development is far from complete or

a finished. Further work is necessary for improvement and optimization of the binar sort. A pure-ly iterative variant, and a parallel version of the algorithm are just two possibilities of future

work. Future developments are anticipated with possibly other algorithms inspired by the binar

sort.


30/32


-30-

Appendix A. Binar Sort Java Source Code for Integer

voidBinarSort(int lower, int upper, int pos, int[] array)

{

if(pos == 33 || upper < lower + 1) return;

int lo = lower; //array lower bound positionint hi = upper; //array upper bound position

while (lo < hi + 1){

int bit = (array[lo]


31/32


-31-

Appendix B. Binar Sort C Source Code for Integer

voidbsort(constint low, constint high, constint pos, int list[])

{

int lo = low;int hi = high;

if( pos == 32 || high < low + 1 ) return;

while (lo < hi + 1)

{

if( (list[lo]


32/32


References

[ANSI 1986] American National Standards Institute, "Coded Character Set - 7-bit American

Standard Code for Information Interchange", ANSI X3.4, 1986.

[Floyd 1990] Floyd, Thomas L. Digital Fundamentals, 4th edition. Merril Publishing Company,Columbus, OH. 1990.

[IEEE 1985] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arithmetic,IEEE Std 754-1985, 1985.

[Johnsonbaugh and Schaefer 2004] Johnsonbaugh, Richard and Schaefer, Marcus. Algorithms.

Prentice-Hall, Upper Saddle River, New Jersey, 2004.

[Knuth 1998] Knuth, Donald E. The Art of Computer Programming, 2nd

edition. Addison-Wesley, Reading, Massachusetts, 1998.

[Null and Lobur 2003] Null, Linda and Lobur, Julia. The Essentials of Computer Organization

and Architecture, Jones and Bartlett Publishers, Sudbury MA. 2003

[Roth 1992] Roth, Jr., Charles A. Fundamentals of Logic Design, 4th edition. West Publishing

Company, New York, NY. 1992.

[Unicode 2006] The Unicode Standard, Version 5.0, Fifth Edition, The Unicode Consortium,

Addison-Wesley Professional, 2006.

[Yergeau 2003] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC3629, November 2003.

binar sort a generalized sorting algorithm

Documents