binar sort a generalized sorting algorithm

Upload: william-gilreath

Post on 08-Apr-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    1/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    by

    William F. Gilreath

    ([email protected])

    May 2011

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    2/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -2-

    Abstract

    Sorting is a common and ubiquitous activity for computers. It is not surprising that there exists a

    plethora of sorting algorithms. For all the sorting algorithms, it is an accepted performance limit

    that sorting algorithms are linearithmic O(N lg N). The linearithmic lower bound in performance

    stems from the fact that the sorting algorithms use the ordering property of the data. The sortingalgorithm uses comparison by the ordering property to arrange the data elements from an initial

    permutation into a sorted permutation.

    Linear O(N) sorting algorithms exist, but use a priori knowledge of the data to use a specific

    property, and thus has greater performance. In contrast, the linearithmic sorting algorithms aregeneralized by using a universal property of data, relative ordering by comparison, but have a

    linearithmic performance lower bound. The tradeoff in sorting algorithms is generality for per-

    formance by the chosen property used to sort the data elements.

    A general-purpose, linear sorting algorithm in the context of the tradeoff of performance for ge-

    nerality seems implausible. There is an implicit assumption that only the property of ordering isuniversal. But, ordering is not the only universal property for data elements. The binar sort is a

    general-purpose sorting algorithm that utilizes this other universal property to sort with linearperformance.

    Keywords: comparison, comparison sorting, encoding, ordering, linear, linearithmic, sort, sort-ing, universal property.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    3/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -3-

    1. IntroductionA. Sorting

    Sorting is a frequent, ubiquitous computational activity on computers, often in tandem with other

    algorithms such as search. The interest and ubiquity of sorting has led to the development of adiverse number of sorting algorithms. But it begs the question of What exactly is sorting?

    Sorting Formalism

    A sorting formalism independent of any specific sorting algorithm defines sorting without a spe-cific method to sort a collection of data elements. Knuth defines sorting [Knuth 1998] as:

    The records:

    R1, R2, , RN-1, RN

    are supposed to be sorted into non-decreasing order of their keys K1, K2,,KN, essentially by

    discovering a permutation p(1) p(2)p(N) such that:

    K p(1) Kp(2) Kp(N-1) Kp(N)

    Knuths definition of sorting defines sorting in terms of the end result, a specific permutation outof the n! permutations possible for a collection of records. The definition is more mathematical

    than algorithmic. The result is described mathematically in organization among the dataelements, but not the mechanism for the result. The actual means is the specific sorting

    algorithm.

    Linearithmic Sorting

    Most sorting algorithms are comparison based sorting algorithms. The comparison sorting

    algorithms include such algorithms as the merge sort, quick sort, and heap sort. These sortingalgorithms use comparison to arrange the elements in the sorted permutation, and are general-

    purpose in nature.

    The comparison sorting algorithms have a well-known theoretical [Johnsonbaugh and Schaefer2004] performance limit that is the least upper bound for sorting that is linearithmic or O(N lg N)

    in complexity. This theoretical lower bound is from the basis of the comparison sorting

    algorithm-using sorting to arrange the data elements. A decision tree forNelements is oflogarithmic height lg N. Thus, the time complexity involves the cost that is the cost of using adecision tree to compare elements lg Nand the number of elementsN. Hence the theoretical least

    upper bound is the product of the two costs involved, O(N lg N), which islinearithmic

    complexity.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    4/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -4-

    Linear Sorting

    Most, but not all sorting algorithms are linearithmic and use comparison as the operation to

    organize the data elements. But, there are linear sorting algorithms, or O(N) complexity, but are

    not utilizing comparison. Such linear algorithms include the radix sort, hash sort, and counting

    sort.

    Linear sorting algorithms use a priori knowledge of the data elements, a specific property, that is

    required to sort linearly. The linear sorting algorithms map with a function, the data elementusing the specific property.

    The linear sorting algorithms are not general-purpose as they are tied to a specific property.Compared to (no pun intended) linearithmic sorting algorithms, the linear are special-purpose. In

    sorting algorithms, there is the classic engineering tradeoff of generality with performance for

    linear with linearithmic, respectively.

    Summary of Linear with Linearithmic

    Sorting is algorithmic, but the algorithm has a key operation based upon a property of the dataelements. Linearithmic sorting algorithms use comparison, or a comparison operation, but that

    operation is based upon the universal property of ordering. Linear sorting algorithms use a

    mapping operation based upon a specific property of the data elements.

    The tradeoff in generality to performance for sorting algorithms is an absolute. But there is an

    implicit presumption that there is no other universal property of the data elements. Such a

    presumption is that ordering is the only universal property of data elements. However, orderingis not the only universal property of data elements.

    B. Data

    Long before computers were invented and constructed, data or lots of datum existed. Files, cards,

    and records were kept as the industrial revolution lead to a revolution in data for production fig-

    ures, records on employees, and the government kept records for taxes, census for population,and so forth. Hence the modern civilization is characterized by the growth of information in the

    form of data in a variety of forms for different and varied uses. Now there is a vast dizzying

    ocean of data, with more information accumulated, analyzed, mined, and processed.

    Data on a Computer

    In order to use data on a computer, the data must be represented in a form the computer can work

    with natively. Roth explains, ...computers require processing of data which contains numbers,

    letters, and other symbols such as punctuation marks. In order to transmit such alphanumeric da-

    ta to or from a computer, or store it internally, in a computer, each symbol must be represented

    by a binary code. [Roth 1992, pp. 13-14]

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    5/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -5-

    Data comes in the form of integers, ordinals, characters, strings, reals, and Boolean values. Com-

    puters are binary, and only work on binary numbers, the binary digits or bits of a binary number.Hence an integer must be represented in binary, a Boolean value as binary, and all others as a

    binary number.

    Encoding Data

    The representing of data on a computer as a binary number, or more generally in binary form, the

    data must be encoded into binary form. The kinds of data, integers, ordinals, reals, characters,strings, and Booleans must be encoded.

    Floyd explains, In order to communicate, we need not only numbers, but also letters and othersymbols. In the strictest sense, alphanumeric codes are codes that represent numbers and alpha-

    betic characters (letters). Most such codes however, also represent symbols and various instruc-

    tions necessary for conveying information. [Floyd 1990 p. 81].

    Encoding is the system of representation of binary form of each data type in a consistent method.Many varieties of encoding exist, some well before computers were invented such as Morse

    code, or the Baudot code. Other encodings are for communication efficiently such as Gray code,Excess-3, and encoding for error detection such as Hamming code. [Floyd 1990 p. 80] Null and

    Lobur summarize encoding as ...how these internal values can be converted to a form that is

    meaningful to humans. The manner in which this is done depends on both the coding system

    used by computer and how the values are stored and retrieved. [Null and Lobur 2003 p. 68]

    There are different encoding standards used, such as ASCII code [ANSI 1986], UTF-8 [Yergeau

    2003], and Unicode [Unicode 2006] for characters, IEEE-754-1985 [IEEE 1985] for floating

    point numbers, and twos complement for integers. Encoding is universal because computersrequire a ubiquitous form of representation to interchange data. But, the encoding is perceived as

    a hardware feature, which is the reason encoding, is not readily apparent as a universal property.

    Data (from the software perspective) automatically exists in the computer, and the details of thedata are a hardware, not software focus. An integer or Boolean exists in the programming

    language, and thus in the computer, but the fact is is specifically encoded into a binary number is

    abstracted away.

    A summary of encoding for data types in the various programming languages:

    1. integer - 2-complement encoding2. real (as float or double) - IEEE-754-19853. Boolean - binary word (each bit is a Boolean value of true or false)4. character - ASCII, EBCDIC, Unicode, UTF-8 (string is an array or linear structure of

    characters)

    Other Universal Property

    The ordering property, in the form of a comparison function or operator, is not the only universal

    property in data. Frequently in object-oriented software development, a class must define a

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    6/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -6-

    comparison method to determine the ordering of instances of the class as objects. From a

    mathematical perspective, numbers such as real numbers and integer numbers have an ordering,but from an algorithmic perspective the ordering is not always given. Thus ordering is not the

    only universal property, and thus the tradeoff in sorting algorithms of performance to generality

    is not immutable.

    The question is then one of: What is the alternate universal property of data? The answer isseemingly obvious, but is not. The answer is the encoding, that is how a data element is

    represented, in the computer.

    The basic data elements of integers, reals, and characters are all encoded in a binary form on a

    computer. Computers are binary in operation and the representation or encoding is in bits--binarydigits. But for computers to exchange and share data the representation must be agreed upon--the

    encoding, particularly for characters and strings of characters.

    Given that the encoding is universal in a binary form, the germane question is: How does this

    relate to a sorting algorithm? Simply put, finding another universal property allows forgenerality, and encoding is not ordering. Hence the algorithm using encoding is not comparison

    sorting, and the linearithmic O(N lg N) lower bound is inapplicable.

    Utilizing the encoding, a sorting algorithm can map the data elements instead of comparing a

    data element to another, which is the basis for a general-purpose linear sorting algorithm. A

    function or operator is needed to use the encoding to arrange the data elements from an initialrandom permutation into a sorted permutation. The sorting algorithm is the binar sort algorithm.

    Ubiquity of Encoding

    An obvious realization is that while Unicode is widely used, there is no one, single, universal

    encoding of data, particulary for character data. It would seem that using the encoding as a

    universal property to sort upon is contradictory as there is no universal encoding. No universalencoding implies that there is no universal property, hence that sorting is no general-purpose.

    While there is no one universal encoding for data, all the various different encodings can betranslated or converted to one another. Even a proprietary encoding can be translated, out of the

    need to work with the bulk of existing software, unless the originator of the proprietary encoding

    intends to implement all the various forms of software to work with the encoding. The translationand conversion is a necessity for use of existing software. No matter how elegant and efficient an

    encoding, the bulk of software exists using standardized encodings of data.

    The universality is that all encodings are translatable to one another, so encoding is universal inthat each encoding is possible to use for a different platform or system.

    More formally, for given encoding domain Ex there exists a translation function Fxy that

    translates an encoding domain Ex into the encoding range Ey. A character or glyph cEx inencoding Ex is mapped by Fxy into the encoding Ey for a character cEy.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    7/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -7-

    When cEx Ex cEy Ey then for Fxy, it follows Fxy : ExEythereforeFxy(cEx) cEy.

    An encoding this is a subset of another larger encoding, the translation for the subset (a partialmapping function for Ex mapped by Fxy to Ey) is possible with all other characters outside the

    domain encoding mapped to a default character in the range encoding.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    8/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -8-

    2. Algorithm SynopsisThe binar sort partitions or divides data elements from the initial array into the lower and upper

    sub-arrays. The algorithm extracts the nth

    bit from a given data element, and then uses that bit to

    map the element in the lower or upper sub-array.

    Diagram 1 of Two Sub-Arrays at Extreme Ends of the Original Array

    Two sub-arrays are created within the original array starting at the extreme endpoints and pro-

    gressing towards each other at the middle of the initial array.

    Diagram 2 of Two Sub-Arrays Approaching One Another

    The process continues on each created sub-array using the next bit position in the data elements.

    Diagram 3 of Two Sub-Arrays Encompassing Original Array Approaching One Another

    This process continues until either the last bit in the data element is reached, or the size of a sub-

    array is one element.

    Diagram 4 Illustrating Two Sub-Arrays at Intersection or Crossover

    The resulting array at the base case for termination cannot partition into more sub-arrays, or there

    are no more bits for extraction. The original array is now in a sorted permutation.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    9/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -9-

    3. Operation of AlgorithmThe binar sort algorithm operates both iteratively and recursively in place on the original array of

    data elements.

    The algorithm operates with four discrete steps that are:

    1. Evaluate for recursive base case.2. Initialize starting array bounds.3. Partition the array into sub-arrays.4. Determine recursive call on sub-arrays.

    Evaluate for Recursive Base Case

    The first step is to evaluate the passed parameters for termination of the algorithm, to determineif a recursive base case has been reached. When one of the two possible base cases of the binar

    sort is reached, the recursion terminates, and the call returns without any further operation on thepassed parameters.

    The two criteria for the base case of the recursion are:

    1. Reach the end of the bits in an element.2. The size of a sub-array is one element.

    The first case is to reach the end of the number of bits for a data element. In effect, there are nomore bits to extract to use for partitioning. The second case is the check of the bounds of the ar-

    ray parameter for any elements to partition into a sub-array. For an array of one element, there is

    no point partitioning, as the element is in its final position. For either case, the operation of thealgorithm terminates.

    Initialize Starting Array Bounds

    When the binar sort algorithm does not terminate, then the operation proceeds to initialization.

    The passed array bounds are used to initialize the bounds for the original array. The passed array

    bounds are retained for use later in the operation of the algorithm. The bounds of the original ar-

    ray are used by variables to track the changing boundaries of the original array during partition-ing. After initializing the original array boundaries, the operation then proceeds to partition.

    Partition Array into Sub-arrays

    The heart of the binar sort algorithm is the continual partitioning of the original array. The parti-

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    10/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -10-

    tion step of the algorithm divides the elements of the original array into lower and upper sub-

    arrays.

    The partition operation has three steps:

    1.

    Extract the n

    th

    bit from the data element in selected position.2. Map the data element in the correct sub-array.3. Adjust sub-array boundary to encompass element.

    The initial start position used in the operation is the lower index position of the array. The start-

    ing position acts as the locus of partitioning the array. The data element at the locus position is

    the working element mapped into a sub-array.

    Bit Extraction

    Before partitioning the working data element, the nth bit is extracted to determine the sub-array

    to map the element. Bit extraction uses a shift operation to the left by N-bits, and a bit mask. Thebit mask is a literal value that depends on the data element word size in bits. The result is all the

    bits are zero or the value of the literal bit mask.

    For a data element of a nybble or 4-bits, the bit mask is hexadecimal 0x8 or a binary number of

    10002 for an integer value of 8.

    For a data element using the bit mask with a logical and bitwise operation is:

    XXXX2 && 10002 X0002

    The resulting value for a nybble is either hexadecimal 0x0 or a binary number 00002 the integer0, a hexadecimal 0x8, or a binary number of 10002 for the integer value of 8, which is a non-

    zero. Thus for a bit extraction, the result of the extracted bit for a data element is essentially a

    zero or non-zero (the literal bit mask) value.

    Placement of Element

    The placement of the data element is dependent upon the bit value from the bit extraction. De-

    pending on the extracted bit value, the data element is placed by one of two possibilities. The

    data element selected is in the lower sub-array.

    The two possibilities are:

    1. Data element is in the correct position in the lower sub-array, extend sub-array boundary.2. Data element is in the wrong position, exchange with the element in the upper sub-array.

    With an extracted bit value of zero, the data element is in the correct position in the lower sub-

    array, and nothing is exchanged. For an extracted bit value of non-zero, the data element is ex-

    changed or swapped with the data element in the upper sub-array. In both cases, it then followsto adjust the array bounds after the data element is placed.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    11/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -11-

    Adjust Array Bounds

    Once the data element is placed in the correct sub-array, the sub-array boundaries are adjusted to

    encompass the element. For the lower sub-array, the boundary index is incremented, and for the

    upper sub-array, the boundary index is decremented. The lower and upper array boundaries ap-proach each other as each data element is partitioned.

    Partition Repetition

    The partitioning process continues iteratively for each data element in the array. The iterative

    process continues until the array bounds cross or intersect, when all the elements are partitionedinto the sub-arrays for a bit position. The remaining step is to determine the recursive continua-

    tion of the operation of the algorithm.

    Determine Recursive Continuation

    The last step of the binar sort is recursive continuation of the algorithm on the result of partition-

    ing. The partitioning creates one or two sub-arrays. For one sub-array, no partitioning occurred,which is pass through. A result of two sub-arrays, the partitioning process successfully divided

    the data elements of the original array.

    The continuation of the algorithm must determine the results of partitioning. The original arraybounds are evaluated using the index of sub-array bounds after partitioning, and partitioning con-

    tinues recursively.

    4. Illustration of the AlgorithmThe binar sort algorithm works on different data types (such as character, integer, ordinal, real,

    and string) of different data bit sizes. The word size of the data element is a constant c of the al-

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    12/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -12-

    gorithm. In illustration of the algorithm, a nybble, or a 4-bit word size (a single hexadecimal

    number) is used.

    Consider an initial array of nybble values which are eight nybbles to sort into an ordered permu-

    tation. The bit mask for the bit extraction is hexadecimal 0x1000, or integer value of 8.

    First Pass at Partitioning

    The first pass at partitioning divides the original array (in the center) into a lower sub-array (onthe left) and an upper sub-array (on the right). Initially before any partitioning the configuration

    of the array, lower sub-array, and upper sub-array is:

    Figure 1 Initial Array of Elements before Partitioning

    The selected element is 'B', and the bit extracted is from the most significant position, which is a

    non-zero. The element is not in the correct position, so is swapped with the element at the boun-

    dary of the upper sub-array, the 'E' element. After the exchange, and array boundary adjustment

    the configuration is:

    Figure 2 Array of Elements after Partitioning First Element

    The selected element is 'E', and the bit extracted is non-zero. Again, the element is not in the cor-

    rect position, so is swapped with the 'C' element. The configuration is:

    Figure 3 Array of Elements after Partitioning Second Element

    The selected element is 'C', and the bit extracted is non-zero. The element is not in the correct

    position, and so is swapped the 'A' element. The configuration is:

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    13/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -13-

    Figure 4 Array of Elements after Partitioning Third Element

    The selected element is 'A', and the bit extracted is non-zero. The element is not in the correct

    position, and so is swapped the '7' element. The configuration is:

    Figure 5 Array of Elements after Partitioning Four Element

    The selected element is '7', and the bit extracted is zero. The element is in the correct position,

    and so only the array bounds are adjusted. The configuration is:

    Figure 6 Array of Elements after Partitioning Fifth Element

    The selected element is '4', and the bit extracted is zero. Again, the element is in the correct posi-

    tion, and so only the array bounds are adjusted. The configuration is:

    Figure 7 Array of Elements after Partitioning Sixth Element

    The selected element is '0', and the bit extracted is zero. The element is in the correct position,and so only the array bounds are adjusted. The configuration is:

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    14/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -14-

    Figure 8 Array of Elements after Partitioning Seventh Element

    Again, as the selected element is '0', and by the same process, the final configuration of:

    Figure 9 Array of Elements after Partitioning Eighth Element

    The algorithm continues on each sub-array as an array, further partitioning them on the next bit

    position.

    Partition from Initial Lower Sub-Array

    The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

    guration is:

    Figure 10 Lower Sub-array of Elements before Partitioning

    The selected element is '7', and the extracted bit is non-zero. The element is placed with an ex-

    change, and the configuration is:

    Figure 11 Lower Sub-array of Elements after Partitioning First Element

    The selected element is '0', and the extracted bit is zero. The element is in the correct position,

    and the configuration is:

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    15/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -15-

    Figure 12 Lower Sub-array of Elements after Partitioning Second Element

    The selected element is '4', and the extracted bit is non-zero. The element is placed with an ex-

    change, and the configuration is:

    Figure 13 Lower Sub-array of Elements after Partitioning Third Element

    The selected element is '0', and the extracted bit is zero. The element is in the swapped into posi-

    tion (with itself as there is only one element), and the configuration is:

    Figure 14 Lower Sub-array of Elements after Partitioning Fourth Element

    The resulting lower and upper sub-array elements are in the correct positions. However, parti-

    tioning would continue, but in the case of pass through resulting in only one sub-array. The re-

    maining third and fourth bit positions are evaluated, and when the last bit position is reached.

    Partition from Initial Upper Sub-Array

    The partitioning of the lower sub-array is on the second bit position. As before the initial confi-

    guration is:

    Figure 15 Upper Sub-array of Elements before Partitioning

    The selected element is 'A', and the extracted bit is zero. The element is in the correct position,

    and the configuration is:

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    16/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -16-

    Figure 16 Upper Sub-array of Elements after Partitioning First Element

    The selected element is 'C', and the extracted bit is non-zero. The element is swapped into the

    correct position, and the configuration is:

    Figure 17 Upper Sub-array of Elements after Partitioning Second Element

    The selected element is 'B', and the extracted bit is zero. The element is in the correct position,

    and the configuration is:

    Figure 18 Upper Sub-array of Elements after Partitioning Third Element

    Lastly, the selected element is 'E', and the extracted bit is non-zero. The element is swapped

    (with itself as there is only one element) and the array bounds adjusted. The final configuration

    is:

    Figure 19 Upper Sub-array of Elements after Partitioning Fourth Element

    The lower sub-array for the third and fourth bits the elements are in the correct position, the caseof pass through. For the upper sub-array, the elements are positioned on the third bit, but then the

    recursion terminates, as the sub-array size is one element.

    Summary of Illustration of the Algorithm

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    17/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -17-

    The partitioning process starts from an initial array of nibbles or hexadecimal numbers, and con-

    tinues until the end of the last bit (pass through), or when the sub-array is only of one element insize. Consider the array configuration into sub-arrays for each bit position, starting from an ini-

    tial array. The configuration for each pass is:

    Figure 20 Partitioning Hierarchy from Initial Array into Sub-arrays

    On the partitioning with the second and third bit, the sub-array with [E C] is partitioned into

    two sub-arrays of a single element. Thus, for the remaining bit positions, no partitioning occurs

    as the recursion terminates.

    However, the other sub-arrays are the case of pass through, no data elements are partitioned. This

    continues until the last bit position is reached, and the recursion terminates. Both recursive cases

    are used in the overall partitioning process.

    After the last bit position is evaluated for the three sub-arrays that are partitioned but result inpass through, the overall array is in the final configuration of:

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    18/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -18-

    Figure 21 Resulting Array after Partitioning into Sorted Permutation

    The configuration is, by Knuth's formal definition, a sorted permutation of the original array of

    elements. The partitioning on the sub-arrays operates in place, within the index boundaries for

    sub-arrays within the overall array.

    5. Analysis of AlgorithmThe binar sort is linear in both space and time complexity. The algorithm requires no extra sto-

    rage space or memory, and performance time is proportional to the number of data elements.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    19/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -19-

    Space Analysis

    The binar sort is an in-place sorting algorithm; hence the same initial array is used for each re-

    cursive invocation of the algorithm for partitioning. The sub-arrays are determined by the passed

    parameters as index boundaries within the array. The only change is of the variables for each re-

    cursive call of the binar sort.

    The array ofNelements and a constant number of variables c for reach recursive invocation is

    the space complexity S:

    O(S) = N + c

    As there are multiple invocations through recursion, the constant is the sum of each recursive

    call, or for i total calls recursively:

    O(S) = N + (c0 + ... + ci-1)

    The expression can be simplified using summation notation to the expression:

    O(S) = N +

    jcj0

    i1

    But the summation of a constant is the product of the constant:

    jcj 0

    i1

    = icj = c

    Effectively the sum of the constants c for i recursive calls is simply another constant c, or inBig-Oh notation a constant c, so the space complexity for the binar sort is:

    O(S) = N + c = O(N)

    The space complexity of the binar sort is linear or O(N), proportional to the number of data ele-ments.

    Time Analysis

    The analysis of the performance time of sorting algorithms typically considers different cases,from an optimal to a worst case. Each case is simply a different permutation for the particular

    sorting algorithm is better or worse performance. Frequently, the permutations of the data for the

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    20/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -20-

    different cases are for an ascending sorted permutation, a descending sorted permutation, and a

    random permutation.

    This approach to sorting algorithm analysis is used for the comparison-based sorting algorithms.

    An unasked question is "Why use various permutations for performance time analysis?" The an-

    swer is that comparison-based sorting algorithms do a relative comparison for positioning a dataelement. Each comparison results in a relative positioning of the data element in the overall array

    of data elements. Thus, the initial permutation of the data elements can impact sorting perfor-

    mance.

    Why consider this particular point for time analysis of a sorting algorithm? Simply, the binar sort

    does not use comparison to place an element in position, but the extracted bit values. For theconstant c total bits or word size of a data element, each bit constitutes 1/c of the process in the

    correct positioning a data element within the array. Each positioning is absolute within the over-

    all array, not relative to the other data elements. Effectively, the initial permutation of the data

    elements is superfluous to the performance time of the binar sort. Comparison is not used to rela-

    tively place the element; each bit is used to put an element in an absolute position.

    In the analysis of the performance time, considering worse and optimal cases is irrelevant as thebinar sort is not comparison-based, so is independent of any particular permutation of the data

    elements.

    Analysis of the Performance

    Analysis of the binar sort algorithm performance time does not use specific cases. The analysis

    approach is to consider each discrete step, and consider the overall total performance time O(T)as the sum of each step. Thus the initial analysis of the performance time of the binar sort is:

    O(T) = O(T0) + O(T1) + O(T2) + O(T3)

    Each discrete step has its own performance time, and for each discrete step the performance time

    is:

    1. O(T0) - Evaluate for recursive base case.2. O(T1) - Initialize starting array bounds.3. O(T2) - Partition array into sub-arrays.4. O(T3) - Determine recursive call on sub-arrays.

    Except for the partition step of the algorithm, all the other steps are constant in the performance

    time. For the last step involving a logical decision of the recursive call, the complexity of theevaluation is constant for the decision, and the greater performance time for two recursive calls is

    used as the constant. Substituting a constant for each step, the performance time of the binar sort

    becomes:

    O(T) = O(c0) + O(c1) + O(T2) + O(c3)

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    21/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -21-

    The expression simplifies the constants to a single constant c, and then is:

    O(T) = O(T2) + c

    Partitioning Analysis

    The analysis of partitioning is not constant, as the partitioning step in the algorithm is iterative.

    However, the iterative step processes through all elements in the array, and for the recursive con-

    tinuation the sub-arrays, which are divisions of the overall array. Thus partitioning is a constantperformance time c for the bit extraction and determination of which sub-array to place a data

    element. But this process is repeated n times, or once for each data element. This means that the

    performance time for the partitioning step is:

    O(T2) = cN

    Substituting this into the original performance time expression for the algorithm is:

    O(T) = cN + c

    But for the Big-Oh time complexity, this expression simplifies to:

    O(T) = N

    Recursion Analysis

    One important consideration in the performance time analysis, is that for the expression as the

    sum of the performance time for each discrete step, is that it is for one operation of the algorithm

    on a single bit position. This means the performance time expression is:

    O(T) = 1N

    For each recursive pass, each bit of each data element is accessed once, and there are a constant

    number c of bits (and for other data types with variants, such as a string of characters, there is an

    overall average length for a data element that is independent of the number of data elements).This means that the recursion will occur for each bit position, so that the performance time is:

    O(T) = cN

    Again, the performance time is O(N) linear.

    Summary of Performance Analysis

    Regardless of the permutation of the data elements, the binar sort in the worst case will access a

    data element once for each bit. Hence forNdata elements with a constant number of bits c, theperformance time is O(N). The binar sort algorithm operates independently of the permutation of

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    22/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -22-

    the data elements, so avoids a worst or optimal case. Thus, the binar sort algorithm remains con-

    sistently linear in performance time.

    6. Applied PerformanceThe binar sort is implemented in both managed code (in Java and C#) and native code (using C)

    to evaluate the applied performance of the algorithm. The code was executed on different plat-

    forms, from an AMD 64-bit PC with Windows 7, an PowerPC eMac running Mac OS X, an Intel

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    23/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -23-

    Core Duo MacBook, and Linux Pentium 4 PC. The managed code was executed on the Common

    Language Runtime (CLR) of .NET, and the Java Virtual Machine (JVM).

    The same data sets are used for all the implementations of the binar sort. The test set examined

    three cases:

    1. Unsorted integers

    2. Sorted integers

    3. Partially sorted integers

    In each case, the test was executed, and the performance time and size of the data written to a

    data log file. The test sizes are varied, but as increments of blocks from a size of 216

    = 65536 in-tegers, and blocks the size of 220 = 1,048,576 integers.

    The performance is linear, with a statistical covariance ranging from 0.9994 to 0.9999 in the

    graph of the performance. The applied implementation of the binar sort was consistent with the

    theoretical expectation, linear O(N) performance.

    Chart of Test for Binar Sort Performance at 0.9994 Covariance

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    24/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -24-

    Chart of Test for Binar Sort Performance at 0.9996 Covariance

    Chart of Test for Binar Sort Performance at 0.9997 Covariance

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    25/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -25-

    Chart of Test for Binar Sort Performance at 0.9999 Covariance

    Chart of Test for Binar Sort Performance at 0.9999 Covariance on Megabyte Block

    Summary of Charts

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    26/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -26-

    The charts of the performance of data size N to time T illustrate a linear, straight line. The size ofdata increases linearly, and then the time to sort the data increases linearly in proportion to a con-

    stant. The covariance varies from 0.9994 to 0.9999, but is 0.999 for each equation. Hence the

    equations are statistically accurate for the binar sort performance.

    Chart of Binar Sort Performance Equations High Scalar Range Plot

    When the equations of data size to sort time are plotted, the lines are congruent, and some paral-

    lel to each other.

    Chart of Binar Sort Performance Equations Lower Scale Range Plot

    The equations change along the position along the axes, but the plot of the lines are consistent.This indicates how the binar sort performance is consistently linear, with a change in the equa-

    tion constants from the data sizes and the runtime platform.

    7. Future Work

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    27/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -27-

    There are two major potential aspects for further work with the binar sort that are:

    1. Variations2. Optimization

    Algorithm Variants

    The binar sort is recursive and executes on a serial processor. This leads to the opportunity for:

    1. Purely iterative implementation.2. Parallel version of the algorithm.

    The algorithm variant of purely iterative is to replace recursion with iteration. The implementa-

    tion is more an investigation into the potential possibility, so an open research question. An itera-

    tive version would remove the overhead in time and space of activation records on the runtime

    stack of the environment. A comparison of the performance of a recursive to iterative implemen-tation of the binar sort would show possible performance improvement.

    The binar sort executes on a serial processor, and thus it has potential parallelization. A parallel

    version of the binar sort would let each processor focus on the partition for the processor identi-

    ty. The interesting feature is that from the initial array of data elements, each parallel operation is

    completely independent of the other. Theoretically, the linear algorithm ofO(N) increases in per-formance by O(N/P) where there are a total ofP-processors.

    Optimize the Algorithm

    The binar sort is efficient as a linear O(N) algorithm, but optimizations are possible in the opera-

    tion of the algorithm. Two such possible optimizations are:

    1. Improve for the partitioning case of pass through.2. Optimize for determining if elements in sorted order.

    A potential optimization is for pass through when partitioning, essentially no data elements are

    partitioned into sub-arrays. The cost of a recursive call to continue partitioning in the same

    bounds for the next bit position is optimized by iteration. Instead of a recursive call, the partition-ing would proceed but for the next bit position. This also simplifies the continuation recursive

    step, as the recursion would always be for two sub-arrays, not just a single one.

    The illustration of the operation of the binar sort demonstrated that the algorithm continues therecursive invocation of the algorithm when the entire array is in a sorted permutation. This is

    wasteful of process time, and increases the overhead of additional recursion without any further

    need. A viable optimization is to check within the original array bounds to verify if the data ele-

    ments are in sorted order, and indicate a sorted permutation exists.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    28/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -28-

    The difficulty with the optimization is to avoid unnecessary checks when the array is not poten-

    tially sorted, but to determine when the array is sorted to avoid unnecessary recursive partition-ing. For example, after several occurrences of pass through while partitioning a sorted order

    check is performed. If the array is in a sorted ordered arrangement, the recursive calls in other

    sub-arrays would terminate.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    29/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -29-

    8. ConclusionThe binar sort is a linear, unstable, and recursive sorting algorithm. The binar sort algorithm is

    independent of the initial data permutation. By using the bits of the encoded data elements, the

    binar sort arranges the data elements into a sorted permutation. Utilizing the encoding makes the

    binar sort universal, but also a linear algorithm independent of any initial permutation for ex-treme performance cases.

    Beyond the features of the binar sort algorithm, are some intriguing realizations. One is that ge-

    nerality and performance are not a hard, fast, tradeoff that is frequently presumed. The binar sort,

    by its nature illustrates that such a tradeoff is not immutable. The binar sort algorithm illustratesthat the linearithmic, comparison based sorting algorithms have an implicit presumption that or-

    dering is the only universal property feasible for generalized sorting. This implicit presumption is

    flawed, and the other universal property is (of which all computers at the time of this writing are)

    the universal property of the encoding of data.

    In the analysis of the binar sort algorithm, the permutation of the data is superfluous and irrele-vant to the performance of the binar sort. This illustrates that comparison based sorting algo-

    rithms are sensitive to the permutation of the data set to sort. By using encoding instead of order-ing, the binar sort avoids the sensitivity flaw to the permutation of the data. This flaw in sensitiv-

    ity to the permutation of the data is an accepted defect in existing comparison based sorting algo-

    rithms, but again it is an implicit presumption that this is necessary.

    The binar sort algorithm breaks new ground, but any further development is far from complete or

    a finished. Further work is necessary for improvement and optimization of the binar sort. A pure-ly iterative variant, and a parallel version of the algorithm are just two possibilities of future

    work. Future developments are anticipated with possibly other algorithms inspired by the binar

    sort.

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    30/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -30-

    Appendix A. Binar Sort Java Source Code for Integer

    voidBinarSort(int lower, int upper, int pos, int[] array)

    {

    if(pos == 33 || upper < lower + 1) return;

    int lo = lower; //array lower bound positionint hi = upper; //array upper bound position

    while (lo < hi + 1){

    int bit = (array[lo]

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    31/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    -31-

    Appendix B. Binar Sort C Source Code for Integer

    voidbsort(constint low, constint high, constint pos, int list[])

    {

    int lo = low;int hi = high;

    if( pos == 32 || high < low + 1 ) return;

    while (lo < hi + 1)

    {

    if( (list[lo]

  • 8/6/2019 Binar Sort a Generalized Sorting Algorithm

    32/32

    Binar Sort: A Linear Generalized Sorting Algorithm

    References

    [ANSI 1986] American National Standards Institute, "Coded Character Set - 7-bit American

    Standard Code for Information Interchange", ANSI X3.4, 1986.

    [Floyd 1990] Floyd, Thomas L. Digital Fundamentals, 4th edition. Merril Publishing Company,Columbus, OH. 1990.

    [IEEE 1985] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arithmetic,IEEE Std 754-1985, 1985.

    [Johnsonbaugh and Schaefer 2004] Johnsonbaugh, Richard and Schaefer, Marcus. Algorithms.

    Prentice-Hall, Upper Saddle River, New Jersey, 2004.

    [Knuth 1998] Knuth, Donald E. The Art of Computer Programming, 2nd

    edition. Addison-Wesley, Reading, Massachusetts, 1998.

    [Null and Lobur 2003] Null, Linda and Lobur, Julia. The Essentials of Computer Organization

    and Architecture, Jones and Bartlett Publishers, Sudbury MA. 2003

    [Roth 1992] Roth, Jr., Charles A. Fundamentals of Logic Design, 4th edition. West Publishing

    Company, New York, NY. 1992.

    [Unicode 2006] The Unicode Standard, Version 5.0, Fifth Edition, The Unicode Consortium,

    Addison-Wesley Professional, 2006.

    [Yergeau 2003] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC3629, November 2003.