data structures arrays phil tayco slide version 1.0 feb 02, 2015

Data StructuresArrays

Phil Tayco

Slide version 1.0

Feb 02, 2015

Arrays

Our first traditional data structure

• Arrays in modern programming languages have different forms (ArrayList, dynamic memory allocated arrays, dictionaries, etc.)

• Depending on the language, many of the constraints we discuss may appear to be addressed

• We look at a more traditional view of an array and its design intentions versus a specific programming languages implementation

Arrays

Definition

• This structure begins with reserving a specified amount of space for n number of elements

• Each element is the same data type• Direct random access of any element is possible• Element location is referred to as an “index” with

the first index starting at a value of 0

Arrays

Considerations

• A specified amount of space for n number of elements must be reserved. This means you must consider maximum capacity at creation and implies some space may not be used in the program

• Direct random access of any element is possible suggesting fast performance getting to any record

Arrays

Functional usage: Insert

• Adding an element into an array requires 2 essential steps– Ensure there is enough space in the array to add the

new record– If necessary, adding the record while preserving the

intended order of the array

• Many times, an additional variable representing current size is maintained along with the array (e.g. the capacity of an array may be set to 100 but the current number of existing elements may be 5)

Arrays

Insert unordered

• If the order is insignificant, the key step after ensuring there is enough space is to find the next available space and adding the new record there

• An effective way to maintain this is to use the current size variable as the index for the next available space to add a record

• Before insert, if the current size is already equal to the array capacity, the insert cannot be performed

• If insert is performed, the current size must be incremented

Arrays

Sample code for insert unordered:

// Given Type[] array = new Type[CAPICITY];// Given currentSize = 0;

boolean insert(Type element){

if (arrayCapacity == array.length)return false; // Array is at maximum

capacity

array[currentSize++] = element;}

Arrays

Insert unordered analysis

• Using the comparison operation type and a worst case scenario of all elements filled, the Big O for this algorithm is O(1): There will always be one comparison performed

• The performance is also the same in all other cases (if the list is empty or partially full)

• If the list is unordered, this algorithm is most effective given that its performance is constant

• Question: would it even be possible to eliminate the need to perform a comparison?

Arrays

Insert unordered algorithm 2

• Given this ideal performance, it may seem unnecessary to explore other algorithms

• As analysts on the never ending quest for something better, we should take a look at other ideas

“Always keep an open mind and a compassionate heart” – Phil Jackson

Arrays

Insert unordered algorithm 2

• In the previous algorithm, the new element gets added to the end of the array

• What if we added it to the beginning of the array?– All elements “in front” of the new record must be shifted

over to the next spot– The need to check for capacity must still be performed

Arrays

Sample code for insert unordered 2:


boolean insert(Type element){

if (arrayCapacity == array.length)return false; // Array is at maximum capacity

for (int n = currentSize; n > 0 ; n--)array[n] = array[n-1];

array[0] = element;currentSize++;

}

Arrays

Insert unordered 2 analysis

• The code is the same as algorithm 1 with the addition of a loop which means there will at least be one comparison

• Can the loop add to the performance time? Definitely in the worst case!– Worst case is when only one space is available– The loop must perform (n-1) comparisons to shift everything

before adding the new record

• In the worst case, performance is O(n) while best case is O(1) – can also be viewed as the more the array grows in used spaces, the more the performance goes toward O(n)

Arrays

Insert unordered algorithms comparison

• Clearly, the first algorithm is preferred. Does that mean the second algorithm has no application?

• Perhaps there are situations where the order of the contents of the array don’t matter, but it is particularly important to know what the last element was added to the list

• Remember the saying at the beginning: “It depends” – in this case, how a record is deleted could be a factor

• Before examining the delete algorithms, we should look at the Search and Update functions

Arrays

Search

• As discussed in the introduction, the search in an unordered list is an O(n) operation in the worst case using comparisons as the operation type

• Worst case situation is the record is found in the last location or does not exist in a full array

• Best case is the record is found on the first try, but how much can that be viewed as a reliable way to measure algorithm effectiveness in this case?

• Average case is going to be somewhere in between best and worst, which at this point is “good to know”

Arrays

Linear Search

• Since the algorithm performs at O(n), an unordered array search is often referred to as a “Linear search”

• Sample code that follows returns the index location of the record if found or -1 representing not found

Arrays

Sample code for linear search:


int search(Type element){

for (int n = 0; n < currentSize; n++)if (array[n] == element)

return n;

return -1;}

Arrays

Linear search analysis

• Clearly, this is O(n) performance. However, notice the actual number of comparisons– 1 comparison to control the loop– 1 comparison to check between element and array index

• The actual number of comparisons in the worst case is 2 * n. Why isn’t this referred to as O(2n)?– Our initial analysis is to identify the algorithm category– If we want to compare between two O(n) algorithms, the

2n becomes more significant

Arrays

Can we do better?

• The inclination is to see if we can develop an algorithm that performs better than this. The challenges are:– 2 comparisons are always needed to control going through the

array and checking if we found the element– Worst case requires going through every single element in the

array– Arbitrary start points (end of array or middle of array) and

random hopping around the array do not improve the performance in the worst case and actually make things more complicated

• Think of a word search puzzle. If your algorithm is to look for the first letter of your search word in the puzzle, no matter how you jump around, the worst case is still ending up checking every letter

Arrays

Update

• Now that we’ve looked at the search algorithm, update (and delete) can take advantage of this

• Update is a search for an element and if found, modifying it while maintaining the design intent of the structure

• In this case, the design intent is an unordered list making a modification of a record simple (there’s no need to check if the array order needs to be maintained because there is no order)

• We can use the search algorithm in our update

Arrays

Sample code for update:


boolean update(Type oldElement, Type newElement){

int searchIndex = search(oldElement);

if (searchIndex == -1)return false; // Element to update is not found

array[searchIndex] = newElement;return true;

}

Arrays

Update analysis

• Because the algorithm uses the search algorithm, the performance initially depends on the Big O of the search

• Since the search is linear in the worst case, update performance will at least be O(n) as well

• Are there additional comparisons to consider?– Notice after search is performed, there is one more

comparison done to check if the record was found– Technically, this algorithm is O(2n + 1), but as far as a

category is concerned, this is simply O(n)

• Thus, the update performance is also linear in the worst case

Arrays

Delete

• Now the delete. In most data structures, the delete function is usually found to be the most complex and is often treated last among the 4 functions

• The algorithm of this function also tends to affect the performance of the other functions making choosing between delete algorithms needing to consider the impact insert, search and update

• The design intent of this structure is still unordered, so maintaining order after removing an element is not a big deal

Arrays

Delete

• However, another design intent to question is whether or not to have “holes” in the array

• In this case, we don’t want holes because that impacts the use of the “currentSize” variable as the location of the next available element

• To account for this, the algorithm requires the following steps:– Use the search algorithm to find the element to remove– Shift over “to the left” any records after the record– Reduce the current size by 1

ArraysSample code for delete:


boolean delete(Type element){

int searchIndex = search(element);

if (searchIndex == -1)return false; // Element to update is not found

// “Remove” element at array[searchIndex] here

for (int n = currentSize; n > searchIndex; n--)array[n-1] = array[n];

currentSize--;return true;

}

Arrays

Delete analysis

• Because the algorithm uses the search algorithm, the performance initially depends on the Big O of the search

• Question: What is the performance of the shift that occurs after the search is performed?

• Answer: In the case of all elements requiring a shift, it will be O(n)

• Question 2: What does this make the performance of delete?

Arrays

Big O of delete in the worst case

• The worst case of the shift is O(n) implying that the overall order is technically O(2n) + O(n) or O(3n)

• If this is true, the algorithm should still be considered as being in the linear category of O(n), but it would be good to note that it is still has a higher performance against search and update

• However, it is not really O(3n)– In the worst case of the shift, the search will perform in the best

case (because the first element will be found on the first try!)– In the worst case of the search, the shift of the entire array will

also not need to be performed– The effective order is still O(n), but it is a lot closer to O(2n)

than O(3n) as the code implies

Arrays

Unordered Array Summary

• Worst case scenarios show the performances as:– Insert: O(1)– Update, Search and Delete: O(n)

• Holes are not intended in this data structure to ensure the currentSize variable is utilized properly

• Question: how are these performances impacted (if any) if duplicate record values are allowed?

• Update and Delete are dependent on Search – if Search were somehow improved, Update and Delete could be positively impacted

• Search is improved dramatically if a sense of order is maintained. This is our next topic: Sorted arrays

Sorted Arrays

Definition

• As the name implies, a sorted array is an array where the elements maintain some sense of order

• This order can be anything that makes sense within the context of the data elements used:– Numerical sequential order of a key value like social security numbers– Reverse alphabetical order of strings like student last names– Function call order managing code execution– Numeric and mathematical symbol order for calculating equations– Character sequences representing compressed or encoded text– More and more…

• The main goal of maintaining any order is that its use is designed to improve the efficiency of any, if not all, supporting functions such as search

Sorted Arrays

Binary Search

• A linear search in a non sorted array performs at O(n) in the worst case

• By establishing an order, the binary search algorithm can be applied and significantly improves performance in the worst case

• The algorithm:– In an array of n elements, go to index [n/2]– If the record there is the one you want, you are done– If the record value there is smaller than your search value, all records

less than the current record can be ignored – set your range of elements to [n/2+1…n] and return to step 1

– Otherwise, set your range of elements to [0…(n/2)-1] and return to step 1

– Repeat this loop until you have 0 elements (record is not found) or record is found

Sorted ArraysCode for binary search:

// Function returns -1 if element is not found or index of found elementint binarySearch(Type searchValue){

int lowIndex = 0;int highIndex = currentSize - 1;int currentIndex;

while (highIndex >= lowIndex){

currentIndex = (lowIndex + highIndex) / 2;if (array[currentIndex] == searchValue)

return currentIndex;else if (array[currentIndex] > searchValue)

highIndex = currentIndex – 1;else

lowIndex = currentIndex + 1;}

return -1;}

Sorted Arrays

Binary search analysis

• Using the comparison operation as a unit of measure, each iteration can be seen at worst as 3 comparisons performed

• Worst case scenario is an element not found:– 10 elements: 3 * (4 iterations) + 1– 100 elements: 3 * (7 iterations) + 1– 1000 elements: 3 * (10 iterations) + 1

• What is the formula that captures the relationship between the size of the list and the number of iterations? (Take the “3 *” and the “+ 1” out of the equation)

Sorted Arrays

Binary search analysis

• Worst case scenario is an element not found:– 10 elements: 3 * (4 iterations) + 1– 100 elements: 3 * (7 iterations) + 1– 1000 elements: 3 * (10 iterations) + 1

• What is the formula that captures the relationship between the size of the list and the number of iterations? (Take the “3 *” and the “+ 1” out of the equation)

Sorted Arrays

Exponential growth

• The formula is based on the fact that the algorithm cuts the range of the search size in half with each iteration

• On the growing side of things mathematically, numbers that double in size are exponentially growing by a factor of 2:– 8 = 23

– 16 = 24

– 32 = 25

Sorted Arrays

Captain’s log

• The converse of an exponent is a logarithm which can be similarly applied to these numbers:– log2 8 = 3– log2 16 = 4 – log2 32 = 5

• Since the binary search algorithm cuts the array size in half with each iteration, the number of iterations can be related to the size of the list in a similar way:– log2 n = number of iterations

Sorted Arrays

To be precise…

• The size of a list is often not a base 2 number• Truncate the result of log2 n and add 1:

– log2 n = floor(number of iterations) + 1

• For example, with 100 elements, log2 100 equals about 6.64. Truncate it to 6 and add 1 and you get 7 iterations

• This makes for a fun game with kids. Think of a number between 1 and 1000 and you can find it in 10 tries!

Sorted Arrays

Back to Big O

• Now we have a way to categorize the performance of binary search

• As the list grows in size, the performance grows logarithmically. We can be precise if we need to, but as a category, this is O(log n)

• O(log n) is much better than O(n)• O(log n) is usually found with algorithms that cut search

ranges in half (known as the “divide and conquer” method which we will see more of later)

• You could say with smaller n values, O(n) is better than O(log n), but remember that the purpose of the categories is to characterize the performance as the list grows in size

Sorted Arrays

So sorted arrays are better, right?

• Recall worst case scenarios for unsorted arrays:– Insert: O(1)– Update, Search and Delete: O(n)

• For sorted arrays, the Search improves to O(log n), but what about Insert, Update and Delete?

• All functions must now consider keeping the order of the array intact:– Insert needs to find the right location to add the correct element

making the performance degrade to O(log n) plus whatever work is necessary to shift elements

– Update can improve to O(log n), but the new key value may require moving the element to a new location and shifting other elements

– Delete can also improve to O(log n), but the elements must also be shifted to keep holes from forming

Sorted Arrays

Okay, well how do we stand so far:

Sorted Arrays

What do we do with the other 3 functions?

• Search is dramatically improved, no doubt making situations where saving data that is read only in a sorted state very significant

• If we have to update, insert and delete, then there are 2 schools of thought:

– Maintain the order as you perform these functions– Do not maintain order as you perform these functions and

only perform a sort when you need to (such as before a search takes place)

• Let’s next take a look at maintaining the order as the functions are performed

Sorted Arrays

Insert

• The algorithm requires searching for the correct location in the array for where the new element needs to be placed

• Once found, the elements to the right are shifted to make room for the new record

• Searching for the correct spot can use a similar approach as binary search for good performance

• However, the shift requires looking at elements linearly which results in an O(n) performance in the worst case

• Because of this, one approach is to perform a linear search for the insert spot first and then complete the O(n) with the shift

Sorted ArraysCode for insert:

boolean insert(Type newElement){

if (currentSize == array.length)return false;

int currentIndex = 0;while (array[currentIndex++] < newElement &&

currentIndex < currentSize);

int insertIndex = currentIndex - 1;for (int n = currentSize; n > insertIndex; n--)

array[n] = array[n-1];currentSize++;

array[insertIndex] = newElement;return true;

}

Sorted Arrays

Insert analysis

• The combination of the two loops together end up using comparisons that go through the entire list linearly no matter where the new element goes

• This is good for worst case scenario but also means at least an O(n) performance every time

• For very large size lists, O(n) may not be good and if worst case scenario is not expected, a binary search for the insert location first may be a better choice

Sorted ArraysCode for insert using binary search:

boolean binaryInsert(Type newElement){

if (currentSize == array.length)return false;

int lowIndex = 0;int highIndex = currentSize - 1;int currentIndex;

while (highIndex >= lowIndex){

currentIndex = (lowIndex + highIndex) / 2;if (currentIndex == 0)

break;else if (array[currentIndex] > newElement && array[currentIndex – 1] <= newElement)

break;

Sorted ArraysCode for insert using binary search:

else if (array[currentIndex] > searchValue)highIndex = currentIndex – 1;

elselowIndex = currentIndex + 1;

}

int insertIndex = currentIndex;if (newElement > array[insertIndex])

insertIndex++;for (int n = currentSize; n > insertIndex; n--)

array[n] = array[n-1];currentSize++;

array[insertIndex] = newElement;return true;

}

Sorted Arrays

Insert binary analysis

• More comparisons and functions are needed to handle finding correct insert index using binary search. In the long run, this is still O(log n)

• In the worst case, the algorithm will take longer than the first insert algorithm

• However, in average cases, the O(log n) search plus O(n) will be less than a full n

• Best case is finding the insert index on the first iteration (the middle of the array) making the shift an n/2 performance

Sorted Arrays

Algorithm analysis

• Algorithm 1 is always O(n) while algorithm 2 ranges from O(n/2) to O(log n) + O(n)

• Which to use? Remember “it depends”?– Insertion of random values suggests algorithm 2– Smaller array sizes suggests algorithm 1– Frequency of expected insertions also a factor

Sorted Arrays

Update

• This algorithm uses the search to find the value to change• Once the value is changed, the new value needs to be

moved to the correct spot• Once the correct spot is found, elements must be shifted to

make room for the updated value• Because of the shift, we can take advantage of this by

combining the search with the shift– If the updated value is to the left, linearly move in that

direction shifting elements at the same time until you hit the right spot

– If the updated value is to the right, shift in that direction in a similar way

• This leads to a binary search followed by a linear shift

Sorted ArraysCode for update (uses binary search function):

boolean update(Type oldValue, Type newValue){

int recordIndex = binarySearch(oldValue);if (recordIndex == -1)

return false;

int nextIndex = 0;if (newValue > oldValue){

nextIndex = recordIndex + 1;if (nextIndex == currentSize)

return true;

while(nextIndex < currentSize && newValue > array[nextIndex]){

array[nextIndex-1] = array[nextIndex];nextIndex++;

}nextIndex--;

}

Sorted ArraysCode for update (uses binary search function):

else{

if (recordIndex == 0)return true;

nextIndex = recordIndex - 1;

while(nextIndex >= 0 && newValue < array[nextIndex]){

array[nextIndex+1] = array[nextIndex];nextIndex--;

}nextIndex++;

}array[nextIndex] = newValue;return true;

}

Sorted Arrays

Update analysis

• The use of the binary search makes the algorithm perform at least at O(log n)

• The worst case scenario here is the update of a value from one end of the array to the other

• The shift is unavoidably performed in a linear way adding a worst case O(n)

• Since the O(log n) binary search is an “addition to” (not a multiplication with…) the O(n) shift, the O(n) performance dominates the analysis, making this algorithm O(n)

• Like the insert, the O(n) represents worst case, but a similar range of O(n/2) to O(n) + O(log n) applies for average and best case scenarios

• Also like the insert, the challenge is the linear shift. Perhaps there is a better data structure to make that better…

• Meanwhile, what about delete?

Sorted Arrays

Delete

• This algorithm also uses the search to find the value to remove

• Once the value is found, elements from the right must be shifted to ensure there are no holes

• This leads to another situation similar to insert where we can stick to a consistent O(n) and do a linear search followed by a shift or

• Go with the range of O(n/2) to O(log n) + O(n) and do a binary search followed by a linear shift

Sorted ArraysCode for O(n) delete:

boolean delete(Type targetValue){

int targetIndex = 0;

while (targetIndex < currentSize){

if (array[targetIndex] != targetValue)targetIndex++;

elsebreak;

}

if (targetIndex == currentSize)return false;

for (int n = targetIndex; n < currentSize - 1; n++)array[n] = array[n+1];

array[--currentSize] = -1; // -1 value representing a blank valuereturn true;

}

Sorted Arrays

Delete O(n) analysis

• The O(n) delete is a guaranteed O(n) solution. No matter the scenario (best, average or worst), the performance for comparisons is O(n)

• This can be useful is specific smaller range situations• As the range gets larger, the search part of the algorithm

can be improved using a binary search

Sorted Arrays

Code for delete using binary search:

boolean deleteBinary(Type targetValue){

int targetIndex = binarySearch(targetValue);

if (targetIndex == -1)return false;

for (int n = targetIndex; n < currentSize - 1; n++)array[n] = array[n+1];

array[--currentSize] = -1;return true;

}

Sorted Arrays

Delete using Binary Search analysis

• In larger data range situations, the search improvement is very helpful

• The worst case situation is worse than the consistent O(n) solution. If the target value is the first element in the array, the search is a complete O(log n) followed by a complete O(n) shift

• However, the best case is O(n/2) in which the target value is in the middle of the array followed by a shift of half the elements

• Also note, if the element does not exist, this algorithm performs at O(log n) while the previous delete algorithm is still O(n)

Sorted Arrays

Summary (Unsorted and Sorted, Linear):

Worst Case Unsorted Arrays Sorted Arrays (Linear Search)

Search O(n) O (log n) - no need to do linear

Insert O(1) O(n)

Update O(n) (not discussed, but O(2n) in worst case)

Delete O(n) O(n)

O(n) here is worst case O(n) here is guaranteed

Sorted Arrays

Summary (Sorted using Binary Search):

Sorted Arrays (Binary Search) Worst Case Best Case

Search O(log n) (not discussed, but O(1))

Insert O(log n) + O(n) O(n/2)

Update O(log n) + O(n) (not discussed, but O(1))

Delete O(log n) + O(n) O(n/2)

Sorted Arrays

Summary

• For smaller sized lists, the linear search based maintenance algorithms of insert, update and delete can take advantage of a guaranteed O(n) performance

• The worst case for using a binary search based maintenance algorithm exceeds the guaranteed linear search based ones

• The best case, though is as low as O(n/2), which is significantly better than the guaranteed O(n)– Note that we can do this comparison between O(n) and O(n/2)

because both are in the linear category• This makes the average performance sometimes better than

the guaranteed O(n)• The big deal with the maintenance algorithms is the shift.

Let’s now look at a data structure that addresses this along with the memory allocations necessary with arrays

data structures arrays phil tayco slide version 1.0 feb 02, 2015

Documents