cse 5350 - fall 2007slide 1 textbook readings: cormen: part iii, chapters 10-14 mihaela iridon...
TRANSCRIPT
CSE 5350 - Fall 2007 Slide 1
Data StructuresData StructuresSpecification and ImplementationSpecification and Implementation
Data StructuresData StructuresSpecification and ImplementationSpecification and Implementation
Textbook readings:Cormen: Part III, Chapters 10-14
Mihaela IridonMihaela Iridon, Ph.D.
CSE 5350/7350Introduction to Algorithms
Data Structures
ObjectivesObjectives
• Understand what dynamic sets are• Learn basic techniques for
a) Representing &b) Manipulating finite dynamic set
• Elementary Data Structures– Stacks, queues, heaps, linked lists
• More Complex Data Structures– Hash tables, binary search trees
• Data Structures in C#.NET 2.0CSE 5350 - Fall 2007 Slide 2Data Structures
High-Level Structure (1)High-Level Structure (1)
• Arrays– System.Collections.ArrayList– System.Collections.Generic.List
• Queue– System.Collections.Generic.Queue
• Stack– System.Collections.Generic.Stack
CSE 5350 - Fall 2007 Slide 3Data Structures
High-Level Structure (2)High-Level Structure (2)
• Hashtable– System.Collections.Hashtable– System.Collections.Generic.Dictionary
• Trees– Binary Trees, BST, Self-Balancing BST– Linked Lists
• System.Collections.Generic.LinkedList
• Graphs
CSE 5350 - Fall 2007 Slide 4Data Structures
Dynamic Data SetsDynamic Data Sets
• Definition• Why dynamic• General examples• Data structures and the .NET framework• “An Extensive Examination of Data
Structures Using C# 2.0” – Scott Mitchell
• http://msdn2.microsoft.com/en-us/library/ms364091(VS.80).aspx
CSE 5350 - Fall 2007 Slide 5Data Structures
Data Structure DesignData Structure Design
• Impact on efficiency/running time• The data structure used by an
algorithm can greatly affect the algorithm's performance
• Important to have rigorous method by which to compare the efficiency of various data structures
CSE 5350 - Fall 2007 Slide 6Data Structures
Example: file extension searchExample: file extension search
• Search is of O(n)
CSE 5350 - Fall 2007 Slide 7
public bool DoesExtensionExist(string [] fileNames, string extension) {
int i = 0; for (i = 0; i < fileNames.Length; i++) if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0)
return true; return false; // If we reach here, we didn't find the extension }
}
Data Structures
The ArrayThe Array
• Linear• Simple• Direct Access• Homogeneous• Most widely used
CSE 5350 - Fall 2007 Slide 8Data Structures
The Array (2)The Array (2)
• The contents of an array are stored in contiguous memory.
• All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures.
• Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i].
CSE 5350 - Fall 2007 Slide 9Data Structures
Array OperationsArray Operations
• Allocation• Accessing
– Declaring an array in C#:string[] myArray;
(initially myArray reference is null)
– Creating an array in C#:myArray = new string[5];
CSE 5350 - Fall 2007 Slide 10Data Structures
Array AllocationArray Allocation
• string[] myArray = new string[someIntegerSize];
this allocates a contiguous block of memory on the heap (CLR-managed)
CSE 5350 - Fall 2007 Slide 11Data Structures
Array AccessingArray Accessing
• Accessing an element at index i: O(1)
• Searching through and array– Unsorted: O(n)– Sorted: O(log n)
• Array class: static method:– Array.BinarySearch(Array input, object
val)
CSE 5350 - Fall 2007 Slide 12Data Structures
Array ResizingArray Resizing
• When the size needs to change:– Must create a new array instance– Copy old array into new array:
Array1.CopyTo(Array2, 0)
• Time consuming• Also, inserting into an array is
problematic
CSE 5350 - Fall 2007 Slide 13Data Structures
Multi-Dimensional ArraysMulti-Dimensional Arrays
• Rectangular– n x n– n x n x n x …– Accessing: O(1)– Searching: O(nk)
• Jagged/Ragged– n1 x n2 x n3 x …
CSE 5350 - Fall 2007 Slide 14Data Structures
GoalsGoals
• Type-safe• Performant• Reusable
• Example: payroll application
CSE 5350 - Fall 2007 Slide 15Data Structures
System.Collections.ArrayListSystem.Collections.ArrayList
• Can hold any data type: (hybrid)• Internally: array object• Automatic resizing• Not type safe: casting errors
detected only at runtime• Boxing/unboxing: extra-level of
indirection affects performance• Loose homogeneity
CSE 5350 - Fall 2007 Slide 16Data Structures
GenericsGenerics
• Remedy for Typing and Performance• Type-safe collections• Reusability
• Example:public class MyTypeSafeList<T>{
T[] innerArray = new T[0];}
CSE 5350 - Fall 2007 Slide 17Data Structures
ListList• Homogeneous• Self-Re-dimensioning Array• System.Collections.Generic.List
List<string> studentNames = new List<string>();
studentNames.Add(“John”); …string name = studentNames[3];studentNames[2] = “Mike”;
CSE 5350 - Fall 2007 Slide 18Data Structures
List MethodsList Methods
• Contains()• IndexOf()• BinarySearch()• Find()• FindAll()• Sort()
– Asymptotic Running Time: same as array but with extra overhead
CSE 5350 - Fall 2007 Slide 19Data Structures
Ordered Requests ProcessingOrdered Requests Processing
• First-come, First-serve (FIFO)• Priority-based processing• Inefficient to use List<T>• List will continue to grow (internally,
the size is doubled every time)• Solution: circular list/array• Problem: initial size??
CSE 5350 - Fall 2007 Slide 20Data Structures
QueueQueue
• System.Collections.Generic.Queue• Operations:
– Enqueue()– Dequeue()– Contains()– ToArray()– Peek()
• Does not allow random access• Type-safe; maximizes space utilization
CSE 5350 - Fall 2007 Slide 21Data Structures
Queue (continued)Queue (continued)
• Applications:– Web servers– Print queues
• Rate of growth:– Specified in the constructor– Default: double initial size
CSE 5350 - Fall 2007 Slide 22Data Structures
StackStack
• LIFO• System.Collections.Generic.Stack• Operations:
– Push()– Pop()
• Doubles in size when more space is needed
• Applications:– CLR call stack (functions invocation)
CSE 5350 - Fall 2007 Slide 23Data Structures
Limitations of Ordinal IndexingLimitations of Ordinal Indexing
• Ideal access time: O(1)• If index is unknown
– O(n) if not sorted– O(log n) if sorted
• Example: SSN: 10 ^ 9 possible combinations
• Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits
CSE 5350 - Fall 2007 Slide 24Data Structures
Hash TableHash Table• Hashing:
– Math transformation of one representation into another representation
• Hash table:– The array that uses hashing to compress
the indexers space
• Cryptography (information security)• Hash function:
– Non-injective (not a one-to-one function)– “Fingerprint” of initial data
CSE 5350 - Fall 2007 Slide 25Data Structures
GoalsGoals
• Fast access of items in large amounts of data
• Few collisions as possible– collision avoidance
• Avalanche effect:– Minor changes to input major
changes to output
CSE 5350 - Fall 2007 Slide 26Data Structures
Collision Resolution (1)Collision Resolution (1)
• Probability to map to a given location:
1/k (k = size = number of slots)
• (1) Linear ProbingIs H[i] empty?• YES: place item at location I• NO: i = i + 1; repeat
– Deficiency: clustering– Access and Insertion: no longer O(1)
CSE 5350 - Fall 2007 Slide 27Data Structures
Collision Resolution (2)Collision Resolution (2)
• (2) Quadratic Probing– Check s + 12
– Check s – 12
– Check s + 22
– Check s – 22
– …– Check s +/- i2
– Clustering a problem as well
CSE 5350 - Fall 2007 Slide 28Data Structures
Collision Resolution (3)Collision Resolution (3)• (3) Rehashing – used by Hashtable
(C#)• System.Collections.Hashtable• Operations:
– Add(key, item)– ContainsKey()– Keys()– ContainsValue()– Values()
• Key, Value: any type not type safeCSE 5350 - Fall 2007 Slide 29Data Structures
Hashtable Data Type – ExampleHashtable Data Type – Example
CSE 5350 - Fall 2007 Slide 30Data Structures
using System;using System.Collections;
public class HashtableDemo{ private static Hashtable employees = new Hashtable();
public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun");
// Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); }}
using System;using System.Collections;
public class HashtableDemo{ private static Hashtable employees = new Hashtable();
public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun");
// Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); }}
HashtableHashtable
• Key = any type• Key is transformed into an index via
GetHashCode() function• Object class defines GetHashCode()• H(key) = [GetHash(key) + 1 +
(((GetHash(key) >> 5) + 1) %(hashsize – 1))] % hashsize
Values = 0 .. hashsize-1
CSE 5350 - Fall 2007 Slide 31Data Structures
Collision Resolution (3 – cont’d)Collision Resolution (3 – cont’d)
• Rehashing = double hashing• Set of hash functions: H1, H2, …, Hn
• Hk(key) = [GetHash(key) + k *
(1 + (((GetHash(key) >> 5) + 1) %(hashsize – 1)))] % hashsize
• Hashsize must be PRIME
CSE 5350 - Fall 2007 Slide 32Data Structures
HashtableHashtable• Load Factor = MAX ( # items / # slots)• Optimal: 0.72• Expanding the hashtable: 2 steps: (costly)
– Double # slots (crt prime next prime which is about twice bigger)
– Rehash
• High LoadFactor Dense Hashtable– Less space– More probes on collision (1/(1-LF))– If LF = 0.72 expected # probes = 3.5 O(1)
CSE 5350 - Fall 2007 Slide 33Data Structures
HashtableHashtable
• Costly to expand• Set the size in constructor if size is
known• Asymptotic running times:
– Access: O(1)– Add, Remove: O(1)– Search: O(1)
CSE 5350 - Fall 2007 Slide 34Data Structures
System.Collections.Generic.DictionSystem.Collections.Generic.Dictionaryary• Typesafe• Strongly typed KEYS + VALUES• Operations:
– Add(key, value)– ContainsKey(key)
• Collision Resolution: CHAININGCHAINING– Uses linked lists from an entry where
collision occurs
CSE 5350 - Fall 2007 Slide 35Data Structures
Chaining in Dictionary Data Chaining in Dictionary Data TypeType
CSE 5350 - Fall 2007 Slide 36Data Structures
Dictionary ExampleDictionary Example
CSE 5350 - Fall 2007 Slide 37Data Structures
Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>();
Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>();
Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>();Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>();
// Add some employeesemployeeData.Add(455110189) = new Employee("Scott Mitchell");employeeData.Add(455110191) = new Employee("Jisun Lee");...// See if employee with SSN 123-45-6789 works hereif (employeeData.ContainsKey(123456789)) ...
// Add some employeesemployeeData.Add(455110189) = new Employee("Scott Mitchell");employeeData.Add(455110191) = new Employee("Jisun Lee");...// See if employee with SSN 123-45-6789 works hereif (employeeData.ContainsKey(123456789)) ...
Chaining in the Dictionary typeChaining in the Dictionary type• Efficiency:
– Add: O(1)– Remove: O (n/m)– Search: O(n/m)Where:
n = hash table sizem = number of buckets/slots
• Implemented s.t. n=m at ALL times– The total # of chained elements can never
exceed the number of buckets
CSE 5350 - Fall 2007 Slide 38Data Structures
TreesTrees
• = set of linked nodes where no cycle exists
• (GT) a connected acyclic graph• Nodes:
– Root– Leaf– Internal
• |E| = ?• Forrest = { trees }CSE 5350 - Fall 2007 Slide 39Data Structures
Popular Tree-Type Data Popular Tree-Type Data StructuresStructures• BST: Binary Search Tree• Heap• Self-balancing binary search trees
– AVL – Red-black
• Radix tree• …
CSE 5350 - Fall 2007 Slide 40Data Structures
Binary TreesBinary Trees
• Code example for defining a tree data object
• Tree Traversal– In-order: L Ro R– Pre-order: Ro L R– Post-order: L R Ro– Ө(n)
CSE 5350 - Fall 2007 Slide 41Data Structures
Tree OperationsTree Operations• Search: Recursive: O(h)
– h = height of the tree
• Max & Min Search: search right/left• Successor & Predecessor Search• Insertion (easy: always add a new leaf)
& Deletion (more complicated as it may cause the tree structure to change)
• Running time:– function of the tree topology
CSE 5350 - Fall 2007 Slide 43Data Structures
Binary Search TreeBinary Search Tree
• Improves the search time (and lookup time) over the binary tree in general
• BST property:– for any node n, every descendant node's
value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n
CSE 5350 - Fall 2007 Slide 44Data Structures
Linear Search Time in BSTLinear Search Time in BST
CSE 5350 - Fall 2007 Slide 46Data Structures
The search time for a BST depends upon its topology.
BST continuedBST continued
• Perfectly balanced BST:– Search: O(log n) [ height = log n]
• Sub-linear search running time
• Balanced Binary Tree:– Exhibits a good ration: breadth/width
• Self-balancing trees
CSE 5350 - Fall 2007 Slide 47Data Structures
The HeapThe Heap
• Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap]
• Operations:– delete-max or delete-min: removing the root
node of a max- or min-heap, respectively – increase-key or decrease-key: updating a key
within a max- or min-heap, respectively – insert: adding a new key to the heap – merge: joining two heaps to form a valid new
heap containing all the elements of both
CSE 5350 - Fall 2007 Slide 48Data Structures
Max Heap ExampleMax Heap Example
• Example of max-heap:
CSE 5350 - Fall 2007 Slide 49Data Structures
Linked ListsLinked Lists• No resizing necessary• Search: O(n)• Insertion
– O(1) if unsorted– O(n) is sorted
• Access: O(n)• System.Collections.Generic.LinkedList
– Doubly-linked; type safe (value Generics)– Element: LinkedListNode
CSE 5350 - Fall 2007 Slide 50Data Structures
Skip ListSkip List• Link list with self-balancing BST-like
property• The elements are sorted• Height = log n• Problems with insert & delete• Solution: randomized distribution• Overall: O(log n)• Worst case: O(n) – but very, very, slim
changes to reach worst case
CSE 5350 - Fall 2007 Slide 51Data Structures
GraphsGraphs
• A collection of interconnected nodes• A graph or undirected graph G is
an ordered pair G: = (V,E) that is subject to the following conditions:
– V is a set, whose elements are called vertices or nodes,
– E is a set of pairs (unordered) of distinct vertices, called edges or lines.
• Edges (1):– Directed - Weighted– Undirected - Unweighted
CSE 5350 - Fall 2007 Slide 53Data Structures
Graph (cont’d)Graph (cont’d)
• Sparse: |E| << |Emax| or |E| ≤ n2
• Representation:– Adjacency List– Adjacency Matrix– (Packed Edge List)
• Problems applicable to graphs:– Minimum spanning tree (Kruskal, Prim)– Shortest Path (Dijkstra)
CSE 5350 - Fall 2007 Slide 54Data Structures
Website Navigation as a GraphWebsite Navigation as a Graph
CSE 5350 - Fall 2007 Slide 55Data Structures
Minimum Spanning TreeMinimum Spanning Tree• Spanning Tree of a connected,
undirected graph = some subset of the edges that connect all the nodes, and does not introduce a cycle
CSE 5350 - Fall 2007 Slide 58Data Structures