data structures

52
Data Structures Placement Lectures 2012 Pranav Gupta

Upload: pranav-gupta

Post on 23-Dec-2014

733 views

Category:

Technology


7 download

DESCRIPTION

this slide some of the important data structures like graphs, trie, suffix trees, hash tables etc

TRANSCRIPT

Page 1: Data structures

Data StructuresPlacement Lectures 2012

Pranav Gupta

Page 2: Data structures

Why we need Data Structures?

• Efficient and Intuitive representation of data• Tree using arrays vs tree using pointers

• To solve real life problems efficiently• Insertion• Deletion• Search• Sort

• Applications• Social networks• Employee hierarchy• Recommended items

Page 3: Data structures

Basic Operations

1. traverse2. insert3. delete4. find

Page 4: Data structures

Data Structures (Basic)

• Arrays• Linked Lists• Stacks• Queues• Recursion• Trees – Basic• Practice Problems

Page 5: Data structures

Arrays

Page 6: Data structures

• Contiguous and fixed memory allocation (independent of language)

• Random access and modification

• List of (index, value); index is non-negative integer; all values in a given array are of the same data type

• To hold various types of values or have non-numerical indices, use associative arrays/dictionaries – The Dictionary Problem?

Page 7: Data structures

• Arrays may also be:• 2-D : array of 1-D arrays (a 1-D array is a data type in itself)• 3-D : array of 2-D arrays (a 2-D array is a data type in itself)

• Memory placement of multi-dimensional arrays1.row-major2.column-major

• Useful Operationa. Modifyb. Accessc. Swapd. In-place reverse

Page 8: Data structures

Structure of an Arraytemplate<class T> class Array{int size;T *arr;void put();void get();…….};

Useful Libraries#include <vector>

Page 9: Data structures

Irregular Arrays• Languages known to students at IITG1.2-D Array

2.Irregular Array

Student

Languages

Student

Page 10: Data structures

Special (Arrays ??)

• Diagonal matrix, upper/lower triangular matrix, trigonal matrix, symmetric/asymmetric matrices

• Generally deal with 2-D matrices, but 3-D or higher cases are also possible. Generally deal with square matrix, but rectangular (non-square) are also possible

• More like functions

Page 11: Data structures

Special (Arrays ??)

int spec_matrix(int i, int j){return no_cols*i + j + 1;

}

• Performance ??

Page 12: Data structures

One Dimensional Sparse Array

4 17 7 23 8 14ary

0 0 0 0 17 0 0 23 14 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11ary

Page 13: Data structures

Two Dimensional Sparse Array

8

12

33

17

0 1 2 3 4 50

1

2

3

4

5

5 120

1

2

3

4

5

1 8 5 33

3 17

Row elements can be accessed efficiently

Page 14: Data structures

Two Dimensional Sparse Array

8

12

33

17

0 1 2 3 4 50

1

2

3

4

5

5 120

1

2

3

4

5

1 8 5 33

3 17

0 1 2 3 4 5

0

33

4

rows

cols

Efficient row and column elements access

Page 15: Data structures

Efficient Representation

8

12

3317

0 1 2 3 4 50

1

2

3

4

5

5 12

1 8 5 33

3 17

5

0

33

4

rows

cols

0

3

4

31

Page 16: Data structures

Linked Lists

Page 17: Data structures

Why?

• To store heterogeneous data• To store sparse data• Flexibility of increase/decrease in size; easy insertion and

deletion of elements

• Useful Operations• insertion• deletion

Page 18: Data structures

Logical Arrangement

First element

Second element

Third element Null

Head nodeFinal node

Tail node

Address of second node

Address of third node

Address of final node

Page 19: Data structures

The Structuretemplate <class T> class node{

T data;node<T> *next; // Extra (4?) bytes; size of a pointer

};

template <class T> LinkedList{node<T> *head;int size; // …..etc etc etc

};

Useful Libraries#include <list>

Page 20: Data structures

View of the Memory

*Struct is stored in contiguous memory

Page 21: Data structures

Insertion/Deletion

Time Complexity:Insertion : O(1) / O(n)Deletion : O(1) / O(n)

Space Complexity:Insertion : O(1)Deletion : O(1)

Page 22: Data structures

Tweak some more !

• Doubly Linked Lists• Extra (4?) bytes space vs better accessibility• Insertion/deletion ?

• Circular Linked Lists• How to find the end?

• Tail pointer• Null ‘next’ pointer from last node• Last node points to first (circular)

Page 23: Data structures

Practice (Linked List)Linked List 1)Linked List 2)

Page 24: Data structures

Recursion

Page 25: Data structures

• To solve a task using that task itself• ; a task should have recursive nature• ; generally can be transformed by tweaking some parts of the

task

• Example: task of piling up n coins vs picking up a suitcase.

• Let the task be a C function. What are the parts of the task:1.Input it takes2.What it does3.Output it gives

Page 26: Data structures

• A task is performed recursively when generally a large input can’t be handled directly.

• So, recursion is all about simplifying the input at every step till it becomes trivial (base case)

Page 27: Data structures

Implementation – run time stack

• Activation Records (AR)• Store the state of a method

1.input parameters2.return values3.local variables4.return addresses

Page 28: Data structures

2

25.6

(136)?

2

...…y…

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

05.6

(105)?

2

25.6

(136)?

2

...…y…

2

15.6

(105)?

2

05.6

(105)1.0

2

25.6

(136)?

2

...…y…

2

15.6

(105)5.6

2

25.6

(136)31.36

2

...…y…

power(5.6, 2)

power(5.6, 1)power(5.6, 2)

power(5.6, 0)power(5.6, 1)power(5.6, 2)

power(5.6, 1)power(5.6, 2)

power(5.6, 2)

Page 29: Data structures

• AR is formed on run-time stack and is private to a method.• run-time stack is 1 only.

Stack pointer

Stack pointer

Stack pointer

Stack pointer

Page 30: Data structures

Advantages/Disadvantages

1.more readable/understandable/consistent with the the definition

2.memory requirements increase due to runtime stack3.difficult to open and debug

Page 31: Data structures

Types of Recursion

• Tail (vs loop?)int factn;While (n > 0) factn *= n--;

• Indirect• A() -> B() -> C() -> A()

• Nested:• h(n) = h(2 + h(n-1))

Page 32: Data structures

Types of Recursion

• Excessive: exponential time complexity!

• Questionable: will it terminate??

2)2()1(

11

00)(

nnFibnFibnif

nifnFib

otherwisenf

evenisnifnf

nif

nf

)1*3(

)2/(

11

)(

Page 33: Data structures

Hashes

Page 34: Data structures

Why?

• Want to store dictionaries?, associative arrays?• arrays with non-numerical indices

• String operations made easy• Ex: Finding anagrams• Ex: Counting frequency of words in a string

Page 35: Data structures

Associative Arrays• (key, value) pairs where key is not necessarily a non-negative

integer; can be string etc.

• Ex: no. of students in each department• “cse” => 68• “eee” => 120• “mech” => 70• “biotech” => 30

• Do not allow duplicate keys• Dict (“cse”) = “data structures”• Dict (“cse”) = “algorithms”

Dict(“cse”) = {“data structures”, “algorithms”}

Page 36: Data structures

Hash Functions1.HashTable : an array of fixed size

• TableSize - preferably prime and large2.Hash function (map to an index of the HashTable)Techniques

• use all characters• use aggregate properties - length, frequencies of characters• first 3 characters, odd characters

Evaluation• Uniform distribution; load factor λ?• Utilize table space• Quickly computable

Page 37: Data structures

3. Collision resolution1.separate chaining

• Linked list at each index• Insertion (at head?)• Desired length of a chain : close to λ• Avg. time for Successful search = 1 + 1 + λ/2• Disadvantages

• slow?• different data structures - array/linked lists?

Page 38: Data structures

1.open addressing• Single table• Desired λ ~ 0.5• Apply h0(x), h1(x), h2(x) …

• hi(x) = h(x) + f(i); f(0) = 03 ways to do it

1.linear probing : f(i) is linear in i• f(i) = i (quickly computable vs primary clustering?)

2.quadratic probing : f(i) is quadratic in i3.double hashing

• H(x) = h(x) + f(i).h2(x)Rehashing

• What if the table gets full (70%, …. , 100%)• Create a new HashTable double? the size

Page 39: Data structures

Structure

template<class T> class Hash{int TableSize;T *arr;

};

Useful Libraries#include <hash>

Page 40: Data structures

Practice (Hashes)Trie 7)

Page 41: Data structures

Graphs

Page 42: Data structures

What is it?

In simple words, G = (V, E)V = (v0, v1, v2, v3, .. vn) is the set of nodesE = (e0, e1, e2, e3 .. em) is the set of edges

*Any tree T = (V, E) as well; so most techniques in graph algorithms apply to trees as well.

v0

v3v2

v1

Page 43: Data structures

Representation1.Adjacency Matrix (|V| * |V|)

2.Adjacency List

Page 44: Data structures

Breadth First Traversal (BFT)

• Traverse the nodes depth-wise; nodes at depth 0 before nodes at depth 1 before nodes at depth 2 ....

• Done using a queue• Ex: 1,2,3,4,5,7,8,6

Page 45: Data structures

Depth First Traversal (DFT)

• Move to next child only after all nodes in the current child are marked

• Done using a stack• Ex: a, b, c, d, e, h, f, g

Page 46: Data structures

Trees (Advanced)

Page 47: Data structures

Retrieval

• Stores the prefixes of a set of strings in an efficient manner• Used to store associative arrays/dictionaries

Page 48: Data structures

How to create a Trie

• Ex: tin, ten, ted, tea, to, i, in, inn

Page 49: Data structures

Pairs of anagrams

• Sort all the strings• acute -> acetu• obtuse -> beostu … etc

• Insert them into the trie• Keep storing collisions i.e. multiple values for each key• Each set of values gives groups of anagrams

Page 50: Data structures

Suffix Tree/Patricia/Radix Tree

• Stores the suffixes of a string• O(n) space and time to build• Does not exist for all strings; add special symbol $ at the end

Page 51: Data structures

Advantages of Suffix Trees

• Store n suffixes in O(n) space.• Improved string operations. Eg. substring lookup, Longest

common substring operation (generalized suffix trees?)

Generalized Suffix Trees• Each string terminated by a different special symbol• More space efficient• Have different set of algorithms

Page 52: Data structures

Longest Common Substring

Longest Common Substring1.Make a “generalized suffix tree” for the (2?) strings2.Traverse the tree to mark all internal nodes as 1, 2 or (1,2)

depending on whether it is parent to a leaf node terminating with the special symbol of string 1 and string 2.

3.Find the deepest internal node marked 1,2

Pattern Matching ?