programming interest group chxw/pig/index.htm tutorial two data structures
TRANSCRIPT
Programming Interest Grouphttp://www.comp.hkbu.edu.hk/~chxw/pig/index.htm
Tutorial Two
Data Structures
Data Structures Basic data types:
Integral: integer, character, boolean Floating-point types: float, double, long double
Data structures are methods of organizing large amounts of data. Array List, Stack, Queue, Dequeue Trees: binary tree, binary search tree, AVL tree Priority Queues Hash table Set Graph
COMP1200: Data Structures and Algorithms
Elementary Data Structures Data type is a set of values and a collection of
operations on those values
Basic data types in C and C++ Integers (ints)
short int, int, long int, Floating-point numbers (floats)
float, double Characters (chars)
char
Structure in C and C++
Example 1: Basic Data Types#include <iostream>#include <stdlib.h>#include <math.h>
using namespace std;typedef int Number;Number randNum(){ return rand();}
int main(int argc, char *argv[]){ int N = atoi(argv[1]); float m1 = 0.0, m2 = 0.0; for (int i = 0; i < N; i++) { Number x = randNum(); m1 += ((float)x) / N; m2 += ((float)x*x) / N; } cout << "RAND_MAX.: " << RAND_MAX << endl; cout << "Avg.:" << m1 << endl; cout << "Std. dev.: " << sqrt(m2 - m1 * m1) << endl;}
This program computers the average and standard deviation of a sequence of integers generated by the library function rand( ).
Question: how can you modity the program to handle a sequence of random floating-point numbers in the range of [0, 1]?
Example 2: Structure#include <iostream>#include <stdlib.h>#include <math.h>
using namespace std;struct mypoint { float x; float y; };float mydistance(mypoint, mypoint);mypolar (mypoint, float *r, float *theta);
int main(int argc, char *argv[]){ struct mypoint a, b; a.x = 1.0; a.y = 1.0; b.x = 4.0; b.y = 5.0; cout << " Distance is " << mydistance(a, b); float r, theta; mypolar(a, &r, &theta); cout << "r : " << r << endl; cout << “theta: " << theta << endl;}
/* return the distance between two points */float mydistance(mypoint a, mypoint b){ float dx = a.x - b.x; float dy = a.y - b.y; return sqrt(dx*dx + dy*dy);}
/* convert from Cartesian to polar coordinates */mypolar (mypoint p, float *r, float *theta){ *r = sqrt(p.x*p.x + p.y*p.y); *theta = atan2(p.y, p.x);}
Result:[chxw@csr40 cplus]$ ./a.outDistance is 5r : 1.41421theta: 0.785398
Arrays Array is the most fundamental data structure
An array is a fixed collection of same-type data that are stored contiguously and are accessible by an index
It is the responsibility of the programmer to use indices that are nonnegative and smaller than the array size
Two ways to create an array Static allocation: size known to and set by the programmer Dynamic allocation: size unknown to the programmer and set by
the user at the execution time
Example: Sieve of Eratosthenes
#include <iostream>using namespace std;
static const int N = 1000;int main( ){ int i, a[N]; /* initialization */ for (i = 2; i < N; i++) a[i] = 1; for (i = 2; i < N; i++) if (a[i] ) /* sieve i’s multiples up to N-1*/ for(int j = i; j*i < N; j++) a[i*j] = 0; for (i = 2; i < N; i++) if (a[i]) cout << " " << i; cout << endl;}
Sieve of Eratosthenes is a classical method to calculate the table of prime numbers.
Basic idea: Set a[i] to 1 if i is prime, and 0 if i is not a prime.
Dynamic Memory Allocation
C language malloc( ) and free( )
C++ language use operator new and operator delete
int main(int argc, char *argv[]){ int N = atoi(argv[1]); int *a = new int[N]; if (a == 0) { cout << “out of memory " << endl; return 0; } … delete [] a;}
Array of Structures#include <iostream>#include <stdlib.h>#include <math.h>
using namespace std;struct mypoint { float x; float y; };float mydistance(mypoint, mypoint);float randfloat( );int main(int argc, char *argv[]){ float d = atof(argv[2]); int i, cnt = 0, N = atoi(argv[1]); mypoint *a = new mypoint[N]; for( i = 0; i < N; i++) { a[i].x = randfloat(); a[i].y = randfloat(); } for( i = 0; i < N; i++) for(int j = i+1; j < N; j++) if (mydistance(a[i], a[j]) < d) cnt++; cout << cnt << " pairs within " << d << endl; delete [] a;}
/* return the distance between two points */float mydistance(mypoint a, mypoint b){ float dx = a.x - b.x; float dy = a.y - b.y; return sqrt(dx*dx + dy*dy);}
/* return a random number between 0 and 1 */float randfloat( ){ return 1.0 * rand() / RAND_MAX;}
This program calculates the number of pair of points whose distance is shorter than a threshold.
List
A general list of elements: A1, A2, …, AN, associated with a set of operations: Insert: add an element Delete: remove an element Find: find the position of an element (search) FindKth: find the kth element
Each element has a fixed position Two different implementations:
Array-based list Linked list
List
A 1 A 2 A 3
A 1 A 2 A 3 h ead er
Linked list:
Linked list with a header:
Doubly linked list:
A 1 A 2 A 3
Sample C Implementation of Linked List with a Header
Header files: http://www.comp.hkbu.edu.hk/~chxw/pig/code/fatal.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/list.h
Source file: http://www.comp.hkbu.edu.hk/~chxw/pig/code/list.h
Circular List Example
Josephus problem: N people decided to elect a leader as follows: Arrange themselves in a circle Eliminate every Mth person around the circle The last remaining person will be the leader
Simulation of Josephus problem#include <iostream>#include <stdlib.h>
using namespace std;
struct mynode { int item; mynode* next; /* constructor */ mynode(int x, mynode* t) { item = x; next = t; } };
typedef mynode *mylink;
int main(int argc, char *argv[]){ int i, N = atoi(argv[1]), M = atoi(argv[2]);
/* create the first node */ mylink t = new mynode(1, 0); t->next = t; mylink x = t;
/* insert the next N-1 nodes */ for( i = 2; i <= N; i++) x = (x->next = new mynode(i, t));
/* simulate the election process */ while (x != x->next) { for (i = 1; i < M; i++) x = x->next; /* delete the next node */ t = x-> next; x->next = t->next; delete t; } cout << x->item << endl;}
Stacks
A stack is a list with the restriction that insertions and deletions can be performed at the end of the list, called the top. LIFO: last in, first out
Operations: Push(x, s) Pop(s) MakeEmpty(s) IsEmpty(s) Top(s)
Stack Implementations
Using a linked list http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackli.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackli.c
Using an array http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackar.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/stackar.c
Remark: you need to define the maximum stack size when creating the stack
Queues A Queue is a list with the restriction that insertion
is done at one end, whereas deletion is done at the other end. FIFO: first in, first out
Operations: CreateQueue(x): create a queue with maximum size
of x Enqueue(x, q): insert an element x at the end of the
list Dequeue(q): return and remove the element at the
start of the list IsEmpty(q) and IsFull(q)
Queue Implementation
Implemented by a circular array Need to specify the maximum size of the queue when creating
the queue One variable for the front of the queue, another one for the
rear of the queue
Sample code http://www.comp.hkbu.edu.hk/~chxw/pig/code/queue.h http://www.comp.hkbu.edu.hk/~chxw/pig/code/queue.c
Priority Queues
A priority queue is a data structure that allows the following operations: Insert(x, p): insert item x into priority queue p Maximum(p): return the item with the highest
priority in priority queue p ExtractMax(p): return and remove the item with
the highest priority in p Note:
Each element contains a key which represents its priority
Sets
A set is a collection of unordered elements drawn from a given universal set U.
Operations: Member(x, S): is an item x an element of set S? Union(A, B) Intersection(A, B) Insert(x, S) Delete(x, S)
Dictionaries
Dictionaries permit content-based retrieval. Operations:
Insert(x, d) Delete(x, d) Search(k, d): return an item with key k
Note Dictionaries can be implemented by lots of
techniques, like linked list, array, tree, hashing, etc.
C++ Standard Template Library The C++ STL provides implementations of lots of
data structures Reference:
http://www.sgi.com/tech/stl/ http://www.cppreference.com/
Data structures: (Containers in C++) Sequential containers (see Workshop 7)
Vectors, Lists, Double-ended Queues Associative containers (see Workshop 7)
Sets, Multisets, Maps, Multimaps Container adaptors
Stacks, Queues, Priority Queues
List in C++
List is implemented as a doubly linked list of elements Each element in a list has its own segment of memory and
refers to its predecessor and its successor Disadvantage: Lists do not provide random access.
General access to an arbitrary element takes linear time. Hence lists don’t support the [ ] operator
Advantage: insertion or removal of an element is fast at any position
http://www.cplusplus.com/reference/stl/list/
24
List Example 1// list1.cpp#include <iostream>#include <list>using namespace std;
int main(){ list<char> coll;
for (char c = 'a'; c <= 'z'; ++c) coll.push_back(c);
while (! coll.empty() ) { cout << coll.front() << ' '; coll.pop_front(); } cout << endl;
return 0;}
$ g++ list1.cpp$ ./a.outa b c d e f g h i j k l m n o p q r s t u v w x y z $
25
List Example 2// list2.cpp#include <iostream>#include <list>using namespace std;
int main(){ list<char> coll;
for (char c='a'; c<='z'; ++c) coll.push_back(c);
list<char>::const_iterator pos; for (pos = coll.begin(); pos != coll.end(); ++pos) cout << *pos << ' '; cout << endl;}
$ g++ list2.cpp$ ./a.outa b c d e f g h i j k l m n o p q r s t u v w x y z $
begin() end()pos ++
26
List Example 3// list3.cpp#include <iostream>#include <list>using namespace std;
int main(){ list<char> coll;
for (char c='a'; c<='z'; ++c) coll.push_back(c);
list<char>::iterator pos; for (pos = coll.begin(); pos != coll.end(); ++pos) { *pos = toupper(*pos); cout << *pos << ' '; } cout << endl;}
Stack in C++// stack.cpp#include <iostream>#include <stack>using namespace std;
int main(){ stack<int> s;
for (int i=1; i<=10; ++i) s.push(i);
while( !s.empty() ) { cout << s.top() << endl; s.pop(); }
return 0;}
push(): insert an elementpop(): remove the first elementtop(): access the first elementsize(): return the number of elementsempty(): check whether the container is empty
Remark:pop() will remove the first element and return nothing. So usually we need to call top() to get the first element, then call pop() to remove it.
Queue in C++// queue.cpp#include <iostream>#include <queue>using namespace std;
int main(){ queue<int> s;
for (int i=1; i<=10; ++i) s.push(i);
while( !s.empty() ) { cout << s.front() << endl; s.pop(); }
return 0;}
push(): insert an elementpop(): remove the first elementfront(): access the first elementback(): access the last elementsize(): return the number of elementsempty(): check whether the container is empty
Queue Example II// queue2.cpp#include <iostream>#include <queue>#include <string>using namespace std;
int main(){ queue<string> q;
q.push(“These “); q.push(“are “); q.push(“more than “);
cout << q.front(); q.pop(); cout << q.front(); q.pop();
q.push(“four “); q.push(“words!“);
// skip one element q.pop(); cout << q.front(); q.pop(); cout << q.front(): q.pop();
cout << “number of elements in the queue: “ << q.size() << endl;
return 0;}
Priority Queue in C++// pqueue.cpp#include <iostream>#include <queue>using namespace std;
int main(){ priority_queue<int> s;
s.push(5); s.push(4); s.push(8); s.push(9); s.push(2); s.push(7); s.push(6); s.push(3); s.push(10);
while( !s.empty() ) { cout << s.top() << endl; s.pop(); }
return 0;}
push(): insert an elementpop(): remove the element with the highest prioritytop(): access the element with the highest prioritysize(): return the number of elementsempty(): check whether the container is empty
By default, elements are sorted by operator < in descending order, i.e., the largest element has the highest priority.
Different Sorting Criterion// pqueue.cpp#include <iostream>#include <queue>using namespace std;
int main(){ priority_queue<int, vector<int>, greater<int> > s;
s.push(5); s.push(4); s.push(8); s.push(9); s.push(2); s.push(7); s.push(6); s.push(3); s.push(10);
while( !s.empty() ) { cout << s.top() << endl; s.pop(); }
return 0;}
Three parameters when defining a priority queue:
int: type of element
vector<int>: the container that is used internally
greater<int>: the sorting criteria(by default, it is less<>)
Java java.util package http://java.sun.com/products/jdk
http://java.sun.com/j2se/1.4.2/docs/api/java/util/package-summary.html
Stack Stack
Queue ArrayList, LinkedList
Dictionaries HashMap, hashtable
Priority Queue TreeMap
Sets HashSet
What to do now?
Choose your own weapon C: write a set of data structure C++: learn the STL Java: learn the java.util package
Try to solve at least one exercise If you still have time, solve more exercises.