1 assignment 2: (due at 10:30 a.m on friday of week 10) question 1 (given in tutorial 5) question 2...
Post on 20-Dec-2015
218 views
TRANSCRIPT
1
Assignment 2: (Due at 10:30 a.m on Friday of Week 10)
Question 1 (Given in Tutorial 5)
Question 2 (Given in Tutorial 7)
•If you do Question 1 only, you get 60 points.
•If you do Question 2 only, you get 90 points.
•If you correctly do both Question 1 and Question 2, you get 100 points.
•Bonus: 5 Points will be given to those who write a Java program for the Huffman code algorithm.
2
Review of Lecture 1 to Lecture 6
Lecture 1: Some concept: Pseudo code, Abstract Data Type. (Page 60 of text book.)
Stack. Give the ADT of stack (slide 11 of lecture1)
The interface is on slide 19. (Q: Is the interface equivalent to ADT? Not really. We need the method for insertion and deletion, i.e., first in last out. )
Applications: parentheses matching
3
Lecture 2: Linked list
Singly linked list
Doubly linked list
Just know how to setup a list. (Assignment 1)
Lecture 3: Analysis of Algorithms (important)
Primitive operations
Count number of primitive operations for an algorithm
big-O notation 2nO(n), 5n2+10n+11++>O(n2).
4
Lecture 4: TreeDefinition of tree (slide 7)
Tree terminology: root, internal node, external node (leaf), depth of a node, height of a node, height of a node.
Inorder traversal of a binary tree
Tree ADT, slide 11, Binary tree ADT, slide 17
In terms of programming, understand TreeInExample1.java. (If tested in exam, java codes will be given. I do not want to give long code.)
5
Lecture 5: More on TreesLinked Structure for Binary Tree.
Just understand the node:
Preorder traversal for any tree
Postorder traversal for any tree
Array-Based representation of binary tree (slide 9)
Algorithms for Depth(), Height() slide 12-15.
6
Lecture 6: Priority Queue (Heeps)Priority Queue ADT (slide 2)
Heap:
1. definition of heap
2. What does “heap-order” mean?
3. Complete Binary tree (what is a complete binary?)
4. Height of a complete binary tree with n nodes is O(log n).
5. Insert a node into a heap runtimg time O(log n).
6. removeMin: remove a node with minimum key. Running time O(log n)
Array-based complete binary tree representation.
Show a sample exam paper.
7
Lecture 6: Priority Queue (Heeps)Priority Queue ADT (slide 2)
Heap:
1. definition of heap
2. What does “heap-order” mean?
3. Complete Binary tree (what is a complete binary?)
4. Height of a complete binary tree with n nodes is O(log n).
5. Insert a node into a heap runtimg time O(log n).
6. removeMin: remove a node with minimum key. Running time O(log n)
Array-based complete binary tree representation.
Show a sample exam paper.
8
Exercise:
Give some trees and ask students to give InOrder, PostOrder and PreOrder.
Tutorial 6 of Question 2: Using PreOrder.
Given a complete binary, write the array representation.
Given an array, draw the complete binary tree.
Given a heap, show the steps to removMin.
Given a heap, show the steps to insert a node with key 3. (Do it for the tree version, do it for an array version.)
Linear time construction of a heap.
Hash Tables 9
Huffman codes (Page 565 Chapter 12.4)
Binary character code: each character is represented by a unique binary string.A data file can be coded in two ways:
a b c d e f
frequency(%) 45 13 12 16 9 5
fixed-length code
000 001 010 011 100 101
variable-length code
0 101 100 111 1101 1100
The first way needs 1003=300 bits. The second way needs
45 1+13 3+12 3+16 3+9 4+5 4=232 bits.
Hash Tables 10
Variable-length code
Need some care to read the code. 001011101 (codeword: a=0, b=00, c=01,
d=11.) Where to cut? 00 can be explained as
either aa or b. Prefix of 0011: 0, 00, 001, and 0011.Prefix codes: no codeword is a prefix of some other codeword. (prefix free)Prefix codes are simple to encode and decode.
Hash Tables 11
Using codeword in Table to encode and decode
Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.)
Decode: 001011101 = 0.0.101.1101 = aabe a b c d e f
frequency(%) 45 13 12 16 9 5
fixed-length code 000 001 010 011 100 101
variable-length code
0 101 100 111 1101 1100
Hash Tables 12
Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.)
Decode: 001011101 = 0.0.101.1101 = aabe
(use the (right)binary tree below:)
a:45
b:13 c:12
d:16 e:9
f:5
0
1
100
1486
142858
00
0
0
0 1 1
1
1
a:45
b:13c:12
d:16
e:9
f:5
55
25 30
14
100
0 1
00
0
0
1 1
1
1
Tree for the fixed length codeword Tree for variable-
length codeword
Hash Tables 13
Binary tree
Every nonleaf node has two children.The fixed-length code in our example is not optimal.The total number of bits required to encode a file is
f ( c ) : the frequency (number of occurrences) of c in the file
dT(c): denote the depth of c’s leaf in the tree
Cc
T cdcfTB )()()(
Hash Tables 14
Constructing an optimal code
Formal definition of the problem:Input: a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. Huffman proposed a greedy algorithm to solve the problem.
Hash Tables 16
a:45
d:16
e:9
f:5
140 1
b:13c:12
250 1
a:45
b:13c:12
d:16
e:9
f:5
25 30
140 1
00 1 1
(c)
(d)
Hash Tables 17
a:45
b:13c:12
d:16
e:9
f:5
55
25 30
14
100
0 1
00
0
0
1 1
1
1
a:45
b:13c:12
d:16
e:9
f:5
55
25 30
140 1
00
0
1 1
1
(e) (f)
Hash Tables 18
HUFFMAN(C)
1 n:=|C|
2 Q:=C
3 for i:=1 to n-1 do
4 z:=ALLOCATE_NODE()
5 x:=left[z]:=EXTRACT_MIN(Q)
6 y:=right[z]:=EXTRACT_MIN(Q)
7 f[z]:=f[x]+f[y]
8 INSERT(Q,z)
9 return EXTRACT_MIN(Q)
Hash Tables 19
The Huffman AlgorithmThis algorithm builds the tree T corresponding to the optimal code in a bottom-up manner.C is a set of n characters, and each character c in C is a character with a defined frequency f[c].Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together.The result of the merger is a new object (internal node) whose frequency is the sum of the two objects.
Hash Tables 20
Time complexity
Lines 4-8 are executed n-1 times. Each heap operation in Lines 4-8 takes O(lg n) time.Total time required is O(n lg n).
Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered.