introduction to algorithm instructor: dr. bin fu office: engr 3. 280 (third floor) email:...

Introduction to Algorithm

• Instructor: Dr. Bin Fu

• Office: ENGR 3. 280 (Third Floor)

• Email: bfu@utpa.edu

• Textbook: Introduction to Algorithm

(Second Edition) by Cormem, Leiserson, Rivest and Stein

Why study algorithms and performance?

• Performance often draws the line between what

is feasible and what is impossible.

• Algorithmic mathematics provides a language

for talking about program behavior. (e.g., by using big-O –notation)

• In real life, many algorithms, though different

from each other, fall into one of several

paradigms (discussed shortly). These

paradigms can be studied.

Why these particular algorithms ??

In this course, we will discuss problems, and algorithms for solving these problems.

Why these algorithms (cont.)

1. Main paradigms:

a) Greedy algorithms

b) Divide-and-Conquers

c) Dynamic programming

d) Brach-and-Bound (mostly in AI )

e) Etc etc.

2. Other reasons:

a) Relevance to many areas:

• E.g., networking, internet, search engines…

Topics

• Recursive Equations

• Divide and Conquer Method

• Dynamic Programming Method

• Basic and Advanced Data Structures

• Graph Algorithms

• Approximation Algorithms

• NP-Complete Theory

• Randomized Algorithms

• 4-5 assignments (35%)

• Midterm (20%)

• Final (25%)

• Exercises and Attendance for the class (20%)

Advantage for a good algorithm designer

• It helps you develop efficient software

• It is easy to switch from one area to another in computer science

2.2 Analyzing Algorithms

RAM: Random-access machine,

It takes one step for accessing memory once.

The instructions are executed one by one sequentially

Running time: the total number of steps expressed as the function of input size

The problem of sorting

• Input: sequence ⟨a1, a2, …, an of numbers.⟩• Output: permutation of the input numbers:

Example:

Input: 8 2 4 9 3 6

Output: 2 3 4 6 8 9

''' 21 naaa

Running time

The running time depends on the input:

• Parameterize the running time by the

size of the input n

• Seek upper bounds on the running time

T(n) for the input size n, because everybody likes a guarantee.

Kinds of analyses

Worst-case: (usually)• T(n) = maximum time of algorithm on any input of size n.

Average-case: (sometimes)• T(n) = expected time of algorithm over all inputs of size n.• Need assumption of statistical distribution of inputs.

Best-case: (bogus)• Cheat with a slow algorithm that works fast on some input.

Machine-independent time

What is insertion sort’s worst-case time?• It depends on the speed of our computer:• relative speed (on the same machine),• absolute speed (on different machines).

BIG IDEA:• Ignore machine-dependent constants.• Look at growth of T(n) as n → ∞ .• “Asymptotic Analysis”

Bubble Sort Algorithm

1. Compare 1st two elements and exchange them if they are out of order.

2. Move down one element and compare 2nd and 3rd

elements. Exchange if necessary. Continue until end of array.

3. Pass through array again, repeating process and exchanging as necessary.

4. Repeat until a pass is made with no exchanges.

Bubble Sort Example

Array numlist3 contains

Compare values 17 and 23. In correct order, sono exchange.

Compare values 23 and11. Not in correct order,so exchange them.

17 23 5 11

Compare values 23 and5. Not in correct order, so exchange them.

Bubble Sort Example

17 5 23 11

Bubble Sort Example

17 5 11 23

Bubble Sort Example (continued)

After first pass, array numlist3 contains

Compare values 17 and23. In correct order, sono exchange.

5 17 11 23

Compare values 17 and11. Not in correct order, so exchange them.

In order from previous pass

Bubble Sort Example (continued)

After second pass, array numlist3 contains

No exchanges, so array is in order

Compare values 5 and 11. In correct order, sono exchange.

5 11 17 23

In order from previous passes

Sorting Problem

• Given a series of integers

7, 2, 5, 3, 6, 9, 8, 1

• Arrange them by the increasing order:

1< 2< 3 < 5 < 6 < 7< 8 < 9

Merge for Sorting• Convert the sorting for 7, 2, 5, 3, 6, 9, 8, 1 into 7, 2, 5, 3, and 6, 9, 8, 1 • Sort 7, 2, 5, 3 into 2< 3 < 5 < 7• Sort 6, 9, 8, 1 into 1< 6 < 8 <9• Merge 2< 3 < 5 < 7 1< 6 < 8 <9 2< 3 < 5 < 7 3 < 5 < 7 5 < 7 7 6 < 8 <9 6 < 8 <9 6<8 <9 6<8 <9 1 1<2 1<2<3 1<2<3<5

Merge for SortingMerge 2< 3 < 5 < 7 1< 6 < 8 <9

2< 3 < 5 < 7 3 < 5 < 7 5 < 7 7

6 < 8 <9 6 < 8 <9 6<8 <9 6<8 <9

1 1<2 1<2<3 1<2<3<5

8<9 8<9

1<2<3<5<6 1<2<3<5<6<7 1<2<3<5<6<7<8<9

Time analysis

T(n): the number of steps to sort n elements

T(1)=1

T(n)=2T(n/2)+n for n>1

P1P1 P2P2

Time analysis

T(n)=2T(n/2)+n

=2(2T(n/4)+n/2)+n=4T(n/4)+n+n

=4(2T(n/8)+n/4)+n+n=8T(n/8)+n+n+n

=O(n (log n))

P1P1 P2P2

nkT knk )(2

Every layer costs steps

Total #layers is log n. Total time is n (log n)

Merge time #Nodes

P1 P2 2

……

Exponentiation Problem

• Compute

• Compute na

Polynomial Problem• Compute

• Compute a general polynomial:

158273 234567 xxxxxxx

1 ... axaxaxaxa nn

Exercise

Draw the tree for merge sorting

15, 11, 4, 22, 31, 55, 71, 12, 7, 2, 5, 3, 6, 9, 8, 1

Point out the number of comparison that you use.

Chapter 3-4

Growth of Functions and Recursion Equations

O-notation: f(n) = O(g(n)) ， g(n) is an asymptotically upper bound for f(n) 。

O(g(n)) = {f(n)| if there are positive constants c and n0 such that 0 f(n) c2 g(n) for all large n n0 }

Example: 3n2 - 6n = O(n2) 。

O-notation

• Drop low-order terms; ignore leading constants.• Example: • we say that T(n)= O( g(n) ) iff• there exists positive constants , and such that• 0<T(n) < g(n) for all n > n0• Usually T(n) is running time, and n is size of input

)O(n 6046 5n – 90n 3n 323

Simplified Master Theorem Let

be a recursive equation on the nonnegative integers,

where a> 0, b > 1, c>0, and r>=0 are constants,

Then,• 1. If , then• 2. If , then• 3. If , then

ar blog )()( log abnOnT ar blog )log()( log nnOnT abar blog )()( rnOnT

rcnbnaTnT )/()(

• Sum

• Let

kddd ...1 2

kk dddS ...1 2

132 ... kk dddddS

11 kkk dSdS

• From

• So,

kk dddS ...1 2

132 ... kk dddddS

dSThus

• Assume that d is a constant.

• Case 1: d>1. (Main term is Right region)

• Case 2: d=1. (Each term is same )

• Case 3: d<1. (Main term is at left region)

kk dddS ...1 2

)( kk dOS

)(kOSk

)1(OSk

• Assume that d, r are constants.

• Case 1: d>1.

• Case 2: d=1.

• Case 3: d<1.

)...1()( )(2 nhr dddnnT

)()( )(nhrdnOnT

))(()( nhnOnT r

)()( rnOnT

where a> 0, b > 1, c>0, and r>0 are constants,

rcnbnaTnT )/()(

LayersLayer rcn

......

Proof.

• The number of nodes in the j-th layer is• The size of a node in the j-the layer is

• The cost of each node in the j-th layer is

• The total cost at j-th layer is

nca )()(

Proof.

• The number of layers is • The total cost at all layers is

Proof.

• The total cost at all layers is

)()(log

Proof.

• The total cost at all layers is

Where d is the constant

Proof.

• Case 1.

Therefore, the total cost is

ar blog

br log

)()()()( loglog

loglog a

nrnr b

acnOdcnO

Proof.Case 2.

ar blog

br log

)log(log

nnOncn

Proof.

• Case 3.

ar blog

br log

)()1(log

r nOOcndcnb

Simplified Master Theorem Let Then,

• 1. If , then

(The main cost is the bottom layers region)

• 2. If , then

(Every layer has roughly the same cost)

• 3. If , then

(The main cost is the top layers region)

)log()( log nnOnT ab

ar blog )()( rnOnT

rcnbnaTnT )/()(

)()( log abnOnT

ar blog

3.1 Asymptotic notation

Θ-notation: f(n) = Θ(g(n)) ， g(n) is an asymptotically tight bound for f(n) 。

Θ(g(n)) = {f(n)| there exists constants c1 ， c2 ， and n0 such that 0 c1 g(n) f(n) c2 g(n) for all large n n0 }

Example: Prove 3n2 - 6n = Θ(n2) 。

Proof:

We need to find constants c1, ， c2 and n0 such that:

c1n2 3n2 - 6n c2n2 ， (for all nn0)

Divide n2

c1 3 - 6/n c2

Select c1=2 ， c2=3 and n0=6

Note ： f(n) = Θ(g(n)) iff g(n)= Θ(f(n)) ， For example: n2=(3n2-6n)

O-notation: f(n) = O(g(n)) ， g(n) is an asymptotically upper bound for f(n) 。

O(g(n)) = {f(n)| if there are positive constants c and n0 such that 0 f(n) c2 g(n) for all large n n0 }

• Θ(g(n)) O(g(n))

• f(n) = Θ(g(n)) implies f(n) = O(g(n))

• 6n = O(n) ， 6n = O(n2)

• Computational time O(n2) means the time in the worst case is O(n2)

Ω-notation: f(n) = Ω(g(n)) ， g(n) is an asymptotically lower bound for f(n) 。

Ω(g(n)) = {f(n)| there are positive constants c and n0 such that 0 cg(n) f(n) for all n n0 }

Note ： f(n) = Θ(g(n)) if and only if

(f(n)=O(g(n))) & (f(n)=Ω (g(n)))

tight bound upper bound lower bound

o-notation: f(n) = o(g(n)) (little-oh of g of n)

o(g(n)) = {f(n)| for every positive constant c ， there exists constant n0 > 0 such that 0 f(n) < cg(n) for all n n0 }

• 2n = o(n2) ， but 2n2 o(n2)• f(n) = o(g(n)) can be also written 0 =

Comparison of functions

• functions: Ω Θ O o

numbers: = <

• Transitivity ， Reflexivity ， Symmetry

• Two real numbers are always comparable ， two functions may not be comparable

– Example ： f(n)=n and g(n)=n1+sin n

Appendix A: Summation formulas

kkk bacbca

)1/()1()()1(2

xxxnnnk n

series) (Harmonic )1(log1

)1()1(

k nkkkk

• 1. If , then

• 2. If , then

• 3. If , then

ar blog )()( rnOnT

rcnbnaTnT )/()(

)()( log abnOnT

ar blog

Exercise• For the following two equations. Identify the main cost

region and solution with the simplified master theorem :

)2/(8)(.2

)2/(8)(.1

Divide and Conquer Method

The most-well known algorithm design strategy:

1. Divide instance of problem into two or more smaller instances

2. Solve smaller instances recursively

3. Obtain solution to original (larger) instance by combining these solutions

rcnbnaTnT )/()(

LayersLayer rcn

......

Integer Multiplication

• 12x 47=564 by the algorithm below

----------------------------

-----------------------------

Integer MultiplicationA = 1234567890135798 B = 87654321284820912

The elementary school algorithm:

a1 a2 … an

b1 b2 … bn

(d10) d11d12 … d1n

(d20) d21d22 … d2n

… … … … … … …

(dn0) dn1dn2 … dnn

Efficiency: n2 one-digit multiplications

Karatsuba’s Algorithm• Using the classical pen and paper algorithm two n

digit integers can be multiplied in O(n2) operations. Karatsuba came up with a faster algorithm.

• Let A and B be two integers with– A = A110k + A0, A0 < 10k

– B = B110k + B0, B0 < 10k

– C = A*B = (A110k + A0)(B110k + B0)

= A1B1102k + (A1B0 + A0 B1)10k + A0B0

A trivial analysis• T(n)=4T(n/2)+O(n)

• T(n)=O( )2n

• 1. If , then

• 2. If , then

• 3. If , then

ar blog )()( rnOnT

rcnbnaTnT )/()(

)()( log abnOnT

ar blog

3 MultiplicationsInstead this can be computed with 3 multiplications

• T0 = A0B0

• T1 = (A1 + A0)(B1 + B0)

• T2 = A1B1

• C = T2102k + (T1 - T0 - T2)10k + T0

Complexity of Algorithm• Let T(n) be the time to compute the product of two

n-digit numbers using Karatsuba’s algorithm. Assume n = 2k. T(n) = (nlg(3)), lg(3) 1.58

• T(n) 3T(n/2) + cn

Matrix Multiplication

• Regular method takes time O(n*n*n)

• IDEA:

r s a b e f

t u c d g h

• r =ae+bg

• s =af +bh

• t =ce+dg

• u =cf +dh

Strassen’s algorithm

• Multiply 2×2 matrices with only 7 recursive mults.

• P1 = a ( ⋅ f – h)

• P2 = (a + b) ⋅ h

• P3 = (c + d) ⋅ e

• P4 = d (⋅ g – e)

• P5 = (a + d) (⋅ e + h)

• P6 = (b – d) (⋅ g + h)

• P7 = (a – c) (⋅ e + f )

Strassen’s Algorithm

• r = P5 + P4 – P2 + P6

• s = P1 + P2

• t = P3 + P4

• u = P5 + P1 – P3 – P7

ProblemVerify one of the four equations in the Strassen’s

algorithm:

u =P5 + P1 – P3 – P7

Strassen observed [1969] that the product of two matrices can be computed as follows:

C00 C01 A00 A01 B00 B01

C10 C11 A10 A11 B10 B11

M1 + M4 - M5 + M7 M3 + M5

M2 + M4 M1 + M3 - M2 + M6

M1 = (A00 + A11) (B00 + B11)

M2 = (A10 + A11) B00

M3 = A00 (B01 - B11)

M4 = A11 (B10 - B00)

M5 = (A00 + A01) B11

M6 = (A10 - A00) (B00 + B01)

M7 = (A01 - A11) (B10 + B11)

Analysis of Strassen’s Algorithm

If n is not a power of 2, matrices can be padded with zeros.

Number of multiplications:

T(n) = 7T(n/2)+O( ), T(1) = 1

Solution: T(n) = nlog 27 ≈ n2.807

vs. n3 of brute-force alg.

ProblemVerify two of the four equations in the Strassen’s

algorithm:

t = P3 + P4

u =P5 + P1 – P3 – P7

Heap 3

10 11 6

Father<= childrenThe root is the smallestPerfect binary tree

Heap 3

10 11 6

Father<= childrenThe root is the smallestPerfect binary tree (every layer except the bottom is

filled up)

Heap Operations

• Insertion

• Deletion: Remove root (take the least)

Heap Insertion 3

10 11 6 1

Insertion Adjustment 3

10 11 6 5

Adjust it on the path from new leaf to root

Heap Insertion 1

10 11 6 5

Except new leaf, all adjusted nodes get smaller valuesInsertion does not damage the heap

Deletion (Remove Root)

10 11 6 5

Deletion Adjustment 5

10 11 6

Take the last leaf to the rootAdjust on the path from root to a leaf

Deletion Adjustment 3

10 11 6

Deletion does not damage heap

Heap Representation

• A heap with no more than n elements uses array h of size n.

• The children of h[i] is h[2i] and h[2i+1]

• The h[i/2] is the father of h[i]

Heap Representation

• A heap with no more than n elements uses array h of size n.

Prove By Induction:

• left[i]=2i

• right[i]=2i+1

• father(i)=i/2

Adjustment for Insertion

Bottom-Up-Adjust(i){

if (h[i]<h[parent(i)]){

swap between h[i] and h[father(i)];

Bottom-Up-Adjust(parent(i))

Insert

Insert(date, heapsize){

Put data at h[heapsize+1];

Bottom-Up-Adjust(heapsize+1);

Adjustment for deletion

Top-Down-Adjust(i){

let h[child] be minimal(h[left(i)], h[right(i)]).

if (h[i]>h[child]){

swap between h[i] and h[child];

Top-Down-Adjust(child)

Delete

Delete(heapsize){

move h[heapsize] to h[1];

Top-Down-Adjust(1);

Complexity of Heap Operations

• A heap has n elements.

• The depth of heap is O(log n)

• Insertions takes O(log n) steps

• Deletion takes O(log n) steps

Heap Sorting

• Input: a1, a2, …, an

• Build Heap: n insertions

Cost: O(nlog n)

• Remove from heap: n deletions

Cost: O(nlog n)

Total cost: O(nlog n)

ATM Traffic Shaping

Incoming

vc1 queue

vc2 queue

vc3 queue

vcn queue

schedulerOutgoing

Tele-communicationPhones

Phones

Each vc is considered as one phone connection

GatewaySwitch

Traffic in one virtual circuit

Incoming packets:

Outgoing packets after shaping:

p1 p2 p3 p4 p5 p6 p7

p1 p2 p3 p4 p5 p7p6

Interval Packet Delay

• Inter packet time gap is big enough

• Every virtual circuit i has minimal inter delay const

inter_delay_i

>inter_delay_i

Packet 3 Packet 4

A Trivial Algorithm

• After a packet in one vc, set the ready time for sending the next packet :

ready_time=current_time+inter_delay

• Periodically check all queues, and send packet without inter delay violation

Drawback of the Algorithm

• Much time is wasted for checking those vcs which have no packet ready

Another Algorithm

• Check if there is at least one queue ready to send

• Use a heap to select a queue with the least ready time

current_time>ready_time

Heap for Selecting Queue

• Each vc has <ready_time, vc_i> to enter heap

• Heap is built based on the order of ready_time

Apply Heap to Traffic Control

<ready_time,vc_i>

<ready_time,vc_1>

<ready_time,vc_2>

<ready_time, vc_1000>

…….

vc1 queue

vc2 queue

vc1000 queue

When to insert to heap?

• When a queue just has one new packet, or

• When a queue just sends out one packet and still has packets waiting

When to delete from heap?• Outgoing bandwidth is available, and

• The least ready_time in heap is expired

Drawback of One Heap Solution

• It can not prevent greedy VC

• It may ignore some VCs

• Traffic control is not predictable

Two Heaps Design

vc1000 queue

vc1 queue

timingfairness

Two Heaps Functions

• Time Heap: Control the inter packet delay

<time_ready, vc_i>

• Fairness Heap: Balance the service among all VCs

<service_got, vc_i>

Adjust servic_got for fairness• Each vc has a weight w_i > 1

• When a packet is sent, its

service_got=service_got + w_i

• When a queue just has one packet,

service_got=max(service_got, time_stamp)

• The service is reverse proportional to weight w_i

Problem 1

10 11 6 5

a) Draw the steps to insert element 2.b) After 2 is inserted, draw the steps to remove the

Dynamic Programming

Recursion: Like divide-and-conquer .

Overlap in subproblems: Not like divide-and-conquer

P(m1) P(m2) …. P(mk)

S1 S2 …. Sk

Matrix Multiplication (definition)

Given a series of matrices A1, A2, … , An, matrix Ai

has size pi1 pi, find a way to compute A1A2…An so

that it has least number of multiplications

Example ： A1 A2 A3 A4

pi :13 5 89 3 34

(A1(A2(A3A4))), (A1((A2A3)A4)), ((A1A2 )( A3A4)),

((A1(A2 A3))A4), ((( A1A2)A3)A4).

5 ways to compute the product of 4 matrices :

Matrix Multiplication(Example)

(A1(A2(A3A4))), costs = 26418

(A1((A2A3)A4)), costs = 4055

((A1A2 )( A3A4)), costs = 54201

((A1(A2 A3))A4), costs = 2856

((( A1A2)A3)A4), costs = 10582

A1 A2 A3 A4 13 5 89 3 34

(A1(A2(A3A4))) A1(A2A3A4) A2(A3A4) A3A4

cost = 13*5*34 + 5*89*34 + 89*3*34 = 2210 + 15130 + 9078

= 26418

Catalan Number

For any n, # ways to fully parenthesize the product

of a chain of n+1 matrices

= # binary trees with n nodes.

= # n pairs of fully matched parentheses.

= n-th Catalan Number = C(2n, n)/(n +1) =

(4n/n3/2)

Multiplication Tree

(A1(A2(A3A4)))

(A1((A2A3)A4))

((A1A2 )( A3A4))

((A1(A2 A3))A4)

((( A1A2)A3)A4)

A1 A2 A3 A4

Multiplication Design (1)

If T is an optimal solution for A1, A2, … , An

1, …, k k+1, …, n

then, T1 (resp. T2) is an optimal solution for A1, A2,

… , Ak (resp. Ak+1, Ak+2, … , An).

Multiplication Design (2)

Let m[i, j] be the minmum number of scalar multiplications needed to compute the product

Ai…Aj , for 1 i j n.

If the optimal solution splits the product Ai…Aj =

(Ai…Ak)(Ak+1…Aj), for some k, i k < j, then

m[i, j] = m[i, k] + m[k+1, j] + pi1 pk pj . Hence,

we have :m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }

= 0 if i = j

Matrix Multiplication (Example)

Consider an example with sequence of dimensions <5,2,3,4,6,7,8>

m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }

1 2 3 4 5 6

1 0 30 64 132 226 348

2 0 24 72 156 268

3 0 72 198 366

4 0 168 392

5 0 336

Matrix Multiplication (Find Solution)

m[i, j] = mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }s[i, j] = a value of k that gives the minimum

s 1 2 3 4 5 6

1 1 1 1 1 1

2 2 3 4 5

3 3 4 5

[2,6]A1

[2,5] A6

[2,4] A5

A4[2,3]

A1(((( A2A3)A4)A5) A6)

Analysis

To fill the entry m[i, j], it needs (ji) operations. Hence the execution time of the algorithm is

m[i, j] =mini k < j{m[i, k] + m[k+1, j] + pi1 pk pj }

1 11 2

jjjijij

Time: (n3)Space: (n2)

Steps for Developing DP algorithm

Characterize the structure of an optimal solution.

Derive a recursive formula for computing the

values of optimal solutions.

Compute the value of an optimal solution in a

bottom-up fashion (top-down is also applicable).

Construct an optimal solution in a top-down

fashion.

Elements of Dynamic Programming

Optimal substructure (a problem exhibits optimal

substructure if an optimal solution to the problem

contains within it optimal solutions to

subproblems)

Overlapping subproblems

Memoization

Longest Common Subsequence (Def.)

Given two sequences X = <x1, x2, … , xm> and Y =

<y1, y2, … , yn> find a maximum-length common

subsequence of X and Y.

Example 1 ： Input: ABCBDAB BDCABA

C.S.’s: AB, ABA, BCB, BCAB, BCBA …

Longest: BCAB, BCBA, … Length = 4

A B C B D A B B D C A B A

Example 2 : vintner writers

Longest Common Subsequence (Design 1)

Let Z= < z1, z2, … , zk> be a LCS of X = <x1, x2,

… , xm> and Y = <y1, y2, … , yn>.

If zk xm, then Z is a LCS of <x1, x2, … , xm1>

and Y.

If zk yn, then Z is a LCS of X and <y1, y2, … ,

If zk = xm = yn, then <z1, z2, … , zk1> is a LCS of

<x1, x2, … , xm1> and <y1, y2, … , yn1>.

Longest Common Subsequence (Design 2)

Let L[i, j] be the length of an LCS of the prefixes

Xi = <x1, x2, … , xi> and Yj = <y1, y2, … , yj>, for

1 i m and 1 j n. We have :

L[i, j] = 0 if i = 0, or j = 0

= L[i1, j1] + 1 if i , j > 0 and xi = yj

= max(L[i, j1], L[i1, j]) if i , j > 0 and xi yj

Longest Common Subsequence L[i, j] = 0 if i = 0, or j = 0

= L[i1, j1] + 1 if i , j > 0 and xi = yj

Time: (mn)Space: (mn)

A B C B D A BB 0 1 1 1 1 1 1 D 0 1 1 1 2 2 2 C 0 1 2 2 2 2 2A 1 1 2 2 2 3 3B 1 2 2 3 3 3 4A 1 2 2 3 3 4 4

LCS: BCBALCS: BCBA

Longest Common Subsequence L[i, j] = 0 if i = 0, or j = 0

= L[i1, j1] + 1 if i , j > 0 and xi = yj

Time: (mn)Space: (mn)

A B C B D A BB 0 1 1 1 1 1 1 D 0 1 1 1 2 2 2 C 0 1 2 2 2 2 2A 1 1 2 2 2 3 3B 1 2 2 3 3 3 4A 1 2 2 3 3 4 4

LCS: BCBALCS: BCBA

Find a triangulation s.t. the sum of the weights

of the triangles in the triangulation is

minimized.

Optimal Polygon Triangulationv0

Optimal Polygon Triangulation (Design 1)

If T is an optimal solution for v0, v1, … , vnv0

then, T1 (resp. T2) is an optimal solution for v0, v1,

… , vk (resp. vk, vk+1, … , vn), 1 k < n.

},{ 021 nkk vvvvTTT

Optimal Polygon Triangulation (Design 2)

Let t[i, j] be the weight of an optimal triangulation

of the polygon vi1, vi,…, vj, for 1 i < j n.

If the triangulation splits the polygon into vi1, vi,

…, vk and vk, vk+1, … ,vn for some k, then

t[i, j] = t[i, k] + t[k+1, j] + w(vi1 vk vj) . Hence,

we have :

t[i, j] = mini k < j{t[i, k] + t[k+1, j] + w(vi1 vk vj) }

= 0 if i = j

Catalan Number• Segner's recurrence formula gives :

,... 212312 EEEEEEE nnnn

2 nn CE

121 EE

Problem

Draw the dynamic programming table to find the Longest Common Sequence between BACAC and CABC.

Data Structures

• Linked List

• Heap

• Application of Heap in an Industry Product

Program=Data Structure + Algorithm

Link List

8 10 15

Node Structure

• struct listnode{

type data;

struct node *nextPtr;

Dynamic Memory Allocation:

• Apply for the memory when it is needed

• Release memory when it is not needed

Linked List Operations

• Insertion:

add a new node to the link list

• Deletion:

remove a node from the current link list

Linked List to Implement

The linked list is increasing order for characters

startPtr

Insertion

• Create a new node

• Find a place to insert

• Apply for a new piece of memory

• Adjust the nearby pointers

After g is inserted

startPtr

Find Location to Insert

startPtr previousPtr currentPtr

startPtr

Deletion

• Find the node to delete

• Adjust the pointers nearby the deleted node

• Release the memory for the deleted node

startPtr

Remove the node and release memory

startPtr

After e is deleted

startPtr

• First in, First Out

Queue Linked List

• headPtr tailPtr

Three Important Operations

Supporting the 3 operations are the foundation of modern data base:

• Search

• Insertion

• Deletion

• Tree is a 2-dimensional structure

11 43 93

• Binary tree: root, left_child, right_child

Numbers in the left <= Numbers in the right.

Data structure for one node

struct treenode{

struct treenode *leftptr;

int data;

struct treenode *rightptr;

typedef struct treenode TreeNode;

typedef TreeNode *TreeNodePtr;

11 43 93

Operations

• Insertion

• Traverse:

inorder

preorder

postorder

Insertionvoid insertNode(TreeNodePtr *treePtr, int value){

if (treePtr is empty) {

allocate memory and put here

else if (value<(*treePtr)data)

insert at the left tree

insert at the right tree

void insertNode( TreeNodePtr *treePtr, int value ) { if ( *treePtr == NULL ) { *treePtr = malloc( sizeof( TreeNode ) ); if ( *treePtr != NULL ) { ( *treePtr )->data = value; ( *treePtr )->leftPtr = NULL; ( *treePtr )->rightPtr = NULL; } else printf( "No memory available.\n“); } else if ( value < ( *treePtr )->data ) insertNode( &( ( *treePtr )->leftPtr ), value ); else if ( value > ( *treePtr )->data ) insertNode( &( ( *treePtr )->rightPtr ), value ); else printf( "dup" );}

void inOrder( TreeNodePtr treePtr )

if ( treePtr != NULL ) {

inOrder( treePtr->leftPtr );

printf( "%3d", treePtr->data );

inOrder( treePtr->rightPtr );

void preOrder( TreeNodePtr treePtr )

preOrder( treePtr->leftPtr );

preOrder( treePtr->rightPtr );

void postOrder( TreeNodePtr treePtr )

postOrder( treePtr->leftPtr );

postOrder( treePtr->rightPtr );

Problem

Implement the function to find the largest and least elements in the binary tree:

void max(TreeNodePtr *treePtr, int *largest, int *least)

Binary tree

When the binary is not balanced, it takes O(n) steps for search, insert, or delete.

• Search: O( n) steps in the worst case

• Insert: O(n) steps in the worst case

• Delete: O(n) steps in the worst case

Maintaining Balance

• Binary Search Tree– Height governed by

• Initial order • Sequence of insertion/deletion

• Need a structure that tends to maintain balance– How?

• Grow in ‘width’ first, then height• Accommodate horizontal growth• More data at each level• Nodes of two forms

– One data member and two children (“Two node”)– Two data members and three children (“Three node”)

2-3 Tree• Each node which is not a leaf has either 2 or 3 sons

• Every path from the root to a leaf has the same length.

Depth and Size for 2-3 tree

• Let d be the depth of a 2-3 tree

• The k-level has and nodes

• A depth d=(log n) 2-3 tree can hold n nodes at leave level

2-3 Tree Nodes

S >S L

2-3 Tree Nodes

S LS ,

2-3 Tree NodesS:L:M

S LS , ML ,

Search in 2-3 TreeSearch (a,r){

if (r only has leaf children) return r

else {

if a<= S then search(a, left_child)

else if (a<=L) then search(a, mid_child)

else search(a, right_child)

Insertion(36)

40:100

20:40 60:80:100

20 40 60 80 100

Insertion(36)

40:100

20:36:40 60:80:100

20 40 60 80 10036

Insertion(36)

40:100

20:36:40 60:80:100

20 40 60 80 10036

Insertion(50)

40:100

20:36:40 60:80:100

20 40 60 80 10036

Insertion(50)

40:100

20:36:40 60:80:100

20 40 60 80 10036 50

Insertion(50)

36:60:100

20:36:40 80:100

20 40 60 80 10036 50

Insertioninsertion(a){

use search(root,a) to find the node r

make a as son of r

if (r has four sons)

adjust the tree from r up to root by addson(r)

Insertion and Splits on the path to root root

split stops here

split starts here

Addson(v)Addson(v){

create a new node v’

make the two rightmost sons of v to sons of v’

if (v has no father) {

create a new root r

make v and v’ the left and right sons of r

else {make v’ a son of father(v) to the right of v

if (father(v) has four sons) then addson(father(v))

Computational steps for insertion

• Assume the tree has n nodes on the leave level

• Insertion operates the nodes from the root to a leaf

• The path from the root to leaf has (log n) nodes

• The number of steps for insertion is O(log n)

Deletion(80)

36:50:100

20:36 60:80:100

20 40 60 80 10036 50

Deletion(80)

36:50:100

20:36 60:100

20 40 60 10036 50

Deletion(4)5:9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 4 5 6 7 8 9

Deletion(4)5:9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 5 6 7 8 9

Deletion(4)5:9

3:5 7:9

1:3:5 6:7 8:9

1 3 5 6 7 8 9

Deletion(4)5:9

1:3 6:7 8:9

1 3 5 6 7 8 9

Deletion(4)

1:3 6:7 8:9

1 3 5 6 7 8 9

Deletion

Stops at this level

merge starts at this level

Deletion

Stops at this level

Deletion

Stops at this level

Deletion

Stops at this level

Deletion

Stops at this level

Deletion

Stops at this level

Deletion

Delete(r,a){

remove the son of r with value a

call RemoveSon( r) to recursively adjust the tree

(roughly along the path from r to root)

RemoveSon(r)

RemoveSon( r){ if (r has one child) { let r’ be a brother of r if (r’ has 3 sons) let r get a son from r’ else {make the son of r son of r’ let f be the father of r remove r RemoveSon (f) }}

SearchS:L:M

S LS , ML ,

Insertion and Splits on the path to root root

split stops here

split starts here

Deletion

Stops at this level

Problem: show how to delete 75:9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 4 5 6 7 8 9

Problem: show how to delete 75:9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 4 5 6 7 8 9

Binomial Heaps

Operations

• Insert(H,x)

• Minimum(H)

• Extract-Min(H)

• Union(H1, H2)

• Decrease-Key(H,x, k)

• Delete(H,x)

• Binomial trees:B0 has a single node

B1 B2 B3

– Lemma1For the binomial tree Bk,

1. there are 2k nodes,

2. the height if the tree is k,

3. there are exactly nodes at depth i for i= 0, 1, …., k,

4. the root has degree k, which is greater than that of any other node; moreover if the children of the root are numbered from left to right by k-1, k-2, …, 0, child I is the root of a subtree Bi

Proof:

1) By induction, 2k-1 + 2k-1 = 2k

2) By induction, 1 + (k-1) = k

3) , by induction

Corollary:The maximum degree of any node in an n-node binomial tree is lg(n)

• Binomial heaps:H: a set of binomial trees satisfying the following:

1. Each binomial tree in H is heap-ordered:

the key of a node is greater than or equal to the key of its parent

2. There is at most one binomial tree in H whose root has a given degree

By 2. an n-node binomial heap H consists of at most

binomial trees

Representation of binomial heaps

(a) head[H] 10 1

(b) head[H]

degree

sibling

– Operations on binomial heaps• Creating a new binomial heap

• Finding the minimum key

• Uniting 2 binomial heaps

time:θ(1) NIL,Hhead

yreturn

sibling[x] x

key[x] min

min then x[k]if

do NIL x while

head[H] x

Minimum(H)HeapBinomial

Time: O( lg n)

• Binomial-Heap-Merge

}1 degree[z] degree[z]

child[z] child[z] sibling[y]

z p[y] {

z) Link(y,Binomial

Y, z : Bk-1 trees

12 7 15

4410298

1731482223

32 244555

(a) head[H1] head[H2]

4410298

1731482223

32 244555

(b) head[H] 12 7 15

x next-xsorted degree

Output of

Binomial-Heap-Merge

Case 3

4410298

1731482223

32 244555

(c) head[H] 12 7 15

x next-x

Case 2

sibling[next-x]

4410298

1731482223

32 244555

(d) head[H] 12 7 15

x next-x

Case 4

prev-x

4410298

1731482223

32 244555

(d) head[H] 12

x next-x

Case 3

prev-x

4410298

1731482223

32 244555

(d) head[H] 12

x next-x

Case 1

prev-x

.......

then ) ]degree[ ]]ling[degree[sib and NIL ]sibling[ (

or ) ]degree[ ]degree[ ( if {

)do NIL ( while

sibling[H] head[H]

NIL H return then NILhead[H] if

to pointthey liststhe not but H and Hfree )H ,Merge(H-Heap-Binomial head[H]

) Heap(-Binomial-Make H {

)H ,Union(HHeapBinomial

next-xxxprev-x

xnext-xnext-x

next-xx

next-x

next-xxprev-x

}y return

} ]sibling[

) , Link(-Binomial ]sibling[

else head[H]

then ) NIL ( if {else }

) , Link(-Binomial ]sibling[ ]sibling[

{ then ) ]key[ ]key[ ( ifelse }

xnext-x

next-xxnext-xx

next-xprev-x

xnext-xnext-xx

next-xx

......

Case 3

Case 4

….a b c dprev-x x next-xsibling[next-x]

….….(a)

Case 1

a b c dprev-x x next-x

a b c dprev-x x next-xsibling[next-x]

….….(b)

Case 2

a b c dprev-x x next-x

….….(c)

Case 3

dprev-x x next-x

….….(d)

Case 4

][][ xnextkeyxkey

dprev-x x

• Insert a node

Extracting the node with minimum key}

H H, Union(-Heap-Binomial H x ]head[H NIL sibling[x]

NIL child[x] NIL p[x]

) Heap(-Binomial-Make H {

x) Insert(H,HeapBinomial

)x return

H H, Union(-Heap-Binomial H listresulting

the of headthe to point to ]head[H set and children sx' of list linkedthe of orderthe reverse

) Heap(-Binomial-Make H H of list root

the inkey minthe withrootthe remove and find {

Min(H)-ExtractHeapBinomial

2512166

1823262914

17 381127

(a) head[H] 37

2512166

1823262914

17 3811

(b) head[H] 37

(c) head[H] 37

17 3811

232642

head[H’]

17 3811

(d)head[H]

232642

Decreasing a key

• Deleting a key

p[y] z z y

fields other and key[z] key[y]exchange {

do ) key[z] key[y] and NIL z ( whilep[y] z x y

k key[x] " key[x] k" error then key[x] k if

{k) x, Key(H,-DecreaseHeapBinomial

} ) H Min(-Extract-Heap-Binomial

) -x, H, Key(-Decrease-Heap-Binomial {

x) Delete(H,HeapBinomial

17 3811

(a)head[H]

17 3811

(b)head[H]

231642

17 3811

(c)head[H]

231642

Minimum Spanning Trees

(Greedy Algorithms)

• Graph: A set of nodes V

A set of edges E from V x V

4321 ,,, vvvv

),(),,(),,(),,( 42433221 vvvvvvvv

Path• Graph G=(V,E)

• A path is a series of edges linked one by one

• Loop:

Tree• A graph is connected if every two nodes have a path

to connect them

• A tree is a connected graph without loop

Connected Graph Tree

• Every connected graph can be converted into tree by removing some edges

Removing one edge on a loop does not damage the connectivity.

A tree is a minimal connected graph

• Removing any edge on a tree damages the connectivity

Proof. Tree T=(V,E).

Let (v1, v2) be removed from T. T T’=(V, E-{(v1,v2)}).

If T’ is still connected, T has a loop containing v1 and v2 . Contradiction!

Number of edges in a tree• Each tree has node with only one edge

Proof. Start from one node to build a path. Meet a node with only one edge. Otherwise, it has loop.

• Each tree of n nodes has n-1 edges

Proof. By induction. It is true for n=1,2.

Assume it is true at case n.

At case n+1, find the node with one edge. Remove it. By inductive assumption, it has n-1 edges.

Unique path on tree• Every two nodes in a tree have a unique path.

Proof. If there are different path, there is a loop.

Weighed Graph

Many graph problems have weighted edges

All weights are positive value here

G=(V,E)

Minimum Spanning Trees (MST) Find the lowest-cost way to connect all of the points

(the cost is the sum of weights of the selected edges).

The solution must be a tree. (Why ?)

A spanning tree : a subgraph that is a tree and connect all of the points.

A graph may contains exponential number of spanning trees.

(e.g. # spanning trees of a complete graph = nn-2.)

A High-Level Greedy Algorithm for MST

The algorithm grows a MST one edge at a time and maintains that A is always a subset of some MST.

An edge is safe if it can be safely added to without destroying the invariant.

How to check that T is a spanning tree of G ? How to select a “safe edge” edge ?

A = ; while(T=(V,A) is not a spanning tree of G) { select a safe edge for A ; }

MST Basic LemmaLet V = V1 + V2 and V1 and V2 have no intersection

(V1,V2 ) = {uv | u V1& v V2 }.

if xy (V1,V2 ) and w(xy) = min {w(uv)| uv

(V1,V2 )}, then xy is contained in a MST.

• Edge xy selected with the minimal w(xy) connecting V1 and V2 is an extension toward MST.

• Otherwise, add xy to the MST and replace another edge connecting V1 and V2. This makes adding xy is an extension toward MST.

Kruskal’s Algorithm (pseudo code 1)

A= ;for( each edge in order by nondecreasing weight ) if( adding the edge to A doesn't create a cycle ){ add it to A; if( | A| == n1 ) break; }

How to check that adding an edge does not create a

cycle?

Kruskal’s Algorithm (Example 1/3)

MST cost = 17MST cost = 17

Kruskal’s Algorithm (pseudo code 2)

A = ; initial(n); // for each node x construct a set {x}for( each edge xy in order by nondecreasing weight)

if ( ! find(x, y) ) { union(x, y); add xy to A; if( | A| == n1 ) break; }

find(x, y) = true iff. x and y are in the same setunion(x, y): unite the two sets that contain x

and y, respectively.

find(x, y) = true iff. x and y are in the same setunion(x, y): unite the two sets that contain x

and y, respectively.

Prim’s Algorithm (pseudo code 1)

ALGORITHM Prim(G)// Input: A weighted connected graph G=(V,E)// Output: A MST T=(V, A) VT { v0 } // Any vertex will do; A ; for i 1 to |V|1 do find an edge xy (VT, VVT ) s.t. its weight is minimized among all edges in (VT, VVT ); VT VT { y } ; A A { xy } ;

Prim’s Algorithm (Example 1/8)

Built a priority queue Q for V with key[u] = uV;key[v0] = 0; [v0] = Nil; // Any vertex will do While (Q ) { u = Extract-Min(Q); for( each v Adj(u) ) if (v Q && w(u, v) < key[v] ) { [v] = u; key[v] = w(u, v); Change-Priority(Q, v, key[v]); }}

Minimum Spanning Tree (analysis)

Let n = |V( G)|, m =|E( G)|. Execution time of Kruskal’s algorithm: (use

union-find operations with bionomial heap)

O(m log m ) = O(m log n )

Running time of Prim’s algorithm: adjacency lists + (binary or ordinary) heap:

O((m+n) log n ) = O(m log n )

Find the Minimum spanning with Prim’s algorithmStart from the node a. Show the steps

Problem 1

1.Give asymptotic upper and lower bounds for T(n) in each of the following recurrences. Assume that T(n) is constant for n2. Make your bounds as tight as possible, and justify your answers.

• a) T(n)=8T(n/2)+• b) T(n)=2T(n/4)+• c) T(n)=T(n-1)+• d) T(n)= T( ) +1

Problem 1. a)

Upper bound

(by Simplified Master Theorem )

Lower Bound

(by the recursion)

)( 5nO

Problem 1. b)

Upper bound

(by Simplified Master Theorem Case 2)

Lower Bound

(by the recursion tree analysis)

)log( nnO

)log( nn

Problem 1. c)

Upper bound

Lower Bound

)( 4nO

Problem 1. c)

We have

)1(...21

)1()2()3(

)1()2(

Problem 1.d)

• Upper bound

• Lower Bound

)log(log nO

)log(log n

Problem 1. d)

Upper bound

(by Simplified Master Theorem Case 3)

Lower Bound

(by the recursion tree)

)( 2nO

Problem 2

2. Let A[0...n-1] be an array of n distinct integers. A pair (A[i], A[j]) is said to be an inversion if these numbers are out of order, i.e., i<j but A[i]>A[j]. Design an O(n log n) time algorithm for counting the number of inversions.

Solution

• Revise the merge sorting.• When merge to sorted sub-array, compare

two front elements. a) remove the front left element if it is less

than or equal to the front on the right. b) increase the counter by the number of

elements in the left half if front left is larger than right, and remove the front right.

Problem 3: Bubble, Merge, and Heap Sortings

a) int bubblesort(int *a, int size).

b) int mergesort(int *a, int size)

c) int generate(int *a, int size)

d) Test both with 10, 100, 1000, 10000, 100000, and 1000,000, 4000,000 integers.

Mergevoid merge(long int *a, long int lo, long int m, long int hi){ long int i, j, k; i=0; j=lo; // copy first half of array a to auxiliary array b while (j<=m) b[i++]=a[j++];

i=0; k=lo; // copy back next-greatest element at each time while (k<j && j<=hi) if (b[i]<=a[j]) a[k++]=b[i++]; else a[k++]=a[j++]; // copy back remaining elements of first half (if any) while (k<j) a[k++]=b[i++];}

Mergesort

void mergesort(long int *a, long int lo, long int hi){

if (lo<hi) {

long int m=(lo+hi)/2;

mergesort(&a[0], lo, m);

mergesort(&a[0], m+1, hi);

merge(&a[0], lo, m, hi);

The beginning of the program

#include <iostream>using namespace std;

#include <time.h>#define ARRAYSIZE 100000long int array[ARRAYSIZE];long int b[ARRAYSIZE]; void merge(long int a[], long int lo, long int m, long int hi);void mergesort(long int a[], long int lo, long int hi);void swap( long int *element1Ptr, long int *element2Ptr );void bubbleSort( long int *array, const long int size );

Main int main(void) { time_t t1,t2; long int option, i; printf("Enter 1 for merge sort or 2 for bubble sort\n"); scanf("%d", &option); for(i = 0; i < ARRAYSIZE; i++) array[i] = rand(); /* load random values */ if(option == 1){ t1 = time(NULL); mergesort(&array[0], 0, ARRAYSIZE - 1); t2 = time(NULL); } else{ t1 = time(NULL); bubbleSort(&array[0], ARRAYSIZE); t2 = time(NULL); } return 0;}

Homework 2

• The knapsack problem is that given a set of positive integers {a1,…, an}, and a knapsack of size s, find a subset A of {a1,…, an} such that the sum of elements in A is the largest, but at most s.

• Part 1. Use the dynamic programming method to design the algorithm for the knapsack problem. Prove the correctness of your algorithm. Show the computational time of your algorithm carefully.

Homework 2

Part 2. Use C++ to implement the function belowint knapsack(int *a, //the input integers int n, //the number of input integers int s, //knapsack size int *subset, //subset elements int &size_of_subset //the number of items in the subset)Test your program for the following knapsack problem:Input list: 5, 23, 27, 37, 48, 51, 63, 67, 71, 75, 79, 83, 89, 91, 101, 112,

121, 132, 137, 141, 143, 147, 153, 159, 171, 181, 190, 191 with knapsack size 595. Print out the subset and the sum of its elements.

Also print out your source code.

Single-Source Shortest Paths

Shortest-paths with Source s (Example)

Original Graph G Shortest-paths with Source s

Shortest-path problem

• Find the shortest path in a graph 。• G=(V,E) is a Weighted Directed Graph

• Weight function w: ER assigns weight to each edge 。

• p=(v0,v1,…,vk) is a path from v0 to vk 。

Shortest-path problem

• Define

• Define the shortest distance from node u to node v

i ii vvwpw1 1 ),()(

otherwise. ,

. to frompath a exists },:)(min{),( vuvupwvup

Shortest-path tree rooted at s

• For Graph G=(V,E), its Shortest-path tree rooted at s is G’=(V’,E’) ， which satisfies：– V’ is the set of all nodes reachable from s 。– G’ is a tree with s as its root 。– In G’, a path from s to v is the shortest path

from s to v in G 。

Shortest-path tree rooted at s (Example)

Original Graph G Shortest-path tree rooted at s

Predecessor graph

• For graph G=(V,E) ， follow table π to build Gπ=(Vπ,Eπ), which statisfies ：– π[s]=NIL ， and s∈Vπ 。– If π[v]≠NIL, then(π[v],v)∈Eπ and v∈Vπ 。

• Shortest-path tree rooted at s is an example of Predecessor graph 。

Predecessor graph Example

Original Graph G Shortest-path tree rooted at s

π[s] π[t] π[u] π[v] π[w] π[x] π[y] π[z]

NIL s NIL t s t x v

Initialize-Single-Source Algorithm

• Define d[v]to be the shortest distance from s to v 。

• Let π[v] be the node before reaching v on the shortest path from s to v 。

• Initially ， d[v]=∞ ， π[v]=NIL ， d[s]=0 。 Except the shortest path from s to s, everything

else is unknown 。

Initialize-Single-Source(G,s)

{ for each vertex v∈V[G]

do d[v]∞

π[v]NILd[s]0

Relaxation Algorithm

• Use the edge (u,v) to improve the current known shortest path 。

Relax(u,v,w)

{ if d[v]>d[u]+w(u,v)

thend[v]d[u]+w(u,v)π[v]u

Relaxation Example

w(u,v)u v

if w(u,v)=2 (<3)

Before Relax(u,v,w) After Relax(u,v,w)

w(u,v)u v

Renew sv shortest path and π[v]u

if w(u,v)=4 (>3)

w(u,v)u v

Do not update sv shortest distance

Shortest Path and Relaxation• Triangular inequality ：

For every edge (u,v) ， δ(s,v)<= δ(s,u)+w(u,v)。

• Upper bound propert ：δ(s,v)<= d[v] ， d[v] is always the upper for the shortest distance sv 。 If d[v]=δ(s,v) ， then Relaxation does not update d[v] 。

Shortest Path and Relaxation• No Path ：

If there is no path from s to v, then d[v]=δ(s,v)=∞。

• Convergence Property ：If the shortest path sv has edge (u,v) and d[u]=δ(s,u) ， Then Relax(u,v,w) makes d[v]=δ(s,v) 。

Shortest Path and Relaxation

• Path-relaxation Property ：If p=(v0,v1,…,vk) is a shortest path s=v0vk, , then excuting Relax(v0,v1,w) ， Relax(v1,v2,w)… ， Relax(vk-1,vk,w) can achieve d[vk]=δ(s,vk) 。

• Predecessor graph Property ：After a series of Relaxation ， for every node v ， when d[v]=δ(s,v) ， the corresponding Predecessor graph Gπ is a Shortest-path tree rooted at s 。

Bellman-Ford Algorithm

• It computes the shortest paths for the graph without negative loop 。

Bellman-Ford(G,w,s){ Initialize-Single-Source(G,s)for i = 1 to |V-1|

do for each edge (u,v)∈Edo Relex(u,v,w)

for each edge (u,v)∈Edo if d[v]>d[u]+w(u,v)

then return false//Negative loopreturn true //Success

Bellman-Ford Algorithm Example

(d) (e)6

Bellman-Ford Algorithm Analysis

• Correctness ： For each edge, Relaxation can compute the next reachable node’s shortest path in the Shortest-path tree rooted at s 。 By path-relaxation property ， After |V|-1 ， All Shortest simple path destination v ， d[v]=δ(s,v) 。

• Time Complexity ：

– Initialize-Single-Source takes O(|V|) steps 。– For each edge, it spends O(|V|) time Relaxation and

costs O(|E||V|) steps 。– Finally, spends O(|E|) to check if it has negative loop

• Total time : O(|V||E|) 。

Dijkstra Algorithm

• Can only handle graph without negative edge 。

• It is faster than Bellman-ford algorithm ， and select an order to do Relaxation 。

• Use Priority queue for implementation 。

• Main idea ： Use the convergence property 。

Dijkstra Algorithm

Q: Priority queue with d as the key

Dijkstra(G,w,s)

{ Initialize-Single-Source(G,s)

Q=V[G]

while Q is not empty

do u=Extract-Min(Q)

for each v∈adj[u]

do Relax(u,v,w)

Dijkstra Algorithm Example

∞ ∞

10 ∞

Dijkstra Algorithm Analysis

• Use different Priority queue ， has different cost 。

• Use Linear array ， Cost O(|V|2) steps 。

• Use Binary heap ， Costs O(|E|log|V|) steps 。

• Use Fibonacci heap ， Costs O(|E|+|V|log|V|) steps。

Single-source shortest paths in DAGs

• Different from Bellman-Ford. Follow certain order to do Relaxation ， Can find the shortest path in shorter time 。

DAG-Shortest-Path(G,w,s){ Topologically sort V[G]Initialize-Single-Source(G,s)for each u taken in topological order

do for each v∈adj[u]do Relax(u,v,w)

• Costs O(|V|+|E|) steps 。

DAG-Shortest-Path Example

∞ 0 ∞ ∞∞ ∞

(a) s5 2 7 -1 -2

∞ 0 ∞ ∞∞ ∞

(b) s5 2 7 -1 -2

∞ 0 ∞ ∞2 6

(c) s5 2 7 -1 -2

∞ 0 6 42 6

(d) s5 2 7 -1 -2

∞ 0 5 42 6

(e) s5 2 7 -1 -2

∞ 0 5 32 6

(f)s5 2 7 -1 -2

∞ 0 5 32 6

(g) s5 2 7 -1 -2

Problem: apply Dijskstra algorthm to find the shortest paths

to all nodes from s. Show how d[v] changes at every v.

Bipartite Matching

Lecture 3: Jan 17

Bipartite Matching

A graph is bipartite if its vertex set can be partitioned

into two subsets A and B so that each edge has one

endpoint in A and the other endpoint in B.

A matching M is a subset of edges so that

every vertex has degree at most one in

The bipartite matching problem:

Find a matching with the maximum number of edges.

Maximum Matching

A perfect matching is a matching in which every vertex is matched.

The perfect matching problem: Is there a perfect matching?

• Greedy method?

(add an edge with both endpoints unmatched)

First Try

Key Questions

• How to tell if a graph does not have a (perfect) matching?

• How to determine the size of a maximum matching?

• How to find a maximum matching efficiently?

Hall’s Theorem [1935]:

A bipartite graph G=(A,B;E) has a matching that “saturates” A

if and only if |N(S)| >= |S| for every subset S of A.

Existence of Perfect Matching

König [1931]:

In a bipartite graph, the size of a maximum matching

is equal to the size of a minimum vertex cover.

What is a good upper bound on the size of a maximum matching?

Min-max theorem NP and co-NP

Implies Hall’s theorem.

Bound for Maximum Matching

König [1931]:

Any idea to find a larger matching?

Algorithmic Idea?

Given a matching M, an M-alternating path is a path that alternates

between edges in M and edges not in M. An M-alternating path

whose endpoints are unmatched by M is an M-augmenting path.

Augmenting Path

What if there is no more M-augmenting path?

Prove the contrapositive:

A bigger matching an M-augmenting path

1. Consider

2. Every vertex in has degree at most 2

3. A component in is an even cycle or a path

4. Since , an M-augmenting path!

If there is no M-augmenting path, then M is maximum!

Optimality Condition

Algorithm

Key: M is maximum no M-augmenting path

How to find efficiently?How to find efficiently?

Finding M-augmenting paths

• Orient the edges (edges in M go up, others go down)

• An M-augmenting path

a directed path between two unmatched vertices

Complexity

• At most n iterations

• An augmenting path in time by a DFS or a BFS

• Total running time

Hall’s Theorem [1935]:

A bipartite graph G=(A,B;E) has a matching that “saturates” A

if and only if |N(S)| >= |S| for every subset S of A.

König [1931]:

Idea: consider why the algorithm got stuck…

Minimum Vertex Cover

Observation: Many short and disjoint augmenting paths.

Idea: Find augmenting paths simultaneously in one search.

Faster Algorithms

• Matching

• Determinants

• Randomized algorithms

Bonus problem 1 (50%):

Given a bipartite graph with red and blue edges,

find a deterministic polynomial time algorithm to determine

if there is a perfect matching with exactly k red edges.

Randomized Algorithm

Application of Bipartite Matching

Marking

Darek TomIsaac

Tutorials Solutions Newsgroup

Job Assignment Problem:

Each person is willing to do a subset of jobs.

Can you find an assignment so that all jobs are taken care of?

With Hall’s theorem, now you can determine exactly

when a partial chessboard can be filled with dominos.

Latin Square: a nxn square, the goal is to fill the square

with numbers from 1 to n so that:

• Each row contains every number from 1 to n.

• Each column contains every number from 1 to n.

Now suppose you are given a partial Latin Square.

Can you always extend it to a Latin Square?

With Hall’s theorem, you can prove that the answer is yes.

Homework 2

• Problem 1. Bitonic Euclidean Traveling Saleman problem.

Problem 1

• Define C(i, j): the minimal cost of tour from i to 1( to leftmost) and from 1 to j (to rightmost).

• Identify the recursion for C(i,j)

Problem 1

• Define C(i, j): the minimal cost of tour from i to 1( to leftmost) and from 1 to j (to rightmost).

• Identify the recursion for C(i,j)

• Sort those points by x-coordinates 1,…,n

Recursion

• Case i>j+1

),1()1,(),C( jiCiidistji

Recursion

• Case j>i+1

)1,()1j,j(),C( jiCdistji

Recursion

• Case j=i+1

)},1()1i,i(),1,()1i,j({min),C( jiCdistiiCdistji

Recursion

• Case j=i+1

)},1()1i,i(),1,()1i,j({min),C( jiCdistiiCdistji

Recursion

• Case j=i

)}1,()1i,i(),i,1-()1i,i({min),C( iiCdistiCdistii

Recursion

• Case j=i

)}1,()1i,i(),i,1-()1i,i({min),C( iiCdistiCdistii

Recursion

• Case j>i+1

)1,(),1-j(),C( jiCjdistji

Recursion

• Case i=j+1

)}2,()2,1(),1,2()2,(min{)1,C( iiCiidistiiCiidistii

Recursion

• Case i=j+1

)}2,()2,1(),1,2()2,(min{)1,C( iiCiidistiiCiidistii

• Each C(i,j) needs to deal with O(1) cases.• Output C(n,n).• Total time is

)( 2nO

Problem 2• Printing Neatly problem.• The extra space each line is

• Minimize the sum of the cube of extra space for all lines except the last.

ikklijM

Problem 2• Define a line extra space cube for printing

word i, word i+1,…, word j:

• Define C(k) to be the cost for printing word k, word k+1,…, word n.

3)(),(

ikklijMjiline

Recursion

• If word k, word k+1,…, word n can fit into one row, then C(k)=0.

• Otherwise, assume h is the maximal number of words from k to fit into one row:

)}1(),({min)( gCgklkC hkgk

• Each C(k) takes O(n) time.• Total time is

)( 2nO

Problem: Find an augmenting path to improve the red matching

Midterm

• >=90: 2

• 80-89: 3

• 70-79: 4

• 60-70: 5

• <60 : 2

Problem 1

Solve the following recursive equations with big-O notation:

T(n)=T(n-2)+n^3, with T(1)=1.

T(n)=16T(n/2)+n^2 ， with T(1)=1.

rcnbnaTnT )/()(

Problem 1

a) T(n)=T(n-1)+n^3, with T(1)=1.

Soltuion: T(n)=O(n^4)

b) T(n)=16T(n/2)+n^2 ， with T(1)=1.

Solution: T(n)=O(n^4)

Problem 2• Delete 7 5:9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 4 5 6 7 8 9

3:5 7:9

1:3 4:5 6:7 8:9

1 3 4 5 6 8 9

3:5 7:9

1:3 4:5 6:8:9

1 3 4 5 6 8 9

Problem 2• Delete 7

1:3 4:5 6:8:9

1 3 4 5 6 8 9

Problem 3 The following is a heap. A) show the steps to insert a

new element 1. b) Show the steps to remove the root after 1 is inserted.

11 8 6 4

Heap Insertion 2

11 8 6 4

Heap Insertion 2

1 8 6 4

Heap Insertion 2

7 8 6 4

Heap Insertion 1

7 8 6 4

Heap Deletion

7 8 6 4

Heap Deletion 2

7 8 6 4

Heap Deletion 2

11 8 6 4

Heap Deletion 2

11 8 6 4

Problem 5

Apply the Prim’s Algorithm to find the minimum spanning tree. Show each of your steps.

7 10 2 5 5 9 7 1 3 5 2 4

ALGORITHM Prim(G)// Input: A weighted connected graph G=(V,E)// Output: A MST T=(V, A) VT { v0 } // Any vertex will do; A ; for i 1 to |V|1 do find an edge xy (VT, VVT ) s.t. its weight is minimized among all edges in (VT, VVT ); VT VT { y } ; A A { xy } ;

Problem 5

5. (20%) Find an O(nlog n) time algorithm such that given two sets of integers A and B, it determines whether B is a subset of A, where n=max(|A|,|B|), which is the larger size of A and B.

For examples, if A={3, 7,5} and B={3,5}, then the algorithm returns “yes”; and if A={3, 7,5} and B={2,5}, then the algorithm returns “no”.

Problem 6

6. (20%) This is a job scheduling problem with one machine. Each job has a specific time interval to be executed by the machine. In order to allocate some jobs to the machine, all the jobs assigned to the machine must have disjoint time intervals. For example, the list of input jobs has time intervals: [1, 3], [2, 6], [5, 9], [7,13], [11, 15]. There is an overlap between [1,3] and [2,6]. Therefore, [1,3] and [2,6] cannot be assigned to the machine together. Three jobs can be assigned to the machine without overlap as below [1,3], [5,9], and [11,15] (all intervals are disjoint) .

Develop an algorithm for the scheduling problem to get the maximal number of jobs assigned to the machine. Show the time complexity of your algorithm. Hint: you may use a greedy method to solve this problem.

Improve Midterm by 20 points

Rewrite the solution for problem 6, and implement the algorithm with C++.

Submit your solution with test results.

Initialize-Single-Source(G,s)

{ for each vertex v∈V[G]

do d[v]∞

π[v]NILd[s]0

Relaxation Algorithm

• Use the edge (u,v) to improve the current known shortest path 。

Relax(u,v,w)

{ if d[v]>d[u]+w(u,v)

thend[v]d[u]+w(u,v)π[v]u

Relaxation Example

w(u,v)u v

if w(u,v)=2 (<3)

Before Relax(u,v,w) After Relax(u,v,w)

w(u,v)u v

Renew sv shortest path and π[v]u

if w(u,v)=4 (>3)

w(u,v)u v

Do not update sv shortest distance

Shortest Path and Relaxation• Triangular inequality ：

For every edge (u,v) ， δ(s,v)<= δ(s,u)+w(u,v)。

• Upper bound propert ：δ(s,v)<= d[v] ， d[v] is always the upper for the shortest distance sv 。 If d[v]=δ(s,v) ， then Relaxation does not update d[v] 。

Shortest Path and Relaxation• No Path ：

If there is no path from s to v, then d[v]=δ(s,v)=∞。

• Convergence Property ：If the shortest path sv has edge (u,v) and d[u]=δ(s,u) ， Then Relax(u,v,w) makes d[v]=δ(s,v) 。

Shortest Path and Relaxation

• Path-relaxation Property ：If p=(v0,v1,…,vk) is a shortest path s=v0vk, , then excuting Relax(v0,v1,w) ， Relax(v1,v2,w)… ， Relax(vk-1,vk,w) can achieve d[vk]=δ(s,vk) 。

• Predecessor graph Property ：After a series of Relaxation ， for every node v ， when d[v]=δ(s,v) ， the corresponding Predecessor graph Gπ is a Shortest-path tree rooted at s 。

Bellman-Ford Algorithm

• It computes the shortest paths for the graph without negative loop 。

Bellman-Ford(G,w,s){ Initialize-Single-Source(G,s)for i = 1 to |V-1|

do for each edge (u,v)∈Edo Relex(u,v,w)

for each edge (u,v)∈Edo if d[v]>d[u]+w(u,v)

then return false//Negative loopreturn true //Success

Bellman-Ford Algorithm Example

(d) (e)6

• Correctness ： For each edge, Relaxation can compute the next reachable node’s shortest path in the Shortest-path tree rooted at s 。 By path-relaxation property ， After |V|-1 ， All Shortest simple path destination v ， d[v]=δ(s,v) 。

• Time Complexity ：

– Initialize-Single-Source takes O(|V|) steps 。– For each edge, it spends O(|V|) time Relaxation and

costs O(|E||V|) steps 。– Finally, spends O(|E|) to check if it has negative loop

• Total time : O(|V||E|) 。

Dijkstra Algorithm

• Can only handle graph without negative edge 。

• It is faster than Bellman-ford algorithm ， and select an order to do Relaxation 。

• Use Priority queue for implementation 。

• Main idea ： Use the convergence property 。

Dijkstra Algorithm

Q: Priority queue with d as the key

Dijkstra(G,w,s)

{ Initialize-Single-Source(G,s)

Q=V[G]

while Q is not empty

do u=Extract-Min(Q)

for each v∈adj[u]

do Relax(u,v,w)

∞ ∞

10 ∞

Problem 6

Design an algorithm to test if an undirected graph is connected. A graph is connected if there exists a path between every two vertices. For examples, the left graph is connected, but the right graph is not.

Example

• Vertices a,b,c,d are reachable, but e is not.

Problem 6 Solution

• Assign weight one to each edge.

• Apply the minimal spanning tree algorithm

• The graph is connected iff the size of minimal spanning tree is n-1, where n is the number of nodes.

Problem 7

a) Design an O(n log n) time algorithm that given an array of n integers, it finds two elements a and b with |a-b|<5.

b) Improve the algorithm to O(n) time if the n integers in the input are in the range from 1 to 7n.

Problem 7 Examples• Connected Unconnected

Problem 7 Solution a)

• Apply the merge sorting. O(n log n) time

• If there two neighbors have difference < 5. O(n) time.

• Total time is O(n log n)+O(n)=O(nlog n)

Problem 7 Solution b)

• Define an array int a[7n]=0;

• Let a[k]=1 if k is in the list; O(n) time

• Check if there exists two 1s with distance less than 5 in array a[ ]. O(n) time

Problem 8

Suppose you have one machine and a set of n jobs a1, a2, …, an to process on that machine. Each job aj has a processing time tj, and a profit pj, and a deadline dj. The machine can process only one job at a time, and job aj must run uninterruptedly for tj consecutive time units. If job aj is completed by its deadline dj, you receive a profit pj, but if it is completed after its deadline, you receive a profit 0. Give an algorithm to find the schedule that obtains the maximum amount of profit, assuming that all processing times are integers between 1 and n. What is the running time of your algorithm.

Problem 8 Solution

• Try dymnamic programming method.

• Improve your midterm by working on it again.

• Due March 31 (Tuesday)

NP-completeness

NP Problems

blind monkey

Hamiltonian Path Problem• Given n cities• Does it exist a path through each city exactly once.

ORDPVD

MIADFW

Hamiltonian Path

Hamiltonian path goes through each node exactly once

HAMPATH={G| G is a directed graph with a Hamiltonian path}

Polynomialn: input size

is a polynomial of n, where c does not depend on n.

Examples:

,...,...,,, 10032 nnnn

Class P

P is the complexity class consisting of all decision problems that have polynomial-time algorithms

Polynomial-Time Decision Problems

• Decision problems: output is 1 or 0 (“yes” or “no”)• Examples:

• Is a given circuit satisfiable?

• Does a text T contain a pattern P?

• Does an instance of 0/1 Knapsack have a solution with benefit at least K?

• Does a graph G have an MST with weight at most K?

The Complexity Class P

• A complexity class is a collection of languages

• P is the complexity class consisting of all decision problems that have polynomial-time algorithms

• For each problem L in P, there is a polynomial-time decision algorithm A for L.– If n=|x|, for x in L (decision with “yes”), then A runs in p(n) time on input

– The function p(n) is some polynomial

Verifier

A verifier for a language L is an algorithm V,

L={w| V accepts <w,c> for some string c}

For the verifier V for L, c is a certificate of w if V accepts <w,c>

If the verifier V for the language L runs in polynomial time, V is the polynomial time verifier for L.

Verifier for Hamiltonian Path

For <G,s,t>, a certificate is a list of nodes of G:

Verifier:

check if m is the number of nodes of G

Check if all nodes are all different

check if each is a directed edge of G for

i=1,…,m-1

If all pass, accept . Otherwise, reject.

mvvv ,...,, 21

),( 1ii vv

NP example (2)

• Problem: Decide if a graph has an Hamilton tour with weight K

• Verification Algorithm: 1. Test that Tour containing all nodes

2. Test that Tour has weight at most K

• Analysis: Verification takes O(n) time, so this algorithm runs in polynomial time in non-deterministic algorithms.

• Thinking about this way: if we have such a tour, we can verify that.

Class NP

NP is the class of languages that have polynomial time verifiers.

Examples:

• HAMPATH is in NP

Clique Problem

Given undirected graph G, a clique is a set of nodes of G such that every two nodes are connected by an edge.

A k-clique is a clique with k nodes

clique5

Clique Problem

CLIQUE={<G,k>| G iss an undirected graph with k-clique}

CLIQUE is in NP.

Subset Sum Problem

SUBSET-SUM={<S,t>| S= and for some

, we have

},...,,{ 21 kxxx

},...,,{},...,,{ 2121 km xxxyyy

}...21 tyyy m

Polynomial Time Computable

A function is a polynomial time computable function if some polynomial time algorithm A exists that outputs for input w.

Polynomial Time Reduction

Assume that A and B are two languages.

A is polynomial time mapping reducible to A if a polynomial time computable function f exists such that

BwfAw )(

Transitivity• If and ,

BA P CB P

Boolean Formula

A literal is either a boolean variable or its negation:

A clause is the disjunction of several literals

Conjunctive normal form is the conjunction of several clauses

4321 xxxx

)()()( 636534321 xxxxxxxxx

A 3nd conjunctive normal formula (3nd-formula) is a conjunction form with at most 3 literals at each clause

3SAT={ | is satisfiable 3nd-formula}

)()()( 63653321 xxxxxxxx

3SAT to CLIQUE

Example:

)()()( 321321321 xxxxxxxxx

Outline• P and NP

– Definition of P– Definition of NP– Alternate definition of NP

• NP-completeness – Definition of NP-complete and NP-hard– The Cook-Levin Theorem

More Outline

• Some NP-complete problems – Problem reduction– SAT (and CNF-SAT and 3SAT)– Vertex Cover– Clique– Hamiltonian Cycle

What is a problem• A language is a set of strings

• A problem is a collection of instances

• An instance can be coded into a string

• A language=a problem

• Size of the problem refers to the length of string

• Algorithm that solves a problem A Turing machine accepts a language

Traveling Saleman Problem• Given n cities• Find a shortest path through each city exactly once.

ORDPVD

MIADFW

13871743

10991120

1233337

Running Time Revisited• Input size, n• All the polynomial-time algorithms studied so far in

this course run in polynomial time using this definition of input size.

ORDPVD

MIADFW

13871743

10991120

1233337

NP Problems

blind monkey

Problem• Given the formula f=

• construct a graph G such that f is satisfiable iff G has a clique of size 3.

)()()( 432121 xxxxxx

An Interesting Problem

Logic Gates:

Inputs:

Output:

A Boolean circuit is a circuit of AND, OR, and NOT gates; the CIRCUIT-SAT problem is to determine if there is an assignment of 0’s and 1’s to a circuit’s inputs so that the circuit outputs 1.

CIRCUIT-SAT is in NP

Logic Gates:

Inputs:

Output:

Non-deterministically choose a set of inputs and the outcome of every gate, then test each gate’s I/O.

If there is an input assignment, we can verify that in polynomial time.

NP-Completeness• Reduction: transfer a language to a subset of another

language. P-reduction means the process of transferring each string can be done in polynomial time.

• NP-complete class L: L is in NP. For each language M in NP, we can take an input x for M, transform it in polynomial time to an input x’ for L such that x is in M if and only if x’ is in L.

• L is NP-hard if it’s harder than NP-complete.

NP poly-time L

Cook-Levin Theorem• Cook’s Theorem: CIRCUIT-SAT is NP-complete.

– Proof: We already showed it is in NP.

– To prove it is NP-complete, we have to show that every language in NP can be reduced to it.

– Let M be in NP, and let x be an input for M.

– Let y be a certificate that allows us to verify membership in M in polynomial time, p(n), by some algorithm D.

– Let S be a circuit of size at most O(p(n)2) that simulates a computer (details omitted…)

NP poly-time CIRCUIT-SATM

Cook-Levin Proof

< p(n) cells

p(n)steps

We can build a circuit that simulates the verification of x’s membership in M using y.

Let W be the working storage for D (including registers, such as program counter); let D be given in RAM “machine code.”

Simulate p(n) steps of D by replicating circuit S for each step of D. Only input: y.

Circuit is satisfiable if and only if x is accepted by D with some certificate y

Total size is still polynomial: O(p(n)3).

Output0/1

from D

Some Thoughts about P and NP

• Belief: P is a proper subset of NP.

• Implication: the NP-complete problems are the hardest in NP.

• Why: Because if we could solve an NP-complete problem in polynomial time, we could solve every problem in NP in polynomial time.

• That is, if an NP-complete problem is solvable in polynomial time, then P=NP.

• Since so many people have attempted without success to find polynomial-time solutions to NP-complete problems, showing your problem is NP-complete is equivalent to showing that a lot of smart people have worked on your problem and found no polynomial-time algorithm.

CIRCUIT-SAT

NP-complete problems live here

Circuit Formula

Circuit

))31(6(

))46(5(

))21(3(

))43(2(

))22(1(1

• Demorgan Law:

zyxzyx

Truth table for

y1 y2 x2

1 1 1 0 1

1 1 0 1 0

1 0 1 0 1

1 0 0 0 1

0 1 1 1 0

0 1 0 0 1

0 0 1 1 0

0 0 0 1 0

))22(1( xyy f))22(1( xyyf

Convert to CNFConversion:

)221()221()221()221( xyyxyyxyyxyy

))221()221()221()221((

xyyxyyxyyxyy

Convert to CNFConversion:

)221()221()221()221( xyyxyyxyyxyy

))221()221()221()221((

xyyxyyxyyxyy

• The SAT problem is still NP-complete even if the formula is a conjunction of disjuncts, that is, it is in conjunctive normal form (CNF).

• The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation):– (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e)

• Reduction from SAT .

Problem• Given the formula f=

• construct a graph G such that f is satisfiable iff G has a clique of size 3.

)()()( 432121 xxxxxx

Showing NP-Completeness

x1 x3x2x1 x4x3x2 x4

Problem Reduction• A language M is polynomial-time reducible to a language L if

an instance x for M can be transformed in polynomial time to an instance x’ for L such that x is in M if and only if x’ is in L.– Denote this by ML.

• A problem (language) L is NP-hard if every problem in NP is polynomial-time reducible to L. (another way to define NP-hard.

• A problem (language) is NP-complete if it is in NP and it is NP-hard.

• CIRCUIT-SAT is NP-complete:– CIRCUIT-SAT is in NP– For every M in NP, M CIRCUIT-SAT.

Inputs:

Output:

Problem Reduction• A general problem M is polynomial-time reducible to a general problem L

if an instance x of problem M can be transformed in polynomial time to an instance x’ of problem L such that the solution to x is yes if and only if the solution to x’ is yes.– Denote this by ML.

• A problem (language) L is NP-hard if every problem in NP is polynomial-time reducible to L.

• A problem (language) is NP-complete if it is in NP and it is NP-hard.• CIRCUIT-SAT is NP-complete:

– CIRCUIT-SAT is in NP– For every M in NP, M CIRCUIT-SAT.

Inputs:

Output:

Transitivity of Reducibility• If A B and B C, then A C.

– An input x for A can be converted to x’ for B, such that x is in A if and only if x’ is in B. Likewise, for B to C.

– Convert x’ into x’’ for C such that x’ is in B iff x’’ is in C.

– Hence, if x is in A, x’ is in B, and x’’ is in C.

– Likewise, if x’’ is in C, x’ is in B, and x is in A.

– Thus, A C, since polynomials are closed under composition.

• Types of reductions:– Local replacement: Show A B by dividing an input to A into components

and show how each component can be converted to a component for B.

– Component design: Show A B by building special components for an input of B that enforce properties needed for A, such as “choice” or “evaluate.”

CNF-SAT• A Boolean formula is a formula where the variables and

operations are Boolean (0/1):– (a+b+¬d+e)(¬a+¬c)(¬b+c+d+e)(a+¬c+¬e)

– OR: +, AND: (times), NOT: ¬

• SAT: Given a Boolean formula S, is S satisfiable, that is, can we assign 0’s and 1’s to the variables so that S is 1 (“true”)?– Easy to see that CNF-SAT is in NP:

• Non-deterministically choose an assignment of 0’s and 1’s to the variables and then evaluate each clause. If they are all 1 (“true”), then the formula is satisfiable.

CNF-SAT is NP-complete• Reduce CIRCUIT-SAT to CNF-SAT.

– Given a Boolean circuit, make a variable for every input and gate.

– Create a sub-formula for each gate, characterizing its effect. Form the formula as the output variable AND-ed with all these sub-formulas:

• Example: m((a+b)↔e)(c↔¬f)(d↔¬g)(e↔¬h)(ef↔i)(m ↔kn)…Inputs:

Output:

The formula is satisfiable if and only if the Boolean circuit is satisfiable.

• The SAT problem is still NP-complete even if the formula is a conjunction of disjuncts, that is, it is in conjunctive normal form (CNF).

• The SAT problem is still NP-complete even if it is in CNF and every clause has just 3 literals (a variable or its negation):– (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e)

• Reduction from SAT .

Vertex Cover• A vertex cover of graph G=(V,E) is a subset W of V, such that, for every

edge (a,b) in E, a is in W or b is in W.

• VERTEX-COVER: Given a graph G and an integer K, does G have a vertex cover of size at most K?

• VERTEX-COVER is in NP: Non-deterministically choose a subset W of size K and check that every edge is covered by W.

Vertex-Cover is NP-completeReduce 3SAT to VERTEX-COVER.

Let S be a Boolean formula in CNF with each clause having 3 literals.For each variable x, create a node for x and ¬x, and connect these two:

For each clause Ci = (a+b+c), create a triangle and connect the three nodes.

truth settingcomponent

clause satisfyingcomponent

Vertex-Cover is NP-completeCompleting the construction

Connect each literal in a clause triangle to its copy in a variable pair.E.g., for a clause Ci = (¬x+y+z)

Let n=# of variablesLet m=# of clausesSet K=n+2mG has 3m+2n vertices

x ¬x z ¬z

Vertex-Cover is NP-complete

Example: (a+b+c)(¬a+b+¬c)(¬b+¬c+¬d)Graph has vertex cover of size K=4+6=10 iff formula is satisfiable.

¬cc¬aa ¬bb

Proof : Vertex-Cover is NP-complete

• We need to prove the following two statements:– Suppose there is an assignment of Boolean

values that satisfies S, then we need to prove that there is a k cover.

– Suppose the special graph has a k<=n+2m cover, we need to prove that the Boolean expression is satisfiable.

Why? (satisfiable cover)

• Suppose there is an assignment of Boolean values that satisfies S– Build a subset of vertices that contains each literal that

is assigned 1 by satisfying assignment– For each clause, the satisfying assignment must assign

one to at least one of the summands (may be shared by other clauses). Include the other two vertices in the vertex cover (not share with other).

– The cover has size n + 2m (as required).

Is What We Described a Cover?

• Each edge in a truth setting component (x+¬x) is covered.• Each edge in a clause satisfying component is covered

– Two of three edges incident on a clause satisfying component is covered.

– An edge (incident to a clause satisfying component) not covered by a vertex in the component must be covered by a node in cover C labeled with a literal, since the corresponding literal is 1 (by how we chose the vertices to be covered in the clause satisfying components)

– (Choose two from each clause and chose one that has true value in each truth setting component.)

Why? (cover satisfiable)

• Suppose there is a cover C with size at most n + 2m• For this special graph, any cover must contain at least one

vertex from each truth setting component, and two from each clause satisfying component, so size is at least n + 2m (so exactly that)

• So, one edge incident to any clause satisfying component is not covered by a vertex in the clause satisfying component. This edge must be covered by the other endpoint, which is labeled with a literal.

• We can associate the literal associated with this node 1 and each clause in S is satisfied, hence S is satisfied

Why? (cover satisfiable)

This is the complete proof.• Bottom line: S is satisfiable iff G has a vertex

cover of size at most n + 2m.• Bottom line 2: Vertex Cover is NP-Complete

Clique

• A clique of a graph G=(V,E) is a subgraph C that is fully-connected (every pair in C has an edge).

• CLIQUE: Given a graph G and an integer K, is there a clique in G of size at least K?

• CLIQUE is in NP: non-deterministically choose a subset C of size K and check that every pair in C has an edge in G.

This graph hasa clique of size 5

CLIQUE is NP-Complete

Reduction from VERTEX-COVER.A graph G has a vertex cover of size K if and only if it’s complement has a clique of size n-K.

Some Other NP-Complete Problems

• SET-COVER: Given a collection of m sets, are there K of these sets whose union is the same as the whole collection of m sets?– NP-complete by reduction from VERTEX-COVER

• SUBSET-SUM: Given a set of integers and a distinguished integer K, is there a subset of the integers that sums to K?– NP-complete by reduction from VERTEX-COVER

Some Other NP-Complete Problems

• 0/1 Knapsack: Given a collection of items with weights and benefits, is there a subset of weight at most W and benefit at least K?– NP-complete by reduction from SUBSET-SUM

• Hamiltonian-Cycle: Given an graph G, is there a cycle in G that visits each vertex exactly once?– NP-complete by reduction from VERTEX-COVER

• Traveling Salesperson Tour: Given a complete weighted graph G, is there a cycle that visits each vertex and has total cost at most K?– NP-complete by reduction from Hamiltonian-Cycle.

Beyond NP

Outline and Reading• Co-NP

– A language L is in Co-NP iff (-L) is in NP. – Example, non-saitisfiable, the language is defined

as all cases of Boolean expressions that are not saitisfiable.

• PSpace– A language is in Pspace if there is a TM accept it uses only

polynomial space in an offline machine.

Some facts• Co-NP=?NP, P=?PSpace.

– Do not know• PSpace=NPSpace • P is subset of Co-NP. P=Co-P• Other facts

– Co-NP <= PSPACE <= EXPTIME. – The validity problem for propositional logic is Co-NP-

complete. – Determinining whether a position in generalized checker

game is a winning position for one of the players is PSPACE-complete.

– ML type checking is EXPTIME-complete.

Turing Machine

• Write on the tape and read from it• Head can move left and right• Tape is infinite• Rejecting and accepting states

Control

a b a b ......

Deterministic Turing Machine7-tuple

1. Q is the finite set of states

2. is the input alphabet not containing special blank

3. is the tape alphabet

5. is the start state,

6. is the accept state

7. is the reject state, where

),,,,,,( 0 rejectaccept qqqQ

},{: RLQQ Qq 0

Qqaccept

Qqreject rejectaccept qq

Nondeterministic Turing Machine

5. is the accept state.

),,,,,( 0 rejectaccept qqqQ

}),{(: RLQPQ

Qqaccept

Configuration

• Current state: q7• Current head position on the tape: 4th cell• Current tape content: abab

a b a b ......

Configuration

A configuration is represented by

Where is the left part of the tape content,

is the right part of the tape content,

a is the symbol at the head position,

q is the current state

rightleftqaww

leftwrightw

Configuration Transition

),,(),( Lcqbq ji

udbav:Tape

udcav:Tape

dcavuqbavudq ji

),,(),( Rcqbq ji

udbav:Tape

udcav:Tape

avudcqbavudq ji

Configuration

Start configuration: , where w is the input

Accepting configuration: a configuration with state

Rejecting configuration: a configuration with state

acceptq

rejectingq

Accept Computation

A Turing machine M accepts input w if a sequence of configurations exists where

1. is the start configuration of M on input w,

2. each yields , and

3. is an accepting configuration

kCCC ,...,, 21

Language recognized by TM

For a Turing machine M, L(M) denotes the set of all strings accepted by M.

A language is Turing recognizable if some Turing machine recognizes it.

Turing Recognizable

• Turing machine M recognizes language L

accept

reject

foreverrun _

)(xMLx

Decidability

A language L is Turing decidable if there is a deterministic Turing machine M such that

• If x is in L, then M accepts x in finite number of steps

• If x is not in L, then M rejects x in finite number of steps

Example: {w#w| w is in {0,1}*} is Turing decidable

Turing Decidable

• Turing machine M decides language L

accept

reject

)(xMLx

stopsalways _

Observation

If L is Turing decidable, then L is Turing recognizable

NP-completeness

A language B is NP-complete if

1. B is in NP, and

2. Every A in NP is polynomial time reducible to B

Theorem. If B is NP-complete and B is in P, then P=NP.

A boolean formula is satisfiable if there exists assignments to its variables to make the formula true

SAT={ | is satisfiable boolean formula}

)()()(

63653321

xxxxxxxx

Cook-Leving Theorem

Theorem: SAT is NP-complete

Proof.

1. SAT is in NP.

2. For every problem A in NP, SATA P

1. The start configuration is legal

2. The final state is accept.

3. The movement is legal.

4. Each cell takes one legal symbol.

acceptmovestartcell

:start

:accept

1. 1 if The cell[i,j] holds symbol s; 0 otherwise

2. Time bound for the NTM M with constant k.

3. The movement is legal.

4. NTM M for accepting A.

:,, sjix

:}{#QC

:),,,,( 0 acceptqqQM

Nondeterministic Turing Machine

5. is the accept state.

),,,,,( 0 rejectaccept qqqQ

}),{(: RLQPQ

Qqaccept

),,(),( Lcqbq ji

udbav:Tape

udcav:Tape

dcavuqbavudq ji

),,(),( Rcqbq ji

udbav:Tape

udcav:Tape

avudcqbavudq ji

Configuration

Start configuration: , where w is the input

Accepting configuration: a configuration with state

Rejecting configuration: a configuration with state

acceptq

rejectingq

Accept Computation

A Turing machine M accepts input w if a sequence of configurations exists where

1. is the start configuration of M on input w,

2. each yields , and

3. is an accepting configuration

kCCC ,...,, 21

Language recognized by TM

For a Turing machine M, L(M) denotes the set of all strings accepted by M.

A language is Turing recognizable if some Turing machine recognizes it.

Each cell has only one symbol

1. The symbol is selected from C:

2. Only one symbol is selected:

3. It is true for all cell at all configuration:

))](()[( ,,,,,

tjisji

sjiCsnji

cell xxxk

)( ,, sjiCs

)( ,,,,,

tjisji

(...),1 knji

The start configuration is

#......# 210 nwwwq

0,2,1#,1,1 qstart xx

#,,1,1,1,4,1,3,1 ...... kk nnnn xxxx

nwnww xxx ,2,1,4,1,3,1 ...21

Accept computation has reached.

It makes sure the accept state will appear among the configuration transitions.

acceptk

qjinji

accept x ,,,1

Characterize the legal move

The whole move is legal if all windows are legal.

Characterize one window is legal

)___),((

legaliswindowji

)(654321

61,1,1,,1,1,1,1,,,,1,

_,...,

ajiajiajiajiajiaji

legalisaa

xxxxxx

The state transition

)},,(),,,{(),( 221 RaqLcqbq

:_ windowsLegal

:_ windowsIllegal

Boolean Formula

A literal is either a boolean variable or its negation:

A clause is the disjunction of several literals

Conjunctive normal form is the conjunction of several clauses

4321 xxxx

)()()( 636534321 xxxxxxxxx

Prepare for the Final

• Regular language and automata

• Context free language

• Decidability

• Undecidability

• Complexity theory

Regular Language

Concepts: Automata, regular expression

Skills: Design automata to accept a regular language

Disprove a language is a regular

}0|10{ iii

}0|1{ 23 ii

Context-free Language

Concepts: Context-free grammar, parsing tree

Skills: Design automata to accept a context-free language

Disprove a language is context-free

}0|210{ iiii

}0|10{ 2332 iii

Decidability

Concepts: Turing machine, algorithm, Church-Turing Thesis, Turing recognizable, Turing Decidable

Skills: Prove a language is decidable (design algorithm)

Prove a language is Turing recognizable

}0|210{ iiii

}_int___|),...,({ 1 solutionegerhasppolynomialxxpL n

}__|,{ wacceptsMwMATM

Undecidability

Concepts: Countable, Turing undecidable, reduction

Skills: Diagonal method: Prove is undeciable

Use reduction to prove a language is undecidable

TMmTM HALTA

Complexity

Concepts: Time on Turing machine

PTIME(t(n))

NP-completeness

Polynomial time reduction

Polynomial time verifier

)( kk nTIMEP

)( kk nNTIMENP

Complexity

Skill: Prove a problem is in P

Prove a problem is in NP

Use reduction to prove a problem is NP-complete.

CompositeCliqueSATSAT ,,3,

SATSAT

CLIQUESAT

• A:…

• B:…

• C: Miss exam or homework

SAT’

A conjunctive normal form is a conjunction of some clauses

SAT’={ | is satisfiable conjunctive normal form}

)()()(

6365315321

xxxxxxxxxx

Cook-Leving Theorem’

Theorem: SAT’ is NP-complete

Proof. Same as that for SAT is NP-complete

A 3nd conjunctive normal formula (3nd-formula) is a conjunction form with at most 3 literals at each clause

3SAT={ | is satisfiable 3nd-formula}

)()()( 63653321 xxxxxxxx

3SAT is NP-complete

Theorem: There is polynomial time reduction from SAT’ to 3SAT.

)()()( 63653321 xxxxxxxx

3SAT is NP-complete

is satisfiable if and only if the following is satisfiable

)( 4321 aaaa

)()( 4321 aazzaa

3SAT is NP-complete

)...( 21 laaa

)(...)(

231121

lll aazzaz

zazzaa

3SAT is NP-complete

Convert every clause into

3cnf:)...( 21 laaa

)(...)(

231121

lll aazzaz

zazzaa

3SAT is NP-complete

Conjunctive normal form

Each clause is convert into

kfff ...21

if ),...,2,1( kigi

kfff ...21

kggg ...21

Problem: Convert Circuit C to Formula f such that C is satisfiable iff f is satisfiable

Circuit C

Approximation Algorithms

Outline and Reading• Approximation Algorithms for NP-Complete

Problems – Approximation ratios– Polynomial-Time Approximation Schemes – 2-Approximation for Vertex Cover – Approximate Scheme for Subset Sum– 2-Approximation for TSP special case – Log n-Approximation for Set Cover

Approximation Ratios• Optimization Problems

– We have some problem instance x that has many feasible “solutions”.

– We are trying to minimize (or maximize) some cost function c(S) for a “solution” S to x. For example,

• Finding a minimum spanning tree of a graph

• Finding a smallest vertex cover of a graph

• Finding a smallest traveling salesperson tour in a graph

Approximation Ratios• An approximation produces a solution T

– T is a k-approximation to the optimal solution OPT if c(T)/c(OPT) < k (assuming a min. prob.; a maximization approximation would be the reverse)

Polynomial-Time Approximation Schemes

• A problem L has a polynomial-time approximation scheme (PTAS) if it has a polynomial-time (1+)-approximation algorithm, for any fixed >0 (this value can appear in the running time).

• Subset Sum has a PTAS.

Vertex Cover

• A vertex cover of graph G=(V,E) is a subset W of V, such that, for every (a,b) in E, a is in W or b is in W.

• OPT-VERTEX-COVER: Given an graph G, find a vertex cover of G with smallest size.

• OPT-VERTEX-COVER is NP-hard.

A 2-Approximation for Vertex Cover• Every chosen edge e has

both ends in C• But e must be covered by

an optimal cover; hence, one end of e must be in OPT

• Thus, there is at most twice as many vertices in C as in OPT.

• That is, C is a 2-approx. of OPT

• Running time: O(m)

Algorithm VertexCoverApprox(G)Input graph GOutput a vertex cover C for GC empty setH Gwhile H has edges

e H.removeEdge(H.anEdge()) v H.origin(e)

w H.destination(e)C.add(v)C.add(w)for each f incident to v or w

H.removeEdge(f)return C

Subset Sum

Given a set {x1,x2,…,xn} of integers and an integer t, find {y1,y2,…,yk} a subset of {x1,x2,…,xn} such that:

Approximate Solution for Subset Sum

• Find a subset {y1,y2,…,yk} from {x1,x2,…,xn} such that

• y1+y2+…+yk t

• Minimize (y1+y2+…+yk )/(z1+z2+…+zm ),

• Where z1+z2+…+zm is the optimal solution such that z1+z2+…+zm t and

t-(z1+z2+…+zm ) is minimal

Subset Sum

To prove NP-complete:

1. Prove is in NP• Verifiable in polynomial time• Give a nondeterministic algorithm

2. Reduction from a known NP-complete problem to subset sum

• Reduction from 3SAT to subset sum

Subset Sum is in NP

sum = 0

A = {x1,x2,…,xn}for each x in A

y choice(A)sum = sum + yif ( sum = t ) then successA A – {y}

donefail

Inequality

......!3!2

Inequality• Standard formulas

• Assume that , we have

......!3!2

21 xxex

Scaling factor• Select

• Each time the difference is scaled by factor

• After n time,

1()1(2

TrimmingExample:

L=< 10, 11, 12, 15, 20 ,21,22, 23, 24, 29>

It is trimmed to L’={10, 12, 15, 20, 23, 39>

Reduction

Goal: Reduce 3SAT to SUBSET-SUM.How: Let Ф be a 3 conjunctive normal formformula. Build an instance of SUBSET-SUMproblem (S, t) such that Ф is satisfiable if and only if there is a subset T of S whoseelements sum to t.Prove the reduction is polynomial.

1. Algorithm

Input: Ф - 3 conjunctive normal form formula

Variables: x1, x2, …, xl

Clauses: c1,c2,…,ck.

Output: S, t such that

Ф is satisfiable iff there is T subset of S

which sums to t.

1. Algorithm (cont.)x1 x2 …. xl c1 c2 …. ck

y1 1 0 0 1 0 0

z1 1 0 0 0 1 0

y2 1 0 0 0 1

z2 1 0 0 0 0

yl 1 0 0 0

zl 1 0 0 0

g1 1 0 0

h1 2 0 0

g2 1 0

h2 2 0

t 1 1 … 1 4 4 … 4

1. Algorithm (cont.)

(yi,xj), (zi,xj) – 1 if i=j, 0 otherwise

(yi,cj) – 1 if cj contains variable xi, 0 otherwise

(zi,cj) – 1 if cj contains variable x’i, 0 otherwise

(gi,xj), (hi,xj) – 0

(gi,cj), (hi,cj) – 1 if i=j, 0 otherwise

Each row represents a decimal number.

S={y1,z1,..,yl,zl,g1,h1,…,gk,hk}

t is the last row in the table.

2. Reduction ‘’

Given a variable assignment which satisfies

Ф, find T.

1. If xi is true then yi is in T, else zi is in T

2. Add gi and/or hi to T such all last k digits of T to be 4.

3. Reduction ‘’

Given T a subset of S which sums to t, find a

variable assignment which satisfies Ф.

1. If yi is in T then xi is true

2. If zi is in T then xi is false

4. Polynomial

Table size is (k+l)2

Example

),(),(

32143213

32123211

xxxCxxxC

x1 x2 x3 c1 c2 c3 c4

y1 1 0 0 1 0 0 1

z1 1 0 0 0 1 1 0

y2 1 0 0 0 1

z2 1 0 1 1 1 0

y3 1 0 0 1 1

z3 1 1 1 0 0

g1 1 0 0

h1 2 0 0

g2 1 0

h2 2 0

t 1 1 … 1 4 4 4 4

),(),(

32143213

32123211

xxxCxxxC

Special Case of the Traveling Salesperson Problem

• OPT-TSP: Given a complete, weighted graph, find a cycle of minimum cost that visits each vertex.– OPT-TSP is NP-hard– Special case: edge weights satisfy the triangle inequality

(which is common in many applications):• w(a,b) + w(b,c) > w(a,c)

A 2-Approximation for TSP Special Case

Output tour T

Euler tour P of MST M

Algorithm TSPApprox(G)Input weighted complete graph G, satisfying the triangle inequalityOutput a TSP tour T for GM a minimum spanning tree for GP an Euler tour traversal of M, starting at some vertex sT empty listfor each vertex v in P (in traversal order)

if this is v’s first appearance in P then T.insertLast(v)T.insertLast(s)return T

A 2-Approximation for TSP Special Case - Proof

Euler tour P of MST MOutput tour T Optimal tour OPT (twice the cost of M) (at least the cost of MST M)(at most the cost of P)

The optimal tour is a spanning tour; hence |M|<|OPT|.The Euler tour P visits each edge of M twice; hence |P|=2|M|Each time we shortcut a vertex in the Euler Tour we will not increase the total length, by the triangle inequality (w(a,b) + w(b,c) > w(a,c)); hence, |T|<|P|.Therefore, |T|<|P|=2|M|<2|OPT|

Problem• Convert the following spanning tree into a path so

that it provides 2-approximation for the traveling saleman probelm. Point out the edges not in the tree.

Set Cover

• OPT-SET-COVER: Given a collection of m sets, find the smallest number of them whose union is the same as the whole collection of m sets?

– OPT-SET-COVER is NP-hard

• Greedy approach produces an O(log n)-approximation algorithm. See §13.4.4 for details.

Algorithm SetCoverApprox(G)

Input a collection of sets S1…Sm Output a subcollection C with same union

F {S1,S2,…,Sm}C empty set

U union of S1…Sm while U is not empty

Si set in F with most elements in U

F.remove(Si)

C.add(Si)

Remove all elements in Si from Ureturn C

Final Exam

• May 11 (Tuesday)

• 5:45-8:25pm

Randomized Algorithm

bababa

Get the apple

blind monkey

Randomized algorithm

blind monkey

Randomized algorithm

blind monkey

Randomized algorithm• Randomized select 4 independent paths• Each path has ¼ chance to get apple• Each path has 1-1/4=3/4 to get nothing• It has chance to fail at all 4 paths• It has at least 1-(1/3)=2/3 to get an apple from trying the

4 paths• The worst case is that the monkey can get an apple after

trying 13 paths

Try 6 Paths

• It has probability to fail at all 4 paths• It has at least 1-0.178=0.822 probability to get an apple

from trying the 4 paths• The worst case is that the monkey can get an apple after

trying 13 paths

178.04

Polynomial Identity • Check if a polynomial is constantly equal to zero:

0)12()1( 22 xxx

0)2()1( 22 yxyyxyx

Degree of polynomial • The highest exponent among all monomial terms.

• A single variable polynomial is converted into the format below, it has degree n

For Example, 011

1 ... axaxaxa nn

12)1( 22 xxx

Degree of Multiple Variable Polynomial

• The polynomial

has multi-degree

if the highest degree (exponent) of is

• The degree of a variable in a multiple variable polynomial is its highest exponent.

• For example: the following polynomial has multi-degree (30, 100)

),...,,( 21 kxxxP),...,,( 21 kddd

100230 yxxyyx

• Each nonzero single variable polynomial of degree n has at n most different real roots.

Randomized algorithm Polynomial Identity

• Assume the polynomial P(x) has degree n• Randomly select n+1 different real numbers

• P(x) is zero iff all of

are zero

121 ,...,, nxxx

)(),...,(),( 121 nxPxPxP

Checking the identity of two lists

• Given two lists of integers, check if they will be the same after sorting.

• 5,1, 9,1,4 and 1, 4, 1,9,5

Two algorithms

• Check after they are sorted.

Time: O(n log n)

• Convert into two polynomials

Time: O(n)

Example• For the polynomial P(x) below, if let x=1,2,3,

P(1)=P(2)=P(3)=0.

)12()1( 22 xxx

Example• For the polynomial P(x) below, if let x=1,2,3,

then P(1)=P(2)=0, but P(3)=2.

)2)(1( xx

Two variables polynomial has infinite roots

• The polynomial

has infinite roots. It represents the circle of radius one and center at origin.

0122 yx

Two variables polynomial has infinite roots

• The polynomial

has infinite roots. It represents the circle of radius one and center at origin.

0122 yx

Randomized Algorithm• Randomly select a point (x,y) on the plane, if the

point is not in the circle boundary, then

0122 yx

Convert the multiple variable polynomial

• The polynomial

can be converted into the format:

5100230 yxxyx

)5(301002 xyxyx

Convert two variables polynomial• The polynomial

of multi-degree

Where each has degree at most

),( yxP

)()(...)()( 01

2xPyxPyxPyxP d

),( 21 dd

)(xPi 1d

Convert the multiple variable polynomial

• For polynomial

Replace y by

)5(301002 xyxyx 31x

)5(613102

313031002

Convert two variables polynomial• The polynomial

of multi-degree

Where each has multi-degree at most

),,( 321 xxxP

),(),(...),(),( 3101

211213

3xxPxxxPyxxPxxxP d

),,( 321 ddd

),( 21 xxPi),( 21 dd

Convert two variables polynomial• For the polynomial

of multi-degree

Replace y by

Which is and has degree at most

),( yxP

)()(...)()( 0)1(

1)1( 121

2xPxxPxxPxxP ddd

),( 21 dd

),( 11dxxP

)1)(1( 21 dd

Convert multiple variables polynomial into single variable poly.

• For the polynomial of multi-degree

It can be converted into a single variable polynomial of degree

Furthermore, is not zero iff is not zero

),...,,( 21 kxxxP

),...,,( 21 kddd

)1)...(1)(1( 21 kddd

),...,,( 21 kxxxP )( 1xQ

)( 1xQ

Randomized algorithm for multiple variables polynomial

• Input: the polynomial of multi-degree

Convert it into a single variable polynomial of degree (at most)

• Randomly select an integer z in

Evaluate , if P(…) is zero, then is zero. Otherwise, Q(z) is zero with chance <1/1000

),...,,( 21 kxxxP),...,,( 21 kddd

)1)...(1)(1( 21 kddd

)( 1xQ

)]1)...(1)(1(1000,1[ 21 kddd

)(zQ )(zQ

Big Open Problem

• Is there any deterministic algorithm such that

given a polynomial ,

the algorithm decides if it is identical to zero in

steps, where c is a constant and n is the length of the input polynomial.

),...,,( 21 kxxxP

Degree

• The degree of a monomial

• For example, has degree 3+21+7=31.

• The degree of a multi-variable polynomial is the largest degree of its monomials after sum of product expansion.

dd xxx ...21

kddd ...21

31 kxxx

Schwartz-Zippel Theorem

Let be a multivariate polynomial of degree d. Fix a set of integer S, and let be chosen randomly and uniformly from S. If

Then with probability at most ,

),...,( 1 nxxQnrr ,...,1

0),...,( 1 nxxQ

0),...,( 1 nrrQ

• Basis: The number of variables is one.

The polynomial has at most d different roots.

So, with probability at most ,

• Hypothesis: if the number of variables is n, then

with probability at most

d0)( rQ

0),...,( 1 nrrQ

• Induction: The number of variables is n+1.

With probability at most ,

If with probability at most ,

Therefore, with probability at most

0),,...,( 11 nn xxxQ

),...,(),,...,( 10

111 ni

innn xxQxxxxQ

0),...,( 1 nk rrQ

,0),...,( 1 ni rrQ || S

.0),...,(),,...,( 10

innn xxQxxxxQ

,|||||| S

.0),,...,( 11 nn xxxQ

Application

• Find the perfect matching of a bipartite.

• Convert it into determinate.

• Check if the determinate is zero.

Problem: Convert the multiple variable polynomial

• For polynomial P(x,y)=

Use the previous method to convert it into one variable

polynomial Q(x) so that P(x,y) is identical to zero iff Q(x) is identical to zero

11010102 yxxyxy

Expectation

• Let f(x) be a real-valued function. Then the expectation of f(x) is given by

.][)()]([ x

xXyprobabilitxfxfE

Independence

Two random variables X and Y are independent if

)]()[(

bYyprobabilitaXyprobabilit

bYandaXyprobabilit

Independence

Two random variables X and Y are independent if

][][][ YEXEXYE )]([)]([)]()([ YgEXfEYgXfE

Markov Inequality

Let Y be random variable assuming only non-negative values. Then for all t>0,

YEtYyprobabilit

Chernoff Bound

Therorem: Let be independent 0,1-random variables such that Then for

nXXX ,...,, 21

.]1[ pXyprobabilit n

nXXXX ...21,10

pnepnXyprobabilit ]

)1([])1([

Proof. For any real number t,

])1([)1( pnttX eeyprobabilit

pnXyprobabilit

Proof. Apply Markov inequality

.][]...[][

tXtXtX

eEeEeE

eeyprobabilit

pnXyprobabilit

Proof. By the definition of expectation,

).1(][ ppeeE ttX n

.1 xex

)1()1(

ProofProof. Apply Markov inequality

][]...[][

)1()1()1(

epepep

tXtXtX

eEeEeE

pnXyprobabilit

ProofFind a t to make it minimal. Let

).1ln( t

)1)(1ln(

)1( )1ln(

pnXyprobabilitt

Chernoff Bound

Therorem: Let be independent 0,1-random variables such that Then for

nXXX ,...,, 21

.]1[ pXyprobabilit n

nXXXX ...21,10

pnepnXyprobabilit2

Proof. For any real number t,

)1( pnttX eeyprobabilit

pnXyprobabilit

Proof. Apply Markov inequality

.][]...[][

tXtXtX

eEeEeE

eeyprobabilit

pnXyprobabilit

Proof. By the definition of expectation,

Proof.

][]...[][

)1()1()1(

epepep

tXtXtX

eEeEeE

pnXyprobabilit

Proof. Let

[)1()1(

)1()1()1(

)1)1((

)1))(1/(1(

)1( ))1/(1ln(

pnXyprobabilitt

))1/(1ln( t

Use the following inequality:

For each

.)1( 2/1 2 e

],1,0(

Homework 4

• Problem 1. An independent set of a graph G=(V,E) is a subset V’ of V of vertices such that each edge E is incident on at most one vertex in V. The independent set problem is to find the maximum-size independent set in G. Formulate the independent set problem and prove that it is NP-complete.

Solution of Problem 1

• We reduce the Clique problem to Independent set problem.

• Let G=(V,E) be a graph.

• Construct a graph G’=(V,E’) such that (u,v) is in E’ if and only if (u,v) is not in E.

• G has a clique of size k if and only if G’ has a independent set of size k.

Problem 2

Longest path problem is that given a graph G and an integer g, find in G a simple path of length g. Prove that longest path problem is NP-complete.

• Reduce the Hamiltonian path problem to the longest path problem.

• Let G=(V,E) be an input for the Hamiltonian path problem. Assume n=|V| (number of vertices).

• Let <G,n> be an instance for the longest path problem.

• G has a Hamiltonian path if and only if G has a path of vertices n.

Problem 3

In the hitting set problem, we are given a family of sets {S1, S2, …, Sn} and a budget b, and we wish to find a set H of size at most b which intersects every Si, if such an H exists. In other words, we want for all i.

Show that hitting set is NP-complete.

• Reduece the vertex cover problem to the hitting set problem.

• Let G=(V,E) be an input of vertex cover problem.

• Construct the hitting set problem with S1, S2,…, Sm such that each Si={u,v} for an edge (u,v) in E.

• The hitting set problem has solution of size b if and only if the graph can be covered by b vertices.

Problem 4

Show that for every problem A in NP, there is an algorithm which solves A in time

where n is the size of the input instance and p(n) is a polynomial (which may depend on A).

)2( )(npO

Polynomialn: input size

is a polynomial of n, where c does not depend on n.

Examples:

,...,...,,, 10032 nnnn

Class P

P is the complexity class consisting of all decision problems that have polynomial-time algorithms

Polynomial-Time Decision Problems

• Decision problems: output is 1 or 0 (“yes” or “no”)• Examples:

• Is a given circuit satisfiable?

• Does a text T contain a pattern P?

• Does an instance of 0/1 Knapsack have a solution with benefit at least K?

• Does a graph G have an MST with weight at most K?

The Complexity Class P

• A complexity class is a collection of languages

• P is the complexity class consisting of all decision problems that have polynomial-time algorithms

• For each problem L in P, there is a polynomial-time decision algorithm A for L.– If n=|x|, for x in L (decision with “yes”), then A runs in p(n) time on input

– The function p(n) is some polynomial

Verifier

A verifier for a language L is an algorithm V,

L={w| V accepts <w,c> for some string c}

For the verifier V for L, c is a certificate of w if V accepts <w,c>

If the verifier V for the language L runs in polynomial time, V is the polynomial time verifier for L.

Verifier for Hamiltonian Path

For <G,s,t>, a certificate is a list of nodes of G:

Verifier:

check if m is the number of nodes of G

Check if all nodes are all different

check if each is a directed edge of G for

i=1,…,m-1

If all pass, accept . Otherwise, reject.

mvvv ,...,, 21

),( 1ii vv

NP example (2)

• Problem: Decide if a graph has an Hamilton tour with weight K

• Verification Algorithm: 1. Test that Tour containing all nodes

2. Test that Tour has weight at most K

• Analysis: Verification takes O(n) time, so this algorithm runs in polynomial time in non-deterministic algorithms.

• Thinking about this way: if we have such a tour, we can verify that.

Class NP

NP is the class of languages that have polynomial time verifiers.

Examples:

• HAMPATH is in NP

Clique Problem

Given undirected graph G, a clique is a set of nodes of G such that every two nodes are connected by an edge.

A k-clique is a clique with k nodes

clique5

• Assume that A is a problem in NP.• A has a polynomial time verifier V(.). Let V(.) run in time

h(n)= .• For each input x of length n, x is in A if and only if there is

a certificate c such that V(x,c) accepts.• Since V runs in a polynomial time, the length of c is

bounded by a polynomial q(n).• The alphabet is finite number k of strings

Solution of problem 4

• There are at most certificates of length q(n).• For each certificates c, check if V(x,c) accepts.• If V(x,c) accepts for one of those certificates, then x is in A.• Total time is • Let p(n) be a polynomial

where k and d are constants.

knqdnknqndknqdnq nknh )(log)(loglog)()( 222)(

)()( nkqdnknqdn

Homework 5

Problem 1

Problem 1. Bin Packing Problem (35-1 from the textbook)• Suppose that we are given a set of n objects, where the size of the i-th object

satisfies 0< <1. We wish to pack all the objects into the minimum number of unit-size bins. Each bin can hold any subset of the objects whose total size does not exceed 1.

• Prove that the problem of determining the minimum number of bins required is NP-hard. (Hint: Reduce from the subset-sum problem).

• The first-fit heuristic takes each object in turn and places it into the first bin that can accommodate it. Let S=.

• Argue that the optimal number of bins required is at least [S].• Argue that the first-fit heuristic leaves at most one bin less than half full.• Prove that the number of bins used by the first-fit heuristic is never more than.• Prove an approximation ratio of 2 for the first-fit heuristic.• Analyze its running time.

Problem 2

Problem 2. A box contains n balls. Each ball is either in red or white colors. Let be an arbitrary constant in (0,1).

• a) Assume that the number of red ball is at least n/10. Develop a constant time algorithm that gives an (1-)-approximation for the number of red balls in the box.

• b) If the number of red ball is at least n/m, what is the time complexity of your algorithm to give an (1-)-approximation for the number of red balls.

• You can assume that the input is in an array char b[n], where each b[i] is either ‘r’ (red) or ‘w’ (white). Hint: Apply the Chernoff bound.

Problem 3

a) Develop an O(n) time algorithm such that given two lists of integers, decide if the second list is a permutation of the first. For example, 9,2, 13, 97 is a permutation of 2, 9, 13, 97. Do not use sorting algorithm that takes O(n log n) time.

b) Your algorithm may do a large number of multiplications that will generate very large numbers and slow down the computation. Propose some strategies to avoid large number multiplications in your algorithm.

Example

it outputs -

1x 2x 3x 4x

04321 xxxx

Example

it outputs -

02121 xxxx

introduction to algorithm instructor: dr. bin fu office: engr 3. 280 (third floor) email:...

input size n

input of size n

input numbers

running time tn

maximum time of algorithm

expected time of algorithm

sequentiallyrunning

inputs of size n

Documents

bfu – beratungsstelle für unfallverhütung see you - mach...

fai spazio alla sicurezza - bfu

fpga pro bfu - linuxdays · fpga pro bfu marek va sut ......

csci 6307 foundation of systems review: midterm exam xiang...

simple random samples vs. cluster samples - home bfu bpa upi...

10 5x018efr a319 cologne smell - bfu-web.de

study skills for college success presented by: utpa...

bfu-faktenblatt nr. 05...bfu-faktenblatt nr. 05 einleitung 5...

methocultâ„¢ formulation optimized for mouse bfu-e...

17-1339 ub en v03 u3 - final - bfu-web.de

bfu-jahresbericht 2014

tb29953-methocult formulation optimized mouse bfu e...

ethics: leadership in the counseling relationship counseling...

il mio libro ahia! - bfu

mental health issues common problems that interfere with...

03 fragebogen-svg de - bfu

bfu: communism and the masses

bfu: capitalism and investment

bulletin - bfu-web.de

bfu-fachdokumentation 2 - plattenverband€¦ ·...