discrete mathematics - mgnet home pagemgnet.org/~douglas/classes/discrete-math/notes/discrete... ·...

Discrete Mathematics

University of Kentucky CS 275Spring, 2007

Professor Craig C. Douglas

http://www.mgnet.org/~douglas/Classes/discrete-math/notes/2007s.pdf

http://www.mgnet.org/~douglas/Classes/discrete-math/notes/2007s.pdf

Material Covered (Spring 2007)

Tuesday Pages Thursday Pages1/11 1-9

1/16 9-24 1/18 24-331/23 34-45 1/25 46-521/30 53-65 2/1 Exam 12/6 66-73 2/8 74-832/13 84-92 2/15 92-942/20 95-106 2/22 106-1152/27 116-124 3/1 Exam 23/6 125-132 3/8 No class3/13 Spring 3/15 Break3/20 132-142 3/22 No class3/26 142-156 3/28 Exam 34/3 157-169 4/5 170-1774/10 178-185 4/12 186-1974/17 198-210 4/19 Exam 44/24 211-217 4/26 Rama: review5/1 No class 5/3 Final: 8-10 AM

The final exam will cover Chapters 1-10.

2

Course Outline

1. Logic Principles2. Sets, Functions, Sequences, and Sums3. Algorithms, Integers, and Matrices4. Induction and Recursion5. Simple Counting Principles6. Discrete Probability7. Advanced Counting Principles8. Relations9. Graphs

10. Trees11. Boolean Algebra12. Modeling Computation

3

Logic Principles

Basic values: T or F representing true or false, respectively. In a computer T an F may be represented by 1 or 0 bits.

Basic items:

Propositionso Logic and Equivalences

Truth tables Predicates Quantifiers Rules of Inference Proofs

o Concrete, outlines, hand waving, and false

4

Definition: A proposition is a statement of a true or false fact (but not both).

Examples:

2+2 = 4 is a proposition because this is a fact. x+1 = 2 is not a proposition unless a specific value of x is stated.

Definition: The negation of a proposition p, denoted by ¬p and pronounced not p, means that, “it is not the case that p.” The truth values for ¬p are the opposite for p.

Examples:

p: Today is Thursay, ¬p: Today is not Thursday. p: At least a foot of snow falls in Boulder on Fridays. ¬p: Less than a foot

of snow falls in Boulder on Fridays.

5

Definition: The conjunction of propositions p and q, denoted pq, is true if both p and q are true, otherwise false.

Definition: The disjunction of propositions p and q, denoted pq, is true if either p or q is true, otherwise false.

Definition: The exclusive or of propositions p and q, denoted pq, is true if only one of p and q is true, otherwise false.

Truth tables:

p ¬p q pq pq pqT F T T T F

T * F * F F T TF * T * T F T TF T F F F F

* The truth table for p and ¬p is really a 22 table.

6

Concepts so far can be extended to Boolean variables and Bit strings.

Definition: A bit is a binary digit. Hence, it has two possible values: 0 and 1.

Definition: A bit string is a sequence of zero or more bits. The length of a bit string is the number of bits.

Definition: The bitwise operators OR, AND, and XOR are defined based on , , and , bit by bit in a bit string.

Examples:

010111 is a bit string of length 6 010111 OR 110000 = 110111 010111 AND 110000 = 010000 010111 XOR 110000 = 100111

7

Definition: The conditional statement is an implication, denoted pq, and is false when p is true and q is false, otherwise it is true. In this case p is known as a hypothesis (or antecedent or premise) and q is known as the conclusion (or consequence).

Definition: The biconditional statement is a bi-implication, denoted pq, and is true if and only if p and q have the same truth table values.

Truth tables:

p q pq pqT T T TT F F FF T T FF F T T

8

We can compound logical operators to make complicated propositions. In general, using parentheses makes the expressions clearer, even though more symbols are used. However, there is a well defined operator precedence accepted in the field. Lower numbered operators take precedence over higher numbered operators.

Operator Precedence¬ 1 2 3 4 5

Examples:

¬pq = (¬p) q pqr = (pq) r

9

Definition: A compound proposition that is always true is a tautology. One that is always false is a contradiction. One that is neither is a contingency.

Example:

p ¬p p¬p p¬pT F F TF T F Tcontigencies contradiction tautology

Definition: Compound propositions p and q are logically equivalent if pq is a tautology and is denoted pq (sometimes written as pq instead).

10

Theorem: ¬(pq) ¬p ¬q.Proof: Construct a truth table.

p q ¬(pq) ¬p ¬q ¬p¬qT T F F F FT F F F T FF T F T F FF F T T T T

qed

Theorem: ¬(pq) ¬p ¬q.Proof: Construct a truth table similar to the previous theorem.

These two theorems are known as DeMorgan’s laws and can be extended to any number of propositions:

¬(p1p2…pk) ¬ p1 ¬ p2 … ¬ pk

¬(p1p2…pk) ¬ p1 ¬ p2 … ¬ pk

Theorem: pq ¬pq.

11

Proof: Construct a truth table.

p q pq ¬p ¬pqT T T F TT F F F FF T T T TF F T T T

qed

These proofs are examples are concrete ones that are proven using an exhaustive search of all possibilities. As the number of propositions grows, the number of possibilities grows like 2k for k propositions.

The distributive laws are an example when k=3.

12

Theorem: p (qr) (pq)(pr).Proof: Construct a truth table.

p q r p (qr) pq pr (pq)(pr)T T T T T T TT T F T T T TT F T T T T TT F F T T T TF T T T T T TF T F F T F FF F T F F T FF F F F F F F

qed

Theorem: p (qr) (pq) (pr).Proof: Construct a truth table similar to the previous theorem.

13

Some well known logical equivalences includes the following laws:

LawpTppFp

Identity

pTTpFF

Domination

pppppp

Idempotent

¬(¬p) p Double negationp¬p Tp¬p F

Negation

pqqppqqp

Commutative

(pq)r p(qr)(pq) r p(qr)

Associative

p(qr) (pq)(qr)p(qr) (pq)(qr)

Distributive

14

Law¬(pq) ¬p¬q¬(pq) ¬p¬q

DeMorgan

p(pq)pp(pq)p

Absorption

All of these laws can be proven concretely using truth tables. It is a good exercise to see if you can prove some.

15

Well known logical equivalences involving conditional statements:pq ¬pq

pq ¬q¬ppq ¬pq

pq ¬(p¬q)¬(pq) p¬q

(pq)(pr) p(qr)(pr)(qr) (pq)r(pq)(pr) p(qr)(pr)(qr) (pq)r

Well known logical equivalences involving biconditional statements:pq (pq)(qp)

pq ¬p¬qpq (pq) (¬p¬q)

¬(pq) p¬q

16

Propositional logic is pretty limited. Almost anything you really are interested in requires a more sophisticated form of logic: predicate logic with quantifiers (or predicate calculus).

Definition: P(x) is a propositional function when a specific value x is substituted for the expression in P(x) gives us a proposition. The part of the expression referring to x is known as the predicate.

Examples:

P(x): x > 24. P(2) = F, P(102) = T. P(x): x = y + 1. P(x) = T for one value only (y is an unbounded variable). P(x,y): x = y + 1. P(2,1) = T, P(102,-14) = F.

Definition: A statement of the form P(x1,x2,…,xn) is the value of the propositional function P at the n-tuple (x1,x2,…,xn). P is also known as a n-place (or n-ary) predicate.

17

Definition: The universal quantification of P(x) is the statement P(x) is true for all values of x in some domain, denoted by x P(x).

Definition: The existential quantification of P(x) is the statement P(x) is true for at least one value of x in some domain, denoted by x P(x).

Definition: The uniqueness quantification of P(x) is the statement P(x) is true for exactly one value of x in some domain, denoted by !x P(x).

There is an infinite number of quantifiers that can be constructed, but the three above are among the most important and common.

Examples: Assume x belongs to the real numbers.

x<0 (x2 > 0). The negative real numbers form the domain. !x (x1223 = 0).

and have higher precedence than the logical operators.

18

Example: x P(x)Q(x) means (x P(x))Q(x).

Definition: When a variable is used in a quantification, it is said to be bound. Otherwise the variable is free.

Example: x (x = y + 1).

Definition: Statements involving predicates and quantifiers are logically equivalent if and only if they have the same truth value independent of which predicates are substituted and in which domains are used. Notation: S T.

DeMorgan’s Laws for Negation:

¬x P(x) x ¬P(x). ¬x P(x) x ¬P(x).

Nested quantifiers just means that more than one is in a statement. The order of quantifiers is important.

19

Examples: Assume x and y belong to the real numbers. xy (x + y = 0). xy (x < 0) (y > 0) xy < 0.

Quantification of two variables:

Statement When True? When False?xy P(x,y) For all x and y, P(x,y)=T. There is a pair of x and y such that

P(x,y)=F.xy P(x,y) For all x there is a y such

that P(x,y)=TThere is an x such that for all y, P(x,y)=F.

xy P(x,y) There is an x such that for all y, P(x,y)=T.

For all x there is a y such that P(x,y)=F.

xy P(x,y) There is a pair x and y such that P(x,y)=T.

For all x and y, P(x,y)=F.

Rules of Inference are used instead of truth tables in many instances. For n variables, there are 2n rows in a truth table, which gets out of hand quickly.

20

Definition: A propositional logic argument is a sequence of propositions. The last proposition is the conclusion. The earlier ones are the premises. An argument is valid if the truth of the premises implies the truth of the conclusion.

Definition: A propositional logic argument form is a sequence of compound propositions involving propositional variables. An argument form is valid if no matter what particular propositions are substituted for the proposition variables in its premises, the conclusion remains true if the premises are all true.

Translation: An argument form with premises p1, p2, …, pn and conclusion q is valid when (p1p2…pn) q is a tautology.

21

There are eight basic rules of inference.

Rule Tautology Nameppqq

[p( pq)] q Modus ponens

¬qpq¬p

[¬q(pq)] ¬p Modus tollens

pqqr pr

[(pq)(qr)] (pr) Hypothetical syllogism

pq¬pq

[(pq)¬p] q Disjunctive syllogism

ppq

p (pq) Addition

22

Rule Tautology Namepqp

(pq) p Simplification

pqpq

[(p)(q)] (pq) Conjunction

pq¬prqr

[(pq)(¬pr)] (qr) Resolution

23

Rules of Inference for Quantified Statements:

Rule of Inference Namex P(x)P(c)

Universal instantiation

P(c) for an arbitrary cx P(x)

Universal generalization

x (P(x) Q(x))P(a), where a is a particular element in the domainQ(a)

Universal modus ponens

x (P(x) Q(x))¬Q(a), where a is a particular element in the domain¬P(a)

Universal modus tollens

x P(x)P(c) for some c

Existential instantiation

P(c) for some cx P(x)

Existential generalization

Sets, Functions, Sequences, and Sums

24

Definition: A set is a collection of unordered elements.

Examples:

Z = {…, -3, -2, -1, 0, 1, 2, 3, …} N = {1, 2, 3, …} and = N0 = {0, 1, 2, 3, …} (Slightly different than text) Q = {p/q | p,qZ, q0} R = {reals}

Definition: The cardinality of a set S is denoted |S|. If |S| = n, where nZ, then the set S is a finite set. Otherwise it is an infinite set (|S| = ).

Example: The cardinality of of Z, N, N0, Q, and R is infinite.

Definition: If |S| = |N|, then S is a countable set. Otherwise it is an uncountable set.

25

Examples:

Q is countable. R is uncountable.

Definition: Two sets S and T are equal, denoted S = T, if and only if x(xS xT).

Examples:

Let S = {0, 1, 2} and T = {2, 0, 1}. Then S = T. Order does not count. Let S = {0, 1, 2} and T = {0, 1, 3}. Then S T. Only the elements count.

Definition: The empty set is denoted by . Note that S(S).

26

Definition: A set S is a subset of a set T if xS(xT) and is denoted ST. S is a proper subset of T if ST, but ST and is denoted ST.

Example: S = {1, 0} and T = {0, 1, 2}. Then ST.

Theorem: S(SS).Proof: By definition, xS(xS).

27

Definition: The Power Set of a set S, denoted P(S), is the set of all possible subsets of S.

Theorem: If |S| = n, then |P(S)| = 2n.

Example: S = {0, 1}. Then P(S) = {, {0}, {1}, {0,1}}

Definition: The Cartesian product of n sets Ai is defined by ordered elements from the Ai and is denoted A1A2…An = {(a1,a2,…an) | aiAi}.

Example: Let S = {0, 1} and T = {a, b}. Then ST = {(0,a), (0,b), (1,a), (1,b)}.

Definition: The union of n sets Ai is defined by

Aii=1

nU = A1A2…An = {x | i xAi}.

Definition: The intersection of n sets Ai is defined by

Aii=1

nI = A1A2…An = {x | i xAi

28

Definition: n sets Ai are disjoint if A1A2…An = .

Definition: The complement of set S with respect to T, denoted TS, is defined by TS = {xT | xS}. TS is also called the difference of S and T.

Definitions: The universal set is denoted U. The universal complement of S isS = US.

29

Examples:

Let S = {1, 0} and T = {0, 1, 2}. Theno ST.o ST = S.o ST = T.o TS = {2}.o Let U = N0. S = {2, 3, …}

Let S = {0, 1} and T = {2, 3}. Theno ST.o ST = .o ST = {0, 1, 2, 3}.o TS = {2, 3}.o Let U=R. Then S is the set of all reals except the integers 0 and 1, i.e.,

S = {xR | x0 x1}.

30

The textbook has a large number of set identities in a table.

Identity Law(s)A = A, AU = A IdentityAU = U, A = DominationAA = A, AA = A IdempotentA = A ComplementationAB = BA, AB = BA CommutativeA(BC) = (AB)C, A (BC) = (AB) C AssociativeA (BC) = (AB) (AC)A(BC) = (AB) (AC)

Distributive

A∪B = A∩B, A∩B = A∪B DeMorganA (AB) = A, A (AB) = A AbsorptionA∪A = U, A∩A = ∅ Complement

Many of these are simple to prove from very basic laws.

31

Definition: A function f:AB maps a set A to a set B, denoted f(a) = b for aA and bB, where the mapping (or transformation) is unique.

Definition: If f:AB, then

If bB aA (f(a) = b), then f is a surjective function or onto. If A=B and f(a) = f(b) implies a = b, then f is one-to-one (1-1) or injective. A function f is a bijection or a one-to-one correspondence if it is 1-1 and

onto.

Definition: Let f:AB. A is the domain of f. The minimal set B such that f:AB is onto is the image of f.

Definitions: Some compound functions include fii

n∑( )(a)= fii=1

n∑ (a) . We can substitute + if we expand the summation.

fii=1

n∏( )(a)= fii=1

n∏ (a) . We can substitute * if we expand the product.

32

Definition: The composition of n functions fi: AiAi+1 is defined by(f1f2…fn)(a) = f1(f2(…(fn(a)…)),

where aA1.

Definition: If f: AB, then the inverse of f, denoted f-1: BA exists if and only if bB aA (f(a) = b f-1(b) = a).

Examples: Let A = [0,1] R, B = [0,2] R.

o f(a) = a2 and g(a) = a+1. Then f+g: AB and f*g: AB.o f(a) = 2*a and g(a) = a-1. Then neither f+g: AB nor f*g: AB.

Let B = A = [0,1] R.o f(a) = a2 and g(a) = 1-a. Then f+g: AA and f*g: AA. Both

compound functions are bijections.o f(a) = a3 and g(a) = a1/3. Then gf(a): AA is a bijection.

Let A = [-1, 1] and B=[0, 1]. Theno f(a) = a3 and g(a) = {x>0 | x= a1/3}. Then gf(a): AB is onto.

Definition: The graph of a function f is {(a,f(a)) | aA}.

33

Example: A = {0, 1, 2, 3, 4, 5} and f(a) = a2. Then

(a) graph(f,A) (b) an approximation to graph(f,[0,5])

34

Definitions: The floor and ceiling functions are defined by

x = largest integer smaller or equal to x. x = smallest integer larger or equal to x.

Examples:

2.99 = 2, 2.99 = 3 -2.99 = -3, -2.99 = -2

Definition: A sequence is a function from either N or a subset of N to a set A whose elements ai are the terms of the sequence.

Definitions: A geometric progression is a sequence of the form {ar i, i=0, 1, …}. An arithmetic progression is a sequence of the form {a+id, i=0, 1,…}.

Translation: f(a,r,i) = ari and f(a,d,i) = a + id are the corresponding functions.

35

There are a number of interesting summations that have closed form solutions.

Theorem: If a,rR, then

ar ii=0

n∑ =(n+1)a, if r=1,

arn+1ar1

, otherise.

⎧⎨⎪

⎩⎪Proof: If r = 1, then we are left summing a n+1 times. Hence, the r = 1 case is trivial. Suppose r 1. Let S = ari.i=0

n∑ Then

rS = r arii=0n

∑ Substitution S formula.

arii=1n+1

∑ Simplifying.

arii=0n

∑⎛⎝⎜⎞⎠⎟+ ar

n+1a( ) Removing n+1 term and adding 0 term.

S+(arn+1-a) Substituting S for formula

Solve for S in rS = S+(arn+1-a) to get the desired formula. qedSome other common summations with closed form solutions are

36

Sum Closed Form Solutionii=1

n∑ n(n+1)

2i2i=1

n∑ n(n+1)(2n+1)

6i3i=1

n∑ n2(n+1)2

4xi

i=0∞

∑ , |x|<1 (1x)-1

ixi-1,i=1∞

∑ |x|<1 (1x)-2

Proving some of these requires knowledge about limits. There are close ties to integral and differential calculus, which is no surprise since integration is summation taken to a limit.

Example: limi→∞xi = 0 when |x|<1. Using the Theorem on the previous page, we get the result for xi

i=0∞

∑ , |x|<1.

37

Definition: Let f and g be functions from either Z or R to R. Then f(x) is O(g(x)) if there are constants C and k such that |f(x)|C|g(x)| whenever x>k.

Pronunciation: f(x) is Big Oh of g(x).

Examples:

f(x) = x2+2x is O(xn)o When 0x1, x2x, so 0 x2+2x x+2x 3xo When x1, xx2, so 0 x2+2x x2+2x2 = 3x2

In general, f(x) = aixi

i=0n

∑ with an0 is O(xn) when x1. n! is O(nn) when n1.

o n! = 12…n nn…n = nn. log(n!) is O(nlogn) when n1.

o log(n!) log(nn) = nlog(n) log(n) is O(n) when n1.

Theorem: If fi(x) is O(gi(x)), for 1in, then

38

fi(x)i=1n

∑ is O(max{|g1(x)|, |g2(x)|, …, |gn(x)|}).

Proof: Let g(x) = max{|g1(x)|, |g2(x)|, …, |gn(x)|} and Ci the constants associated with O(gi(x)). Then

fi(x)i=1n

∑ Cii=1n

∑ gi(x) Cii=1n

∑ g(x) = |g(x)| Cii=1n

∑ = C|g(x)|.

Theorem: If fi(x) is O(gi(x)), for 1in, then fi(x)i=1n

∏ is O( gi(x)i=1n

∏ ).

Proof: Let g(x) = |g1(x)||g2(x)|…|gn(x)| and Ci the constants associated with O(gi(x)). Then

fi(x)i=1n

∏ Ci gi(x)i=1n

∏ C gi(x)i=1n

∏ .

39

Definition: Let f and g be functions from either Z or R to R. Then f(x) is (g(x)) if there are constants C and k such that |f(x)| C|g(x)| whenever x>k.

Definition: Let f and g be functions from either Z or R to R. Then f(x) is (g(x)) if f(x) = O(g(x)) and f(x) = (g(x)). In this case, we say that f(x) is of order g(x).

Comment: f(x) = O(g(x)) notation is great in the limit, but does not always provide the right bounds for all values of x. , denoted Big Omega, is used to provide lower bounds. , denoted Big Theta, is used to provide both lower and upper bounds.

Example: f(x) = aixi

i=0n

∑ with an0 is of order xn.

40

Notation: Timing, as a function of the number of elements falls into the field of Complexity.

Complexity Terminology(1) Constant(log(n)) Logarithmic(n) Linear(nlog(n)) nlog(n)(nk) Polynomial(nklog(n)) Polylog(kn), where k>1 Exponential(n!) Factorial

Notation: Problems are tractable if they can be solved in polynomial time and are intractable otherwise.

41

Algorithms, Integers, and Matrices

Definition: An algorithm is a finite set of precise instructions for solving a problem.

Computational algorithms should have these properties:

Input: Values from a specified set. Output: Results using the input from a specified set. Definiteness: The steps in the algorithm are precise. Correctness: The output produced from the input is the right solution. Finiteness: The results are produced using a finite number of steps. Effectiveness: Each step must be performable and in a finite amount of

time. Generality: The procedure should accept all input from the input set, not

just special cases.

42

Algorithm: Find the maximum value of ai⎧⎨⎩

⎫⎬⎭i=1

n, where n is finite.

procedure max( ai⎧⎨⎩

⎫⎬⎭i=1

n: integers)

max := a1

for i := 2 to nif max < ai then max := ai

{max is the largest element}

Proof of correctness: We use induction.1. Suppose n = 1, then max := a1, which is the correct result.2. Suppose the result is true for k = 1, 2, …, i-1. Then at step i, we know that

max is the largest element in a1, a2, …, ai-1. In the if statement, either max is already larger than ai or it is set to ai. Hence, max is the largest element in a1, a2, …, ai. Since i was arbitrary, we are done. qed

This algorithm’s input and output are well defined and the overall algorithm can be performed in O(n) time since n is finite. There are no restrictions on the input set other than the elements are integers.

43

Algorithm: Find a value in a sorted, distinct valued ai⎧⎨⎩

⎫⎬⎭i=1

n, where n is finite.

There are many, many search algorithms.

procedure linear_search(x, ai⎧⎨⎩

⎫⎬⎭i=1

n: integers)

i := 1while (in and xai)

i := i + 1if in then location := i else location := 0

{location is the subscript of ai⎧⎨⎩

⎫⎬⎭i=1

n equal to x or 0 if x is not in ai

⎧⎨⎩

⎫⎬⎭i=1

n}

We can prove that this algorithm is correct using an induction argument. This algorithm does not rely on either distinctiveness nor sorted elements.

Linear search works, but it is very slow in comparison to many other searching algorithms. It takes 2n+2 comparisons in the worst case, i.e., O(n) time.

44

procedure binary_search(x, ai⎧⎨⎩

⎫⎬⎭i=1

n: integers)

i := 1j := nwhile ( i < j )

m := (i+j)/2if x > am then i := m+1 else j := m

if x = ai then location := i else location := 0

{location is the subscript of ai⎧⎨⎩

⎫⎬⎭i=1

n equal to x or 0 if x is not in ai

⎧⎨⎩

⎫⎬⎭i=1

n}

We can prove that this algorithm is correct using an induction argument.

This algorithm is much, much faster than linear_search on average. It is O(logn)

in time. The average time to find a member of ai⎧⎨⎩

⎫⎬⎭i=1

n can be proven to be of

order n.

45

Algorithm: Sort the distinct valued ai⎧⎨⎩

⎫⎬⎭i=1

n into increasing order, where n is

finite.

There are many, many sorting algorithms.

procedure bubble_sort( ai⎧⎨⎩

⎫⎬⎭i=1

n: reals, n1)

for i := 1 to n-1for j := 1 to n-i

if aj > aj+1 then swap aj and aj+1

{ ai⎧⎨⎩

⎫⎬⎭i=1

n is in increasing order}

This is one of the simplest sorting algorithms. It is expensive, however, but quite easy to understand and implement. Only one temporary is needed for the swapping and two loop variables as extra storage. The worst case time is O(n2).

46

procedure insertion_sort( ai⎧⎨⎩

⎫⎬⎭i=1

n: reals, n1)

for j := 2 to ni := 1while aj > ai

i := i + 1t := aj

for k := 0 to j-i-1aj-k := aj-k-1

ai := t

{ ai⎧⎨⎩

⎫⎬⎭i=1

n is in increasing order}

This is not a very efficient sorting algorithm either. However, it is easy to see that at the jth step that the jth element is put into the correct spot. The worst case time is O(n2). In fact, insertion_sort is trivially slower than bubble_sort.

47

Number theory is a rich field of mathematics. We will study four aspects briefly:

1. Integers and division2. Primes and greatest common denominators3. Integers and algorithms4. Applications of number theory

Most of the theorems quoted in this part of the textbook require knowledge of mathematical induction to rigorously prove, a topic covered in detail in the next chapter.

48

Theorem (Division Algorithm): Let a,dZ(d > 0). Then !q,rZ(a = dq+r).

Definition: In the division algorithm, a is the dividend, d is the divisor, q is the quotient, and r is the remainder. We write q = a div d and r = a mod d.

Examples:

Consider 101 divided by 9: 101 = 119 + 2. Consider -11 divided by 3: -11 = 3(-4) + 1.

Definition: Let a,b,mZ(m > 0). Then a is congruent to b modulo m if m | (a-b), denoted a b (mod m). The set of integers congruent to an integer a modulo m is called the congruence class of a modulo m.

Theorem: Let a,b,mZ(m > 0). Then a b (mod m) if and only if a mod m = b mod m.

50

Examples:

Does 17 5 mod 6? Yes, since 17 – 5 = 12 and 6 | 12. Does 24 14 mod 6? No, since 24 – 14 = 10, which is not divisible by 6.

Theorem: Let a,b,mZ(m > 0). Then a b (mod m) if and only if kZ(a=b+km).

Proof: If a b (mod m), then m | (a-b). So, there is a k such that a-b = km, or a = b+km. Conversely, if there is a k such that a = b + km, then km = a-b. Hence, m | (a-b), or a b (mod m).

Theorem: Let a,b,c,d,mZ(m > 0). If a b (mod m) and c d (mod m), thena+c b+d (mod m) and ac bd (mod m).

Corollary: Let a,b,mZ(m > 0). Then (a+b) mod m = ((a mod m)+(b mod m)) mod m and (ab) mod m = ((a mod m)(b mod m)) mod m.Some applications involving congruence include

51

Hashing functions h(k) = k mod m. Pseudorandom numbers: xn+1 = (axn+c) mod m.

o c = 0 is known as a pure multiplicative generator.o c 0 is known as a linear congruential generator.

Cryptography

Definition: A positive integer a is a prime if it is divisible only by 1 and a. It is a composite otherwise.

Fundamental Theorem of Arithmetic: Every positive integer greater than 1 can be written uniquely as a prime or the product of two or more primes where the prime factors are written in nondecreasing order.

Theorem: If a is a composite number, then a has a prime divisor less than or equal to a1/2.

Theorem: There are infinitely many primes.

52

Prime Number Theorem: The ratio of primes not exceeding a and x/ln(a) approaches 1 as a.

Example: The odds of a randomly chosen positive integer n being prime is given by (n/ln(n))/n = 1/ln(n) asymptotically.

There are still a number of open questions regarding the distribution of primes.

Definition: Let a,bZ(a and b not both 0). The largest integer d such that d | a and d | b is the greatest common devisor of a and b, denoted by gcd(a,b).

Example: gcd(24,36) = 12.

Definition: The integers a and b are relatively prime if gcd(a,b) = 1.

53

Definition: The integers ai⎧⎨⎩

⎫⎬⎭i=1

n are pairwise relatively prime if gcd(ai,aj) = 1

whenever 1i<jn.

Examples:

{10, 17, 121} are relatively prime. {10, 19, 124} are not relatively prime.

Definition: The least common multiple of positive integers a and b is the smallest positive integer that is divisible by both a and b, denoted lcm(a,b).

Theorem: Let a and b be positive integers. Then ab = gcd(a,b)lcm(a,b).

54

Integers can be expressed uniquely in any base.

Theorem: Let bZ(b>1). Then if nN, then there is a unique expression such that n = akbk+ ak-1bk-1+…+a1b+a0, where {ai},kN0, ak0, and 0ai<b. n is written by n = (ak ak-1… a1a0)b.

Examples:

(123)5 = 152 + 25 + 3 = (38)10,o the base 5 digits are {0-4}.

(1011)2 = (11)10,o the binary digits are {0, 1}.

(F)16 = (15)10,o the hexadecimal digits are {0-9, A-F}.

Note: Common bases are 2 (binary), 8 (octal), 10 (decimal), and 16 (hexadecimal).

55

Algorithm: Constructing base b expansions.

procedure base_b_expansion(n: integer)q := 0k := 0while q0

ak := q mod bq := q/bk := k+1

{the base b expansion of n is (ak-1ak-2…a1a0)b}

Examples: Converting between some bases is easier than others.

Base 2 to any base 2k, k>1, is really easy. Just group k bits together and convert to the base 2k symbol.

Base 10 to any base 2k is a pain. Base 2k to base 10 is also a pain.

56

Algorithm: Addition of integers

procedure add(a, b: integers)(an-1an-2…a1a0)2 := base_2_expansion(a)(bn-1bn-2…b1b0)2 := base_2_expansion(b)c := 0for j := 0 to n-1

d := (aj+bj+c)/2sj := aj+bj+c – 2dc := d

sn := c{the binary expansion of the sum is (sk-1sk-2…s1s0)2}

Questions:

What is the complexity of this algorithm? Is this the fastest way to compute the sum?

57

Algorithm: Mutiplication of integers

procedure multiply(a, b: integers)(an-1an-2…a1a0)2 := base_2_expansion(a)(bn-1bn-2…b1b0)2 := base_2_expansion(b)for j := 0 to n-1

if bj = 1 then cj := a shifted j places else cj := 0{c0,c1,…,cn-1 are the partial products}p := 0for j := 0 to n-1

p := p + cj

{p is the value of ab}

Examples:

(10)2(11)2 = (110)2. Note that there are more bits than the original integers.

(11)2(11)2 = (1001)2. Twice as many binary digits!

58

Algorithm: Compute div and mod

procedure division(a: integer, d: positive integer)q := 0r := |a|while r d

r := r – dq := q + 1

if a < 0 and r > 0 thenr := d – rq := -(q + 1)

{q = a div d is the quotient and r = a mod d is the remainder}

Notes: The complexity of the multiplication algorithm is O(n2). Much more

efficient algorithms exist, including one that is O(n1.585) using a divide and conquer technique we will see later in the course.

59

There are O(log(a)log(d)) complexity algorithms for division.

60

Modular exponentiation, bk mod m, where b, k, and m are large integers is important to compute efficiently to the field of cryptology.

Algorithm: Modular exponentiation

procedure modular_exponentiation(b: integer, k,m: positive integers)(an-1an-2…a1a0)2 := base_2_expansion(k)y := 1power := b mod mfor i := 0 to n-1

if ai = 1 then y := (y power) mod mpower := (power power) mod m

{y = bk mod m}

Note: The complexity is O((log(m))2log(k)) bit operations, which is fast.

61

Euclidean Algorithm: Compute gcd(a,b)

procedure gcd(a,b: positive integers)x := ay := bwhile y0

r := x mod yx := yy := r

{gcd(a,b) is x}

Correctness of this algorithm is based on

Lemma: Let a=bq+r, where a,b,q,rZ. then gcd(a,b) = gcd(b,r).

The complexity will be studied after we master mathematical induction.

62

Number theory useful results

Theorem: If a,bN then s,tZ(gcd(a,b) = sa+tb).

Lemma: If a,b,cN (gcd(a,b) = 1 and a | bc, then a | c).

Note: This lemma makes proving the prime factorization theorem doable.

Lemma: If p is a prime and p | a1a2…an where each aiZ, then p | ai for some i.

Theorem: Let mN and let a,b,cZ. If ac bc (mod m) and gcd(c,m) = 1, then a b (mod m).

Definition: A linear congruence is a congruence of the form ax b (mod m), where mN, a,bZ, and x is a variable.

Definition: An inverse of a modulo m is an a such that aa 1 (mod m).

63

Theorem: If a and m are relatively prime integers and m>1, then an inverse of a modulo m exists and is unique modulo m.

Proof: Since gcd(a,m) = 1, s,tZ(1 = sa+tb). Hence, sa=tb 1 (mod m). Since tm 0 (mod m), it follows that sa 1 (mod m). Thus, s is the inverse of a modulo m. The uniqueness argument is made by assuming there are two inverses and proving this is a contradiction.

Systems of linear congruences are used in large integer arithmetic. The basis for the arithmetic goes back to China 1700 years ago.

Puzzle Sun Tzu (or Sun Zi): There are certain things whose number is unknown.

When divided by 3, the remainder is 2. When divided by 5, the remainder is 3, and When divided by 7, the remainder is 2.

What will be the number of things? (Answer: 23… stay tuned why).

64

Chinese Remander Theorem: Let m1, m2,…,mnN be pairwise relatively prime. Then the system x ai (mod mi) has a unique solution modulo m = mii=1

n∏ .

Existence Proof: The proof is by construction. Let Mk = m / mk, 1kn. Then gcd(Mk, mk) = 1 (from pairwise relatively prime condition). By the previous theorem we know that there is a yk which is an inverse of Mk modulo mk, i.e., Mkyk 1 (mod mk). To construct the solution, form the sum

x = a1M1y1 + a2M2y2 + … + anMnyn.

Note that Mj 0 (mod mk) whenever jk. Hence,

x akMkyk ak (mod mk), 1kn.

We have shown that x is simultaneous solution to the n congruences. qed

65

Sun Tzu’s Puzzle: The ak{2, 1, 2} from 2 pages earlier. Next

mk{3, 5, 7}, m=357=105, and Mk=m/mk{35, 21, 15}.

The inverses yk are

1. y1 = 2 (M1 = 35 modulo 3).2. y2 = 1 (M2 = 21 modulo 5).3. y3 = 1 (M3 = 15 modulo 7).

The solutions to this system are those x such that

x a1M1y1 + a2M2y2 + a2M2y2 = 2352 + 3211 + 2151 = 233

Finally, 233 23 (mod 105).

66

Definition: A mn matrix is a rectangular array of numbers with m rows and n columns. The elements of a matrix A are noted by Aij or aij. A matrix with m=n is a square matrix. If two matrices A and B have the same number of rows and columns and all of the elements Aij = Bij, then A = B.

Definition: The transpose of a mn matrix A = [Aij], denoted AT, is AT = [Aji]. A matrix is symmetric if A = AT and skew symmetric if A = -AT.

Definition: The ith row of an mn matrix A is [Ai1, Ai2, …, Ain]. The jth column is [A1j, A2j, …, Amj]T.

Definition: Matrix arithmetic is not exactly the same as scalar arithmetic:

C = A + B: cij = aij + bij, where A and B are mn. C = A – B: cij = aij - bij, where A and B are mn C = AB: cij = aipbpjp=1

k∑ , where A is mk, B is kn, and C is mn.

Theorem: AB = BA, but ABBA in general.

67

Definition: The identity matrix In is nn with Iii = 1 and Iij = 0 if ij.

Theorem: If A is nn, then AIn = InA = A.

Definition: Ar = AAA (r times).

Definition: Zero-One matrices are matrices A = [aij] such that all aij{0, 1}. Boolean operations are defined on mn zero-one matrices A = [aij] and B = [bij] by Meet of A and B: AB = aijbij, 1im and 1jn. Join of A and B: AB = aijbij, 1im and 1jn. The Boolean product of A and B is C = A e B, where A is mk, B is kn,

and C is mn, is defined by cij = (ai1b1j)(ai2b2j)…(aikbkj).

Definition: The Boolean power of a nn matrix A is defined by A[r] = A e A e… e A (r times), where A[0] = In.

Induction and Recursion

68

Principle of Mathematical Induction : Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying1. (Basis) P(1) is true2. (Induction) P(k)P(k+1), kN.

Notes:

Equivalent to [P(1) kN (P(k)P(k+1))] nN P(n). We do not actually assume P(k) is true. It is shown that if it is assumed that

P(k) is true, then P(k+1) is also true. This is a subtle grammatical point with mathematical implications.

Mathematical induction is a form of deductive reasoning, not inductive reasoning. The latter tries to make conclusions based on observations and rules that may lead to false conclusions.

Sometimes P(1) is not the basis, but some other P(k), kZ. Sometimes P(k) is for a (possibly infinite) subset of N or Z. Sometimes P(k-1)P(k) is easier to prove than P(k)P(k+1). Being flexible, but staying within the guiding principle usually works.

69

There are many ways of proving false results using subtly wrong induction arguments. Usually there is a disconnect between the basis and induction parts of the proof.

Examples 10, 11, and 12 in your textbook are worth studying until you really understand each.

Lemma: (2i-1)i=1n

∑ = n2 (sum of odd numbers).Proof: (Basis) Take k = 1, so 1 = 1.(Induction) Assume 1+3+5+…+(2k-1) = k2 for an arbitrary k > 1. Add 2k+1 to both sides. Then (1+3+5+…+(2k-1))+(2k+1) = k2+(2k+1) = (k+1)2.

70

Lemma: 2ii=0n

∑ = 2n+11.

Proof: (Basis) Take k=0, so 20 = 1 = 21 – 1.(Induction) Assume 2i

i=0k

∑ = 2k+11 for an arbitrary k > 0. Add 2k+1 to both sides. Then

2ii=0k

∑ + 2k+1= 2k+11 + 2k+1,

which simplifies to 2ii=0k+1

∑ = 2k+21.

Principle of Strong Induction : Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying1. (Basis) P(1) is true2. (Induction) [P(1)P(2)…P(k)]P(k+1) is true kN.

71

Example: Infinite ladder with reachable rungs. For mathematical or strong induction, we need to verify the following:

Step Mathematical StrongBasis We can reach the first rung.

Induction If we can reach an arbitrary rung k, then we can reach rung k+1.

kN, if we can reach all k rungs, then we can reach rung k+1.

We cannot prove that you can climb an infinite ladder using mathematical induction. Using strong induction, however, you can prove this result using a trick: since you can prove that you can climb to rungs 1, 2, …, k, it follows that you can climb 2 rungs arbitrarily, which gets you from rung k-1 to rung k+1.

Rule of thumb: Always use mathematical induction if P(k)P(k+1) kN. Only resort to strong induction when that fails.

72

Fundamental Theorem of Arithmetic: Every nN (n>1) is the product of primes.

Proof: Let P(n) be the proposition that n can be written as the product of primes.(Basis) P(2) is true: 2 = 2, the product of 1 prime.(Induction) Assume P(j) is true jk. We must verify that P(k+1) is true.

Case 1: k+1 is a prime. Hence, P(k+1) is true.Case 2: k+1 is a composite. Hence k+1 = a•b, where 2ab<k+1. By the inductive step, P(a) and P(b) are both true. Hence, a= pa∏ and b= pβ∏ , where the p’s are primes. It follows then that k+1 = pa pβ∏∏ , so P(k+1) is true.

Principle of Modified Strong Induction : Given a propositional function P(n), nN, we prove that P(n) is true for all nN by verifying1. (Basis) P(b), P(b+1), …, P(b+j) are all true.2. (Induction) [P(b)P(b+1)…P(k)]P(k+1) is true kb+jN.

73

Example: Every postage amount $.12 can be formed using $.04 and $.05 stamp combinations only. We can prove this using modified strong induction.(Basis) Consider 4 specific cases:

Postage Number of $.04’s Number of $.05’s$.12 3 0$.13 2 1$.14 1 2$.15 0 3

Hence, P(j) is true for 12j15.(Induction) Assume P(j) is true for 12jk and k15. By the inductive hypothesis, P(k-3) is true since k-312. Hence, we can just add another $.04 stamp.

Well Ordering Property: Every nonempty set of N has a least element.

The validity of math and strong induction is based on the well ordering property.

74

Definition: A recursive function is defined from1. (Basis) Initial value f(0).2. (Recursion) f(k), k>0, in terms of {f(j) | {j} such that 0j<k} and other

terms.

Examples:

f(0) = 1, f(n) = 2f(n-1)+4, n>0. g(0) = 12, g(1) = 1, g(n) = 2g(n-1) – g(n-2), n>2. h(0) = 1, h(n) = nh(n-1) = n! Fibonacci numbers: f0 = 0, f1 = 1, fn = fn-1 + fn-2, n>1.

n 0 1 2 3 4f(n) 1 6 16 36 76g(n) 12 1 -10 -21 -32h(n) 1 1 2 6 24fn 0 1 1 2 3

75

Theorem: Whenever n3, fn > an-2, where a=(1+ 5)/2 .The proof is by modified strong induction.

Lamé’s Theorem: Let a,bN (ab). Then the number of divisions used by the Euclidean algorithm to find gcd(a,b) 5•decimal digits in b.

We can recursively define sets, too, not just functions. There is a basis step and a recursion step with the possibility of an exclusion step.

Definition: The set * of strings over an alphabet is defined by(Basis) *, where is the empty string.(Recursion) If w,x, then wx*.

Example: = {0,1}. Then * is the binary representation of N0.

76

Principle of Structured Induction : 1. (Basis) Show the result holds for all elements specified in the basis step of

the recursive definition of the set.2. (Induction) Show that if the statement is true for each element used to

construct new elements in the recursive step of the definition, then the result holds for these new elements.

The validity of this approach comes from mathematical induction over N. First state that P(n) is true whenever n or fewer elements are used to generate an element. We must show that P(0) is true (i.e., the basis element). Now assume that P(k) is true for an arbitrary k. Hence, P(k+1) must be true, too, due to the recursion involving k or fewer elements.

77

Definition: A recursive algorithm solves a problem by reducing it to an instance of the same problem with smaller input(s).

Note: Recursive algorithms can be proven correct using mathematical induction or modified strong induction.

Examples:

n! = n•(n-1)! an = a•(an-1) gcd(a,b) with a,bN (a<b).

procedure gcd(a,b: integers and a<b)if a = 0 then gcd(a,b) := belse gcd(a,b) := gcd(b mod a, a)

78

linear search

procedure search(i,j,x: integers and 1in, 1jn)if ai = x then location := ielse if i = j then location := 0else search(i+1,j,x)

binary search

procedure binary_search(I,j,x: integers and 1in, 1jn)m := (i+j)/2if x = am then location := melse if x < am and i<m then binary_search(i,m-1,x)else if x > am and j>m then binary_search(m+1,j,x)else location := 0

79

Fibonacci numbers

procedure fib(n: nN0)if n = 0 then fib(0) := 0else if n = 1 then fib(1) := 1else fib(n) := fib(n-1) + fib(n-2)

or it can be defined iteratively:

procedure fib(n: nN0)if n = 0 then y := 0else

x := 0, y := 1for I := 1 to n-1

z := x+yx := yy := z

{y is fn}

80

Graphs and trees are important concepts that we will spend a lot of time considering later in the course.

A graph is made up of vertices and edges that connect some of the vertices. A tree is a special form of a graph, namely it is a connected unidirectional

graph with no simple circuits. A rooted tree is a tree with one vertex that is the root and every edge is

directed away from the root. A m-ary tree is a rooted tree such that every internal vertex has no more

than m children. If m = 2, it is a binary tree. The height of a rooted tree T, denoted h(T), is the maximum number of

levels (or vertices). A balanced rooted tree T has all of its leaves at h(T) or h(T)-1.

Let T1, T2, …, Tm be rooted trees with roots r1, r2, …, rm. Let r be another root. Connecting r to the roots r1, r2, …, rm constructs another rooted tree T. We can reformulate this concept using the recursive set methodology.

81

Merge sort is a balanced binary tree method that first breaks a list up recursively into two lists until each sublist has only one element. Then the sublists are recombined, two at a time and sorted order, until only one sorted list remains.

Note: The height of the tree formed in merge sort is O(log2n) for n elements.

10, 4, 7, 110, 4 7, 1

10 4 7 14, 10 1, 7

1, 4, 7, 10

Notes:

First three rows do the sublist splitting. Last two rows do the merging. There are two distinct algorithms at work.

82

procedure merge_sort(L = ai⎧⎨⎩

⎫⎬⎭i=1

n)

if n > 1 thenm := n/2

L1 := ai⎧⎨⎩

⎫⎬⎭i=1

m

L2 := ai⎧⎨⎩

⎫⎬⎭i=m +1

n

L := merge(merge_sort(L1), merge_sort(L2))

{L is now the sorted ai⎧⎨⎩

⎫⎬⎭i=1

n}

procedure merge(L1, L2: sorted lists)L := while L1 and L2 are both nonempty

remove the smaller of the first element of L1 and L2 and append it to end of Lif either L1 or L2 are empty, append the other list to the end of L

{L is the merged, sorted list}

83

Theorem: If ni = |Li|, i=1,2, then merge requires at most n1+n2-1 comparisons. If n = |L|, then merge_sort requires O(nlog2n) comparisons.

Quick sort is another sorting algorithm that breaks an initial list into many

sublists, but using a different heuristic than merge sort. If L = ai⎧⎨⎩

⎫⎬⎭i=1

n with

distinct elements, then quick sort recursively constructs two lists: L1 for all ai < a1 and L2 for all ai > a1 with a1 appended to the end of L1. This continues recursively until each sublist has only one element. Then the sublists are recombined in order to get a sorted list.

Note: On average, the number of comparisons is O(nlog2n) for n elements, but can be O(n2) in the worst case. Quick sort is one of the most popular sorting algorithms used in academia.

Exercise: Google “quick sort, C++” to see many implementations or look in many of the 200+ C++ primers. Defining quick sort is in Rosen’s exercises.

84

Counting, Permutations, and Combinations

Product Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1ik, ways to do the ith task, then there are

nii=1k

∏ ways to do the procedure.

Sum Rule Principle: Suppose a procedure can be broken down into a sequence of k tasks. If there are ni, 1ik, ways to do the ith task, with each way unique, then there are nii=1

k∑ ways to do the procedure.

Exclusion ( Inclusion ) Principle : If the sum rule cannot be applied because the ways are not unique, we use the sum rule and subtract the number of duplicate ways.

Note: Mapping the individual ways onto a rooted tree and counting the leaves is another method for summing. The trees are not unique, however.

85

Examples:

Consider 3 students in a classroom with 10 seats. There are 1098 = 720 ways to assign the students to the seats.

We want to appoint 1 person to fill out many, may forms that the administration wants filled in by today. There are 3 students and 2 faculty members who can fill out the forms. There are 3+2 = 5 ways to choose 1 person. (Duck fast.)

How many variables are legal in the orginal Dartmouth BASIC computer language? Variables are 1 or 2 alphanumeric characters long, begin with A-Z, case independent, and are not one of the 5 two character reserved words in BASIC. We use a combination of the three counting principles:o 1 character variables: V1 = 26o 2 character variables: V2 = 2636 - 5 = 931o Total: V = V1 + V2 = 957

86

Pigeonhole Principle: If there are kN boxes and at least k+1 objects placed in the boxes, then there is at least one box with more than one object in it.

Theorem: A function f: DE such that |D| >k and |E| = k, then f is not 1-1.The proof is by the pigeonhole principle.

Theorem (Generalized Pigeonhole Principle): If N objects are placed in k boxes, then at least one box contains at least N/k - 1 objects.

Proof: First recall that N/k < (N/k)+1. Now suppose that none of the boxes contains more than N/k - 1 objects. Hence, the total number of objects has to be

k(N/k - 1) < k((N/k)+1)-1) = N. Hence, the theorem must be true (proof by contradiction).

Theorem: Every sequence of n2+1 distinct real numbers contains a subsequence of length n+1 that is either strictly increasing or decreasing.Examples: From a standard 52 card playing deck.

87

How many cards must be dealt to guarantee that k = 4 cards from the same suit are dealt?o GPP Theorem says N/k - 1 4 or N = 17.o Real minimum turns out to be N/k 4 or N = 16.

How many cards must be dealt to guarantee that 4 clubs are dealt?o GPP Theorem does not apply.o The product rule and inclusion principles apply: 313+4 = 43 since all

of the hearts, spaces, and diamonds could be dealt before any clubs.

Definition: A permutation of a set of distinct objects is an ordered arrangment of these objects. A r-permutation is an ordered arrangement of r of these objects.

Example: Given S = {0,1,2}, then {2,1,0} is a permutation and {0,2} is a 2-permutation of S.

88

Theorem: If n,rN, then there are P(n,r) = n(n-1)(n-2)…(n-r+1) = n!/(n-r)! r-permutations of a set of n distinct elements. Further, P(n,0) = 1.

The proof is by the product rule for r1. For r=0, there is only way to order 0 objects.

Example: You want to visit 10 cities in China on a vacation. You will arrive in Hong Kong as your first city and you want to maximize the number of frequent flier miles you will accumulate by flying to 9 more cities. You have 9! Different paths to check. Good luck since 9! = 362,880.

Definition: A r-combination is an unordered subset with r elements from the original set.

Definition: The binomial coefficient is defined by nr

⎛

⎝⎜⎜

⎞

⎠⎟⎟ =

n!r!(nr)!.

89

Theorem: The number of r-combinations of a set with n elements with n,rN0 is

C(n,r) = nr

⎛

⎝⎜⎜

⎞

⎠⎟⎟ .

Proof: The r-permutations can be formed using C(n,r) r-combinations and then ordering each r-combination, which can be done in P(r,r) ways. So,

P(n,r) = C(n,r)P(r,r)or

C(n,r) = P(r,r)P(n,r) = n!

(n-r)!⋅(rr)!r! = n!

r!(nr)! .

Theorem: C(n,r) = C(n,n-r) for 0rn.

Definition: A combinatorial proof of an identity is a proof that uses counting arguments to prove that both sides f the identity count the same objects, but in different ways.

90

Binomial Theorem: Let x and y be variables. Then for nN,

(x+y)n = nj

⎛

⎝⎜⎜

⎞

⎠⎟⎟j=0

n∑ xnjyj.

Proof: Expanding the terms in the product all are of the form xn-jyj for j=0,1,…,n. To count the number of terms for xn-jyj, note that we have to choose n-j x’s from the n sums so that the other j terms in the product are y’s. Hence, the coefficient

for xn-jyj is n

n-j⎛

⎝⎜⎜

⎞

⎠⎟⎟=

nj

⎛

⎝⎜⎜

⎞

⎠⎟⎟ .

Example: What is the coefficient of x12y13 in (x+y)25? 2513

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 5,200,300.

Corollary: Let nN0. Then nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 2nk=0

n∑ .

Proof: 2n = (1+1)n = nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟1k1nk = n

k⎛

⎝⎜⎜

⎞

⎠⎟⎟k=0

n∑k=0

n∑ .

91

Corollary: Let nN0. Then (-1)k nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 0k=0

n∑ .

Proof: 0 = 0n = ((-1)+1)n = nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟(1)k1nk = (1)k n

k⎛

⎝⎜⎜

⎞

⎠⎟⎟k=0

n∑k=0

n∑ .

Corollary:

n0

⎛

⎝⎜⎜

⎞

⎠⎟⎟+ n2

⎛

⎝⎜⎜

⎞

⎠⎟⎟+ n4

⎛

⎝⎜⎜

⎞

⎠⎟⎟+L =n

1⎛

⎝⎜⎜

⎞

⎠⎟⎟+ n3

⎛

⎝⎜⎜

⎞

⎠⎟⎟+ n5

⎛

⎝⎜⎜

⎞

⎠⎟⎟+L

Corollary: Let nN0. Then 2k nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 3nk=0

n∑ .

Theorem (Pascal’s Identity): Let n,kN with nk. Then n+1k

⎛

⎝⎜⎜

⎞

⎠⎟⎟ =

nk1

⎛

⎝⎜⎜

⎞

⎠⎟⎟+ nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ .

92

Note: Using n0

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = nn

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 1 as a basis, we can define

nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ recursively using

Pascal’s Identity. It is normally written as a triangular table, denoted Pascal’s Triangle.

Theorem (Vandermonde’s Identity): Let m,n,rN with rm and rn. Thenm+n

r⎛

⎝⎜⎜

⎞

⎠⎟⎟ =

mrk

⎛

⎝⎜⎜

⎞

⎠⎟⎟nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟k=0

r∑ .

Corollary: If nN0, then 2nn

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = n

k⎛

⎝⎜⎜

⎞

⎠⎟⎟

2

k=0n

∑ .

Proof: 2nn

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = n

nk⎛

⎝⎜⎜

⎞

⎠⎟⎟nk

⎛

⎝⎜⎜

⎞

⎠⎟⎟k=0

n∑ = n

k⎛

⎝⎜⎜

⎞

⎠⎟⎟

2

k=0n

∑ .

Theorem: Let n,rN0 such that rn. Then n+1r+1

⎛

⎝⎜⎜

⎞

⎠⎟⎟ =

jr

⎛

⎝⎜⎜

⎞

⎠⎟⎟j=r

n∑ .

93

If we allow repetitions in the permutations, then all of the previous theorems and corollaries no longer apply. We have to start over .

Theorem: The number of r-permutations of a set with n objects and repetition is nr.

Proof: There are n ways to select an element of the set of all r positions in the r-permutation. Using the product principle completes the proof.

Theorem: There are C(n+r-1,r) = C(n+r-1,n-1) r-combinations from a set with n elements when repetition is allowed.

Example: How many solutions are there to x1+x2+x3 = 9 for xiN? C(3+9-1,9) = C(11,9) = C(11,2) = 55. Only when the constraints are placed on the x i can we possibly find a unique solution.

Definition: The multinomial coefficient is C(n; n1, n2, …, nk) = n!

ni!i=1k

∏ .

94

Theorem: The number of different permutations of n objects, where there are ni, 1ik, indistinguishable objects of type i, is C(n; n1, n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k distinguishable boxes so that ni objects are placed into box i, 1ik, is C(n; n1, n2, …, nk).

Theorem: The number of ways to distribute n distinguishable objects in k indistinguishable boxes so that ni objects are placed into box i, 1ik, is

1j!j=1

k∑ 1⎛

⎝⎜⎞⎠⎟j ji

⎛

⎝⎜⎜

⎞

⎠⎟⎟i=0

j1∑ ji⎛

⎝⎜⎞⎠⎟n

.

Multinomial Theorem: If nN, then

xii=1k

∑⎛

⎝⎜⎜

⎞

⎠⎟⎟

n

= C(n;n1,n2,...,nkn1+n2+...nk=k∑ )x1

n1x2n2...xk

nk .

95

Generating permutations and combinations is useful and sometimes important.

Note: We can place any n-set into a 1-1 correspondence with the first n natural numbers. All permutations can be listed using {1, 2, …, n} instead of the actual set elements. There are n! possible permutations.

Definition: In the lexicographic (or dictionary) ordering, the permutation of {1,2,…,n} a1a2…an precedes b1b2…bn if and only if ai bi, for all 1in.

Examples:

5 elements. The permutation 21435 precedes 21543. Given 362541, then 364125 is the next permutation lexicographically.

96

Algorithm: Generate the next permutation in lexicographic order.

procedure next_perm(a1a2…an: ai{1,2,…,n} and distinct)j := n – 1while aj > aj+1

j := j – 1{j is the largest subscript with aj < aj+1}k := nwhile aj > ak

k := k – 1{ak is the smallest integer greater than aj to the right of aj}Swap aj and ak

r := n, s := j+1while r > s

Swap ar and as

r := r – 1, s:= s + 1{This puts the tail end of the permutation after the j th position in increasing order}

97

Algorithm: Generating the next r-combination in lexicographic order.

procedure next_r_combination(a1a2…an: ai{1,2,…,n} and distinct)i := rwhile ai = n-r+1

i := i – 1ai := ai + 1for j := i+1 to r

aj := ai + j - 1

Example: Let S = {1, 2, …, 6}. Given a 4-permutation of {1, 2, 5, 6}, the next 4-permutation is {1, 3, 4, 5}.

98

Discrete Probability

Definition: An experiment is a procedure that yields one of a given set of possible outcomes.

Definition: The sample space of the experiment is the set of (all) possible outcomes.

Definition: An event is a subset of the sample space.

First Assumption: We begin by only considering finitely many possible outcomes.

Definition: If S is a finite sample space of equally likely outcomes and ES is an event, then the probability of E is p(E) = |E| / |S|.

99

Examples:

I randomly chose an exam1 to grade. What is the probability that it is one of the Davids? Thirty one students took exam1 of which five were Davids. So, p(David) = 5 / 31 ~ 0.16.

Suppose you are allowed to choose 6 numbers from the first 50 natural numbers. The probability of picking the correct 6 numbers in a lottery drawing is 1/C(50,6) = (44!6!) / 50! ~ 1.4310-9. This lottery is just a regressive tax designed for suckers and starry eyed dreamers.

Definition: When sampling, there are two possible methods: with and without replacement. In the former, the full sample space is always available. In the latter, the sample space shrinks with each sampling.

100

Example: Let S = {1, 2, …, 50}. What is the probability of sampling {1, 14, 23, 32, 49}?

Without replacement: p({1,14,23,32,49}) = 1 / (5049484746) = 3.9310-9.

With replacement: p({1,14,23,32,49}) = 1 / (5050505050) = 3.2010-9.

Definition: If E is an even, then E is the complementary event.

Theorem: p(E ) = 1 – p(E) for a sample space S.

Proof: p(E ) = (|S| – |E|) / |S| = 1 – |E| / |S| = 1 – p(E).

Example: Suppose we generate n random bits. What is the probability that one of the bits is 0? Let E be the event that a bit string has at least one 0 bit. Then E is the event that all n bits are 1. p(E) = 1 – p(E ) = 1 – 2-n = (2n – 1) / 2n.

101

Note: Proving the example directly for p(E) is extremely difficult.

Theorem: Let E and F be events in a sample space S. Thenp(EF) = p(E) + p(F) – p(EF).

Proof: Recall that |EF| = |E| + |F| – |EF|. Hence,p(EF) = |EF| / |S| = (|E| + |F| – |EF|) / |S| = p(E) + p(F) – p(EF).

Example: What is the probability in the set {1, 2, …, 100} of an element being divisible by 2 or 3? Let E and F represent elements divisible by 2 and 3, respectively. Then |E| = 50, |F| = 33, and |EF| = 16. Hence, p(EF) = 0.67.

102

Second Assumption: Now suppose that the probability of an event is not 1 / |S|. In this case we must assign probabilities for each possible event, either by setting a specific value or defining a function.

Definition: For a sample space S with a finite or countable number of events, we assign probabilities p(s) to each event sS such that

(1) 0 p(s) 1 sS, and(2) p(s) = 1s∈∑ .

Notes:

1. When |S| = n, the formulas (1) and (2) can be rewritten using n.2. When |S| = and is uncountable, integral calculus is required for (2).3. When |S| = and is countable, the sum in (2) is true in the limit.

103

Example: Coin flipping with events H and T. S = {H, T} for a fair coin. Hence, p(H) = p(T) = 0.5. S = {H, H, T} for a weighted coin. Then p(H) = 0.67 and p(T) = 0.33.

Definition: Suppose that S is a set with n elements. The uniform distribution assigns the probability 1/n to each element in S.

Definition: The probability of the event E is the sum of the probabilities of the outcomes in E, i.e., p(E) = p(s)s∈E∑ .

Note: When |E| = , the sum p(s)s∈E∑ must be convergent in the limit.

Definition: The experiment of selecting an element from a sample space S with a uniform distribution is known as selecting an element from S at random.

We can prove that (1) p(E) = 1 – p(E ) and (2) p(EF) = p(E) + p(F) – p(EF) using the more general probability definitions.

104

Definition: Let E and F be events with p(F) > 0. The conditional probability of E given F is defined by p(E|F) = p(EF) / p(F).

Example: A bit string of length 3 is generated at random. What is the probability that there are two 0 bits in a row given that the first bit is 0? Let F be the event that the first bit is 0. Let E be the event that there are two 0 bits in a row. Note that EF = {000, 001} and p(F) = 0.5. Hence, p(E|F) = 0.25 / 0.5 = 0.5.

Definition: The events E and F are independent if p(EF) = p(E)p(F).

Note: Independence is equivalent to having p(E|F) = p(E).

Example: Suppose E is the event that a bit string begins with a 1 and F is the event that there is are an even number of 1’s. Suppose the bit strings are of length 3. There are 4 bit strings beginning with 1: {100, 101, 110, 111}. There are 3 strings with an even number of 1’s: {101, 110, 011}. Hence, p(E) = 0.5 and p(F) = 0.375. EF = {101, 110}, so p(EF) = 0.25. Thus, p(EF) p(E)p(F). Hence, E and F are not independent.

105

Note: For bit strings of length 4, 0.25 = p(EF) = (0.5)(0.5) = p(E)p(F), so the events are independent. We can speculate on whether or not the even/odd length of the bit strings plays a part in the independence characteristic.

Definition: Each performance of an experiment with exactly two outcomes, denoted success (S) and failure (F), is a Bernoulli trial.

Definition: The Bernoulli distribution is denoted b(k; n,p) = C(n,k)pkqn-k.

Theorem: The probability of exactly k successes in n independent Bernoulli trials, with probability of success p and failure q = 1 – p is b(k; n,p).

Proof: When n Bernoulli trials are carried out, the outcome is an n-tuple (t1, t2, …, tn), all n ti{S, F}. Due to the trials independence, the probability of each outcome having k successes and n-k failures is pkqn-k. There are C(n,k) possible tuples that contain exactly k successes and n-k failures.

106

Example: Suppose we generate bit strings of length 10 such that p(0) = 0.7 and p(1) = 0.3 and the bits are generated independently. Then

b(8; 10,0.7) = C(10,8)(0.7)8(0.3)2 = 450 .08235430.09 = 0.3335 b(7; 10,0.7) = C(10,7)(0.7)7(0.3)3 = 1200 .057648010.027 = 0.1868

Theorem: b(k;n,p) = 1k=0n

∑ .

Proof: b(k;n,p) = C(k; n,p)pkqn-k = (p+q)n = k=0n

∑ 1k=0n

∑ .

Definition: A random variable is a function from the sample space of an experiment to the set of reals.

Notes:

A random variable assigns a real number to each possible outcome. A random function is not a function nor random.

107

Example: Flip a fair coin twice. Let X(t) be the random variable that equals the number of tails that appear when t is the outcome. Then

X(HH) = 0, X(HT) = X(TH) = 1, and X(TT) = 2.

Definition: The distribution of a random variable X on a sample space is the set of pairs (r, p(X=r)) rX(S), where p(X=r) is the probability that X takes the value r.

Note: A distribution is usually described by specifying p(X=r) rX(S).

Example: For our coin flip example above, each outcome has probability 0.25. Hence,

p(X=0) = 0.25, p(X=1) = 0.5, and p(X=2) = 0.25.

108

Definition: The expected value (or expectation) of the random variable X(s) in the sample space S is E(X)= p(s)X(s)s∈∑ .

Note: If S = {xi}i=1n , then E(X) = p(xi)X(xi)i=1

n∑ .

Example: Roll a die. Let the random variable X take the valuess 1, 2, …, 6 with

probability 1/6 each. Then E = 16

⎛

⎝⎜⎜

⎞

⎠⎟⎟i=1

n∑ = 3.5 . This is not really what you would

like to see since the die does not a 3.5 face.

Theorem: If X is a random variable and p(X=r) is the probability that X=r so that p(X=r) = p(s)r∈,X(s)=r∑ , then E(X) = p(X=r)rr∈X()∑ .

Proof: Suppose X is a random variable with range X(S). Let p(X=r) be the probability that X takes the value r. Hence, p(X=r) is the sum of probabilities of outcomes s such that X(s)=r Finally, E(X) = p(X=r)rr∈X()∑ .

109

Theorem: If Xi, 1in, are random variables on S and if a,bR, then

1. E(X1+X2+…+Xn) = E(X1)+E(X2)+…+E(Xn)2. E(aXi+b) = aE(Xi) + b

Proof: Use mathematical induction (base case is n=2) for 1 and using the definitions for 2.

Note: The linearity of E is extremely convenient and useful.

Theorem: The expected number of successes when n Bournoulli trials is performed when p is the probability of success on each trial is np.

Proof: Apply 1 from the previous theorem.

110

Notes:

The average case complexity of an algorithm can be interpreted as the expected value of a random variable. Let S={ai}, where each possible input is an ai. Let X be the random variable such that X(ai) = bi, the number of operations for the algorithm with input ai. We assign a probability p(ai) based on bi. Then the average case complexity is E(X) = p(ai)X(ai)ai∈

∑ . Estimating the average complexity of an algorithm tends to be quite

difficult to do directly. Even if the best and worst cases can be estimated easily, there is no guarantee that the average case can be estimated without a great deal of work. Frankly, the average case is sometimes too difficult to estimate. Using the expected value of a random variable sometimes simplifies the process enough to make it doable.

111

Example of linear search average complexity : See page 44 in the class notes for the algorithm and worst case complexity bound. We want to find x in a distinct

set ai⎧⎨⎩

⎫⎬⎭i=1

n. If x = ai, then there are 2i+1 comparisons. If x ai

⎧⎨⎩

⎫⎬⎭i=1

n, then there are

2n+2 comparisons. There are n+1 input types: ai⎧⎨⎩

⎫⎬⎭i=1

nx. Clearly, p(ai) = p/n,

where p is the probability that x ai⎧⎨⎩

⎫⎬⎭i=1

n. Let q = 1p. So,

E = (p/n) (2i-1)i=1n

∑ + (2n+2)q

= (p/n)((n+1)2 + (2n+2)q= p(n+2) + (2n+2)q.

There are three cases of interest, namely, p = 1, q = 0: E = n + 1 p = q = 0.5: E = (3n + 4) / 2 p = 0, q = 1: E = 2n + 2

112

Definition: A random variable X has a geometric distribution with parameter p if p(X=k) = (1p)k-1p for k = 1, 2, …

Note: Geometric distributions occur in studies about the time required before an event happens (e.g., time to finding a particular item or a defective item, etc.).

Theorem: If the random variable X has a geometrix distribution with parameter p, then E(X) = 1/p.

Proof: E(X) = ip(X=i)i=1

∞∑

= i(1-p)i-1pi=1∞

∑= p i(1-p)i-1

i=1∞

∑= pp-2

= 1/p

113

Definition: The random variables X and Y on a sample space are independent if p(X(s)=r1 and Y(S)=r2) = p(X(S)=r1)p(Y(S)=r2).

Theorem: If X and Y are independent random variables on a space S, thenE(XY) = E(X)E(Y).

Proof: From the definition of expected value and since X and Y are independent random variables,

E(XY) = X(s)Y(s)p(s)s∈∑= rtp(X(s)=r and Y(s)=t)r∈X(),t∈Y()∑= rtp(X(s)=r)p(Y(s)=t)r∈X(),t∈Y()∑= rp(X(s)=r)r∈X()∑⎛⎝⎜

⎞⎠⎟ tp(Y(s)=t)t∈Y()∑⎛⎝⎜

⎞⎠⎟

= E(X)E(Y).

114

Third Assumption: Not all problems can be solved using deterministic algorithms. We want to assess the probability of an event based on partial evidence.

Note: Some algorithms need to make random choices and produce an answer that might be wrong with a probability associated with its likelihood of correctness or an error estimate. Monte Carlo algorithms are examples of probabilistic algorithms.

Example: Consider a city with a lattice of streets. A drunk walks home from a bar. At each intersection, the drunk must choose between continuing or turning left or right. Hopefully, the drunk gets home eventually. However, there is no absolute guarantee.

115

Example: You receive n items. Sometimes all n items are guaranteed to be good. However, not all shipments have been checked. The probability that an item is bad in an unchecked batch is 0.1. We want to determine whether or not a shipment has been checked, but are not willing to check all items. So we test items at random until we find a bad item or the probability that a shipment seems to have been checked is 0.001. How items do we need to check? The probability that an item is good, but comes from an unchecked batch is 10.1 = 0.9. Hence, the kth check without finding a bad item, the probability that the items comes from an unchecked shipment is (0.9)k. Since (0.9)66~0.001, we must check only 66 items per shipment.

Theorem: If the probability that an element of a set S does have a particular property is in (0,1), then there exists an element in S with this property.

116

Bayes Theorem: Suppose that E and F are events from a sample space S such that p(E) 0 and p(F) 0. Then

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F )p(F )).

Generalized Bayes Theorem: Suppose that E is an event from a sample space and that F1, F2, …, Fn are mutually exclusive events such that Fi = Si=1

nU .

Assume that p(E) 0 and p(Fi) 0, 1in. Then

p(Fj|E) = p(E| Fj)p(Fj) / p(E|Fi)p(Fi)i=1n

∑ .

117

Example: We have 2 boxes. The first box contains 2 green and 7 red balls. The second box contains 4 green and 3 red balls. We select a box at random, then a ball at random. If we picked a red ball, what is the probability that it came from the first box?

Let E be the event that we chose a red ball. Thus, E is the event that we chose a green ball. Let F be the event that we chose a ball from the first box. Thus, F is the event that we chose a ball from the second box. p(F) = p(F ) = 0.5 since we pick a box at random.

We want to calculate p(F|E) = p(EF) / p(E), which we will do in stages.

p(E|F) = 7/9 since there are 7 red balls out of 9 total in box 1. p(E|F ) = 3/7 since there are 3 red balls out of a total of 7 in box 2.

p(EF) = p(E|F)p(F) = 7/18 = 0.389 and p(EF ) = p(E|F )p(F) = 3/14.

118

We need to find p(E). We do this by observing that E = (EF)(EF ), where EF and EF are disjoint sets. So, p(E) = p(EF)+p(EF ) = 0.603.

p(F|E) = p(EF) / p(E) = 0.389 / 0.603 = 0.645, which is greater than the 0.5 from the second bullet above. We have improved our estimate!

Example: Suppose one person in 100,000 has a particular rare disease and that there is an accurate diagnostic test for this disease. The test is 99% accurate when given to someone with the disease and is 99.5% accurate when given to someone who does not have the disease. We can calculate(a) the probability that someone who tests positive has the disease, and(b) the probability that someone who tests negative does not have the disease.Let F be the event that a person has the disease and let F be

119

the event that this person tests positive. We will use Bayes theorem to calculate (a) and (b), so have to calculate p(F), p( F ), p(E|F), and p(E|F ).

p(F) = 1 / 100000 = 105 and p(F ) = 1 p(F) = 0.99999. p(E|F) = 0.99 since someone who has the disease tests

positive 99% of the time. Similarly, we know that a false negative is p(E |F) = 0.01. Further, p(E |F ) = 0.995 since the test is 99.5% accurate for someone who does not have the disease.

p(E|F ) = 0.005, which is the probability of a false negative (100 99.5%).

120

Now we calculate (a):

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F )p(F )) =(0.99105) / (0.99105 + 0.0050.99999) = 0.002.

Roughly 0.2% of people who test positive actually have the disease. Getting a positive should not be an immediate cause for alarm (famous last words).

Now we calculate (b):

p(F |E ) = p(E |F )p(F ) / (p(E |F )p(F ) + p(E |F)p(F)) (0.9950.99999) / (0.9950.99999 + 0.01105) =

0.9999999.

Thus, 99.99999% of people who test negative really do not have the disease.

121

Bayesian Spam Filters used to be the first line of defense for email programs. Like many good things, the spammers ran right over the process in about two years. However, it is an interesting example of useful discrete mathematics.

The filtering involves a training period. Email messages need to be marked as Good or Bad messages, which we will denote as being the G or B sets. Eventually the filter will mark messages for you, hopefully accurately.

The filter finds all of the words in both sets and keeps a running total of each word per set. We construct two functions nG(w) and nB(w) that return the number of messages containing the word w in the G and B sets, respectively.

We use a uniform distribution. The empirical probability that a spam message contains the word w is p(w) = nB(w) / |B|. The empirical probability that a non-spam message contains the word w is q(w) = nG(w) / |G|.

We can use p and q to estimate if an incoming message is or is not spam based on a set of words that we build dynamically over time.

122

Let E be the event that an incoming message contains the word w. Let S be the event that an incoming message is spam and contains the word w. Bayes theorem tells us that the probability that an incoming message containing the word w is spam is

p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E|S)p(S)).

If we assume that p(S) = p(S) = 0.5, i.e., that any incoming message is equally likely to be spam or not, then we get the simplified formula

p(S|E) = p(E|S) / (p(E|S) + p(E|S)).

We estimate p(E|S) = p(w) and p(E|S) = q(w). So, we estimate p(S|E) by

r(w) = p(w) / (p(w) + q(w)).

If r(w) is greater than some preset threshold, then we classify the incoming message as spam. We can consider a threshold of 0.9 to begin with.

123

Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in 5 / 1000 good messages. We will estimate the probability that an incoming message with Rolex in it is spam assuming that it is equally likely that the incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125 and q(Rolex) = 5 / 1000 = 0.005. So,

r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9.

Hence, we would reject the message as spam. (Note that some of us would reject all messages with the word Rolex in it as spam, but that is another case entirely.)

124

Using just one word to determine if a message is spam or not leads to excessive numbers of false positives and negatives. We actually have to use the generalized Bayes theorem with a large set of words.

p(S | Eii=1kI ) =

p(Ei|S)i=1k

∏p(Ei|)i=1

k∏ + p(Ei|)i=1

k∏

,

which we estimate assuming equal probability that an incoming message is spam or not by

r(w1,w1,...,w1) = p(wi)i=1

k∏p(i)i=1

k∏ + q(i)i=1

k∏

.

125

Example: The word w1 = stock appears in 400 / 2000 spam messages and in just 60 / 1000 good messages. The word w2 = undervalued appears in 200 / 2000 spam messages and in just 25 / 1000 good messages. Estimate the likelihood that an incoming message with both words in it is spam. We know p(stock) = 0.2 and q(stock) = 0.06. Similarly, p(undervalued) = 0.1 and q(undervalued) = .025. So,

r(stock,undervalued) =p(stock)p(undervalued)

p(stock)p(undervalued)+q(stock)q(undervalued)

= 0.2×0.10.2×0.1+0.06×0.025

= 0.930 > 0.9

Note: Looking for particular pairs or triplets of words and treating each as a single entity is another method for filtering. For example, enhance performance probably indicates spam to almost anyone, but high performance computing probably does not indicate spam to someone in computational sciences (but probably will for someone working in, say, Maytag repair).

126

Advanced Counting Principles

Definition: A recurrence relation for the sequence {an} is the equation that expresses an in terms of one or more of the previous terms in the sequence. A sequence is called a solution to a recurrence relation if its terms satisfy the recurrence relation. The initial conditions specify the values of the sequence before the first term where the recurrence relation takes effect.

Note: Recursion and recurrence relations have a connection. A recursive algorithm provides a solution to a problem of size n in terms of a problem size n in terms of one more instances of the same problem, but of smaller size. Complexity analysis of the recursive algorithm is a recurrence relation on the number of operations.

Example: Suppose we have {an} with an = 3n, nN. Is this a solution foran = 2an-1 an-2 for n2? Yes, since for n2,

2an-1 an-2 = 2(3(n1)) – 3(n2) = 3n = an.

127

Example: Suppose in 1977 you invested $100,000 into a tax free, 30 year municipal bond that paid 15% per year. What is it worth at maturity? Did it beat inflation and if so, by how much? P0 = 100000 P1= 1.15P0

P2 = 1.15P1 = (1.15)2P0

Pi = (1.15)iP0, which can be rigorously proven using mathematical induction.

P30 = (1.15)30P0 = $6,621,180This is a big number. What about inflation? We can find the consumer price increase (CPI) monthly and yearly on the Internet, e.g., http://inflationdata.com. Consider just the yearly CPI to make the comparison fairer. {Ij} the CPI per year Bj = Ijj=1

30∏ = $354,580.

Investing your money in a bank that just beat inflation would have been a huge investing error. 15% seems high, but that existed back then due to high inflation.

128

Fibonacci Example: A young pair of rabbits (1 male, 1 female) arrive on a deserted island. They can breed after they are two months old and produce another pair. Thereafter each pair at least two months old can breed once a month. How many pairs fn of rabbits are there after n months. n = 1: f1 = 1 Initial n = 2: f2 = 1 conditions n > 2: fn = fn-1 + fn-2 Recurrence relation

The n > 2 formula is true since each new pair comes from a pair at least 2 months old.

Example: For bit strings of length n 3, find the recurrence relation and initial conditions for the number of bit strings that do not have two consecutive 0’s. n = 1: a1 = 2 Initial {0,1} n = 2: a2 = 3 conditions {01,10,11} n > 2: an = an-1 + an-2 Recurrence relation

For n > 2, there are two cases: strings ending in 1 (thus, examine the n1 case) and strings ending in 10 (thus, examine the n2 case).

129

Definition: A linear homogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form

an = c1an1 + c2an2 + … + ckank,

where {ci}R.

Motivation for study: This type of recurrence relation occurs often and can be systematically solved. Slightly more general ones can be, too. The solution methods are related to solving certain classes of ordinary differential equations.

Notes:

Linear because the right hand side is a sum of previous terms. Homogeneous because no terms occur that are not multiples of aj’s. Constant because no coefficient is a function. Degree k because an is defined in terms of the previous k sequential terms.

130

Examples: Typical ones include

Pn = 1.15Pn-1 is degree 1. fn = fn-1 + fn-2 is degree 2. an = an-5 is degree 5.

Examples: Ones that fail the definition include

an = an-1 + an-22 is nonlinear.

Hn = 2Hn-1 + 1 is nonhomogeneous. Bn = nBn-1 is variable coefficient.

We will get to nonhomogeneous recurrence relations shortly.

131

Solving a recurrence relation usually assumes that the solution has the form

an = rn,

where rC, if and only if

rn = c1rn-1 + c2rn-2 + … + cn-krn-k.

Dividing both sides by rn-k to simplify things, we get

Definition: The characteristic equation is

rk c1rk-1 c2rk-2 … cn-k = 0.

Then {an} with an = rn is a solution if and only if r is a solution to the characteristic equation. The proof is quite involved.

The n = 2 case is much easier to understand, yet still multiple cases.

132

Theorem: Assume c1,c2,a1,a2R and r1,r2C. Suppose that r2c1rc2 = 0 has two distinct roots r1 and r2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if an = a1r1

n + a2r2n for nN0.

Example: a0 = 2, a1 = 7, and an = an-1 + 2an-2 for n2. Then

Characteristic equation: r2 – r – 2 = 0 or (r2)(r+1) = 0. Roots: r1 = 2 and r2 = 1. Constants: a0 = 2 = a1 + a2 and a1 = 7 = 2a1 a2.

Solve 1 12 -1

⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥

a1a2

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥ = 2

7⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥ or

a1a2

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥ = 3

1⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥.

Solution: an = 32n + (1)n.

Matlab or Maple is essential to solving recurrence relations quickly and accurately.

133

Fibonacci Example: f0 = 0, f1 = 1, and fn = fn-1 + fn-2, n2.

Characteristic equation: r2 – r – 1 = 0. Roots: r1 = 1+ 5

2 and r2 = 1- 52 .

Set up a 22 matrix problem to solve for a1 and a2, which are a1 = 15 and

a2 = −15 .

Solution: fn = 15

1+ 52

⎛

⎝⎜⎜⎜

⎞

⎠⎟⎟⎟

n

− 151− 52

⎛

⎝⎜⎜⎜

⎞

⎠⎟⎟⎟

n

.

134

Now comes the second case for n = 2.

Theorem: Assume c1,c2,a1,a2R and r0C. Suppose that r2c1rc2 = 0 has one root r0 with multiplicity 2. Then the sequence {an} is a solution to the recurrence relation an = c1an-1 + c2an-2 if and only if an = a1r0

n + a2nr0n for nN0.

Example: a0 = 1, a1 = 6, and an = 6an-1 9an-2 for n2. Then

Characteristic equation: r2 6r + 9 = 0 or (r3)2 = 0. Double root: r0 = 3. Constants: a0 = 1 = a1 and a1 = 6 = 3a1 + 3a2.

Solve 1 03 3

⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥

a1a2

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥ = 1

6⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥ or

a1a2

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥ = 1

1⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥.

Solution: an = (n+1)3n.

135

Theorem: Let {ci}i=ik , {ai} i=i

k R and {ri}i=ik C. Suppose the characteristic

equation rk – c1rk1 … ck = 0 has k distinct roots ri, 1ik. Then the sequence {an} is a solution of the recurrence relation an = c1an1 + c2an2 + … + ckank if and only if an = a1r1

n + a2r2n + ... + akrk

n for nN0.

Example: a0 = 2, a1 = 5, a2 = 15, and an = 6an1 11an2 + 6an3, n3.

Characteristic equation: r3 6r2 +11r 6 = 0 or (r1)(r2)(r3) = 0. Roots: r1 = 1, r2 = 2, and r3 = 3. Constants: a0 = 2 = a1 + a2 + a3, a2 = 5 = a1 + 2a2 + 3a3, and a0 = 15 = a1 + 4a2 + 9a3.

Solve 1 1 11 2 31 4 9

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

a1a2a3

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

= 2515

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

or a1a2a3

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

= 1−12

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥.

Solution: an = 1 2n + 23n.

136

Theorem: Let {ci}i=ik , {ai} i=i

k R and {ri}i=ik C. Suppose the characteristic

equation rk – c1rk1 … ck = 0 has t distinct roots ri, 1it, with multiplicities miN such that mi = ki=1

t∑ . Then the sequence {an} is a solution of the

recurrence relation an = c1an1 + c2an2 + … + ckank if and only if

an = (a1,0+a1,1n+...+a1,m 1−1nm 1−1)r1

n + ... + (at,0+at,1n+...+at,m t−1nm t−1)rtn

for nN0 and all ai,j, 1it and 0jmi1.

Example: Suppose the roots of the characteristic equation are 2, 2, 3, 3, 3, 5. Then the general solution form is

(a1,0+a1,1n)2n + (a2,0+a2,1n+a2,2n2)3n + a3,05n.

With given initial conditions, we can even compute the a’s.

137

Definition: A linear nonhomogeneous recurrence relation of degree k with constant coefficients is a recurrence relation of the form

an = c1an1 + c2an2 + … + ckank + F(n),

where {ci}R.

Theorem: If {an(p)} is a particular solution of the recurrence relation with

constant coefficients an = c1an1 + c2an2 + … + ckank + F(n), then every solution is of the form {an

(p)+an(h)} , where {an

(h)} is a solution of the associated homogeneous recurrence relation (i.e., F(n) = 0).

Note: Finding particular solutions for given F(n)’s is loads of fun unless F(n) is rather simple. Usually you solve the homogeneous form first, then try to find a particular solution from that.

138

Theorem: Assume {bi},{ci}R. Suppose that {an} satisfies the nonhomogeneous recurrence relation

an = c1an1 + c2an2 + … + ckank + F(n)and

f(n) = (btnt + bt-1nt-1 + … + b1n + b0)sn.

When s is not a root of the characteristic equation of the associated homogeneous recurrence relation, there is a particular solution of the form

(ptnt + pt-1nt-1 + … + p1n + p0)sn.

When s is a root of multiplicity m of the characteristic equation, there is a particular solution of the form

nm(ptnt + pt-1nt-1 + … + p1n + p0)sn.

Note: If s = 1, then things get even more complicated.

139

Example: Let an = 6an-1 – 9an-2 + F(n). When F(n) = 0, the characteristic equation is (r3)2. Thus, r0 = 3 with multiplicity 2.

F(n) = 3n: particular solution is n2p03n. F(n) = n3n: particular solution is n2(p1n + p0)3n. F(n) = n22n: particular solution is (p2n2 + p1n + p0)2n. F(n) = (n+1)3n: particular solution is n2(p2n2 + p1n + p0)3n.

Definition: Suppose a recursive algorithm divides a problem of size n into m subproblems of size n/m each. Also suppose that g(n) extra operations are required to combine the m subproblems into a solution of the problem of size n. If f(n) is the cost of solving a problem of size n, then the divide and conquer recurrence relation is f(n) = af(n/b) + g(n).

We can easily work out a general cost for the divide and conquer recurrence relation using Big-Oh notation.

140

Divide and Conquer Theorem: Let a,b,cR and be nonnegative. The solution to the recurrence relation

f(n) = c, for n = 1,af(n/b)+cnd, for n > 1,

⎧⎨⎪

⎩⎪

for n a power of b is

f(n)=O(nd),

O(ndlogn), O(nlogba),

⎧

⎨⎪⎪⎪

⎩⎪⎪⎪

for a < βd,for a = βd,for a βd.

Proof: If n is a power of b, then for r = a/b, f(n) = cn rii=1logbn

∑ . There are 3 cases:

a < bd: Then rii=0∞

∑ converges, so f(n) = O(nd). a = bd: Then each term in the sum is 1, so f(n) = O(ndlogn).

a > bd: Then cnd rii=1logbn

∑ = cnd⋅r1+ogβn1r1

which is O(alogbn ) or O(nlogba ).

141

Example: Recall binary search (see page 45 in the class notes). Searching for an element in a set requires 2 comparisons to determine which half of the set to search further. The search keeps halving the size of the set until at most 1 element is left. Hence, f(n) = f(n/2) + 2. Using the Divide and Conquer theorem, we see that the cost is O(logn) comparisons.

Example: Recall merge sort (see pages 81-83 in the class notes). This sorts halves of sets of elements and requires less than n comparisons to put the two sorted sublists into a sorted list of size n. Hence, f(n) = 2f(n/2) + n. Using the Divide and Conquer theorem, we see that the cost is O(nlogn) comparisons.

Multiplying integers can be done recursively based on a binary decomposition of the two numbers to get a fast algorithm. The patent on this technique, implemented in hardware, made a computer company several billion dollars back when a billion dollars was real money (cf. a trillion dollars today).

Why stop with integers? The technique extends to multiplying matrices, too, with real, complex, or integer entries.

142

Example (funny integer multiplication): Suppose a and b have 2n length binary representations a = (a2n1a2n2… a1a0)2 and a = (b2n1b2n2… b1b0)2. We will divide a and b into left and right halves:

a = 2nA1 + A0 and , where b = 2nB1 + B0 andA1 = (a2n1a2n2…an+1an)2 and A0 = (an-1an2…a1a0)2,B1 = (b2n1b2n2…bn+1bn)2 and B0 = (bn-1bn2…b1b0)2.

The trick is to notice that

ab = (22n+2n)A1B1 + 2n(A1A0)(B0B1) + (2n+1)A0B0.

Only 3 multiplies plus adds, subtracts, and shifts are required. So, f(2n) = 3f(n) + Cn, where C is the cost of the adds, subtracts, and shifts. The Divide and Conquer theorem tells us this O(nlog3), which is about O(n1.6). The standard algorithm is O(n2). It might not seem like much of an improvement, but it actually is when lots of integers are multiplied together. The trick can be applied recursively on the three multiplies in the ab line (halving 2n in the recursion).

143

Example (Strassen-Winograd Matrix-Matrix multiplication): We want to multiply A: mk by B: kn to get C: mn. The matrix elements can be reals, complex numbers, or integers. When m = k = n, this takes O(n3) operations using the standard matrix-matrix multiplication algorithm. However, Strassen first proposed a divide and conquer algorithm that reduced the exponent. The belief is that someday, someone will devise an O(n2) algorithm. Some hope it will even be plausible to use such an algorithm. The variation of Strassen’s algorithm that is most commonly implemented by computer vendors in high performance math libraries is the Winograd variant. It computes the product as

A11 A12A21 A22

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

B11 B12B21 B22

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥ =

C11 C12C21 C22

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥.

C is computed in 22 steps involving the submatrices of A, B, and intermediate temporary submatrices. An interesting question for many years was how little extra memory was needed to implement the Strassen-Winograd algorithm (see C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A portable Level 3 BLAS Winograd variant of Strassen's matrix-matrix multiply

144

algorithm, Journal of Computational Physics, 110 (1994), pp. 1-10 for an answer).

The 22 steps are the following:

Step Wmk C11 C12 C21 C22 Wkn Operation1 S7 B22B12

2 S3 A11A21

3 M4 S3S7

4 S1 A21+A22

5 S5 B12B11

6 M5 S1S5

7 S6 B22S5

8 S2 S1A11

9 M1 S2S6

10 S4 A12S2

11 M6 S4B22

12 T3 M5+M6

145

Step Wmk C11 C12 C21 C22 Wkn Operation13 M2 A11B11

14 T1 M1+M2

15 C12 T1+T3

16 T2 T1+M4

17 S8 S6B21

18 M7 A22S8

19 C21 T2M7

20 C22 T2+M5

21 M3 A12B21

22 C11 M2+M3

There are four tricky steps in the table above, depending on whether or not k is even or odd. Each step makes certain that we do not use more memory than is allocated for a submatrix or temporary. For example,

In step 4, we have to take care that with S1. (a) If k is odd, then copy the first column of A21 into Wmk. (b) Complete S1.

146

In step 10, we have to take care that with S4. (a) If k is odd, then pretend the first column of A21 = 0 in Wmk. (b) Complete S4.

In step 11, we have to take care that with M6. (a) If m is odd, then save the first row of M5. (b) Calculate most of M6. (c) Complete M6 using (a) based on whether or not m is odd.

In step 21, we have to take care that with M3. (a) Caluclate M3 using an index shift.

This all sounds very complicated. However, the code GEMMW that is readily available on the Web effectively is implemented in 27 calls to subroutines that do the matrix operations and actually implements

C = a×op(A)op(B) + β×C,

where op(X) is either X, X transpose, X conjugate, or X conjugate transpose.

What is the total cost?

147

There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n2/4 when m=k=n. This is actually an O(n2.807logn) algorithm, where log27 = 2.807.

The work area Wmk needs ((m+1)max(k,n)+m+4)/4 space. The work area Wkn needs ((k+1)n+n+4)/4 space. If C overlaps A or B in memory, an additional mn space is needed to save C

before calculating β×C when β0. The maximum amount of extra memory is bounded by

(m×max(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall extra storage is cN2/3, where c{2,5}.

Typical memory usage when m=k=n iso β0 or A or B overlap with C: 1.67N2.o β=0 and A and B do not overlap with C: 0.67N2.

148

Definition: The (ordinary) generating function for a sequence a1, a2, …, ak, … of real numbers is the infinite series G(x) = akxk

k=0∞

∑ . For a finite sequence {ak}k=0

n , the generating function is G(x) = akxkk=0n

∑ .

Examples:

1. ak = 3, G(x) = 3 xkk=0∞

∑ .2. ak = k+1, G(x) = (k+1)xk

k=0∞

∑ .3. ak = 2k, G(x) = (2x)k

k=0∞

∑ .

4. ak = 1, 0k2, G(x) = xkk=02

∑ = x3−1x−1 .

Notes:

x is a placeholder, so that G(1) in example 4 above is undefined does not matter.

We do not have to worry about convergence of the series, either.

149

When solving a series using calculus, knowing the ball of convergence for the x’s is required.

Lemma: f(x) = (1ax)1 is the generating function for the sequence 1, (ax), (ax)2, …, (ax)k, … since for a0 and |ax|<1, (1-ax)−1 = (ax)kk=0

∞∑ .

Theorem: If f(x) = akxkk=0∞

∑ and g(x) = bkxkk=0∞

∑ and f and g share the same ball of convergence, then

f(x) + g(x) = (ak+bk)xkk=0∞

∑ and f(x)g(x) = ( a jbk-j)j=0k

∑ xkk=0∞

∑ .

Example: Let f(x) = (1-x)2 be the generating function. What is the sequence? Consider the sequence 1, 1, …, 1, …, which has a generating function of g(x) = (1-x)1. We can use the previous theorem to answer our question:

(1-x)−2 = ( 1j=0k

∑ )k=0∞

∑ xk = (k+1)xkk=0∞

∑ or ak = k+1.

150

Definition: The extended binomial coefficient uk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ for uR and kN0 is defined

by

uk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = u(u1)L (uk+1)/k! if k 0,

1 if k = 0.⎧⎨⎪

⎩⎪

Extended Binomial Theorem: If u,xR such that |x|<1, then

(1+x)u = uk

⎛

⎝⎜⎜

⎞

⎠⎟⎟xkk=0

∞∑ .

Examples:

1..52

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = (.5)(−.5)/2! = −.125 .

2.−nr

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = (−1)r n+r −1

r⎛

⎝⎜⎜

⎞

⎠⎟⎟ = (-1)rC(n+r −1,r) for nN.

151

3. if uN, then the extended binomial theorem is equivalent to the binomial

theorem since uk

⎛

⎝⎜⎜

⎞

⎠⎟⎟ = 0 when k>u.

4. (1−x)−n = C(n+k−1,k)xkk=0∞

∑ (uses examples 2 and 3).

Other Useful Generating Functions:

1−xn+11−x = xkk=0

n∑ .

(1−(ax)r)−1 = (ax)rkk=0∞

∑ .

(1−(ax)r)−n = C(n+k−1,k)(ax)rkk=0∞

∑ .

(1+(ax)r)n = C(n,k)(ax)rkk=0∞

∑ .

(1+(ax)r)−n = (−1)kC(n+k−1,k)(ax)rkk=0∞

∑ .

ex = xk

k!k=0∞

∑ .

ln(x+1) = (−1)kxkkk=0

∞∑ .

152

Note: Generating functions can be used to solve many counting problems.

Examples:

How many solutions are there to the constrained problem a+b = 9 for 3a5 and 4b6? There are 3 total. The number of solutions with the constraints is the coefficient of x9 in (x3+x4+x5)(x4+x5+x6). We choose xa and xb from the two factors, respectively, so that a+b = 9. By inspection, there are only 3 choices for a and b.

How many ways can 8 CPUs be distributed in 3 servers if each server gets 2-4 CPUs each? The generating function is f(x) = (x2+x3+x4)3. We need the coefficient of x8 in f(x). Expansion of f(x) gives us 6 ways.

Note: Maple or Mathematica is really useful in the examples above.

153

Note: Generating functions are useful in solving recurrence relations, too.

Example: ak = 3ak1, k > 0 with a0 = 2. Let f(x) = akxkk=0∞

∑ be the generating function for {ak}. Then xf(x) = ak−1x

kk=1∞

∑ . Using the recurrence relation directly, we have

f(x) – 3xf(x) = akk=0∞

∑ xk − 3 ak−1xk

k=1∞

∑= a0 + (ak −3ak−1)x

kk=1∞

∑= a0

= 2

Hence, f(x) 3xf(x) = (13x)f(x) = 2 or f(x) = 2 / (13x). Using the identity for (1ax)1, we see that

f(x) = 2⋅3kxkk=0∞

∑ or ak = 2⋅3k .

154

Example: an = 8an1 + 10n1 with a0 = 1, which gives us a1 = 9. Find an in closed form. First multiply the recurrence relation by xn to give us anxn + 8an−1x

n + 10n1xn . If f(x) = akxkk=0∞

∑ , then

f(x) 1 = akxkk=1∞

∑= (8ak-1x

k+10k-1xk)k=1∞

∑= 8xf(x) + x/(110x)

Hence,

f(x) = 1−9x(1−8x)(1−10x)

= 121

1−8x+1

1−10x⎛

⎝⎜⎜

⎞

⎠⎟⎟

= 12 8k+10k⎛

⎝⎜⎞⎠⎟xkk=0

∞∑

or an = .5(8k+10k).

155

Note: It is possible to prove many identities using generating functions.

Exclusion-Inclusion Theorem: Given sets Ai, 1in, the number of elements in the union is

Aii=1

nU = Aii=1

n∑

− Ai I A j1≤i<j≤n

n∑

+ Ai I A j I Ak1≤i<j<k≤n

n∑

…

+ (−1)k Aii=1

nI

and there are 2n1 terms in the formula.

Note: Venn diagrams motivate the above theorem.

156

Example: A factory produces vehicles that are car or truck based: 2000 could be cars, 4000 could be trucks, and 3200 are SUV’s, which can be car or truck based (depending on the frames). How many vehicles were produced? Let A1 be the number of cars and A2 be the number of trucks. There are

A1UA2 = A1 + A2 − A1 I A2 = 2000 + 4000 − 3200 = 2800 .

Theorem: The number of onto functions from a set of m elements to a set of n elements with m,nN is

nm C(n,1)(n1)m1 + C(n,2) )(n1)m1 … + (1)n1C(n,n1).

157

Definition: A derangement is a permutation of objects such that no object is in its original position.

Theorem: The number of derangements of a set of n elements is

Dn = 1− (−1)k 1k!k=1n

∑⎛

⎝⎜⎜

⎞

⎠⎟⎟n!

Example: I hand back graded exams randomly. What is the probability that no student gets his or her own exam? It is Pn = Dn / n! since there are n! possible permutations. As n, Pne1.

158

Relations

Definition: A relation on a set A is a subset of AA.

Definition: A binary relation between two sets A and B is a subset of AB. It is a set R of ordered pairs, denoted aRb when (a,b)R and aRbwhen (a,b)R.

Definition: A n-ary relation on n sets A1, …, An is a subset of A1…An. Each Ai is a domain of the relation and n is the degree of the relation.

Examples:

Let f: AB be a function. Then the ordered pairs (a,f(a)), aA, forms a binary relation.

Let A = {Springfield} and B = {U.S. state | Springfield in the state}. Then (Springfield,U.S. states) is a relation with about 44 elements (the so-called Simpsons relation).

Theorem: Let A be a set with n elements. There are 2n2 unique relations on A.

159

Proof: We know there are n2 elements in AA and that there are 2m possible subsets of a set with m elements. Hence, the result.

Definitions: Consider a relation R on a set A. Then R is reflexive if (a,a)R, aA. R is symmetric if (a,b)R and (b,a)R, a,bA. R is antisymmetric if (a,b)R and (b,a)R, then a=b, a,bA. R is transitive if (a,b)R and (b,c)R, then (a,c)R, a,b,cA.

Theorem: Let A be a set with n elements. There are 2n(n1) unique transitive relations on A.

Proof: Each of the n pairs (a,a)R. The remaining n(n1) pairs may or may not be in R. The product rule and previous theorem give the result.

160

Examples: Let A = {1, 2, 3, 4}.

R1 = {(1,1), (1,2), (2,1), (2,2), (3,4), (4,1), (4,4)} iso just a relation

R2 = {(1,1), (1,2), (2,1)} iso symmetric

R3 = {(1,1), (1,2), (1,4), (2,1), (2,2), (3,3), (4,1), (4,4)} iso reflexive and symmetric

R4 = {(2,1), (3,1), (3,2), (4,1), (4,2), (4,3)} iso antisymmetric and transitive

R5 = {(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), (3,3), (3,4), (4,1), (4,4)} iso reflexive, antisymmetric, and transitive

R6 = {(3,4)} iso antisymmetric

Note: We will come back to these examples when we get around to representations of relations that work in a computer.

161

Note: We can combine two or more relations to get another relation. We use standard set operations (e.g., , , , , …).

Definition: Let R be a relation on a set A to B and S a relation on B to a set C. Then the composite of R and S is the relation SoR such that if (a,b)R and (b,c)S, then (a,c) SoR , where aA, bB, and cC.

Definition: Let R be a relation on a set A. Then Rn is defined recursively: R1 = R and Rn=Rn−1oR , n>1.

Theorem: The relation R is transitive if and only if RRn, n1.

162

Representation: The relation R from a set A to a set B can be represented by a zero-one matrix MR = [mij], where

mij=1 if (ai,b j)∈R,0 if (ai,βj)∉R.

⎧

⎨⎪⎪

⎩⎪⎪

Notes:

This is particularly useful on computers, particularly ones with hardware bit operations for packed words.

MR contains I for reflexive relations. MR = MR

T for symmetric relations. mij = 0 or mji = 0 when ij for antisymmetric relations.

163

Examples:

MR = 1 1 01 1 10 1 1

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

is transitive and symmetric.

MR = 0 1 00 0 00 1 0

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

is antisymmetric.

164

Representation: A relation can be represented as a directed graph (or digraph). For (a,b)R, a and b are vertices (or nodes) in the graph and a directional edge runs from a to b.

Example: The following digraph represents {(a,b), (b,c), (c,a), (c,b)}.

a b

c

What about all of those examples on page 159 of the class notes? We can do all of them over in either representation.

165

Examples (from page 159):

MR1=

1 1 0 01 1 0 00 0 0 11 0 0 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

MR2=

1 1 0 01 0 0 00 0 0 00 0 0 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

or a digraph a1 a2

MR3=

1 1 0 11 1 0 00 0 1 01 0 0 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

166

MR4=

0 0 0 01 0 0 01 1 0 01 1 1 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

MR5=

1 1 1 11 1 1 10 0 1 01 0 0 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

MR6=

0 0 0 00 0 0 00 0 0 10 0 0 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

or the digraph a3 a4

167

Definition: A relation on a set A is an equivalence relation if it is reflexive, symmetric, and transitive. Two elements a and b that are related by an equivalence relation are called equivalent and denoted a~b.

Examples:

Let A = Z. Define aRb if and only if either a = b or a = b.o symmetric: aRa since a = a.o reflexive: aRb bRa since a = b.o transitive: aRb and bRc aRc since a = b = c.

Let A = R. Define aRb if and only if abZ.o symmetric: aRa since aa = 0Z.o reflexive: aRb bRa since abZ (ab) = baZ.o transitive: aRb and bRc aRc since (ab)+(bc) Z acZ.

168

Definition: Let R be an equivalence relation on a set A. The set of all elements that are related to an element aA is called the equivalence class of a and is denoted by [a]R. When R is obvious, it is just [a]. If b[a]R, b is called a representative of this equivalence class.

Example: Let A = Z. Define aRb if and only if either a = b or a = b. There are two cases for the equivalence class: [0] = {0} [a] = {a, a} if a0.

169

Theorem: Let R be an equivalence relation on a set A. For a,bA, the following are equivalent:

1. aRb2. [a] = [b]3. [a] [b] .

Proof: 1 2 3 1. 1 2: Assume aRb. Suppose c[a]. Then aRc. Due to symmetry,

we know that bRa. Knowing that bRa and aRc, by transitivity, bRc. Hence, c[b]. A similar argument shows that if c[b], then c[a]. Hence, [a] = [b].

Assume that [a] = [b]. Since aA and R is reflexive, [a] [b] .

Assume [a] [b] . So there is a c[a] and c[b], too. So, aRc and bRc. By symmetry, cRb. By transitivity, aRc and cRb, so aRb.

Lemma: For any equivalence relation R on a set A, [a]R=Aa∈AU .Proof: For all aA, a[a]R.

170

Definition: A partition of a set S is a collection of disjoint sets whose union is A.

Theorem: Let R be an equivalence relation on a set S. Then the equivalence classes of R form a partition of S. Conversely, given a partition {A i | iI} of the set S, there is an equivalence relation R that has the sets A i, iI, as its equivalence classes.

171

Graphs

Definition: A graph G = (V,E) consists of a nonempty set of vertices V and a set of edges E. Each edge has either one or two vertices as endpoints. An edge connects its endpoints.

Note: We will only study finite graphs (|V| < ).

Categorizations:

A simple graph has edges that connects two different vertices and no two edges connect the same vertex.

A multigraph has multiple edges connecting the same vertices. A loop is a set of edges from a vertex back to itself. A pseudograph is a graph in which the edges do not have a direction

associated with them. An undirected graph is a graph in which the edges do not have direction. A mixed graph has both directed and undirected edges.

172

Definition: Two vertices u and v in an undirected graph G are adjacent (or neighbors) in G if u and v are endpoints of an edge e in G. Edge e is incident to {u,v} and e connects u and v.

Definition: The degree of a vertex v, denoted deg(v), in an undirected graph is the number of edges incident with it except that loops contribute twice to the degree of that vertex. If deg(v) = 0, then it is isolated. If deg(v) = 1, then it is a pendant.

Handshaking Theorem: If G = (V,E) is an undirected graph with e edges, then e= deg(v)v∈V∑⎛⎝⎜

⎞⎠⎟/2 .

Proof: Each edge contributes 2 to the sum since it is incident to 2 vertices.

Example: Let G = (V,E). Suppose |V| = 100,000 and deg(v) = 4 for all vV. Then there are (4100,000)/2 = 200,000 edges.

173

Theorem: An undirected graph has an even number of vertices and an odd degree.

Definition: Let (u,v)E in a directed graph G(V,E). Then u and v are the initial and terminal vertices of (u,v), respectively. The initial and terminal vertices of a loop (u,u) are both u.

Definition: The in-degree of a vertex, denoted deg(v), is the number of edges with v as their terminal vertex. The out-degree of a vertex, denoted deg+(v), is the number of edges with v as their initial vertex.

Theorem: For a directed graph G(V,E), deg−(v) = v∈V∑ deg+(v) = v∈V∑ E .

174

Examples of Simple Graphs:

A complete graph has an edge between any vertex. A cycle Cn is a graph with |V|3 such that the n edges are from {v1,v2},

{v2,v3}, …, {vn,v1}.

A wheel Wn is a cycle Cn with an extra vertex with an edge connecting to each vertex in Cn.

175

Definition: A simple graph G = (V,E) is bipartite if V = V1V2 with V1V2 = and every edge in the graph connects a vertex in V1 to a vertex in V2. The pair (V1,V2) is a bipartition of V in G.

Theorem: A simple graph is bipartite if and only if it is possible to assign one of two colors to each vertex of the graph so that no two adjacent vertices are assigned the same color.

Definition: The union of two simple graphs G = (V,E) and H = (W,F) is the simple graph GH = (VW,EF).

176

Representation: For graphs without multiple edges we can use adjacency lists or matrices. For general graphs we can use incidence matrices.

Definition: Let G(V,E) have no multiple edges. The adjacency list LG = {av}vV, where av = adj(v) = {wV | w is adjacent to v}.

Definition: Let G(V,E) have no multiple edges. The adjacency matrix AG = [aij] is

aij=1 if {vi,vj} is an edge of G,0 otherwise.

⎧⎨⎪

⎩⎪

Example:

v1 v2

v4 v3

results in AG = 0 1 1 01 0 0 11 0 0 10 1 1 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

and LG =

v1:v2:v3:v4:

⎧

⎨

⎪⎪⎪

⎩

⎪⎪⎪

v2,v3v1,v4v1,v4v2,v3

.

177

Note: For an undirected graph, AG = AGT . However, this is not necessarily true

for a directed graph.

Definition: The incidence matrix M = [mij] for G(V,E) is

mij=1 when edge ei is incident with vj,0 otherwise.

⎧⎨⎪

⎩⎪

Definition: The simple graphs G(V,E) and H = (W,F) are isomorphic if there is an isomorphism f: VW, a one to one, onto function, such that a and b are adjacent in G if and only if f(a) and f(b) are adjacent in H for all a,bV.

178

Examples:

v1 v2

v4 v3

and

v1 v2

v3 v4

are not isomorphic.

v1 v2

v3 v4

and

v1 v2

v4 v3

are isomorphic.

Note: Isomorphic simple graphs have the same number of vertices and edges.

Definition: A property preserved by graph isomorphism is called a graph invariant.

Note: Determining whether or not two graphs are isomorphic has exponential worst case complexity, but linear average case complexity using the bet algorithms known.

179

Definition: Let G = (V,E) be an undirected graph and nN. A path of length n for u,vV is a sequence of edges e1, e2, …, enE with associated vertices in V of u = x0, x1, …, xn = v. A circuit is a path with u = v. A path or circuit is simple if all of the edges are distinct.

Notes:

We already defined these terms for directed graphs. The terminal vertex of the first edge in a path is the initial vertex of the

second edge. We can define a path using a recursive definition.

Definition: An undirected graph is connected if there is a path between every pair of distinct vertices in the graph.

180

Theorem: There is a simple path between every distinct pair of vertices of a connected undirected graph G = (V,E).

Proof: Let u,vV such that u v. Since G is connected, there is a path from u to v that has minimum length n. Suppose this path is not simple. Then in this minimum length path, there is some pair of vertices x i=xjV for some 0i<j n. Hence, there is a shorter path from u to v, which is a contradiction.

Definition: A connected component of a graph is a connected subgraph of G that is not a proper subgraph of another connected subgraph of G.

Note: A connected component is a maximally connected subgraph.

181

Example: Telecoms analyze call graphs routinely in order to provide better, less expensive services. The old AT&T used to publish information routinesly (typically by Bell Labs researchers). One of their recent published graphs G = (V,E) had |V| ~ 54,000,000 with |E| ~ 170,000,000. G had approximately 3,700,000 connected subgraphs. Most of the subgraphs were of size 2 or just slightly larger. However, one was of size approximated 45,000,000 with all of the vertices being connected with less or equal to 20 calls.

Note: Sometimes removing a vertex v and all of the edges incident to v produces a subgraph with more connected components that the original graph. The vertx v is called a cut vertex or an articulation point.

Definition: A directed graph G = (V,E) is strongly connected if there are paths from both u to v and v to u for all distinct u,vV. G is weakly connected if there is a path between and two distinct vertices in the underlying undirected graph. The maximal strongly connected subgraphs of G are strongly connected components.

182

Theorem: Let G = (V,E) be a graph with adjacency matrix A. The number of different paths of length n from vi to vj, where vi,vjV and nN, is the (i,j) entry in An.

Example:

v1 v2

v4 v3

A = 0 1 1 01 0 0 11 0 0 10 1 1 0

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

and A4 = 8 0 0 80 8 8 00 8 8 08 0 0 8

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

Note: The theorem can be used to find the shortest path between any two vertices and also to determine if a graph is connected.

183

Definition: Let G = (V,E) have an associated weighting function w(u,v): VVR. G is called a weighted graph. The weighted length of a path in G is the sum of the weights for the edges in the path.

Example: Let G = (V,E) be a weighted graph where V represents airports. Then some interesting weighting functions include the following between pairs of distinct airports:

Distance Flight times Airfares Frequent flier miles Frequent flier qualification miles

Note: Weighted graphs are extremely important in analyzing transportation of goods and people and trying to minimize time and expenses.

184

Dijkska’s Algorithm (Shortest Path) – [published in 1959]

Procedure Dijkstra( G = (V,E) with w: VVR+. G is a weighted connected simple graph, a,zV: initial and terminal vertices )

for i := 1 to nL(i) :=

L(a) := 0S := while zS

u := a vertex not in S with L(u) minimalS := S{u}for all vV such that vS

if L(u) + w(u,v) < L(v) then L(v) := L(u,v) + w(u,v){ L(z) = length of shortest path from a to z. }

185

Theorem: Dijkstra’s algorithm finds the length of the shortest path between two vertices in a connected simple undirected weighted graph. The algorithm uses O(n2) comparison and addition operations.

Traveling Salesman Problem: Find the circuit of minimum total weight in a weighted complete undirected graph that visits every vertex exactly once and returns to its starting vertex.

Note: There are n! possible circuits to consider, which is intractable when n is sufficiently large. A tremendous amount of research has been devoted to finding fast approximate solution algorithms. The best ones can produce a circuit of length 1,000 in a few seconds and still be within 2% of the optimum circuit.

186

Definition: A coloring of a simple graph is the assignment of a color to each vertex of the graph so that no adjacent vertices are assigned the same color.

Definition: The chromomatic number c(G) is the least number of colors needed for a coloring of the graph G = (V,E).

Definition: A planar graph is a graph that can be drawn in a plane with no edges crossing in the picture.

Four Color Theorem: If G is a planar graph, then c(G) 4.

Note: The Four Color Conjecture was made in the 1850’s and not proven until 1976. Like Fermat’s last theorem, this theorem became famous partly for how many wrong proofs (some quite ingenious) were either published or submitted for publication.

187

Trees

Definition: A tree is a connected undirected graph with no simple circuits. A weighted tree is a tree with weights associated with the edges.

Uses:

An efficient data structure for searching a list.o Useful in encoding data for transmission.o Computational complexity easily determined for algorithms using trees.

Weighted trees have edges with weights.o Useful in decision making.o Used by telecoms to dynamically connect calls cheaply.

Historical Note: Trees were first developed in the context of this course to describe molecules in chemistry, where atoms were the vertices and bonds were the edges.

188

Theorem: An undirected graph T = (V,E) is a tree if and only if there is a unique simple path between any two of its distinct vertices.

Proof: 1. Assume T is a tree, so it has no simple circuits. Since T is connected, for all

distinct u,vV, there is exactly one simple path between u and v. Otherwise, there is another simple path. Combining the two simple paths is a circuit, which is a contradiction that T is a tree.

2. Assume that there is a unique simple path between any two distinct vertices u,vV. The T is connected. T has no simple circuits since then there would be two simple paths between u and v (thus forming a crcuit), which is a contradiction.

Definition: A rooted tree is a tree with one vertex designated as the root and every edge is directed away from the root.

Note: Any tree can become a rooted tree by picking the right vertex as the root.

189

Terminology/Definitions: Let T = (V,E) be a rooted tree. Then

If vV is not a root of, the parent wV of v is a vertex with an edge directed at v and v is a child of u.

If viV are children of the same uV, they are siblings. The ancestors viV of uV are any vertices in V except the root which are

in the path from the root to u. The descendents viV of uV are all vertices with u as an ancestor. A leaf vV is a vertex with no children. An internal vertice vV has children. A subtree is the subgraph formed from aV and all of its descendents and

the edges incident to these descendents. The height of a rooted tree T, denoted h(T), is the maximum number of

levels (or vertices). A balanced rooted tree T has all of its leaves at h(T) or h(T)-1.

190

Definition: A m-ary tree is a rooted tree such that every internal vertex has no more than m children. A full m-ary tree is a rooted tree such that every internal vertex has exactly m children. If m = 2, it is a (full) binary tree.

Definition: An ordered rooted tree is a rooted tree with an ordering applied to the children of all of the children of the root and the internal vertices.

Examples:

Management charts Directory based file or memory systems

Theorem: A tree with n vertices has n1 edges.The proof is by mathematical induction.

Theorem: A full m-ary tree with i internal vertices contains n = mi+1 vertices.

Proof: There are mi children plus the root.

191

Theorem: A full m-ary tree with n vertices has i = (n1)/m internal vertices and q = [(m1)n+1]/m leaves. i internal vertices has n = m+1 vertices and q = (m1)i + 1 leaves. q leaves has n = (mq1) / (m1) vertices and i = (q1) / (m1) internal

vertices.

Theorem: There are at most mh leaves in a m-ary tree of height h.

The proof uses mathematical induction.

Corollary: If an m-ary tree of height h has q leaves, then h logmq. For a full m-ary and balnced m-ary tree, h = logmq.

192

Definition: A binary search tree T = (V,E) is a binary tree with a key for each vertex. The keys are ordered such that a key for a vertex is greater in value than all keys associated with its left subtree and less in value than all keys associated with its right subtree. The key for vertex vV is denoted by label(v).

Note: Recursive algorithms search binary trees for keys in O(loghn) operations for a binary tree of height h and with n vertices.

Notation: Let T = (V,E) be a binary tree.

Let root(T) be the root vertex in T. Let left_child(v) and right_child(v) refer to the left or right child of a root or

internal vertice v in a binary tree. Let add_new_vertex(parent, value) add a new left or right vertex to the

parent vertex with a key of value. The details are left intentionally fuzzy.

Note: One of the most common operation with a binary tree is to search it. Another is to search a binary tree for a key and add it if it is missing.

193

procedure insertion( T = (V,E): binary tree, x: item )v := root(T)while v and label(v) x

if x < label(v) thenif left_child(v) then

v := left_child(v)else

add_new_vertex(left_child(v), x) and v = else

if right_child(v) thenv := right_child(v)

elseadd_new_vertex(right_child(v), x) and v =

if root(T) = thenadd_new_vertex(T, x)

else if v = or label(v) = thenlabel the new vertex x and set v := the new vertex

{ v = location of x. }

194

Definition: A decision tree is a rooted tree in which the children are the possible outcomes of their ancestors’ keys.

Note: There is usually a weighting associated with a decision tree. The keys may not be unique.

Definition: A prefix code is an encoding based on bit strings representing symbols such that a symbol, as a bit string, never occurs as the first part of another symbol’s bit string.

Example: We can represent normally a-z in 5 bits and a-zA-Z in 6 bits. Suppose we only have 3 letters: a = 0, c = 10, and t = 11. Then cat = 10011. Wowee! We saved one whole bit!!!

Representation: Prefix codes form a binary tree.

195

Example: The prefix code for a = 0, c = 10, and t = 11 is stored as

0 1

a 0 1

c t

Definition: A Huffman coding takes the frequency of symbols and is the prefix code with the smallest number of bits.

Note: Huffman coding was a course project by a graduate student at MIT in the 1950’s. Needless to say, his professor was stunned.

196

procedure Huffman(ai: symbols, wi: frequencies, 1in )F := forest of n rooted trees, each with a single vertex ai with weight wi

while F treeReplace the rooted trees T and T’ of least weights from F with w(T) w(T’) with a tree T’’ having a new root that has T and T’ as it left and right children. Label the edge to T as 0 and the edge to T’ as 1.Assign w(T) + w(T’) to the new tree T’’

{ Huffman encoding tree is complete. }

197

Example: Given {(a,1), (c,2), (t,3)} as (symbol,frequency). What is the Huffman coding?

Initial forest (a,1) (c,2) (t,3)

Step1 3 (t,3)0 1 a c

Step 2 60 1 a

0 1 c t

198

Note: Game trees are another highly studied tree.

Definition ( Minimax Strategy ) : The value of a vertex in a game tree is defined recursively as:

1. The value of a leaf is the payoff to the first player when the game terminates in the position represented by this leaf.

2. The value of an internal vertex at an even level is the maximum of the values of its children. The value of an internal vertex at an odd level is the inximum of the values of its children.

Theorem: The value of a vertex v of a game tree tells us the payoff to the first player if both players follow the Minimax strategy and play starts from the position represented by vertex v.

Notes: Game trees are Enormous (not just slightly, but really, really enormous) Lead to optimal solutions (if you can compute them) Basically intractable using standard computer

199

Note: Tree traversal is extremely important to accessing data. There are many algorithms, each with a plus and a minus. We will study three traversal algorithms:

Preorder Inorder Postorder

These traversal methods not only are used for data storage, but for representing arithmetic that is useful for compilers.

Definition: The universal addressing system is defined recursively for an ordered rooted tree T = (V,E). The root rV is labeled 0 and its k children are labeled 1, …, k. For each vertex vV, labeled Av, its n children are labeled Av.1, Av.2, …, Av.n.

200

Example: Given a tree T = (V,E) with keys ordered 0 < 1 < 1.1 < 2 < 2.1 < 2.2 < 2.2.1 < 2.3, we represent it as

0

1 2

1.1 2.1 2.2 2.3

2.2.1

We will use this example for quite some time.

201

Definition ( Preorder Traversal ) : Let T be an ordered rooted tree with root r. If T consists only of r, then r is the preorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the preorder traversal begins at r and continues by traversing T1 in preorder, T2 in preorder, …, and Tn

in preorder.

Example: In the tree example at the top of page 199, the preorder traversal order is 0, 1, 1.1, 2, 2.1, 2.2, 2.2.1, and 2.3.

Definition ( Inorder Traversal ) : Let T be an ordered rooted tree with root r. If T consists only of r, then r is the inorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the inorder traversal begins by traversing T1 in inorder, then r, and continues with T2 in inorder, …, and Tn in inorder.

Example: In the tree example at the top of page 199, the inorder traversal order is 1.1, 1, 0, 2.1, 2, 2.2.1, 2.2, and 2.3.

202

Definition ( Postorder Traversal ) : Let T be an ordered rooted tree with root r. If T consists only of r, then r is the postorder traversal of T. Otherwise, suppose T1, T2, …, Tn are subtrees at r from left to right in T. Then the postorder traversal begins by traversing T1 in postorder, T2 in postorder, …, Tn in postorder, and r.

Example: In the tree example at the top of page 199, the postorder traversal order is 1.1, 1, 2.1, 2.2.1, 2.2, 2.3, 2, and 0.

Notation: Let add_to_list(v) be a global function to append a vertex v to a list. The list must be initialized to at some point before use.

Note: The tree traversal algorithms are all easily defined recursively using a global list that must be initialized first.

203

procedure preorder_traversal( T: ordered rooted tree )r := root(T)add_to_list(r)for each child c of r from left to right

T(c) := subtree with c as its rootpreorder_traversal( T(c) )

procedure inorder_traversal( T: ordered rooted tree )r := root(T)if r = leaf then add_to_list(r)else

q := first child of r from left to rightT(q) := subtree with q as its rootinorder( T(q) )add_to_list(r)for each remaining child c of r from left to right

T(c) := subtree with c as its rootinorder_traversal( T(c) )

204

procedure postorder_traversal( T: ordered rooted tree )r := root(T)for each child c of r from left to right

T(c) := subtree with c as its rootpostorder_traversal( T(c) )

add_to_list(r)

Definition: Logic and arithmetic can be rewritten using binary trees. Using inorder, preorder, or postorder traversal of the binary tree is known as infix, prefix, or postfix notation.

Note: The best known is prefix notation, otherwise known as reverse Polish notation (RPN). This was used in the first pocket sized electronic calculator, the HP-45 (1972). This notation is valuable in writing compilers, too. See

http://glow.sourceforge.net/tutorial/lesson7/side_rpn.html http://www.hpmuseum.org/rpn.htm

205

http://www.hpmuseum.org/rpn.htm

http://glow.sourceforge.net/tutorial/lesson7/side_rpn.html

Examples: Parentheses disappear completely. It is best to think of a RPN calculator as a stack machine where data is in the stack and arithmetic operates on the top elements of the stack.

The expression 2+3 is written as 2 3 + in RPN. The expression [(9+3) * (4/2)] - [(3x) + (2-y)] is written as 9 3 + 4 2 / * 3 x

* 2 y - + - in RPN, where x and y are numbers.

Tree representation: Labels are the operations on internal vertices or the root and values (constants or simple variables) on the leaves.

Example: 4 * 3 + 2 in RPN is 4 3 * 2 +, or

+

* 2

4 3

206

Definition: Let G = (V,E) be a simple graph. A spanning tree of G is a subgraph of G that is a tree containing every vertex in G.

Example: Your instructor wants his town, the states of Connecticut and New York, and New York City to keep the roads and highways cleared in of ice and snow connecting his house and Laguardia airport. A graph connecting each of the relevant endpoints and connecting points can be made. The relevant agencies can use this graph when deciding how to keep roads open after a storm.

G G G

PC PC PC

RB RB RB

S S S

WB LGA WB LGA WB LGA

207

Theorem: A simple graph G is connected if and only if it has a spanning tree T.

Example: Multicasting over networks.

Note: Constructing a spanning tree can be done in many different ways, including some very inefficient ones. Two common ways are depth first and breadth first searches.

Notation: Let visit(v) mean that we keep track of when we first go to vertex v until we return to v using a backtrack.

procedure visit( G = (V,E): connected graph, T: tree )for each wV adjacent to v and not yet in T

add w and edge {v,w} to Tvisit(w, T)

208

procedure depth_first( G = (V,E): connected graph )T := tree with only some single vVvisit( v, T ){ T is a spanning tree. }

procedure breadth_first( G = (V,E): connected graph )T := tree with only some single vVL := vwhile L

Remove first vertex vLfor each neighbor wV of v

if wL and wT thenAdd w to the end of LAdd w and edge {v,w} to T

{ T is a spanning tree. }

209

Theorem: Let G = (V,E) be a connected graph with |V| = n. Then either depth first or breadth first takes O(e), or O(n2), steps to construct a spanning tree.

Proof: For a simple graph, |E| n(n1)/2.

Bactracking applications:

Graph coloring: can a graph be colored in n colors n-Queens problem: find places on a nn board so n queens are toothless

Sums of subsets: Given xi⎧⎨⎩

⎫⎬⎭i=1

n, where xiN, find a subset whose sum is M

Web crawlers: search all hyperlinks on a network efficiently

210

Definition: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible sum of weights on its edges.

procedure Pim( G = (V,E): weighted connected undirected graph )T := minimum weighted edgefor i := 1 to |V|2

e := an edge of minimum weight incident to a vertex in T not forming a simple circuit in T if it is added to T

T := T with e added{ T is a minimum spanning tree. }

procedure Kruskal(G = (V,E): weighted connected undirected graph )T := empty graphfor i := 1 to |V|1

e := an edge in G of minimum weight that does not form a simple circuit in T if it is added to T

T := T with e added{ T is a minimum spanning tree. }

211

Theorem: The cost of Pim’s algorithm is O(|E|log|V|). The cost of Kruskal’s algorithm is O(|E|log|E|).

Definition: A graph G = (V,E) is sparse if |E| is very small with respect to |V|2.

Comment: Sparse is ill defined intentionally. There are different degrees of sparseness, too (highly sparse, very sparse, somewhat sparse, hardly sparse, not sparse, and the Scottish favorite, a wee bit sparse). Matrices can also be categorized as (fill in the blank type) sparse based on their graphs.

Note: When G is sparse, Kruskal’s algorithm is much less expensive than Pim’s algorithm.

212

Boolean Algebra

Definition: Let B = { 0, 1 } and Bn = BB…B ( n times). A Boolean variable xB. A Boolean function of degree n is a function f: BnB.

Notation: For x,yB, define

x + y = x y x × y = x y x = ¬x

using the logic predicate notation from the class notes (circa pages 5-6).

Definition: A Boolean algebra is a set B with binary operators and , the unitary operator ¬, elements 0 and 1, and the following laws holding for all elements of B: identity, complement, associative, commutative, and distributive.

213

Logic gates: Boolean algebra is used to model electronic logic gates, such as AND, OR, NOT, XAND, XOR, … We design functions with Boolean algebras and operators. Then we build them using the right gates and wiring patterns. Typical symbols for AND, OR, and NOT are the following:

AND: OR: NOT:

These are two input AND and OR gates. Versions of these gates exist for more than two inputs and perform the expected operation on all of the inputs to get one output.

Definition: A simple output circuit takes the input(s) and has one output. A multiple output circuit takes input(s) and has multiple outputs.

Example: The gates above are simple output circuits.

214

Examples: Most circuits are of the multiple output variety. A half adder adds two bits producing a single bit sum plus a single bit carry:

S := (xy) (¬(xy)) = xy and Cout := xy. A half adder has two AND, one OR, and one NOT gates.

A full adder computes the complete two bit sum and carry out: S := (xy)cin, where Cin is the incoming carry. The carry is quite complicated: Cout := (x×y) + (y×Cin) + (Cin×x). A full adder has two half adders and an OR gate.

Ripple adders, lookahead adders, and lookahead carry circuits use many bits as input to implement integer adders.

Half adder Full adder

215

Note: Minimizing the Boolean algebra function means a less complicated circuit. Simpler circuits are cheaper to make, take up less space, and are usually faster. Add in how many devices are made and there is potentially a lot of money involved in saving even a small amount of circuitry.

There are two basic methods for simplifying Boolean algebra functions:

Karnaugh maps (or K-maps) provide a graphical or table driven technique that works up to about 6 variables before it becomes too complicated.

The Quine-McCluskey algorithm works with any number of variables.

Going to Google and searching on Karnaugh map software leads to a number of programs to do some of the work for you.

Definition: A literal of a Boolean variable is its value or its complement. A minterm of Boolean variables x1, x2, …, xn is a Boolean product of the {xi,xi} .

Note: A minterm is just the product of n literals.

216

Karnaugh maps: The area of a K-map rectangle is determined by the number of variables (n) and how many (k) are used in a Boolean expression: 2nk. Common arrangements are

2 variables: 22, 3 variables: 42, and 4 variables: 44.

Each variable contributes two possibilities to each possibility of every other variable in the system. K-maps are organized so that all the possibilities of the system are arranged in a grid form and between two adjacent boxes only one variable can change value. Each square in a K-map corresponds to a minterm.

Cover the ones on the map by rectangule that contain a number of boxes equal to a power of 2 (e.g., 4 boxes in a line, 4 boxes in a square, 8 boxes in a rectangle, etc.). Once the ones are covered, a term of a sum of products is produced by finding the variables that do not change throughout the entire covering, and taking a 1 to mean that variable and a 0 as the complement of that variable. Doing this for every covering produces a matching function.

217

Given a Boolean function f with inputs x1, …, xn, make a table with all possible inputs and outputs. Then create a K-map with the variables on the left and top sides of the rectangle. Look for 1’s. The rectangle is a torus, so look for wrap arounds, too.

Example: f: B4B with a corresponding K-map of

x1, x2

00 01 11 1000 0 0 1 1

x3, 01 0 0 1 1x4 11 0 0 0 1

10 0 1 1 1

The K-map is colored to try to find patterns in the Boolean expression that can be simplified. It is quite common to eliminate some of the Boolean variables using this approach. Use high quality software if you use the K-map approach.

218

Definition: An implicant is sum term or product term of one or more minterms in a sum of products. A prime implicant of a function is an implicant that cannot be covered by a more reduced (i.e., one with fewer literals) implicant.

Note: Suppose f is a Boolean function and P is a product term. Then P is an implicant of f if f takes the value 1 whenever P takes the value 1. This is sometimes written as P f in the natural ordering of the Boolean algebra.

Quine-McCluskey: This algorithm has two steps:

1. Find all prime implicants of the function.2. Use those prime implicants in a prime implicant chart to find the essential

prime implicants of the function as well as other prime implicants that are necessary to cover the function.

The algorithm constructs a table and then simplifies the table. The method leads to computer implementations for large numbers of variables. Use high quality software if you use the Quine-McCluskey approach.

219

discrete mathematics - mgnet home pagemgnet.org/~douglas/classes/discrete-math/notes/discrete... ·...

Documents