algorithms and complexity bioinformatics spring 2008 hiram college algorithm definition slides taken...

57
Algorithms and Complexity Bioinformatics Spring 2008 Hiram College hm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Upload: esmond-hunter

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Algorithms and Complexity

Bioinformatics

Spring 2008

Hiram College

Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Page 2: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

What is an algorithm?

• An algorithm is a

• well-ordered collection of

• unambiguous and

• effectively computable operations that, when executed,

• produces a result and

• halts in a finite amount of time.

Page 3: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

AN EXAMPLE OF A VERY SIMPLE ALGORITHM

• 1. Wet your hair.• 2. Lather your

hair.• 3. Rinse your

hair.• 4. Stop.

We assume that

The algorithm begins executing at the top of the list of operations.

The "Stop" can be omitted if we assume the last line is an implied "Stop" operation.

Observe:

Operations need not be executed by a computer only by an entity capable of carrying out the operations listed.

Page 4: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

A well-ordered collection of operations

The question that must be answered is:

At any point in the execution of the algorithm, do you know what operation is to be performed next?

Well-ordered operations:

1. Wet your hair.

2. Lather your hair.

3. Rinse your hair.

Not well-ordered operations:

1. Either wet your hair or lather your hair.

2. Rinse your hair.

Page 5: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Well-ordered operations:

1. If your hair is dirty, then

a. Wet your hair.

b. Lather your hair.

c. Rinse your hair.

2. Else

a. Go to bed.

Note: We will often omit the numbers and the letters and assume a "top-down" reading of the operations.

Choices are allowed:

Well-ordered operations:

If your hair is dirty, then

Wet your hair.

Lather your hair.

Rinse your hair.

Else

Go to bed.

Page 6: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Unambiguous operations

The question that must be answered is:

Does the computing entity understand what the operation is to do?

This implies that the knowledge of the computing entity must be considered.

For example, is the following ambiguous?

Make the pie crusts.

Page 7: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

To an experienced cook,

Make the pie crusts.

is not ambiguous.

But, an less experienced cook may need:

Take 1 1/3 cups of flour.

Sift the flour.

Mix the sifted flour with 1/2 cup of butter and 1/4 cup of water to make dough.

Roll the dough into two 9-inch pie crusts.

or even more detail!

Page 8: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Definition: An operation that is unambiguous is called a primitive

operation (or just a primitive)

One question we will be exploring in the course is what are the primitives of a computer.

Note that a given collection of operations may be an algorithm with respect to one computing agent, but not with respect to another computing agent!!

Page 9: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Primitives for Computer Algorithms (e.g. PERL)

• Mathematical operations: add, subtract, multiply, divide, log, sqrt, …

• String operations: append, substring, reverse, …

• File operations: read, write, append

• Other I/O: print, scan

Page 10: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Primitives for Biological Algorithms

• Bind (a molecule binds to a site)

• Separate (strands of DNA)

• Polymerize (add base to strand)

• Repair gaps

• These primitives make up an algorithm for DNA replication (A, pp. 14-16)

Page 11: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Effectively computable operations

The question that must be answered is:

Is the computing entity capable of doing the operation?

This assumes that the operation must first be unambiguous- i.e. the computing agent understands what is to be done.

Not effectively computable operations:

Write all the fractions between 0 and 1.

Create matter from nothing

Page 12: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

that, when executed, produces a result

The result need not be a number or piece of text viewed as "an answer".

It could be an alarm, signaling something is wrong.

It could be an approximation to an answer.

It could be an error message.

The question that must be answered is:

Can the user of the algorithm observe a result produced by the algorithm?

Page 13: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

halts in a finite amount of timeThe question that must be answered is:

Will the computing entity complete the operations in a finite number of steps and stop?

Do not confuse "not finite" with "very, very large". A failure to halt usually implies there is an infinite loop in the collection of operations:

1. Write the number 1 on a piece of paper.

2. Add 1 to the number you just wrote and write it on a piece of paper.

3. Repeat 2.

4. Stop.

Page 14: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Definition of an algorithm:• An algorithm is a well-ordered collection

of unambiguous and effectively computable operations that, when executed, produces a result and halts in a finite amount of time.

Note: Although I have tried to give clean cut examples toillustrate what these new words mean, in some cases, a collection of operations can fail for more than one reason.

Page 15: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

A Language for Algorithms

• Natural Language (English)?– Too ambiguous

• Programming Language (Perl)?– Too much new syntax to learn

• Pseudocode– A compromise. (… just right)

Page 16: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

What is Pseudocode?

• Structured like a programming language, but ignores many syntactical details (like $ and ;)

• Complex operations can be written in natural language

• Still, we need to agree on some standard operations…

Page 17: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Our Pseudocode (pp. 8-11)

• Assignment

• Arithmetic

• Conditional execution

• Repeated execution

• Array access

• Functions

Page 18: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Assignment and Variables

• A variable has a name (which can be anything in pseudocode) and a value.

• Assignment changes the value• Examples:

– myName <- “Ellen Walker”– number <- 17– copy <- number– number <- 98765

Page 19: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Arithmetic

• In pseudocode, mathematical symbols are allowed

• But, programming language style math is easier to type

dist <- sqrt((x[2]-x[1])^2 + (y[2]-y[1])^2)€

dist← x2 − x1( )2

+ y2 − y1( )2

Page 20: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Conditional (If Statement)

• Allows a choice to be made, given:– a condition,– something to do if the condition is true– something to do if the condition is false.

• Example:if today is a weekday

go to work

else

stay home

Page 21: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Repeated Execution (Loops)

• Need to know:– Which instructions to repeat– When to stop repeating

• Two kinds of loops– While loop: stop when a condition is true– For loop: repeat a specific number of

times

Page 22: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

For Loop

• Loop controlled by a variable• Executes once for each value of the

variable, from a given starting value to a given ending value

• Example:sum <- 0for num <- 1 to 10 sum <- sum + num

Page 23: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

While Loop

• Loop is controlled by a condition.

• When the condition is false, the loop no longer executes.

• Examplewhile (you are cold)

raise the thermostat temperature 1 degree

Page 24: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Array Access

• An array is a sequence of values of the same type.

• We access each item, or element by its numeric index.

• Computers start counting at 0!

Page 25: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Array ExampleJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

0 1 2 3 4 5 6 7 8 9 10 11

Months <- {“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”}

print(months[5]) prints Junfor m<- 0 to 11 print(months[m]) go to the next line

Page 26: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Assignment to an Array

• An array element is really a variable, so you can assign to it.

• Example:for n <- 0 to 99

squares[n] = n*n

Page 27: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Function

• A function has a name, parameters (inputs), code to execute, and a return value.

• Example: fibonacci (n) f[0] <- 1 f[1] <- 1 for i <- 2 to n-1 f[i] <- f[i-1]+f[i-2] return f[n-1]

Page 28: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Calling a Function

• Write the function name, and the actual values for the parameters

• When the function is complete, the return value replaces the name of the function in any expression.

• Example:print(fibonacci(8)) prints 21

Page 29: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Exercise

• Find a set of instructions, written in English, on the Internet, preferably for a non-mathematical task.

• There should be at least 5 steps, and a condition or a loop, preferably both.

• Rewrite the instructions in pseudocode.• Possibilities:

– Instruction manuals– Government sites– Game descriptions

Page 30: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Comparing Algorithms

• There can be many different algorithms to solve the same problem

• Better algorithms…– Get the correct answer (if possible)– Get “better” answers (otherwise)– Use fewer resources (time and space)

Page 31: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Algorithm Complexity

• Time– How long does the algorithm take?– Abstract, don’t want answer to depend on

which machine!

• Space– How much space (arrays, variables) does

the algorithm need?

Page 32: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Time Complexity of an Algorithm

What we want to do is relate

1. the amount of work performed by an algorithm

2. and the algorithm's input size

by a fairly simple formula.

Page 33: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

STEPS FOR DETERMING THE TIME COMPLEXITY OF AN ALGORITHM

• 1. Determine how you will measure input size. Ex: – N items in a list– N x M table (with N rows and M columns)– Two numbers of length N

• 2. Choose an operation (or perhaps two operations) to count as a gauge of the amount of work performed. Ex:– Comparisons– Swaps– Copies– Additions

Normally we don't count operations in input/output.

Page 34: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

STEPS FOR DETERMING THE TIME COMPLEXITY OF AN ALGORITHM

• 3. Decide whether you wish to count operations in the

– Best case? - the fewest possible operations– Worst case? - the most possible operations– Average case?

• This is harder as it is not always clear what is meant by an "average case". Normally calculating this case requires some higher mathematics such as probability theory.

• 4. For the algorithm and the chosen case (best, worst, average), express the count as a function of the input size of the problem.

For example, we determine by counting, statements such as ...

Page 35: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

EXAMPLES:

• For n items in a list, counting the operation swap, we find the algorithm performs 10n + 5 swaps in the worst case.

• For an n X m table, counting additions, we find the algorithm perform nm additions in the best case.

• For two numbers of length n, there are 3n + 20 multiplications in the best case.

Page 36: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

STEPS FOR DETERMING THE TIME COMPLEXITY OF AN ALGORITHM

5. Given the formula that you have determined, decide the complexity class of the algorithm.

What is the complexity class of an algorithm?

Question: Is there really much difference between

3n

5n + 20

and 6n -3

especially when n is large?

Page 37: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

But, there is a huge difference, for n large, between

n

n2

and n3

So we try to classify algorithm into classes, based on their counts and simple formulas such as n, n2, n3, and others.

Why does this matter?

It is the complexity of an algorithm that most affects its running time---

not the machine or its speed

Page 38: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

ORDER WINS OUTThe TRS-80

Main language support: BASIC - typically a slow running language

For more details on TRS-80 see:

http://en.wikipedia.org/wiki/TRS-80

http://en.wikipedia.org/wiki/Cray_Y-MP

The CRAY-YMP

Language used in example: FORTRAN- a fast running language

For more details on CRAY-YMP see:

Page 39: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

CRAY YMP TRS-80with FORTRAN with BASICcomplexity is 3n3 complexity is 19,500,000n

n is:

10

100

1000

2500

10000

1000000

3 microsec 200 millisec

3 millisec 2 sec

3 sec 20 sec

50 sec 50 sec

49 min 3.2 min

95 years 5.4 hours

Page 40: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Trying to maintain an exact count for an operation isn't too useful.

Thus, we group algorithms that have counts such as

n

3n + 20

1000n - 12

0.00001n +2

together. We say algorithms with these type of counts are in the class (n) -

read as the class of theta-of-n or

all algorithms of magnitude n or

all order-n algorithms

Page 41: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Similarly, algorithms with counts such as

n2 + 3n

1/2n2 + 4n - 5

1000n2 + 2.54n +11

are in the class (n2).

Other typical classes are those with easy formulas in n such as

1

n3

2n

lg n k = lg n if and only if 2k = n

Page 42: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

lg n k = lg n if and only if 2k = n

lg 4 = ?

lg 8 = ?

lg 16 = ?

lg 10 = ?

Note that all of these are base 2 logarithms. You don't use any logarithm table as we don't need exact values (except on integer powers of 2).

Look at the curves showing the growth for algorithms in

(1), (n), (n2), (n3), (lg n), (n lg n), (2n)

These are the major ones we'll use.

Page 43: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.4Work = cn for Various Values of c

Page 44: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.10Work = cn2 for Various Values of c

Page 45: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.11A Comparison of n and n2

Page 46: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.21A Comparison of n and lg n

Page 47: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.21A Comparison of n and lg n

Page 48: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Figure 3.25Comparisons of lg n, n, n2 , and 2n

Page 49: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Making Change

• US Change Problem– Input: amount of money, M, in cents– Output: smallest number of coins (quarters,

dimes, nickels, and pennies) that add up to M

Page 50: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

US Change Algorithm

While M>0

c <- value of largest coin with value <= M

give c coin to customer

M <- M-c

• (See mathematical version, p. 19)

Page 51: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

US Change Examples:

• 77c– Quarter (77-25 = 52)– Quarter (52-25 = 27)– Quarter (27-25 = 2)– Penny (2-1 = 1)– Penny (2-1 = 1)

Page 52: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Same Algorithm using Mathematics

• Calculate and give the max # of quarters (25c):– Q = floor (M / 25) – Give customer Q quarters– M = M - 25*Q

• Calculate and give the max # of dimes (10c).• Calculate and give the max # of nickels (5c).• Give the rest in pennies.

Page 53: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Generalizing the Algorithm

• Suppose a non-US money system has d (different) coins of denominations: c[0], c[1], … c[d-1]

• And, c[0] > c[1] > … c[d-1]

• Then, we can generalize the algorithm:

Page 54: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Generalizing, continued

for i <- 0 to d-1

Calculate and give the max number of this coin (c[i])

Page 55: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Generalized Algorithm is Not Correct!

• c = {25, 20, 10, 5, 1}

• M = 40

• Result is 3 coins, not 2

Page 56: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Brute Force: Correct but Slow

N <- 1

While (not done)

construct a combination of N coins

if it adds up to M

return it (done=true)

else if no more combinations of N coins

N <- N+1

Page 57: Algorithms and Complexity Bioinformatics Spring 2008 Hiram College Algorithm definition slides taken from CPSC 171, courtesy of Obertia Slotterbeck

Improving the Change Algorithm

• Try combinations in a useful order– We’ve already done this, looking at fewer

coins first.

• Generate as few combinations as possible– Use knowledge to bound the combinations

tried.