csc 231 1 devon m. simmonds university of north carolina, wilmington csc231 data structures

50
CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures Analysis of Algorithm

Upload: claire-mccoy

Post on 21-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

1

Devon M. SimmondsUniversity of North Carolina,

Wilmington

CSC231Data Structures

Analysis of Algorithms

Page 2: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

2

Outline

Analysis Running time Worst case and average case

analysis Asymptotic notation

Big O, etc.

Page 3: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Expressing good solutions to complex software-related problems require a good grasp of algorithms and data structures.

3

Program = Data Structures + Algorithms

Computer Science = solving problems using computer programs

Page 4: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

What are algorithms?

Instructions for accessing and manipulating our data structures. Order of access How data should be modified Efficiency of access Etc.

4

Page 5: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Properties of Algorithms Finite input. Finite output. Correctness.

A correct algorithm halts with the correct output for every input.

Efficiency/Complexity How much resources the algorithm requires.

time Space bandwidth

5

Page 6: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Algorithm Analysis We only analyze correct algorithms

Incorrect algorithms might not halt at all on some input instances

Incorrect algorithms might halt with other than the desired answer

Analyzing an algorithm involves: Predicting the resources that the algorithm requires

Time – computational time (usually the most important)

Space - memory Bandwidth

Page 7: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Running Time?

The running time of an algorithm on a particular input is the number of primitive operations executed.

7

Page 8: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Algorithm Analysis… Factors affecting the running time

computer compiler algorithm used input to the algorithm

The content of the input affects the running time typically, the input size (number of items in the input)

is the main consideration E.g. sorting problem the number of items to be

sorted E.g. multiply two matrices together the total

number of elements in the two matrices Machine model assumed

Instructions are executed one after another, with no concurrent operations Not parallel computers

Page 9: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Example Calculate

Lines 1, 4 and 5 count for one unit each Line 3: executed N times, each time four units (2 mult, 1 add, I

asmt) Line 2: (1 for initialization, N+1 for all the tests, N for all the

increments) total 2N + 2 total cost: 6N + 4

N

i

i1

3

def sum(N): partialSum = 0 for i in range (1,

N+1): partialSum += i*i*i return partialSum

def main(): print(sum(3)) main()

1

2N+2

4N

1

1

1

2

3

4

5

Page 10: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Worst- / average- / best-case Worst-case running time of an algorithm

The longest running time for any input of size n An upper bound on the running time for any input guarantee that the algorithm will never take longer The worst case can occur fairly often

E.g. in searching a database for a particular piece of information – at the end of the list

Best-case running time E.g. in searching a database for a particular piece of

information – at the beginning of the list Average-case running time

May be difficult to define what “average” means

Page 11: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

11

Why we do not use Experimental Studies

Write a program implementing the algorithm

Run the program with inputs of varying size and composition

Use a method like time.clock() to get an accurate measure of the actual running time

Plot the results 0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50 100

Input Size

Tim

e (

ms)

Page 12: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

12

Limitations of Experiments It is necessary to implement the algorithm,

which may be difficult. Results may not be indicative of the running

time on other inputs not included in the experiment.

In order to compare two algorithms, the same hardware and software environments must be used

Many factors may affect the results. Poor memory use (i.e. cache misses, page faults) can

give misleading results Platform dependence?

Page 13: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

13

Theoretical Analysis Uses a high-level description of the

algorithm instead of an implementation

Characterizes running time as a function of the input size, n.

Takes into account all possible inputs Allows us to evaluate the speed of an

algorithm independent of the hardware/software environment

Page 14: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

14

Pseudocode High-level

description of an algorithm

More structured than English prose

Less detailed than a program

Preferred notation for describing algorithms

Hides many program design issues

Algorithm arrayMax(A, n)Input array A of n integersOutput maximum element of A

currentMax A[0]for i 1 to n 1 do

if A[i] currentMax thencurrentMax

A[i]return currentMax

Example: find max element of an array

Page 15: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

15

Primitive Operations Basic computations

performed by an algorithm

Identifiable in pseudocode

Largely independent from the programming language

Assumed to take a constant amount of time in the RAM model

Examples: Evaluating an

expression Assigning a

value to a variable

Indexing into an array

Calling a method Returning from a

method

Page 16: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

16

The Random Access Machine (RAM) Model

A CPU

An potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character

01

2

Instructions are executed one after another, no concurrent operations.Memory cells are numbered and accessing any cell in memory takes unit time.

Page 17: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Running Time?

The running time of an algorithm on a particular input is the number of primitive operations executed.

17

Page 18: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

18

Counting Primitive Operations

By inspecting the pseudocode/program, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size

Algorithm arrayMax(A, n) # operations

currentMax A[0] 2for( i 1; i <= n 1; i++ ) loop 2n times

if A[i] currentMax then 2 ops (index, comp)

currentMax A[i] 2 ops (index, asmt)return currentMax 1

4n + 1 < Total < 6n - 1

Page 19: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

19

Counting Primitive Operations

Algorithm arrayMax(A, n) # operations

currentMax A[0] 2for( i 1; i <= n 1; i++ ) loop 2n times

if A[i] currentMax then 2 ops (index, comp)

currentMax A[i] 2 ops (index, asmt)return currentMax 1

1st line = 2 ops, last line = 1 op 2nd line = 2n ops (init = 1, incr = n-1, cond = n). If statement is either 2 ops or 4 ops * (n-1) times

2 + 2n + 2(n-1) + 1 < Total < 2 + 2n + 4(n-1) + 1

4n + 1 < Total < 6n - 1

Page 20: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

20

Estimating Running Time Algorithm arrayMax executes 6n 1 primitive

operations in the worst case. Define:a = Time taken by the fastest primitive operationb = Time taken by the slowest primitive operation

Let T(n) be worst-case time of arrayMax. Then T(n) b(6n 1)

Hence, the running time T(n) is bounded by a linear function

Actual running time can be between a (4n + 1) and b (6n 1)

Page 21: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Rules for Counting Primitive Operations Rule-1 loops

running time of loop <= #iterations * running time of statements in loop

21

Algorithm arrayMax(A, n) # operations

currentMax A[0] 2for i 1 to n 1 do loop n 1 times

if A[i] currentMax then 2 ops (index, comp)

currentMax A[i] 2 ops (index, asmt){ increment counter i } 2 ops (cond, incr) ?return currentMax 1 op (incr) ?

loop <= 4(n 1)

Page 22: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Rules for Counting Primitive Operations Rule-2 nested loops

Evaluate nested loops inside out. running time of nested loop <= product

of sizes of loops * running time of statements

22

for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times

if A[i] currentMax then 2 ops (index, comp)

currentMax A[i] 2 ops (index, asmt)

worst case running time = 4(n 1) (m 1)

Page 23: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Rules for Counting Primitive Operations Rule-3 consecutive statements

Add consecutive statements.

23

for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times

A[i] = currentMax 2 ops (index, comp)

currentMax++ 1 op (comp)

worst case running time = (2 + 1)

worst case running time = (n 1) (m 1)(2 + 1)

Page 24: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Rules for Counting Primitive Operations Rule-4 if/else statements

Running time <= running time of condition + larger of two choices.

24

for i 1 to n 1 do loop n 1 timesfor i 1 to m 1 do loop m 1 times

if (A[i] currentMax && i > n) then 4 ops (index, 3 comps)

currentMax A[i] 2 ops (index, asmt) then

currentMax A[i] + 2 3 ops (index, comp, asmt)

worst case running time <= (4+3)(n 1) (m 1)

worst case running time <= (4+3)(n 1) (m

1)

Page 25: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Rules for Counting Primitive Operations Add consecutive statements. Running time of if-statement <=

running time of condition + larger of two choices.

Running time of loop <= #iterations * running time of statements in loop

Evaluate nested loops inside out. running time of nested loop <= product of

sizes of loops * running time of statements

25

Page 26: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231 Running-time of algorithms

Bounds are for the algorithms, rather than programs programs are just implementations of

an algorithm, and almost always the details of the program do not affect the bounds

Bounds are for algorithms, rather than problems A problem can be solved with several

algorithms, some are more efficient than others

Page 27: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

27

Growth Rate of Running Time

How does the running time of function T(n) change as input size increases? Changing the hardware/ software

environment Affects T(n) by a constant factor, but Does not alter the growth rate of T(n).

Page 28: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

28

Growth Rates

Growth rates of functions: Linear n Quadratic n2

Cubic n3

In a log-log chart, the slope of the line corresponds to the growth rate of the function

1E-11E+11E+31E+51E+71E+9

1E+111E+131E+151E+171E+191E+211E+231E+251E+271E+29

1E-1 1E+1 1E+3 1E+5 1E+7 1E+9

T(n

)

n

Cubic

Quadratic

Linear

Page 29: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

29

Constant Factors The growth rate

is not affected by constant factors

or lower-order terms

Examples 102n + 105 is a

linear function 105n2 + 108n is a

quadratic function 1E-11E+11E+31E+51E+71E+9

1E+111E+131E+151E+171E+191E+211E+231E+25

1E-1 1E+2 1E+5 1E+8

T(n

)

n

Quadratic

Quadratic

Linear

Linear

Page 30: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

30

Asymptotic Algorithm Analysis

Analysis for input sizes large enough to make only the order of growth of running times relevant. The asymptotic analysis of an algorithm determines

the running time in big-Oh notation To perform the asymptotic analysis

We find the worst-case number of primitive operations executed as a function of the input size

We express this function with big-Oh notation

Page 31: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

31

Asymptotic Notation

Big-O f(n) is O(g(n)) if f(n) is asymptotically less than or equal to

g(n)big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal

to g(n)big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n)

little-o f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n)

little-omega f(n) is (g(n)) if is asymptotically strictly greater than g(n)

Page 32: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Asymptotic notation: Big-O

f(N) = O(g(N)) iff there are positive constants c and n0 such that

f(N) c g(N) when N n0

The growth rate of f(N) is less than or equal to the growth rate of g(N)

g(N) is an upper bound on f(N)

Page 33: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Big-O Growth Rate

f(N) = O(g(N)) iff there are positive constants c and n0 such that

f(N) c g(N) when N n0

The growth rate of f(N) is less than or equal to the growth rate of g(N)

g(N) is an upper bound on f(N)

f(N) grows no faster than g(N) for “large” N

Page 34: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Big-O: example

Let f(N) = 2N2. Then f(N) = O(N4) f(N) = O(N3) f(N) = O(N2) (best answer,

asymptotically tight)

O(N2): reads “order N-squared” or “Big-O N-

squared”

Page 35: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Big O: more examples N2 / 2 – 3N = O(N2) 1 + 4N = O(N) 7N2 + 10N + 3 = O(N2) = O(N3) log10 N = log2 N / log2 10 = O(log2 N) = O(log N) sin N = O(1); 10 = O(1), 1010 = O(1)

log N + N = O(?) logk N = O(N) for any constant k N = O(2N), but 2N is not O(N) 210N is not O(2N)

)( 32

1

2 NONNiN

i

)( 2

1NONNi

N

i

Page 36: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

36

Big-O and Growth Rate The big-Oh notation gives an upper bound on

the growth rate of a function The statement “f(n) is O(g(n))” means that the

growth rate of f(n) is no more than the growth rate of g(n)

We can use the big-Oh notation to rank functions according to their growth rate

f(n) is O(g(n)) g(n) is O(f(n))

g(n) grows more

Yes No

f(n) grows more No Yes

Same growth Yes Yes

Page 37: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

37

Big-O Rules

If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e.,

1. Drop lower-order terms2. Drop constant factors

Use the smallest possible class of functions

Say “2n is O(n)” instead of “2n is O(n2)” Use the simplest expression of the class

Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)”

Page 38: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Some rules Ignore the lower order terms and the

coefficients of the highest-order term No need to specify the base of logarithm

Changing the base from one constant to another changes the value of the logarithm by only a constant factor

If T1(N) = O(f(N) and T2(N) = O(g(N)), then

T1(N) + T2(N) = max(O(f(N)), O(g(N))),

T1(N) * T2(N) = O(f(N) * g(N))

Page 39: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

Some typical growth rates c - constant time logn - logarithmic log2n – log-squared n – linear nlogn n2 – quadratic n3 - cubic an – exponential

39

Page 40: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

40

Summations

Logarithms and Exponents

Proof techniques Proof by contradiction Proof by induction

Math you need to Review

Page 41: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231Math Review: logarithmic functions

xdx

xd

aaa

na

aba

a

bb

baab

abiffbx

e

bbb

an

b

m

ma

xa

1log

log)(loglog

loglog

log

loglog

logloglog

log

loglog

Page 42: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

42

properties of exponentials:a(b+c) = aba c

abc = (ab)c

ab /ac = a(b-c)

b = a logab

bc = a c*logab

properties of logarithms:logb(xy) = logbx + logbylogb (x/y) = logbx - logby

logbxa = alogbx

logba = logxa/logxb

Math you need to Review

Page 43: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

43

Computing Prefix Averages Asymptotic analysis with

two algorithms for prefix averages

The i-th prefix average of an array X is average of the first (i + 1) elements of X:

A[i] = (X[0] + X[1] + … + X[i])/(i+1)

Computing the array A of prefix averages of another array X has applications to financial analysis

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7

X

A

Page 44: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

44

A Prefix Averages Algorithm

Algorithm prefixAverages1(X, n)Input array X of n integersOutput array A of prefix averages of X appx. #operations A new array of n integers for i 0 to n 1 do

s X[0] for j 1 to i do

s s + X[j]A[i] s / (i + 1)

return A

Algorithm prefixAverages2 runs in O(?) time

Page 45: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

45

One Way To Think About This

You go through the outer loop n times Each time through the loop you have to

perform an assignment, and then another loop.

The inner loop first does no iterations, then 1, then 2, etc.

Each iteration of the inner loop costs a constant amount.

So, real cost of this is like )1(321 n

Page 46: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

46

Another Prefix Averages Algorithm

Algorithm prefixAverages2(X, n)Input array X of n integersOutput array A of prefix averages of X #operationsA new array of n integerss 0 for i 0 to n 1 do

s s + X[i]A[i] s / (i + 1)

return A

Algorithm prefixAverages2 runs in O(?) time

Page 47: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

47

Summary

Big-O f(n) is O(g(n)) if f(n) is asymptotically less than or equal to

g(n)big-Omega f(n) is (g(n)) if f(n) is asymptotically greater than or equal

to g(n)big-Theta f(n) is (g(n)) if f(n) is asymptotically equal to g(n)

little-oh f(n) is o(g(n)) if f(n) is asymptotically strictly less than g(n)

little-omega f(n) is (g(n)) if is asymptotically strictly greater than g(n)

Page 48: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

48

______________________Devon M. Simmonds

Computer Science Department

University of North Carolina Wilmington

_____________________________________________________________

Qu es ti ons?

Reading from course text:

Page 49: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

49

Relatives of Big-Oh

big-Omega f(n) is (g(n)) if there is a constant c > 0

and an integer constant n0 1 such that f(n) c•g(n) for n n0

big-Theta f(n) is (g(n)) if there are constants c’ > 0 and c’’ > 0 and

an integer constant n0 1 such that c’•g(n) f(n) c’’•g(n) for n n0

little-oh f(n) is o(g(n)) if, for any constant c > 0, there is an integer

constant n0 0 such that f(n) c•g(n) for n n0

little-omega f(n) is (g(n)) if, for any constant c > 0, there is an integer

constant n0 0 such that f(n) c•g(n) for n n0

Page 50: CSC 231 1 Devon M. Simmonds University of North Carolina, Wilmington CSC231 Data Structures

CSC231

50

Devon M. SimmondsUniversity of North Carolina, Wilmington

TIME: Tuesday/Thursday 11:11:50am in 1012 & Thursday 3:30-5:10pm in 2006. Office hours: TR 1-2pm or by appointment. Office location: CI2046. Email: simmondsd[@]uncw.edu

Course Overview