advanced topics

23
Advanced Topics NP-complete reports. Continue on NP, parallelism

Upload: ugo

Post on 12-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Advanced Topics. NP-complete reports. Continue on NP, parallelism. Reprise: Non-determinism. Informal: add to any algorithm taking a guess at one or more places forking and pursuing one or more possibilities - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Topics

Advanced Topics

NP-complete reports. Continue on NP, parallelism

Page 2: Advanced Topics

Reprise: Non-determinism

• Informal: add to any algorithm – taking a guess at one or more places– forking and pursuing one or more possibilities

• If there is a Non-deterministic algorithm, then there is a regular/standard algorithm – just try all the possibilities– may take a long time

Page 3: Advanced Topics

Reprise: the class P

• … is all problems for which there exist an algorithm with complexity bounded by a polynomial.

Page 4: Advanced Topics

Reprise: the class NP

• all problems for which there is an algorithm, possibly non-deterministic, that assuming you take the right paths, is bounded by a polynomial

• Alternative definition: you can check that the answer is correct in polynomial time.

Page 5: Advanced Topics

Reprise: does P = NP?

• Is it possible to find actual standard algorithms for these NP problems?

• THE great problem of computer science.

• Proving it false would also be significant.

• Theoretical problem with considerable practical value.

Page 6: Advanced Topics

NP complete

• A set of NP problems that can be translated into each other in polynomial time so…

• If one of the problems can be solved in polynomial time– aka tractible

• …. they all can.

Page 7: Advanced Topics

NP-hard

• A problem is NP-hard if there is an NP-complete problem that can be translated into it in polynomial time.– but not necessarily the other way.

• NP-hard problems are at least as hard as NP-complete problems.

Page 8: Advanced Topics

NP-hard example

• Robot path planning in a dynamic environment

Page 9: Advanced Topics

Reports on NP-complete problems

• Tetris

• Knapsack problem

• Steiner Tree problem

• Graph coloring

• Minesweeper

• Subset problem

Page 10: Advanced Topics

Note

• There are methods for getting answers to NP problems, but they aren't guaranteed to be optimal.

• Called heuristics or approximations

Page 11: Advanced Topics

Distributed computing

• Approach to NP problems: fork a new process

• That is, use distributed computing to investigate the different choices

• Some problems may be embarrassingly parallelizable.

Page 12: Advanced Topics

Sources

• Many

• Google: http://code.google.com/edu/parallel/mapreduce-tutorial.html

• Note: there is controversy re: MapReduce– may be issue of patent– Is it the right framework– ??

Page 13: Advanced Topics

Concepts

• key/value pair

• Master / Worker

• nodes on network– may be one Master and many Workers

• hashing: quick way to find data (key/value data)

• piece / partition / split / shard

Page 14: Advanced Topics

Example from Google tutorial

• Compute pi using many workers, each doing a calculation using pseudo-random function.– no data (NOT typical MapReduce problem)

• Worker picks a random pointin the square. If it is in the circle,worker increments a counter.

• http://faculty.purchase.edu/jeanine.meyer/processing/piEstimate/applet/

Page 15: Advanced Topics

Formulas

• Area_of_circle = pi * r2• Area_of_square containing circle = 4 * r2• So r2 = Area_of_square / 4• Let Ac be Area_of_circle and

As be Area_of_square• Then pi = 4 * Ac / As

• Estimate for pi is 4 * counter / Number_of_points_tried

Page 16: Advanced Topics

Informal proof

• The chances of any point being in the circle is proportional to the ratio of the areas.

• Choosing many points randomly carries out this test.

• We could [simply] use for-loops and do the calculation for every point.

Page 17: Advanced Topics

MapReduce

• Model for distributed (aka parallel) computing• There are different products that implement

MapReduce. From a google search:– Google– Apache Hadoop: Open source– Teradata– Amazon– Greenplum– Platform

Page 18: Advanced Topics

MapReduce

• Programmers sets up program for Master and for Workers. Typically, the Master program sets up and partitions input array(s).

• Typically, data is key/value pairs.• Programmers write

– Map functions that process data, possibly making use of functions in the MapReduce library

– Reduce functions that combine the results• Workers work on Map tasks and/or Reduce

tasks. The Map task is applied to the worker's piece (aka shard) of the input array.

Page 19: Advanced Topics

MapReduce for pi estimate

• Not typical in that there is no data

• The map function does the calculation

• When all done, the reduce function adds up all the individual counters and calculates the estimate for pi

Page 20: Advanced Topics

Speed up for pi estimate• Suppose

– each step (getting the 2 random values and determining if in circle) takes K steps

– suppose 1000 workers calculating all together 1000000 values

– suppose adding 2 numbers takes 1 time unit

• Time without distributed computing: 1000000*K• Time with distributed computing

1000*K + 1000• Speed up is slightly less than 1000

Page 21: Advanced Topics

Follow-up

• Look up examples using MapReduce

• Note: one example is Google maintaining its keyword index by scanning (crawling) the web

Page 22: Advanced Topics

Speaker

Twitter: @kmwinterfield

• IBM Smarter Cities

• Social media for political campaigns

• World Community Grid

Page 23: Advanced Topics

Homework

• Prepare question for Kevin– follow on twitter and send message OR– post on moodle

• Continue with postings

• Research unique NP complete problem and post summary and source!