can you trust the internet? an introduction to graph theory, computational complexity, and a little...

54
Confidential 1 Can you connect the dots, like the the graph below, without lifting up your pencil? Can you connect each dot on the top to each dot on the bottom, like above, without crossing lines? There are pens at the sign-in table if you want to try these out

Upload: denise-gosnell-phd

Post on 08-Aug-2015

143 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Confidential 1

Can you connect the dots, like the the graph below, without lifting up

your pencil?

Can you connect each dot on the top to each dot on the bottom, like

above, without crossing lines?

There are pens at the sign-in table if you want to try these out

Silicon Valley | Silicon Harbor

Can You Trust The Internet? An intro to graph invariants, computational

complexity, and graph algorithms (with a little bit of cryptography)

Denise K. Gosnell, Ph.D. Data Scientist, PokitDok

Confidential 3

Graph Theory?

G = (V,E)

Confidential 4

The Plan

1.  Graph Properties and Reddit 2.  Computational Complexity

P: Bipartite Graph Matching NP: Graph Coloring

3. A little bit of RSA

Confidential 5

Why should you care?

1.  Everything is graph theoretic in nature.

2.  Much easier to code. 1.  or… you just want to be here to look cool.

Confidential 6

The Seven Bridges of Königsberg

"Konigsberg bridges" by Bogdan Giuşcă - Public domain (PD) http://commons.wikimedia.org/wiki/File:Konigsberg_bridges.png#/media/File:Konigsberg_bridges.png

Confidential 7

The Seven Bridges of Königsberg: Solved by Euler in 1735

Thank you Wikipedia: http://en.wikipedia.org/wiki/Seven_Bridges_of_Konigsberg

Confidential 8

Eulerian Paths

vs.

3

3

5 3

2

4 4

3 3

Confidential 9

screen shot from vizit: http://redditstuff.github.io/sna/vizit/

Silicon Valley | Silicon Harbor

P

Confidential 11

P The set of all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computational time

Confidential 12

Classes of Algorithm Complexity

O(1) •  Accessing an element of an array •  Determining if a number is even or odd

O(log(n)) •  Binary search [eg: intelligently searching through a dictionary]

O(n) •  Linear search: finding a min or max in an unsorted list •  Graph Search

O(n2) •  looking for a word in a word search •  Bad sorting algorithms like bubble sort, insertion sort

Confidential 13

Graph Search: O(|V| + |E|)

Depth vs. Breadth

Confidential 14

Graph Matching Given a bipartite graph G = (V,E), a matching M in G is a set of edges in which no two edges share a common vertex

Confidential 15

The User Preferences Problem

Candidates à Jobs

Confidential 16

The User Preferences Problem Goal: assign candidates to jobs to fill as many jobs as possible

Confidential 17

The User Preferences Problem Greedy Algorithm:

Keep adding edges until no more edges can be added

Confidential 18

The User Preferences Problem Greedy Approach

Confidential 19

The User Preferences Problem A better solution: via augmenting paths

Confidential 20

Augmenting Paths

•  A path P is M-alternating if the edges of the path alternate between being in the matching M and not in the matching M.

•  A path P is M-augmenting if it is m-alternating

and the first and last edges are not in the matching M.

Confidential 21

Augmenting Paths

You can improve the matching M by:

1. Remove from the matching M the edges of the path P that are in the matching M

2. Add to the matching M the edges of the path P that are not in the matching M

3. This will have one more edge than in the matching

Confidential 22

The User Preferences Problem Augmenting Path Example:

Confidential 23

The User Preferences Problem Augmenting Path Example:

Confidential 24

Berge’s Theorem:

A matching in a graph is maximum if and only if there does not exist an augmenting path in a graph

Confidential 25

Maximal Matching by Constructing the Auxiliary Graph

Confidential 26

Maximal Matching by Constructing the Auxiliary Graph: Theorem: G has an augmenting path if and only if it has a directed path from the source to the sink in the auxiliary graph

Confidential 27

Maximal Matching by Constructing the Auxiliary Graph:

Confidential 28

Maximal Matching Final Solution: EdmondsAlgorithm(G): M = empty matching!while there is an augmenting path P for M!

!M = M +- P!output M!!AugmentingPath(G,M): G’ = Auxiliary graph for G, M!P = Path from source to sink (via BFS)!if P is null:!

!return false!else:!

!delete s and t from P and return P!

Confidential 29

Graph Algorithms in P •  Maximum (minimum) degree •  Finding connected components

[BFS, DFS] •  Pairwise shortest path algorithms

[Dijkstra’s, Bellman-Ford, Floyd-Warshall] •  Diameter •  Girth [shortest cycle] •  Edge Covering Number •  …

Silicon Valley | Silicon Harbor

NP

Confidential 31

NP Nondeterministic Polynomial: the set of all decision problems where a “yes” instance can be verified by a non-deterministic Turing machine in polynomial time

Confidential 32

P à solvable in polynomial time

NP à verifiable in polynomial time

Confidential 33

Classic NP Problems:

Integer (prime?) Factorization

Graph Coloring (this isn’t what you think) The Knapsack Problem

The Traveling Salesman

Confidential 34

Integer (prime?) Factorization

Confidential 35

Integer (prime?) Factorization

Factoring a 232 digit number took over two years and utilized hundreds of machines. Paper: “Factorization of a 768-bit RSA modulus”. Kleinjung, et al. 2010.

Confidential 36

Graph Coloring Minimum number of colors required to color the vertices of G such that no two adjacent vertices are the same color

Confidential 37

Graph Coloring: Greedy approach

Confidential 38

Graph Coloring: Greedy approach

Confidential 39

Greedy Graph Coloring via a BFS:

ColorGraph(G,v):!!colors = []; !!let Q be a queue!!Q.enqueue(v)!!v.color = new color!!while Q is not empty:!! v ß Q.dequeue()!! for all edges from v to w:!! if w is not labeled as discovered!! ! Q.enqueue(w)!! ! label w as discovered!! ! neighbor_colors = set of color assignments ! ! !of the neighbors of w!! ! if neighbor_colors == colors:!! ! w.color = new color, update colors!! ! else: w.color = a color from !! ! ! colors - neighbor_colors !!!! ! ! ! ! !!! ! ! ! !!

Confidential 40

The Four Color Theorem Every planar graph is four colorable. 1997: Thomas

Confidential 41

Non-Planar Graphs

Can you connect each dot on the top to each dot on the bottom (like

above), without crossing lines?

Confidential 42

Finding non-planar graphs

K5 K3,3

Confidential 43

Graph Coloring The chromatic number of a graph has a constrained optimization version that is impossible to approximate within any constant factor unless P = NP. -1996: Zuckerman

Silicon Valley | Silicon Harbor

About this P =? NP.

Confidential 45

The P versus NP Problem: Essentially: can every problem whose solution can be checked by a computer in polynomial time also be solved by a computer in polynomial time? Formal Conjecture: 1971 by Stephen Cook

Confidential 46

Why should you care?

Integer (prime?) Factorization

Graph Coloring (this isn’t what you think) The Knapsack Problem

The Traveling Salesman

Confidential 47

RSA: 1. Pick two extremely large prime numbers p and q 2. Public Key: (e,n) where:

n = p � q e in [3, (p – 1)(q – 1)]

3. Private key: (d,n) where:

n = p � q e � d = 1 mod ((p – 1)(q – 1))

Confidential 48

RSA: The foundation of RSA’s security relies upon the fact that given a composite number, it is considered a hard problem to determine it’s prime factors. An NP problem, in fact.

Confidential 49

What just happened?

1.  Graph Properties and Reddit 2.  Computational Complexity

P: Bipartite Graph Matching NP: Graph Coloring

3. A little bit of RSA

Confidential 50

Graph Theory Resources: Introduction to Graph Theory 2nd Ed (West) Introduction to Graph Theory (Chartrand) Social Network Analysis (Wasserman) Introduction to Algorithms 3rd Edition (Cormen, …, Stein) … the Wikipedia pages aren’t too shabby.

Confidential 51

Links in the Presentation Notes: Reddit Viz: http://redditstuff.github.io/sna/vizit/# YouTube Lecture on Graph Matching: https://www.youtube.com/watch?v=NlQqmEXuiC8 Graph Matching Code: http://www.geeksforgeeks.org/maximum-bipartite-matching/ RSA Detailed Example: http://doctrina.org/How-RSA-Works-With-Examples.html Factorization of a 768-bit RSA modulus: http://eprint.iacr.org/2010/006.pdf

Confidential 52

Graph Tech Stack: Databases:

Titan Neo4J OrientDB

Visualization:

Gephi sigma.js GraphViz

Graph Tech Stack: Algorithm Libraries:

Spark Gremlin (Gremthon) Boost (c++) JGraphT (java) NetworkX Python-Graph

… there are plenty more to dive into. This is just a start.

Silicon Valley | Silicon Harbor

Can You Trust The Internet? An intro to graph invariants, computational

complexity, and graph algorithms (with a little bit of cryptography)

Denise K. Gosnell, Ph.D. Data Scientist, PokitDok

T: @DeniseKGosnell

Confidential 54

Can you connect the dots like the the graph below without lifting up

your pencil?

yes.

Can you connect each dot on the top to each dot on the bottom (like

above), without crossing lines?

no.