can you trust the internet? an introduction to graph theory, computational complexity, and a little...
TRANSCRIPT
Confidential 1
Can you connect the dots, like the the graph below, without lifting up
your pencil?
Can you connect each dot on the top to each dot on the bottom, like
above, without crossing lines?
There are pens at the sign-in table if you want to try these out
Silicon Valley | Silicon Harbor
Can You Trust The Internet? An intro to graph invariants, computational
complexity, and graph algorithms (with a little bit of cryptography)
Denise K. Gosnell, Ph.D. Data Scientist, PokitDok
Confidential 4
The Plan
1. Graph Properties and Reddit 2. Computational Complexity
P: Bipartite Graph Matching NP: Graph Coloring
3. A little bit of RSA
Confidential 5
Why should you care?
1. Everything is graph theoretic in nature.
2. Much easier to code. 1. or… you just want to be here to look cool.
Confidential 6
The Seven Bridges of Königsberg
"Konigsberg bridges" by Bogdan Giuşcă - Public domain (PD) http://commons.wikimedia.org/wiki/File:Konigsberg_bridges.png#/media/File:Konigsberg_bridges.png
Confidential 7
The Seven Bridges of Königsberg: Solved by Euler in 1735
Thank you Wikipedia: http://en.wikipedia.org/wiki/Seven_Bridges_of_Konigsberg
Confidential 11
P The set of all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computational time
Confidential 12
Classes of Algorithm Complexity
O(1) • Accessing an element of an array • Determining if a number is even or odd
O(log(n)) • Binary search [eg: intelligently searching through a dictionary]
O(n) • Linear search: finding a min or max in an unsorted list • Graph Search
O(n2) • looking for a word in a word search • Bad sorting algorithms like bubble sort, insertion sort
…
Confidential 14
Graph Matching Given a bipartite graph G = (V,E), a matching M in G is a set of edges in which no two edges share a common vertex
Confidential 16
The User Preferences Problem Goal: assign candidates to jobs to fill as many jobs as possible
Confidential 17
The User Preferences Problem Greedy Algorithm:
Keep adding edges until no more edges can be added
Confidential 20
Augmenting Paths
• A path P is M-alternating if the edges of the path alternate between being in the matching M and not in the matching M.
• A path P is M-augmenting if it is m-alternating
and the first and last edges are not in the matching M.
Confidential 21
Augmenting Paths
You can improve the matching M by:
1. Remove from the matching M the edges of the path P that are in the matching M
2. Add to the matching M the edges of the path P that are not in the matching M
3. This will have one more edge than in the matching
Confidential 24
Berge’s Theorem:
A matching in a graph is maximum if and only if there does not exist an augmenting path in a graph
Confidential 26
Maximal Matching by Constructing the Auxiliary Graph: Theorem: G has an augmenting path if and only if it has a directed path from the source to the sink in the auxiliary graph
Confidential 28
Maximal Matching Final Solution: EdmondsAlgorithm(G): M = empty matching!while there is an augmenting path P for M!
!M = M +- P!output M!!AugmentingPath(G,M): G’ = Auxiliary graph for G, M!P = Path from source to sink (via BFS)!if P is null:!
!return false!else:!
!delete s and t from P and return P!
Confidential 29
Graph Algorithms in P • Maximum (minimum) degree • Finding connected components
[BFS, DFS] • Pairwise shortest path algorithms
[Dijkstra’s, Bellman-Ford, Floyd-Warshall] • Diameter • Girth [shortest cycle] • Edge Covering Number • …
Confidential 31
NP Nondeterministic Polynomial: the set of all decision problems where a “yes” instance can be verified by a non-deterministic Turing machine in polynomial time
Confidential 33
Classic NP Problems:
Integer (prime?) Factorization
Graph Coloring (this isn’t what you think) The Knapsack Problem
The Traveling Salesman
Confidential 35
Integer (prime?) Factorization
Factoring a 232 digit number took over two years and utilized hundreds of machines. Paper: “Factorization of a 768-bit RSA modulus”. Kleinjung, et al. 2010.
Confidential 36
Graph Coloring Minimum number of colors required to color the vertices of G such that no two adjacent vertices are the same color
Confidential 39
Greedy Graph Coloring via a BFS:
ColorGraph(G,v):!!colors = []; !!let Q be a queue!!Q.enqueue(v)!!v.color = new color!!while Q is not empty:!! v ß Q.dequeue()!! for all edges from v to w:!! if w is not labeled as discovered!! ! Q.enqueue(w)!! ! label w as discovered!! ! neighbor_colors = set of color assignments ! ! !of the neighbors of w!! ! if neighbor_colors == colors:!! ! w.color = new color, update colors!! ! else: w.color = a color from !! ! ! colors - neighbor_colors !!!! ! ! ! ! !!! ! ! ! !!
Confidential 41
Non-Planar Graphs
Can you connect each dot on the top to each dot on the bottom (like
above), without crossing lines?
Confidential 43
Graph Coloring The chromatic number of a graph has a constrained optimization version that is impossible to approximate within any constant factor unless P = NP. -1996: Zuckerman
Confidential 45
The P versus NP Problem: Essentially: can every problem whose solution can be checked by a computer in polynomial time also be solved by a computer in polynomial time? Formal Conjecture: 1971 by Stephen Cook
Confidential 46
Why should you care?
Integer (prime?) Factorization
Graph Coloring (this isn’t what you think) The Knapsack Problem
The Traveling Salesman
Confidential 47
RSA: 1. Pick two extremely large prime numbers p and q 2. Public Key: (e,n) where:
n = p � q e in [3, (p – 1)(q – 1)]
3. Private key: (d,n) where:
n = p � q e � d = 1 mod ((p – 1)(q – 1))
Confidential 48
RSA: The foundation of RSA’s security relies upon the fact that given a composite number, it is considered a hard problem to determine it’s prime factors. An NP problem, in fact.
Confidential 49
What just happened?
1. Graph Properties and Reddit 2. Computational Complexity
P: Bipartite Graph Matching NP: Graph Coloring
3. A little bit of RSA
Confidential 50
Graph Theory Resources: Introduction to Graph Theory 2nd Ed (West) Introduction to Graph Theory (Chartrand) Social Network Analysis (Wasserman) Introduction to Algorithms 3rd Edition (Cormen, …, Stein) … the Wikipedia pages aren’t too shabby.
Confidential 51
Links in the Presentation Notes: Reddit Viz: http://redditstuff.github.io/sna/vizit/# YouTube Lecture on Graph Matching: https://www.youtube.com/watch?v=NlQqmEXuiC8 Graph Matching Code: http://www.geeksforgeeks.org/maximum-bipartite-matching/ RSA Detailed Example: http://doctrina.org/How-RSA-Works-With-Examples.html Factorization of a 768-bit RSA modulus: http://eprint.iacr.org/2010/006.pdf
Confidential 52
Graph Tech Stack: Databases:
Titan Neo4J OrientDB
Visualization:
Gephi sigma.js GraphViz
Graph Tech Stack: Algorithm Libraries:
Spark Gremlin (Gremthon) Boost (c++) JGraphT (java) NetworkX Python-Graph
… there are plenty more to dive into. This is just a start.
Silicon Valley | Silicon Harbor
Can You Trust The Internet? An intro to graph invariants, computational
complexity, and graph algorithms (with a little bit of cryptography)
Denise K. Gosnell, Ph.D. Data Scientist, PokitDok
T: @DeniseKGosnell