2012 09 gdg san francisco hackday at parisoma

164
1 Andreas Kollegger @akollegger The Neo4j Election Data @GDG SF Peter Neubauer @peterneubauer #neo4j Michael Hunger @mesirii Saturday, September 29, 12

Upload: peter-neubauer

Post on 09-May-2015

1.100 views

Category:

Technology


0 download

DESCRIPTION

Presentation on Neo4j, Federal Campaign Data at http://www.gtugsf.com/events/52972282/

TRANSCRIPT

Page 1: 2012 09 GDG San Francisco Hackday at Parisoma

1

Andreas Kollegger@akollegger

The Neo4j ElectionData @GDG SF

Peter Neubauer@peterneubauer

#neo4j

Michael Hunger@mesirii

Saturday, September 29, 12

Page 2: 2012 09 GDG San Francisco Hackday at Parisoma

1

#neo4j

Saturday, September 29, 12

Page 3: 2012 09 GDG San Francisco Hackday at Parisoma

2

Andreas Kollegger@akollegger

Follow the DataFEC Campaign Data

Peter Neubauer@peterneubauer

#neo4j

Michael Hunger@mesirii

Saturday, September 29, 12

Page 4: 2012 09 GDG San Francisco Hackday at Parisoma

2

#neo4j

Saturday, September 29, 12

Page 5: 2012 09 GDG San Francisco Hackday at Parisoma

Saturday, September 29, 12

Page 6: 2012 09 GDG San Francisco Hackday at Parisoma

4

Saturday, September 29, 12

Page 7: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

4

Saturday, September 29, 12

Page 8: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

4

Saturday, September 29, 12

Page 9: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

4

Saturday, September 29, 12

Page 10: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

2.What's a graph database?

4

Saturday, September 29, 12

Page 11: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

2.What's a graph database?

2.FEC Campaign Data

4

Saturday, September 29, 12

Page 12: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

2.What's a graph database?

2.FEC Campaign Data

1.Data Model

4

Saturday, September 29, 12

Page 13: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

2.What's a graph database?

2.FEC Campaign Data

1.Data Model

2.Import Strategy

4

Saturday, September 29, 12

Page 14: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan

1.Graph Database Primer

1.Why graphs?

2.What's a graph database?

2.FEC Campaign Data

1.Data Model

2.Import Strategy

3.Queries

4

Saturday, September 29, 12

Page 15: 2012 09 GDG San Francisco Hackday at Parisoma

5

Saturday, September 29, 12

Page 16: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 2

5

Saturday, September 29, 12

Page 17: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

5

Saturday, September 29, 12

Page 18: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

5

Saturday, September 29, 12

Page 19: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

5

Saturday, September 29, 12

Page 20: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

4. Add a Neo4j addon instance to it (heroku addons:add neo4j)

5

Saturday, September 29, 12

Page 21: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

4. Add a Neo4j addon instance to it (heroku addons:add neo4j)

5. Upload existing data to the graph

5

Saturday, September 29, 12

Page 22: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

4. Add a Neo4j addon instance to it (heroku addons:add neo4j)

5. Upload existing data to the graph

6. Create a custom Ruby proxy app on Heroku

5

Saturday, September 29, 12

Page 23: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

4. Add a Neo4j addon instance to it (heroku addons:add neo4j)

5. Upload existing data to the graph

6. Create a custom Ruby proxy app on Heroku

7. Connect to the app using a Google Spreadsheet 

5

Saturday, September 29, 12

Page 24: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Intro to Google Apps Script by Alex

2. Register at Heroku and install the heroku gem

3. Create and install a Heroku app (heroku apps:create)

4. Add a Neo4j addon instance to it (heroku addons:add neo4j)

5. Upload existing data to the graph

6. Create a custom Ruby proxy app on Heroku

7. Connect to the app using a Google Spreadsheet 

8. Build a small bar chart from a Cypher query

5

Saturday, September 29, 12

Page 25: 2012 09 GDG San Francisco Hackday at Parisoma

6

Saturday, September 29, 12

Page 26: 2012 09 GDG San Francisco Hackday at Parisoma

Graph Database Primer

6

Saturday, September 29, 12

Page 27: 2012 09 GDG San Francisco Hackday at Parisoma

7

Saturday, September 29, 12

Page 28: 2012 09 GDG San Francisco Hackday at Parisoma

Why graphs, why now?

7

!⛵☕

$

%⚾'

()* +,-

.

✈⛽ 1

23

4☕ 5

6

7 89:

;<

=

>

?@

B

C

D

E $F%

GHI

J

K

L

M

()

NOP

,Q

-*

Saturday, September 29, 12

Page 29: 2012 09 GDG San Francisco Hackday at Parisoma

Why graphs, why now?

1.Big Data is the trend

7

! " #

$

%

&

*+

,⚽

.

/ 0

1

3

4

5 ⚾

7

8

9

:

;<

=> ?

@ A B

C!D

E F

G

H"

$

✈⛽,

.

0

1

I☕J

<

@KBL

MNG

OJ

#

%&

P

*

+Q

R

/

S

3

T

4U5

O

⚾7

8 V

9

W

:

X

;

=>

YZ

C

[

D\E

F]

H

AX

?!⛵☕

$

%⚾'

()* +,-

.

✈⛽ 1

23

4☕ 5

6

7 89:

;<

=

>

?@

B

C

D

E $F%

GHI

J

K

L

M

()

NOP

,Q

-*

Saturday, September 29, 12

Page 30: 2012 09 GDG San Francisco Hackday at Parisoma

Why graphs, why now?

1.Big Data is the trend

2.NOSQL is the answer

7

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

! " #

$

%

&

*+

,⚽

.

/ 0

1

3

4

5 ⚾

7

8

9

:

;<

=> ?

@ A B

C!D

E F

G

H"

$

✈⛽,

.

0

1

I☕J

<

@KBL

MNG

OJ

#

%&

P

*

+Q

R

/

S

3

T

4U5

O

⚾7

8 V

9

W

:

X

;

=>

YZ

C

[

D\E

F]

H

AX

?!⛵☕

$

%⚾'

()* +,-

.

✈⛽ 1

23

4☕ 5

6

7 89:

;<

=

>

?@

B

C

D

E $F%

GHI

J

K

L

M

()

NOP

,Q

-*

Saturday, September 29, 12

Page 31: 2012 09 GDG San Francisco Hackday at Parisoma

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

Why graphs, why now?

1.Big Data is the trend

2.NOSQL is the answer

3.Large in volume, and in

7

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

! " #

$

%

&

*+

,⚽

.

/ 0

1

3

4

5 ⚾

7

8

9

:

;<

=> ?

@ A B

C!D

E F

G

H"

$

✈⛽,

.

0

1

I☕J

<

@KBL

MNG

OJ

#

%&

P

*

+Q

R

/

S

3

T

4U5

O

⚾7

8 V

9

W

:

X

;

=>

YZ

C

[

D\E

F]

H

AX

?!⛵☕

$

%⚾'

()* +,-

.

✈⛽ 1

23

4☕ 5

6

7 89:

;<

=

>

?@

B

C

D

E $F%

GHI

J

K

L

M

()

NOP

,Q

-*

Saturday, September 29, 12

Page 32: 2012 09 GDG San Francisco Hackday at Parisoma

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

Why graphs, why now?

1.Big Data is the trend

2.NOSQL is the answer

3.Large in volume, and in

7

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

! " #

$

%

&

*+

,⚽

.

/ 0

1

3

4

5 ⚾

7

8

9

:

;<

=> ?

@ A B

C!D

E F

G

H"

$

✈⛽,

.

0

1

I☕J

<

@KBL

MNG

OJ

#

%&

P

*

+Q

R

/

S

3

T

4U5

O

⚾7

8 V

9

W

:

X

;

=>

YZ

C

[

D\E

F]

H

AX

?!⛵☕

$

%⚾'

()* +,-

.

✈⛽ 1

23

4☕ 5

6

7 89:

;<

=

>

?@

B

C

D

E $F%

GHI

J

K

L

M

()

NOP

,Q

-*

! " #

$

%

&

'

+,

-

.

0

1

2 3

4

5

7

8

9

:

;

< ⚾

>

?@

A

B

C

DE

FG H

I J

KL

M

N

O!

PQ

R

S T

U V

W

X

Y

Z"

$

✈⛽-

1

3

4

5☕[

E

IKMP

SWX

\[

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾>

? @

A

B

C

]

D

FG

LN

O

Q

RTU

VY

Z

J]

H

! " #

$

%

&

'

+,

-

.

0

1

2 3

4

5

7

8

9

:

;

< ⚾

>

?@

A

B

C

DE

FG H

I J

KL

M

N

O!

PQ

R

S T

U V

W

X

Y

Z"

$

✈⛽-

1

3

4

5☕[

E

IKMP

SWX

\[

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾>

? @

A

B

C

]

D

FG

LN

O

Q

RTU

VY

Z

J]

H!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

!"#

$

%

&

'

+,

-

.

0

1

23

4

5

7

8

9

:

;

<⚾

>

?@

A

B

C

D E

FGH

IJ

KL

M

N

O !

PQ

R

ST

UV

W

X

Y

Z"

$

✈⛽ -

1

3

4

5☕ [

E

I KMP

S WX

\ [

#

%&

'

+

,.

0

2

7

8

9

:;<

\

⚾ >

?@

A

B

C

]

D

FG

LN

O

Q

R TU

VY

Z

J]

H

Saturday, September 29, 12

Page 33: 2012 09 GDG San Francisco Hackday at Parisoma

7

Saturday, September 29, 12

Page 34: 2012 09 GDG San Francisco Hackday at Parisoma

8

Saturday, September 29, 12

Page 35: 2012 09 GDG San Francisco Hackday at Parisoma

A Graph?

8

Saturday, September 29, 12

Page 36: 2012 09 GDG San Francisco Hackday at Parisoma

A Graph?

8

Yes, a graph

Saturday, September 29, 12

Page 37: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

Saturday, September 29, 12

Page 38: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

๏no: not for charts & diagrams, or vector artwork

Saturday, September 29, 12

Page 39: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

๏no: not for charts & diagrams, or vector artwork

๏yes: for storing data that is structured as a graph

Saturday, September 29, 12

Page 40: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

๏no: not for charts & diagrams, or vector artwork

๏yes: for storing data that is structured as a graph

•remember linked lists, trees?

Saturday, September 29, 12

Page 41: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

๏no: not for charts & diagrams, or vector artwork

๏yes: for storing data that is structured as a graph

•remember linked lists, trees?

•graphs are the general-purpose data structure

Saturday, September 29, 12

Page 42: 2012 09 GDG San Francisco Hackday at Parisoma

A graph database...

9

๏no: not for charts & diagrams, or vector artwork

๏yes: for storing data that is structured as a graph

•remember linked lists, trees?

•graphs are the general-purpose data structure

๏“A relational database may tell you the average age of everyone in the USA,

but a graph database will tell you who is most likely to buy you a beer.”

Saturday, September 29, 12

Page 43: 2012 09 GDG San Francisco Hackday at Parisoma

A Graph Database

10

Saturday, September 29, 12

Page 44: 2012 09 GDG San Francisco Hackday at Parisoma

11

Saturday, September 29, 12

Page 45: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

Saturday, September 29, 12

Page 46: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

Saturday, September 29, 12

Page 47: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo

Saturday, September 29, 12

Page 48: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo bar

Saturday, September 29, 12

Page 49: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo barfoo_bar

Saturday, September 29, 12

Page 50: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo barfoo_bar

Saturday, September 29, 12

Page 51: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo barfoo_bar

Saturday, September 29, 12

Page 52: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

foo barfoo_bar

Saturday, September 29, 12

Page 53: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 54: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 55: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 56: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 57: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 58: 2012 09 GDG San Francisco Hackday at Parisoma

You know relational

11

now consider relationships...

Saturday, September 29, 12

Page 59: 2012 09 GDG San Francisco Hackday at Parisoma

11

Saturday, September 29, 12

Page 60: 2012 09 GDG San Francisco Hackday at Parisoma

12

Saturday, September 29, 12

Page 61: 2012 09 GDG San Francisco Hackday at Parisoma

We're talking about aProperty Graph

12

Saturday, September 29, 12

Page 62: 2012 09 GDG San Francisco Hackday at Parisoma

We're talking about aProperty Graph

12

Nodes

Saturday, September 29, 12

Page 63: 2012 09 GDG San Francisco Hackday at Parisoma

We're talking about aProperty Graph

12

Nodes

Relationships

Saturday, September 29, 12

Page 64: 2012 09 GDG San Francisco Hackday at Parisoma

Emil

Andrés

Lars

Johan

Allison

Peter

Michael

Tobias

Andreas

IanMica

Delia

knows

knows

knowsknows

knows

knows

knows

knows

knows

knowsMica

knowsknowsMica

Delia

knows

We're talking about aProperty Graph

12

Nodes

Relationships

Properties (each a key+value)

+ Indexes (for easy look-ups)

Saturday, September 29, 12

Page 65: 2012 09 GDG San Francisco Hackday at Parisoma

12

Saturday, September 29, 12

Page 66: 2012 09 GDG San Francisco Hackday at Parisoma

13

Saturday, September 29, 12

Page 67: 2012 09 GDG San Francisco Hackday at Parisoma

And, but, so how do you query this "graph" database?

13

Saturday, September 29, 12

Page 68: 2012 09 GDG San Francisco Hackday at Parisoma

14

Saturday, September 29, 12

Page 69: 2012 09 GDG San Francisco Hackday at Parisoma

14

Cypher - a graph query language๏a pattern-matching query language

๏declarative grammar with clauses (like SQL)

๏aggregation, ordering, limits

๏create, read, update, delete

Saturday, September 29, 12

Page 70: 2012 09 GDG San Francisco Hackday at Parisoma

14

Cypher - a graph query language๏a pattern-matching query language

๏declarative grammar with clauses (like SQL)

๏aggregation, ordering, limits

๏create, read, update, delete

// get node 1, traverse 2 steps awaystart a=node(1) match (a)--()--(c) return c

// create a node with a 'name' propertyCREATE (me {name: 'Andreas'}) return me

๏more on this later...

Saturday, September 29, 12

Page 71: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 72: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 73: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 74: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 75: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 76: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 77: 2012 09 GDG San Francisco Hackday at Parisoma

15

Cypher - pattern matching

Saturday, September 29, 12

Page 78: 2012 09 GDG San Francisco Hackday at Parisoma

16

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 79: 2012 09 GDG San Francisco Hackday at Parisoma

16

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 80: 2012 09 GDG San Francisco Hackday at Parisoma

16

Cypher - pattern matching syntax

() --> ()

Saturday, September 29, 12

Page 81: 2012 09 GDG San Francisco Hackday at Parisoma

17

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 82: 2012 09 GDG San Francisco Hackday at Parisoma

17

Cypher - pattern matching syntax

A B

Saturday, September 29, 12

Page 83: 2012 09 GDG San Francisco Hackday at Parisoma

17

Cypher - pattern matching syntax

(A) --> (B)A B

Saturday, September 29, 12

Page 84: 2012 09 GDG San Francisco Hackday at Parisoma

18

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 85: 2012 09 GDG San Francisco Hackday at Parisoma

18

Cypher - pattern matching syntax

A B

Saturday, September 29, 12

Page 86: 2012 09 GDG San Francisco Hackday at Parisoma

18

Cypher - pattern matching syntax

(A) -- (B)A B

Saturday, September 29, 12

Page 87: 2012 09 GDG San Francisco Hackday at Parisoma

19

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 88: 2012 09 GDG San Francisco Hackday at Parisoma

19

Cypher - pattern matching syntax

A BLOVES

Saturday, September 29, 12

Page 89: 2012 09 GDG San Francisco Hackday at Parisoma

19

Cypher - pattern matching syntax

A -[:LOVES]-> B

A BLOVES

Saturday, September 29, 12

Page 90: 2012 09 GDG San Francisco Hackday at Parisoma

20

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 91: 2012 09 GDG San Francisco Hackday at Parisoma

20

Cypher - pattern matching syntax

A B C

Saturday, September 29, 12

Page 92: 2012 09 GDG San Francisco Hackday at Parisoma

20

Cypher - pattern matching syntax

A --> B --> CA B C

Saturday, September 29, 12

Page 93: 2012 09 GDG San Francisco Hackday at Parisoma

21

Cypher - pattern matching syntax

Saturday, September 29, 12

Page 94: 2012 09 GDG San Francisco Hackday at Parisoma

21

Cypher - pattern matching syntax

A

B C

Saturday, September 29, 12

Page 95: 2012 09 GDG San Francisco Hackday at Parisoma

21

Cypher - pattern matching syntax

A --> B --> C, A --> C

A

B C

Saturday, September 29, 12

Page 96: 2012 09 GDG San Francisco Hackday at Parisoma

21

Cypher - pattern matching syntax

A --> B --> C, A --> C

A

B C

A --> B --> C <-- ASaturday, September 29, 12

Page 97: 2012 09 GDG San Francisco Hackday at Parisoma

22

Saturday, September 29, 12

Page 98: 2012 09 GDG San Francisco Hackday at Parisoma

22

Cypher - common clauses

Saturday, September 29, 12

Page 99: 2012 09 GDG San Francisco Hackday at Parisoma

22

Cypher - common clauses// get node 1, traverse 2 steps awaySTART a=node(1) MATCH (a)--()--(c) RETURN c

// get node from an index, return itSTART a=node:people(name='Andreas')RETURN a

// get node from an index, match, filter// with where, then return resultsSTART a=node:people(name='Andreas')MATCH (a)-[r]-(b) WHERE b.last='Sparrow'RETURN r,b

Saturday, September 29, 12

Page 100: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

23

Saturday, September 29, 12

Page 101: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

23

yeah, this is the good stuff..

Saturday, September 29, 12

Page 102: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

23

yeah, this is the good stuff..

and now, it's time for

Saturday, September 29, 12

Page 103: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

24

๏In 1975, Congress created the Federal Election Commission (FEC) to administer and enforce the Federal Election Campaign Act (FECA) – The statute that governs the financing of federal elections.

๏The duties of the FEC, which is an independent regulatory agency, are to disclose campaign finance information

Saturday, September 29, 12

Page 104: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

25

๏Detailed files about...

•Candidates

•Committees

•Individual Contributions

๏10 years of data

๏Updated every Sunday

Committee Candidate

Individual Contributions

Saturday, September 29, 12

Page 105: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data - Committees

26

๏Committees

•one record for each committee registered with the Federal Election Commission.

CMTE_ID: StringCMTE_NM: StringTRES_NM: StringCMTE_ST1: StringCMTE_ST2: StringCMTE_CITY: StringCMTE_ST: StringCMTE_ZIP: StringCMTE_DSGN: StringCMTE_TP: StringCMTE_PTY_AFFILIATION: StringCMTE_FILING_FREQ: StringORG_TP: StringCONNECTED_ORG_NM: StringCAND_ID: String

Committee - cm12.txt

Saturday, September 29, 12

Page 106: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

27

๏Candidates

•one record for each candidate who has either registered with the FEC or appeared on a ballot list prepared by a state elections office.

CAND_ID: StringCAND_NAME: StringCAND_PTY_AFFILIATION: StringCAND_ELECTION_YR: StringCAND_OFFICE_ST: StringCAND_OFFICE: StringCAND_OFFICE_DISTRICT: StringCAND_ICI: StringCAND_STATUS: StringCAND_PCC: StringCAND_ST1: StringCAND_ST2: StringCAND_CITY: StringCAND_ST: StringCAND_ZIP: String

Candidate - cn12.txt

Saturday, September 29, 12

Page 107: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data

28

๏Individual Contributions

•each contribution from an individual to a federal committee if the contribution was at least $200.

CMTE_ID: StringAMNDT_IND: StringRPT_TP: StringTRANSACTION_PGI: StringIMAGE_NUM: StringTRANSACTION_TP: StringENTITY_TP: StringNAME: StringCITY: StringSTATE: StringZIP_CODE: StringEMPLOYER: StringOCCUPATION: StringTRANSACTION_DT: StringTRANSACTION_AMT: DoubleOTHER_ID: StringTRAN_ID: StringFILE_NUM: IntegerMEMO_CD: StringMEMO_TEXT: StringSUB_ID: Integer

Individual Contrib - itcont.txt

Saturday, September 29, 12

Page 108: 2012 09 GDG San Francisco Hackday at Parisoma

FEC Campaign Data - Extra Records

29

๏Candidate to Committee Linkage

•registered candidate to committee linkage

๏Transactions between Committees

• inter-committee contribution or independent expenditure during the two-year election cycle

๏Contribution to Candidate

•contribution or independent expenditure from committee to candidate during the two-year election cycle

Saturday, September 29, 12

Page 109: 2012 09 GDG San Francisco Hackday at Parisoma

Import Strategy

30

Saturday, September 29, 12

Page 110: 2012 09 GDG San Francisco Hackday at Parisoma

Raw Data Import

31

Committee Candidate

Candidate to Committee

Inter Committee Contributions

Candidate Contributions

Individual Contributions

Saturday, September 29, 12

Page 111: 2012 09 GDG San Francisco Hackday at Parisoma

Raw Data Import

31

Committee Candidate

Candidate to Committee

Inter Committee Contributions

Candidate Contributions

Individual Contributions

CAND_IDCMTE_ID

CMTE_ID CAND_ID

CMTE_ID

CAND_ID

CMTE_ID

Saturday, September 29, 12

Page 112: 2012 09 GDG San Francisco Hackday at Parisoma

Connected Data Import

32

Saturday, September 29, 12

Page 113: 2012 09 GDG San Francisco Hackday at Parisoma

CAND_ID

CMTE_ID CAND_ID

CMTE_ID

OTHER_ID:CAND_ID(from)

OTHER_ID:CMTE_ID(from) CAND_ID

CMTE_ID

CMTE_ID(to)

OTHER_ID:CAND_ID(from)

OTHER_ID:CMTE_ID(from)

Committee Candidate

Candidate to Committee

Inter Committee Contributions

Candidate Contributions

Individual Contributions

Connected Data Import

32

Saturday, September 29, 12

Page 114: 2012 09 GDG San Francisco Hackday at Parisoma

CAND_ID

CMTE_ID CAND_ID

CMTE_ID

OTHER_ID:CAND_ID(from)

OTHER_ID:CMTE_ID(from) CAND_ID

CMTE_ID

CMTE_ID(to)

OTHER_ID:CAND_ID(from)

OTHER_ID:CMTE_ID(from)

Committee Candidate

Candidate to Committee

Inter Committee Contributions

Candidate Contributions

Individual Contributions

Connected Data Import

32

Saturday, September 29, 12

Page 115: 2012 09 GDG San Francisco Hackday at Parisoma

Related Data Import

33

Saturday, September 29, 12

Page 116: 2012 09 GDG San Francisco Hackday at Parisoma

CAMPAIGNS_FOR

SUPPORTS

INTER_COMMITTEE_CONTRIBUTION

CANDIDATE_CONTRIBUTION

INDIVIDUAL_CONTRIBUTIONEARMARKED_BY

EARMARKED_BY

Committee Candidate

Individual Contributions

INTER_COMMITTEE_CONTRIBUTION

Related Data Import

33

Saturday, September 29, 12

Page 117: 2012 09 GDG San Francisco Hackday at Parisoma

CAMPAIGNS_FOR

SUPPORTS

INTER_COMMITTEE_CONTRIBUTION

CANDIDATE_CONTRIBUTION

INDIVIDUAL_CONTRIBUTIONEARMARKED_BY

EARMARKED_BY

Committee Candidate

Individual Contributions

INTER_COMMITTEE_CONTRIBUTION

Related Data Import

33

Committee Candidate

Individual Contributions

Saturday, September 29, 12

Page 118: 2012 09 GDG San Francisco Hackday at Parisoma

Dave Fauth's Approach

34

Saturday, September 29, 12

Page 119: 2012 09 GDG San Francisco Hackday at Parisoma

Advanced Import - Dave Fauth๏ includes SuperPAC data

๏custom transform, then import

๏model then looks like this...

35

SUPPORTSCommittee Candidate

FUNDS

GIVES

Contribution

Individual

Expenditures

superPac Contributions

Saturday, September 29, 12

Page 120: 2012 09 GDG San Francisco Hackday at Parisoma

Advanced Import - Dave Fauth๏Extract and Transform

• Stored files on S3

•Used MortarData to run Hadoop jobs to prepare data (@MortarData)

๏Load

•Used Neo4J BatchInserter to load

•Thanks to Michael Hunger (@mesirii)

• Loaded 2M+ nodes in <5 minutes

36

Saturday, September 29, 12

Page 121: 2012 09 GDG San Francisco Hackday at Parisoma

37

Download data

Use S3 Storage

Process with Hadoop/Pig

Java BatchInsert

Created Neo4J DB

Advanced Import - Dave Fauth

Saturday, September 29, 12

Page 122: 2012 09 GDG San Francisco Hackday at Parisoma

38

Wanna learn more?

๏Come hear Dave Fauth present at...

Saturday, September 29, 12

Page 123: 2012 09 GDG San Francisco Hackday at Parisoma

Next...Your Turn

39

Saturday, September 29, 12

Page 124: 2012 09 GDG San Francisco Hackday at Parisoma

From scratch

40

๏git clone https://github.com/akollegger/FEC_GRAPH.git

๏cd FEC_GRAPH

๏ant initialize

• (need Apache ant? install from http://ant.apache.org)

๏ant

• ant will build the importers and create a script

๏ ./bin/fec2graph --force --importer=RELATED

๏ant neo4j-start

•will download and unpack neo4j, then start it

Saturday, September 29, 12

Page 125: 2012 09 GDG San Francisco Hackday at Parisoma

Investigate with Neo4j's Web UI

๏open http://localhost:7474

๏Dashboard - overview of data records

๏Data browser - examine data records, with visualization options

๏Console - query the database using Cypher

41

Saturday, September 29, 12

Page 126: 2012 09 GDG San Francisco Hackday at Parisoma

Querying FEC with Cypher๏For Cypher documentation

• http://docs.neo4j.org/

๏FEC Data Definitions

• http://www.fec.gov/finance/disclosure/ftpdet.shtml

๏Ready for a challenge?

42

Saturday, September 29, 12

Page 127: 2012 09 GDG San Francisco Hackday at Parisoma

43http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 128: 2012 09 GDG San Francisco Hackday at Parisoma

43

Cypher Challenges

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 129: 2012 09 GDG San Francisco Hackday at Parisoma

43

Cypher Challenges// All presidential candidates for 2012

// Top 10 Presidential candidates according to number of campaign committees

// find President Barack Obama

// lookup Obama by his candidate ID

// find Presidential Candidate Mitt Romney

// lookup Romney by his candidate ID

// find the shortest path of funding between Obama and Romney

// 10 top individual contributions to Obama

// 10 top individual contributions to Romney

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 130: 2012 09 GDG San Francisco Hackday at Parisoma

44

Saturday, September 29, 12

Page 131: 2012 09 GDG San Francisco Hackday at Parisoma

44

Cypher Challenges

Saturday, September 29, 12

Page 132: 2012 09 GDG San Francisco Hackday at Parisoma

44

Cypher Challenges

// All presidential candidates for 2012start candidate=node:candidates('CAND_ID:*')where candidate.CAND_OFFICE='P' ANDcandidate.CAND_ELECTION_YR='2012'return candidate.CAND_NAME;

// Top 10 Presidential candidates according to // number of campaign committeesstart candidate=node:candidates('CAND_ID:*')match candidate<-[r:SUPPORTS]-(campaign) where candidate.CAND_OFFICE='P' AND

candidate.CAND_ELECTION_YR='2012' return candidate.CAND_NAME, COUNT(campaign) as countORDER BY count desc LIMIT 10;

// find President Barack Obamastart obama=node:candidates('CAND_ID:*') WHERE obama.CAND_NAME =~ '.*OBAMA.*' return obama.CAND_NAME, obama.CAND_ID;

Saturday, September 29, 12

Page 133: 2012 09 GDG San Francisco Hackday at Parisoma

45

Saturday, September 29, 12

Page 134: 2012 09 GDG San Francisco Hackday at Parisoma

45

Cypher Challenges

Saturday, September 29, 12

Page 135: 2012 09 GDG San Francisco Hackday at Parisoma

45

Cypher Challenges// lookup Obama by his candidate IDstart obama=node:candidates(CAND_ID='P80003338') return obama;

// find Presidential Candidate Mitt Romneystart romney=node:candidates('CAND_ID:*') WHERE romney.CAND_NAME =~ '.*ROMNEY.*' return romney.CAND_NAME, romney.CAND_ID;

// lookup Romney by his candidate IDstart romney=node:candidates(CAND_ID='P80003353') return romney;

// find the shortest path of funding between Obama and Romneystart romney=node:candidates(CAND_ID='P80003353'),

obama=node:candidates(CAND_ID='P80003338') MATCH p=shortestPath(romney-[*..10]-obama) return p;

Saturday, September 29, 12

Page 136: 2012 09 GDG San Francisco Hackday at Parisoma

46

Saturday, September 29, 12

Page 137: 2012 09 GDG San Francisco Hackday at Parisoma

46

Cypher Challenges

Saturday, September 29, 12

Page 138: 2012 09 GDG San Francisco Hackday at Parisoma

46

Cypher Challenges// 10 top individual contributions to Obamastart obama=node:candidates(CAND_ID='P80003338') match obama<-

[:SUPPORTS]-(campaign)<-[:INDIVIDUAL_CONTRIBUTION]-(contribution) return contribution.NAME, contribution.TRANSACTION_AMT order by contribution.TRANSACTION_AMT desc limit 10;

// 10 top individual contributions to Romneystart romney=node:candidates(CAND_ID='P80003353') match

romney<-[:SUPPORTS]-(campaign)<-[:INDIVIDUAL_CONTRIBUTION]-(contribution) return contribution.NAME, contribution.TRANSACTION_AMT order by contribution.TRANSACTION_AMT desc limit 10;

Saturday, September 29, 12

Page 139: 2012 09 GDG San Francisco Hackday at Parisoma

Customize the Data Importer

๏Java-savvy and feeling brave?

๏make a copy of

• CODE/fecGraph/src/importer/fec/RelatedFecImporter.java

๏add your class to

• CODE/fecGraph/src/importer/Tool.java

๏read docs about batch insertion

• http://docs.neo4j.org/chunked/milestone/batchinsert.html

๏Ideas:

•extract States and Zip Codes into "location index"

•extract individual contributors from contribution list47

Saturday, September 29, 12

Page 140: 2012 09 GDG San Francisco Hackday at Parisoma

48

#neo4j

Saturday, September 29, 12

Page 141: 2012 09 GDG San Francisco Hackday at Parisoma

48

Let's have some Fun!

:)

#neo4j

Saturday, September 29, 12

Page 142: 2012 09 GDG San Francisco Hackday at Parisoma

49

#neo4j

Neo4j

Heroku

Google

REST Cypher

Ruby

Saturday, September 29, 12

Page 143: 2012 09 GDG San Francisco Hackday at Parisoma

49

#neo4j

Neo4j

Heroku

Google

REST Cypher

Ruby

Follow the Plan - Part 2

Saturday, September 29, 12

Page 144: 2012 09 GDG San Francisco Hackday at Parisoma

50

Saturday, September 29, 12

Page 145: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 2

50

Saturday, September 29, 12

Page 146: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

50

Saturday, September 29, 12

Page 147: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

2. Create and install a Heroku app (heroku apps:create)

50

Saturday, September 29, 12

Page 148: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

2. Create and install a Heroku app (heroku apps:create)

3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)

50

Saturday, September 29, 12

Page 149: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

2. Create and install a Heroku app (heroku apps:create)

3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)

4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy

50

Saturday, September 29, 12

Page 150: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

2. Create and install a Heroku app (heroku apps:create)

3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)

4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy

5. Upload the data from example-data.neo4j.org

50

Saturday, September 29, 12

Page 152: 2012 09 GDG San Francisco Hackday at Parisoma

Follow the Plan - Part 21. Register at Heroku and install the heroku gem

2. Create and install a Heroku app (heroku apps:create)

3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)

4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy

5. Upload the data from example-data.neo4j.org

6. Connect to the app using a Google Spreadsheet , http://bit.ly/GDG-GCALC

7. Build a small bar chart from a Cypher query

50

Saturday, September 29, 12

Page 153: 2012 09 GDG San Francisco Hackday at Parisoma

51http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 154: 2012 09 GDG San Francisco Hackday at Parisoma

51

Heroku Challenges

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 155: 2012 09 GDG San Francisco Hackday at Parisoma

51

Heroku Challenges//Point the Database Instance to FEC http://bit.ly/SmkwUx/db/

data

// Build a Google Data table endpoint

// https://developers.google.com/chart/interactive/docs/php_example

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 156: 2012 09 GDG San Francisco Hackday at Parisoma

The Heroku Neo4j proxy App

52

Saturday, September 29, 12

Page 157: 2012 09 GDG San Francisco Hackday at Parisoma

The Heroku Neo4j proxy App

52

> gem install heroku> git add github [email protected]:neo4j-examples/heroku-neo4j-

proxy.git> heroku apps:create <app-name>> heroku addons:add neo4j> //add the project files> git add *; git commit -m"neo4j demo"> git push heroku master

Saturday, September 29, 12

Page 159: 2012 09 GDG San Francisco Hackday at Parisoma

The Google Spreadsheet Cypher driver

53

function cypherUrlREST(payload, url, user, pwd) { var auth = Utilities.base64Encode(user+":"+pwd); var response = UrlFetchApp.fetch( url, {"method":"POST", "payload": payload, "contentType": "application/json", "headers":{ "Authorization":"Basic "+auth, "accept":"application/json", } }); return response.getContentText();}

https://docs.google.com/spreadsheet/ccc?key=0AsSBFHSo5OaPdGhzT1RTbDVaR0R3NW5iNUFpejVuSHc#gid=0

Saturday, September 29, 12

Page 160: 2012 09 GDG San Francisco Hackday at Parisoma

54http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 161: 2012 09 GDG San Francisco Hackday at Parisoma

54

Google Challenges

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 162: 2012 09 GDG San Francisco Hackday at Parisoma

54

Google Challenges// Build a cypher parser in GoogleAppsScript

// Build a Cypher query Google Widget

// Visualize Cypher Results with Google Data Table

// Geographic data viz

http://1.usa.gov/uIGzZSaturday, September 29, 12

Page 163: 2012 09 GDG San Francisco Hackday at Parisoma

55

The heatmap from Cypher to Google

Saturday, September 29, 12

Page 164: 2012 09 GDG San Francisco Hackday at Parisoma

56

Wanna learn more?

Saturday, September 29, 12