2012 09 gdg san francisco hackday at parisoma
DESCRIPTION
Presentation on Neo4j, Federal Campaign Data at http://www.gtugsf.com/events/52972282/TRANSCRIPT
1
Andreas Kollegger@akollegger
The Neo4j ElectionData @GDG SF
Peter Neubauer@peterneubauer
#neo4j
Michael Hunger@mesirii
Saturday, September 29, 12
1
#neo4j
Saturday, September 29, 12
2
Andreas Kollegger@akollegger
Follow the DataFEC Campaign Data
Peter Neubauer@peterneubauer
#neo4j
Michael Hunger@mesirii
Saturday, September 29, 12
2
#neo4j
Saturday, September 29, 12
Saturday, September 29, 12
4
Saturday, September 29, 12
Follow the Plan
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
2.What's a graph database?
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
2.What's a graph database?
2.FEC Campaign Data
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
2.What's a graph database?
2.FEC Campaign Data
1.Data Model
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
2.What's a graph database?
2.FEC Campaign Data
1.Data Model
2.Import Strategy
4
Saturday, September 29, 12
Follow the Plan
1.Graph Database Primer
1.Why graphs?
2.What's a graph database?
2.FEC Campaign Data
1.Data Model
2.Import Strategy
3.Queries
4
Saturday, September 29, 12
5
Saturday, September 29, 12
Follow the Plan - Part 2
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
4. Add a Neo4j addon instance to it (heroku addons:add neo4j)
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
4. Add a Neo4j addon instance to it (heroku addons:add neo4j)
5. Upload existing data to the graph
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
4. Add a Neo4j addon instance to it (heroku addons:add neo4j)
5. Upload existing data to the graph
6. Create a custom Ruby proxy app on Heroku
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
4. Add a Neo4j addon instance to it (heroku addons:add neo4j)
5. Upload existing data to the graph
6. Create a custom Ruby proxy app on Heroku
7. Connect to the app using a Google Spreadsheet
5
Saturday, September 29, 12
Follow the Plan - Part 21. Intro to Google Apps Script by Alex
2. Register at Heroku and install the heroku gem
3. Create and install a Heroku app (heroku apps:create)
4. Add a Neo4j addon instance to it (heroku addons:add neo4j)
5. Upload existing data to the graph
6. Create a custom Ruby proxy app on Heroku
7. Connect to the app using a Google Spreadsheet
8. Build a small bar chart from a Cypher query
5
Saturday, September 29, 12
6
Saturday, September 29, 12
Graph Database Primer
6
Saturday, September 29, 12
7
Saturday, September 29, 12
Why graphs, why now?
7
!⛵☕
$
%⚾'
()* +,-
.
✈⛽ 1
23
4☕ 5
6
7 89:
;<
=
>
?@
⚽
B
C
D
E $F%
GHI
J
K
L
M
()
NOP
,Q
-*
Saturday, September 29, 12
Why graphs, why now?
1.Big Data is the trend
7
! " #
$
%
&
✈
⛽
⛵
*+
,⚽
.
/ 0
1
☕
3
4
5 ⚾
7
8
9
:
;<
=> ?
@ A B
C!D
E F
G
H"
$
✈⛽,
.
0
1
I☕J
<
@KBL
MNG
OJ
#
%&
P
⛵
*
+Q
⚽
R
/
S
3
T
4U5
O
⚾7
8 V
9
W
:
X
;
=>
YZ
C
[
D\E
F]
H
AX
?!⛵☕
$
%⚾'
()* +,-
.
✈⛽ 1
23
4☕ 5
6
7 89:
;<
=
>
?@
⚽
B
C
D
E $F%
GHI
J
K
L
M
()
NOP
,Q
-*
Saturday, September 29, 12
Why graphs, why now?
1.Big Data is the trend
2.NOSQL is the answer
7
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
! " #
$
%
&
✈
⛽
⛵
*+
,⚽
.
/ 0
1
☕
3
4
5 ⚾
7
8
9
:
;<
=> ?
@ A B
C!D
E F
G
H"
$
✈⛽,
.
0
1
I☕J
<
@KBL
MNG
OJ
#
%&
P
⛵
*
+Q
⚽
R
/
S
3
T
4U5
O
⚾7
8 V
9
W
:
X
;
=>
YZ
C
[
D\E
F]
H
AX
?!⛵☕
$
%⚾'
()* +,-
.
✈⛽ 1
23
4☕ 5
6
7 89:
;<
=
>
?@
⚽
B
C
D
E $F%
GHI
J
K
L
M
()
NOP
,Q
-*
Saturday, September 29, 12
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
Why graphs, why now?
1.Big Data is the trend
2.NOSQL is the answer
3.Large in volume, and in
7
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
! " #
$
%
&
✈
⛽
⛵
*+
,⚽
.
/ 0
1
☕
3
4
5 ⚾
7
8
9
:
;<
=> ?
@ A B
C!D
E F
G
H"
$
✈⛽,
.
0
1
I☕J
<
@KBL
MNG
OJ
#
%&
P
⛵
*
+Q
⚽
R
/
S
3
T
4U5
O
⚾7
8 V
9
W
:
X
;
=>
YZ
C
[
D\E
F]
H
AX
?!⛵☕
$
%⚾'
()* +,-
.
✈⛽ 1
23
4☕ 5
6
7 89:
;<
=
>
?@
⚽
B
C
D
E $F%
GHI
J
K
L
M
()
NOP
,Q
-*
Saturday, September 29, 12
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
Why graphs, why now?
1.Big Data is the trend
2.NOSQL is the answer
3.Large in volume, and in
7
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
! " #
$
%
&
✈
⛽
⛵
*+
,⚽
.
/ 0
1
☕
3
4
5 ⚾
7
8
9
:
;<
=> ?
@ A B
C!D
E F
G
H"
$
✈⛽,
.
0
1
I☕J
<
@KBL
MNG
OJ
#
%&
P
⛵
*
+Q
⚽
R
/
S
3
T
4U5
O
⚾7
8 V
9
W
:
X
;
=>
YZ
C
[
D\E
F]
H
AX
?!⛵☕
$
%⚾'
()* +,-
.
✈⛽ 1
23
4☕ 5
6
7 89:
;<
=
>
?@
⚽
B
C
D
E $F%
GHI
J
K
L
M
()
NOP
,Q
-*
! " #
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
2 3
4
5
☕
7
8
9
:
;
< ⚾
>
?@
A
B
C
DE
FG H
I J
KL
M
N
O!
PQ
R
S T
U V
W
X
Y
Z"
$
✈⛽-
1
3
4
5☕[
E
IKMP
SWX
\[
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾>
? @
A
B
C
]
D
FG
LN
O
Q
RTU
VY
Z
J]
H
! " #
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
2 3
4
5
☕
7
8
9
:
;
< ⚾
>
?@
A
B
C
DE
FG H
I J
KL
M
N
O!
PQ
R
S T
U V
W
X
Y
Z"
$
✈⛽-
1
3
4
5☕[
E
IKMP
SWX
\[
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾>
? @
A
B
C
]
D
FG
LN
O
Q
RTU
VY
Z
J]
H!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
!"#
$
%
&
'
✈
⛽
⛵
+,
-
.
⚽
0
1
23
4
5
☕
7
8
9
:
;
<⚾
>
?@
A
B
C
D E
FGH
IJ
KL
M
N
O !
PQ
R
ST
UV
W
X
Y
Z"
$
✈⛽ -
1
3
4
5☕ [
E
I KMP
S WX
\ [
#
%&
'
⛵
+
,.
⚽
0
2
7
8
9
:;<
\
⚾ >
?@
A
B
C
]
D
FG
LN
O
Q
R TU
VY
Z
J]
H
Saturday, September 29, 12
7
Saturday, September 29, 12
8
Saturday, September 29, 12
A Graph?
8
Saturday, September 29, 12
A Graph?
8
Yes, a graph
Saturday, September 29, 12
A graph database...
9
Saturday, September 29, 12
A graph database...
9
๏no: not for charts & diagrams, or vector artwork
Saturday, September 29, 12
A graph database...
9
๏no: not for charts & diagrams, or vector artwork
๏yes: for storing data that is structured as a graph
Saturday, September 29, 12
A graph database...
9
๏no: not for charts & diagrams, or vector artwork
๏yes: for storing data that is structured as a graph
•remember linked lists, trees?
Saturday, September 29, 12
A graph database...
9
๏no: not for charts & diagrams, or vector artwork
๏yes: for storing data that is structured as a graph
•remember linked lists, trees?
•graphs are the general-purpose data structure
Saturday, September 29, 12
A graph database...
9
๏no: not for charts & diagrams, or vector artwork
๏yes: for storing data that is structured as a graph
•remember linked lists, trees?
•graphs are the general-purpose data structure
๏“A relational database may tell you the average age of everyone in the USA,
but a graph database will tell you who is most likely to buy you a beer.”
Saturday, September 29, 12
A Graph Database
10
Saturday, September 29, 12
11
Saturday, September 29, 12
You know relational
11
Saturday, September 29, 12
You know relational
11
Saturday, September 29, 12
You know relational
11
foo
Saturday, September 29, 12
You know relational
11
foo bar
Saturday, September 29, 12
You know relational
11
foo barfoo_bar
Saturday, September 29, 12
You know relational
11
foo barfoo_bar
Saturday, September 29, 12
You know relational
11
foo barfoo_bar
Saturday, September 29, 12
You know relational
11
foo barfoo_bar
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
You know relational
11
now consider relationships...
Saturday, September 29, 12
11
Saturday, September 29, 12
12
Saturday, September 29, 12
We're talking about aProperty Graph
12
Saturday, September 29, 12
We're talking about aProperty Graph
12
Nodes
Saturday, September 29, 12
We're talking about aProperty Graph
12
Nodes
Relationships
Saturday, September 29, 12
Emil
Andrés
Lars
Johan
Allison
Peter
Michael
Tobias
Andreas
IanMica
Delia
knows
knows
knowsknows
knows
knows
knows
knows
knows
knowsMica
knowsknowsMica
Delia
knows
We're talking about aProperty Graph
12
Nodes
Relationships
Properties (each a key+value)
+ Indexes (for easy look-ups)
Saturday, September 29, 12
12
Saturday, September 29, 12
13
Saturday, September 29, 12
And, but, so how do you query this "graph" database?
13
Saturday, September 29, 12
14
Saturday, September 29, 12
14
Cypher - a graph query language๏a pattern-matching query language
๏declarative grammar with clauses (like SQL)
๏aggregation, ordering, limits
๏create, read, update, delete
Saturday, September 29, 12
14
Cypher - a graph query language๏a pattern-matching query language
๏declarative grammar with clauses (like SQL)
๏aggregation, ordering, limits
๏create, read, update, delete
// get node 1, traverse 2 steps awaystart a=node(1) match (a)--()--(c) return c
// create a node with a 'name' propertyCREATE (me {name: 'Andreas'}) return me
๏more on this later...
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
15
Cypher - pattern matching
Saturday, September 29, 12
16
Cypher - pattern matching syntax
Saturday, September 29, 12
16
Cypher - pattern matching syntax
Saturday, September 29, 12
16
Cypher - pattern matching syntax
() --> ()
Saturday, September 29, 12
17
Cypher - pattern matching syntax
Saturday, September 29, 12
17
Cypher - pattern matching syntax
A B
Saturday, September 29, 12
17
Cypher - pattern matching syntax
(A) --> (B)A B
Saturday, September 29, 12
18
Cypher - pattern matching syntax
Saturday, September 29, 12
18
Cypher - pattern matching syntax
A B
Saturday, September 29, 12
18
Cypher - pattern matching syntax
(A) -- (B)A B
Saturday, September 29, 12
19
Cypher - pattern matching syntax
Saturday, September 29, 12
19
Cypher - pattern matching syntax
A BLOVES
Saturday, September 29, 12
19
Cypher - pattern matching syntax
A -[:LOVES]-> B
A BLOVES
Saturday, September 29, 12
20
Cypher - pattern matching syntax
Saturday, September 29, 12
20
Cypher - pattern matching syntax
A B C
Saturday, September 29, 12
20
Cypher - pattern matching syntax
A --> B --> CA B C
Saturday, September 29, 12
21
Cypher - pattern matching syntax
Saturday, September 29, 12
21
Cypher - pattern matching syntax
A
B C
Saturday, September 29, 12
21
Cypher - pattern matching syntax
A --> B --> C, A --> C
A
B C
Saturday, September 29, 12
21
Cypher - pattern matching syntax
A --> B --> C, A --> C
A
B C
A --> B --> C <-- ASaturday, September 29, 12
22
Saturday, September 29, 12
22
Cypher - common clauses
Saturday, September 29, 12
22
Cypher - common clauses// get node 1, traverse 2 steps awaySTART a=node(1) MATCH (a)--()--(c) RETURN c
// get node from an index, return itSTART a=node:people(name='Andreas')RETURN a
// get node from an index, match, filter// with where, then return resultsSTART a=node:people(name='Andreas')MATCH (a)-[r]-(b) WHERE b.last='Sparrow'RETURN r,b
Saturday, September 29, 12
FEC Campaign Data
23
Saturday, September 29, 12
FEC Campaign Data
23
yeah, this is the good stuff..
Saturday, September 29, 12
FEC Campaign Data
23
yeah, this is the good stuff..
and now, it's time for
Saturday, September 29, 12
FEC Campaign Data
24
๏In 1975, Congress created the Federal Election Commission (FEC) to administer and enforce the Federal Election Campaign Act (FECA) – The statute that governs the financing of federal elections.
๏The duties of the FEC, which is an independent regulatory agency, are to disclose campaign finance information
Saturday, September 29, 12
FEC Campaign Data
25
๏Detailed files about...
•Candidates
•Committees
•Individual Contributions
๏10 years of data
๏Updated every Sunday
Committee Candidate
Individual Contributions
Saturday, September 29, 12
FEC Campaign Data - Committees
26
๏Committees
•one record for each committee registered with the Federal Election Commission.
CMTE_ID: StringCMTE_NM: StringTRES_NM: StringCMTE_ST1: StringCMTE_ST2: StringCMTE_CITY: StringCMTE_ST: StringCMTE_ZIP: StringCMTE_DSGN: StringCMTE_TP: StringCMTE_PTY_AFFILIATION: StringCMTE_FILING_FREQ: StringORG_TP: StringCONNECTED_ORG_NM: StringCAND_ID: String
Committee - cm12.txt
Saturday, September 29, 12
FEC Campaign Data
27
๏Candidates
•one record for each candidate who has either registered with the FEC or appeared on a ballot list prepared by a state elections office.
CAND_ID: StringCAND_NAME: StringCAND_PTY_AFFILIATION: StringCAND_ELECTION_YR: StringCAND_OFFICE_ST: StringCAND_OFFICE: StringCAND_OFFICE_DISTRICT: StringCAND_ICI: StringCAND_STATUS: StringCAND_PCC: StringCAND_ST1: StringCAND_ST2: StringCAND_CITY: StringCAND_ST: StringCAND_ZIP: String
Candidate - cn12.txt
Saturday, September 29, 12
FEC Campaign Data
28
๏Individual Contributions
•each contribution from an individual to a federal committee if the contribution was at least $200.
CMTE_ID: StringAMNDT_IND: StringRPT_TP: StringTRANSACTION_PGI: StringIMAGE_NUM: StringTRANSACTION_TP: StringENTITY_TP: StringNAME: StringCITY: StringSTATE: StringZIP_CODE: StringEMPLOYER: StringOCCUPATION: StringTRANSACTION_DT: StringTRANSACTION_AMT: DoubleOTHER_ID: StringTRAN_ID: StringFILE_NUM: IntegerMEMO_CD: StringMEMO_TEXT: StringSUB_ID: Integer
Individual Contrib - itcont.txt
Saturday, September 29, 12
FEC Campaign Data - Extra Records
29
๏Candidate to Committee Linkage
•registered candidate to committee linkage
๏Transactions between Committees
• inter-committee contribution or independent expenditure during the two-year election cycle
๏Contribution to Candidate
•contribution or independent expenditure from committee to candidate during the two-year election cycle
Saturday, September 29, 12
Import Strategy
30
Saturday, September 29, 12
Raw Data Import
31
Committee Candidate
Candidate to Committee
Inter Committee Contributions
Candidate Contributions
Individual Contributions
Saturday, September 29, 12
Raw Data Import
31
Committee Candidate
Candidate to Committee
Inter Committee Contributions
Candidate Contributions
Individual Contributions
CAND_IDCMTE_ID
CMTE_ID CAND_ID
CMTE_ID
CAND_ID
CMTE_ID
Saturday, September 29, 12
Connected Data Import
32
Saturday, September 29, 12
CAND_ID
CMTE_ID CAND_ID
CMTE_ID
OTHER_ID:CAND_ID(from)
OTHER_ID:CMTE_ID(from) CAND_ID
CMTE_ID
CMTE_ID(to)
OTHER_ID:CAND_ID(from)
OTHER_ID:CMTE_ID(from)
Committee Candidate
Candidate to Committee
Inter Committee Contributions
Candidate Contributions
Individual Contributions
Connected Data Import
32
Saturday, September 29, 12
CAND_ID
CMTE_ID CAND_ID
CMTE_ID
OTHER_ID:CAND_ID(from)
OTHER_ID:CMTE_ID(from) CAND_ID
CMTE_ID
CMTE_ID(to)
OTHER_ID:CAND_ID(from)
OTHER_ID:CMTE_ID(from)
Committee Candidate
Candidate to Committee
Inter Committee Contributions
Candidate Contributions
Individual Contributions
Connected Data Import
32
Saturday, September 29, 12
Related Data Import
33
Saturday, September 29, 12
CAMPAIGNS_FOR
SUPPORTS
INTER_COMMITTEE_CONTRIBUTION
CANDIDATE_CONTRIBUTION
INDIVIDUAL_CONTRIBUTIONEARMARKED_BY
EARMARKED_BY
Committee Candidate
Individual Contributions
INTER_COMMITTEE_CONTRIBUTION
Related Data Import
33
Saturday, September 29, 12
CAMPAIGNS_FOR
SUPPORTS
INTER_COMMITTEE_CONTRIBUTION
CANDIDATE_CONTRIBUTION
INDIVIDUAL_CONTRIBUTIONEARMARKED_BY
EARMARKED_BY
Committee Candidate
Individual Contributions
INTER_COMMITTEE_CONTRIBUTION
Related Data Import
33
Committee Candidate
Individual Contributions
Saturday, September 29, 12
Dave Fauth's Approach
34
Saturday, September 29, 12
Advanced Import - Dave Fauth๏ includes SuperPAC data
๏custom transform, then import
๏model then looks like this...
35
SUPPORTSCommittee Candidate
FUNDS
GIVES
Contribution
Individual
Expenditures
superPac Contributions
Saturday, September 29, 12
Advanced Import - Dave Fauth๏Extract and Transform
• Stored files on S3
•Used MortarData to run Hadoop jobs to prepare data (@MortarData)
๏Load
•Used Neo4J BatchInserter to load
•Thanks to Michael Hunger (@mesirii)
• Loaded 2M+ nodes in <5 minutes
36
Saturday, September 29, 12
37
Download data
Use S3 Storage
Process with Hadoop/Pig
Java BatchInsert
Created Neo4J DB
Advanced Import - Dave Fauth
Saturday, September 29, 12
38
Wanna learn more?
๏Come hear Dave Fauth present at...
Saturday, September 29, 12
Next...Your Turn
39
Saturday, September 29, 12
From scratch
40
๏git clone https://github.com/akollegger/FEC_GRAPH.git
๏cd FEC_GRAPH
๏ant initialize
• (need Apache ant? install from http://ant.apache.org)
๏ant
• ant will build the importers and create a script
๏ ./bin/fec2graph --force --importer=RELATED
๏ant neo4j-start
•will download and unpack neo4j, then start it
Saturday, September 29, 12
Investigate with Neo4j's Web UI
๏open http://localhost:7474
๏Dashboard - overview of data records
๏Data browser - examine data records, with visualization options
๏Console - query the database using Cypher
41
Saturday, September 29, 12
Querying FEC with Cypher๏For Cypher documentation
• http://docs.neo4j.org/
๏FEC Data Definitions
• http://www.fec.gov/finance/disclosure/ftpdet.shtml
๏Ready for a challenge?
42
Saturday, September 29, 12
43
Cypher Challenges
http://1.usa.gov/uIGzZSaturday, September 29, 12
43
Cypher Challenges// All presidential candidates for 2012
// Top 10 Presidential candidates according to number of campaign committees
// find President Barack Obama
// lookup Obama by his candidate ID
// find Presidential Candidate Mitt Romney
// lookup Romney by his candidate ID
// find the shortest path of funding between Obama and Romney
// 10 top individual contributions to Obama
// 10 top individual contributions to Romney
http://1.usa.gov/uIGzZSaturday, September 29, 12
44
Saturday, September 29, 12
44
Cypher Challenges
Saturday, September 29, 12
44
Cypher Challenges
// All presidential candidates for 2012start candidate=node:candidates('CAND_ID:*')where candidate.CAND_OFFICE='P' ANDcandidate.CAND_ELECTION_YR='2012'return candidate.CAND_NAME;
// Top 10 Presidential candidates according to // number of campaign committeesstart candidate=node:candidates('CAND_ID:*')match candidate<-[r:SUPPORTS]-(campaign) where candidate.CAND_OFFICE='P' AND
candidate.CAND_ELECTION_YR='2012' return candidate.CAND_NAME, COUNT(campaign) as countORDER BY count desc LIMIT 10;
// find President Barack Obamastart obama=node:candidates('CAND_ID:*') WHERE obama.CAND_NAME =~ '.*OBAMA.*' return obama.CAND_NAME, obama.CAND_ID;
Saturday, September 29, 12
45
Saturday, September 29, 12
45
Cypher Challenges
Saturday, September 29, 12
45
Cypher Challenges// lookup Obama by his candidate IDstart obama=node:candidates(CAND_ID='P80003338') return obama;
// find Presidential Candidate Mitt Romneystart romney=node:candidates('CAND_ID:*') WHERE romney.CAND_NAME =~ '.*ROMNEY.*' return romney.CAND_NAME, romney.CAND_ID;
// lookup Romney by his candidate IDstart romney=node:candidates(CAND_ID='P80003353') return romney;
// find the shortest path of funding between Obama and Romneystart romney=node:candidates(CAND_ID='P80003353'),
obama=node:candidates(CAND_ID='P80003338') MATCH p=shortestPath(romney-[*..10]-obama) return p;
Saturday, September 29, 12
46
Saturday, September 29, 12
46
Cypher Challenges
Saturday, September 29, 12
46
Cypher Challenges// 10 top individual contributions to Obamastart obama=node:candidates(CAND_ID='P80003338') match obama<-
[:SUPPORTS]-(campaign)<-[:INDIVIDUAL_CONTRIBUTION]-(contribution) return contribution.NAME, contribution.TRANSACTION_AMT order by contribution.TRANSACTION_AMT desc limit 10;
// 10 top individual contributions to Romneystart romney=node:candidates(CAND_ID='P80003353') match
romney<-[:SUPPORTS]-(campaign)<-[:INDIVIDUAL_CONTRIBUTION]-(contribution) return contribution.NAME, contribution.TRANSACTION_AMT order by contribution.TRANSACTION_AMT desc limit 10;
Saturday, September 29, 12
Customize the Data Importer
๏Java-savvy and feeling brave?
๏make a copy of
• CODE/fecGraph/src/importer/fec/RelatedFecImporter.java
๏add your class to
• CODE/fecGraph/src/importer/Tool.java
๏read docs about batch insertion
• http://docs.neo4j.org/chunked/milestone/batchinsert.html
๏Ideas:
•extract States and Zip Codes into "location index"
•extract individual contributors from contribution list47
Saturday, September 29, 12
48
#neo4j
Saturday, September 29, 12
48
Let's have some Fun!
:)
#neo4j
Saturday, September 29, 12
49
#neo4j
Neo4j
Heroku
REST Cypher
Ruby
Saturday, September 29, 12
49
#neo4j
Neo4j
Heroku
REST Cypher
Ruby
Follow the Plan - Part 2
Saturday, September 29, 12
50
Saturday, September 29, 12
Follow the Plan - Part 2
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)
4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)
4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy
5. Upload the data from example-data.neo4j.org
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)
4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy
5. Upload the data from example-data.neo4j.org
6. Connect to the app using a Google Spreadsheet , http://bit.ly/GDG-GCALC
50
Saturday, September 29, 12
Follow the Plan - Part 21. Register at Heroku and install the heroku gem
2. Create and install a Heroku app (heroku apps:create)
3. Add a Neo4j addon (http://addons.heroku.com/neo4j) instance to it (heroku addons:add neo4j)
4. Create a custom Ruby app (code below, GitHub) https://github.com/neo4j-examples/heroku-neo4j-proxy
5. Upload the data from example-data.neo4j.org
6. Connect to the app using a Google Spreadsheet , http://bit.ly/GDG-GCALC
7. Build a small bar chart from a Cypher query
50
Saturday, September 29, 12
51
Heroku Challenges
http://1.usa.gov/uIGzZSaturday, September 29, 12
51
Heroku Challenges//Point the Database Instance to FEC http://bit.ly/SmkwUx/db/
data
// Build a Google Data table endpoint
// https://developers.google.com/chart/interactive/docs/php_example
http://1.usa.gov/uIGzZSaturday, September 29, 12
The Heroku Neo4j proxy App
52
Saturday, September 29, 12
The Heroku Neo4j proxy App
52
> gem install heroku> git add github [email protected]:neo4j-examples/heroku-neo4j-
proxy.git> heroku apps:create <app-name>> heroku addons:add neo4j> //add the project files> git add *; git commit -m"neo4j demo"> git push heroku master
Saturday, September 29, 12
The Google Spreadsheet Cypher driver
53https://docs.google.com/spreadsheet/ccc?key=0AsSBFHSo5OaPdGhzT1RTbDVaR0R3NW5iNUFpejVuSHc#gid=0
Saturday, September 29, 12
The Google Spreadsheet Cypher driver
53
function cypherUrlREST(payload, url, user, pwd) { var auth = Utilities.base64Encode(user+":"+pwd); var response = UrlFetchApp.fetch( url, {"method":"POST", "payload": payload, "contentType": "application/json", "headers":{ "Authorization":"Basic "+auth, "accept":"application/json", } }); return response.getContentText();}
https://docs.google.com/spreadsheet/ccc?key=0AsSBFHSo5OaPdGhzT1RTbDVaR0R3NW5iNUFpejVuSHc#gid=0
Saturday, September 29, 12
54
Google Challenges
http://1.usa.gov/uIGzZSaturday, September 29, 12
54
Google Challenges// Build a cypher parser in GoogleAppsScript
// Build a Cypher query Google Widget
// Visualize Cypher Results with Google Data Table
// Geographic data viz
http://1.usa.gov/uIGzZSaturday, September 29, 12
55
The heatmap from Cypher to Google
Saturday, September 29, 12
56
Wanna learn more?
Saturday, September 29, 12