dataengconf sf16 - bridging the gap between data science and data engineering
TRANSCRIPT
Data Engineering and Data Science:Bridging the Gap
About Me● Slack’s Head of Data
Engineering
● Used to work at Cloudera, Google, other places
● Wrote popular tweet that I’m sort of tired of talking about
● Only owns one hat
First Question: Why Bother?
In The Beginning...
What Do We Both Want?
Infinite Loop of Sadness
Data Eng
Ops
Data Science
Business
Alone Together
Back To First Principles
Deploying Kafka
Infinite Loop of Sadness Empathy
Data Eng
Ops
Data Science
Business
Rage In Support Of The Machine
Everybody ETLs
Option 1: SQL-Centric ETL
Option 2: JVM-Centric ETL
A Third Way
#1: The Rise of Spark
#2: Too Many Streaming Engines
#3: Streaming Design Patterns
#4: A Focus On The Real Problem
Inspiration From Deep Learning
Not Quite Static Typing
Glitch
Table Declarations
Scripting >> UDFs...
...but SQL when you need it
A Brave New World
Thanks!(oh and we’re hiring)