data science, what even?!

120
Data Science?! what even...

Upload: david-coallier

Post on 27-Jan-2015

125 views

Category:

Technology


1 download

DESCRIPTION

Presented an abridged version of my "What is data science" talk at #websummit 2013. This talk goes over the required skillset as defined by Drew Conway and his famous venn diagram, and also outlines the Data Scientific Method brought by Dr. Patil. The talk is mainly two parts and the second part goes over some of the packages and technologies we use — minus the storage part.

TRANSCRIPT

Page 1: Data Science, what even?!

Data Science?!what even...

Page 2: Data Science, what even?!

David Coallier@davidcoallier

Page 3: Data Science, what even?!

Data ScientistEngine Yard

Page 4: Data Science, what even?!
Page 5: Data Science, what even?!

And I cook..A lot.

Page 6: Data Science, what even?!

(n-1) items

Page 7: Data Science, what even?!

Adapting.

Page 8: Data Science, what even?!

Feedback.

Page 9: Data Science, what even?!

Indifference.

Page 10: Data Science, what even?!

Young mathematically inclined minds

Page 11: Data Science, what even?!

Young mathematically inclined minds

We knew everything.

Page 12: Data Science, what even?!

First Bad Assumption.

Page 13: Data Science, what even?!

So we asked “experts”.

Page 14: Data Science, what even?!

Wrong Ingredients

Page 15: Data Science, what even?!

Bad Data

Page 16: Data Science, what even?!

Tasted like sh*t

Page 17: Data Science, what even?!
Page 18: Data Science, what even?!
Page 19: Data Science, what even?!
Page 20: Data Science, what even?!

From Our ResultsWe had questions.

Page 21: Data Science, what even?!

Found ExpertiseNot Online.

Page 22: Data Science, what even?!

Data Scientific Method

Page 23: Data Science, what even?!

Find a QuestionYour Hypothesis

Page 24: Data Science, what even?!

Current DataWhat do you have?

Page 25: Data Science, what even?!

Features & TestsTry it.

Page 26: Data Science, what even?!

Analyse ResultsWon’t be pretty.

Page 27: Data Science, what even?!

ConversationFramed. By. Data.

Page 28: Data Science, what even?!

But....

Page 29: Data Science, what even?!

Good DiscussionsImply good data scientists

Page 30: Data Science, what even?!

Hacking Skills

Page 31: Data Science, what even?!

Hacking Skills

Maths & Stats

Page 32: Data Science, what even?!

Hacking Skills

Maths & Stats

Expertise

Page 33: Data Science, what even?!

Hacking Skills

Maths & Stats

Expertise

MachineLearning

Research

DangerZone!!!

Page 34: Data Science, what even?!

Hacking Skills

Maths & Stats

Expertise

DataScience

Page 35: Data Science, what even?!

Hacking Skills

Maths & StatsExpertise

MachineLearning

Research

DangerZone!!!

DataScience

Page 36: Data Science, what even?!

BusinessDon’t need an MBA

Page 37: Data Science, what even?!

In other words.

Page 38: Data Science, what even?!

1. Hacking2. Maths & Stats3. Expertise

Page 39: Data Science, what even?!

Apply MethodData Scientific

Page 40: Data Science, what even?!

1. Question2. Current Data3. Features/Tests4. Analyse5. Converse

Page 41: Data Science, what even?!

Find a QuestionLet’s imagine Github

Page 42: Data Science, what even?!

Upgrade ReposAffect users as little as possible

Page 43: Data Science, what even?!

import csvcontent = csv.read('repo1.csv')

Page 44: Data Science, what even?!

f (k;λ) = λ ke−k

k!for k >= 0

Page 45: Data Science, what even?!
Page 46: Data Science, what even?!

ConversePresent Findings

Page 47: Data Science, what even?!

IterateCommits aren’t key.

Page 48: Data Science, what even?!

KPIs are keyIndicators from experience

Page 49: Data Science, what even?!

QuestionsSuper Important.

Page 50: Data Science, what even?!

Just test it..

Page 51: Data Science, what even?!
Page 52: Data Science, what even?!
Page 53: Data Science, what even?!
Page 54: Data Science, what even?!

We are Human.Emotional Connection

Page 55: Data Science, what even?!

What next?Second Hypothesis.

Page 56: Data Science, what even?!
Page 57: Data Science, what even?!

Focus on DataRelevant to your KPIs.

Page 58: Data Science, what even?!
Page 59: Data Science, what even?!

Data gives you the what

Humans give you the why

Page 60: Data Science, what even?!

Turn Information

Page 61: Data Science, what even?!

Into

Actionable Insight

Page 62: Data Science, what even?!

Create DiscussionsIntrospection Engines

Page 63: Data Science, what even?!

Seeing, Feeling itThe brain sees.

Page 64: Data Science, what even?!

Not regressions

Page 65: Data Science, what even?!

Not p-values

Page 66: Data Science, what even?!

Not slopes

Page 67: Data Science, what even?!

Not F-statistics

Page 68: Data Science, what even?!

Not coefficients

Page 69: Data Science, what even?!
Page 70: Data Science, what even?!
Page 71: Data Science, what even?!

Question DataNot Visualisations.

Page 72: Data Science, what even?!

ToolboxWhat do we use?

Page 73: Data Science, what even?!

RModeling, Testing, Prototyping

Page 74: Data Science, what even?!

RStudioThe IDE

Page 75: Data Science, what even?!

lubridateand zoo

Dealing with Dates...

Page 76: Data Science, what even?!

yy/mm/dd mm/dd/yyYYYY-mm-dd HH:MM:ss TZyy-mm-dd 1363784094.513425yy/mm different timezone

Page 77: Data Science, what even?!

reshape2Reshape your Data

Page 78: Data Science, what even?!

ggplot2Visualise your Data

Page 79: Data Science, what even?!

RCurl, RJSONIOFind more Data

Page 80: Data Science, what even?!

HMiscMiscellaneous useful functions

Page 81: Data Science, what even?!

forecastCan you guess?

Page 82: Data Science, what even?!

garchGeneralized Autoregressive Conditional Heteroskedasticity

Page 83: Data Science, what even?!

quantmodStatistical Financial Trading

Page 84: Data Science, what even?!
Page 85: Data Science, what even?!

getSymbols('AAPL')barChart(AAPL)addMACD()

Page 86: Data Science, what even?!

xtsExtensible Time Series

Page 87: Data Science, what even?!

igraphStudy Networks

Page 88: Data Science, what even?!

maptoolsRead & View Maps

Page 89: Data Science, what even?!

map('state', region = c(row.names(USArrests)), col=cm.colors(16, 1)[floor(USArrests$Rape/max(USArrests$Rape)*28)], fill=T)

Page 90: Data Science, what even?!

PythonScientific Computing

Page 91: Data Science, what even?!

SciPyhttp://www.scipy.org

Page 92: Data Science, what even?!

scipy.stats

Page 93: Data Science, what even?!

scipy.statsDescriptive Statistics

Page 94: Data Science, what even?!

from scipy.stats import describe

s = [1,2,1,3,4,5]

print describe(s)

Page 95: Data Science, what even?!

scipy.statsProbability Distributions

Page 96: Data Science, what even?!

ExamplePoisson Distribution

Page 97: Data Science, what even?!

f (k;λ) = λ ke−k

k!for k >= 0

Page 98: Data Science, what even?!

import scipy.stats.poissonp = poisson.pmf([1,2,3,4,1,2,3], 2)

Page 99: Data Science, what even?!

print p.mean()print p.sum()...

Page 100: Data Science, what even?!

NumPyhttp://www.numpy.org/

Page 101: Data Science, what even?!

NumPyLinear Algebra

Page 102: Data Science, what even?!

1 00 1

⎛⎝⎜

⎞⎠⎟

Page 103: Data Science, what even?!

import numpy as npx = np.array([ [1, 0], [0, 1] ])vec, val = np.linalg.eig(x)np.linalg.eigvals(x)

Page 104: Data Science, what even?!

>>> np.linalg.eig(x) ( array([ 1., 1.]), array([ [ 1., 0.], [ 0., 1.] ]) )

Page 105: Data Science, what even?!

MatplotlibPython Plotting

Page 106: Data Science, what even?!

statsmodelsAdvanced Statistics Modeling

Page 107: Data Science, what even?!

NLTKNatural Language Tool Kit

Page 108: Data Science, what even?!

scikit-learnMachine Learning

Page 109: Data Science, what even?!

from sklearn import treeX = [[0, 0], [1, 1]]Y = [0, 1]clf = tree.DecisionTreeClassifier()clf = clf.fit(X, Y)

clf.predict([[2., 2.]])>>> array([1])

Page 110: Data Science, what even?!

PyBrain... Machine Learning

Page 111: Data Science, what even?!

PyMCBayesian Inference

Page 112: Data Science, what even?!

PatternWeb Mining for Python

Page 113: Data Science, what even?!

NetworkXStudy Networks

Page 114: Data Science, what even?!

MILK: Machine Learning

Page 115: Data Science, what even?!

Pandaseasy-to-use data structures

Page 116: Data Science, what even?!

from pandas import *x = DataFrame([ {"age": 26}, {"age": 19}, {"age": 21}, {"age": 18}])

print x[x['age'] > 20].count()print x[x['age'] > 20].mean()

Page 117: Data Science, what even?!

Python vs R?Different Purposes

Page 118: Data Science, what even?!

DogfoodingData Scientific Method

Page 119: Data Science, what even?!

Original QuestionWhat is Data Science?

Page 120: Data Science, what even?!

Back to youFor questioning