demystifying data science with an introduction to machine learning

34
Demys&fying Data Science with and Intro to Machine Learning

Upload: julian-bright

Post on 15-Jan-2015

209 views

Category:

Internet


4 download

DESCRIPTION

Demystifying data science is the slide deck to accompany @brightsparc presentation to SEEK.

TRANSCRIPT

Page 1: Demystifying Data Science with an introduction to Machine Learning

Demys&fying  Data  Science  

with  and  Intro  to  Machine  Learning  

Page 2: Demystifying Data Science with an introduction to Machine Learning

Data  science  is  everywhere  

Page 3: Demystifying Data Science with an introduction to Machine Learning

Sexiest  job  in  21st  century*  

 McKinsey  Global  Ins&tute  report  es&mates  that  by  2018,  “the  United  States  alone  could  face  a  shortage  of  140,000  to  190,000  people  with  deep  analy&cal  skills  as  well  as  1.5  million  managers  and  analysts  with  the  know-­‐how  to  use  the  analysis  of  big  data  to  make  effec&ve  decisions”  

Source:  Harvard  business  Review  Oct’  2012  

 

Page 4: Demystifying Data Science with an introduction to Machine Learning

So  what  is  Data  Science?  

Page 5: Demystifying Data Science with an introduction to Machine Learning

Source:  Hilary  Mason  ex-­‐Chief  data  science  bit.ly    

Page 6: Demystifying Data Science with an introduction to Machine Learning

Who  are  these  unicorns?  

Page 7: Demystifying Data Science with an introduction to Machine Learning

Bit  about  me  

@brightsparc  

Page 8: Demystifying Data Science with an introduction to Machine Learning

I  thought  it  was  all  about  stats?  

Page 9: Demystifying Data Science with an introduction to Machine Learning

It’s  a  broader  skillset  

Source:  h[p://blogs.wsj.com/cio/2014/02/14/it-­‐takes-­‐teams-­‐to-­‐solve-­‐the-­‐data-­‐scien&st-­‐shortage/  

Page 10: Demystifying Data Science with an introduction to Machine Learning

Data  science  pipeline  

Source:  h[p://cacm.acm.org/blogs/blog-­‐cacm/169199-­‐data-­‐science-­‐workflow-­‐overview-­‐and-­‐challenges/fulltext  

Page 11: Demystifying Data Science with an introduction to Machine Learning

Where  does  Kaggle  fit  it?  

   

Degree  breakdown  in  top  100   Areas  of  study  

Page 12: Demystifying Data Science with an introduction to Machine Learning

What’s  the  deal  with  big  data?  

Page 13: Demystifying Data Science with an introduction to Machine Learning

Apache  Hadoop  Ecosystem  

Page 14: Demystifying Data Science with an introduction to Machine Learning

It’s  like  Map  Reduce  you  know  

Page 15: Demystifying Data Science with an introduction to Machine Learning

So  what  about  machine  learning?  

Pioneer  in  machine  learning,  created  a  checkers  game  that  played  itself  

“Give  machines  the  ability  to  learn  without  explicitly  programming  them.”  Arthur  L.  Samuel  (1959)  

Page 16: Demystifying Data Science with an introduction to Machine Learning

Types  of  algorithms  

Page 17: Demystifying Data Science with an introduction to Machine Learning

Some  examples  

Page 18: Demystifying Data Science with an introduction to Machine Learning

Machine  learning  process  

Page 19: Demystifying Data Science with an introduction to Machine Learning

Build  a  model  

Underfit   Overfit  

Linear  Regression  Solve  for  values  of  θ  in  the  Hypothesis  func&on    hθ(x)  

Page 20: Demystifying Data Science with an introduction to Machine Learning

Gradient  descent  algorithm  

Minimize  cost  func&on  which  is  ½  of  average  square  error  of  predic&on  vs.  the  training  data.  

Page 21: Demystifying Data Science with an introduction to Machine Learning

Demo:  House  prices  

Page 22: Demystifying Data Science with an introduction to Machine Learning

Cross  valida&on  –  split  training/test  

Page 23: Demystifying Data Science with an introduction to Machine Learning

Supervised  learning  model  

Page 24: Demystifying Data Science with an introduction to Machine Learning

Recommender  systems  

Collabora&ve  filtering  –  predict  ra&ngs  for  similar  items  given  other  users  behavior  

Page 25: Demystifying Data Science with an introduction to Machine Learning

Collabora&ve  filtering  method  

Source:  h[p://cran.r-­‐project.org/web/packages/recommenderlab/vigne[es/recommenderlab.pdf  

Page 26: Demystifying Data Science with an introduction to Machine Learning

Similar  users  based  on  distance  

Manha[an  distance   Euclidian  distance  

Page 27: Demystifying Data Science with an introduction to Machine Learning

Demo:  Music  recommender  system  

Pearson  Correla&on  Coefficient    

Page 28: Demystifying Data Science with an introduction to Machine Learning

Visualiza&on  frameworks  

Tableau  

D3.js   Processing  

Raphaël.js  

Page 29: Demystifying Data Science with an introduction to Machine Learning

What  about  online  experimenta&on?  

Page 30: Demystifying Data Science with an introduction to Machine Learning

What  will  the  future  look  like  

•  Online  collabora&on  

•  Open  Data  

Page 31: Demystifying Data Science with an introduction to Machine Learning

Next  gen  distributed  compu&ng  

100x  faster  in  memory,  and  10x  faster  even  when  running  on  disk.  

Page 32: Demystifying Data Science with an introduction to Machine Learning

Deep  learning,  a  new  fron&er?  

Geoffrey  Hinton  @Google  

Page 33: Demystifying Data Science with an introduction to Machine Learning

How  can  I  get  started?  •  MOOCs  –  Coursera  Machine  Learning    (Andrew  Ng  -­‐  Stanford)  

–  Learning  from  Data  (Abu-­‐Mostafa  -­‐  Caltech)  

•  Other  references  –  Collec&ve  Intelligence  – Mining  of  massive  data  sets  –  Open-­‐Source  Data  Science  Masters  

•  Frameworks  –  Python  –  Scikit  learn  –  Java  –  WEKA  and  Cascading  

Page 34: Demystifying Data Science with an introduction to Machine Learning

Ques&ons