10 mistakes to avoid in data science

Post on 11-Apr-2017

71 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

avier rudent

enough? dependsonwhatyouwanttoachieve(trainneuralnetwork,ABtes8ng…)backtosta8s8cs(mathema8calcondi8ons)+ruleofthumb

ManyformatcanbeeasilyreadwithR Text,csv,excel,protobuffer,json,xml,html,SQL…

ManysourcesalreadyavailableKaggleWebsitesminingOpendataGovernmentagencies

library(XML)Web.page<-htmlTreeParse("hNp://lapresse.ca")

More details in coming lectures

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Mistake 10 Not Having (enough) Data

Where does your data come from? standupandgetout,talkwithpeople,readdoc

library(ggplot2)library(tabplot)tableplot(diamonds)

Donotunderes;matenaïvetests…

Mistake 9 Do not check data quality

PackageRDatacheck

Whatkindofdata?WhatdoIwanttoknow? Geographic?Time-series?Correla8on?Whichvisualiza;on?histogram,boxplot,mosaic,heatmap,hexbining,scaNerplot,linechart,3DManyRpackagesavailable:

ggplot2leafletplot_lycorrplot

Mistake 8 Do not look at your data

Look at your data

More details in coming lectures

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

ChoosetherightcolorColorblindness,prin8ng,meaning…

Levelofinterac;vity?

SetupdeadlineMaketo-dolistsProjectmanagementtool(Asana)Plan&monitoryour8me

Mistake 7 Not having a plan

Have a plan and focus on it

Donotforgetthebigpicturege\nglostintotechnicaltools

What is the question you want to answer?

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Rpackagecaret:evaluatemodel,choice,es8mateperformance(regression&classifica8on)Sta;s;caltests:Goodnessoffit,R2,Homer-Lemeshowtest(MKmisc),Waldtest,k-foldvalida8onRetrain“oMen”

Observe > Clean > Understand > Train > Predict

Mistake 6 Focus on training

Ques;on:HowmuchsnowwillfallonMontréalduringthe5nextyears?Data:Snowfallandtemperatureofthelast80years

?

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Mistake 5 Keep it complex

Do not jump first on the fashion complicated method

Keepyourmethodassimpleaspossible(focusontheques8on)KnowthelimitsofthismethodComparethemethods(caret,ROC)

BoostedDecisionTreecoupledtoneuralnetwork

Linearregression

Complexitycomesataprice(speed,errorprone,

exper8se,amountofdata)

Canyouaffordit?

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Rstandardfunc8on:p.adjust,Bonferroni,Benjamini-Hochberg

Youareamiserableshooter,probatohit1%Youshoot10,000lasers,hitat10,001stshotDoesthatmakeyouashooHnggenius?

Mistake 4 Do not correct for multiple tests

Mul8plica8onofsensors,datagatheringprotocolsàEraofBigDataThemoredatayouanalyze,themoreweirdcaseswillpopupregularly

Aretheysignificant?

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

document your work

RMarkdown,Shiny

CreateHTML,pdf,Word,slides,webpages,CV,journal,bookAutoma8callyinclude&updatetheresultofyouranalysis

Moreinterac8ve?Dashboards,interac8vemaps…

hNp://rmarkdown.rstudio.com/gallery.html

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Mistake 3 Do not communicate or document

fortheothersaswellasforyourself

More details in coming lectures

RMarkdown

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Café,meetups,colleagues,boardgamejoggingclubPublishonline(blog)Askforexternalviewofyourwork

Mistake 2 Stay alone

Do not stay alone, do not work alone

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

DATAscienceassocia8on:codeofconducthNp://www.datascienceassn.org/code-of-conduct.html

Mistake 1 Ethics is a useless luxury

Whatareyoudoing?Forwhom?Whatistheimpactofyourwork? -Company,society,yourself -Short–longtermWhattypeofdataareyouanalyzing? -Law&regula8on -PrivacyDoyouhaveanyconflictofinterest?

MontréalBigDataMeetup22ndMarch2017–XavierPrudent–www.xavierprudent.com

Tendencytofocusonthetechnics,onthechallenge

“Yes,but”answers?

CAST!

XavierPrudent XAVIERPRUDENTOrganizer MICHAELALBOTheAudience ALLOFYOU

TechnicalSupport OVHDesign-PhotographyCHRISTINENAULLEAU

SpecialThankstoGeorgeLucasandtotheaudiencefortheiraNen8on

question? Comment? Feel free to contact me:!

Xavier Prudent, prudentxavier@gmail.com!

top related