[databeers] 18-09-2014 models: pets and herds. carlos j. gil bellosta
DESCRIPTION
Models: pets and herds". Carlos J. Gil BellostaTRANSCRIPT
![Page 1: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/1.jpg)
Models: pets and herds
Carlos J. Gil [email protected]
September 2014
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 1 / 21
![Page 2: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/2.jpg)
This is a pet...
Source: http://jessfalcone.wordpress.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 2 / 21
![Page 3: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/3.jpg)
... and this is a herd
Source: http://bonfirehealth.com/negative-influences-comparisons-social-cues-herd/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 3 / 21
![Page 4: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/4.jpg)
Some people treat computers as pets...
Source: aliexpress.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 4 / 21
![Page 5: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/5.jpg)
... an others like herds
Source: Failure Trends in a Large Disk Drive Population, Pinheiro et al.
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 5 / 21
![Page 6: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/6.jpg)
This is a statistical model treated as a pet
Source: http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 6 / 21
![Page 7: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/7.jpg)
Pets are very demanding and require...
1 variable selection,
2 checks for outilers,
3 assessment of the goodness of fit,
4 finding confidence intervals,
5 calculating p-values,
6 interpretating the results,
7 discuss the generalization,
8 ...
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 7 / 21
![Page 8: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/8.jpg)
Models... as herds?
Source: http://www.gotmedieval.com
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 8 / 21
![Page 9: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/9.jpg)
Model construction: population
Source: http://timyeo.wordpress.com/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 9 / 21
![Page 10: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/10.jpg)
Model construction: data enrichment (aka left join)
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 10 / 21
![Page 11: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/11.jpg)
But subject data is often messy...
Source: http://arquitectolegista.com.ar/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 11 / 21
![Page 12: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/12.jpg)
... and contains temporal data...
Source: http://thirdorderscientist.org/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 12 / 21
![Page 13: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/13.jpg)
... that is difficult to fit in a box (table)
Source: http://cutestcatpics.com/cat-trying-to-fit-into-tiny-box/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 13 / 21
![Page 14: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/14.jpg)
We have a whole dataset per subject!
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 14 / 21
![Page 15: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/15.jpg)
... and a model per subject?
Source: http://www.unc.edu/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 15 / 21
![Page 16: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/16.jpg)
(Most) models are sophisticated summaries of data
Source: http://xkcd.r-forge.r-project.org/
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 16 / 21
![Page 17: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/17.jpg)
Do you seek α? Build a model per stock!
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 17 / 21
![Page 18: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/18.jpg)
Fitting a million models in the nineties was all of anachievement (for some)
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 18 / 21
![Page 19: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/19.jpg)
Beyond recency and frequency: self exciting processes
Source: Bursting transition in a linear self-exciting point process, Onaga, T. et al
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 19 / 21
![Page 20: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/20.jpg)
One logistic regression per Gmail user...
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 20 / 21
![Page 21: [Databeers] 18-09-2014 Models: pets and herds. Carlos J. Gil Bellosta](https://reader034.vdocuments.net/reader034/viewer/2022052621/55810118d8b42a9d198b5143/html5/thumbnails/21.jpg)
Challenges: statistical, computational,... and more!
This approach faces many challenges:
1 Computational: how do you fit so many models? (but Spark rocks!)2 Statistical: how do you...
1 perform variable selection?2 evaluate the fit?3 deal with outliers?4 ...
3 And finally, how do you sell these approaches to business people(ex-Google)?
Carlos J. Gil Bellosta – datanalytics Models: pets and herds September 2014 21 / 21