kaggle - global data science community
DESCRIPTION
slides from the Lviv IT Arena talkTRANSCRIPT
![Page 1: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/1.jpg)
Kaggle – the global community of Data Science professionals
Anastasiia Kornilova
![Page 2: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/2.jpg)
Who am I?
![Page 3: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/3.jpg)
- MS in Applied Mathematics, - 3 years as a Data Scientist
![Page 4: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/4.jpg)
![Page 5: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/5.jpg)
What is Data Science?
![Page 6: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/6.jpg)
Scientific Method
Math
Statistics
Data Engineering
Domain Expertise
Advanced ComputingVisualization
Hacker Mindset
What matters?
![Page 7: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/7.jpg)
![Page 8: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/8.jpg)
What is Kaggle?
![Page 9: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/9.jpg)
2010 - founded in Melbourne, Australia by Antony Goldbloom
![Page 10: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/10.jpg)
What problem they solve?
Data problems
Data solvers
![Page 11: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/11.jpg)
![Page 12: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/12.jpg)
![Page 13: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/13.jpg)
In fact, a McKinsey Global Institute report estimates that by 2018, “the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” !!!
![Page 14: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/14.jpg)
Between 2010 and 2020, the data scientist career path is projected to increase by 18.7 percent, beat only by video game designers. The big data industry is expected to be a 53.4 billion industry by 2016.
![Page 15: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/15.jpg)
Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter emails a day," said Josh Sullivan, who leads a 500-person data-science group at the consulting firm Booz Allen Hamilton Holding
![Page 16: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/16.jpg)
Are you good enough?
![Page 17: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/17.jpg)
First Competition: Forecast Eurovision Song Contest Voting
!
!
- 1000 dollars prize - 22 teams
Outperformed prediction markets: predict 7 countries from Top10, prediction markets only 5.
![Page 18: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/18.jpg)
- 2011 - relocated to San Francisco - November, 2011 - raise 11M dollars fundings - July, 2013 - 100,000 data scientists involved - February, 2014 - more than 140,000 data
scientists
Short story of success
![Page 19: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/19.jpg)
![Page 20: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/20.jpg)
How you can use Kaggle?
![Page 21: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/21.jpg)
Rewarding types
- Knowledge - Money - Job interview
![Page 22: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/22.jpg)
Competitions for knowledge (always open)
!
- Digit recognizer, CIFAR-10, First steps with Julia - Titanic: Machine Learning for Disaster - Bike Sharing Demand - Learning Social Circles in Networks
![Page 23: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/23.jpg)
Competitions with prize:Open: - American Epilepsy Society Seizure Prediction
Challenge: 25, 000 prize - Africa Soil Property Prediction Challenge: 8,000 prize - Tradeshift Text Classification: 5,000 prize
![Page 24: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/24.jpg)
Completed competitions (170+)- Heritage Health Price: 500,000 - GE Flight Quest: 250,000 - GE Hospital Quest: 100,000 - Higgs Boson ML Challenge: 13,000 + invitation to
CERN - Galaxy Zoo: 16,000 - KDD Author Paper Identification Challenge - Job Recommendation Challenge
![Page 25: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/25.jpg)
Job competitions (completed):Facebook:
- recommend missing links in social graph (who to follow) - optimal graph path - predict text tags
Yelp: - estimate the number of useful votes a review will receive
Wallmart: - predict store sales
+ Job Board
![Page 26: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/26.jpg)
How to win?
![Page 27: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/27.jpg)
Dig into the data
![Page 28: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/28.jpg)
![Page 29: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/29.jpg)
Stay on track
![Page 30: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/30.jpg)
!
Kaggle competition == Data science?
![Page 31: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/31.jpg)
1. Understand
2. Collect
3. Data exploration4. Clean and transform
5. Model
6. Validate
7. Communicating results
Deploy
![Page 32: Kaggle - global Data Science community](https://reader035.vdocuments.net/reader035/viewer/2022081802/5538650a550346722e8b47ea/html5/thumbnails/32.jpg)
?