![Page 1: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/1.jpg)
OptimizingBaseball Performance andPlayer Salary
Michael GrenonCS378 Data Mining
Spring 2018
![Page 2: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/2.jpg)
![Page 3: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/3.jpg)
Baseball ♥ Stats
![Page 4: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/4.jpg)
North America ♥ Baseball
![Page 5: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/5.jpg)
![Page 6: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/6.jpg)
Franchise Entertainmentorganization
moneydata
wins
2017*
Optimization
![Page 7: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/7.jpg)
Salary optimizationHow did teams optimize their player salaries?
● No Salary Cap!● Similarity problem
○ Linear correlation○ Pearson correlation coefficient
![Page 8: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/8.jpg)
How to Play to Win?Which aspects of play most strongly correlate with winning?
● Similarity problem○ (Linear) association○ Pearson correlation coefficient
![Page 9: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/9.jpg)
How do the best teams use their players?How frequently are certain players used in games?
● Frequent item set problem○ Apriori algorithm○ Support threshold?
![Page 10: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/10.jpg)
![Page 11: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/11.jpg)
![Page 12: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/12.jpg)
R = 0.253
![Page 13: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/13.jpg)
Price Per Win
![Page 14: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/14.jpg)
E = win% * (ppw rank + win rank)
R = -0.555
![Page 15: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/15.jpg)
BsR ≈ Baserunning WAR
Taken from team_batting(2017)
R = 0.548
![Page 16: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/16.jpg)
wSL ≈ weighted Slider
Taken from team_batting(2017)
R = 0.500
![Page 17: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/17.jpg)
¯\_(ツ)_/¯
![Page 18: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/18.jpg)
What’s next?How frequently are certain players used in games?
● Frequent item set problem○ Apriori algorithm○ Support threshold?
![Page 19: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/19.jpg)
What’s next?To what extent are win-loss record and attendance related?
● Extending Pearson correlation analysis
![Page 20: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/20.jpg)
Preliminary Conclusions
● Hitting coaches: teach how to hit a slider○ Pitch most “cost-efficient” to excel at hitting○ ...but not by much
● Fielding coaches: emphasize speed and skill on baserunning○ More closely associated with salary efficiency than any other performance
■ Even batting, pitching
![Page 21: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/21.jpg)
Preliminary Conclusions
● Chess match: all pieces are important ● 2-sided game● Predictive (vs. descriptive) statistics
○ Time-series analysis
![Page 22: Michael Grenon CS378 Data Mining Spring 2018 Optimizing ...lxiong/cs378/share/project/11_late_24558… · Optimizing Baseball Performance and Player Salary Michael Grenon CS378 Data](https://reader033.vdocuments.net/reader033/viewer/2022043010/5fa017805fc16c4d1d684d04/html5/thumbnails/22.jpg)
Preliminary Conclusions● Too much data
○ batting_stats(): 287 attributes?● Statcast data
○ More complex mining techniques○ Neural Nets
● Data warehouses incomplete, disorganized○ Private sector