final case study powerpoint

19
Final Case Study Predictive Modelling for Equestrian Sports N RAMACHANDRAN

Upload: ramachandran-n

Post on 18-Jul-2015

32 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Final case study powerpoint

Final Case StudyPredictive Modelling for Equestrian Sports

N RAMACHANDRAN

Page 2: Final case study powerpoint

Average by Stake Indicator

0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000

All

AP

CRC

FG

Handle by Stake Indicator

Y N

Page 3: Final case study powerpoint

Average Handle by Day of Week

0

50000

100000

150000

200000

250000

300000

350000

Sun Mon Tue Wed Thu Fri Sat

Handle vs Day of week

All AP CRC FG

Page 4: Final case study powerpoint

Average Handle by Hour of day

0

50000

100000

150000

200000

250000

300000

350000

400000

1 2 3 4 5 6 7 8 9

Handle vs Hour of day

hour_of_day All AP CRC FG

Page 5: Final case study powerpoint

Average Handle by No of runners

0

100000

200000

300000

400000

500000

600000

700000

800000

3 4 5 6 7 8 9 10 11 12 13 14

Handle vs No of runners

All AP CRC FG

Page 6: Final case study powerpoint

Average Handle vs Race Number

0

200000

400000

600000

800000

1000000

1200000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Handle by Race Number

All AP CRC FG

Page 7: Final case study powerpoint

Average Handle by Month

0

50000

100000

150000

200000

250000

300000

350000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Average Handle by Month

All AP CRC FG

Page 8: Final case study powerpoint

Variables and their influence on the handle

Variables influencing Handle

All AP CRC FG

Purse_USA +ve +ve +ve +ve

Number of runners +ve +ve +ve +ve

Holiday +ve +ve +ve -ve

Weekend +ve NA +ve +ve

Race Type -ve +ve -ve +ve

Age Restriction -ve -ve NA +ve

Sex Restriction -ve -ve -ve -ve

Race Number +ve -ve -ve +ve

Hour of day +ve -ve +ve +ve

Track_Condition -ve -ve NA NA

Wager Type +ve +ve +ve +ve

Page 9: Final case study powerpoint

Linear Regression

• The analytic modelling used to predict the handle values is Linear Regression .Since the handle is a continuous variable , this is the best method to understand the predict the values.

• Following are the charts that show the results of the predicted values and the error with respect to the original handle values .

• (The details of the variables used in the regression are in the Excel files.)

Page 10: Final case study powerpoint

Predicted Handle vs Handle with All Track Ids

Page 11: Final case study powerpoint

Original Handle vs Errors for all Track Ids

Page 12: Final case study powerpoint

Predicted Handle vs Original Handle for track AP

0

200000

400000

600000

800000

1000000

1200000

1400000

0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000

predicted_handle

Page 13: Final case study powerpoint

Original Handle value vs Error for Track AP

-600000

-400000

-200000

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2000000

2200000

2400000

2600000

2800000

0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 2000000 2200000 2400000 2600000 2800000 3000000 3200000 3400000

difference

Page 14: Final case study powerpoint

Predicted Handle vs Original Handle for track CRC

0

100000

200000

300000

400000

500000

600000

700000

800000

0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000

predicted_handle

Page 15: Final case study powerpoint

Original Handle value vs Error for Track CRC

-400000

-200000

0

200000

400000

600000

800000

1000000

1200000

0 200000 400000 600000 800000 1000000 1200000 1400000

difference

Page 16: Final case study powerpoint

Predicted Handle vs Original Handle for track FG

0

100000

200000

300000

400000

500000

600000

0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000 1800000

predicted_handle

Page 17: Final case study powerpoint

Original Handle value vs Error for Track FG

-400000

-200000

0

200000

400000

600000

800000

1000000

1200000

1400000

0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000

difference

Page 18: Final case study powerpoint

Important Points

• The predicted values for the range upto handle = 700,000 is predicted with a good accuracy.

• The model does not do a good job of predicting higher values of handle.

• The Handle values vs error graph shows most of the values symmetrically placed along the x axis , the error are random and therefore there is not any collinearity issue.

• Adj R sq is in the range 0.60 – 0.75 for all the different analysis.

Page 19: Final case study powerpoint

Ideal Variable Values to Maximize Handle

Ideal Values for the maximization of Handle

All AP CRC FG

Number of runners 14 14 13 13

Holiday 1 1 1 0

Weekend 1 0 1 1

Race Type STK STK STK STK

Age Restriction 4U 34 35 3

Sex Restriction No Restriction No Restriction No Restriction No Restriction

Race Number 3 9 6 2

Hour of day 7 1 2 2

Track_Condition FT GD FT FT

Wager Type E E E E

Month Jan Aug Jan Jan

Day of Week Wed Wed Mon Thu