![Page 1: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/1.jpg)
EE517 Project Presentation
Maisam Shahid Wasti and Dennis Hartono
![Page 2: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/2.jpg)
Biking wrong way at Trousdale
We found that it is possible to predict wrong way violations at Trousdale Parkway, USC
Campus
![Page 3: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/3.jpg)
Data Collection
Collected 14 hours of data with total sample size of 2837
![Page 4: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/4.jpg)
Decision Rule to classify violators
Established a consistent decision rule for sample validation
![Page 5: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/5.jpg)
Overview of the observation site
Five minutes slot following class ending times
Observed higher proportion of violators for few minutes after the classes end
![Page 6: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/6.jpg)
Name: 5min_after
Five minutes slot following class ending times
Interpretation of important variables
Name '5min_after'Type Binary
Description Counted '1' if sample observed within 5 minutes slot following the class ending times
![Page 7: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/7.jpg)
Used ‘bodyweight’ as a binary measure for speed
Interpretation of important variables
![Page 8: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/8.jpg)
Approach to model selection
Refined our model in three stages
All non-interaction termsAll non-interaction terms
Initial Model with selective non interaction terms
Initial Model with selective non interaction terms
+ (n,C,2) second order terms
+ (n,C,2) second order terms
Intermediate Model Intermediate Model Backward-LR
Backward-LR
Final Model after removing terms causing
Multicollinearity
Final Model after removing terms causing
Multicollinearity
Filtration
LOGISTIC REGRESSION
![Page 9: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/9.jpg)
Variables in final model
Found significant independent variables
We interpret bag as an indicator for student on campus
Variables Significance
Gender .003
Bag * Sportswear .010
Bag * Bodyweight .002
Bag * 5min_after .001
Non-Interaction Terms
Interaction Terms
![Page 10: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/10.jpg)
Model evaluationTest Statistic Significance
Omnibus 33.518 0.000
Cox and Snell R2 0.015
Nagelkerke R2 0.024
• Observed significant improvement in Log-Likelihood through Omnibus test
• Model suffered from low R2 values
![Page 11: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/11.jpg)
Multicollinearity Test 1
Found no serious multi-collinearity issues (>0.3)
with highest correlation coefficient of magnitude 0.186
Correlation Matrix Gender
Bag * Sportswear
Bag * Body_weight
Bag * 5min_after
Gender 1.000 -.085 -.186 .041
Bag * Sportswear -.085 1.000 -.028 -.015
Bag * Body_weight -.186 -.028 1.000 -.007
Bag * 5min_after .041 -.015 -.007 1.000
![Page 12: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/12.jpg)
Multicollinearity Test 2
Observed Standard Errors to be bounded by maximum of 0.258
Variables B S.E. Wald Sig. Exp(B)
Gender -.335 .112 8.936 .003 .715
Bag * Sportswear .667 .258 6.671 .010 1.949
Bag * Bodyweight .798 .258 9.605 .002 2.222
Bag * 5min_after .409 .120 11.533 .001 1.506
![Page 13: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/13.jpg)
Residual Analysis
Observed no residuals lying above 2 standard deviation
![Page 14: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/14.jpg)
Challenges with Classification Accuracy
- Have a skewed class distribution
- Resulting in high baseline accuracy - Difficult to improve much from the high baseline
accuracy
Violators Non-Violators0
500
1000
1500
2000
2500
522
2315
18.3 %
81.7 %
![Page 15: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/15.jpg)
Predicted Probabilities Histograms
- Observed significant overlap- The default 0.5 gave a bad cut-off threshold
Violators Non-Violators
![Page 16: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/16.jpg)
Classification Tables
The overall classification accuracy remains the same with increased prediction power for
violations
Ground Truth
Baseline Classification Table
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 465 0 100.0
Violating 107 0 0.0
Total 572 0 81.3
Ground Truth
Classification Table with 0.35 Cut-off
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 2 99.6
Violating 105 2 1.9
Total 568 4 81.3
![Page 17: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/17.jpg)
The ROC Graph
Observed to be better at predicting violations than the baseline at Cut-off = 0.35
![Page 18: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/18.jpg)
Multiway Cross-tabulation tests
Lacking significant number of violators for few cases
Wrong_Way_Violation * Food_or_Beverages Crosstabulation
Count
Food_or_Beverages
TotalWithout food or beverage
With food or beverage
Wrong_Way_Violation Not Violating 2278 37 2315
Violating 511 11 522
Total 2789 48 2837
Wrong_Way_Violation * Formal_Dressing Crosstabulation
Count
Formal_Dressing
TotalNot in formal
dress In formal dressWrong_Way_Violation Not Violating 2268 47 2315
Violating 510 12 522
Total 2778 59 2837
Wrong_Way_Violation * Helmet Crosstabulation
Count
Helmet
TotalNot wearing
helmetWearing helmet
Wrong_Way_Violation Not Violating 2299 16 2315
Violating 520 2 522
Total 2819 18 2837
Wrong_Way_Violation * Hoodie Crosstabulation
Count
Hoodie
TotalNot wearing
hoodieWearing hoodie
Wrong_Way_Violation Not Violating 2286 29 2315
Violating 517 5 522
Total 2803 34 2837
![Page 19: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/19.jpg)
Other classifiers
Experimented other classifiers to achieve a slight increase in overall accuracy
Classifier Accuracy %
Baseline 81.3
Logistic Regression 81.3
Parzen windows 81.64
Linear Perceptron 81.29
K-Nearest Neighbors 81.29
![Page 20: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/20.jpg)
Logistic Regression vs. Parzen Windows
Achieved slightly improved TPR/FPR and overall classification accuracy using Parzen Windows
Ground Truth
Parzen Window
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 0 100
Violating 105 2 1..9
Total 570 2 81.64
Ground Truth
Logistic Regression (0.35)
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 2 99.6
Violating 105 2 1.9
Total 568 4 81.30
![Page 21: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013](https://reader036.vdocuments.net/reader036/viewer/2022070302/54825b4db479590f0d8b4728/html5/thumbnails/21.jpg)
Questions…