metis project 2: predicting box office gross
TRANSCRIPT
![Page 1: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/1.jpg)
Analysis of features most influential in the success of
Jamie FradkinJanuary 29, 2016
PREDICTING SUCCESS FOR MOVIES
![Page 2: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/2.jpg)
Problem Statement/MotivationCreate a linear regression model that can predict Worldwide Gross of movies Based on a True Story by determining the features most influential to their success.
![Page 3: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/3.jpg)
All Features1) MPAA Rating (G/PG/PG-13/R)
2) Runtime
3) IMDB Score
4) Opening # Theaters (Domestic)
5) Opening Gross (Domestic)
6)
6) Total # Theaters (Domestic)
7)Total Gross (Domestic)
8) Peak movie season*
9) Budget
10) Genre: Action, Adventure, Biography,
Comedy, Crime, Documentary, Drama, Family,
History, Horror, Music, Mystery, Romance, Sport,
Thriller, War, Western*May, June, July, November, December are highest grossing months (BoxOfficeMojo.com)
![Page 4: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/4.jpg)
Procedure• Scrape all relevant data from various sources*, merge
into data frame by Title• Perform OLS regression on training set (70% of data)
beginning with all features• Evaluate model based on p-values for each feature and R2,
remove features as needed• Apply new model to remainder of data set
*Boxofficemojo.com, TheNumbers.com, IMDB.com
![Page 5: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/5.jpg)
Training Set—Results Feature P >|t| R2 : 0.925
Adjusted R2: 0.918
MPAA Rating 0.123Runtime 0.300IMDB Score 0.155Opening Theaters 0.002Opening Gross 0.005Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Peak Movie Season 0.067Budget 0.000
![Page 6: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/6.jpg)
Training Set—ResultsRule: remove feature if p-value > 0.100Feature P >|t| R2 : 0.925
Adjusted R2: 0.918
MPAA Rating 0.123Runtime 0.300IMDB Score 0.155Opening Theaters 0.002Opening Gross 0.005Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Peak Movie Season 0.067Budget 0.000
![Page 7: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/7.jpg)
Training Set—ResultsFeature P >|t| R2 : 0.920
Adjusted R2: 0.915
Opening Theaters 0.007Opening Gross 0.011Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Peak Movie Season 0.069Budget 0.000
![Page 8: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/8.jpg)
Training Set—ResultsRule: remove feature if p-value > 0.005Feature P >|t| R2 : 0.920
Adjusted R2: 0.915
Opening Theaters 0.007Opening Gross 0.011Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Peak Movie Season 0.069Budget 0.000
![Page 9: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/9.jpg)
Final ModelFeature P >|t| R2 : 0.904
Adjusted R2: 0.902
Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Budget 0.000Next step: Add in genre categories to determine which one(s) have lowest p-values and determine how they affect the model overall
![Page 10: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/10.jpg)
Final ModelFeature P >|t| R2 : 0.905
Adjusted R2: 0.902
Total Theaters (Domestic)
0.000
Total Gross (Domestic)
0.000
Budget 0.000Romance ❤ * 0.053
*Runner-ups: Thriller, Sport, Family
![Page 11: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/11.jpg)
Key Features:Trends in Raw Data
![Page 12: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/12.jpg)
Model Fit:Actual Worldwide Gross vs. Predicted by Feature Model predictions
Raw Data
95% Confidence Interval
![Page 13: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/13.jpg)
Model Fit: Prediction Residuals
Residuals increase in magnitude as Worldwide Gross increases: model predictions are not as accurate with extreme cases
![Page 14: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/14.jpg)
Model Fit: Prediction Residuals
Accuracy could be improved with more data points in extreme high-grossing group
![Page 15: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/15.jpg)
Conclusion:Predicting Success of Movies Based on a True Story• Domestic release (measured by Total Theaters and
Total Domestic Gross) is key indicator of Worldwide Gross• Total Domestic Gross and Budget are linearly related
to Worldwide Gross• Romance genre is the most highly correlated to
Worldwide Gross
![Page 16: Metis Project 2: Predicting Box Office Gross](https://reader035.vdocuments.net/reader035/viewer/2022062401/58f198ec1a28ab15228b45c1/html5/thumbnails/16.jpg)
Next Steps• Revisit larger data set knowing key features • Addition features to explore:• Actors/actresses• Award nominations or wins
• Analyze model with more features, allow more lenient p-values