predict the oscars with data science
TRANSCRIPT
![Page 2: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/2.jpg)
Predicting the Oscars with data science
![Page 3: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/3.jpg)
Data Science Process
• Frame the question.
• Collect the raw data.
• Process the data.
• Explore the data.
• Communicate results.
![Page 4: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/4.jpg)
Frame the question
• Who will win the Oscar for Best Picture?
![Page 5: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/5.jpg)
Collect the Data
• What kind of data do we need?
• Financial data (Budget, box office…)
• Reviews, ratings and scores.
• Awards and nominations.
![Page 6: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/6.jpg)
Process the data
• How’s the data “dirty” and how can we fix it?
• User input, redundancies, missing data…
• Formatting: adapt the data to meet certain specifications.
• Cleaning: detecting and correcting corrupt or inaccurate records.
![Page 7: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/7.jpg)
Explore the data
• What are the meaningful patterns in the data?
• How meaningful is each data point for our predictions?
![Page 8: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/8.jpg)
Goals
• Introduction to a data scientist's tools and methods:
• Jupyter notebooks, numpy, pandas, sklearn…
• Overview of basic machine learning concepts:
• Data formatting and cleaning, Decision trees, Overfitting, Random Forests…
![Page 9: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/9.jpg)
Jupyter Notebooks
• One of data scientist’s everyday tools.
• Find the links in our classroom tool.
• Contains cells with code.
![Page 10: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/10.jpg)
NumPy
• The fundamental package for scientific computing with Python.
• Provides powerful multi-dimensional array objects.
• Many methods for fast operations on arrays.
![Page 11: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/11.jpg)
Pandas
• Fundamental high-level building block for doing practical, real world data analysis in Python.
• Built on top of NumPy.
• Offers data structures and operations for manipulating numerical tables and time series.
![Page 12: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/12.jpg)
Scikit-learn
• Python module for machine learning.
• Provides a large menu of libraries for scientific computation, such as integration, interpolation, signal processing, linear algebra, statistics, etc.
![Page 13: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/13.jpg)
Initial imports and loading data with Pandas
![Page 14: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/14.jpg)
Understanding your data
• .head(n) method: Returns first n rows.
• .value_counts() method: Returns the counts of unique values in the DataFrame.
![Page 15: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/15.jpg)
Formatting your Data
![Page 16: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/16.jpg)
Formatting your Data
• Rate values in a non-numeric format. Thus, we will need to assign each rate a unique integer so that Python can handle the information.
• With the .ix method you create a subset of rows and assign a value to a certain variable of that subset of observations.
![Page 17: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/17.jpg)
Cleaning your Data
![Page 18: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/18.jpg)
Decision Trees
• It breaks down a dataset into smaller and smaller subsets.
• The final result is a model with a tree structure that has:
• Decision nodes: ask a question and have two or more branches.
• Leaf nodes: represent a classification or decision.
![Page 19: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/19.jpg)
![Page 20: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/20.jpg)
Classification vs Regression
• Classification — Predict categories.• Identifying group membership.
• Regression — Predict values.• Involves estimating or predicting a
response.
![Page 21: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/21.jpg)
Classification
![Page 22: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/22.jpg)
Classification
?
![Page 23: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/23.jpg)
Creating your first Decision Tree
You will use the scikit-learn and numpy libraries to build your first decision tree. We will need the following to build a decision tree
• target: A one-dimensional numpy array containing the target from the train data.
• features: A multidimensional numpy array containing the features/predictors from the train data.
![Page 24: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/24.jpg)
Creating your first Decision Tree
![Page 25: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/25.jpg)
Importances and Score
• .feature_importances_ attribute: tells us how important the features are for the final result.
• .score() method: returns the mean accuracy of our fitting.
![Page 26: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/26.jpg)
Importances and Score
![Page 27: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/27.jpg)
Predicting
![Page 28: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/28.jpg)
Pretty bad results :(Let’s improve it!
![Page 29: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/29.jpg)
Let’s improve it!
![Page 30: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/30.jpg)
Modify the feature list
![Page 31: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/31.jpg)
Run the prediction again
![Page 32: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/32.jpg)
Overfitting
• Resulting model too tied to the training set.
• It doesn’t generalize to new data, which is the point of prediction.
![Page 33: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/33.jpg)
Random Forest Classifier
• Random Forest Classifiers use many Decision Trees to build a classifier.
• We introduce a bit of randomness.
• Each Tree can give a different answer (a vote). The final classification is the most common amongst the Trees.
![Page 34: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/34.jpg)
Random Forest Classifier
![Page 35: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/35.jpg)
Importances and Score
![Page 36: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/36.jpg)
Predicting with Random Forest Classifiers
![Page 37: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/37.jpg)
Results
![Page 38: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/38.jpg)
1976
Rocky
![Page 39: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/39.jpg)
1984
Amadeus
![Page 40: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/40.jpg)
1996
The English Patient
![Page 41: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/41.jpg)
2009
The Hurt Locker
![Page 42: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/42.jpg)
And the Oscar goes to…
![Page 43: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/43.jpg)
La La Land!!
![Page 44: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/44.jpg)
![Page 45: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/45.jpg)
![Page 46: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/46.jpg)
The EndNothing happened after that.
Right?? RIGHT??
![Page 47: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/47.jpg)
We can predict the OscarsExcept for 2017 ¯\_( )_/¯
![Page 48: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/48.jpg)
![Page 49: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/49.jpg)
More about Thinkful
• Anyone who’s committed can learn programming or data science
• 1-on-1 mentorship is the best way to learn
• Flexibility matters — learn anywhere, anytime
![Page 50: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/50.jpg)
Our Program
You’ll learn concepts, practice with drills, and build capstone projects — all guided by a personal mentor
![Page 51: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/51.jpg)
Our Mentors
Mentors have, on average, 10+ years of experience
![Page 52: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/52.jpg)
Data Science Syllabus
• Managing data with SQL and Python
• Modeling with both supervised and unsupervised models
• Data visualization and communicating with data
• Technical interviews + Career prep
![Page 53: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/53.jpg)
Web Development Syllabus
• Frontend Development (HTML, CSS, Javascript)
• Backend Development (Node.js)
• Frontend Frameworks (React.js)
• Computer Science Fundamentals
• Technical interviews + Career prep
![Page 54: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/54.jpg)
Our Results
Job Titles after GraduationMonths until Employed
![Page 55: Predict the Oscars with Data Science](https://reader031.vdocuments.net/reader031/viewer/2022030313/58ece61f1a28abb1118b4841/html5/thumbnails/55.jpg)
Special Prep Course Offer
• Three-week program, includes nine mentor sessions for $500 $250
• Introduction to Programming in Python, Data Visualization, and Statistics
• Option to continue into full data science bootcamp
• Talk to me (or email me) if you’re interested