feature engineering studio september 9, 2013. welcome to problem proposal day rules for presenters...
TRANSCRIPT
![Page 1: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/1.jpg)
Feature Engineering Studio
September 9, 2013
![Page 2: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/2.jpg)
Welcome to Problem Proposal Day
• Rules for Presenters• Rules for the Rest of the Class
![Page 3: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/3.jpg)
Rules for Presenters
• Talk for 3 minutes on:– Data set– What variable will you predict?– What kind of variables will you use to predict it?– Why is this worth doing?
• Remember to send me your slides (if any)
![Page 4: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/4.jpg)
Rules for Audience
• After the presentation– Ask quick questions– Give quick suggestions
![Page 5: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/5.jpg)
Criteria
• Everyone– Is the problem genuinely important? (usable or
publishable)– Is there a good measure of ground truth?
• Only if you know what you’re talking about– Is there rich enough data to distill meaningful
features?– Is there enough data to be able to take advantage
of data mining?
![Page 6: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/6.jpg)
Rules for Audience
• Be polite!
• No interrupting• No rambling• No being mean
![Page 7: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/7.jpg)
First Step
• Get into the right collaborative spirit
• You are officially encouraged (though not required)to sing along
• http://www.youtube.com/watch?v=pd_5-2kCzfs– 0:25
![Page 8: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/8.jpg)
Presentations
• Alphabetical Order Based on Last Name– Tie-Breaker: First Name
![Page 9: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/9.jpg)
For next week
• Think about how to improve your problem proposal
• Rewrite your problem proposal based on the feedback you got today
• Then email it to me for further feedback and a “thumbs-up” before the next class
![Page 10: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/10.jpg)
Assignment 2• Data Familiarization
“Mucking Around”
• Get your data set• Open it in Excel• Look at your ground truth label (if you have one)• Look at other key variables
• What does each variable mean semantically?• If numerical, what are its max, min, average, stdev? Create
histograms of key variables.• If categorical, what is the distribution of each value?
![Page 11: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/11.jpg)
Assignment 2
• Data Familiarization“Mucking Around”
• Write a brief report for me• You don’t need to prepare a presentation• But be ready to discuss what you learn about
your data
![Page 12: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/12.jpg)
What if you don’t have data yet?
1. Get your data2. If you can’t get your data before class, email
me at least 48 hours before class and I’ll send you a practice data set
![Page 13: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/13.jpg)
How to compute in Excel
• If numerical, what are its max, min, average, stdev?
• If categorical, what is the distribution of each value?
• Using Class2Data
![Page 14: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/14.jpg)
How to do a histogram in Excel
• Using Class2Data
![Page 15: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/15.jpg)
Next Class
• 9/23 Feature distillation in Excel (Asgn.2 due)– Do the assignment– Read the readings
![Page 16: Feature Engineering Studio September 9, 2013. Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class](https://reader035.vdocuments.net/reader035/viewer/2022072016/56649eec5503460f94bfdec2/html5/thumbnails/16.jpg)
Upcoming Classes
• 9/23 Feature distillation in Excel (Asgn.2 due)• 9/25 Special session on prediction models– Come to this if you don’t know why student-level cross-
validation is important, or if you don’t know what J48 is• 9/30 Advanced feature distillation in Excel (Asgn. 3
due)• 10/2 Special session on RapidMiner– Come to this if you’ve never built a classifier or regressor
in RapidMiner (or a similar tool)– Statistical significance tests using linear regression don’t
count…