predict house prices in taichung to create an online service for the real estate market
TRANSCRIPT
Team 7: Andi, Guerman, Raju
Business Problem
Stakeholder: Customers, Real State Agents
Challenge: Es7mate the price accurately Opportunity: Become an essen7al tool for current real estate market in Taichung
Humanity considera7on: Help popula7on of Taichung to get fair prices
Data Mining Problem
Supervised Task: Find a rela7onship between dependent variables and price of new houses.
Predic7ve: Provide predic7ve analysis for pricing of new houses.
How to be deployed: Trying different subset of predictor including external data and derived variables.
Data Descrip5on
Data Prepara7on-‐ Translate Chinese Character to English
Data Prepara7on-‐ Data Cleanup and Missing Value Handling 2000 rows à 995
Data Analysis-‐ Visual Representa7on of the data through ScaQer Plot
Data Binning-‐ Binned the data in following useful variables-‐ Following slide
Data Par77oning-‐ Training-‐ 60 %, Valida7on-‐ 40 %
Data Descrip5on
District Transac7on type la7tude longitude
distance_mrt area/avg EGR floorbin
PaQern
age
total building area
number of rooms
number of bathrooms total price
Price per square meter
zhong1qu1 labu 24.14226 120.6796 0.822103 0.326183 2.83 2 ResBuild 32.35616 29.65 1 1 500000 16863 zhong1qu1 labu 24.14589 120.6802 0.898648 0.348405 2.83 1 ResBuild 32.85479 31.67 1 1 1000000 31576 zhong1qu1 labu 24.14407 120.6752 0.425695 0.520132 2.83 1 Suite 32.68493 47.28 2 2 1100000 23266 zhong1qu1 labu 24.14138 120.6771 0.738148 0.245655 2.83 1 Suite 21.6 22.33 1 1 1100000 49261
• Transac7on_land building • Longitude_la7tude (Ext) • Distance_to_MRT (Ext) • Area/average (Ext) • Floor_bin (D) • Building_paQern • Age_of_the_building (D)
• Total_building_area • Number_of _rooms • Number_of_bathrooms • Price_persquare_meter
Output:
Data Visualiza5on
Method and Evalua5on
KNN Algorithm
Mul7ple Linear Regression
Training Data Scoring -‐ Summary Report (for k = 2)
Total sum of squared errors RMS Error Average
Error
8116666667 3,687.242406 -‐1.56E-‐12
Valida5on Data Scoring -‐ Summary Report (for k = 2)
Total sum of squared errors RMS Error Average
Error
1.91469E+15 2,193,350.823 -‐2,123.90 -‐70.7963 USD
Training Data Scoring -‐ Summary Report Total sum of squared errors RMS Error
Average Error
9.25995E+14 1,245,423.591 -‐3.03642E-‐05
Valida5on Data Scoring -‐ Summary Report Total sum of squared errors RMS Error
Average Error
5.93783E+14 1,221,440.97 -‐747.06 -‐24.902 USD
Naïve Benchmark Total sum of squared errors RMS Error Average Error
1.3053E+16 114,249,854.8 7,616,564 253,885.5 USD
Method
0
10000000
20000000
30000000
40000000
1 12
23
34
45
56
67
78
89
100
111
122
133
144
155
166
177
188
199
210
221
232
243
254
265
276
287
298
309
320
331
342
353
364
375
386
397
408
419
430
441
452
463
474
485
496
507
518
529
540
551
562
573
584
595
Valida7on MLR
Predicted Value
Actual Value
Recommenda5ons
• Run the model monthly with update data • Create alterna7ve source of data by providing the customer the op7on to upload their home informa7on • Split the data according to the transac7on type • Try external data to increase accuracy • Automa7ze the system with the online page