google confidential and proprietary 1 predicting the present with google trends hyunyoung choi hal...
TRANSCRIPT
Google Confidential and Proprietary 1
Predicting the PresentWith Google Trends
Hyunyoung Choi
Hal Varian
June 2009
2Google Confidential and Proprietary 2
Problem statement
Government agencies and other organizations produce monthly reports on economic activity Retail Sales
House Sales
Automotive Sales
Unemployment
Problems with reports Compilation delay of several weeks
Subsequent revisions
Sample size may be small
Not available at all geographic levels
Google Trends releases daily and weekly index of search queries by industry vertical Real time data
No revisions (but some sampling variation)
Large samples
Available by country, state and city
Can Google Trends data help predict current economic activity? Before release of preliminary statistics
Before release of final revision
Google Confidential and Proprietary 3
Categories in Google Trends by Query Shares
Note: Queries from 2009-01-01 to 2009-04-30 & Growth Comparison w/ the same time window
Google Confidential and Proprietary
Real Estate
Google Confidential and Proprietary 5
Geography
Category
Time window
6Google Confidential and Proprietary
Real Estate Agencies
Rental Listings & Referrals
Home Insurance
Home Inspections & Appraisal
Pro
pe
rty
Ma
na
ge
me
nt
Home Financing
6
Subcategories under Real Estate by Query Shares
7Google Confidential and Proprietary 7
Search on Real Estate Agencies
8Google Confidential and Proprietary 8
Searches on Rental Listings & Referrals
9Google Confidential and Proprietary
Depicting trends
Google Trends measures normalized query share of particular category of queries – controls for overall growth
Often useful to look at year-on-year changes to eliminate seasonality.
Illustrate correlations and covariates.
Improving predictions
Forecast time series using its own lagged values and add Trends data as a predictor.
• Statistical significance?
• Improved fit?
• Improved forecasts?
• Identify turning points?
9
20 06 20 07 20 08
30
20
10
0
10
20
R eal Es t at e A gencies Q uery Index
O ct Jan A pr Ju l20
15
10
5
0
5
R eal Est at e A gencies YO Y G row th Index
10
Google Confidential and Proprietary 10
15 yr Mortgage Rate vs. Home Financing
11
Google Confidential and Proprietary 1111
Forecasting primer
Basic forecasting models
Autoregressive: value at time t depends on
• Value at time t-1
Seasonal adjustment: value at time t depends on
• Value at time t-12
• For monthly data
Transfer function: value at time t depends on
• Other contemporaneous or lagging variables
Seasonal autoregressive transfer model: Value at time t depends on
• Value at time t-12 (seasonality)
• Value at time t-1 (recent behavior)
• Other lagging or contemporaneous variables (such as Google Trends data)
Typical question of interest
• How much more accurate forecasts can you get from additional variables over and above the accuracy
you get with the history of the time series itself?
Google Confidential and Proprietary
New Home Sales
Model
Recent Trend with New Home Sales at t-1
Seasonality with New Home Sales at t-12
Recent Search Activity on
• Real Estate Agencies
• Rental Listings & Referrals
• Home Inspections & Appraisal
• Property Management
• Home Insurance
• Home Financing
Time Series Google Trends
Housing affordability with Average/Median Home Price
Exogenous Variables
13
Google Confidential and Proprietary 13
Predicting the present
Monthly release 24 – 28 days after the month
Seasonally adjusted
National and Regional aggregate
Home Inspections & Appraisal
Home Insurance
Home Financing
Property Management
Rental Listings & Referrals
Real Estate Agencies
New Residential Sales from US Census Google Trends Real Estate by Category
14
Google Confidential and Proprietary 14
New House Sales vs. Real Estate Google Trends
15
Google Confidential and Proprietary
Model:
Yt = 446.1 + 0.864 * Yt - 1 – 4.340 * us378.1 + 4.198 * us96.2 – 0.001 * AvgPt – 1
Yt : New house sold at t-th month
AvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month
us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week
us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week
15
Analysis and Forecasting
July 2008
Actual = 515K
Predicted = 442.98K
Z-score = 2.53
August 2008 Prediction = 417.52K
16
Google Confidential and Proprietary 16
Analysis and Forecasting
Observations
Since 2005 new house sales have been decreasing, with little seasonality
Google Trends captures seasonality & recent trends
Positive association with Real Estate Agencies (96)
Negative association with Rental Listings & Referrals (378) and Average Price
17
Google Confidential and Proprietary
Travel
18
Google Confidential and Proprietary
Hotels & Accommodations
Attractions & Activities
Air Travel
Bus & Rail
Cruises & Charters
Ad
ve
ntu
re
Tra
ve
l
Car Rental & Taxi Services
Vacation Destinations
18
Subcategories under Travel by Query Shares
19
Google Confidential and Proprietary 19
Travel to Hong Kong
Monthly summaries release with 1 month lag
Reports Country/Territory of Residence of visitors
Data available 2004-2008
Hotels & Accommodations
Air Travel
Car Rental & Taxi Services
Cruises & Charters
Attractions & Activities
Vacation Destinations• Australia
• Caribbean Islands
• Hawaii
• Hong Kong• Las Vegas
• Mexico
• New York City
• Orlando
Adventure Travel
Bus & Rail
Google Trends Travel by CategoryVisitors Arrival Statistics from Hong
Kong Tourism Board
20
Google Confidential and Proprietary 20
Visitors Arrival Statistics vs. Google Trends
21
Google Confidential and Proprietary 21
Analysis and Forecasting
Model:
log(Yi,t) = 0.664 + 0.113 * log(Yi,t-1) + 0.828 * log(Yi,t-12) + 0.001 * Xi,t,2 + 0.001 * Xi,t,3
+ 0.005 * FXrate i,t + ηi, + ei,t
ei,t ~ N(0, 0.09382), ηi ~ N(0, 0.02282)
Yi,t = Arrival to Hong Kong at month t and from i-th country
Xi,t,1 = Google Trend Search at 1st week of month t and from i-th country
Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th country
Xi,t,3 = Google Trend Search at 3rd week of month t and from i-th country
FXrate i,t = Hong Kong Dollar per one unit of i-th country’s local currency at month t. Average of first
week’s FX rate is used as a proxy to FX rate per each month.
22
Google Confidential and Proprietary 22
Visitor Arrival Statistics - Actual & Fitted
23
Google Confidential and Proprietary 23
Analysis and Forecasting
Conclusion
Arrival at time t is positively associated with arrival at time t-1 and arrival at time t-12.
• It shows strong seasonality and autocorrelation
Arrival at time t is positively associated with searches on [Hong Kong].
Arrival at time t is positively associated with FX rates.
• When the local currency appreciates relative to Hong Kong Dollar, visitors to Hong Kong increase.
24
Google Confidential and Proprietary
Automobiles
25
Google Confidential and Proprietary 2525
US Auto Sales by Make
Monthly summaries released 1 week after end of month
Data available by Car Sales, Truck Sales and Total Sales for each make
Data available from 2003-2008
Source: Automotive News Data Center
Google Trends subcategory Vehicle Brands.
Weekly Search query index
Total 31 verticals in this subcategory• 27 verticals matching to Monthly Sales
available
Google Trends under Vehicle Brands Category
US Auto Sales by Make
26
Google Confidential and Proprietary 26
Google Categories under Vehicle Brands
NOTE: Area represents the queries volume from first half year 2008 and the color represents queries yearly growth rate
27
Google Confidential and Proprietary 2727
Auto Sales by Make (Top 9 Make by Sales) Monthly Sales vs. Google Trends at Second Week of each month
28
Google Confidential and Proprietary 2828
Analysis and Forecasting
Fixed effects model:
log(Yi,t) = 2.4276 + 0.2552 * log(Yi,t-1) + 0.4930 * log(Yi,t-12)
+ 0.0005 * Xi,t,2 + 0.0014 * Xi,t,2 + ai * Makei + ei,t
ei,t ~ N(0, 0.13472) , Adjusted R2 = 0.9829
Yi,t = Auto Sales of i-th Make at month t
Xi,t,1 = Google Trend Search at 1st week of month t and from i-th make
Xi,t,2 = Google Trend Search at 2nd week of month t and from i-th make
Makei =Dummy variable for Auto Make
ai = Coefficient to capture the mean level of Auto Sales by Make
ANOVA Table
Df Sum Sq Mean Sq F value Pr(>F)
trends1 1 12.89 12.89 710.3542 < 2e-16 ***
trends2 1 0.05 0.05 2.7987 0.09455 .
log(s1) 1 1532.95 1532.95 84452.7530 < 2e-16 ***
log(s12) 1 24.07 24.07 1325.9741 < 2e-16 ***
as.factor(brand) 26 3.34 0.13 7.0696 < 2e-16 ***
Residuals 1480 26.86 0.02
29
Google Confidential and Proprietary 29
Actual vs. Fitted Sales (Top 9 Make by Sales)
30
Google Confidential and Proprietary 3030
Analysis and Forecasting
Conclusion
Sales at time t are positively associated with Sales at time t-1 and Sales at time t-12.
• Sales show strong seasonality and autocorrelation
Monthly Sales are positively correlated to the first and second weeks search volume of each
month.
• If the search volume increase by 1%, the sales volume will increase by an average of 0.19%.
31
Google Confidential and Proprietary
Unemployment
Google Confidential and Proprietary
YoY Growth in Initial Claims & Google Search
According to the NBER, the current recession started December 2007.
National unemployment rate passed 5% in mid 2008 and search queries on [Welfare and Unemployment] also increased at same time.
Google Confidential and Proprietary
Initial claims is an important leading indicator
Google Confidential and Proprietary
Google Trends data [Search Insights screenshot]
Google Confidential and Proprietary
Initial Claims and Google Trends
Month May 2009
Week3/15/09 - 3/21/09
3/22/09 - 3/28/09
3/29/09 - 4/4/09
4/5/09 - 4/11/09
4/12/09 - 4/18/09
4/19/09 - 4/25/09
4/26/09 - 5/2/09
Initial Claims 81,236 74,179 69,471 75,875 84,410 Continued Claims 859,561 826,924 866,734 834,569 846,477 Covered Employment 15,395,215 15,395,215 15,395,215 15,356,117 15,356,117 Insured Unemployment Rate 5.58 5.37 5.63 5.43 5.51
Jobs 9% 6% 2% 0% 1% -9% -11%Welfare & Unemployment -2% -9% -13% -12% -6% -9% -10%
CaliforniaMarch 2009 April 2009
Release at 5/7/09
Release at 5/14/09
Google Trends
US Dept of Labor
Google Confidential and Proprietary
Strong Autocorrelation in Initial Claims
Time Series Autocorrelation Function
Google Confidential and Proprietary
Initial Claims Before/After Recession Started
California New York
Google Confidential and Proprietary
Time Window for Analysis
Window For Long Term Model
Window For Short Term Model
Recession Starts
Google Confidential and Proprietary
Model
Reference ARIMA(0,1,1) X (1,0,0)12 Model
ARIMA(0,1,1) X (1,0,0)12 Model With Google Trends
Model Fit improved significantly – smaller Standard deviation, high log likelihood and smaller AIC
Initial Claims are positively correlated with searches on Jobs and Welfare.
Sigmalog
likelihoodAIC Sigma
log likelihood
AIC
LT Model -0.755 *** 0.619 *** 0.086 268.85 -531.69 -0.725 *** 0.565 *** 0.004 ** 0.003 ** 0.083 285.96 -561.91ST Model -0.691 *** 0.463 *** 0.098 99.04 -192.08 -0.657 *** 0.359 ** 0.002 0.007 *** 0.088 114.19 -218.38
Reference Model Model with Google Trends
Theta Phi Theta Phi Jobs Welfare
Signif. codes: 0.001 ‘***’ 0.05 ‘**’ 0.01 ‘*’
Google Confidential and Proprietary
Long Term Model: Prediction Comparison with MAE
With Google Trends, the out-of-sample prediction MAE decreases by 16.84%. Prediction with rolling window from 1/11/2009 to 4/12/2009
Prediction Error at t:
Mean Absolute Error:
Google Confidential and Proprietary
Short Term Model: Prediction Comparison with MAE
With Google Trends, the out-of-sample prediction MAE decreases by 19.23%. Prediction errors are within the same range as LT Model.
Fit improvement is better with ST Model.
Google Confidential and Proprietary
Summary
Google Trends significantly improves out-of-sample prediction of state unemployment, up to 18 days in advance of data release.
Mean absolute error for out-of-sample predictions declines by 16.84% for LT Model and 19.23% for ST Model.
Further work Can examine metro level data
Other local data (real estate)
Combine with other predictors
Detect turning points?