quantifying trading behavior in financial markets using google trends

6
Quantifying Trading Behavior in Financial Markets Using Google Trends Tobias Preis 1 *, Helen Susannah Moat 2,3 * & H. Eugene Stanley 2 * 1 Warwick Business School, University of Warwick, Scarman Road, Coventry, CV4 7AL, UK, 2 Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA, 3 Department of Civil, Environmental and Geomatic Engineering, UCL, Gower Street, London, WC1E 6BT, UK. Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some of the complex human behavior that has led to these crises. We suggest that massive new data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. By analyzing changes in Google query volumes for search terms related to finance, we find patterns that may be interpreted as ‘‘early warning signs’’ of stock market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a better understanding of collective human behavior. T he increasing volumes of ‘big data’ reflecting various aspects of our everyday activities represent a vital new opportunity for scientists to address fundamental questions about the complex world we inhabit 1–7 . Financial markets are a prime target for such quantitative investigations 8,9 . Movements in the markets exert immense impacts on personal fortunes and geopolitical events, generating considerable scientific attention to this subject 10–19 . For example, a range of recent studies have focused on modeling financial markets 20–25 and on performing network analyses 26–29 . At their core, financial trading data sets reflect the myriad of decisions taken by market participants. According to Herbert Simon, actors begin their decision making processes by attempting to gather information 30 . In today’s world, information gathering often consists of searching online sources. Recently, the search engine Google has begun to provide access to aggregated information on the volume of queries for different search terms and how these volumes change over time, via the publicly available service Google Trends. In the present study, we investigate the intriguing possibility of analyzing search query data from Google Trends to provide new insights into the information gathering process that precedes the trading decisions recorded in the stock market data. A recent investigation has shown that the number of clicks on search results stemming from a given country correlates with the amount of investment in that country 31 . Further studies exploiting the temporal dimension of Google Trends data have demonstrated that changes in query volumes for selected search terms mirror changes in current numbers of influenza cases 32 and current volumes of stock market transactions 33 . This demonstration of a link between stock market transaction volume and search volume has also been replicated using Yahoo! data 34 . Choi and Varian 35 have shown that data from Google Trends can be linked to current values of various economic indicators, including automobile sales, unemployment claims, travel destination planning and consumer con- fidence. A very recent study has shown that Internet users from countries with a higher per capita GDP are more likely to search for information about years in the future than years in the past 36 . Here, we suggest that within the time period we investigate, Google Trends data did not only reflect the current state of the stock markets 33 but may have also been able to anticipate certain future trends. Our findings are consistent with the intriguing proposal that notable drops in the financial market are preceded by periods of investor concern. In such periods, investors may search for more information about the market, before eventually deciding to buy or sell. Our results suggest that, following this logic, during the period 2004 to 2011 Google Trends search query volumes for certain terms could have been used in the construction of profitable trading strategies. Results We analyze the performance of a set of 98 search terms. We included terms related to the concept of stock markets, with some terms suggested by the Google Sets service, a tool which identifies semantically related keywords. The set of terms used was therefore not arbitrarily chosen, as we intentionally introduced some financial bias. We explain our strategy based on changes in search volume with reference to the term debt,a SUBJECT AREAS: STATISTICAL PHYSICS, THERMODYNAMICS AND NONLINEAR DYNAMICS APPLIED PHYSICS COMPUTATIONAL SCIENCE INFORMATION THEORY AND COMPUTATION Received 25 February 2013 Accepted 3 April 2013 Published 25 April 2013 Correspondence and requests for materials should be addressed to T.P. (Tobias.Preis@ wbs.ac.uk) * These authors contributed equally to this work. SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 1

Upload: h-eugene

Post on 27-Jan-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Quantifying Trading Behavior in Financial Markets Using Google Trends

Quantifying Trading Behavior inFinancial Markets Using Google TrendsTobias Preis1*, Helen Susannah Moat2,3* & H. Eugene Stanley2*

1Warwick Business School, University of Warwick, Scarman Road, Coventry, CV4 7AL, UK, 2Department of Physics, BostonUniversity, 590 Commonwealth Avenue, Boston, Massachusetts 02215, USA, 3Department of Civil, Environmental and GeomaticEngineering, UCL, Gower Street, London, WC1E 6BT, UK.

Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect someof the complex human behavior that has led to these crises. We suggest that massive new data sourcesresulting from human interaction with the Internet may offer a new perspective on the behavior of marketparticipants in periods of large market movements. By analyzing changes in Google query volumes forsearch terms related to finance, we find patterns that may be interpreted as ‘‘early warning signs’’ of stockmarket moves. Our results illustrate the potential that combining extensive behavioral data sets offers for abetter understanding of collective human behavior.

The increasing volumes of ‘big data’ reflecting various aspects of our everyday activities represent a vital newopportunity for scientists to address fundamental questions about the complex world we inhabit1–7.Financial markets are a prime target for such quantitative investigations8,9. Movements in the markets exert

immense impacts on personal fortunes and geopolitical events, generating considerable scientific attention to thissubject10–19. For example, a range of recent studies have focused on modeling financial markets20–25 and onperforming network analyses26–29.

At their core, financial trading data sets reflect the myriad of decisions taken by market participants. Accordingto Herbert Simon, actors begin their decision making processes by attempting to gather information30. In today’sworld, information gathering often consists of searching online sources. Recently, the search engine Google hasbegun to provide access to aggregated information on the volume of queries for different search terms and howthese volumes change over time, via the publicly available service Google Trends. In the present study, weinvestigate the intriguing possibility of analyzing search query data from Google Trends to provide new insightsinto the information gathering process that precedes the trading decisions recorded in the stock market data.

A recent investigation has shown that the number of clicks on search results stemming from a given countrycorrelates with the amount of investment in that country31. Further studies exploiting the temporal dimension ofGoogle Trends data have demonstrated that changes in query volumes for selected search terms mirror changes incurrent numbers of influenza cases32 and current volumes of stock market transactions33. This demonstration of alink between stock market transaction volume and search volume has also been replicated using Yahoo! data34.Choi and Varian35 have shown that data from Google Trends can be linked to current values of various economicindicators, including automobile sales, unemployment claims, travel destination planning and consumer con-fidence. A very recent study has shown that Internet users from countries with a higher per capita GDP are morelikely to search for information about years in the future than years in the past36.

Here, we suggest that within the time period we investigate, Google Trends data did not only reflect the currentstate of the stock markets33 but may have also been able to anticipate certain future trends. Our findings areconsistent with the intriguing proposal that notable drops in the financial market are preceded by periods ofinvestor concern. In such periods, investors may search for more information about the market, before eventuallydeciding to buy or sell. Our results suggest that, following this logic, during the period 2004 to 2011 Google Trendssearch query volumes for certain terms could have been used in the construction of profitable trading strategies.

ResultsWe analyze the performance of a set of 98 search terms. We included terms related to the concept of stockmarkets, with some terms suggested by the Google Sets service, a tool which identifies semantically relatedkeywords. The set of terms used was therefore not arbitrarily chosen, as we intentionally introduced somefinancial bias. We explain our strategy based on changes in search volume with reference to the term debt, a

SUBJECT AREAS:STATISTICAL PHYSICS,

THERMODYNAMICS ANDNONLINEAR DYNAMICS

APPLIED PHYSICS

COMPUTATIONAL SCIENCE

INFORMATION THEORY ANDCOMPUTATION

Received25 February 2013

Accepted3 April 2013

Published25 April 2013

Correspondence andrequests for materials

should be addressed toT.P. (Tobias.Preis@

wbs.ac.uk)

* These authorscontributed equally to

this work.

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 1

Page 2: Quantifying Trading Behavior in Financial Markets Using Google Trends

keyword with an obvious semantic connection to the most recentfinancial crisis, and overall the term which performed best in ouranalyses.

To uncover the relationship between the volume of search queriesfor a specific term and the overall direction of trader decisions, weanalyze closing prices p(t) of the Dow Jones Industrial Average (DJIA)on the first trading day of week t. We use Google Trends to determinehow many searches n(t – 1) have been carried out for a specific searchterm such as debt in week t – 1, where Google defines weeks as endingon a Sunday, relative to the total number of searches carried out onGoogle during that time. We find that search volume data changeslightly over time due to Google’s extraction procedure. For eachsearch term, we therefore average over three realizations of its searchvolume time series, based on three independent data requests inconsecutive weeks. The variability of Google Trends data across dif-ferent dates of access is irrelevant for our results, and it can be shownthat the data are consistent with reported real world events (seeFig. S1 in the Supplementary Information).

To quantify changes in information gathering behavior, we use therelative change in search volume: Dn(t, Dt) 5 n(t) 2 N(t 2 1, Dt)with N(t 2 1, Dt) 5 (n(t 2 1) 1 n(t 2 2) 1 … 1 n(t 2 Dt))/Dt, wheret is measured in units of weeks. In Fig. 1, we depict relative searchvolume changes for the term debt, and their relationship to DJIAclosing prices.

To investigate whether changes in information gathering behavioras captured by Google Trends data were related to later changes instock price in the period between 2004–2011, we implement a hypo-thetical investment strategy for a portfolio using search volume data,called ‘Google Trends strategy’ in the following. Profit can only bemade in a trading strategy if at least some future changes in the stockprice are correctly anticipated, in particular around large market

movements. We implement this strategy by selling the DJIA at theclosing price p(t) on the first trading day of week t, if Dn(t 2 1, Dt) .

0, and buying the DJIA at price p(t 1 1) at the end of the first tradingday of the following week. Note that mechanisms exist which make itpossible to sell assets in financial markets without first owning them.If instead Dn(t 2 1, Dt) , 0, then we buy the DJIA at the closing pricep(t) on the first trading day of week t and sell the DJIA at pricep(t 1 1) at the end of the first trading day of the coming week. Atthe beginning of trading, we set the value of all portfolios to anarbitrary value of 1. If we take a ‘short position’—selling at the closingprice p(t) and buying back at price p(t 1 1)—then the cumulativereturn R changes by log(p(t)) 2 log(p(t 1 1)). If we take a ‘longposition’—buying at the closing price p(t) and selling at pricep(t 1 1)—then the cumulative return R changes by log(p(t 1 1))2 log(p(t)). In this way, buy and sell actions have symmetric impactson the cumulative return R of a strategy’s portfolio. In using thisapproach to analyze the relationship between Google search volumeand stock market movements, we neglect transaction fees, since themaximum number of transactions per year when using our strategyis only 104, allowing a closing and an opening transaction per week.We of course do not dispute that such transaction fees would impactprofit in a real world implementation.

In Fig. 2, the performance of the Google Trends strategy based onthe search term debt is depicted by a blue line, whereas dashed linesindicate the standard deviation of the cumulative return from astrategy in which we buy and sell the market index in an uncorre-lated, random manner (‘random investment strategy’). The standarddeviation is derived from simulations of 10,000 independent realiza-tions of the random investment strategy. Fig. 2 shows that the use ofthe Google Trends strategy, based on the search term debt and Dt 5 3weeks, would have increased the value of a portfolio by 326%. The

Time, t [Years]In

de

x v

alu

e, p

(t)

[Po

ints

]

8000

10000

12000

14000

2004 2005 2006 2007 2008 2009 2010 2011

0 %

+50 %

-50 %

Re

lati

ve S

ea

rch

Vo

lum

e C

ha

ng

e

Figure 1 | Search volume data and stock market moves. Time series of closing prices p(t) of the Dow Jones Industrial Average (DJIA) on the first day of

trading in each week t covering the period from 5 January 2004 until 22 February 2011. The color code corresponds to the relative search volume changes

for the search term debt, with Dt 5 3 weeks. Search volume data are restricted to requests of users localized in the United States of America.

Pro

fit

an

d L

oss

[%

]

Time, t [Years]

2004 2005 2006 2007 2008 2009 2010 2011

0

100

200

300326

16

‘debt’

Google Trends Strategy

Buy and Hold Strategy

μ + σ, μ, μ − σ of Random Strategy

Figure 2 | Cumulative performance of an investment strategy based on Google Trends data. Profit and loss for an investment strategy based on the

volume of the search term debt, the best performing keyword in our analysis, with Dt 5 3 weeks, plotted as a function of time (blue line). This is compared

to the ‘‘buy and hold’’ strategy (red line) and the standard deviation of 10,000 simulations using a purely random investment strategy (dashed lines). The

Google Trends strategy using the search volume of the term debt would have yielded a profit of 326%.

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 2

Page 3: Quantifying Trading Behavior in Financial Markets Using Google Trends

Cumulative Return, R [Std. Devs. of Random Strategy Returns]

economics debt inflation metals sell bonds risk car leverage color chance unemployment nasdaq money society war transaction cash economy stocks forex finance fed growth culture banking markets marriage office stock market revenue house dow jones portfolio fond travel ore fine religion gains restaurant consumption loss credit default crisis hedge headlines cancer DOW JONESrich trader garden housing gain return tourism investment derivatives BUY AND HOLD oil politics invest food crash returns greed movie health nyse rare earths success water short sell consume gold bubble energy lifestyle home freedom world opportunity happy dividend arts present labor environment buy financial markets fun short selling earnings holiday profit kitchen train ring conflict

−3 −2 −1 0 1 2 3

2.12 1.87 1.82

1.57 1.53 1.52 1.46 1.46

1.31 1.3

1.29 1.27 1.27 1.23 1.15 1.1

1.06 1.03 1.02

1 0.99 0.97 0.97 0.96 0.93 0.91 0.9

0.87 0.84 0.79 0.77 0.73 0.69 0.67 0.67 0.66 0.66 0.64 0.63 0.61 0.61 0.59 0.57 0.53 0.53 0.51 0.49 0.48 0.47 0.45 0.4 0.4

0.39 0.38 0.37 0.31 0.3 0.3

0.29 0.29 0.29 0.19 0.18 0.16 0.08 0.07 0.03 0.02 0.02

0 0

−0.03 −0.04 −0.07 −0.08 −0.15 −0.16 −0.21 −0.22 −0.23 −0.24 −0.24 −0.25 −0.25 −0.26 −0.31 −0.32 −0.33 −0.35 −0.42 −0.45 −0.45 −0.48 −0.54 −0.55 −0.57 −0.58 −0.66

−0.82 −0.84

μ−2σ

of R

ando

m S

trat

egy

μ−σ

of R

ando

m S

trat

egy

μ+σ

of R

ando

m S

trat

egy

μ+2σ

of R

ando

m S

trat

egy

μ+3σ

of R

ando

m S

trat

egy

BCumulative Return, R

[Std. Devs. of Random Strategy Returns]

debt color stocks restaurant portfolio inflation housing dow jones revenue economics credit markets return unemployment money religion cancer growth investment hedge marriage bonds derivatives headlines profit society leverage loss cash office fine stock market banking crisis happy car nasdaq gains finance sell invest fed house metals travel returns gain default present holiday water rich risk gold success oil war economy DOW JONESchance short sell lifestyle greed food financial markets movie nyse ore BUY AND HOLD opportunity health short selling earnings arts culture bubble buy trader rare earths tourism politics energy consume consumption freedom dividend world conflict kitchen forex home crash transaction garden fond train labor fun environment ring

−3 −2 −1 0 1 2 3

2.31 2.22 2.21

1.7 1.69 1.66 1.56 1.51 1.47 1.44 1.4

1.38 1.34 1.34 1.26 1.22 1.2

1.16 1.15 1.12 1.11 1.11 1.05 1.05 1.03 1.02 0.97 0.93 0.92 0.89 0.88 0.87 0.86 0.84 0.83 0.79 0.78 0.77 0.77 0.75 0.74 0.73 0.69 0.69 0.67 0.66 0.65 0.63 0.63 0.63 0.62 0.62 0.57 0.57 0.53 0.53 0.49 0.46 0.45 0.43 0.43 0.43 0.38 0.38 0.38 0.37 0.36 0.31 0.29 0.29 0.28 0.27 0.25 0.25 0.2

0.18 0.14 0.13 0.1

0.08 0.05 0.04 0.04 0.03

−0.02 −0.07 −0.1

−0.16 −0.17 −0.19

−0.33 −0.34 −0.37 −0.45 −0.48 −0.56 −0.61 −0.7

−0.87 −1.68

μ−2σ

of R

ando

m S

trat

egy

μ−σ

of R

ando

m S

trat

egy

μ+σ

of R

ando

m S

trat

egy

μ+2σ

of R

ando

m S

trat

egy

μ+3σ

of R

ando

m S

trat

egy

A

US

Sear

ch V

olum

e D

ata

Glo

bal S

earc

h Vo

lum

e D

ata

Figure 3 | Performances of investment strategies based on search volume data. (A) Cumulative returns of 98 investment strategies based on search volumes

restricted to search requests of users located in the United States for different search terms, displayed for the entire time period of our study from 5 January

2004 until 22 February 2011—the time period for which Google Trends provides data. We use two shades of blue for positive returns and two shades of red for

negative returns to improve the readability of the search terms. The cumulative performance for the ‘‘buy and hold strategy’’ is also shown, as is a ‘‘Dow Jones

strategy’’, which uses weekly closing prices of the Dow Jones Industrial Average (DJIA) rather than Google Trends data (see gray bars). Figures provided next to

the bars indicate the returns of a strategy, R, in standard deviations from the mean return of uncorrelated random investment strategies, ,R.RandomStrategy 5

0. Dashed lines correspond to 23, 22, 21, 0, 11, 12, and 13 standard deviations of random strategies. We find that returns from the Google Trends

strategies tested are significantly higher overall than returns from the random strategies (,R.US 5 0.60; t 5 8.65, df 5 97, p , 0.001, one sample t-test).

(B) A parallel analysis shows that extending the range of the search volume analysis to global users reduces the overall return achieved by Google Trends

trading strategies on the U.S. market (,R.US 5 0.60, ,R.Global 5 0.43; t 5 2.69, df 5 97, p , 0.01, two-sided paired t-test). However, returns are still

significantly higher than the mean return of random investment strategies (,R.Global 5 0.43; t 5 6.40, df 5 97, p , 0.001, one sample t-test).

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 3

Page 4: Quantifying Trading Behavior in Financial Markets Using Google Trends

debt color stocks restaurant economics portfolio housing credit revenue inflation dow jones return markets unemployment growth hedge money cancer religion investment marriage society headlines stock market bonds office invest cash loss fine happy sell banking ore profit finance car fed metals house risk rich water crisis holiday derivatives nasdaq DOW JONES chance travel leverage returns gold success war oil gain default economy short sell gains movie earnings food nyse BUY AND HOLD arts health culture tourism present bubble politics energy conflict financial markets trader buy opportunity rare earths lifestyle greed world freedom fond consumption kitchen crash dividend short selling consume home garden train transaction forex labor fun environment ring

1.28 1.27 1.2

0.97 0.94 0.93 0.91 0.87 0.87 0.85 0.85 0.82 0.81 0.8

0.78 0.78 0.77 0.75 0.72 0.69 0.66 0.64 0.61 0.61 0.6 0.6 0.6

0.59 0.59 0.58 0.57 0.57 0.56 0.56 0.54 0.54 0.51 0.5

0.48 0.48 0.46 0.46 0.45 0.45 0.45 0.45 0.45 0.45 0.44 0.44 0.43 0.42 0.42 0.41 0.41 0.37 0.37 0.37 0.34 0.34 0.31 0.31 0.31 0.29 0.29 0.29 0.28 0.26 0.25 0.24 0.22 0.22 0.2

0.18 0.18 0.16 0.15 0.14 0.13 0.12 0.11 0.1

0.09 0.06 0.04 0.02 0.01

0 −0.02 −0.04 −0.04 −0.05 −0.08 −0.13 −0.14 −0.15 −0.15 −0.21 −0.31

−0.72

μ−2σ

of R

ando

m S

trat

egy

μ−σ

of R

ando

m S

trat

egy

μ+2σ

of R

ando

m S

trat

egy

μ+3σ

of R

ando

m S

trat

egy

Cumulative Return, R [Std. Devs. of Random Strategy Returns]

−3 −2 −1 0 1 2 3Adebt stocks color inflation portfolio restaurant dow jones housing revenue derivatives markets unemployment leverage credit return bonds economics religion money profit investment gains cancer marriage DOW JONES headlines present crisis society growth loss hedge nasdaq cash short selling lifestyle fine banking office BUY AND HOLD greed gain car default stock market happy returns travel fed finance financial markets house metals sell holiday water rich oil opportunity gold invest success economy risk short sell food war consume nyse movie health consumption buy chance trader rare earths arts bubble forex dividend culture earnings freedom energy politics tourism kitchen world transaction ore home conflict crash garden train labor fun fond environment ring

1.03 1.01 0.95 0.8

0.75 0.73 0.65 0.65 0.6 0.6

0.57 0.53 0.53 0.53 0.52 0.51 0.5 0.5

0.49 0.49 0.46 0.46 0.45 0.45 0.45 0.44 0.4

0.38 0.38 0.38 0.34 0.33 0.33 0.33 0.31 0.31 0.3 0.3

0.29 0.29 0.29 0.28 0.28 0.27 0.26 0.25 0.23 0.23 0.22 0.22 0.22 0.21 0.2

0.18 0.18 0.17 0.16 0.16 0.15 0.15 0.14 0.12 0.12 0.11 0.09 0.09 0.08 0.08 0.07 0.06 0.02 0.01

−0.01 −0.01 −0.02 −0.02 −0.03 −0.04 −0.04 −0.05 −0.05 −0.06 −0.08 −0.14 −0.15 −0.16 −0.19 −0.19 −0.23 −0.25 −0.29 −0.34 −0.34 −0.38 −0.42 −0.46 −0.49 −0.51 −0.57

−0.96

μ−2σ

of R

ando

m S

trat

egy

μ−σ

of R

ando

m S

trat

egy

μ+σ

of R

ando

m S

trat

egy

μ+2σ

of R

ando

m S

trat

egy

μ+3σ

of R

ando

m S

trat

egy

Cumulative Return, R [Std. Devs. of Random Strategy Returns]

−3 −2 −1 0 1 2 3B

Shor

t Pos

ition

s O

nly

Long

Pos

ition

s O

nly

Figure 4 | Analysis using strategies in which we take long or short positions only, using U.S. search volume data. (A) We implement Google Trends

strategies in which we take long positions following a decrease in search volume, and never take short positions. We find that returns from these long

position Google Trends strategies are significantly higher overall than returns from the random investment strategies (,R.USLong 5 0.41; t 5 11.42, df 5

97, p , 0.001, one sample t-test). Again, we find a positive correlation between our indicator of financial relevance and returns from these strategies

(Kendall’s tau 5 0.242, z 5 3.53, N 5 98, p , 0.001). (B) We also implement Google Trends strategies in which we take short positions following an

increase in search volume, and never take long positions. In line with our results from the long position Google Trends strategies, we find that returns from

the short position Google Trends strategies are significantly higher overall than returns from the random investment strategies (,R.USShort 5 0.19; t 5

5.28, df 5 97, p , 0.001, one sample t-test), and that there is a positive correlation between our indicator of financial relevance and short position Google

Trends returns (Kendall’s tau 5 0.275, z 5 4.01, N 5 98, p , 0.001).

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 4

Page 5: Quantifying Trading Behavior in Financial Markets Using Google Trends

performance of Google Trends strategies based on all other searchterms that we analyze is depicted in Figures S3-S100 in theSupplementary Information.

We rank the full list of the 98 investigated search terms by theirtrading performance when using search data for U.S. users only(Fig. 3A) and when using globally generated search volume(Fig. 3B). In order to ensure the robustness of our results, the overallperformance of a strategy based on a given search term is determinedas the mean value over the six returns obtained for Dt 5 1 . . . 6 weeks.Returns of the strategies are calculated as the logarithm of relativeportfolio changes, following the usual definition of returns. The dis-tribution of final portfolio values resulting from the random invest-ment strategies is close to log-normal. Cumulative returns from therandom investment strategy, derived from the logarithm of theseportfolio values, therefore follow a normal distribution, with a meanvalue of ,R.RandomStrategy 5 0. Here we report R, the cumulativereturns of a strategy, in standard deviations of the cumulative returnsof these uncorrelated random investment strategies.

We find that returns from the Google Trends strategies we testedare significantly higher overall than returns from the random strat-egies (,R.US 5 0.60; t 5 8.65, df 5 97, p , 0.001, one sample t-test).

We compare the performance of these search terms with twobenchmark strategies. The ‘buy and hold’ strategy is implementedby buying the index in the beginning and selling it at the end of thehold period. This strategy yields 16% profit, equal to the overallincrease in value of the DJIA in the time period from January 2004until February 2011. We further implement a ‘Dow Jones strategy’ byusing changes in p(t) in place of changes in search volume data as thebasis of buy and sell decisions. We find that this strategy also yieldsonly 33% profit with Dt 5 3 weeks, or when determined as the meanvalue over the six returns obtained for Dt 5 1 . . . 6 weeks, 0.45standard deviations of cumulative returns of uncorrelated randominvestment strategies (Figs. 3A and 3B; see also Fig. S101 in theSupplementary Information).

Our results show that performance of the Google Trends strategydiffers with the search term chosen. We investigate whether thesedifferences in performance can be partially explained using an indi-cator of the extent to which different terms are of financial rel-evance—a concept we quantify by calculating the frequency ofeach search term in the online edition of the Financial Times fromAugust 2004 to June 2011, normalized by the number of Google hitsfor each search term (see Fig. S2 in the Supplementary Information).We find that the return associated with a given search term is corre-lated with this indicator of financial relevance (Kendall’s tau 5 0.275,z 5 4.01, N 5 98, p , 0.001) using Kendall’s tau rank correlationcoefficient37.

It is widely recognized that investors prefer to trade on their do-mestic market, suggesting that search data for U.S. users only, as usedin analyses so far, should better capture the information gatheringbehavior of U.S. stock market participants than data for Google usersworldwide. Indeed, we find that strategies based on global searchvolume data are less successful than strategies based on U.S. searchvolume data in anticipating movements of the U.S. market (,R.US

5 0.60, ,R.Global 5 0.43; t 5 2.69, df 5 97, p , 0.01, two-sidedpaired t-test).

Our empirical results so far are consistent with a two part hypo-thesis: namely that key increases in the price of the DJIA were pre-ceded by a decrease in search volume for certain financially relatedterms, and conversely, that key decreases in the price of the DJIAwere preceded by an increase in search volume for certain financiallyrelated terms. However, our trading strategy can be decomposed intotwo strategy components: one in which a decrease in search volumeprompts us to buy (or take a long position) and one in which anincrease in search volume prompts us to sell (or take a short position).

In order to verify that both strategy components play a significantrole in our results, such that we have evidence for both parts of this

hypothesis, we implement and test one strategy in which we take longpositions following a decrease in search volume but never take shortpositions (Fig. 4A), and another strategy in which we take shortpositions following an increase in search volume but never take longpositions (Fig. 4B). We find that returns from both Google Trendsstrategy components are significantly higher overall than returnsfrom a random investment strategy (long position strategies:,R.USLong 5 0.41; t 5 11.42, df 5 97, p , 0.001, one sample t-test;short position strategies: ,R.USShort 5 0.19; t 5 5.28, df 5 97, p ,0.001, one sample t-test).

DiscussionIn summary, our results are consistent with the suggestion that duringthe period we investigate, Google Trends data did not only reflectaspects of the current state of the economy, but may have also pro-vided some insight into future trends in the behavior of economicactors. Using historic data from the period between January 2004 andFebruary 2011, we detect increases in Google search volumes for key-words relating to financial markets before stock market falls. Ourresults suggest that these warning signs in search volume data couldhave been exploited in the construction of profitable trading strategies.

We offer one possible interpretation of our results within the con-text of Herbert Simon’s model of decision making28. We suggest thatGoogle Trends data and stock market data may reflect two subsequentstages in the decision making process of investors. Trends to sell onthe financial market at lower prices may be preceded by periods ofconcern. During such periods of concern, people may tend to gathermore information about the state of the market. It is conceivable thatsuch behavior may have historically been reflected by increasedGoogle Trends search volumes for terms of higher financial relevance.

We find that strategies based on search volume data for U.S. usersare more successful for the U.S. market than strategies using globalsearch volume data. Given the assumption that the population of U.S.Internet users contains a higher proportion of traders on the U.S.markets than the worldwide population of Internet users contains,this finding is in line with the intriguing suggestion that these data-sets may provide insights into different stages of decision makingwithin the same population.

In this work, we provide a quantification of the relationshipbetween changes in search volume and changes in stock marketprices. Future work will be needed to provide a thorough explanationof the underlying psychological mechanisms which lead people tosearch for terms like debt before selling stocks at a lower price. It isclear that many opportunities also remain to extend our analyses tofurther financial data sets.

The results of our investigation suggest that combining large beha-vioral data sets such as financial trading data with data on search queryvolumes may open up new insights into different stages of large-scalecollective decision making. We conclude that these results furtherillustrate the exciting possibilities offered by new big data sets to ad-vance our understanding of complex collective behavior in our society.

MethodsHow related are search terms to the topic of finance? We quantify financialrelevance by calculating the frequency of each search term in the online edition of theFinancial Times (http://www.ft.com) from August 2004 to June 2011, normalized bythe number of Google hits (http://www.google.com) for each search term. Details aregiven in the Supplementary Information.

Data retrieval. We retrieved search volume data by accessing the Google Trendswebsite (http://www.google.com/trends) on 10 April 2011, 17 April 2011, and 24April 2011. The data on the number of hits for search terms in the online edition of theFinancial Times was retrieved on 7 June 2011. The numbers of Google hits for theseterms were obtained on 8 June 2011.

1. Axtell, R. L. Zipf distribution of US firm sizes. Science 293, 1818–1820 (2001).2. King, G. Ensuring the Data-Rich Future of the Social Sciences. Science 331,

719–721 (2011).

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 5

Page 6: Quantifying Trading Behavior in Financial Markets Using Google Trends

3. Vespignani, A. Predicting the Behavior of Techno-Social Systems. Science 325,425–428 (2009).

4. Lazer, D. et al. Computational Social Science. Science 323, 721–723. (2009).5. Perc, M. Evolution of the most common English words and phrases over the

centuries. J. R. Soc. Interface 9, 3323–3328 (2012).6. Petersen, A. M., Tenenbaum, J. N., Havlin, S., Stanley, H. E. & Perc, M. Languages

cool as they expand: Allometric scaling and the decreasing need for new words.Scientific Reports 2, 943 (2012).

7. Christakis, N. A. & Fowler, J. H. Connected: The surprising power of our socialnetworks and how they shape our lives (Little, Brown and Company, 2009).

8. Fehr, E. Behavioural science - The economics of impatience. Nature 415, 269–272(2002).

9. Shleifer, A. Inefficient Markets: An Introduction to Behavioral Finance (OxfordUniversity Press, Oxford, 2000).

10. Lillo, F., Farmer, J. D. & Mantegna, R. N. Econophysics - Master curve for price-impact function. Nature 421, 129–130 (2003).

11. Gabaix, X., Gopikrishnan, P., Plerou, V. & Stanley, H. E. A theory of power-lawdistributions in financial market fluctuations. Nature 423, 267–270 (2003).

12. Preis, T., Kenett, D. Y., Stanley, H. E., Helbing, D. & Ben-Jacob, E. Quantifying theBehavior of Stock Correlations Under Market Stress. Scientific Reports 2, 752(2012).

13. Preis, T., Schneider, J. J. & Stanley, H. E. Switching processes in financial markets.PNAS 108, 7674–7678 (2011).

14. Preis, T. Econophysics - complex correlations and trend switchings in financialtime series. European Physical Journal Special Topics 194, 5–86 (2011).

15. Bunde, A., Schellnhuber, H. J. & Kropp, J., eds. The Science of Disasters: ClimateDisruptions, Heart Attacks, and Market Crashes (Springer, Berlin, 2002).

16. Vandewalle, N. & Ausloos, M. Coherent and random sequences in financialfluctuations. Physica A 246, 454–459 (1997).

17. Podobnik, B., Horvatic, D., Petersen, A. M. & Stanley, H. E. Cross-correlationsbetween volume change and price change. PNAS 106, 22079–22084 (2009).

18. Sornette, D., Woodard, R. & Zhou, W. X. The 2006-2008 oil bubble: Evidence ofspeculation, and prediction. Physica A 388, 1571–1576. (2009).

19. Watanabe, K., Takayasu, H. & Takayasu, M. A mathematical definition of thefinancial bubbles and crashes. Physica A 383, 120–124 (2007).

20. Bouchaud, J. P., Matacz, A. & Potters, M. Leverage effect in financial markets: theretarded volatility model. Physical Review Letters 87, 228701 (2001).

21. Hommes, C. H. Modeling the stylized facts in finance through simple nonlinearadaptive systems. PNAS 99, 7221–7228 (2002).

22. Haldane, A. G. & May, R. M. Systemic risk in banking ecosystems. Nature 469,351–355 (2011).

23. Lux, T. & Marchesi, M. Scaling and criticality in a stochastic multi-agent model ofa financial market. Nature 397, 498–500 (1999).

24. Krugman, P. The Self-Organizing Economy (Blackwell, Cambridge,Massachusetts, 1996).

25. Sornette, D. & von der Becke, S. Complexity clouds finance-risk models. Nature471, 166 (2011).

26. Schweitzer, F. et al. Economic Networks: The New Challenges. Science 325,422–425 (2009).

27. Garlaschelli, D., Caldarelli, G. & Pietronero, L. Universal scaling relations in foodwebs. Nature 423, 165–168 (2003).

28. Onnela, J. P., Arbesman, S., Gonzalez, M. C., Barabasi, A. L. & Christakis, N. A.Geographic Constraints on Social Network Groups. PLoS One 6, e16939 (2011).

29. Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophiccascade of failures in interdependent networks. Nature 464, 1025–1028 (2010).

30. Simon, H. A. A behavioral model of rational choice. Quarterly Journal ofEconomics 69, 99–118 (1955).

31. Mondria, J., Wu, T. & Zhang, Y. The determinants of international investmentand attention allocation: Using internet search query data. Journal ofInternational Economics 82, 85–95 (2010).

32. Ginsberg, J. et al. Detecting influenza epidemics using search engine query data.Nature 457, 1012–1014 (2009).

33. Preis, T., Reith, D. & Stanley, H. E. Complex dynamics of our economic life ondifferent scales: insights from search engine query data. Phil. Trans. R. Soc. A 368,5707–5719 (2010).

34. Bordino, I. et al. Web Search Queries Can Predict Stock Market Volumes. PLoSOne 7, e40014 (2012).

35. Choi, H. & Varian, H. Predicting the Present with Google Trends. The EconomicRecord 88, 2–9 (2012).

36. Preis, T., Moat, H. S., Stanley, H. E. & Bishop, S. R. Quantifying the Advantage ofLooking Forward. Scientific Reports 2, 350 (2012).

37. Kendall, M. A New Measure of Rank Correlation. Biometrika 30, 81–89 (1938).

AcknowledgementsWe thank Didier Sornette, Dirk Helbing, and Steven R. Bishop for comments. This workwas partially supported by the German Research Foundation Grant PR 1305/1-1 (to T.P.).This work was also supported by the Intelligence Advanced Research Projects Activity(IARPA) via Department of Interior National Business Center (DoI/NBC) contract numberD12PC00285 and by the National Science Foundation (NSF), the Office of Naval Research(ONR), and the Defense Threat Reduction Agency (DTRA). The U.S. Government isauthorized to reproduce and distribute reprints for Governmental purposesnotwithstanding any copyright annotation thereon. Disclaimer: The views and conclusionscontained herein are those of the authors and should not be interpreted as necessarilyrepresenting the official policies or endorsements, either expressed or implied, of IARPA,DoI/NBC, or the U.S. Government.

Author contributionsT.P., H.S.M. and H.E.S. developed the design of the study, performed analyses, discussed theresults, and contributed to the text of the manuscript.

Additional informationSupplementary information accompanies this paper at http://www.nature.com/scientificreports

Competing financial interests: The authors declare no competing financial interests.

License: This work is licensed under a Creative CommonsAttribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of thislicense, visit http://creativecommons.org/licenses/by-nc-sa/3.0/

How to cite this article: Preis, T., Moat, H.S. & Stanley, H.E. Quantifying Trading Behaviorin Financial Markets Using Google Trends. Sci. Rep. 3, 1684; DOI:10.1038/srep01684 (2013).

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1684 | DOI: 10.1038/srep01684 6