wsdm2016勉強会 資料

12
Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior F. Kooti, K. Lerman, L. M. Aiello, M. Grbovic, and N. Djuric http://www.kamishima.net/ WSDM206 2016/03/19 1

Upload: toshihiro-kamishima

Post on 08-Jan-2017

790 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Portrait of an Online Shopper: Understanding and Predicting Consumer Behavior

F. Kooti, K. Lerman, L. M. Aiello, M. Grbovic, and N. Djuric

http://www.kamishima.net/

WSDM206 2016/03/19

1

2

3

4

most shoppers have a finite budget and need to wait longer betweenpurchases to buy more expensive items.

In addition to temporal and demographic factors, social networksare believed to play an important role in shaping consumer be-havior (e.g., by spreading information about products through theword-of-mouth [33]). Previous studies examined how consumersuse their online social networks to gather product information andrecommendations [19, 20], although the direct effect of recommen-dations on purchases was found to be weak [27]. In addition, peo-ple who are socially connected are generally more similar to oneanother than unconnected people [29], and hence are more likelyto be interested in similar products. Our analysis confirmed thatshoppers who are socially connected and e-mail each other tend topurchase more similar products than unconnected shoppers.

Once we understand the factors affecting consumer behavior, wecan use this knowledge to predict future purchases. Given users’purchase history and demographic data, we address a problem ofpredicting the time and price of their next purchase. Our methodattains a relative improvement of 108.7% over the random baselinefor predicting the price of the next purchase, and 36.4% relative im-provement over the random baseline for predicting the time of thenext purchase. Interestingly, demographic features were shown tobe the least useful in these prediction tasks, while temporal featurescarried the most discriminative information.

The contributions of the paper are summarized below:

• In-depth analysis of a unique and very rich data set describingconsumer behavior, extracted from purchase confirmations mer-chants send to shoppers (Section 2);

• A quantitative analysis of an impact of demographic, temporal,and social factors on consumer behavior (Section 3);

• Prediction of consumer behavior, where we predict the time ofthe next purchase and how much money will be spent (Sec-tion 4).

Better understanding of consumer behavior can benefit both con-sumers and advertisers. Knowing when consumers are ready tomake a purchase and how much they are willing to spend can im-prove the effectiveness of advertising campaigns, in terms of op-timizing ad impressions and budget spend. Understanding thesepatterns can also make online shopping experience more efficientfor consumers. Considering that consumer spending presents sucha large portion of the economy, even a small efficiency gain canhave significant impact on the overall economic activity.

2. DATA SETMost online purchases result in a confirmation email being sent bythe merchant to the shopper. These emails provide a unique op-portunity to study the shopping behavior of people across differentonline retail stores, such as Amazon, eBay, or Walmart.

Yahoo Mail is one of the world’s largest email providers withmore than 300M users3, and many online shoppers use this emailservice for receiving purchase confirmations. We selected theseemails by using a precompiled list of email addresses of popularmerchants. Applying a set of manually specified extraction rules tothe email body, we extracted the list of purchased item names andthe price of each item. In case of multiple items purchased in asingle order, we considered them as individual purchases occurringat the same time. Therefore, throughout the paper the expression“purchase” will refer to a purchase of a single item. In order tobe able to analyze purchases by category (e.g., electronics, books,3http://www.comscore.com/

0.0

0.1

0.2

100 101 102 103

# of purchases

PDF

(a) PDF

0.4

0.6

0.8

1.0

100 101 102

# of purchases

CD

F

(b) CDFFigure 1: Distribution of number of purchases made by users

0.000

0.003

0.006

0.009

100 101 102 103

Total money spent ($)

PDF

(a) PDF

0.00

0.25

0.50

0.75

1.00

100 101 102 103 104

Total money spent ($)

CD

F

(b) CDFFigure 2: Distribution of total money spent by users

handbags), we developed a categorization module based on the pur-chased item names. Specifically, an item name was used as an inputto a classifier that predicts the item category. We used a three levelsdeep, 1,733 node Pricegrabber taxonomy4 to categorize the items.The details of categorization are beyond the scope of this paper.

We limited our study to a random subset of Yahoo Mail users inthe US. Our data set contains information on 20.1M users, who col-lectively made 121M purchases from several top retailers betweenFebruary and September 2014, amounting to total spending of 5.1Bdollars. For each user we extracted age, gender, and zip code in-formation from the Yahoo user database. We excluded users whomade more than 1,000 purchases (amounting to less than 0.01%of the sample), as these accounts are likely to belong to stores andnot individual users. The analysis of the data set was performed inaggregate and on anonymized data.

In order to examine social aspects of the shopping behavior, weused the Yahoo email network data set in addition to the data setof purchases. The email network can be represented as a directedgraph G, with edges denoted by (i, j,Nij) signifying that user isent Nij emails to user j. For our analysis we retained only edgeswith a minimum of 5 exchanged messages, and considered only asubgraph C of G induced by the two-hop neighborhood of the userswho made purchases (i.e., their immediate contacts and contactsof their contacts). The subgraph C was used to construct a list of1st-level contacts and 2nd-level contacts for each online shopper.

Let us take a closer look at the data set characteristics. In Fig-ure 1(a) we show the distribution of number of purchases per user,which reveals the expected heavy-tailed characteristic. Figure 1(b)shows that only 5% of the users made more than 20 purchases.In contrast, the distribution of total money spent per user peaksat around 10 dollars, sharply decreasing for smaller amounts (Fig-ure 2(a)). Also, there is a non-negligible minority of people whospend a substantial amount of money for online shopping, such as5% of the users spent more than 1,000 dollars (Figure 2(b)).

Items being purchased also have drastically different levels ofpopularity, as shown by the distribution in Figure 3. Disney’s Frozen4http://www.pricegrabber.com

206

10−6

10−4

10−2

100

100 101 102 103 104

# of times being purchased

PDF

Figure 3: Number of times different items have been purchased

101

102

103

104

105

106

107

100 101 102 103

Item price

# of

tim

es p

urch

ased

Figure 4: Number of item purchases as a function of price

DVD, the most popular item in the data set, has been purchasedmore than 200,000 times, whereas the vast majority of the items hasbeen purchased less than 10 times. Table 1 lists the top 5 most fre-quently purchased items. Intuitively, the set of items that users havespent the most money on is a different set, because a single pur-chase of an expensive item would account for the same amount ofmoney of several purchases of cheaper items (Table 2). In fact, thenumber of times an item is purchased negatively correlates with theprice of that item (Figure 4). This is in line with previous survey-based studies [7] that found the vast majority of items purchasedonline are worth at most few tens of dollars.

3. PURCHASE PATTERN ANALYSISIn this section we present a quantitative analysis of factors affectingonline purchases. We examine the role of demographic, temporal,and social factors that include gender and age, daily and weeklypatterns, frequency of shopping, tendency towards recurring pur-chases, and budget constraints.

3.1 Demographic FactorsLet us consider how gender, age, and location (i.e., zip code)

affect purchasing behavior of customers. First, we measured frac-tion of all email users that made an online purchase. We foundthat higher fraction of women make online purchases compared tomen (Figure 5(a)), albeit men make slightly more purchases (Fig-ure 5(b)) and spend more money on average (Figure 5(c)). It isinteresting that, as a result, men spend much more money in total(Figure 5(d)). The same patterns hold across different age groups.The presented results back up findings from earlier consumer sur-veys which revealed that man have a higher perceived advantage ofonline shopping [2], while women havie a higher concern for neg-ative consequences of online purchasing [17], resulting in a highernumber of purchases made by men.

Table 1: Top 5 most purchased products

Rank Product name # of purchases1 Frozen (DVD) 202,1032 Cards Against Humanity (Cards) 110,0323 Google Chromecast 59,5484 HDMI cable 54,4025 Pampers 51,044

Table 2: Top 5 products with the most money spent on them

Rank Product name Money spent on product1 Play Station 4 $ 7.0M2 Frozen (DVD) $ 3.8M3 Kindle $ 3.2M4 Samsung Galaxy Tab $ 2.7M5 Cards Against Humanity $ 2.5M

Table 3: Top product categories purchased by women and men

Rank Top categories Distinctive women Distinctive men1 Android Books Games2 Accessories Dresses Flash memory3 Books Diapering Light bulbs4 Vitamins Wallets Accessories5 Shirts Bracelets Batteries

With respect to the age, spending ability increases as peopleget older, peaking for the population between age 30 to 50 andthen declining afterwards. The same pattern holds for number ofpurchases made, average item price, and total money spent (Fig-ures 5(b), 5(d), 5(c)). Different generations also purchase differenttypes of products online (Table 5). Younger shoppers (18-22 yearsold) purchase more phone accessories and games, whereas oldershoppers (60-70 years old) are much more interested in buying TVshows. Also, blood sugar medicine is purchased more by the olderusers, which is expected.

Differences exist across genders as well. Table 3 shows the topfive categories of purchased products for male and female cus-tomers. Even though the ranking of the top products is the same,each product accounts for different fraction of all purchases withinthe same gender. To find the most distinctive categories, we com-pare the fraction of all the items bought by both genders, and con-sider the categories that have the largest differences. Books, dresses,and diapering are the categories that were more disproportionatelybought by women, whereas games, flash memory sticks, and ac-cessories (e.g., headphones) are the categories purchased more bymen. The largest differences range from only 0.5% to 1%, butare statistically significant. This result is aligned with previousresearch on offline shopping that found men more keen in buy-ing electronics and entertainment products, while women more in-clined to buy clothes [11, 22]. We repeated the same gender analy-sis at a product level (Table 4). Consistently, there is a high overlapbetween most purchased products.

In the following, we measure the impact of economic factorson online shopping behavior. We use the US census data to re-trieve median income associated with each zip code, making aninferred income for a user an aggregated estimate. Nevertheless,given the large size of this data set this coarse approach was enoughto observe clear trends. The number of purchases, average product

207

5

10−6

10−4

10−2

100

100 101 102 103 104

# of times being purchased

PDF

Figure 3: Number of times different items have been purchased

101

102

103

104

105

106

107

100 101 102 103

Item price

# of

tim

es p

urch

ased

Figure 4: Number of item purchases as a function of price

DVD, the most popular item in the data set, has been purchasedmore than 200,000 times, whereas the vast majority of the items hasbeen purchased less than 10 times. Table 1 lists the top 5 most fre-quently purchased items. Intuitively, the set of items that users havespent the most money on is a different set, because a single pur-chase of an expensive item would account for the same amount ofmoney of several purchases of cheaper items (Table 2). In fact, thenumber of times an item is purchased negatively correlates with theprice of that item (Figure 4). This is in line with previous survey-based studies [7] that found the vast majority of items purchasedonline are worth at most few tens of dollars.

3. PURCHASE PATTERN ANALYSISIn this section we present a quantitative analysis of factors affectingonline purchases. We examine the role of demographic, temporal,and social factors that include gender and age, daily and weeklypatterns, frequency of shopping, tendency towards recurring pur-chases, and budget constraints.

3.1 Demographic FactorsLet us consider how gender, age, and location (i.e., zip code)

affect purchasing behavior of customers. First, we measured frac-tion of all email users that made an online purchase. We foundthat higher fraction of women make online purchases compared tomen (Figure 5(a)), albeit men make slightly more purchases (Fig-ure 5(b)) and spend more money on average (Figure 5(c)). It isinteresting that, as a result, men spend much more money in total(Figure 5(d)). The same patterns hold across different age groups.The presented results back up findings from earlier consumer sur-veys which revealed that man have a higher perceived advantage ofonline shopping [2], while women havie a higher concern for neg-ative consequences of online purchasing [17], resulting in a highernumber of purchases made by men.

Table 1: Top 5 most purchased products

Rank Product name # of purchases1 Frozen (DVD) 202,1032 Cards Against Humanity (Cards) 110,0323 Google Chromecast 59,5484 HDMI cable 54,4025 Pampers 51,044

Table 2: Top 5 products with the most money spent on them

Rank Product name Money spent on product1 Play Station 4 $ 7.0M2 Frozen (DVD) $ 3.8M3 Kindle $ 3.2M4 Samsung Galaxy Tab $ 2.7M5 Cards Against Humanity $ 2.5M

Table 3: Top product categories purchased by women and men

Rank Top categories Distinctive women Distinctive men1 Android Books Games2 Accessories Dresses Flash memory3 Books Diapering Light bulbs4 Vitamins Wallets Accessories5 Shirts Bracelets Batteries

With respect to the age, spending ability increases as peopleget older, peaking for the population between age 30 to 50 andthen declining afterwards. The same pattern holds for number ofpurchases made, average item price, and total money spent (Fig-ures 5(b), 5(d), 5(c)). Different generations also purchase differenttypes of products online (Table 5). Younger shoppers (18-22 yearsold) purchase more phone accessories and games, whereas oldershoppers (60-70 years old) are much more interested in buying TVshows. Also, blood sugar medicine is purchased more by the olderusers, which is expected.

Differences exist across genders as well. Table 3 shows the topfive categories of purchased products for male and female cus-tomers. Even though the ranking of the top products is the same,each product accounts for different fraction of all purchases withinthe same gender. To find the most distinctive categories, we com-pare the fraction of all the items bought by both genders, and con-sider the categories that have the largest differences. Books, dresses,and diapering are the categories that were more disproportionatelybought by women, whereas games, flash memory sticks, and ac-cessories (e.g., headphones) are the categories purchased more bymen. The largest differences range from only 0.5% to 1%, butare statistically significant. This result is aligned with previousresearch on offline shopping that found men more keen in buy-ing electronics and entertainment products, while women more in-clined to buy clothes [11, 22]. We repeated the same gender analy-sis at a product level (Table 4). Consistently, there is a high overlapbetween most purchased products.

In the following, we measure the impact of economic factorson online shopping behavior. We use the US census data to re-trieve median income associated with each zip code, making aninferred income for a user an aggregated estimate. Nevertheless,given the large size of this data set this coarse approach was enoughto observe clear trends. The number of purchases, average product

207

20

30

40

50

20 30 40 50 60Age

% o

f onl

ine

shop

pers Men

Women

(a) Percentage of online shoppers

3

4

5

6

7

20 40 60 80Age

Avg

# of

item

s pu

rcha

sed

MenWomen

(b) Number of purchases

20

30

20 40 60 80Age

Med

ian

of p

rodu

ct p

rice

MenWomen

(c) Average price

25

50

75

100

125

20 40 60 80Age

Med

ian

tota

l $ s

pent

MenWomen

(d) Total money spent

Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) averageprice of products purchased; and (d) total spent by men and women

Table 4: Top products purchased by women and men

Rank Top women Top men Distinctive women Distinctive men1 Frozen Frozen Frozen Chromecast2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 45 Game of Thrones iPhone screen protector iPhone case Godfather collection

Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users

Rank Top younger users Top older users Distinctive younger users Distinctive older users1 iPhone screen protector Frozen iPhone screen protector Frozen2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey4 iPhone case Downton Abbey iPhone case Blood sugar medicine5 Frozen Hunger Games iPhone case TurboTax Package

0.0

2.5

5.0

7.5

50k 100k 150kIncome

Aver

age

# of

pur

chas

es

(a) Number of purchases

0

20

40

60

50k 100k 150kIncome

Aver

age

item

pric

e

(b) Average item price

0

100

200

300

400

500

50k 100k 150kIncome

Aver

age

tota

l spe

nt

(c) Total money spentFigure 6: Effect of income on purchasing behavior

price, and total money spent are all positively correlated with in-come (Figures 6(a), 6(b), and 6(c) respectively). While users livingin high-income zip codes do not buy substantially more expensiveproducts, they do make more purchases, spending more money intotal than users from lower-income zip codes. Although the fac-tors that lead to lower-income households spending less online aremultiple and complex, part of this effect can be explained by thereluctance of people who are concerned with their financial safetyto trust and make full use of online shopping, as pointed out byprevious studies [23].

3.2 Temporal FactorsOur data set spans a period of eight months, giving us opportu-

nity to investigate temporal dynamics of purchasing behavior and

factors affecting it. Besides daily and weekly cycles and periodicpurchasing, we observed temporal variations that we associate withfinancial depletion: users wait longer to buy more expensive items,waiting for the budget to recover from the previous purchases.

3.2.1 Daily and Weekly CyclesFigure 7 shows the daily number of purchases over a period of

two months. The figure indicates a clear weekly shopping patternwith more purchases taking place in the first days of the week andfewer purchases on the weekends. On average there are 32.6%more purchases on Mondays than Sundays. There also exists strongdiurnal patterns in shopping behavior (Figure 8). Interestingly,most of the purchases occur during the working hours (i.e., in themorning and early afternoon). Note that for this analysis we in-

208

20

30

40

50

20 30 40 50 60Age

% o

f onl

ine

shop

pers Men

Women

(a) Percentage of online shoppers

3

4

5

6

7

20 40 60 80Age

Avg

# of

item

s pu

rcha

sed

MenWomen

(b) Number of purchases

20

30

20 40 60 80Age

Med

ian

of p

rodu

ct p

rice

MenWomen

(c) Average price

25

50

75

100

125

20 40 60 80Age

Med

ian

tota

l $ s

pent

MenWomen

(d) Total money spent

Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) averageprice of products purchased; and (d) total spent by men and women

Table 4: Top products purchased by women and men

Rank Top women Top men Distinctive women Distinctive men1 Frozen Frozen Frozen Chromecast2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 45 Game of Thrones iPhone screen protector iPhone case Godfather collection

Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users

Rank Top younger users Top older users Distinctive younger users Distinctive older users1 iPhone screen protector Frozen iPhone screen protector Frozen2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey4 iPhone case Downton Abbey iPhone case Blood sugar medicine5 Frozen Hunger Games iPhone case TurboTax Package

0.0

2.5

5.0

7.5

50k 100k 150kIncome

Aver

age

# of

pur

chas

es

(a) Number of purchases

0

20

40

60

50k 100k 150kIncome

Aver

age

item

pric

e

(b) Average item price

0

100

200

300

400

500

50k 100k 150kIncome

Aver

age

tota

l spe

nt

(c) Total money spentFigure 6: Effect of income on purchasing behavior

price, and total money spent are all positively correlated with in-come (Figures 6(a), 6(b), and 6(c) respectively). While users livingin high-income zip codes do not buy substantially more expensiveproducts, they do make more purchases, spending more money intotal than users from lower-income zip codes. Although the fac-tors that lead to lower-income households spending less online aremultiple and complex, part of this effect can be explained by thereluctance of people who are concerned with their financial safetyto trust and make full use of online shopping, as pointed out byprevious studies [23].

3.2 Temporal FactorsOur data set spans a period of eight months, giving us opportu-

nity to investigate temporal dynamics of purchasing behavior and

factors affecting it. Besides daily and weekly cycles and periodicpurchasing, we observed temporal variations that we associate withfinancial depletion: users wait longer to buy more expensive items,waiting for the budget to recover from the previous purchases.

3.2.1 Daily and Weekly CyclesFigure 7 shows the daily number of purchases over a period of

two months. The figure indicates a clear weekly shopping patternwith more purchases taking place in the first days of the week andfewer purchases on the weekends. On average there are 32.6%more purchases on Mondays than Sundays. There also exists strongdiurnal patterns in shopping behavior (Figure 8). Interestingly,most of the purchases occur during the working hours (i.e., in themorning and early afternoon). Note that for this analysis we in-

208

6

20

30

40

50

20 30 40 50 60Age

% o

f onl

ine

shop

pers Men

Women

(a) Percentage of online shoppers

3

4

5

6

7

20 40 60 80Age

Avg

# of

item

s pu

rcha

sed

MenWomen

(b) Number of purchases

20

30

20 40 60 80Age

Med

ian

of p

rodu

ct p

rice

MenWomen

(c) Average price

25

50

75

100

125

20 40 60 80Age

Med

ian

tota

l $ s

pent

MenWomen

(d) Total money spent

Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) averageprice of products purchased; and (d) total spent by men and women

Table 4: Top products purchased by women and men

Rank Top women Top men Distinctive women Distinctive men1 Frozen Frozen Frozen Chromecast2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 45 Game of Thrones iPhone screen protector iPhone case Godfather collection

Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users

Rank Top younger users Top older users Distinctive younger users Distinctive older users1 iPhone screen protector Frozen iPhone screen protector Frozen2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey4 iPhone case Downton Abbey iPhone case Blood sugar medicine5 Frozen Hunger Games iPhone case TurboTax Package

0.0

2.5

5.0

7.5

50k 100k 150kIncome

Aver

age

# of

pur

chas

es

(a) Number of purchases

0

20

40

60

50k 100k 150kIncome

Aver

age

item

pric

e

(b) Average item price

0

100

200

300

400

500

50k 100k 150kIncome

Aver

age

tota

l spe

nt

(c) Total money spentFigure 6: Effect of income on purchasing behavior

price, and total money spent are all positively correlated with in-come (Figures 6(a), 6(b), and 6(c) respectively). While users livingin high-income zip codes do not buy substantially more expensiveproducts, they do make more purchases, spending more money intotal than users from lower-income zip codes. Although the fac-tors that lead to lower-income households spending less online aremultiple and complex, part of this effect can be explained by thereluctance of people who are concerned with their financial safetyto trust and make full use of online shopping, as pointed out byprevious studies [23].

3.2 Temporal FactorsOur data set spans a period of eight months, giving us opportu-

nity to investigate temporal dynamics of purchasing behavior and

factors affecting it. Besides daily and weekly cycles and periodicpurchasing, we observed temporal variations that we associate withfinancial depletion: users wait longer to buy more expensive items,waiting for the budget to recover from the previous purchases.

3.2.1 Daily and Weekly CyclesFigure 7 shows the daily number of purchases over a period of

two months. The figure indicates a clear weekly shopping patternwith more purchases taking place in the first days of the week andfewer purchases on the weekends. On average there are 32.6%more purchases on Mondays than Sundays. There also exists strongdiurnal patterns in shopping behavior (Figure 8). Interestingly,most of the purchases occur during the working hours (i.e., in themorning and early afternoon). Note that for this analysis we in-

208

20

30

40

50

20 30 40 50 60Age

% o

f onl

ine

shop

pers Men

Women

(a) Percentage of online shoppers

3

4

5

6

7

20 40 60 80Age

Avg

# of

item

s pu

rcha

sed

MenWomen

(b) Number of purchases

20

30

20 40 60 80Age

Med

ian

of p

rodu

ct p

rice

MenWomen

(c) Average price

25

50

75

100

125

20 40 60 80Age

Med

ian

tota

l $ s

pent

MenWomen

(d) Total money spent

Figure 5: Demographic analysis broken down by age: (a) Percentage of online shoppers; (b) number of items purchased; (c) averageprice of products purchased; and (d) total spent by men and women

Table 4: Top products purchased by women and men

Rank Top women Top men Distinctive women Distinctive men1 Frozen Frozen Frozen Chromecast2 iPhone screen protector Game of Thrones iPhone screen protector Game of Thrones3 Cards Against Humanity Chromecast iPhone screen protector Titanfall Xbox One4 iPhone screen protector (another brand) Cards Against Humanity iPhone case Playstation 45 Game of Thrones iPhone screen protector iPhone case Godfather collection

Table 5: Differences in the products purchased by younger (18-22 yo) and older (60-70 yo) users

Rank Top younger users Top older users Distinctive younger users Distinctive older users1 iPhone screen protector Frozen iPhone screen protector Frozen2 iPhone screen protector Game of Thrones iPhone screen protector (another brand) Game of Thrones3 Cards Against Humanity Chromecast Cards Against Humanity Downton Abbey4 iPhone case Downton Abbey iPhone case Blood sugar medicine5 Frozen Hunger Games iPhone case TurboTax Package

0.0

2.5

5.0

7.5

50k 100k 150kIncome

Aver

age

# of

pur

chas

es

(a) Number of purchases

0

20

40

60

50k 100k 150kIncome

Aver

age

item

pric

e

(b) Average item price

0

100

200

300

400

500

50k 100k 150kIncome

Aver

age

tota

l spe

nt

(c) Total money spentFigure 6: Effect of income on purchasing behavior

price, and total money spent are all positively correlated with in-come (Figures 6(a), 6(b), and 6(c) respectively). While users livingin high-income zip codes do not buy substantially more expensiveproducts, they do make more purchases, spending more money intotal than users from lower-income zip codes. Although the fac-tors that lead to lower-income households spending less online aremultiple and complex, part of this effect can be explained by thereluctance of people who are concerned with their financial safetyto trust and make full use of online shopping, as pointed out byprevious studies [23].

3.2 Temporal FactorsOur data set spans a period of eight months, giving us opportu-

nity to investigate temporal dynamics of purchasing behavior and

factors affecting it. Besides daily and weekly cycles and periodicpurchasing, we observed temporal variations that we associate withfinancial depletion: users wait longer to buy more expensive items,waiting for the budget to recover from the previous purchases.

3.2.1 Daily and Weekly CyclesFigure 7 shows the daily number of purchases over a period of

two months. The figure indicates a clear weekly shopping patternwith more purchases taking place in the first days of the week andfewer purchases on the weekends. On average there are 32.6%more purchases on Mondays than Sundays. There also exists strongdiurnal patterns in shopping behavior (Figure 8). Interestingly,most of the purchases occur during the working hours (i.e., in themorning and early afternoon). Note that for this analysis we in-

208

7

8 × 10+5

1.2 × 10+6

1.6 × 10+6

2 × 10+6

Jun15 Jul15 Aug15Date

Num

ber o

f pur

chas

es

DailyWeek avg

Figure 7: Number of purchases in a day and average weekly

2 × 10+6

4 × 10+6

6 × 10+6

8 × 10+6

0 5 10 15 20Time of the day

Num

ber o

f pur

chas

es

Figure 8: Number of purchases in each hour of the day

Table 6: Top 5 items with the most number of repurchasesRank Item name Median purchase delay

1 Pampers 448 count 422 Bath tissue 623 Pampers 162 count 304 Pampers 152 count 315 Frozen 12

ferred the time zone from the user’s zip code, which might be dif-ferent from a shipping zip code for a purchase.

Researchers have also reported monthly effects, where peoplespend more money at the beginning of the month when they re-ceive their paychecks, compared to the end of the month [14]. Totest the first-of-the-month phenomenon, we compared spending inthe first Monday of the month with the last Monday of the month.We considered the first and the last Mondays and not the first andthe last days, because the strong weekly patterns would result inan unfair comparison if the first and the last day of the month arenot the same day of the week. Our data does not support the ear-lier findings and there are months in which the last Monday of themonth includes more activity compared to the first Monday of themonth.

3.2.2 Recurring PurchasesSome products are purchased periodically, such as printer car-

tridges, water filters, or toilet paper. Finding these items and theirtypical cycle would help predicting purchasing behavior. We dothis by counting the number of times each item has been purchasedby each user, then from each user’s count we eliminate those prod-ucts purchased only once, and lastly we aggregate the number of

10−5

10−3

10−1

100 101 102

Days from last purchase

PDF

Figure 9: Distribution of number of days between purchases

0.10

0.15

0.20

0.25

0.30

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

Figure 10: Relationship between purchase price and time tonext purchase (0.95 confidence interval are shown yet too smallto be observed)

purchases per each item. Table 6 shows the top five such products,along with the median number of days between purchases. Outof the top 20 products only four are neither toilet paper nor tissue(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-gle serve coffees). In the top 20 list, the only unexpected item isthe Frozen DVD, which probably made the list due to users buyingadditional copies as gifts or due to purchases by small stores thatwere not eliminated by our removal criterion of maximum 1,000purchases. Interestingly, the number of days between purchases formost of the top 20 items is close to 1 or 2 months, which might bedue to automatic purchasing that users can set up.

3.2.3 Finite Budget EffectsFinally, we study the dynamics of individual purchasing behav-

ior. Figure 9 shows the distribution of number of days betweenpurchases. The distribution is heavy-tailed, indicating bursty dy-namics. The most likely time between purchases is one day andthere are local maxima around multiples of 7 days, consistent withweekly cycles we observed.

An individual’s purchasing decisions are not independent, butconstrained by their finances. Budgetary constraints introduce tem-poral dependencies between purchases made by the user: Afterbuying a product, the user has to wait some time to accumulatemoney to make another purchase. Previous work in economicsstudied models of budget allocation of households across differenttypes of goods to maximize an utility function [13] and analyzed

209

8 × 10+5

1.2 × 10+6

1.6 × 10+6

2 × 10+6

Jun15 Jul15 Aug15Date

Num

ber o

f pur

chas

es

DailyWeek avg

Figure 7: Number of purchases in a day and average weekly

2 × 10+6

4 × 10+6

6 × 10+6

8 × 10+6

0 5 10 15 20Time of the day

Num

ber o

f pur

chas

es

Figure 8: Number of purchases in each hour of the day

Table 6: Top 5 items with the most number of repurchasesRank Item name Median purchase delay

1 Pampers 448 count 422 Bath tissue 623 Pampers 162 count 304 Pampers 152 count 315 Frozen 12

ferred the time zone from the user’s zip code, which might be dif-ferent from a shipping zip code for a purchase.

Researchers have also reported monthly effects, where peoplespend more money at the beginning of the month when they re-ceive their paychecks, compared to the end of the month [14]. Totest the first-of-the-month phenomenon, we compared spending inthe first Monday of the month with the last Monday of the month.We considered the first and the last Mondays and not the first andthe last days, because the strong weekly patterns would result inan unfair comparison if the first and the last day of the month arenot the same day of the week. Our data does not support the ear-lier findings and there are months in which the last Monday of themonth includes more activity compared to the first Monday of themonth.

3.2.2 Recurring PurchasesSome products are purchased periodically, such as printer car-

tridges, water filters, or toilet paper. Finding these items and theirtypical cycle would help predicting purchasing behavior. We dothis by counting the number of times each item has been purchasedby each user, then from each user’s count we eliminate those prod-ucts purchased only once, and lastly we aggregate the number of

10−5

10−3

10−1

100 101 102

Days from last purchase

PDF

Figure 9: Distribution of number of days between purchases

0.10

0.15

0.20

0.25

0.30

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

Figure 10: Relationship between purchase price and time tonext purchase (0.95 confidence interval are shown yet too smallto be observed)

purchases per each item. Table 6 shows the top five such products,along with the median number of days between purchases. Outof the top 20 products only four are neither toilet paper nor tissue(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-gle serve coffees). In the top 20 list, the only unexpected item isthe Frozen DVD, which probably made the list due to users buyingadditional copies as gifts or due to purchases by small stores thatwere not eliminated by our removal criterion of maximum 1,000purchases. Interestingly, the number of days between purchases formost of the top 20 items is close to 1 or 2 months, which might bedue to automatic purchasing that users can set up.

3.2.3 Finite Budget EffectsFinally, we study the dynamics of individual purchasing behav-

ior. Figure 9 shows the distribution of number of days betweenpurchases. The distribution is heavy-tailed, indicating bursty dy-namics. The most likely time between purchases is one day andthere are local maxima around multiples of 7 days, consistent withweekly cycles we observed.

An individual’s purchasing decisions are not independent, butconstrained by their finances. Budgetary constraints introduce tem-poral dependencies between purchases made by the user: Afterbuying a product, the user has to wait some time to accumulatemoney to make another purchase. Previous work in economicsstudied models of budget allocation of households across differenttypes of goods to maximize an utility function [13] and analyzed

209

8 × 10+5

1.2 × 10+6

1.6 × 10+6

2 × 10+6

Jun15 Jul15 Aug15Date

Num

ber o

f pur

chas

es

DailyWeek avg

Figure 7: Number of purchases in a day and average weekly

2 × 10+6

4 × 10+6

6 × 10+6

8 × 10+6

0 5 10 15 20Time of the day

Num

ber o

f pur

chas

es

Figure 8: Number of purchases in each hour of the day

Table 6: Top 5 items with the most number of repurchasesRank Item name Median purchase delay

1 Pampers 448 count 422 Bath tissue 623 Pampers 162 count 304 Pampers 152 count 315 Frozen 12

ferred the time zone from the user’s zip code, which might be dif-ferent from a shipping zip code for a purchase.

Researchers have also reported monthly effects, where peoplespend more money at the beginning of the month when they re-ceive their paychecks, compared to the end of the month [14]. Totest the first-of-the-month phenomenon, we compared spending inthe first Monday of the month with the last Monday of the month.We considered the first and the last Mondays and not the first andthe last days, because the strong weekly patterns would result inan unfair comparison if the first and the last day of the month arenot the same day of the week. Our data does not support the ear-lier findings and there are months in which the last Monday of themonth includes more activity compared to the first Monday of themonth.

3.2.2 Recurring PurchasesSome products are purchased periodically, such as printer car-

tridges, water filters, or toilet paper. Finding these items and theirtypical cycle would help predicting purchasing behavior. We dothis by counting the number of times each item has been purchasedby each user, then from each user’s count we eliminate those prod-ucts purchased only once, and lastly we aggregate the number of

10−5

10−3

10−1

100 101 102

Days from last purchase

PDF

Figure 9: Distribution of number of days between purchases

0.10

0.15

0.20

0.25

0.30

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

Figure 10: Relationship between purchase price and time tonext purchase (0.95 confidence interval are shown yet too smallto be observed)

purchases per each item. Table 6 shows the top five such products,along with the median number of days between purchases. Outof the top 20 products only four are neither toilet paper nor tissue(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-gle serve coffees). In the top 20 list, the only unexpected item isthe Frozen DVD, which probably made the list due to users buyingadditional copies as gifts or due to purchases by small stores thatwere not eliminated by our removal criterion of maximum 1,000purchases. Interestingly, the number of days between purchases formost of the top 20 items is close to 1 or 2 months, which might bedue to automatic purchasing that users can set up.

3.2.3 Finite Budget EffectsFinally, we study the dynamics of individual purchasing behav-

ior. Figure 9 shows the distribution of number of days betweenpurchases. The distribution is heavy-tailed, indicating bursty dy-namics. The most likely time between purchases is one day andthere are local maxima around multiples of 7 days, consistent withweekly cycles we observed.

An individual’s purchasing decisions are not independent, butconstrained by their finances. Budgetary constraints introduce tem-poral dependencies between purchases made by the user: Afterbuying a product, the user has to wait some time to accumulatemoney to make another purchase. Previous work in economicsstudied models of budget allocation of households across differenttypes of goods to maximize an utility function [13] and analyzed

209

8

8 × 10+5

1.2 × 10+6

1.6 × 10+6

2 × 10+6

Jun15 Jul15 Aug15Date

Num

ber o

f pur

chas

es

DailyWeek avg

Figure 7: Number of purchases in a day and average weekly

2 × 10+6

4 × 10+6

6 × 10+6

8 × 10+6

0 5 10 15 20Time of the day

Num

ber o

f pur

chas

es

Figure 8: Number of purchases in each hour of the day

Table 6: Top 5 items with the most number of repurchasesRank Item name Median purchase delay

1 Pampers 448 count 422 Bath tissue 623 Pampers 162 count 304 Pampers 152 count 315 Frozen 12

ferred the time zone from the user’s zip code, which might be dif-ferent from a shipping zip code for a purchase.

Researchers have also reported monthly effects, where peoplespend more money at the beginning of the month when they re-ceive their paychecks, compared to the end of the month [14]. Totest the first-of-the-month phenomenon, we compared spending inthe first Monday of the month with the last Monday of the month.We considered the first and the last Mondays and not the first andthe last days, because the strong weekly patterns would result inan unfair comparison if the first and the last day of the month arenot the same day of the week. Our data does not support the ear-lier findings and there are months in which the last Monday of themonth includes more activity compared to the first Monday of themonth.

3.2.2 Recurring PurchasesSome products are purchased periodically, such as printer car-

tridges, water filters, or toilet paper. Finding these items and theirtypical cycle would help predicting purchasing behavior. We dothis by counting the number of times each item has been purchasedby each user, then from each user’s count we eliminate those prod-ucts purchased only once, and lastly we aggregate the number of

10−5

10−3

10−1

100 101 102

Days from last purchase

PDF

Figure 9: Distribution of number of days between purchases

0.10

0.15

0.20

0.25

0.30

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

Figure 10: Relationship between purchase price and time tonext purchase (0.95 confidence interval are shown yet too smallto be observed)

purchases per each item. Table 6 shows the top five such products,along with the median number of days between purchases. Outof the top 20 products only four are neither toilet paper nor tissue(Frozen, Amazon gift card, chocolate chip cookie dough, and sin-gle serve coffees). In the top 20 list, the only unexpected item isthe Frozen DVD, which probably made the list due to users buyingadditional copies as gifts or due to purchases by small stores thatwere not eliminated by our removal criterion of maximum 1,000purchases. Interestingly, the number of days between purchases formost of the top 20 items is close to 1 or 2 months, which might bedue to automatic purchasing that users can set up.

3.2.3 Finite Budget EffectsFinally, we study the dynamics of individual purchasing behav-

ior. Figure 9 shows the distribution of number of days betweenpurchases. The distribution is heavy-tailed, indicating bursty dy-namics. The most likely time between purchases is one day andthere are local maxima around multiples of 7 days, consistent withweekly cycles we observed.

An individual’s purchasing decisions are not independent, butconstrained by their finances. Budgetary constraints introduce tem-poral dependencies between purchases made by the user: Afterbuying a product, the user has to wait some time to accumulatemoney to make another purchase. Previous work in economicsstudied models of budget allocation of households across differenttypes of goods to maximize an utility function [13] and analyzed

209

0.19

0.20

0.21

0.22

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

NormalShuffled

(a) Users with 5 purchases

0.100

0.105

0.110

0.115

0.120

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

NormalShuffled

(b) Users with 9-11 purchases

0.030

0.035

0.040

0.045

0.050

0 50 100 150Days from last purchase

Aver

age

norm

alize

d pr

ice

NormalShuffled

(c) Users with 28-32 purchases

Figure 11: Relationship between purchase price and time to next purchase with 0.95 confidence interval

conditions under which people are willing to break their budgetcap [8]. However, we are not aware of any study aimed to supportthe hypothesis of the time of purchase being partly driven by anunderlying cyclic process of budget depletion and replenishment.

To test this hypothesis, we examined the relationship betweenpurchase price and the time period since last purchase. Since differ-ent users have different spending power, we considered the normal-ized change in the price given the number of days from the last pur-chase. In other words, we computed how users divide their personalspending across different purchases, given the time delay betweenpurchases. We then averaged the normalized values for all users,and report the change for each time delay. Figure 10 shows that asthe time delay gets longer, users spend higher fraction of their bud-gets, which supports our hypothesis. To test that our analysis doesnot have any bias in the way the users are grouped, we perform ashuffle test by randomly swapping the prices of products purchasedby users. This destroys the correlation between the time delay andproduct price. We then do the same analysis with the shuffled dataand expect to see a flat line. However, the same increase also existsin the shuffled data, indicating a bias in the methodology. This isdue to the heterogeneity of the underlying population: we are mix-ing users with different number of purchases. Users making morepurchases have lower normalized prices and also shorter time de-lays, and are systematically overrepresented in the left side of theplot, even in the shuffled data.

To partially account for heterogeneity, we grouped users by thenumber of purchases (i.e., those who made exactly 5 purchases,those with 9-11 purchases, and 28-32 purchases). Even withineach group there is variation as the total spending differs signifi-cantly across users, which we address by normalizing the productprice by the total amount of money spent by the user, as explainedabove. If our hypothesis is correct, there should be a refractory pe-riod after a purchase, with users waiting longer to make a largerpurchase. We clearly observe a positive relationship between (nor-malized) purchase price and the time (in days) since last purchase(Figure 11), but not in the shuffled data, which produces a horizon-tal line. We conclude that the relationship between time delay andpurchase price arises due to behavioral factors, stemming from thelimited budget of customers.

3.3 Social FactorsAn individual’s behavior is often correlated with that of his or hersocial contacts (or friends). In online shopping, this would resultin users purchasing products that are similar to those purchased bytheir friends. Distinct social mechanisms give rise to this corre-lation [5]. First, a friend could influence the user to buy the sameproduct by highly recommending it. This is the basis for social con-

tagions in general, and “word-of-mouth” marketing in particular,although empirical evidence suggests that influence has a limitedeffect on shopping behavior [27]. Alternatively, users could havebought the same product as their friends purchased, because peo-ple tend to be similar to their friends, and therefore, have similarneeds. The tendency of socially connected individuals to be similaris called homophily, and it is a strong organizing principle of so-cial networks. Studies have found that people tend to interact withothers who belong to a similar socio-economic class [16, 29] orshare similar interests [25, 4]. Finally, a user’s and their friend’s be-havior may be spuriously correlated because both depend on someother external factor, such as geographic proximity. In reality, allthese effects are interconnected [9, 3] and are difficult to disentan-gle from observational data [35]. For example, homophily oftenresults in selective exposure that may amplify social influence andword-of-mouth effects.

We investigate whether social correlations exist, although we donot resolve the source of the correlation. Specifically, we studywhether users who are connected to each other via email interac-tions tend to purchase similar products in contrast to users whoare not connected. To measure similarity of purchases betweentwo users, we first describe the purchases made by each user witha vector of products, each entry containing the frequency of pur-chase. This approach results in large and sparse vectors due to thelarge number of unique products in our data set. To address thischallenge, we use vectors of product categories, instead of prod-uct names. There are three levels of product categories, and weperform our experiments at all levels.

We compare similarity of category vectors of pairs of users whoare directly connected in the email network (104K pairs of users)with the same number of pairs of randomly chosen users (whoare not directly connected). We use cosine similarity to measuresimilarity of two vectors. Using top-level categories to describeuser purchases gives average similarity of 0.420 between connectedpairs of users, whereas random pairs have similarity of 0.377 onaverage (+11% relative change as compared to connected pairs).Using the more detailed level-2 categories to describe purchasesgives average similarity of 0.215 for connected versus 0.170 forrandom pairs of users (+26% relative increase). Finally using themost detailed, level 3, categories results in average similarity of0.188 for connected vs. 0.145 for random pairs of users (+30%relative increase). Although the absolute similarity decreases as amore detailed product vector is used, shoppers who communicateby email are always more similar than random shoppers who arenot directly connected.

Gender also plays an important role when measuring purchasesimilarity between user pairs. To quantify this effect, we calculate

210

9

10

11

12