the interplay of information from friends and from the crowd ......the interplay of information from...

38
The Interplay of Information from Friends and from the Crowd to Search and to Purchase Experience Goods * Baojiang Yang Department of Engineering and Public Policy, Carnegie Mellon University, [email protected] Miguel Godinho de Matos Cat´olica Lisbon School of Business and Economics, [email protected] Pedro Ferreira Heinz College and Department of Engineering and Public Policy, Carnegie Mellon University, [email protected] Consumers use information from both friends and the crowd to estimate the quality of products prior to search and purchase. However, these signals may carry information of different nature and value and thus consumers are likely to combine them in different ways throughout the consumption funnel. We study how they do so using an optimal stopping framework to model search behavior and a multinomial choice model to describe purchase decisions. We show results from an observational study using clickstream data from a large provider of video-on-demand and from a randomized control trial using an online video-on-demand system created and operated by us for the purpose of this study. In the latter case, we randomize the friends that buy movies, the movies’ number of likes and their prices, thus obtaining identification by design for the effects of interest. We find consistent evidence that the relative value of a like increases from search to purchase although less so for more expensive movies. In particular, for the most browsed movies, additional likes do not change the likelihood of searching for a movie but increase the likelihood of purchasing the movie whereas additional friends’ rentals increase the likelihood of searching for a movie but do not further increase the likelihood of purchasing the movie. In our setting, consumers seem to primarily start by browsing movies that their friends bought to form a consideration set and use likes for decision making purposes only when they are closer to commit. Our results show how highlighting different signals throughout the consumers’ shopping journey may help improve recommender systems. Key words : Peer Influence, Likes, Clickstream Data, Randomized Experiment 1. Introduction The shopping journey of a consumer is a dynamic process whereby she acquires information about products from different sources and decides which ones to buy (Chen et al. 2011). Information from friends has been sought forever, usually in person and by discussing product features face to face. Information from strangers is now increasingly available over the Internet in the form of ratings * Authors in reverse alphabetical order 1

Upload: others

Post on 17-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • The Interplay of Informationfrom Friends and from the Crowd

    to Search and to Purchase Experience Goods *

    Baojiang YangDepartment of Engineering and Public Policy, Carnegie Mellon University, [email protected]

    Miguel Godinho de MatosCatólica Lisbon School of Business and Economics, [email protected]

    Pedro FerreiraHeinz College and Department of Engineering and Public Policy, Carnegie Mellon University, [email protected]

    Consumers use information from both friends and the crowd to estimate the quality of products prior to

    search and purchase. However, these signals may carry information of different nature and value and thus

    consumers are likely to combine them in different ways throughout the consumption funnel. We study how

    they do so using an optimal stopping framework to model search behavior and a multinomial choice model

    to describe purchase decisions. We show results from an observational study using clickstream data from

    a large provider of video-on-demand and from a randomized control trial using an online video-on-demand

    system created and operated by us for the purpose of this study. In the latter case, we randomize the friends

    that buy movies, the movies’ number of likes and their prices, thus obtaining identification by design for

    the effects of interest. We find consistent evidence that the relative value of a like increases from search to

    purchase although less so for more expensive movies. In particular, for the most browsed movies, additional

    likes do not change the likelihood of searching for a movie but increase the likelihood of purchasing the movie

    whereas additional friends’ rentals increase the likelihood of searching for a movie but do not further increase

    the likelihood of purchasing the movie. In our setting, consumers seem to primarily start by browsing movies

    that their friends bought to form a consideration set and use likes for decision making purposes only when

    they are closer to commit. Our results show how highlighting different signals throughout the consumers’

    shopping journey may help improve recommender systems.

    Key words : Peer Influence, Likes, Clickstream Data, Randomized Experiment

    1. Introduction

    The shopping journey of a consumer is a dynamic process whereby she acquires information about

    products from different sources and decides which ones to buy (Chen et al. 2011). Information from

    friends has been sought forever, usually in person and by discussing product features face to face.

    Information from strangers is now increasingly available over the Internet in the form of ratings

    * Authors in reverse alphabetical order

    1

  • 2

    and reviews. However, the information from friends and from the crowd is likely to convey different

    signals, not only of a different nature but also of different value (Bikhchandani et al. 1998). For

    example information from friends may be seen as more trustworthy and useful because friends

    usually exhibit similar preferences (Manski 1993) and explain each other the reasons why they like

    or dislike certain products. On the other hand, the information from the crowd may average out

    extreme opinions providing more reliable (average) evaluations (Lorenz et al. 2011, Godinho de

    Matos et al. 2016). Still, the latter has also been shown to sometimes suffer from herding (Salganik

    et al. 2006, Salganik and Watts 2008, Muchnik et al. 2013, Lee et al. 2015), which leads to biased

    estimates of product quality. The prior literature has also shown that while average ratings and

    number of votes convey summary information that is easy to read and use they do not capture all

    the diversity that reviews convey (Archak et al. 2011).

    So far, three papers have used historical data to characterize how consumers use information

    from friends and from the crowd. (Chen et al. 2011) study the effect of Word of Mouth (WoM)

    and of Observational Learning (OL) on sales at Amazon.com. The former is modeled by how many

    people bought a product and wrote reviews while the latter is modeled by how many people bought

    the product from those that browse it. The authors find that negative WoM affects sales more than

    positive WoM and that the reverse seems to hold for OL. These authors also find that the volume

    (but not the valence) of WoM and of OL complement each other. However, in an online music

    community, (Dewan et al. 2017) find that popularity, measured by the aggregate number of votes,

    and proximity, measured by the number of votes from friends, are substitutes for each other and

    that the effect of proximity dominates the effect of popularity when both are present. Using data

    from an online movie service, (Lee et al. 2015) find evidence of both herding and differentiation

    with respect to the effect of ratings from the crowd on subsequent ratings. Namely, they find that

    users issue fewer negative ratings in response to more positive prior ratings by the crowd as movie

    popularity increases and that the herding effect of friends’ prior ratings becomes always stronger

    with more prior ratings from friends.

    These studies look at how consumers use information from friends and from the crowd to purchase

    and evaluate products. They focus on the purchase and post-purchase behavior of consumers and

    do not observed how they use these signals in the core of the consumption funnel where awareness

    and search develops. However, knowing how consumers use these signals at these stages of the

    decision making process is fundamental to shape and improve consumption. For example, one may

    imagine improving recommender systems by highlighting a different signal at each stage of the

    consumption funnel in line with what source of information matters most to consumers at each step

    of their thought process. In addition, product characteristics, most notably price, may also shape

    the way consumers combine information from these two sources. Therefore, one may also consider

  • 3

    developing a recommender system that changes the information displayed to consumers based

    on the products searched. However, and at the outset, it is unclear which source of information

    (from friends or from the crowd) is more important when and how price mediates their relative

    importance. (Shukla et al. 2018) use clickstream data from a major doctor booking platform and

    find that patients pay less attention to price when the number of recommendations given by patients

    to doctors, which they call word of mouth in their setting, is available. These authors characterize

    the tradeoff between price and number of recommendations while patients decide with doctor to

    choose but their setting does not differentiate between information from the crowd and information

    for close friends, which is the focus of our study.

    In a world where obtaining and interpreting information is costly (Kim et al. 2010) and where

    excess of information has been shown to hinder the consumers’ search process (Iyengar and Lepper

    2000, Ghose et al. 2018a), consumers need to decide whether to get information about the products

    in the first place, which reinforces the importance of understanding how they use and combine

    information from friends and from the crowd prior to purchase. However, measuring the effect of

    information from friends and from the crowd, either separately or simultaneously, is empirically

    difficult because of the unobserved effects that may drive the organic behavior of consumers (Rohilla

    Shalizi and Thomas 2010). In particular, homophily in social networks has been shown to lead

    researchers to overestimate peer effects (Aral et al. 2009) unless some exogenous source of variation

    orthogonal to the process of network formation can be explored to identify then (Taylor and

    Eckles 2017). In recent times, peer effects have been identified using randomized control trials

    across many fields (marketing, education, labor, development) and, in general, researchers find

    that their magnitude is smaller than previously thought. Still, votes and reviews have been shown

    to significantly affect consumer behavior, in particular in the case of experience goods. Our work

    provides results from both an observational study and a randomized control trial, which come in

    line with each other thus lending robustness to our findings.

    We study how information from friends and information from the crowd affect how consumers

    decide to search and to purchase movies in Video-on-Demand (VoD). We study movies because

    they are an experience good (Nelson 1970) and it is thus likely that in a VoD system consumers use

    both these signals to make decisions. We apply a structural model to describe both the decision of

    consumers to search for information about movies and their decision to rent a movie. We model the

    former type of decision using an optimal stopping rule whereby consumers search for an additional

    movie if the expected benefit from doing so supersedes the search cost (Stigler 1961, Weitzman

    1979). We model the latter type of decision using a multinomial choice model where consumers pick

    up the movie that yields the highest utility from the ones that they have previously searched (Chen

    et al. 2011). In other words, search is used to build a consideration set and purchase decisions are

  • 4

    performed over such sets (Ke et al. 2016). This model is consistent with the primary role of digital

    recommender systems, which is to help consumers narrow down the set of interesting products to

    consider from the ever larger catalogs of options offered online (Zhang et al. 2015).

    Our paper studies two empirical contexts. In our first study, we use clickstream data from the

    VoD system of a large telecommunications provider, hereinafter called TELCO. In our second

    study, we use data from an online VoD system that we created and operated for the purpose of

    this study and we use Amazon Mechanical Turk (AMT) to recruit consumers for it. In both cases,

    we proxy the information from friends by whether they purchase movies, about which they can

    talk about, and we measure the wisdom of the crowd by the number of likes. In our first study, we

    proxy the social network using detailed records for the cellphone calls served by TELCO. In our

    second study, we embed participants in a virtual social network and allow them to browse their

    friends to learn about the movies that they have previously rented. In this study, we follow the

    literature in experimental economics (Ding 2007) to design a incentive compatible task at AMT,

    thus inducing as much realism as possible to improve the truthfulness of the participants’ choices.

    We randomize the number of friends’ rentals, the number of likes and the price to rent movies in

    our second study, thus obtaining identification by design for the effects of interest.

    In our first study, we find that more friends’ rentals and more likes increase the likelihood of

    clicking the page of a movie to obtain more information about it and the likelihood of renting the

    movie. We also find that, on average, the value of likes compared to that of friends’ rentals tends to

    increase from the search stage to the purchase stage but less so for more expensive movies. We find

    even stronger results when we restrict the analysis to the most “popular” movies in the TELCO’s

    VoD system (proxied by most browsed by consumers). Across this set of movies, additional likes do

    not increase the likelihood of clicking the movie page but do increase the likelihood of purchasing

    the movie whereas friends’ rentals increase the likelihood of clicking the movie page but do not

    further increase the likelihood of renting the movie. Therefore, in this case, the increase in the

    value of likes relative to that of friends’ rentals from the search to the purchase stage is even more

    pronounced. We also find that friends’ rentals reduce the likelihood of clicking the movie page when

    friends’ watch less than half of the movie, which is consistent with the idea that friends’ rentals

    proxies the potential transmission of information among friends about movie quality. In our second

    study, we also find that only friends’ rentals increase the likelihood of clicking the movie page and

    that only likes increase further the likelihood of purchase. The consistency of our results across the

    two studies increases our confidence in our findings and, in particular, the randomized nature of

    our second study is reassuring of proper econometric identification.

    Our results provide new insights for how consumers combine information from friends and from

    the crowd throughout the search-purchase process for the case of an experience good (movies).

  • 5

    These insights can be valuable to managers and practitioners. In particular, recent studies show that

    vendors design online search engines to strategically shape consumer choice (Dukes and Liu 2015)

    by dynamically directing their attention to specific sources of information (Zhu and Dukes 2017).

    Our findings show that consumers seem to use information from friends to form a consideration

    set and consider only the number of likes when deciding what to buy. Therefore, a potential way

    to improve recommender systems is to highlight information from friends early in the consumers’

    digital shopping journey and show assessments from the crowd only later when consumers may

    be closer to commit. This would be a departure from the current practice at most e-commerce

    websites that show all signals at all times, which may hinder, instead of guide, the search process.

    The remainder of this paper is organized as follows. Section 2 describes the relevant prior work.

    Section 3 describes our modeling framework. Section 4 covers our first study and section 5 covers

    our second study. Finally, section 6 concludes.

    2. Related Work2.1. The Effect of Ratings and Reviews on Consumer Behavior

    The prior literature on the effect of online reviews on sales provides mixed findings. For example,

    (Chevalier and Mayzlin 2006, Li and Hitt 2008) find that user reviews increase the sales of books

    at online retailers such as Amazon.com and Barnes-and-Noble, but (Sharon et al. 2004) find no

    effect using another dataset from the former retailer. In line with the latter finding, (Duan et al.

    2009) show that user ratings do not affect the adoption of popular software. Closer to our setting,

    (Liu 2006) and (Duan et al. 2008) find that the box office sales of movies does not change with the

    valence of ratings but increase with the volume of reviews. However, (Chintagunta et al. 2010) finds

    that it is the former that drives box office sales and not the latter. In sum, the effect of reviews on

    sales is complex and context specific. For example, using data from Amazon.com, (Forman et al.

    2008) show that the identity of reviewers shapes future reviews and subsequent sales.

    It has also been shown that ratings may suffer from herding as well as induce herding behavior.

    Herding describes the phenomenon whereby individuals converge in their decisions when they take

    into account the decisions of those that had to decide before them (Banerjee 1992, Bikhchandani

    et al. 1998). Empirically, and using an artificial market for music, (Salganik et al. 2006, Salganik

    and Watts 2008) find that the popularity of songs is self-reinforcing for medium quality songs.

    Using data from a social news aggregator, (Muchnik et al. 2013) find that the likelihood of an up-

    vote is much higher when users initially see a positive vote. These authors also find no significant

    effect when users see a negative vote. In a setting closer to ours, (Godinho de Matos et al. 2016)

    find that users correct both positive and negative exogenous manipulations to the number of likes

    of movies in a VoD system. Taken as a whole, these studies show that self-fulfilling prophecies

  • 6

    arise in markets for experience goods but also that they are constrained by the individuals’ private

    preferences. Individuals take the opinions from their friends into account to form and evolve their

    private preferences (Godes et al. 2005) and, therefore, it seems necessary to study how consumers

    combine information from friends and from ratings to obtain a fuller picture of how they behave.

    Other authors find evidence that ratings tend to deteriorate over time. (Li and Hitt 2008)

    attribute this effect to the differences in preferences between consumers that buy early versus those

    that buy later. (Moe and Trusov 2011) and (Godes and Silva 2012) confirm this finding showing

    that ratings worsen over time because the preferences of reviewers widens with more reviewers.

    Finally, (Moe and Schweidel 2012) find that less frequent reviewers imitate prior reviewers while

    more active reviewers tend to issue more negative reviews as a way to differentiate.

    2.2. The Effect of Peer Influence on Consumer Behavior

    Leenders (2002) defines peer influence as the dyadic process by which people “...shape their behav-

    ior, beliefs and attitudes according to what other people in the social system think, express and

    do”. Peer influence can be intentional or unintentional. It is not limited to direct communication

    but, one way or another, information about the behavior and attitudes of friends needs to be avail-

    able and shared. Sociology offers a number of theories that explain how peer influence arises. Most

    of them look at how the behavior and attitudes of friends change one’s assessment of a situation. In

    particular, the opinions of friends are often seen as standards – frames of reference – against which

    people evaluate their own opinions and options (Coleman et al. 1966). Friends offer signals, which

    reduce uncertainty, increasing agreement in opinions, attitudes and beliefs. The classical works of

    (Festinger 1950, Festinger and Thibaut 1951, Berelson 1954, Katz and Lazarsfeld 1955, Lazarsfeld

    et al. 1968) show empirically that in fact people use personal contacts to obtain more information

    and to better support their arguments.

    Peres et al. (2010) provide a framework to classify the factors that drive product adoption.

    The authors distinguish factors that stem only from heterogeneity among individuals and factors

    that involve social interactions. The former includes only individual characteristics that determine

    whether and when adoption occurs, while the latter includes all forms of communication across

    individuals – i.e., peer influence. In this framework, diffusion is the process that leads new products

    to spread across markets that is driven by “social influences”. These influences “include all of

    the interdependencies among consumers that affect various market players with or without their

    explicit knowledge.” Some studies focus on the effect of communication over specific channels, such

    as messages in an online social network (Aral and Walker 2012) or offline word-of-mouth (Mobius

    et al. 2015). In this paper, we proxy the transmission of information from friends by their previous

    movie rentals and, therefore, we remain agnostic to the specific form or channel for information

    transmission.

  • 7

    Different streams of research have operationalized differently the mechanisms by which individ-

    uals influence each other. In threshold models (e.g., Granovetter 1978) adoption occurs when a

    given fraction of one’s friends adopts the product. In hub models (e.g., Watts and Dodds 2007), a

    number of well-informed central agents adopt the product leading their friends to adopt. All these

    models explore the idea that, under the right conditions, a small number of initial adopters may

    potentially lead to significant adoption. However, this might not always be the case. Watts and

    Dodds (2007) run a set of computer simulations to test this hypothesis finding that in most cases

    large cascades of peer influence might be driven not by influential individuals (opinion leaders) but

    rather by a large number of easily influenced people. Goel et al. (2012) analyze diffusion patterns

    in seven online domains, such as Twitter and Yahoo. They find similarities across all domains,

    namely that most adoption is part of very simple cascades of only one hop, and that only a very

    small fraction of adoptions are associated to longer cascades. One reason why some studies provide

    large statistics for the effect of peer influence is that they fail to appropriately control for latent

    homophily, as discussed in (Rohilla Shalizi and Thomas 2010). For example, Aral et al. (2009)

    show that failing to control for homophily can inflate the estimates of peer influence by 300-700%.

    These authors look at peer influence in an instant messaging network and use a matched-sample

    estimation strategy to distinguish homophily from peer influence. They conclude that the latter

    is responsible for at least 50% of the observed correlation. These results speak to the importance

    of using appropriate empirical strategies to avoid overestimating peer influence. In this paper, we

    add results from a randomized control trial to results obtained using historical data, which allows

    us to better identify the effects of interest.

    2.3. Interplay between Information from Friends and from the Crowd

    Three papers have looked at the interplay between information from friends and information from

    the crowd using observational data. (Dewan et al. 2017) works with a website devoted to songs

    where users can befriend each other and thus form social circles. The authors explore the fact that

    on a certain date this website started showing the number of users who “favorited” each song.

    The authors find that this measure of aggregate popularity affects consumption and relatively

    more so for narrow appeal songs. In their setting, consumption refers to listening songs (for free)

    on the website. The authors find that proximity – i.e., the effect of peers in the social network

    “favoriting” songs – also affects consumption and that popularity and proximity are substitutes.

    Namely, they find that the effect of popularity is stronger when proximity is not present and the

    effect of proximity, when present, dominates the effect of popularity.

    (Chen et al. 2011) study the effect of Word of Mouth (WoM) and of Observational Learning (OL)

    at Amazon.com. The former refers to the dissemination of information through communication

  • 8

    among people (Arndt 1967) whereas the latter refers to consumers learning what others do by

    observing aggregate behavior but not necessarily the reasons behind their choices (Boone et al.

    1977, Bikhchandani et al. 1998). Taking advantage of changes in the information shown by this

    retailed to consumers on what percentage of them bought a product that they had previously

    considered, the authors find that negative WoM affects sales more than positive WoM. However,

    the opposite seems to hold in the case of OL. Consequently, the authors conclude that offering

    information about aggregate sales may help popular products without hurting niche ones. More

    important for our study, the authors also find significant complementarity between the effect of

    the volume of WoM and of OL for sales but no significant interaction effect for their valence. Not

    surprising, the volume and the valence of WoM and of OL may exhibit different effects because they

    provide different information. While valence is believed to affect consumers’ valuations (Chevalier

    and Mayzlin 2006, Mizerski 1982) volume is more likely to affect consumer awareness (Liu 2006).

    In a setting similar to ours, (Lee et al. 2015) use data from a social movie website where users

    can befriend each other and rate movies. Using the identification strategy suggested in (Bramoull

    et al. 2009) to identify peer effects, the authors find evidence of both herding and differentiation

    with respect to the effect of ratings from the crowd, while ratings from friends seem to always

    induce herding. Namely, the authors find that users issue fewer negative ratings in response to

    more positive prior ratings by the crowd as movie popularity increases. However, the herding effect

    of friends’ prior ratings becomes always stronger with more prior ratings from friends. The authors

    also find that social networking reduces the herding from prior ratings by the crowd. Overall, the

    authors conclude that ratings may provide biased indicators of movie quality and that combining

    movie ratings from the crowd and those from friends is unlikely to address this concern.

    Our paper differs from the ones described above in several important dimensions. First, none

    of these papers looks at the interplay between information from friends and information from the

    crowd for the decision to search information about products. Our work closes this gap by char-

    acterizing how consumers use both sources of information at different stages of their shopping

    journey, thus enriching our knowledge of how best to display information to them at the core of

    the consumption funnel. Second, these papers have not studied how price affects the interplay

    between information from friends and information from the crowd. Yet, it may be the case that

    consumers combine these two sources of information differently depending on whether the prod-

    ucts considered are cheap or expensive. Our paper studies how price mediates consumer behavior

    showing heterogeneous effects in this respect, thus pointing to the idea that improving the con-

    sumers’ shopping experience may require exposing them to different information sets at each stage

    of the search-purchase process based on product characteristics. Finally, the three papers referred

    above rely only on observational datasets to estimate the effects of interest. Our work adds results

  • 9

    from a randomized control trial to increase the confidence in our results. Randomizing number of

    likes, friends’ rentals and movie prices offers us an improved handle on identification allowing us

    to better claim causal effects.

    2.4. Modeling the Consumption Funnel and Search-Purchase Decisions

    The idea of a conversion funnel has been central to the marketing literature for a long time (Howard

    and Sheth 1973, Barry 1987) as well as widely accepted by marketers (Mulpuru 2011, Court et al.

    2009). This approach to modeling consumer behavior considers that consumers move through

    several stages, from awareness to post-purchase, and that they may change stages as a function

    of the information that they obtain about products. The most often used approach to model the

    conversion funnel is to include the following stages - awareness, consideration and purchase (Bruce

    et al. 2012, Mulpuru 2011, Court et al. 2009). At the start, consumers receive some trigger, in

    many instances advertising, and become aware of the products in the market. Subsequently, they

    search for additional information about the products that may be of interesting to them, leading

    them to create a consideration set with the products that they are potentially interest in. Finally,

    consumers decide whether to purchase a product among the ones they collected information for. In

    this paper, we study only consumers that are already aware of the product (movies available from

    a VoD system) and thus we focus our analysis on the latter two stages of the conversion funnel,

    namely consideration and purchase.

    Measuring the movement of consumers throughout the conversion funnel has been historically

    difficult due to the lack of detailed individual level data allowing researchers to observe the con-

    sumers’ stage. This has now been circumvented in online settings with the pervasiveness of click-

    stream data. For example, (Abhishek et al. 2018) use detailed data from several web touch-points

    to address the problem of multi-stage attribution in advertising. (Shukla et al. 2018) use click-

    stream data from a major doctor booking platform to find that online WoM reduces consumer

    search costs and that patients pay less attention to price when WoM is available. In a setting more

    similar to ours, (Ghose et al. 2018a) uses the optimal sequential search framework put forward in

    (Weitzman 1979) to model the decisions of consumers to search for hotels and to purchase rooms.

    The sequential search strategy to study consumer search behavior in online settings has been also

    used in (Kim et al. 2010, Koulayev 2014, Chen and Yao 2016). In this modeling framework, which

    we also use in this paper, consumers search for an additional product if the expected benefit from

    doing so supersedes the search cost. Consumers search for additional products as long as the for-

    mer condition holds. Once done with search, consumers choose the product in their consideration

    set that maximizes expected utility or abandon the market without purchasing one if the outside

    option holds higher utility.

  • 10

    As thoroughly discussed by these authors an empirical challenge in estimating search models is

    to simultaneously identify the heterogeneity in consumers’ preferences and search costs (Hortasu

    and Syverson 2004). In short, consumers may stop searching for additional products because of

    high search costs or because they value highly the products already searched. Fortunately, the

    optimal sequential search framework offers an identification strategy akin to selection models,

    where estimating the search decision mimics the first stage of such models and estimating the

    purchase decision mimics the second stage. Search decisions depend on both consumer preferences

    and search costs. However, search costs do not enter the purchase decision because consumers

    choose products in the consideration set (i.e., search costs are sunk for the purchase decision) and

    thus act as exclusion restrictions. Still, the setting that we study in out paper is far more complex

    than those previously considered in the literature because besides simultaneously estimating search

    and purchase decisions we are also interested in estimating both the effect of the crowd and the

    effect of peers. For this purpose, in the second part of our study we resort to a randomized control

    trial that provides identification by design for these effects.

    3. Modeling Approach3.1. Consumer Utility Functions and Search Costs

    We index consumers by i∈ I= {1, . . . ,N} and movies by j ∈ J= {1, . . . , J}. Consumer i has specific

    knowledge about movie j, represented by Xij. This includes all the information revealed to this

    consumer about this movie before she clicks on the movie cover. When she clicks on the cover of

    movie j additional information is revealed to her, represented below by zj, which is the same for all

    users. After this information is revealed, the utility from renting movie j to consumer i, represented

    below by uaij, is given by

    uaij = Xijβ+ zjγ+ �ij (1)

    where �ij ∼ Type I EV (0,1), as is customary in models where consumers choose among a pre-

    defined finite set of alternatives, represents the unknown idiosyncratic stochastic error term, which

    we assume is i.i.d. across consumers and movies. In our setting searching for a movie to watch

    is associated to clicking movie covers to learn more about the movies. Following the most used

    approaches in the literature to model search behavior, we assume that prior to clicking on the

    movie cover consumers do not know zj but know its distribution (Kim et al. 2010, Koulayev 2014,

    Ghose et al. 2018b). Accordingly, the consumer’s utility function before clicking on the movie cover

    can be written as

    ubij = Vij + �ij ; Vij = Xijβ+E(zj)γ+ ξzj (2)

  • 11

    The expected value of zj represents the consumer’s best guess for the true value of zj. This is

    computed by each consumer as a function of the information available to her in Xij. Before, clicking

    on the movie cover consumers use the expected value of zj in lieu of its true value introducing an

    additional error term, namely (zj − E(zj))γ, represented by ξzj . Clicking the movie page realizes

    ξzj + �ij for the consumer (while �ij is still unobserved to the researcher). Finally, searching for

    movies is costly. Following (Kim et al. 2010), we model the marginal search cost of movie j for

    consumer i, represented below by cij, using

    cij = exp{wijη} (3)

    where wij contains several covariates that describe the position of movie j to consumer i in the

    VoD system used by this consumer. This cost is sunk for the purchase decision and enters only the

    consumer’s decision to search, as explained below in more detail.

    3.2. The Consumer Search and Purchase Decisions

    Consider a stage k in the search process of consumer i when the k movies in Sik ⊂ J have been

    searched. The set Sik ∪ {0} represents the consideration set of this consumer at this stage, where

    0 represents not buying a movie. At this stage, this consumer can either continue searching for

    another movie or stop searching and purchase the alternative from her consideration set with the

    highest utility. Let u∗ki represent the utility of this alternative, that is, u∗ki =maxj∈Sik{uij,0}. The

    expected benefit from searching an additional movie j′ in J−Sik given the current highest utility,

    represented below by Bij′(u∗ki), is given by

    Bij′(u∗ki) =

    ∫ ∞u∗ki

    (ubij′ −u∗ki)f(ubij′)dubij′ (4)

    where f(x) is the probability density function of movie utility x. Consumer i keeps searching

    for movies as long as there is at least one movie j′ in J− Sik with cij′ u∗ki, that is, if its reservation utility supersedes the (highest) utility that the

    consumer can enjoy at the current search stage. Consequently, the search decisions of consumer

    i can be modeled as follows: (1) compute Rij for all movies j ∈ J and rank them in descending

    order, that is, let rij represent the index of the movie with the j highest reservation utility for

    consumer i and thus Riri1 >Riri2 > · · ·>RiriJ ; (2) at stage k if Riri(k+1) > ukirk

    then search movie

  • 12

    ri(k+1), otherwise stop searching, choose the movie with index argmaxj′∈Sik∪{0}uij′ and accept u∗ki.

    As a consequence, the search session for consumer i when she stops searching at stage k, which

    we represent by the corresponding consideration set Sik, includes the ordered sequence of movies

    indexed by ri1, ri2, . . . , rik.

    Let πi(rij) represent the probability that consumer i clicks the page of the movie with the j

    highest reservation utility. This is equal to the probability that all previous j−1 draws of utility are

    lower than Ririj , which is given by∏j−1k=1F (Ririj −Vik) for j > 1 and equal to 1 if j = 1 (given that

    our data are conditional on search having occurred), where F (·) is the CDF for the distribution

    of movie utility. The probability that a search session with sequence Sik occurs for consumer i,

    represented below by πi(Sik), is given by the probability that this consumer searches movie rik

    but does not search movie ri(k+1), which is πi(rik)− πi(ri(k+1)). The probability that the movie

    with the j(≤ k) highest reservation utility for consumer i is purchased from such a search session,

    represented below by πi(rij|Sik), is defined by P (uirij >uij′ ,∀j′ ∈ Sik ∪{0}−{rij}).

    3.3. Estimating the Search-Purchase Model

    Estimating the model introduced above requires computing the reservation utilities. From equation

    4 and from the definition of reservation utility, we have:

    cij =

    ∫ ∞Rij

    (ubij −Rij)f(ubij)d(ubij) = (1−F (Rij))∫ ∞Rij

    (ubij −Rij)f(ubij)

    1−F (Rij)d(ubij) (5)

    = (1−Φ(ζij))(ubij −Rij +φ(ζij)

    1−Φ(Rij)σubij

    ) (6)

    where ζij = (Rij − ubij)/σubij and the last equality arises using the expectation of the truncated

    normal distribution and thus is only valid when ubij follows such a distribution. However, this

    is not the case in our setting given that �ij ∼ Type I EV (0,1). In practice, we transform this

    logit error term into standard normal disturbances, represented by σ̃�ij , using an inverse standard

    normal CDF function, as proposed and widely used by previous studies to compute the inverse

    Mill’s ratio for the logit distribution (Ghose et al. 2018a). We also assume zj ∼N(0,Σzj) and thus

    zjγ ∼ N(0, σ2zj) from where σ2ubij

    = σ̃2�ij + σ2zj

    =∑k

    n=1 γ2(n)Σzj(n,n) + 2

    ∑km=2

    ∑m−1n=1 γ(m)γ(n)Σzj(m,n).

    Therefore, given a pair {cij, Vij}, we can compute the corresponding reservation utility Rij by

    solving the equation above. In practice, and for performance sake, we follow the literature and

    compute a grid of {cij, Vij,Rij} triples ahead of time and look it up when needed. We also use

    polynomial interpolation to approximate the reservation utility when the {cij, Vij} given is not

    explicitly given by this look-up table.

  • 13

    Armed with the reservation utilities we can immediately compute πi(Sik) and πi(rij|Sik) from

    where we can estimate our model parameters θ = {β,γ, η} using maximum simulated likelihood,

    with our log-likelihood function given by:

    Log Likelihood(θ|S,Y) =∑i∈I

    ∑s∈Si

    {ln(πi(s)) +∑k∈s

    yiksln(πi(k|s))} (7)

    where Si is the set of sessions of consumer i in our data and S =⋃i∈I Si is the set of all search

    sessions in our data. Finally, yiks is an indicator for whether consumer i purchased product k in

    session s and Y=⋃i∈I,k∈J,s∈Si

    yiks.

    There is no closed-form solution for the model parameters that maximize our log-likelihood

    function given by expression 7. As is customary in the literature (Honka et al. 2017), we use

    heuristics to maximize this expression. Newton-based methods can result in local optima or fail to

    converge, which has been addressed in the literature by using a downhill simplex methods, such as

    Nelder-Mead (Ghose et al. 2018b), or by applying a kernel smoother to approximate the original

    problem and solve it using a Newton-based method (Honka et al. 2017). However, Nelder-Mead can

    converge to non-stationary points (McKinnon 1998) and the kernel smoothing approach requires

    finding the appropriate smoothing parameters, which may be computationally costly and limit

    generalizability. In our particular case, we compare the performance of several algorithms and use

    the one that performs best on our data, which turns out to be the Broyden-Fletcher-Goldfarb-

    Shanno (BFGS) algorithm (converges faster and is more robust to starting values). We also tested

    BFGS with L−1 regularization as a robustness check to avoid potential overfitting by minimizing

    LL(Θ)+λ|Θ|1. We vary λ between 1 to 100 and did not observe significant changes to the estimates

    that we report in our results section.

    4. Observational Study4.1. Empirical Context and Data

    We use clickstream and purchase data from a large telecommunications provider, hereinafter called

    TELCO, to study the role of information from friends and from the crowd on the decisions of

    consumers to search and purchase movies. TELCO is a major provider of Pay-TV services in

    the country that we analyze, serving more than 1.5 million households. In addition to triple play

    service (TV, Internet and telephony), TELCO offers Video-on-Demand (VoD), from where our

    data come from. Our dataset covers the period June - December 2015. During this period we have

    timestamped data for all relevant events initiated in this VoD system, namely movie page views,

    movie purchases, streaming sessions and likes issued. Each event indexes both a movie in TELCO’s

    VoD system and a consumer (more precisely, a set-top-box in a household with TELCO service).

    A clickstream session is defined as a set of events without an idle time longer than 1 hour. Within

  • 14

    one hour, consumers usually either purchase a movie from the VoD catalog or exit the VoD system

    without buying one.

    We use data from the month December 2015, hereinafter referred to as the period of analysis,

    to estimate our model and, at any point in time, we use data from 6 months previous to compute

    cumulative signals as explained in detail below. Our dataset contains 261,407 clickstream sessions

    from 117,455 unique households that browsed TELCO’s VoD system at least once in December

    2015. Table 1 shows the relevant descriptive statistics. On average, households engage in 1.7 sessions

    per month, browse 2.8 movies per session and end up buying 23.9% of the time. TELCO’s VoD

    system included 1,771 movies during our period of analysis. Their average price was $4.43. Usually

    consumers have 48 hours to stream the movie that they purchase from TELCO’s VoD system

    except for Perpetual VoD titles (14% of the movies in this system), which could be streamed any

    time after purchase. Movies in this VoD system had on average 392 likes when searched by TELCO

    consumers.

    Table 1 Descriptive Statistics from TELCO VoD sessions and TELCO VoD movies.

    Household Covariates min max median mean sdSessions / Month 1 10 1 1.719 1.175Movies Searched / Session 1 62 2 2.822 3.517Movies Purchased / Session 0 1 0 0.239 0.427Movies with Friends’ Purchases 0 239 5 10.940 15.820Friends with VoD Service 0 311 12 16.4 20.9Movie Covariates min max median mean sdPrice 1.49 19.99 2.99 4.43 3.34Likes When Searched 1 6,929 110 391.60 698.30IMDb Rating 1.10 9 6.40 6.27 1.21IMDb Votes 5 1,529,615 27,207 94,864 164,877Years since Release 0 74 5 6.51 6.84Days in VoD System 0 3100 375 561 606.6

    Figure 1 illustrates the VoD interface at TELCO, which consumers could reach with one click

    of a button in the TV remote control. The entry page of TELCO’s VoD system contains multiple

    menus stacked vertically, each with its corresponding name, such as “Suggested Movies” or “Recent

    Movies”. Immediately under each menu movie covers appear side by side. Consumers use a cursor

    to highlight the movie that they are interested in. They can scroll up and down across menus and

    left and right across movies. Scrolling right past the last movie cover in a menu unveils additional

    movies under that menu. The only information about the movies that is revealed to consumers

    at this stage, besides that in the movie covers, is the cumulative number of likes of the movie

    highlighted by the cursor, which shows at the bottom of the screen, as displayed in Figure 1.

  • 15

    SUGGESTED MOVIES

    movie

    cover mo

    vie

    cover mo

    vie

    cover mo

    vie

    cover… …

    RECENT MOVIES

    movie

    cover

    movie

    cover mo

    vie

    cover mo

    vie

    cover… …

    MOVIE TITLE (### LIKES)

    Figure 1 Illustration of the VoD interface at TELCO. Menus appear vertically on the TV screen and movies

    underneath each menu side by side. The cursor highlights one video at a time whose number of likes is shown at

    the bottom of the page. Arrows allow scrolling across menus and across movies.

    Clicking on the movie cover highlighted by the cursor displays the movie landing page, which

    reveals the following additional information to the consumer: price to rent the movie, length of

    the movie, cast, directors, year of release, IMDb rating and IMDb votes. The movie landing page

    allows the consumer to choose one of the following actions: (i) watch the movie trailer; (ii) pay and

    rent the movie; (iii) issue a like for the movie; (iv) redeem a voucher to watch the movie; (iv) get

    back to the menu page. Figure 2 illustrates the movie landing page at TELCO.

    Finally, we also have access to Call Detailed Records (CDRs) from cell phone calls served by

    TELCO. Each CDR contains the anonymized phone number of both the caller and the callee and

    a timestamp for the call. This dataset contains more than 193 million records for all calls placed by

    TELCO consumers between August and October 2015. We match the anonymized phone numbers

    in this dataset to household accounts at TELCO and keep the CDRs in which both the caller and

    the callee are TELCO consumers with VoD service. From here, we create a graph of social proximity

    across households. We introduce an edge in this graph between two households when they call each

    other in our dataset. Requiring two-way communication to add an edge to this graph is likely to

    rule out a number of otherwise spurious connections, such as those to commercial service providers

    and/or call centers. For sake of space, two households connected in this graph, will be called friends

    in the remainder of this paper. The resulting social graph includes 474 thousand households and

    3.8 million edges. The median and average degree are 11 and 16, respectively. Once the social

    network is defined we can define friends’ covariates. Table 1 shows that, on average, each TELCO

  • 16

    BACK

    movie

    cover

    MOVIE TITLE (### LIKES)YEAR, DURATION, SD/HDDIRECT, CAST, SYNOPSIS$$$ TO RENT FOR 48H

    Watch trailer

    Rent Movie

    Like Movie

    Redeem Voucher

    Figure 2 Illustration of the movie landing page at TELCO’s VoD system. This page reveals additional

    information to consumers, including the price. From this page consumers can, among other actions, watch the

    movie trailer, rent the movie or go back to the menus page.

    household has 16.4 friends with VoD service from TELCO and every time a TELCO household

    browses the VoD catalog there are an average of 11 movies that her friends have purchased.

    4.2. Modeling Details

    Table 2 describes the covariates that we include in our model. The number of friends rentals and

    the number of likes are known before consumers click on movie covers. Likewise for the number of

    times that the consumer searched each particular movie in the past (previous searchesij), which

    controls for variation in the consumers’ prior knowledge. Other movie characteristics included in

    this table are revealed to the consumer only when she clicks on the movie cover. More important,

    in our setting, the price to rent the movie is also revealed to the consumer only after she clicks on

    the movie cover. We estimate the following consumer utility model:

    uij =α−βpricej + other movie controlsijγ+ δprevious searchesij (8)

    + δb1log(likesij) + δb2frd rentalsij (9)

    + δa1post click× log(likesij) + δa2post click× frd rentalsij + �ij (10)

    where post click indicates a pre-click observation, in which case the dependent variable pertains

    to the utility of a search decision (ub in our model), or a post-click observation, in which case

    the latter pertains to the utility of a purchase decision (ua in our model). We expect β to be

    positive given that a higher price lowers utility. We also note that empirically, and to improve the

  • 17

    performance of our estimation technique, we normalize the price and log-transform the number of

    likes. In this specification, δb1 measures, in percentage terms, how the number of likes affects the

    utility of clicking on the movie cover, and δb2 measures, in unit terms, how the number of friends’

    rentals affects the utility of doing so. We expect both δb1 and δb2 to be positive given that these

    signals are likely correlated to movie quality. Furthermore, the ration δb1/δb2 measures the marginal

    rate of substitution between friends’ rentals and the logarithm of the number of likes. Likewise,

    δa1 and δa2 measure how these effects change from the decision of clicking on the movie cover to

    the decision of purchasing the movie. The ratio δb1/δb2 indicates how consumers trade-off likes for

    friends’ rentals for their decision to search, thus allowing us to understand the relative strength of

    these signals at this stage of the consumers’ decision making process. Likewise, (δb1+δa1)/(δb2+δa2)

    measures this trade-off for the consumers’ decision to purchase. Comparing these two ratios allows

    us to understand if this trade-off changes from search to purchase, thus learning more about how

    consumers use these signals during their decision making purposes and, in particular, how this

    trade-off varies as a function of how close to commit consumers may be.

    Finally, we model the search cost with an intercept, which captures the baseline search cost in

    TELCO’s VoD system, the number of times that a movie appears in this system (for example, the

    same movie can be cataloged under several genres) and the average horizontal displacement within

    menu from the left of the screen. Therefore, we consider the following cost model:

    cij = exp{η0 + η1Timesij + η2Displacementij} (11)

    where we keep the subscript i because movies change location in TELCO’s VoD system over

    time and thus different consumers visiting this system at different times may find the same movie

    with different search costs.

    Table 2 Description of covariates used in our empirical model to study TELCO’s VoD system.

    Covariate DescriptionRevealed to consumer i before she clicks the movie page

    Likesij Number of likes of movie jFrd rentalsij Number of friends’ rentals for movie jPrevious searchesij Number of prior clicks on the cover of movie j

    Revealed to consumer i after she clicks the movie pagePricej Price to pay to rent movie jIMDb ratingj IMDb Rating of movie jIMDb votesj Number of IMDb votes displayed for movie jY ears releasedj Number of years since movie j was released

    Cost model parameters when consumer i browses VoD systemFrequencyij Number of times that movie j appears in the VoD interfaceDisplacementij Horizontal displacement within menu of movie j

  • 18

    4.3. Results and Discussion

    Table 4.3 shows the results obtained from estimating the models in the previous sub-section using

    TELCO’s data. Column (1) shows results using all 1,771 movies in TELCO’s VoD system. Column

    (2) shows results only for the 200 “most popular” movies in this VoD system, that is, the latter

    results are obtained for the subset of the data indexed by (i, j) where i is a TELCO VoD consumer

    and j is a movie among the 200 most browsed movies in this VoD system. We also report results for

    this subset of movies because they are different from the average movie in TELCO’s VoD system and

    thus studying them allows us to learn whether popularity affects our findings. Later in section 5, it

    will also be useful to compare the results obtained for this subset of movies to the ones obtained from

    our randomized experiment given the similarity between the former and the movies used there. The

    coefficients without the interaction with post click provide estimates for the effects on the utility of

    clicking the movie page whereas the coefficients with the interaction with post click indicate how

    these effects change from the search decision to the purchase decision. As expected the coefficient

    on price is negative in both columns, meaning that the higher the price of a movie the lower the

    consumer’s utility from browsing its landing page or renting it. Column (1) shows that more likes

    and more friends’ rentals increases the likelihood of clicking the movie page. The ratio between the

    coefficients associated to these effects and the coefficient on price, adjusting for the appropriate

    transformations (recall that price is normalized and the number of likes is log-normalized), allows us

    to interpret these effects in dollar terms. From this column, and on average for the search stage, one

    more friends’ rental is worth $2.20 (δb2/βσprice = 0.071/0.107∗3.34) to consumers and one more like

    (from the mean) is worth ¢0.44 ([δb1/β] ∗ [σprice/(exp{µlikes + 0.5σlikes}− exp{µlikes− 0.5σlikes})] =

    (0.028/0.107) ∗ (3.34/195)) to them. For reference, the average movie price in our data is $4.43.

    Taking the ratio of the former two statistics shows us that at the search stage, on average, one

    friends’ rental is worth roughly 494 likes in our setting.

    The coefficients associated to the interactions with post click show us that both the number of

    likes and friends’ rentals are worthier to consumers for the purchase decision, compared to their

    value for the search decision. However, the relative value of these signals is very different at the

    purchase stage when compared to the search stage. Using the results in column (1), on average

    at the purchase stage, one more like (from the mean) is worth ¢2.05 to consumers and one more

    friends’ rental is worth $3.00, making a friends’ rental worth only 147 likes at this stage of the

    consumer’s decision process (likes are worth more at the purchase stage thus one needs fewer of

    them to substitute for a friends’ rental). The results in column (2), obtained when we consider only

    the 200 “most popular” movies at TELCO are similar to the ones reported above. In particular,

    the value the number of likes relative to that of friends’ rentals also increases from the search to the

    purchase stage. In fact, much more so in this case. Across this subset of 200 movies (which account

  • 19

    Table 3 Baseline results for the effects of friends’ rentals and of number of likes on consumer behavior

    (observational).

    (1) - all movies (2) – top 200 moviesUtility

    estimate s.e. estimate s.e.Constant -1.405*** (0.041) -1.632*** (0.041)Price -0.107*** (0.004) -0.262*** (0.009)Likes 0.028*** (0.002) 0.004 (0.004)Frd rentals 0.071*** (0.012) 0.085*** (0.016)Likes× post click 0.101*** (0.003) 0.154*** (0.006)Frd rentals× post click 0.026* (0.014) 0.021 (0.018)Other movie controls Yes Yes

    Search CostConstant -0.472*** (0.023) -0.452*** (0.020)Frequency -0.358*** (0.016) -0.564*** (0.018)Displacement 0.093*** (0.002) 0.253*** (0.016)Menu dummies Yes Yes

    Number of observations 8,912,497 800,994Log Likelihood -130547.6 -42859.7

    ***p

  • 20

    Table 4 Heterogeneous effects on rental price and on how much friends watch.

    (1) (2)Utility

    estimate s.e. estimate s.e.Intercept -1.400*** (0.041) -1.402*** (0.041)Price -0.107*** (0.004) -0.105*** (0.005)Likes 0.028*** (0.002) 0.029*** (0.002)Frd rentals 0.074*** (0.012) 0.079*** (0.013)Likes × post click 0.095*** (0.003) 0.095*** (0.003)Frd rentals × post click 0.024* (0.014) 0.029** (0.015)Likes × post click × Price -0.074*** (0.004) -0.074*** (0.004)Frd rentals × post click × Price 0.162*** (0.013) 0.161*** (0.013)Frd rentals × Watched

  • 21

    1

    Pre-click Post-click

    Perc

    enta

    ge ch

    ange

    in th

    e pr

    obab

    ility

    of cl

    ick/p

    urch

    ase

    for t

    he a

    vera

    ge m

    ovie

    Pre-click Post-click

    Figure 3 The mediating role of price on the effect of friends’ rentals. This signal remains relatively

    “important” in the purchase stage for pricier movies.

    yields less utility compared to when they watch more than half of it, which comes in line with

    the idea that friends convey a less enthusiastic opinion about the movie in such cases. Figure 4

    illustrates these results. The line for the average effect of one more friends’ rental is above the one

    for this average effect when friends watch less than 50% of the movie. One more friends’ rental

    increases the utility of clicking the movie page and of purchasing the movie but not so when friends

    watch less than 50% of the movie. For robustness purposes, in another analysis we interact the

    effect of friends’ rentals with a dummy variable indicating whether friends watch (on average)

    more than 85% of the movie (which occurs 83% of the time in our data). As expected, the results

    obtained are opposite to the ones described before, namely the utility from searching a movie that

    (on average) friends watch more than 85% of increases compared to a movie that friends watch

    less than that. These results are available upon request.

    5. Experimental Study5.1. Experimental Design

    5.1.1. Identification Identification in our observational study is difficult not only because

    of the interdependence between search costs and consumer preferences but also because of the

    correlation between one consumer’s preferences and her friends’ preferences. A consumer may click

    on the cover of a movie because she likes it but also because it has a low search cost, that is,

    the distribution of observed consumer preferences is truncated to researchers because consumers

    purchase movies from their consideration sets, and the latter are determined by search costs. As

  • 22

    Pre-click Post-click

    Perc

    enta

    ge ch

    ange

    in th

    e pr

    obab

    ility

    of cl

    ick/p

    urch

    ase

    for t

    he a

    vera

    ge m

    ovie

    Pre-click Post-click

    Figure 4 The mediating role of how much friends watch movies on the effect of friends’ rentals. This signal

    lowers the likelihood of search when friends watch less than 50% of the movie.

    discussed in the prior literature, this setting may offer an identification strategy similar to that

    employed in classical selection models. Estimating the utility of purchase is similar to estimating

    the “outcome equation” in such models. This utility is estimated using purchase data and movie

    attributes other than search costs. Estimating the utility of search is similar to estimating the

    “selection equation” in selection models. This utility is estimated using search data, which are

    correlated to purchase data, and data on the movies’ search costs. Therefore, the latter enter only

    the search cost equation. They do not enter the purchase utility equation and thus, as discussed

    in detail in (Ghose et al. 2018a), the fact that different movies have different search costs may be

    seen as “cost shifters” that help set up the required exclusion restrictions for identification. Also,

    and as (Ghose et al. 2018a, Kim et al. 2010) point out, the non-linearity of the search equation

    (namely, the integral over the distribution of movie utility used to compute reservation utilities)

    may further help with identification in these settings.

    However, in our observational study the purchase equation includes movie price, number of likes

    and friends’ rentals as right hand-side covariates, which raises additional threats to identification.

    For example, unobserved movie quality may simultaneously determine these three covariates ren-

    dering them endogenous in our setting, which would lead us to obtain biased estimates for the

    effects of interest. In fact, homophily in social networks is known to prevent the identification of

    peer effects using only observational data (Aral et al. 2009, Manski 1993). Furthermore, placing

    movies in TELCO’s VoD interface is done at the discretion of TELCO’s editorial team, which may

  • 23

    introduce correlation between prices, number of likes, friends’ rentals, search costs, and potential

    unobserved movie quality. Therefore, and in our case, we design, implement and study outcomes

    from a randomized control trial to obtain identification by design. Namely, we operate our own

    online VoD system where we randomize prices, number of likes, friends’ rentals and the position of

    the movies’ covers on the VoD interface. Our goal operating this VoD system is to put consumers

    before a VoD system similar to the one at TELCO in order to find out whether we can recover

    the main results reported before in our observational study but now randomizing the covariates

    of interest, thus achieving a much better handle on identification. We discuss the details of this

    experiment below.

    5.1.2. MoviePlatform: An Online-based Video-on-Demand System We implement

    MoviePlatform whose VoD interface is depicted in figure 5 and resembles the one at TELCO.

    Four movie genres can be accessed from the top menu, namely “Drama”, “Action”, “Comedy” and

    “Family”. These menus are stacked vertically on the VoD interface and thus can also be accessed

    by scrolling up and down the webpage. Twelve movies are included in each genre in two horizontal

    lines of six movies each displayed side by side. MoviePlatform offers access to 48 movies, all of

    which were recently made available as “Featured Videos” at Amazon Video. The number of likes

    and the number of friends’ rentals are displayed under each movie cover. Clicking a movie cover

    carries the user to the movie landing page, which conveys additional information about the movie

    as illustrated in figure 6. The price to rent the movie is shown on the movie landing page just

    as in TELCO’s case. On the movie’s landing page consumers can like the movie (and the like

    count adjusts accordingly), watch the movie trailer, rent the movie or go back to the full catalog.

    MoviePlatform was developed to resemble TELCO’s VoD system in all aspects possible. At any

    time consumers can also abandon the platform without renting a movie using the button on the

    bottom-right of the screen.

    We randomize prices, number of likes and friends’ rentals at MoviePlatform using a block design

    to increase our statistical power (Gerber and Green 2012). Movies are matched on observed char-

    acteristics, namely IMDb rating, Amazon rating count (similar to IMDb votes), year of release,

    genres, and rental price. Two movies with most similar characteristics are included in the same

    block. Within each block of two movies, one is assigned to be “High-Likes” (HL) group and the

    other one to the “Low-Likes” (LL) group. Which movie is HL and which is LL is selected at random

    for each consumer visiting MoviePlatform. The number of likes of a HL movie is randomly drawn

    from a lognormal with mean 6 and standard deviation 1. For a LL movie we use a lognormal with

    mean 4 and standard deviation 1. Therefore, the distribution of the number of likes at MoviePlat-

    form follows a lognormal distribution with average 5 and standard deviation√

    2, thus similar to

  • 24

    Figure 5 Snapshot of the VoD interface at MoviePlatform. Forty-eight movies of four different genres are

    available from this platform. Menus of movies by genre are stacked vertically. Twelve movies per genre are

    displayed in two horizontal lines of six movies each. The number of likes and of friends’ rentals is displayed

    underneath each movie cover.

    what we empirically observe in TELCO’s case (there the number of likes follows a lognormal dis-

    tribution with mean 5 and standard deviation 1.5). Independent of the number of likes, and again

    for each consumer visiting MoviePlatform, two movies within a block are randomly assigned to the

    “High-Friends’ Rentals” (HFR) group or to the “Low-Friends’ Rentals” (LFR) group. The number

    of friends’ rentals is set to zero for LFR movies. For HFR movies we draw a number from {0,1,2,3}with equal probability. Therefore, and on average across the 48 movies at MoviePlatform, there are

    12 movies with at least one friends’ rental, which mimics the corresponding statistic from TELCO’s

    VoD system. Similarly, and again for each consumer visiting MoviePlatform, two movies in the

    same block are randomly placed in the “High-Price” (HP) group and in the “Low-Price” (LP)

    group. The price of a HP movie is drawn randomly from {2.49,2.99.3.49.3.99,4.49} with equalprobability. Likewise for a LP movie using the set {4.99,5.49,5.99,6.49,6.99} instead. This ensuresan average movie price close to the average price at TELCO’s VoD system ($4.43). In addition,

    we (fully) randomized for each consumer visiting MoviePlatform the vertical order in which menus

    were shown on the webpage and the horizontal slot of each movie under each menu. Two movies

    in the same block appear in the same menu – the one for their genre – but in different positions

    under that menu.

  • 25

    Figure 6 Snapshot of the movie landing page at MoviePlatform. This page reveals additional movie covariates,

    such as duration, IMDb rating, IMDb votes, year of release, synopsis, cast and price. From this page, consumers

    can like the movie, watch the movie trailer (which shows within MoviePlatform), rent the movie or go back to the

    full catalog.

    The concepts of friends and of friends’ rentals are introduced using a social network page dis-

    played to consumers before the VoD entry page. This mimics the consumers’ experience at TELCO,

    in which case they come to the VoD system with (partial) knowledge of whether their friends

    watched some of the movies available there. Figure 7 depicts this social network page. The focal

    user is in the center of the page (represented by the “Gru” avatar) and 12 anonymous friends (with

    “Minion” avatars) were connected to her. Consumers could browse over their social network to

    learn about who rented which movies. The list of movies rented by each friend that they clicked

    on shows up on the right of the webpage. Consumers could also click an exit button that would

    lead them to the VoD entry page (they could also come back from to the social network page any

    time they so wanted). The list of movies rented by each friend in the social network is randomized

    in a way equates the total number of friends’ rentals of each movie to the corresponding number

    shown later in the VoD interface and in the movie landing page.

    5.1.3. Recruiting Consumers for MoviePlatform We use Amazon Mechanical Turk

    (AMT) to recruit “consumers” for MoviePlatform. All of them are based in the US and each of

    them could only participate once in our experiment. Once given the instructions to participate in

    this experiment, users are led to the social network page, then to the MoviePlatform page and

    then to a survey that we use to assess their perceptions about MoviePlatform. After the survey,

  • 26

    Figure 7 Snapshot of social network page displayed to participants at MoviePlatform. The focal user is in the

    center and twelve anonymous friends are connected her. The user can browse over the social network and click

    her friends to learn about which movies they rented. The list of movies that they rented shows up towards the

    right of the webpage.

    users obtain a task completion code and return to the AMT page to redeem their reward for their

    participation in our experiment. We followed the literature in experimental economics to design

    this AMT task in a incentive compatible way, thus inducing as much realism as possible in order to

    improve the truthfulness of the participants’ choices (Ding 2007). To this end, we applied a random

    lottery procedure. The reward that a user receives from participating in our experiment includes

    two parts: a guaranteed participation fee of $1 and an individual-specific lottery reward. Figure 8,

    which is used as part of the instructions at AMT to explain the rewards process, illustrates how

    rewards are computed. There is a 10% chance to win the lottery. The result of the lottery is only

    realized and shown to consumers after they answer the final survey. A user that loses the lottery

    collects only the participation fee. A user that wins the lottery gets a voucher to watch the movie

    that she selects (at MoviePlatform) from Amazon Video and the difference between its price (at

    MoviePlatform) and $10 in cash (we note that we did not follow users at Amazon Video and thus

    we do not know whether they used voucher to watch the movie that they select, but whether this

    happens is immaterial to our analysis). For example, a user that wins the lottery and selects, at

    MoviePlatform, a movie priced at $5.99 gets a voucher to watch this movie from Amazon Video,

  • 27

    and $5.01 ($1 participation fee and $10-$5.99). A consumer that leaves MoviePlatform without

    renting a movie collects $11.

    Figure 8 Snapshot of the example given to participate to explain them the calculation of rewards.

    5.1.4. Survey A final survey asked participants if they relied on the number of likes and/or on

    the number of friends’ rentals to make decisions at MoviePlatform. Our data show that 41% of them

    indicate that they relied on the number of likes and 30% indicate that they combined the number

    of likes with the number of friends’ rentals to make a decision. More than half of the participants

    report that they relied on at least one of these signals to make a decision. The top reason identified

    by users to rely on the number of likes was that “likes are a good indicator of movie quality”.

    The top reason identified by users to rely on the number of friends’ rentals was that they have

    “similar interests and preferences” to their friends – “if my friends enjoyed a movie it is likely that

    I will too”. Some participants also indicate that they ask their friends for movie recommendations

    because “they trust their friends on movie selections” and that asking them “helps narrow down

    the set of movies to consider”. Some participants indicate that watching the same movies as their

    friends gives them the benefit of having something to chat about. Several participants indicate that

    they trusted this signal at MoviePlatform because they saw that their hypothetical friends rented

  • 28

    movies that they also like. Finally, 46% of the participants indicate that they did not rely on these

    signals to make decisions at MoviePlatform. Around 90% of them indicate that they “only trust

    their own tastes” when it comes to movies and many of them report that they know enough about

    movies from other sources of information to make their own decisions. In sum, we find that many

    participants in our experiment relied on the social signals displayed by MoviePlatform to make

    decisions. Yet, the statistics reported above are only descriptive and hard to compare to TELCO’s

    case. For example, we do not know how many VoD consumers at TELCO rely on the number of

    likes shown on TELCO’s VoD interface or on the information that they may collect from their

    friends when making decisions.

    This survey also asked participants if the movie renting process at MoviePlatform was clear,

    whether the reward calculation was clear and whether it was clear to them that it was in their

    best interest to choose the movie that they would like to watch the most rather than pick up a

    movie at random. Answers were collected on a Likert scale from 1 to 5. The average score for

    these questions are 4.65, 4.68 and 4.67, respectively. Participants were also quizzed on the reward

    calculation above as part of the instructions, thus before they had access to MoviePlatform, and

    85.1% of them answered the quiz question correctly.

    5.2. Descriptive Statistics and Modeling Details

    Our MoviePlatform experiment run for 10 days at AMT starting on March 9th 2018. We launched

    several experimental batches at different times of the day (to avoid biases) and participants were

    required to use a laptop or a desktop in order to better mimic the experience at TELCO’s VoD

    system. We recruited 500 participants of which 483 completed the required task at AMT. From

    the latter, 95.9% clicked at least one movie cover. Therefore, the dataset that we analyze below

    covers all the sessions from these 463 (483∗0.959) participants. Table 5 shows descriptive statisticsfor these sessions and for the movies available at MoviePlatform. On average, participants browsed

    2.4 different movies during our experiment and this statistic is very similar to its counterpart in

    TELCO’s case. Also, on average, the 48 movies used at MoviePlatform are very similar to the 200

    ”most popular” movies available at TELCO’s VoD system. In particular, the average number of

    years since release for the former is 3.83. This statistic is 3.83 for the latter but 6.51 across all

    movies in TELCO’s VoD system. Similarly, the average IMDb rating for all movies in TELCO’s

    VoD system is 6.27 and 6.48 for the 200 “most popular” movies there. This statistic is 6.60 for the

    48 movies available at MoviePlatform. In sum, it becomes appropriate to compare the results from

    this experiment to the ones obtained from our observational study across the 200 “most popular”

    movies at TELCO.

    Table 6 describes the covariates that we include in our model, which is very similar to the one

    that we used before in the case of TELCO’s VoD system. Therefore, we estimate this model using

  • 29

    Table 5 Descriptive Statistics from MoviePlatform sessions and for MoviePlatform movies.

    Household Covariates min max median mean sdMovies Searched / Session 1 21 1 2.37 2.71Movies Purchased / Session 0 1 1 0.931 0.254Movies with Friends’ Purchases 11 16 11 12.47 2.01

    Time spent on social network page (sec) 0 59 18 21.73 20.10Time spent on MoviePlatform page (sec) 0 628 32 65.25 87.49Number of hypothetical friends browsed 0 12 8 7.52 4.30Movie Covariates min max median mean sdPrice 2.49 6.99 4.74 4.80 1.45IMDb Rating 5.10 8.20 6.60 6.64 0.78Amazon rating count 8 19783 358 2122.0 3773.9Years since Release 1 33 1 3.98 6.49Likes When Searched 6 4941 125 403.50 642.82

    the same approach as before, described in detail in our modeling section. In this case, and for the

    utility equation, there is no covariate for previous searches (each consumer visits MoviePlatform

    at most once during our experiment), IMDb votes is substituted by Amazon rating count, and we

    add genre dummies. In the search cost equation, we include only the order of the menu where the

    movie shows up (0 for the top menu, 1 for the next menu, 2 for the third menu and 4 fo the last

    menu on the webpage) because at MoviePlatform consumers have to scroll up and down across

    menus but they do not need to scroll left and right to browse movies. All twelve movies under

    each menu are shown at once in two lines of six movies side by side and therefore they all appear

    immediately at the eyes of consumers. For sake of completeness, the models that we estimate in

    this case are as follows:

    uij =α−βpricej + other movie controlsijγ (12)

    + δb1log(likesij) + δb2frd rentalsij (13)

    + δa1post click× log(likesij) + δa2post click× frd rentalsij + �ij (14)

    cij = exp{η0 + η1Menu Orderij} (15)

    Finally, table 7 shows that there are no correlations in our setting between observed movie

    characteristics and number of likes, number of friends’ rentals and movie prices, and thus our

    randomized schedule worked as expected for this field experiment.

    5.3. Results and Discussion

    Our data cover the sessions from the 463 participants that clicked at least one movie cover at

    MoviePlatform during our experiment. Table 5.3 shows the results obtained with and without

  • 30

    Table 6 Description of covariates used in our empirical model to study our MoviePlatform experiment.

    Covariate DescriptionRevealed to consumer i before she clicks the movie page

    Likesij Number of likes of movie jFrd rentalsij Number of friends’ rentals for movie j

    Revealed to consumer i after she clicks the movie pagePricej Price to pay to rent movie jIMDb ratingj IMDb Rating of movie jAmazon Rating Countj Number of Amazon rating votes displayed for movie jY ears releasedj Number of years since movie j was releasedGenre dummiesj Dummies for the genres of movie j

    Cost model parameters when consumer i browses VoD systemMenu Orderij Vertical displacement of the menu of movie j

    Table 7 Correlation between movie characteristics, number of likes, number of friends’ rentals and movie prices

    at Movie Platform (p-values for the Pearson correlation coefficients in brackets).

    log(Likes) Frd rentals Price

    log(Likes) 1 -0.039 0.037(NA) (0.89) (0.91)

    Frd Rentals -0.009 1 0.023(0.89) (NA) (0.72)

    Price 0.008 0.023 1(0.91) (0.72) (NA)

    IMDB Rating -0.009 0.053 -0.009(0.88) (0.42) (0.89)

    Years Released -0.062 -0.003 -0.018(0.34) (0.97) (0.79)

    Amazon Rating Count 0.003 0.040 0.013(0.96) (0.54) (0.84)

    Genre Drama -0.025 -0.056 0.003(0.70) (0.39) (0.96)

    Genre Comedy 0.096 0.014 0.022(0.14) (0.84) (0.73)

    Genre Action 0.000 -0.007 -0.014(0.99) (0.91) (0.83)

    Genre Family 0.010 0.020 0.020(0.88) (0.76) (0.76)

    additional controls, in columns (1) and (2), respectively. As expected, the coefficient on price

    is negative, meaning that the higher the price of a movie the lower the consumer’s utility from

    browsing its page or renting it. The more friends’ rentals the higher the utility from clicking the

    movie page. The coefficient on the number of likes at the search stage is also positive but not

    statistically significant. Therefore, the results that we obtain from our experiment at the search

    stage are qualitatively similar to the ones that we obtained from our observational study for the

    case of the 200 “most popular” movies at TELCO. The coefficient on the number of likes at the

  • 31

    Table 8 Baseline results for the effects of friends’ rentals and of number of likes on consumer behavior

    (experimental).

    (1) (2)Utility

    estimate s.e. estimate s.e.Constant -0.077* (0.044) 0.001 (1.391)Price -0.070*** (0.007) -0.076*** (0.008)Likes 0.001 (0.010) 0.003 (0.011)Frd rentals 0.069*** (0.013) 0.070*** (0.014)Likes × post click 0.034* (0.015) 0.039** (0.015)Frd rentals × post click -0.023 (0.018) -0.030 (0.018)Other movie controls No YesGenre dummies No YesBlock dummies No Yes

    Search CostIntercept -1.549*** (0.062) -1.662*** (0.071)Menu Order 0.028*** (0.007) 0.033*** (0.007)Number of observations 22,224 22,214Log Likelihood -4417.19 -4348.16

    ***p

  • 32

    Table 9 Comparing the effects of the number of likes and of friends’ rentals between the observational study

    and experimental study.

    TELCO VoD(All 1,771 movies)

    TELCO VoD(200 most searched)

    MoviePlatform(48 movies)

    BeforeSearch

    AfterSearch

    BeforeSearch

    AfterSearch

    BeforeSearch

    AfterSearch

    Monetary value of social signalsFriend rental (dollar) 2.30 2.79 1.15 1.44 0.92 0.54Like (cents) 0.35 1.88 0.01 0.48 0.02 0.25

    Substitution between social signalsLog number of likes thata friend rental is worth

    2.82 2.17 3.97 2.47 3.73 2.34

    Drop in the rate of substitutionbetween log(likes) and friends’ rentals

    22.93% 37.76% 37.28%

    6. Conclusions

    We study how consumers combine signals from the crowd and from their friends to make search

    and purchase decisions. We develop a dynamic structural model that combines the ideas of optimal

    sequential search and discrete choice and test it using data from two empirical contexts. In our first

    study, we use clickstream data from a VoD platform operated by a large telecom service provider

    where consumers search and purchase movies using their home TV screen. In our second study,

    we analyze data from a randomized control trial in which consumers can search and rent movies

    online. The second study was implemented using a web-based VoD system created and operated

    by us for the purpose of this study.

    We find consistent results in both empirical cases. In our first study, we find that more friends’

    rentals and more likes increase the likelihood of clicking the page of a movie to obtain more

    information about it and the likelihood of renting the movie. We also find that, on average, the

    value of likes compared to that of friends’ rentals tends to increase from the search stage to the

    purchase stage but less so for more expensive movies. We find even stronger results in the same

    direction when we study only the 200 most browsed movies in TELCO’s VoD system. Across this

    set of movies, additional likes do not increase the likelihood of clicking the movie page but do

    increase the likelihood of purchasing the movie whereas friends’ rentals increase the likelihood of

    clicking the movie page but do not further increase the likelihood of renting the movie. Therefore,

    in for this subset of movies, the increase in the value of a like relative to that of friends’ rental from

    the search to the purchase stage is even more pronounced. We also find that friends’ rentals reduce

    the likelihood of clicking the movie page when friends’ watch less than half of the movie, which

    is consistent with the idea that friends’ rentals proxies the potential transmission of information

    among friends about movie quality. In our second study, we also find that only friends’ rentals

    increase the likelihood of clicking the movie page and that only likes increase further the likelihood

  • 33

    of purchase. The second study provides proper econometric identification for these effects by design

    given our randomized assignment of number of likes, friends’ rentals and prices to each movie for

    each consumer visiting the VoD platform.

    Understanding the relative importance of information from the crowd and from friends at dif-

    ferent stages of the consumer shopping journey in the inner part of the conversion funnel may

    provide valuable implications for business practitioners. Our paper shows that in a world where

    “real estate” on the screen is scarce (e.g. mobile phones), search is costly and consumers exhibit

    limited attention and ability to process multiple signals at once, highlighting signals from friends

    early in the consumer’s shopping journey and signals from the crowd only later may be a produc-

    tive way to increase the consumers’ likelihood of both searching and purchasing products. Doing

    so would constitute a departure from the current practice of showing all signals at the same time

    to consumers, which may hinder, instead of help, the search process.

    Still, we acknowledge that our paper comes with several limitations. First, our model assumes

    that the distributions of product features are public knowledge prior to search. Future work may

    relax this assumption by considering consumers that dynamically update their beliefs on product

    characteristics as they collect more information. Second, friends’ rentals is only a proxy for potential

    information that friends may convey about the movies that they watch. We use this proxy because

    in our setting we do not measure directly whether friends talk about movies but knowing so is

    likely to reduce the noise associated to this covariate helping us measure the effects of interest.

    Finally, our results pertain to consumers choosing to search and purchase movies and therefore

    our results may not generalize to other types of products. In particular, movies are an experience

    good, and consumers may combine information from friends and from the crowd in a particular

    way for this type of good that our analyses pick up.

  • 34

  • 35

    References

    Abhishek, V., Fader, P., and Hosanagar, K. (2018). Media exposure through the funnel: A model of multi-

    stage attribution. Working paper.

    Aral, S., Muchnik, L., and Sundararajan, A. (2009). Distinguishing influence-based contagion from

    homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences,

    106(51):21544–21549.

    Aral, S. and Walker, D. (2012). Identifying influential and susceptible members of social networks. Science,

    337(6092):337–341.

    Archak, N., Ghose, A., and Ipeirotis, P. G. (2011). Deriving the pricing power of product features by mining

    consumer reviews. Management Science, 57(8):1485–1509.

    Arndt, J. (1967). Role of product-related conversations in the diffusion of a new product. Journal of

    Marketing Research, 4(3):291–295.

    Banerjee, A. V. (1992). A simple model of herd behavior. The quarterly journal of economics, 107(3):797–817.

    Barry, T. E. (1987). The development of the hierarchy of effects: An historical perspective. Current Issues

    and Research in Advertising, 10(1-2):251–295.

    Berelson, B. (1954). Voting: A study of opinion formation in a presidential campaign. University of Chicago

    Press.

    Bikhchandani, S., Hirshleifer, D., and Welch, I. (1998). Learning from the behavior of others: Conformity,

    fads, and informational cascades. Journal of Economic Perspectives, 12(3):151–170.

    Boone, T., Reilly, A., and Sashkin, M. (1977). Social learning theory albert bandura englewood cliffs, n.j.:

    Prentice-hall, 1977. 247 pp., paperbound. Group & Organization Studies, 2(3):384–385.

    Bramoull, Y., Djebbari, H., and Fortin, B. (2009). Identification of peer effects through social networks.

    Journal of Econometrics, 150(1):41–55.

    Bruce, N. I., Peters, K., and Naik, P. A. (2012). Discovering how advertising grows sales and builds brands.

    Journal of Marketing Research, 49(6):793–806.

    Chen, Y., Wang, Q., and Xie, J. (2011). Online social interactions: A natural experiment on word of mouth

    versus observational learning. Journal of Marketing Research, 48(2):238–254.

    Chen, Y. and Yao, S. (2016). Sequential search with refinement: Model and application with click-stream

    data. Management Science.

    Chevalier