the 1st the economics of digitization - cesifo group munich · the economics of digitization...
TRANSCRIPT
The 1st Doctoral Workshop on
The Economics of Digitization Munich, May 12–13 2017
Estimating the Effects of File-sharing on Movie
Box-office Zhuang Liu
Estimating the Effects of File-sharing on Movie
Box-office
Zhuang Liu∗
University of Western Onatrio
This Draft: May 2, 2017
Abstract
File-sharing and on-line piracy have caught great public attention yet no
consensus has been reached on how file-sharing affects industry revenue in
economics literature. Using a novel dataset of downloads from Bit-Torrent
network, this paper quantifies the effects of file-sharing on movie box-office
revenue. I estimate a random coefficient demand model of movies to quantify
the effects of file-sharing. I also allow piracy consumption to have spillover
effect on paid consumption and quantify its impact on box-office. The es-
timates show that file-sharing reduces box-office revenue of motion picture
industry by 1.4 % over a 20 week period in 10 countries. In addition, spillover
effect from pirated consumption contribute to $ 9.4 million (0.14%) to movie
box-office revenue.
∗Preliminary and Incomplete, please do not cite without permission. I thank SalvadorNavarro, David Rivers, Tim Conley, Tai-Yeong Chung, Scott Orr, Nail Kashaev, Giulia Pa-van and participants of Tenth IDEI-TSE-IAST Conference on The Economics of IntellectualProperty, Software and the Internet, 50th Annual Conference of the Canadian Economics As-sociation, Western Economics 50th Anniversary Conference and UWO labor lunch seminar fortheir help and useful comments. All errors are mine.
1
1 Introduction
One of the very important developments on the Internet is the emergence of peer-
to-peer file-sharing. In less than 20 years, P2P file-sharing has experienced dra-
matic growth and now become one of the most common activities on the Internet.
The most widely used file sharing protocol, BitTorrent now has more than 170
million active users worldwide. It is claimed that BitTorrent moves as much as
40% of the world’s Internet traffic on a daily basis1. The wide use of file-sharing
has provided Internet users free and easy access to unauthorized copies of digital
content like movies and music, resulting in a surge in digital piracy2.
These facts have raised concerns among both policymakers and academic re-
searchers about the economic effects of file-sharing on relevant industries. However,
there is yet no consensus on the impact of file-sharing. On one side, many people,
especially copyright holders in movie and music industries treat file-sharing as the
major reason for declining sales. Several widely quoted industry investigations
have indicated evidence of huge economics loss, for example, software piracy costs
the economy about 63.4 billion dollars in 2011 (Business Software Alliance(BSA)’s
2011 Piracy Study); Digital piracy causes 58 billion dollar in actual US economic
losses and 373,000 lost jobs (IPI 2005 study)3. However, the reliability of some
of these estimates is under criticism for the unrealistic assumptions made in these
studies 4. This relatively “naive” methodology will inevitably inflates the esti-
mated loss. In addition, It is possible that piracy can have positive spillover
effects on sales through channels like sampling effect (Peitz and Waelbroeck, 2006;
Kretschmer and Peukert, 2016), social learning from word-of-mouth recommenda-
tion (Moul, 2007; Moretti, 2011; Peukert et al, 2016) , observation learning (New-
berry, 2016) or network externalities in movie consumption (Gilchrist and Sand,
2016) 5, so it is hard to conclude what the true effect of file-sharing is without
1BitTorrent Inc: http://www.bittorrent.com/company/about2http://arstechnica.com/tech-policy/2015/08/riaa-says-bittorrent-software-accounts-for-75-
of-piracy-demands-action/3http://www.prnewswire.com/news-releases/58-billion-in-economic-damage-and-373000-
jobs-lost-in-us-due-to-copyright-piracy-58354582.html4For instance BSA admits that they assume that every download counts as one lost sale in
their study5Belleflamme and Peitz (2012) provides a more comprehensive survey of the literature on the
positive effects of digital piracy
2
knowing (1) the true substitutability between legitimate and pirated consumption
and (2) the magnitude of pirated consumption’s positive spillover effect on sale.
The goal of this paper is to answer these questions that are at the center of
current debates. To be specific, this paper complements the empirical literature
on file-sharing by estimating a random coefficient demand model of movies using a
novel dataset I collected on actual downloads on Bit-Torrent. Using the estimated
parameters I conduct a no-piracy counterfactual experiment to quantify the effects
of file-sharing. Main contributions of this paper is twofold. First, using computer
science techniques, I conduct a 20 week period study that monitors the download-
ing activities of pirated movies on BitTorrent during a 20 week period in 2015.
I construct a dataset of weekly movie downloads using information from 26,266
relevant movie torrent files collected via major torrent search engines. Due to lack
of data on actual downloads, researchers on file-sharing mainly explore various
proxies and events to study the impact of file-sharing and data limitation issues
may hamper the identification of true effects of file-sharing because of measure-
ment errors and data representativeness. With information on actual downloads,
we are free from some worries about measurement errors and representativeness
issues from employing proxies or using individual survey data.
Second, to the best of my knowledge, this paper is the first attempt to assemble
aggregate download data on P2P file-sharing and apply it to structural model to
study movie piracy. The use of structural model bring several benefits. First, we
can use counter-factual experiments to test the efficacy of various anti-piracy pol-
icy. With data on aggregate downloads, we can obtain more reasonable estimates
of loss and compare them with estimates from the previous widely cited indus-
try studies. Second, estimation of a demand model allow calculation of consumer
welfare, therefore we can conduct welfare analysis regarding file-sharing. Third,
we can have a more thorough investigation of the substitution pattern, specifically
how one movie’s piracy displace its own boxoffice and other movies’ boxoffice.
Third, this paper decompose the effect of piracy on box-office into the canni-
balization effects and spillover effects. Quantifying the spillover effect from piracy
via sampling or word of mouth have important managerial implication if firm can
correctly identify the magnitude of spillover and utilize it as a promotional tool in
the right timing. However, few papers have attempted to decompose those effect:
Kretschmer and Peukert (2016) use a natural experiment on Youtube in Germany
3
and qualitatively study the promotional and displacement effect of sampling; An-
other similar paper is Ma et al.,(2016) who also decompose the two effects on
boxoffice using a Hidden Markov Model, I complement those studies by structural
estimation of a random coefficient demand model with consumer decision on movie
and piracy choice, allowing for more thorough investigation at the substitution
pattern.
This paper’s findings are as follows. First, file-sharing reduces total revenue
of the motion picture industry from box-office by $ 90 million in total, 1.4 %
of the current box-office6. The estimates are smaller than widely cited industry
estimates constantly referenced in policy making, the “naive” methodology which
assumes full sale displacement will inflate the true cost 9.2 times. On average one
movie suffers monetary loss of 0.379 million because of file-sharing. Second, on
average one download displaces legitimate sale by 0.11 unit. Third, the results of
welfare analysis show that file-sharing increase consumer welfare by a total of $
0.73 billion, therefore banning file-sharing service will result in a dead weight loss
of $ 0.64 billion. Fourth, I examine heterogeneity in revenue loss due to piracy. I
find that science fiction and action movies are more vulnerable to piracy. Wide
release movies benefit most from removal of piracy. In addition, the magnitude
of cross substitution effect of piracy is potentially big, so anti-piracy campaigns
that remove piracy for individual movie have limited benefits to box-office revenue
because most downloaders will substitute into other pirated movies. Lastly, I
examine the magnitude of spillover effect of piracy on box-office revenue. I find
that spillover effect contributes to box-office by a total of 9.4 million dollars in 10
countries during 20 weeks period.
The topic of this paper is important to the resolution of current heated debate
on controversial issues regarding file-sharing and intellectual property. For policy
makers, the results on the effects of file-sharing on industry revenue and consumer
welfare will help their decision making regarding the legal issues on file-sharing.
These results also have important managerial implication, proper estimate of the
effects of file-sharing and substitutability will help managers in motion picture
industry to better determine the optimal level of copyright protection given the
supervision and litigation cost.
The paper is organized as follows. Section 2 provides an overview of relevant
6The number is a total of number in 10 countries under study with a time period of 20 week.
4
literature. Section 3 provides background information on motion picture industry
and file-sharing. Section 4 describes the data and Section 5 presents the model.
Estimation procedure and results are presented in Section 6. Section 7 gives the
results of counterfactual experiments, and Section 8 concludes the paper.
2 Literature Review
This paper adds to several strands of literature. Firstly, this paper is related to
the empirical literature on file-sharing. Identifying effects of file-sharing on sales of
digital products is an empirically challenging question because of issues like data
limitation and endogeneity of downloads. The displacement effects of filesharing
on sale has been widely studied inthe literature, but evidence about the causal
effect of file-sharing on sale are mixed. Majority of papers find negative effect on
sale ( Liebowitz, 2004,2005; Zentner, 2006; De Vany and Walls, 2007; Rob and
Waldfogel, 2004; Rob and Waldfogel, 2007; Hong, 2013; Danaher and Waldfogel,
2015; Ma et al, 2016), but there are also a number of papers finding moderate
negative effect, insignificant effect or even positive effect (Oberholzer-Gee and
Strumpf, 2007; Smith and Telang, 2010; Bai and Waldfogel, 2012; Hammond,
2015; Lee, 2016)
One reason cause the controversy in those empirical results could be data lim-
itation. Due to the difficulty to observe actual downloads, researchers have come
out with different ways to overcome this empirical issue. Judging by their method-
ologies, most researches on file-sharing can be categorized into three categories.
Firstly, many researchers employ various proxies such as geographic variations in
Internet penetration rate, broadband connection rate, etc(Liebowitz, 2004,2005;
Zentner, 2006). Secondly, some papers take advantages of quasi-experiments such
as development of file-sharing technology, close of filesharing sites or variation
in international movie release window (Danaher and Smith, 2014; Hong, 2013;
Kretschmer and Peukert, 2016; Peukert et al., 2016; Danaher and Waldfogal,
2015). Lastly, the others use survey data collected from group of consumers (Rob
and Waldfogal 2004, Rob and Waldfogal 2007, Bai and Waldfogal 2012, Leung,
2013). Each of these researche methods have their own merits, but in absence of
data on actual file-sharing activities, questions may arise such as to what degree
these proxies and quasi-experiments can capture the true variation of file-sharing
5
activities and to what degree the consumer sampled in survey are representative
of the true population. These questions regarding measurement errors and repre-
sentativeness may help explain the different results in those papers. Having data
on actual downloads can be a good complement to those studies. Few studies that
utilize actual download data include Oberholzer-Gee and Strumpf (2007), Ham-
mond (2013) and Lee (2016). They use download data on Napster and private
BitTorrent tracker and most of them find no significant effect and or very moder-
ate negative effect. Compared with the file-sharing data used in those papers, the
data employed in this paper are collected from a more recent period in 2015 where
the landscape of file-sharing has changed dramatically from 2007. Instead of using
data from one tracker, I attempt to estimate the aggregate download using data
obtained from a more comprehensive list of 84 popular public BitTorrent trackers.
Beside data, a paper closely related to this paper in terms of econometric method-
ology is Leung (2013), who also structurally estimate a random coefficient Logit
model to study software piracy using a conjoint survey of 281 college students.
Our papers are different in several aspects. Leung (2013) studies on software in-
dustry and this paper focuses on motion picture industry. While Leung (2013)
focus on the study of substitution pattern using the rich information in college
student survey data, this paper also tries to estimate total impact on the industry
level using an aggregate measure of download activity, in addition this paper also
decompose the pure substitution effect and positive spillover effect of piracy which
is not considered in Leung (2013)’s study.
In addition to the empirical literature on file-sharing, this paper is also related
to the growing literature on motion picture industry7. Researchers have studied
different aspects of the motion picture industry, for example: spatial competition
of movie theaters (Davis, 2006), social spillover and word of mouth (Moul, 2007;
Moretti, 2011; Gilchrist and Sand, 2016), seasonality in the motion picture indus-
try (Einav, 2007), uniform pricing practice (Einav, 2007), movie price elasticity
(Davis, 2002; De Roos and McKenzie, 2014), effect of uncertainty in the movie
industry (De Vany and Walls, 1999; Elberse and Eliashberg, 2003), influence of
movie critics(Eliashberg and Shugan, 1997). This paper adds to the literature on
the effect of file-sharing on the motion picture industry.
The third strand of literature is the broad literature on intellectual property
7See McKenzie (2012) for survey on movie industry.
6
especially copyright. The emergence of file-sharing may require governments to
adjust the existing strength of copyright protection accordingly. However, there
is no consensus on the optimal degree of intellectual property protections. As
Boldrin and Levine (2002) point out, strong property rights which not only include
the right to own and sell ideas, but also the right to regulate their use after sale,
will create a socially inefficient intellectual monopoly. Klein, Lerner and Murphy
(2002) argue that file sharing restrict copyright holders’ ability to exercise price
discrimination and effectively control price, so file-sharing services are likely to
reduce the value of copyrighted work. They argue that the use of strong property
rights to restrict piracy should be implemented even if there is substantial cost of
restricting consumer’s “fair use”. Empirical evidence on the effects of file-sharing
will provide useful insights to the debate on optimal copyright protection.
3 File-Sharing and BitTorrent
Peer-to-peer file-sharing is a decentralized file-transfer technology. In traditional
downloading methods, files are downloaded from a centralized servers which store
the source file. Because of the limited bandwidth, download speed will deteriorates
as the number of clients requesting services from the server increases. For P2P file-
sharing, clients download the file from other clients who are also downloading the
file or those who have downloaded the file. P2P file-sharing can efficiently utilizes
the upload bandwidth of clients to facilitate downloading, therefore it success-
fully overcomes the bandwidth bottleneck of centralized servers and significantly
increases download speed. Due to these advantages, P2P file-sharing has quickly
gained popularity among Internet users.
The history of file-sharing can date back to 1999. An American computer
programmer named Shawn Fanning developed a peer-to-peer file-sharing platform
called Napster, Napster was used to share music files among users and it quickly
became popular among Internet users. At its peak in 2001, Napster had about
80 million registered users over the world. In July 2001, Napster was involved
in a series of copyright lawsuits and was forced to shut down by US court. Af-
ter the shutdown of Napster, subsequent file-sharing services have been developed
including Gnutella, Freenet, Kazaa, FastTrack, E-Mule and so on. Among those
followers, BitTorrent has become the dominating file-sharing service gradually, ac-
7
counting for on average 40 % of Internet upstream traffic according to broadband
management company Sandvine8. Most files transfered in BitTorrent are media
files like movies, TV shows and music, and most of these files are pirated. Accord-
ing to research conducted by RIAA, Bit Torrent may account for about 70% of
piracy activities around the world.
Due to the dominance of Bit-Torrent over other file-sharing platforms, I focus
on BitTorrent in the study of file-sharing in this paper. Bit-Torrent is representa-
tive of the P2P file-sharing population. First, although BitTorrent is not the only
one of the P2P file-sharing services, it is believed that behaviors of file-sharers
are not systematically different across different platforms. Oberholzer-Gee and
Strumpf (2007) provide the evidence that download patterns in different plat-
forms are very similar. Second, even if there are some difference across platform,
BitTorrent is so dominating nowadays that share of other substitutes are negligi-
ble. In 2007, according to the research of Ipoque9, the traffic of the second popular
P2P protocol E-Donkey is roughly 50 % of BitTorrent. However in 2011 traffic of
E-Donkey is only 2.6% 10 and therefore is negligible.
4 Data
4.1 Data Description
The data used in this analysis are from several data sources. The main data consist
of data about box-office and downloads11, country specific average admission price
and movie characteristics for movies from 10 countries including major box-office
market such as United States, United Kingdom for 20 week periods in 2015.
I collected the file-sharing data during the 20 weeks period from March 27th to
August 7th in 2015. The data contains weekly downloads on BitTorrent collected
8TorrentFreak: https://torrentfreak.com/bittorrent-still-dominates-global-internet-traffic-101026/
9TorrentFreak, https://torrentfreak.com/p2p-traffic-still-booming-071128/10ISPreview, http://www.ispreview.co.uk/story/2011/05/18/bittorrent-p2p-filesharing-
dominates-eu-broadband-isp-internet-traffic.html11Admission and downloads are at country-level. 144 of the movies are found to have pirated
versions circulated in BitTorrent, most of movies unavailable to download are small independentmovies or Non-English movies from other countries. They are all with limited box-office influence.The countries in our data include: United States, United Kingdom, Australia, Brazil, Japan,South Korea, Mexico, Greece, New Zealand, Poland.
8
from 26,266 torrent files for 144 movies released between March 27th and August
7th, 2015. The data are collected using computer science techniques following
several studies on BitTorrent (Erman 2005, Layton and Walters 2010), details of
data collection methods are presented in the next section.
The box-office data was collected from box-office reporting service websites
Boxofficemojo.com and The-numbers.com. I collected information on weekly box-
office and movie characteristics for all movies showing during the sampling period
(March 27th to August 7th in 2015). I include characteristics such as movie ratings,
sequel, cast quality, director quality, genres, MPAA rating, weeks after release
which are commonly used in studies on motion picture industry. Movie rating data
was collected from Internet Movie Database(imdb.com). Cast and director quality
are collected from Powergird.com. they are of scale 0-100, measured by their
previous boxoffice performance. Because the uniform pricing practice in movie
theaters and movie price data hard to collect, only country-level average admission
price is obtained12. The box-office of some independent movies are extremely small
that their market shares are indistinguishable from zero, inclusion of these “zero”
market share movies will bring numerical problems to the estimation procedure so
I drop all observations with market share smaller than 0.01 % in my sample.
4.2 Collection of File-sharing Data
This section provides a description of the procedures of downloading on BitTorent
and my data collection methodology.
It is very easy for BitTorrent users to download movie files online, they only
need to find the .torrent file associated to the requesting file, the .torrent file is
a descriptor meta-file containing important information to facilitate file transfer.
Each .torrent file is indexed by an unique 40 bits identifier called torrent info-hash.
The torrent file usually can be obtained from popular torrent search engines such
as Piratebay.com, Torrentz.com and so on. Upon getting the .torrent file, the Bit-
Torrent Client software installed on user’s computer will help download the file
12Admission price variation is very small, although prices are different across screentypes(IMAX/3D/Ordiniary), these variation are perfectly correlated with movie characteristicsand therefore offer little identification power for the price elasticity. There is also price discrimi-nation on different age group and selected days in a week (’Cheap Tuesday’), but actual data onadmission by type and price are hard to obtain, so I will not attempt to estimateprice elasticityin this paper.
9
automatically. The information on .torrent file will guide client to contact BitTor-
rent trackers and get a list of clients(so called ’peers’) who are also downloading
the same file. The role of trackers is essentially directing the traffic in the Bit
Torrent network, tracker server don’t keep the file content itself, instead it keeps
tracks of who are downloading the file and tell a client who they should contact
for file transfer. Tracker server keep the current number of downloads for each
registered torrent file and these number can be scraped by sending an HTTP or
UDP request given the info-hash of torrent file13.
Now I describe my data collection methods. To obtain the estimates of weekly
download on BitTorrent, I first collect the torrent files of each movie by webcrawl-
ing the popular BitTorrent search engines. Every week the web crawler will send
search queries about each movie on major Bit Torrent search engines (Torrentz,
Kickass, Isohunt, Piratedbay, Extratorrent) and extract the identifier (infohash)
of relevant movie torrent files from the torrent information page.
To ensure the extracted torrent file are truly relevant, I add several restrictions
in the search queries:
• The file size has to be bigger than 200 MB.
• The file format has to be a video format such as mp4,avi,wmv,mkv,rmvb,etc.
• The file age can not be older than the earliest release date of the movie.
• I filter out several keywords such as: trailer, featurette, soundtrack, OST,
xxx, etc.
After obtaining a collection of infohashes (torrent identifiers) for each movie,
I collect a list of all working public Bit Torrent trackers. There are currently 84
trackers in the list.
According to Bit Torrent protocols, BitTorrent trackers will respond to HTTP
or UDP GET request with information including number of downloads, current
number of seeders, number of leechers and list of peers. The procedures of obtain-
ing downloads for a movie go as follows:
13Though trackers coordinate most of the downloads on BitTorrent, it is not the only way todownload file on BitTorrent, downloading can happen in a decentralized way using DHT withouttrackers, I did not currently count download incidence right now in DHT because monitoringthe DHT traffic is difficult. I am working on a estimating of the scale of downloads in DHT forpossible correction on the download estimates
10
• For each movie (e.g. Furious 7), searches the name plus filter in torrent
search engine as shown in Figure 1.
• The webcrawler will collect the infohashes for all search results shown in
Figure 2.
• Specifically, for each torrent file in search results, for example:
“Fast.and.Furious.7.HDRip.XviD.AC3-EVO”, the crawler will get access to
the Torrent information page and record the infohash as shown in Figure 3:
35a89cb57246dbdfdbf581403c33010d177a30dd
• The computer program then transforms the infohash into codes that can be
understood by trackers (Bencode):
5%A8%9C%B5rF%DB%DF%DB%F5%81%40%3C3%01%0D%17z0%DD
• For each tracker in the tracker list (e.g. http://www.todotorrents.com:2710/announce),
the program sends a HTTP GET request14:
GET http:///www.todotorrents.com:2710/scrape?info_hash=5%A8%9C%B5rF
%DB%DF%DB%F5%81%40%3C3
%01%0D%17z0%DD
• The tracker response contains information about the current number of seed-
ers (complete), leechers (incomplete) and the number of completed down-
loads (downloaded) for the file:
{’files’: {’5\xa8\x9c\xb5rF\xdb\xdf\xdb\xf5\x81@<3\x01\r\x17z0\xdd
’: {’downloaded’: 659, ’complete’: 3, ’incomplete’: 4}}}
From the response, ’downloaded’ indicates stock value of completed down-
loads, ’complete’ refers to number of seeders, ’incomplete’ is the number of
leechers. Current number of downloads registered in this tracker for this
torrent is: 659.
• The program records this number and repeats previous steps for all trackers
and all torrents.
11
Figure 1: Home Page of a Torrent Search Engine Figure 2: Search Result
Figure 3: Torrent Information Page
12
Figure 4: File-sharing Activities in the World
Notes: Darker color denotes higher number of file-sharers adjusted by country population. Frequency of file-sharing activities ineach country is based on a sample of 1,698,846 movie downloaders’ IP addresses that I collected from public BitTorrent trackers
during a 5 days period. The geographic information of IP address is obtained using Maxmind’s geoip database.
I will aggregate the number of downloads of each torrent file to get the current
stock value of download count for each movie. Weekly flow value of download is
obtained by taking difference of download count of consecutive weeks. This number
can be treated as the total global downloads because the trackers’ responses to
SCRAPE requests contain no geographical information. Additional HTTP and
UDP ’announce’ request is sent on weekly basis to trackers to get a snapshot list
of IP address of users currently downloading the files. I then use the IP address to
identify the source country of downloaders and the share of downloads from each
country. Country-specific weekly downloads is estimated using this geographic
share information.
4.3 Descriptive Statistics
Figure 4 shows the intensity of file-sharing activities across the world. The intensity
is measured by the number of file-sharers we found in the sample period adjusted
by country population.File-sharing is indeed penetrating into almost every place
in the world. Of 177 countries and regions in the study, file-sharing activities are
found in 170 countries. In terms of total number of file-sharers, United States
is the country with the largest number of file-sharers, making up 13.7 % of the
total numbers. Other followers including Russia (6.3 %) and France (5.4%). Not
14The UDP request is similar so I omit the description of UDP.
13
Table 1: Top-sellers and top downloaded movies
Top Selling MoviesTitle Admission(million)Jurassic World 184.09Furious 7 167.97Avengers: Age of Ultron 155.83Minions 120.15The Hobbit: The Battle of the Five Armies 106.22Inside Out 84.63The Hunger Games: Mockingjay Part 1 83.57Interstellar 75.00Big Hero 6 73.09Mission: Impossible - Rogue Nation 72.94
Top Downloaded MoviesTitle Download(million)Furious 7 35.85Interstellar 35.08Fifty Shades of Grey 30.54Kingsman: The Secret Service 27.18Big Hero 6 23.41The Hobbit: The Battle of the Five Armies 21.65American Sniper 21.28Avengers: Age of Ultron 18.84Taken 3 18.57Jupiter Ascending 16.02
Notes: Box-office and download data are up to September 11th, 2015. Box-office anddownloads are all global numbers.
surprisingly, file-sharing activities in one country are positively correlated with
country’s GDP per capita, population size,15 but they are only mildly correlated
with Internet speeds16.
I match the box-office data with the collected file-sharing data. Table 1 provides
statistics about top downloaded movies and top selling movies. Top downloaded
15Correlation coefficient of GDP and file-sharing is 0.7649, correlation coefficient of populationand file-sharing is 0.3262.
16Correlation coefficient of Internet speed and file-sharing is 0.082. Due to data limitation Ionly able to collect average Internet speed of 59 countries, most of countries with low Internetspeed are not presented in the data, this selection problem may explain the low correlationsfound between Internet speed and file-sharing activities
14
movies are generally blockbuster movies featured by big budgets and massive ad-
vertisement campaign, most of best-seller movies also appeared to be the most
downloaded.
As one kind of experience good, movies exhibit short product life cycles. Con-
sumers have strong preferences for new movies and demand are strongly influenced
by pre-release advertising campaigns. The typical showing period of an ordinary
movie is about 6-10 weeks. Most of a movie’s box-office revenue is concentrated
on the first few weeks since release. For blockbuster movies, the box office revenue
of the opening week usually account for around 20 % of total box office revenue.
Figure 5 shows the pattern of average weekly audience (in 1,000,000’s) and down-
loads (in 100,000’s) per movie by number of weeks after initial release. Weekly
audience attendance in theater decays exponentially, quickly dropping to almost 0
around 10 weeks after initial release. Downloads of pirated movies exhibit a more
persistent pattern, partly because the continuous supply of better quality torrents
in the later period.
The most important thing to point out is that on average most of the mass
of downloads happens after closure of theatre window. Not only is the overlap
between download and boxoffice not big, the quality of downlaoded movies are also
not comparable to movie quality in theater. During the first few weeks after release,
most available pirated movies are the “CAM” version with very low quality17,
which are harly comparable with the quality of normal moviesin theater. Around
5-10 weeks after release, many better quality “TC” version 18 pirated movies come
out and downloads start to increase. Download usually peaks at some time between
10-20 weeks after theater release when the “DVDRip/BluerayRip” version pirated
movies become available due to the movie’s DVD/Blueray release. At this moment,
movies’ theatrical windows have closed for a long time.
Judging by these facts, one conjecture is that movie’s own download might not
displace its own boxoffice by large. To verify the conjecture and quantify the extent
of displacement, counterfactual experiments are conducted in section 7. The fact
that movie’s own download does not overlap with its boxoffice doesn’t mean file-
sharing is not hurting studio revenue. Though the effect on own boxoffice might
17CAM or CAMrip version are usually copy made in a cinema using a camcorder or mobilephone by audience.
18TC (telecine) versions are usually copy produced by transfering the movie from its analogreel to digital format.
16
be low, two other potential effects prevail. One is to displace its own sale revenue
on DVD/Blueray, another one is to displace the similar movies that release later.
Because a lack of data on DVD/Blueray sale, this paper focus on the second effect.
Examination of the cross elasticity of piracy is discussed in section 7.
Table 2 provides sample descriptive statistics for the movies. The average movie
budget is 43.12 million dollars, and the median of movie budget is 17 million, in-
dicating that the distribution is skewed to the right by the top big budget ‘Hits’
movies. The standard deviation is 55.7 million, which shows that the budget dis-
tribution is quite dispersed. The highest budget movie (Avengers: Age of Ultron)
spends 250 million on budget, while about 34.5 % of movies spend less than 10 mil-
lion.19 Compared to budget, movie rating exhibits less dispersion, with an average
rating of 6.7 and a standard deviation of 1.02. 65% of the movies have pirated
copies available on-line. The industry is dominated by major studios (so-called
“Big Six”20). Those “Big Six” studios produced 44% of all movies in the sample,
but account for 80.9 % of all box-office. Average admission per movie is about
0.53 million, similar to movie budget, the distribution is skewed to the right by
blockbuster movies, the best-seller (Jurassic World) admits 92.6 million audience,
which is greater than 6 times of standard deviation above mean. The distribution
of downloads resembles the admission distribution, but it has a much smaller scale
and is less dispersed. In terms of Genre, Three most common genres are Drama
(24.4%), Comedy (21.2%) and Action (14.2%). Table 3 provides information on
market shares across genres and source(download or sale). Average market share
of the legal sale of a movie is about 0.47%, while average market share of the
download of a movie is about 0.05%. So illegal download account for about less
than 10 % of all movie watching activities. Action, Animation and Science Fiction
movie usually have higher market share for both sale and download. The variable
Share at the last column measures the frequency that consumer choose this given
category conditional on watching movies.
19Because the observations missing budget is quite large, we did no include budget in the setof movie characteristics in estimation.
20“Big Six” refers to the six biggest studios in Hollywood: Disney, Warner Brothers, Sony/-Columbia, Universal, 20th Century Fox and Paramount
17
Table 2: Summary Statistics on Movie Characteristics
Mean Std.Dev Median Min MaxBudget(.million) 43.1156 55.7843 17 .1 250Rating 6.7850 1.0296 6.9 4 8.9Cast score(0-100) 57.5166 20.2985 58.9733 0 87.36Director score(0-100) 65.3064 24.3756 69.07 0 96.94Pirated .6535 .4777 1 0 1GenreAction .1417 .3501 0 0 1Animation .0708 .2576 0 0 1Comedy .2125 .4107 0 0 1Drama .2440 .4312 0 0 1Horror .0629 .2439 0 0 1Science Fiction .0708 .2576 0 0 1MPPA RatingPG .1496 .3580 0 0 1PG13 .3464 .4777 0 0 1R .3700 .4847 0 0 1Market ShareSale( %) .4788 .9809 0.1243 0.0102 11.4391Downloads( %) .0514 .0705 0.0280 0.0100 1.0616
Note: Budget are in unit of million. Rating are of a scale of 0-10. Pirated is a dummy variablewhich equals 1 if the movie have pirated version available online. Sale and Downloads in moviecharacteristics section are measured in units. Action, Animation, Comedy, Drama, Horror,Science Fiction, PG, PG13, R are all genre and MPAA Rating dummy variables. In the marketshare section, the market share is an average of one movie’s market shares in all weeks andcountries.
18
Table 3: Market Shares by Genre and Source
(Percent%) Mean Std Deviation Min Max ShareAction Sale 0.84 1.38 0.01 9.10 4.06
Download 0.07 0.09 0.01 0.73 2.11Animation Sale 0.83 1.12 0.01 6.43 24.98
Download 0.04 0.05 0.01 0.45 5.80Comedy Sale 0.36 0.56 0.01 4.17 14.89
Download 0.06 0.08 0.01 1.06 1.70Drama Sale 0.13 0.17 0.01 0.93 15.45
Download 0.03 0.03 0.01 0.26 4.48Horror Sale 0.21 0.25 0.01 1.18 3.18
Download 0.02 0.01 0.01 0.07 1.29Science Fiction Sale 0.87 1.65 0.01 11.44 1.95
Download 0.06 0.07 0.01 0.54 0.28Other Sale 0.16 0.32 0.01 2.08 16.06
Download 0.03 0.03 0.01 0.25 3.74
Note: This table provides information on weekly market share of movie by Genre andSource(Download/Sale). Share variable is measured by using sum of all consumption in one cat-egory devided by sum of total consumption of movies. It can be used to examine the distribution ofconsumer’s choice across categories.
5 Model
Models of movies demand with realistic substitution pattern and taking into ac-
count consumer heterogeneity are pivotal in examining the effect of file-sharing.
In this section, I present a static random coefficient demand model of movies from
both legal source and file-sharing based on Berry, Levinsohn and Pakes(1995). It is
well acknowledged that random coefficient models can generate better substitution
pattern that can get rid of the unrealistic IIA assumption in Multinomial Logit
demand models. In the model, I treat paid movie in cinema and its pirated coun-
terpart as different goods which have very similar product characteristics, their
difference are accounted by dummy variable Pirated.
In the model, time is discrete and indexed by t, the decision period is one week
in length. At each time period we observe a number of markets indexed by m
and a number of products in each market. A product is defined as a movie that
are currently showing in cinemas or available to download on the Internet at a
given period of time. A product is differentiated by title, source and time and can
19
be indexed by jbct where j denotes movie title, c denotes country and b denotes
source( i.e. b=1 denotes download and b=0 denotes in cinemas). In each market,
there is a number of consumers indexed by i. The market size is set to be the total
population of the country.
Consumer i’s utility from movie j at time t via source b in country c is:
uijbct = Xjbctβi + αiPiratedb + φTotalV iewjct−1 + ξj + ∆ξjbct + εijbct (1)
where Xjbt is a vector of observed movie characteristics such as movie ratings in
IMDB, cast and director quality, genres, MPAA rating, weeks after release. I also
add movie brand dummies and country dummies βi is a vector of individual-specific
taste parameters associated with observed movie characteristics21. Piratedb is a
dummy variable which equals 1 if consumer choose to download(b=1) 22, so αi is
the individual specific difference in the mean valuation of legal movies and pirated
movies23. εijbt is the idiosyncratic consumer taste shock following Type-I Extreme
Value distribution.
Spillover and Complementarity Variable TotalV iewjct measure the total
viewership of a movie j at country c at time t. It is defined as the sum of movie
j’s pirated views and paid box-office views at time t in country c.
TotalV iewjct = Mc ∗ (sj0ct + sj1ct)
Mc is the market size in country c24. The purpose of including Total viewership
of last two period in demand equation is to capture and test potential spillover
of demand from pirated movie consumption to paid movie consumption. In the
setting of BLP model pirated movies and paid movies are by construction, sub-
stitutes. A number of recent research have pointed out the possibility of comple-
mentarity through spillover of demand from pirated movie consumption to paid
movie consumption. This could be from sampling effect(Peitz and Waelbroeck,
21I add random coefficient on movie genres and Pirated.22Here I treat piracy and paid movie of the same title as different goods in the model. Although
pirated movies have differentiation in the quality, right now it is not captured in the model.23Notice that I did not include price coefficient in this specification, so one should treat αi as
a combination a taste effect and a price effect.24The market size is defined as the population of country c
20
2006; Kretschmer and Peukert, 2016); peer effect/word-of-mouth effect (Morreti,
2011; Peukert, Claussen and Kretschmer, 2016; Lee, 2016), observational learning
(Newberry, 2016), pure network externalities in movie consumption (Gilchrist and
Sand, 2016), backward spillover on product discovery (Hendricks and Sorensen,
2009). In all of these cases, current demand is affected by previous number of
users. Therefore, I model movie as a network good and add total views of last
period in the demand equation, allowing previous downloads to have a spillover ef-
fect on current demand. I then empirically examine the magnitude of the spillover
effect through counterfactual experiment.
Movie Specific Dummy and Other Controls ξj is the movie dummy vari-
ables used to control for time invariant movie specific unobservable characteristics,
the coefficient of time invariant movie characteristics can be obtained by regress-
ing estimated movie fixed effects on those variables. Beside movie fixed effects, I
also include country fixed effect and interaction of country dummies with variable
Pirated that help control all country specific unobservable component that affect
the demand for movies and valuation difference between pirated and paid movies.
Including those fixed effects help improve fit of the model and serve to correct the
potential correlation between observable movie characteristics and unobservable
characteristics as implemented in Nevo(2001). As now market specific deviation
from mean valuation ∆ξjbct will serve as the“error term”. It is plausible to assume
movie characteristics are predetermined and not responsive to market specific taste
deviation from mean.
Interaction of Movie Characteristics with Pirated An important question
is: What movie characteristics are more amenable to piracy? Is consumer’s taste
on Pirated differ by movie characteristics? To answer the question I add inter-
action term of selected movie characteristics with variable Pirated in the vector
of movie characteristics Xjbct. The coefficients of these interaction terms capture
consumer taste on these movie characteristics different between pirated and paid
movies.
Following Nevo (2000), I model the distribution of consumer taste parameters
for movie characteristics (movie genres) and piracy as multivariate normal with a
21
mean that is a function of demographics25:(αi
βi
)=
(α
β
)+ ΠDi + Σvi (2)
whereDi is a vector of demographics variables. Π is a matrix of parameters measur-
ing how consumer taste change with demographics. vi is a vector of unobservable
consumer characteristics following a multivariate standard normal distribution. Σ
is a scaling diagonal matrix. We can then decompose the utility into the shock
term εijbct, a mean utility term component δjbct:
δjbct(Xjbt, P iratedb, ξjbct; θ1) = Xjbctβ−αP iratedb+ξjbct = Xjbctβ−αP iratedb+ξj+∆ξjbct
(3)
and a individual-specific deviation from mean component µijbt:
µijbct(Xjbct, P iratedb, Di, vi; θ2) = [Piratedb, Xjbct]′ ∗ (ΠDi + Σvi) (4)
where θ = (θ1, θ2) is the vector of parameters to be estimated. θ1 = (α, β) are the
linear parameters and θ2 = (Π,Σ) are the nonlinear parameters.
Consumer i can also choose the outside option to neither watch nor download
any movies. The introduction of outside option gives consumers flexibilities to
turn to other non-movie activities, therefore rules out the unrealistic assumption
that one download must transfer into one sale if file-sharing is disabled. The utility
of outside option is defined as:
ui0bct = εi0bct (5)
Consumer i chooses one among all options to maximize his utility. Since the
error term εijbct follows extreme value distribution, consumer i’s choice probability
of movie jbt at time t can be written as:
Prijbct(Xjbct, P iratedb, Di, vi; θ1, θ2) =exp(δjbct + µijbct)
1 +∑j′exp(δj′bct + µij′bct)
(6)
25For simplicity, I abuse notation a little bit and collapse variable TotalV iew and moviedummies inside movie characterstics Xjbct
22
And the market share of product jbt is then:
sjbct =
∫Prijbct(Xjbct, P iratedb, Di, vi; θ1, θ2)dP (D, v, ε) (7)
6 Estimation and Results
6.1 Estimation
Following the estimation procedures of Berry, Levinsohn and Pakes (1995), I use
GMM method to estimate the model’s parameters. The estimation procedure is
basically a nested fixed point algorithm: in the inner loop I solve a contraction
mapping to get mean utility δ’s from the market share. In the outside loop the
unobserved characteristics ξ’s can be obtained via 2SLS and interacted with instru-
ments to form the GMM objective function. I use BFGS method with self-supplied
analytical Gradient function for the optimization.
To be specific, the data I have is: movie characteristics {Xjbct, P iratedb} and
market shares {sjbct}. The parameters need to estimate is {θ1, θ2}. Given the data
I can solve the contraction mapping in the inner loop of estimation algorithm:
δn+1jbct = δnjbct + ln(sjbct)− ln(S(Xjbct, P iratedb, δ
njbct, θ2)) (8)
where S(Xjbct, P iratedb, δnjbct, θ2) is the simulated market share:
S(Xjbct, P iratedb, δnjbct, θ2) =
1
nind
∑i
Prijbct(Xjbct, P iratedb, Di, vi; θ1, θ2) (9)
Following Dube, Fox and Su(2012), I set the convergence tolerance to be 10−8 to
avoid propagation of simulation error which affects parameter estimates. After I
get mean utility δ’s, I can run 2SLS to get error term: the market specific deviation
of mean valuation ∆ξ’s. I then apply GMM to the set of moment conditions:
E[Z∆ξ(θ)] = 0 (10)
Instruments For identification of the random coefficients, I maintain the as-
sumption that own-product characteristics (except week after release and last
23
week’s total views) are uncorrelated with market specific deviation of mean val-
uation ∆ξ. Given the assumptions, I choose a set of differentiation-instruments
in line with Gandhi and Houde (2016) which approximate Chamberlain (1987)
optimal instruments. The instruments are:
• own product characteristics
•∑
j′ ‖Xkjt −Xk
j′t‖2 for each characteristics k
• sum of number of rival product where difference between rival product char-
acteristics and own product characteristics less than one standard deviation
of product characteristics.∑j′ 1{‖Xk
jt −Xkj′t‖ < sd(Xk)} for each characteristics k
6.2 Identification
In this subsection I provide a discussion of identification in this model.
Mean coefficients on movie characteristics (β’s) Characteristics coeffi-
cients (movie genres, rating, weeks after release) are identified from variation in
sales as such characteristics change across different products.
Spillover Coefficient Identification of the spillover effect coefficients on lagged
total views might run into problems with Fixed effect model as pointed out by
the dynamic panel data literature(Nickell 1981; Arellano and Bond 1991). To
account for the potential bias, I choose to include a group of variables on last
week’s weather condition in this country (Gilchrist and Sand 2016) to instrument
for the lagged total views26. The intuition is that weather condition affect both
movie-watching activities and it’s orthogonal to the time-varying component of
unobservable movie quality after we control for time fixed effect.
26For each country, I collect the weather condition for the top 5 cities in terms of populationdaily information including temperature(Celcius), precipitation(inches), dummy variable aboutHail, Snow, Thunder, Fog, Tornado. I separate temperature and precipitation into 5 degreebins and 0.25 inch bins respectively, these city level dummies are then aggregated by weight ofpopulation into country-week level. The data are obtained from WeatherUnderground.com.
24
Distributions of random coefficient (σ)’s Distributions of random coeffi-
cients are identified using variations in choice sets and the corresponding change
in market shares. For example, if three movie A,B and C are offered, A and C
have the same budget but very different rating while B and C have same rating but
very different budget. Suppose we observe that movie C exits the market, then the
magnitude of how consumers of C shift to movie A and movie B will help determine
the distributions of random coefficient on budget and rating respectively.
Interaction terms with demographics To identify interaction terms with
demographics, ideally we should have several markets with variation in distribution
of demographics. Then a comparison of how market shares change following change
in choice sets across different markets will help identify the interaction terms. Here
I have two source of variations: the first one is variations of demographics over
the 10 countries, second is variations of choice set over time across those markets.
These variation will help the identification of interaction terms.
7 Estimation Results
This section reports the estimation result of our model. Beside the full model,
we also report results of several other alternative specifications including a simple
multinomial logit model as benchmark, a nested logit model and a random coef-
ficient model without interactions with demographics for comparison. For MNL
model and nested logit model, the mean utility can be easily obtained by inverting
the market shares. For random coefficient logit model there exist no closed form
solution to invert mean utility, I implement the numerical procedure described in
the previous section. The results are shown in Table 4. Before discussing the full
model result, I’ll first show the result for MNL benchmark.
Multinomial Logit model Column 1 of Table 4 shows the demand estimates
for MNL model. As the result indicates, there is significant difference between
mean valuation of pirated movies and paid movies in theater. One caveat is that,
since I did not include price coefficient in my demand specification in all models
because the identification problem mentioned earlier, dis-utility from price is not
25
controlled for when we interpret the taste for Pirated. One should treat the coef-
ficient as the combination of the negative quality differential effect and a positive
price differential effect.
On average, mean utilities of pirated movies are lower than legal movies by
0.54. This utility difference suggests the quality of a pirated movie at the early
stage of movie release is quite low compared with counterpart in cinema, and there
are certain costs to download movies via file-sharing(waiting time, search cost, risk
of being blocked by ISPs).
The spillover coefficient on lagged totalviews is positive and significant, indicat-
ing that current demand is influenced by previous demand, both piracy and paid.
Movies with better cast and director have better valuations among consumers.
Weeks after release coefficient is negative and significant, indicating consumer’s
preference of ‘fresh’ movies. A counterintuitive result is that the coefficient of rat-
ing is negative and statistically significant, this may be because consumers have
high expectation and are more critical on big budget movies which usually have
wider releases and higher box-office, while consumer rating are relatively lenient
for independent movies, so they are more inclined to get high ratings. Estimates
on Genre dummies indicate that generally action, animation and science fiction are
more popular movie genres than drama and horror movies. Negative coefficient
on the constant term shows that compared with watching movies, people usually
have better outside option.
Nested Logit Column 2 shows the result for nested logit model. As a prior
I impose a nested choice structure where consumer first choose which movie to
watch and then choose the way to watch that movie(theater/download). The
nested choice structure is helpful to overcome the IIA assumption presented in
MNL model. The model can be described as follows:
uijbct = Xjbctβ + φTotalV iewjct−1 + ξjbct + εijbct (11)
where
εijbct = λj + αPiratedb + (1− ρ)εijbct (12)
Here the term εijbct is a combination of group specific shocks and the idiosyn-
cratic shocks. ρ is the nesting parameter which represents the degree of preference
26
Table 4: Demand Estimation Results
(1) (2) (3) (4)MNL Nested Logit Random Coefficient Logit RC Logit with Demographics
Mean Utility Random Coefficient Mean Utility Random Coefficient
Pirated -0.54∗∗∗ -0.36∗∗∗ -1.221* 1.580*** -1.946* 4.533***(0.08) (0.08) (0.615) (0.057) (0.716) (0.022)
TotalViews 0.16∗∗∗ 0.20∗∗∗ 0.182*** 0.122***(in millions) (0.01) (0.01) (0.017) (0.020)Weeks after Release -0.04∗∗∗ -0.05∗∗∗ -0.021** -0.033**
(0.00) (0.00) (0.009) (0.010)Rating -0.02 0.00 -0.109 -0.064
(0.02) (0.02) (0.093) (0.099)Action 1.01∗∗∗ 0.95∗∗∗ -1.705*** 3.096*** -1.217*** 2.783***
(0.09) (0.09) (0.397) (0.127) (0.421) (0.344)Comedy 0.25∗∗ 0.25∗∗ -0.3611 1.0753 -0.437 1.057***
(0.09) (0.08) (0.3381) (1.2231) (0.358) (0.180)Drama -0.08 -0.08 -3.394*** 2.890*** -5.051*** 3.761***
(0.09) (0.08) (0.309) (0.157) (0.328) (0.253)Science Fiction 0.76∗∗∗ 0.79∗∗∗ -0.935 2.280*** -1.142* 2.449***
(0.09) (0.09) (0.515) (0.335) (0.545) (0.843)Horror 0.25∗ 0.23∗ -5.328*** 4.006** -2.901*** 2.748***
(0.10) (0.10) (0.389) (0.877) (0.412) (0.326)Cartoon 1.43∗∗∗ 1.44∗∗∗ -1.541*** 2.653*** -2.628*** 3.224***
(0.09) (0.09) (0.457) (0.335) (0.484) (1.059)PG -0.68∗∗∗ -0.74∗∗∗ -4.091*** 2.915*** -3.131*** 2.521***
(0.10) (0.10) (0.429) (0.110) (0.454) (0.479)PG-13 -0.26∗∗ -0.32∗∗∗ -4.274*** 3.432*** -3.523*** 2.915***
(0.09) (0.09) (0.3631) (0.078) (0.385) (0.418)R -0.29∗∗ -0.37∗∗∗ -4.518*** 3.199*** -5.551*** 3.732***
(0.09) (0.09) (0.343) (0.571) (0.364) (0.584)Cast 0.00∗∗∗ 0.01∗∗∗ 0.008 0.006
(0.00) (0.00) (0.0058) (0.006)Director 0.00∗∗∗ 0.01∗∗∗ 0.005 0.005
(0.00) (0.00) (0.005) (0.006)Sequel 0.17∗∗∗ 0.21∗∗∗ 0.766* 0.739*
(0.04) (0.04) (0.343) (0.363)Nesting Parameter 0.38∗∗∗
(0.02)Constant -6.43∗∗∗ -6.47∗∗∗ -6.338*** 1.873*** -10.350*** 1.787***
(0.14) (0.14) (0.617) (0.212) (0.653) (0.2141)
Interaction with Demographics
Age*Pirated -4.533***(1.185)
Income*Pirated -0.707***(0.030)
Internet Speed*Pirated 2.007***(0.044)
Age*Constant 1.216*(0.549)
Income*Constant 0.448***(0.034)
Movie Fixed Effect X XTime Fixed Effect X X X XObservations 5625 5625 5625 5625Adjusted R2 0.4593 0.4955
Notes: Standard errors in parentheses. ***,**, and * denote statistical significance at 0.005, 0.01, and 0.05 levels respectively. Based on 5625 observations. Cast and Director arevariables ranging from 0 to 100 measuring the strength of cast and director in terms of previous box-office performance. Vriable Age is a binary variable indicates whether or notindividual is older than 40. Variable Income is the log of annual income and variable Internet Speed is the log of the speed of Internet. For full model, movie dummies, countrydummies and interaction terms of Pirated with contry dummies are included. Coefficients of time-invarying movie characterristics are obtained from regressing movie fixed effectson time-invarying movie characterristics.
27
Table 5: Substitutability of Piracy across Movie Genres
Genre MNL Nested Logit RC Logit RC Logit with Demographics
Action -1.15*** -1.03*** -1.3538*** -1.0558***(0.10) (0.10) (0.2159) (0.2515)
Comedy -0.42*** -0.41*** -0.8121*** -0.6911***(0.10) (0.09) (0.2084) (0.2428)
Drama -0.13 -0.10 0.2244 0.4715(0.10) (0.10) (0.2336) (0.2722)
Science Fiction -0.62*** -0.57*** -0.8024*** -0.5514*(0.10) (0.10) (0.2139) (0.2492)
Horror -0.83*** -0.84*** -1.8048*** -0.5695(0.13) (0.12) (0.2539) (0.2958)
Cartoon -1.07*** -0.88*** -1.1734*** -1.1161***(0.10) (0.10) (0.2362) (0.2751)
Notes: Standard errors in parentheses. ***,**, and * denote statistical significance at 0.005, 0.01, and 0.05 levelsrespectively. Based on 5625 observations. The reported estimates denote coefficients of interaction terms betweenPirated and Movie genre dummies. The coefficient can be interpreted as a measure of how disutility of Piracydiffer across movie genres. Larger coefficients indicates higher disuility of the pirated version and therefore lowersubstitutabiliy.
correlation between products in the same group (title).
From column 2 of Table 4, the coefficient for the taste of Piracy drops to -0.36,
and most of the other coefficient are not significantly different from MNL model.
The coefficient for the nesting parameter is 0.38 and statistically significant.
Random Coefficient Logit I estimate two versions of random coefficient logit
models, the first version adds random coefficients on Pirated, movie genres, MPAA
rating and the constant term which is shown on column 3. To explore the role of
demographics in terms of exlaining heterogeneity in preference for piracy, I include
demographics in the second version of RC logit model. Specifically, I include three
demographics variable: Age is a binary variable indicates whether or not individ-
ual is older than 40. Log(Income) is the log of annual income and Log(Internet
Speed) is the log of the speed of Internet. Distribution of Age and Log(Income)
is obtained from Luxembourg Income Study, and distribution of Internet speed
28
data27 is obtained from Testmy.net. I interact demographics (Age, log(Income),
log(Internet Speed)) with variable Pirated and the constant term.
I’ll mainly focus on the full model with demographics, the mean coefficient
on piracy, which represent the taste for piracy is -1.94, the standard deviation of
random coefficient is 4.25, which shows that people’s preference of piracy is quite
dispersed. Figure 6 shows the frequency distribution of consumer taste for Pirated.
Again since there is no price coefficient the price effect is inside the coefficient.
Presumably when we take control of the price, the whole distribution will shift to
the left. Currently about 4.76% of individuals’ tastes on Pirated are positive. part
of the heterogeneity in taste of piracy can be explained by demographics. Estimate
on interaction of Pirated with income is negative and significant, suggesting that
taste on piracy is higher for people with lower income. Pirated movies are in some
sense “inferior goods”, as the marginal valuation of Pirated movies decrease with
income. Interaction on Pirated and Internet speed is positive and statistically
significant, indicating preference for piracy is higher if there is higher Internet
speed. The estimated coefficient of Interactions of Age on Pirated is significant
and negative, indicating that taste on piracy are higher for younger people.
The full model also interact demographics variable Age, Income with the con-
stant term. The estimate on interaction of constant term with income is positive,
showing that generally watching movies are normal goods. The positive cieffi-
cient on interaction of Age and cosntant term indicates generally younger people
are more willing to watch movies. Standard deviations of most of other random
coefficients are significant, which shows that adding random coefficients can ex-
plain a significant amount of heterogeneity that can not be explained by observed
variations in demographics.
For spillover effect, estimated coefficients on lagged Total Viewership is 0.1219
in current version of the result, which means an increase on 1 million views of the
movie in the last week will increase consumer valuation for this movie by 0.1219.
This indicate that controlling for observable movie characteristics, there is some
evidence that consumer demand are influenced by past box-office and downloads.
Potentially through spillover effects from word-of-mouth communication and rec-
ommendation by peers who previously consumed the movies.
27Unfortunately I don’t have joint distribution of internet speed with income and age in mydata.
29
Figure 6: Frequency Distribution of Consumer Taste for Pirated
Note: Frequency distribution of consumer taste for Pirated. Since I did not include price coefficient in my demand equation,dis-utility from price is not controlled for when we interpret the taste for Pirated. Presumably when we take into consideration
the price effect, the whole distribution will shift to the left. About 4.76% of individuals’ tastes on Pirated are positive.
Table 5 reports the estimated coefficients for interaction term between movie
genres and Pirated. The coefficient can be interpreted as a measure of how disu-
tility of Piracy differ across movie genres. Larger coefficients indicates higher
disuility of the pirated version and therefore lower substitutability. Results show
that there are significant difference in piracy’s substitutability, for example, action
and cartoon movies have consistently more negative coefficient, indicating that
these genres of movies are more suitable for theater experience. Other genres like
science fiction also have significant but smaller difference. Where for Drama movie
it is positive but not significant.
8 Counterfactual Experiments
The most important task in this paper is to estimate the true cost of file-sharing
on movie box-office and its welfare implications. In this section, I conduct several
counterfactual experiments to estimate the true cost of file-sharing on box-office
revenue. Specifically, First, I conduct a “No-Piracy” experiment that eradicate all
pirated movie products in my models and compared the counterfactual box-office
revenue and consumer welfare with the benchmark. Second, I consider a firm level
30
Table 6: Result of No-Piracy Counterfactual Experiment
With Piracy No Piracy ChangeIndustry Revenue(billion) 6.82 6.91 +0.09Consumer Welfare(billion) 8.74 8.01 -0.73
“Anti-Piracy Campaign” for each movie by removing just pirated versions of this
movie, while leaving other movies’ pirated version untouched. Third, I shut down
the Spillover effect channel to measure the magnitude of spillover effect of piracy
on box-office.
8.1 Eradicate All Piracy
I remove all pirated movies in the model and recalculated counter-factual market
shares using the estimated full model parameters in Table 6. Assuming price is
the same after the no-piracy policy, I can then calculate counterfactual industry
revenue as the product of market share times market size and price. Following
Train(2003), consumer welfare at market c and time t is calculated as the market
size times the average of expected maximum value of indirect utility of simulated
individuals:
CSct = Mct1
α
1
nind
∑i
E[maxuijbct] (13)
where α is the mean price coefficient used to translate utility into terms of money
value28 and Mc denotes market size of country c.
The result of the counterfactual experiment is shown in Table 6. The elimina-
tion of pirated movies on file-sharing will result in a increase of industry revenue of
$90 millions during 20 week period in those 10 countries. The number represent a
1.4 % increase in total box-office revenue. It will translate into an annual number
of $0.243 billion, which is lower than the widely cited estimates of $3 billions from
MPAA in 2005.29 On the other side, consumer welfare decrease by $ 0.73 billion
when we ban piracy, which is 3 times higher than the increase in motion picture
industry revenue. There is a dead weight loss of $0.64 billions if we ban movie
28Because I did not attempt to estimate price elasticity in this paper, I parametrize the α as0.16 according to Davis (2002).
29In 2005, the Motion Picture Association of America (MPAA) estimated that they were losing$3 billion in box office sales due to piracy according to De Vany and Walls(2007)
31
piracy. In general, The counterfactual result suggests that piracy indeed “rob”
firm revenue, but also increase consumer welfare which is higher than the initial
loss. So policy that eradicating movie file-sharing may result in transfer of large
reduction of consumer welfares into small increase in industry revenue, resulting
in socially inefficient outcomes from just the social welfare’s point of view.
If we use a “Naive” way to estimate the revenue loss, assuming that one down-
load equals one lost sale of paid movies, then the estimated revenue loss amount to
0.828 billion dollars for the same time periods and countries, which is 9.2 times of
the revenue loss calculated in counter-factual experiment. Many widely cited in-
dustry studies have employed this “Naive” methods in their estimation on piracy’s
cost. The result shows that using such methodology will substantially inflate the
true loss of piracy.
I also calculate the average displacement rate of pirated movies on legitimate
movie sale in theaters. On average one download displaces legitimate sale by 0.11
unit.
To assess the heterogeneity in responses to removal of piracy, I calculate the
displacement rate and recovered revenue for each movie. Table 8 shows some de-
scriptive statistics on the distribution of recovered revenue. There are substantial
heterogeneity in terms of movies recovered revenue from piracy eradication be-
cause of the difference in position in characteristics space and level of competition
faced. I calculate each movie’s revenue gain from the no-piracy counter-factual
experiment. On average movie’s revenue increases by 0.379 million dollars, the
distribution is quite dispersed with a standard deviation of 1.390 million.
To further understand how the response differs with movie characteristics, I
run an OLS regression of each movie’s recovered revenue and displacement on a
number of movie characteristics, the result is shown in Table 7. From the result, for
displacement rate, wide release movies have significantly higher displacement rate.
In terms of recovered revenue, Action movie and Science Fiction movie significantly
benefit more from removal of piracy. Again, wide release movies have significantly
much higher recovered revenue. These results indicate that the removal of piracy
mainly benefit the ”blockbuster” movies that lies in the top tail of the distribution.
32
Table 7: OLS Regression of Recovered Revenue and displacement Rate on MovieCharacteristics
(1) (2)Recovered Revenue in Full Eradication Displacement rate
Wide Release 1.46*** 0.22***(0.22) (0.04)
cast 0.00 0.00(0.00) (0.00)
director 0.00 -0.00(0.00) (0.00)
sequel 1.14*** -0.05(0.27) (0.05)
rating 0.14 0.00(0.07) (0.02)
action 0.68* 0.08(0.32) (0.07)
comedy -0.14 0.04(0.26) (0.06)
drama -0.07 0.02(0.24) (0.06)
sci 1.30** 0.02(0.41) (0.08)
horror -0.43 0.07(0.31) (0.08)
animation -0.00 0.03(0.36) (0.08)
pg -0.50 -0.03(0.33) (0.12)
pg13 -0.16 -0.06(0.28) (0.11)
r -0.09 -0.03(0.27) (0.11)
Constant -0.99* -0.01(0.48) (0.17)
Adjusted R2 0.40 0.2633
8.2 Partial Eradication
The previous contractual experiment resemble the copyright protection at the pub-
lic and legislative level, where policy are tend to affect the whole industry. But
copyright protections are not always initiated by the government or legislation, in
recent years private copyright protection initiated by firms targeting at individ-
ual copyrighted work becomes more and more prevalent. As Reimers (2016) have
pointed out, such private copyright protections are effective in the book publishing
industry. In motion picture industry, studios also hire internet surveillance com-
pany to monitor and send DMCA notices to take down torrents files on file-sharing
websites. How effective are those private copyright protection efforts targeted to
remove piracy for individual movie? Will downloader substitute into its paid ver-
sion, other pirated movies or simply the outside options? To answer the question
I conduct a partial removal counter-factual experiment. In this experiment, for
each movie I simulate a firm-level private copyright protection campaign, which
eliminate all its pirated versions in all countries across all time periods, but leave
pirated versions of other movies untouched. I then calculate counter-factual mar-
ket shares and counter-factual revenue increase for that movie.
Table 8 shows the comparison of average movie’s revenue increase between this
partial eradication counterfactual experiment and the full eradication experiment.
Not surprisingly, average revenue increase has dropped to 0.045 million, only 12%
of the average recovered revenue by eradicating all piracy. In this counterfactual,
most downloaders will choose the other available pirated movie or other similar
movies instead because in many cases the availability of original movie in theaters
are small.
Table 10 shows how consumers substitute into other products when their initial
choice was eliminated for a selected number of movies in US at one particular time
period. An examination of Table 10 reveals that substitution is not just restricted
to substitution within movie title. There is also notable cross substitution effect
of piracy across movie titles with similar characteristics. For example, after the
elimination of pirated versions of movie Minions, 26 % of downloaders of Minions
choose to go to watch Minions in theaters, also 4.5 % of downloaders choose to
watch another cartoon Inside Out. In general, except for blockbuster movie like
Avengers: Age of Ultron or Jurassic World that have few concurrent competitors
34
Table 8: Comparison of Revenue Increase from two Counter-Factual Experiment
(in millions) Mean Std Dev Min Max Effects on other moviesFull removal 0.379 1.390 -0.007 10.711 -Partial removal 0.045 0.359 -0.008 4.607 0.22
and lots of concurrent downloads during release, most movies can only reclaim
a small fraction of recovered revenue in full eradication of piracy. Both due to
the timing of downloads and cross-substitution into other piracy or paid movies.30
Despite the limited influence on its own revenue, the externalities from private
copyright protection to the other movies are big in magnitude. On average other
movies gain 0.22 million dollar in total, roughly 5 times of the gain from the own
movie. The result to some extent indicate that the biggest threat to one movie’s
box-office revenue is not the piracy of its own movie, but rather the movies whose
downloads overlap with its box-office windows. In order for private copyright pro-
tection to secure the box-office revenue, other studios’ copyright protection efforts
are equally important, so coordination and cooperation of copyright protection
efforts may be beneficial to studios.
8.3 How Big is the Spillover Effect?
In the third counter-factual experiment, I quantify the magnitude of the spillover
effect from pirated consumption. In the model, demand are influenced by the total
viewership in the last two periods to account for all possible channels of spillover
effects. Higher previous market share in pirated movie therefore can benefit the
demand for paid movies in next period. Based solely on the estimates, the spillover
effect is not statistically significant, but it is difficult to know whether the mag-
nitude of spillover effects have economics significance. so in this counterfactual
experiment, I shut down the channel for spillover effect from piracy by redefine
the previous totalviews as only the previous views from box-office, and compare the
counterfactual revenue with the benchmark to quantify the magnitude of spillover
effect on industry revenue.
30Here the cross-substitution effect might be overestimated because of the model assumptionsof iid taste shock between one movie and its piracy, it would be useful to verify using differentspecification of choice structure, for example imposing a nested structure of choice.
35
Tab
le9:
Subst
ituti
onP
atte
rns
up
onE
radic
atio
nof
Par
ticu
lar
Mov
ie’s
Pir
acy
for
US
inw
eek
15
Per
cent(
%)
ou
tid
eop
tion
Paid
Movie
sP
irate
dM
ovie
s
Hom
eIn
sid
eO
ut
Ju
rass
icW
orl
dT
erm
inato
rM
inio
ns
Spy
Hom
eJu
rass
icW
orl
dM
inio
ns
Spy
ou
tsid
eop
tion
--
--
--
--
--
-
Paid
Movie
sH
om
e33.4
069
-100
9.1
391
0.5
217
0.1
418
54.0
09
0.0
058
0.2
265
0.0
291
0.0
93
0.0
011
Insi
de
Ou
t35.2
809
0.1
467
-100
0.5
684
0.1
522
60.7
835
0.0
061
0.2
481
0.0
309
0.1
019
0.0
011
Ju
rass
icW
orl
d77.0
618
0.0
085
0.5
938
-100
8.0
485
3.5
092
0.0
043
0.0
01
0.8
649
0.0
004
0.0
005
Ter
min
ato
r80.4
048
0.0
03
0.2
091
9.9
687
-100
1.2
357
0.0
495
0.0
006
0.1
373
0.0
003
0.0
006
Min
ion
s55.5
423
0.4
934
34.5
919
1.1
557
0.2
79
-100
0.0
102
0.7
127
0.0
493
0.2
927
0.0
013
Spy
30.3
505
0.0
005
0.0
355
0.0
255
0.2
027
0.2
096
-100
0.0
001
0.0
01
09.4
217
Pir
ate
dM
ovie
sH
om
e59.8
264
0.0
67
4.6
948
0.0
191
0.0
09
27.7
45
0.0
002
-100
0.5
691
3.6
39
0.0
012
Ju
rass
icW
orl
d90.7
006
0.0
017
0.1
201
3.2
379
0.3
738
0.7
099
0.0
006
0.1
426
-100
0.0
586
0.0
103
Min
ion
s56.3
862
0.0
65
4.5
548
0.0
19
0.0
089
26.9
177
0.0
002
8.2
414
0.5
626
-100
0.0
012
Spy
17.5
318
0.0
002
0.0
114
0.0
053
0.0
044
0.0
676
15.8
22
0.0
006
0.0
241
0.0
002
-100
36
Table 10: Comparison of Counter-factual Revenue: With Spillover vs No Spillover
(in millions) Industry revenue Consumer surplus
No Spillover 6807.37 8734.13
Benchmark 6816.78 8746.08
Contribution of Spillover Effects from Piracy 9.40 11.95
The results are shown in Table 11. From current version of results, the contri-
bution from spillover effect on industry revenue is relatively moderate. It increases
the total industry revenue by 9.4 million dollars in these 10 countries. The number
represent 0.14% of the total box-office revenue, so it is unlikely that spillover effect
from piracy will have a huge benefit for boxoffice. Spillover effects also increase
consumer welfare by 11.95 million dollars. The small magnitude in benefits to
boxoffice may be attributed to the fast decay of movie attendance in theaters, as
most downloads take place late in movie’s life cycle in theaters, spillover effect
happens too late to affect sale as movies’ availability in theaters drop quickly.
However this does not rule out the important roles spillover effects might play in
other distributional channels like homevideo/video on demand market.
9 Conclusion
This paper examines the effect of file-sharing on movie box-office revenue. To al-
low for more flexible substitution patterns, I estimate a random coefficient demand
model of movies allowing demand to be influenced by spillover from pirated con-
sumption and use the no-piracy counterfactual experiment to quantify the effect
of file-sharing. Using a representative sample of download data from BitTorrent
networks, I have several findings. First, file-sharing reduces total revenue of the
motion picture industry from box-office by $ 90 million in total, 1.4 % of the cur-
rent box-office31. The estimates are smaller than widely cited industry estimates
constantly referenced in policy making, the “naive” methodology which assume
full sale displacement will inflate the true cost 9.2 times. On average one movie
31The number is a total of number in 10 countries under study with a time period of 20 week.
37
suffers monetary loss of 0.259 million because of file-sharing. Second, on average
one download displace legitimate sale by 0.11 unit. Third, the results of welfare
analysis show that file-sharing increase consumer welfare by a total of $ 0.73 bil-
lion, therefore banning file-sharing service will result in a dead weight loss of $ 0.64
billion. Fourth, I examine factors that affect the revenue loss due to piracy. I find
that wide release movies, science fiction and action movies are more vulnerable to
piracy. In addition, anti-piracy campaigns that remove piracy for individual movie
have limited benefits to boxoffice revenue because most downloaders just substi-
tute into other pirated movies. Lastly, I examine the magnitude of spillover effect
of piracy on boxoffice revenue. I find that spillover effect contributes to box-office
revenue by a total of 9.4 million dollars in 10 countries during 20 weeks period.
The findings of this paper serve to provide extra evidence to assist the resolution
of current heated debate on controversial issues regarding intellectual property. For
policy makers, the findings in this paper highlight the importance of considering
outside option and substitution in evaluating th effect of file-sharing, research
omitting these factors will substantially overestimate the negative effects of file-
sharing and should be treated with caution for policy making. For industry, the
finding in this paper can be used by motion picture studios to determine the
optimal level of copyright protection given the high supervision and litigation cost.
An interesting question I did not answer in this paper is how supply of movies
is affected by file-sharing since I take movie release as exogenous in my model.
An interesting extension of this paper will be to model the movie release decision
as an entry game given the estimated demand system. This will help to find the
effect of file-sharing on producer incentives to supply new products, which is also
an important question worth exploring in the future.
References
[1] Manuel Arellano and Stephen Bond. Some Tests od Specification for Panel
Data: Monte Carlo Evidence and an Application to Emplyment Equations.
The Review of Economic Studies, 59(2):277–297, 1991.
[2] Paul Belleflamme and Martin Peitz. Digital Piracy: Theory. The Oxford
Handbook of the Digital Economy, Oxford University Press, 2012.
38
[3] Paul Belleflamme and Martin Peitz. Industrial organization: markets and
strategies. Cambridge University Press, 2010.
[4] Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market
equilibrium. Econometrica, 63(4):841–890, 1995.
[5] David Blackburn. Does file sharing affect record sales. PhD diss. Harvard
University, 2004.
[6] Michele Boldrin and David Levine. The case against intellectual property.
American Economic Review, pages 209–212, 2002.
[7] Bram Cohen. The bittorrent protocol specification. 2015.
[8] Brett Danaher and Joel Waldfogel. Reel Piracy: The Effect of Internet Film
Piracy on International Box Office Sales. Working Paper, 2015.
[9] Brett Danaher and Michael D Smith. Gone in 60 Seconds: The Impact of
the Megaupload Shutdown on Movie Sales International Journal of Industrial
Organization, 33:1–8, 2014.
[10] Peter Davis. Estimating multi-way error components models with unbalanced
data structures. Journal of Econometrics, 106(1):67–95, 2002.
[11] Peter Davis. Spatial competition in retail markets: movie theaters. RAND
Journal of Economics, pages 964–982, 2006.
[12] Nicolas De Roos and Jordi McKenzie. Cheap tuesdays and the demand for
cinema. International Journal of Industrial Organization, 33:93–109, 2014.
[13] Arthur De Vany and W David Walls. Uncertainty in the movie industry: Does
star power reduce the terror of the box office? Journal of Cultural Economics,
23(4):285–318, 1999.
[14] Arthur S De Vany and W David Walls. Estimating the effects of movie piracy
on box-office revenue. Review of Industrial Organization, 30(4):291–301, 2007.
[15] Jean-Pierre Dube, Jeremy T Fox, and Che-Lin Su. Improving the numerical
performance of static and dynamic aggregate discrete choice random coeffi-
cients demand estimation. Econometrica, 80(5):2231–2267, 2012.
39
[16] Liran Einav. Seasonality in the us motion picture industry. RAND Journal
of Economics, pages 127–145, 2007.
[17] David Erman. Bittorrent Traffic Measurements and Models, 2005.
[18] Amit Gayer and Oz Shy. Internet and peer-to-peer distributions in markets
for digital products. Economics Letters, 81(2):197–203, 2003.
[19] Benjamin Klein, Andres V Lerner, and Kevin M Murphy. The economics of
copyright” fair use” in a networked world. American Economic Review, pages
205–208, 2002.
[20] Jonathan Lee. Purchase, Pirate, Publicize: The Effect of Private-Network
File Sharing on Album Sales Working paper, 2016.
[21] Tin Cheuk Leung. What is the true loss due to piracy? evidence from mi-
crosoft office in hong kong. Review of Economics and Statistics, 95(3):1018–
1029, 2013.
[22] Robert Layton, and Paul Watters. Investigation into the extent of infringing
content on BitTorrent networks Internet Commerce Security Laboratory, 8–
10, 2010.
[23] Stan Liebowitz. Will mp3 downloads annihilate the record industry? the
evidence so far. Advances in the Study of Entrepreneurship, Innovation, and
Economic Growth, 15:229–260, 2004.
[24] Stan J Liebowitz. Pitfalls in measuring the impact of file-sharing on the sound
recording market. CESifo Economic Studies, 51(2-3):435–473, 2005.
[25] Stan J Liebowitz. File sharing: creative destruction or just plain destruction?
Journal of Law and Economics, 49(1):1, 2006.
[26] Liye Ma, Alan Montgomery and Michael D. The Dual Impact of Movie Piracy
on Box-Office Revenue: Cannibalization and Promotion. Available at SSRN:
https://ssrn.com/abstract=2736946
[27] Jordi McKenzie. The economics of movies: A literature survey. Journal of
Economic Surveys, 26(1):42–70, 2012.
40
[28] Aviv Nevo. Mergers with differentiated products: The case of the ready-to-eat
cereal industry. The RAND Journal of Economics, pages 395–421, 2000.
[29] Aviv Nevo. A practitioner’s guide to estimation of random-coefficients logit
models of demand. Journal of Economics and Management Strategy, 9(4):513–
548, 2000.
[30] Aviv Nevo. Measuring market power in the ready-to-eat cereal industry.
Econometrica, 69(2):307–342, 2001.
[31] Stephen Nickell. Biases in Dynamic Models with Fixed Effects. Econometrica,
49(6):1417–1426, 1981.
[32] Imke Reimers. Can Private Copyright Protection be Effective? Evidence
from Book Publishing. Journal of Law and Economics, 59, no.2 (May 2016):
411–440.
[33] Felix Oberholzer-Gee and Koleman Strumpf. The effect of file sharing on
record sales: An empirical analysis. Journal of Political Economy, 115(1):1–
42, 2007.
[34] Motion Picture Association of America. Theatrical market statistics 2014.
2014.
[35] Barak Y Orbach and Liran Einav. Uniform prices for differentiated goods:
The case of the movie-theater industry. International Review of Law and
Economics, 27(2):129–153, 2007.
[36] Christian Peukert, Jorg Claussen , and Tobias Kretschmer. Piracy and Box
Office Movie Revenues: Evidence from Megaupload. International Journal of
Industrial Organization, forthcoming, 2016.
[37] Tobias Kretschmer and Christian Peukert. Video killed the radio star? Online
music videos and digital music sales. Working Paper, 2016.
[38] Martin Peitz and Patrick Waelbroeck. Piracy of digital products: A criti-
cal review of the theoretical literature. Information Economics and Policy,
18(4):449–476, 2006.
41
[39] Kathleen Reavis Conner and Richard P Rumelt. Software piracy: an analysis
of protection strategies. Management Science, 37(2):125–139, 1991.
[40] Rafael Rob and Joel Waldfogel. Piracy on the high c’s: Music download-
ing, sales displacement, and social welfare in a sample of college students.
Technical report, National Bureau of Economic Research, 2004.
[41] Rafael Rob and Joel Waldfogel. Piracy on the Silver Screen. Journal of
Industrial Economics, 55(3), 379–395, 2007.
[42] Joshua Slive and Dan Bernhardt. Pirated for profit. Canadian Journal of
Economics, pages 886–899, 1998.
[43] Olaf van der Spek. Udp tracker protocol for bittorrent. 2015.
[44] Kenneth E Train. Discrete choice methods with simulation. Cambridge uni-
versity press, 2009.
[45] Joel Waldfogel. Music file sharing and sales displacement in the itunes era
Information economics and policy, 22:306–314, 2010.
[46] Alejandro Zentner. Measuring the effect of file sharing on music purchases*.
Journal of Law and Economics, 49(1):63–90, 2006.
42
Appendix A Reliability of the Download Esti-
mates
Given the difficulty in estimating traffics on BitTorrent, concerns might be raised
regarding precision of the collected data in this paper, as indeed certain type of
BitTorrent activities are omitted in our data collection procedures. For example,
the data collection process are unable to track download activity happened through
trackerless protocol (DHT) and private trackers. It would be ideal to compare
our data with data with more reliable statistics from sources such as Internet
surveillance companies to further assess the quality of our data. While the data
on downloading via BitTorrent for movies are scarce. I manage to find yearly
download statistics for a limited number of movie in 2015 estimated by professional
piracy tracking company Explico32. Table 12 shows the comparison of download
estimates in this paper and Explico’s estimates.
As the Table shows, indeed there are some difference between the two columns,
generally our data tend to underestimate the download compared to Explico’s, our
average is 28,155,435 compared with their average: 33,221,557. The correlation
coefficient is 0.88. The high correlation suggest that variation in our data well
match the variation in file-sharing network. Although data estimates in this paper
are usually smaller than Explico’s estimate, to assess robustness of our result to the
downloads count. we can multiplied our download count by a factor to minimize
the distance to Explico’s download estimates and re-estimate the model.
[IMCOMPLETE]
Appendix B Illegal Streaming
With the emergence of Pirated streaming website like Popcorntime, Putlocker and
Movie4k, many file-sharing users have switched from downloading to streaming.
In 2015, streaming has already taken up a significant proportion of total piracy
activity. In order to taken into account the increasing popularity of illegal stream-
ing service, the volume of illegal streaming need to be estimated. Unfortunately
it is technically very difficult to monitor the movie streaming traffic.
32http://variety.com/2015/digital/news/top-10-pirated-movies-of-2015-see-alarming-increase-in-downloads-1201667982/
43
Table 11: Comparison between Download Estimates from Explico and this paper
Movie Title Explico’s Estimates Estimates in this paperInterstellar(2014) 46,762,310 37,615,912Furious 7(2015) 44,794,877 37,961,921Avengers: Age of Ultron (2015) 41,594,159 36,418,665Mad Max: Fury Road (2015) 36,443,244 29,645,492Terminator: Genisys (2015) 31,001,480 30,399,370San Andreas (2015) 25,883,469 20,376,013The Minions (2015) 23,495,140 22,071,636Inside Out (2015) 22,734,070 22,135,244Jurassic World (2015) 36,881,763 27,094,954American Sniper (2014) 33,953,737 24,423,823Fifty Shades of Grey (2015) 32,126,827 34,442,676The Hobbit: Battle Of The Five Armys (2014) 31,574,872 24,179,608Mean 33,211,557 28,155,435Correlation Coefficient: 0.88
To overcome the difficulties in direct estimation of streaming traffic, we choose
to leverage search traffic data for streaming and downloading in Google Trend as
proxies for actual downloading/streaming activities. Given our estimated down-
loads on BitTorrent we can come out with an estimates to the volume of streaming
activities using ratio of Google Trend search traffic index between streaming and
downloading.
The procedure is as follows. First, we keep track of a list of most common search
queries about streaming/downloading that appear in the Top related search queries
list related to movie category. Second, we divide those queries into Streaming-
related and Downloading-related and retrieve values of their weekly search traffic
index for each movie in our sample. Third, we calculate the ratio between aggregate
download-related and streaming-related traffic for each movie. The ratio for each
movie is then used to adjust total piracy views estimates. We can then reestimate
the model using the new piracy views estimates.
[IMCOMPLETE]
44
Appendix C List of Trackers
udp://open.demonii.com:1337/announce
udp://9.rarbg.com:2710/announce
udp://tracker.leechers-paradise.org:6969/announce
udp://glotorrents.pw:6969/announce
http://bttracker.crunchbanglinux.org:6969/announce
http://i.bandito.org/announce
udp://www.eddie4.nl:6969/announce
udp://coppersurfer.tk:6969/announce
udp://shadowshq.eddie4.nl:6969/announce
http://tracker.dutchtracking.nl/announce
http://tracker.flashtorrents.org:6969/announce
udp://tracker.internetwarriors.net:1337/announce
http://www.todotorrents.com:2710/announce
http://pow7.com/announce
udp://inferno.demonoid.ph:3389/announce
http://torrent.gresille.org/announce
udp://tracker4.piratux.com:6969/announce
http://opensharing.org:2710/announce
http://anisaishuu.de:2710/announce
http://tracker.tvunderground.org.ru:3218/announce
http://tracker2.wasabii.com.tw:6969/announce
udp://mgtracker.org:2710/announce
udp://shadowshq.yi.org:6969/announce
http://bt.careland.com.cn:6969/announce
http://teentorrent.com:7070/announce
http://tracker.dler.org:6969/announce
http://bigfoot1942.sektori.org:6969/announce
udp://sugoi.pomf.se:80/announce
http://tracker.blazing.de:6969/announce
udp://exodus.desync.com:6969/announce
udp://open.nyaatorrents.info:6544/announce
http://tracker.tricitytorrents.com:2710/announce
udp://tracker.blackunicorn.xyz:6969/announce
http://tracker.ex.ua/announce
45
udp://bt.rutor.org:2710/announce
http://announce.torrentsmd.com:6969/announce
http://tracker.aletorrenty.pl:2710/announce
http://210.244.71.11:6969/announce
udp://tracker.torrenty.org:6969/announce
http://pubt.net:2710/announce
http://tracker.best-torrents.net:6969/announce
http://tracker.files.fm:6969/announce
http://retracker.uln-ix.ru/announce
http://bulkpeers.com:2710/announce
http://tracker3.infohash.org/announce
http://bt.mp4ba.com:2710/announce
udp://tracker.opentrackr.org:1337/announce
udp://p4p.arenabg.ch:1337/announce
http://retracker.telecom.kz/announce
http://tracker.mg64.net:6881/announce
http://tracker.trackerfix.com/announce
udp://zer0day.ch:1337/announce
udp://tracker.piratepublic.com:1337/announce
udp://tracker.sktorrent.net:6969/announce
http://xbtrutor.com:2710/announce
http://85.17.19.180/announce
http://tracker.bittorrent.am/announce
http://siambit.org/announce.php
http://retracker.krs-ix.ru/announce
http://tracker.baravik.org:6970/announce
http://tracker.tntvillage.scambioetico.org:2710/announce
http://tracker.mininova.org/announce
http://tracker.frozen-layer.com:6969/announce
http://www.mvgroup.org:2710/announce
http://bt.edwardk.info:6969/announce
http://share.camoe.cn:8080/announce
http://tracker.otaku-irc.fr/bt/announce.php
http://tracker.anirena.com:81/announce
http://tracker.dm258.cn:7070/announce
http://tracker.minglong.org:8080/announce
46
http://www.smartorrent.com:2710/announce
http://tracker.zaerc.com/announce.php
http://www.spanishtracker.com:2710/announce
http://www.todotorrents.com:2710/announce
http://www.tribalmixes.com/announce.php
http://funfile.org:2710/announce
http://mixfiend.com/announce.php
http://firesharing.altervista.org/announce.php
http://tracker.desitorrents.com:6969/announce
http://fafs.fansubanime.net/announce.php
http://all4nothin.net/announce.php
http://www.crnaberza.com/announce.php
http://www.gameupdates.org/announce.php
47