do the rich get richer? an empirical analysis of the bitcoin ... · do the rich get richer? an...

9
Do the rich get richer? An empirical analysis of the BitCoin transaction network aniel Kondor, * arton P´ osfai, Istv´ an Csabai, and G´ abor Vattay Department of Physics of Complex Systems, otv¨osLor´ and University, Hungary H-1117 Budapest, P´ azm´ any P´ eter S´ et´any1/A (Dated: August 16, 2013) Manuscript submitted to PLOS ONE on Aug 14, 2013. ABSTRACT The possibility to analyze everyday monetary transac- tions is limited by the scarcity of available data, as this kind of information is usually considered highly sensi- tive. Present econophysics models are usually employed on presumed random networks of interacting agents, and only macroscopic properties (e.g. the resulting wealth dis- tribution) are compared to real-world data. In this paper, we analyze BitCoin, which is a novel digital currency sys- tem, where the complete list of transactions is publicly available. Using this dataset, we reconstruct the net- work of transactions, and extract the time and amount of each payment. We analyze the structure of the trans- action network by measuring network characteristics over time, such as the degree distribution, degree correlations and clustering. We find that linear preferential attach- ment drives the growth of the network. We also study the dynamics taking place on the transaction network, i.e. the flow of money. We measure temporal patterns and the wealth accumulation. Investigating the micro- scopic statistics of money movement, we find that sublin- ear preferential attachment governs the evolution of the wealth distribution. We report a scaling relation between the degree and wealth associated to individual nodes. INTRODUCTION In the past two decades network science has been suc- cessful in many diverse scientific fields. Indeed, many complex systems can be represented as networks, rang- ing from biochemical systems, through the Internet and the World Wide Web, to various social systems [1–7]. Economics also made use of the concepts of network sci- ence, gaining additional insight to the more traditional approach [8–13]. Although a large volume of financial data is available for research, information about the ev- eryday transactions of individuals is usually considered sensitive and is kept private. In this paper we analyze BitCoin, a novel currency system, where the complete list of transactions is accessible. We believe that this * [email protected] is the first opportunity to investigate the movement of currency in such detail. BitCoin is a decentralized digital cash system, there is no single overseeing authority [14]. The system oper- ates as an online peer-to-peer network, anyone can join by installing a client application and connecting it to the network. The unit of the currency is one bitcoin (abbre- viated as BTC), and the smallest transferable amount is 10 -8 BTC. Instead of having a bank account main- tained by a central authority, each user has a BitCoin address which consists of a pair of public and private keys. Existing bitcoins are associated to the public key of their owner, and outgoing payments have to be signed by the owner using his private key. To maintain pri- vacy a single user may use multiple addresses. Each participating node stores the complete list of previous transactions. Every new payment is announced on the network, and the payment is validated by checking con- sistency with the entire transaction history. To avoid fraud it is necessary that the participants agree on a sin- gle valid transaction history. This process is designed to be computationally difficult, so an attacker can only hijack the system if he possesses the majority of the com- putational power. Therefore the system is more secure if more resources are devoted to the validation process. To provide incentive, new bitcoins are created periodically and distributed among the nodes participating in these computations. Another way to obtain bitcoins is to pur- chase them from someone who already has bitcoins using traditional currency; the price of bitcoins is completely determined by the market. The BitCoin system was proposed in 2008 by Satoshi Nakamoto, and the system went online in January 2009 [14–17]. For over a year, it was only used by a few enthusiasts, and bitcoins did not have any real-world value. The MtGox trading site was started in 2010, mak- ing the exchange of bitcoins and conventional money sig- nificantly easier. More people and services joined the sys- tem, resulting a steadily growing exchange rate. Starting from 2011, appearances in the mainstream media drew wider public attention which led to skyrocketing prices accompanied by large fluctuations (see Fig. 1). Since the inception of BitCoin over 17 million transactions took place, and currently the market value of all bitcoins in circulation exceeds 1 billion dollars. See the Methods section for more details of the system and the data used in our analysis. We download the complete list of transactions, and re- construct the transaction network: each node represents

Upload: phamque

Post on 27-Aug-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Do the rich get richer? An empirical analysis of the BitCoin transaction network

Daniel Kondor,∗ Marton Posfai, Istvan Csabai, and Gabor VattayDepartment of Physics of Complex Systems,

Eotvos Lorand University, HungaryH-1117 Budapest, Pazmany Peter Setany 1/A

(Dated: August 16, 2013)

Manuscript submitted to PLOS ONE on Aug 14, 2013.

ABSTRACT

The possibility to analyze everyday monetary transac-tions is limited by the scarcity of available data, as thiskind of information is usually considered highly sensi-tive. Present econophysics models are usually employedon presumed random networks of interacting agents, andonly macroscopic properties (e.g. the resulting wealth dis-tribution) are compared to real-world data. In this paper,we analyze BitCoin, which is a novel digital currency sys-tem, where the complete list of transactions is publiclyavailable. Using this dataset, we reconstruct the net-work of transactions, and extract the time and amountof each payment. We analyze the structure of the trans-action network by measuring network characteristics overtime, such as the degree distribution, degree correlationsand clustering. We find that linear preferential attach-ment drives the growth of the network. We also studythe dynamics taking place on the transaction network,i.e. the flow of money. We measure temporal patternsand the wealth accumulation. Investigating the micro-scopic statistics of money movement, we find that sublin-ear preferential attachment governs the evolution of thewealth distribution. We report a scaling relation betweenthe degree and wealth associated to individual nodes.

INTRODUCTION

In the past two decades network science has been suc-cessful in many diverse scientific fields. Indeed, manycomplex systems can be represented as networks, rang-ing from biochemical systems, through the Internet andthe World Wide Web, to various social systems [1–7].Economics also made use of the concepts of network sci-ence, gaining additional insight to the more traditionalapproach [8–13]. Although a large volume of financialdata is available for research, information about the ev-eryday transactions of individuals is usually consideredsensitive and is kept private. In this paper we analyzeBitCoin, a novel currency system, where the completelist of transactions is accessible. We believe that this

[email protected]

is the first opportunity to investigate the movement ofcurrency in such detail.

BitCoin is a decentralized digital cash system, thereis no single overseeing authority [14]. The system oper-ates as an online peer-to-peer network, anyone can joinby installing a client application and connecting it to thenetwork. The unit of the currency is one bitcoin (abbre-viated as BTC), and the smallest transferable amountis 10−8 BTC. Instead of having a bank account main-tained by a central authority, each user has a BitCoinaddress which consists of a pair of public and privatekeys. Existing bitcoins are associated to the public keyof their owner, and outgoing payments have to be signedby the owner using his private key. To maintain pri-vacy a single user may use multiple addresses. Eachparticipating node stores the complete list of previoustransactions. Every new payment is announced on thenetwork, and the payment is validated by checking con-sistency with the entire transaction history. To avoidfraud it is necessary that the participants agree on a sin-gle valid transaction history. This process is designedto be computationally difficult, so an attacker can onlyhijack the system if he possesses the majority of the com-putational power. Therefore the system is more secure ifmore resources are devoted to the validation process. Toprovide incentive, new bitcoins are created periodicallyand distributed among the nodes participating in thesecomputations. Another way to obtain bitcoins is to pur-chase them from someone who already has bitcoins usingtraditional currency; the price of bitcoins is completelydetermined by the market.

The BitCoin system was proposed in 2008 by SatoshiNakamoto, and the system went online in January2009 [14–17]. For over a year, it was only used by afew enthusiasts, and bitcoins did not have any real-worldvalue. The MtGox trading site was started in 2010, mak-ing the exchange of bitcoins and conventional money sig-nificantly easier. More people and services joined the sys-tem, resulting a steadily growing exchange rate. Startingfrom 2011, appearances in the mainstream media drewwider public attention which led to skyrocketing pricesaccompanied by large fluctuations (see Fig. 1). Since theinception of BitCoin over 17 million transactions tookplace, and currently the market value of all bitcoins incirculation exceeds 1 billion dollars. See the Methodssection for more details of the system and the data usedin our analysis.

We download the complete list of transactions, and re-construct the transaction network: each node represents

2

a BitCoin address, and we draw a directed link betweentwo nodes if there was at least one transaction betweenthe corresponding addresses. In addition to the topologywe also obtain the time and amount of every payment.Therefore we are able to analyze both the evolution ofthe network and the dynamical process taking place onit (i.e. the flow and accumulation of bitcoins). To char-acterize the underlying network we investigate the evo-lution of basic network characteristics over time, such asthe degree distribution, degree correlations and cluster-ing. Concerning the dynamics, we measure the wealthstatistics and the temporal patterns of transactions. Toexplain the observed degree and wealth distribution wemeasure the microscopic growth statistics of the system.We provide evidence that preferential attachment is animportant factor shaping these distributions. Preferen-tial attachment is often referred to as the “rich get richer”scheme, meaning that hubs grow faster than low degreenodes. In the case of BitCoin this is more than an anal-ogy: we find that the wealth of rich users increases fasterthan the wealth of users with low balance; furthermore,we find positive correlation between the wealth and thedegree of a node.

RESULTS

Evolution of the transaction network

BitCoin is an evolving network: new nodes are addedby creating new BitCoin addresses, and links are createdif there is a transaction between two previously uncon-nected addresses. The number of nodes steadily growsover time with some fluctuations; especially noticeable isthe large peak which coincides with the first boom in theexchange rate in 2011 (Fig. 1). After five years BitCoinnow has N = 13, 086, 528 nodes and L = 44, 032, 115links. To study the evolution of the network we mea-sure the change of network characteristics in function oftime. We identify two distinct phases of growth: (i) Theinitial phase lasted until the fall of 2010, in this periodthe system had low activity and was mostly used as anexperiment. The network measures are characterized bylarge fluctuations. (ii) After the initial phase the BitCoinstarted to function as a real currency, bitcoins gained realvalue. The network measures converged to their typicalvalue by mid-2011 and they did not change significantlyafterwards. We call this period the trading phase.

We first measure the degree distribution of the net-work. We find that both the in- and the outdegree dis-tributions are highly heterogeneous, and can be modeledwith a power-law [18]. Figures 2 and 3 show the distri-bution of indegrees and outdegrees at different points oftime during the evolution of the BitCoin network. In theinitial phase the number of nodes is low, and thus fittingthe data is prone to large error. In the trading phase, theexponents of the distributions do not change significantly,and can be approximated by power-laws pin(kin) ∼ k−2.18

in

10

100

1000

10000

100000

1e+06

01/09 01/10 01/11 01/12 01/13 0.01

0.1

1

10

100

# o

f a

dd

resse

s

Price

[U

SD

/BT

C]

Time

Weekly active addressesAddresses with nonzero balance

Exchange price

Figure 1. The growth of the BitCoin network. Num-ber of addresses with nonzero balance (green), addresses inparticipating in at least one transaction in one week intervals(red) and the exchange price of bitcoins in US dollars accord-ing to MtGox, the largest BitCoin exchange site (blue). Thetwo black lines are exponential functions bounding the growthof the networks with characteristic times 188 and 386 days.

1e-06

0.0001

0.01

1

100

10000

1e+06

1e+08

1 10 100 1000 10000 100000 1e+06

norm

aliz

ed #

of nodes

indegree

Jan 2010Jan 2011Jul 2011Jul 2012

May 2013

Figure 2. Evolution of the indegree distribution. Sincethe beginning of 2011, the shape of the distribution does notchange significantly. The black line shows a fitted power-lawfor the final network; the exponent is 2.18.

1e-06

0.0001

0.01

1

100

10000

1e+06

1e+08

1 10 100 1000 10000 100000 1e+06

norm

aliz

ed #

of nodes

outdegree

Jan 2010Jan 2011Jul 2011Jul 2012

May 2013

Figure 3. Evolution of the outdegree distribution. Theblack line shows a fitted power-law for the the final network;the exponent is 2.06.

3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

01/01/09 01/01/10 01/01/11 01/01/12 01/01/13

Gin

i coeffic

ient

Time

IndegreesOutdegrees

Balances

Figure 4. Evolution of the Gini-coefficient of the degreeand the balance distributions. We observe the distinctinitial phase lasting until mid-2011. The trading phase ischaracterized by approximately constant coefficients.

and pout(kout) ∼ k−2.06out .

To further characterize the evolution of the de-gree distributions we calculate the corresponding Gini-coefficients in function of the time (Fig. 4). The Gini-coefficient is mainly used in economics to characterizethe inequality present in the distribution of wealth, butit can be used to measure the heterogeneity of any empir-ical distribution. In general the Gini-coefficient is definedas

G =2∑ni=1 ixi

n∑ni=1 xi

− n+ 1

n(1)

where {xi} is a sample of size n, and xi are monotonicallyordered, i.e. xi ≤ xi+1. G = 0 indicates perfect equality,meaning that x is uniformly distributed; and G = 1 cor-responds to complete inequality, e.g. the complete wealthin the system is owned by a single individual.

In the BitCoin network we find that in the initial phasethe Gini-coefficient of the indegree distribution is close to1, and for the outdegree distribution it is much lower. Wespeculate that in this phase a few users collected bitcoins,and without the possibility to trade, they stored themon a single address. In the second phase the coefficientsquickly converge to Gin ≈ 0.629 and Gout ≈ 0.521, indi-cating that normal trade is characterized by both highlyheterogeneous in- and outdegree distributions.

To characterize the degree correlations we measure thePearson correlation coefficient of the out- and indegreesof connected node pairs:

r =

∑e(j

oute − jout)(kine − kin)

σoutσin. (2)

Here jouti is the outdegree of the node at the beginningof link e, and kini is the indegree of the node at the end

of link e. The summation∑e · runs over all links, kin =∑

e kine /L and σ2

in =∑e(k

ine −kin)2/L. We calculate σout

and jout similarly.We find that the correlation coefficient is negative,

except only a brief period in the initial phase. After

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

01/01/09 01/01/10 01/01/11 01/01/12 01/01/13

Time

Degree correlationClustering

Figure 5. Evolution of the clustering coefficient andthe out-in degree correlation coefficient. After an initialphase both measures reached a stationary value by the middleof 2011.

1

10

100

1000

10000

100000

1e+06

1 10 100 1000 10000 100000 1e+06

avg. in

degre

e o

f neig

hbors

outdegree

Figure 6. The average indegree of neighbors in thefunction of the outdegree. A clear disassortative behaviorcan be observed.

mid-2010, the degree correlation coefficient stays between−0.01 and −0.05, reaching a value of r ≈ −0.014 by 2013,suggesting that the network is disassortative (Fig. 5).However, small values of r are hard to interpret: it wasshown that for large purely scale-free networks r vanishesas the network size increases [19]. Therefore we computethe average nearest neighbor degree function kinnn(kout)for the final network; kinnn(kout) measures the average in-degree of the neighbors of nodes with outdegree kout. Wefind clear disassortative behavior (Fig. 6).

We also measure the average clustering coefficient

C =1

N

∑v

∆i

di(di − 1)/2, (3)

which measures the density of triangles in the network.Here the sum

∑v · runs over all nodes, and ∆i is the

number of triangles containing node i. To calculate ∆i

we ignored the directionality of the links; di is the degreeof node i in the undirected network.

In the initial phase C is high, fluctuating around 0.15(see Fig. 5), possibly a result of transactions taking placebetween addresses belonging to a few enthusiasts try-ing out the BitCoin system by moving money between

4

their addresses. In the trading phase, the clustering co-efficient reaches a stationary value around C ≈ 0.05,which is still higher than the clustering coefficient for ran-dom networks with the same degree sequence (Crand ≈0.0037(9)).

To explain the observed broad degree distribution weturn to the microscopic statistics of link formation. Mostreal complex networks exhibit distributions that can beapproximated by power-laws. Preferential attachmentwas introduced as a possible mechanism to explain theprevalence of this property [20]. Indeed, direct measure-ments confirmed that preferential attachment governs theevolution of many real systems, e.g. scientific citationnetworks [21, 22], collaboration networks [23] or socialnetworks [24, 25]. In its original form preferential at-tachment describes the process when the probability offorming a new link is proportional to the degree of thetarget node [26]. In the past decade, several generaliza-tions and modifications of the original model were pro-posed, aiming to reproduce further structural character-istics of real systems [27–30]. Here we investigate thenonlinear preferential attachment model [27], where theprobability that a new link connects to node v is

π(kv) =kαv∑w k

αw

, (4)

where kv is the indegree of node v and α > 0. Theprobability that the new link connects to any node withdegree k is Π(k) ∼ nk(t)π(k), where nk(t) is the numberof nodes with k degree at the time of the link formation.We cannot test directly our assumption, because Π(k)changes over time. To proceed we transform Π(k) toa uniform distribution by calculating the rank functionR(k, t) for each new link given π(k) and nk(t):

R(k, t) =

∑kj=0 nj(t)j

α∑kmax

j=0 nj(t)jα=

∑kv<k

kαv∑v k

αv

. (5)

If Eq. 4 holds, R(k, t) is uniformly distributed in theinterval [0, 1], independently of t. Therefore if we plotthe cumulative distribution function, we get a straightline for the correct exponent α. To determine the bestexponent, we compare the empirical distribution of the Rvalues to the uniform distribution for different exponentsby computing the Kolmogorov-Smirnoff distance betweenthe two distributions.

Evaluating our method for indegree distribution of theBitCoin network, we find good correspondence betweenthe empirical data and the presumed conditional proba-bility function; the exponent giving the best fit is α ≈ 1(Fig. 7). This shows that the overall growth statisticsagree well with the preferential attachment process. Ofcourse preferential attachment itself cannot explain thedisassortative degree correlations and the high clusteringobserved in the network. We argue that preferential at-tachment is a key factor shaping the degree distribution,however, more detailed investigation of the growth pro-cess is necessary to explain the higher order correlations.

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

α =0.85α =1.0α =1.1α =1.2α =1.35

R

CDF

=0.7α

0.7 0.9 1.1 1.30.05

0.35

error

α

Figure 7. Rank function for new links. The CDF of theR values (see Eq. 5), for exponents 0.7, 0.85, 1, 1.1, 1.2 and1.35. The inset shows the maximum absolute error for theconsidered exponents.

1e-20

1e-15

1e-10

1e-05

1

1e-08 1e-06 0.0001 0.01 1 100 10000 1e+06

Pro

ba

bili

ty (

arb

itra

ry u

nits)

balance [BTC]

Jan 2010Jan 2011Jul 2011Jul 2012

May 2013

Figure 8. Evolution of the distribution of balances ofindividual BitCoin addresses. The y-values are shiftedby arbitrary factors for the better visibility of the separatelines. The black lines are stretched exponential and power-law fits for the data. The tail of the distribution can be wellapproximated by a power-law, with an exponent of −1.984.On the other hand the bulk of the distribution can be fittedwith the stretched exponential P (b) ∼ b−γe−(ab)1−γ , whereγ = 0.873 and a = 8014 BTC−1.

Dynamics of transactions

In the this section we analyze the detailed dynamics ofmoney flow on the transaction network. The increasingavailability of digital traces of human behavior revealedthat various human activities, e.g. mobility patterns,phone calls or email communication, are often charac-terized by heterogeneity [31–34]. Here we show that thehandling of money is not an exception: we find hetero-geneity in both balance distribution and temporal pat-terns. We also investigate the microscopic statistics oftransactions.

The state of node v at time t is given by the balance ofthe corresponding address bv(t), i.e. the number of bit-coins associated to node v. The transactions are directlyavailable, and we can infer the balance of each node basedon the transaction list. Note that the overall quantity ofbitcoins increases over time: BitCoin rewards users de-

5

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

1 10 100 1000 10000 100000 1e+06 1e+07 1e+08

Pro

ba

bili

ty

Delay [s]

Figure 9. Distribution of delays between transactionsinitiated from a single BitCoin address. We observe apower-law distribution close to the widely observed P (T ) ∼T−1, the exponential cutoff corresponds to the finite lifetimeof the BitCoin system.

voting computational power to sustain the system.We first investigate the temporal patterns of the sys-

tem by measuring the distribution of inactivity times T .The inactivity time is defined as the time elapsed be-tween two consecutive outgoing transactions from a node.We find a broad distribution that can be approximatedby the power-law P (T ) ∼ 1/T (Fig. 9), in agreementwith the behavior widely observed in various complexsystems [31, 35–37].

It is well known that the wealth distribution of soci-ety is heterogeneous; the often cited –and quantitativelynot precise– 80-20 rule of Pareto states that the top 20%of the population controls 80% of the total wealth. Inline with this, we find that the wealth distribution inthe BitCoin system is also highly heterogeneous. Wemeasure the distribution of balances at different pointsof time, and we find a stable distribution. The tail ofwealth distribution is generally modeled with a power-law [38–40], following this practice we find a power-lawtail ∼ x−1.984 for balances >∼ 50BTC (see Fig. 8). How-ever, visual inspection of the fit is not convincing: thescaling regime spans only the last few orders of mag-nitude, and fails to reproduce the majority of the dis-tribution. Instead we find that the overall behavior ismuch better approximated by the stretched exponential

distribution P (b) ∼ b−γe−(ab)1−γ , where γ = 0.873 anda = 8014 BTC−1.

To further investigate the evolution of the wealth dis-tribution we measure the Gini-coefficient over time. Wefind that the distribution is characterized by high valuesthroughout the whole lifetime of the network, reaching astationary value around G ≈ 0.985 in the trading phase(see Fig. 4).

To understand the origin of this heterogeneity we turnto the microscopic statistics of acquiring bitcoins. Simi-larly to the case of degree distributions, the observed het-erogeneous wealth distributions are often explained bypreferential attachment. Moreover, preferential attach-ment was proposed significantly earlier in the context of

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R

CDF

α = 0.00α = 0.20

α = 0.40α = 0.50

α = 0.60α = 0.70

α = 0.85α = 1.00

Figure 10. Rank function for the growth of balances.The CDF values on the x-axis are with respect to the totalsum transfered (refer to the text for description). The insetshows the Kolmogorov-Smirnoff error for the exponents con-sidered. Here, the results are not as obvious as in the casewhen considering the node degrees (Fig. 7; a simple power-law form like in Eq. 4 is not sufficient to accurately modelthe statistics of money flow. On the other hand, the “aver-age” behavior shows a correlation between the balance andthe gained money: the uncorrelated case (α = 0) clearly givesa much worse approximate than the exponents that presumepreferential attachment (α > 0).

wealth distributions than complex networks [41]. In eco-nomics preferential attachment is traditionally called theMatthew effect or the “rich get richer phenomenon” [42].It states that the growth of the wealth of each individ-ual is proportional to the wealth of that individual. Inline with this principle, several statistical models wereproposed to account for the heterogeneous wealth distri-bution [38, 43–45].

First we investigate the change of balances in fixedtime windows. We calculate the difference between thebalance of each address at the end and at the start ofeach month. We plot the differences in function of thestarting balances (Fig. 11). When the balance increases,we observe a positive correlation: the average growthincreases in function of the starting balance. This indi-cates the “rich get richer” phenomenon is indeed presentin the system. For decreasing balances we find that a sig-nificant number of addresses lose all their wealth in thetime frame of one month. This phenomenon is specific toBitCoin, due to the privacy concerns of users: it is gener-ally considered a good practice to move unspent bitcoinsto a new address when carrying out a transaction [49].

To better quantify the preferential attachment wecarry out a similar analysis to the previous section. How-ever, there is a technical difference: in the case of theevolution of the transaction network, for each event thedegree of a node increased by exactly one. In the caseof the wealth distribution there is no such constraint.To overcome this difficulty we consider the increment ofa node’s balance by one unit as an event, e.g. if af-ter a transaction bv increased by ∆bv, we consider it as

6

1e-08

1e-06

0.0001

0.01

1

100

10000

1e+06

1e-08 1e-06 0.0001 0.01 1 100 10000 1e+06

Increase

inonemonth[BTC]

Balance [BTC]

averagemedian

1e-08

1e-06

0.0001

0.01

1

100

10000

1e+061e-08 1e-06 0.0001 0.01 1 100 10000 1e+06

Decreaseinonemonth[BTC]

Balance [BTC]

averagemedian

Figure 11. Change of balances in one month windows.Increase (top) and decrease (bottom, y axis inverted) of nodebalances in one month windows as a function of their currentbalance. Also displayed are the average and median values inlogarithmically sized bins. A clear growing trend can be ob-served. The distribution of changes of balances is also highlyheterogeneous; choosing a fixed window on the x-axis, we finda fat-tailed distribution of changes on the y-axis. Examiningthese distributions, we find that their typical values tend toincrease as the balance of nodes increases; the median valueof balance increases follows an approximately power-law rela-tionship for several orders of magnitude.

∆bv separate and simultaneous events. We only con-sider events when the balance associated to an addressincreases, i.e. the address receives a payment. We nowcalculate the rank function R(b, t) defined in Eq. 5, andplot the cumulative distribution function of the R valuesobserved throughout the whole time evolution of the Bit-Coin network (Fig. 10). Visual inspection shows that nosingle exponent provides a satisfying result, meaning thatπ(bv) cannot be modeled by a simple power-law relation-ship like in Eq. 4. However, we do find that the “aver-age” behavior is best approximated by exponents aroundα ≈ 0.8, suggesting that π(bv) is a sublinear function.In the context of network evolution, previous theoreticalwork found that sublinear preferential attachment leadsto a stationary stretched exponential distribution [27], inline with our observations.

We have investigated the evolution of both the trans-action network and the wealth distribution separately.However, it is clear that the two processes are not inde-pendent. To study the connection between the two, we

3

10

100

1000

1 10 100 1000 5000

bala

nce

[BTC

]

indegree

0.01

1

100

1 100 10000 1e+06

Figure 12. Average node balances as a function of theindegrees. The averages are calculated in logarithmicallysized bins. For degrees up to knn ≈ 3000, a strong correla-tion can be observed. The straight line is the power-law fitin this range, giving: b ∼ k0.617in . The inset shows the samedataset for the full range of degrees. There are only 75 nodes(0.0063%) that have higher indegree than knn = 3000, re-sulting high fluctuations. The Pearson correlation coefficientof the full dataset is 0.00185041, while the Spearman rankcorrelation coefficient is 0.275881, which indicates significantcorrelation. For 1000 realizations of the same data with ran-dom shuffling, the average Spearman correlation coefficient is10−4 with a standard deviation of 9.5 · 10−4; the maximumabsolute value is 3.6 · 10−3

.

measure the correlation between the indegree and bal-ance associated to the individual nodes. We plot theaverage balance of addresses as a function of their de-grees on Fig. 12. For degrees in the range of 1–3000(over 99.99% of all nodes with nonzero balance), the av-erage balance is a monotonously increasing function ofthe degree and can be approximated with the power-lawb ∼ k0.617in , indicating that the accumulated wealth andthe number of distinct transaction partners an individualhas are inherently related. Similar scaling was reportedby Tseng et al., who conducted an online experimentwhere volunteers traded on a virtual market [46].

METHODS

The BitCoin network

BitCoin is based on a peer-to-peer network of usersconnected through the Internet, where each node storesthe list of previous transactions and validates new trans-actions based on a proof-of-work system. Users announcenew transactions on this network, which are formed intoblocks at an approximately constant rate of one block per10 minutes; blocks contain a varying number of transac-tions. These blocks form the block-chain, where eachblock references the previous block; changing a previ-ous transaction (e.g. double spending) would require therecomputation of all blocks since then, which becomespractically infeasible after a few blocks. To send or re-

7

ceive bitcoins, each user needs at least one address, whichis a pair of private and public keys. The public key can beused for receiving bitcoins (users can send money to eachother referencing the recipient’s public key), while send-ing bitcoins is achieved by signing the transaction withthe private key. Each transaction consists of one or moreinputs and outputs. On Fig. 13 we show a schematic viewof a typical BitCoin transaction. Readers interested inthe technical details of the system can consult the originalpaper of Satoshi Nakamoto [14] or the various resourcesavailable on the Internet [50, 51].

An important aspect of BitCoin is how new bitcoinsare generated, and how new users can acquire bitcoins.New bitcoins are generated when a new block is formedas a reward to the users participating in block generation.The generation of a valid new block involves solving a re-verse hash problem, whose difficulty can be set in a widerange. Participating in block generation is referred toas mining bitcoins. The nodes in the network regulatethe block generation process by adjusting the difficultyto match the processing power currently available. Asinterest in the BitCoin system grew, the effort requiredto generate new blocks and such receive the newly avail-able bitcoins has increased by over 10 million times; mostminers nowadays use specialized hardware, requiring sig-nificant investments. Consequently, an average BitCoinuser typically acquires bitcoins by either buying them atan exchange site or receiving them as compensation forgoods or services.

Due to the nature of the system, the record of all previ-ous transactions since its beginning are publicly availableto anyone participating in the BitCoin network. Fromthese records, the sending and receiving addresses, thesum involved and the approximate time of the transac-tion can be recovered. Such detailed information is rarelyavailable in financial systems, making the BitCoin net-work a valuable source of empirical data involving mone-tary transactions. Of course, there are several shortcom-ings. One of these is that only the addresses involved inthe transactions are revealed, not the users themselves.While providing complete anonymity is not among thestated goals of the BitCoin project [52], identifying ad-dresses belonging to the same user can be difficult [16],especially on a large scale. Each user can have an unlim-ited number of BitCoin addresses, which appear as sepa-rate nodes in the transaction records. When constructingthe network of users, these addresses would need to bejoined to a single entity.

Another issue arises not only for BitCoin, but for mostonline social datasets: It is hard to determine which ob-served phenomena are specific to the system, and whichresults are general. We do not know to what extent thegroup of people using the system can be considered asa representative sample of the society. In the case ofBitCoin for example, due to the perceived anonymity ofthe system, the system is widely used for commerce ofillegal items and substances [47]; these types of transac-tions are probably overrepresented among BitCoin trans-

Figure 13. Schematic view of a BitCoin transaction.Here we have four input (I1–I4) and three output (O1–O3)addresses.

actions. Ultimately, the validity of our results will betested if data becomes available from other sources, andcomparison becomes possible.

Data

We installed the open-source bitcoind client anddownloaded the blockchain from the peer-to-peer net-work. We modified the client to extract the list of alltransactions in a human-readable format. We down-loaded more precise timestamps of transactions from theblockchain.info website’s archive. We extracted thetransactions from the blockchain on May 7th, 2013. Thedata and the source code of the modified client programis available at the project’s website [53] or through theCasjobs web database interface [48, 54].

The data includes 235,000 blocks, which contain atotal of 17,354,797 transactions. This dataset includes13,086,528 addresses (i.e. addresses appearing in at leastone transaction); of these, 1,616,317 addresses were ac-tive in the last month. The BitCoin network itself doesnot store balances associated with addresses, these can becalculated from the sum of received and sent bitcoins forthe given address [55]. Using this method, we identifiedthat approximately one million addresses had nonzerobalance at the time of our analysis.

DISCUSSION

We have preformed detailed analysis of BitCoin, anovel digital currency system. A key difference from tra-ditional currencies handled by banks is the open natureof the BitCoin: here each transactions is publicly an-nounced, providing unprecedented opportunity to studymonetary transactions of individuals. We have down-loaded and compiled the complete list of transactions,and we have extracted the time and amount of each pay-ment. We have studied the structure and evolution ofthe transaction network, and we have investigated the

8

dynamics taking place on the network, i.e. the flow ofbitcoins.

Measuring basic network characteristics in function oftime, we have identified two distinct phases in the lifetimeof the system: (i) When the system was new, no busi-nesses excepted bitcoins as a form of payment, thereforeBitCoin was more of an experiment than a real currency.This initial phase is characterized by large fluctuationsin network characteristics, heterogeneous indegree- andhomogeneous outdegree distribution. (ii) Later BitCoinreceived wider public attention, the increasing number ofusers attracted services, and the system started to func-tion as a real currency. This trading phase is character-ized by stable network measures, dissasortative degreecorrelations and power-law in- and outdegree distribu-tions. We have measured the microscopic link forma-tion statistics, finding that linear preferential attachmentdrives the growth of the network.

To study the accumulation of bitcoins we have mea-sured the wealth distribution at different points in time.We have found that this distribution is highly hetero-geneous through out the lifetime of the system, and itconverges to a stable stretched exponential distributionin the trading phase. We have found that sublinear pref-erential attachment drives the accumulation of wealth.Investigating the correlation between the wealth distri-bution and network topology we have identified a scalingrelation between the degree and wealth associated to in-

dividual nodes, implying that the ability to attract newconnections and to gain wealth is fundamentally related.

We believe that the data presented in this paper hasgreat potential to be used for evaluating and refiningeconophysics models, as not only the bulk properties butalso the microscopic statistics can be readily tested. Tothis end, we make all the data used in this paper avail-able online to the scientific community in easily accessibleformats [48, 53, 54].

ACKNOWLEDGMENTS

The authors thank Andras Bodor for many usefuldiscussions on the subject. This work has been sup-ported by the European Union under grant agreementNo. FP7-ICT-255987-FOC-II Project. The authorsthank the partial support of the European Union andthe European Social Fund through project FuturICT.hu(grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-0013),the OTKA 7779 and the NAP 2005/KCKHA005 grants.EITKIC 12-1-2012-0001 project was partially supportedby the Hungarian Government, managed by the NationalDevelopment Agency, and financed by the Research andTechnology Innovation Fund and the MAKOG Founda-tion.

[1] Albert, R. and Barabasi, A.-L. (2002). Statistical me-chanics of complex networks. Reviews of modern physics74 (1), 47

[2] Newman, M. E. J. (2003). The structure and function ofcomplex networks. SIAM review 45 (2), 167–256

[3] Barabasi A.-L., Oltvai Z. N. (2004). Network biology: un-derstanding the cell’s functional organization. Nat. Rev.Genet. 5 101–113.

[4] Pastor-Satorras, R., and Vespignani, A. (2007). Evolu-tion and structure of the Internet: A statistical physicsapproach. Cambridge University Press

[5] Barrat, A., Barthelemy, M., Vespignani, A. (2008).Dynamical Processes on Complex Networks. CambridgeUniversity Press

[6] Cohen, R., and Havlin, S. (2010). Complex Networks.Structure, Robustness and Function. Cambridge Univer-sity Press

[7] Dorogovtsev, S. N., Goltsev, A. V. and Mendes, J. F. F.(2008). Critical phenomena in complex networks. Rev.Mod. Phys. 80 (4) 1275–1335

[8] Catanzaro, M., and Buchanan, M. (2013). Network op-portunity. Nature Physics, 9(3), 121–123.

[9] Caldarelli, G., Chessa, A., Pammolli, F., Gabrielli, A.,and Puliga, M. (2013). Reconstructing a credit network.Nature Physics, 9(3), 125–126.

[10] Bargigli, L., and Gallegati, M. (2013). Finding Commu-nities in Credit Networks. Economics, 7.

[11] Caldarelli, G. (2007). Scale-Free Networks. Oxford Uni-versity Press

[12] Schweitzer, F., Fagiolo, G., Sornette, D., Vega-Redondo,F., Vespignani, A., and White, D. R. (2009). Economicnetworks: The new challenges. science Science, 325(5939) 422

[13] Palla, G., Farkas, I., Derenyi, I., Barabasi, A.-L., andVicsek, T. (2004). Reverse engineering of linking prefer-ences from network restructuring. Phys. Rev. E, 70(4),046115.

[14] Nakamoto, S. (2008). Bitcoin: A peer-to-peer electroniccash system. http://bitcoin.org/bitcoin.pdf

[15] Ron, D. and Shamir, A. (2012). QuantitativeAnalysis of the Full Bitcoin Transaction Graph.http://eprint.iacr.org/2012/584

[16] Reid, F. and Harrigan, M. (2011). An Analysis ofAnonymity in the Bitcoin System. 1107.4524

[17] Note that the name Satoshi Nakamoto iswidely believed to be a psuedonym. See e.g.https://en.bitcoin.it/wiki/Satoshi_Nakamoto

[18] Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009).Power-law distributions in empirical data. SIAM Review51 (4), 661–703.

[19] Menche, J., Valleriani, A., and Lipowsky, R. (2010).Asymptotic properties of degree-correlated scale-free net-works. Phys. Rev. E, 81(4), 046103.

[20] Barabasi, A.-L. (2012). Network science: Luck or reason.Nature 489, 1–2

[21] Barabasi, A.-L., Jeong, H., Neda, Z., Ravasz, E., Schbert,A., and Vicsek, T. (2002). Evolution of the social networkof scientific collaborations. Physica A 311 590–614

9

[22] Newman, M. E. J. (2001). Clustering and preferentialattachment in growing networks. Phys. Rev. E 64 025102

[23] Jeong, H., Neda, Z., and Barabasi, A.-L. (2003). Measur-ing preferential attachment in evolving networks. Euro-physics Letters, 567.

[24] Kunegis, J., Blattner, M., and Moser, C. (2013). Prefer-ential attachment in online networks: Measurement andexplanations. arXiv:1303.6271.

[25] Alan Mislove. Online Social Networks: Measurement,Analysis, and Applications to Distributed InformationSystems. PhD thesis, Rice University, 2009.

[26] Barabasi, A.-L. and Albert, R. (1999). Emergence of scal-ing in random networks. Science 286 509–512

[27] Krapivsky, P. L., Redner, S., and Leyvraz, F. (2000).Connectivity of growing random networks. Physical re-view letters, 85 (21), 4629.

[28] Dorogovtsev, S. N., Mendes, J. F. F., and Samukhin, A.N. (2000). Structure of growing networks with preferen-tial linking. Physical review letters, 85 (21), 4633.

[29] Dorogovtsev, S. N., and Mendes, J. F. F. (2000). Evolu-tion of networks with aging of sites. Physical Review E,62 (2), 1842.

[30] Vazquez, A. (2003). Growing network with local rules:Preferential attachment, clustering hierarchy, and degreecorrelations. Phys. Rev. E, 67 (5), 056104.

[31] Vazquez, A., Oliveira, J. G., Dezso, Z., Goh, K., Kon-dor, I., and Barabasi, A.-L. (2006). Modeling bursts andheavy tails in human dynamics. Phys. Rev. E, 73, 036127.

[32] Jo, H., Karsai, M., Kertesz, J., and Kaski, K. (2012).Circadian pattern and burstiness in mobile phone com-munication. New J. Phys., 14, 013055.

[33] Song, C., Qu, Z., Blumm, N., and Barabasi, A.-L. (2010).Limits of predictability in human mobility. Science, 327,1018–21.

[34] Lambiotte, R., Blondel, V. D., Kerchove, C., Huens, E.,Prieur, C., Smoreda, Z., and Van Dooren, P. (2008). Ge-ographical dispersal of mobile communication networks.Physica A, 387 (21), 5317–5325.

[35] Barabasi, A.-L. (2005). The origin of bursts and heavytails in human dynamics. Nature, 435 (7039), 207-211.

[36] Malmgren, R. D., Stouffer, D. B., Motter, A. E., andAmaral, L. A. (2008). A Poissonian explanation for heavytails in e-mail communication. Proceedings of the Na-tional Academy of Sciences, 105 (47), 18153-18158.

[37] Holme, P., and Saramaki, J. (2012). Temporal networks.Physics reports, 519 (3), 97-125.

[38] Yakovenko, V. M. and Rosser, J. B. (2009). Colloquium:Statistical mechanics of money, wealth, and income. Rev.Mod. Phys. 81 (4) 1703–1725

[39] Ning, D., and You-Gui, W. (2007). Power-law tail in theChinese wealth distribution. Chinese Physics Letters, 24(8), 2434.

[40] Klass, O. S., Biham, O., Levy, M., Malcai, O., andSolomon, S. (2006). The Forbes 400, the Pareto power-law and efficient markets. Eur. Phys. J. B, 55 (2), 143–147.

[41] Simon, H. A. (1955). On a class of skew distribution func-tions. Biometrika, 42 (3/4) 425–440.

[42] “For whosoever hath, to him shall be given, and he shallhave more abundance: but whosoever hath not, fromhim shall be taken away even that he hath.” The Bible,Matthew 13:12, King James translation

[43] Ispolatov, S. and Krapivsky, P.L. and Redner, S. (1988).Wealth Distributions in Asset Exchange Models. Eur.Phys. J. B, 2 (2), 267–276.

[44] Garlaschelli, D., Battiston, S., Castri, M., Servedio, V.D. P., and Caldarelli, G. (2005). The scale-free topologyof market investments. Physica A, 350, 491–499.

[45] Tseng, J., Li, S., and Wang, S. (2010). Experimentalevidence for the interplay between individual wealth andtransaction network. Eur. Phys. J. B, 73 69–74.

[46] Tseng, J. J., Li, S. P., and Wang, S. C. (2010). Ex-perimental evidence for the interplay between individualwealth and transaction network. The European PhysicalJournal B, 73 (1) 69–74.

[47] Christin, N. (2012). Traveling the Silk Road: A measure-ment analysis of a large anonymous online marketplace.arXiv: 1207.7139

[48] Matray, P., Csabai, I., Haga, P., Steger, J., Dobos, L. andVattay, G. (2007). Building a prototype for Network Mea-surement Virtual Observatory. Proc. ACM SIGMET-RICS 2007 MineNet Workshop. San Diego, CA, USA.DOI: 10.1145/1269880.1269887

[49] “Most Bitcoin software and websites will help with thisby generating a brand new address each time you performa transaction.” https://en.bitcoin.it/wiki/Address

[50] http://bitcoin.org/about.html

[51] http://bitcoin.org/en/choose-your-wallet

[52] See e.g. https://en.bitcoin.it/wiki/Anonymity.[53] http://www.vo.elte.hu/bitcoin

[54] http://nm.vo.elte.hu/casjobs

[55] Checking against spending more than a given address hasis achieved by requiring that the input of a transactioncorresponds to the output of a previous transaction.