supplementary materials for - humnet lab -...

13
1 Supplementary Materials for Title: Revealing patterns in human spending habits Authors: Riccardo Di Clemente 1 , Miguel Luengo-Oroz 2 , Bapu Vaitla 3 , Marta C. González 1,4* Affiliations: 1 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Massachusetts Avenue 77, MA 02139 Cambridge (USA) 2 United Nations Global Pulse, 46th St & 1st Ave, New York, NY 10017 USA 3 Department of Environmental Health, Harvard University, 677 Huntington Avenue Boston, MA 02115 USA 4 Center for Advanced Urbanism, Massachusetts Institute of Technology, Massachusetts Avenue 77, MA 02139 Cambridge (USA) *Correspondence to: [email protected] This PDF file includes: Figs. S1 to S10 References

Upload: docong

Post on 26-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

1

Supplementary Materials for

Title: Revealing patterns in human spending habits Authors: Riccardo Di Clemente1, Miguel Luengo-Oroz2, Bapu Vaitla3, Marta C.

González1,4*

Affiliations: 1Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Massachusetts Avenue 77, MA 02139 Cambridge (USA) 2 United Nations Global Pulse, 46th St & 1st Ave, New York, NY 10017 USA 3 Department of Environmental Health, Harvard University, 677 Huntington Avenue Boston, MA 02115 USA 4 Center for Advanced Urbanism, Massachusetts Institute of Technology, Massachusetts Avenue 77, MA 02139 Cambridge (USA)

*Correspondence to: [email protected] This PDF file includes:

Figs. S1 to S10 References

Page 2: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

2

Supplementary Figure

Fig. S 1 (A) Histogram of the user transaction number in the CCRS, the 150,000 users selected for the analysis are those with more than 10 transactions and less than 300. (B) Relation between the district population (Census Data) and the number of user in our datasets, using the same color map as Fig.1 in the paper. (C) Distribution of overall users’ monthly expenditure in USD.

A B

C

Page 3: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

3

Fig. S 2 Complementary Cumulative Distribution and Probability Density Plot (inset) of MMCs’ transaction codes. The probability distribution of a transaction code 𝑥" presents Zipf’s distribution 𝑝(𝑋") ∝ 𝑥"().+,, with a KS test of 0.04 and with cutoff identified as the right-most point in the distribution before the fitted power law (33,34).

Page 4: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

4

Fig. S 3 (A) The Z-score of each word depend on occurrence. The z-score of each word is calculated by comparing the code sequence with 100 randomized code sequences for each user, preserving the number of transactions per type. (B) Probability Density Function plot of the occurrence of the significant words {𝑾𝒊} and its complementary cumulative distribution; the probability distribution words manifest a power law behavior 𝒑(𝑾𝒊) ∝ 𝒙𝒊(𝟏.𝟕𝟑; with 𝒙𝒊 frequency of the {𝑾𝒊} and KS test of 0.02. (C) Z-score of a words occurrence for a sample user. We highlight in orange the users’ associated set of words with a z-score greater than 2, that characterizes the user shopping routines.

Page 5: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

5

Fig. S 4 Clustering results depending on the users’ selection. For each threshold 𝑥 we select all the users with more than 𝑥 significant links. Using the Louvain Algorithm (26) we perform the cluster of the users’ similarity matrix of the selected users at each threshold. For each threshold, we show the proportion of users that belong to each cluster, the core transaction for that cluster (as defined in the main text, Fig. 3) and the conditional probability for a user to belong in a given cluster depending on its number of significant links 𝑃(𝐶𝑙𝑢𝑠𝑡𝑒𝑟|#𝑙𝑖𝑛𝑘𝑠). By applying a lower threshold is it possible to increase the number of users analyzed. In particular, selecting users with more than 3 significant links improves the identification of clusters 4 and 6, which were misidentified when using higher thresholds. At the same time lowering the threshold increases the number of user that we are not able to categorize effectively (user percentage of cluster 5).

4,5

12

3

64,5

12

3

6

,,

Toll

?

15%25%16%23%20%

Toll

?

18%23%15%17%27%

#USERS = 2.4K #USERS = 4.1K

4

6

5

12

34

65

12

3

Toll

?

15%17%13% 8%21%26%

Toll

?

13%17%11%10%39%10%

#USERS = 7.2K #USERS = 13.0K

Page 6: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

6

Fig. S 5 (A) Cluster analysis of the users’ similarity matrix for threshold 3 using the Walking Trap (27) and Leading Eigenvector (28) algorithms; both algorithms detect six different clusters as Louvain (25) (Fig. 3 and Fig. S5). (B) Network modularity analysis depending on the three cluster algorithms proposed. We see the Louvain algorithm always performs better in terms of modularity (33). (C) Analysis of similarity between the thee methods of data clustering performed, using Normalized Mutual Information (NMI) (34,35) and Rand (36) index. Values near one suggest a higher similarity between the cluster identified by the Louvain algorithm and the other two algorithms.

A

C

Toll

?

12%17%20%16%22%13%

Toll

?

12%13%11% 7%47%10%

4

65

12

3

4

6

5

12

3

B

Page 7: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

7

Fig. S 6 (A) Distributions of the number of users’ transactions per clusters (on the left), and confidence interval (on the right). (B) Distribution of the users’ transaction diversity per clusters. We measure the transaction diversity 𝐷(𝑖) of a user 𝑖 by using the Shannon entropy of the user’s transactions and dividing by the number of transactions 𝑁 hence: 𝐷(𝑖) = 𝑝 𝑡" log 𝑝 𝑡"IJ∈LJ /𝑁; with 𝑇" the set of user transaction. The users identified as “commuters” (see main text) are those with low transactions diversity and a higher frequency of transactions. Conversely the users in the cluster 5 manifest a higher transaction diversity with a low number of transactions. These two factors combined means that the identification of the users’ routines in this cluster is more challenging.

A

B

Page 8: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

8

Fig. S 7 (A) Sequitur compression ratio. Ratio between the original sequence transactions length and the length of Sequitur (25) sequence output. The compression ratio of the clustered user is 1.50. (B) Shanon entropy of the transactions Sequence. We define the Shannon entropy for a user𝑖 asS(i) = p(tT) log p(tT)UV∈WV ; where p(tT) is the probability of observing the transaction tT in the transaction user sequence TT. Our proposed algorithm achieves its best results analyzing the users characterized by higher values of Shannon entropy in transactions type.

Page 9: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

9

Fig. S 8 Distributions of socio-demographic characteristics of individuals in each cluster.

Page 10: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

10

Fig. S 9 Confidence intervals of socio-demographic characteristics of individuals in each cluster detected by our framework, and the solid in red representing the median values of the all clustered users.

Page 11: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

11

Fig. S 10 Median expendidure by transaction code (in USD for the 10 weeks considered). The overal clusters’ expenditure are in agreement with the core transaction identified by our framework.

Toll

Page 12: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

12

References

(from 33 only in the supplementary) 1. D. Lazer et al., Science 323, 721–723 (2009).

2. J. Giles, Nature 488, 448–450 (2012).

3. Vespignani, A. Nat. Phys. 8, 32–39 (2011).

4. N. Eagle, M. Macy, R. Claxton, Science 328, 1029–1031 (2010).

5. A. Pentland, Sci. Am., 309 78-83 (2013).

6. J. Mervis, Science, 336, 22 (2012).

7. J. L. Toole, et al., J. R. Soc. Interface 12, 10.1098/rsif.2014.1128 (2015).

8. M. C. Gonzalez, C. A. Hidalgo, A.-L. Barabasi, Nature 453, 779 (2008).

9. S. Jiang, et al., P. Natl. Acad. Sci. USA, 113, 10.1073/pnas.1524261113 (2016).

10. C. Song, et al., Science, 327, 1018-1021 (2010).

11. J. Blumenstock, G. Cadamuro, R. On, Science 350, 1073 (2015).

12. M. Lenormand, Sci. Rep. 5, 10.1038/srep10075 (2015).

13. T. Louail, et al., Sci. Rep. 5, 10.1038/srep05276 (2015).

14. S., Çolak, et al., Nat. Com. 7, 10.1038/ncomms10793 (2016).

15. M. R. Solomon, Consumer behavior: Buying, having, and being (prentice Hall

Engelwood Cliffs, NJ, 2014).

16. D. Pennacchioli, et al. EPJ Data Sci 3, 1—27 (2014).

17. Y. Yoshimura, et al., arXiv:1610.00187 (2016).

18. C. Krumme, et al., Sci. Rep. 3, 10.1038/srep01645 (2013).

19. X. Dong, et al., Workshop on Information in Networks, (NY, USA, 2015).

20. V. Singh et al., PLoS ONE, 10, 10.1371/journal.pone.0136628, (2016).

21. R. Milo, et al., Science, 298, 824—827, (2002).

22. V. C. Solutions, Visa USA Inc. (2004).

23. S. Sobolevsky, et al., PloS one, 11, 10.1371/journal.pone.0146291 (2016).

24. C. G. Nevill-Manning, I. H. Witten, J. Artif. Intell. Res. 7, 66–82 (1997).

25. V. Blondel, et al., J Stat. Mech-Theory, 10, P10008 10.1088/1742-

5468/2008/10/P10008 (2008)

26. C. L. Staudt, H. Meyerhenke, IEEE T. Parall. Distr., 27, 171--184 (2016).

Page 13: Supplementary Materials for - HuMNet Lab - MIThumnetlab.mit.edu/wordpress/wp-content/uploads/2016… ·  · 2016-12-27Supplementary Materials for Title: Revealing patterns in human

13

27. M. EJ. Newman, Phys. Rev. E, 74, 10.1103/PhysRevE.74.036104, (2006).

28. P. Pons, M. Latapy, International Symposium on Computer and Information

Sciences, 284—293, (2005).

29. L. Pappalardo et al., IEEE International Conference on Big Data, 871—878,

(2015).

30. C. Hidalgo, N. Blumm, A.-L. Barabási, Plos Comput. Biol., 5,

10.1371/journal.pcbi.1000353 (2009).

31. L. Schuerman, S. Kobrin, JSTOR, 67—100, (1986).

32. A. Cavallo, Scraped data and sticky prices, NBER (2015).

33. Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in

empirical data. SIAM Review 51.

34. J. Alstott, E. Bullmore, P. Dietmar, PloS one 9 10.1371/journal.pone.0085777

(2014).

35. M. EJ. Newman, P. Natl. Acad. Sci. USA, 23, 8577--8582, (2006).

36. L. Danon, A. Diaz-Guilera, J. Duch, A. Arenas, J Stat. Mech-Theory E., 09,

P09008, (2005).

37. L.N.F. Ana, A. K. Jain, Computer Vision and Pattern Recognition, 2003.

Proceedings. 2003 IEEE Computer Society Conference on, 2, II—128, (2003).

38. W. M. Rand, J. Am. Stat. Assoc., 66, 846--850, (1971).