tuni baltakys arkisto1

218
,QYHVWRU 1HWZRUNV DQG ,QIRUPDWLRQ 7UDQVIHU LQ 6WRFN 0DUNHWV .ĉ6787,6 %$/7$.<6 7DPSHUH 8QLYHUVLW\ 'LVVHUWDWLRQV

Upload: others

Post on 08-Jan-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TUNI baltakys arkisto1
Page 2: TUNI baltakys arkisto1
Page 3: TUNI baltakys arkisto1
Page 4: TUNI baltakys arkisto1

Responsible supervisor and Custos

Pre-examiners

Opponent

Page 5: TUNI baltakys arkisto1

PREFACE

In this thesis, I am studying investor networks, where the links are based on sometrading similarity. Before going into more details, I want to explore a different net-work, the social sub-network of my ego. In such a network, all the nodes have aconnection with me but are not necessarily linked between each other. Not unlikein the thesis, the relationships built and maintained during the writing of this thesisare observable, but non-measurable, at least not emotionally. As I traverse this egonetwork, I will attempt to express my gratitude to each connecting node.

With my subjective community detection algorithm, I can identify several over-lapping communities and define them as my colleagues, my co-authors, my familyand my friends. In the same order, I will attempt to describe the importance of therelationships with the members of these communities.

First and foremost, I want to thank my supervisor, professor Juho Kanniainen,for all the trust you put in me. You always had time for every single one of us, puttingso much effort in us succeeding. You granted full freedom, yet the best guidance.You have shown interest not only in helping us in our research but also in personallife.

I want to thank professor Jari Saramäki for serving as an opponent in the publicdefence of my thesis. I appreciate the time you have dedicated to reading my work.Unbeknownst to you, you have been guiding me into becoming a better researcherthrough your research papers, blog and especially the book on scientific writing.Moreover, I want to express my gratitude to the pre-examiners of this thesis: profes-sors Damien Challet and Michele Tumminello. Thank you for the time you sparedto read this piece of work, and for your too kind feedback.

I am happy to have been able to work with all of my co-authors and especiallyprofessors Frank Emmert-Streib and Hannu Kärkkäinen. Without you, this thesiswould not have been possible. I want to thank professor Mikko Kivelä for hostingme during the visiting period at Aalto University. I enjoyed all the inspiring meetings

iii

Page 6: TUNI baltakys arkisto1

Ina

Olga

Karan

Ye

Figure 1 My social ego network. Links shown in the network are observed from my perspective andmight not represent the true underlying relationships. Node colours are based on the assign-ment to communities based on modularity maximization.

and discussions I had through the years with professor Jyrki Piilo. I particularlywant to express my gratitude to professor Fabrizio Lillo. Your lectures on ComplexNetworks introduced me to this wonderful research area. The reason I am interestedin research is to work with and learn from such great minds as all of you are. Youhave all inspired me.

Words are not enough to express my gratitude to my parents - Ina and Faustas,for all the sacrifices you made in raising my brother and me. The safety harbour youprovided allowed me to endeavour on all of my journeys, knowing that there is aplace I will always be welcome. Your support, your advice has and will forever guideme. I want to thank my brother, Vaidotas. Your strength and confidence inspire me,and I am sure that I would not have finished this thesis if it were not for you. I want tothank my wife, Margarita. You are my soul-mate, my friend, my companion and asdocumented by our papers and this thesis, my co-author. You are a true fighter anda perfectionist. I would never be where I am now without your support. You makeeverything I do better, and you make me better. Most of the nodes in the network

iv

Page 7: TUNI baltakys arkisto1

(see Figure 1) are connected to me because of you. Occasionally, I refer to them as“your friends”, since I have no clue why they would be interested in interacting withme. I want to thank my parents-in-law Olga and Jurijus. Your optimism and supportallowed Margarita and me to be where we are today.

I want to thank my dear friend Rokas, for his support and visits to the furthestplaces that my journey so far has taken me. The friendship we share has grown de-spite the increasing geographical distance. I want to thank my thrice colleague, Rytis.We first met back in 2010, working in a small asset management company. You areone of the people who inspired me to continue my scientific journey. Our friend-ship has lasted two companies and a doctoral program, and I believe that somedaythere is more to come. I want to thank all of my previous colleagues and especiallyGyte, Egle, Dovile, Algirdas and Stig. There were some tough moments, but for themost part, I feel that we were just a group of friends enjoying our time together. Iwant to thank my friends and colleagues at the university Sindhuja, Jimmy, Jaakko,Matias, Toni, Karan, Jayesh, Adam, Natalia. I know I have been pushing the limitswith my sense of humour, and I thank you for your laughter. Because of you, mytime at the university never felt like work.

A special thanks to my colleagues and friends Milla and Ville, Eija and Lassi. Youmade Finland feel like home. Milla was one of the first connections I made in Fin-land. The benefits of this friendship started with her furnishing my then emptystudio and continued with her helping to choose our own home, here in Finland.Just as Milla and Ville have welcomed us at their home, we have welcomed themto our family. Not surprisingly, Milla’s node in the network has the third-highestdegree (more about that in the Methods section). I am especially glad to have sharedmy PhD time here in Finland with Martin. Our friendship had served us well whenwe needed late-night or early-morning transportation to or from airports. The ad-ventures we had when travelling together will be the ones that I will most cherishfrom this period.

I want to thank all the rest of my family and friends, especially Borisas, Natalija,Monika, Emilija, Ye, Jun, Lukas, Auste, Mindaugas, Vytautas, Rimante, Egle, Tadas,Lasivydas, Jeremy, Federica, Chiara and of course my best friend, Sergio. So manypeople influenced the path that Ied me here that I can not mention everyone, but allof you make my life special.

v

Page 8: TUNI baltakys arkisto1

vi

Page 9: TUNI baltakys arkisto1

ABSTRACT

In the last five decades, financial research has considerably changed. From the ef-ficient market hypothesis and the rational investor being the only leitmotif to theacknowledgement of the real human investor with its flawed nature, giving groundfor the behavioural finance and economics fields. Similarly, in the area of researchmethodologies, old school linear regressions were enriched by non-linear approachesadopted from other disciplines of research. For example, the symbiosis between thestudies of financial markets, statistical mechanics and theoretical physics refined intothe new field of econophysics while the union of mathematical finance and numeri-cal methods gave way for the field of computational finance. These changes did notcome about by chance. The disagreement between the conventional financial mod-els and the empirical evidence has become more acknowledged and requires newapproaches and theories to explain the empirical observations.

Since the financial crisis of 2007-2008, interdisciplinary methods have gained sub-stantial importance in modelling the financial systems and especially in the investi-gation of systemic risk. More and more network scientists contribute to the analysisof financial systems, and network methods are slowly gaining the position as oneof the standard approaches when dealing with the complexity of financial markets.Traditional methods are not able to capture the full spectrum of individual investorbehaviour, much less the impact of investor interconnectedness on their behaviourand effect they have on markets. This dissertation portrays the increasing impor-tance and adoption of the view that economic and financial questions should be in-vestigated as complex systems with heterogeneous agents and significant intercon-nections. Nevertheless, the study of investor behaviour in the stock markets usingthe methodologies of complex networks is still relatively sparse. The main reasonfor the lack of research in this area is due to limited data availability.

The main objective of this thesis is to study investor interconnectedness in termsof their trading behaviour and information transfer in the financial markets. This

vii

Page 10: TUNI baltakys arkisto1

dissertation consists of an introductory part and five research papers. It is intendedas an introduction into the research of investor networks, particularly elaboratingon the topics studied with my coauthors. The concept of investor networks is inthe intersection between the studies of investor behaviour, information transfer andcomplex networks. Investor networks aim to capture the synchronization betweeninvestor trading behaviours, which might come about because of information trans-fer channels existing between investors.

With this in mind, the thesis provides a survey of the general complex networkconcepts and measures that are used in economic and financial studies. The other ob-jectives of the dissertation are to address some of the challenges in investor networkstudies, pointing out some gaps and provide empirical evidence. In particular, a mul-tilayer aggregation framework based on statistical validation is proposed allowing forconstructing networks from information about multiple securities and periods, so-cioeconomically meaningful investor categorization and transaction bootstrappingfor investor network link validation. Moreover, investor networks and their charac-teristics are investigated for a set of 69 Initial Public Offering (IPO) securities. Besidesour introduced multilayer aggregation procedure, this is the first study observing in-vestor clusters over multiple securities. As a contribution to the investigation ofword-of-mouth information transfer between household investors, the associationbetween trade timing similarity and the geographical distance is tested. Alterna-tively, investors might react to public news announcements, and in this respect, theeffect of social media releases on investor trading decisions is tested.

By using the multilayer aggregation framework, household investors in Helsinkiwere identified as the most central investors in terms of trading synchronizationlinks with other investor categories. At the same time, the biggest clusters in IPOsecurities are formed by institutional investors. Remarkably, the same investor clus-ters are observed to persist in time and exist over different IPO securities as well asfive mature securities. This leads to the conclusion that investors use market-wideinstead of security-specific strategies. As for the information transfer in stock mar-kets, geographical distance is found to have a negative association for household tradetiming similarity, which suggests the existence of word-of-mouth based informationtransfer channels. Even further, company announcements on a social media plat-form are found to have an effect on inactive household trading decisions.

viii

Page 11: TUNI baltakys arkisto1

CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1 General background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 Motivation and research objectives . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Linkages between the publications . . . . . . . . . . . . . . . . . . . . . . . 24

1.4 Dissertation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Key concepts and related research . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1 Investor behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2 Information transfer in stock markets . . . . . . . . . . . . . . . . . . . . 30

2.3 Investor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Research Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 Shareholding registration data . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Facebook data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3 Postal codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Nokia announcement data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Nokia weekly return data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.6 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Network notation, concepts and measures . . . . . . . . . . . . . . . . . 45

4.2 Network inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Network filtering and validation . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ix

Page 12: TUNI baltakys arkisto1

5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.1 Investor networks in stock markets . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Information transfer in investor networks . . . . . . . . . . . . . . . . . . 64

6 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.2 Reliability and validity of research . . . . . . . . . . . . . . . . . . . . . . . 69

6.3 Limitations and suggestions for future research . . . . . . . . . . . . . . 72

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Publication I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Publication II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Publication III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Publication IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Publication V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Appendix to Publication III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Appendix to Publication IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Appendix to Publication V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

x

Page 13: TUNI baltakys arkisto1

ABBREVIATIONS

C3NET Conservative Causal Core Network

e.g. for example, from Latin exempli gratia

EIN Empirical Investor Network

et al. and others, from Latin et alii

FDR False Discovery Rate

i. e. in other words, from Latin id est

IPO Initial Public Offering

MST Minimum Spanning Tree

MTC Multiple Test Correction

PMFG Plannar Maximally Filtered Graph

xi

Page 14: TUNI baltakys arkisto1

xii

Page 15: TUNI baltakys arkisto1

ORIGINAL PUBLICATIONS

Publication I Emmert-Streib, F., Musa, A., Baltakys, K., Kanniainen, J., Tri-pathi, S., Yli-Harja, O., Jodlbauer, H. and Dehmer, M. (2018a).Computational Analysis of the structural properties of Economicand Financial Networks. Journal of Network Theory in Finance4.3, 1–32. DOI: 10.21314/JNTF.2018.043.

Publication II Baltakys, K., Kanniainen, J. and Emmert-Streib, F. (2018a).Multilayer aggregation with statistical validation: Applica-tion to investor networks. Scientific reports 8.1, 8198. DOI:10.1038/s41598-018-26575-2.

Publication III Baltakiene, M., Baltakys, K., Kanniainen, J., Pedreschi, D. andLillo, F. (2019a). Clusters of Investors Around Initial Public Of-fering. arXiv preprint arXiv:1905.13508v2. Accepted for publica-tion in the Palgrave Communications journal.

Publication IV Baltakys, K., Baltakiene, M., Kärkkäinen, H. and Kanniainen, J.(2018a). Neighbors matter: Geographical distance and trade tim-ing in the stock market. Finance Research Letters. DOI: 10.1016/j.frl.2018.11.013.

Publication V Siikanen, M., Baltakys, K., Kanniainen, J., Vatrapu, R., Mukka-mala, R. and Hussain, A. (2018a). Facebook drives behavior ofpassive households in stock markets. Finance Research Letters 27,208–213. DOI: 10.1016/j.frl.2018.03.020.

xiii

Page 16: TUNI baltakys arkisto1

Author’s contribution

This section describes my role in each of the publications. The dissertation is com-piled of five publications, of which four are journal articles (Publications I, II, IV, V)and one is accepted for publication in the Palgrave Communications journal (Publi-cation III). I was the first author in Publications II and IV, second author in Publica-tions III and V, and the third author in Publication I.

In Publication I, together with coauthors, I did the literature review of the use ofnetwork methodologies in financial and economic research. Besides reviewing thewhole manuscript, I wrote the chapter on future directions and discussion.

In Publication II, I had the main responsibility in designing the experiment, do-ing the literature review, writing and reviewing the manuscript, preparing the figuresand conducting the empirical analysis.

In Publication III, together with coauthors, I wrote, reviewed and edited themanuscript, planned the research design, methodology and helped to code the em-pirical analysis and to make the figures. This publication will also be part of thecompendium dissertation of my colleague Margarita Baltakiene.

In Publication IV, I had the main responsibility in designing the experiment, do-ing the literature review, writing and reviewing the manuscript, preparing the figuresand conducting the empirical analysis. This publication will also be part of the com-pendium dissertation of my colleague Margarita Baltakiene.

In Publication V, in collaboration with coauthors, I wrote, reviewed and editedthe manuscript, planned the research design and prepared the shareholding registra-tion data. This publication is part of the compendium dissertation of my colleagueMilla Siikanen.

xiv

Page 17: TUNI baltakys arkisto1

1 INTRODUCTION

This chapter introduces the topic explored in this dissertation, as well as the motiva-tion for research objectives that are investigated in Publications I–V. Section 1.1 in-troduces the broad study of complex networks, investor behaviour and informationtransfer, while Section 1.2 focuses on the study of investors in stock markets throughcomplex network methods, and provides the motivation for the research objectivesin Publications I–V. Section 1.3 discusses the linkages between Publications I–V and,finally, Section 1.4 outlines the structure of the dissertation.

1.1 General background

Complex network research has always been inspired by the study of real-world sys-tems. The documented origin of the network research began in 1736 when LeonhardEuler solved the Konigsberg bridge problem (Euler, 1741). The solution to the pop-ular question, if there exists a path such that all edges in a network are traversed onlyonce, leads to the definition of Euler path on a graph. This was the first publishedwork showing the importance of interconnectedness. Roughly a hundred years laterin 1858, the so-called Hamiltonian circuit was defined (Hamilton, 1858; Kirkman,1858). It is a similar problem, however, in this case, each node of the network is tra-versed only once. Since then, graph theory has been developed to solve numerousproblems (Appel et al., 1989; Nešetril et al., 2001; Bondy et al., 1976).

The beauty of research in the complex network domain is that, not uncommonly,methods and concepts developed for a particular research area find their way andapplications in other areas. Following the mathematical development of networkrelated methods, came their application in various fields, examples ranging from In-ternet and WWW (Faloutsos et al., 1999; Albert, Jeong et al., 1999), mobile com-munication networks (Onnela et al., 2007), citation networks (Radicchi et al., 2008),global corporate control (Vitali et al., 2011) and many others.

15

Page 18: TUNI baltakys arkisto1

In most natural and engineered systems, a set of entities interact with each otherin complicated patterns that can involve multiple types of relationships, change intime and include various sub-systems spanning over multiple layers of connectiv-ity. In order to gain understanding in such complex systems, it is important to takesuch features into account. Complex network methods can be used to answer newquestions, that previously could not have been answered with existing methodolo-gies or improve the previous evidence. A side product of networks methodologiesare stunning visualizations that can be made to illustrate the data and investigatedphenomena (Bastian et al., 2009; Smoot et al., 2010).

The current state of complex network research has to thank mainly to two strandsof research, namely, the analysis of social networks and statistical mechanics (Hi-dalgo, 2015). Social network analysis originated from sociology, where the spread ofideas, habits, behaviours, and importance of nodes is investigated (Scott, 1988), whilestatistical mechanics, studies the statistical properties of networks, the resilience ofdifferent structures and processes as well as the network growth (Albert and Barabási,2002).

Social network studies have also contributed to the study of the economy. Un-like economists, sociologists have not derived the behaviour of the whole societyconstituting of "entirely rational" individuals. The rational society idea is also inconflict with the network approach, where all nodes are located not in a regular, lat-tice type structure, but rather in a highly heterogeneous network. Sociologists havefurther contributed to the idea of network importance in economic studies as theysuggested that asymmetric information in markets originate specifically from ones’location in the information networks, providing benefits to some and leaving othersat a disadvantage (W. E. Baker, 1984; Abolafia, 2001).

Likewise, network theory applications did not overlook the domain of finance(Allen et al., 2000; D’Arcangelis et al., 2016; Battiston, Farmer et al., 2016) – fromsimple correlation coefficients between daily log-returns of securities, to form secu-rity networks (Mantegna, 1999), to network filtering techniques, to extract the mostessential information (Mantegna and Stanley, 2000; Tumminello, Aste et al., 2005).The network approach has been used to build a diversified portfolio that reduces in-vestment risk (Pozzi et al., 2013), while analysis of abnormal motifs in complex trad-ing networks has been used to detect possible price manipulation (Jiang et al., 2013),identified as significantly positive cumulative excess returns after buyer-initiated sus-

16

Page 19: TUNI baltakys arkisto1

picious trades.

The financial crisis of 2007-2009 showed the importance of interconnectedness.At that time, existing risk valuation methods were insufficient in their ability to ac-count for the linkages between market entities as well as predict how risk propagatesthrough the markets. Specific network-based centrality measures have been devel-oped to measure the systemic risk inherent in the financial networks (D’Errico etal., 2009; Battiston, Puliga et al., 2012; Markose et al., 2012). With the help of newmethodologies, the complexity of the financial networks has been studied as a factor,increasing systemic risk in the markets (Battiston, Caldarelli et al., 2016; Bardosciaet al., 2017).

The models of rational economic behaviour have tried to explain the empiricalobservations, but with mounting evidence of sub-optimal decisions in the areas ofretirement, mortgage and stock markets the bubble of old-school finance began toburst (Thaler et al., 2015). What further drove to question the validity of rationalmodels was the endless series of extreme market fluctuations. Ergo, the field of be-havioural economics was born. Already in the 1950s, Simon, 1957 introduced theconcept of bounded rationality, but it took many years since, for the field of be-havioural sciences to be taken seriously. Eventually, behavioural biases became rec-ognized in the studies of economics and finance, understanding that individuals havelimited time and cognitive abilities, therefore they take shortcuts by using rules ofthumb to make decisions, even if they lead to sub-optimal decisions (Tversky et al.,1974).

A significant shortcoming of earlier studies on stock markets is the lack of struc-ture consideration between individual investors and the assumption of rational in-vestors. From the structural perspective, it is difficult to observe the interconnect-edness between investors as links between investors are neither directly observablenor declared in the data set. Instead, connections are estimated using particular in-ference methods to link investors who trade similarly (Tumminello, Micciche, Lillo,Piilo et al., 2011). Even though complex network methods have been applied tostudy financial markets for more than two decades now, only a few studies havebeen conducted investigating the complexity of investor behavioural interrelation-ships (Tumminello, Micciche, Lillo, Piilo et al., 2011; Tumminello, Micciche, Lillo,Varho et al., 2011; Tumminello, Lillo et al., 2012; Ozsoylev, Walden, Yavuz et al.,2013; Musciotto, Marotta, Miccichè et al., 2016; Ranganathan et al., 2018; Musciotto,

17

Page 20: TUNI baltakys arkisto1

Marotta, Piilo et al., 2018). Mostly, this can be explained by the lack and sensitivenature of investor-level data.

Actions of millions of individual investors, through their indirect interactionswhen trading and direct interactions when exchanging information constitute finan-cial markets, where price discovery is facilitated. From the behavioural perspective,it is not sufficient to investigate how an individual investor trades in the stock mar-kets and then assume that the market behaviour is the aggregate of such behaviours.First of all, for each trade, one needs a counterparty, i.e. some other investor to as-sume a completely opposing position. Second, ones’ decisions are always influencedby the surrounding environment. Taking into account these moments, we can notsay that we understand markets if we investigate individual investors in isolation.Indeed, studies have shown that individuals decisions are affected by their peers andneighbourhood (Brown et al., 2008; Ivkovic et al., 2007; Ahern, 2017). Understand-ing the investor behaviour with respect to their environments can shed light on theinformation transfer importance, the stability of markets and better risk manage-ment tools for regulators and policymakers.

1.2 Motivation and research objectives

The previous section introduced the broad study of complex networks and touchedupon its applications in various research fields, including finance, as well as the im-portance of information transfer and investor behaviour in understanding financialmarkets. This section continues expanding on the combined study of complex net-works, investor behaviour and information transfer and in particular – the overar-ching topic of investor networks. The current research gaps are pointed out, and theresearch questions and objectives set out in Publications I–V are justified. This dis-sertation combines three main concepts - investor behaviour, information transferand complex networks. Figure 1.1 places all five publications with respect to howrelated to each of the given concepts they are.

The main objectives of this thesis are to study investor interconnectedness interms of (i) trading behaviour and (ii) information transfer in the stock markets. Inorder to analyze the synchronizations between investor trading behaviours, appro-priate complex network methods should be developed, and investor behaviour sus-ceptibility to information channels understood. Publications I – IV tackle the ques-

18

Page 21: TUNI baltakys arkisto1

Investor Behaviour

InformationTransfer

ComplexNetworksI

IIInvestorNetworks

III

IV

V

Figure 1.1 Concept map and publication location with respect to the concepts

tions concerning investor interconnectedness, and Publications II-V are related to thequestions of information transfer in stock markets. Therefore, the research questionsand objectives are formulated to address different sub-questions and sub-objectivesand altogether they shape the narrative of the thesis. Research objectives and ques-tions are numbered with respect to publications that investigate them1. More specif-ically, research objectives (RO 1, RO 2.1 and RO 2.2) aim to survey, develop anddesign complex network methodologies for stock market studies; research questions(RQ 2, RQ 3.1 and RQ 3.2) are investigating the synchronization of investor be-haviour in the stock market; and research questions (RQ 4.1, RQ 4.2 and RQ 5)explicitly tackle the questions related to information transfer in the stock markets.

Motivation and research objective for Publication I. A network simply put, isa collection of points, called nodes, joined together in pairs by lines, called links (M.Newman, 2010). There are three primary levels of analysis (Battiston, Glattfelder etal., 2010): the first is purely topological, where the binary adjacency matrix describesonly the existence or absence of ties, while the second level expands on the first byallowing links to have weights and the third level assigns the so-called fitness to thenodes themselves.

With an increasing amount of data being generated in all areas of life and espe-cially the financial sector, rises the demand for quantitative methods that can facil-

1Research questions and objectives are annotated as R[O/Q]#(.#), where [O/Q] stands for "O" if itis an objective and "Q" if its a question; # stands for the Arabic publication number and (.#) enumeratesthe questions and objectives if there are more than one in publication #.

19

Page 22: TUNI baltakys arkisto1

itate faster analysis and decision making. Both, the reduced cost of storage spaceand its availability combined with rising computing power allows for developingnetwork theory based methods and tools for "big data" analysis.

Recent decades have provided a considerable expansion in the research literatureon economic and financial networks. A comprehensive overview of structural net-work properties and their use cases in the study of financial and economic problemsis long overdue. In order to fill this gap, the first objective is:RO 1: To survey the use of networks and network based methods to investigate financeand economics related questions.

Publication I centres around the concept of complex networks in finance and isnot explicitly related neither to investor behaviour nor about information transfer.Therefore in the concept map in Figure 1.1 Publication I is positioned in the centreof Complex Network concept.

Motivation and research objectives and question for Publication II. Usingnetwork methods to investigate the financial markets emphasizes the relationshipsamong participants when trying to explain how the system functions. In order tounderstand the relationships in investor networks, it is not sufficient to compare thetrading behaviours in a single security. However the available network inferencemethods are mainly designed to deal with data on single security trading in a singleobserved window. Among the recent advances in network simplification attempts,researchers provided tools for reducing the number of layers by merging the mostsimilar layers (De Domenico et al., 2015) and keeping dissimilar layers apart, or usingproduct-rank based aggregation (Zhong et al., 2014). However, a computationallyefficient framework to integrate information, resulting in the most active relation-ships over different securities, periods has not been proposed. Therefore, the secondresearch objective is:RO 2.1: To develop a network aggregation method for simplified analysis of multilayerinvestor networks.

Investor network inference suffers from the high-dimension, low-sample size prob-lem, meaning that there are many more investor pairs for whom the links are esti-mated than there are trading observations. This setting motivates to either reducethe number of investigated investors in the network or leads to an unreliable linkestimation. This observation points to the following research objective:RO 2.2: To design an investor network link inference and validation approach for

20

Page 23: TUNI baltakys arkisto1

transaction level data sets.

If the choice is made to reduce the number of investors, the network inferenceleads to a description of a sub-system and drawing conclusions about the whole mar-ket is not entirely straightforward. In Publication II, this issue is circumvented byassigning investors into categories and performing the network inference for thosecategories instead of individual investors. Having an inferred network for a par-ticular time window and security gives an idea about possible information transferchannels between those investor categories. Integrating this information over multi-ple time windows and securities, using the methods developed for RO 2.1, providesinformation about the most robust observed information channels. This leads to thequestion:RQ 2: How are different investor categories connected over multiple securities and pe-riods?

Research question RQ 2: is undoubtedly related to investor behaviour, as in-vestor behaviour leads to specific trading decisions. However, the overall objectiveof this paper is to address several methodological issues in investor network infer-ence, which in turn are meant to represent information channels. Publication IIfalls right between the concepts of complex networks and information transfer inFigure 1.1.

Motivation and research questions for Publication III. IPOs have a specialplace in the investment universe. They provide new investment opportunities forexisting investors and possibly attract new investors to the markets. IPOs give morefreedom for founding stockholders to cash-in and new investors to participate infurther share appreciation. This gives reason to believe that investor behaviour andemployed information transfer channels might be different from the ones observedfor mature stocks. Numerous financial studies have investigated behavioural biasesin relation to IPOs (Kaustia et al., 2008; Karhunen et al., 2001; Keloharju, 1993;Ljungqvist et al., 2003; Ljungqvist et al., 2005). However, information transfer forIPO securities has not been investigated in terms of investor networks. Recent liter-ature on investor networks has expanded the investigation of investor communities,using the statistically validated network method (Tumminello, Micciche, Lillo, Piiloet al., 2011). In particular, time-persistent communities have been uncovered for amature company in Helsinki stock exchange (Musciotto, Marotta, Piilo et al., 2018).

In contrast to existing studies on investor networks, IPO securities have not been

21

Page 24: TUNI baltakys arkisto1

investigated. An additional drawback of previous investor network studies was thefocus on a single security. While (Baltakys, Kanniainen et al., 2018) did investigateinvestor network inferred over multiple securities, it was an aggregate network. Thecurrent gap in the literature is the lack of consideration for a wider variety of secu-rities. A broader security consideration would help to answer if the same investorcommunities form when trading different securities and if the communities formedin newly issued securities are different from the ones observed in mature securities.Additionally, we lack information about the characteristics of those communities.These observations lead to questions:RQ 3.1: What kind of investor communities can be observed for recently issued securi-ties?RQ 3.2: What are the characteristics of investor cluster evolution for IPO securities, andhow do they compare to mature securities?

Publication III heavily relies on complex network methods for investor networkinference, community detection, community evolution and community attributeinvestigation. At the same time, complex network methods are employed in orderto understand investor behaviour in IPO securities and compare the informationtransfer channels present, that is why Publication III takes the central position interms of concepts in Figure 1.1.

Motivation and research question for Publication IV. Analyzing the intercon-nectedness of investors, in terms of their trading similarities, one can shed light onthe underlying information transfer networks embedded in the markets, as well ason types of investment strategies present. In the investor network, nodes locatednext to each other share similar trade timing strategies. These, in turn, may be re-lated to an underlying information network, through which investors share or mu-tually observe information and make decisions accordingly. The structure of thisinformation network is the principal determinant in how fast and wide the informa-tion will spread in the market. Privately held information will be shared with thelocal neighbourhood, and the connectedness of this neighbourhood, as well as its lo-cation in the overall network, will determine the impact of this information on thereaction in the market. It has been found that well-connected investors in such infor-mation networks also have higher returns (Ozsoylev and Walden, 2011; Ozsoylev,Walden, Yavuz et al., 2014). At the same time, investors live in the physical world,and shorter distances between them can also facilitate information transfer, thereby

22

Page 25: TUNI baltakys arkisto1

leading to similar trading behaviour to neighboring investors (Brown et al., 2008;Feng et al., 2004). Social networks are the backbone of our civilization. Throughthese networks, individuals share their opinions, beliefs and information. The im-portance of word-of-mouth communication has been suggested to have a significantimpact on individual behaviour (Hong et al., 2005; Baltakys, Baltakiene et al., 2018a).The information flow through the underlying information network undoubtedly in-fluences investor behaviour. Investor behaviour studies have investigated the socialimpact investor neighbourhoods have on investor decision to participate in stockmarkets or the portfolio composition itself (Brown et al., 2008). However, it has notbeen put to question if investor trade timing can also be affected by word-of-mouthcommunication with investors in the neighbourhood. This leads to the researchquestions:RQ 4.1: How geographical distance, which measures a possibility for word-of-mouthcommunication, between investors is related to trade timing similarity?RQ 4.2: In addition to the geographical distance, how differences in other investor at-tributes, such as gender, language, or age, influence trade timing similarity between dif-ferent investors?

The goal of Publication IV is to look for information transfer evidence betweenhousehold investors, based on their trade timing behaviour. Therefore, in Figure1.1, it is placed right between the two concepts though it is closer to the concept ofcomplex networks than the Publication V.

Motivation and research objective for Publication V. Previous research objec-tives and questions focus on direct information transfer channels between individualinvestors. However, in addition to private channels, public information channels,e.g., financial news on the TV, Radio or the Internet, are important in financial deci-sion making. In order to understand how investors react to public Facebook posts,Publication V tests if different investors are more likely to change their positions ingiven security if the company has released an announcement through their Facebookaccount. The research question in this regard is the following:RQ 5: How does Facebook drive the investment decisions of different investor cate-gories?

Publication V investigates how information transfer from a public source affectsinvestor decisions. As this study has less to do with the concepts of network research,it is placed outside the complex network concept in Figure 1.1.

23

Page 26: TUNI baltakys arkisto1

1.3 Linkages between the publications

This dissertation is comprised of five publications that range in their scope fromdiscussing and extending the relevant network definitions and methods for financialand investor network analysis to empirically analyzing investor behaviour and in-formation transfer using network methods. This section more closely discusses thelinkages between the publications. Since this dissertation is comprised of five publi-cations, there are ten undirected links between them, and they are shortly addressedhere.

Relation between Publication I and II. Publication I serves as a survey of struc-tural network properties used in the study of financial and economic problems. Nat-urally, this publication relates to the other publications in terms of complex networkmethodologies. Publication I, serves as a foundation, defining necessary networkterminology for Publications II-IV whereas Publication II extends the complex net-works methodologies proposing a network aggregation methodology.

Relation between Publication I and III. Publication III builds on the founda-tion of network methodologies introduced in Publication I incorporating investornetwork inference and community detection methodologies. As a result of using net-work methodologies, Publication III goes on to discuss the empirical results aboutinvestor clusters for newly issued securities in Helsinki stock exchange. The ob-served synchronization between institutional investors is considered to be relatedto adoption of similar trading strategies and the existence of information transferchannels between them.

Relation between Publication I and IV. Publication IV adopts a network per-spective on what would typically be seen as a regular regression problem. Investorsare placed in a spatial network, and links between them are weighted, measuring theactual geographical distance between them as well as the trade timing similarity.

Relation between Publication I and V. Publication V does not directly usecomplex network methodologies. Instead, it investigates the impact that public newsreleases on a popular social network have on investor trading behaviour.

Relation between Publication II and III. Both publications are related to com-plex network methodologies and their use to investigate investor networks. Bothpublications contribute to the investigation of investor networks over multiple se-curities, in Publication II aggregating the information over multiple securities into a

24

Page 27: TUNI baltakys arkisto1

single network, while in Publication III observing investor networks for each secu-rity separately.

Relation between Publication II and IV. Both publications investigate informa-tion transfer in stock markets over multiple securities. Similarly, both publicationsleverage the bootstrapping approach. Publication II uses bootstrapping to validatelinks in the investor network, while Publication IV uses it to test the robustness ofregression coefficients. Whereas Publication II investigates the information transferamong different investor categories through the relationship in net traded volumes,Publication IV looks for the evidence of local information transfer among individualinvestors based on trade timing similarity.

Relation between Publication II and V. Both publications share the same def-initions of investor categories. Whereas Publication II looks for investor categoriesthat have similar trading behaviour, Publication V looks into how these investorcategories react to public information.

Relation between Publication III and IV. Both publications investigate infor-mation transfer channels between investors. Publication IV focuses on the localinformation transfer channels between household investors, while Publication IIIdoes not limit the investor set. Both publications take advantage of trading statecategories, i.e. based on the information if investors were primarily buying/sellingor day-trading. In publications, trade timing similarities are inferred using differentapproaches.

Relation between Publication III and V. Publication V focuses on how newsarrival on social networks impacts investor trading decisions for a mature security(Nokia), while Publication III directs attention to newly issued securities during thefirst two years after the IPO, however it draws conclusions about the empirical ob-servations from comparison to the same mature security as in Publication V.

Relation between Publication IV and V. Both publications investigate informa-tion transfer. Publication IV aims to capture the existence of local word-of-mouthcommunication, while Publication V aims to observe the impact of public news oninvestor trading behaviour.

25

Page 28: TUNI baltakys arkisto1

1.4 Dissertation structure

This thesis is divided into two parts. The first part covers the introductory disserta-tion chapters, and the second part consists of the original Publications I–V. Appendixsection follows after the publications. Chapter 2 introduces the reader to the keyconcepts of this dissertation and expands the research literature cited so far. Chapter3 provides the necessary information about the data sets, and Chapter 4 specifies thenecessary mathematical definition and the main methods used in Publications I–V.Chapter 5 summarizes the findings of Publications I–V and Chapter 6 completes theintroductory part by focusing on the contributions of the publications, addressingtheir validity, reliability and limitations, as well as suggesting directions for futureresearch.

26

Page 29: TUNI baltakys arkisto1

2 KEY CONCEPTS AND RELATED

RESEARCH

This chapter introduces the fundamental concepts of this dissertation and discussesthe related literature. The concepts are grouped under three topics. First, Section 2.1overviews some of the relevant concepts in behavioural finance studies. Then, Sec-tion 2.2 discusses the research on information transfer. Finally, Section 2.3 overviewsthe concepts in investor network studies. The purpose of this chapter is to present awider research area encompassing Publications I–V.

2.1 Investor behaviour

A financial market consists of millions of investors indirectly interacting with eachother by buying and selling exchange-traded securities through a central-counter-party. The trading is affected by the transactions themselves, information flow, aswell as the state of the overall economy. Understanding the social, information andspatial networks in which these investors are embedded provides insight into theircollective behaviour in the markets. Investors behave in a non-completely-rationalway, satisficing rather than optimizing when making decisions (Simon, 1957; Thaleret al., 2015). Non-textbook investors live in a world where they do not have timelyaccess to all possible information, have cognitive abilities to process it, and can makebiased decisions on it.

The specifics of human behaviour have been observed not only in the domainof finances but also in their mobility (Alessandretti et al., 2018), communication(Saramäki et al., 2014; Miritello et al., 2013), scientific collaboration patterns (Panet al., 2012; Abbasi et al., 2011) and even the design of transportation systems (Bar-rat et al., 2004). It is important to understand the characteristics of these underlyingbehaviour patterns, as these combined behaviours shape the dynamics of the whole

27

Page 30: TUNI baltakys arkisto1

market, and thus are essential factors in explaining booms and bubbles in the finan-cial markets (Ranganathan et al., 2018). Similar properties to the ones observed inhuman communication (Saramäki et al., 2014) emerge in investor trading patterns,i.e. investors concentrate their attention mainly on a few securities and direct a smallproportion of their attention to a broader set of securities, additionally, few of theinvestors are very active in the stock markets while the majority are rather passive.

While executing individual trading strategies in the stock markets, non-observableindirect relationships are formed in terms of behavioural similarities. The resultingstructure is called investor network (Ozsoylev, Walden, Yavuz et al., 2013). Thestructure of these networks can influence dynamics of interaction, communication,spreading of information and panic; therefore, it is relevant not only from the per-spective of individual members but also from that of the whole society. Marketforces cannot be investigated at individual investor level as for each investors’ action,there is a counter-action, for each purchase, there is a sale.

It became clear that existing models are no longer sufficient in explaining investorand market behaviour (Battiston, Farmer et al., 2016). Investor actions in the stockmarkets differ from the ones prescribed to rational investors by traditional financehypotheses. The research field of behavioural finance aims to explain this differencebetween the observed investor behaviour in the markets and the one that should befollowed by the rational investor (Thaler et al., 2015; R. C. Shiller, 2000). By in-vestigating the real investor behaviour, researchers can observe how the behaviouris different from the rational investor, hypothesize why it is different and observeto what outcomes and implications this behaviour leads. Understanding the indi-vidual investor behaviour at the microscopic level, how this, in turn, influences thebehaviour of investor clusters at mesoscopic level, which in turn forms the marketbehaviour at the macroscopic level (Schelling, 2006), might lead to tools to observeand modify investor behaviour in order to increase the economic stability and wellbeing of the markets.

Making financial decisions is a complicated process, layered with different choices,that might be hard to optimize. Understanding the existing biases and deviationsfrom optimal/rational behaviour might help in developing strategies to prevent sub-optimal choices. Traditional finance describes the rational investor as a selfish, risk-averse, someone who is trying to maximize his utility function, possesses near-perfectinformation and immediately updates his beliefs with the arrival of new informa-

28

Page 31: TUNI baltakys arkisto1

tion. For real investors, unlike for rational investors, Bayesian belief updating doesnot come easy. They do not have perfect and timely information in the first place,and moreover, they lack the cognitive abilities to process that information into ra-tional and consistent decisions. At the same time, a rational investor would use onlylogic, and unlike real investors, refrain from using emotions in decision making.

The widespread use of rational investors’ concept is due to the fact that it is easierto model such behaviours. A more realistic model of investor behaviour is based onbounded rationality (Simon, 1957), i.e. the assumption that investors decisions arerational, but limited by investors knowledge and cognitive abilities. So an investorwith bounded rationality attempts to gather as much information as is available anduses heuristics, to simplify the decision-making process, until a satisfactory decisioncan be made.

Household investors are especially prone to cognitive and emotional biases (H. K.Baker et al., 2002). It is possible to correct for cognitive biases once they are identifiedthough it is more difficult to change the behaviour caused by emotional biases. Oneof the cognitive biases is confirmation bias, where investors overweight the infor-mation that confirms their views and ignore the information that contradicts them.Another bias – the illusion of control, is when investors believe they have controlover the outcomes. Availability bias is when readily available information and out-comes are thought as being most probable or more important. All three mentionedbiases, together with selective communication, where individuals tend to report onlythe decisions that lead to positive outcomes, in a neighbourhood setting, might leadto sub-optimal behaviour. As a matter of fact, researchers have identified behavioursthat are influenced by the investors’ neighbourhood (Hong et al., 2004; Brown et al.,2008; Heimer, 2014; Shive, 2010).

Markets, on the other hand, are deemed efficient, if all available relevant informa-tion is already accounted in the price due to the actions of many rational investors,while the portfolios constructed under efficient market assumptions are called op-timal. In order to have the correct prices in the market, no one individual needs tospecifically be correct in estimating the price, but it is sufficient if in aggregate, theinvestor population is. If the market is efficient, it is next to impossible to consis-tently outperform it as investors would exploit any existing inefficiencies until theinefficiency is eliminated. Fama, 1970 proposed three forms of market efficiency:

29

Page 32: TUNI baltakys arkisto1

weak, semi-strong and strong 1. Various studies have both confirmed the efficiencyof markets, but, on the other hand, found anomalies that could lead to excess re-turns. No trade would be executed if either of the transaction counter-parties wouldsuspect that they are trading at a wrong price, that is unless the two parties havedifferent information about the true value of the asset. Indeed, for the markets tobe efficient, information needs to be collected and analyzed. Grossman et al., 1980argued that efficient markets could generate positive returns as long as they do notexceed the costs of information acquisition. Therefore, more sophisticated investorswho are aware of these inefficiencies can outperform less savvy investors.

2.2 Information transfer in stock markets

As R. C. Shiller, 2000 suggests, stock markets build up bubbles because of large popu-lations of investors start basing their purchases, not on the research of the true intrin-sic value of the assets, but rather motivated by the emotional hype created by newsmedia, false heuristics and investor herding. Therefore it is necessary to take intoaccount how investors attain information, how they pass it on to other investors,and how this information transfer can affect financial markets.

Every transaction in the stock exchange is a record of two counter-parties ex-changing assets at an agreed price. Both counter-parties in the transaction have dif-ferent expectations for the assets exchanged, which implies that they hold differentinformation. Besides using public, investors may use private information sources,available only to them, for example, their social network, where private opinions,valuations and other non-public information can be exchanged. Having superior in-formation is crucial in order to succeed in the stock markets (Grossman et al., 1980).One way to acquire such information is through the so-called information network(Colla et al., 2009; Ozsoylev and Walden, 2011), where information is passed onfrom one investor to another. Investors’ location in the information network andthe structure of the network is crucial in defining its success in financial markets

1The weak form market efficiency assumes that all historical price and volume information is fullypriced in, i.e. technical analysis will not generate excess returns. The semi-strong form of market effi-ciency assumes on top of the weak form efficiency, that also all available public information is incorpo-rated in the price, i.e. both the technical and fundamental analysis will not generate excess returns. Thestrong form of market efficiency assumes that all, both public and private, information is incorporatedinto security prices; therefore, neither the insider information can generate excess returns.

30

Page 33: TUNI baltakys arkisto1

(Ozsoylev, Walden, Yavuz et al., 2014). Information network structure leads to anon-uniform concentration of information among the investors. Therefore, the de-cisions made by investors are heterogeneous and at least partially can be influencedby social interaction.

Ozsoylev, Walden, Yavuz et al., 2014 construct Empirical Investor Network (EIN)by linking investors together if they have traded the same security in the same di-rection during at least a single five minute time window. The assumption is thatinvestors react to the information simultaneously and therefore links in the EIN canbe thought of as proxies for information channels. Such links could also be explainedby investor reaction to public news announcements. Nevertheless, the authors arguethat the information networks are not sufficiently dense to be exclusively explainedby public news diffusion.

From institutional and individual investor questionnaire surveys, R. J. Shiller etal., 1989 found that direct interpersonal communication is important when makinginvestment decisions, and some investors are influenced by word-of-mouth commu-nication. This holds not only for household investors but is also true for professionalinvestors as they learn from each other in a professional setting. Authors define twotypes of investors - "diffusion" and "systematic" investors, where diffusion investorsdo not use systematic trading rules, but instead receive, pass and react to the infor-mation received through word-of-mouth communication, while the "systematic" in-vestors are the ones responding to the underlying asset values. Another importantobservation from the investor reports in the survey is that the information transferdoes not seem to be concentrated around news events. Hong et al., 2005 investigatedthe spread of information through word-of-mouth communication among mutualfund managers. Authors found that even for non-local stocks, mutual fund managersare more likely to trade a security if other mutual fund managers in the same city aretrading that particular security.

A recent study of legal documents in 183 insider trading cases further contributedto the concept of information spread by word-of-mouth means. Ahern, 2017 inves-tigated 1139 peer-to-peer information exchanges of material non-public informationbetween 622 individuals. These individuals shared information about events thathad a significant impact on stock prices. Insiders shared sensitive information withindividuals they trusted, as in the case of exposure, they would have suffered legalconsequences. The motivation to share the sensitive information was to gain favour

31

Page 34: TUNI baltakys arkisto1

with family, friends and employers. Roughly 90% of the information transfer linkswere between family friends and business associates, and most of them lived close by.Many of the individual pairs shared educational background; around 80% met duringor before college. Moreover, insiders were more likely to share information with in-dividuals of the same age and gender. An important suggestion from the study is thatindividuals might share the information in order to gain favour with the recipientsand usually, the information flows from lower social hierarchy to higher, i.e. fromsubordinates to bosses, for older to younger, from children to parents. The authoralso found that central investors in the insider trading network had more profitabletrades and possibly received more valuable information, confirming the observationsof (Ozsoylev and Walden, 2011; Walden, 2014; Ozsoylev, Walden, Yavuz et al., 2014).

The structure of communication networks is detrimental to the way informa-tion spreads across the population. Stein, 2008 developed a model to investigate theexchange of ideas in a social network. The model suggests that gossip or rumourlevel underdeveloped ideas can spread far through the network, while valuable ideasremain in the local neighbourhood. Andrei et al., 2017 found that word-of-mouthcommunication can spread the information across the social network at an increasingrate, even generating short-term momentum and long-term reversal in asset prices.Similarly, Banerjee et al., 2009 suggested that price momentum can arise when thereare large differences in investor opinions.

Social networks are the backbone of our lives, connecting us with our loved ones,family and friends, with our colleagues from work, and other professionals, aid-ing trade and business. The links in this network facilitate information transfer,either face-to-face or using other means of communication. However, informationis not the only thing that can spread across these links, e.g. disease spreading/epi-demic models (Pastor-Satorras et al., 2001; Jones et al., 2003) can also be investigatedthrough the use of social networks. The study of social networks started with sociol-ogists and moved in parallel with the mathematical study of random networks, andsince then gained interest in computer science and statistical physics fields, thoughkeeping different methodologies and approaches (Jackson, 2005). Social networkscan be used not only for the transfer of information but also for beliefs and be-haviours and investment heuristics. The use of the same beliefs and behaviours canlead to similar trading even when no information is shared. For instance, Smith et al.,2014 provide evidence that people in the same social network tend to have the same

32

Page 35: TUNI baltakys arkisto1

gender, race, religion, age, and education and Guiso et al., 2009 show that similarityin language and ancestry is related to greater trust. Goldenberg et al., 2009 arguesthat even though with the advent of the internet, long-distance communication be-came much more accessible through the social network platforms and the overalllevel of communication dramatically increased, the increase affected the local socialties more significantly. Authors show that the volume of electronic communicationis inversely proportional to geographical distance.

Milgram, 1967 performed a significant experiment in the study of social net-works. He measured the distance between two, directly through the social network,unconnected members and found that it takes only a few connections to pass infor-mation from one to another, and named this phenomenon the so-called small-worldeffect. The experiment was reproduced in many different network studies and founda similar truth. This is an important aspect to keep in mind when considering infor-mation networks - investors that never knew each other could potentially share thesame information through just a few common connections.

Information networks are directed, in a given information channel one acts as asender and the other as a receiver of information. The roles can be changed, depend-ing on which side in the pair has excess information. In order for the informationchannel to function, both sides have to engage in the exchange.

Communication networks are inseparable from social learning. Bala et al., 1997developed a communication network formation model, where agents individuallyform and discard links based on costs of forming, maintaining and benefits in termsof rewards. One of the inputs, when determining the cost of link creation andmaintenance, can be the spatial distance between the two individuals (Johnson etal., 2003). The model suggests that no matter the initial network structure if theinformation "decays" during the transmission, networks form in locally connectedneighbourhoods with some agents bridging these neighbourhoods together.

It is also important to understand how information networks impact asset valua-tions. For example, Han et al., 2013 investigated a rational expectations equilibriummodel where traders can learn about the asset payoff from the market price, informa-tion acquisition or communication with other traders through the social network.Authors proposed that when information is exogenous, information sharing reducesthe variance of the stocks payoff and affects the price, improving market efficiency.At the same time, when information is endogenously created, the possibility to com-

33

Page 36: TUNI baltakys arkisto1

municate with informed investors reduces the incentive to acquire information ontheir own. In this situation, a fraction of investors who put effort acquiring informa-tion is inversely related to network density, i.e. better social networks lead to fewerinformed traders.

Research literature has also identified behavioural biases caused by the local neigh-bourhoods. Kaustia et al., 2012 found that positive investor stock returns have animpact on their neighbourhood peer stock market entry probabilities, while nega-tive returns do not seem to have a similarly negative effect. This observation mostlikely comes due to selective communication when investors are not sharing deci-sions that lead to negative performance. Similarly, Brown et al., 2008 found thatinvestors are more likely to participate in the stock markets if the participation rateamong their fellow peers is high. Ivkovic et al., 2007 also found that investor de-cision to purchase securities in a particular industry is related to the choice of thepreferred industry in the neighbourhood. These observations give the premise toview investors in the stock market as interconnected units with actions influencedby their local communities. This setting is perfect for the application of networktheory methodologies.

2.3 Investor Networks

In this dissertation, a network, where nodes represent investors and the links be-tween them some type of trading similarity, is referred to as an investor network.The study of investor networks requires a particularly detailed, investor level dataset, that allows tracking individual investor transactions in the stock market. Usingsuch data set and an appropriate inference method, investor networks can be esti-mated. Tumminello, Micciche, Lillo, Piilo et al., 2011 proposed an inference methodbased on the hypergeometric test, specifically with investor transaction data sets inmind. Other network inference methods include the use of Pearson correlation ormutual information measure estimates, with subsequently some network filtrationor validation applied to them (Ranganathan et al., 2018; Baltakys, Kanniainen et al.,2018; Gutiérrez-Roig et al., 2019). For example, the Conservative Causal Core Net-work (C3NET) extracts the significant maximum mutual information network, andMinimum Spanning Tree (MST) or Plannar Maximally Filtered Graph (PMFG) ex-tract the strongest relationships taking into account some topological properties of

34

Page 37: TUNI baltakys arkisto1

the resulting networks.

Using the method proposed by Tumminello, Micciche, Lillo, Piilo et al., 2011, foreach investor scaled net trading volumes ri (t ) are calculated. ri (t ) ranges between−1 and 1. Then, an appropriate positive threshold θ is chosen that defines threetrading states. An investor i is said to be primarily buying (selling) on a day t ifri (t ) > θ (ri (t ) < θ) and it is said that he is primarily-buying and selling if −θ ≤ri (t ) ≤ θ. For each pair of investors, nine possible trading state combinations canbe tested against a null hypothesis of random co-occurrence. Later, Challet et al.,2018 introduced a lead-lag network inference method, applicable to investor networkinference. Essentially, the method closely follows the (Tumminello, Micciche, Lillo,Piilo et al., 2011) approach, testing the null hypothesis of random state co-occurrencebetween the trading states of investor i and lagged states of investor j .

In a later publication, Tumminello, Lillo et al., 2012 used this method to inferinvestor networks in Helsinki Stock Exchange. Authors investigated investor clus-tering in most liquid security in Helsinki Stock Exchange – Nokia in the period from9 October 1998 to 29 December 2003. Authors used Infomap community detectionalgorithm to find distinct investor clusters. Household investors dominate most ofthe identified clusters, however, there are clusters with over-represented groups offinancial institutions, general-government and non-profit organizations. A more re-cent study extended the analysis of investor clusters in Nokia security by focusingon the cluster dynamics (Musciotto, Marotta, Piilo et al., 2018). By investigatingannual investor networks over 15 years from 1995 to 2009, authors observed thataround 20% of clusters with more than five investors are connected with at least onecluster in the subsequent year and some clusters show a persistence even up to sev-eral years. Additionally, certain attribute over-expression is observed in subsequentperiods. Authors observe a negative relationship between the log validated networkdensity and the average daily volatility of Nokia’s stock, empirically confirming thathigher asset volatility is associated with a higher number of distinct trading strate-gies.

When performing multiple statistical tests simultaneously, the Type I error is ac-cumulating unless a proper controlling procedure is applied. For example, if theexistence of each of the M links in a network is tested with significance α, then theprobability that at least one of the links would be inferred spuriously is 1− (1−α)M(Streiner et al., 2011). For example, if every pairwise link in an undirected network

35

Page 38: TUNI baltakys arkisto1

is tested with significance level α= 0.01, then there is a 99% probability that at leastone of the links is inferred spuriously, in a network with 458 links, that is a net-work with approximately 30 nodes. Though some researchers have expressed theiropinions against the use of Multiple Test Correction (MTC) (Rothman, 1990), theconsensus in the networks literature is to correct for the family-wise error. The twomost popular approaches are – the False Discovery Rate (FDR) and the Bonferronicorrection (for more details, see Chapter 4). The downside of using MTC is that it israther strict, especially Bonferroni, and might lead to networks with few to no links.At the same time, a situation is possible where for two networks with a differentnumber of investors, links would be tested against a null hypothesis with differentstatistical significance, and this can raise the question of their comparability.

Ranganathan et al., 2018 have also investigated the dynamics of Nokia’s investornetworks. The change in networks was captured by inferring overlapping networksover six-month windows shifting the window by one month in order to get the sub-sequent network snapshot. Maximum spanning trees were extracted from pairwisenet-volume correlation estimates, resulting in a network with all investors connectedinto a tree-like structure having the strongest trading strategy similarities. Authorsargued, that through the use of maximum spanning tree, they were able to observehousehold investor herding leading up to the dot-com bubble.

While in Publication II, for investor network inference, mutual information wasestimated from net-volume data, Gutiérrez-Roig et al., 2019 have suggested an alter-native approach. Instead of using the net-volume data, authors proposed to use themutual information and transfer of entropy measures over symbolized investor i ’strading actions on end of the day t changes to holding positions ΔNi (t ). A chosenembedding dimension m describes the symbols and their number. For the simplecase of m = 2, when only two subsequent days are taken into account, there areonly three possible symbols (similar to the states defined in Tumminello, Lillo etal., 2012). If ΔNi (t ) > 0 (ΔNi (t ) < 0), investor i ’s trading volume is said to bedominated by buying (selling) orders and ifΔNi (t ) = 0, the trader is either inactiveon day t or acting like a day trader. Having constructed the symbolic time seriesallows for estimating the symbolic mutual information and symbolic transfer of en-tropy measures. Similarly to Publication II, a bootstrapping approach has been usedto validate links between investors. The network, inferred using symbolic mutualinformation, describes the synchronization between individual investors, while the

36

Page 39: TUNI baltakys arkisto1

one inferred using the symbolic transfer of entropy describes how investors followeach others trading behaviour. Authors applied the proposed approach for a data setof 566 non-professional investors trading eight different assets in the Spanish IBEXmarket in the period between 2000 and 2008.

Unlike previously mentioned studies, Gualdi et al., 2016 investigated a finan-cial institution network where the linkages between institutional investors are de-termined based on common asset holdings. Authors proposed the use of BipartiteConfiguration Model (Saracco et al., 2015) in testing the random allocation of assetsto owner portfolios. The result is a network of statistically significant portfolio over-laps, that is a valuable tool when considering systemic risk, due to the possibility ofa fire sales spillover.

As previously mentioned, Ozsoylev, Walden, Yavuz et al., 2014 investigated in-vestor networks in Istanbul Stock Exchange during 2005. The networks were con-structed by simply linking investors if they have executed a trade in the same direc-tion in the same security during some time window. Central investors are found toearn higher profits and trade earlier than peripheral investors.

Even though the research on investor networks to this date is somewhat limited,the development of investor network inference methodologies from the transaction-level data sets as well as the wider availability of such data sets will inevitably leadthis research area to expand and help in explaining the heterogeneous behaviour ofstock market investors.

37

Page 40: TUNI baltakys arkisto1

38

Page 41: TUNI baltakys arkisto1

3 RESEARCH DATA

This section introduces data sets used in Publications II-V. Section 3.1 introduces theinvestor-level transaction data set from the Helsinki Stock Exchange used in Publica-tions II-V. Section 3.2 introduces the Facebook data set covering the activity relatedto Nokia companies official account used in Publication V. Section 3.3 describes thedata set of all Finnish postal codes used in Publication IV. Sections 3.4 and 3.5 detailthe data related to Nokia’s announcements and daily price data. Finally, Section 3.6describes some of the data processing.

3.1 Shareholding registration data

The object of empirical study in this dissertation is the data set obtained from FinnishCentral Securities Depository, maintained by Euroclear Finland. The data set con-tains information about all shareholding changes in securities listed in Nasdaq OMXHelsinki Exchange. It is possible to track individual investor transactions for allFinnish households and institutions, while foreigner information can sometimes beaggregated. The data set covers the period from 1995-01-01 to 2016-12-31. Eachrecord in the data set contains information about the executed transaction as wellas the meta-data about the investors, such as investor gender, language, and yearof birth. The field owner_id helps to keep track of individual investor transac-tions. The following fields describe the transaction itself: isin, trading_date,registration_date, volume, transaction_type (purchase or sale), holding_type(if the transaction belongs to the investors’ portfolio, or aggregate foreigner account),transaction_basis (if market place or different type transaction), price, currency.While the following fields describe the investor: owner_id, year_of_birth, gender,sector_code (determines if the transaction belongs to a household or some institu-tional investor), juridical_form (additional field to determine the type of investor),postal_code and language. There are five broad groups of investors according

39

Page 42: TUNI baltakys arkisto1

to their sector codes: non-financial corporations, financial and insurance corpo-rations, general governmental organizations, non-profit organizations, and house-holds. Each investor may have more than one unique value in each of the attributefields. Yet, this does not mean that the attributes are dynamic, but rather, the at-tributes represent the state of the data at the time of data extraction. If the attributeswere corrected from mistakes or changed based on changed investor circumstancesfrom the previous data extraction date, they will appear changed in the future dataperiods, but not the previous ones. Therefore the last observed attributes should bethe mistake-free observations.

Some considerations need to be taken into account when working with this dataset. First, in the year 2000, Finland adopted the common European currency; there-fore, prices of all trades prior to the adoption should be adjusted to the euro currency.Second, the trade settlement cycle has changed from T +3 to T +2 on the 6th of Oc-tober 2014. Transactions up to the trade date of the 3rd of October 2014 were settledaccording to the T + 3 settlement cycle, and trades with the trading date startingfrom the 6th of October 2014 are settled with the new T + 2 cycle. What is more,starting from November 2009, not all transactions may be reported in the data set,but rather an investor-specific aggregated transaction record, yet still on a daily bases.

When investigating the data consistency, miss-matches in total volume boughtand sold were observed for dozens of securities on most of the days. Euroclear hasaddressed this issue and in the spring of 2017 provided a replacement section for theperiod from 2000-10-01 to 2007-06-30. Previous studies using this data set must beread, having in mind that some transactions might be missing in their analyses ifthis particular period was investigated. For more information about the data set see(Grinblatt et al., 2000; Grinblatt et al., 2001; Lillo et al., 2015; Tumminello, Lilloet al., 2012).

Publications II–V have used different periods in their analysis and Figure 3.1shows which sections of the shareholding registration data were used. PublicationII investigated the period from 2004 to 2009, as during this period, the data set wasidentified as the most error-free. Publication III uses the first two year data for 14IPOs, irrespective of when the IPO took place. While Publication IV uses the dataset up to 2010. Finally, Publication V uses the period overlapping with Facebookdata, i.e. from 2010 to 2016.

40

Page 43: TUNI baltakys arkisto1

1995-01-01 to 2016-12-3195 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16

Publication IIPublication IIIPublication IVPublication V

Figure 3.1 Chart illustrating the periods of shareholding registration data set that are used in PublicationsII-V.

3.2 Facebook data

The information from Nokia’s Facebook wall about the daily numbers of posts andpost related comments, likes, and shares is collected using the Social Data AnalyticsTool (Hussain, Vatrapu et al., 2014; Hussain and Vatrapu, 2014). The data set coversthe period between June 2010 and December 2016. The main object in the dataset is the information about the post. Other supplementary information about thecomments, likes, and shares is related to a specific post. Therefore, the numbers ofcomments, likes, and shares are assigned to the date of the target post. This way,the number of comments, likes, and shares quantify the importance and the overallattention the posts received.

The daily Facebook data is aggregated to weekly windows by summing the num-bers of posts, comments, likes, and shares during a given week. The week is definedin a way that it begins on Saturday and ends on Friday since trading does not oc-cur on weekends. This way, the Facebook activity on weekends is related to theweek where it can affect investors’ trading decisions. In total, the sample contains342 weekly observations for posts, comments, likes, and shares. On average, thereis more than one post made per day, and one post got, on average, 274 comments,4379 likes, and seven shares.

3.3 Postal codes

In Publication IV household investors are embedded in geographical space using thepostal codes as a proxy for investor location and the links between investors are ac-

41

Page 44: TUNI baltakys arkisto1

Figure 3.2 Postal code distribution in Finland.

tual geographical distances between postal code locations. Postal code coordinatedata was obtained by combining the information from the services aggdata.com1

and geonames.org2. The combined data set contained information about 3621 postalcode coordinates. Still, nine postal codes in the shareholding registration data setwere unmatched to coordinates and therefore eliminated from the study.

Fig. 3.2 shows the distribution of Finnish household investors’ postal codes in thecountry. Postal code areas vary from sparsely dense to medium dense with roughly64% of postal codes having less than 600 investors. Postal code locality coordinatesare used to calculate the distances between each pair of investors using the Vincenty’sformula (Vincenty, 1975). For investors in the same postal code, or in cases wheretwo or more postal codes share the same coordinates, the distance is measured as 1/4of the distance to the closest non-zero distance postal code.

1https://www.aggdata.com2http://www.geonames.org

42

Page 45: TUNI baltakys arkisto1

3.4 Nokia announcement data

Nokia announcements filed with Nasdaq are collected from the official NASDAQOMX Nordic website3. Data set contains 507 public announcements in the periodfrom June 2010 to December 2016. To conform to the other data standard in Publica-tion V, the data is aggregated to weekly resolution (saturday to friday as in Facebookdata).

3.5 Nokia weekly return data

For Publication V, split and dividend adjusted daily closing price data for Nokia isprovided by NASDAQ OMX Nordic. Then using this price information, for eachweek, log returns are calculated by:

Re tt = lnPt

Pt−1,

where Pt is the last trading day during the week (Friday), and Pt−1 is the closing priceof the last day during the previous week. The data set is collected for the period fromJune 2010 to December 2016.

3.6 Data processing

The data processing required a particularly significant effort, especially in the at-tempt to identify and fix the discrepancies in the shareholding registration data set.Once the main data issues were solved, using the data was quite straightforward.For Publications II and V investors were categorized into groups based on their at-tributes. Each investor in the data set is assigned to a sector group: Financial andInsurance, Government, Non-Financial, and Non-Profit companies, and FinnishHouseholds. In Publication II, households are further divided into five age groups:Under-Aged (0,18], Young (18,30], Middle-Aged (30,50], Mature (50,64], and Re-tired (64,+∞]. Here, the age attributes are derived for each transaction separately,taking into account the difference between the transaction date and the year of birth

3http://www.nasdaqomxnordic.com/News/companynews/

43

Page 46: TUNI baltakys arkisto1

of the corresponding investor.In Publication V, Household investors are divided into four groups based on their

trading activity. The activity group is defined based on the number of days the in-vestor traded during the past eight weeks, including the analyzed week. If the num-ber of active days in the past eight weeks is equal to 1, the investor is consideredinactive; if it is between 2 and 5, the investor is passive; 6–20 means moderate; and21–40 means active. This is a dynamic attribute, as one investor might have differentactivity levels throughout the analysis period.

Coming back to the investor categorization in Publication II. All of the investorgroups are also distributed geographically by assigning investor postal codes to 11regions: Helsinki, Rest-Uusimaa, Eastern-Tavastia, South-West, Western-Tavastia,Central-Finland, South-East, Ostrobothnia, Northern-Savonia, Eastern-Finland andNorthern-Finland. Together, these assignment rules form 99 investor categories.Each of the transactions in the data set is assigned to one of these categories.

44

Page 47: TUNI baltakys arkisto1

4 METHODS

This chapter describes the methods used in Publications I–V. Since Publication I it-self is a survey of structural properties of networks, Section 4.1 briefly introducesthe general network notation, concepts and some of the measures from PublicationI that are relevant to Publications II–V. Section 4.2 describes the network inference,while Section 4.3 describes the network validation and filtering methods used in Pub-lications II, III. Finally, Section 4.4 briefly introduces linear and logistic regressionsused in Publications IV and V.

In research philosophy, epistemology is what we perceive and describe reality tobe. Due to the empirical nature of this thesis, a positivist epistemology is adopted,whereby quantitative methods are used, and the numerical data are assumed to berepresentative of investor underlying behaviour characteristics and the inferred syn-chronisation between these variables to be representative of their relationships. Thequantitative approach leads to the hypothesis testing about the significance of the ob-served relationships from the systematic measurement of empirical data and allowsto obtain generalisable findings.

4.1 Network notation, concepts and measures

A network, simply put, is a set of nodes connected via links. Nodes can representany agent in a complex system, for example, an airport or city in transportation net-works, a website in a WWW network, a human in a social network, and so on. In thisdissertation, the nodes represent investors, both household and institutional. Linksrepresent relationships between the nodes. In transportation networks that wouldbe roads between the cities or flight routes between the airports, in WWW networksthey stand for links from one website to another and in social networks they are in-formation channels between friends and acquaintances. While in this dissertation,links will primarily measure some trading similarity between the two investors.

45

Page 48: TUNI baltakys arkisto1

Notation. Formally, a network (graph) is defined by a pair of two sets (V , E). Vis a finite set of N nodes (vertices), and E ⊆ V ×V is a set of M links (edges), con-necting pairs of nodes. The terms vertex and edge are usually used in graph theory,while the terms node and link are frequently used in the computer science literature.A network can be represented by an adjacency matrix A. Its elements are defined asfollows:

Ai j =

⎧⎨⎩1 if nodes i and j are connected by a link ((i , j )⊂ E ),

0 otherwise.(4.1)

A network is said to be undirected if the adjacency matrix is symmetric and di-rected otherwise.

Network links can have more than just the binary information of their existence,but also a real number assigned to it, in which case it is called a weighted network.This can be achieved by setting the adjacency matrix elements equal to the weights ofthe corresponding links. Another way to assign link weights is by defining a functionw, mapping from links to real numbers, w : E →�, so that weight between nodes iand j is denoted as w(i , j ) := wi j . In this dissertation, network link weights will beused to represent the strength of different trading similarity measures or geographicaldistance between individual investors or their categories.

A slightly different type of network is called a bipartite network. Such networkcontains two sets of nodes U and V . Links in bipartite networks exist only betweenthe two sets of nodes, i.e. the set of nodes is defined as follows: E ⊆ U ×V . Ininvestor network setting, the set of nodes U can stand for the investors, while the setof nodes V can stand for securities traded or trading days. If V stands for securities,then the link (i , j ) would represent investor i ’s holding or trading security j , whileif V stands for trading days, then the link (i , j ) would stand for investor i ’s activityon trading day j .

Centrality measures. There exist many network measures in order to under-stand and compare the structural properties of networks. One of the essential mea-sures is the family of centrality measures, that quantify node importance in thenetwork from different perspectives. The most essential and popular centrality mea-sure in all fields of research is the degree centrality ki . It corresponds to the number

46

Page 49: TUNI baltakys arkisto1

of links, that are connected to node i and is defined by

ki =∑

j

Ai j . (4.2)

The higher i , the more connections node i has to other nodes. The degree distribu-tion can be obtained by calculating degrees of all nodes and calculating the frequen-cies

�(k) :=∑

i

�{k=ki }N

, (4.3)

where �{k=ki } is equal to 1 if k = ki .

Another measure of centrality is called the closeness centrality and denoted Ci .It measures the average distance from node i to all other nodes in the network. Ifd (i , j ) is the shortest distance between nodes i and j , then the closeness centralityis defined as

Ci =N − 1∑j d (i , j )

(4.4)

A slightly different type of centrality measure is the so-called betweenness central-ity. It is based on the amount of all shortest paths that pass through a node (M. E.Newman, 2001). The betweenness centrality is given by

Bi =∑

s �=i �=t∈V

σs t (i )σs t

, (4.5)

where σs t is the number of shortest paths between node s and t and σs t (i ) is thenumber of shortest paths going through node i .

Random networks. There are different random network models, where certainparameters are set and others remain random. The simplest of the models is theso-called Erdos–Rényi random network. It is defined by just two parameters – thenumber of nodes and the number of links. In order to generate this random network,it is sufficient to have the network density ρ, which is calculated simply by takingthe ratio between the number of links in the network and the total possible numberof links, i.e. ρ = M/(N (N − 1)/2) in the undirected network case. Then for eachpair of nodes, a link is placed between them with probability ρ. Publication II usesthe notion of such random networks to test for a random occurrence of links inan ensemble of networks, while in future research, other random network models,

47

Page 50: TUNI baltakys arkisto1

like the exponential random graph, can be used to account for more sophisticatednetwork features.

Community detection. Node assignment to communities allows for a betterunderstanding of the network structure at the mesoscopic scale. There are variousmethods developed to find communities having high modularity, i.e. communitiesof nodes that are densely connected within and sparsely with nodes outside theircommunity. The existing methods include modularity maximization, spectral clus-tering and clique percolation among many others (Fortunato, 2010). PublicationIII takes advantage of the Infomap (Rosvall et al., 2008) algorithm, that defines com-munities by minimizing the entropy of a random walk description. The Infomapalgorithm is chosen as it is the dominant community detection algorithm used inthe investor network literature.

4.2 Network inference

Besides the many ways different networks can be classified, they can also be splitinto observable and non-observable networks. An observable network has iden-tifiable links between its nodes. For example, social networks are mostly observ-able, where individuals self-declare their relationships and their ties. Other exam-ples of observable networks are road, world-wide-web, co-citation and power gridnetworks among many. Investor networks, on the other hand, are an example of anon-observable network, since the links between investors must be estimated fromtheir trading behaviour and can not be observed from the transaction data. Indeed,the links in the investor network do not even exist in the real world, and they arestatistical concepts based on trading similarities. In order to estimate the presenceof links, some similarity measure has to be used, and an inferred link between twoinvestors means that they share some feature, e.g. strategy, behaviour or portfoliocomposition.

Similarity-based networks. Similarity networks are undirected (Ai j = Aj i ) andthe choice of the similarity measure is arbitrary and depends on the particular situ-ation. In the case of investor networks with investor transaction data, each investorcan have several time-series representing their trading behaviour, e.g. net volumetime-series for some specific security. The most straightforward similarity measurein this situation is the Pearson correlation. In such case, the similarity between

48

Page 51: TUNI baltakys arkisto1

two investors (nodes) is quantified by the linear correlation. In a system with Nnodes, a N ×N size matrix C can be estimated, which measures the pairwise corre-lations between investor net volume time-series. The matrix C completely describesa weighted network, where every of links connecting nodes i and j is associated witha weight Ci j .

Estimating the network structure with similarity measures, yield fully connectednetworks, therefore an appropriate network filtering or validation method is re-quired in order to extract meaningful information. Different filtering methods giverise to different network structures. Similarly, resulting link weights are only statisti-cal estimates of the true coefficients, therefore prone to measurement noise. Accord-ingly, the set of links can be trimmed to a subset of statistically significant estimates.For a discussion on network filtering and validation methods, see Section 4.3.

In what follows, we assume that we have N investors, for which we have theirtrading time series. For a selected security, V (i , t ) is defined as the net traded volumeby investor i on day t . It can be further split into volume bought Vb (i , t ) and soldVs (i , t ) during the day. We will use these time series in the following sections todescribe approaches in estimating trading similarity links between investors.

Information theoretic measures. This section briefly introduces informationtheoretic measures (Cover et al., 2012) that are used for network inference in Pub-lication II. The key definition here is the measure of entropy (Shannon, 1948). En-tropy measures the uncertainty of a random variable. If p(x) is the probability massfunction of a discrete random variable X , then the entropy H of X is defined by

H (X ) =−∑x

p(x) log p(x). (4.6)

Entropy takes non-negative values (H (X )≥ 0) and it is highest when X is uniformlydistributed (most uncertainty about the outcome of X ), while H (X ) is lowest andequal to 0 when X is concentrated in one value, i.e. there is no uncertainty aboutthe outcome of X .

If Y is another discrete random variable, then H (X |Y ) is the conditional entropyof X given Y .

H (X |Y ) =∑y

p(y)�−∑

xp(x|y) log (p(x|y))

�, (4.7)

49

Page 52: TUNI baltakys arkisto1

H(X,Y)

H(X|Y) H(Y|X)I(X,Y)

H(X) H(Y)

Figure 4.1 Relationship between entropy, conditional entropy and mutual information in Venn diagram.

where p(x|y) is the conditional probability. Finally, mutual information measuredbetween variables X and Y is the difference between the entropy of variable X andthe conditional entropy H (X |Y ). Mutual information is denoted as I (X ,Y ) and itis the reduction of uncertainty about the outcomes in the first variable, based on in-formation provided by the second variable, so it is a measure of dependence betweentwo random variables. Intuitively, it is the amount of information one random vari-able has about the other random variable (see Figure 4.1 for illustration):

I (X ,Y ) =∑x,y

p(x, y) logp(x, y)

p(x)p(y)= I (Y,X )≥ 0. (4.8)

The advantage mutual information holds over, for example, correlation, is theability to capture non-linear dependencies. The difficulty of using mutual infor-mation as a measure of dependence comes in its estimation (Kraskov et al., 2004;Walters-Williams et al., 2009). However, if the variables are jointly normally dis-tributed, or such assumption can be made, mutual information value can be calcu-lated analytically from Pearson correlation, by the following equality:

I (X ,Y ) =−12

log(1−ρ2X Y ). (4.9)

Network inference using mutual information. Having net trading volumes forinvestors i and j – Vi , Vj we can calculate the mutual information between investornet trading volume time series I (i , j ) by

I (i , j ) := I (Vi ,Vj ) =∑

Vi ,Vj

p(Vi ,Vj ) logp(Vi ,Vj )

p(Vi )p(Vj ). (4.10)

50

Page 53: TUNI baltakys arkisto1

In Publication II the investor traded net volumes are assumed to be jointly normallydistributed. Therefore, correlation estimates between the time series are used to ob-tain the mutual information (see Equation 4.9). The framework developed in Publi-cation II can handle networks inferred using any of the available similarity measuresas long as they provide an ensemble of binary networks. The use of mutual informa-tion measure is chosen as an example, since, despite the normality assumption, it isbetter equipped to capture non-linear relationships. In fact, Publication II is amongthe first investor network studies leveraging mutual information measure.

Recently, a new approach based on symbolic mutual information has been pro-posed to measure synchronization among financial traders (Gutiérrez-Roig et al.,2019). Authors also proposed the use of the symbolic transfer of entropy measurein order to estimate lead-lag relationships between trader behaviours.

Jaccard similarity index. Another way to assign weights to network links isby measuring commonly shared elements in the sets of attributes/properties of thenodes connected. For this purpose, one of the most popular measures used is theJaccard similarity index. It is defined as the number of elements in the intersectionover the number of elements in the union.

J (A,B) =|A∩B ||A∪B | =

M11

M01+M10+M11, (4.11)

where A and B are sets of elements of two investors for which the similarity is mea-sured. M11 is defined as the number of elements common in both sets, M01 is thenumber of elements that exist in set B but not A and M10 is the number of elementsthat exist in set A but not B . In set notation M11 = |A∩ B |, M01 = |B \ A| andM10 = |A \ B |. Jaccard similarity index ranges from 0 to 1, 0 being the case whenthere are no shared elements, and 1 when all elements in both sets are the same.

Network inference using Jaccard similarity Index. In order to measure tradingsimilarity, for a particular security, between two investors i and j using the Jaccardindex, two investor specific sets have to be defined. For this purpose, the tradingstate variable is defined next. The following notation has been used in PublicationsIII and IV.

First, net scaled volume r is calculated from the trading data. For a given security,for each investor i and each trading date t total volume sold is denoted as Vs (i , t )and total volume bought is denoted as Vb (i , t ). Then the scaled net volume ratio is

51

Page 54: TUNI baltakys arkisto1

calculated as

r (i , t ) =Vb (i , t )−Vs (i , t )Vb (i , t )+Vs (i , t )

. (4.12)

Here r (i , t ) ranges from −1, which is the case if investor i was only selling, to 1,which is the case if investor i was only buying, during day t . Next, the trading statefor day t can be assigned:

⎧⎨⎩b − primarily buying state, when r (i , t )> θ,

s − primarily selling state, when r (i , t )<−θ,(4.13)

where θ is a chosen threshold.

The following Jaccard similarity index based measures relate to the ones usedin Publication IV when measuring trade timing similarity between household in-vestors. Each day, when a household investor is active in the stock market, he eitherincreases or decreases his positions. Over a period, investors’ activity yields two setsof dates. In order to measure the trade timing similarity between two investors, thesimilarities of these sets have to be compared. The goal is to have a 0 similarity ifboth investors have timed their trades differently, a similarity of 1 if their same typetrades were executed precisely on the same dates and somewhere in-between 0 and 1when there is some but not complete overlap between investor trade timing. Jaccardsimilarity index is ideal for this purpose.

For each investor i two sets of trading days are collected – Bi and Si . Bi containsall the days when investor i was in trading state b and Si contains all the days wheninvestor i was in trading state s, i.e. Bi = {t : r (i , t )> θ} and Si = {t : r (i , t )<−θ}.Then the buying similarity can be defined as

JB (i , j ) =|Bi ∩Bj ||Bi ∪Bj | (4.14)

and selling similarity can be defined as

JS (i , j ) =|Si ∩ Sj ||Si ∪ Sj | . (4.15)

52

Page 55: TUNI baltakys arkisto1

Further the trading similarity over both behaviours can be defined as

JSB (i , j ) =|Si ∩ Sj |+ |Bi ∩Bj ||Si ∪ Sj |+ |Bi ∪Bj | . (4.16)

Similarly, the trading similarity can be extended to take into account trading sim-ilarity over multiple securities by

J zSB (i , j ) =

|B z1i ∩B z1

j |+ |S z1i ∩ S z1

j |+ . . .+ |B zNi ∩B zN

j |+ |S zNi ∩ S zN

j ||B z1

i ∪B z1j |+ |S z1

i ∪ S z1j |+ . . .+ |B zN

i ∪B zNj |+ |S zN

i ∪ S zNj |

, (4.17)

where z is a set of securities. The following equation generalizes the Jaccard indexcalculation for measuring trade timing similarities:

Ji j =∑

z,d M (z,d ;i , j )11∑

z,d

M (z,d ;i , j )

01 +M (z,d ;i , j )10 +M (z,d ;i , j )

11

, (4.18)

where M (z,d ;i , j )11 is the total number of trading days where both i and j are in the

state d ∈ {b , s} in security z, M (z,d ;i , j )01 is the total number of trading days where i is

not and j is in the state d ∈ {b , s} in security z, and M (z,d ;i , j )10 is the total number of

trading days where i is in the state d ∈ {b , s} in security z and j is not.

Hypergeometric distribution. It is possible to form a bipartite network frominvestors trading data. Links in such a network can be based on investor owned ortraded securities or days when an investor was active in a particular security. A pro-jection of such a bipartite network yields an investor network. In order to achievethis projection, an appropriate approach should be chosen to determine relevantlinks. One such approach is called – statistically validated network approach. Itis leveraging the information about investor trading states and their co-occurrences(Tumminello, Lillo et al., 2012). Similarly to Equation 4.13, trading states dependenton the value of the net scaled volumes (Eq. 4.12) are defined. In this situation, a daytrading related state bs is included:

⎧⎪⎪⎪⎨⎪⎪⎪⎩

b − primarily buying state, when r (i , t )> θ

s − primarily selling state, when r (i , t )<−θb s − buying and selling state, when −θ≤ r (i , t )≤ θ

. (4.19)

53

Page 56: TUNI baltakys arkisto1

In total there are nine possible trading state co-occurrence combinations for twoinvestors i and j . Let N P

i be the number of trading states P for investor i and N Qj be

the number of trading states Q for investor j , while N PQi j is the number of trading

state co-occurrences when investor i is in trading state P and investor j is in tradingstate Q. Then if N PQ

i j > 0, we can establish a link between investors i and j .

The probability of observing N PQi j trading state co-occurrences under the null hy-

pothesis of randomness is given by (Tumminello, Micciche, Lillo, Piilo et al., 2011):

p�N PQ

i , j

= 1−

N PQi , j −1∑X=0

H (X |T ,N Pi ,N Q

j ). (4.20)

Statistically validated network approach has become one of the leading investor net-work inference methods in the literature. Therefore, this method was leveraged inPublication III. Moreover, the reason for the choice of this method, was to obtaininvestor clusters that are comparable in nature to those observed in (Tumminello,Lillo et al., 2012; Musciotto, Marotta, Piilo et al., 2018). This allowed us to coher-ently extend the existing results about investor clusters over multiple IPO securities.

4.3 Network filtering and validation

Network filtering. The most straightforward filtered network is a threshold net-work. Given the set of all link weights between nodes, only those above or below adetermined level are kept, depending on if higher or lower values mean stronger rela-tionships. A different, more sophisticated approach is to extract the most importantlinks based on some topological network property. One of the most common meth-ods to detect a hierarchical structure is given by the Minimum Spanning Tree (MST)algorithm (Mantegna, 1999). MST yields a network that has a tree structure, con-necting all nodes with the smallest possible link weights. MST is used in PublicationII to obtain a filtered mutual information investor network1. The filtering methodwas chosen due to its computational efficiency, which is very important when deal-ing with bootstrap ensembles. Also, it is frequently used in financial network studiesand has been already used in several investor network studies (Ranganathan et al.,

1Since MST extracts the network with smallest link weights, mutual information values in Publi-cation II are multiplied by −1, as higher mutual information values mean stronger relationship

54

Page 57: TUNI baltakys arkisto1

2018; Musciotto, Marotta, Piilo et al., 2018).

MST is a strict filtering approach; despite that, it does not involve statistical test-ing. In the case when network links are estimated using mutual information mea-sure, before using the MST filtering, mutual information values should be trans-formed so that higher mutual information values would become lower. Otherwise, ifmutual information values are not transformed a maximum spanning tree approachwould yield the same results. The MST algorithm consists of the following steps:

1. Sort pairwise mutual information values in an ascending queue.

2. Take the node pair with the lowest value and add a link between them in thefiltered network if it does not create a cycle.

3. Remove the mutual information value from the queue.

4. Repeat steps 2 and 4 until all nodes are connected and there are unused mutualinformation values in the queue.

Another more complex structure based on network topology is called a PlannarMaximally Filtered Graph (PMFG) algorithm (Tumminello, Aste et al., 2005). Themain difference of the network structures obtained with the PMFG with respect tothe MST is the number of links. MST has N − 1 links, while PMFG has 3(N − 2).PMFG extracts all the strongest links from the original network, even if they formloops, but as long as the extracted links do not cross each other, therefore the name– plannar. MST structure is present within the PMFG structure. It has been shown(Tumminello, Coronnello et al., 2007) that correlation-based network filtering tech-niques like MST or PMFG can distinguish economic sectors and sub-sectors as com-munities in networks.

Conservative Causal Core Network (C3NET). The following algorithm wasdeveloped by one of the co-authors in Publication II (Altay et al., 2010) and used toinfer investor category networks for different securities. The algorithm was chosenas an example to illustrate the network aggregation framework and could be sub-stituted by another network inference method. It is designed to extract the mostimportant statistically significant relationships, for each node in the network, thatare estimated using mutual information measure. The algorithm is computationallyefficient and is suitable for large networks or network ensembles. It consists of threesteps:

55

Page 58: TUNI baltakys arkisto1

1. Pairwise mutual information values are estimated (this is easily extended toother similarity measures)

2. Each mutual information value estimate is tested against the null hypothe-sis H0 : I (i , j ) = 0. If the null hypothesis is not rejected at a chosen signif-icance level, mutual information value for the tested pair and therefore thelink weight is set to 0 otherwise 1.

3. From the remaining statistically significant mutual information values eachnode in the network is allowed to keep a single strongest link.

The resulting network is a binary network with at most N links in a network of Nnodes.

Network aggregation. The aggregation procedure is meant to summarize theinformation about links in an ensemble of networks {Gk}Ek=1, yielding a single net-work with the most reoccurring links in the network ensemble. Here E stands forthe number of networks in the ensemble. The network ensemble might be createdbased on transaction bootstrapping (discussed later), observing the trading similarityrelationships in different periods or observing the relationships over different secu-rities. It is possible that the ensemble covers several or all of the mentioned options,and it is possible to have multiple aggregation steps to integrate this information.The aggregation procedure follows the subsequent steps:

1. The network ensemble is aggregated into a weighted network {Gk}Ek=1→Gw ,where the link weights in the network Gw correspond to the number of par-ticular link occurrences in the ensemble. For example, the weight of a linkbetween investor groups i and j is defined as

ni j =Gw (i , j ) =E∑

k=1

Gk (i , j ), (4.21)

where ni j may assume integer values between 0 and E .

2. A statistical hypothesis test to remove the need for an arbitrary link thresholdparameter is conducted: The null hypothesis, H

ni j

0 , is rejected if the numberof networks ni j in the ensemble with a link between i and j is less than n0(α),where α is the significance level. Probability p, for two groups to be connectedby chance in an N network ensemble, is estimated as the fraction of the actualnumber of links in the ensemble

∑i> j ,k{Gk (i , j )} to the number of all possible

56

Page 59: TUNI baltakys arkisto1

links in the ensemble E × (N (N − 1)/2), where N is the number of nodes inthe network. Then ni j follows a binomial distribution,� (p, E) and

pi j = �(n ≥ ni j ) =E∑

n=ni j

�En

�pn(1− p)E−n (4.22)

is the probability of observing by chance the link between nodes i and j morethan ni j times.

3. In order to control the family-wise error rate, correction can be applied here.In the case of Bonferroni multiple hypothesis test correction procedure, thechosen significance level α is adjusted by the number of tests (ntests): αadjusted =α/ntests. Therefore, nodes in the aggregated network G are connected if pi j <

αadjusted.

Network validation by transaction bootstrapping. As mentioned previously,a bootstrap network ensemble can be generated by bootstrapping the transactiondata set, calculating net-volumes and inferring networks using any method. Afterestimating the investor trading networks for each bootstrap ensemble, links can bevalidated based on the number of times they are observed in the ensemble. This al-lows to statistically validate even the links obtained from methods without statisticalvalidation, for example, MST.

Multilayer aggregation procedure. The following steps describe a frameworkwhere previously introduced network aggregation procedure is used multiple timesto integrate information about links inferred for different securities and periods.This framework was introduced in Publication II.

1. For a set of securities S and a number of non-overlapping inference periods T ,transaction bootstrapping, network inference and bootstrap network ensem-ble aggregation is performed, yielding the {Gs t }(S×T ) ensemble. Each networkrepresents significant relationships between investor groups for specific secu-rities at different periods.

2. Then the network aggregation procedure over securities for each period t isapplied

∀t ∈ T :{Gs t }Ss=1network aggregation−−−−−−−−−−→Gt ,

resulting in an ensemble of networks {Gt }Tt=1. Each of the {Gt } ensemble net-

57

Page 60: TUNI baltakys arkisto1

works represents significant relationships between investor groups that occurover multiple securities during period t . Similarly, if we apply the networkaggregation procedure over time for each security s we end up with an en-semble of networks {Gs}Ss=1, where each of the networks {Gs} represents themost important over time reoccurring relationships between investor groupsin security s .

3. Finally, the second layer of information is aggregated.

{Gt }Tt=1network aggregation−−−−−−−−−−→ �G

Depending on the aggregation order, the aggregation sequence leads to uniquenetworks:

{Gs t }(S×T ) =⇒ ∀t{Gs t }Ss=1n. agg.−−−→ {Gt } =⇒ {Gt }Tt=1

n. agg.−−−→ �G{Gs t }(S×T ) =⇒ ∀s{Gs t }Tt=1

n. agg.−−−→ {Gs} =⇒ {Gs}Ss=1n. agg.−−−→ �G , (4.23)

where �G �= �G.

Network community attribute analysis. The hypergeometric test can be lever-aged again, in order to describe network communities based on their dominatingattributes (Tumminello, Miccichè et al., 2013). After the network community de-tection, a set of communities is provided, distributing all N investors into each oneof them, e. g., community C having NC investors. Each member of the communityhas some value for each of the possible attributes. By performing a hypergeometrictest, the community can be defined in terms of the statistically over-expressed at-tributes in the community. The probability of observing X investors with attributeA is H (X |N ,NC ,NA), where NA is the number of investors with attribute A amongall N investors in the system. Then the probability of observing by chance NC A

investors in community C with attribute A is obtained by

po(NC A) = 1−NC A−1∑

X=0

H (X |N ,NC ,NA) (4.24)

The null hypothesis of random occurrence is rejected if the over-expression p-valuepo is lower than a selected significance level. In this case, it is said that attribute A isover-expressed in community C . Alternatively, the attribute under-expression in a

58

Page 61: TUNI baltakys arkisto1

given community can be tested looking at the left tail of the same hypergeometricdistribution, i.e. pu = 1− po .

Network community evolution analysis. Network community evolution canbe identified, by leveraging the hypergeometric test again over two sets of investorcommunities and their compositions from different years. Reusing the notationfrom the attribute over/under-expression analysis, NA denotes the number of in-vestors in community A in the first year, and NB denotes the number of investorsin community B in the second year, while NAB denotes the number of common in-vestors in communities A and B . Then, if N is the number of distinct investors overthe two years, the probability to observe by chance NAB common investors is ob-tained by Equation 4.24. More details about the procedure can be found in (Marottaet al., 2015).

4.4 Regression Analysis

The method of linear regression allows for modelling a linear relationship between adependent variable and exploratory (independent) variables. In case there are morethan one of the exploratory variables, the method is called multiple linear regression(Greene, 2003). Publication IV uses multiple linear regression model to study therelationship between investor trade timing similarities and the geographical distancebetween them, controlling for attributes such as language, age and gender.

Logistic regression is the most widely used regression model when dealing withbinary response data (Wooldridge, 2015). In Publication V logistic regression modelwas used, since the dependent variable was a binary decision to increase or decreasetheir holding position. Logistic regression utilizes the logit link function. It providesa simple interpretation of the regression results with odds-ratios.

59

Page 62: TUNI baltakys arkisto1

60

Page 63: TUNI baltakys arkisto1

5 RESULTS

This chapter reports the summarized results of Publications I–V. Results are dis-cussed in the light of research objectives and questions declared in Section 1.2.

5.1 Investor networks in stock markets

The first research objective RO 11 is directly addressed in Publication I (Emmert-Streib et al., 2018a). The publication serves as a survey of network methods applica-ble in the analysis of economics and finance domains. Besides reviewing the standardnetwork concepts and definitions, the publication goes on to discuss the current stateof network methodology adoption and use in the areas as well as the possibilities thatthese methods can bring to the field in the future.

The paper, calls for a further comparative study of network inference methodsfor all possible types of economic and financial networks with a consideration of theweaknesses and strengths of the methods. Inference methods capturing meaningfulinformation would lead to a better interpretation of network structures.

A need to combine both the "traditional" financial studies and data-driven mul-tidisciplinary research is expressed in the publication. There is potential for a fruit-ful symbiosis between the two strands of research often living in entirely differentworlds. The data-driven research can raise questions based on exploratory researchof the actual complex systems to be answered by theoretical models. And the otherway around, the theoretical models should be backed up by data-driven research.However, not only the researchers should be more open to the union of these twoworlds, but also the journals should allow for using a wider variety of methodologiesto be used.

1RO 1: To survey the use of networks and network based methods to investigate finance and eco-nomics related questions.

61

Page 64: TUNI baltakys arkisto1

The research objective RO 2.12 is addressed in Publication II (Baltakys, Kanni-ainen et al., 2018), which introduces a framework for integrating different networkinformation layers. In the paper, investor category networks are inferred for differ-ent periods and different securities, as well as different re-sampled transaction datasets. Research question RO 2.23 is answered by employing the proposed networkaggregation approach. That is, for re-sampled transaction data sets network aggrega-tion procedure works as network link validation. While performing the same net-work aggregation procedure for network ensemble inferred for different periods,yields a network with most reoccurring links. The advantage of the proposed net-work aggregation procedure over standard thresholding approach is the use of statis-tical significance instead of an arbitrary threshold when filtering the least significantrelationships. The use of multilayer aggregation procedure also changes the topol-ogy of the aggregated network ensemble. For example, if the network ensemble wascreated using MST approach (see Sec. 4.3), the aggregated network will not neces-sarily preserve a tree structure. Similarly, in the case of C3NET (see Sec. 4.3), theaggregated network with N nodes will not necessarily have N links.

The proposed methodology allows for inferring networks over multiple secu-rities, thereby allowing the market-wide analysis of investor networks instead ofmaking the conclusions based on single securities. The comparison of link over-laps for networks aggregated over the same six-year periods, however using differentsize windows and aggregating over them, shows that there is a significant differencein the number of links inferred. Links that are inferred aggregating longer windowsare also likely to be inferred when aggregating shorter windows, but not the otherway around. So some of the relationships last shorter periods and may re-appear indifferent periods, most likely due to following of some specific market determinedstrategies, and can not be inferred when looking at longer periods.

This approach allows us to answer the research question RQ 2.24 by observingthe associations between investor trading behaviours over different periods and dif-ferent securities. Investors’ categories with higher centralities in the network struc-ture, behave in a mutually predictable way with more groups than categories withlow centralities. The application to Helsinki stock exchange for network ensem-

2RO 2.1: To develop a network aggregation method for simplified analysis of multilayer investornetworks.

3RO 2.2: To design an investor network link inference and validation approach for transactionlevel data sets.

4RQ 2: How are different investor categories connected over multiple securities and periods?

62

Page 65: TUNI baltakys arkisto1

ble inferred over 100 securities and 12 half-year periods shows that the most centralinvestors across degree, load and closeness centrality measures are middle-aged andmature household investors in Helsinki area. In the light of (Walden, 2014), whoshows that higher centrality in the networks can suggest better access to private in-formation channels, these investors (middle-aged and mature household investors inHelsinki area) should be the best-informed investors in Finland.

In Publication III (Baltakiene et al., 2019), investor networks are investigated for69 companies for the first two years following their initial public offering. In orderto answer the research question RQ 3.15, investor networks are inferred separatelyduring the first and second year after IPO using the statistically validated network ap-proach (see Section 4.2). After that, Infomap community detection algorithm is usedto assign investors into groups of locally densely connected sub-graphs. Members inthe community represent investors that time their trades in a synchronized manner.In the cluster attribute analysis, we observed that the largest communities are asso-ciated with over-expression of institutional investors, namely, general-governmentand non-profit institutions, while at the same time the biggest communities signif-icantly under-represent the Household investors. These observations suggest thatinstitutional investors have more similar trade timing strategies across the set of IPOsecurities when compared with household investors. These conclusions agree withthe literature that provides evidence of institutional herding (Nofsinger et al., 1999;Sias, 2004). Three possible explanations for such observations are: (i) similar port-folio restrictions apply to regulated institutional portfolios; (ii) private informationchannels exist between these institutions leading to synchronized trading or (iii) allinstitutions react to public information in the same way.

To answer research question RQ 3.26, investor overlaps in the first- and second-year clusters are statistically validated. The results show that some clusters persistinto the second year though the majority do not. The analysis shows that the net-works themselves change considerably from the first year to the following year, soeven the persisting clusters are somewhat different in the second year compared withthe first. Similarly, some cluster splitting is observed when one cluster in the firstyear is statistically associated with more than one cluster in the second year. We findthat the majority of clusters appear more than in one security; however, there are

5RQ 3.1: What kind of investor communities can be observed for recently issued securities?6RQ 3.2: What are the characteristics of investor cluster evolution for IPO securities, and how do

they compare to a mature security?

63

Page 66: TUNI baltakys arkisto1

also some security-specific clusters. This is a new finding, empirically confirmingthat groups of investors employ the same strategies when investing in several secu-rities. Investor clusters found in the first and second years after an IPO were alsoobserved in mature securities. Therefore, the origin of these clusters is not due toIPO’s per se. The reoccurring investor clusters might be due to market wide invest-ment strategies followed by those investor groups.

5.2 Information transfer in investor networks

Research question RQ 4.17 is the primary motivation behind Publication IV Bal-takys, Baltakiene et al. (2018a). The results in Publication IV show a consistentnegative relationship between the trade timing similarity and the physical distanceseparating investors. The observed results suggest the existence of local social com-munication between investors that affects their investment decisions.

In response to research question RQ 4.2 8, the results suggest that age differencealso has an impact on trade timing similarity, i.e. investors with age difference, largerthan ten years, on average, have smaller trading similarity than pairs of investorswho are of more similar age. This result, that age similarity matters in improvingcommunication, is in line with the literature (Liebowitz et al., 2007; Leskovec et al.,2008). A similarly intuitive result is the importance of language aspect. PublicationIV finds that investors who use different languages as their primary languages have,on average, smaller trade timing similarities. There is some disagreeing evidencefrom regions with different language distributions, namely Finnish and Swedish.Further, gender does not provide any particularly strong evidence of importance intrade timing similarity.

Finally, the research question RQ 59 is answered in Publication V (Siikanen et al.,2018a) that examines the relationship between investors’ trading decisions and com-pany announcements on their official Facebook account. Similarly to PublicationII, investors are grouped into five categories: companies, financial (and insurance)institutions, governmental organizations, non-profit organizations, and householdsand the analysis is performed for each category separately. Households are further

7RQ 4.1: How is the geographical distance between investors related to trade timing similarity?8RQ 4.2 : In addition to the geographical distance, how differences in other investor attributes,

such as gender, language, or age, affect trade timing similarity between different investors?9RQ 5: How does Facebook drive the investment decisions of different investor categories?

64

Page 67: TUNI baltakys arkisto1

divided into four categories based on their trading activity. The main analysis inves-tigates if for a set of investors who traded in a given week, Facebook data is relatedto an investor’s decision to increase or decrease his/her position on Nokia stock.Publication V provides the first empirical evidence that social network activity on acompanies Facebook page affects the trading of different investors differently. Theanalysis shows an association between Facebook data and the decisions of mainlypassive household investors and non-profit organizations. In contrast to this obser-vation, no association between the decisions of financial institutions and Facebookdata has been observed. Unlike professional investors, less sophisticated investorsbenefit the most from company disclosures made through the social media as the in-formation is delivered to them making the access to information easier (Snow et al.,2017).

65

Page 68: TUNI baltakys arkisto1

66

Page 69: TUNI baltakys arkisto1

6 DISCUSSION AND CONCLUSION

This chapter concludes the first part of the dissertation. The chapter starts witha section discussing the contributions made to the research area, then proceeds todiscuss the reliability and validity of the conducted research. Finally, limitationsand suggestions for future research are discussed.

6.1 Contribution

This dissertation gives a brief introduction to the research investigating investor net-works and concentrates on contributions made in Publications I–V. This disserta-tion gives an idea (i) why it is interesting to investigate investor networks and (ii)what are the challenges when working with investor networks. Additionally, basicmethodologies and techniques when working in this area are introduced.

This dissertation with papers therein contributes to the methodological as well asempirical literature of financial networks. Notably, it contributes to the analysis ofinformation transfer networks in stock market analysis both with methodologicalsuggestions as well as empirical observations. The research community would bene-fit from a better characterization of the real individual investor behaviour as well asunderstanding how their interaction in the stock markets facilitates price discoveryand economic cycles. Overall, this dissertation contributes to the analysis of investorbehaviour and their interaction in the stock markets by adopting the complex net-work methodologies.

Publication I (Emmert-Streib et al., 2018a), contributes to the field of networksin finance by surveying the use of networks and networks related methods in thestudy of finance and economics. Various existing network measures and their use inquantifying the structural properties of economic networks are discussed.

Publication II (Baltakys, Kanniainen et al., 2018) addresses the increasingly im-portant topic of multilayer networks Kivelä et al., 2014. In particular, it contributes

67

Page 70: TUNI baltakys arkisto1

a framework to aggregate and validate networks of possible information transferchannels observed in different periods and over different securities. There are manymethodological challenges in the analysis of investor networks, and this article ad-dresses several of them. The contributions of the publication are manifold: investordata categorization based on socioeconomic attributes, transaction bootstrapping,which improves the network inference, and multilayer aggregation procedure. Thepublication also adopts a known method from systems biology – C3NET, used toinfer gene regulatory networks, for the investor network inference.

The aggregation procedure is comprised of several individual steps. First, struc-turing the transaction data into network layers and, second, the aggregation of theselayers. The network inference methods (C3NET and MST applied on a mutual in-formation network) are used as an example and can be replaced by any other method,i.e. the network aggregation framework is applicable to different existing inferencemethods. Ideologically, the proposed ensemble aggregation framework is extend-ing the aggregation procedure introduced in (Matos Simoes et al., 2012) and aimingfor a similar goal as (Zhong et al., 2014), where the rank-product method is used toimprove the accuracy of gene network reconstruction. The proposed framework ismeant to substitute trivial network ensemble aggregation procedures such as maxi-mum and mean rules (Polikar, 2006). De Domenico et al., 2015 proposed a relatedconcept of complexity reduction in multilayer networks, whereby similar layers arecombined, leaving only the most dissimilar layers apart. Additionally, empirical evi-dence that households in Helsinki are the most central investor category is provided,which could indicate them being well-informed investors.

Publication III (Baltakiene et al., 2019), contributes to the field of investor net-works by investigating the investor interconnectedness and possible informationtransfer for newly issued shares, in the first two years after initial public offerings.Earlier studies, like (Tumminello, Lillo et al., 2012; Musciotto, Marotta, Piilo et al.,2018), have mainly concentrated on the analysis of mature stocks and the investornetwork structures inferred using their data, while Publication III is the first contri-bution to the analysis of newly issued security network structures and their dynam-ics.

Publication extends the findings of (Musciotto, Marotta, Piilo et al., 2018) beyondthe most actively traded security in Helsinki stock exchange, to 69 IPO securities andinvestigates their investor cluster dynamics. The empirical results show that investor

68

Page 71: TUNI baltakys arkisto1

clusters in IPO securities are rather dynamic, they form both in the first and thesecond years after IPO; however, most of the clusters rearrange in the second year,having but a few statistically significant overlaps with a cluster from the first year.Additionally, statistically significant similarity is found between investor clusters indifferent security networks, i.e. investor clusters not only persist in time, what wasfound in (Musciotto, Marotta, Piilo et al., 2018) and confirmed for IPO securities in(Baltakiene et al., 2019), but also appear in different securities.

This is also the first study to investigate investor clusters for IPO securities interms of socioeconomic attributes. The largest clusters in IPO securities are over-represented by institutional investors, namely general-government and non-profitorganization investors. This further provides evidence for institutional herding (Sias,2004), that can be the cause of similar trading strategies, existing information ex-change channels or reaction to public news.

In the Publication IV (Baltakys, Baltakiene et al., 2018a), the Finnish retail in-vestor activity in the stock market is used to investigate trade timing similarity amonggeographically related neighbours. The article contributes to the field of networkstudies in finance, providing evidence that the neighbourhood affects not only inthe questions of market participation and portfolio composition (Kaustia et al., 2012)but also in trade timing. The pairwise trade timing similarity measure is constructedbetween two individuals, and the effect of geographical distance on trade timing sim-ilarity is examined with a number of control variables. The results show that thecoefficient on geographical distance is generally negative, i.e. the farther people livefrom each other, the less similar their trades are. This suggests the existence of localinformation diffusion channels between household investors, particularly for thosewho are of similar age and speak the same language.

Publication V (Siikanen et al., 2018a), provides the first empirical evidence of theimpact that company news releases in social networks, and in our case, Facebook,have on individual investor decisions to trade securities.

6.2 Reliability and validity of research

Research reliability refers to the consistency of the scientific study together withits conclusions, i.e. if things are done correctly and results are robust, trustworthyand repeatable. First of all, the main data set used in Publications II–V is a well-

69

Page 72: TUNI baltakys arkisto1

established and frequently used database of investor-level transactions in the HelsinkiStock Exchange. The data set is obtained from the official distributor, Euroclear Ltd,even though the data set is supposed to be the exact replica of the investor holdings,co-authors of the publications have put a particularly significant effort improvingthe data quality for Publications II–V and the broader scientific community. Severalinconsistencies in the data set have been identified and solved in collaboration withEuroclear; therefore, the data set itself should be more reliable than ever before. Allfurther data manipulations were carefully programmed and checked multiple timesbefore confirming the final results.

Secondly, in order to ensure the reliability of results, multiple data sets are usedand/or various robustness checks are performed in Publications II-V. PublicationII compares the network aggregation procedure applied to two network inferencemethods. Additionally, in Publication II, transaction bootstrapping for investorcategories is performed in order to make sure that the observed synchronization isnot because of specific transaction data realization as well as over-representation ofsome particular investor. As for reliability in Publication III, the network inferenceis applied to 69 different IPO securities as well as five mature securities. The resultsshow internal consistency by observing similar network structures across securitiesas well as the properties for the mature securities. Similar observations regarding thecomposition and investor cluster properties have been observed in other research(Tumminello, Lillo et al., 2012; Musciotto, Marotta, Piilo et al., 2018). Next, Pub-lication IV ensures the reliability of the regression results by performing the re-gression analysis with several different control variables, additionally checking theconsistency of results by means of bootstrapping. More specifically, for each inves-tigated region carrying out 1000 bootstrap iterations, where in each iteration 5%of all pairwise observations is sampled. The results show that most of the signifi-cant coefficients from the principal analysis are found to be significant in more than90% of bootstrap iterations too, confirming the reliability of regression results. Fi-nally, Publication V checks the robustness of results by comparing the associationbetween trading decisions and social media data for different investor categories aswell as different specifications based on the trading activity of household investors.Additionally, the results are tested for different specifications of social media activity,e.g., the number of comments, likes and shares per post.

To address the issue of accruing False Positives (Type I error) when multiple tests

70

Page 73: TUNI baltakys arkisto1

are performed, where appropriate, multiple test correction is applied to the p-values.In Publication II, Bonferroni correction is used, while in Publication III FDR cor-rection is used.

Research validity relates to the fact that the data, methods and the overall designof the research are documenting, measuring and reporting what they are intendedto. In terms of research validity, all articles use or are based on widely establishedstatistical methods. The measures and methodologies used have been selected aftera careful investigation of the extant literature.

For example, an important consideration is the choice of the network inferencemethod and its appropriateness. As discussed in Section 2.3, there are several ways toinfer the networks from investor level transaction data sets. In Publication II, mu-tual information measure is chosen to estimate linkages between investor categoriesfrom net-volume time series. Then, the observed network structure is suggested tobe a proxy for information transfer networks. It can be argued if the mutual infor-mation measure is appropriate to identify information transfer channels, as both fol-lowing identical and completely different trading strategies would yield high mutualinformation measurements. Nevertheless, in both situations, if the mutual infor-mation measure between the two net-volume time series is high, it is likely that thedecisions are based on the same information, but the investment strategies resultedin opposing actions. Similarly to Publication II, Gutiérrez-Roig et al., 2019 have alsoused information-theoretic measures to investigate investor trading synchronizationand anticipation.

Publication III takes advantage of the hypergeometric test to infer synchroniza-tion links between investor trading time, which is documented as a valid methodby a number of studies (Tumminello, Lillo et al., 2012; Musciotto, Marotta, Piiloet al., 2018; Challet et al., 2018). Similarly, Infomap method (Rosvall et al., 2010)for community detection is one of the most frequently used algorithms in recentliterature.

The Euroclear transaction data set contains information about the postal codesassociated with investor trading accounts. Publication IV uses this information,together with the geographical coordinates of all postal code centres, to estimate aproxy for the pairwise distances between household investors. The resulting dis-tances are only approximate, as investors are not necessarily concentrated in the cen-tre of postal code areas. However, earlier studies (Brown et al., 2008; Ivkovic et al.,

71

Page 74: TUNI baltakys arkisto1

2007) have found that even a broader definition of the neighbourhood is sufficientto observe investor behavioural biases. Therefore, the approximation should be avalid measure for the possibility of word-of-mouth communication. Additionally,the local bias observed in investor decisions, where they prefer to invest in securitieswhose headquarters are located in the area, should not affect the results. Local biasaffects the investor decision to own local companies and should not make investorsalways time their transactions in them in a similar way on the same days unless theyshare an information transfer channel.

The fact that all articles, except for Publication III have undergone the peer-review process in high-quality journals, further confirms the reliability and validityof the data sets, methods and research design used.

6.3 Limitations and suggestions for future research

This section addresses the research limitations encountered in Publications I-V andpossible future research directions based on the encountered limitations.

Publication I has focused broadly on economic and financial networks just brieflyreporting on the developments in investor network studies. This is partially due tothe fact that at the time of writing Publication I, investor networks have not pro-duced sufficiently many publications to warrant a survey. However, with the in-creasing number of investor network inference methodologies and different data setsbecoming available to researchers, a survey of the current state of the art in investornetwork studies is becoming more relevant than ever.

The most significant limiting factor of Publications II–IV is the empirical analy-sis using data from a single stock exchange. Methodologically the same approachesshould be perfectly applicable with any investor-level data set. However, it remainsopen if the findings about different investor groups would hold with other markets,too. This, however, is difficult to verify in practice because such shareholder regis-tration data is unique and is not generally available from many other markets. Un-fortunately, we do not have access to other similar detail data sets were not availablefor study.

A further limitation of the investigated data set is the lack of time stamps. Havingonly the information about the transaction date allows for focusing the studies ondaily resolution synchronization between investor trading behaviour and can not

72

Page 75: TUNI baltakys arkisto1

be extended to investigate intra-day information transfer. On the other hand, ac-cording to (Ozsoylev, Walden, Yavuz et al., 2014), results on investor networks withdata from the Istanbul Stock Exchange with intra-day analysis are consistent withthe daily resolution, and therefore this may not be a significant limitation. Besides,word-of-mouth communication most likely does not happen at the resolution ofseconds, especially for household investors.

Inference methods used in this dissertation yield undirected networks. However,we know from insider trading studies (Ahern, 2017) that information channels mightbe directed. Unfortunately, it is difficult to determine the direction of the informa-tion channels with the data set used in this dissertation. Therefore, lagged informa-tion transfer (Gutiérrez-Roig et al., 2019; Challet et al., 2018) is not investigated inthis dissertation and left for future work, with possibly higher resolution data setsthat might become available in the near future, and the question of directed infor-mation networks can be revisited.

Similarly, in Publication V the social media data set covers information onlyabout a single security – Nokia. The reliability of the study results would benefitfrom analysis of different maturity and economic sector securities as well as differ-ent markets.

We name the inferred links in Publications II, III, IV, as potential pathways for in-formation transfer. However, the links are inferred based on significant trading syn-chronization, which in turn must not necessarily occur due to information transfer,but instead could be the outcome of the same investment strategies and their reactionto market fluctuations or public information arrivals. That is, information transfercan, but also not always, lead to synchronized trading while synchronized tradingcan be explained not only by information transfer.

In order to improve the reliability of results, instead of using trade synchroniza-tion as a way to find investor interconnectedness, portfolio composition similaritiescould be used (Cimini et al., 2015). This alternative has not been considered in thedissertation, but some works are in progress for future research.

The used data set covers full information about investor trading in the HelsinkiStock Exchange. However, the full investment portfolio picture is lacking, as the dataset does not have any information about non-Helsinki-Stock-Exchange holdings. Anaddition of such data would significantly improve the reliability and validity of anyinvestor-level studies, but such detailed data sets, covering all investors in a given

73

Page 76: TUNI baltakys arkisto1

country, are not directly available.A better investigation of individual investor behaviour is long overdue. Even

though human social behaviour has been widely investigated, it has remained anopen question of how individuals manage their capital and how much time andhow uniformly they allocate it to different financial instruments. Having resultsfrom such studies would be a worthwhile input to creating more realistic Agent-Based Models, that could better explain the processes in financial systems (Battiston,Farmer et al., 2016). Further, research in investor networks could benefit the topic ofinvestor herding. Namely, the links in an investor network are usually defined usingsome similarity measure, so naturally, the topological properties of these similaritynetworks are directly influenced by herding in stock markets.

74

Page 77: TUNI baltakys arkisto1

REFERENCES

Abbasi, A., Hossain, L., Uddin, S. and Rasmussen, K. J. (2011). Evolutionary dy-namics of scientific collaboration networks: multi-levels and cross-time analysis.Scientometrics 89.2, 687–710.

Abolafia, M. Y. (2001). Making markets: Opportunism and restraint on Wall Street.Harvard University Press.

Ahern, K. R. (2017). Information networks: Evidence from illegal insider tradingtips. Journal of Financial Economics 125.1, 26–47.

Albert, R. and Barabási, A.-L. (2002). Statistical mechanics of complex networks.Reviews of modern physics 74.1, 47.

Albert, R., Jeong, H. and Barabási, A.-L. (1999). Internet: Diameter of the world-wide web. Nature 401.6749, 130–131.

Alessandretti, L., Sapiezynski, P., Sekara, V., Lehmann, S. and Baronchelli, A. (2018).Evidence for a conserved quantity in human mobility. Nature Human Behaviour,1.

Allen, F. and Gale, D. (2000). Financial contagion. Journal of political economy 108.1,1–33.

Altay, G. and Emmert-Streib, F. (2010). Inferring the conservative causal core of generegulatory networks. BMC Systems Biology 4.1, 132.

Andrei, D. and Cujean, J. (2017). Information percolation, momentum and reversal.Journal of Financial Economics 123.3, 617–645.

Appel, K. I. and Haken, W. (1989). Every planar map is four colorable. Vol. 98. Amer-ican Mathematical Soc.

Baker, H. K. and Nofsinger, J. R. (2002). Psychological biases of investors. Financialservices review 11.2, 97.

Baker, W. E. (1984). The social structure of a national securities market. Americanjournal of sociology 89.4, 775–811.

Bala, V. and Goyal, S. (1997). Self-organization in communication networks. Tech. rep.

75

Page 78: TUNI baltakys arkisto1

Baltakiene, M., Baltakys, K., Kanniainen, J., Pedreschi, D. and Lillo, F. (2019). In-vestor Clusters in Initial Public Offerings. Palgrave Communications.

Baltakiene, M., Baltakys, K., Kanniainen, J., Pedreschi, D. and Lillo, F. (2019a). Clus-ters of Investors Around Initial Public Offering. arXiv preprint arXiv:1905.13508v2.Accepted for publication in the Palgrave Communications journal.

— (2019b). Clusters of Investors Around Initial Public Offering. arXiv preprint arXiv:1905.13508v2.Accepted for publication in the Palgrave Communications journal.

Baltakys, K., Baltakiene, M., Kärkkäinen, H. and Kanniainen, J. (2018a). Neighborsmatter: Geographical distance and trade timing in the stock market. Finance Re-search Letters. DOI: 10.1016/j.frl.2018.11.013.

— (2018b). Neighbors matter: Geographical distance and trade timing in the stockmarket. Finance Research Letters. DOI: 10.1016/j.frl.2018.11.013.

Baltakys, K., Kanniainen, J. and Emmert-Streib, F. (2018). Multilayer aggregationwith statistical validation: Application to investor networks. Scientific reports 8.1,8198. DOI: 10.1038/s41598-018-26575-2.

Banerjee, S., Kaniel, R. and Kremer, I. (2009). Price drift as an outcome of differencesin higher-order beliefs. The Review of Financial Studies 22.9, 3707–3734.

Bardoscia, M., Battiston, S., Caccioli, F. and Caldarelli, G. (2017). Pathways towardsinstability in financial networks. Nature Communications 8, 14416.

Barrat, A., Barthélemy, M., Pastor-Satorras, R. and Vespignani, A. (2004). The ar-chitecture of complex weighted networks. Proceedings of the National Academy ofSciences 101.11, 3747–3752.

Bastian, M., Heymann, S. and Jacomy, M. (2009). Gephi: an open source softwarefor exploring and manipulating networks. Third international AAAI conferenceon weblogs and social media.

Battiston, S., Caldarelli, G., May, R. M., Roukny, T. and Stiglitz, J. E. (2016). Theprice of complexity in financial networks. Proceedings of the National Academy ofSciences 113.36, 10031–10036.

Battiston, S., Farmer, J. D., Flache, A., Garlaschelli, D., Haldane, A. G., Heester-beek, H., Hommes, C., Jaeger, C., May, R. and Scheffer, M. (2016). Complexitytheory and financial regulation. Science 351.6275, 818–819.

Battiston, S., Glattfelder, J. B., Garlaschelli, D., Lillo, F. and Caldarelli, G. (2010).The structure of financial networks. Network Science. Springer, 131–163.

76

Page 79: TUNI baltakys arkisto1

Battiston, S., Puliga, M., Kaushik, R., Tasca, P. and Caldarelli, G. (2012). Debtrank:Too central to fail? financial networks, the fed and systemic risk. Scientific reports2, 541.

Bondy, J. A., Murty, U. S. R. et al. (1976). Graph theory with applications. Vol. 290.Citeseer.

Brown, J. R., Ivkovic, Z., Smith, P. A. and Weisbenner, S. (2008). Neighbors matter:Causal community effects and stock market participation. The Journal of Finance63.3, 1509–1531.

Challet, D., Chicheportiche, R., Lallouache, M. and Kassibrakis, S. (2018). Statisti-cally validated lead-lag networks and inventory prediction in the foreign exchangemarket. Advances in Complex Systems 21.08, 1850019.

Cimini, G., Squartini, T., Garlaschelli, D. and Gabrielli, A. (2015). Systemic RiskAnalysis on Reconstructed Economic and Financial Networks. Scientific reports5.

Colla, P. and Mele, A. (2009). Information linkages and correlated trading. The Re-view of Financial Studies 23.1, 203–246.

Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley& Sons.

D’Arcangelis, A. M. and Rotundo, G. (2016). Complex networks in finance. ComplexNetworks and Dynamics. Springer, 209–235.

De Domenico, M., Nicosia, V., Arenas, A. and Latora, V. (2015). Structural reducibil-ity of multilayer networks. Nature communications 6, 6864.

D’Errico, M., Grassi, R., Stefani, S. and Torriero, A. (2009). Shareholding networksand centrality: an application to the Italian financial market. Networks, Topologyand Dynamics. Springer, 215–228.

Emmert-Streib, F., Musa, A., Baltakys, K., Kanniainen, J., Tripathi, S., Yli-Harja, O.,Jodlbauer, H. and Dehmer, M. (2018a). Computational Analysis of the structuralproperties of Economic and Financial Networks. Journal of Network Theory inFinance 4.3, 1–32.

— (2018b). Computational Analysis of the structural properties of Economic andFinancial Networks. Journal of Network Theory in Finance 4.3, 1–32. DOI: 10.21314/JNTF.2018.043.

Euler, L. (1741). Solutio problematis ad geometriam situs pertinentis. Commentariiacademiae scientiarum Petropolitanae, 128–140.

77

Page 80: TUNI baltakys arkisto1

Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999). On power-law relationships ofthe internet topology. ACM SIGCOMM computer communication review. Vol. 29.4. ACM, 251–262.

Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work.The journal of Finance 25.2, 383–417.

Feng, L. and Seasholes, M. S. (2004). Correlated trading and location. The Journal ofFinance 59.5, 2117–2144.

Fortunato, S. (2010). Community detection in graphs. Physics Reports 486, 75–174.ISSN: 0370-1573. DOI: http://dx.doi.org/10.1016/j.physrep.2009.11.002. URL: http://www.sciencedirect.com/science/article/pii/S0370157309002841.

Goldenberg, J. and Levy, M. (2009). Distance is not dead: Social interaction and ge-ographical distance in the internet era. arXiv preprint arXiv:0906.3202.

Greene, W. H. (2003). Econometric analysis. Pearson Education India.Grinblatt, M. and Keloharju, M. (2000). The investment behavior and performance

of various investor types: a study of Finland’s unique data set. Journal of financialeconomics 55.1, 43–67.

— (2001). What makes investors trade?: The Journal of Finance 56.2, 589–616.Grossman, S. J. and Stiglitz, J. E. (1980). On the impossibility of informationally

efficient markets. The American economic review 70.3, 393–408.Gualdi, S., Cimini, G., Primicerio, K., Di Clemente, R. and Challet, D. (2016). Statis-

tically validated network of portfolio overlaps and systemic risk. Scientific reports6, 39467.

Guiso, L., Sapienza, P. and Zingales, L. (2009). Cultural biases in economic exchange?:The Quarterly Journal of Economics 124.3, 1095–1131.

Gutiérrez-Roig, M., Borge-Holthoefer, J., Arenas, A. and Perelló, J. (2019). Mappingindividual behavior in financial markets: synchronization and anticipation. EPJData Science 8.1, 10.

Hamilton, W. R. (1858). Account of the icosian calculus. Proceedings of the RoyalIrish Academy. Vol. 6, 415–416.

Han, B. and Yang, L. (2013). Social networks, information acquisition, and assetprices. Management Science 59.6, 1444–1457.

Heimer, R. Z. (2014). Friends do let friends buy stocks actively. Journal of EconomicBehavior & Organization 107, 527–540.

78

Page 81: TUNI baltakys arkisto1

Hidalgo, C. A. (2015). Disconnected! the parallel streams of network literature in thenatural and social sciences. Tech. rep.

Hong, H., Kubik, J. D. and Stein, J. C. (2004). Social interaction and stock-marketparticipation. The journal of finance 59.1, 137–163.

— (2005). Thy neighbor’s portfolio: Word-of-mouth effects in the holdings and tradesof money managers. The Journal of Finance 60.6, 2801–2824.

Hussain, A. and Vatrapu, R. (2014). Social data analytics tool (sodato). InternationalConference on Design Science Research in Information Systems. Springer, 368–372.

Hussain, A., Vatrapu, R., Hardt, D. and Jaffari, Z. A. (2014). Social data analyticstool: A demonstrative case study of methodology and software. Analyzing SocialMedia Data and Web Networks. Springer, 99–118.

Ivkovic, Z. and Weisbenner, S. (2007). Information diffusion effects in individualinvestors’ common stock purchases: Covet thy neighbors’ investment choices.The Review of Financial Studies 20.4, 1327–1357.

Jackson, M. O. (2005). The economics of social networks.Jiang, Z.-Q., Xie, W.-J., Xiong, X., Zhang, W., Zhang, Y.-J. and Zhou, W.-X. (2013).

Trading networks, abnormal motifs and stock manipulation. Quantitative Fi-nance Letters 1.1, 1–8.

Johnson, C. and Gilles, R. P. (2003). Spatial social networks. Networks and Groups.Springer, 51–77.

Jones, J. H. and Handcock, M. S. (2003). Social networks (communication arising):Sexual contacts and epidemic thresholds. Nature 423.6940, 605.

Karhunen, J. and Keloharju, M. (2001). Shareownership in Finland 2000. Liiketaloudelli-nen aikakauskirja, 188–226.

Kaustia, M. and Knüpfer, S. (2008). Do investors overweight personal experience?Evidence from IPO subscriptions. The Journal of Finance 63.6, 2679–2702.

— (2012). Peer performance and stock market entry. Journal of Financial Economics104.2, 321–338.

Keloharju, M. (1993). The winner’s curse, legal liability, and the long-run price per-formance of initial public offerings in Finland. Journal of Financial Economics34.2, 251–277.

Kirkman, T. P. (1858). VII. On the partitions of the R-pyramid, being the first classof R-gonous X-edra. Philosophical Transactions of the Royal Society of London 148,145–161.

79

Page 82: TUNI baltakys arkisto1

Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y. and Porter, M. A.(2014). Multilayer networks. Journal of Complex Networks 2.3, 203–271.

Kraskov, A., Stögbauer, H. and Grassberger, P. (2004). Estimating mutual informa-tion. Physical review E 69.6, 066138.

Leskovec, J. and Horvitz, E. (2008). Planetary-scale views on a large instant-messagingnetwork. Proceedings of the 17th international conference on World Wide Web.ACM, 915–924.

Liebowitz, J., Ayyavoo, N., Nguyen, H., Carran, D. and Simien, J. (2007). Cross-generational knowledge flows in edge organizations. Industrial Management &Data Systems 107.8, 1123–1153.

Lillo, F., Miccichè, S., Tumminello, M., Piilo, J. and Mantegna, R. N. (2015). Hownews affects the trading behaviour of different categories of investors in a financialmarket. Quantitative Finance 15.2, 213–229.

Ljungqvist, A. and Wilhelm Jr, W. J. (2005). Does prospect theory explain IPO mar-ket behavior?: The Journal of Finance 60.4, 1759–1790.

— (2003). IPO pricing in the dot-com bubble. The Journal of Finance 58.2, 723–752.Mantegna, R. N. (1999). Hierarchical structure in financial markets. The European

Physical Journal B-Condensed Matter and Complex Systems 11.1, 193–197.Mantegna, R. N. and Stanley, H. E. (2000). An introduction to econophysics: corre-

lation and complexity in finance. Cambridge, UK: Cambridge University.Markose, S., Giansante, S. and Shaghaghi, A. R. (2012). ‘Too interconnected to fail’financial

network of US CDS market: Topological fragility and systemic risk. Journal ofEconomic Behavior & Organization 83.3, 627–646.

Marotta, L., Micciche, S., Fujiwara, Y., Iyetomi, H., Aoyama, H., Gallegati, M. andMantegna, R. N. (2015). Bank-firm credit network in Japan: an analysis of a bi-partite network. PloS one 10.5, e0123079.

Matos Simoes, R. de and Emmert-Streib, F. (2012). Bagging statistical network infer-ence from large-scale gene expression data. PLoS One 7.3, e33624.

Milgram, S. (1967). The small world problem. Psychology today 2.1, 60–67.Miritello, G., Moro, E., Lara, R., Martınez-López, R., Belchamber, J., Roberts, S. G.

and Dunbar, R. I. (2013). Time as a limited resource: Communication strategy inmobile phone networks. Social Networks 35.1, 89–95.

80

Page 83: TUNI baltakys arkisto1

Musciotto, F., Marotta, L., Miccichè, S., Piilo, J. and Mantegna, R. N. (2016). Pat-terns of trading profiles at the Nordic Stock Exchange. A correlation-based ap-proach. Chaos, Solitons & Fractals 88, 267–278.

Musciotto, F., Marotta, L., Piilo, J. and Mantegna, R. N. (2018). Long-term ecologyof investors in a financial market. Palgrave Communications 4.1, 92.

Nešetril, J., Milková, E. and Nešetrilová, H. (2001). Otakar Borvka on minimumspanning tree problem translation of both the 1926 papers, comments, history.Discrete mathematics 233.1-3, 3–36.

Newman, M. (2010). Networks: an introduction. OUP Oxford.Newman, M. E. (2001). Scientific collaboration networks. II. Shortest paths, weighted

networks, and centrality. Physical review E 64.1, 016132.Nofsinger, J. R. and Sias, R. W. (1999). Herding and feedback trading by institutional

and individual investors. The Journal of finance 54.6, 2263–2295.Onnela, J.-P., Saramäki, J., Hyvönen, J., Szabó, G., Lazer, D., Kaski, K., Kertész, J.

and Barabási, A.-L. (2007). Structure and tie strengths in mobile communicationnetworks. Proceedings of the National Academy of Sciences 104.18, 7332–7336.

Ozsoylev, H. N. and Walden, J. (2011). Asset pricing in large information networks.Journal of Economic Theory 146.6, 2252–2280.

Ozsoylev, H. N., Walden, J., Yavuz, M. D. and Bildik, R. (2013). Investor networksin the stock market. The Review of Financial Studies 27.5, 1323–1366.

— (2014). Investor networks in the stock market. Review of Financial Studies 27.5,1323–1366.

Pan, R. K. and Saramäki, J. (2012). The strength of strong ties in scientific collabo-ration networks. EPL (Europhysics Letters) 97.1, 18007.

Pastor-Satorras, R. and Vespignani, A. (2001). Epidemic spreading in scale-free net-works. Physical review letters 86.14, 3200.

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits andsystems magazine 6.3, 21–45.

Pozzi, F., Di Matteo, T. and Aste, T. (2013). Spread of risk across financial markets:better to invest in the peripheries. Scientific reports 3.

Radicchi, F., Fortunato, S. and Castellano, C. (2008). Universality of citation dis-tributions: Toward an objective measure of scientific impact. Proceedings of theNational Academy of Sciences 105.45, 17268–17272.

81

Page 84: TUNI baltakys arkisto1

Ranganathan, S., Kivelä, M. and Kanniainen, J. (2018). Dynamics of investor span-ning trees around dot-com bubble. PloS one 13.6, e0198807.

Rosvall, M. and Bergstrom, C. T. (2010). Mapping change in large networks. PloSone 5.1, e8694.

— (2008). Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences 105.4, 1118–1123.

Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epi-demiology, 43–46.

Saracco, F., Di Clemente, R., Gabrielli, A. and Squartini, T. (2015). Randomizingbipartite networks: the case of the World Trade Web. Scientific Reports 5.

Saramäki, J., Leicht, E. A., López, E., Roberts, S. G., Reed-Tsochas, F. and Dunbar,R. I. (2014). Persistence of social signatures in human communication. Proceedingsof the National Academy of Sciences 111.3, 942–947.

Schelling, T. C. (2006). Micromotives and macrobehavior. WW Norton & Company.Scott, J. (1988). Social network analysis. Sociology 22.1, 109–127.Shannon, C. E. (1948). A mathematical theory of communication. Bell system tech-

nical journal 27.3, 379–423.Shiller, R. C. (2000). Irrational exuberance. Philosophy and Public Policy Quarterly

20.1, 18–23.Shiller, R. J. and Pound, J. (1989). Survey evidence on diffusion of interest and in-

formation among investors. Journal of Economic Behavior & Organization 12.1,47–66.

Shive, S. (2010). An epidemic model of investor behavior. Journal of Financial andQuantitative Analysis 45.1, 169–198.

Sias, R. W. (2004). Institutional herding. The Review of Financial Studies 17.1, 165–206.

Siikanen, M., Baltakys, K., Kanniainen, J., Vatrapu, R., Mukkamala, R. and Hussain,A. (2018a). Facebook drives behavior of passive households in stock markets. Fi-nance Research Letters 27, 208–213. DOI: 10.1016/j.frl.2018.03.020.

— (2018b). Facebook drives behavior of passive households in stock markets. Fi-nance Research Letters 27, 208–213. DOI: 10.1016/j.frl.2018.03.020.

Simon, H. A. (1957). Models of man; social and rational.

82

Page 85: TUNI baltakys arkisto1

Smith, J. A., McPherson, M. and Smith-Lovin, L. (2014). Social distance in the UnitedStates: Sex, race, religion, age, and education homophily among confidants, 1985to 2004. American Sociological Review 79.3, 432–456.

Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P.-L. and Ideker, T. (2010). Cytoscape2.8: new features for data integration and network visualization. Bioinformatics27.3, 431–432.

Snow, N. M. and Rasso, J. (2017). If the tweet fits: How investors process financialinformation received via social media.

Stein, J. C. (2008). Conversations among competitors. American Economic Review98.5, 2150–62.

Streiner, D. L. and Norman, G. R. (2011). Correction for multiple testing: is there aresolution?: Chest 140.1, 16–18.

Thaler, R. H. and Ganser, L. (2015). Misbehaving: The making of behavioral economics.WW Norton New York.

Tumminello, M., Aste, T., Di Matteo, T. and Mantegna, R. N. (2005). A tool forfiltering information in complex systems. Proceedings of the National Academy ofSciences of the United States of America 102.30, 10421–10426.

Tumminello, M., Coronnello, C., Lillo, F., Micciche, S. and Mantegna, R. N. (2007).Spanning trees and bootstrap reliability estimation in correlation-based networks.International Journal of Bifurcation and Chaos 17.07, 2319–2329.

Tumminello, M., Lillo, F., Piilo, J. and Mantegna, R. N. (2012). Identification ofclusters of investors from their real trading activity in a financial market. NewJournal of Physics 14.1, 013041.

Tumminello, M., Micciche, S., Lillo, F., Piilo, J. and Mantegna, R. N. (2011). Statis-tically validated networks in bipartite complex systems. PloS one 6.3, e17994.

Tumminello, M., Micciche, S., Lillo, F., Varho, J., Piilo, J. and Mantegna, R. N.(2011). Community characterization of heterogeneous complex systems. Journalof Statistical Mechanics: Theory and Experiment 2011.01, P01019.

Tumminello, M., Miccichè, S., Varho, J., Piilo, J. and Mantegna, R. N. (2013). Quan-titative Analysis of Gender Stereotypes and Information Aggregation in a Na-tional election. PloS one 8.3, e58910.

Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics andbiases. science 185.4157, 1124–1131.

83

Page 86: TUNI baltakys arkisto1

Vincenty, T. (1975). Direct and inverse solutions of geodesics on the ellipsoid withapplication of nested equations. Survey review 23.176, 88–93.

Vitali, S., Glattfelder, J. B. and Battiston, S. (2011). The network of global corporatecontrol. PloS one 6.10, e25995.

Walden, J. (2014). Trading, profits, and volatility in a dynamic information networkmodel. Available at SSRN 2561055.

Walters-Williams, J. and Li, Y. (2009). Estimation of mutual information: A survey.International Conference on Rough Sets and Knowledge Technology. Springer, 389–396.

Wooldridge, J. M. (2015). Introductory econometrics: A modern approach. Nelson Ed-ucation.

Zhong, R., Allen, J. D., Xiao, G. and Xie, Y. (2014). Ensemble-based network ag-gregation improves the accuracy of gene network reconstruction. PloS one 9.11,e106319.

84

Page 87: TUNI baltakys arkisto1

PUBLICATIONS

85

Page 88: TUNI baltakys arkisto1
Page 89: TUNI baltakys arkisto1

PUBLICATION

I

Computational Analysis of the structural properties of Economic andFinancial Networks

Emmert-Streib, F., Musa, A., Baltakys, K., Kanniainen, J., Tripathi, S.,Yli-Harja, O., Jodlbauer, H. and Dehmer, M.

Journal of Network Theory in Finance 4.3 (2018), 1–32DOI: 10.21314/JNTF.2018.043

Publication reprinted with the permission of the copyright holders

Page 90: TUNI baltakys arkisto1
Page 91: TUNI baltakys arkisto1

Journal of Network Theory in Finance 4(3), 1–32DOI: 10.21314/JNTF.2018.043

Copyright Infopro Digital Limited 2018. All rights reserved. You may shareusing our article tools. This article may be printed for the sole use of theAuthorised User (named subscriber), as outlined in our terms and conditions.https://www.infopro-insight.com/termsconditions/insight-subscriptions

Research Paper

Computational analysis of structuralproperties of economic and financial networks

Frank Emmert-Streib,1,2 Aliyu Musa,1,2 Kestutis Baltakys,3

Juho Kanniainen,3 Shailesh Tripathi,1 Olli Yli-Harja,2,4

Herbert Jodlbauer5 and Matthias Dehmer5,6,7

1Predictive Medicine and Data Analytics Lab, Department of Signal Processing,Tampere University of Technology, Korkeakoulunkatu 3, 33720 Tampere, Finland;emails: [email protected], [email protected] of Biosciences and Medical Technology, Tampere University of Technology,Korkeakoulunkatu 3, 33720 Tampere, Finland; emails: [email protected], [email protected] and Information Management, Tampere University of Technology, Korkeakoulunkatu 3,33720 Tampere, Finland; emails: [email protected], [email protected] for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109, USA5Institute for Intelligent Production, Faculty for Management,University of Applied Sciences Upper Austria, Steyr Campus, Wehrgrabengasse 1–3,4400 Steyr, Austria; emails: [email protected], [email protected] for Biomedical Computer Science and Mechatronics,UMIT – The Health and Lifesciences University, Eduard-Wallnöfer-Zentrum 1,6060 Hall in Tirol, Austria7College of Computer and Control Engineering, Nankai University, 38 Tongyan Road,Tianjin 300350, People’s Republic of China

(Received October 12, 2017; revised April 23, 2018; accepted May 15, 2018)

ABSTRACT

In recent years, methods from network science have been rapidly gaining tractionin economics and finance. One reason for this is that, in a globalized world, it iscrucial that we understand the interconnections between economic and financial enti-ties; networks provide a natural framework for representing and studying such sys-tems. In this paper, we survey the use of networks and network-based methods to

Corresponding author: F. Emmert-Streib Print ISSN 2055-7795 j Online ISSN 2055-7809© 2018 Infopro Digital Risk (IP) Limited

1

Page 92: TUNI baltakys arkisto1

2 F. Emmert-Streib et al

study economy-related questions. We start with a brief overview of graph theory andsome basic definitions. Then, we discuss descriptive network measures and networkcomplexity measures for quantifying the structural properties of economic networks.Finally, we discuss different network and tree structures as well as their relevance tocertain applications.

Keywords: economic networks; network science; social networks; economy; econometrics;computational social science; financial networks.

1 INTRODUCTION

In the field of economics, there has been increasing interest in recent years in investi-gating financial, economic, production and investment markets by means of networks.One reason for this interest is that a network, also called a graph, allows for the con-venient mathematical representation and analysis of a system with many interactingentities. This flexibility is considerable enough to accommodate all different typesof economic networks in existence, eg, interbank, investment, director, ownership,financial, product and trade networks (Arnold et al 2006; Boss et al 2004; Degryseand Nguyen 2004; Dhar et al 2014; Hochberg et al 2007; Qiu et al 2010; Rouknyet al 2014; Vitali et al 2011).

The study of graphs and networks is a long-standing tradition, beginning with Euler(1736) and Cayley (1857), whose theories were formalized by König in the 1930s(König 1936), and later finding utility in interdisciplinary applications to mathematics(Brandstädt et al 1999; Diestel 2000; Erdos and Rényi 1959; Harary 1967), computerscience (Cormen et al 2001; Even 1979), physics (Harary 1967), biology (Emmert-Streib and Dehmer 2008; Kauffman 1969; Palsson 2006; Roberts 1989) and sociology(Harary 1959; Scott 2001; Wasserman and Faust 1994). Despite this, the study ofeconomic networks lags behind these other fields. One reason for this might be thedifficulty in constructing economic networks. For instance, while it is relatively easyto observe the acquaintanceships among people or the molecular composition ofchemicals that lead to social or chemical networks, pinpointing the effect of one stockon another in order to construct a financial network is considerably more difficult toinfer; an appropriate statistical method and a data set are necessary to accomplishthis. Fortunately, technological progress and the emergence of a digital society areallowing us to tackle this problem.

Before we start with our review, we want to mention a couple of possible appli-cations for such a network. According to Hopp and Spearman (2011), a productionsystem is a network of interacting parts for which managing those interactions is asimportant as, if not more important than, managing individual processes and entities.Graph theory is a powerful approach to modeling these interactions.

Journal of Network Theory in Finance www.risk.net/journals

Page 93: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 3

Historically well-known methods are the program evaluation and review technique(PERT), developed by Kelly and Walker for the US Navy, and the critical path method(CPM), developed by Kelley et al (1959). Both methods help managers to schedule,monitor and control large, complex projects. In operations, several complex projectsmay occur: for instance, optimal resource allocation for the different flight rates ofa space shuttle (Heileman et al 1992), or shutdown management and schedulingmaintenance (Roberts and Mask 1992). In Hartmann and Briskorn (2010), Pinedo(2012), Kalinowski et al (2014) and Morinaga et al (2016), good surveys of theresource-constrained project scheduling problem are given.

Similarly to project schedules in manufacturing, graphs can be used to describecomplex precedence constraints and production schedules (Kalinowski et al 2014;Pinedo 2012). Gantt charts, widely used in production planning, have an equivalentgraph representation. For flow shops both with and without unlimited intermediatestorage, a directed graph can be applied for the computation of different objectivessuch as makespan, tardiness or number of tardy jobs. For job shops, disjunctive graphsor bipartite graphs are suitable to model the minimization of the completion time.Assembly line balancing is a specific problem in production planning. Kilbridge andWester (1961) developed a heuristic diagram of work elements based on precedence.

Network location models (Amin and Zhang 2013) support the task of locating oneor more new facilities in an existing network in order to minimize multi-objectives, forinstance, some function of distance separating the new and existing facilities. System-atic layout planning tries to find an optimal plant layout that balances technologicallimitations, organizational policies, safety considerations, space requirements, avail-ability, and product and process constraints by finding a maximally planar-weightedgraph (Morinaga et al 2016).

The purpose of this paper is to highlight the potential of structural analyses ofeconomic networks by reviewing approaches from graph theory and network science.We start in Section 2 by providing the necessary preliminaries from graph theory. InSection 3, we review local and global descriptive network measures, while in Section 4we discuss measures that quantify the structural complexity of networks. Section 5gives an overview of important network and tree classes that are useful for the studyof economic networks. This paper finishes in Sections 6 and 7 with a discussion ofpotential future directions and conclusions.

2 SETTING THE FRAMEWORK FROM GRAPH THEORY

Before we begin surveying the analysis methods of economic networks, we need toprovide the necessary preliminaries from graph theory. We start with basic definitionsfor undirected and directed graphs (Bang-Jensen and Gutin 2002; Harary 1969).

www.risk.net/journals Journal of Network Theory in Finance

Page 94: TUNI baltakys arkisto1

4 F. Emmert-Streib et al

Definition 2.1 The pair G D .V; E/, where V represents a finite set of vertexesand E is the set of edges E � �

V2

�, is called a finite undirected graph.

Throughout this paper, we denote the cardinality of the vertex set by jV j WD N .The cardinality of the edge set is denoted by jEj. In the following, we write N.G/ andjE.G/j instead of N and jEj when it is necessary to emphasize that we are referringto a specific graph G.

Definition 2.2 G.N / denotes the set of undirected graphs having N vertexes.

Definition 2.3 The pair G D .V; E/, where V represents a finite set of vertexesand E is the set of edges E � V � V , is called a finite directed graph.

We emphasize that, in this paper, we are only considering graphs with finite vertexsets. Hence, their edge sets are also finite. For this reason, these graphs are called finitegraphs (Harary 1969). In contrast, infinite graphs possess both infinite vertex sets andinfinite edge sets. They have been investigated, for example, when studying growthmodels for the world wide web, birth and death processes, random graph models,and mathematical symmetry using Cayley graphs (Bollabás 1998; Chakrabarti 2002;Erdos and Rényi 1960; Harary 1969).

We remark that if G D .V; E/ is allowed to have loops (reflexive edges) and paralleledges, then G is called a multigraph (Gross andYellen 2006; Harary 1969). In contrast,hypergraphs (Berge 1989) are generalizations of the ordinary notation of a graph,which we just introduced. Specifically, for an ordinary graph (see Definitions 2.1and 2.3), an edge connects exactly two vertexes, whereas a hyperedge can connectany number of vertexes (see Berge 1989). Graphs that possess directed hyperedgesare called directed hypergraphs and have been defined by Gallo et al (1993).

A very important graph class is labeled graphs (Harary 1969). These have been usedto model complex structures in various scientific disciplines, such as biology (Felsen-stein 2003; Foulds 1992; Semple and Steel 2003), chemistry (Devillers and Balaban1999; Trinajstic 1992), sociology (Wasserman and Faust 1994) and mathematicalpsychology (Sommerfeld 1994; Sommerfeld and Sobik 1994).

Definition 2.4 Let

AGV WD fl1

v ; l2v ; : : : ; l

jAGV j

v g (2.1)

and

AGE WD fl1

e ; l2e ; : : : ; l

jAGE j

e g (2.2)

be unique (finite) vertex and edge alphabets, respectively. lV W V ! AGV and lE W E !

AGE are the corresponding edge and vertex labeling functions. In addition,

G WD .V; E; lV; lE/; (2.3)

is called a finite, labeled graph.

Journal of Network Theory in Finance www.risk.net/journals

Page 95: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 5

FIGURE 1 Representation of a network G by an adjacency matrix A.

1 2 3 4

1

2

3

4

A =

1

0

(a) (b)

v1

v2 v3

v4

(a) Adjacency matrix of G. (b) Network G. Due to the undirectedness of G, the matrix A is symmetrical.

To represent a graph or a network practically, the so-called adjacency matrix canbe used (Harary 1969).

Definition 2.5 The adjacency matrix of a finite graph G D .V; E/ is defined by

Aij WD(

1 if .vi ; vj / 2 E;

0 otherwise:(2.4)

In Figure 1, we show the adjacency matrix A (part (a)) of an undirected networkG (part (b)). In this case, the matrix A is symmetric.

Based on the adjacency matrix of a network, eigenvalues (Cvetkovic et al 1997;Harary 1969) can be defined.

Definition 2.6 The spectrum of G consists of the sets M� D f�1; �2; : : : ; �kgand M D fn1; n2; : : : ; nkg. Here, ni denotes the multiplicity of the zero �i of theequation det.A � �U / D 0, which is the characteristic polynomial of G, where A isthe adjacency matrix and U is the unit matrix.

Examples of eigenvalue-based models that have been applied in the context ofeconomic networks can be found in König and Battiston (2009).

3 DESCRIPTIVE NETWORK MEASURES

In the following sections, we present quantitative network measures that allow us toperform a descriptive network analysis. Many of the measures have their origin in thesocial, chemical or information sciences (Allen 2002; Bonchev 1983; Bonchev and

www.risk.net/journals Journal of Network Theory in Finance

Page 96: TUNI baltakys arkisto1

6 F. Emmert-Streib et al

Rouvray 2005; Wasserman and Faust 1994). If not otherwise stated, we are assumingthat the networks have undirected edges.

3.1 Node degree and degree distribution

The degree ki of a vertex i is the number of edges that are incident with vertex i . Thisis given by

ki DX

j

Avi ;vj: (3.1)

From this, the degree distribution (Bornholdt and Schuster 2003; Mason andVerwoerd2007) is obtained by

P.k/ WD ık

N; (3.2)

where ık denotes the number of vertexes in the network G having a degree of k,and N is the total number of nodes. Equation (3.2) corresponds to the proportion ofvertexes in G having a degree of k. Formally, ık can be written as

ık DNX

iD1

I.ki D k/; (3.3)

where I./ is the indicator function giving 1 for a true argument and 0 otherwise.Another meaning of (3.2) is that a randomly chosen vertex in the network has aprobability P.k/ of being linked with k other vertexes.

It is an interesting and important finding that many real world networks such asthe world wide web, the internet, social networks, citation networks and food webs(Adamic and Huberman 2000; Bornholdt and Schuster 2003; Brandes and Erlebach2005) are not Poisson distributed like random networks (see Section 5.1 for a detaileddiscussion of random networks). Rather, they follow a power law distribution, ie,

P.k/ � k�� ; � > 1: (3.4)

In contrast to the above measures that characterize the properties of individualnodes, there are measures that characterize the whole network. For instance, theaverage degree for the entire network is

k D k.G/ WDXv2V

kv

N: (3.5)

Finally, the edge density of G is defined by

ˇ.G/ WD jEj�N2

� : (3.6)

Journal of Network Theory in Finance www.risk.net/journals

Page 97: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 7

FIGURE 2 An example for a clustering coefficient.

vi ei = 3, ni = 5

The node vi has ni D 5 connections and ei D 3. This results in Ci D 3=10.

Here, the denominator gives the total number of possible edges for a network withN nodes, which corresponds to a fully connected network. Further network statisticsand advanced aspects can be found in, for example, Brinkmeier and Schank (2005)and Skorobogatov and Dobrynin (1988).

3.2 Clustering coefficient

The clustering coefficient Ci is a local measure (Watts and Strogatz 1998) defined forevery vertex i . For an undirected network, it is defined by

Ci D 2ei

ni .ni � 1/; (3.7)

where ni is the number of neighbors of vertex i , and ei is the number of adjacentpairs between all neighbors of i (Watts and Strogatz 1998). In Figure 2, we give anexample.

3.3 Path-based measures

The next type of measure involves more than one node for its calculation, being basedon the path between nodes (Brinkmeier and Schank 2005; Buckley and Harary 1990;Halin 1989; Skorobogatov and Dobrynin 1988).

Let G D .V; E/ be a connected graph and let

.d.vi ; vj //vi ;vj 2V (3.8)

be the distance matrix, where d.vi ; vj / denotes the distance (length of the shortestpath) between the nodes vi and vj . From this, the average distance of a network

www.risk.net/journals Journal of Network Theory in Finance

Page 98: TUNI baltakys arkisto1

8 F. Emmert-Streib et al

follows:Nd.G/ WD 1�

N2

� X16i<j 6N

d.vi ; vj /: (3.9)

Additional graph metrics (Skorobogatov and Dobrynin 1988) based on the distancematrix are

�.v/ D maxu2V

d.u; v/; (3.10)

�.G/ D maxv2V

�.v/ (3.11)

and

r.G/ D minv2V

�.v/: (3.12)

The above entity �.v/ is called the eccentricity of v 2 V ; �.G/ is the diameter of G,and r.G/ is the radius of the graph.

3.4 Network centrality measures: identifying important nodes

There is a large family of measures called centrality measures (Freeman 1977) thathave their origin in the social sciences (Hage and Harary 1995; Wasserman and Faust1994). The goal of these measures is to identify nodes in networks that are importantin terms of communication.

Conceptually, one distinguishes between two fundamentally different types of cen-trality measure (Freeman 1977, 1979). The first are point centrality measures, whilethe second are graph centrality measures. The difference is that the former characterizethe local properties of a graph, whereas the latter characterize global properties.

For an undirected graph G D .V; E/, the degree centrality of a vertex v 2 V isdefined as its degree, ie,

CD.v/ D kv: (3.13)

The next measure CB.vk/ is called betweenness centrality:

CB.vk/ DX

vi ;vj 2V;vi ¤vj

�vi vj.vk/

�vi vj

: (3.14)

CB.vk/ is based on distances (see, for example, Freeman 1977; Sabidussi 1966; Scott2001). Here, �vi vj

stands for the number of shortest paths from vi to vj , and �vi vj.vk/

is the number of shortest paths from vi to vj that include vk . Thus, the quantity

�vi vj.vk/

�vi vj

(3.15)

Journal of Network Theory in Finance www.risk.net/journals

Page 99: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 9

FIGURE 3 Visualization of the betweenness centrality measure.

vkvi vk vj

�vi ,vj = 13

�vi ,vj (vk ) = 4

The gray nodes are further nodes in the network.

can be interpreted as the probability that vk lies on a shortest path connecting vi withvj . Consequently, CB.vk/ determines the appearance of vk on all the shortest pathsin the corresponding network. In Figure 3, we show a visualization of CB.vk/.

Another well-known measure is the centrality index called closeness:

CC.vk/ D 1PNiD1 d.vk; vi /

: (3.16)

Here, d.vk; vi / denotes the number of edges on the shortest path between vk and vi . Ifthere are multiple shortest paths connecting vk with vi , d.vk; vi / remains unchanged.Note that CC.vk/ can be used to evaluate how close a vertex is to other vertexes in agiven network.

The previously mentioned measures are local centrality measures because theydetermine the centrality of a single vertex within a network. In contrast, we nowpresent the definition of a global measure called graph centrality. Here, the crucialidea is to use these individual measures to obtain an average characteristic for thewhole network:

Cx DPN

iD1.Cx.vm/ � Cx.vi //

C maxx

: (3.17)

www.risk.net/journals Journal of Network Theory in Finance

Page 100: TUNI baltakys arkisto1

10 F. Emmert-Streib et al

x denotes any of the three point (local) centrality measures:

Cx.vm/ D maxifCx.vi /g: (3.18)

Cx.vm/ is the maximum of Cx.vi / determined for the given network, and C maxx

denotes the maximal value possible for G 2 G.N / (see Definition 2.2):

C maxx D max

G2G.N /

NXiD1

.Cx.vm/ � Cx.vi //: (3.19)

As special graph centrality measures, we obtain (Brandes and Erlebach 2005; Freeman1977; Wasserman and Faust 1994)

Cd DPN

iD1 Cd .vm/ � Cd .vi /

N 2 � 3N C 2; (3.20)

Cb DPN

iD1 Cd .vm/ � Cd .vi /

N 3 � 4N 2 C 5N � 2(3.21)

and

Cc D 2N � 3

N 3 � 4N 2 C 5N � 2

NXiD1

.Cd .vm/ � Cd .vi //: (3.22)

Further details and applications of these measures can be found in Brandes andErlebach (2005), Freeman (1977) and Wasserman and Faust (1994).

Aside from the measures presented so far, which are classic centrality measures,there are several extended measures. For instance, Bonacich (1972) introduced theeigenvector centrality:

Ce D xmax D 1

�maxAxmax: (3.23)

The purpose of this measure is to express that an important vertex is connected toimportant neighbors. To calculate Ce , one needs to determine the eigenvector of theunderlying adjacency matrix A of a graph G corresponding to the largest eigenvalue.Let us assume �max denotes this largest eigenvalue and xmax denotes the correspondingeigenvector. It is important to note that Ce is a point centrality measure, because eachvertex in the network obtains a value corresponding to the component of Ce . Furthereigenvector centrality measures have been investigated in Koschützki et al (2005).

A conceptual extension of betweenness centrality (Shuja 2016) has been providedby joint betweenness centrality (JBC) (Emmert-Streib 2007). JBC is a nonlocal mea-sure because it quantifies the number of paths that flow through pairs of nodes in anetwork. This centrality measure is defined by

Cjb.vm; vn/ DX

vi ;vj 2V;vi ¤vj

�vi ;vj.vm; vn/

�vi ;vj

; (3.24)

Journal of Network Theory in Finance www.risk.net/journals

Page 101: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 11

FIGURE 4 Visualization of the JBC measure.

vm vnvi vm vn vj

�vi ,vj = 14

�vi ,vj (vm ,vn) = 4

The gray nodes are further nodes in the network that are different to vm and vn.

JBC evaluates the joint occurrence of two vertexes on the shortest communicationpaths in the network. Here, �vi ;vj

gives the number of shortest paths connecting vi

with vj , and �vi ;vj.vm; vn/ gives the number of shortest paths connecting vi with vj

that contain the vertexes vm and vn. In Figure 4, we show a visualization of the JBCmeasure. Further application examples of betweenness centrality and other variantsin the context of the economy can be found in Shuja (2016).

For the general application of centrality measures, normalizations have been foundto be useful. For instance, the analysis in Emmert-Streib (2007) used the followingnormalization:

Cjb.vm; vn/ DX

vi ;vj 2V;vi ¤vj

�vi ;vj.vm; vn/

�max; (3.25)

where

�max D maxvi ;vj

f�vi ;vjg: (3.26)

Examples of economic networks that have been analyzed by applying networkcentrality measures are prevalent. For instance, network centrality measures havebeen studied to identify systemically important financial institutions in the Turkish

www.risk.net/journals Journal of Network Theory in Finance

Page 102: TUNI baltakys arkisto1

12 F. Emmert-Streib et al

interbank market (Kuzubas et al 2014). Specifically, those authors investigated themain borrower role of Demirbank in the crash of the banking system in 2000.A similarstudy of interbank networks can be found in Temizsoy et al (2016), where data fromthe eMID market in the euro area and the United States has been analyzed. Anotherstudy using centrality measures can be found in Sharma et al (2017). That study aimedto identify the core economic sectors of twenty countries worldwide, providing a linkbetween financial networks and the underlying economic fundamentals. For theiranalysis, Sharma et al (2017) utilized eigenvector centrality. For further examples ofapplications to economic networks, please see Giudici and Spelta (2016), Hakeemand Suzuki (2017) and Vitali and Battiston (2014).

4 NETWORK COMPLEXITY

In contrast to the quantitative measures described so far, network complexity mea-sures, which will be discussed in this section, evaluate the network as a whole. Here,the term “complexity” is in general broadly defined, but it often refers to the well-known Kolmogorov complexity (Kolmogorov 1965; Li and Vitányi 1997). The under-lying idea of network complexity measures is to assess the structural complexity asexpressed by the intricate linking or branching structure of a graph.

In general, economic networks can be represented by undirected or directed net-works. However, both are topological networks, which are amenable to structuralanalysis in the form of network complexity. The network complexity measures weare going to discuss in this section are quantitative measures. That means an economicnetwork will be mapped to a real number in order to determine the complexity thereof.This value can be seen as an index characterizing the network.

4.1 Network complexity based on information theory

An important class of network complexity measures that is of relevance for practicalapplications is based on information theory. Information-theoretic complexity mea-sures have been applied to many scientific areas, such as biology, computer scienceand chemistry (Bonchev 1983; Dehmer and Emmert-Streib 2008; Emmert-Streib andDehmer 2007; Mehler 2009). In the following, we review some of the most importantmeasures from this area.

We start with a network G, where X is a graph invariant and � is an equivalencecriterion. This leads to distributions such as that in (Bonchev 1983):0

B@ 1 2 � � � k

jX1j jX2j � � � jXkjp1 p2 � � � pk

1CA : (4.1)

Journal of Network Theory in Finance www.risk.net/journals

Page 103: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 13

The first row stands for the equivalency classes, while the second row contains thecardinalities of the obtained partitions. From this, we calculate probabilities by

pi D jXi jjX j

for each partition corresponding to the third row of the matrix. That means PG D.p1; : : : ; pk/ represents a probability distribution of G. Using the well-knownShannon entropy (Shannon and Weaver 1997), one obtains

I.G; �/ D jX j log.jX j/ �kX

iD1

jXi j log.jXi j/; (4.2)

NI .G; �/ D �kX

iD1

jXi jjX j log

� jXi jjX j

�: (4.3)

Equation (4.2) stands for the so-called total information content of G, whereas (4.3)is the mean information content. This means that, once we have a given economicnetwork, these two measures can be computed in a straightforward manner.

Another method to determine the entropy of economic networks is due to Dehmer(2008). Instead of determining partitions using a graph invariant X , we assign aprobability value to each vertex of a network. We do this by using an informationfunctional f that captures the structural information of G. So, if we apply Shannon’sentropy again, we obtain

If .G/ WD �NX

iD1

f .vi /PNj D1 f .vj /

log

�f .vi /PN

j D1 f .vj /

�: (4.4)

Once we choose concrete information functionals, we obtain concrete graph entropymeasures. Examples are information functionals based on the metrical properties ofgraphs, namely (Dehmer 2008; Dehmer et al 2009),

f V1.vi / WD ˛c1jS1.vi ;G/jCc2jS2.vi ;G/jC���Cc�.G/jS�.G/.vi ;G/j;ck > 0; 1 6 k 6 �.G/; ˛ > 0: (4.5)

Note that the parameters ck > 0 can be used to weight structural characteristicsor differences of G in each sphere. They need to be chosen such that they are alldifferent, eg, c1 > c2 > � � � > c�.G/ (see Dehmer and Emmert-Streib 2008). It isevident that the choice of ck > 0 has an impact on the resulting measured value. Forspecial economic networks possessing special topological properties such as pathnessor a large number of cycles, these parameters could be learned systematically. Whenapplying graph entropy measures to hierarchical economic networks representing

www.risk.net/journals Journal of Network Theory in Finance

Page 104: TUNI baltakys arkisto1

14 F. Emmert-Streib et al

hierarchical business group graphs,Altomonte and Rungi (2013) generalized the workof Emmert-Streib and Dehmer (2007). They defined a new measure called “GroupIndex of Complexity” (GIC), which is given by

GIC.G/ DLXl

lnl

Nlog

�N

nl

�: (4.6)

Here, L is the number of hierarchy levels, nl is the number of affiliates on hierarchicallevel l and N is the total number of affiliates in that network. Altomonte and Rungi(2013) found a negative correlation between vertical integration and the hierarchi-cal complexity of business groups. Further, they determined a positive correlationbetween the hierarchical complexity of business groups and their productivity levels.

The last contribution we mention in this section is that of Bekiros et al (2017). Theyused information theoretic quantities to measure the centrality of economic networks.An important result of this paper is that the authors found evidence of disparity incorrelation and entropy-based centrality measurements for all markets between thepre- and post-crisis periods.

5 DIFFERENT TYPES OF NETWORKS AND TREES

Networks models have, in general, been useful for many reasons. In fact, networksenable an immediate visualization of the complex interrelations among importantplayers of a system under consideration. Also, networks constitute a mathematicalrepresentation that can be analyzed rigorously. We start this section by discussingimportant network classes.

5.1 Random networks

Random networks are the first type of networks that have been studied extensively. Forinstance, the seminal works of Erdos and Rényi (1959) catalyzed this development.To put it simply, a random graph with N vertexes can be obtained by connectingevery pair of vertexes with a probability of p. The expected number of edges for an(undirected) network constructed in this way is

E.n/ D pN.N � 1/

2: (5.1)

The degree distribution of a vertex i in a random network follows a binomial

P.ki D k/ D

N � 1

k

!pk.1 � p/N �1�k; (5.2)

Journal of Network Theory in Finance www.risk.net/journals

Page 105: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 15

because the maximal degree of vertex i is at most N � 1, the probability that thevertex has k edges is pk.1� p/N �1�k , and there are

�N �1

k

�possibilities to choose k

edges from N � 1 vertexes. By going to the limit N !1, (5.2) gives

P.ki D k/ � zk exp.�z/

kŠ: (5.3)

Here, z D p.N �1/ is the expected number of edges for a vertex. This means that, forlarge N , the degree distribution of a vertex in a random network can be approximatedby the Poisson distribution. For this reason, random networks are also called Poissonrandom networks (Newman 2003).

Further, one can show that the degree distribution of a random network (instead ofjust a vertex) also approximately follows a Poisson distribution:

P.Xk D r/ � zr exp.�z/

rŠ: (5.4)

This can be interpreted as there being Xk D r vertexes in the network that have adegree of k (Albert and Barabasi 2002).

For random networks, the clustering coefficient Ci of a vertex i (see (3.7)) assumesa very simple value. Specifically, because the average degree of a vertex can beapproximated by z D p.N � 1/ � pN , it follows that

Ci � z

ND p: (5.5)

5.2 Trees

A tree is a simple but nontrivial network that, in general, is connected and acyclic(Harary 1969). Below, we state a theorem which shows that a tree can be characterizedby various properties (see Ihringer 1994).

Theorem 5.1 Let G be a graph having N vertexes. Then, the following assertionsare equivalent:

(1) G D .V; E/ is a tree;

(2) every two vertexes of G are connected by a unique path;

(3) G is connected, but for every edge e 2 E, Gnfeg is disconnected;

(4) G is connected and has exactly N � 1 edges;

(5) G is cycle free and has exactly N � 1 edges;

(6) G is cycle free, but for every two nonadjacent vertexes v; w, G[fv; wg containsexactly one cycle.

www.risk.net/journals Journal of Network Theory in Finance

Page 106: TUNI baltakys arkisto1

16 F. Emmert-Streib et al

FIGURE 5 An ordinary rooted tree representing the relations between the directorates offirms.

D1

D2

D4

D3

Di : directorate of firm i

D5

A connection indicates that two company boards have some common directors.This view allows us to easily identifythe distance between two directorates Di of two firms.

The first studies of trees that can be found in the literature are due to Arthur Cayley(see Cayley 1857, 1875).

In Figure 5, we show an example of an ordinary rooted tree. This ordinary rooted treerepresents the relations between the directorates of firms. Specifically, a connectionindicates that two company boards have some common directors that serve on bothboards. Due to the fact that such a relation does not imply a natural direction, theconnections are undirected. This view allows us to easily identify the distance betweentwo directorates Di of two firms.

In contrast, in Figure 6 we show an example of trees connecting stocks. We want toemphasize that a tree does not possess a hierarchy; that means there is no top or bottomin the graph of a tree. For this reason, the minimum spanning tree (MST) shown inFigure (6) can be rearranged arbitrarily. In contrast, rooted trees have a root that is adistinct vertex with all paths pointing away from it (Harary 1969). An extension ofminimum spanning trees (Mantegna 1999) obtained from the correlation networksof financial markets can be found in Tumminello et al (2005), which introduces amethod – the planar maximally filtered graph (PMFG) – for extracting planar graphsfrom such correlation networks. Applications to risk diversification can be found inMusmeci et al (2015) and Musmeci et al (2016); more advanced methods have beenstudied in Massara et al (2016).

5.3 Generalized trees

In this section, we introduce an important extension of trees called generalized trees(GTs) (Dehmer 2006; Mehler et al 2004). Here, we only introduce undirected GTs.

Journal of Network Theory in Finance www.risk.net/journals

Page 107: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 17

FIGURE 6 Minimum spanning tree of stocks.

IBM

Apple

IntelCisco

Amazon HP

Yahoo

The idea of introducing directed GTs was first broached in Mehler et al (2004).When introducing GTs, we claim that they are hierarchical and possess a distinctvertex called a root, which is usually present in ordinary rooted trees. A GT has edgetypes in addition to the edges of an ordinary rooted tree, and this leads to a richerconnectivity among vertexes.

Definition 5.2 (Generalized tree) A generalized tree GTi is defined by a vertexset V , an edge set E, a level set L and a multilevel function Li . The vertex and edgeset define the adjacencies, while the level set and the multilevel function demonstratea hierarchy between the vertexes of GTi . The index i 2 V indicates the root.

The multilevel function is defined as follows.

Definition 5.3 (Multilevel function) The function Li W V ! L 2 N is called themultilevel function with Li .i/ D 0.

The multilevel function Li assigns to all vertexes an element l 2 L that correspondsto the level to which it will be assigned. The index i refers to the root node, whichis assigned to level l D 0. From these definitions, it is immediately clear that a GTis similar to a graph but is additionally equipped with a level set L and a multilevelfunction Li , introducing a vertex grouping corresponding to the introduction of ahierarchy between vertexes and sets thereof.

www.risk.net/journals Journal of Network Theory in Finance

Page 108: TUNI baltakys arkisto1

18 F. Emmert-Streib et al

FIGURE 7 Two GTs representing an investment network between firms.

Business group

F4 F5F3

F2F1

F6F7 F8

Definition 5.4 (Edge types) A generalized tree GTi has three edge types.

� Edges with jLi .m/ �Li .n/j D 1 are called kernel edges (E1).

� Edges with jLi .m/ �Li .n/j D 0 are called across edges (E2).

� Edges with jLi .m/ �Li .n/j > 1 are called jump edges (E3).

Here, m; n 2 V .

Figure 7 shows a GT. The edge types are highlighted by color; kernel edges form-ing the hierarchy are red, cross edges that do not overjump a level are green, andjump edges are blue. Here, it is important to emphasize that the two orange nodesrepresenting two firms F1 and F2 are combined into one node representing a businessgroup. This means the shown GT has only one root node. Further, we note that a GTis a tree-like graph that may possess cycles (Dehmer et al 2007). However, a usualgraph containing cycles is not hierarchical (Harary 1969).

Here, the firms F1 and F2 are forming a business group. The direction of an edgeindicates the directionality of the investment. Kernel edges are in red, across edgesare in green and jump edges are in blue.

If one does not collapse the two firms F1 and F2 into a business group but insteadleaves them as individual nodes, the graph structure shown in Figure 7 forms a morecomplex structure than a GT because it has two root nodes. Such a tree structure hasbeen termed a universal graph (see Emmert-Streib and Dehmer 2007).

Journal of Network Theory in Finance www.risk.net/journals

Page 109: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 19

FIGURE 8 Bipartite networks connecting countries with economic sectors.

UK

US

AG

ER

JPN

Ser

vice

Agr

icul

ture

Indu

stry

The width of the links is proportional to the strength of the economic sector.

5.4 Bipartite networks

Another network structure important in the representation of economic networks is abipartite network. A bipartite network consists of two types of nodes. Let us call thefirst node type U and the second node type V . Edges can only occur between nodesof different types, ie,

Eij D 1 if vi 2 U and vj 2 V: (5.6)

In order to distinguish such a network, one often writes G D .U; V; E/. In the caseof jU j D jV j, the network is called a balanced bipartite network. If the connectionsEij carry a weight, the graph is called a weighted bipartite network.

In Figure 8, we show an example of a weighted bipartite network. This network con-nects four countries (United Kingdom (UK), United States (USA), Germany (GER)and Japan (JPN)) to three economic sectors (industry, agriculture and service). Thewidth of the links is proportional to the strength of the corresponding economic sec-tor. In this way, one can express the contribution of different economic sectors to thegross domestic product (GDP) of a country or the number of people that are workingin the corresponding fields.

We would like to note that there are many different graphical ways to visualizebipartite networks and, in most cases (as seen in Figure 8), the nodes of the bipartitenetworks are not shown as circles; instead, the visualization is more artistic. Never-theless, one should not forget that the underlying graph is defined in terms of graphtheory with a strict meaning.

www.risk.net/journals Journal of Network Theory in Finance

Page 110: TUNI baltakys arkisto1

20 F. Emmert-Streib et al

5.5 Complex network topologies

Toward the end of the 1990s, two new types of networks were added to the literature,namely small-world networks (Watts and Strogatz 1998) and scale-free networks(Albert and Barabasi 2002).

Specifically, Watts and Strogatz (1998) found that networks generated according tospecific rules have a high clustering coefficient (like regular networks) as well as, onaverage, short distances between vertexes (similar to random networks). Hence, thesenetworks (small-world networks) combine different features from different networkclasses. With respect to biological networks, small-world networks have been foundin, for example, coexpression, protein and metabolic networks (van Noort et al 2004;Wagner and Fell 2001; Wilhelm et al 2003).

One economic network with small-world characteristics is the directorate interlocknetwork. Davis et al (2003) studied several hundred US firms and banks as wellas thousands of directors on their board of directors during the 1980s and 1990s.They found that the director network, where nodes correspond to directors and linkscorrespond to the common board positions of two directors, is consistent with highclustering coefficients and low average path lengths.

Complementary to this, Albert and Barabasi (2002) found that many real worldnetworks show scale-free behavior in terms of node degrees:

P.k/ � k�� : (5.7)

To explain this common feature, Barabási and Albert introduced a model (Barabásiand Albert 1999), now known as the Barabási–Albert (BA) or preferential attachmentmodel (Newman 2003), which results in so-called scale-free networks; these have adegree distribution following a power law (Barabási and Albert 1999). A major differ-ence between the preferential attachment model and the other algorithms describedabove for generating random or small-world networks is that the BA model does notassume a fixed number of vertexes N and then rewires them iteratively with a fixedprobability, whereas in this model N grows. Each newly added vertex is connected,with a certain probability (which is not constant), to other vertexes already present inthe network. This attachment probability,

pi D kiPj kj

; (5.8)

is proportional to the degree kj of these vertexes, which explains the name of themodel. This way, each new vertex is added to e 2 N existing vertexes in the network.

Garlaschelli et al (2005) found that the network of investment markets follows ascale-free distribution. Further examples of economic networks exhibiting a scale-free degree distribution, eg, for interbank networks or world trade networks, can be

Journal of Network Theory in Finance www.risk.net/journals

Page 111: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 21

found in Boss et al (2004), Souma et al (2003), Arnold et al (2006) and Serrano andBoguná (2003).

6 FUTURE DIRECTIONS AND DISCUSSION

A natural prerequisite for any economic network analysis is the creation or inference ofnetworks. For this reason, we think it would be useful to conduct some comparativeanalyses to identify the best method for different types of economic networks. Inorder to perform such analyses, one would need to define what “best” means in thiscontext. The problem is that, in most cases, the true economic network is not known.For instance, the true financial network connecting stocks in the New York StockExchange is unknown. For this reason, one needs to define context-specific measuresthat allow, potentially in an indirect way, the assessment of the inferred networkstructure. In the case of financial networks, this could be accomplished by utilizingfinancial networks in predictive forecasting models to estimate, eg, future stock prices.That means one can compare predictive models that take the network structure intoconsideration with those that do not. Better prediction results in the former case couldindicate that the inferred financial network captures meaningful information that canbe translated into beneficial forecasts.

As a further example, we want to look at investor networks. MacLeod (2009)investigated the social properties of stakeholders by using these networks. In addition,they compared them with the so-called advocacy networks of social activists. It isworth mentioning that these network classes can be compared using comparativegraph measures (Emmert-Streib et al 2016). This would extend the work in MacLeod(2009) considerably, as it would lead to a thorough graph-theoretical treatment of thisproblem. Another example of analyzing investor networks using quantitative networkmeasures can be found in Ozsoylev et al (2014).

Research on networks in economics and finance has exploded for many reasons.Financial and economic systems are often complex, and it is necessary to pay atten-tion to their patterns of interaction in order to understand the behavior of systemsand the agents therein. The increasing quantity of available data from various sourcesfor different systems is enabling researchers to test and apply theories more eas-ily (Hagströmer and Menkveld 2016; Jackson and Wolinsky 1996; Kirman 1997;Ozsoylev et al 2014; Walden 2017) as well as conduct explanatory research (Bal-takys et al 2017; Boss et al 2004; Fagiolo et al 2009; Iori et al 2008; Onnela et al2003a,b; Tumminello et al 2012) to gain a better empirical understanding of financialand economic systems. At the same time, network research in economics and financeis quite fragmented. This is not only in terms of research topics and applications, butalso in terms of research approaches, academic disciplines and their journals.

www.risk.net/journals Journal of Network Theory in Finance

Page 112: TUNI baltakys arkisto1

22 F. Emmert-Streib et al

In economics, there have been several application areas for networks, includingnetwork games (Ballester et al 2006; Bramoullé and Kranton 2015), labor markets(Adamic et al 2017; Beaman 2016; Calvó-Armengol and Jackson 2007), interna-tional trade (Chaney 2016; Rauch 1999) and social networks more broadly (Morelliet al 2017). This type of research is typically published in well-established journalsin economics, but research on international trade has also been published in cross-disciplinary journals (see, for example, Bhattacharya 2008; Jackson and Nei 2015;Jiang and Zhou 2010; Saracco et al 2016).

In finance, complex systems and networks offer the potential for better analysisand monitoring of research into systemic risk in financial systems. Research on thistopic has been published in both financial journals (Acemoglu et al 2015; Billioet al 2012; Diebold and Yılmaz 2014; Hautsch et al 2014; Markose et al 2012) andmultidisciplinary journals (Battiston et al 2012a, 2016; Cimini et al 2015; Haldaneand May 2011). As emphasized in Jackson (2016), in the area of systemic risk, thegap between theory and applications still needs to be closed, which is importantbecause, in this area, network theory can have an immediate as well as a lastingimpact. In addition, research papers on networks in financial markets have mainlybeen published in multidisciplinary, complexity and physics journals, rather than infinance ones (see, for example, Battiston et al 2012b; Emmert-Streib 2010a,b; Gualdiet al 2016; Musciotto et al 2016; Onnela et al 2003a,b; Tumminello et al 2012),although recent exceptions do exist (Ahern 2017; Han and Yang 2013; Ozsoylevet al 2014). Network methods can be used to investigate investors’ joint behaviorand interaction, and given that investor network structure is important for stock pricedynamics (Walden 2017), it is quite surprising that finance journals have been so slowon the uptake, only marginally adopting network methods in the research of investorbehavior.

In econophysics, papers are often data driven and exploratory, whereas paperspublished in finance journals rely on models, typically under the neoclassical eco-nomics paradigm of rational individual choice. The research questions and approachescan be very different between different journals, which can partially be explainedby researchers’ different interests and backgrounds. Nevertheless, different researchapproaches (eg, exploratory versus confirmatory research) can benefit from each other,and one of the most important possibilities in network research in finance is to fosteractual interaction between research published in different journal categories, ie, mul-tidisciplinary, complexity, (econo)physics and finance journals. In particular, data-driven research can feed “traditional” finance research by raising observations thatshould be theoretically explained; however, theoretical models should be carefullyreflected upon and evaluated by large data sets with the alternative methods used innonfinancial journals. Also, there would be possibilities for using methods developedin exploratory research in financial modeling. For example, Tumminello et al (2012)

Journal of Network Theory in Finance www.risk.net/journals

Page 113: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 23

provide a method to estimate links between investors and detect investor communi-ties, which could be used to verify financial theories in the interaction of individualinvestors and information transformation. As there is a theoretical relation betweeninvestor networks and volatility dynamics (Walden 2017), methods developed to esti-mate links between investors (Baltakys et al 2017; Tumminello et al 2012) couldbe used to produce state variables that, in turn, could be augmented into volatil-ity models for better risk management and option-pricing accuracy. Addressing thisquestion requires experience of both network inference methods, which are typicallypublished in multidisciplinary journals (see, for example, Tumminello et al 2012),and established time series models.

7 CONCLUSION

The purpose of this paper was to showcase the use of graph theoretic methods forstudying economic problems. We hope this will help in making the study of economicnetworks more popular and accessible in the economics and finance literature becauseof the tremendous potential that such approaches possess for shedding light on ourglobal, interconnected world.

DECLARATION OF INTEREST

The authors report no conflicts of interest. The authors alone are responsible for thecontent and writing of the paper.

ACKNOWLEDGEMENTS

Matthias Dehmer thanks the Austrian Science Fund for supporting this work(project P26142). Juho Kanniainen received funding from the EU Horizon 2020research and innovation programme, under the Marie Skłodowska-Curie grantagreement No 675044 “BigDataFinance”.

REFERENCES

Acemoglu, D., Ozdaglar, A., and Tahbaz-Salehi, A. (2015). Systemic risk and stabilityin financial networks. American Economic Review 105(2), 564–608 (https://doi.org/10.1257/aer.20130456).

Adamic, L., and Huberman, B. (2000).Power-law distribution of the world wide web.Science287(2115a), 1–2 (https://doi.org/10.1126/science.287.5461.2115a).

Adamic, L., Brunetti, C., Harris, J. H., and Kirilenko, A. (2017). Trading networks. Econo-metrics Journal 20(3), 126–149 (https://doi.org/10.1111/ectj.12090).

Ahern, K. R. (2017). Information networks: evidence from illegal insider trading tips. Journalof Financial Economics 125(1), 26–47 (https://doi.org/10.1016/j.jfineco.2017.03.009).

www.risk.net/journals Journal of Network Theory in Finance

Page 114: TUNI baltakys arkisto1

24 F. Emmert-Streib et al

Albert, R., and Barabasi, A. L. (2002). Statistical mechanics of complex networks. Reviewsof Modern Physics 74, 47–97 (https://doi.org/10.1103/RevModPhys.74.47).

Allen, E. B. (2002). Measuring graph abstractions of software: an information-theoryapproach. In Proceedings of the 8th International Symposium on Software Metrics. IEEEComputer Society (https://doi.org/10.1109/METRIC.2002.1011337).

Altomonte, C., and Rungi, A. (2013). Business groups as hierarchies of firms: determinantsof vertical integration and performance. Working Paper Series 1554, European CentralBank (https://doi.org/10.2139/ssrn.2253222).

Amin, S. H., and Zhang, G. (2013). A multi-objective facility location model for closed-loop supply chain network under uncertain demand and return. Applied MathematicalModelling 37(6), 4165–4176 (https://doi.org/10.1016/j.apm.2012.09.039).

Arnold, J., Bech, M. L., Beyeler, W. E., Glass, R. J., and Soramäki, K. (2006). The topologyof interbank payment flows. Staff Report 243, Federal Reserve Bank of New York.

Ballester, C., Calvó-Armengol, A., and Zenou, Y. (2006). Who’s who in networks. Wanted:the key player. Econometrica 74(5), 1403–1417 (https://doi.org/10.1111/j.1468-0262.2006.00709.x).

Baltakys, K., Kanniainen, J., and Emmert-Streib, F. (2017). Multilayer aggregation with stat-istical validation: application to investor networks. Scientific Reports 8(1), 8198 (https://doi.org/10.1038/s41598-018-26575-2).

Bang-Jensen, J., and Gutin, G. (2002). Digraphs: Theory, Algorithms and Applications.Springer (https://doi.org/10.1007/978-1-4471-3886-0).

Barabási, A. L., and Albert, R. (1999). Emergence of scaling in random networks. Science206, 509–512.

Battiston, S., Delli Gatti, D., Gallegati, M., Greenwald, B., and Stiglitz, J. E. (2012a).Liaisons dangereuses: increasing connectivity, risk sharing, and systemic risk. Jour-nal of Economic Dynamics and Control 36(8), 1121–1141 (https://doi.org/10.1016/j.jedc.2012.04.001).

Battiston, S., Puliga, M., Kaushik, R., Tasca, P., and Caldarelli, G. (2012b). Debtrank: toocentral to fail? Financial networks, the Fed and systemic risk. Scientific Reports 2(541)(https://doi.org/10.1038/srep00541).

Battiston, S., Farmer, J. D., Flache, A., Garlaschelli, D., Haldane, A. G., Heesterbeek, H.,Hommes, C., Jaeger, C., May, R., and Scheffer, M. (2016). Complexity theory and finan-cial regulation. Science 351(6275), 818–819 (https://doi.org/10.1126/science.aad0299).

Beaman, L. (2016). Social networks and the labor market. In The Oxford Handbook on theEconomics of Networks, Bramoullé, Y., Galeotti, A., and Rogers, B. (eds), pp. 649–674.Oxford University Press.

Bekiros, S., Nguyen, D. K., Sandoval, L., Jr., and Uddin, G. S. (2017). Information diffu-sion, cluster formation and entropy-based network dynamics in equity and commoditymarkets. European Journal of Operational Research 256(3), 945–961 (https://doi.org/10.1016/j.ejor.2016.06.052).

Berge, C. (1989). Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam.

Bhattacharya, K., Mukherjee, G., Saramäki, J., Kaski, K., and Manna, S. S. (2008). Theinternational trade network: weighted network analysis and modelling. Journal of Stat-istical Mechanics: Theory and Experiment 2008(02), P02002 (https://doi.org/10.1088/1742-5468/2008/02/P02002).

Journal of Network Theory in Finance www.risk.net/journals

Page 115: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 25

Billio, M., Getmansky, M., Lo, A.W., and Pelizzon, L. (2012). Econometric measures of con-nectedness and systemic risk in the finance and insurance sectors. Journal of FinancialEconomics 104(3), 535–559 (https://doi.org/10.1016/j.jfineco.2011.12.010).

Bollabás, B. (1998). Modern Graph Theory. Graduate Texts in Mathematics. Springer(https://doi.org/10.1007/978-1-4612-0619-4).

Bonacich, P. (1972). Factoring and weighting approaches to status scores and cliqueidentification. Journal of Mathematical Sociology 2, 113–120 (https://doi.org/10.1080/0022250X.1972.9989806).

Bonchev, D. (1983). Information Theoretic Indices for Characterization of Chemical Struc-tures. Research Studies Press, Chichester, UK.

Bonchev, D., and Rouvray, D. H. (2005). Complexity in Chemistry, Biology, and Ecology.Mathematical and Computational Chemistry. Springer.

Bornholdt, S., and Schuster, H. G. (2003). Handbook of Graphs and Networks: From theGenome to the Internet. Wiley.

Boss, M., Elsinger, H., Summer, M., and Thurner, S. (2004). Network topology of theinterbank market.Quantitative Finance 4(6), 677–684 (https://doi.org/10.1080/14697680400020325).

Bramoullé, Y., and Kranton, R. (2015). Games played on networks. Working Paper, Aix-Marseille School of Economics. URL: https://halshs.archives-ouvertes.fr/halshs-01180657/.

Brandes, U., and Erlebach, T. (2005). Network Analysis. Lecture Notes in ComputerScience. Springer (https://doi.org/10.1007/b106453).

Brandstädt, A., Le, V. B., and Sprinrand, J. P. (1999). Graph Classes: A Survey. SIAMMonographs on Discrete Mathematics and Applications. SIAM, Philadelphia, PA (https://doi.org/10.1137/1.9780898719796).

Brinkmeier, M., and Schank, T. (2005). Network statistics. In Network Analysis, Brandes,U., and Erlebach, T. (eds), pp. 293–317. Lecture Notes in Computer Science. Springer(https://doi.org/10.1007/978-3-540-31955-9_11).

Buckley, F., and Harary, F. (1990). Distance in Graphs. Addison-Wesley, Boston, MA.Calvó-Armengol, A., and Jackson, M. O. (2007). Networks in labor markets: wage and

employment dynamics and inequality. Journal of Economic Theory 132(1), 27–46(https://doi.org/10.1016/j.jet.2005.07.007).

Cayley, A. (1857). On the theory of analytic forms called trees. Philosophical Magazine 13,19–30 (https://doi.org/10.1080/14786445708642238).

Cayley, A. (1875). On the analytical forms called trees, with application to the theory ofchemical combinatorics. Report, British Association for the Advancement of Science,pp. 257–305.

Chakrabarti, S. (2002).Mining the Web: Discovering Knowledge from Hypertext Data.Morgan Kaufmann, San Francisco, CA.

Chaney, T. (2016). Networks in international trade. In The Oxford Handbook on the Eco-nomics of Networks, Bramoullé, Y., Galeotti, A., and Rogers, B. (eds), pp. 754–775.Oxford University Press.

Cimini, G., Squartini, T., Garlaschelli, D., and Gabrielli, A. (2015). Systemic risk analysison reconstructed economic and financial networks. Scientific Reports 5, 15758 (https://doi.org/10.1038/srep15758).

www.risk.net/journals Journal of Network Theory in Finance

Page 116: TUNI baltakys arkisto1

26 F. Emmert-Streib et al

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001). Introduction toAlgorithms. MIT Press, Cambridge, MA.

Cvetkovic, D. M., Doob, M., and Sachs, H. (1997). Spectra of Graphs: Theory andApplication. Academic Press.

Davis, G. F., Yoo, M., and Baker, W. E. (2003). The small world of the American corporateelite, 1982–2001. Strategic Organization 1(3), 301–326 (https://doi.org/10.1177/14761270030013002).

Degryse, H., and Nguyen, G. (2004). Interbank exposure: an empirical examination ofsystemic risk in the Belgian banking system. Working Paper Research 43, NationalBank of Belgium.

Dehmer, M.(2006).Strukturelle Analyse web-basierter Dokumente.Multimedia undTeleko-operation. Deutscher Universitäts Verlag, Wiesbaden.

Dehmer, M. (2008). A novel method for measuring the structural information content of net-works. Cybernetics and Systems 39, 825–843 (https://doi.org/10.1080/01969720802435925).

Dehmer, M., and Emmert-Streib, F. (2008). Structural information content of networks:graph entropy based on local vertex functionals. Computational Biology and Chemistry32, 131–138 (https://doi.org/10.1016/j.compbiolchem.2007.09.007).

Dehmer, M., Mehler, A., and Emmert-Streib, F. (2007). Graph-theoretical characterizationsof generalized trees. In Proceedings of the International Conference on Machine Learn-ing: Models, Technologies and Applications (MLMTA’07), Las Vegas, 2007, pp. 113–117.CSREA Press.

Dehmer, M., Varmuza, K., Borgert, S., and Emmert-Streib, F. (2009). On entropy-basedmolecular descriptors: statistical analysis of real and synthetic chemical structures.Journal of Chemical Information and Modeling 49, 1655–1663 (https://doi.org/10.1021/ci900060x).

Devillers, J., and Balaban, A. T. (1999). Topological Indices and Related Descriptors inQSAR and QSPR. Gordon and Breach Science Publishers, Amsterdam, The Nether-lands.

Dhar, V., Geva, T., Oestreicher-Singer, G., and Sundararajan, A. (2014). Prediction in eco-nomic networks. Information Systems Research 25(2), 264–284 (https://doi.org/10.1287/isre.2013.0510).

Diebold, F. X., andYılmaz, K. (2014). On the network topology of variance decompositions:measuring the connectedness of financial firms. Journal of Econometrics 182(1), 119–134 (https://doi.org/10.1016/j.jeconom.2014.04.012).

Diestel, R. (2000). Graph Theory. Springer.

Emmert-Streib, F., Chen, L., and Storey, J. (2007). Functional annotation of genes insaccharomyces cerevisiae based on joint betweenness. Preprint (arXiv:0709.3291).

Emmert-Streib, F., and Dehmer, M. (2010a). Identifying critical financial networks of theDjia: toward a network-based index. Complexity 16(1), 24–33 (https://doi.org/10.1002/cplx.20315).

Emmert-Streib, F., and Dehmer, M. (2010b). Influence of the time scale on the construc-tion of financial networks. PLOS One 5(9), e12884 (https://doi.org/10.1371/journal.pone.0012884).

Journal of Network Theory in Finance www.risk.net/journals

Page 117: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 27

Emmert-Streib, F., and Dehmer, M. (2007). Information theoretic measures of UHG graphswith low computational complexity.Applied Mathematics and Computation 190(2), 1783–1794 (https://doi.org/10.1016/j.amc.2007.02.095).

Emmert-Streib, F., and Dehmer, M. (2008). Analysis of Microarray Data: A Network-BasedApproach. Wiley–VCH, Weinheim, Germany (https://doi.org/10.1002/9783527622818).

Emmert-Streib, F., Dehmer, M., and Shi, Y. (2016). Fifty years of graph matching, networkalignment and network comparison. Information Sciences 346–347, 180–197 (https://doi.org/10.1016/j.ins.2016.01.074).

Erdos, P., and Rényi, A. (1959). On random graphs. I. Publicationes Mathematicae 6,290–297.

Erdos, P., and Rényi, A. (1960). On the evolution of random graphs. A MTA MatematikaiKutató Intézetének Közleményei 5, 17–61.

Euler, L. (1736). Solutio problematis ad geometriam situs pertinentis. CommentariiAcademiae Scientiarum Imperialis Petropolitanae 8, 128–140.

Even, S. (1979). Graph Algorithms. Computer Science Press.Fagiolo, G., Reyes, J., and Schiavo, S. (2009). World-trade web: topological properties,

dynamics, and evolution. Physical Review E 79(3), 036115 (https://doi.org/10.1103/PhysRevE.79.036115).

Felsenstein, J. (2003). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.Foulds, L.R. (1992).GraphTheory Applications.Springer (https://doi.org/10.1007/978-1-46

12-0933-1).Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry

40(1), 35–41 (https://doi.org/10.2307/3033543).Freeman, L. C. (1979). Centrality in social networks: conceptual clarification. Social

Networks 1, 215–239 (https://doi.org/10.1016/0378-8733(78)90021-7).Gallo, G., Longo, G., and Pallottino, S. (1993). Directed hypergraphs and applications.

Discrete Applied Mathematics 42(2), 177–201 (https://doi.org/10.1016/0166-218X(93)90045-P).

Garlaschelli, D., Battiston, S., Castri, M., Servedio, V. D. P., and Caldarelli, G. (2005). Thescale-free topology of market investments. Physica A 350(2), 491–499 (https://doi.org/10.1016/j.physa.2004.11.040).

Giudici, P., and Spelta, A. (2016).Graphical network models for international financial flows.Journal of Business and Economic Statistics 34(1), 128–138 (https://doi.org/10.1080/07350015.2015.1017643).

Gross, J. L., and Yellen, J. (2006). Graph Theory and Its Applications, 2nd edn. DiscreteMathematics and Its Applications. Chapman & Hall, Boca Raton.

Gualdi, S., Cimini, G., Primicerio, K., Di Clemente, R., and Challet, D. (2016). Statisticallyvalidated network of portfolio overlaps and systemic risk. Scientific Reports 6, 39467(https://doi.org/10.1038/srep39467).

Hage, P., and Harary, F. (1995). Eccentricity and centrality in networks. Social Networks17, 57–63 (https://doi.org/10.1016/0378-8733(94)00248-9).

Hagströmer, B., and Menkveld, A. J. (2016). A network map of information percola-tion. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.2770313).

Hakeem, M. M., and Suzuki, K. (2017). Centrality measures for trade and investmentnetworks. Australian Academy of Accounting and Finance Review 1(2), 103–118.

www.risk.net/journals Journal of Network Theory in Finance

Page 118: TUNI baltakys arkisto1

28 F. Emmert-Streib et al

Haldane, A. G., and May, R. M. (2011). Systemic risk in banking ecosystems. Nature469(7330), 351–355 (https://doi.org/10.1038/nature09659).

Halin, R. (1989). Graphentheorie. Akademie, Berlin.Han, B., and Yang, L. (2013). Social networks, information acquisition, and asset prices.

Management Science 59(6), 1444–1457 (https://doi.org/10.1287/mnsc.1120.1678).Harary, F. (1959). Status and contrastatus. Sociometry 22, 23–43 (https://doi.org/10.2307/

2785610).Harary, F. (1967). Graph Theory and Theoretical Physics. Academic Press.Harary, F. (1969). Graph Theory. Addison Wesley, Reading, MA (https://doi.org/10.21236/

AD0705364).Hartmann, S., and Briskorn, D. (2010). A survey of variants and extensions of the resource-

constrained project scheduling problem. European Journal of Operational Research207(1), 1–14 (https://doi.org/10.1016/j.ejor.2009.11.005).

Hautsch, N., Schaumburg, J., and Schienle, M. (2014). Financial network systemic riskcontributions. Review of Finance 19(2), 685–738 (https://doi.org/10.1093/rof/rfu010).

Heileman, M. D., Linton, D. G., and Khajenoori, S. (1992). Simulation study aids space-shuttle flight rate planning. Industrial Engineering 24(3), 58–59

Hochberg, Y. V., Ljungqvist, A., and Lu, Y. (2007). Whom you know matters: venture cap-ital networks and investment performance. Journal of Finance 62(1), 251–301 (https://doi.org/10.1111/j.1540-6261.2007.01207.x).

Hopp, W. J., and Spearman, M. L. (2011). Factory Physics. Waveland Press, Long Grove,IL.

Ihringer, T. (1994). Diskrete Mathematik. Teubner, Stuttgart.Iori, G., De Masi, G., Precup, O.V., Gabbi, G., and Caldarelli, G. (2008). A network analysis

of the Italian overnight money market. Journal of Economic Dynamics and Control 32(1),259–278 (https://doi.org/10.1016/j.jedc.2007.01.032).

Jackson, M. O. (2016).The past and future of network analysis in economics. In The OxfordHandbook on the Economics of Networks, Bramoullé, Y., Galeotti, A., and Rogers, B.(eds), pp. 71–79. Oxford University Press.

Jackson, M. O., and Nei, S. (2015). Networks of military alliances, wars, and internationaltrade. Proceedings of the National Academy of Sciences of the USA 112(50), 15 277–15 284 (https://doi.org/10.1073/pnas.1520970112).

Jackson, M.O., andWolinsky, A. (1996).A strategic model of social and economic networks.Journal of Economic Theory 71(1), 44–74 (https://doi.org/10.1006/jeth.1996.0108).

Jiang, Z.-Q., and Zhou, W. X. (2010). Complex stock trading network among investors.Physica A 389(21), 4929–4941 (https://doi.org/10.1016/j.physa.2010.07.024).

Kalinowski, K., Grabowik, C., Kempa, W. M., and Paprocka, I. (2014). The graph repre-sentation of multivariant and complex processes for production scheduling. In AdvancedMaterials Research, Volume 837, pp. 422–427. Trans Tech Publications.

Kauffman, S. A. (1969). Metabolic stability and epigenesis in randomly constructed geneticnets. Journal of Theoretical Biology 22, 437–467 (https://doi.org/10.1016/0022-5193(69)90015-0).

Kelley, J. E., Jr., and Walker, M. R. (1959). Critical-path planning and scheduling. InPapers Presented at the December 1–3, 1959, Eastern Joint IRE–AIEE–ACM ComputerConference, pp. 160–173. ACM (https://doi.org/10.1145/1460299.1460318).

Journal of Network Theory in Finance www.risk.net/journals

Page 119: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 29

Kilbridge, M. D., and Wester, L. (1961). A heuristic method of assembly line balancing.Journal of Industrial Engineering 12(4), 292–298.

Kirman, A. (1997). The economy as an evolving network. Journal of Evolutionary Eco-nomics 7(4), 339–353 (https://doi.org/10.1007/s001910050047).

Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of “information”.Problems of Information Transmission 1, 1–7.

König, M. D. (1936).Theorie der endlichen und unendlichen Graphen. Chelsea Publishing.

König, M. D., and Battiston, S. (2009). From graph theory to models of economic net-works. a tutorial. In Networks, Topology and Dynamics, Naimzada, A. K., Stefani, S., andTorriero, A. (eds), pp. 23–63. Lecture Notes in Economics and Mathematical Systems,Volume 613. Springer.

Koschützki, D., Lehmann, K. A., Peters, L., Richter, S., Tenfelde-Podehl, D., and Zlotkowski,O. (2005).Clustering. In Centrality Indices, Brandes, U., and Erlebach,T. (eds), pp.16–61.Lecture Notes in Computer Science. Springer.

Kuzubas, T. U., Omercikoglu, I., and Saltoglu, B. (2014). Network centrality measuresand systemic risk: an application to the Turkish financial crisis. Physica A 405, 203–215(https://doi.org/10.1016/j.physa.2014.03.006).

Li, M., and Vitányi, P. M. B. (1997). An Introduction to Kolmogorov Complexity and ItsApplications. Springer (https://doi.org/10.1007/978-1-4757-2606-0).

MacLeod, M. R. (2009). Emerging investor networks and the construction of corporatesocial responsibility. Journal of Corporate Citizenship 34, 69–98 (https://doi.org/10.9774/GLEAF.4700.2009.su.00010).

Mantegna, R. N. (1999). Hierarchical structure in financial markets. European PhysicalJournal B 11(1), 193–197 (https://doi.org/10.1007/s100510050929).

Markose, S., Giansante, S., and Shaghaghi, A. R. (2012). “Too interconnected to fail”financial network of US CDS market: topological fragility and systemic risk Jour-nal of Economic Behavior and Organization 83(3), 627–646 (https://doi.org/10.1016/j.jebo.2012.05.016).

Mason, O., and Verwoerd, M. (2007). Graph theory and networks in biology. IET SystemsBiology 1(2), 89–119 (https://doi.org/10.1049/iet-syb:20060038).

Massara, G. P., Di Matteo, T., and Aste, T. (2016). Network filtering for big data: triangulatedmaximally filtered graph. Journal of Complex Networks 5(2), 161–178 (https://doi.org/10.1093/comnet/cnw015).

Mehler, A. (2009). A quantitative graph model of social ontologies by example of Wikipedia.In Genres on the Web: Computational Models and Empirical Studies, Mehler, A., Sharoff,S., Rehm, G., and Santini, M. (eds). Springer.

Mehler, A., Dehmer, M., and Gleim, R. (2004).Towards logical hypertext structure: a graph-theoretic perspective. In Proceedings of the Fourth International Workshop on InnovativeInternet Computing Systems (I2CS ’04), Böhme, T., and Heyer, G., (eds), pp. 136–150.Lecture Notes in Computer Science, Volume 3473. Springer.

Morelli, S. A., Ong, D. C., Makati, R., Jackson, M. O., and Zaki, J. (2017). Empathyand well-being correlate with centrality in different social networks. Proceedings ofthe National Academy of Sciences of the USA, 201702155 (https://doi.org/10.1073/pnas.1702155114).

www.risk.net/journals Journal of Network Theory in Finance

Page 120: TUNI baltakys arkisto1

30 F. Emmert-Streib et al

Morinaga, E., Shintome,Y., Wakamatsu, H., and Arai, E. (2016).Facility layout planning withcontinuous representation considering temporal efficiency. Transactions of the Instituteof Systems, Control and Information Engineers 29(9), 408–413 (https://doi.org/10.5687/iscie.29.408).

Musciotto, F., Marotta, L., Miccichè, S., Piilo, J., and Mantegna, R. N. (2016). Patternsof trading profiles at the Nordic stock exchange: a correlation-based approach. Chaos,Solitons and Fractals 88, 267–278 (https://doi.org/10.1016/j.chaos.2016.02.027).

Musmeci, N., Aste, T., and Di Matteo, T. Risk diversification: a study of persistence with afiltered correlation-network approach. The Journal of Network Theory in Finance 1(1),77–98 (https://doi.org/10.21314/JNTF.2015.005).

Musmeci, N., Aste, T., and Di Matteo, T. (2016). Interplay between past market correlationstructure changes and future volatility outbursts. Scientific Reports 6, 36320 (https://doi.org/10.1038/srep36320).

Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review45, 167–256 (https://doi.org/10.1137/S003614450342480).

Onnela, J.-P., Chakraborti, A., Kaski, K., and Kertesz, J. (2003a). Dynamic asset treesand Black Monday. Physica A 324(1), 247–252 (https://doi.org/10.1016/S0378-4371(02)01882-4).

Onnela, J.-P., Chakraborti, A., Kaski, K., Kertesz, J., and Kanto, A. (2003b). Dynamics ofmarket correlations: taxonomy and portfolio analysis. Physical Review E 68(5), 056110(https://doi.org/10.1103/PhysRevE.68.056110).

Ozsoylev, H. N., Walden, J., Deniz Yavuz, M., and Bildik, R. (2014). Investor networks inthe stock market.Review of Financial Studies 27(5), 1323–1366 (https://doi.org/10.1093/rfs/hht065).

Palsson, B. O. (2006). Systems Biology. Cambridge University Press (https://doi.org/10.1017/CBO9780511790515).

Pinedo, M. (2012). Scheduling. Springer (https://doi.org/10.1007/978-1-4614-2361-4).Qiu, T., Zheng, B., and Chen, G. (2010). Financial networks with static and dynamic thresh-

olds. New Journal of Physics 12(4), 043057 (https://doi.org/10.1088/1367-2630/12/4/043057).

Rauch, J.E. (1999).Networks versus markets in international trade.Journal of InternationalEconomics 48(1), 7–35 (https://doi.org/10.1016/S0022-1996(98)00009-9).

Roberts, F. (1989). Applications of Combinatorics and Graph Theory to the Biological andSocial Sciences Series. IMA Volumes in Mathematics and Its Applications. Springer(https://doi.org/10.1007/978-1-4684-6381-1_1).

Roberts, S. M., and Mask, K. J. (1992). Shutdown management: five years of learning.AACE International Transactions 2, U–3.

Roukny, T., Georg, C.-P., and Battiston, S. (2014). A network analysis of the evolution ofthe German interbank market. Discussion Paper, Deutsche Bundesbank.

Sabidussi, G. (1966). The centrality index of a graph. Psychometrika 31, 581–603 (https://doi.org/10.1007/BF02289527).

Saracco, F., Di Clemente, R., Gabrielli, A., and Squartini, T. (2016). Detecting early signsof the 2007–2008 crisis in the world trade. Scientific Reports 6, 30286 (https://doi.org/10.1038/srep30286).

Scott, F. (2001). Social Network Analysis. Sage Publications.

Journal of Network Theory in Finance www.risk.net/journals

Page 121: TUNI baltakys arkisto1

Computational analysis of structural properties of economic and financial networks 31

Semple, C., and Steel, M. (2003). Phylogenetics. Graduate Series in Mathematics and ItsApplications. Oxford University Press.

Serrano, M. A., and Boguná, M. (2003). Topology of the world trade web. Physical ReviewE 68(1), 015101 (https://doi.org/10.1103/PhysRevE.68.015101).

Shannon, C. E., and Weaver, W. (1997). The Mathematical Theory of Communication.University of Illinois Press, Urbana, IL.

Sharma, K., Gopalakrishnan, B., Chakrabarti, A. S., and Chakraborti, A. (2017). Finan-cial fluctuations anchored to economic fundamentals: a mesoscopic network approach.Scientific Reports 7, 8055 (https://doi.org/10.1038/s41598-017-07758-9).

Shuja, A. (2016). Social and economic networks: an investigation of retailer networksin Lahore and their impact on enterprise performance. PhD Thesis, Lahore School ofEconomics.

Skorobogatov, V. A., and Dobrynin, A. A. (1988). Metrical analysis of graphs. Communica-tions in Mathematical and in Computer Chemistry 23, 105–155.

Sommerfeld, E. (1994). Kognitive Strukturen. Mathematisch-psychologische Elemen-taranalysen der Wissensstrukturierung und Informationsverarbeitung. Waxmann Pub-lishing.

Sommerfeld, E., and Sobik, F. (1994). Operations on cognitive structures – their modelingon the basis of graph theory. In Knowledge Structures, Albert, D. (ed), pp. 146–190.Springer (https://doi.org/10.1007/978-3-642-52064-8_5).

Souma, W., Fujiwara, Y., and Aoyama, H. (2003). Complex networks and economics.Physica A 324(1), 396–401 (https://doi.org/10.1016/S0378-4371(02)01858-7).

Temizsoy, A., Iori, G., and Montes-Rojas, G. (2016). Network centrality and funding rates inthe e-MID interbank market. Journal of Financial Stability 33, 346–365 (https://doi.org/10.1016/j.jfs.2016.11.003).

Trinajstic, N. (1992). Chemical Graph Theory. CRC Press, Boca Raton, FL.

Tumminello, M., Aste, T., Di Matteo, T., and Mantegna, R. N. (2005). A tool for filteringinformation in complex systems. Proceedings of the National Academy of Sciences ofthe USA 102(30), 10 421–10 426 (https://doi.org/10.1073/pnas.0500298102).

Tumminello, M., Lillo, F., Piilo, J., and Mantegna, R. N. (2012). Identification of clusters ofinvestors from their real trading activity in a financial market. New Journal of Physics14(1), 013041 (https://doi.org/10.1088/1367-2630/14/1/013041).

van Noort, V., Snel, B., and Huymen, M. A. (2004). The yeast coexpression network hasa small-world, scale-free architecture and can be explained by a simple model. EMBOReports 5(3), 280–284 (https://doi.org/10.1038/sj.embor.7400090).

Vitali, S., and Battiston, S. (2014).The community structure of the global corporate network.PLOS One 9(8), 1–13 (https://doi.org/10.1371/journal.pone.0104655).

Vitali, S., Glattfelder, J. B., and Battiston, S. (2011).The network of global corporate control.PloS one 6(10), e25995 (https://doi.org/10.1371/journal.pone.0025995).

Wagner, A., and Fell, D. A. (2001). The small world inside large metabolic networks.Proceedings of the Royal Society of London B 268(1478), 1803–1810 (https://doi.org/10.1098/rspb.2001.1711).

Walden, J. (2017). Trading, profits, and volatility in a dynamic information network model.Working Paper, Social Science Research Network.

www.risk.net/journals Journal of Network Theory in Finance

Page 122: TUNI baltakys arkisto1

32 F. Emmert-Streib et al

Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications.Structural Analysis in the Social Sciences. Cambridge University Press (https://doi.org/10.1017/CBO9780511815478).

Watts, D. J., and Strogatz, S. H. (1998). Collective dynamics of “small-world” networks.Nature 393, 440–442 (https://doi.org/10.1038/30918).

Wilhelm, T., Nasheuer, H.-P., and Huang, S. (2003). Physical and functional modularity ofthe protein network in yeast. Molecular and Cellular Proteomics 2(5), 292–298 (https://doi.org/10.1074/mcp.M300005-MCP200).

Journal of Network Theory in Finance www.risk.net/journals

Page 123: TUNI baltakys arkisto1

PUBLICATION

II

Multilayer aggregation with statistical validation: Application to investornetworks

Baltakys, K., Kanniainen, J. and Emmert-Streib, F.

Scientific reports 8.1 (2018), 8198DOI: 10.1038/s41598-018-26575-2

Publication reprinted with the permission of the copyright holders

Page 124: TUNI baltakys arkisto1
Page 125: TUNI baltakys arkisto1

1SCIENTIFIC REPORTS | (2018) 8:8198

Multilayer Aggregation with Statistical Validation: Application to Investor Networks

Baltakys , Juho Kanniainen & Frank Emmert-Streib

we develop a new tractable procedure for multilayer aggregation based on statistical validation, which

integrate security-wise and time-wise information about investor trading networks, but it is not

constant size networks and having more observations for each node, which is important in the inference

capital have high centrality in investor networks, which, under the theory of information channels in

Scientific literature on multilayer networks has recently started to gain more attention1–3, having important appli-cations in the financial area4,5. Recent empirical evidence has triggered a disagreement over conventional tracta-ble financial models6. In finance, complex network theory7,8 has mainly been applied to gauge the systemic risk posed by interconnected banks6,9–12, but recently, multiple network inference methods have been developed to investigate trading behavior13,14 and portfolios15 in investor network research.

An investor network is a representation of a real-world complex system where institutional and private inves-tors indirectly interact with each other by trading or owning securities. In general, network science methods allow for analyzing and gaining a clearer understanding of the intricate relationships between the components of this system, and a key advantage of such an approach is that it allows for visualizing the resulting networks16,17. However, estimating investor networks is not straightforward, as links between investors are not directly observa-ble. Instead, a link represents the abstract distance of a pair of investors in terms of trading behavior or portfolios. Therefore, the analysis requires investor-level transaction or portfolio data and an appropriate statistical inference method for inferring such networks from the data. Even though complex network methods have begun attracting attention to investor-level data18, many methodological challenges remain, several of which we aim to address in this paper. For our analysis, we use data from a large shareholder registry to investigate the trading networks of different investor categories.

First, the main challenge in investor trading networks is considering multiple securities leading to a multi-layer network representation. What if we wanted a simple network representation, which would have statistically significant relationships over multiple securities? Ever-changing investor behavior poses difficulties for correctly inferring their relationships. Most likely, performing network inference for a whole period will not reveal the whole picture, as localized relationships between investor categories occurring at different periods might be diluted when we look at longer horizons. At the same time, static networks inferred over a whole period do not provide information on how node relationships evolve over time. In order to analyze the varying associations between investor categories, we use a simple, window-based analysis to recover the time-evolving networks of

Laboratory of Industrial and Information Management, Tampere University of Technology, Tampere, Finland. Predictive Medicine and Data Analytics Lab, Faculty of Biomedical Sciences and Engineering, Tampere University of

Technology, Tampere, Finland. Institute of Biosciences and Medical Technology, Tampere, Finland. Correspondence and requests for materials should be addressed to K.B. (email: )

Received: 3 November 2017

Accepted: 16 May 2018

Published: xx xx xxxx

OPEN

Page 126: TUNI baltakys arkisto1

www.nature.com/scientificreports/

2SCIENTIFIC REPORTS | (2018) 8:8198

investor category interactions. Moreover, having a sequence of network snapshots, one might want to summarize the most important reoccurring relationships over the whole period. Therefore, we propose a multilayer aggrega-tion approach that can address this challenge by integrating an ensemble of networks, resulting in a network that captures the most significant consistencies in investor relationships over multiple estimation periods and many securities. We also consider the influence of window size on the resulting aggregated networks19. As we show, this approach allows for producing robust network structures using all of the transaction data over multiple securities and estimation periods without discarding a single transaction.

Second, we can think of investor trading as a data generation process that produces observations (transaction data) based on unobservable trading mechanisms. For example, trading algorithms have specific trading rules, and household investors with more or less intuitive trading strategies can have certain (stochastic) mechanisms, which are impossible to observe directly. The point is that the data set of observable transactions is just one real-ization of the underlying data generation process driven by certain mechanisms. Therefore, one might wonder which data sample to use for the network inference–all the transaction data together or one or more sub-samples of the full set of trading data. In addition, in our case, the investor category consists of many investors, and we want to prevent cases where a couple of active investors or investors who trade large volumes overshadow the behaviors of other investors in the category. In our approach, we address these problems by performing the lowest resolution bootstrapping at the investor transaction level. An empirical demonstration shows that the results clearly differ between the conventional approach of using the full data set directly and our data bootstrapping approach.

Third, the transaction data for network inference suffers from a high-dimension, low-sample size problem20, as the number of investors exceeds the number of trading days. Estimating investor networks based on trading patterns requires long observation periods and sufficient data for each investor13. Since the majority of household investors are rather inactive, only a fraction of investors–the active ones–can be included in the analysis. The exclusion of inactive investors leads to the description of a sub-system; therefore, the conclusions can be difficult to generalize at the market level. In this paper, we solve this problem by assigning investors to categories accord-ing to investor attributes that are available in the data set. Such a categorization allows us to reduce the number of variables in the system significantly, but we do not exclude data, as the categories contain aggregated data from the whole system. Importantly, this approach allows for considering inactive investors and less liquid stocks with fewer trading events. The size of such a categorized network remains the same over time, whereas the size of a network of individual investors can change over time, depending on the activeness of the investors. Since investor categories are based on real attributes, we can characterize the nature of each category.

We contribute to the methodological research literature by introducing three building blocks that can be used together or separately: the investor categorization, transaction data bootstrapping and network aggregation. The main contribution of this paper is that we propose the use of a tractable multilayer and multistep aggregation procedure, by which we aggregate information from multiple layers using statistical validation. Methodologically, this approach can be used for different non-financial applications, with various network estimation methods, even for observable networks. Multilayer aggregation procedure can be used standalone in cases when the underlying networks are either directly observable of the input is a network. In that case, no data bootstrapping is necessary.

In contrast to our paper, none of the existing procedures provide tractable procedure for the aggregation of binary network layers using statistical validation. The paper that mostly closely resembles ours regarding its topic, proposes an ensemble-based network aggregation21 method that leverages the rank-product method22 to improve the accuracy of gene network reconstruction. However, the algorithm is intended to integrate gene networks inferred using different methods and genomics data sets. Other trivial network ensemble aggregation procedures include maximum and mean rules23. Another recent paper24 proposes a method for reducing the complexity of multilayer networks by aggregating the redundant layers while retaining the pertinent information about the whole system. In practice, the goal of their method is to combine similar layers and keep dissimilar layers apart. The objective of our research is different from ref.24; we are looking for the most important relationships that span multiple layers, rather than keeping information about different layers. The principal of our procedure is con-sistent with that of ref.25; however, we are aggregating layers whereas ref.25 is for the inference of unlayered data.

Secondly, we contribute the literature on investor networks by introducing transaction bootstrapping, which improves the inference of the networks. Additionally, in the investor network inference, we consider investor cat-egories, instead of individual investors. This makes the system interpretable in economic and sociological terms. As a fourth building block, we use already existing network inference methods to identify links between investors. In addition to the methodological contribution, this is the first paper that provides investor network graphs over multiple securities and time-windows. We provide empirical evidence that households in Helsinki are the most central investor category that, under the theory of information channels in investor networks14, this category is shown to represent well-informed investors.

Overall, our framework can be summarized as follows (in chronological order):

Investor categorization, where all investors in the analyzed data set are assigned to only 99 categories based on their economic and social attributes. Categorization allows to keep the number of nodes in the network constant, with sufficient transaction data for each one of them. (Investors → Investor Categories).Transaction data bootstrapping, where the analyzed data is uniformly resampled into multiple data sets for better statistical validation in the network inference. The advantage of the bootstrap is that it does not require any assumptions about the data distribution and it addresses the issue of finite observations. (Dataset → Resa-mpled Datasets).Network inference, where a selected inference method is used to identify edges between investor categories for each resampled data set to produce an ensemble of networks. In this regard, we use two existing methods:

Page 127: TUNI baltakys arkisto1

www.nature.com/scientificreports/

3SCIENTIFIC REPORTS | (2018) 8:8198

Conservative Causal Core Network (C3NET)26 and Minimum Spanning Trees (MST). In fact, any network inference method27–31 that produces or can be converted into binary, non-weighted networks could be applied. (Dataset → Network).Network aggregation, where by using statistical validation, a network ensemble is aggregated to identify significant relationships that appear across the set of networks. In our demonstration, we employ the aggre-gation procedures in three different ways revealing the most important relationships: during bootstrapping, where relationships arise for the same security during the same time period; in the time-wise aggregation, where the relationships are observed for the same security in different time periods; and in the security-wise aggregation, where the relationships are observed in the same time period over multiple securities. (Network Ensemble → Aggregated Network).

To demonstrate our multilevel aggregation approach, we use an investor-level transaction data set obtained from Euroclear Finland Ltd for our analysis. It includes transactions from 2004-01-01 to 2009-12-31 of all domes-tic investors that traded stocks listed on the Nasdaq OMX Helsinki Exchange. Each transaction also contains meta-data about the investors (the same data set is used, for example, in refs13,32,33, while ref.14 uses a similar data set of trades on the Istanbul Stock Exchange). In this data set, the attributes used to categorize investors include gender, year of birth, and postal code for households and sector code for institutions. These attributes allow us to define 99 investor categories on which our analyses are performed.

Before providing results on our data set, let us elaborate the main principle of aggregation, while leaving the technical details of statistical validation into Methods section. Figure 1 demonstrates how the ensemble of indi-vidual networks is aggregated. Overall, we have two different layers in investor networks. The first layer indicates securities and the second one indicates time. Interestingly, there are two different ways to integrate over these variables, indicated by the blue and red arrows. We show in the following that the results highlight different char-acteristics of the data.

For each time step and each security, we want to extract a network. These networks cannot be directly observed, but they are estimated using the transaction data set. By bootstrapping this data set, we generate B bootstrap data sets. Network inference is applied to each of these B data sets, resulting in an ensemble of B networks. The aggrega-tion of the B networks results in one network, indicated by (1) (see Fig. 1). Adjacent networks in the main matrix are similarly inferred for other time steps and securities. Each column is an ensemble of networks that contains infor-mation about trading relationships for different securities during the same time step, while each row is a network ensemble that contains information about trading relationships in individual securities over different time steps. In the following, we first describe the security-wise integration and then the time-wise integration.

Initially, we integrate the security-wise information for each time step contained in the columns. Network (2) represents an aggregated network for time step 1 over all securities. Repeating a similar analysis for each of the T different time steps results in further networks for the corresponding cases. To combine these T networks, we per-form aggregation again, resulting in one final network, indicated by (3) in the figure. The blue arrows in the figure represent the aforementioned steps. Alternatively, one can perform a time-wise integration first in a similar way. This type of integration follows the red arrows in the figure and applies 2 times the aggregation method because two integration steps are required. This leads to the final network indicated by (5). Interestingly, even though the

Figure 1. Aggregation of ensemble of individual networks. A set of networks in the main matrix containing two information layers–time and securities (networks adjacent to network (1)). Each of these networks is a result of applying a chosen network inference algorithm for the corresponding transaction data sets. Two information integration approaches are possible: one-layer integration of either securities (2) or time (4), for each period and security, respectively, or a multilayer approach, where both information layers are fully integrated, leading to networks (3) and (5).

Page 128: TUNI baltakys arkisto1

www.nature.com/scientificreports/

4SCIENTIFIC REPORTS | (2018) 8:8198

final networks (3) and (5) summarize the same information, because of the different aggregation order, the cap-tured relationships might be different, as shown in the results section.

In the next section, we present the results from the method application to our data set. We begin by applying our proposed techniques to single security networks. We investigate the impact of transaction bootstrapping on the network inference problem and compare a network inferred over the whole period to an aggregated network from a set of network snapshots. Next, we investigate multiple security networks. First, we use the aggregation technique to summarize information about trading in multiple securities and then we perform a two-layer aggre-gation, summarizing the information given by a series of network snapshots for a set of securities.

ResultsIn this section, we describe the network inference and aggregation process over single and multiple securi-ties by performing the analysis over the whole period of analysis and multiple non-overlapping sub-periods. Mutual information (MI) values are estimated from daily net volume time series for each investor group pair. Any methods that produce or can be converted into binary, non-weighted networks can be used and in this paper we employ two existing algorithms. As a main algorithm, we employ Conservative Causal Core (C3NET) (see Methods for details) for network inference from the MI estimates in order to demonstrate our aggregation method. Moreover, we compare the results for single security trading networks obtained using C3NET and the minimum spanning tree (MST).

We investigate single security networks using data on the most liquid security in the Helsinki stock exchange–namely, Nokia. In this section, we aim to demonstrate transaction bootstrapping impact on network inference and the aggregation of multiple networks inferred from different time periods. Also, we compare results obtained using two inference algorithms, C3NET and MST.

Network inference. We begin our results section by comparing inferred networks using C3NET algorithm with and without transaction bootstrapping. By definition, C3NET allows for establishing as many links as there are nodes in the network, if each investor group has at least one statistically significant MI estimate with some other group. In our data set for Nokia from 2004-01-01 to 2009-12-31, using C3NET, we infer 90 links. Interestingly, even after completing the categorization, some investor categories do not have a sufficient number of Nokia trans-actions to estimate relationships. For the bootstrapped version of network inference, we perform 100 transaction sampling iterations and form a network for each of them using the C3NET algorithm. The resulting ensem-ble of 100 networks contains 8853 links with 1195 different relationships. As a statistical null model for our ensembles, we choose the canonical Erdös–Rényi G(n, p) model, with a fixed number of nodes and an ensemble probability of a random link (see the methods section for more details). A fully connected ensemble would have n × (n − 1)/2 × B = 485 × 100 links; therefore, the probability of having a random link in the ensemble is esti-mated to be p = 8853/485100 = 1.82 × 10−2. By choosing significance of α = 0.01 and adjusting it by the number of tests we perform (1195), we conclude, that a relationship must be observed in at least 10 networks for it to be considered non-randomly occurring. The bootstrapped version identifies a total of 197 relationships that are sta-tistically significant. Hence, the topology is no longer limited to one link per node. Almost all relationships from the non-sampled C3NET network are found also in the bootstrapped version–that is, 77 out of 90.

The two networks are depicted in (a) and (b) sub-plots of Fig. 2. Both networks identify the same nodes as most connected, and the four most connected nodes represent households. Specifically, the most connected node represents mature Helsinki households, followed by the same age group of western Tavastians, then middle-aged western Tavastians, and finally, mature northern Finnish households. The most connected non-household groups in the bootstrapped version are non-profit organization from eastern Tavastia, non-financial company from Ostrobothnia, and financial insurance company from northern Savonia, with six relationships each.

Time-wise network aggregation. The third and fourth networks for Nokia security in Fig. 2(c,d) are obtained by aggregating two 12-network ensembles inferred from non-overlapping 6-month periods covering the whole 6-year period analyzed. As in the previous section, we compare Nokia networks inferred with and without transaction bootstrapping. The number of relationships in the non-bootstrapped version ensemble varies from 63 to 80 and from 182 to 216 in the bootstrapped version. A total of 1420 different relationships are observed throughout the 12 networks in the transaction bootstrapped network ensemble and the total number of links in the ensemble is 2361, while in the non-bootstrapped version, the numbers of relationships and links are 673 and 858, respectively. Each network contains 99 nodes, and therefore, the total possible number of links in the ensemble is equal to 12 × 4851 = 58212 and the probabilities of having a random link are estimated to be p = 2361/58212 ≈ 4.05 × 10−2 and p = 858/58212 ≈ 1.47 × 10−2. Again, by choosing the statistical significance of α = 0.01 and adjusting for the number of tests performed, a link must appear at least 5 times in order to be aggre-gated into the final network for the bootstrapped version and 4 times for the non-bootstrapped version. From Table 1, we see that in the bootstrapped version, 54 links appear at least 5 times in the 12 networks, and Fig. 3 shows the link occurrence in the ensemble. In the latter figure, we can see that some relationships are accumulated in consecutive periods while others are more scattered over time.

From Fig. 4 we can see that 43 links overlap with the bootstrapped version of C3NET for the whole period under analysis. Further, for the non-bootstrapped version, 14 relationships are inferred after time-wise aggrega-tion. Of those 14 relationships, 14 also appear in the bootstrapped version. All nodes, but two non-financial inves-tor groups that have relationships, are households. A visual inspection of all four networks in Fig. 2 reveals that the most important set of nodes in both networks inferred from the whole transaction data set is also identified as central in networks aggregated from various time window analyses.

Page 129: TUNI baltakys arkisto1

www.nature.com/scientificreports/

5SCIENTIFIC REPORTS | (2018) 8:8198

Choice of network inference method. The multilayer aggregation framework works independently of the choice for network inference, because it is applied to an ensemble of networks and not the underlying data. Here we pres-ent Table 2 to compare the obtained aggregated Nokia networks using the C3NET and MST network inference algorithms. In the table, we can see that the two similarly strict network inference algorithms yield highly similar results. We can also see that once we use network aggregation, the resulting aggregated network does not comply the definitions of respective algorithms. For example, in the aggregated networks MST no longer has (n − 1) links, where n is the number of nodes in the network (here n = 95). Also, C3NET is no longer limited to at most n links.

In the extant literature, investor trading networks have estimated for sin-gle securities (see ref.13). In this paper, we show how to aggregate the networks of 100 securities into one. The security-specific networks are inferred using the whole 6-year period with transaction bootstrapping. Next, we examine the aggregation of networks with respect to securities and time-periods. Particularly, we explore how different the resulting aggregated networks are if we first integrate security-wise and then time-wise information versus in other way around. For simplicity, we present results for the bootstrapped version of C3NET and results for MST are available upon request.

Figure 2. Four networks of investor group trading relationships in Nokia security. Investor group positions are fixed in all four plots. Node sizes depend on node degrees in each network. The first network (a) is inferred using the C3NET algorithm on the original data set. The second network (b) is inferred by bagging C3NET. For the third (c) and fourth (d) networks, the whole six-year period is divided into 12 six-month sub-periods. For each of those 12 sub-periods, a C3NET and bagged C3NET networks are inferred. Then those 12 networks are aggregated into a final network that covers the whole 6-year period. - Households, - Non-profit organizations,

- Other companies - Financial and insurance companies, - Government institutions.

Number of occurrences 12 11 10 9 8 7 6 5 4 3 2 1Links 3 0 2 6 6 11 12 14 38 96 312 920Cumulative 3 3 5 11 17 28 40 54 92 188 500 1420

Table 1. Number of link occurrences in the Nokia ensemble inferred over non-overlapping six-month periods using the bootstrapped version of C3NET. We can see that only one link appears in all 12 networks, while 992 links appear only once.

Page 130: TUNI baltakys arkisto1

www.nature.com/scientificreports/

6SCIENTIFIC REPORTS | (2018) 8:8198

Security-wise aggregation. Here, we aim to incorporate information about investor group trading relationships in 100 securities over the whole 6-year period. The number of inferred relationships across different securities ranges from 88 to 261 while the total number of detected relationships in the ensemble is 3218. Subsequently, for the ensemble of 100 security networks, we apply the same aggregation procedure as before. From the observed number of links in the ensemble and total possible number of links in a fully connected ensemble of this size, we estimate the probability of random links to be p = 20229/485100 = 4.17 × 10−2. Then, for a significance level of

Figure 3. 54 most re-occurring links in 12 Nokia networks estimated over non-overlapping 6-month periods. - inferred relationship, □ - no relationship.

Figure 4. Link overlap in variously inferred networks for Nokia. 6y. is the network inferred using transaction bootstrapping over the whole period. All the other networks are inferred on shorter windows (1, 2, 3, 4, 6, 12, 24 months) and then aggregated into a network that covers the whole period under analysis. From the figure, we can observe that most of the relationships inferred over the longer observation windows are also found in the shorter window analyses.

Without transaction bootstrapping With transaction bootstrappingLength of (sub-)periods 6 y. 6 y. 1 m. 2 m. 3 m. 4 m. 6 m. 12 m. 24 m.Panel A: LinksMST 94 215 333 137 97 84 59 39 28C3NET 90 197 247 93 78 60 54 41 29MST ∩ C3NET 90 177 219 83 67 53 49 32 25Panel B: NodesMST 95 96 51 42 38 40 38 35 28C3NET 95 96 48 41 36 33 37 36 29MST ∩ C3NET 95 96 48 37 32 32 36 33 26

Table 2. Here we compare the number of estimated links and nodes having a link in the aggregated networks for Nokia security. As the length of time period, “6 y.” is the network inference algorithms applied to the whole 6-year period. “1 m.”, “2 m.”,… are the bootstrapped versions of network inference algorithms applied over 1, 2,… month periods and then aggregated over all respective periods to cover the whole 6 years.

Page 131: TUNI baltakys arkisto1

www.nature.com/scientificreports/

7SCIENTIFIC REPORTS | (2018) 8:8198

α = 0.01, we apply Bonferroni adjustment in 3218 tests and end up with a threshold of 16 link occurrences in the ensemble, which leaves 236 links in the aggregated network. Households represent the majority of groups with relationships over multiple securities. Furthermore, two of the most central nodes are mature and middle-aged household investor groups from Helsinki, with 49 and 30 relationships, respectively. The two most central non-household investor groups are financial and non-financial companies in Helsinki, both with 9 relationships to other investor groups.

Here we leverage the previously introduced time-wise and security-wise network aggregation procedures. Our goal is to produce a single network that can summarize the trading relationship information inferred for 100 securities over multiple and various sizes time windows. We investigate networks inferred over seven different non-overlapping time windows–that is, 1, 2, 3, 4, 6, 12, and 24 months. Each security respectively has 72, 36, 24, 18, 12, 6, and 3 such networks, covering the whole 6-year period under analysis. Our starting point is a set of net-work ensembles inferred using bootstrapped C3NET algorithm for 100 securities for all analyzed time window sizes. For instance, in the case of the 6-month window, we have 12 networks for each of the 100 securities–that is, an ensemble of 12 × 100 = 1200 networks (corresponding to the networks in the main matrix of Fig. 1). We must also keep in mind that the aggregated network will differ depending on the order of information aggrega-tion–that is, if relationship time-wise or security-wise information is summarized first. Accordingly, we describe the results of using both approaches and compare the final results. By performing the time-wise aggregation first, we end up with a 100-network ensemble, with one network for each security. Links in each network represent the most important reoccurring relationships in corresponding securities. Conversely, if we start with security-wise aggregation, we end up with an ensemble of 12 networks. Each of the 12 networks contains the most important relationships that are present over multiple securities, but this might be a different set of securities in each period. Next, for the two ensembles stemming from the first aggregation procedure, we perform the final aggregation, yielding a network summarizing the relationships of investor groups in their trading behavior over 100 securities for the whole period under analysis. However, the two final networks are not the same (see the networks in Fig. 5). Table 3 compares the links and nodes in the final networks for various window sizes. For each of the seven time windows, we obtain two networks, depending on the order of the aggregation procedure; thus, together with the security-wise aggregated network for the whole period from the previous section, we compare 15 networks. Figure 6 summarizes the node degrees in all 15 final networks. Node degree sequences are highly correlated, with Spearman’s correlation ranging from 0.65 to 0.99. Similar to the whole period security-wise aggregated network, networks in Fig. 5 identify mature and middle-aged household investor groups from Helsinki as the most central groups, while financial and non-financial company investor groups from Helsinki are most central non-household investor groups.

The networks are designed to reflect associations in investors’ trading behavior and hence the high centrality reflects that there are many other groups of investors that behave in a mutually predictable way. Conversely, if an investor category is measured to have low centrality, then investors in this category are trading differently compared with other investor categories. From the point of view of information networks in financial markets14, high centrality of a node indicates that investors in the node have many private information channels from other investors. Therefore, given that association between investors trading patterns reflects information transfer, high centrality reflects good access to private information. In this paper we con-sider both positive and negative linkages to represent information transfer, that is two investors can employ the mutual information channel to trade in the same or opposite directions. Technically, we have used three centrality

Figure 5. Networks summarizing investor group trading similarities in 100 securities over 6 years. The starting point for both networks is a set of 1200 networks inferred for each security over 12 six-month, non-overlapping periods. The difference between networks comes from the aggregation order. The first network is first aggregated security-wise and then time-wise while the third network is aggregated in reverse order. Network (3) in Fig. 1 represents network (a), while network (5) in the same figure represents network (b). - Households,

- Non-profit organizations, - Other companies - Financial and insurance companies, - Government institutions.

Page 132: TUNI baltakys arkisto1

www.nature.com/scientificreports/

8SCIENTIFIC REPORTS | (2018) 8:8198

measures: Degree (fraction of nodes it is connected to), load centrality (fraction of shortest paths that include the given node), and closeness (1 divided by the average shortest-path distance for a given node).

Regarding the networks of investors over all the securities and 12 half-year periods (see Fig. 5, the two most central investor categories are Middle-Age and Mature Household investors located in Helsinki. This result is robust across all the measures, i.e. degree, load centrality and closeness. A deeper investigation of the results shows household traders trade in a rather mutually predictable way between each other, though links to financial and non-financial companies exist, too. From the point of view of information propagation, these investors, i.e. households in the capital, are well-connected to other investor groups across the country. Potentially, this can be explained by rather strong and long-lasting internal migration and industrial changes in Finland to urban cen-tres and especially to the capital of Helsinki (see, for example, ref.34). Particularly, such work-related migration can strengthen the social ties between different regions of the country. In addition, top-10 central nodes include not only household investors of different age groups and regions, but also Financial and Insurance Companies and Non-Financial Companies, which are among 3–7 most central nodes (the rank depends on the centrality measure). This can be seen also from Fig. 5, where the node of Financial and Insurance Companies is with blue color and Non-Financial Companies in green. This means that the investment behavior of financial institutions is not very different from the behavior of other investor groups. In fact, this result does not support some earlier findings that institutional investors and households (all households collected together) exhibit different trading strategies (see ref.32). Rather, based on our findings, there are large groups of households whose trading patterns are strongly associated with financial institutions, which we can observe from validated networks, aggregated over multiple stocks.

DiscussionThe advantage of our aggregation approach is that no arbitrary link-filtering threshold is needed. Instead, the algorithm adjusts this itself depending on a chosen significance level and the properties of the investigated net-work ensemble. We found that networks aggregated over multiple time-periods and inferred over the whole period significantly differed in the number of relationships inferred and the number of nodes having relation-ships. However, a similar set of nodes was identified as central in both cases. Second, when the networks are aggregated over multiple securities and time-windows, two-layer aggregation yields different network descrip-tions depending on the order of information aggregation. It is worth mentioning that the aggregation of time-wise and security-wise trading relationships could be performed in a single step, in which case there would be no confusion about the aggregation order. However, in that case, the meaning of network relationships would be obscure. We would be neither certain that investor categories were similarly trading over a significant number of the same securities nor that they were trading similarly over a significantly large number of the same periods; further, the definition of a single step aggregation would be somewhere in-between, in some cases perhaps failing to meet both criteria. This question will be addressed in the future research.

Second, to the best of our knowledge, we are the first to propose the use of lowest resolution–that is, transaction-level–bootstrapping as the means for statistically validating investor network relationships. The advantage of transaction bootstrapping is that it enables network inference over shorter time windows to provide insight into the dynamics of these relationships. Most of the research has been focused on inferring static or time-invariant investor networks, and much less has been done to infer the dynamic relationships that are con-stantly evolving over time. Indeed, over the course of time, multiple interchanging processes may determine the behavior of investor categories, and such processes can be dynamic and stochastic. Therefore, investor behavior at each time point is dependent on these processes, and investor networks can undergo significant topological changes, rather than being invariant over time. Transaction bootstrapping is a viable strategy for network infer-ence because it not only allows for assigning statistical significance to link existence but also enhances the robust-ness of the relationships to specific realizations of the trading outcome.

Finally, we introduced investor grouping into categories based on their attributes. This approach allows for performing any analysis by discarding less information. Also investor category networks based on investor attributes have not been investigated previously in the literature. The vulnerability of the investor categorization approach is that the ensuing analysis is ultimately dependent on the category definition.

In the results section, we observed that Helsinki households represented the most connected investor cat-egory. Under the theory of information channels in investor networks14, this category is shown to represent

Window size

Nodes LinksST\TS ST ∩ TS TS\ST Jaccard ST\TS ST ∩ TS TS\ST Jaccard

1 0 34 4 0.8947 1 264 85 0.75432 0 40 1 0.9756 19 282 15 0.89243 1 39 0 0.9750 40 263 7 0.84844 0 42 2 0.9545 46 259 8 0.82756 3 42 0 0.9333 86 233 3 0.723612 13 38 0 0.7451 132 176 4 0.564124 5 42 0 0.8936 62 93 20 0.5314

Table 3. Summary of node and link overlap in various window size final networks. ST stands for the network where the first aggregation layer is security-wise (network (3) in Fig. 1) and TS is the network where the first aggregated layer is time-wise (network (5) in Fig. 1).

Page 133: TUNI baltakys arkisto1

www.nature.com/scientificreports/

9SCIENTIFIC REPORTS | (2018) 8:8198

well-informed investors. In fact, the important role of household investors has been identified in the litera-ture35–38. For example, according to38, households are contrarian traders, leading them to serve as liquidity pro-viders to institutional investors.

In our future research, we will employ this framework to analyze the relationships between buyers and sellers under different market conditions. Additionally, we expect that in the future research this framework will be exploited with non-financial applications, too, such as social networks39, different communication channels40–42, transportation43, and co-authorship44 networks.

MethodsWe use an investor-level transaction data set covering the period from 2004-

01-01 to 2009-12-31 of all trades executed by domestic investors on the Helsinki Stock Exchange. The data set is composed of transactions belonging to 443556 investors trading in 100 securities over 6 years. The analyzed security list includes the top 100 securities ranked by number of investors and transactions. Each investor in the data set is assigned to a sector group: Financial and Insurance, Government, Non-Financial, and Non-Profit companies, and Finnish Households. Households are further divided into five age groups: Under-Aged (0, 18], Young (18, 30], Middle-Aged (30, 50], Mature (50, 64], and Retired (64, +∞]. Age attributes are derived for each transaction separately, taking into account the difference between the transaction date and the year of birth of the corresponding investor. All of these groups are also distributed geographically by assigning investor postal codes to 11 regions: Helsinki, Rest-Uusimaa, Eastern-Tavastia, South-West, Western-Tavastia, Central-Finland, South-East, Ostrobothnia, Northern-Savonia, Eastern-Finland, Northern-Finland. Together, these assignment rules form 99 investor categories. Each transaction in the dataset is assigned to one of these categories.

In terms of the number of investors, the largest categories are households in various regions and of different age. For example, the category “Helsinki households, Middle-Age” includes 58 124 individual household inves-tors. In contrast, there are 343 members in the category “Helsinki financial-insurance companies”. The smallest categories are these ones for General Government, having 6–84 members, depending on the location.

In terms of the average number of trading days, the most active investor category is middle-age individuals (households) in Helsinki area with approx. 1 229 trading days/security over this data period. This is a high num-ber, because this category trades a security on average in ∼80% of all the unique trading days. Mature individuals (households), Financial and Insurance Institutions, and Non-Financial Companies, all in Helsinki area, followed with 1.097, 974, and 929 average trading days/securities, respectively. In contrast, general government investors categories located elsewhere than in Helsinki traded just on a few days over the whole period. On the other hand, if activity is measured by the number of transactions, then the most active category is “Helsinki financial-insurance companies” traded over 11 million times over the six-year period. Non-financial Companies and mature house-hold investors, both in Helsinki, followed, having around 3.5 million and 2.3 transactions over the same period. Again, the categories for General Government located elsewhere than in Helsinki were inactive in terms of the number of transactions, having just a few transactions over the whole period.

The data that support the findings of this study are available from Euroclear Finland Ltd., however, are not available from the authors under the non-disclosure agreement signed with the data provider.

For network inference, we perform B bootstrap iterations. For each bootstrap iteration, we uniformly re-sample with replacement the whole transaction data set under investigation, yielding a replica dataset of the same size as the original, but with randomly chosen transactions. More specifically, some transactions from the investigated dataset might appear more than once, while others might not be included at all, but the total number of transactions in the bootstrapped dataset will equal the number in the original dataset. Then, for each sampled transaction set b, we aggregate daily transaction records for each category, resulting in a net traded volume matrix b, where b ∈ {1 … B} and the columns of the matrix w t( )i

b are the net traded volume time-series of investor group i.

We estimate the MI values using net volume matrix. For simplicity, we assume that the joint distribution of net traded volumes is normal. Then we can calculate the MI analytically

Figure 6. Node degree comparison in final networks aggregated over 100 securities and non-overlapping time windows covering the whole 6-year period. T{W}_{O}, where W stands for the window size and O stands for aggregation order, either security-wise or time-wise first. The row with 6 y. inference strategy stands for the network aggregated from networks that were inferred using the whole six-year data set for each security.

Page 134: TUNI baltakys arkisto1

www.nature.com/scientificreports/

1 0SCIENTIFIC REPORTS | (2018) 8:8198

from Pearson’s correlation. If wj(t) are the net traded volume time-series in some security of investor group j, then the correlation coefficient between investor categories i and j is defined as ρij = [⟨wi(t) × wj(t)⟩ − ⟨wi(t)⟩ × ⟨wj(t)⟩]/[σi × σj], where ⟨·⟩ and σ• denote the mean and standard deviation. Then MI is defined as =I i j( , ) ρ− −log(1 )ij

12

2 .

We apply a chosen network inference method to MI estimates obtained from the net traded volume matrix b. A specific requirement for the inference method is that it is computationally efficient for handling a large bootstrap ensemble. For this reason, we have chosen the C3NET26 inference method as our main network inference method, and MST as a comparison example. C3NET is intended to infer a significant maximum MI network. This algorithm comprises three basic steps:

1. MI values are estimated for each investor category pair. 2. Each MI value estimate is tested against a null hypothesis of vanishing MI. The null-hypothesis H0:I(i,

j) = 0, i.e. the MI between investor group i and j is zero.In order to test the statistical significance of the MI estimates, we need to procure an appropriate null distribution. To do that, we independently resample with replacement dates, traded volumes, and categories completely eliminating any relationship between them. Then we aggregate daily transaction records for each category, resulting in a net traded volume matrix

∼b. We do this multiple times and each

time we estimate MI values between pairs of investor groups. These values result in an estimate of the null distribution, which we use to find statistically significant MI values.

3. From statistically significant MI values each investor group is allowed to keep a single link, with the strong-est statistically significant MI value. The resulting binary network has at most M relationships in a system of M nodes.

MST is another strict network filtering algorithm. It keeps the subset of edges, that connect all nodes, while at the same time having the smallest possible total sum of edge weights. In order to use MST together with MI estimates, we transform MI values by multiplication with −1, then the network construction is straightforward:

1. Sort pairwise MI estimates in a ascending order. 2. Take the node pair with the lowest −1 × MI value and add a link between them in the network if it doesn’t

create a cycle in the network. 3. Remove the previously chosen MI value from the queue. 4. Repeat steps 2 and 3 until all nodes are connected and there are still unused MI values in the queue.

The aggregation procedure takes an ensemble of N independent undirected binary networks =G{ }k k

N1 as an input and gives a single network G as an output. Our procedure is methodologically similar

with ref.25, where the inference was improved by aggregating bootstrap ensemble of gene regulatory networks, though we use the principle to aggregate layers, not bootstrap ensemble. In our demonstration, the number of networks in the ensemble N takes the values of B when aggregating the bootstrap ensemble, T when aggregating over non-overlapping time periods and S when aggregating over securities.

1. The network ensemble is aggregated into a weighted network →=G G{ }k kN

w1 , where the edge weights in the network Gw correspond to the number of particular edge occurrences in the ensemble. For example, the weight of an edge between investor groups i and j is defined as

∑= ==

n G i j G i j( , ) ( , ),(1)ij w

k

N

k1

where nij may assume integer values between 0 and N. 2. We conduct a statistical hypothesis test to remove the need for an arbitrary link threshold parameter:

H n0

ij: The number of networks nij in the ensemble with an edge between i and j is less than n0(α), where α is the significance level.We define p as the probability of two investor groups being randomly connected. We estimate the probabil-ity p, for two groups to be connected by chance in an N network ensemble, as the fraction of the actual number of edges in the ensemble ∑ > G i j{ ( , )}i j k k, to the number of all possible links in the ensemble N × (n(n − 1)/2), where n is the number of investor groups. Then nij follows a binomial distribution,

p N( , ) and

∑= ≥ = −=

−( )p n n Nn p p( ) (1 )

(2)ij ij

n n

Nn N n

ij

is the probability of observing by chance the link between investor groups i and j more than nij times. 3. In order to control the family-wise error rate, we leverage the strict Bonferroni multiple hypothesis test

correction (MTC) procedure. Following the Bonferroni procedure, we adjust the chosen significance level α by the number of tests (ntests) we perform: αadjusted = α/ntests. Therefore, nodes in the aggregated network G are connected if pij < αadjusted, where αadjusted is the significance level.

Page 135: TUNI baltakys arkisto1

www.nature.com/scientificreports/

1 1SCIENTIFIC REPORTS | (2018) 8:8198

1. For a set of securities S and a number of non-overlapping inference periods T, we infer S × T networks {Gst}(S×T). For all security × inference period combinations, we perform transaction bootstrapping, network inference and bootstrap network ensemble aggregation, yielding the {Gst}(S×T) ensemble. Each network represents significant relationships between investor groups for specific securities at periods.

2. Then we apply the network aggregation procedure over securities for each period t, ∀ ∈ ⎯ →⎯⎯⎯⎯=t T G G: { }st s

St1

networkaggregation , we end up with an ensemble of networks =G{ }t tT

1. Each of the {Gt} networks represents significant relationships between investor groups that occur over multiple securities during period t. Similarly, if we apply the network aggregation procedure over time for each security s, ∀ ∈ ⎯ →⎯⎯⎯⎯=s S G G: { }st t

T networkaggregations1 , we end up with an ensemble of networks =G{ }s s

S1, where each of the

networks {Gs} represents the most important over time reoccurring relationships between investor groups in security s.

3. Finally, we aggregate the second layer of information. ⎯ →⎯⎯⎯⎯=G G{ }t tT

1networkaggregation and

⎯ →⎯⎯⎯⎯=ˆG G{ }s s

S1

networkaggregation appropriately. Both aggregation sequences lead to unique networks.

⇒ ∀ ⎯→ ⇒ ⎯→⎯

⇒ ∀ ⎯→ ⇒ ⎯→⎯≠× =

. .=

. .

× =. .

=. . ˆ

ˆG t G G G G

G s G G G GG G

{ } { } { } { }

{ } { } { } { },

(3)

st S T st sS

t t tT

st S T st tT

s s sS

( ) 1n agg

1n agg

( ) 1n agg

1n agg

Both G and G are accordingly equivalent to the illustrated networks (3) and (5) in Fig. 1.

References 1. Kivelä, M. et al. Multilayer networks. Journal of Complex Networks 2, 203–271 (2014). 2. De Domenico, M. et al. Mathematical formulation of multilayer networks. Physical Review X 3, 041022 (2013). 3. Boccaletti, S. et al. The structure and dynamics of multilayer networks. Physics Reports 544, 1–122 (2014). 4. Bargigli, L., Di Iasio, G., Infante, L., Lillo, F. & Pierobon, F. The multiplex structure of interbank networks. Quantitative Finance 15,

673–691 (2015). 5. Musmeci, N., Nicosia, V., Aste, T., Di Matteo, T. & Latora, V. The multiplex dependency structure of financial markets. arXiv preprint

arXiv:1606.04872 (2016). 6. Battiston, S. et al. Complexity theory and financial regulation. Science 351, 818–819 (2016). 7. Newman, M. E. J. The structure and function of complex networks. SIAM Review 45, 167–256 (2003). 8. Dehmer, M. & Emmert-Streib, F. (eds) Analysis of Complex Networks: From Biology to Linguistics (Wiley-VCH, Weinheim, 2009). 9. Cimini, G., Squartini, T., Garlaschelli, D. & Gabrielli, A. Systemic risk analysis on reconstructed economic and financial networks.

Scientific reports 5, 15758 (2015). 10. Barucca, P. et al. Network valuation in financial systems. arXiv preprint arXiv:1606.05164 (2016). 11. Haldane, A. G. & May, R. M. Systemic risk in banking ecosystems. Nature 469, 351 (2011). 12. Cont, R., Moussa, A. & Santos, E. B. e. Network structure and systemic risk in banking systems. Available at SSRN: https://ssrn.com/

abstract=1733528 (2010). 13. Tumminello, M., Lillo, F., Piilo, J. & Mantegna, R. N. Identification of clusters of investors from their real trading activity in a

financial market. New Journal of Physics 14, 013041 (2012). 14. Ozsoylev, H. N., Walden, J., Yavuz, M. D. & Bildik, R. Investor networks in the stock market. The Review of Financial Studies 27,

1323–1366 (2013). 15. Gualdi, S., Cimini, G., Primicerio, K., Di Clemente, R. & Challet, D. Statistically validated network of portfolio overlaps and systemic

risk. Scientific reports 6 (2016). 16. Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. Icwsm 8, 361–362

(2009). 17. Jacomy, M., Venturini, T., Heymann, S. & Bastian, M. Forceatlas2, a continuous graph layout algorithm for handy network

visualization designed for the gephi software. PloS one 9, e98679 (2014). 18. Ranganathan, S., Kivelä, M. & Kanniainen, J. Dynamics of investor spanning trees around dot-com bubble. arXiv preprint

arXiv:1708.04430 (2017). 19. Emmert-Streib, F. & Dehmer, M. Influence of the time scale on the construction of financial networks. PloS one 5, e12884 (2010). 20. Bernardo, J. et al. Bayesian factor regression models in the “large p, small n” paradigm. Bayesian statistics 7, 733–742 (2003). 21. Zhong, R., Allen, J. D., Xiao, G. & Xie, Y. Ensemble-based network aggregation improves the accuracy of gene network

reconstruction. PloS one 9, e106319 (2014). 22. Breitling, R., Armengaud, P., Amtmann, A. & Herzyk, P. Rank products: a simple, yet powerful, new method to detect differentially

regulated genes in replicated microarray experiments. FEBS letters 573, 83–92 (2004). 23. Polikar, R. Ensemble based systems in decision making. IEEE Circuits and systems magazine 6, 21–45 (2006). 24. De Domenico, M., Nicosia, V., Arenas, A. & Latora, V. Structural reducibility of multilayer networks. Nature communications 6, 6864

(2015). 25. de Matos Simoes, R. & Emmert-Streib, F. Bagging statistical network inference from large-scale gene expression data. PLoS One 7,

e33624 (2012). 26. Altay, G. & Emmert-Streib, F. Inferring the conservative causal core of gene regulatory networks. BMC Systems Biology 4, 132 (2010). 27. Mantegna, R. N. Hierarchical structure in financial markets. The European Physical Journal B-Condensed Matter and Complex

Systems 11, 193–197 (1999). 28. Tumminello, M., Aste, T., Di Matteo, T. & Mantegna, R. N. A tool for filtering information in complex systems. Proceedings of the

National Academy of Sciences of the United States of America 102, 10421–10426 (2005). 29. Peng, J., Wang, P., Zhou, N. & Zhu, J. Partial correlation estimation by joint sparse regression models. Journal of the American

Statistical Association 104, 735–746 (2009). 30. Boginski, V., Butenko, S. & Pardalos, P. M. Statistical analysis of financial networks. Computational statistics & data analysis 48,

431–443 (2005). 31. Onnela, J.-P., Kaski, K. & Kertész, J. Clustering and information in correlation based financial networks. The European Physical

Journal B-Condensed Matter and Complex Systems 38, 353–362 (2004).

Page 136: TUNI baltakys arkisto1

www.nature.com/scientificreports/

1 2SCIENTIFIC REPORTS | (2018) 8:8198

32. Grinblatt, M. & Keloharju, M. The investment behavior and performance of various investor types: a study of finland’s unique data set. Journal of financial economics 55, 43–67 (2000).

33. Berkman, H., Koch, P. D. & Westerholm, P. J. Informed trading through the accounts of children. The Journal of Finance 69, 363–404 (2014).

34. Olli, K. Internal migration and specialising labour markets in finland. Finnish Yearbook of Population Research 103–125 (2001). 35. Grinblatt, M. & Keloharju, M. How distance, language, and culture influence stockholdings and trades. The Journal of Finance 56,

1053–1073 (2001). 36. Grinblatt, M. & Keloharju, M. What makes investors trade? The Journal of Finance 56, 589–616 (2001). 37. Kaniel, R., Liu, S., Saar, G. & Titman, S. Individual investor trading and return patterns around earnings announcements. The

Journal of Finance 67, 639–680 (2012). 38. Kaniel, R., Saar, G. & Titman, S. Individual investor trading and stock returns. The Journal of Finance 63, 273–310 (2008). 39. Scott, J. Social network analysis (Sage, 2017). 40. Onnela, J.-P. et al. Structure and tie strengths in mobile communication networks. Proceedings of the national academy of sciences

104, 7332–7336 (2007). 41. Newman, M. E., Forrest, S. & Balthrop, J. Email networks and the spread of computer viruses. Physical Review E 66, 035101 (2002). 42. Isella, L. et al. What’s in a crowd? analysis of face-to-face behavioral networks. Journal of theoretical biology 271, 166–180 (2011). 43. Guimera, R., Mossa, S., Turtschi, A. & Amaral, L. N. The worldwide air transportation network: Anomalous centrality, community

structure, and cities’ global roles. Proceedings of the National Academy of Sciences 102, 7794–7799 (2005). 44. Liu, X., Bollen, J. & Nelson, M. L. & Van de Sompel, H. Co-authorship networks in the digital library research community.

Information processing & management 41, 1462–1480 (2005).

AcknowledgementsThe research project leading to these results received funding from the EU Research and Innovation Programme Horizon 2020 under grant agreement No. 675044 (BigDataFinance).

Author ContributionsAll authors designed the experiment, wrote and reviewed the main manuscript text. K.B. prepared all figures (Figure 1 together with F.E.) and conducted the empirical analysis.

Additional InformationCompeting Interests: The authors declare no competing interests.Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or

format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-ative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per-mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2018

Page 137: TUNI baltakys arkisto1

PUBLICATION

III

Clusters of Investors Around Initial Public OfferingBaltakiene, M., Baltakys, K., Kanniainen, J., Pedreschi, D. and Lillo, F.

Palgrave Communications 5.129 (2019)DOI: 10.1057/s41599-019-0342-6

Publication reprinted with the permission of the copyright holders

Page 138: TUNI baltakys arkisto1
Page 139: TUNI baltakys arkisto1

ARTICLE

Clusters of investors around initial public offeringMargarita Baltakienė 1*, Kęstutis Baltakys1, Juho Kanniainen1, Dino Pedreschi2 & Fabrizio Lillo3

ABSTRACT The complex networks approach has been gaining popularity in analysing

investor behaviour and stock markets, but within this approach, initial public offerings (IPOs)

have barely been explored. We fill this gap in the literature by analysing investor clusters in

the first two years after the IPO filing in the Helsinki Stock Exchange by using a statistically

validated network method to infer investor links based on the co-occurrences of investors’

trade timing for 69 IPO stocks. Our findings show that a rather large part of statistically

similar network structures form in different securities and persist in time for mature and IPO

companies. We also find evidence of institutional herding.

https://doi.org/10.1057/s41599-019-0342-6 OPEN

1 Unit of Computational Sciences, Tampere University, Korkeakoulunkatu 1, 33720 Tampere, Finland. 2 Department of Computer Science, University of Pisa,Largo B. Pontecorvo 3, 56127 Pisa, Italy. 3 Department of Mathematics, University of Bologna, Piazza di Porta San Donato 5, 40126 Bologna, Italy.*email: [email protected]

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 1

1234

5678

90():,;

Page 140: TUNI baltakys arkisto1

Introduction

Initial public offerings (IPOs) play an important role infinancial markets because they open new investment oppor-tunities, redistribute funds’ allocations and attract new inves-

tors to the market. An IPO is usually a long-awaited event in thelife of a privately held company, both for the current stockholdersand the public exchange investors, giving the owners the oppor-tunity to cash in and giving the investors a chance to gain frompotential underpricing and future returns. Here, numerous financialstudies have addressed various behavioural biases in relation toIPOs: Ljungqvist and Wilhelm Jr (2005) analysed the satisfactionwith an IPO underwriter’s performance, Ljungqvist and Wilhelm Jr(2003) indicated a unique pricing behaviour around the dot-combubble, while Kaustia and Knüpfer (2008) found that investors’personal experiences and previous IPO returns have a significantimpact on future IPO subscriptions. Other studies have analysedIPO investments (Karhunen and Keloharju, 2001), IPO earnings(Spohr, 2004) and IPO underpricing (Keloharju, 1993) in financialmarkets on an aggregated level.

Financial markets, in turn, are complex systems comprised offinancial decisions, information flows and direct and indirectinvestor interactions. A typical aspect of a financial market is mul-tidimensionality and agent heterogeneity (Lakonishok and Maberly,1990; Musciotto et al., 2016). Making an investment decision is acomplex procedure because it is layered with different choices thatare influenced by various market factors, investors’ experiences,wealth and investors’ stage of life. It is crucial to understand thecharacteristics of the underlying investor behaviour patterns becausethese, when combined with their behaviours, shape the dynamics ofthe whole market and thus are important factors in explaining thebooms and bubbles in the financial markets (Ranganathan et al.,2018). Because investors seek higher returns, one possibility is to usesocial networks and other private information channels to followother investors’ strategies and to exploit privately channelled infor-mation in stock markets. Recently, Baltakys et al. (2018a) providedevidence of the negative relationship between distance and tradetiming similarity for household investors, indicating that face-to-facecommunication is still important in financial decision making.According to Ozsoylev et al. (2013), information links can beidentified from realised trades because investors who are directlylinked in the information network tend to time their transactionssimilarly. We follow this idea and use observations on investor-leveltransactions from shareholder registration data to identify the linksbetween investors, here with a special focus on identifying investorclusters. Prior studies have investigated the structures of investornetworks in different contexts (Ozsoylev et al., 2013; Tumminelloet al., 2012; Gualdi et al., 2016; Musciotto et al., 2018; Ranganathanet al., 2018; Baltakys et al., 2018b), but investor clusters around IPOshave barely been explored.

We address this research gap by performing a broad multistockexploratory analysis of investor clusters over 69 stocks in the firsttwo years of their IPO. In particular, we seek to establish whetherthe identified investor clusters are persistent over the first twoyears of the IPOs and appear across multiple IPO securities, aswell as with existing, mature stocks in the market. Our analysisunveils statistically robust investor clusters that form simulta-neously in various securities, and that persist over time.

Most of the earlier papers perform analyses on an aggregatedcategory level (Karhunen and Keloharju, 2001; Grinblatt andKeloharju, 2001; Lillo et al., 2015; Siikanen et al., 2018) or con-centrate on a single highly liquid stock (Tumminello et al., 2012;Musciotto et al., 2018). Even though earlier studies might haveincluded nearly all market participants (Tumminello et al., 2011a;Musciotto et al., 2018), due to the focus on a single most liquidsecurity, the results were limited and insufficient to concludewhat strategies investors employ when trading over multiple

securities. In contrast to previous research in the IPO literature,the current study is the first one on early-stage trading behaviourpatterns on an individual investor account level. On the otherhand, in opposition to the existing research on investor networks,in the current paper, instead of focusing of heavily capitalisedstocks we analyse collective investor trading strategies thatemerge after IPOs in the Helsinki Stock Exchange (HSE).

With the growing amounts of data and the availability of newdatasets, the network theory has become a popular approach inanalysing financial complex systems (e.g., Emmert-Streib et al.,2018). Notwithstanding the high interest in the market structure,investor networks and the complexity of investor behaviouralinterrelationships remain weakly explored. Indeed, high precisionfinancial investor-level datasets covering years of historical dataand containing information about the social links are very rareand expensive because of their sensitive nature. Moreover,transactional data often have no explicit or implicit links betweeninvestors. As a consequence, the network inference methodolo-gies have gained much interest in recent research (Ozsoylev et al.,2013; Gualdi et al., 2016). Similar to Musciotto et al. (2018), weuse the statistical validation method proposed by Tumminelloet al. (2011a), which best suits our objectives and the availabledataset.

In the current paper, we infer investor networks based on theinvestors’ trading co-occurrences for 69 securities that had theirIPOs between the years 1995 and 2007, and we obtain multilinknetworks covering two years after their IPOs. Further, by applyingthe Infomap algorithm (Rosvall and Bergstrom, 2008) on theinvestor networks, we obtain clusters of investors that share hightrade-timing synchronisation. With the obtained network parti-tioned into clusters, we detect statistically robust clusters that persistin the networks between the first and the second years after theIPO. We also find clusters that form and re-occur over multiplesecurities. Finally, by cross-validating investor clusters on IPOsecurities with the investor clusters of more mature stocks, weconclude that the phenomenon of persistent clusters observed inearlier studies (see e.g. Musciotto et al., 2018) is not limited tomature companies but is also observable in young securities duringthe first years after their IPO.

Dataset and methodologyDataset. In this paper, we use a unique database provided byEuroclear Finland. The dataset contains all transactions executedin the HSE by Finnish stocks shareholders between 1995 and2009 on a daily basis. The data records represent the officialcertificates of ownership and include all the transactions executedin the HSE that change an ownership of assets. Each transactionin the dataset has a rich set of attributes—such as investor sectorcode, investor birth year, gender and postal code—that we makeuse of in our analysis to identify and characterise the investorgroups. The dataset classifies investors into six main categories:households; nonfinancial corporations; financial and insurancecorporations; government; nonprofit institutions; and the rest ofthe world. Finnish domestic investors correspond to a separateaccount ID, while foreign investors can choose the nomineeregistration for the trades. However, the analysis cannot beconducted for nominee-registered transactions because individualnominee investors cannot be uniquely identified. Rather, thenominee investors are pooled together under the custodian’snominee trading account. Therefore, a single nominee-registeredinvestor’s account holdings may correspond to a large aggregatedownership of several foreign investors. So to avoid inconsistenciesin the results, we eliminated nominee transactions from ouranalysis. This dataset has been also analysed and described in

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

2 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 141: TUNI baltakys arkisto1

previous research (e.g., Ilmanen and Keloharju, 1999; Baltakyset al., 2018a, 2018b; Ranganathan et al., 2018; Siikanen et al.,2018).

The analysed data are restricted to marketplace transactions forsecurities that had their IPO listing in the HSE between 1995 and

2009. The official listing dates were provided by NASDAQ OMXNordic explicitly for the current research. We analyse 691,2 stocksin total that were listed in Finland on the Main Exchange or FirstNorth in the given time period (Table 1). Some companies (e.g.Oriola) have two share classes with different voting rights. Class

Table 1 Summary of IPO stocks

ISIN Company name Industry Total # of transactions # of unique investors IPO date

FI0009004881 Aspoyhtymä Industrials 13,157 2070 1995-01-12FI0009800346 Orion B Basic materials 399,268 45,588 1995-05-11FI0009800320 Orion A Basic materials 116,334 18,132 1995-05-11FI0009900336 Lemminkäinen Industrials 94,849 13,269 1995-06-01FI0009005318 Nokian Renkaat Consumer goods 1,152,852 60,476 1995-06-01FI0009800643 YIT Industrials 896,718 54,808 1995-09-04FI0009005870 Konecranes Industrials 715,306 26,940 1996-03-27FI0009005953 Stora Enso A Basic materials 73,993 14,816 1996-05-02FI0009005961 Stora Enso R Basic materials 1,514,604 52,567 1996-05-02FI0009005987 UPM-Kymmene Basic materials 2,323,897 118,769 1996-05-02FI0009006381 PKC Group Industrials 194,480 24,624 1997-04-03FI0009006415 Nordic Aluminium Basic materials 19,012 4291 1997-04-24FI0009005805 Kyro Consumer services 44,418 9100 1997-06-09FI0009006589 Rocla Basic materials 15,415 3918 1997-06-17FI0009006621 Helsingin Puhelin Telecommunications 116,532 32,367 1997-11-25FI0009006738 Elcoteq Technology 503,265 43,323 1997-11-26FI0009006696 Pöyry Industrials 125,202 14,135 1997-12-02FI0009006761 Metsä Tissue Basic materials 11,286 3725 1997-12-09FI0009007017 Alma Media I Consumer services 10,673 2472 1998-04-01FI0009007025 Alma Media II Consumer services 30,500 5383 1998-04-01FI0009007066 Ramirent Industrials 295,726 21,662 1998-04-30FI0009006829 Sponda Financials 213977 19,500 1998-06-01FI0009007215 Mandatum Pankki Financials 25,732 6430 1998-08-03FI0009007264 Elektrobit Technology 681,542 74,839 1998-09-15FI0009007371 Sonera Telecommunications 1,504,103 140,253 1998-11-17FI0009007355 Rapala VMC Consumer goods 30,739 5202 1998-12-04FI0009007132 Fortum Utilities 2,068,556 120,902 1998-12-18FI0009007629 Conventum Financials 13,395 2736 1999-03-01FI0009801286 Janton Consumer services 22,946 5418 1999-03-15FI0009007553 Eimo Telecommunications 187,912 24,664 1999-03-23FI0009007728 Teleste Technology 209,132 22,537 1999-04-06FI0009007546 Keskisuomalainen Consumer services 11,019 2046 1999-04-19FI0009007686 SanomaWSOY A Consumer services 10,784 2438 1999-05-03FI0009007694 Sanoma Consumer services 458,541 33,242 1999-05-03FI0009006886 Technopolis Financials 85,510 8892 1999-06-08FI0009007819 Perlos Telecommunications 520,835 44,281 1999-06-28FI0009007835 Metso Industrials 1,528,914 69,361 1999-07-01FI0009007884 Elisa Telecommunications 1,209,330 199,530 1999-07-01FI0009008080 Aspocomp Group Industrials 99,023 10,948 1999-10-01FI0009007918 Aldata Solution Technology 253,021 22,840 1999-10-27FI0009801310 F-Secure Technology 578,978 70,994 1999-11-09FI0009008221 Comptel Telecommunications 529,255 65,050 1999-12-13FI0009902530 Nordea Bank Financials 1,081,900 149,790 2000-01-31FI0009008924 Sievi Capital Financials 91,541 12,109 2000-05-24FI0009008833 Tekla Telecommunications 73,328 8581 2000-05-24FI0009009146 Tecnomen Telecommunications 19,745 4532 2000-07-04FI0009009054 Okmetic Telecommunications 75,944 10,430 2000-07-05FI0009009633 Evox Rifa Group Telecommunications 51,493 10,203 2000-11-01FI0009009567 Vacon Telecommunications 80,081 10,770 2000-12-19FI0009008270 SSH Comm. Security Technology 112,633 16,433 2000-12-22FI0009009674 AvestaPolarit Basic materials 24,752 4299 2001-01-30FI0009009377 CapMan Financials 74,153 11,279 2001-04-02FI0009010219 Glaston Industrials 47,748 8174 2001-04-02FI0009010854 Lassila & Tikanoja Industrials 120,822 13,385 2001-10-01FI0009010862 Suominen Consumer goods 51,734 7052 2001-10-01SE0000667925 Telia Telecommunications 870,709 107,088 2002-12-09SE0000110165 OMX Financials 8721 1851 2003-09-04FI0009012843 Kemira GrowHow Basic materials 142,417 25,253 2004-10-18FI0009013296 Neste Oil Oil & gas 1,387,293 81,750 2005-04-21FI0009013429 Cargotec Industrials 474,949 29,210 2005-06-01FI0009013312 Affecto Technology 40,635 5726 2005-06-01FI0009013403 Kone Industrials 618,717 30,192 2005-06-01FI0009013924 Salcomp Industrials 28,721 3688 2006-03-17FI0009010391 Ahlstrom Basic materials 87,853 16,594 2006-03-17FI0009013593 FIM Group Financials 11,379 3084 2006-04-21FI0009014344 Oriola A Health care 25,922 5595 2006-07-03FI0009014351 Oriola B Health care 116,890 19,279 2006-07-03FI0009012413 Terveystalo Health Health care 35,203 8946 2007-04-10FI0009015309 SRV Yhtiöt Industrials 56,384 9579 2007-06-15

International Securities Identification Number (ISIN), company, industry, total number of transactions, total number of unique investors and the IPO day of the security. ISINs from the error-free set aremarked in bold

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 3

Page 142: TUNI baltakys arkisto1

A shares give the owner more voting rights than Class B andhence potentially falls under a separate group of investors.Therefore, the comparison or a direct substitution of shares withone another seems improper, and we consider the securities withdifferent voting classes as separate stocks.

Table 2 gives the number of investors, the number oftransactions and the traded volume for the entire set of 69 IPOstocks. The total number of investors who traded an IPOsecurity is 570,039, and the total number of transactions is76,505,089. The table also shows the number of nominee andnon-nominee-registered investors. As shown, a few nomineeaccounts perform roughly twice as many trades as the non-nominee accounts.

Methodology. The given dataset is composed of transactiondata where investors’ social links are not explicitly given, norcan they be directly obtained from other sources because of dataanonymisation. However, given that investors must individu-ally react and adapt to a quickly changing environment, theyshould identify and follow the best trading strategies. To detectinvestors with similar trading strategies or, more precisely,trade timing similarity, we take a look at the pairwise investors’trading co-occurrences. In the current paper, we use a statis-tically validated network (SVN) method first introduced byTumminello et al. (2011a). This method, briefly presentedbelow, has been demonstrated to be effective in investigatingfinancial, biological and social systems (Tumminello et al.,2011a, 2012).

To compare the trading position taken by an investor on agiven day, irrespective of the absolute volume traded, a categoricalvariable is introduced that describes the investor’s trading activity.For each investor i and each trading day t having the volume soldof a security Vs(i, t) and the volume bought of a security Vb(i, t),we calculate the scaled net volume ratio as follows:

rði; tÞ ¼ Vbði; tÞ � Vsði; tÞVbði; tÞ þ Vsði; tÞ

ð1Þ

Then, a daily trading state can be assigned for an investor afterhaving selected a threshold θ, as follows:

b� primarily buying state;when rði; tÞ > θ

s� primarily selling state;when rði; tÞ < � θ

bs� buying and selling state;when� θ � rði; tÞ � θ

8><

>:

Note that r(i, t) is not defined for day t that had no tradingactivity, and therefore, no trading state is assigned. In ouranalysis, much like in Musciotto et al. (2016), we set θ= 0.25. Wehave verified that the calculations are not sensitive to θ selection:

the results do not vary significantly for the θ threshold rangingfrom 0.01 to 0.25. With this categorisation, the system can bemapped into a bipartite network. We will take one set of nodescomposed of investors and the other set composed of thetrading days.

The states b, s and bs of investor i are indicated as ib, is and ibs,respectively. There are nine possible combinations of the threetrading states between investors i and j: (ib, jb), (ib, js), (ib, jbs), (is,jb), (is, js), (is, jbs), (ibs, jb), (ibs, js) and (ibs, jbs). Because we arefocusing on the positive relationship between investors’ tradingstrategies, we further analyse only the situations where bothinvestors have been in a buy state (ib, jb), both investors have beenin the sell state (is, js), and both investors have been day traders(ibs, jbs), thus excluding the other six trading state co-occurrences.

Statistically validated networks. With the categorical variableson the trading states, the co-occurrence of the trading states ofinvestors i and j can be identified and statistically validated. First,for each investor, her or his activity period is identified. Second,for an investor pair, the length of a joint trading period isdetermined, T, which is equal to the number of trading days in anannual data sample for a given security (≈250). Then, in theintersection periods of a trader’s activity, NP

i (NPj ) denotes the

number of days when investor i (j) is in a given state {b, s, bs}.Moreover, NP

i;j denotes the number of days when we observe theco-occurrence of the given states for investors i and j. Under thenull hypothesis of the random co-occurrences of a state forinvestors i and j, the probability of observing X co-occurrences ofthe investigated states for two investors in T observations can beexpressed by the hypergeometric distribution H(X|T, NP

i , NPj )

(Tumminello et al., 2011a). For each trading state P= {b, s, bs}, ap-value can be associated as follows:

p NPi;j

� �¼ 1�

XNPi;j�1

X¼0

HðXjT;NPi ;N

Pj Þ ð2Þ

Using the SVN method, for each security we construct twosubsequent year networks. The analysis for each security spansfrom the initial listing day up to the second year after the IPO.We assign the categorical variables that define the investor’s dailytrading state, and we select only domestic Finnish investors whohave traded an IPO stock at least five days during the first orsecond year. For each analysed security, we take two consecutiveone-year periods of categorised trading states for investors.Taking the projection of the investor set in a year, we obtain anannual monopartite investor network, and two investor networksfor consecutive years are obtained for each security.

Table 2 Summary of the number of investors, absolute exchanged shares volume and the number of transactions

Investor category # ids Volume # transactions

Non-financial corporations 29,008 10,492,715,279 3,678,419Financial and insurancecorporations

827 350,594,504,886 55,735,780

Government 277 7,279,324,503 298,434Households 532,387 8,984,345,323 12,965,717Non-profit institutions 3407 937,609,174 291,922Rest of the world 4133 12,505,262,104 3,534,817Total 570,039 390,793,761,269 76,505,089Nominee registered 89 331,154,383,799 51,782,691Non-nominee registered 569,993 59,639,377,470 24,722,398

Note that the total volume in the table is counted twice, both for the selling and buying transactions. Here, 43 out of 89 investors with a nominee-registered holding type also made transactions with anon-nominee-registered holding type

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

4 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 143: TUNI baltakys arkisto1

We adjust the p-thresholds using a false discovery rate (FDR)correction (Benjamini and Hochberg, 1995) by taking the sortedp-values p1 < p2 <… < pntests in an increasing order and retainthose that satisfy pi < α ⋅ i/ntests, i= 1, …, ntests. Here, we applyα= 0.05, and ntests equals the total number of observed relation-ships in a year. All networks are essentially multilink networks,where each link describes the type of trading co-occurrencebetween an investor pair. This adjustment is needed because thereare multiple links and thus multiple tests with a given network.The link between investors i and j is considered to be statisticallysignificant and thus existing if the corresponding p-value, ðNP

i;jÞ, isbelow the FDR-adjusted p-threshold. In this way, we obtainvalidated networks for the first and second years. As an example,Fig. C.1 in Appendix C shows the first year sorted p-values andthe FDR thresholds for Kemira GrowHow links.

Statistically validated clusters: persistence in time. We areinterested in the investors’ cluster evolution over time. In otherwords, we want to verify whether investors systematically syn-chronise their trading strategies with other investors and if suchbehaviour can be detected in the subsequent year networks. Withthe community partition for each network, we identify persistentclusters (i.e., clusters that share the same statistically significantcomponent of investors in both the first and the second yearsafter the IPO). Further, we briefly present the method fromMarotta et al. (2015).

We are interested in identifying statistically similar clusters thatemerged in both years (i.e., clusters with the overexpression of thesame investor composition in both clusters, which sharenonrandom elements). The probability that X elements in thecluster C1 of the first year network composed of NC1

elements alsoappear in the cluster C2 of the second year composed of NC2

elements under the null hypothesis that the elements in eachcluster are randomly selected is given by the hypergeomtericdistribution HðXjN;NC1

;NC2Þ, where N is the total number of

unique elements over 2 years. By using this distribution, a p-valuecan be associated with the observed number NC1C2

of elements ofthe cluster C1 reoccurring in C2 according to the followingequation:

pðNC1C2Þ ¼ 1�

XNC1C2�1

X¼0

HðXjN;NC1;NC2

Þ ð3Þ

We reject the null hypothesis if p(NC1C2) is smaller than a given

adjusted threshold, in which case we say that the cluster C1 isstatistically similar with the cluster C2. We adjust the statisticalthreshold using the FDR correction with α= 0.05 and thenumber of tests being equal to the total number of cluster pairsover 2 years that shared at least one common element.

Statistically validated clusters: similarity across securities.Additionally, to check if the same cluster exists over multiplesecurities, we expand the analysis and further look for statisticallysignificant overlapping clusters across all investigated securities.Because the IPO event is the alignment point in our analysis, welook for the overlapping clusters in the set of first-year networksand the set of second-year networks separately. We again use themethod (Eq. (3)) for the cluster overlaps to detect clusters withnonrandomly overlapping elements (investors). To calculate thep-values, we take N equal to the total number of unique investorsacross all investigated securities in the same year, where NC1

is thenumber of investors in the cluster C1, NC2

is the number ofinvestors in the cluster C2, and NC1C2

is the number of commoninvestors in both C1 and C2. Again, we adjust the statistical

threshold using the FDR correction, where α= 0.05 and thenumber of tests is equal to the total number of cluster pairs withinthe same year that shared at least one common element.

Overexpression and underexpression of the characterisinginvestor attributes. To describe the investor clusters from theperspective of the attributes, such as postal code, age, gender orthe type of organisation, we again use the hypergeometric test foridentifying nonrandom overlap (Tumminello et al., 2011b). Oncewe obtain a system of N elements partitioned into clusters(communities), we want to characterise each cluster C of NC

elements. Each element of the system has a certain number ofattributes from a specific class. Here, we want to see if the numberof elements in the cluster with a specific attribute value is sig-nificantly larger than randomly selecting the elements from thetotal system elements. For each attribute Q of the system, we testif Q is over-expressed in the cluster C. The probability that Xelements in cluster C have the attribute Q under the nullhypothesis that the elements in the cluster are randomly selectedis given by the hypergeomteric distribution H(X|N, NC, NQ),where NQ is the total number of elements in the system withattribute Q. By using this distribution, a p-value can be associatedwith the observed number NC,Q of elements in cluster C that havethe attribute Q analogously with Eq. (3). We reject the nullhypothesis if the p-value is smaller than a given FRD-adjusted p-threshold, and we then say that the attribute Q is overexpressed incluster C. In the FDR-adjustment, the number of tests is equal tothe total number of unique attribute values over all attributeclasses and all clusters in a network.

Alternatively, the attribute’s Q underexpression can also betested. Here, we want to see if the number of elements in thecluster with a specific attribute value is significantly lower thanrandomly selecting the elements from the total system elements.The probability under the null hypothesis that the value of anattribute Q in a cluster C is smaller than the observed value in thesystem can be obtained from the left tail of the hypergeometricdistribution, as follows:

puðNC;QÞ ¼XNC;Q

X¼0

HðXjN;NC;NQÞ ð4Þ

Again, if pu(NC,Q) is smaller than a given FDR-adjusted p-threshold, we say that the attribute Q is underexpressed in clusterC. We used the same setting for the FDR correction.

ResultsUsing the SVN methodology, for each of the 69 securities we inferb, s and bs trading state networks for the first and the second yearafter their IPO dates. In order to identify investor clusters we startby aggregating the networks for all three possible joint-tradingstates into one weighted network. Each link in the network isgiven the weight w∈ {1, 2, 3} depending on how many validatedtrading states have been observed for a given investor pair3.Finally, for each weighted network we identify clusters usingInfomap community detection algorithm4 (Rosvall and Berg-strom, 2008). Identified communities are locally dense connectedsubgraphs in a network that play an important role in under-standing a system’s topology. In the current paper, communitiesrepresent investor clusters that are timing their trades synchro-nously throughout the year. Table 3 summarises the number ofobserved clusters during the first and the second year. Forexample, during the first year, 54 investor clusters were identifiedin the security’s Kemira GrowHow (FI0009012843) networks,while during the second year 64 clusters were formed. Figure 1a, bvisualise the later Infomap clusters for the first-year and second-year networks.

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 5

Page 144: TUNI baltakys arkisto1

Tab

le3Inve

stor

netw

orkclusters’statistics

ISIN

Uniqu

eclusters

Y1

Uniqu

eclusters

Y2

Persistingclusters

Y1→

Y2

Uniqu

einvestorsY1

Active

investorsY1

Uniqu

einvestorsY2

Active

investorsY2

Activeinvestors

Y1∩Y2

Med

ian

clustersize

FI0009004881

1/14

(7%)

1/10

(10%)

714

107

875

111

383(2)

FI000980034

60/2

8(0

%)

4/4

0(10%)

10→

1037

47

251

4410

316

117

3(4)

FI000980032

01/8(13%

)0/2

1(0

%)

3→

31867

106

2365

146

48

4(3)

FI000990033

60/0

(0%)

0/11(0

%)

441

221465

90

100(4)

FI00090053

180/5

(0%)

0/13(0

%)

1→

154

551

653

7419

2(2)

FI0009800643

0/5

(0%)

0/3

2(0

%)

1→

153

650

2730

261

233(3)

FI00090058

700/6

(0%)

0/14(0

%)

1→

1947

68

734

67

222(2)

FI00090059

531/14

(7%)

0/10(0

%)

1→

12108

131

2509

104

46

2(2)

FI00090059

61

0/3

0(0

%)

4/6

5(6%)

11→

1335

7028

055

5550

1159

3(4)

FI00090059

87

8/8

2(10%)

11/110

(10%)

29→

3211,093

678

15,139

906

314

3(4)

FI000900638

10/3

2(0

%)

0/3

8(0

%)

2→

252

7722

64085

316

743(3)

FI0009006415

0/11(0

%)

0/5

(0%)

1258

85

601

359

3(2)

FI00090058

05

1/39

(3%)

1/12

(8%)

4835

294

1560

102

42

3(3)

FI000900658

90/11(0

%)

0/1

(0%)

1853

92

274

207

3(2)

FI0009006621

2/66(3%)

1/63(2%)

6→

714,372

469

11,033

565

155

4(5)

FI000900673

80/3

8(0

%)

3/71

(4%)

2→

257

89

305

7261

542

84

2(3)

FI0009006696

0/5

(0%)

0/5

(0%)

1→

11073

72672

3915

3(3)

FI000900676

10/8

(0%)

0/7

(0%)

1000

521252

5517

4(3)

FI00090070

170/0

(0%)

1/4(25%

)53

429

888

388

0(2)

FI00090070

250/9

(0%)

0/2

0(0

%)

1→

11025

68

1951

125

292(2)

FI00090070

66

0/8

(0%)

0/3

(0%)

1984

60

341

2814

3(2)

FI0009006829

0/5

(0%)

1/15

(7%)

1→

11902

5632

12136

272(3)

FI00090072

153/

10(30%)

1/25

(4%)

1→

11673

83

2952

156

215(2)

FI00090072

64

7/113(6%)

52/4

75(11%

)68→

98

8067

854

437

45

428

8482

3(4)

FI00090073

7120

/272

(7%)

136/8

18(17%

)22

7→

389

33,419

2633

82,70

210,050

1467

6(7)

FI00090073

550/1

(0%)

0/0

(0%)

747

1377

432

62(0

)FI0009007132

8/111

(7%)

0/5

4(0

%)

5→

622

,617

943

18,156

514

218

5(6)

FI00090076

290/2

(0%)

0/12(0

%)

596

531426

91

142(3)

FI0009801286

0/15(0

%)

0/1

(0%)

3191

115

968

3816

4(2)

FI00090075

536/8

1(7%)

2/99(2%)

17→

169492

657

9449

997

182

4(3)

FI00090077

281/44(2%)

1/31

(3%)

1→

172

1930

333

5532

276

4(2)

FI00090075

46

0/1

(0%)

0/0

(0%)

232

2697

75

2(0

)FI00090076

86

2/5(40%)

0/1

(0%)

753

45

417

223

2(2)

FI00090076

94

4/2

7(15%

)0/2

(0%)

2774

176

1909

88

283(2)

FI0009006886

2/11

(18%)

0/2

(0%)

1849

103

819

3911

2(2)

FI00090078

194/135

(3%)

2/123(2%)

13→

1416608

1223

828

71117

329

6(5)

FI00090078

352/

41(5%)

0/3

4(0

%)

4→

4632

028

339

1023

585

3(3)

FI00090078

84

11/136

(8%)

2/100(2%)

4→

358

,326

1049

20,940

934

277

3(5)

FI0009008080

1/11

(9%)

0/9

(0%)

1→

11296

91

1094

102

272(3)

FI00090079

181/87(1%)

1/79

(1%)

9→

107136

802

7199

1051

256

6(6)

FI0009801310

15/169(9%)

4/2

18(2%)

79→

84

30,706

2328

20,898

2497

672

2(7)

FI000900822

138

/337

(11%

)8/2

52(3%)

198→

172

35,617

3454

17,235

2541

985

7(6)

FI00099025

306/6

2(10%)

2/65(3%)

8→

825

,808

572

12,223

614

200

6(5)

FI0009008924

0/12(0

%)

1/2(50%)

2644

151

1070

7427

4(2)

FI0009008833

0/2

(0%)

0/3

(0%)

674

35615

45

142(2)

FI0009009146

2/28

(7%)

0/1

(0%)

1→

134

44

285

1188

7034

2(2)

FI0009009054

1/8(13%

)1/8(13%

)1832

135

1120

112

342(2)

FI0009009633

4/19(21%

)1/14

(7%)

2847

224

1771

107

332(3)

FI000900956

74/12(33%

)0/8

(0%)

1614

112

1380

96

282(2)

FI000900827

06/5

3(11%

)0/12(0

%)

5743

442

2558

137

64

3(2)

FI0009009674

1/24

(4%)

1/13

(8%)

4→

420

3322

637

46

161

88

4(6)

FI000900937

71/5(20%)

1/6(17%

)27

79151

2329

133

262(2)

FI0009010219

1/8(13%

)1/5(20%)

1438

84

1078

66

194(2)

FI0009010854

0/2

(0%)

0/14(0

%)

573

41

1164

114

194(4)

FI0009010862

0/5

(0%)

1/10

(10%)

879

66

1604

99

162(4)

SE00006679

2522

/120

(18%)

7/129(5%)

8→

917,759

1186

21,725

1580

476

4(7)

SE0

000110165

1/4(25%

)0/0

(0%)

576

43

176

93

2(0

)FI0009012843

5/54

(9%)

2/64(3%)

5→

58047

464

9609

818

183

5(6)

FI000901329

633

/262(13%

)42/

336(13%

)180→

221

24,350

3518

22,421

3603

1555

7(7)

FI0009013429

12/133

(9%)

5/89(6%)

26→

249945

1016

6012

691

326

3(3)

FI0009013312

3/26

(12%

)0/13(0

%)

2→

126

67

224

1204

125

524(4)

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

6 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 145: TUNI baltakys arkisto1

Next, for each security, we detect clusters with a statisticallysignificant investor overlap between the first and second year. Thesummary of statistically validated cluster time persistence for all69 securities is presented in the fourth column of Table 3. Forexample, in the Kemira GrowHow networks, only 5 of the 54, i.e.9% of clusters identified in the first year were observed in thesecond year. Figure 1c, d display those five clusters that persistedover the first two years after the IPO. The observation in theexample that only a small number of clusters persist into thesecond year is consistent for the majority of the analysed IPOsecurities. However, there are several securities for which morethan a half of the first year clusters persist into the following year.A sample of time persistent clusters and their composition interms of investor attributes are visualised in the Appendix Figs.A.1 and A.2.

By calculating the fraction of clusters that do not persist intothe second year, we observe that over all 69 securities onaverage 88% of the first-year clusters are not observed in thefollowing year, while the same number falls to 78% for maturecompany networks inferred during the same periods (moredetails about the comparison to mature companies are providedin the following section). This observation can suggest theexistence of IPO trading strategy-related clusters that formexclusively during the first year after the IPO date and break upin the following year.

Additionally, we analyse cluster overlap across multiplesecurities, separately for the first-year and second-year net-works. The second and third columns in Table 3 show thenumber of asset-specific clusters over the total number ofcommunities in the first and second year. Here, by asset-specificclusters, we refer to the clusters that are not observable withininvestor networks of the same year for other IPO securities inour investigated 69 security universe. The number of observedasset-specific clusters is rather small and is around 15% (9%)during the first (second) year averaged over all 69 securities.This means that the majority of investor clusters are found tobe present in multiple securities, i.e. they execute synchronisedtrading strategies over multiple IPOs. Note that this clustersynchronisation is observed even though the network inferenceperiods are not aligned in time. The observed decrease in theoverall percentage of asset-specific clusters hints that during thesecond year after IPO more clusters use non-IPO relatedtrading strategies. This is later supported by the mature securityanalysis (see the next section and Tables 4 and 5). Figure A.3 inAppendix A shows a sample of clusters with statistically sig-nificant investor overlap across multiple securities.

Combining the previous results together, we observe persistentclusters that emerge in investor networks over multiple securities.Figure 2 explains the visualisation of a cluster in this study andFig. 3 shows a sample of clusters that both, overlap over time andover multiple securities. In the figure, the top (bottom) row of thegroup refers to the first- (second-) year clusters. Moreover, thedownward arrows associate statistically similar clusters in thefirst-year and second-year networks. The arrows between theclusters in the same year after IPO are omitted for the simplifi-cation of the visualisation. Notably, even if some of the clustersare not persistent over time, quite often they appear over differentsecurities.

Next, we analyse the overexpression and underexpression ofthe investor attributes in the identified investor clusters. We saythat a cluster is overexpressing (underexpressing) an attribute ifthe number of investors in the cluster with that particular attri-bute is significantly higher (lower) than could be expected underthe null model defined in the “Dataset and Methodology” section.We are primarily interested in the sector code attribute analysis,where investors can be assigned households, nonfinancialT

able

3(con

tinu

ed)

ISIN

Uniqu

eclusters

Y1

Uniqu

eclusters

Y2

Persistingclusters

Y1→

Y2

Uniqu

einvestorsY1

Active

investorsY1

Uniqu

einvestorsY2

Active

investorsY2

Activeinvestors

Y1∩Y2

Med

ian

clustersize

FI0009013403

11/112

(10%)

4/9

2(4%)

45→

44

1023

41084

7952

769

409

2(3)

FI0009013924

3/29

(10%)

1/11

(9%)

1804

192

2104

235

43

2(2)

FI0009010

391

3/37

(8%)

6/5

0(12%

)3→

38822

306

5915

434

114

3(5)

FI0009013593

3/17

(18%)

0/0

(0%)

2345

162

870

94

3(0

)FI0009014

344

2/3(67%

)0/0

(0%)

2815

83

1128

3415

2(0

)FI0009014

351

11/5

6(20%)

0/6

(0%)

10,338

399

3267

135

69

2(2)

FI0009012413

11/2

2(50%)

5/22

(23%

)1→

1478

824

36627

237

62

2(5)

FI000901530

99/6

4(14%)

3/29

(10%)

1→

1674

852

122

08

187

95

3(4)

Colum

ns‘Uniqu

eclusters

Y1(Y

2)’show

thenu

mbe

rof

asset-specificinvestor

clusters

over

allclustersob

served

inthefirst(secon

d)year

netw

orks.H

ere,asset-specificinvestor

clusters

arede

fine

das

thosethat

wereno

tob

served

inothe

rIPO

netw

orks.T

henu

mbe

rin

the

brackets

()show

stheratio

inpe

rcen

tage.‘Pe

rsistin

gclusters

Y1→

Y2’show

sthenu

mbe

rof

clusters

with

statistic

allysign

ificant

overlaps

inthefirstandthesecond

years.Notethat

clusters

split

andmerge,and

thus

thenu

mbe

rof

persistedclusters

isno

tne

cessarily

thesame

forbo

thyears.Colum

ns‘Uniqu

einvestorsY1(Y

2)’sho

wthetotaln

umbe

rof

investorspe

rISIN

inayear.C

olum

ns‘ActiveinvestorsY1(Y

2)’sho

wthetotalnum

berof

investorswho

traded

atleast5days

perISIN

inayear.T

hecolumn‘ActiveinvestorsY1∩

Y2’show

sthetotal

numbe

rof

investorswho

traded

atleast5days

perISIN

inbo

thfirstandsecond

year

afterIPO.T

hecolumn‘M

edianclustersize’show

sthenu

mbe

rof

investorsin

amed

ian-sizedclusterin

thefirst(secon

d)year

netw

ork.ISINsfrom

theerror-free

setaremarkedin

bold

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 7

Page 146: TUNI baltakys arkisto1

corporations, financial and insurance corporations, government,nonprofit institutions, and the rest of the world attribute. Addi-tionally, we test whether or not attributes related to gender, age orgeographical location are over expressed or underexpressed5.

Over all 69 securities, we identify 115 (28) investor clusterswith 182 (40) overexpressed (underexpressed) attributes duringthe first year after the IPO, and 130 (44) investor clusters with 236(70) overexpressed (underexpressed) attributes during the secondyear. The number of overexpressed (underexpressed) attributes islarger than the number of investor clusters, because each clustercan overexpress (underexpress) more than one attribute. Theoverexpressed clusters are observed over 28 different securitiesduring the first year after IPO and for 27 different securitiesduring the second year after IPO. As for the underexpressedclusters, they are observed over 16 securities during the first yearand 20 securities during the second year after IPO.

In order to present the attribute analysis in a concise way, weuse the fact that the same clusters appear over multiple secu-rities and assign overexpressed (underexpressed) investorclusters into groups if they are statistically similar. Figure 4presents the resulting sector code attribute overexpressinginvestor cluster networks for the first and second years afterrespective IPOs. In the figure, nodes on the left (right) handside of the vertical dashed line represent investor clustersobserved in the first (second) year after IPO. Statistically similarcluster nodes are connected with links and dotted lines circlenetwork components. Each connected component in the net-work relates to a group of clusters with a statistically similarinvestor composition. The dashed lines crossing from the left tothe right-hand-side indicate that there is a statistical similarityfor some of the clusters in the components between the first andthe second year.

Tables B.1 and B.2 in the Appendix summarise the over-expressed and underexpressed cluster attributes for each investorcluster component in Figs. 4 and 5. The largest first and secondyear components in Fig. 4 are over-represented by finance-insurance and general government institutions, as well as non-profit organisations. Moreover, the same components

underexpress Household sector (see Fig. 5), further supportingtheir institutional profile. In addition, the same componentsoverexpress location attributes, in particular Helsinki and South-West regions (see Fig. B.1 in the Appendix). Investor clusters withan overexpression of a geographical attribute could be observedbecause of some locally present investment strategy, for examplean investor club, or some other means of local informationtransfer. Overall, the results show that the largest cluster com-ponents mainly contain institutions that are timing their tradessimilarly in a year. Compared with household investors, institu-tional traders form robust clusters, that execute similar trade-timing strategies over multiple IPOs, both during the first and thesecond year after the IPO date. Our findings thus support thestudies that provide evidence of institutional herding (Nofsingerand Sias, 1999; Sias, 2004). Some of the financial institutions, suchas pension insurance companies, are driven by the same legisla-tion and portfolio restrictions, which can lead to the same tradingstrategies. Alternatively, traders working for financial institutionshave mutual and/or joint private information channels, leading tosimilar trade timing. The third explanation is that they react topublic news in similar ways.

Do clusters of IPO investors exist with mature companies?. Toverify if our identified clusters are just IPO-related or if they existwith mature companies6 as well, we compare the clusters of thenew-to-the-market stocks with five mature companies (see Table 4).For each mature security, just like previously for IPOs, we con-struct SVNs and identify investor clusters with Infomap algo-rithm. When constructing the first-year and second-yearnetworks, the periods are aligned with respective IPO dates. Thisway we construct 345 (69 × 5) networks for each year. Next, weanalysed the overlaps between mature security investor clustersand the investor clusters inferred with the data from IPOs, toanswer the question if the investor clusters identified with IPOsecurities exist with a mature company. When statistically vali-dating overlaps between mature and IPO security investor net-work clusters, we use the total number of cluster pairs with atleast one investor in common between an IPO and all five maturesecurities as the number of tests for the FDR correction. Table 5shows the number of statistically similar clusters between the IPOand mature securities, as well as the total number of clustersobserved in the IPO and the mature security during the exactlysame period. Here we observe that on average over all investi-gated IPO securities only 16% of IPO clusters are not observed inone of the five investigated mature securities during the first yearafter IPO, and 13% during the second year. By looking at thesame table, we can see that only a fraction of total clustersobserved in mature securities are also observed in IPO securitynetworks. It can be because not all investors who trade mature

Fig. 1 Infomap clusters and their evolution for Kemira GrowHow (FI0009012843). Community detection is used with weighted links based on the totalnumber of buy state, sell state, and day trade link types between two investors. a FDR: 54 clusters, first year after IPO, b FDR: 64 clusters, second year afterIPO, c, d show five statistically significant overlapping clusters in both years. Node position is fixed. The colours of reoccurring clusters in all graphscoincide. In a, b, each cluster has a unique colour, with the exception of those with fewer than four elements, which are coloured in grey

Table 4 Five mature companies with the highest number oftransactions in HSE

ISIN Company name IPO date

FI0009000681 Nokia 1981-04-01FI0009000277 Tieto 1984-06-01FI0009000665 Metsä Board B 1987-01-02FI0009002943 Raisio V 1989-04-25FI0009003727 Wärtsilä 1991-01-17

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

8 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 147: TUNI baltakys arkisto1

Tab

le5IPO

andmaturecompa

nies

inve

stor

clusters

overlap

ISIN

IPO

date

Year

FI0009000681

FI00090037

27FI0009000665

FI000900027

7FI00090029

43

#Uniqu

ecl.

FI0009004881

1995-01-12

Y1

2(2){118}

1(1){17}

2(2){11}

{8}

3(2){11}

9/14(64%)

Y2

9(11)

{142}

6(8){33}

7(9){34}

1(1){5}

6(6){57}

1/10

(10%)

FI000980034

61995-05-11

Y1

21(25)

{146}

13(12)

{19}

12(12)

{20}

7(7){15}

16(14){29}

3/28

(11%

)Y2

26(32)

{133

}18

(16){30}

23(23)

{46}

1(1){2}

15(19){6

7}6/4

0(15%

)FI000980032

01995-05-11

Y1

5(7){146}

3(3){19}

2(3){20}

1(2){15}

4(5){29}

3/8(38%)

Y2

10(14){133

}12

(14){30}

7(8){4

6}

1(1){2}

11(10){6

7}5/

21(24%)

FI00090053

181995-06-01

Y1

3(3){152

}{21}

2(3){21}

2(3){15}

1(1){29}

2/5(40%)

Y2

9(10){123

}7(8){34}

5(6){38}

1(1){3}

8(9){6

2}1/13

(8%)

FI000990033

61995-06-01

Y1

{152

}{21}

{21}

{15}

{29}

0Y2

6(8){123

}6(7){34}

5(8){38}

{3}

6(6){6

2}1/11

(9%)

FI0009800643

1995-09-04

Y1

5(8){176

}1(2){21}

2(3){27}

{10}

3(3){4

5}0/5

(0%)

Y2

19(31)

{133

}17

(19){38}

21(22)

{38}

1(1){2}

14(15)

{67}

6/3

2(19%)

FI00090058

701996-03-27

Y1

4(4){136

}2(3){32}

2(2){4

1}1(1){2}

2(3){6

5}1/6(17%

)Y2

13(15)

{190}

6(7){26}

10(9){30}

1(1){4

}9(10){56}

0/14(0

%)

FI00090059

87

1996-05-02

Y1

56(54){128

}23

(24){30}

26(24){4

3}{2}

33(28){72}

18/8

2(22%

)Y2

82(91)

{225

}27

(22)

{34}

40(30){33}

6(4){9

}43(37)

{58}

20/110

(18%)

FI00090059

61

1996-05-02

Y1

21(29){128

}13

(18){30}

19(19){4

3}1(1){2}

19(18){72}

2/30

(7%)

Y2

51(73)

{225

}29

(26){34}

36(26){33}

3(3){9

}31

(33)

{58}

7/65(11%

)FI00090059

531996-05-02

Y1

11(12)

{128

}7(8){30}

9(10){4

3}{2}

8(9){72}

2/14

(14%)

Y2

9(10){225

}4(6){34}

8(7){33}

1(1){9

}6(8){58}

0/10(0

%)

FI000900638

11997-04-03

Y1

26(34){179

}5(5){27}

14(13)

{33}

2(2){4

}19

(18){54}

4/3

2(12%

)Y2

31(40){312}

13(11)

{25}

13(15)

{61}

19(19){8

0}

20(28){118}

2/38

(5%)

FI0009006415

1997-04-24

Y1

6(8){210}

3(4){32}

5(6){32}

{4}

8(9){56}

3/11

(27%

)Y2

4(6){333

}3(3){24}

2(2){54}

{90}

4(4){130

}0/5

(0%)

FI00090058

05

1997-06-09

Y1

28(44){240}

16(16){35}

22(19){39}

8(7){14}

19(22)

{53}

4/3

9(10%)

Y2

9(13)

{356

}3(3){29}

4(7){6

2}5(7){9

3}5(9){160}

2/12

(17%

)FI000900658

91997-06-17

Y1

9(14){245}

7(6){39}

6(7){37}

4(4){14}

6(11)

{51}

2/11

(18%)

Y2

1(1){377

}{29}

{62}

{91}

{162}

0/1

(0%)

FI0009006621

1997-11-25

Y1

60(81)

{273

}23

(19){27}

24(20){4

6}

31(24){4

3}44(39){6

2}3/

66(5%)

Y2

59(123

){547}

26(20){27}

24(21)

{46}

50(55)

{90}

53(101)

{236

}2/

63(3%)

FI000900673

81997-11-26

Y1

30(46){275

}11

(10){28}

18(16){4

8}

16(15)

{45}

20(21)

{62}

3/38

(8%)

Y2

54(107)

{557

}12

(12)

{27}

14(14){4

7}43(48){9

7}57

(86){241}

5/71

(7%)

FI0009006696

1997-12-02

Y1

4(5){274

}1(1){28}

1(1){4

8}

2(3){4

6}

4(5){6

5}0/5

(0%)

Y2

5(8){6

02}

3(3){25}

4(4){50}

3(3){8

8}

5(9){210}

0/5

(0%)

FI000900676

11997-12-09

Y1

8(14){291}

6(8){26}

8(10){4

9}

7(9){4

5}6(8){6

8}

0/8

(0%)

Y2

7(15)

{606}

6(5){25}

4(4){4

1}6(10){9

0}

7(18){229

}0/7

(0%)

FI00090070

171998-04-01

Y1

{316}

{28}

{62}

{81}

{117}

0Y2

3(4){765}

2(2){32}

{44}

1(1){8

7}3(4){232

}1/4(25%

)FI00090070

251998-04-01

Y1

7(9){316}

6(6){28}

5(6){6

2}6(6){8

1}6(7){117}

2/9(22%

)Y2

15(24){765}

10(11)

{32}

10(9){4

4}

10(16){8

7}14

(22)

{232

}3/

20(15%

)FI00090070

66

1998-04-30

Y1

6(9){326

}5(7){29}

4(5){56}

5(8){8

7}6(11)

{142}

0/8

(0%)

Y2

3(2){8

36}

2(1){35}

2(3){4

2}2(2){102}

2(2){223

}0/3

(0%)

FI0009006829

1998-06-01

Y1

4(5){341}

3(3){30}

2(2){6

0}

2(2){8

9}

1(1){153

}0/5

(0%)

Y2

9(14){9

06}

4(4){38}

7(12)

{46}

5(6){9

6}

7(7){210}

5/15

(33%

)FI00090072

151998-08-03

Y1

6(12)

{437

}1(2){30}

5(4){57}

4(6){9

3}6(11)

{179

}3/

10(30%)

Y2

15(24){6

15}

6(6){39}

5(6){4

5}13

(14){103}

11(12)

{185}

6/2

5(24%)

FI00090072

64

1998-09-15

Y1

94(134

){4

69}

16(13)

{25}

15(15)

{48}

53(45)

{94}

75(81)

{204}

9/113

(8%)

Y2

376(361)

{587}

37(31)

{41}

32(23)

{41}

163(105)

{125

}23

7(144){200}

64/4

75(13%

)FI00090073

711998-11-17

Y1

248(350

){515}

41(24){26}

49(35)

{45}

148(83)

{90}

206(187)

{233

}13/2

72(5%)

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 9

Page 148: TUNI baltakys arkisto1

Table

5(con

tinu

ed)

ISIN

IPO

date

Year

FI0009000681

FI00090037

27FI0009000665

FI000900027

7FI00090029

43

#Uniqu

ecl.

Y2

683(550

){6

45}

47(31)

{36}

49(39){4

8}

252(108){112}

300(156

){183}

99/8

18(12%

)FI00090073

551998-12-04

Y1

1(1){6

12}

{26}

{47}

{91}

1(1){226

}0/1

(0%)

Y2

{611}

{33}

{44}

{115}

{175

}0

FI0009007132

1998-12-18

Y1

91(153

){6

29}

26(16){24}

32(23)

{40}

67(60){8

7}85(125

){250

}9/111

(8%)

Y2

35(45)

{523

}18

(19){33}

29(25)

{49}

31(34){112}

38(58){166}

4/5

4(7%)

FI00090076

291999-03-01

Y1

2(3){6

89}

1(1){33}

{41}

{81}

1(1){261}

0/2

(0%)

Y2

6(7){553

}3(3){34}

4(4){56}

8(9){9

7}4(4){130

}3/

12(25%

)FI0009801286

1999-03-15

Y1

14(23)

{731}

6(5){33}

5(5){4

1}10

(13)

{81}

15(28){262}

0/15(0

%)

Y2

1(1){580}

1(1){32}

{54}

1(1){9

6}

1(1){124

}0/1

(0%)

FI00090075

531999-03-23

Y1

65(108){745}

28(19){35}

20(17)

{48}

44(44){8

3}61(77)

{242}

8/8

1(10%)

Y2

65(70){551}

12(10){36}

14(13)

{52}

56(48){100}

42(26){128

}11/9

9(11%

)FI00090077

281999-04-06

Y1

36(78){8

11}

16(20){33}

8(10){4

6}

27(32)

{92}

36(59){230

}5/

44(11%

)Y2

21(22)

{530

}10

(11)

{32}

10(10){53}

18(22)

{90}

11(11)

{125

}5/

31(16%)

FI00090075

46

1999-04-19

Y1

1(1){8

50}

1(1){35}

1(1){4

6}

{100}

1(1){225

}0/1

(0%)

Y2

{447}

{29}

{53}

{84}

{124

}0

FI00090076

94

1999-05-03

Y1

23(43)

{855

}11

(13)

{37}

4(6){4

4}

18(28){9

9}

19(29){228

}4/2

7(15%

)Y2

1(2){4

54}

1(2){34}

{51}

2(3){9

3}{131}

0/2

(0%)

FI00090076

86

1999-05-03

Y1

3(7){8

55}

{37}

{44}

3(3){9

9}

4(7){228

}1/5(20%)

Y2

{454

}{34}

{51}

{93}

{131}

1/1(100%)

FI0009006886

1999-06-08

Y1

6(6){8

98}

2(2){33}

3(3){4

7}5(6){9

7}8(10){210}

2/11

(18%)

Y2

2(2){4

33}

1(1){34}

1(1){53}

1(1){9

0}

1(1){130

}0/2

(0%)

FI00090078

191999-06-28

Y1

122(277

){8

59}

38(23)

{36}

29(28){4

7}86(63)

{92}

98(115){203}

5/135(4%)

Y2

101(124

){513}

31(20){34}

35(33)

{51}

81(67)

{98}

63(58){123

}6/123

(5%)

FI00090078

351999-07-01

Y1

35(64){791}

22(22)

{34}

16(18){4

8}

30(38){9

5}30

(44){186}

4/4

1(10%)

Y2

30(42)

{596}

19(17)

{34}

16(16){51}

23(31)

{103}

18(27)

{124

}1/34

(3%)

FI00090078

84

1999-07-01

Y1

104(171){791}

31(23)

{34}

27(26){4

8}

74(57)

{95}

82(75)

{186}

13/136

(10%)

Y2

82(145)

{596}

32(24){34}

36(29){51}

73(73)

{103}

57(61)

{124

}6/100(6%)

FI0009008080

1999-10-01

Y1

9(16){597}

3(4){4

1}2(1){4

2}6(7){120

}7(8){197}

2/11

(18%)

Y2

9(15)

{775

}5(7){34}

5(6){53}

7(10){8

9}

7(7){111}

0/9

(0%)

FI00090079

181999-10-27

Y1

81(167)

{631}

20(21)

{41}

16(16){4

7}58

(64){120

}58

(64){192}

2/87(2%)

Y2

67(150

){763}

21(22)

{37}

29(21)

{51}

51(61)

{91}

33(44){101}

6/7

9(8%)

FI0009801310

1999-11-09

Y1

136(149){6

30}

22(18){38}

20(16){50}

85(58){113}

69(67)

{195}

14/169(8%)

Y2

183(318){765}

34(21)

{38}

35(30){4

5}90(67)

{86}

93(70){108}

22/2

18(10%)

FI000900822

11999-12-13

Y1

280(292)

{536

}37

(23)

{34}

39(29){4

6}

151(91)

{112}

146(108){171}

32/3

37(9%)

Y2

215(356

){8

43}

50(27)

{39}

52(30){4

2}115(74){8

9}

78(60){104}

22/2

52(9%)

FI00099025

3020

00-01-31

Y1

46(61)

{489}

17(22)

{37}

21(21)

{55}

39(49){115}

29(46){158

}5/

62(8%)

Y2

55(118){9

15}

34(29){4

9}

37(31)

{46}

40(47)

{90}

20(19){8

3}2/

65(3%)

FI0009008924

2000-05-24

Y1

10(15)

{439

}4(4){33}

3(3){58}

6(7){8

6}

7(8){130

}0/12(0

%)

Y2

1(1){8

56}

{61}

{47}

{80}

{70}

1/2(50%)

FI0009008833

2000-05-24

Y1

{439

}{33}

1(1){58}

1(2){8

6}

{130

}1/2(50%)

Y2

2(2){8

56}

2(2){6

1}2(3){4

7}2(2){8

0}

{70}

0/3

(0%)

FI0009009146

2000-07-04

Y1

18(28){586}

4(4){35}

7(8){54}

15(19){9

8}

13(15)

{123

}5/

28(18%)

Y2

1(1){8

49}

1(1){6

1}1(1){4

6}

1(1){78}

{66}

0/1

(0%)

FI0009009054

2000-07-05

Y1

5(8){6

00}

2(4){35}

1(1){54}

4(6){102}

3(3){128

}1/8(12%

)Y2

3(4){8

49}

3(4){6

0}

3(4){4

7}2(5){78}

4(4){6

9}

0/8

(0%)

FI0009009633

2000-11-01

Y1

12(16){762}

3(4){37}

3(4){4

7}6(6){9

2}7(6){107}

5/19

(26%)

Y2

7(9){548}

2(2){6

4}

1(1){50}

4(4){9

3}2(2){53}

5/14

(36%)

FI000900956

720

00-12-19

Y1

8(9){8

18}

5(5){4

0}

4(2){4

7}8(8){9

3}1(1){101}

3/12

(25%

)Y2

2(4){4

68}

4(4){58}

3(3){59}

5(5){101}

2(2){52}

0/8

(0%)

FI000900827

020

00-12-22

Y1

45(68){8

58}

16(12)

{39}

17(11)

{48}

23(26){9

1}17

(18){9

1}5/

53(9%)

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

10 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 149: TUNI baltakys arkisto1

Table

5(con

tinu

ed)

ISIN

IPO

date

Year

FI0009000681

FI00090037

27FI0009000665

FI000900027

7FI00090029

43

#Uniqu

ecl.

Y2

4(4){4

77}

5(5){59}

2(3){58}

6(5){9

5}3(3){52}

2/12

(17%

)FI0009009674

2001-01-30

Y1

18(36){9

00}

19(24){4

7}19

(25)

{52}

19(31)

{91}

11(11)

{81}

1/24

(4%)

Y2

11(24){4

99}

11(19){58}

11(22)

{52}

12(35)

{105}

4(6){4

1}1/13

(8%)

FI000900937

720

01-04-02

Y1

2(2){795}

2(2){54}

1(1){4

9}

2(2){8

8}

1(2){102}

2/5(40%)

Y2

1(1){526

}{4

2}2(2){53}

1(1){118}

{14}

3/6(50%)

FI0009010219

2001-04-02

Y1

5(11)

{795}

4(4){54}

4(5){4

9}

5(5){8

8}

4(6){102}

1/8(12%

)Y2

2(2){526

}1(1){4

2}{53}

1(1){118}

{14}

2/5(40%)

FI0009010862

2001-10-01

Y1

1(1){6

39}

5(6){6

2}4(5){55}

3(4){101}

1(1){51}

0/5

(0%)

Y2

{164}

3(3){39}

5(8){6

7}7(8){9

4}

{14}

2/10

(20%)

FI0009010854

2001-10-01

Y1

2(2){6

39}

2(3){6

2}2(2){55}

2(3){101}

{51}

0/2

(0%)

Y2

3(3){164}

5(5){39}

12(15)

{67}

13(20){9

4}

{14}

0/14(0

%)

SE00006679

2520

02-12-09

Y1

42(44){248}

18(18){4

7}29

(27)

{57}

57(51)

{91}

20(18){35}

33/120

(28%)

Y2

96(125

){327

}52

(45)

{58}

60(49){70}

66(63)

{86}

45(37)

{73}

11/129

(9%)

SE0000110165

2003-09-04

Y1

1(1){339

}{4

4}

2(3){6

6}

1(1){9

3}1(1){77}

2/4(50%)

Y2

{173

}{56}

{105}

{100}

{73}

0FI0009012843

2004-10-18

Y1

23(41)

{187}

37(42)

{71}

34(44){107}

36(53)

{99}

20(27)

{71}

7/54

(13%

)Y2

41(65)

{438

}36

(42)

{159

}42(54){133

}31

(33)

{113}

6(5){25}

6/6

4(9%)

FI000901329

620

05-04-21

Y1

153(138

){211}

144(87)

{102}

76(60){9

5}109(65)

{81}

31(28){6

3}36

/262(14%)

Y2

237(207)

{273

}165(134

){185}

106(76){107}

148(89){110}

34(30){6

0}

41/33

6(12%

)FI0009013312

2005-06-01

Y1

19(28){323

}13

(14){129

}18

(17)

{107}

15(13)

{82}

9(12)

{68}

1/26

(4%)

Y2

6(9){266}

6(7){171}

7(8){9

5}8(10){9

4}

5(4){50}

1/13

(8%)

FI0009013403

2005-06-01

Y1

75(85)

{323

}81(75)

{129

}39

(37)

{107}

56(45)

{82}

22(15)

{68}

10/112

(9%)

Y2

64(64){266}

66(80){171}

34(31)

{95}

55(49){9

4}

7(8){50}

10/9

2(11%

)FI0009013429

2005-06-01

Y1

88(105)

{323

}85(77)

{129

}51

(44){107}

71(54){8

2}21

(16){6

8}

18/133

(14%)

Y2

61(73)

{266}

64(69){171}

25(26){9

5}48(39){9

4}

9(10){50}

9/8

9(10%)

FI000901039

120

06-03-17

Y1

21(25)

{243}

16(17)

{182}

22(26){108}

20(21)

{119}

12(14){6

8}

4/3

7(11%

)Y2

22(26){370

}17

(19){121}

28(37)

{118}

26(34){101}

9(8){36}

6/5

0(12%

)FI0009013924

2006-03-17

Y1

19(22)

{243}

13(12)

{182}

17(22)

{108}

16(17)

{119}

5(5){6

8}

3/29

(10%)

Y2

4(4){370

}4(4){121}

2(2){118}

4(3){101}

1(1){36}

2/11

(18%)

FI000901359

320

06-04-21

Y1

10(14){268}

6(7){184}

5(9){102}

10(14){112}

2(2){6

0}

2/17

(12%

)Y2

{337

}{127

}{124

}{9

0}

{32}

0FI000901435

120

06-07-03

Y1

28(32)

{263}

21(24){163}

25(27)

{93}

26(25)

{90}

6(6){50}

16/5

6(29%)

Y2

3(4){336

}4(4){147}

5(5){142}

4(4){9

4}

{34}

0/6

(0%)

FI000901434

420

06-07-03

Y1

{263}

{163}

{93}

{90}

{50}

3/3(100%)

Y2

{336

}{147}

{142}

{94}

{34}

0FI0009012413

2007-04-10

Y1

4(4){360}

2(3){122

}9(9){122

}6(4){9

5}3(3){38}

8/2

2(36%)

Y2

6(7){315}

11(13)

{143}

9(16){161}

10(13)

{62}

{7}

6/2

2(27%

)FI000901530

920

07-06-15

Y1

32(37)

{329

}26

(26){139

}31

(35)

{136

}34

(32)

{92}

6(5){37}

12/6

4(19%)

Y2

19(26){380}

23(30){211}

16(24){198}

14(17)

{67}

2(2){16}

2/29

(7%)

Med

ian

Y1

11%

Y2

9%

Average

Y1

16%

Y2

13%

The

overlapisgivenas

A(B){C

},whe

reAisthenu

mbe

rof

theoverlapp

ingclusters

inIPOstock,(B)isthenu

mbe

rof

overlapp

ingclusters

inthematurestock,and{C

}isthetotalnum

berof

clusters

ofamaturestock.Ado

esno

tne

cessarily

equalB

becauseanyclustercanbe

statistic

allysimilarto

morethan

oneclusterin

anothe

rsecurity.‘#un

ique

cl.’isthetotaln

umbe

rof

unique

IPOinvestor

clusters

over

thetotaln

umbe

rof

clusters,w

here

thepe

rcen

tage

ratio

isgivenin

brackets

().M

edianandaveragepe

rcen

tage

oftheun

ique

clusters

over

all

IPO

stocks

aregivenin

thebo

ttom

ofthetable

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 11

Page 150: TUNI baltakys arkisto1

securities trade recently issued securities, and if they do, not all ofthem might apply the same trading strategies and, therefore, notform similar synchronised clusters as in mature securities.

ConclusionsIn the current paper, we analysed investor interactions andbehaviours using a unique dataset of all Finnish investors’transactions in the HSE. Our selected set of 69 securities is alignedto an IPO event, which occurs when a company first startspublicly trading its securities. We performed an analysis formultiple securities on an individual investor account level byconstructing the networks from the statistically validated tradingco-occurrences. Our main focus was on the newly emergingmarket networks and their common and persistent market-drivenstructures with the other mature and new stocks.

Applying a community detection algorithm, we found statis-tically similar investor clusters with synchronised trading strate-gies that were forming repeatedly over several years and formultiple securities. We detected statistically robust clustersbetween the first and second year after an IPO. We also foundclusters that could be found within other securities. By investi-gating cluster attribute overexpression and underexpression, wefind a highly persistent institutional investor cluster. This findingprovides further evidence about institutional herding. Comparingthe findings with the clusters on mature securities, we observethat the majority of clusters can also be observed with a maturesecurity.

Our results show that some synchronised trading strategies infinancial markets span across multiple stocks, are persistent overtime and occur with both newly issued and mature stocks.However, this analysis applies to the HSE only and does notgeneralise to all markets. Further research should check if thisphenomenon also exists in other stock exchanges with a largeramount of IPOs; however, to the best of our knowledge, theseinvestor-level data are not available, for example, from the U.S.markets.

Fig. 2 Graphical representation of the clusters. A single cluster is visualisedas a rectangle block, where a row represents one investor with fourattributes: sector code, location, gender and birth year decade. Sector code:—Households, —Non-financial, —Financial-insurance, —General-

government, —Non-profit, —Rest-world. Geographic location: —

Helsinki, —South-West, —Western-Tavastia, —Central-Finland,—Northern-Finland, —Ostrobothnia, —Rest-Uusimaa, —Eastern-

Tavastia, —Eastern-Finland, —South-East, —Northern-Savonia.Gender: —Male, —Female, —No-Gender. Decade: —No-age, —

1910, —1920, —1930, —1940, —1950, —1960, —1970, —

1980, —1990, —2000

Fig. 3 Statistically significant cluster overlaps across multiple securities and over time. The figure contains many subfigures separated by borders. Eachsubfigure presents a cluster of investors that spans over multiple securities and persists in time. The row alignment shows statistically similar clusters inthe same year: the top row is the first after the IPO, and the bottom row is the second year after the IPO. The downward arrows show the cluster timewiseevolution from the first to the second year for the same security. A cluster is represented by the rectangle. Each cluster is composed of investors with fourattributes: sector code, geographic location, gender and decade. See the attribute colour mapping in Fig. 2

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

12 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 151: TUNI baltakys arkisto1

Traditional financial research assumes that investors arerational and hold optimal portfolios. However, actual investorshave information, intellectual and computational limitations, andthey satisfice7 when making decisions. The systematic reoccur-rence of the clusters gives a notion of possible stronger infor-mation connections that the investors share. For example, theymay be consistently following the same public informationsources or have mutual private information channels. However,with the current research, we do not try to explain the directionor the publicity of the information transfer. On the other hand,according to Ozsoylev et al. (2013), investor networks can beconsidered proxies of information networks if they are fairlystable over time. In light of this argument, the persistent andsecurity-wide investor clusters can represent the mutual infor-mation channels that exist for both new IPO securities andmature stocks (e.g., Nokia).

Data availabilityThe dataset analysed in the current study is not publicly availableand cannot be distributed by the authors because it is a pro-prietary database of Euroclear Finland. The database can beaccessed for research purposes under the nondisclosure agree-ment by asking permission from Euroclear Finland.

Received: 3 June 2019; Accepted: 2 October 2019;

Notes1 In total, 75 securities had their IPOs during our analysis period. In this study weestimate investor networks during a 2-year period after their IPO date; therefore, wediscarded ISIN FI0009014716 because its 2-year period falls out of our analysis period.Additionally, five ISINs (FI0009015580, FI0009015291, FI0009015713, FI0009005250and FI0009902514) were discarded from the analysis, because no networks wereestimated for them.

2 Unfortunately, the data appear to have issues with the trading date attribute for somesecurities, particularly for the transactions between 1998 and 2004. The net tradingvolumes on a daily resolution do not reconcile to 0 for all trading dates, while thevolume sold should be equal to the volume bought per each stock during each dayacross all investors. This suggests that some transactions in the dataset were misplacedtimewise because of incorrectly recorded trading dates. Only 14 of 69 securities fallinto the completely error-free data period, and are marked in bold in Table 1.

3 For example, if the given investors were timing their buy transactions similarly so thatthey have a statistically validated link in the buying state but there were no statisticalassociations with the sell and buy–sell states, then the weight of the link between theinvestors would be 1.

4 We use igraph implementation of the Infomap algorithm with 100 as the parameter forthe number of trials.

5 No-Gender and No-Age attributes refer to the institutional investors, but also to theindividual investors who had no gender and/or birth year indicated in the data.

6 Recently, the long-term evolution of the clusters of the most capitalised stock in theHSE—Nokia—has been analysed in Musciotto et al. (2018).

7 The term satisfice refers to making optimal decisions under the limited resources. Itwas first defined in Simon and Barnard (1947).

ReferencesBaltakys K, Baltakienė M, Kärkkäinen H, Kanniainen J (2018a) Neighbors matter:

Geographical distance and trade timing in the stock market. Finance Res Letthttps://doi.org/10.1016/j.frl.2018.11.013

first year second year

Fig. 4 Network of investor clusters with overexpressed attributes. On the left-hand-side are the clusters observed in the first year after respective IPOs andright-hand-side, in the second year. Investor cluster nodes are connected with continuous links if they share statistically significant number of individualinvestors. Dashed links represent statistical similarity between some of the connected cluster components in the first and the second year after the IPOs.Node colours identify overexpressed sector codes within clusters. For overexpressed geographical location see Appendix Fig. B.1, for underexpressedattributes see Fig. 5 and for all overexpressed and underexpressed attributes see Appendix Tables B.1 and B.2. Sector code: - Households, —Non-financial, —Financial-insurance, —General-government, —Non-profit, —Rest-world

first year second year

Fig. 5 Network of investor clusters with underexpressed attributes. On the left-hand-side are the clusters observed in the first year after respective IPOsand right-hand-side, in the second year. Investor cluster nodes are connected with continuous links if they share statistically significant number ofindividual investors. Node colours identify underexpressed sector code and geographical location attributes within clusters. Sector code: —Households,—Financial-insurance. Geographic location: —Helsinki, —South-West

PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6 ARTICLE

PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms 13

Page 152: TUNI baltakys arkisto1

Baltakys K, Kanniainen J, Emmert-Streib F (2018b) Multilayer aggregation withstatistical validation: application to investor networks. Sci Rep 8(1):8198

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practicaland powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300

Emmert-Streib F, Musa A, Baltakys K, Kanniainen J, Tripathi S, Yli-Harja O,Jodlbauer H, Dehmer M (2018) Computational analysis of the structuralproperties of economic and financial networks. J Netw Theory Financ 4(3):1–32

Grinblatt M, Keloharju M (2001) How distance, language, and culture influencestockholdings and trades. J Financ 56(3):1053–1073

Gualdi S, Cimini G, Primicerio K, Di Clemente R, Challet D (2016) Statisticallyvalidated network of portfolio overlaps and systemic risk. Sci Rep 6:39467

Ilmanen M, Keloharju M (1999) Shareownership in Finland. Finn J Bus Econ 48(1):257–285

Karhunen J, Keloharju M (2001) Shareownership in Finland 2000. Finn J Bus Econ50(2):188–226

Kaustia M, Knüpfer S (2008) Do investors overweight personal experience? evi-dence from ipo subscriptions. J Financ 63(6):2679–2702

Keloharju M (1993) The winner’s curse, legal liability, and the long-run priceperformance of initial public offerings in finland. J Financ Econ 34(2):251–277

Lakonishok J, Maberly E (1990) The weekend effect: trading patterns of individualand institutional investors. J Financ 45(1):231–243

Lillo F, Miccichè S, Tumminello M, Piilo J, Mantegna RN (2015) How news affectsthe trading behaviour of different categories of investors in a financial market.Quant Financ 15(2):213–229

Ljungqvist A, Wilhelm Jr WJ (2003) Ipo pricing in the dot-com bubble. J Financ 58(2):723–752

Ljungqvist A, Wilhelm Jr WJ (2005) Does prospect theory explain ipo marketbehavior? J Financ 60(4):1759–1790

Marotta L, Micciche S, Fujiwara Y, Iyetomi H, Aoyama H, Gallegati M, MantegnaRN (2015) Bank-firm credit network in japan: an analysis of a bipartitenetwork. PloS one 10(5):e0123079

Musciotto F, Marotta L, Miccichè S, Piilo J, Mantegna RN (2016) Patterns oftrading profiles at the nordic stock exchange. a correlation-based approach.Chaos Solitons Fractals 88:267–278

Musciotto F, Marotta L, Piilo J, Mantegna RN (2018) Long-term ecology ofinvestors in a financial market. Palgrave Commun 4(1):92

Nofsinger JR, Sias RW (1999) Herding and feedback trading by institutional andindividual investors. J Financ 54(6):2263–2295

Ozsoylev HN, Walden J, Yavuz MD, Bildik R (2013) Investor networks in the stockmarket. Rev Financ Stud 27(5):1323–1366

Ranganathan S, Kivelä M, Kanniainen J (2018) Dynamics of investor spanningtrees around dot-com bubble. PLoS One 13(6):e0198807

Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networksreveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123

Sias RW (2004) Institutional herding. Rev Financ Stud 17(1):165–206Siikanen M, Baltakys K, Kanniainen J, Vatrapu R, Mukkamala R, Hussain A (2018)

Facebook drives behavior of passive households in stock markets. FinanceRes Lett 27:208–213

Simon HA, Barnard CI (1947) Administrative behavior: a study of decision-makingprocesses in administrative organization. Macmillan

Spohr J (2004) Earnings management and ipos—evidence from Finland. Finnish JBus Econ 53(2):157–172

Tumminello M, Lillo F, Piilo J, Mantegna RN (2012) Identification of clusters ofinvestors from their real trading activity in a financial market. New J Phys 14(1):013041

Tumminello M, Micciche S, Lillo F, Piilo J, Mantegna RN (2011a) Statisticallyvalidated networks in bipartite complex systems. PLoS One 6(3):e17994

Tumminello M, Micciche S, Lillo F, Varho J, Piilo J, Mantegna RN (2011b)Community characterization of heterogeneous complex systems. J Stat Mech:Theory Exp 2011(01):P01019

AcknowledgementsMB and KB are grateful for the grants received from the Finnish Foundation for SharePromotion, The Foundation for Advancement of Finnish Securities Market and FinnishFoundation for Technology Promotion. KB received funding from the EU Research andInnovation Programme Horizon 2020 under grant agreement no. 675044 (BigDataFi-nance) and from the doctoral school of Tampere University. FL and DP acknowledgepartial support from the European Community H2020 Programme under the schemeINFRAIA-1-2014-2015: Research Infrastructures, grant agreement No. 654024 SoBig-Data: Social Mining and Big Data Ecosystem (http://www.sobigdata.eu). The funders hadno role in study design, data collection and analysis, decision to publish or preparation ofthe manuscript.

Competing interestsThe authors declare no competing interests.

Additional informationSupplementary information is available for this paper at https://doi.org/10.1057/s41599-019-0342-6.

Correspondence and requests for materials should be addressed to M.B.

Reprints and permission information is available at http://www.nature.com/reprints

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Open Access This article is licensed under a Creative CommonsAttribution 4.0 International License, which permits use, sharing,

adaptation, distribution and reproduction in any medium or format, as long as you giveappropriate credit to the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made. The images or other third partymaterial in this article are included in the article’s Creative Commons license, unlessindicated otherwise in a credit line to the material. If material is not included in thearticle’s Creative Commons license and your intended use is not permitted by statutoryregulation or exceeds the permitted use, you will need to obtain permission directly fromthe copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© The Author(s) 2019

ARTICLE PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-019-0342-6

14 PALGRAVE COMMUNICATIONS | (2019) 5:129 | https://doi.org/10.1057/s41599-019-0342-6 | www.nature.com/palcomms

Page 153: TUNI baltakys arkisto1

PUBLICATION

IV

Neighbors matter: Geographical distance and trade timing in the stockmarket

Baltakys, K., Baltakiene, M., Kärkkäinen, H. and Kanniainen, J.

Finance Research Letters (2018)DOI: 10.1016/j.frl.2018.11.013

Publication reprinted with the permission of the copyright holders

Page 154: TUNI baltakys arkisto1
Page 155: TUNI baltakys arkisto1

Contents lists available at ScienceDirect

Finance Research Letters

journal homepage: www.elsevier.com/locate/frl

Neighbors matter: Geographical distance and trade timing in the

stock market

Kȩstutis Baltakys⁎, Margarita Baltakienė, Hannu Kärkkäinen, Juho Kanniainen

DARE Business Data Research Group, Laboratory of Industrial and Information Management, Tampere University of Technology, Finland

A R T I C L E I N F O

Keywords:

Investor trading

Geographical distance

Information transfer

Private information

Investor network

Social interactions

Behavioral finance

Behavioral economics

Social networks

Individual investors

JEL classification:

D8

G10

A B S T R A C T

The starting point of this paper is that neighboring investors may talk to each other sharing

information about their transactions in stock markets, leading to similar trading behavior. We

find that pairwise trade timing similarities between investor pairs are negatively associated to

geographical distance between corresponding investor pairs. This suggests that local information

transfer channels between neighboring individual investors are used in decision making. We also

observe that differences in age and language moderate this association. The analysis is conducted

using investor level data from different regions of Finland.

1. Introduction

The financial research literature, through numerous empirical studies, has provided evidence of how various behavioral biases

affect investors’ decisions. One of the less investigated examples is social interaction which is an integral part of the investing process

and has an important effect on investor decisions. Specifically, the studies on market participation have found that “social” investors

are more likely to invest in the market when the participation rate among their peers is high (Hong et al., 2004; Brown et al., 2008;

Heimer, 2014) or when a family member has recently started investing (Li, 2014). Social influence extends beyond the mere choice of

whether or not to participate in the stock market. Ivković and Weisbenner (2007) find that an investor’s decision to purchase

securities in an industry is related to neighborhood purchases of securities in that industry, especially for local stocks. It is also

possible to predict individual investor trading using a proxy measure of social contact possibilities based on epidemic models

(Shive, 2010).

Direct social interaction is the key underlying construct for social influence, and household investors can use it as a primary

channel for acquiring knowledge and information about investments. However, most of the research has investigated the aggregate

group behavior influence on individual investors’ decisions, while direct investor-to-investor (i2i) communication has not been

widely explored. A welcome exception is Ozsoylev et al. (2013), who estimate the empirical investor network to identify information

transfer between investors from investor-level transaction data, but they do not use any data or proxies about social links between

investors. The aim of this paper is to address this research gap.

https://doi.org/10.1016/j.frl.2018.11.013

Received 3 July 2018; Received in revised form 6 November 2018; Accepted 17 November 2018

⁎ Corresponding author.

E-mail address: [email protected] (K. Baltakys).

Page 156: TUNI baltakys arkisto1

This paper is also related to investor network analysis, which has generated interest in financial studies since the recent financial

crisis (Tumminello et al., 2012; Ozsoylev et al., 2013; Emmert-Streib et al., 2018; Baltakys et al., 2018; Ranganathan et al., 2018).

Our focus is on the relationship between pairwise trade timing similarity and geographical distance between investors in Finland. In

the following, we will loosely refer to such negative relationship as “local information exchange or transfer”. We seek to understand

whether households that are located closer to each other tend to time their trades in a more similar manner than those that are farther

apart. We expect households, as opposed to institutions, to be less sophisticated investors, lacking expertise and seeking to reduce the

cost of information search and decision making by word-of-mouth communication (Ivković and Weisbenner, 2007). In this paper, we

use a unique account-level transaction data set that has been explored in multiple studies (Grinblatt and Keloharju, 2001;

Linnainmaa, 2011; Tumminello et al., 2012; Siikanen et al., 2018).

Our main goal is to seek evidence of local information transfer channels existence and usage between investors living nearby. Our

starting point is that geographical proximity enables face-to-face interactions and grants access to the same local sources of in-

formation. In this regard, it has been observed that the likelihood and dynamics of friendship decrease as the distance between

individuals increases (Backstrom et al., 2010; Preciado et al., 2012).

In this paper, we test if the geographical distance of a pair of investors is associated with their synchronization in trade timing.

This research strategy automatically avoids capturing the local bias (Grinblatt and Keloharju, 2001; Zhu, 2003; Ivković and

Weisbenner, 2005; Seasholes and Zhu, 2010), because local bias is about the geographical compositions of portfolios, but not about

how investors time their transactions. Investors can seek mutual connections; consequently, trade execution timing can be partially

explained by investor reactions to received information, which in turn can be private or public. A stronger trade timing similarity at

shorter distances could indicate a reaction to private information exchange between neighboring investors. An alternative ex-

planation could be local public information arrivals, but business news on publicly listed companies is widely spread and thus is not

supposed to have local effects only. At the same time, due to the small-world effect (Travers and Milgram, 1967), the dynamics of

such exchange could lead to a ripple effect of further information dissemination (Ivković and Weisbenner, 2007). Therefore, we do

not expect to find extraordinarily strong associations in terms of economic significance. To the best of our knowledge, this is the first

investor-level, market-wide study where the geographical distance between investors is linked to their pairwise trade timing simi-

larity.

2. Data

2.1. Postal codes

In this paper, we investigate the household investor pairwise relationships between two network layers: geographical distance

layer (Barthélemy, 2011) and trade timing similarity layer aggregated over multiple securities. In the geographical distance layer,

nodes are embedded in space using the postal code as a proxy for investor location and the links between investors are actual

geographical distances between investor locations. Fig. 1 shows the distribution of Finnish household investors’ postal codes in the

country. Postal code areas vary from sparsely dense to medium dense with roughly 64% of postal codes having less than 600

investors. We use the postal code locality coordinates to calculate the distance matrix for each pair of investors1 using the Vincenty’s

formula (Vincenty, 1975).

2.2. Investor transaction data set

We use a unique investor-level transaction data set obtained from Euroclear Finland Ltd for our analysis.2 It includes transactions

from January 1995 to December 2009 of all domestic investors that traded stocks listed on the Nasdaq OMX Helsinki Exchange. Each

transaction also contains meta-data about the investors, such as investor gender, language, and year of birth. We focus on the top 20

most actively traded securities from each market capitalization segment, making a total of 60 securities analyzed.3 Finnish house-

holds typically have very low trading activity, with 47% of investors being active only once throughout the whole observation

window of 1995–2009. To conduct a market-wide analysis and have a manageable size data set, we limit the selection of household

investors to those who were active in the exchange at least 20 times.4 We need to have a sufficient number of observations to estimate

the trade timing similarity, but at the same time, we expect the inactive investors to be less sophisticated and seek easy channels with

a lower information acquisition cost (Siikanen et al., 2018). In addition, we remove the investors with multiple postal code entries.

1 For investors in the same postal code, or in cases where two or more postal codes share the same coordinates in our database, we measure the

distance as 1/4 of the distance to the closest non-zero distance postal code.2 The data that support the findings of this study are available from Euroclear Finland Ltd.; however, they cannot be obtained from the authors

under the non-disclosure agreement signed with the data provider.3 Table A.1 (see Appendix) shows the selected companies, ISINs, number of days the companies securities were traded, number of investors, total

number of transactions, and the market capitalization. We have decided to exclude Nokia company, as it is the most widely traded and owned

security in Finland.4 e.g., an investor who traded 20 securities on one day or an investor who traded one security in 20 days or any combination of both.

K. Baltakys et al.

Page 157: TUNI baltakys arkisto1

3. Methods

3.1. Measuring trading similarity

To compare the trading position taken by an investor on a given day, irrespective of the absolute volume traded, a categorical

variable is calculated as in Tumminello et al. (2012) that describes the investor’s trading activity. For each investor i and each trading

day t we take the volume sold Vs(i, t) and the volume bought Vb(i, t) for a certain security. Then the scaled net volume ratio (Eq. (1))

can be calculated:

=

+

r i t V i t V i tV i t V i t

( , ) ( , ) ( , )( , ) ( , )

b s

b s (1)

The trading state of day t can be assigned for an investor, for a chosen threshold θ (we set = 0.01):

>

<

b r i ts r i t

primarily buying state, when ( , )primarily selling state, when ( , )

In fact, the use of θ≠0 automatically excludes day traders (investors who close their positions at the end of the day). In this paper,

we focus on the investor pairs trading simultaneously in the same direction.

Subsequently, to measure the trade timing similarities over all 60 securities and both buying and selling behaviors, Jaccard

coefficients for each investor pair i, j are calculated:

=

+ +

JM

M M M[ ]ij

z dz d i j

z dz d i j z d i j z d i j

, 11( , ; , )

, 01( , ; , )

10( , ; , )

11( , ; , )

(2)

whereM z d i j11( , ; , ) is the total number of trading days where both i and j are in the state d∈ {b, s} in security z,M z d i j

01( , ; , ) is the total number

of trading days where i is not and j is in the state d∈ {b, s} in security z, and M z d i j10( , ; , ) is the total number of trading days where i is in

Fig. 1. The distribution of 3621 households postal codes over the map of Finland.

K. Baltakys et al.

Page 158: TUNI baltakys arkisto1

the state d∈ {b, s} in security z and j is not. The Jaccard coefficient is chosen for its straightforward interpretation as the fraction of

common choices of trading activity.

3.2. Regression model

Using a linear regression model, we seek to explain how the distance between Finnish households is associated with the trade

timing. The dependent variable in our model is the Jaccard similarity Jij, ranging between [0,1], with 0 having no similarity and 1

being identical in the choice of trade timing. The explanatory variable is the distance Dij (in km) between investors i and j.

We expect the age, language, and gender of an investor pair to alter the friendship and communication and possibly have an effect

on the local information exchange. Thus the following variables obtained from the data set attributes are used to examine the mod-

erating effects:

• Dummy age variable Aij, equal to 1 if absolute age difference between i and j is more than 10 years, 0 otherwise.• Language variable Lij, equal to 1 if investors i and j speak different languages, 0 otherwise.• Female pair dummy variable FFij, equal to 1 if investors i and j are both female, 0 otherwise.• Mixed gender pair dummy variable MFij, equal to 1 if investors i is male (female) and j is female (male), 0 otherwise.

In addition, these dummies are used as control variables. To summarize, we define the regression model as follows:

= + + + + +

+ + + +

J D A L FF MF DA L FF MF· ( · · · · )·· · · · .

ij ij ij ij ij ij ij

ij ij ij ij

0 1 12 13 14 15

2 3 4 5 (3)

The baseline in the regression is that in a pair, investors are approximately of the same age, they speak the same language, and

both of them are male. Thus, we expect that α1 is negative (longer distance, less communication, and thus fewer synchronized

transactions). Differences from the baseline settings are handled by the dummy variables described above.

Age has been found to impact communication and communication patters in many ways. For instance, according to Leskovec and

Horvitz (2008), especially in the online context, people tend to communicate more with each other when they are similar with respect

to age, language, and location. Moreover, generation-related differences and biases have been found to negatively affect tacit

knowledge transfer (Liebowitz et al., 2007). In addition, compared to the older generations, generation Y (usually defined as those

born between the early 1980s and mid-late 1990s) has been found to rely heavily on technology to communicate with each other

(Bolton et al., 2013), while the earlier generations commonly value more face-to-face communication (Venter, 2017). Consequently,

we expect β12 and β13 to be positive, making the association between distance and trading similarity less negative.

With the remaining two variables, we want to contrast different gender combinations. Studies show that females are more risk

averse (Estes and Hosseini, 1988; Embrey and Fox, 1997; Fehr-Duda et al., 2006) and show a stronger desire to use a financial adviser

compared to men (Stinerock et al., 1991). Therefore, we would expect the coefficients β14 and β15 to be negative.

Geographical distances can have very different meanings in rural and urban areas. For example, in Helsinki, the population

density is over 3000/km ,2 but in Lapland, it is around 1.8/km2 and neighbors can be kilometers apart from each other; thus, people are

used to traveling long distances on a daily basis. Therefore, we analyze the trade timing similarity association to the distance between

investors separately for different metropolitan and rural areas. The separate analyses on different regions can also be justified by the

finding of (Gilbert et al., 2008) that people in urban areas, as opposed to people in rural areas, have more friends, on average, and

they are situated closer, and yet they tend to have more friends scattered throughout the country than those in rural areas do. In total,

we run 14 regression analyses for different geographical areas. This also acts as a robustness check for the results.5

4. Main results

The results in Tables 1 and 2 confirm our expectations and show a consistent significant, negative relationship between distance

and trade timing similarity across all regions, except for one positive and one negative insignificant relationship. Our findings are

consistent regarding the distance and trade timing similarity association.

In 13 regions, an investor pair with an age difference larger than 10 years, on average, has a smaller trade timing similarity than a

pair of investors who are similar in age. Taking into account the age difference moderator, in most cases, we observe a reduced

negative relationship between distance and trade timing similarity for investor pairs with a larger age difference. These finding are in

line with the literature (Liebowitz et al., 2007; Leskovec and Horvitz, 2008) stating that similarity in age is important for better

communication.

We also observe a smaller trade timing similarity for investors who speak different languages. In nine regions, investors who share

the same main language are more similar in trade timing than investor pairs who speak different languages. The language moderator

also weakens the negative effect on distance association to trade timing, except for Turku, Southeastern Finland, and Northern

5 The set of metropolises is defined as follows: = {Helsinki, Tampere, Turku, Oulu}, where in our definition the Helsinki area also includes

investors from neighboring Espoo and Vantaa municipalities. The set of rural regions is defined as follows: = {Uusimaa, Eastern Tavastia,

Southwestern Finland, Western Tavastia, Central Finland, Southeastern Finland, Ostrobothnia, Northern Savonia, Eastern Finland, Northern

Finland}

K. Baltakys et al.

Page 159: TUNI baltakys arkisto1

Finland, which have a strengthening language moderating effect. This might be related to the different language distributions

(especially Finnish and Swedish) across regions.

The gender control variables do not suggest a universal rule of effects on the trade timing similarity. However, the moderators

hint at a stronger negative relationship between trade timing similarity and distance in the rural areas if one investor is female, which

is in line with gender research studies.

The main results show high statistical significance. This can partially be explained by the very high number of observations. For a

robustness check, we perform a bootstrap analysis. For each of the 1000 iterations, we sample 10% of the relationships, run the same

regression analysis, and observe the signs of the coefficients. These results also confirm that our findings are robust (see Tables C.2

and C.3 in the Appendix). At the same time, the economic significance is quite limited, but this is expected, as we are analyzing all the

relationships in the regions, while only a fraction of them may have actual social connections.

5. Additional robustness checks

In addition to the main analysis, we run a number of additional robustness checks. First, we conduct three analyses: for ag-

gregated urban and rural areas (see Table C.4), an analysis without moderators, and an analysis with the logarithmic distance

transformation ln(Dij). All results also confirm the negative association between distance and trade timing, and are available on

request. Fourth, we reason that investors might not converse about all of their traded securities but about a subset of special interest

companies. To capture this relationship, instead of measuring the similarity over all securities, we estimate the trade timing similarity

to be the maximum of the similarities observed for different securities (see eq. (B.1) in the online Appendix). The regression results

are consistent with the main findings and can be found in the online Appendix (see tables C.5 , C.6 and C.7).

Table 1

Linear regression estimates for four metropolitan areas with control variables and moderators. Pairwise relationships are estimated over 60 se-

curities and buying and selling behaviors. Standard errors are given in parentheses () and economic significance is given in curly brackets {}.

Economic significance is normed by Jaccard coefficient standard deviation.

Panel A: Distance

Helsinki Tampere Turku Oulu

Distance −6.23e−06 *** −8.24e−06 * 8.02e−07 −1.64e−05 ***

(4.47e−07) (3.52e−06) (6.01e−06) (3.02e−06)

{−0.006} {−0.002} {1.82e−04} {−0.008}

Panel B: Moderators

Age diff. ≥ 10 2.72e−06*** 1.49e−05*** 1.49e−05* 1.53e−05***

(4.84e−07) (4.16e−06) (6.73e−06) (3.59e−06)

{0.002} {0.005} {0.004} {0.008}

Different language 1.49e−06* 2.23e−05 −3.90e−05 ** 1.18e−04***

(5.81e−07) (1.53e−05) (1.18e−05) (2.04e−05)

{6.40e−04} {0.002} {−0.005} {0.009}

Female-female 3.88e−06*** 1.15e−05 −4.01e−05 ** −5.64e−06

(7.80e−07) (1.23e−05) (1.51e−05) (1.06e−05)

{0.001} {0.001} {−0.004} {−7.91e−04}

Male-female 3.43e−06*** −7.84e−06 −2.52e−05 *** −1.36e−05 ***

(4.68e−07) (4.56e−06) (6.97e−06) (3.93e−06)

{0.002} {−0.002} {−0.006} {−0.006}

Panel C: Constant and control variables

Constant 0.003*** 0.003*** 0.003*** 0.003***

(5.60e−06) (3.10e−05) (3.77e−05) (4.03e−05)

{0} {0} {0} {0}

Age diff. ≥ 10 −9.38e−05 *** −3.69e−04 *** −2.65e−04 *** −3.08e−04 ***

(6.17e−06) (3.62e−05) (4.21e−05) (4.81e−05)

{−0.004} {−0.012} {−0.009} {−0.010}

Different language −3.13e−04 *** −6.92e−04 *** −5.15e−05 −0.001 ***

(8.30e−06) (1.25e−04) (6.87e−05) (2.40e−04)

{−0.009} {−0.006} {−0.001} {−0.008}

Female-female −1.20e−04 *** 6.95e−05 1.85e−04 8.27e−06

(1.23e−05) (9.73e−05) (9.58e−05) (1.44e−04)

{−0.002} {8.12e−04} {0.003} {8.52e−05}

Male-female −1.40e−04 *** 3.39e−05 6.60e−05 9.02e−05

(6.24e−06) (3.87e−05) (4.35e−05) (5.29e−05)

{−0.005} {0.001} {0.002} {0.003}

Panel D: Sample size and Jaccard standard deviation

N 26,953,337 3,039,121 1,805,007 1,253,352

Jaccard std. dev. 0.013 0.015 0.014 0.015

*** p < 0.001; ** p <0.01; * p < 0.05

K. Baltakys et al.

Page 160: TUNI baltakys arkisto1

Table2

Linearregressionestimatesfor10ruralregionswithcontrolvariablesandmoderators.Pairwiserelationshipsareestimatedover60securitiesandbuyingandsellingbehaviors.Standarderrorsaregiven

inparentheses()andeconomicsignificanceisgivenincurlybrackets{}.EconomicsignificanceisnormedbyJaccardcoefficientstandarddeviation.

PanelA:Distance

Uusimaa

EasternTavastia

South-Western

Finalnd

WesternTavastia

CentralFinalnd

South-Eastern

Finalnd

Ostrobothnia

NorthernSavonia

EasternFinland

NorthernFinland

distance

−3.84e−07

−6.64e−06***

−8.45e−06***

−3.00e−06***

−2.46e−06***

−4.54e−06***

−9.05e−07***

−2.67e−06***

−2.82e−06***

−5.71e−07***

(2.66e−07)

(4.67e−07)

(1.51e−07)

(3.29e−07)

(1.25e−07)

(4.86e−07)

(1.68e−07)

(4.76e−07)

(1.58e−07)

(1.71e−07)

{−0.001}

{−0.014}

{−0.032}

{−0.008}

{−0.014}

{−0.011}

{−0.003}

{−0.007}

{−0.020}

{−0.004}

PanelB:Moderators

agediff.≥10

5.11e−07

1.50e−06**

2.57e−06***

9.83e−07*

1.22e−06***

−2.17e−07

2.51e−07

7.39e−07

6.81e−07***

4.24e−07*

(2.97e−07)

(5.54e−07)

(1.57e−07)

(3.94e−07)

(1.52e−07)

(5.88e−07)

(1.83e−07)

(5.72e−07)

(1.94e−07)

(2.09e−07)

{0.001}

{0.003}

{0.011}

{0.003}

{0.007}

{−6.21e−04}

{9.55e−04}

{0.002}

{0.005}

{0.003}

differentlanguage

7.64e−07*

2.43e−06

7.85e−06***

1.48e−05***

6.77e−08

−4.09e−07

3.90e−07*

8.49e−06

1.53e−06

−7.23e−06**

(3.49e−07)

(2.81e−06)

(2.17e−07)

(3.04e−06)

(9.13e−07)

(3.54e−06)

(1.90e−07)

(4.66e−06)

(1.77e−06)

(2.50e−06)

{0.002}

{9.89e−04}

{0.036}

{0.005}

{5.55e−05}

{−1.55e−04}

{0.001}

{0.002}

{9.73e−04}

{−0.004}

female-female

−3.60e−06***

−8.36e−06***

−6.26e−06***

−7.97e−07

6.13e−07

−2.44e−06

−1.34e−06*

1.28e−07

−1.61e−06*

3.23e−07

(8.60e−07)

(1.56e−06)

(4.03e−07)

(1.20e−06)

(4.73e−07)

(1.83e−06)

(5.29e−07)

(1.66e−06)

(6.29e−07)

(6.69e−07)

{−0.003}

{−0.006}

{−0.009}

{−7.08e−04}

{9.49e−04}

{−0.002}

{−0.002}

{1.04e−04}

{−0.003}

{6.18e−04}

male−female

−1.63e−06***

−4.63e−06***

−2.26e−06***

−1.56e−06***

−6.44e−08

3.79e−07

−8.83e−07***

−1.46e−06*

−9.24e−07***

−4.71e−07*

(3.27e−07)

(6.01e−07)

(1.68e−07)

(4.36e−07)

(1.69e−07)

(6.57e−07)

(2.01e−07)

(6.29e−07)

(2.19e−07)

(2.36e−07)

{−0.004}

{−0.009}

{−0.008}

{−0.004}

{−2.95e−04}

{8.91e−04}

{−0.003}

{−0.003}

{−0.005}

{−0.003}

PanelC:Constantandcontrolvariables

constant

0.003***

0.003***

0.003***

0.003***

0.003***

0.003***

0.003***

0.003***

0.004***

0.003***

(1.60e−05)

(2.56e−05)

(1.45e−05)

(2.45e−05)

(1.77e−05)

(3.84e−05)

(1.51e−05)

(3.73e−05)

(3.15e−05)

(2.98e−05)

{0}

{0}

{0}

{0}

{0}

{0}

{0}

{0}

{0}

{0}

agediff.≥10

−7.88e−05***

−2.08e−04***

−4.24e−04***

−2.91e−04***

−3.18e−04***

2.32e−05

−1.90e−04***

−2.88e−04***

−3.13e−04***

−2.96e−04***

(1.88e−05)

(3.03e−05)

(1.65e−05)

(2.93e−05)

(2.14e−05)

(4.63e−05)

(1.66e−05)

(4.45e−05)

(3.87e−05)

(3.63e−05)

{−0.003}

{−0.007}

{−0.014}

{−0.010}

{−0.011}

{7.38e−04}

{−0.007}

{−0.009}

{−0.010}

{−0.010}

differentlanguage

−1.90e−04***

−0.002***

−9.69e−04***

−0.002***

−8.74e−04***

2.33e−04

−1.94e−04***

−5.43e−04

8.19e−05

9.05e−04*

(2.73e−05)

(1.75e−04)

(2.94e−05)

(2.52e−04)

(1.43e−04)

(2.57e−04)

(1.75e−05)

(3.07e−04)

(3.31e−04)

(3.91e−04)

{−0.006}

{−0.011}

{−0.029}

{−0.007}

{−0.005}

{0.001}

{−0.007}

{−0.002}

{2.78e−04}

{0.003}

female−female

−1.63e−04**

4.57e−04***

6.64e−04***

−2.72e−04**

−1.63e−04*

−2.62e−04

6.92e−05

2.87e−04*

2.33e−04

−5.35e−05

(5.51e−05)

(8.36e−05)

(4.43e−05)

(9.00e−05)

(6.78e−05)

(1.44e−04)

(4.78e−05)

(1.29e−04)

(1.24e−04)

(1.11e−04)

{−0.002}

{0.006}

{0.009}

{−0.003}

{−0.002}

{−0.003}

{8.81e−04}

{0.003}

{0.002}

{−6.18e−04}

male−female

−1.05e−04***

2.17e−04***

1.29e−04***

−2.14e−05

−1.12e−04***

−1.82e−04***

−4.59e−05*

2.14e−04***

1.26e−04**

−1.62e−05

(2.07e−05)

(3.28e−05)

(1.78e−05)

(3.25e−05)

(2.41e−05)

(5.16e−05)

(1.82e−05)

(4.88e−05)

(4.37e−05)

(4.05e−05)

{−0.004}

{0.007}

{0.004}

{−6.94e−04}

{−0.003}

{−0.005}

{−0.002}

{0.006}

{0.004}

{−5.13e−04}

PanelD:SamplesizeandJaccardstandarddeviation

N5,935,341

3,154,892

11,648,346

3,983,213

5,301,202

1,922,045

12,072,544

1,858,117

2,246,499

2,128,835

Jaccardstd.dev.

0.013

0.014

0.014

0.014

0.015

0.016

0.013

0.016

0.016

0.014

***p<0.001;**p<0.01;*p<0.05

K. Baltakys et al.

Page 161: TUNI baltakys arkisto1

As a yet additional robustness check, to control for increase in information flows due to internet and online trading in the 2000s,

we estimated the trade timing similarities over different years and five-year periods (see Table C.8 and C.9 in the Appendix) and ran

regression analysis with period dummies. The results concerning the distance coefficients, age difference and languages are consistent

with the main findings.

Sixth, to investigate if the results are not driven only by active investors we ran the main regression analysis excluding 5% of the

most active investors. The results confirm our main findings and can be found in the Appendix (see Table C.10).

Finally, as a reference to our main within- regional analysis we performed a regression analysis for across regional investor pairs.

We ran 1000 bootstrap iterations for each urban region, sampling 500,000 investor pairs within an urban region, and the same

amount of investor pairs where one of the investors in the pair originates from the urban area and the other comes from anywhere

outside the analyzed urban region. For both the within-regional and across-regional sample we ran the regression models and

observed the resulting coefficient distributions (see resulting distributions of α1 coefficients in Fig. C.1 in the Appendix). We find that

parameter estimates of within-regional regressions are systematically more negative compared to corresponding parameters of

across-regional regressions. The fact that the within-regional coefficients are systematically lower than the across-regional coeffi-

cients suggests that the distance between investors is more important at closer distances.

6. Conclusions

This study contributes to the literature on financial information transfer channels in that the findings confirm the expected

negative association between the trade timing similarities and distances between investors. The results are robust and show that the

closer investors are to each other, the higher their trade timing similarity is. The observed negative association suggests the existence

of local information diffusion channels between household investors. Investor age and language are important attributes that sys-

temically moderate the observed association.

Acknowledgments

The research project leading to these results received funding from the EU Research and Innovation Programme Horizon 2020

under grant agreement No. 675044 (BigDataFinance) and Tampere University of Technology doctoral school. M.B. is grateful for the

grants received from The Finnish Foundation for Technology Promotion, The Foundation for Advancement of Finnish Securities

Market and The Finnish Foundation for Share Promotion. The funders had no role in study design, data collection and analysis,

decision to publish, or preparation of the manuscript.

Supplementary material

Supplementary material associated with this article can be found, in the online version, at 10.1016/j.frl.2018.11.013

References

Backstrom, L., Sun, E., Marlow, C., 2010. Find me if you can: improving geographical prediction with social and spatial proximity. Proceedings of the 19th

International Conference on World Wide Web. ACM, pp. 61–70.

Baltakys, K., Kanniainen, J., Emmert-Streib, F., 2018. Multilayer aggregation with statistical validation: application to investor networks. Sci. Rep. 8 (1), 8198.

Barthélemy, M., 2011. Spatial networks. Phys. Rep. 499 (1–3), 1–101.

Bolton, R.N., Parasuraman, A., Hoefnagels, A., Migchels, N., Kabadayi, S., Gruber, T., Komarova Loureiro, Y., Solnet, D., 2013. Understanding generation y and their

use of social media: a review and research agenda. J. Serv. Manag. 24 (3), 245–267.

Brown, J.R., Ivković, Z., Smith, P.A., Weisbenner, S., 2008. Neighbors matter: causal community effects and stock market participation. J. Finance 63 (3), 1509–1531.

Embrey, L.L., Fox, J.J., 1997. Gender differences in the investment decision-making process. J. Financ. Counsel. Plan. 8 (2), 33.

Emmert-Streib, F., Musa, A., Baltakys, K., Kanniainen, J., Tripathi, S., Yli-Harja, O., Jodlbauer, H., Dehmer, M., 2018. Computational analysis of the structural

properties of economic and financial networks. Journal of Network Theory in Finance 4 (3), 1–32.

Estes, R., Hosseini, J., 1988. The gender gap on wall street: an empirical analysis of confidence in investment decision making. J. Psychol. 122 (6), 577–590.

Fehr-Duda, H., De Gennaro, M., Schubert, R., 2006. Gender, financial risk, and probability weights. Theory Decis. 60 (2–3), 283–313.

Gilbert, E., Karahalios, K., Sandvig, C., 2008. The network in the garden: an empirical analysis of social media in rural life. Proceedings of the SIGCHI Conference on

Human Factors in Computing Systems. ACM, pp. 1603–1612.

Grinblatt, M., Keloharju, M., 2001. How distance, language, and culture influence stockholdings and trades. J. Finance 56 (3), 1053–1073.

Heimer, R.Z., 2014. Friends do let friends buy stocks actively. J. Econ. Behav. Organ. 107, 527–540.

Hong, H., Kubik, J.D., Stein, J.C., 2004. Social interaction and stock-market participation. J. Finance 59 (1), 137–163.

Ivković, Z., Weisbenner, S., 2005. Local does as local is: information content of the geography of individual investors’ common stock investments. J. Finance 60 (1),

267–306.

Ivković, Z., Weisbenner, S., 2007. Information diffusion effects in individual investors’ common stock purchases: covet thy neighbors’ investment choices. Rev. Financ.

Stud. 20 (4), 1327–1357.

Leskovec, J., Horvitz, E., 2008. Planetary-scale views on a large instant-messaging network. Proceedings of the 17th International Conference on World Wide Web.

ACM, pp. 915–924.

Li, G., 2014. Information sharing and stock market participation: evidence from extended families. Rev. Econ. Stat. 96 (1), 151–160.

Liebowitz, J., Ayyavoo, N., Nguyen, H., Carran, D., Simien, J., 2007. Cross-generational knowledge flows in edge organizations. Ind. Manag. Data Syst. 107 (8),

1123–1153.

Linnainmaa, J.T., 2011. Why do (some) households trade so much? Rev. Financ. Stud. 24 (5), 1630–1666.

Ozsoylev, H.N., Walden, J., Yavuz, M.D., Bildik, R., 2013. Investor networks in the stock market. Rev. Financ. Stud. 27 (5), 1323–1366.

Preciado, P., Snijders, T.A., Burk, W.J., Stattin, H., Kerr, M., 2012. Does proximity matter? distance dependence of adolescent friendships. Soc. Netw. 34 (1), 18–31.

Ranganathan, S., Kivelä, M., Kanniainen, J., 2018. Dynamics of investor spanning trees around dot-com bubble. PLoS ONE 13 (6), e0198807.

Seasholes, M.S., Zhu, N., 2010. Individual investors and local bias. J. Finance 65 (5), 1987–2010.

K. Baltakys et al.

Page 162: TUNI baltakys arkisto1

Shive, S., 2010. An epidemic model of investor behavior. J. Financ. Quant. Anal. 45 (1), 169–198.

Siikanen, M., Baltakys, K., Kanniainen, J., Vatrapu, R., Mukkamala, R., Hussain, A., 2018. Facebook drives behavior of passive households in stock markets. Finance

Res. Lett.

Stinerock, R.N., Stern, B.B., Solomon, M.R., 1991. Gender differences in the use: of surrogate consumers for financial decision-making. J. Prof. Serv. Mark. 7 (2),

167–182.

Travers, J., Milgram, S., 1967. The small world problem. Phychol. Today 1 (1), 61–67.

Tumminello, M., Lillo, F., Piilo, J., Mantegna, R.N., 2012. Identification of clusters of investors from their real trading activity in a financial market. New J. Phys. 14

(1), 013041.

Venter, E., 2017. Bridging the communication gap between generation y and the baby Boomer generation. Int. J. Adolesc. Youth 22 (4), 497–507.

Vincenty, T., 1975. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey Rev. 23 (176), 88–93.

Zhu, N., 2003, The local bias of individual investors, Working paper, Yale University.

K. Baltakys et al.

Page 163: TUNI baltakys arkisto1

PUBLICATION

V

Facebook drives behavior of passive households in stock marketsSiikanen, M., Baltakys, K., Kanniainen, J., Vatrapu, R., Mukkamala, R. and

Hussain, A.

Finance Research Letters 27.(2018), 208–213DOI: 10.1016/j.frl.2018.03.020

Publication reprinted with the permission of the copyright holders

Page 164: TUNI baltakys arkisto1
Page 165: TUNI baltakys arkisto1

Contents lists available at ScienceDirect

Finance Research Letters

journal homepage: www.elsevier.com/locate/frl

Facebook drives behavior of passive households in stock markets

Milla Siikanen⁎,a, Kęstutis Baltakysa, Juho Kanniainena, Ravi Vatrapub,c,Raghava Mukkamalab,c, Abid Hussainb

aDARE Business Data Research Group, Laboratory of Industrial and Information Management, Tampere University of Technology, Finlandb Centre for Business Data Analytics, Copenhagen Business School, DenmarkcWesterdals Oslo School of Arts, Communication and Technology, Norway

A R T I C L E I N F O

Keywords:Investor behaviorSocial mediaStock marketsInvestor sophisticationDecision making

JEL classification:G10G11

A B S T R A C T

Recent studies using data on social media and stock markets have mainly focused on predictingstock returns. Instead of predicting stock price movements, we examine the relation betweenFacebook data and investors’ decision making in stock markets with a unique data on investors’transactions on Nokia. We find that the decisions to buy versus sell are associated with Facebookdata especially for passive households and for nonprofit organizations. At the same time, it seemsthat more sophisticated investors—financial and insurance institutions—are behaving in-dependently from Facebook activities.

1. Introduction

Social media sites, such as Facebook and Twitter, create various opportunities for companies to improve their internal andexternal communications and to collaborate and communicate with their customers, partners, and other stakeholders, such as in-vestors. Given the importance of social media in external communications, it is not surprising that social media data have been usedrecently to predict real-world outcomes (see e.g. Asur and Huberman, 2010). In the financial market research, numerous scholarshave used Facebook data (Karabulut, 2013; Siganos et al., 2014; Bukovina et al., 2015) and data from other social media sites (Bollenet al., 2011; Zhang et al., 2011; Zheludev et al., 2014; Chen et al., 2014; Nofer and Hinz, 2015; Zhang et al., 2017; You et al., 2017).1

The primary aim of such research has been to predict market-wide stock movements, yet there is scant research on how social mediadata relate to the behavior of individual investors, perhaps because of the lack of availability of investor account level data.

In this paper, we examine the extent to which investors’ trading decisions are driven by Facebook posts and activity. To this end,we use a unique investor-level shareholding registration data set that includes the trading of all Finnish investors over multiple years.In particular, given that an investor trades, we study how Facebook data relate to investors’ decisions to increase or decrease theirpositions. This question is addressed for different investor groups, including financial institutions, nonprofit organizations, andhouseholds, and their trades in Nokia stock. As Nokia was one of the most liquid stocks on the Finnish stock market, this unique datahas been studied in several articles,2 and here we combine it with social media data. Paper by Lillo et al. (2015) is the most closelyrelated study to ours. It also investigates the trading behavior of different investor groups with Nokia stock, but with ThomsonReuters news articles—which are not social media data per se.

https://doi.org/10.1016/j.frl.2018.03.020Received 5 March 2018; Accepted 16 March 2018

⁎ Corresponding author.E-mail address: [email protected] (M. Siikanen).

1 See also Bukovina (2016) for an overview of research related to a link between social media and capital markets.2 See for example Westerholm (2009); Tumminello et al. (2012); Lillo et al. (2015) and Ranganathan et al. (2017).

Page 166: TUNI baltakys arkisto1

Currently, Facebook is clearly the most widely used social media platform, with 2.2 billion monthly active users worldwide(Statista, 2018). As of January 2013, social media sites such as Facebook and Twitter are used by about 45% of S&P1500 firms tocommunicate externally formal and informal information about their business (Jung et al., 2017). Specifically, companies com-municate both corporate disclosures and other information via social media (Zhou et al., 2014). Yang et al. (2017) show that socialmedia, and mass media in general, influences investor’s trading decisions. Snow and Rasso (2017) argue that less sophisticatedinvestors potentially benefit most from disclosures communicated via social media, because, on social media platforms, the in-formation is essentially “pushed” to them, which makes this information easier to access. In addition, Snow and Rasso (2017) showthat less sophisticated investors process financial information received from social media differently from information received viacompany’s investor relations website.

It is important to remember that typically companies use official exchange-routed company announcements as a primary com-munication channel (see e.g. Jung et al., 2017), followed by other channels, including newspapers and social media.3 Additionally,communicating information via social media is voluntary, while some company announcement releases are mandatory. Furthermore,Jung et al. (2017) show that companies disseminate strategically, i.e. companies are less likely to disseminate information in Twitterwhen the news is bad. In this regard, we wish to determine how the investment decisions of, for example, less sophisticated andprofessional investors, among other investor groups, correlate with potentially biased Facebook information. We note that the re-lationship between Facebook data and trading can also be related to the attention grabbing behavior of investors, especiallyhouseholds (see Barber and Odean, 2007).

2. Data

2.1. Shareholding registration record data

To identify the trading of different investor categories, we use shareholding registration record data including all domesticinvestors from June 7, 2010 to the end of 2016, obtained from Euroclear Ltd.4 Each record in the data contains detailed informationabout the investor and the change in his/her holdings. During our analysis period, 282,269 distinct Finnish investors traded Nokiastock. We divide them into five groups according to their sector codes: nonfinancial corporations, financial and insurance cor-porations, general governmental organizations, nonprofit organizations, and households. Household investors are further dividedinto four investor activity groups. Investor’s activity group is defined based on the number of days the investor traded during the pasteight weeks, including the analyzed week. If the number of active days in the past 8 weeks is equal to 1, the investor is consideredinactive; if it is between 2 and 5, the investor is passive; 6–20 means moderate; and 21–40 means active. Notably, this is a dynamicgroup, as one investor might appear in several groups throughout the analysis period.

For the purposes of our analysis, we calculate the number of investors in each group who changed their holdings during a weekand the number of investors who increased their holdings (bought more than sold) during that week. Table 1 gives the descriptivestatistics of the investor groups and their weekly trading in our data sample. We see that financial and governmental institutions areon average most active sector groups, where as households and nonprofit organizations are least active.

2.2. Facebook data

We collect daily numbers of posts and related comments, likes, and shares from Nokia’s Facebook wall5 between June 2010 andDecember 2016 using the Social Data Analytics Tool (SODATO) (see Hussain et al., 2014; Hussain and Vatrapu, 2014a, 2014b). Thecomments, likes, and shares are always related to a specific post, i.e. the post is the main action. Therefore, we assign the numbers ofcomments, likes, and shares to the date of the original post—that is, not the date when the actual comment, like, or share was made.In effect, the numbers of comments, likes, and shares quantify the attention the posts released on a particular day received.

We aggregate the daily Facebook data to weekly by summing the numbers of posts, comments, likes, and shares during a week.We take the week beginning on Saturday and ending on Friday, since trading does not occur on weekends. This way, we relate theFacebook activity on weekends to the week in which they can actually affect investors’ trading decisions. In total, our samplecomprises of 342 weekly observations for posts, comments, likes, and shares. Table 2 gives descriptive statistics of these time series.We can see that on average, there is more than one post made per day, and calculate that one post got on average 274 comments,4379 likes, and 7 shares.

3 See Siikanen et al. (2017b,a), and references therein, for effects of company announcements in stock markets.4 Grinblatt and Keloharju (2000, 2001); Tumminello et al. (2012); Lillo et al. (2015) and Baltakys et al. (2018) use data sets from the same source, and provide

descriptions of the data. However, they use data from before 2009, when all transactions were reported separately with exact trading dates. After moving to CentralCounterparty Clearing in late 2009, the Euroclear research data set contains only aggregated daily trades without specifying the actual trading dates—instead a registrationdate is reported for each record. Thus, we reverse engineer the trading dates from the registration dates. We use the official T+3 settlement convention for data beforeand on October 8, 2014 and T + 2 afterwards (see Euroclear, 2014). Using the derived trading dates, we aggregate transactions on a weekly basis, and this reduced thepossible noise of inaccurate trading date derivation.5 https://www.facebook.com/nokia.

M. Siikanen et al.

Page 167: TUNI baltakys arkisto1

2.3. Company announcement data

The announcement data is collected from NASDAQ OMX Nordic’s website.6 The data set includes all the announcements thatNokia filed with Nasdaq between June 2010 and December 2016. Altogether, we have 507 company announcements in the sample.We aggregate the announcement data into weekly by summing the number of announcements from Saturday to Friday, i.e. in similarway as the Facebook data. In the regressions, we use a dummy variable to indicate whether there was at least one announcementrelease during a week. Our sample includes 187 weeks with at least one announcement release (out of total 342 weeks).

2.4. Weekly return data

The daily adjusted closing price data used to calculate the returns is collected from NASDAQ OMX Nordic’s website.7 For eachweek, we calculate the log return as = −P PRet ln[ / ],t t t 1 where Pt is the closing price from the last trading day on the week (usuallyFriday), and −Pt 1 is the closing price from last trading day on the previous week −t 1 (usually previous week’s Friday). The averageweekly return for Nokia during the sample period was −0.16%.

3. Framework of the empirical analysis

Our analysis is based on logistic regressions to explain how Facebook activity relates to an increase versus a decrease in Nokiashares in investors’ portfolios.8 To identify the groups of investors whose trading behavior is related to Facebook data, we runseparate regressions for each investor group with each Facebook variable.

Table 1Descriptive statistics on investor groups. N gives the total number of investors per group. Mean, median and standard deviation (st.Dev) relate tothe weekly observations on numbers of investors in each group that changed their net holdings during a week. In Panel B, household investors arecategorized into activeness groups on the basis of their trading in the past eight weeks (40 trading days).

Panel A: Investor categories

Sector N Mean (%of all) Median st.Dev

Companies 12,213 271 (2.2%) 230 166Financial 427 28 (6.6%) 27 9Governmental 89 7 (7.9%) 7 4Nonprofit 1177 18 (1.5%) 16 12Households 268,363 4640 (1.7%) 3694 3179Total 282,269

Panel B: Activity groups of household investors

Activeness; # of active days N Mean (%of all) Median st.Dev

Active; (20, 40] 1228 54 (4.4%) 51 22Moderate; (5, 20] 16,019 502 (3.1%) 450 227Passive; (1, 5] 120,906 1856 (1.5%) 1402 1422Inactive; 1 264,942 2228 (0.8%) 1670 1897

Table 2Descriptive statistics on Facebook data. N gives the total number of each Facebook activity in our sample. Mean, median and standard deviation(st.Dev) relate to the weekly observations on numbers of each Facebook activity.

Activity N Mean Median st.Dev

Post 2906 8 8 6Comment 797,586 2332 1585 2808Like 12,725,171 37,208 11,977 43,500Share 919,380 2688 461 4525

6 http://www.nasdaqomxnordic.com/news/companynews, see the page also for detailed information.7 http://www.nasdaqomxnordic.com/shares/microsite?Instrument=HEX24311.8 Another option would be (instead of restricting the analysis to a binary outcome) to use linear regressions with continuous dependent variable (i.e. how much an

investor changed the position). However, in order to use continuous dependent variable, proportional changes in investors’ positions would have to be calculated,which, in turn, requires information on investors’ holdings. In contrast to changes in holdings, the levels of holdings, however, were not accurately available. The useof “changes in holdings” as a non-proportional variable is problematic, because investors are trading by very different amounts of shares. These problems are addressedby using logistic regression.

M. Siikanen et al.

Page 168: TUNI baltakys arkisto1

The dependent variable in our regressions is a dummy variable with value 1 if an investor increased his/her holdings in Nokiastock during a given week (bought more than sold) and 0 if the investor decreased the holdings (Dtincreased). In a given week, onlyinvestors whose net position for Nokia changed are included. The explanatory variable of main interest is the number of posts,comments, likes, or shares depending on the regression (FB). We control for company announcement releases with company an-nouncement dummy (NEWSt), which is 1 if there was an announcement released during week t and 0 otherwise. Additionally, we usethe number of investors in the group who increased their holdings during the previous week scaled by the total number of investorswho changed their holdings during the previous week. This is depicted as follows:

∑=−− =

nDscD 1

tt i

n

i t1increased

1 1, 1increased

t 1

where −nt 1 is the number of investors who changed (increased or decreased) their holdings in Nokia during week −t 1. We also addcontrol variables for the return on present week (Rett) and the previous week (Ret −t 1). Lastly, we include monthly (Mt) and yearly (Yt)dummy variables. The monthly dummies control for the potential yearly seasonality in the trading (for example, realizing the lossesin December for tax purposes, see e.g. Grinblatt and Keloharju, 2001), and the yearly dummies accommodate the analysis for exampleto possible changes due to the abandonment of Nokia’s mobile business (in 2014, Nokia’s mobile business was acquired by Microsoft,changing the focus of the company to a telecommunications infrastructure business). To summarize, the regressions we run are of thefollowing form:

∑ ∑

= + + +

+ + + +

=

+

=

+

g D α α α α

α α α M α Y

( ) ·FB ·NEWS ·scD

·Ret ·Ret · ·

t t t t

t tj

j jj

j j

increased1 2 3 4 1

increased

5 6 11

11

61

6

17(1)

where g is the logit function.

4. Results

Panel A in Table 3 shows that for households and nonprofit institutions, all the regression estimates are statistically significant.The results indicate that the decisions of investors in these groups to buy vs. sell have a clear association with the Facebook data. Fornonprofit institutions, the economic significance is relatively high: the odds of a nonprofit institution buying rather than selling rangefrom 1.111 to 1.212 when the amount of Facebook activity increases by one standard deviation. For financial institutions, Panel A inTable 3 shows no association between the buy vs. sell decisions and the Facebook data. The results for companies and governmentalinstitutions are something between those of financial institutions and households and nonprofit institutions, as half or less of theestimates are statistically significant.

To take a closer look at the effect of Facebook on the trading of households, Panel B in Table 3 presents the estimated regressionresults for individual investors in different activity groups. We observe that, in general, the more active a household is, the weaker isthe association between Facebook data and buying/selling behavior. The odds ratios for passive and inactive investors are moremodest than those of nonprofit institutions, though for posts they are still relatively high (1.088 and 1.072). For brevity, we do notreport the regression estimates for interception and control variables here, but they are available in Online appendix. In general, mostof the estimates for control variables are statistically significant.

Grinblatt and Keloharju (2000) argue that, roughly speaking, finance and insurance institutions, as well as companies, can beviewed as the most sophisticated investor groups, as they generally take larger positions, have more resources to spend on research,and in many cases view investment as full-time career. In light of this, our findings indicate that more sophisticated investors aremore independent of Facebook activities, as there is clearly no association between Facebook activities and decisions of financialinstitutions. Assuming that an investor’s activeness is related to his/her sophistication, our findings on household activity groupssupports the result that more sophisticated investors behave more independently of Facebook data.

Facebook can be seen as a secondary information channel compared to first-hand official company announcements published onthe exchange, and companies are likely to strategically select information disseminated in Facebook (Jung et al., 2017). Nonprofitorganizations and households, as arguably less sophisticated investors (Grinblatt and Keloharju, 2000), may allow their tradingdecision to be affected by Facebook posts and activity, especially if they have no access to professional data sources. In line with thisview, Ammann and Schaub (2017) find that the trading decisions of unsophisticated investors are affected by postings that do notcontain value-relevant information on a social trading platform.

As our question is if the decisions of different investors are associated with the Facebook data, we are mostly interested in whetherthe regression estimates for the Facebook variables are statistically and economically significant, while the signs of the coefficientsare not in the main focus.9 However, a couple of words about the signs of the estimates in Table 3. In Panel A, the signs for posts,comments, and likes are consistently positive, except comments for households. The signs for shares are both positive (governmentaland nonprofit) and negative (companies, financial, households), though not all of them are statistically significant, which can explain

9 The number of data points in the regression analysis is 332–341, which does not automatically lead to significant estimates as very large data samples do.

M. Siikanen et al.

Page 169: TUNI baltakys arkisto1

the variation. Panel B with activity groups reports positive estimates for posts, but there is more variation for comments, likes, andshares as passive and inactive investors have negative estimates. Looking deeper into the reasons of these findings is out of the scopeof this paper and left for the future research, as it would require semantic analysis.10

5. Summary and conclusion

This paper gives the first empirical evidence that Facebook activities affect the trading of different investors differently. Weprovide evidence that the decisions of arguably less sophisticated investors—that is, households and nonprofit organizations—toincrease or decrease shareholdings are clearly associated with Facebook data. At the same time, the decisions of financial institutions,which are likely to be among the most sophisticated investors in the market, are not associated with Facebook activity. Moreover, lessactive households’ decisions are related to Facebook, while the decisions of more active ones are not, which gives additional evidencethat the less sophisticated the investor, the more closely related the behavior is to Facebook. Given that Facebook is not a regulatedinformation channel compared to first-hand official exchange releases, companies are likely to strategically select what informationto disseminate in Facebook (Jung et al., 2017). This suggests that less sophisticated investors, who may not have access to profes-sional sources for financial data and news, may be driven by biased information.

In the future research we are planning to do sentimental analysis on the posts and comments to give a more comprehensive

Table 3Regression estimates: Trading of investor groups and Facebook data. The estimates related to Facebook variables of logistic regressionsdescribed in Section 3 (Eq. (1)) for all the investor categories. The dependent variable is a dummy variable getting value of 1 if an investor increasedhis/her holdings during the week, and 0 if the investor decreased the holdings. In addition to the Facebook related variables (for which we report theestimates here), we control for company announcement releases, number of investors who changed their position during previous week (scaled),current and previous weeks returns, and in addition we have monthly and yearly dummies. The regression estimates for control variables (omittedhere) are available in Online appendix. In Panel B, household investors are categorized into activeness groups on the basis of their trading in the pasteight weeks (40 trading days). Number of observations (weeks in the analysis) is 341 for all the other regressions, except 332 for group govern-mental. p-values are given in parentheses (), and odds ratios (ORs) are given in curly brackets {}. ORs are calculated on the basis of one standarddeviation change in the explanatory variable.

Panel A: Investor categories

Posts Comments Likes Shares

Companies 0.011*** 5.55E−06 7.61E−07** −6.40E−07(3.71E−12) (0.098) (6.44E−03) (0.775){1.064} {1.016} {1.034} {0.997}

Financial 5.92E−03 1.25E−05 9.42E−07 −1.94E−06(0.185) (0.197) (0.226) (0.758){1.034} {1.036} {1.042} {0.991}

Governmental 0.015 5.45E−05** 2.62E−06 2.19E−05(0.091) (7.20E−03) (0.112) (0.087){1.086} {1.165} {1.121} {1.104}

Nonprofit 0.033*** 3.76E−05** 4.43E−06*** 3.03E−05**(2.27E−07) (4.79E−03) (2.23E−04) (1.84E−03){1.203} {1.111} {1.212} {1.147}

Households 0.011*** −4.46E−06*** 1.92E−07** −9.04E−06***(5.74E−83) (7.64E−07) (6.89E−03) (1.23E−41){1.064} {0.988} {1.008} {0.960}

Panel B: Activity groups of household investors

Active 2.41E−04 −5.82E−06 −9.74E−07 −1.36E−05**(0.942) (0.386) (0.065) (1.32E−03){1.001} {0.984} {0.959} {0.940}

Moderate 1.85E−03 4.11E−06 2.67E−07 −1.71E−06(0.083) (0.071) (0.140) (0.225){1.011} {1.012} {1.012} {0.992}

Passive 0.012*** −9.33E−06*** −7.33E−07*** −1.60E−05***(8.55E−55) (2.37E−10) (4.53E−10) (1.73E−49){1.072} {0.974} {0.969} {0.930}

Inactive 0.015*** 1.01E−05*** 1.84E−06*** 1.09E−07(5.12E−67) (5.19E−12) (1.24E−42) (0.911){1.088} {1.029} {1.083} {1.000}

***p < 0.001; **p < 0.01; *p < 0.05

10 Additionally, one could consider if observed associations can represent a reverse causality so that investors are not reacting to social media posts but companiesare posting on Facebook in response to changes in investment behavior. However, the reverse causality seems unlikely, because the information about numbers oftraders changing their position is not public.

M. Siikanen et al.

Page 170: TUNI baltakys arkisto1

picture of the reactions of different investors to Facebook activities. Concentrating only on Nokia may introduce some investorclientele bias, since the investors interested in Nokia may in general be more social media and technology savvy and follow the postsbecause of their inclination towards technology. At this point, we were only able to collect the data for Nokia, but in the futureresearch we are planning to extend the sample to a wider variety of stocks.

Acknowledgments

We want to thank Hannu Kärkkäinen and Jari Jussila for their valuable efforts to enable this project. The research leading to theseresults received funding from Tampere University of Technology through Networked Big Data Science and Engineering Center andTUT’s doctoral school. This project also received funding from the European Union’s Horizon 2020 research and innovation programunder Marie Sklodowska-Curie grant agreement no. 675044. First author is grateful for the grants received from KAUTE Foundationand Nordea Bank Foundation sr. The funders had no role in study design, data collection and analysis, decision to publish, orpreparation of the manuscript.

Supplementary material

Online appendix associated with this article can be found, in the online version, at doi:10.1016/j.frl.2018.03.020.

References

Ammann, M., Schaub, N., 2017. The impact of internet postings on individual investors. Proceedings American Finance Association 2018 Annual Meeting.Asur, S., Huberman, B.A., 2010. Predicting the future with social media. Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM

International Conference on. 1. IEEE, pp. 492–499.Baltakys, K., Kanniainen, J., Emmert-Streib, F., 2018. Multilayer aggregation with statistical validation: application to investor networks. arXiv preprint arXiv:1708.

09850.Barber, B.M., Odean, T., 2007. All that glitters: the effect of attention and news on the buying behavior of individual and institutional investors. Rev. Financ. Stud. 21

(2), 785–818.Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. J. Comput. Sci. 2 (1), 1–8.Bukovina, J., 2016. Social media big data and capital marketsan overview. J. Behav. Exp. Finance 11, 18–26.Bukovina, J., et al., 2015. Sentiment and blue-chip returns. Firm level evidence from a dynamic threshold model. Technical Report. Mendel University in Brno, Faculty

of Business and Economics.Chen, H., De, P., Hu, Y., Hwang, B.-H., 2014. Wisdom of crowds: the value of stock opinions transmitted through social media. Rev. Financ. Stud. 27 (5), 1367–1403.Euroclear, 2014. T+2 Implementation Questions & Answers. https://www.euroclear.com/dam/EFi/Campaigns/T-2_cycleQuestionsAndAnswers.pdf.Grinblatt, M., Keloharju, M., 2000. The investment behavior and performance of various investor types: a study of finland’s unique data set. J. Financ. Econ. 55 (1),

43–67.Grinblatt, M., Keloharju, M., 2001. What makes investors trade? J. Finance 56 (2), 589–616.Hussain, A., Vatrapu, R., 2014a. Social data analytics tool: design, development, and demonstrative case studies. Enterprise Distributed Object Computing Conference

Workshops and Demonstrations (EDOCW), 2014 IEEE 18th International. IEEE, pp. 414–417.Hussain, A., Vatrapu, R., 2014b. Social data analytics tool (sodato). International Conference on Design Science Research in Information Systems. Springer, pp.

368–372.Hussain, A., Vatrapu, R., Hardt, D., Jaffari, Z.A., 2014. Social Data Analytics Tool: A Demonstrative Case Study of Methodology and Software. Analyzing Social Media

Data and Web Networks. Springer, pp. 99–118.Jung, M.J., Naughton, J.P., Tahoun, A., Wang, C., 2017. Do firms strategically disseminate? evidence from corporate use of social media. Acc. Rev. in press https://doi.

org/10.2308/accr-51906.Karabulut, Y., 2013. Can facebook predict stock market activity? AFA 2013 San Diego Meetings Paper. available at SSRN.Lillo, F., Miccichè, S., Tumminello, M., Piilo, J., Mantegna, R.N., 2015. How news affects the trading behaviour of different categories of investors in a financial

market. Quant. Finance 15 (2), 213–229.Nofer, M., Hinz, O., 2015. Using twitter to predict the stock market. Bus. Inf. Syst. Eng. 57 (4), 229.Ranganathan, S., Kivelä, M., Kanniainen, J., 2017. Dynamics of investor spanning trees around dot-com bubble. arXiv preprint arXiv:1708.04430.Siganos, A., Vagenas-Nanos, E., Verwijmeren, P., 2014. Facebook’s daily sentiment and international stock markets. J. Econ. Behav. Organ. 107, 730–743.Siikanen, M., Kanniainen, J., Luoma, A., 2017a. What drives the sensitivity of limit order books to company announcement arrivals? Econ. Lett. 159, 65–68.Siikanen, M., Kanniainen, J., Valli, J., 2017b. Limit order books and liquidity around scheduled and non-scheduled announcements: empirical evidence from nasdaq

nordic. Finance Res. Lett. 21, 264–271.Snow, N.M., Rasso, J., 2017. If the Tweet Fits: How Investors Process Financial Information Received via Social Media. SSRN. working paperStatista, 2018. Most Famous Social Network Sites Worldwide as of January 2018. https://www.statista.com/statistics/272014/global-social-networks-ranked-by-

number-of-users/.Tumminello, M., Lillo, F., Piilo, J., Mantegna, R.N., 2012. Identification of clusters of investors from their real trading activity in a financial market. New J. Phys. 14

(1), 013041.Westerholm, P.J., 2009. Do uninformed crossed and internalized trades tap into unexpressed liquidity? The case of nokia. Acc. Finance 49 (2), 407–424.Yang, W., Lin, D., Yi, Z., 2017. Impacts of the mass media effect on investor sentiment. Finance Res. Lett. 22, 1–4.You, W., Guo, Y., Peng, C., 2017. Twitter’s daily happiness sentiment and the predictability of stock returns. Finance Res. Lett. 23, 58–64.Zhang, X., Fuehres, H., Gloor, P.A., 2011. Predicting stock market indicators through twitter “I hope it is not as bad as I fear”. Procedia-Soc. Behav. Sci. 26, 55–62.Zhang, Y., An, Y., Feng, X., Jin, X., 2017. Celebrities and ordinaries in social networks: who knows more information? Finance Res. Lett. 20, 153–161.Zheludev, I., Smith, R., Aste, T., 2014. When can social media lead financial markets? Sci. Rep. 4, 4213.Zhou, M., Lei, L., Wang, J., Fan, W., Wang, A.G., 2014. Social media adoption and corporate disclosure. J. Inf. Syst. 29 (2), 23–50.

M. Siikanen et al.

Page 171: TUNI baltakys arkisto1

APPENDIX TO PUBLICATION

III

Clusters of Investors Around Initial Public OfferingBaltakiene, M., Baltakys, K., Kanniainen, J., Pedreschi, D. and Lillo, F.

arXiv preprint arXiv:1905.13508v2 (2019). Accepted for publication in the PalgraveCommunications journal

Publication reprinted with the permission of the copyright holders

Page 172: TUNI baltakys arkisto1
Page 173: TUNI baltakys arkisto1

Appendix to Clusters of investors around InitialPublic Offering

Margarita Baltakiene††

Tampere [email protected]

Kęstutis BaltakysTampere University

Juho KanniainenTampere University

Dino PedreschiUniversity of Pisa

Fabrizio LilloUniversity of Bologna

September 5, 2019

The supplementary figures and tables are as follows:

• Fig. A.1 and Fig. A.2 show the investor clusters that persisted from the first- into thesecond-year after IPO for eight securities. The clusters are shown as rectangle blocksthat are composed of investors with four attributes: sector code, geographic location,gender and year of birth decade (for explanation see Fig. 2 in the Article). Statisticallysignificant overlapping cluster pairs are connected by the arrow from the first- to thesecond-year cluster.

• Fig. A.3 show the examples of security-wide investor cluster overlaps in the first (second)year after IPO. In the figure, each cluster has a statistically significant overlap with atleast one cluster in a group, however, the arrows between the clusters are omitted for thesimplification of the visualisation.

• Fig. B.1 shows the cluster groups with overexpressed geographical location attributes.

• Tables B.1 and B.2 show the grouped clusters of overexpressed (underexpressed) at-tributes.

• Fig. C.1 illustrates the link validation procedure and shows the sorted p-values and theFDR correction threshold for Kemira GrowHow (FI0009012843).

∗Corresponding author††Corresponding author

1

Page 174: TUNI baltakys arkisto1

A The evolution of investor clusters on IPOs

Figure A.1. Cluster evolution for networks with the FDR validation. Partial results forISIN FI0009013296. A cluster is represented by a rectangle. Cluster evolution is represented bythe rectangles connected by downward arrows. The top rectangle is the cluster in the first yearafter the IPO, and the bottom rectangle is the cluster in the second year after an IPO in the samenetwork. Sector code: - Households, - Non-financial, - Financial-Insurance, -General-Government, - Non-Profit, - Rest-World.Geographic location: - Helsinki, - South-West, - Western-Tavastia, - Central-Finland,

- Northern-Finland, - Ostrobothnia, - Rest-Uusimaa, - Eastern-Tavastia, -Eastern-Finland, - South-East, - Northern-Savonia.Gender: - Male, - Female, - No-Gender.Decade: - No-Age, - 1910, - 1920, - 1930, - 1940, - 1950, - 1960, - 1970, -1980, - 1990, - 2000.

2

Page 175: TUNI baltakys arkisto1

(a)

(b)

(c)

(d) (e) (f) (g)

Figure A.2. Cluster evolution for FDR networks (continued from Fig. A.1). (a)FI0009013403. (b) FI0009013429. (c) FI0009012843. (d) FI0009015309. (e) FI0009013312. (f)FI0009012413. (g) FI0009010391.

3

Page 176: TUNI baltakys arkisto1

(a)

1st

year

afte

rIP

O(b

)2n

dye

araf

ter

IPO

Fig

ure

A.3

.Sta

tist

ical

lysi

gnifi

cant

clust

erov

erla

ps

acro

ssm

ult

iple

secu

riti

es.

(a)

Ove

rlap

ping

clus

ters

acro

ssm

ulti

ple

secu

riti

esdu

ring

the

first

year

afte

ran

IPO

.(b)

Ove

rlap

ping

clus

ters

acro

ssm

ulti

ple

secu

riti

esdu

ring

the

seco

ndye

araf

ter

anIP

O.A

clus

ter

isre

pres

ente

dby

the

rect

angl

e.St

atis

tica

llyov

erla

ppin

gcl

uste

rgr

oups

are

sepa

rate

dby

hori

zont

allin

es.

Eac

hcl

uste

ris

com

pose

dof

inve

stor

sw

ith

four

attr

ibut

es:

sect

orco

de,g

eogr

aphi

clo

cati

on,g

ende

ran

dde

cade

.Se

eth

eat

trib

ute

colo

urm

appi

ngin

Fig

.2.

4

Page 177: TUNI baltakys arkisto1

B Over- and underexpressed attributes in groups ofstatistically similar clusters

Figure B.1. Network of investor clusters with overexpressed attributes. On theleft-hand-side are the clusters observed in the first year after respective IPOs and right-hand-side,in the second year. Investor cluster nodes are connected with continuous links if they sharestatistically significant number of individual investors. Node colours identify over-expressedgeographic location within clusters. Geographic location: - Helsinki, - South-West, -Western-Tavastia, - Central-Finland, - Northern-Finland, - Ostrobothnia, -Rest-Uusimaa, - Eastern-Tavastia, - Eastern-Finland, - South-East.

Table B.1. Overexpressed attributes in clusters’ groups. Here, the column ’Group’ is aconventional name for a group of the statistically similar clusters with overexpressed attributes,where the left (right) side ’Y1’ (’Y2’) corresponds to the first (second) year after the IPO.’O-expr.’ is the number of ISINs with overexpressed attributes in the same group, where thenumber of clusters with overexpressed attributes is given in parentheses (). ’Attribute’ is the nameof the overexpressed attribute and ’# Attr.’ is the number of ISINs in a group that overexpressthis attribute, and the number of clusters that overexpress the attribute is given in parentheses ().

Group Y1 Y2O-expr. Attribute # Attr. O-expr. Attribute # Attr.

1 24 (75)

General-Government 20 (32)

25 (83)

General-Government 23 (42)No-Age 16 (30) No-Gender 19 (39)No-Gender 15 (29) No-Age 18 (39)South-West 10 (15) Financial-Insurance 13 (22)Financial-Insurance 8 (12) Non-Profit 13 (15)Non-Profit 8 (9) South-West 8 (11)Ostrobothnia 2 (2) Helsinki 5 (6)Helsinki 2 (2) Rest-World 3 (3)Rest-Uusimaa 2 (2) Households 2 (2)Central-Finland 1 (1) Non-Financial 1 (1)South-East 1 (1) 1930 1 (1)Eastern-Tavastia 1 (1) 1940 1 (1)Non-Financial 1 (1) 1970 1 (1)Female 1 (1)1990 1 (1)

2 6 (7) Central-Finland 4 (4) 1 (1) Central-Finland 1 (1)

5

Page 178: TUNI baltakys arkisto1

Non-Profit 3 (3)3 3 (3) South-West 3 (3)

4 2 (2) Financial-Insurance 1 (1)Households 1 (1)

5 2 (2)Financial-Insurance 1 (1)No-Age 1 (1)1990 1 (1)

6 2 (2) Eastern-Tavastia 2 (2)7 2 (2) Financial-Insurance 2 (2)8 1 (1) Rest-World 1 (1) 1 (1) Rest-World 1 (1)9 1 (1) South-West 1 (1) 1 (1) South-West 1 (1)10 1 (1) 1990 1 (1) 1 (1) Northern-Finland 1 (1)

11 1 (1) Households 1 (1)1980 1 (1)

12 1 (1) General-Government 1 (1)No-Age 1 (1)

13 1 (1) Non-Financial 1 (1)No-Gender 1 (1)

14 1 (1) Financial-Insurance 1 (1)15 1 (1) Rest-World 1 (1)16 1 (1) Helsinki 1 (1)17 1 (1) Northern-Finland 1 (1)18 1 (1) Northern-Finland 1 (1)19 1 (1) Northern-Finland 1 (1)20 1 (1) Northern-Finland 1 (1)21 1 (1) Eastern-Finland 1 (1)22 1 (1) Central-Finland 1 (1)23 1 (1) Eastern-Tavastia 1 (1)24 1 (1) Eastern-Tavastia 1 (1)25 1 (1) Eastern-Tavastia 1 (1)26 1 (1) Western-Tavastia 1 (1)27 1 (1) Ostrobothnia 1 (1)28 1 (1) Ostrobothnia 1 (1)29 1 (1) Rest-Uusimaa 1 (1)30 1 (1) 1930 1 (1)31 1 (1) 1950 1 (1)32 1 (1) 1980 1 (1)33 1 (1) 1990 1 (1)34 1 (1) 1990 1 (1)35 1 (1) 1990 1 (1)

36 1 (1) No-Region 1 (1)Rest-World 1 (1)

37 1 (1) Eastern-Finland 1 (1)Female 1 (1)

38 1 (1) Male 1 (1)1940 1 (1)

39 1 (1) Western-Tavastia 1 (1)1970 1 (1)

40 1 (1) No-Region 1 (1)Rest-World 1 (1)

41 1 (1) Households 1 (1)42 1 (1) Households 1 (1)43 1 (1) Households 1 (1)

6

Page 179: TUNI baltakys arkisto1

44 1 (1) Non-Profit 1 (1)45 1 (1) Financial-Insurance 1 (1)46 1 (1) Rest-World 1 (1)47 1 (1) Helsinki 1 (1)48 1 (1) South-West 1 (1)49 1 (1) South-West 1 (1)50 1 (1) South-East 1 (1)51 1 (1) Ostrobothnia 1 (1)52 1 (1) Central-Finland 1 (1)53 1 (1) Central-Finland 1 (1)54 1 (1) Central-Finland 1 (1)55 1 (1) Eastern-Finland 1 (1)56 1 (1) Eastern-Tavastia 1 (1)57 1 (1) No-Region 1 (1)58 1 (1) 1910 1 (1)59 1 (1) 1920 1 (1)60 1 (1) 1970 1 (1)61 1 (1) 1970 1 (1)62 1 (1) 1980 1 (1)63 1 (1) 1980 1 (1)64 1 (1) 1980 1 (1)65 1 (1) 1980 1 (1)66 1 (1) 1990 1 (1)67 1 (1) 1990 1 (1)68 1 (1) 1990 1 (1)69 1 (1) 1990 1 (1)70 1 (1) 1990 1 (1)71 1 (1) 2000 1 (1)72 1 (1) 2000 1 (1)

7

Page 180: TUNI baltakys arkisto1

Table B.2. Underexpressed attributes in clusters’ groups. Here, the column ’Group’ is aconventional name for a group of the statistically similar clusters with underexpressed attributes,where the left (right) side ’Y1’ (’Y2’) corresponds to the first (second) year after the IPO.’U-expr.’ is the number of ISINs with underexpressed attributes in the same group, where thenumber of clusters with underexpressed attributes is given in parentheses (). ’Attribute’ is thename of the underexpressed attribute and ’# Attr.’ is the number of ISINs in a group thatunderexpress this attribute, and the number of clusters that underexpress the attribute is given inparentheses ().

Group Y1 Y2U-expr. Attribute # Attr. U-expr. Attribute # Attr.

1

14 (22) Households 11 (19)

19 (41)

Households 17 (34)Male 11 (14) Male 14 (27)

Financial-Insurance 1 (1)Helsinki 1 (1)1940 1 (1)1950 1 (1)No-Age 1 (1)

2 2 (2) Helsinki 2 (2)3 2 (2) Helsinki 2 (2)4 1 (1) Households 1 (1)5 1 (1) No-Age 1 (1)6 1 (1) Helsinki 1 (1)

7 1 (1) Helsinki 1 (1)South-West 1 (1)

8 1 (1) No-Age 1 (1)No-Gender 1 (1)

C Link validation with the FDR correction

Figure C.1. Example of link validation for Kemira GrowHow (FI0009012843), firstyear after IPO, log-log scale. The number of observed synchronous trade co-occurrences:33,595. The number of statistically validated links with the FDR correction: 1,481.

8

Page 181: TUNI baltakys arkisto1

APPENDIX TO PUBLICATION

IV

Neighbors matter: Geographical distance and trade timing in the stockmarket

Baltakys, K., Baltakiene, M., Kärkkäinen, H. and Kanniainen, J.

Finance Research Letters (2018)DOI: 10.1016/j.frl.2018.11.013

Publication reprinted with the permission of the copyright holders

Page 182: TUNI baltakys arkisto1
Page 183: TUNI baltakys arkisto1

Appendix to Neighbors Matter: Geographical Distanceand Trade Timing in the Stock Market

K ↪estutis Baltakysa,∗, Margarita Baltakienea, Hannu Karkkainena, Juho

Kanniainena

aDARE Business Data Research Group, Laboratory of Industrial and InformationManagement, Tampere University of Technology, Finland

Abstract

The starting point of this paper is that neighboring investors may talk toeach other sharing information about their transactions in stock markets,leading to similar trading behavior. We find that pairwise trade timingsimilarities between investor pairs are negatively associated to geographi-cal distance between corresponding investor pairs. This suggests that localinformation transfer channels between neighboring individual investors areused in decision making. We also observe that differences in age and lan-guage moderate this association. The analysis is conducted using investorlevel data from different regions of Finland.

Appendix A. Security info

Table A.1 summarizes trading information for the 60 securities investi-gated in the main analysis together with statistics for Nokia security, whichis the most widely traded security in Finland. The table indicated the num-ber of days each security was traded in the stock market, the number ofinvestors that have traded the security as well as the number of transactionexecuted when trading each security. The last column indicated the marketcapitalization segment of each security.

Appendix B. Jaccard of maximum

It is very likely, that an investor pair that talks about investing does notconverse about all of their traded securities but about a subset of special

∗Corresponding authorEmail address: [email protected] (K ↪estutis Baltakys)

Page 184: TUNI baltakys arkisto1

Table A.1: Summary statistics for 61 securities. Total number of unique tradingdays, total number of unique investors and total number of households’ transactions persecurity, market capitalization. Sorted by the descending number of transactions.

Company name ISIN N. of days N. of investors N. of transactions Market CAP

Nokia FI0009000681 2050 228376 3536396 LSonera FI0009007371 472 131547 917969 SFortum FI0009007132 1053 111198 906507 LUPM-Kymmene FI0009005987 1713 107760 871744 LOutokumpu FI0009002422 2041 59437 780667 LMetso FI0009007835 926 62485 765665 LSampo A FI0009003305 1404 87242 726077 LRautaruukki FI0009003552 2046 69759 719318 LNeste Oil FI0009013296 605 74806 694585 LNordea Bank FI0009902530 775 141033 672013 LElisa FI0009007884 920 186809 628931 LNokian Renkaat FI0009005318 1738 54635 584164 LTeliaSonera SE0000667925 605 99013 579741 LWartsila FI0009003727 2039 58019 560238 LElektrobit FI0009007264 1144 69344 501277 MRaisio V FI0009002943 2020 73390 464434 MYIT FI0009800643 1813 48568 449190 LF-Secure FI0009801310 828 65441 422062 MStora Enso R FI0009005961 1685 46645 419219 LMetsa Board B FI0009000665 2041 51764 416560 LKesko B FI0009000202 2048 51654 413323 LComptel FI0009008221 806 59629 371631 MTieto FI0009000277 1986 37377 353656 LKonecranes FI0009005870 1596 23203 345690 MPohjola Pankki A FI0009003222 1839 46322 337004 LElcoteq FI0009006738 1341 38832 323153 MPerlos FI0009007819 503 39257 322043 MOutotec FI0009014575 604 25337 314747 MUponor FI0009002158 1928 32871 272479 LInnofactor FI0009007637 998 38877 260470 SKemira FI0009004824 1998 39994 256723 LOrion B FI0009800346 1439 40628 230284 SHuhtamaki FI0009000459 2018 35251 222599 MStonesoft FI0009801302 983 25053 185230 SPohjola-Yhtyma FI0009000145 1425 35360 182511 SGeoSentric FI0009004204 1999 19626 182506 SSaunalahti Group FI0009008569 118 33931 173883 SAldata Solution FI0009007918 854 20162 164891 MRamirent FI0009007066 960 19006 163133 MCramo FI0009900476 1113 20731 156256 MSponda FI0009006829 1170 17427 140819 MCencorp FI0009006951 1179 19684 138648 STeleste FI0009007728 1001 19795 133379 MPKC Group FI0009006381 1502 21451 130566 MEimo FI0009007553 386 21856 130071 SSoon Communications FI0009006787 639 52622 126486 SFinnair FI0009003230 2029 23038 118105 MBasware FI0009008403 770 41462 116630 MTecnotree FI0009010227 631 14595 113816 SHKScan FI0009006308 1289 18798 106197 MBiotie Therapies FI0009011571 631 11827 103756 SDovre Group FI0009008098 854 11511 98824 SeQ FI0009008676 225 18577 94052 SAfarak Group FI0009800098 858 12680 92602 SSSH Comm. Security FI0009008270 609 14899 89332 SKemira GrowHow FI0009012843 179 23018 88931 MTalvivaara FI0009014716 387 18576 88037 SEfore FI0009900054 1751 11834 87801 STrainers’ House FI0009008122 760 20169 83877 SFiskars FI0009000400 1745 17223 81552 MMarimekko FI0009007660 983 15441 76760 S

interest companies. To capture this relationship, instead of measuring thesimilarity over all securities, we estimate the trade timing similarity to be

2

Page 185: TUNI baltakys arkisto1

the maximum of the similarities observed for different securities:

Jij = maxz

∑dM

(z,d;i,j)11∑

d

[M

(z,d;i,j)01 +M

(z,d;i,j)10 +M

(z,d;i,j)11

] (B.1)

where M(z,d;i,j)11 is the total number of trading days where both i and j are

in the state d ∈ {b, s} in security z, M(z,d;i,j)01 is the total number of trading

days where i is not and j is in the state d ∈ {b, s} in security z, and M(z,d;i,j)10

is the total number of trading days where where i is in the state d ∈ {b, s}in security z and j is not.

Regression results for urban areas is found in table C.5, rural areas intable C.6 and aggregated urban and rural areas in table C.7.

Appendix C. Tables and Figures for Robustness Checks

The tables and figure for robustness checks are as follows:

• table C.2 presents the bootstrap results for four metropolitan areas,where 10% of the investor pairs are sampled 1, 000 times.

• table C.3 presents the bootstrap results for 10 rural areas, where 10%of the investor pairs are sampled 1, 000 times.

• table C.4 presents the results for aggregate urban and rural areas.

• table C.5 presents the regression results for four metropolitan areaswhere the trade timing similarity between two investors is the maxi-mum Jaccard coefficient between those investors over 60 securities.

• table C.6 presents the regression results for 10 rural areas where thetrade timing similarity between two investors is the maximum Jaccardcoefficient between those investors over 60 securities.

• table C.7 presents the regression results for aggregated urban and ru-ral areas where the trade timing similarity between two investors isthe maximum Jaccard coefficient between those investors over 60 se-curities.

• table C.8 presents the regression results for four metropolitan areaswith yearly dummies, where trade timing similarities are inferred fordifferent years.

3

Page 186: TUNI baltakys arkisto1

• table C.9 presents the regression results for four metropolitan areaswith five-year period dummies, where trade timing similarities are in-ferred for different five-year periods.

• table C.10 presents the regression results where the top 5% of the mostactive investors were excluded from the analysis.

• figure C.1 presents the α1 coefficient distributions from bootstrappedacross- and within-regional regression analyses for four metropolitanareas.

4

Page 187: TUNI baltakys arkisto1

Table C.2: Bootstrap results for four metropolitan area linear regression es-timates with control variables and moderators. Pairwise relationships are boot-strapped 1,000 times without replacement taking a sample of 10%. The coefficients comefrom the main regression analysis, while the percentages underneath indicate in what frac-tion of iterations the same sign coefficients have been observed. Coefficients are in bold ifin at least 90% of the bootstrap iterations they had the same sign.

Panel A: Distance

Helsinki Tampere Turku Oulu

distance -6.23e-06 *** -8.24e-06 * 8.02e-07 -1.64e-05 ***100.0 % 78.3 % 51.4 % 96.6 %

Panel B: Moderators

age diff. ≥ 10 2.72e-06 *** 1.49e-05 *** 1.49e-05 * 1.53e-05 ***96.5 % 88.8 % 76.0 % 91.9%

different language 1.49e-06 * 2.23e-05 -3.90e-05 ** 1.18e-04 ***66.9 % 69.6 % 87.4 % 98.1 %

female-female 3.88e-06 *** 1.15e-05 -4.01e-05 ** -5.64e-0695.9 % 61.8 % 78.5 % 56.8 %

male-female 3.43e-06 *** -7.84e-06 -2.52e-05 *** -1.36e-05 ***98.3 % 71.6 % 87.1 % 86.6 %

Panel C: Constant and control variables

constant 0.003 *** 0.003 *** 0.003 *** 0.003 ***100.0 % 100.0 % 100.0 % 100.0 %

age diff. ≥ 10 -9.38e-05 *** -3.69e-04 *** -2.65e-04 *** -3.08e-04 ***100.0 % 100.0 % 97.9 % 97.7 %

different language -3.13e-04 *** -6.92e-04 *** -5.15e-05 -0.001 ***100.0 % 98.8 % 63.6 % 98.1 %

female-female -1.20e-04 *** 6.95e-05 1.85e-04 8.27e-0699.9 % 56.9 % 69.9 % 46.1 %

male-female -1.40e-04 *** 3.39e-05 6.60e-05 9.02e-05100.0 % 63.7 % 67.6 % 68.7 %

*** p < 0.001; ** p < 0.01; * p < 0.05

5

Page 188: TUNI baltakys arkisto1

Table C.3: Linear regression estimates for 10 rural regions with control variables and moderators. Pairwise relationshipsare bootstrapped 1,000 times without replacement taking a sample of 10%. The coefficients come from the main regression analysis,while the percentages underneath indicate in what fraction of iterations the same sign coefficients have been observed. Coefficients arein bold if in at least 90% of the bootstrap iterations they had the same sign.

Panel A: Distance

UusimaaEasternTavastia

South-WesternFinalnd

WesternTavastia

CentralFinalnd

South-EasternFinalnd

OstrobothniaNorthernSavonia

EasternFinland

NorthernFinland

distance -3.84e-07 -6.64e-06 *** -8.45e-06 *** -3.00e-06 *** -2.46e-06 *** -4.54e-06 *** -9.05e-07 *** -2.67e-06 *** -2.82e-06 *** -5.71e-07 ***67.0 % 100.0 % 100.0 % 99.8 % 100.0 % 99.9 % 95.9 % 97.1 % 100.0 % 87.8 %

Panel B: Moderators

age diff. ≥ 10 5.11e-07 1.50e-06 ** 2.57e-06 *** 9.83e-07 * 1.22e-06 *** -2.17e-07 2.51e-07 7.39e-07 6.81e-07 *** 4.24e-07 *69.9 % 79.8 % 100.0 % 79.1 % 99.6 % 53.7 % 66.1 % 64.3 % 85.5 % 73.3 %

different language 7.64e-07 * 2.43e-06 7.85e-06 *** 1.48e-05 *** 6.77e-08 -4.09e-07 3.90e-07 * 8.49e-06 1.53e-06 -7.23e-06 **77.8 % 71.9 % 100.0 % 99.4 % 52.5 % 54.3 % 73.7 % 80.4 % 65.6 % 90.1 %

female-female -3.60e-06 *** -8.36e-06 *** -6.26e-06 *** -7.97e-07 6.13e-07 -2.44e-06 -1.34e-06 * 1.28e-07 -1.61e-06 * 3.23e-0792.5 % 94.0 % 100.0 % 59.6 % 66.2 % 69.3 % 75.1 % 53.3 % 74.2 % 52.0 %

male-female -1.63e-06 *** -4.63e-06 *** -2.26e-06 *** -1.56e-06 *** -6.44e-08 3.79e-07 -8.83e-07 *** -1.46e-06 * -9.24e-07 *** -4.71e-07 *94.3 % 99.6 % 100.0 % 87.9 % 55.5 % 57.5 % 91.3 % 76.7 % 91.8 % 74.2 %

Panel C: Constant and control variables

constant 0.003 *** 0.003 *** 0.003 *** 0.003 *** 0.003 *** 0.003 *** 0.003 *** 0.003 *** 0.004 *** 0.003 ***100.0 % 100.0 % 100.0 % 100.0 % 100.0 % 100.0 % 100.0 % 100.0 % 100.0 % 100.0 %

age diff. ≥ 10 -7.88e-05 *** -2.08e-04 *** -4.24e-04 *** -2.91e-04 *** -3.18e-04 *** 2.32e-05 -1.90e-04 *** -2.88e-04*** -3.13e-04 *** -2.96e-04 ***91.8 % 97.2 % 100.0 % 100.0 % 100.0 % 57.1 % 100.0 % 98.0 % 99.1 % 99.6 %

different language -1.90e-04 *** -0.002 *** -9.69e-04 *** -0.002 *** -8.74e-04 *** 2.33e-04 -1.94e-04 *** -5.43e-04 8.19e-05 9.05e-04 *99.2 % 100.0 % 100.0 % 99.9 % 96.5 % 62.3 % 100.0 % 77.9 % 57.6 % 77.5 %

female-female -1.63e-04 ** 4.57e-04 *** 6.64e-04 *** -2.72e-04 ** -1.63e-04 * -2.62e-04 6.92e-05 2.87e-04 * 2.33e-04 -5.35e-0585.3 % 93.4 % 99.9 % 83.7 % 76.5 % 74.4 % 64.2 % 71.5 % 67.3 % 53.8 %

male-female -1.05e-04 *** 2.17e-04 *** 1.29e-04 *** -2.14e-05 -1.12e-04 *** -1.82e-04 *** -4.59e-05 * 2.14e-04 *** 1.26e-04 ** -1.62e-0594.4 % 97.8 % 96.9 % 57.9 % 90.7 % 84.4 % 78.7 % 89.0 % 83.6 % 55.3 %

*** p < 0.001; ** p < 0.01; * p < 0.05

6

Page 189: TUNI baltakys arkisto1

Table C.4: Linear regression estimates for aggregated urban and rural regionswith control variables and moderators. Pairwise relationships are estimated over60 securities and buying and selling behaviors. Standard errors are given in parentheses ()and economic significance is given in curly brackets {}. Economic significance is normedby Jaccard coefficient standard deviation.

Panel A: Distance

Urban Rural

-5.44e-06 *** -1.28e-06 ***distance (4.47e-07) (5.37e-08)

{-0.004} {-0.006}Panel B: Moderators

age diff. ≥ 10 3.37e-06 *** 5.89e-07 ***(4.86e-07) (6.34e-08){0.002} {0.003}

different language 6.11e-07 6.06e-08(5.95e-07) (1.00e-07){2.31e-04} {1.89e-04}

female-female 2.43e-06 ** -1.80e-06 ***(7.96e-07) (1.88e-07){6.66e-04} {-0.002}

male-female 1.99e-06 *** -9.08e-07 ***(4.71e-07) (7.02e-08){0.001} {-0.003}

Panel C: Constant and control variables

constant 0.003 *** 0.003 ***(5.36e-06) (5.71e-06){0} {0}

age diff. ≥ 10 -1.24e-04 *** -2.17e-04 ***(5.93e-06) (6.74e-06){-0.005} {-0.008}

different language -3.73e-04 *** -2.53e-04 ***(8.28e-06) (1.13e-05){-0.009} {-0.007}

female-female -1.24e-04*** 8.33e-05 ***(1.20e-05) (1.98e-05){-0.002} {0.001}

male-female -1.36e-04 *** -2.37e-05 **(6.01e-06) (7.43e-06){-0.005} {-7.57e-04}

Panel D: Sample size and Jaccard standard deviation

N 33,050,817 50,251,034Jaccard std. dev. 0.013 0.014

*** p < 0.001; ** p < 0.01; * p < 0.05

7

Page 190: TUNI baltakys arkisto1

Table C.5: Linear regression estimates for four metropolitan areas with controlvariables and moderators. Pairwise trade timing similarities over buying and sellingbehaviors are estimated to be the strongest Jaccard coefficients over one of the 60 securi-ties. Standard errors are given in parentheses () and economic significance is given in curlybrackets {}. Economic significance is normed by Jaccard coefficient standard deviation.

Panel A: Distance

Helsinki Tampere Turku Oulu

distance -5.64e-05 *** -1.44e-04 *** -1.10e-04 *** -9.68e-05 ***(2.53e-06) (1.75e-05) (3.12e-05) (1.31e-05){-6.27e-04} {-6.23e-04} {-3.46e-04} {-7.39e-04}

Panel B: Moderators

age diff. ≥ 10 4.51e-05 *** 8.99e-05 *** 7.97e-05 * 9.03e-05 ***(2.74e-06) (2.07e-05) (3.50e-05) (1.55e-05){4.49e-04} {4.44e-04} {2.85e-04} {7.24e-04}

different language -2.32e-05 *** 6.68e-05 -1.83e-04 ** 1.70e-04(3.29e-06) (7.62e-05) (6.15e-05) (8.82e-05){-1.25e-04} {7.68e-05} {-3.19e-04} {1.93e-04}

female-female 9.12e-05 *** 1.41e-04 * -1.18e-04 -2.67e-05(4.41e-06) (6.14e-05) (7.85e-05) (4.61e-05){3.53e-04} {1.99e-04} {-1.57e-04} {-5.62e-05}

male-female 6.30e-05 *** 1.94e-05 -4.99e-05 -6.61e-05 ***(2.65e-06) (2.27e-05) (3.62e-05) (1.70e-05){5.25e-04} {7.78e-05} {-1.57e-04} {-4.17e-04}Panel C: Constant and control variables

constant 0.017 *** 0.020 *** 0.019 *** 0.018 ***(3.17e-05) (1.54e-04) (1.96e-04) (1.74e-04){0} {0} {0} {0}

age diff. ≥ 10 -1.85e-03 *** -3.54e-03 *** -2.68e-03 *** -2.42e-03 ***(3.49e-05) (1.80e-04) (2.19e-04) (2.08e-04){-8.98e-04} {-1.72e-03} {-1.30e-03} {-1.19e-03}

different language -3.71e-04 *** -6.85e-03 *** -1.38e-03 *** -2.21e-03 *(4.70e-05) (6.22e-04) (3.57e-04) (0.001){-1.27e-04} {-9.64e-04} {-4.11e-04} {-2.14e-04}

female-female -1.58e-03 *** 6.38e-04 -1.06e-03 * -1.73e-03 **(6.98e-05) (4.85e-04) (4.98e-04) (6.24e-04){-3.60e-04} {1.14e-04} {-2.20e-04} {-2.68e-04}

male-female -1.29e-03 *** 1.94e-04 -9.97e-04 *** -6.64e-04 **(3.53e-05) (1.93e-04) (2.26e-04) (2.29e-04){-6.17e-04} {8.89e-05} {-4.71e-04} {-2.93e-04}

Panel D: Sample size and Jaccard standard deviation

N 26,953,337 3,039,121 1,805,007 1,253,352Jaccard std. dev. 0.071 0.076 0.072 0.065

*** p < 0.001; ** p < 0.01; * p < 0.05

8

Page 191: TUNI baltakys arkisto1

Table C.6: Linear regression estimates for 10 rural regions with control variables and moderators. Pairwise tradetiming similarities over buying and selling behaviors are estimated to be the strongest Jaccard coefficients over one of the 60 securities.Standard errors are given in parentheses () and economic significance is given in curly brackets {}. Economic significance is normedby Jaccard coefficient standard deviation.

Panel A: Distance

UusimaaEasternTavastia

South-WesternFinalnd

WesternTavastia

CentralFinalnd

South-EasternFinalnd

OstrobothniaNorthernSavonia

EasternFinland

NorthernFinland

distance -1.26e-06 -4.23e-05 *** -3.67e-05 *** -1.07e-05 *** -1.52e-05 *** -9.86e-06 *** -7.84e-06 *** 2.30e-07 -1.03e-05 *** 3.55e-07(1.24e-06) (2.19e-06) (7.85e-07) (1.64e-06) (5.89e-07) (2.15e-06) (9.15e-07) (1.98e-06) (6.79e-07) (8.04e-07){-4.68e-05} {-1.25e-03} {-2.00e-03} {-3.81e-04} {-1.30e-03} {-3.80e-04} {-3.36e-04} {9.69e-06} {-1.11e-03} {3.31e-05}

Panel B: Moderators

age diff. ≥ 10 7.69e-07 1.09e-05 *** 1.11e-05 *** 4.52e-07 6.43e-06 *** -2.47e-06 -1.54e-06 -7.23e-06 ** 2.67e-06 ** 4.31e-08(1.38e-06) (2.59e-06) (8.18e-07) (1.96e-06) (7.12e-07) (2.60e-06) (9.98e-07) (2.38e-06) (8.34e-07) (9.83e-07){2.90e-05} {3.50e-04} {6.79e-04} {1.89e-05} {5.48e-04} {-1.10e-04} {-7.91e-05} {-3.26e-04} {3.11e-04} {4.37e-06}

different language 1.02e-05 *** -2.87e-05 * 4.62e-05 *** 8.62e-05 *** 1.64e-05 *** -7.83e-06 7.65e-06 *** 7.63e-05 *** 6.30e-06 -3.68e-05 **(1.62e-06) (1.32e-05) (1.13e-06) (1.51e-05) (4.29e-06) (1.56e-05) (1.03e-06) (1.94e-05) (7.62e-06) (1.18e-05){3.57e-04} {-1.66e-04} {0.003} {4.31e-04} {1.97e-04} {-4.59e-05} {3.68e-04} {3.23e-04} {6.24e-05} {-2.52e-04}

female-female -3.11e-05 *** -4.46e-05 *** -3.07e-05 *** -6.04e-06 2.46e-06 -3.01e-05 *** -1.59e-05 *** -2.04e-05 ** -8.25e-06 ** -4.26e-06(4.00e-06) (7.31e-06) (2.10e-06) (5.97e-06) (2.22e-06) (8.09e-06) (2.88e-06) (6.91e-06) (2.71e-06) (3.14e-06){-3.43e-04} {-4.26e-04} {-6.31e-04} {-7.38e-05} {5.58e-05} {-3.80e-04} {-2.48e-04} {-2.70e-04} {-2.56e-04} {-1.15e-04}

male-female -1.35e-05 *** -2.06e-05 *** -1.03e-05 *** -7.20e-06 *** -9.85e-07 -1.19e-05 *** -7.20e-06 *** -1.29e-05 *** -5.14e-06 *** -4.19e-06 ***(1.52e-06) (2.81e-06) (8.74e-07) (2.17e-06) (7.95e-07) (2.90e-06) (1.09e-06) (2.62e-06) (9.44e-07) (1.11e-06){-4.15e-04} {-5.43e-04} {-5.40e-04} {-2.50e-04} {-6.61e-05} {-4.34e-04} {-3.08e-04} {-4.78e-04} {-4.76e-04} {-3.36e-04}

Panel C: Constant and control variables

constant 0.015 *** 0.019 *** 0.022 *** 0.018 *** 0.019 *** 0.018 *** 0.020 *** 0.018 *** 0.021 *** 0.017 ***(7.43e-05) (1.20e-04) (7.54e-05) (1.22e-04) (8.32e-05) (1.70e-04) (8.21e-05) (1.55e-04) (1.36e-04) (1.40e-04){0} {0} {0} {0} {0} {0} {0} {0} {0} {0}

age diff. ≥ 10 -1.22e-03 *** -2.28e-03 *** -2.96e-03 *** -2.27e-03 *** -2.80e-03 *** -9.36e-04 *** -2.06e-03 *** -2.00e-03 *** -3.03e-03 *** -2.21e-03 ***(8.73e-05) (1.42e-04) (8.60e-05) (1.46e-04) (1.01e-04) (2.05e-04) (9.06e-05) (1.85e-04) (1.67e-04) (1.71e-04){-6.05e-04} {-1.12e-03} {-1.46e-03} {-1.11e-03} {-1.38e-03} {-4.63e-04} {-1.01e-03} {-9.86e-04} {-1.51e-03} {-1.10e-03}

different language -8.42e-04 *** -6.35e-03 *** -5.58e-03 *** -8.85e-03 *** -9.63e-03 *** -2.55e-03 * -1.61e-03 *** -4.24e-03 *** -7.93e-04 0.007 ***(1.27e-04) (8.20e-04) (1.53e-04) (0.001) (6.74e-04) (0.001) (9.56e-05) (0.001) (0.001) (0.002){-3.38e-04} {-5.87e-04} {-2.43e-03} {-5.33e-04} {-7.36e-04} {-2.06e-04} {-7.80e-04} {-2.73e-04} {-4.19e-05} {2.96e-04}

female-female -4.83e-04 8.40e-04 * 0.002 *** -2.08e-03*** -1.92e-03*** 5.38e-04 4.96e-04 -2.17e-04 -1.23e-03 * 0.002 **(2.56e-04) (3.91e-04) (2.31e-04) (4.48e-04) (3.19e-04) (6.36e-04) (2.61e-04) (5.36e-04) (5.33e-04) (5.19e-04){-8.24e-05} {1.50e-04} {3.90e-04} {-3.38e-04} {-3.03e-04} {8.61e-05} {8.50e-05} {-3.67e-05} {-1.93e-04} {2.52e-04}

male-female -5.00e-04 *** 3.37e-04 * 5.96e-05 -5.88e-04 *** -1.19e-03*** 2.05e-04 -3.77e-04 *** -4.18e-06 -8.63e-04 *** 5.53e-04 **(9.64e-05) (1.53e-04) (9.28e-05) (1.62e-04) (1.13e-04) (2.28e-04) (9.90e-05) (2.03e-04) (1.88e-04) (1.90e-04){-2.26e-04} {1.54e-04} {2.75e-05} {-2.62e-04} {-5.25e-04} {9.13e-05} {-1.71e-04} {-1.89e-06} {-3.82e-04} {2.47e-04}

Panel D: Sample size and Jaccard standard deviation

N 5,935,341 3,154,892 11,648,346 3,983,213 5,301,202 1,922,045 12,072,544 1,858,117 2,246,499 2,128,835Jaccard std. dev. 0.062 0.066 0.075 0.068 0.069 0.069 0.073 0.067 0.067 0.066

*** p < 0.001; ** p < 0.01; * p < 0.05

9

Page 192: TUNI baltakys arkisto1

Table C.7: Linear regression estimates for aggregated urban and rural regionswith control variables and moderators. Pairwise trade timing similarities over buyingand selling behaviors are estimated to be the strongest Jaccard coefficients over one ofthe 60 securities. Standard errors are given in parentheses () and economic significance isgiven in curly brackets {}. Economic significance is normed by Jaccard coefficient standarddeviation.

Panel A: Distance

Urban Rural

distance -6.25e-05 *** -6.46e-06 ***(2.45e-06) (2.66e-07){-6.43e-04} {-4.12e-04}

Panel B: Moderators

age diff. ≥ 10 4.87e-05 *** 1.34e-06 ***(2.67e-06) (3.14e-07){4.53e-04} {8.58e-05}

different language -1.88e-05 *** 1.19e-05 ***(3.26e-06) (4.95e-07){-9.25e-05} {5.26e-04}

female-female 9.12e-05 *** -1.19e-05 ***(4.36e-06) (9.32e-07){3.25e-04} {-2.16e-04}

male-female 6.11e-05 *** -5.97e-06 ***(2.58e-06) (3.47e-07){4.74e-04} {-3.06e-04}

Panel C: Constant and control variables

constant 0.018 *** 0.018 ***(2.94e-05) (2.82e-05){0} {0}

age diff. ≥ 10 -2.02e-03 *** -2.07e-03 ***(3.25e-05) (3.34e-05)-9.86e-04 -1.02e-03

different language -7.07e-04 *** -1.47e-03 ***(4.54e-05) (5.61e-05){-2.28e-04} {-5.56e-04}

female-female -1.53e-03 *** -1.74e-04(6.61e-05) (9.81e-05){-3.36e-04} {-2.98e-05}

male-female -1.24e-03 *** -4.44e-04 ***(3.29e-05) (3.68e-05){-5.92e-04} {-2.01e-04}

Panel D: Sample size and Jaccard standard deviation

N 33,050,817 50,251,034Jaccard std. dev. 0.071 0.070

*** p < 0.001; ** p < 0.01; * p < 0.05

10

Page 193: TUNI baltakys arkisto1

Table C.8: Linear regression estimates for four metropolitan areas with controlvariables and moderators when pairwise variables are formed for differentyears. Pairwise relationships are estimated over 60 securities and buying and sellingbehaviors. Standard errors are given in parentheses ().

Helsinki Tampere Turku OuluPanel A: Distance

distance -2.96e-05*** -5.61e-05*** -8.49e-05*** -5.22e-05***(2.43e-06) (1.18e-05) (2.27e-05) (9.78e-06)

Panel B: Moderators

age diff. ≥ 10 1.54e-05 *** 5.20e-05 *** 5.38e-05 * 3.58e-05 **(2.66e-06) (1.43e-05) (2.60e-05) (1.19e-05)

different language 1.59e-05 *** 2.58e-04 *** -2.38e-04*** 1.73e-04 *(3.20e-06) (6.45e-05) (4.66e-05) (7.63e-05)

female-female -3.35e-06 1.13e-04 ** -1.53e-04* -3.99e-05(3.90e-06) (4.37e-05) (6.18e-05) (3.77e-05)

male-female 1.92e-06 -3.30e-05* -7.99e-05** -7.58e-05***(2.51e-06) (1.60e-05) (2.75e-05) (1.34e-05)

Panel C: Constant and control variables

constant 0.006 *** 0.009 *** 0.005 *** 0.006 **(3.29e-04) (0.001) (0.001) (0.002)

age diff. ≥ 10 -3.60e-04*** -8.90e-04*** -8.27e-04*** -9.12e-04***(2.90e-05) (1.24e-04) (1.63e-04) (1.61e-04)

different language -5.03e-04*** -0.005*** 8.85e-04 ** -0.003**(3.82e-05) (5.07e-04) (2.73e-04) (8.66e-04)

female-female 0.002 *** 0.002 *** 0.002 *** 0.002 ***(5.65e-05) (3.44e-04) (3.91e-04) (5.18e-04)

male-female 6.61e-04 *** 0.001 *** 5.51e-04 ** 0.001 ***(2.91e-05) (1.35e-04) (1.72e-04) (1.82e-04)

year 1996 0.002 *** -0.001 0.003 * -4.17e-05(3.81e-04) (0.002) (0.001) (0.003)

year 1997 4.34e-04 -2.96e-04 0.003 * -6.61e-05(3.49e-04) (0.001) (0.001) (0.002)

year 1998 0.008 *** 0.004 ** 0.012 *** 0.007 **(3.38e-04) (0.001) (0.001) (0.002)

year 1999 0.029 *** 0.018 *** 0.021 *** 0.013 ***(3.31e-04) (0.001) (0.001) (0.002)

year 2000 0.001 *** -0.001 0.004 *** 0.003(3.29e-04) (0.001) (0.001) (0.002)

year 2001 6.76e-04 * -1.87e-04 0.003 ** 0.003(3.30e-04) (0.001) (0.001) (0.002)

year 2002 4.29e-04 -0.002 0.003 * 0.002(3.31e-04) (0.001) (0.001) (0.002)

year 2003 0.007 *** 0.011 *** 0.018 *** 0.012 ***(3.31e-04) (0.001) (0.001) (0.002)

year 2004 0.003 *** -1.63e-04 0.005 *** 6.29e-04(3.30e-04) (0.001) (0.001) (0.002)

year 2005 0.012 *** 0.008 *** 0.014 *** 0.008 ***(3.30e-04) (0.001) (0.001) (0.002)

year 2006 -9.89e-04** -0.002 0.002 * -2.80e-04(3.30e-04) (0.001) (0.001) (0.002)

year 2007 0.015 *** 0.011 *** 0.013 *** 0.011 ***(3.30e-04) (0.001) (0.001) (0.002)

year 2008 0.003 *** 8.57e-04 0.005 *** 0.005 *(3.29e-04) (0.001) (0.001) (0.002)

year 2009 0.002 *** 8.02e-04 0.004 *** 0.003(3.29e-04) (0.001) (0.001) (0.002)

Panel D: Sample size and Jaccard standard deviation

N 57,202,834 6,455,540 3,738,679 2,464,713Jaccard std. dev. 0.080 0.078 0.079 0.072

*** p < 0.001; ** p < 0.01; * p < 0.05

11

Page 194: TUNI baltakys arkisto1

Table C.9: Linear regression estimates for four metropolitan areas with controlvariables and moderators when pairwise variables are formed over three five-year periods. Pairwise relationships are estimated over 60 securities and buying andselling behaviors. Standard errors are given in parentheses (). Analysis is done formingpairwise variables across different 5 year periods [1994-12-31; 1999-12-31; 2004-12-31; 2009-12-31].

Helsinki Tampere Turku OuluPanel A: Distance

distance -1.80e-05*** -2.69e-05*** -3.45e-05** -3.44e-05***(1.49e-06) (7.30e-06) (1.28e-05) (5.19e-06)

Panel B: Moderators

age diff. ≥ 10 1.05e-05 *** 3.42e-05 *** 2.87e-05 * 3.26e-05 ***(1.63e-06) (8.73e-06) (1.45e-05) (6.24e-06)

different language 8.82e-06 *** 9.07e-05 * -8.98e-05*** 1.33e-04 ***(1.94e-06) (3.65e-05) (2.55e-05) (4.02e-05)

female-female -1.28e-05*** 6.70e-05 ** 3.09e-05 -1.20e-05(2.47e-06) (2.58e-05) (3.32e-05) (1.87e-05)

male-female -4.84e-06** 1.45e-05 6.31e-07 -2.79e-05***(1.55e-06) (9.64e-06) (1.51e-05) (6.89e-06)

Panel C: Constant and control variables

constant 0.018 *** 0.014 *** 0.012 *** 0.012 ***(2.44e-05) (8.20e-05) (1.01e-04) (1.01e-04)

age diff. ≥ 10 -1.46e-04*** -5.08e-04*** -5.14e-04*** -7.22e-04***(1.83e-05) (7.58e-05) (9.02e-05) (8.40e-05)

different language -4.80e-04*** -0.002*** -1.01e-04 -0.002***(2.40e-05) (2.93e-04) (1.48e-04) (4.56e-04)

female-female 0.001 *** 3.69e-04 4.62e-04 * 6.24e-04 *(3.52e-05) (2.04e-04) (2.07e-04) (2.55e-04)

male-female 4.12e-04 *** 2.54e-04 ** 2.21e-04 * 5.52e-04 ***(1.83e-05) (8.17e-05) (9.41e-05) (9.31e-05)

period (1999; 2004] -0.015*** -0.010*** -0.009*** -0.009***(2.20e-05) (6.26e-05) (7.53e-05) (8.62e-05)

period (2004; 2009] -0.014*** -0.009*** -0.008*** -0.008***(2.16e-05) (6.04e-05) (7.30e-05) (8.42e-05)

Panel D: Sample size and Jaccard standard deviation

N 41,667,266 4,536,915 2,695,486 1,790,429Jaccard std. dev. 0.044 0.040 0.037 0.032

*** p < 0.001; ** p < 0.01; * p < 0.05

12

Page 195: TUNI baltakys arkisto1

Table C.10: Linear regression estimates for four metropolitan areas with con-trol variables and moderators excluding 5% of most active investors. Pairwiserelationships are estimated over 60 securities and buying and selling behaviors. Standarderrors are given in parentheses (). Analysis done excluding the top 5% of most activeinvestors (who were active > 300 days counted over different securities) from the totaldata set (some regions might contain exactly the same set of observations if there were noreally active investors).

Helsinki Tampere Turku OuluPanel A: Distance

distance -6.73e-06*** -9.34e-06* -4.33e-06 -1.56e-05***(4.77e-07) (3.88e-06) (6.66e-06) (3.26e-06)-0.006 -0.003 -9.44e-04 -0.008

Panel B: Moderators

age diff. ≥ 10 2.82e-06 *** 1.68e-05 *** 2.00e-05 ** 1.48e-05 ***(5.15e-07) (4.56e-06) (7.41e-06) (3.87e-06)0.002 0.005 0.005 0.008

different language 2.03e-06 ** 2.39e-05 -4.03e-05** 1.17e-04 ***(6.16e-07) (1.66e-05) (1.29e-05) (2.14e-05)8.62e-04 0.002 -0.005 0.009

female-female 4.08e-06 *** 1.23e-05 -4.98e-05** -6.13e-06(8.22e-07) (1.31e-05) (1.63e-05) (1.12e-05)0.001 0.001 -0.005 -8.52e-04

male-female 3.30e-06 *** -7.58e-06 -3.12e-05*** -1.36e-05**(4.97e-07) (4.96e-06) (7.63e-06) (4.21e-06)0.002 -0.002 -0.007 -0.006

Panel C: Constant and control variables

constant 0.002 *** 0.003 *** 0.003 *** 0.003 ***(6.06e-06) (3.42e-05) (4.14e-05) (4.34e-05)0.00e+00 0.00e+00 0.00e+00 0.00e+00

age diff. ≥ 10 -1.13e-04*** -4.09e-04*** -2.95e-04*** -3.30e-04***(6.66e-06) (3.96e-05) (4.59e-05) (5.17e-05)-0.004 -0.012 -0.010 -0.011

different language -2.89e-04*** -8.15e-04*** -6.50e-06 -0.001***(8.93e-06) (1.36e-04) (7.40e-05) (2.52e-04)-0.008 -0.007 -1.38e-04 -0.008

female-female -7.21e-05*** 1.18e-04 3.33e-04 ** 1.00e-04(1.32e-05) (1.04e-04) (1.02e-04) (1.51e-04)-0.001 0.001 0.005 0.001

male-female -1.07e-04*** 6.50e-05 1.55e-04 ** 1.20e-04 *(6.73e-06) (4.22e-05) (4.72e-05) (5.65e-05)-0.004 0.002 0.005 0.003

Panel D: Sample size and Jaccard standard deviation

N 24,388,166 2,740,940 1,629,458 1,146,145Jaccard std. dev. 0.013 0.016 0.014 0.015

*** p < 0.001; ** p < 0.01; * p < 0.05

13

Page 196: TUNI baltakys arkisto1

Figure C.1: α1 coefficient distribution from bootstrapped across- and within-regional re-gressions using normalized distances. 1, 000 bootstrap iterations for each urban region,sampling 500, 000 investor pairs within an urban region, and the same amount of investorpairs where one of the investors in the pair originates from the urban area and the othercomes from anywhere outside the analyzed urban region. In order for the α1 coefficientsto be comparable, we normalized the distances before running the regression analyses.Kolmogorov-Smirnov two sample test statistics and p-values: Helsinki (statistic= 0.934,pvalue= 0.0), Tampere (statistic= 0.611, pvalue= 2.255×10−164), Turku (statistic= 0.528,pvalue= 7.415× 10−123), Oulu (statistic= 0.531, pvalue= 2.985× 10−124)

14

Page 197: TUNI baltakys arkisto1

APPENDIX TO PUBLICATION

V

Facebook drives behavior of passive households in stock marketsSiikanen, M., Baltakys, K., Kanniainen, J., Vatrapu, R., Mukkamala, R. and

Hussain, A.

Finance Research Letters 27.(2018), 208–213DOI: 10.1016/j.frl.2018.03.020

Publication reprinted with the permission of the copyright holders

Page 198: TUNI baltakys arkisto1
Page 199: TUNI baltakys arkisto1

Facebook drives behavior of passive households in stock markets

Milla Siikanen*,1, Kęstutis Baltakys1, Juho Kanniainen1, Ravi Vatrapu2,3, Raghava Mukkamala2 , 3, and Abid Hussain2

1 DARE Business Data Research Group, Laboratory of Industrial and Information Management, Tampere University of Technology, Finland

2 Centre for Business Data Analytics, Copenhagen Business School, Denmark 3 Westerdals Oslo School of Arts, Communication and Technology, Norway

Online Appendix: Complete regression tables

For descriptions of the variables and regression equations, see Section 3. Framework of empirical analysis in the paper

Investor sectors Nonfinancial companies

1. Companies, posts Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.34255 0.055783 -6.1408 318 2.4477e-09 -0.4523 -0.2328 FB_posts 0.010967 0.0015177 7.226 318 3.7113e-12 0.0079811 0.013953 scD(t-1) 1.8742 0.057405 32.649 318 1.4251e-103 1.7613 1.9871 NEWS(T) -0.10647 0.016281 -6.5397 318 2.4586e-10 -0.13851 -0.074441 Ret(t) -5.5532 0.10637 -52.207 318 5.0366e-158 -5.7624 -5.3439 Ret(t-1) 0.17885 0.13126 1.3626 318 0.17396 -0.079387 0.43709 Y(2011) -0.30798 0.030747 -10.017 318 1.0467e-20 -0.36847 -0.24749 Y(2012) -0.32176 0.031833 -10.108 318 5.1894e-21 -0.38439 -0.25913 Y(2013) -0.73415 0.036955 -19.866 318 1.1378e-57 -0.80686 -0.66144 Y(2014) -0.64433 0.03684 -17.49 318 1.8486e-48 -0.71681 -0.57185 Y(2015) -0.61576 0.038191 -16.123 318 3.6959e-43 -0.69089 -0.54062 Y(2016) -0.025703 0.032439 -0.79233 318 0.42876 -0.089525 0.03812 M(Feb) 0.18572 0.034169 5.4354 318 1.0905e-07 0.11849 0.25294 M(Mar) 0.13361 0.036486 3.662 318 0.00029294 0.061829 0.2054 M(Apr) 0.27877 0.036435 7.6511 318 2.3957e-13 0.20709 0.35046 M(May) -0.15644 0.035718 -4.3798 318 1.6145e-05 -0.22671 -0.086166 M(Jun) 0.15413 0.036492 4.2237 318 3.1417e-05 0.082336 0.22593 M(Jul) 0.12814 0.036428 3.5177 318 0.00049872 0.056472 0.19981 M(Aug) -0.083501 0.034778 -2.401 318 0.016925 -0.15192 -0.015077 M(Sep) -0.079554 0.035298 -2.2538 318 0.024892 -0.149 -0.010106 M(Oct) 0.22586 0.033608 6.7204 318 8.388e-11 0.15974 0.29198 M(Nov) 0.065511 0.034227 1.914 318 0.056512 -0.0018281 0.13285 M(Dec) -0.14215 0.034438 -4.1277 318 4.6851e-05 -0.20991 -0.074397

2. Companies, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.30743 0.05575 -5.5144 318 7.253e-08 -0.41711 -0.19774 FB_comments 5.5534e-06 3.3498e-06 1.6578 318 0.098342 -1.0373e-06 1.2144e-05 scD(t-1) 1.9194 0.057063 33.636 318 9.1555e-107 1.8071 2.0317 NEWS(T) -0.11705 0.016207 -7.2219 318 3.8087e-12 -0.14893 -0.085159 Ret(t) -5.5641 0.10647 -52.259 318 3.7858e-158 -5.7735 -5.3546 Ret(t-1) 0.31624 0.12987 2.435 318 0.015441 0.060722 0.57176 Y(2011) -0.28513 0.030771 -9.266 318 2.9999e-18 -0.34567 -0.22459 Y(2012) -0.27476 0.032301 -8.5065 318 7.135e-16 -0.33831 -0.21122 Y(2013) -0.67357 0.036362 -18.524 318 1.7792e-52 -0.74511 -0.60203

* Corresponding author, [email protected]

Page 200: TUNI baltakys arkisto1

Y(2014) -0.56256 0.034976 -16.084 318 5.2165e-43 -0.63137 -0.49374 Y(2015) -0.6157 0.038325 -16.065 318 6.1832e-43 -0.6911 -0.54029 Y(2016) 0.018798 0.032304 0.5819 318 0.56105 -0.044759 0.082355 M(Feb) 0.13989 0.033566 4.1677 318 3.9711e-05 0.073851 0.20593 M(Mar) 0.1065 0.036373 2.928 318 0.0036579 0.034937 0.17806 M(Apr) 0.23245 0.036001 6.4568 318 3.9981e-10 0.16162 0.30328 M(May) -0.17939 0.035568 -5.0436 318 7.6917e-07 -0.24937 -0.10941 M(Jun) 0.13576 0.036441 3.7255 318 0.00023063 0.064064 0.20746 M(Jul) 0.12108 0.036377 3.3285 318 0.00097549 0.049512 0.19265 M(Aug) -0.095455 0.0347 -2.7509 318 0.0062838 -0.16372 -0.027185 M(Sep) -0.083545 0.035273 -2.3685 318 0.018458 -0.15294 -0.014146 M(Oct) 0.2267 0.033603 6.7466 318 7.1619e-11 0.16059 0.29281 M(Nov) 0.071577 0.034108 2.0985 318 0.036647 0.0044706 0.13868 M(Dec) -0.14686 0.034319 -4.2791 318 2.4857e-05 -0.21438 -0.079335

3. Companies, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.28819 0.055549 -5.1881 318 3.7924e-07 -0.39748 -0.1789 FB_likes 7.606e-07 2.7732e-07 2.7427 318 0.0064393 2.1499e-07 1.3062e-06 scD(t-1) 1.9119 0.057135 33.463 318 3.289e-106 1.7995 2.0243 NEWS(T) -0.11035 0.016403 -6.7272 318 8.0522e-11 -0.14262 -0.078074 Ret(t) -5.5392 0.10702 -51.761 318 5.8002e-157 -5.7498 -5.3287 Ret(t-1) 0.30788 0.12997 2.3688 318 0.018442 0.052166 0.56359 Y(2011) -0.28845 0.030686 -9.4001 318 1.1111e-18 -0.34882 -0.22807 Y(2012) -0.30968 0.035972 -8.6088 318 3.4696e-16 -0.38045 -0.2389 Y(2013) -0.73109 0.04381 -16.688 318 2.4001e-45 -0.81728 -0.64489 Y(2014) -0.61224 0.039751 -15.402 318 2.2342e-40 -0.69045 -0.53404 Y(2015) -0.63232 0.038399 -16.467 318 1.7183e-44 -0.70787 -0.55677 Y(2016) 0.0041003 0.032173 0.12745 318 0.89867 -0.059198 0.067399 M(Feb) 0.13855 0.033481 4.1383 318 4.4855e-05 0.072681 0.20442 M(Mar) 0.10029 0.036225 2.7687 318 0.0059587 0.029024 0.17157 M(Apr) 0.23194 0.035749 6.488 318 3.33e-10 0.1616 0.30227 M(May) -0.18324 0.035476 -5.1651 318 4.248e-07 -0.25303 -0.11344 M(Jun) 0.13145 0.03636 3.6153 318 0.00034863 0.059917 0.20299 M(Jul) 0.10745 0.036616 2.9344 318 0.003585 0.035406 0.17949 M(Aug) -0.10812 0.035 -3.089 318 0.0021853 -0.17698 -0.039256 M(Sep) -0.093253 0.035574 -2.6214 318 0.0091787 -0.16324 -0.023263 M(Oct) 0.21122 0.034196 6.1766 318 1.9995e-09 0.14394 0.2785 M(Nov) 0.063598 0.034218 1.8586 318 0.064005 -0.0037239 0.13092 M(Dec) -0.15842 0.034489 -4.5934 318 6.2939e-06 -0.22628 -0.090566

4. Companies, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.29924 0.055651 -5.3771 318 1.4686e-07 -0.40873 -0.18975 FB_shares -6.4e-07 2.2416e-06 -0.28551 318 0.77544 -5.0502e-06 3.7702e-06 scD(t-1) 1.9216 0.057164 33.615 318 1.0717e-106 1.8091 2.034 NEWS(T) -0.11814 0.016425 -7.1925 318 4.584e-12 -0.15045 -0.085823 Ret(t) -5.5749 0.1073 -51.957 318 1.9808e-157 -5.786 -5.3638 Ret(t-1) 0.31521 0.12987 2.4271 318 0.015774 0.059697 0.57072 Y(2011) -0.27668 0.030626 -9.0342 318 1.6401e-17 -0.33693 -0.21642 Y(2012) -0.25257 0.035525 -7.1097 318 7.7103e-12 -0.32247 -0.18268 Y(2013) -0.65575 0.039594 -16.562 318 7.3876e-45 -0.73365 -0.57785 Y(2014) -0.55774 0.036032 -15.479 318 1.1299e-40 -0.62863 -0.48684 Y(2015) -0.61977 0.038476 -16.108 318 4.2226e-43 -0.69546 -0.54407 Y(2016) 0.012991 0.03221 0.40332 318 0.68698 -0.05038 0.076362 M(Feb) 0.13545 0.033473 4.0466 318 6.5306e-05 0.069596 0.20131 M(Mar) 0.10118 0.036242 2.7918 318 0.005558 0.029878 0.17249 M(Apr) 0.2233 0.035751 6.2459 318 1.3495e-09 0.15296 0.29364 M(May) -0.1834 0.0355 -5.1661 318 4.2267e-07 -0.25325 -0.11355 M(Jun) 0.1317 0.036376 3.6206 318 0.00034191 0.060134 0.20327 M(Jul) 0.11993 0.036404 3.2943 318 0.0010979 0.048303 0.19155 M(Aug) -0.094141 0.035099 -2.6821 318 0.0076977 -0.1632 -0.025085 M(Sep) -0.079099 0.035439 -2.232 318 0.026313 -0.14882 -0.0093743 M(Oct) 0.23129 0.034139 6.775 318 6.0335e-11 0.16413 0.29846 M(Nov) 0.071537 0.034159 2.0942 318 0.037031 0.0043306 0.13874 M(Dec) -0.14772 0.034442 -4.2889 318 2.3848e-05 -0.21548 -0.079954

Page 201: TUNI baltakys arkisto1

Financial and insurance institutions 5. Financial, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.27359 0.14378 -1.9028 318 0.05797 -0.55647 0.0092961 FB_posts 0.0059153 0.0044514 1.3289 318 0.18485 -0.0028426 0.014673 scD(t-1) 1.4857 0.14501 10.246 318 1.7822e-21 1.2004 1.771 NEWS(T) -0.10078 0.046347 -2.1744 318 0.03041 -0.19196 -0.0095925 Ret(t) -1.741 0.32126 -5.4192 318 1.1845e-07 -2.373 -1.1089 Ret(t-1) -1.014 0.34216 -2.9636 318 0.0032702 -1.6872 -0.34084 Y(2011) -0.26161 0.090425 -2.8931 318 0.004078 -0.43952 -0.083701 Y(2012) -0.37656 0.094684 -3.977 318 8.6461e-05 -0.56284 -0.19027 Y(2013) -0.63504 0.10044 -6.3226 318 8.7001e-10 -0.83265 -0.43743 Y(2014) -0.45838 0.098697 -4.6443 318 5.0016e-06 -0.65256 -0.2642 Y(2015) -0.37838 0.096714 -3.9123 318 0.00011182 -0.56866 -0.1881 Y(2016) -0.086654 0.096335 -0.89951 318 0.36906 -0.27619 0.10288 M(Feb) 0.040755 0.1021 0.39918 318 0.69003 -0.16012 0.24163 M(Mar) -0.021464 0.10422 -0.20594 318 0.83697 -0.22651 0.18358 M(Apr) 0.27561 0.10545 2.6137 318 0.0093828 0.068146 0.48308 M(May) -0.084239 0.10172 -0.82814 318 0.40821 -0.28437 0.11589 M(Jun) -0.11166 0.10319 -1.0821 318 0.28001 -0.31467 0.091351 M(Jul) 0.085172 0.10749 0.79238 318 0.42873 -0.12631 0.29665 M(Aug) -0.14365 0.10012 -1.4347 318 0.15235 -0.34063 0.05334 M(Sep) -0.22464 0.098681 -2.2764 318 0.023486 -0.41879 -0.030488 M(Oct) 0.20171 0.098066 2.0568 318 0.040516 0.0087666 0.39465 M(Nov) -0.0461 0.099805 -0.4619 318 0.64447 -0.24246 0.15026 M(Dec) 0.0050582 0.10184 0.049668 318 0.96042 -0.19531 0.20542

6. Financial, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.25559 0.14224 -1.7969 318 0.073298 -0.53543 0.024257 FB_comments 1.2522e-05 9.6864e-06 1.2927 318 0.19705 -6.5359e-06 3.1579e-05 scD(t-1) 1.4851 0.14514 10.232 318 1.9797e-21 1.1995 1.7706 NEWS(T) -0.10288 0.046279 -2.223 318 0.026917 -0.19393 -0.011828 Ret(t) -1.7315 0.3215 -5.3857 318 1.4056e-07 -2.364 -1.099 Ret(t-1) -0.96583 0.34042 -2.8371 318 0.0048441 -1.6356 -0.29606 Y(2011) -0.2659 0.090877 -2.9259 318 0.0036817 -0.4447 -0.087102 Y(2012) -0.3836 0.096472 -3.9763 318 8.6685e-05 -0.57341 -0.1938 Y(2013) -0.63089 0.099859 -6.3178 318 8.9449e-10 -0.82735 -0.43442 Y(2014) -0.42411 0.094102 -4.5069 318 9.2562e-06 -0.60925 -0.23897 Y(2015) -0.37767 0.096808 -3.9012 318 0.00011683 -0.56814 -0.1872 Y(2016) -0.054148 0.096408 -0.56165 318 0.57475 -0.24382 0.13553 M(Feb) 0.023447 0.10081 0.23258 318 0.81624 -0.1749 0.22179 M(Mar) -0.026627 0.10389 -0.2563 318 0.79789 -0.23103 0.17778 M(Apr) 0.26566 0.10421 2.5492 318 0.011267 0.060625 0.4707 M(May) -0.089213 0.10142 -0.87963 318 0.37972 -0.28875 0.11033 M(Jun) -0.11135 0.10323 -1.0787 318 0.28152 -0.31444 0.091738 M(Jul) 0.08613 0.10748 0.80137 318 0.42352 -0.12533 0.29759 M(Aug) -0.1475 0.10002 -1.4746 318 0.1413 -0.34429 0.049295 M(Sep) -0.23391 0.098936 -2.3643 318 0.018666 -0.42856 -0.03926 M(Oct) 0.1947 0.098145 1.9838 318 0.048138 0.0016043 0.3878 M(Nov) -0.038349 0.099645 -0.38485 318 0.7006 -0.2344 0.1577 M(Dec) 0.0061782 0.10184 0.060664 318 0.95166 -0.19419 0.20655

7. Financial, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.2329 0.14207 -1.6394 318 0.10212 -0.51241 0.046609 FB_likes 9.4247e-07 7.7724e-07 1.2126 318 0.22619 -5.8672e-07 2.4717e-06 scD(t-1) 1.4932 0.14464 10.323 318 9.737e-22 1.2086 1.7777 NEWS(T) -0.095804 0.04683 -2.0458 318 0.041602 -0.18794 -0.0036675 Ret(t) -1.7111 0.32258 -5.3044 318 2.1219e-07 -2.3457 -1.0764 Ret(t-1) -0.96512 0.34064 -2.8333 318 0.0049018 -1.6353 -0.29493 Y(2011) -0.25983 0.090342 -2.876 318 0.0042988 -0.43757 -0.082084 Y(2012) -0.40819 0.10564 -3.8639 318 0.00013526 -0.61604 -0.20034 Y(2013) -0.68726 0.12076 -5.6912 318 2.8659e-08 -0.92485 -0.44967 Y(2014) -0.48133 0.1074 -4.4819 318 1.0339e-05 -0.69262 -0.27003 Y(2015) -0.39996 0.096603 -4.1403 318 4.4486e-05 -0.59002 -0.2099 Y(2016) -0.07959 0.095878 -0.83012 318 0.40709 -0.26823 0.10905 M(Feb) 0.021093 0.10076 0.20934 318 0.83432 -0.17715 0.21934 M(Mar) -0.040063 0.10364 -0.38654 318 0.69935 -0.24398 0.16385 M(Apr) 0.2576 0.10352 2.4883 318 0.013347 0.053919 0.46127 M(May) -0.098709 0.10124 -0.97502 318 0.33029 -0.29789 0.10047 M(Jun) -0.12185 0.10286 -1.1846 318 0.23708 -0.32422 0.080529 M(Jul) 0.068195 0.10801 0.63137 318 0.52825 -0.14431 0.2807 M(Aug) -0.1627 0.1007 -1.6158 318 0.10714 -0.36082 0.035414 M(Sep) -0.23992 0.099478 -2.4118 318 0.016441 -0.43564 -0.044202 M(Oct) 0.17868 0.099592 1.7941 318 0.073741 -0.017261 0.37462 M(Nov) -0.048569 0.099917 -0.48609 318 0.62724 -0.24515 0.14801 M(Dec) -0.0063766 0.10175 -0.062669 318 0.95007 -0.20657 0.19381

Page 202: TUNI baltakys arkisto1

8. Financial, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.24623 0.14238 -1.7294 318 0.084709 -0.52636 0.033895 FB_shares -1.9448e-06 6.3045e-06 -0.30847 318 0.75792 -1.4348e-05 1.0459e-05 scD(t-1) 1.5077 0.14473 10.418 318 4.6489e-22 1.223 1.7925 NEWS(T) -0.10724 0.046981 -2.2826 318 0.023115 -0.19967 -0.014805 Ret(t) -1.7555 0.32294 -5.4359 318 1.0876e-07 -2.3908 -1.1201 Ret(t-1) -0.96966 0.34029 -2.8495 318 0.0046643 -1.6392 -0.30016 Y(2011) -0.24577 0.09027 -2.7226 318 0.0068347 -0.42337 -0.068166 Y(2012) -0.32812 0.10514 -3.1209 318 0.0019686 -0.53496 -0.12127 Y(2013) -0.58476 0.10875 -5.3769 318 1.4702e-07 -0.79873 -0.37079 Y(2014) -0.41073 0.097118 -4.2292 318 3.0698e-05 -0.60181 -0.21966 Y(2015) -0.38722 0.096884 -3.9967 318 7.9894e-05 -0.57783 -0.1966 Y(2016) -0.067442 0.096061 -0.70207 318 0.48315 -0.25644 0.12155 M(Feb) 0.018691 0.10077 0.18549 318 0.85297 -0.17956 0.21694 M(Mar) -0.036118 0.10365 -0.34846 318 0.72773 -0.24005 0.16781 M(Apr) 0.24376 0.10344 2.3564 318 0.019059 0.040235 0.44728 M(May) -0.096697 0.10124 -0.95513 318 0.34024 -0.29588 0.10249 M(Jun) -0.12341 0.10293 -1.199 318 0.23142 -0.32593 0.079099 M(Jul) 0.083551 0.10756 0.77675 318 0.43788 -0.12808 0.29518 M(Aug) -0.14459 0.1009 -1.433 318 0.15283 -0.3431 0.05392 M(Sep) -0.22135 0.099184 -2.2317 318 0.026331 -0.41649 -0.02621 M(Oct) 0.20587 0.099729 2.0643 318 0.039805 0.0096537 0.40208 M(Nov) -0.037633 0.099786 -0.37714 318 0.70632 -0.23396 0.15869 M(Dec) 0.00043414 0.10175 0.0042666 318 0.9966 -0.19976 0.20063

Page 203: TUNI baltakys arkisto1

General governmental organizations 9. Governmental, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.052584 0.24366 -0.21581 309 0.82928 -0.53202 0.42685 FB_posts 0.014641 0.0086466 1.6933 309 0.091412 -0.0023726 0.031655 scD(t-1) 0.70828 0.18245 3.882 309 0.00012669 0.34927 1.0673 NEWS(T) 0.20171 0.095884 2.1037 309 0.036215 0.013041 0.39038 Ret(t) -2.0893 0.67428 -3.0986 309 0.0021229 -3.4161 -0.76259 Ret(t-1) -1.9302 0.69156 -2.7911 309 0.0055802 -3.2909 -0.56943 Y(2011) -0.50268 0.16858 -2.9819 309 0.0030928 -0.83439 -0.17097 Y(2012) -0.48266 0.18034 -2.6765 309 0.0078376 -0.8375 -0.12782 Y(2013) -0.97142 0.19832 -4.8982 309 1.563e-06 -1.3617 -0.58118 Y(2014) -0.8833 0.19908 -4.4369 309 1.2709e-05 -1.275 -0.49158 Y(2015) -1.0279 0.20705 -4.9644 309 1.1414e-06 -1.4353 -0.62048 Y(2016) -0.45068 0.19325 -2.3322 309 0.020335 -0.83093 -0.070435 M(Feb) 0.062391 0.20193 0.30897 309 0.75755 -0.33494 0.45973 M(Mar) 0.26514 0.21088 1.2573 309 0.20958 -0.14979 0.68008 M(Apr) 0.46221 0.20898 2.2117 309 0.027718 0.051002 0.87343 M(May) 0.28204 0.21426 1.3164 309 0.18903 -0.13955 0.70363 M(Jun) 0.1684 0.21964 0.7667 309 0.44384 -0.26378 0.60058 M(Jul) 0.24541 0.22395 1.0958 309 0.27401 -0.19525 0.68607 M(Aug) -0.19373 0.20465 -0.94663 309 0.34457 -0.59641 0.20896 M(Sep) 0.085418 0.20959 0.40756 309 0.68388 -0.32698 0.49781 M(Oct) 0.5011 0.20099 2.4932 309 0.013184 0.10562 0.89659 M(Nov) 0.28888 0.20528 1.4072 309 0.16036 -0.11504 0.6928 M(Dec) 0.074541 0.21514 0.34648 309 0.72922 -0.34879 0.49787

10. Governmental, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.039507 0.23797 -0.16602 309 0.86825 -0.50774 0.42873 FB_comments 5.4546e-05 2.0163e-05 2.7053 309 0.0072031 1.4872e-05 9.4219e-05 scD(t-1) 0.67547 0.18322 3.6867 309 0.00026823 0.31496 1.036 NEWS(T) 0.19591 0.095952 2.0417 309 0.042029 0.0071042 0.38471 Ret(t) -2.0751 0.67443 -3.0768 309 0.0022793 -3.4022 -0.74806 Ret(t-1) -1.7414 0.68892 -2.5278 309 0.011977 -3.097 -0.38586 Y(2011) -0.55124 0.1702 -3.2387 309 0.0013316 -0.88614 -0.21633 Y(2012) -0.5752 0.1853 -3.1041 309 0.0020855 -0.93982 -0.21058 Y(2013) -1.0181 0.19868 -5.124 309 5.2799e-07 -1.409 -0.62711 Y(2014) -0.81809 0.19193 -4.2624 309 2.6895e-05 -1.1958 -0.44043 Y(2015) -1.0116 0.20691 -4.8892 309 1.6304e-06 -1.4188 -0.60449 Y(2016) -0.35213 0.19359 -1.8189 309 0.069895 -0.73305 0.028801 M(Feb) 0.042501 0.20016 0.21234 309 0.83198 -0.35135 0.43635 M(Mar) 0.29009 0.21112 1.374 309 0.17043 -0.12533 0.7055 M(Apr) 0.48291 0.20768 2.3252 309 0.020707 0.074261 0.89157 M(May) 0.30657 0.2145 1.4292 309 0.15395 -0.11549 0.72863 M(Jun) 0.18274 0.21982 0.83128 309 0.40646 -0.2498 0.61527 M(Jul) 0.26122 0.22415 1.1654 309 0.24477 -0.17984 0.70228 M(Aug) -0.18473 0.20467 -0.90255 309 0.36747 -0.58746 0.218 M(Sep) 0.039457 0.21048 0.18746 309 0.85142 -0.3747 0.45361 M(Oct) 0.48516 0.20125 2.4108 309 0.016503 0.089169 0.88115 M(Nov) 0.3232 0.20481 1.578 309 0.11559 -0.079809 0.72621 M(Dec) 0.1013 0.21568 0.46968 309 0.63891 -0.32309 0.5257

11. Governmental, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.079835 0.2354 0.33914 309 0.73473 -0.38336 0.54303 FB_likes 2.6212e-06 1.643e-06 1.5953 309 0.11166 -6.1174e-07 5.8542e-06 scD(t-1) 0.6898 0.18289 3.7717 309 0.00019427 0.32994 1.0497 NEWS(T) 0.21615 0.096781 2.2334 309 0.026235 0.025722 0.40659 Ret(t) -2.0272 0.67495 -3.0035 309 0.002887 -3.3553 -0.69914 Ret(t-1) -1.8004 0.68907 -2.6128 309 0.0094199 -3.1562 -0.44453 Y(2011) -0.50681 0.16892 -3.0004 309 0.002916 -0.83918 -0.17444 Y(2012) -0.57798 0.20504 -2.8189 309 0.0051301 -0.98142 -0.17453 Y(2013) -1.1363 0.24587 -4.6216 309 5.5995e-06 -1.6201 -0.65254 Y(2014) -0.97138 0.22205 -4.3745 309 1.6659e-05 -1.4083 -0.53445 Y(2015) -1.0995 0.20723 -5.3056 309 2.1465e-07 -1.5072 -0.69172 Y(2016) -0.44867 0.19323 -2.322 309 0.020885 -0.82888 -0.068459 M(Feb) 0.010471 0.19905 0.052606 309 0.95808 -0.38119 0.40213 M(Mar) 0.23358 0.20918 1.1166 309 0.26502 -0.17802 0.64518 M(Apr) 0.41772 0.20497 2.0379 309 0.042408 0.014402 0.82103 M(May) 0.25218 0.21259 1.1862 309 0.23645 -0.16613 0.67048 M(Jun) 0.12845 0.21834 0.5883 309 0.55676 -0.30117 0.55807 M(Jul) 0.18769 0.22485 0.83476 309 0.4045 -0.25473 0.63012 M(Aug) -0.26145 0.20544 -1.2726 309 0.20411 -0.66569 0.14279 M(Sep) 0.026845 0.21177 0.12676 309 0.89921 -0.38985 0.44354 M(Oct) 0.43846 0.2035 2.1546 309 0.031963 0.038047 0.83887 M(Nov) 0.28485 0.20495 1.3898 309 0.16558 -0.11843 0.68813 M(Dec) 0.043126 0.21542 0.20019 309 0.84146 -0.38076 0.46701

Page 204: TUNI baltakys arkisto1

12. Governmental, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.097605 0.23615 0.41332 309 0.67966 -0.36706 0.56227 FB_shares 2.1855e-05 1.2734e-05 1.7163 309 0.087118 -3.2015e-06 4.6911e-05 scD(t-1) 0.68168 0.18309 3.7232 309 0.00023377 0.32141 1.0419 NEWS(T) 0.21933 0.096932 2.2627 309 0.024349 0.028597 0.41006 Ret(t) -1.982 0.67637 -2.9303 309 0.0036389 -3.3128 -0.65108 Ret(t-1) -1.8061 0.68895 -2.6215 309 0.0091879 -3.1617 -0.45045 Y(2011) -0.50449 0.16857 -2.9927 309 0.0029879 -0.83619 -0.1728 Y(2012) -0.57892 0.20125 -2.8766 309 0.0042994 -0.97491 -0.18292 Y(2013) -1.0697 0.21872 -4.8907 309 1.6188e-06 -1.5001 -0.63933 Y(2014) -0.88097 0.19843 -4.4396 309 1.2558e-05 -1.2714 -0.49052 Y(2015) -1.1096 0.20778 -5.3405 309 1.8001e-07 -1.5185 -0.7008 Y(2016) -0.45485 0.19347 -2.351 309 0.019349 -0.83554 -0.07417 M(Feb) 0.0039606 0.199 0.019903 309 0.98413 -0.3876 0.39552 M(Mar) 0.23816 0.20917 1.1386 309 0.25575 -0.17342 0.64974 M(Apr) 0.41816 0.2048 2.0418 309 0.042022 0.015178 0.82113 M(May) 0.25279 0.21266 1.1887 309 0.23547 -0.16565 0.67123 M(Jun) 0.13848 0.21841 0.63404 309 0.52652 -0.29128 0.56824 M(Jul) 0.21092 0.22373 0.94274 309 0.34655 -0.22931 0.65115 M(Aug) -0.2736 0.20633 -1.326 309 0.18581 -0.67958 0.13239 M(Sep) 0.029716 0.21121 0.1407 309 0.8882 -0.38587 0.4453 M(Oct) 0.43169 0.20364 2.1199 309 0.034812 0.030997 0.83238 M(Nov) 0.28837 0.20468 1.4089 309 0.15988 -0.11438 0.69112 M(Dec) 0.042815 0.21519 0.19896 309 0.84242 -0.3806 0.46623

Page 205: TUNI baltakys arkisto1

Nonprofit institutions 13. Nonprofit, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.41785 0.19479 2.1451 318 0.032701 0.034608 0.8011 FB_posts 0.032679 0.0061768 5.2907 318 2.2734e-07 0.020527 0.044832 scD(t-1) 1.3724 0.13864 9.899 318 2.5805e-20 1.0997 1.6452 NEWS(T) 0.041756 0.062358 0.66961 318 0.50359 -0.080931 0.16444 Ret(t) -0.63331 0.40943 -1.5468 318 0.1229 -1.4388 0.17222 Ret(t-1) -1.6034 0.42745 -3.751 318 0.00020925 -2.4444 -0.76239 Y(2011) -0.83456 0.12535 -6.6577 318 1.2212e-10 -1.0812 -0.58793 Y(2012) -0.94372 0.13529 -6.9758 318 1.7704e-11 -1.2099 -0.67756 Y(2013) -1.7647 0.14827 -11.902 318 2.8925e-27 -2.0565 -1.473 Y(2014) -1.4929 0.14882 -10.032 318 9.3177e-21 -1.7857 -1.2001 Y(2015) -1.3249 0.14122 -9.3822 318 1.2692e-18 -1.6028 -1.0471 Y(2016) -0.45549 0.13052 -3.4899 318 0.00055141 -0.71227 -0.1987 M(Feb) 0.0066968 0.12964 0.051658 318 0.95883 -0.24836 0.26175 M(Mar) -0.2825 0.13962 -2.0233 318 0.043875 -0.5572 -0.0078023 M(Apr) 0.31919 0.13637 2.3406 318 0.019873 0.050882 0.58749 M(May) -0.28778 0.12792 -2.2496 318 0.025157 -0.53946 -0.036095 M(Jun) -0.34654 0.14148 -2.4494 318 0.014848 -0.62489 -0.068189 M(Jul) -0.45161 0.15709 -2.8749 318 0.0043139 -0.76067 -0.14255 M(Aug) 0.0015287 0.14423 0.010599 318 0.99155 -0.28224 0.2853 M(Sep) -0.34669 0.13489 -2.5702 318 0.010619 -0.61207 -0.081301 M(Oct) 0.18621 0.13462 1.3832 318 0.16756 -0.078645 0.45106 M(Nov) -0.11959 0.12943 -0.924 318 0.35618 -0.37424 0.13505 M(Dec) -0.15748 0.13742 -1.146 318 0.25267 -0.42784 0.11289

14. Nonprofit, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.50168 0.19403 2.5856 318 0.010167 0.11993 0.88342 FB_comments 3.7564e-05 1.3222e-05 2.841 318 0.0047877 1.155e-05 6.3578e-05 scD(t-1) 1.4567 0.13752 10.593 318 1.1745e-22 1.1862 1.7273 NEWS(T) 0.018372 0.062107 0.29582 318 0.76756 -0.10382 0.14056 Ret(t) -0.56132 0.40944 -1.3709 318 0.17137 -1.3669 0.24424 Ret(t-1) -1.3599 0.42627 -3.1901 318 0.0015639 -2.1985 -0.52119 Y(2011) -0.81209 0.12569 -6.461 318 3.8995e-10 -1.0594 -0.5648 Y(2012) -0.87963 0.13787 -6.3801 318 6.2433e-10 -1.1509 -0.60837 Y(2013) -1.6334 0.14549 -11.227 318 7.3382e-25 -1.9196 -1.3471 Y(2014) -1.2501 0.14 -8.9296 318 3.5002e-17 -1.5256 -0.97469 Y(2015) -1.3221 0.14157 -9.3393 318 1.7449e-18 -1.6006 -1.0436 Y(2016) -0.31492 0.13006 -2.4213 318 0.016022 -0.57081 -0.059034 M(Feb) -0.08647 0.12838 -0.67356 318 0.50108 -0.33905 0.16611 M(Mar) -0.31869 0.1397 -2.2812 318 0.023195 -0.59354 -0.043836 M(Apr) 0.24295 0.13544 1.7938 318 0.073787 -0.023512 0.50941 M(May) -0.32371 0.12677 -2.5536 318 0.011129 -0.57312 -0.074301 M(Jun) -0.35988 0.14142 -2.5448 318 0.011406 -0.63812 -0.081648 M(Jul) -0.43714 0.15673 -2.7892 318 0.0056022 -0.7455 -0.12879 M(Aug) -0.0097459 0.14385 -0.067749 318 0.94603 -0.29277 0.27328 M(Sep) -0.37549 0.13432 -2.7955 318 0.0054963 -0.63976 -0.11123 M(Oct) 0.17766 0.13405 1.3253 318 0.18603 -0.086085 0.44141 M(Nov) -0.074263 0.12806 -0.57989 318 0.5624 -0.32622 0.1777 M(Dec) -0.13406 0.13649 -0.98217 318 0.32676 -0.40259 0.13448

15. Nonprofit, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.61566 0.1935 3.1817 318 0.0016086 0.23496 0.99637 FB_likes 4.4277e-06 1.1856e-06 3.7345 318 0.00022288 2.095e-06 6.7603e-06 scD(t-1) 1.4074 0.13848 10.163 318 3.3903e-21 1.1349 1.6798 NEWS(T) 0.053152 0.062844 0.84578 318 0.39831 -0.070491 0.1768 Ret(t) -0.44153 0.41229 -1.0709 318 0.28501 -1.2527 0.36962 Ret(t-1) -1.35 0.42895 -3.1471 318 0.0018048 -2.1939 -0.50603 Y(2011) -0.81176 0.12476 -6.5063 318 2.9909e-10 -1.0572 -0.56629 Y(2012) -1.0462 0.15171 -6.8961 318 2.8892e-11 -1.3447 -0.74774 Y(2013) -1.949 0.17783 -10.96 318 6.3236e-24 -2.2989 -1.5991 Y(2014) -1.5478 0.16373 -9.4532 318 7.4784e-19 -1.87 -1.2257 Y(2015) -1.4197 0.14184 -10.009 318 1.1066e-20 -1.6988 -1.1407 Y(2016) -0.40512 0.12968 -3.124 318 0.0019482 -0.66025 -0.14998 M(Feb) -0.098622 0.12755 -0.77321 318 0.43997 -0.34957 0.15232 M(Mar) -0.33273 0.13869 -2.3991 318 0.01701 -0.60559 -0.059867 M(Apr) 0.23834 0.13466 1.7699 318 0.077697 -0.026599 0.50328 M(May) -0.34163 0.12644 -2.7019 318 0.0072652 -0.59041 -0.092861 M(Jun) -0.38387 0.14074 -2.7275 318 0.006736 -0.66077 -0.10697 M(Jul) -0.51121 0.15695 -3.2572 318 0.0012467 -0.82 -0.20242 M(Aug) -0.078559 0.14468 -0.543 318 0.58751 -0.36321 0.20609 M(Sep) -0.40045 0.13514 -2.9632 318 0.0032745 -0.66634 -0.13457 M(Oct) 0.10546 0.13552 0.77817 318 0.43705 -0.16118 0.3721 M(Nov) -0.10993 0.12838 -0.85626 318 0.3925 -0.3625 0.14265 M(Dec) -0.18347 0.13684 -1.3408 318 0.18095 -0.4527 0.085753

Page 206: TUNI baltakys arkisto1

16. Nonprofit, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.63207 0.19427 3.2535 318 0.0012624 0.24985 1.0143 FB_shares 3.0261e-05 9.6311e-06 3.142 318 0.001836 1.1312e-05 4.921e-05 scD(t-1) 1.4068 0.13881 10.135 318 4.2169e-21 1.1337 1.6799 NEWS(T) 0.049813 0.062948 0.79133 318 0.42934 -0.074035 0.17366 Ret(t) -0.40964 0.41327 -0.9912 318 0.32234 -1.2227 0.40346 Ret(t-1) -1.371 0.42754 -3.2068 318 0.0014788 -2.2122 -0.52986 Y(2011) -0.79589 0.12451 -6.3924 318 5.8153e-10 -1.0408 -0.55093 Y(2012) -0.98982 0.1498 -6.6076 318 1.6453e-10 -1.2846 -0.6951 Y(2013) -1.7965 0.16222 -11.075 318 2.51e-24 -2.1157 -1.4774 Y(2014) -1.3584 0.14566 -9.3263 318 1.921e-18 -1.645 -1.0719 Y(2015) -1.4226 0.1423 -9.9974 318 1.2146e-20 -1.7026 -1.1427 Y(2016) -0.40508 0.1299 -3.1184 318 0.0019849 -0.66066 -0.14951 M(Feb) -0.11333 0.12745 -0.88919 318 0.37457 -0.36409 0.13743 M(Mar) -0.33647 0.13888 -2.4228 318 0.01596 -0.60971 -0.063235 M(Apr) 0.23024 0.13454 1.7113 318 0.087994 -0.034457 0.49494 M(May) -0.33348 0.12646 -2.637 318 0.0087739 -0.58228 -0.084676 M(Jun) -0.37494 0.14092 -2.6606 318 0.0081952 -0.6522 -0.097684 M(Jul) -0.47441 0.15667 -3.028 318 0.0026628 -0.78266 -0.16617 M(Aug) -0.076577 0.14507 -0.52786 318 0.59796 -0.36199 0.20884 M(Sep) -0.39276 0.13482 -2.9131 318 0.0038314 -0.65802 -0.1275 M(Oct) 0.11811 0.13554 0.87137 318 0.38421 -0.14856 0.38478 M(Nov) -0.09836 0.12812 -0.7677 318 0.44324 -0.35044 0.15372 M(Dec) -0.17922 0.13638 -1.3142 318 0.18974 -0.44754 0.089095

Page 207: TUNI baltakys arkisto1

Households 17. Households, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.036811 0.014795 2.4881 318 0.013354 0.007703 0.065918 FB_posts 0.010896 0.00040914 26.632 318 5.7387e-83 0.010091 0.011701 scD(t-1) 1.5001 0.014255 105.24 318 3.4599e-249 1.4721 1.5281 NEWS(T) -0.039957 0.0041561 -9.6142 318 2.2392e-19 -0.048134 -0.03178 Ret(t) -5.2764 0.026473 -199.31 318 0 -5.3285 -5.2243 Ret(t-1) -0.48546 0.031587 -15.369 318 2.9861e-40 -0.54761 -0.42332 Y(2011) -0.21663 0.0081581 -26.554 318 1.0907e-82 -0.23268 -0.20058 Y(2012) -0.20003 0.0083398 -23.984 318 2.6671e-73 -0.21643 -0.18362 Y(2013) -0.75546 0.0094403 -80.025 318 9.414e-213 -0.77404 -0.73689 Y(2014) -0.7962 0.0096552 -82.463 318 1.0453e-216 -0.8152 -0.77721 Y(2015) -0.63079 0.0098027 -64.349 318 2.1219e-184 -0.65008 -0.6115 Y(2016) 0.07737 0.0084833 9.1202 318 8.7552e-18 0.060679 0.09406 M(Feb) 0.2758 0.0090565 30.454 318 2.7852e-96 0.25799 0.29362 M(Mar) 0.21536 0.0096964 22.21 318 1.214e-66 0.19628 0.23444 M(Apr) 0.22415 0.009544 23.486 318 1.9248e-71 0.20537 0.24292 M(May) 0.070463 0.0095385 7.3872 318 1.329e-12 0.051697 0.08923 M(Jun) 0.2067 0.0095061 21.744 318 7.1728e-65 0.188 0.2254 M(Jul) 0.19668 0.0092698 21.217 318 7.3224e-63 0.17844 0.21491 M(Aug) 0.014564 0.0089101 1.6346 318 0.10312 -0.0029658 0.032095 M(Sep) -0.0020314 0.0090309 -0.22494 318 0.82217 -0.019799 0.015736 M(Oct) 0.17061 0.0087869 19.417 318 6.198e-56 0.15332 0.1879 M(Nov) 0.066463 0.0089276 7.4446 318 9.1892e-13 0.048898 0.084027 M(Dec) -0.18137 0.0087435 -20.743 318 4.7857e-61 -0.19857 -0.16416

18. Households, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.11745 0.014779 7.9473 318 3.342e-14 0.088375 0.14653 FB_comments -4.4579e-06 8.8363e-07 -5.045 318 7.6393e-07 -6.1964e-06 -2.7194e-06 scD(t-1) 1.5072 0.014279 105.55 318 1.3585e-249 1.4791 1.5353 NEWS(T) -0.050044 0.0041388 -12.091 318 5.9927e-28 -0.058186 -0.041901 Ret(t) -5.3052 0.026515 -200.08 318 0 -5.3573 -5.253 Ret(t-1) -0.40271 0.031479 -12.793 318 1.6273e-30 -0.46465 -0.34078 Y(2011) -0.18399 0.0081646 -22.535 318 7.1688e-68 -0.20005 -0.16793 Y(2012) -0.13096 0.0084913 -15.423 318 1.8518e-40 -0.14767 -0.11426 Y(2013) -0.682 0.0093033 -73.308 318 2.8931e-201 -0.7003 -0.6637 Y(2014) -0.71917 0.0092151 -78.042 318 1.8711e-209 -0.7373 -0.70104 Y(2015) -0.65546 0.0098601 -66.477 318 1.3941e-188 -0.67486 -0.63606 Y(2016) 0.10514 0.0084697 12.413 318 4.0421e-29 0.088474 0.1218 M(Feb) 0.22511 0.0089298 25.209 318 8.1937e-78 0.20754 0.24268 M(Mar) 0.181 0.0096687 18.72 318 3.1017e-53 0.16197 0.20002 M(Apr) 0.1657 0.0094557 17.524 318 1.3638e-48 0.1471 0.1843 M(May) 0.047583 0.0095291 4.9934 318 9.7965e-07 0.028835 0.066331 M(Jun) 0.18251 0.0094991 19.214 318 3.7718e-55 0.16383 0.2012 M(Jul) 0.19061 0.0092574 20.59 318 1.847e-60 0.1724 0.20882 M(Aug) 0.0051563 0.0088948 0.5797 318 0.56253 -0.012344 0.022656 M(Sep) -0.0022091 0.0090052 -0.24531 318 0.80637 -0.019926 0.015508 M(Oct) 0.17591 0.0087766 20.043 318 2.3599e-58 0.15864 0.19318 M(Nov) 0.069467 0.0089007 7.8046 318 8.6843e-14 0.051955 0.086979 M(Dec) -0.19131 0.0087185 -21.943 318 1.2489e-65 -0.20847 -0.17416

19. Households, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.1051 0.014589 7.2044 318 4.2541e-12 0.076402 0.13381 FB_likes 1.9186e-07 7.0535e-08 2.7201 318 0.0068852 5.3088e-08 3.3064e-07 scD(t-1) 1.5121 0.014262 106.02 318 3.4962e-250 1.484 1.5401 NEWS(T) -0.047872 0.0041922 -11.419 318 1.5368e-25 -0.05612 -0.039624 Ret(t) -5.2895 0.026641 -198.55 318 0 -5.3419 -5.2371 Ret(t-1) -0.39393 0.031462 -12.521 318 1.6351e-29 -0.45583 -0.33203 Y(2011) -0.19163 0.0081284 -23.575 318 8.8996e-72 -0.20762 -0.17564 Y(2012) -0.15729 0.0094328 -16.675 318 2.6956e-45 -0.17585 -0.13873 Y(2013) -0.70775 0.010964 -64.55 318 8.4517e-185 -0.72933 -0.68618 Y(2014) -0.73225 0.010262 -71.355 318 9.4631e-198 -0.75244 -0.71206 Y(2015) -0.65063 0.0097949 -66.426 318 1.7502e-188 -0.6699 -0.63136 Y(2016) 0.10926 0.0084132 12.987 318 3.1326e-31 0.092707 0.12581 M(Feb) 0.22968 0.0088952 25.821 318 4.8215e-80 0.21218 0.24718 M(Mar) 0.18462 0.009634 19.164 318 5.8991e-55 0.16567 0.20358 M(Apr) 0.17505 0.0093885 18.645 318 6.0312e-53 0.15658 0.19352 M(May) 0.050653 0.009505 5.3292 318 1.8729e-07 0.031953 0.069354 M(Jun) 0.18617 0.009474 19.651 318 7.7064e-57 0.16753 0.20481 M(Jul) 0.18845 0.0093244 20.21 318 5.3366e-59 0.17011 0.2068 M(Aug) 0.0025782 0.0089501 0.28807 318 0.77348 -0.015031 0.020187 M(Sep) -0.0055631 0.0090473 -0.61489 318 0.53907 -0.023363 0.012237 M(Oct) 0.17034 0.0089165 19.104 318 1.0087e-54 0.15279 0.18788 M(Nov) 0.06879 0.0089202 7.7118 318 1.6071e-13 0.05124 0.08634 M(Dec) -0.19057 0.0087307 -21.827 318 3.4476e-65 -0.20774 -0.17339

Page 208: TUNI baltakys arkisto1

20. Households, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.10485 0.014605 7.1791 318 4.9886e-12 0.076114 0.13358 FB_shares -9.0402e-06 5.7473e-07 -15.729 318 1.2251e-41 -1.0171e-05 -7.9094e-06 scD(t-1) 1.5051 0.014274 105.44 318 1.902e-249 1.477 1.5331 NEWS(T) -0.060349 0.0041961 -14.382 318 1.787e-36 -0.068604 -0.052093 Ret(t) -5.3519 0.026697 -200.47 318 0 -5.4044 -5.2994 Ret(t-1) -0.41371 0.031478 -13.143 318 8.2429e-32 -0.47564 -0.35178 Y(2011) -0.18012 0.0081142 -22.198 318 1.3478e-66 -0.19609 -0.16416 Y(2012) -0.071317 0.0093383 -7.6371 318 2.6275e-13 -0.08969 -0.052944 Y(2013) -0.62961 0.0099336 -63.381 318 1.8538e-182 -0.64915 -0.61006 Y(2014) -0.69029 0.0094072 -73.379 318 2.1576e-201 -0.7088 -0.67179 Y(2015) -0.63949 0.0098074 -65.205 318 4.2695e-186 -0.65879 -0.62019 Y(2016) 0.1204 0.0084168 14.305 318 3.4979e-36 0.10384 0.13696 M(Feb) 0.22823 0.0089021 25.638 318 2.2229e-79 0.21072 0.24575 M(Mar) 0.1856 0.0096429 19.247 318 2.7954e-55 0.16663 0.20457 M(Apr) 0.16046 0.0093935 17.082 318 7.1174e-47 0.14197 0.17894 M(May) 0.049881 0.0095182 5.2406 318 2.921e-07 0.031155 0.068608 M(Jun) 0.18145 0.0094837 19.133 318 7.7371e-55 0.1628 0.20011 M(Jul) 0.19695 0.009266 21.256 318 5.2127e-63 0.17872 0.21519 M(Aug) 0.022949 0.0089673 2.5592 318 0.010953 0.0053065 0.040592 M(Sep) 0.008407 0.0090222 0.93182 318 0.35214 -0.0093437 0.026158 M(Oct) 0.19793 0.0089067 22.222 318 1.0912e-66 0.18041 0.21545 M(Nov) 0.076644 0.0089086 8.6033 318 3.6065e-16 0.059117 0.094171 M(Dec) -0.17974 0.0087284 -20.593 318 1.8101e-60 -0.19691 -0.16257

Page 209: TUNI baltakys arkisto1

Household activity groups Active investors

21. Active investors, posts Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.60538 0.092679 6.532 318 2.5715e-10 0.42304 0.78772 FB_posts 0.00024134 0.0033095 0.072923 318 0.94191 -0.00627 0.0067527 scD(t-1) -0.82969 0.097284 -8.5285 318 6.1119e-16 -1.0211 -0.63829 NEWS(T) -0.13305 0.033665 -3.9523 318 9.5428e-05 -0.19929 -0.066819 Ret(t) -7.9946 0.27513 -29.058 318 1.6452e-91 -8.5359 -7.4533 Ret(t-1) 1.6007 0.30539 5.2413 318 2.911e-07 0.99981 2.2015 Y(2011) -0.12886 0.062632 -2.0574 318 0.040459 -0.25209 -0.0056358 Y(2012) -0.088595 0.059998 -1.4766 318 0.14077 -0.20664 0.029449 Y(2013) -0.079748 0.06202 -1.2859 318 0.19943 -0.20177 0.042273 Y(2014) -0.020625 0.066816 -0.30868 318 0.75777 -0.15208 0.11083 Y(2015) -0.028919 0.068288 -0.42349 318 0.67223 -0.16327 0.10543 Y(2016) 0.10083 0.063944 1.5768 318 0.11584 -0.024982 0.22663 M(Feb) 0.046021 0.078498 0.58627 318 0.55811 -0.10842 0.20046 M(Mar) 0.087959 0.077226 1.139 318 0.25557 -0.06398 0.2399 M(Apr) -0.098081 0.083243 -1.1783 318 0.23958 -0.26186 0.065695 M(May) -0.11265 0.084266 -1.3368 318 0.18224 -0.27844 0.053142 M(Jun) -0.17391 0.083183 -2.0906 318 0.037354 -0.33757 -0.010247 M(Jul) -0.032485 0.080163 -0.40523 318 0.68558 -0.1902 0.12523 M(Aug) -0.16885 0.080643 -2.0938 318 0.037068 -0.32751 -0.010191 M(Sep) -0.11495 0.078531 -1.4637 318 0.14425 -0.26946 0.039557 M(Oct) 0.032025 0.075289 0.42536 318 0.67086 -0.1161 0.18015 M(Nov) -0.03916 0.075414 -0.51927 318 0.60393 -0.18753 0.10921 M(Dec) -0.050813 0.07676 -0.66197 318 0.50847 -0.20183 0.10021

22. Active investors, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.61757 0.091376 6.7586 318 6.6641e-11 0.43779 0.79735 FB_comments -5.8237e-06 6.7028e-06 -0.86884 318 0.38559 -1.9011e-05 7.3637e-06 scD(t-1) -0.82853 0.096677 -8.5701 318 4.5609e-16 -1.0187 -0.63833 NEWS(T) -0.13361 0.033577 -3.9791 318 8.5729e-05 -0.19967 -0.067545 Ret(t) -8.0021 0.27526 -29.071 318 1.4796e-91 -8.5436 -7.4605 Ret(t-1) 1.6028 0.30324 5.2856 318 2.3318e-07 1.0062 2.1994 Y(2011) -0.1195 0.062149 -1.9228 318 0.055399 -0.24178 0.0027757 Y(2012) -0.066811 0.061938 -1.0787 318 0.28155 -0.18867 0.05505 Y(2013) -0.063958 0.061507 -1.0399 318 0.29919 -0.18497 0.057053 Y(2014) -0.017393 0.063602 -0.27347 318 0.78467 -0.14253 0.10774 Y(2015) -0.036331 0.068405 -0.53112 318 0.59571 -0.17092 0.098253 Y(2016) 0.094549 0.063514 1.4886 318 0.13758 -0.030412 0.21951 M(Feb) 0.044578 0.077387 0.57604 318 0.56499 -0.10768 0.19683 M(Mar) 0.08213 0.076383 1.0752 318 0.28308 -0.06815 0.23241 M(Apr) -0.10733 0.081747 -1.3129 318 0.19016 -0.26816 0.053507 M(May) -0.11733 0.083998 -1.3968 318 0.16345 -0.28259 0.047935 M(Jun) -0.17977 0.083093 -2.1634 318 0.031252 -0.34325 -0.016285 M(Jul) -0.034668 0.080139 -0.4326 318 0.6656 -0.19234 0.123 M(Aug) -0.16769 0.080547 -2.0819 318 0.038151 -0.32616 -0.0092181 M(Sep) -0.10968 0.078731 -1.3931 318 0.16457 -0.26458 0.045222 M(Oct) 0.035482 0.07534 0.47097 318 0.63799 -0.11274 0.18371 M(Nov) -0.039115 0.075418 -0.51865 318 0.60437 -0.1875 0.10927 M(Dec) -0.05407 0.076615 -0.70573 318 0.48087 -0.20481 0.096667

23. Active investors, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.5954 0.090741 6.5616 318 2.16e-10 0.41688 0.77393 FB_likes -9.7367e-07 5.253e-07 -1.8536 318 0.064728 -2.0072e-06 5.9827e-08 scD(t-1) -0.82842 0.096672 -8.5693 318 4.5855e-16 -1.0186 -0.63822 NEWS(T) -0.14222 0.033928 -4.1918 318 3.5905e-05 -0.20897 -0.075468 Ret(t) -8.0472 0.277 -29.052 318 1.7262e-91 -8.5922 -7.5022 Ret(t-1) 1.6012 0.303 5.2843 318 2.3473e-07 1.005 2.1973 Y(2011) -0.11403 0.061849 -1.8437 318 0.066153 -0.23572 0.007652 Y(2012) -0.011414 0.070421 -0.16209 318 0.87134 -0.14996 0.12714 Y(2013) 0.016191 0.078214 0.20701 318 0.83613 -0.13769 0.17007 Y(2014) 0.043365 0.07196 0.60262 318 0.54719 -0.098213 0.18494 Y(2015) -0.018897 0.068181 -0.27716 318 0.78183 -0.15304 0.11524 Y(2016) 0.11316 0.063309 1.7874 318 0.074824 -0.011398 0.23772 M(Feb) 0.046537 0.077373 0.60146 318 0.54796 -0.10569 0.19876 M(Mar) 0.096434 0.076282 1.2642 318 0.20709 -0.053647 0.24651 M(Apr) -0.10167 0.081279 -1.2508 318 0.21191 -0.26158 0.058246 M(May) -0.10686 0.083961 -1.2728 318 0.20402 -0.27205 0.058325 M(Jun) -0.1692 0.082921 -2.0405 318 0.042129 -0.33234 -0.0060549 M(Jul) -0.014418 0.080717 -0.17862 318 0.85835 -0.17322 0.14439 M(Aug) -0.14715 0.081367 -1.8084 318 0.071481 -0.30723 0.012938 M(Sep) -0.083781 0.080274 -1.0437 318 0.29742 -0.24172 0.074155 M(Oct) 0.062521 0.077039 0.81155 318 0.41766 -0.08905 0.21409 M(Nov) -0.023606 0.075872 -0.31113 318 0.75591 -0.17288 0.12567 M(Dec) -0.040007 0.076815 -0.52083 318 0.60285 -0.19114 0.11112

Page 210: TUNI baltakys arkisto1

24. Active investors, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.5885 0.0907 6.4884 318 3.3229e-10 0.41005 0.76695 FB_shares -1.3561e-05 4.1848e-06 -3.2406 318 0.0013189 -2.1795e-05 -5.328e-06 scD(t-1) -0.83397 0.096712 -8.6233 318 3.1319e-16 -1.0242 -0.6437 NEWS(T) -0.15116 0.03404 -4.4407 318 1.2387e-05 -0.21813 -0.084189 Ret(t) -8.1049 0.27788 -29.167 318 6.9221e-92 -8.6517 -7.5582 Ret(t-1) 1.5762 0.30292 5.2034 318 3.5148e-07 0.98025 2.1722 Y(2011) -0.10616 0.061774 -1.7185 318 0.086672 -0.2277 0.015376 Y(2012) 0.040188 0.069535 0.57795 318 0.56371 -0.096619 0.17699 Y(2013) 0.034905 0.068783 0.50746 318 0.61219 -0.10042 0.17023 Y(2014) 0.033569 0.065631 0.51148 318 0.60937 -0.095558 0.1627 Y(2015) -0.0053148 0.068365 -0.077742 318 0.93808 -0.13982 0.12919 Y(2016) 0.12596 0.06347 1.9846 318 0.048044 0.0010908 0.25084 M(Feb) 0.048186 0.0774 0.62255 318 0.53403 -0.1041 0.20047 M(Mar) 0.090362 0.076126 1.187 318 0.23611 -0.059412 0.24014 M(Apr) -0.1087 0.081363 -1.336 318 0.1825 -0.26878 0.051375 M(May) -0.11027 0.083908 -1.3142 318 0.18974 -0.27535 0.054816 M(Jun) -0.17829 0.082859 -2.1517 318 0.03217 -0.34131 -0.01527 M(Jul) -0.020393 0.080197 -0.25428 318 0.79944 -0.17818 0.13739 M(Aug) -0.13075 0.081349 -1.6073 318 0.10898 -0.2908 0.029297 M(Sep) -0.070921 0.079628 -0.89066 318 0.37379 -0.22758 0.085742 M(Oct) 0.081037 0.076761 1.0557 318 0.29191 -0.069987 0.23206 M(Nov) -0.02133 0.075604 -0.28213 318 0.77803 -0.17008 0.12742 M(Dec) -0.035848 0.076734 -0.46718 318 0.64069 -0.18682 0.11512

Page 211: TUNI baltakys arkisto1

Moderate investors 25. Moderate investors, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.54365 0.03572 15.22 318 1.1216e-39 0.47337 0.61392 FB_posts 0.0018524 0.0010636 1.7416 318 0.082539 -0.00024018 0.003945 scD(t-1) -0.024271 0.04043 -0.60033 318 0.54872 -0.10382 0.055273 NEWS(T) -0.14771 0.011301 -13.071 318 1.5195e-31 -0.16995 -0.12548 Ret(t) -7.6544 0.084024 -91.098 318 6.6606e-230 -7.8198 -7.4891 Ret(t-1) 0.47688 0.10513 4.5361 318 8.1314e-06 0.27004 0.68372 Y(2011) -0.27199 0.022059 -12.33 318 8.1202e-29 -0.31539 -0.22859 Y(2012) -0.18958 0.02179 -8.7002 318 1.8137e-16 -0.23245 -0.14671 Y(2013) -0.36673 0.023078 -15.891 318 2.9146e-42 -0.41214 -0.32133 Y(2014) -0.39681 0.024365 -16.287 318 8.6073e-44 -0.44475 -0.34888 Y(2015) -0.44763 0.02471 -18.116 318 6.8641e-51 -0.49625 -0.39902 Y(2016) -0.039814 0.022435 -1.7746 318 0.076917 -0.083955 0.0043262 M(Feb) 0.075762 0.0242 3.1307 318 0.001906 0.02815 0.12337 M(Mar) 0.32328 0.025054 12.903 318 6.3674e-31 0.27399 0.37258 M(Apr) 0.068406 0.025804 2.651 318 0.0084269 0.017638 0.11917 M(May) 0.080219 0.025941 3.0923 318 0.0021618 0.029181 0.13126 M(Jun) 0.12228 0.026217 4.6642 318 4.5678e-06 0.070702 0.17386 M(Jul) 0.15454 0.025664 6.0217 318 4.7619e-09 0.10405 0.20504 M(Aug) -0.037919 0.024942 -1.5203 318 0.12943 -0.086991 0.011153 M(Sep) -0.036723 0.024634 -1.4908 318 0.13702 -0.08519 0.011743 M(Oct) 0.17037 0.023294 7.3141 318 2.1218e-12 0.12454 0.2162 M(Nov) 0.17819 0.023731 7.5086 318 6.0725e-13 0.1315 0.22488 M(Dec) 0.094594 0.02377 3.9796 318 8.5561e-05 0.047828 0.14136

26. Moderate investors, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.54767 0.035417 15.464 318 1.2935e-40 0.47799 0.61735 FB_comments 4.1075e-06 2.2706e-06 1.809 318 0.071392 -3.5973e-07 8.5747e-06 scD(t-1) -0.020608 0.040281 -0.51161 318 0.60928 -0.099859 0.058643 NEWS(T) -0.14904 0.011266 -13.229 318 3.9456e-32 -0.1712 -0.12687 Ret(t) -7.655 0.084026 -91.103 318 6.5597e-230 -7.8203 -7.4897 Ret(t-1) 0.49747 0.10429 4.77 318 2.8112e-06 0.29228 0.70266 Y(2011) -0.27176 0.022024 -12.34 318 7.5044e-29 -0.31509 -0.22843 Y(2012) -0.19268 0.022251 -8.6595 318 2.4223e-16 -0.23646 -0.14891 Y(2013) -0.36604 0.022918 -15.972 318 1.4158e-42 -0.41113 -0.32095 Y(2014) -0.38596 0.023311 -16.557 318 7.7032e-45 -0.43182 -0.34009 Y(2015) -0.44628 0.024765 -18.02 318 1.6074e-50 -0.495 -0.39755 Y(2016) -0.028937 0.022358 -1.2943 318 0.19651 -0.072925 0.015051 M(Feb) 0.069433 0.023792 2.9183 318 0.0037706 0.022622 0.11624 M(Mar) 0.32006 0.02482 12.895 318 6.83e-31 0.27123 0.36889 M(Apr) 0.064499 0.025392 2.5401 318 0.011556 0.014542 0.11446 M(May) 0.079295 0.025894 3.0623 318 0.0023838 0.02835 0.13024 M(Jun) 0.12126 0.026166 4.6342 318 5.2348e-06 0.069778 0.17274 M(Jul) 0.15392 0.025646 6.0015 318 5.3233e-09 0.10346 0.20438 M(Aug) -0.041399 0.024898 -1.6627 318 0.09735 -0.090386 0.0075869 M(Sep) -0.042049 0.024727 -1.7005 318 0.090006 -0.090698 0.0065997 M(Oct) 0.1669 0.023311 7.1599 318 5.6277e-12 0.12104 0.21276 M(Nov) 0.17874 0.023727 7.5334 318 5.1717e-13 0.13206 0.22543 M(Dec) 0.093743 0.023739 3.9489 318 9.6704e-05 0.047038 0.14045

27. Moderate investors, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.55861 0.03544 15.762 318 9.1807e-42 0.48888 0.62833 FB_likes 2.6701e-07 1.807e-07 1.4776 318 0.14049 -8.8506e-08 6.2252e-07 scD(t-1) -0.022048 0.040364 -0.54624 318 0.58528 -0.10146 0.057365 NEWS(T) -0.14671 0.011396 -12.874 318 8.1571e-31 -0.16913 -0.12429 Ret(t) -7.6488 0.084253 -90.784 318 1.9214e-229 -7.8146 -7.4831 Ret(t-1) 0.49417 0.10439 4.7339 318 3.3207e-06 0.28879 0.69954 Y(2011) -0.27044 0.021993 -12.296 318 1.0796e-28 -0.31371 -0.22717 Y(2012) -0.19955 0.025112 -7.9465 318 3.3594e-14 -0.24896 -0.15015 Y(2013) -0.38131 0.028115 -13.563 318 2.2289e-33 -0.43663 -0.326 Y(2014) -0.40231 0.026268 -15.316 318 4.7991e-40 -0.454 -0.35063 Y(2015) -0.45439 0.024799 -18.323 318 1.0757e-51 -0.50318 -0.4056 Y(2016) -0.037196 0.02229 -1.6687 318 0.096154 -0.081051 0.0066585 M(Feb) 0.067934 0.023782 2.8566 318 0.0045644 0.021145 0.11472 M(Mar) 0.31548 0.024739 12.752 318 2.304e-30 0.26681 0.36415 M(Apr) 0.060401 0.025212 2.3957 318 0.017166 0.010797 0.11001 M(May) 0.075376 0.025827 2.9185 318 0.0037675 0.024563 0.12619 M(Jun) 0.11705 0.026116 4.4819 318 1.0336e-05 0.065669 0.16843 M(Jul) 0.14806 0.025807 5.7374 318 2.2398e-08 0.097291 0.19884 M(Aug) -0.046126 0.025193 -1.8309 318 0.068048 -0.095692 0.0034395 M(Sep) -0.045323 0.025121 -1.8042 318 0.072154 -0.094748 0.0041023 M(Oct) 0.16146 0.023835 6.774 318 6.0714e-11 0.11456 0.20835 M(Nov) 0.17508 0.023824 7.3489 318 1.699e-12 0.12821 0.22196 M(Dec) 0.088485 0.023845 3.7108 318 0.0002438 0.041571 0.1354

Page 212: TUNI baltakys arkisto1

28. Moderate investors, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.54964 0.035417 15.519 318 7.9077e-41 0.47996 0.61932 FB_shares -1.7066e-06 1.4039e-06 -1.2157 318 0.22502 -4.4686e-06 1.0554e-06 scD(t-1) -0.015233 0.040303 -0.37795 318 0.70572 -0.094527 0.064062 NEWS(T) -0.15146 0.011409 -13.275 318 2.6545e-32 -0.17391 -0.12901 Ret(t) -7.6699 0.084504 -90.764 318 2.0554e-229 -7.8361 -7.5036 Ret(t-1) 0.50227 0.10429 4.816 318 2.2694e-06 0.29708 0.70745 Y(2011) -0.26343 0.021934 -12.01 318 1.1817e-27 -0.30658 -0.22027 Y(2012) -0.16346 0.024558 -6.6562 318 1.2321e-10 -0.21178 -0.11515 Y(2013) -0.34208 0.024928 -13.723 318 5.5599e-34 -0.39112 -0.29304 Y(2014) -0.37771 0.02393 -15.784 318 7.5416e-42 -0.42479 -0.33063 Y(2015) -0.44708 0.024814 -18.017 318 1.6524e-50 -0.4959 -0.39826 Y(2016) -0.031188 0.022303 -1.3984 318 0.16298 -0.075068 0.012692 M(Feb) 0.068283 0.023785 2.8708 318 0.0043688 0.021487 0.11508 M(Mar) 0.3162 0.024729 12.787 318 1.7152e-30 0.26755 0.36486 M(Apr) 0.057004 0.025229 2.2595 318 0.024532 0.0073669 0.10664 M(May) 0.075899 0.025831 2.9383 318 0.0035418 0.025077 0.12672 M(Jun) 0.1179 0.026107 4.5159 318 8.8939e-06 0.066533 0.16926 M(Jul) 0.15394 0.025659 5.9994 318 5.3871e-09 0.10345 0.20442 M(Aug) -0.035038 0.025287 -1.3856 318 0.16684 -0.084789 0.014713 M(Sep) -0.033083 0.024937 -1.3267 318 0.18557 -0.082146 0.015979 M(Oct) 0.17466 0.023734 7.3591 318 1.5921e-12 0.12796 0.22135 M(Nov) 0.18011 0.023773 7.5764 318 3.9067e-13 0.13334 0.22689 M(Dec) 0.094619 0.02381 3.974 318 8.7497e-05 0.047775 0.14146

Page 213: TUNI baltakys arkisto1

Passive investors 29. Passive investors, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.37889 0.022039 17.191 318 2.667e-47 0.33553 0.42225 FB_posts 0.012314 0.00064394 19.122 318 8.5505e-55 0.011047 0.01358 scD(t-1) 0.68562 0.020412 33.589 318 1.2973e-106 0.64546 0.72578 NEWS(T) -0.053805 0.006542 -8.2246 318 5.0651e-15 -0.066676 -0.040934 Ret(t) -5.2018 0.043553 -119.43 318 3.0976e-266 -5.2875 -5.1161 Ret(t-1) -1.5922 0.049179 -32.375 318 1.1223e-102 -1.6889 -1.4954 Y(2011) -0.27167 0.012836 -21.165 318 1.1575e-62 -0.29692 -0.24641 Y(2012) -0.24963 0.013122 -19.023 318 2.0704e-54 -0.27544 -0.22381 Y(2013) -0.5247 0.014359 -36.542 318 7.3271e-116 -0.55295 -0.49645 Y(2014) -0.55782 0.014777 -37.75 318 1.6201e-119 -0.5869 -0.52875 Y(2015) -0.31908 0.01499 -21.286 318 3.9728e-63 -0.34857 -0.28959 Y(2016) 0.29242 0.013184 22.18 318 1.5779e-66 0.26648 0.31836 M(Feb) 0.31605 0.014244 22.188 318 1.4704e-66 0.28803 0.34408 M(Mar) 0.50699 0.015252 33.24 318 1.7231e-105 0.47698 0.537 M(Apr) 0.31124 0.01526 20.396 318 1.034e-59 0.28122 0.34126 M(May) 0.26069 0.01494 17.449 318 2.6701e-48 0.2313 0.29008 M(Jun) 0.29087 0.014876 19.552 318 1.8533e-56 0.2616 0.32013 M(Jul) 0.15161 0.014626 10.365 318 7.0052e-22 0.12283 0.18039 M(Aug) 0.16516 0.014004 11.794 318 7.078e-27 0.13761 0.19271 M(Sep) 0.27598 0.014267 19.344 318 1.1847e-55 0.24791 0.30405 M(Oct) 0.35994 0.013815 26.055 318 6.828e-81 0.33277 0.38712 M(Nov) 0.4006 0.014293 28.028 318 6.3925e-88 0.37248 0.42872 M(Dec) 0.22895 0.013904 16.466 318 1.7299e-44 0.20159 0.2563

30. Passive investors, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.49717 0.021758 22.85 318 4.6447e-69 0.45436 0.53998 FB_comments -9.3322e-06 1.4257e-06 -6.5458 318 2.3706e-10 -1.2137e-05 -6.5273e-06 scD(t-1) 0.67761 0.02045 33.134 318 3.7865e-105 0.63737 0.71784 NEWS(T) -0.063599 0.0065187 -9.7563 318 7.6502e-20 -0.076424 -0.050773 Ret(t) -5.2286 0.043597 -119.93 318 8.5882e-267 -5.3143 -5.1428 Ret(t-1) -1.5282 0.049135 -31.102 318 1.8423e-98 -1.6249 -1.4315 Y(2011) -0.23317 0.012855 -18.139 318 5.549e-51 -0.25846 -0.20788 Y(2012) -0.1569 0.013342 -11.76 318 9.3498e-27 -0.18315 -0.13065 Y(2013) -0.4354 0.014158 -30.752 318 2.7348e-97 -0.46326 -0.40755 Y(2014) -0.46554 0.014024 -33.195 318 2.4023e-105 -0.49314 -0.43795 Y(2015) -0.35915 0.01505 -23.863 318 7.5196e-73 -0.38876 -0.32954 Y(2016) 0.31943 0.013154 24.283 318 2.0766e-74 0.29355 0.34531 M(Feb) 0.24945 0.013989 17.832 318 8.6512e-50 0.22193 0.27697 M(Mar) 0.45547 0.015167 30.029 318 7.6659e-95 0.42562 0.48531 M(Apr) 0.22674 0.015019 15.096 318 3.3329e-39 0.19719 0.25629 M(May) 0.22648 0.014936 15.163 318 1.8462e-39 0.1971 0.25587 M(Jun) 0.25065 0.014826 16.906 318 3.4171e-46 0.22148 0.27982 M(Jul) 0.13114 0.014593 8.986 318 2.3266e-17 0.10242 0.15985 M(Aug) 0.14713 0.013962 10.538 318 1.8098e-22 0.11966 0.1746 M(Sep) 0.26898 0.014231 18.901 318 6.1528e-54 0.24098 0.29698 M(Oct) 0.3561 0.013813 25.78 318 6.818e-80 0.32892 0.38328 M(Nov) 0.39636 0.014268 27.779 318 4.7742e-87 0.36829 0.42443 M(Dec) 0.20496 0.013859 14.789 318 5.0282e-38 0.17769 0.23223

31. Passive investors, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.47594 0.021504 22.133 318 2.3752e-66 0.43364 0.51825 FB_likes -7.3294e-07 1.1389e-07 -6.4354 318 4.5294e-10 -9.5701e-07 -5.0886e-07 scD(t-1) 0.67793 0.02045 33.15 318 3.3615e-105 0.63769 0.71816 NEWS(T) -0.070091 0.0066076 -10.608 318 1.0435e-22 -0.083091 -0.057091 Ret(t) -5.24 0.043677 -119.97 318 7.6843e-267 -5.326 -5.1541 Ret(t-1) -1.5343 0.04917 -31.204 318 8.4075e-99 -1.631 -1.4375 Y(2011) -0.23608 0.012809 -18.431 318 4.1062e-52 -0.26128 -0.21088 Y(2012) -0.13478 0.014819 -9.0948 318 1.0542e-17 -0.16394 -0.10562 Y(2013) -0.3915 0.016962 -23.081 318 6.3343e-70 -0.42487 -0.35813 Y(2014) -0.41947 0.015979 -26.251 318 1.3434e-81 -0.45091 -0.38803 Y(2015) -0.34191 0.014943 -22.88 318 3.5876e-69 -0.37131 -0.31251 Y(2016) 0.33763 0.013066 25.841 318 4.0721e-80 0.31192 0.36334 M(Feb) 0.25578 0.013934 18.357 318 7.9445e-52 0.22837 0.2832 M(Mar) 0.46588 0.015102 30.849 318 1.2965e-97 0.43616 0.49559 M(Apr) 0.23341 0.014898 15.667 318 2.1255e-41 0.2041 0.26272 M(May) 0.23511 0.014891 15.789 318 7.2128e-42 0.20581 0.26441 M(Jun) 0.25832 0.01478 17.478 318 2.0569e-48 0.22924 0.2874 M(Jul) 0.14593 0.014676 9.943 318 1.8436e-20 0.11705 0.1748 M(Aug) 0.15996 0.014084 11.358 318 2.5405e-25 0.13225 0.18767 M(Sep) 0.27357 0.014266 19.176 318 5.3046e-55 0.2455 0.30164 M(Oct) 0.37173 0.014097 26.369 318 5.0368e-82 0.344 0.39947 M(Nov) 0.40597 0.014305 28.379 318 3.7642e-89 0.37782 0.43411 M(Dec) 0.21751 0.013881 15.67 318 2.0738e-41 0.1902 0.24482

Page 214: TUNI baltakys arkisto1

32. Passive investors, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept 0.47085 0.021534 21.866 318 2.4579e-65 0.42849 0.51322 FB_shares -1.6005e-05 9.0144e-07 -17.755 318 1.7318e-49 -1.7778e-05 -1.4231e-05 scD(t-1) 0.6733 0.020476 32.883 318 2.4733e-104 0.63302 0.71359 NEWS(T) -0.081345 0.0066067 -12.312 318 9.4425e-29 -0.094343 -0.068346 Ret(t) -5.2983 0.043818 -120.92 318 6.6858e-268 -5.3845 -5.2121 Ret(t-1) -1.5621 0.04918 -31.762 318 1.1702e-100 -1.6588 -1.4653 Y(2011) -0.22793 0.012782 -17.833 318 8.6269e-50 -0.25308 -0.20278 Y(2012) -0.05932 0.014556 -4.0753 318 5.8097e-05 -0.087958 -0.030682 Y(2013) -0.34785 0.015125 -22.997 318 1.2994e-69 -0.37761 -0.31809 Y(2014) -0.41334 0.014371 -28.762 318 1.7361e-90 -0.44162 -0.38507 Y(2015) -0.33096 0.014962 -22.12 318 2.6645e-66 -0.3604 -0.30152 Y(2016) 0.3475 0.013068 26.592 318 7.9519e-83 0.32179 0.37321 M(Feb) 0.25625 0.013939 18.384 318 6.2507e-52 0.22883 0.28368 M(Mar) 0.46576 0.015109 30.826 318 1.5485e-97 0.43603 0.49549 M(Apr) 0.22142 0.014896 14.864 318 2.5817e-38 0.19211 0.25073 M(May) 0.23267 0.014903 15.613 318 3.4544e-41 0.20335 0.26199 M(Jun) 0.25306 0.014788 17.112 318 5.44e-47 0.22396 0.28215 M(Jul) 0.14507 0.014595 9.9396 318 1.8913e-20 0.11635 0.17379 M(Aug) 0.1835 0.014116 12.999 318 2.814e-31 0.15573 0.21127 M(Sep) 0.2856 0.014242 20.053 318 2.1558e-58 0.25758 0.31362 M(Oct) 0.39828 0.014062 28.324 318 5.8487e-89 0.37061 0.42594 M(Nov) 0.41146 0.01428 28.814 318 1.1454e-90 0.38336 0.43955 M(Dec) 0.22877 0.013883 16.478 318 1.5644e-44 0.20145 0.25608

Page 215: TUNI baltakys arkisto1

Inactive investors 33. Inactive investors, posts

Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.53309 0.023454 -22.729 318 1.3256e-68 -0.57923 -0.48694 FB_posts 0.014977 0.00067134 22.309 318 5.118e-67 0.013656 0.016298 scD(t-1) 2.7061 0.021064 128.47 318 4.2012e-276 2.6647 2.7476 NEWS(T) 0.014338 0.0066059 2.1705 318 0.03071 0.0013412 0.027335 Ret(t) -4.6532 0.039041 -119.19 318 5.8857e-266 -4.7301 -4.5764 Ret(t-1) -0.44413 0.047307 -9.3882 318 1.2133e-18 -0.5372 -0.35105 Y(2011) -0.2492 0.012984 -19.193 318 4.5602e-55 -0.27474 -0.22365 Y(2012) -0.15783 0.01348 -11.708 318 1.4353e-26 -0.18435 -0.13131 Y(2013) -0.89546 0.016024 -55.883 318 1.6839e-166 -0.92699 -0.86393 Y(2014) -0.95598 0.01612 -59.305 318 5.4177e-174 -0.9877 -0.92427 Y(2015) -0.80527 0.016245 -49.57 318 1.1995e-151 -0.83723 -0.77331 Y(2016) -0.093399 0.01375 -6.7925 318 5.4283e-11 -0.12045 -0.066346 M(Feb) 0.28113 0.014101 19.936 318 6.0939e-58 0.25338 0.30887 M(Mar) 0.0071384 0.015488 0.4609 318 0.64519 -0.023333 0.03761 M(Apr) 0.22834 0.01472 15.513 318 8.3886e-41 0.19938 0.2573 M(May) -0.14367 0.014649 -9.8077 318 5.1775e-20 -0.17249 -0.11485 M(Jun) 0.17724 0.014879 11.912 318 2.672e-27 0.14796 0.20651 M(Jul) 0.25704 0.014413 17.834 318 8.5142e-50 0.22869 0.2854 M(Aug) -0.13043 0.01374 -9.4925 318 5.5763e-19 -0.15746 -0.10339 M(Sep) -0.22186 0.014273 -15.544 318 6.354e-41 -0.24994 -0.19378 M(Oct) -0.0094852 0.013844 -0.68515 318 0.49375 -0.036723 0.017752 M(Nov) -0.1521 0.01382 -11.006 318 4.3775e-24 -0.17929 -0.12491 M(Dec) -0.58371 0.013784 -42.346 318 9.3673e-133 -0.61083 -0.55659

34. Inactive investors, comments Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.49684 0.023623 -21.032 318 3.7415e-62 -0.54332 -0.45036 FB_comments 1.0111e-05 1.4096e-06 7.1729 318 5.1878e-12 7.3373e-06 1.2884e-05 scD(t-1) 2.7456 0.021074 130.29 318 5.2664e-278 2.7041 2.787 NEWS(T) 0.002696 0.0065825 0.40957 318 0.6824 -0.010255 0.015647 Ret(t) -4.6551 0.03919 -118.78 318 1.7018e-265 -4.7322 -4.5779 Ret(t-1) -0.29915 0.047047 -6.3585 318 7.0753e-10 -0.39171 -0.20659 Y(2011) -0.22246 0.012996 -17.118 318 5.1191e-47 -0.24803 -0.1969 Y(2012) -0.11503 0.013715 -8.387 318 1.646e-15 -0.14202 -0.088047 Y(2013) -0.82229 0.01575 -52.21 318 4.9675e-158 -0.85328 -0.79131 Y(2014) -0.84671 0.015325 -55.249 318 4.5234e-165 -0.87687 -0.81656 Y(2015) -0.80669 0.016353 -49.33 318 4.7025e-151 -0.83886 -0.77451 Y(2016) -0.034296 0.013727 -2.4984 318 0.012979 -0.061303 -0.0072887 M(Feb) 0.24096 0.014016 17.191 318 2.6781e-47 0.21338 0.26853 M(Mar) -0.0068534 0.015519 -0.4416 318 0.65908 -0.037387 0.02368 M(Apr) 0.19463 0.014702 13.238 318 3.6338e-32 0.1657 0.22355 M(May) -0.15241 0.014634 -10.415 318 4.754e-22 -0.1812 -0.12362 M(Jun) 0.17608 0.014915 11.805 318 6.4442e-27 0.14673 0.20542 M(Jul) 0.27194 0.01437 18.924 318 5.0125e-54 0.24366 0.30021 M(Aug) -0.12676 0.013718 -9.2408 318 3.6138e-18 -0.15375 -0.099774 M(Sep) -0.20609 0.014215 -14.498 318 6.4797e-37 -0.23406 -0.17812 M(Oct) 0.015157 0.013759 1.1017 318 0.27144 -0.011912 0.042227 M(Nov) -0.12793 0.013718 -9.3261 318 1.9235e-18 -0.15492 -0.10094 M(Dec) -0.56432 0.0137 -41.192 318 1.5938e-129 -0.59128 -0.53737

35. Inactive investors, likes Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.48393 0.023321 -20.751 318 4.4679e-61 -0.52981 -0.43805 FB_likes 1.8368e-06 1.149e-07 15.987 318 1.2431e-42 1.6108e-06 2.0629e-06 scD(t-1) 2.7522 0.021051 130.74 318 1.7823e-278 2.7108 2.7936 NEWS(T) 0.0173 0.0066498 2.6016 318 0.0097143 0.0042166 0.030383 Ret(t) -4.5906 0.0395 -116.22 318 1.5019e-262 -4.6683 -4.5129 Ret(t-1) -0.30593 0.047069 -6.4996 318 3.1122e-10 -0.39853 -0.21332 Y(2011) -0.22639 0.012898 -17.552 318 1.062e-48 -0.25176 -0.20101 Y(2012) -0.21376 0.015283 -13.987 318 5.602e-35 -0.24383 -0.18369 Y(2013) -0.95545 0.018229 -52.415 318 1.6231e-158 -0.99131 -0.91958 Y(2014) -0.9579 0.016858 -56.822 318 1.3592e-168 -0.99107 -0.92474 Y(2015) -0.82965 0.016244 -51.076 318 2.5489e-155 -0.86161 -0.7977 Y(2016) -0.059337 0.013624 -4.3555 318 1.7934e-05 -0.086141 -0.032533 M(Feb) 0.23785 0.013914 17.095 318 6.3287e-47 0.21047 0.26522 M(Mar) -0.020017 0.015438 -1.2967 318 0.19568 -0.05039 0.010355 M(Apr) 0.20427 0.014622 13.97 318 6.5037e-35 0.1755 0.23304 M(May) -0.15798 0.014578 -10.837 318 1.6882e-23 -0.18666 -0.1293 M(Jun) 0.17349 0.014865 11.671 318 1.9495e-26 0.14425 0.20274 M(Jul) 0.24147 0.014482 16.674 318 2.7129e-45 0.21297 0.26996 M(Aug) -0.14518 0.013767 -10.546 318 1.7006e-22 -0.17227 -0.1181 M(Sep) -0.21678 0.01429 -15.17 318 1.7371e-39 -0.24489 -0.18866 M(Oct) -0.014625 0.013887 -1.0531 318 0.2931 -0.041948 0.012698 M(Nov) -0.14076 0.013736 -10.248 318 1.754e-21 -0.16778 -0.11373 M(Dec) -0.5834 0.013735 -42.475 318 4.1205e-133 -0.61042 -0.55637

Page 216: TUNI baltakys arkisto1

36. Inactive investors, shares Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper Intercept -0.46994 0.023328 -20.145 318 9.5339e-59 -0.51584 -0.42405 FB_shares 1.0911e-07 9.7565e-07 0.11184 318 0.91102 -1.8104e-06 2.0286e-06 scD(t-1) 2.737 0.021045 130.06 318 9.0966e-278 2.6956 2.7784 NEWS(T) 0.0015177 0.0066619 0.22781 318 0.81994 -0.011589 0.014625 Ret(t) -4.6787 0.039507 -118.43 318 4.3212e-265 -4.7565 -4.601 Ret(t-1) -0.30962 0.047036 -6.5825 318 1.9086e-10 -0.40216 -0.21708 Y(2011) -0.20882 0.012887 -16.204 318 1.7989e-43 -0.23417 -0.18346 Y(2012) -0.086963 0.015318 -5.6772 318 3.087e-08 -0.1171 -0.056826 Y(2013) -0.80129 0.016866 -47.508 318 1.7795e-146 -0.83448 -0.76811 Y(2014) -0.84592 0.01565 -54.053 318 2.4467e-162 -0.87671 -0.81513 Y(2015) -0.82065 0.016284 -50.396 318 1.132e-153 -0.85269 -0.78861 Y(2016) -0.047446 0.013637 -3.4791 318 0.00057329 -0.074276 -0.020615 M(Feb) 0.22938 0.013925 16.473 318 1.6367e-44 0.20198 0.25677 M(Mar) -0.016097 0.015474 -1.0403 318 0.29901 -0.04654 0.014347 M(Apr) 0.18046 0.014639 12.327 318 8.3442e-29 0.15165 0.20926 M(May) -0.15891 0.014614 -10.874 318 1.2599e-23 -0.18766 -0.13016 M(Jun) 0.16915 0.014893 11.358 318 2.5379e-25 0.13985 0.19846 M(Jul) 0.27327 0.014384 18.999 318 2.5698e-54 0.24497 0.30157 M(Aug) -0.12712 0.013776 -9.2275 318 3.9857e-18 -0.15422 -0.10001 M(Sep) -0.20558 0.014211 -14.467 318 8.4953e-37 -0.23354 -0.17762 M(Oct) 0.017963 0.013893 1.2929 318 0.19697 -0.0093709 0.045296 M(Nov) -0.13051 0.013712 -9.5176 318 4.6213e-19 -0.15748 -0.10353 M(Dec) -0.56859 0.0137 -41.502 318 2.1318e-130 -0.59554 -0.54163

Page 217: TUNI baltakys arkisto1
Page 218: TUNI baltakys arkisto1