a pathway and network oriented approach to enlighten ...jan 15, 2020 · 28 bottom-up manner, and...

A Pathway and Network Oriented Approach to Enlighten Molecular

Mechanisms of Type 2 Diabetes Using Multiple Association Studies

Burcu Bakir-Gungor1*, Miray Unlu Yazici2, Gokhan Goy1, Mustafa Temiz1 1

1 Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey. 2

2 Department of Bioengineering, Abdullah Gul University, Kayseri, Turkey. 3

* Correspondence: 4

Burcu Bakir-Gungor, [email protected] 5

6

Keywords: Genome-wide association study (GWAS), multiple association studies, single 7

nucleotide polymorphism (SNP), subnetwork identification, pathway subnetwork, pathway 8

clustering analysis, normalized mutual information (NMI), type 2 diabetes. 9

Abstract 10

Diabetes Mellitus (DM) is a group of metabolic disorder that is characterized by pancreatic 11

dysfunction in insulin producing beta cells, glucagon secreting alpha cells, and insulin resistance or 12

insulin in-functionality related hyperglycemia. Type 2 Diabetes Mellitus (T2D), which constitutes 13

90% of the diabetes cases, is a complex multifactorial disease. In the last decade, genome-wide 14

association studies (GWASs) for type 2 diabetes (T2D) successfully pinpointed the genetic variants 15

(typically single nucleotide polymorphisms, SNPs) that associate with disease risk. However, 16

traditional GWASs focus on the ‘the tip of the iceberg’ SNPs, and the SNPs with mild effects are 17

discarded. In order to diminish the burden of multiple testing in GWAS, researchers attempted to 18

evaluate the collective effects of interesting variants. In this regard, pathway-based analyses of 19

GWAS became popular to discover novel multi-genic functional associations. Still, to reveal the 20

unaccounted 85 to 90% of T2D variation, which lies hidden in GWAS datasets, new post-GWAS 21

strategies need to be developed. In this respect, here we reanalyze three meta-analysis data of GWAS 22

in T2D, using the methodology that we have developed to identify disease-associated pathways by 23

combining nominally significant evidence of genetic association with the known biochemical 24

pathways, protein-protein interaction (PPI) networks, and the functional information of selected 25

SNPs. In this research effort, to enlighten the molecular mechanisms underlying T2D development 26

and progress, we integrated different in-silico approaches that proceed in top-down manner and 27

bottom-up manner, and hence presented a comprehensive analysis at protein subnetwork, pathway, 28

and pathway subnetwork levels. Our network and pathway-oriented approach is based on both the 29

significance level of an affected pathway and its topological relationship with its neighbor pathways. 30

Using the mutual information based on the shared genes, the identified protein subnetworks and the 31

affected pathways of each dataset were compared. While, most of the identified pathways 32

recapitulate the pathophysiology of T2D, our results show that incorporating SNP functional 33

properties, protein-protein interaction networks into GWAS can dissect leading molecular pathways, 34

which cannot be picked up using traditional analyses. We hope to bridge the knowledge gap from 35

sequence to consequence. 36

37

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 15, 2020. ; https://doi.org/10.1101/2020.01.14.905547doi: bioRxiv preprint

mailto:[email protected]

mailto:[email protected]

https://doi.org/10.1101/2020.01.14.905547

Post-GWAS Methodology to Enlighten T2D

PAGE \* \*

MERGEFORMAT 8

This is a provisional file, not the final typeset article

1 Introduction 38

More than 400 million adults struggle with Diabetes Mellitus, and this number is expected to reach 39

600 million by 2040 (International Diabetes Federation, 2017). Type 1 and Type 2 Diabetes Mellitus 40

(T1DM, T2DM) are the two main types of Diabetes, which contribute to worldwide health care 41

problem by not properly using blood glucose for energy in the body. While T1DM is mostly related 42

with pancreatic beta cell damage, T2DM is both associated with beta cells’ functionality and insulin 43

resistance (DeFronzo et al., 2015; Zheng et al., 2018). Recently, with the help of antidiabetic agents, 44

significant progress has been made in maintaining the glycemic control (blood sugar level) in T2D 45

patients. Still, the targeted glycated hemoglobin levels could not be maintained for 40% of the adults 46

with diabetes in USA. The decrease in pancreatic beta cell functionality and the increase in the 47

insulin sensitivity of T2D patients over the time, eventually gave rise to the imbalance of A1C level 48

and antidiabetic treatment gap (Freeman, 2013). This kind of imbalance and dysfunctionality 49

emerges as a result of the complex interactions among the environmental and genetic risk factors. In 50

this respect, the etiology, driving factors and the genetic predispositions responsible for the increased 51

susceptibility of T2D needed to be well understood in developing new drugs and treatments for this 52

disorder. In this kind of complex diseases, the investigations of different mechanisms of actions may 53

provide benefits for therapeutic approaches. Therefore, post analysis of high throughput studies 54

conducted at different molecular levels and the elucidation of targeted genes and pathways associated 55

with T2D are crucial. 56

The widespread introduction of large-scale genetic studies has enabled researches to investigate the 57

genetic frameworks of complex disorders. During the last decade, genome wide association studies 58

(GWAS) are widely used to identify the risk factors of complex diseases, to better understand the 59

biological mechanisms of these diseases, and hence to help the discovery of novel therapeutic targets 60

(Claussnitzer et al., 2020). Despite GWASs has led to a remarkable range of discoveries in human 61

genetics (Visscher et al., 2017), it has some shortcomings. One important shortcoming of GWAS 62

stems from its testing each marker once at a time for association with disease. Since these studies 63

evaluate the significance of the variants individually, they probably miss the SNPs that have low 64

contribution to disease individually, but might be important when interacting collectively. Moreover, 65

in traditional GWASs, the functional effects of significant SNPs, predicted at the splicing, 66

transcriptional, translational, and post-translational levels are usually neglected. Although GWAS 67

identified more than 140 independent loci influencing the risk of T2D (Bonàs-Guarch et al., 2018; 68

Mahajan et al., 2018b, 2018a; Mercader and Florez, 2017; Scott et al., 2017; Xue et al., 2018; Zhao et 69

al., 2017), most of these loci are driven by common variants and the mechanistic understanding has 70

only been achieved only for a couple of these genes. In this respect, post-GWAS strategies need to be 71

developed to enlighten the molecular mechanisms underlying T2D development and progress (White 72

et al., 2019). 73

Recent studies indicated that the methods focusing on pathways rather than individual genes can 74

detect significant coordinated changes since these genes act in a synergistic mode in a biological 75

pathway (Nguyen et al., 2019). Pathway analysis can hypothetically improve power to uncover 76

genetic factors relevant to disease mechanisms, because identifying the accumulation of small genetic 77

effects acting in a common pathway is often easier than mapping the individual genes within the 78

pathway that contribute to disease susceptibility remarkably (Kao et al., 2017; Lamparter et al., 2016; 79

Thrash et al., 2019). The profound discovery that T2D is genetically heterogeneous suggested that 80

the genetic defects might converge on common pathways building up the final similar phenotype. 81

Besides providing the opportunity to investigate additional therapies that reverse the effects of a 82

particular genetic defect, these findings also may encourage scientist to understand the aberrant 83


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

networks at genetic, cellular and physiological levels and to devise pharmacological and 84

nonpharmacological intervention strategies. 85

Inspired by these findings, in this study, we reanalyzed three meta GWAS dataset of T2D, using the 86

methodology that we have developed to identify disease-associated pathways by combining 87

nominally significant evidence of genetic association with the known biochemical pathways, protein-88

protein interaction (PPI) networks, and the functional information of selected SNPs (Bakir-Gungor et 89

al., 2014). 90

2 Materials and Methods 91

2.1 Datasets 92

2.1.1 70K for T2D Meta-analysis data (T2D1) 93

Bonàs-Guarch et. al. collected T2D genome wide association study (GWAS) data, representing 94

12,931 cases and 57,196 controls of European ancestry from EGA and dbGaP databases (Bonàs-95

Guarch et al., 2018). In 70KforT2D meta-analysis data, each dataset was quality controlled and each 96

cohort was imputed to reference panels (1000G and UK10K). Variants which were selected for 97

IMPUTE2 info score ≥ 0.7, MAF ≥ 0.001 and, Hardy-Weinberg equilibrium (HWE) controls p > 98

1x10-6, were meta-analyzed. For more details about the followed quality control procedure and 99

association analysis of 70KforT2D dataset, please see (Bonàs-Guarch et al., 2018). 100

2.1.2 Meta-analysis of DIAGRAM, GERA, UKB GWAS datasets (T2D2) 101

Xue et. al. performed a meta-analysis of GWAS in T2D by gathering DIAGRAM, GERA, UKB 102

GWAS datasets (Xue et al., 2018). 62,892 cases and 596,424 controls of European ancestry in total 103

were obtained after quality controls and imputed to 1000 Genomes Project. Linkage disequilibrium 104

(LD) score regression analysis was demonstrated. Variants were filtered for GERA and UKB using 105

IMPUTE2 info score ≥ 0.3, MAF ≥ 0.01, HWE controls p > 1x10-6. Further details about DIAGRAM 106

imputed data in stages 1 and 2, genotyping, quality control and association analysis for each dataset 107

can be found in (Xue et al., 2018). 108

2.1.3 Type 2 Diabetes GWAS Meta-analysis Dataset #3 (T2D3) 109

Mahajan et. al. collected T2D GWAS datasets from 32 studies including 74,124 cases and 824,006 110

controls of European population, and aggregated data after initial analyses (Mahajan et al., 2018a). 111

Following quality control checks, the imputation of studies was performed using Haplotype 112

Reference Consortium reference panel, except for deCODE GWAS, where population-specific 113

reference panel was used for imputation. For detailed information, please refer to (Mahajan et al., 114

2018a). 115

2.1.4 Protein-protein interaction (PPI) dataset 116

A human protein-protein interaction (PPI) network (interactome data) containing 13,460 proteins and 117

141,296 protein-protein interactions was derived from (Ghiassian et al., 2015) and used in 118

subnetwork identification steps of this study. 119

2.2 Methods 120

To enlighten the molecular mechanisms underlying T2D development and progress, here we 121

integrated different in-silico approaches that proceed in top-down manner and bottom-up manner, as 122


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


summarized in Figure 1. Via combining nominally significant evidence of genetic association with 123

the known biochemical pathways, PPI networks, and the functional information of selected SNPs, 124

our proposed approach identifies disease-associated pathways. 125

2.2.1 Preprocessing 126

Association summary statistics for the T2D1, T2D2, T2D3 datasets were downloaded from each 127

project’s website. This summary statistics data includes i) marker name as chromosome and position, 128

ii) effect allele, iii) non-effect allele, and iv) p-value of association. To be able to assess the collective 129

effect of the variants detected in GWAS with mild effects, all variants were filtered using p<0.05 130

cutoff, as suggested in previous studies (Bakir-Gungor et al., 2013, 2015; Bakir-Gungor and 131

Sezerman, 2011, 2013; Baranzini et al., 2009). 132

2.2.2 Assigning rsIDs to identified SNPs 133

While T2D2 dataset provides associated rsIDs of the identified SNPs in the summary statistics data, 134

T2D1 and T2D3 datasets only provide chromosome and position information as marker name of the 135

variants and do not provide associated rsIDs. In this respect, fast and easy variant annotation protocol 136

introduced by (Yang and Wang, 2015) is utilized to assign associated rsIDs to the identified SNPs 137

using hg19 or hg38 reference genomes, depending on the provided genomic coordinates at T2D1, 138

T2D3 datasets. 139

2.2.3 Assessing the Functional Impacts of Genetic Variants 140

To assess the functional impact of a non-synonymous change on proteins, numerous computational 141

methods have been developed, as reviewed in (Zeng and Bromberg, 2019). These methods can be 142

classified as following: i) methods that score mutations on the basis of biological principles, ii) 143

methods that use existing knowledge about the functional effects of mutations in the form a training 144

set for supervised machine learning (Carter et al., 2013). Most of these methods assign a numeric 145

score to the non-synonymous change, indicating the predicted functional impact of an amino acid 146

substitution. To identify likely functional missense mutations, Douville et. al. developed a tool called 147

The Variant Effect Scoring Tool (VEST), that utilizes Random Forest as a supervised machine 148

learning algorithm (Douville et al., 2016). Douville et. al. represents all mutations with a set of 86 149

quantitative features; and used missense variants from the Human Gene Mutation Database as a 150

positive class and common missense variants detected in the Exome Sequencing Project (ESP) as a 151

negative class, in their training set (Douville et al., 2016). Since VEST scores result in 0.9 sensitivity 152

and 0.9 specificity values, these scores are utilized to assess the functional impacts of genetic variants 153

in our study. 154

2.2.4 Assigning SNPs to genes 155

Several post-GWAS studies map disease-associated SNPs to genes based on physical distance (Segrè 156

et al., 2010), linkage disequilibrium (LD) (Pers et al., 2015), or a combination of both (Wood et al., 157

2014). In this respect, to aggregate SNP summary statistics into gene scores, several methods have 158

been proposed (Li et al., 2011; Liu et al., 2010; Segrè et al., 2010). Via applying inverse chi-squared 159

quantile transformation on SNP p-values, most of these methods firstly calculate chi-squared values. 160

Secondly, within a window encompassing the gene of interest, some of these methods focus only on 161

the most significant SNP, and assign the maximum-of-chi-squares as the gene score statistic (Lee et 162

al., 2011; Segrè et al., 2010). Some other methods combine results for all SNPs in the gene region by 163

using the sum-of-chi-squares statistic (Wang et al., 2011). In order to compute a well-calibrated p-164

value for the statistic, gene size and LD structure correction is also critical. (Lamparter et al., 2016) 165


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

rigorously analyzed the effects of using the sum and the maximum of chi-squared statistics, which 166

correspond to the strongest and the average association signals per gene, respectively. (Lamparter et 167

al., 2016) proposed a fast and efficient methodology, Pascal, that calculates gene scores by 168

aggregating SNP p-values from a GWAS meta-analysis (without the need for individual genotypes), 169

while correcting for LD structure. Pascal only requires SNP-phenotype association summary 170

statistics and do not require genotype data. Hence, we utilized this tool in our study to map SNPs into 171

genes. 172

2.2.5 The Identification of Dysregulated Modules 173

High throughput experiments enable us to gain better understanding of the functions of the biological 174

molecules in the cell. In addition to the individual activities of these molecules, the molecular 175

interactions are essential to elucidate these molecular mechanisms. In this regard, human protein-176

protein interaction (PPI) networks represent the interactions between human proteins. Via analyzing 177

PPI networks, specific sets of proteins (modules) associated with disease phenotype could be 178

detected. This idea is exploited in several post-GWAS analyzes (Bakir-Gungor et al., 2013, 2014, 179

2015; Bakir-gungor and Sezerman, 2013; Bakir-Gungor and Sezerman, 2011; Chang et al., 2018). 180

An undirected graph could be defined as G = (V, E), in which the vertex or nodes (V) represent 181

proteins, edges (E) represent the physical interactions among proteins, and graph (G) represent 182

protein-protein interaction (PPI) network. A group of proteins in a PPI network that works together to 183

carry out a specific set of functions can be defined as a subnetwork. With the idea of proteins 184

working as a team, disease related protein subnetwork detection has been widely investigated. Active 185

subnetwork search algorithms are originally proposed to identify dysregulated modules in a PPI via 186

utilizing the gene expression values measured in a microarray study (Ideker et al., 2002). p-values of 187

the genes indicate the significance of expression changes of a gene over certain conditions are 188

mapped to PPI and a search algorithm identifies dysregulated modules. Our group and several others 189

later extended this idea to post-GWAS analyzes, where the SNPs are initially mapped to genes and 190

then the p-values of a gene (genotypic p-values) indicate the significance of a gene in the genetic 191

association study. In this study, to detect dysregulated modules, we use the following two approaches 192

that proceed in top-down and bottom-up manners. 193

2.2.5.1 Using Subnetwork Identification Algorithms (Top-down approach) 194

The methodology proposed by (Ideker et al., 2002) to identify active modules in PPI networks, 195

became a pioneer study in this field. While this method brings together the nodes that are highly 196

affected by the condition under study, it also gives a chance to the neighbor nodes of these highly 197

affected nodes, even if they are not highly affected. In this method, firstly a scoring function is 198

defined for each subnetwork and then the problem turned into a search problem of a subnetwork, 199

which maximizes this score. More specifically, to score a subnetwork, the genotypic p-value is 200

converted to a z-score using the equation below, where Φ^ (- 1) indicates inverse normal probability 201

distribution. 202

𝑧𝑖 = 𝛷−1(1 − 𝑝𝑖) 203

The total z score (zA) of the subnetwork A, including k genes is calculated as follows: 204

𝑧𝐴 =1

√𝑘∑ 𝑧𝑖

𝑖 ∈ 𝐴

205


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


While this score is normalized using the following equation, where and indicates mean and 206

standard deviation, respectively; the subnetwork scores are also calibrated by the Monte Carlo 207

method. 208

𝑠𝐴 =(𝑧𝐴 − 𝜇𝑘)

𝜎𝑘 209

Once the subnetwork score is defined, greedy approach, genetic algorithm, and simulated annealing 210

are popular search strategies in active subnetwork identification methodologies. In this study, greedy 211

approach is used during the search steps of the algorithm, and the subnetwork score cutoff is chosen 212

as 3, as suggested in the original paper (Ideker et al., 2002) to select biologically meaningful 213

subnetworks. 214

2.2.5.2 Using Network Propagation (Bottom Up Approach) 215

Based on the idea that the disease-related proteins do not concentrate in a specific region, studies 216

focus on the estimation of dysregulated modules by using the degree of affected nodes information 217

and edges (protein interaction). (Ghiassian et al., 2015) proposed DIseAse MOdule Detection 218

(DIAMOnD) algorithm that finds out dysregulated modules by adding other possible proteins around 219

the known disease protein clusters. Based on random walking, a defined walker starts from a random 220

seed protein and moves through other nodes along the connections of the network. It is hypothesized 221

that more frequently visited proteins are closer to seed proteins (proteins that are known to be 222

associated with the disease). The probability of a random protein with k interaction having ks 223

interaction with seed proteins is calculated by the hyper-geometric distribution as follows: 224

𝑝(𝑘, 𝑘𝑠) =(𝑠0

𝑘𝑠) (𝑁−𝑠0

𝑘−𝑘𝑠)

(𝑁𝑘

) 225

Here, N denotes the number of proteins, s0 denotes the number of seed proteins associated with a 226

particular disease. Whether a protein in the PPI network is randomly interact with the seed protein is 227

calculated by the p-value in equation below. In this way, initiating from seed proteins, other 228

candidate proteins associated with the disease can be identified. 229

𝑝𝑣𝑎𝑙𝑢𝑒 (𝑘, 𝑘𝑠) = ∑ 𝑝(𝑘, 𝑘𝑖)

𝑘

𝑘𝑖=𝑘𝑠

230

2.2.6 Functional Enrichment 231

In multifactorial complex disorders, a single factor is unlikely to explain the disease mechanism. 232

Within this scope, functional enrichment analysis focuses on interconnection of terms and functional 233

groups in networks to predict affected pathways for the interested disease. Hyper geometric test and 234

correction methods such as Bonferroni and Benjamini-Hoschberg are used for analyses. Hyper 235

geometric p-value determines the significance of gene enrichment above a certain threshold form 236

predefined functional terms. 237

𝑃𝑣𝑎𝑙𝑢𝑒 = ∑(

𝑔𝑘

) (𝑓 − 𝑔𝑑 − 𝑘

)

(𝑓𝑑

)

min (𝑔,𝑑)

𝑘=𝑛

238


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

Accordingly, important pathways in the disease and upregulated and downregulated target genes in 239

the pathway are predicted and given as output. In this study, ClueGO (Bindea et al., 2009) is utilized 240

for enrichment analysis. KEGG biological pathways are used as reference pathways. 241

2.2.7 Construction of Pathway Network 242

Figure 2 summarizes our steps regarding pathway-pathway biological network generation and 243

pathway subnetwork identification. In order to establish a pathway network, first of all, the 244

relationships between the genes and 288 KEGG biological pathways need to be analyzed. This 245

relationship is revealed via examining whether the gene of interest is found in a specific pathway or 246

not. For example, if pathway i includes gene j, a value of 1 is assigned to indexi,j in the gene-term 247

matrix and if not, a value of 0 is given to this index. Hence, the created gene-term matrix is a binary 248

matrix, as shown in Figure 2. Secondly, the relationships between pathways need to be analyzed. For 249

this purpose, the term - term matrix is formed by using the previously obtained gene - term matrix, as 250

illustrated in Figure 2. Kappa score metric is used to determine the relationships among the 251

pathways. The equation expressing the Kappa score for any two pathways A, B is given as follows: 252

𝐺𝐴,𝐵 = 𝐶𝑁1,1 + 𝐶𝑁0,0

𝐶𝑁1,1 + 𝐶𝑁0,0 + 𝐶𝑁0,1 + 𝐶𝑁1,0 253

𝐶𝐴,𝐵 = (𝐶𝑁0,1 + 𝐶𝑁1,1) ∗ (𝐶𝑁1,0 + 𝐶𝑁1,1) + (𝐶𝑁0,0 + 𝐶𝑁1,0) ∗ (𝐶𝑁0,0 + 𝐶𝑁0,1)

(𝐶𝑁1,1 + 𝐶𝑁0,0 + 𝐶𝑁0,1 + 𝐶𝑁1,0 ) ∗ (𝐶𝑁1,1 + 𝐶𝑁0,0 + 𝐶𝑁0,1 + 𝐶𝑁1,0 ) 254

𝐾𝐴,𝐵 = 𝐺𝐴,𝐵 − 𝐶𝐴,𝐵

1 − 𝐶𝐴,𝐵 255

where, GA,B represents the observed contingency, CA,B represents random contingency and KA,B 256

represents the Kappa score between pathways A and B. CN1,1, CN0,0, CN1,0, CN0,1 counters are 257

calculated as following. If the gene of interest is present in both compared pathways, CN1,1 counter is 258

increased by 1. Following the same idea, the values of other counters are calculated. Kappa scores, 259

which express the relationships between pairs of pathways, was obtained using observed contingency 260

(G) and random contingency (C) values and stored in term - term matrix. Via applying a threshold on 261

Kappa scores, human KEGG pathway network is created. The pathway network generation steps are 262

implemented in Java. 263

2.2.8 The Identification of Affected Pathway Subnetworks and Pathway Clusters 264

To be able to utilize the interrelated structure of the pathways, we proposed to apply subnetwork 265

identification methodologies on the generated pathway networks, hence disease related affected 266

pathway subnetworks could be identified. A classical subnetwork identification algorithm requires 267

the following two information: i) the biological network file, ii) significance of the nodes. In the 268

regular subnetwork identification problem, while (i) refers to a PPI network, (ii) refers to the 269

significance values of the genes, obtained in a microarray experiment. Here, for (i), we used the 270

pathway network that we generated as described in Section 2.2.7. Regarding (ii), the functional 271

enrichment step, as explained in Section 2.2.6 outputs affected pathway lists with their p-values, 272

indicating the importance of a pathway for the phenotype under study. Hence, to obtain the affected 273

pathway subnetworks, a similar methodology, as described in Section 2.2.5.1 is followed. Instead of 274

using a protein-protein interaction network, in this step, the generated pathway network, as explained 275

in Section 2.2.7, is used. Instead of using the significance values of the proteins, in this step, the 276

significance values of the pathways, generated in Functional Enrichment Step, Section 2.2.6, is used. 277


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


To select biologically meaningful subnetworks among all generated subnetworks, the subnetwork 278

score cutoff is chosen as 3, as suggested in the original paper (Ideker et al., 2002). If the size of the 279

identified subnetwork is bigger than 50, this pathway subnetwork is further sub-divided to find 280

disease related pathway clusters. At this step, we used a graph theoretic clustering algorithm, 281

Molecular Complex Detection (MCODE) to discover densely connected pathway clusters in the T2D 282

affected pathway subnetwork (Bader and Hogue, 2003). In order to confine the dense regions in a 283

PPI, MCODE exploits vertex weighting by local neighborhood density and outward traversal from a 284

locally dense seed protein. In our problem setting, while the PPI refers to the generated pathway 285

network, proteins refer to the pathways. The advantage of MCODE over other graph clustering 286

methods is its allowance for the i) fine-tuning of clusters of interest without considering the rest of 287

the network and ii) inspection of cluster interconnectivity, which is relevant for pathway networks 288

(Bader and Hogue, 2003). It uses 4 different parameters to find clusters: cut off value, K-core value, 289

haircut and fluff parameters. The cut off value sets the intensity of the cluster to be estimated. The K-290

core parameter allows to assign weights to the nodes, which is later used by MCODE to reduce the 291

running time complexity. The haircut parameter, which is a binary parameter, allows the elimination 292

of nodes considered to be topologically irrelevant. The fluff parameter allows someone to set the size 293

of the cluster, which is estimated topologically in the default mode (Bader and Hogue, 2003). In our 294

analyses, the default values of these parameters are used. In the last step, the identified T2D affected 295

pathway subnetworks and pathway clusters are evaluated. 296

2.2.9 Pathway Scoring Algorithm 297

Integration of SNPs across genes and pathways in GWASs has potential to make significant 298

advancement in statistical power and in enlightening relevant biological mechanisms. However, this 299

process is challenging because of the multi-functional roles of genes in several biological processes 300

and the inadequate information about all phenotype – process pairs. In this regard, Pascal (Pathway 301

scoring algorithm) is a robust tool to calculate gene and pathway scores from SNP-phenotype 302

association summary statistics (Lamparter et al., 2016). It does not require genotype data. Firstly, 303

they calculate gene scores by aggregating SNP p-values from a GWAS meta-analysis, and also by 304

correcting for LD structure. While computing the gene scores, they compared the effect of using the 305

sum of chi-squared statistics (average association signals per gene) with the effect of using max of 306

chi-squared statistics (strongest association signals per gene) (Lamparter et al., 2016). Secondly, they 307

calculate pathway scores via aggregating the scores of genes that belong to the same pathways by 308

using modified Fisher method (Lamparter et al., 2016). 309

2.2.10 Comparison of the Identified Subnetworks and Pathways from Different Datasets Using 310

Normalized Mutual Information (NMI) 311

In order to evaluate the similarities between two different community detection algorithms, (Xuan 312

Vinh et al., 2010) and (Tripathi et al., 2016) proposed to use Normalized Mutual Information. Let U 313

and V be the sets of subnetworks that are identified using different datasets. Let U= {U1, …., UR} 314

denote the set of R different subnetworks identified using dataset x, and let V= {V1, …., VS} denote 315

the set of S different subnetworks identified using dataset y. The following contingency table (Table 316

1) illustrates the numbers of shared genes between pairs of subnetworks. In other words, nij indicates 317

the number of common genes between subnetworks Ui and Vj. The entropy of communities H(U), 318

H(V) and mutual information I (U, V) are calculated as following. 319

𝐻(𝑈) = − ∑𝑎𝑖

𝑁

𝑅

𝑖=1

(log𝑎𝑖

𝑁) 320


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

𝐻(𝑉) = − ∑𝑏𝑖

𝑁

𝑆

𝑖=1

(log𝑏𝑖

𝑁) 321

𝐼(𝑈, 𝑉) = ∑ 𝑎

𝑅

𝑖=1

∑𝑛𝑖𝑗

𝑁

𝑆

𝑖=1

(log𝑛𝑖𝑗 𝑁⁄

𝑎𝑖𝑏𝑗 𝑁2⁄) 322

𝑁𝑀𝐼𝑆𝑈𝑀 = 2 × 𝐼(𝑈, 𝑉)

𝐻(𝑈) + 𝐻(𝑉) 323

Here, I (U, V) indicates the amount of information shared between U and V communities. NMISUM is 324

used to compare the clusters in the range of [0,1], where the value 0 refers no similarity between 325

clusters (Vinh et al., 2010). 326

3 Results 327

Based on the idea that the genes and proteins perform cellular functions in a coordinated fashion, 328

understanding the co-operations of proteins in interaction networks may help to identify candidate 329

biomarkers. In this study, we proposed an integrative approach that concurrently analyzes multiple 330

association studies, the functional impacts of these variants, incorporates the interaction partners of 331

susceptibility genes, detects a pathway network of functionally enriched pathways and finally 332

determines the clusterings and subnetworks of affected pathways. The methodology proposed in 333

Figure 1 is applied on three meta-analyses of GWAS data, which are introduced in Section 2.1. As 334

summarized in Table 2, T2D1, T2D2 and T2D3 datasets include 14 .683.492, 5.053.015, 21.635.866 335

SNPs respectively. After the filtration of 3 GWAS datasets using p< 0.05 cutoff, the SNPs with mild 336

effects are collected and the numbers of genetic variants are reduced to 762,111, 557,564 and 337

1,525,650, for T2D1, T2D2 and T2D3 datasets, respectively. Chromosomal position, reference allele, 338

altered allele information of genetic variants are utilized to assign rsIDs. 335,212 and 639,622 rsIDs 339

are assigned to T2D1 and T2D3 datasets, as explained in Section 2.2.2 (Reference genome: hg19). 340

557,564 rsIDs presented as part of T2D2 dataset is used for further analyses. In the next step, 341

functional scores are assigned to each SNP via using VEST (Douville et al., 2016), as explained in 342

Section 2.2.3. Weighted p-values (pW) are calculated for SNPs via combining the genetic association 343

p-values with functional scores (FS) pw=pGWAS/10FS, as proposed by (Saccone et al., 2008). Then, 344

SNPs are mapped to 15,806, 15,460 and 17,200 genes for T2D1, T2D2 and T2D3 datasets, 345

respectively. Combined p-values of 10,298 common genes among three datasets are calculated using 346

Fisher’s combined test (Fisher, 1934), and called as T2D-combined (T2DC) in the rest of this paper. 347

For the detection of dysregulated modules, top-down and bottom-up approaches are followed, as 348

explained in Section 2.2.5. 349

3.1 Affected subnetworks that are identified using meta GWAS datasets 350

For all datasets, the genes and their significance levels are mapped to protein-protein interaction 351

network and 983, 903, 940 and 813 active protein subnetworks are identified for T2D1, T2D2, T2D3 352

and T2DC datasets, respectively. Numbers of the genes included in these subnetworks are depicted in 353

Figure 3A for 70KforT2Dmeta-analysis dataset (T2D1), in Figure 3B for the meta-analysis of 354

DIAGRAM, GERA, UKB GWAS datasets (T2D2), in Figure 3C for T2D3 dataset, in Figure 3D for 355

T2DC dataset. While most of the subnetworks include 175-250 genes in T2D1 and T2D2 datasets, 356

most of the subnetworks detected for T2C dataset include 200-250 genes. Around two third of the 357

subnetworks, which are identified for T2D3 dataset include 150-175 genes. For each identified 358


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


subnetwork, functional enrichment analysis is carried out and hence, affected pathways are 359

determined. 360

3.2 Dysregulated modules of T2D that are identified using network propagation 361

Known T2D genes, collected in the (Ghiassian et al., 2015) study are used as seed genes to find 362

dysregulated modules via expanding a module by adding other possible genes to the known disease 363

gene clusters. This study indicated that seed proteins display unusual interaction patterns among each 364

other. It enlightens the idea that the existence of disease specific modules is not by chance. 365

Connectivity significance values are calculated for all neighbors of 73 known T2D disease associated 366

seed genes. Afterwards, the node with the most significant interaction is added to the module and this 367

iteration is repeated until 200 and 500 genes are included in a module. Then, functional enrichment 368

procedure is performed on each of these two dysregulated modules (T2D_D200, T2D_D500). 369

3.3 Affected Pathways of T2D 370

Based on the observation that genes almost always act cooperatively rather than independently, to 371

facilitate the biological interpretation of high-throughput data, many different methods have been 372

postulated to identify the biological pathways associated with a particular clinical condition under 373

study. Here, to characterize this cooperative nature of genes and to elucidate the molecular 374

mechanisms of T2D, we investigate the affected pathways of T2D and search for the potential 375

failures in these wiring diagrams. 376

3.3.1 Overrepresented Pathways of T2D Dysregulated Modules 377

To detect possible pathogenic pathways related with T2D, the genes listed in each dysregulated 378

module are compared with the genes included in KEGG pathways and the proportion of the module 379

genes over all pathway-associated genes is calculated. Significantly affected KEGG pathways 380

(pathways with corrected p-values < 0.05) for our defined dysregulated modules are appended to 381

potentially significant pathway list of T2D disorder. Table 3 presents top 10 affected pathways that 382

are found to be overrepresented in the dysregulated modules of T2DC dataset. Five of these pathways 383

are also identified in all other T2D datasets. These shared pathways are Spliceosome, Focal adhesion, 384

soluble N-ethylmaleimide-sensitive factor attachment protein receptor (SNARE) interactions in 385

vesicular transport, transforming growth factor-β (TGF-β) signaling, and ErbB signaling pathways. 386

Figures 4A and 4B depicts the commonalities among the top 50 and top 100 affected pathways 387

enriched for the dysregulated modules of T2D1, T2D2, T2D3, T2DC datasets, and among the gold 388

standard T2D pathways. As illustrated in Figure 4A, when the identified top 50 affected pathways are 389

overlapped among all four datasets, 24 KEGG pathways are commonly observed. These pathways 390

are Valine, leucine and isoleucine degradation, SNARE interactions in vesicular transport, 391

Cholinergic synapse, TGF-beta signaling pathway, ErbB signaling pathway, Ubiquitin mediated 392

proteolysis, Focal adhesion, ECM-receptor interaction, Gap junction, Spliceosome, Serotonergic 393

synapse, Pathways in cancer, Retrograde endocannabinoid signaling, beta-Alanine metabolism, 394

Neurotrophin signaling pathway, GABAergic synapse, Chemokine signaling pathway, Glioma, 395

Dopaminergic synapse, Glutamatergic synapse, Endocytosis, GnRH signaling pathway, T cell 396

receptor signaling pathway, Fc gamma R-mediated phagocytosis. When we compare these top 50 397

affected pathways of four datasets with the gold standard T2D pathway set (Yoon et al., 2018), 398

Valine, leucine and isoleucine degradation pathway was commonly identified (as shown in Figure 399

4A). The comparison of the top 100 affected pathways of these datasets with gold standard T2D 400

pathway set resulted in 8 common KEGG pathways, which are Valine, leucine and isoleucine 401

degradation, Jak-STAT signaling pathway, Cell cycle, Glycolysis / Gluconeogenesis, Calcium 402


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

signaling pathway, Insulin signaling pathway, Fatty acid metabolism, Wnt signaling pathway (as 403

shown in Figure 4B). 404

3.3.2 Enriched Pathways for the Expanded Modules of T2D Seed Genes 405

Overrepresented pathways for expanded modules of 73 T2D seed genes, including 200 and 500 406

genes are identified with functional enrichment analysis. As shown in Table 4, the enrichment 407

operation on T2D_D200 and T2D_D500 dysregulated modules resulted in 41 and 84 significant 408

pathways, respectively. 409

3.3.3 The Pathways that are Identified Using Pathway Scoring Algorithm on T2D GWAS meta 410

data 411

The pathway scoring algorithm, as explained in Section 2.2.10 is used to find potentially affected 412

pathways for T2D1, T2D2 and T2D3 data sets. Firstly, gene and pathway scores from SNP-413

phenotype association summary statistics are computed. Secondly, the calculated scores of affected 414

pathways for each datasets are combined with Fisher’s method, and consequently, 38 KEGG and 46 415

Reactome pathways are detected for this combined data (T2D_PC). 416

In Table 4, the commonly identified KEGG pathways of T2DC, T2D_D500, T2D_PC methods, 417

which are described in Sections 3.3.1, 3.3.2, 3.3.3, respectively, are listed. The affected pathways, 418

which are highlighted in bold, refers to the gold standard KEGG pathways reported in the (Yoon et 419

al., 2018)’s study. The affected pathways, which are highlighted in italic, refers to the pathways that 420

are known in literature as related with T2D development mechanisms, as discussed in detail in 421

Section 4. Among the 17 gold standard KEGG pathways of T2D, Type II diabetes mellitus, Calcium 422

signaling, Insulin signaling, Wnt signaling, Adipocytokine signaling, and Jak-STAT signaling 423

pathways are found with our methodology. 424

3.4 Shared T2D Subnetworks and Pathways Among Different GWAS meta data 425

3.4.1 Comparative Evaluation of Identified T2D Subnetworks for Each Dataset 426

The identified T2D1, T2D2, T2D3 and T2DC subnetworks (as explained in Section 3.1, and 427

summarized in Figure 3) are compared in a pairwise manner to assess the shared information among 428

them. Firstly, for each x, y pairs of T2D1, T2D2, T2D3 and T2DC datasets, each identified 429

subnetwork of T2Dx dataset and T2Dy dataset are compared in gene level and a contingency table of 430

T2Dx/T2Dy, as shown in Table 1, is created. In this contingency table, each value of nij represents the 431

shared gene counts between the ith subnetwork of T2Dx dataset and the jth subnetwork of T2Dy 432

dataset. Secondly, based on this table, the entropy values H(T2Dx), H(T2Dy) and the mutual 433

information values I(T2Dx, T2Dy) are computed for each x, y dataset pair. Thirdly, normalized MI is 434

calculated as explained in Section 2.2.10. This procedure is repeated for all pairwise combinations of 435

the T2D datasets. Hence, similarity scores (NMISUM) are calculated between all pairs of datasets. The 436

presented heatmap in Figure 5 illustrate the similarities of datasets according to the strength of the 437

NMISUM score. As illustrated in Figure 5A, T2D1, T2D2, T2D3 and T2DC subnetwork similarities 438

are resulted in range [0, 0.01]. While highest similarity score of 0.0073 is obtained for T2D2-T2D3 439

dataset pair, the lowest score of 0.0060 is obtained for T2D1-T2DC dataset pair. Accordingly, while 440

the darker colors indicate higher correlation, lighter colors indicate smaller correlation in the heatmap 441

of Figure 5. In Figure 5, NMISUM scores in the diagonals of the heatmap are "whitened" for clearer 442

visibility of the other NMISUM values. 443


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


3.4.2 Comparative Evaluation of Identified T2D Pathways for Each Dataset 444

Shared information among different methodologies (subnetwork identification, as presented in 445

Section 2.2.5.1 and bottom-up approach, as presented in Section 2.2.5.2) and different T2D meta-446

datasets, are also evaluated in terms of the identified T2D pathways. The same functional enrichment 447

analysis is applied on the subnetworks and dysregulated modules, as explained in Section 2.2.6. In 448

addition to the identified pathways of T2D1, T2D2, T2D3 and T2DC datasets, the pathways 449

identified from T2D_D200 and T2D_D500 gene sets are also evaluated here. Firstly, for each x, y 450

pairs of T2D1, T2D2, T2D3, T2DC, T2D_D200 and T2D_D500, each identified pathway of T2Dx 451

dataset and T2Dy dataset are compared in terms of their common genes and a contingency table of 452

T2Dx/T2Dy is created, as shown in Table 1. In this contingency table, each value of nij represents the 453

shared gene counts between the ith identified pathway of T2Dx dataset and the jth identified pathway 454

of T2Dy dataset. Secondly, based on this table, the entropy values H(T2Dx), H(T2Dy) and mutual 455

information values I(T2Dx, T2Dy) are computed for each x, y dataset pair. Thirdly, normalized MI is 456

calculated as explained in Section 2.2.10. This procedure is repeated for all pairwise combinations of 457

the T2D datasets. Hence, similarity scores (NMISUM) are calculated between all pairs of datasets, in 458

terms of overrepresented pathways. In terms of the identified pathways, Figure 5B illustrates the 459

similarity levels of the T2D1, T2D2, T2D3, T2DC, T2D_D200 and T2D_D500, in the range of [0-460

0.1]. While a maximum NMISUM score of 0.0658 is achieved for T2D1-T2D3 pair, a minimum 461

NMISUM score of 0.016 is obtained for T2DC-T2D_D200 pair. 462

3.5 Affected Pathway Subnetworks and Pathway Clusters of T2D 463

We hypothesized that similar to the dysregulated modules of proteins, dysregulated modules of 464

pathways have a role in disease development mechanisms. In order to identify affected pathway 465

subnetworks of a disease; we proposed a methodology, as shown in Figure 2. Instead of a PPI 466

network, this method requires a pathway network as the baseline. Here, we utilized the 288 human 467

KEGG pathways as a reference, for the generation of this biological network. To establish a pathway 468

network, firstly, we examined the relationships between the genes and the biological pathways, as 469

explained in Section 2.2.7. In this study, we stored these relationships in a gene-term matrix, which is 470

a binary matrix with dimensions 6881 * 288, representing the number of individual genes in all 471

pathways, and the number of pathways, respectively. Secondly, the relationships between the 472

pathways are analyzed, as explained in Section 2.2.7. For this purpose, kappa statistics was used to 473

determine the relationships between pathways, and a term-term matrix (of size 288 *288), was 474

formed by using the previously obtained gene-term matrix. Thirdly, to identify interrelated pathways, 475

we experimented with different cutoff values of kappa scores. The sizes of the networks that are 476

created with different threshold values are presented in Table 5. Since the node to edge ratio in the 477

human PPI network is approximately 1 to 10, the kappa score threshold value is selected as 0.15 in 478

this study and finally, a human pathway network including 288 pathways (nodes) and 2976 479

interrelations (edges) is created. 480

Active subnetwork identification algorithms require a biological network and the significance values 481

of the nodes, e.g. the p-values of the genes obtained from microarray studies, indicating the 482

significance of a gene, in terms of the expression levels differing between two experimental 483

conditions. Here, while our biological network is selected as our generated pathway network, 484

significance values of the nodes are selected as the corrected hypergeometric test p-values, indicating 485

the importance of the pathway for T2D. Following the methodology proposed in Figure 2, for all 486

T2D datasets, only one affected pathway subnetwork exceeded the predefined subnetwork score, as 487

summarized in Table 5. As the node and edge numbers of these identified pathway subnetworks 488

could be inspected from Table 5, it could be observed that the nodes are severely connected to each 489


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

other in the identified pathway subnetworks. Therefore, these four identified pathway subnetworks 490

(for four different datasets) were further grouped into subcategories as explained in Section 2.2.8, and 491

the affected pathway clusters of T2D are obtained for each dataset. As shown in Table 6, for T2D1, 492

T2D2, T2D3, T2DC datasets, 7, 9, 7, and 8 affected pathway clusters are identified respectively. 493

Numbers of nodes (pathways) included in each cluster and the scores of each pathway cluster can be 494

found in Table 6. When the obtained results are analyzed, it is seen that the initial pathway 495

subnetwork, which is severely connected with each other and has more than 50 nodes is successfully 496

divided into smaller disease related subnetworks. This can be considered as a proof of the 497

effectiveness of the developed method. The highest scoring pathway cluster of T2D1, T2D2, T2D3, 498

T2DC datasets included 38, 34, 35 and 35 pathways, respectively. For each dataset, the representative 499

networks of the identified pathway clusters are shown in Figure 6. When we analyze the 500

commonalities among these pathways, we observed in Figure 7 that 29 of these pathways were 501

commonly identified in T2D1, T2D2, T2D3, T2DC datasets. The details of these commonly 502

identified pathways are given in Table 7. 503

4 Discussion 504

GWASs of T2D have significantly accelerated the discovery of T2D–associated loci (Adeyemo et al., 505

2015; Bonnefond and Froguel, 2015; Liu et al., 2017; Meyre, 2017; Scott et al., 2017). Although the 506

identified T2D-risk variants including 243 loci and 403 distinct association signals exhibit a potential 507

for clinical translation, the genome-wide chip heritability explains only 18% of T2D risk (Bonàs-508

Guarch et al., 2018; Mahajan et al., 2018a; Xue et al., 2018). Traditional GWASs focus on top-509

ranked SNPs and discard all others except ‘the tip of the iceberg’ SNPs. Such GWAS approaches are 510

only capable of revealing a small number of associated functions. In this regard, even though 511

GWASs are a compelling method to detect disease-associated variants, it does not directly address 512

the biological mechanisms underlying genetic association signals, and hence, the development of 513

novel post-GWAS analysis methodologies is needed (Lin et al., 2017), (Gallagher and Chen-Plotkin, 514

2018), (Erdmann and Zeller, 2019). In this respect, to enlighten the molecular mechanisms of Type 2 515

Diabetes development, here we proposed a method that perform protein subnetwork, pathway 516

subnetwork and pathway cluster level analyses of the SNPs that are found to be mildly associated 517

with T2D in multiple association studies. In other words, to achieve a coherent comprehension of 518

T2D molecular mechanisms, the proposed network and pathway-based solution conjointly analyzes 519

three meta-analyses of GWAS, which are conducted on T2D. 520

The baseline of our study is built on the interactions of T2D related proteins, since the proteins act as 521

the functional base units of the cells and construct the frameworks of cellular mechanisms. Protein 522

network structure helps us to gain a collective insight about the biological systems. At the 523

mesoscopic level of these protein networks, active modules are the potential intermediate building 524

blocks between individual proteins and the global interaction network. Dysregulation of these 525

modules are considered to have a role in disease development mechanisms. Hence, the identification 526

of dysregulated modules of T2D helps us to understand the fundamental molecular characteristics of 527

T2D and to discover new candidate disease genes having a role in the regulation of T2D related 528

pathways. In this context, for each analyzed T2D GWAS meta-analysis dataset, where the 529

characteristics of each dataset is summarized in Table 2, 800 to 1000 dysregulated modules, 530

including 150 to 250 genes are detected using a top-down approach, as explained in Section 2.2.5.1. 531

As outlined in Figure 1, these modules are functionally enriched and the pathways that have a 532

potential effect on T2D development are identified. As presented in Table 3, among the top 10 533

affected T2D pathways of T2DC datasets, 5 pathways are commonly overrepresented for the 534

dysregulated modules of T2D1, T2D2, T2D3, T2DC datasets. These five shared pathways are 535


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


Spliceosome, Focal adhesion, SNARE interactions in vesicular transport, TGF-β signaling, and ErbB 536

signaling pathways. All these pathways are known to have a role in T2D development mechanisms. 537

Spliceosome pathway has a role in the regulation of alternative splicing in insulin resistance cases by 538

aberrantly spliced genes like ANO1, GCK, SUR1, VEGF (Costantini et al., 2011; Dlamini et al., 539

2017; Schmid et al., 2012). Focal adhesion pathway is complementary in regulation of insulin 540

signaling pathway. Via controlling adipocyte survival, Focal adhesion kinases (FAK) regulate insulin 541

sensitivity (Luk et al., 2017). SNARE protein contributes to fusion mechanism of insulin secretory 542

vesicles (Xiong et al., 2017). The study conducted by Boström et. al. demonstrated that total skeletal 543

muscle SNARE protein SNAP23 and SNARE related Munc18C protein levels are higher in patients 544

with type 2 diabetes, which are also correlated with markers of insulin resistance (Boström et al., 545

2010). TGF-β signaling pathway has role in inflammation by cytokines such as interleukins, tumor 546

necrosis factors, chemokins interferons, transforming growth factors (TGF). Insulin enhances TGF-β 547

receptors in fibroblasts and epithelial cells. Herder et. al. documented that high levels of anti-548

inflammatory immune mediator TGF-β1 are correlated with T2D (Herder et al., 2009). TGF-β 549

signaling pathway is also shown to have a crucial role in extracellular matrix accumulation in 550

diabetic nephropathy (Kajdaniuk et al., 2013). Akhtar et. al. showed that the dysregulation of 551

epidermal growth factor receptor family (ErbB) triggers vascular dysfunction stimulated by 552

hyperglycemia in T2D (Akhtar et al., 2015). Other dual role of ErbB protein family included diabetes 553

triggered cardiac dysfunction (Akhtar and Benter, 2013). 554

While identifying active subnetworks of T2D, in addition to the top-down approach (as discussed 555

above), we also applied bottom-up approach as explained in Section 2.2.5.2. Overrated pathways of 556

i) top-down approach (T2DC), ii) bottom-up approach (T2D_D200, T2D_D500), and iii) pathway 557

scoring algorithm (T2D_P) are comparatively evaluated. Among these pathways, Type II diabetes 558

mellitus, Calcium, Insulin, Wnt, Adipocytokine, Jak-STAT signaling pathways (shown in bold in 559

Table 4) overlap with gold standard pathways of T2D (Yoon et al., 2018). Additionally, the pathways 560

that are shown in italic in Table 4, have support from the literature as following. The study conducted 561

by (Berntorp et al., 2013) reported that T2D patients express antibodies against gonadotropin-562

releasing hormone GnRH in serum. (De Souza et al., 2016) stated T2D as prognostic and risk factor 563

for pancreatic cancer. (Houtz et al., 2016) reported that paracrine neurotrophin signaling have a role 564

in insulin secretion between pancreatic vascular system and beta cells, which is triggered by glucose. 565

(Ono et al., 2001) stated that phosphatidylinositol signaling system including PTEN (phosphatase and 566

tensin homologue deleted on chromosome 10) and PI3K (phosphoinositide3-kinase) proteins regulate 567

glucose homeostasis and insulin metabolism. In a study performed by (Dissanayake et al., 2018), 568

cadherin mediated adherens junction proteins are shown to have a potential regulation role in insulin 569

secretion mechanism by controlling vesicle traffic in cell. Via studying different GWAS meta-570

analyses, Schierding et. al., indicated the spatial connection of CELSR2–PSRC1 locus with BCAR3, 571

which is part of the insulin signaling pathway (Schierding and O’Sullivan, 2015). The post-GWAS 572

study conducted by (Liu et al., 2017) identified T2D risk pathways. Among these pathways, Type II 573

diabetes mellitus, Calcium signaling pathway, Pancreatic cancer, MAPK signaling pathway, 574

Chemokine signaling pathway, Tight junction pathways were also identified in our study (p<0.05). 575

Another study performed by (Perry et al., 2009) analyzed T2D GWAS data and reported that Wnt 576

signaling pathway, Olfactory transduction, Galactose metabolism, Pyruvate metabolism, Type II 577

diabetes, TGF-signaling pathways are associated with T2D. Wnt signaling and Type II diabetes 578

pathways are overlapped with our findings, as shown in Table 4. The analysis of T2D WTCCC 579

GWAS dataset by (Zhong et al., 2010) indicated 22 affected pathways in T2D. Among these 580

pathways, Tight junction, Phosphatidylinositol signaling system, Pancreatic cancer, Adherens 581

junction, Calcium signaling pathway are replicated in our study, as shown in Table 4. 582


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

Using the mutual information based on the shared genes, the identified protein subnetworks and the 583

affected pathways of each dataset were compared. While the NMISUM subnetwork scores range from 584

0 to 0.01, NMISUM pathway scores range from 0 to 0.1 (as shown in Figure 5). Hence, we show that 585

while the subnetwork level analyzes increase the degree of irregularity, pathway level evaluation of 586

different T2D GWAS meta-data and different methodologies (top-down vs. bottom-up approach) 587

resulted in higher levels of conservation and yielded in more interpretable outcome. 588

While the Type II diabetes mellitus pathway is identified in the later rankings for T2D1, T2D2, 589

T2D3, and T2DC GWAS datasets (as shown in Table 7), the incorporation of the generated pathway 590

network information helped us to prioritize this pathway. This pathway is found in the highest 591

scoring pathway cluster of each dataset. Since the pathways are strongly interrelated, our proposed 592

approach created a pathway network, and identified affected pathway subnetworks and pathway 593

clusters using multiple association studies, which are conducted on T2D. Our approach is based on 594

both significance level of an affected pathway and its topological relationship with its neighbor 595

pathways. 596

In conclusion, the availability of T2D GWAS meta-data and new analytical methods has provided 597

opportunities to bridge the knowledge gap from sequence to consequence. In this study, the collective 598

effects of T2D–associated variants are inspected using network and pathway-based approaches, and 599

the prominent genetic association signals related with T2D biological mechanisms are revealed. We 600

presented a comprehensive analysis of three different T2D GWAS meta-data at protein subnetwork, 601

pathway, and pathway subnetwork levels. To explore whether our results recapitulate the 602

pathophysiology of T2D, we performed functional enrichment analysis on the dysregulated modules 603

of T2D. In addition to our analysis of the shared information among different datasets in terms of 604

subnetworks, we also analyzed the shared information in terms of the identified T2D pathways. The 605

identified pathway subnetworks, pathway clusters and affected genes within these pathways helped 606

us to illuminate T2D development mechanisms. We hope the affected genes and variants within these 607

identified pathway clusters help geneticists to generate mechanistic hypotheses, which can be 608

targeted for large-scale empirical validation through massively parallel reporter assays at the variant 609

level; and through CRISPR screens in appropriate cellular models, and through manipulation in in-610

vivo models, at the gene level. 611

5 Conflict of Interest 612

The authors declare that the research was conducted in the absence of any commercial or financial 613

relationships that could be construed as a potential conflict of interest. 614

6 Author Contributions 615

BBG and MUY conceived the ideas and designed the study. BBG, MUY, GG, MT conducted the 616

experiments and analyzed the results. BBG, MUY, GG, and MT participated in the discussion of the 617

results and writing of the article. All authors read and approved the final version of the manuscript. 618

7 Acknowledgments 619

We would like to thank the anonymous reviewers for their valuable comments and suggestions to 620

improve the quality of the paper. We are also very grateful to Prof. David Torrents from Barcelona 621

Supercomputing Center, to help us with the 70KforT2D meta-analysis data. We also would like to 622

thank Prof. Albert-Laszlo ́ Barabasi at University of Notre Dame and Dr. Michael Cusick at Center 623


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


for Cancer Systems Biology for providing us PPI dataset; Dr. Gabriela Bindea from Integrative 624

Cancer Immunology Team of Cordeliers Research Center for her help with the ClueGO tool. 625

8 Figures 626

627

628

Figure 1. Summary of our pathway and network oriented approach to enlighten T2D mechanisms 629

using multiple association studies. 630

631

632


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

633

Figure 2. Flowchart of pathway network generation and pathway subnetwork identification. 634

635

636

637

638

639


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


640

Figure 3. Numbers of genes included in the identified (A) 983 subnetworks for T2D1, (B) 903 641

subnetworks for T2D2, (C) 940 subnetworks for T2D3, and (D) 813 subnetworks for T2DC datasets. 642

643

644

645

646

Figure 4. Commonalities between (A) top 50, and (B) top 100 affected pathways identified from 647

T2D1, T2D2, T2D3, and T2DC datasets. 648

649


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

650

Figure 5. Shared information comparison among different datasets in terms of (A) identified T2D 651

subnetworks, and (B) identified pathways via normalized mutual information (NMISUM). While the 652

darker colors indicate higher correlation, lighter colors indicate smaller correlation. NMISUM scores in 653

the diagonals of the heatmap are "whitened" for clearer visibility of the other NMISUM values. 654

655


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


656

Figure 6. The representative networks of the highest scoring pathway clusters of (A) T2D1, (B) 657

T2D2, (C) T2D3, (D) T2DC datasets, including 38, 34, 35 and 35 pathways, respectively. 658

659

660


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

661

Figure 7. Commonalities between the highest scoring pathway clusters of T2D1, T2D2, T2D3, and 662

T2DC datasets. 663

664

665

9 Tables 666

Table 1. Contingency table of overlapping genes (ni, j) between subnetworks Ui and Vj , where U and 667

V indicate the sets of subnetworks identified via using datasets X and Y, respectively. 668

U | V V1 V2 … VS Sum

U1

U2

…

UR

n11 n12 … n1S

n21 n22 … n2S

… … … …

nR1 nR2 … nRS

a1

a2

…

aR

Sum b1 b2 … bS N

669

670

671

672


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


Table 2. Summary of T2D1, T2D2, T2D3, T2DC datasets, and the numbers of identified SNPs, 673

genes, subnetworks for each dataset. 674

Datasets # of

Cases

# of

Controls

# of

SNPs

# of SNPs

(p-value <

0.05)

# of

rsIDs

# of

Genes

# of

Subnetworks

T2D1 12.931 57.196 14.683.492 762.111 335.212 15.806 984

T2D2 62.892 596.424 5.053.015 557.564 557.564 15.460 904

T2D3 74.124 824.006 21.635.866 1.525.650 639.622 17.800 941

T2DC - - - - - 10.298 813


https://doi.org/10.1101/2020.01.14.905547

Table 3. Top 10 affected T2D pathways of T2DC dataset. Among these pathways, 5 pathways are commonly overrepresented for the 675

dysregulated modules of T2D1, T2D2, T2D3, T2DC datasets. 676

p-values Rank Number of genes

Percent of

identified genes

in associated

pathways KEGG term T2DC T2D1 T2D2 T2D3 T2D

C T2D1 T2D2 T2D3 T2D_Union

in

pathways

Spliceosome 8.55E-39 3.26E-27 6.95E-30 3.10E-41 1 15 8 5 104 127 0.81

Focal adhesion 7.032E-38 1.80E-30 3.82E-42 1.97E-54 2 10 1 1 172 200 0.86

SNARE

interactions in

vesicular

transport

1.98E-35 1.37E-37 8.16E-33 5.41E-44 3 3 5 4 34 36 0.94

Valine leucine

and isoleucine

degradation

5.97E-35 3.26E-43 6.39E-20 3.34E-29 4 1 34 13 41 44 0.93

Purine

metabolism 7.60E-34 5.35E-43 4.92E-12 1.29E-45 5 2 83 3 99 166 0.59

Dopaminergic

synapse 3.26E-33 1.04E-20 9.48E-32 6.80E-34 6 37 7 9 119 130 0.91

TGF-beta

signaling

pathway

5.03E-29 8.70E-32 5.61E-34 3.23E-28 7 6 3 15 75 84 0.89

ErbB signaling

pathway 1.59E-28 4.64E-31 1.00E-29 1.46E-37 8 8 9 7 85 87 0.97

Chemokine

signaling

pathway

5.23E-28 1.47E-21 1.01E-23 2.97E-19 9 33 20 39 163 189 0.86

Glutamatergic

synapse 3.47E-27 1.97E-20 1.94E-29 3.03E-28 10 38 10 14 101 126 0.80


https://doi.org/10.1101/2020.01.14.905547


https://doi.org/10.1101/2020.01.14.905547

Table 4. Comparison of the overrepresented pathways of T2D dysregulated modules (T2DC), 677

expanded modules of T2D seed genes (T2D_D500), and the affected pathways identified using 678

pathway scoring algorithm (T2DP). 679

p-value Rank

KEGG term T2DP T2DC T2D_D500 T2DP T2DC T2D_D500

Pathways in cancer 1.42E-15 2.52E-20 1.86E-33 2 24 79

Focal adhesion 4.39E-14 7.03E-38 1.48E-33 3 2 80

Type II diabetes mellitus 4.72E-14 1.84E-08 1.81E-10 4 127 43

Prostate cancer 4.28E-10 1.19E-19 2.94E-29 7 27 73

Calcium signaling

pathway 9.66E-10 3.71E-13 2.18E-08 9 77 33

MAPK signaling pathway 3.48E-08 8.59E-24 5.25E-27 10 14 71

Small cell lung cancer 7.44E-08 5.10E-10 1.79E-07 11 110 26

Chronic myeloid leukemia 7.78E-08 5.65E-19 1.09E-31 12 33 77

Insulin signaling pathway 2.12E-07 2.67E-14 2.21E-30 13 63 76

Glioma 3.01E-07 7.22E-18 6.81E-32 14 36 78

Non-small cell lung cancer 7.16E-07 6.51E-12 3.38E-26 15 87 70

GnRH signaling pathway 1.93E-06 1.81E-19 8.73E-20 17 29 62

Pancreatic cancer 2.41E-06 4.22E-15 4.55E-21 18 56 65

Vascular smooth muscle

contraction 2.80E-06 1.21E-19 1.41E-05 19 28 19

Leukocyte transendothelial

migration 6.45E-06 2.82E-13 2.35E-16 20 76 53

Chemokine signaling

pathway 8.94E-06 5.24E-28 1.70E-29 21 9 74

Gap junction 3.33E-05 1.17E-20 5.05E-08 23 23 31

Tight junction 9.78E-05 6.68E-14 1.35E-09 25 67 39

Wnt signaling pathway 1.16E-04 5.63E-22 3.97E-06 26 21 22

Adipocytokine signaling

pathway 1.35E-04 5.40E-11 1.35E-05 27 95 20

Acute myeloid leukemia 1.55E-04 1.08E-13 4.62E-21 29 72 63

Adherens junction 1.61E-04 2.81E-24 7.02E-24 30 12 67

ErbB signaling pathway 2.81E-04 1.60E-28 2.74E-54 32 8 83

Phosphatidylinositol signaling system

3.49E-04 1.91E-23 1.05E-02 33 16 2

Neurotrophin signaling

pathway 3.91E-04 3.03E-22 2.08E-58 34 20 84

Melanogenesis 4.38E-04 1.81E-19 1.57E-07 36 30 27

Jak-STAT signaling

pathway 4.57E-04 7.54E-14 6.66E-19 37 68 60

680

681

682

683


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


684

685

686

Table 6. Identified pathway clusters that are affected in T2D for each dataset. 687

T2D1 T2D2 T2D3 T2DC

# of

Clusters

# of

Nodes

Score

of

Cluster

# of

Clusters

# of

Nodes

Score

of

Cluster

# of

Clusters

# of

Nodes

Score

of

Cluster

# of

Clusters

# of

Nodes

Score

Of

Cluster

7

38 32.919

9

34 30.182

7

35 31.412

8

35 31.118

14 8.462 19 13.111 21 14.3 16 8.8

9 4.75 15 5.286 11 5.2 16 8.533

4 3.333 5 5 5 4.5 11 5

3 3 5 4,5 4 4 5 5

3 3 4 4 4 3.333 8 4.286

3 3 3 3 3 3 4 3.333

*Cut Off Value: 0.2, Haircut: True Fluff: FALSE, K-Core: 2 688

689

Table 5. Node – Edge relationships in the generated pathway networks and affected pathway

subnetworks.

Sizes of the generated pathway networks for different threshold values

Threshold Values ( ≥ ) # of Nodes # of Edges

0 288 82944

1.21E-5 288 10904

0.05 288 6806

0.1 288 4617

0.15 288 2976

0.2 288 1866

0.25 288 1321

Sizes of the generated highest scoring pathway subnetworks for different T2D datasets

Dataset # of Nodes # of Edges

T2D1 119 1356

T2D2 134 1383

T2D3 135 1441

T2DC 158 1709


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

Table 7. Common pathways of highest scoring pathway clusters that are identified for different T2D 690

GWAS meta-data. 691

Pathway Name p-values Rank

T2D1 T2D2 T2D3 T2DC T2D1 T2D2 T2D3 T2DC

Renal cell carcinoma 7.12E-15 1.95E-15 7.23E-13 8.14E-15 68 55 90 57

Colorectal cancer 1.52E-12 7.53E-10 1.82E-14 3.51E-17 97 115 77 41

Hepatitis C 2.99E-14 1.29E-14 1.35E-18 1.59E-16 77 62 47 43

VEGF signaling pathway 1.05E-11 1.20E-10 4.18E-12 4.15E-13 104 99 99 78

Toxoplasmosis 2.38E-12 2.24E-12 1.30E-18 4.39E-13 99 78 48 80

Chagas disease (American

trypanosomiasis) 2.10E-18 1.62E-12 3.85E-19 3.57E-15 48 76 42 54

Type II diabetes mellitus 1.32E-12 2.68E-09 6.18E-19 1.84E-08 96 124 44 127

Chemokine signaling

pathway 1.47E-21 1.01E-23 2.97E-19 5.23E-28 33 20 39 9

Progesterone-mediated

oocyte maturation 2.67E-16 3.57E-12 4.95E-16 7.25E-18 62 81 68 37

Insulin signaling pathway 2.16E-16 1.67E-16 2.96E-18 2.67E-14 60 48 49 63

Toll-like receptor signaling

pathway 1.70E-29 2.63E-11 3.20E-13 1.27E-14 13 91 85 62

Cholinergic synapse 6.32E-35 1.17E-25 1.61E-31 4.37E-27 4 16 11 11

Neurotrophin signaling

pathway 4.20E-22 3.68E-23 3.03E-31 3.02E-22 30 22 12 20

Fc gamma R-mediated

phagocytosis 3.57E-19 2.88E-18 1.01E-19 1.75E-16 44 37 35 47

Osteoclast differentiation 5.24E-22 1.28E-14 3.60E-19 3.16E-17 31 61 41 40

T cell receptor signaling

pathway 3.32E-19 3.69E-21 4.49E-20 2.14E-18 43 32 33 34

Fc epsilon RI signaling

pathway 3.75E-18 9.42E-16 5.92E-18 2.33E-23 52 53 52 17

Natural killer cell mediated

cytotoxicit 2.61E-13 1.53E-13 2.12E-09 5.47E-12 90 69 131 86

B cell receptor signaling

pathway 3.28E-19 3.39E-17 2.41E-14 1.96E-19 42 43 78 31

mTOR signaling pathway 1.28E-12 4.34E-10 1.72E-08 1.60E-10 95 108 141 102

Non-small cell lung cancer 7.60E-16 3.04E-11 1.86E-13 6.51E-12 65 92 82 87

ErbB signaling pathway 4.64E-31 1.09E-29 1.46E-37 1.59E-28 8 9 7 8

Acute myeloid leukemia 5.42E-14 1.40E-10 1.03E-11 1.08E-13 80 102 105 72

Chronic myeloid leukemia 7.27E-20 8.58E-17 2.48E-16 5.65E-19 41 45 65 33

Melanoma 4.79E-14 8.51E-17 6.46E-15 1.05E-14 78 44 74 59

Prostate cancer 1.13E-17 1.82E-13 1.12E-12 1.18E-19 53 70 93 27

Glioma 3.33E-21 1.67E-16 7.34E-19 7.21E18 35 47 45 36

Endometrial cancer 3.47E-16 1.67E-14 4.80E-13 1.62E16 63 63 88 45

Pancreatic cancer 6.15E-13 4.15E-14 8.21E-15 4.21E-15 94 65 75 56


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


REFERENCES 692

Adeyemo, A. A., Tekola-Ayele, F., Doumatey, A. P., Bentley, A. R., Chen, G., Huang, H., et al. 693

(2015). Evaluation of Genome Wide Association Study Associated Type 2 Diabetes Susceptibility 694

Loci in Sub Saharan Africans. Front. Genet. 6, 335. doi:10.3389/fgene.2015.00335. 695

Akhtar, S., and Benter, I. F. (2013). The role of epidermal growth factor receptor in diabetes-induced 696

cardiac dysfunction. BioImpacts. doi:10.5681/bi.2013.008. 697

Akhtar, S., Chandrasekhar, B., Attur, S., Dhaunsi, G. S., Yousif, M. H. M., and Benter, I. F. (2015). 698

Transactivation of ErbB Family of Receptor Tyrosine Kinases Is Inhibited by Angiotensin-(1-7) via 699

Its Mas Receptor. PLoS One 10, e0141657. doi:10.1371/journal.pone.0141657. 700

Bader, G. D., and Hogue, C. W. V (2003). An automated method for finding molecular complexes in 701

large protein interaction networks. BMC Bioinformatics 4, 2. doi:10.1186/1471-2105-4-2. 702

Bakir-Gungor, B., Baykan, B., İseri, S. U., Tuncer, F. N., and Sezerman, O. U. (2013). Identifying 703

SNP targeted pathways in partial epilepsies with genome-wide association study data. Epilepsy Res. 704

105, 92–102. doi:10.1016/j.eplepsyres.2013.02.008. 705

Bakir-Gungor, B., Egemen, E., and Sezerman, O. U. (2014). PANOGA: A web server for 706

identification of SNP-targeted pathways from genome-wide association study data. Bioinformatics 707

30, 1287–1289. doi:10.1093/bioinformatics/btt743. 708

Bakir-Gungor, B., Remmers, E. F., Meguro, A., Mizuki, N., Kastner, D. L., Gul, A., et al. (2015). 709

Identification of possible pathogenic pathways in Behçet’s disease using genome-wide association 710

study data from two different populations. Eur. J. Hum. Genet. 23, 678–687. 711

doi:10.1038/ejhg.2014.158. 712

Bakir-Gungor, B., and Sezerman, O. U. (2011). A New Methodology to Associate SNPs with Human 713

Diseases According to Their Pathway Related Context. PLoS One 6, e26277. 714

doi:10.1371/journal.pone.0026277. 715

Bakir-Gungor, B., and Sezerman, O. U. (2013). The Identification of Pathway Markers in Intracranial 716

Aneurysm Using Genome-Wide Association Data from Two Different Populations. PLoS One 8, 717

e57022. doi:10.1371/journal.pone.0057022. 718

Bakir-gungor, B., and Sezerman, U. (2013). The Identification of Pathway Markers in Intracranial 719

Aneurysm Using Genome-Wide Association Data from Two Di erent Populations. PLoS One 8, 720

e57022. 721

Baranzini, S. E., Galwey, N. W., Wang, J., Khankhanian, P., Lindberg, R., Pelletier, D., et al. (2009). 722

Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. 723

Mol. Genet. doi:10.1093/hmg/ddp120. 724

Berntorp, K., Frid, A., Alm, R., Fredrikson, G., Sjöberg, K., and Ohlsson, B. (2013). Antibodies 725

against gonadotropin-releasing hormone (GnRH) in patients with diabetes mellitus is associated with 726

lower body weight and autonomic neuropathy. BMC Res. Notes 6, 329. doi:10.1186/1756-0500-6-727

329. 728


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

Bindea, G., Mlecnik, B., Hackl, H., Charoentong, P., Tosolini, M., Kirilovsky, A., et al. (2009). 729

ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway 730

annotation networks. Bioinformatics 25, 1091–1093. doi:10.1093/bioinformatics/btp101. 731

Bonàs-Guarch, S., Guindo-Martínez, M., Miguel-Escalada, I., Grarup, N., Sebastian, D., Rodriguez-732

Fos, E., et al. (2018). Re-analysis of public genetic data reveals a rare X-chromosomal variant 733

associated with type 2 diabetes. Nat. Commun. 9, 321. doi:10.1038/s41467-017-02380-9. 734

Bonnefond, A., and Froguel, P. (2015). Rare and Common Genetic Events in Type 2 Diabetes: What 735

Should Biologists Know? Cell Metab. 21, 357–368. doi:10.1016/j.cmet.2014.12.020. 736

Boström, P., Andersson, L., Vind, B., Håversen, L., Rutberg, M., Wickström, Y., et al. (2010). The 737

SNARE protein SNAP23 and the SNARE-interacting protein Munc18c in human skeletal muscle are 738

implicated in insulin resistance/type 2 diabetes. Diabetes. doi:10.2337/db09-1503. 739

Carter, H., Douville, C., Stenson, P. D., Cooper, D. N., and Karchin, R. (2013). Identifying 740

Mendelian disease genes with the Variant Effect Scoring Tool. BMC Genomics 14, S3. 741

doi:10.1186/1471-2164-14-S3-S3. 742

Chang, X., Lima, L. de A., Liu, Y., Li, J., Li, Q., Sleiman, P. M. A., et al. (2018). Common and Rare 743

Genetic Risk Factors Converge in Protein Interaction Networks Underlying Schizophrenia . Front. 744

Genet. 9, 434. Available at: https://www.frontiersin.org/article/10.3389/fgene.2018.00434. 745

Claussnitzer, M., Cho, J. H., Collins, R., Cox, N. J., Dermitzakis, E. T., Hurles, M. E., et al. (2020). 746

A brief history of human disease genetics. Nature 577, 179–189. doi:10.1038/s41586-019-1879-7. 747

Costantini, S., Prandini, P., Corradi, M., Pasquali, A., Contreas, G., Pignatti, P. F., et al. (2011). A 748

novel synonymous substitution in the GCK gene causes aberrant splicing in an Italian patient with 749

GCK-MODY phenotype. Diabetes Res. Clin. Pract. 92, e23–e26. doi:10.1016/j.diabres.2011.01.014. 750

De Souza, A., Irfan, K., Masud, F., and Saif, M. W. (2016). Diabetes Type 2 and Pancreatic Cancer: 751

A History Unfolding. JOP 17, 144–148. Available at: 752

http://www.ncbi.nlm.nih.gov/pubmed/29568247. 753

DeFronzo, R. A., Ferrannini, E., Groop, L., Henry, R. R., Herman, W. H., Holst, J. J., et al. (2015). 754

Type 2 diabetes mellitus. Nat. Rev. Dis. Prim. 1, 15019. doi:10.1038/nrdp.2015.19. 755

Dissanayake, W. C., Sorrenson, B., and Shepherd, P. R. (2018). The role of adherens junction 756

proteins in the regulation of insulin secretion. Biosci. Rep. 38. doi:10.1042/BSR20170989. 757

Dlamini, Z., Mokoena, F., and Hull, R. (2017). Abnormalities in alternative splicing in diabetes: 758

therapeutic targets. J. Mol. Endocrinol. 59, R93–R107. doi:10.1530/JME-17-0049. 759

Douville, C., Masica, D. L., Stenson, P. D., Cooper, D. N., Gygax, D. M., Kim, R., et al. (2016). 760

Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool 761

(VEST-Indel). Hum. Mutat. 37, 28–35. doi:10.1002/humu.22911. 762

Erdmann, J., and Zeller, T. eds. (2019). From GWAS Hits to Treatment Targets. Frontiers Media SA 763

doi:10.3389/978-2-88945-982-7. 764


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


Fisher, R. A. (1934). Statistical methods for research workers. 765

Freeman, J. S. (2013). Review of Insulin-Dependent and Insulin-Independent Agents for Treating 766

Patients With Type 2 Diabetes Mellitus and Potential Role for Sodium-Glucose Co-Transporter 2 767

Inhibitors. Postgrad. Med. 125, 214–226. doi:10.3810/pgm.2013.05.2672. 768

Gallagher, M. D., and Chen-Plotkin, A. S. (2018). The post-GWAS era: from association to function. 769

Am. J. Hum. Genet. 102, 717–730. 770

Ghiassian, S. D., Menche, J., and Barabási, A. L. (2015). A DIseAse MOdule Detection (DIAMOnD) 771

Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the 772

Human Interactome. PLoS Comput. Biol. 11, 1–21. doi:10.1371/journal.pcbi.1004120. 773

Herder, C., Brunner, E. J., Rathmann, W., Strassburger, K., Tabak, A. G., Schloot, N. C., et al. 774

(2009). Elevated Levels of the Anti-Inflammatory Interleukin-1 Receptor Antagonist Precede the 775

Onset of Type 2 Diabetes: The Whitehall II Study. Diabetes Care 32, 421–423. doi:10.2337/dc08-776

1161. 777

Houtz, J., Borden, P., Ceasrine, A., Minichiello, L., and Kuruvilla, R. (2016). Neurotrophin Signaling 778

Is Required for Glucose-Induced Insulin Secretion. Dev. Cell 39, 329–345. 779

doi:10.1016/j.devcel.2016.10.003. 780

Ideker, T., Ozier, O., Schwikowski, B., and Siegel, A. F. (2002). Discovering regulatory and 781

signalling circuits in molecular interaction networks. Bioinformatics 18, 233–240. 782

doi:10.1093/bioinformatics/18.suppl_1.S233. 783

International Diabetes Federation (2017). IDF Diabetes Atlas-8th Edition. Available at: 784

https://diabetesatlas.org/. 785

Kajdaniuk, D., Marek, B., Borgiel-Marek, H., and Kos-Kudła, B. (2013). Transforming growth factor 786

beta1 (TGFbeta1) in physiology and pathology. Endokrynol. Pol. 64, 384–396. 787

doi:10.5603/EP.2013.0022. 788

Kao, P. Y. P., Leung, K. H., Chan, L. W. C., Yip, S. P., and Yap, M. K. H. (2017). Pathway analysis 789

of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. 790

Biochim. Biophys. Acta - Gen. Subj. 1861, 335–353. doi:10.1016/j.bbagen.2016.11.030. 791

Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z., and Bergmann, S. (2016). Fast and Rigorous 792

Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLOS Comput. 793

Biol. 12, e1004714. Available at: https://doi.org/10.1371/journal.pcbi.1004714. 794

Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate 795

disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–796

1121. doi:10.1101/gr.118992.110. 797

Li, M.-X., Gui, H.-S., Kwan, J. S. H., and Sham, P. C. (2011). GATES: a rapid and powerful gene-798

based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293. 799

doi:10.1016/j.ajhg.2011.01.019. 800

Lin, J.-R., Jaroslawicz, D., Cai, Y., Zhang, Q., Wang, Z., and Zhang, Z. D. (2017). PGA: post-801


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

GWAS analysis for disease gene identification. Bioinformatics 34, 1786–1788. 802

Liu, J. Z., McRae, A. F., Nyholt, D. R., Medland, S. E., Wray, N. R., Brown, K. M., et al. (2010). A 803

versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145. 804

doi:10.1016/j.ajhg.2010.06.009. 805

Liu, Y., Zhao, J., Jiang, T., Yu, M., Jiang, G., and Hu, Y. (2017). A pathway analysis of genome-806

wide association study highlights novel type 2 diabetes risk pathways. Sci. Rep. 7, 12546. 807

doi:10.1038/s41598-017-12873-8. 808

Luk, C. T., Shi, S. Y., Cai, E. P., Sivasubramaniyam, T., Krishnamurthy, M., Brunt, J. J., et al. 809

(2017). FAK signalling controls insulin sensitivity through regulation of adipocyte survival. Nat. 810

Commun. 8, 14360. doi:10.1038/ncomms14360. 811

Mahajan, A., Taliun, D., Thurner, M., Robertson, N. R., Torres, J. M., Rayner, N. W., et al. (2018a). 812

Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and 813

islet-specific epigenome maps. Nat. Genet. 50, 1505–1513. doi:10.1038/s41588-018-0241-6. 814

Mahajan, A., Wessel, J., Willems, S. M., Zhao, W., Robertson, N. R., Chu, A. Y., et al. (2018b). 815

Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 816

diabetes article. Nat. Genet. doi:10.1038/s41588-018-0084-1. 817

Mercader, J. M., and Florez, J. C. (2017). The Genetic Basis of Type 2 Diabetes in Hispanics and 818

Latin Americans: Challenges and Opportunities. Front. Public Heal. 5. 819

doi:10.3389/fpubh.2017.00329. 820

Meyre, D. (2017). Give GWAS a Chance. Diabetes 66, 2741–2742. doi:10.2337/dbi17-0026. 821

Nguyen, T.-M., Shafi, A., Nguyen, T., and Draghici, S. (2019). Identifying significantly impacted 822

pathways: a comprehensive review and assessment. Genome Biol. 20, 203. doi:10.1186/s13059-019-823

1790-4. 824

Ono, H., Katagiri, H., Funaki, M., Anai, M., Inukai, K., Fukushima, Y., et al. (2001). Regulation of 825

Phosphoinositide Metabolism, Akt Phosphorylation, and Glucose Transport by PTEN (Phosphatase 826

and Tensin Homolog Deleted on Chromosome 10) in 3T3-L1 Adipocytes. Mol. Endocrinol. 15, 827

1411–1422. doi:10.1210/mend.15.8.0684. 828

Perry, J. R. B., McCarthy, M. I., Hattersley, A. T., Zeggini, E., Weedon, M. N., and Frayling, T. M. 829

(2009). Interrogating Type 2 Diabetes Genome-Wide Association Data Using a Biological Pathway-830

Based Approach. Diabetes 58, 1463–1467. doi:10.2337/db08-1378. 831

Pers, T. H., Karjalainen, J. M., Chan, Y., Westra, H.-J., Wood, A. R., Yang, J., et al. (2015). 832

Biological interpretation of genome-wide association studies using predicted gene functions. Nat. 833

Commun. 6, 5890. doi:10.1038/ncomms6890. 834

Saccone, S. F., Saccone, N. L., Swan, G. E., Madden, P. A. F., Goate, A. M., Rice, J. P., et al. (2008). 835

Systematic biological prioritization after a genome-wide association study: an application to nicotine 836

dependence. Bioinformatics 24, 1805–1811. doi:10.1093/bioinformatics/btn315. 837

Schierding, W., and O’Sullivan, J. M. (2015). Connecting SNPs in Diabetes: A Spatial Analysis of 838


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 8


Meta-GWAS Loci. Front. Endocrinol. (Lausanne). 6, 102. doi:10.3389/fendo.2015.00102. 839

Schmid, D., Stolzlechner, M., Sorgner, A., Bentele, C., Assinger, A., Chiba, P., et al. (2012). An 840

abundant, truncated human sulfonylurea receptor 1 splice variant has prodiabetic properties and 841

impairs sulfonylurea action. Cell. Mol. Life Sci. 69, 129–148. doi:10.1007/s00018-011-0739-x. 842

Scott, R. A., Scott, L. J., Mägi, R., Marullo, L., Gaulton, K. J., Kaakinen, M., et al. (2017). An 843

expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902. 844

Segrè, A. V, Consortium, D., investigators, M., Groop, L., Mootha, V. K., Daly, M. J., et al. (2010). 845

Common inherited variation in mitochondrial genes is not enriched for associations with type 2 846

diabetes or related glycemic traits. PLoS Genet. 6, e1001058. doi:10.1371/journal.pgen.1001058. 847

Thrash, A., Tang, J. D., DeOrnellis, M., Peterson, D. G., and Warburton, M. L. (2019). Pathway 848

Association Studies Tool. bioRxiv, 691964. doi:10.1101/691964. 849

Tripathi, S., Moutari, S., Dehmer, M., and Emmert-Streib, F. (2016). Comparison of module 850

detection algorithms in protein networks and investigation of the biological meaning of predicted 851

modules. BMC Bioinformatics 17, 129. doi:10.1186/s12859-016-0979-8. 852

Vinh, N. X., Epps, J., and Bailey, J. (2010). Information theoretic measures for clusterings 853

comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 854

2837–2854. 855

Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A., et al. (2017). 10 856

years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22. 857

Wang, L., Jia, P., Wolfinger, R. D., Chen, X., Grayson, B. L., Aune, T. M., et al. (2011). An efficient 858

hierarchical generalized linear mixed model for pathway analysis of genome-wide association 859

studies. Bioinformatics 27, 686–692. doi:10.1093/bioinformatics/btq728. 860

White, M. J., Yaspan, B. L., Veatch, O. J., Goddard, P., Risse-Adams, O. S., and Contreras, M. G. 861

(2019). Strategies for Pathway Analysis Using GWAS and WGS Data. Curr. Protoc. Hum. Genet. 862

doi:10.1002/cphg.79. 863

Wood, A. R., Esko, T., Yang, J., Vedantam, S., Pers, T. H., Gustafsson, S., et al. (2014). Defining the 864

role of common variation in the genomic and biological architecture of adult human height. Nat. 865

Genet. 46, 1173–1186. doi:10.1038/ng.3097. 866

Xiong, Q.-Y., Yu, C., Zhang, Y., Ling, L., Wang, L., and Gao, J.-L. (2017). Key proteins involved in 867

insulin vesicle exocytosis and secretion. Biomed. Reports 6, 134–139. doi:10.3892/br.2017.839. 868

Xuan Vinh, N., Epps, J., and Bailey, J. (2010). Information Theoretic Measures for Clusterings 869

Comparison: Variants, Properties, Normalization and Correction for Chance. 870

Xue, A., Wu, Y., Zhu, Z., Zhang, F., Kemper, K. E., Zheng, Z., et al. (2018). Genome-wide 871

association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 872

diabetes. Nat. Commun. 9, 2941. doi:10.1038/s41467-018-04951-w. 873

Yang, H., and Wang, K. (2015). Genomic variant annotation and prioritization with ANNOVAR and 874


https://doi.org/10.1101/2020.01.14.905547


PAGE \* \*

MERGEFORMAT 9

wANNOVAR. Nat. Protoc. 10, 1556–1566. Available at: https://doi.org/10.1038/nprot.2015.105. 875

Yoon, S., Nguyen, H. C. T., Yoo, Y. J., Kim, J., Baik, B., Kim, S., et al. (2018). Efficient pathway 876

enrichment and network analysis of GWAS summary data using GSA-SNP2. Nucleic Acids Res. 46, 877

e60–e60. doi:10.1093/nar/gky175. 878

Zeng, Z., and Bromberg, Y. (2019). Predicting Functional Effects of Synonymous Variants: A 879

Systematic Review and Perspectives . Front. Genet. 10, 914. Available at: 880

https://www.frontiersin.org/article/10.3389/fgene.2019.00914. 881

Zhao, W., Rasheed, A., Tikkanen, E., Lee, J. J., Butterworth, A. S., Howson, J. M. M., et al. (2017). 882

Identification of new susceptibility loci for type 2 diabetes and shared etiological pathways with 883

coronary heart disease. Nat. Genet. doi:10.1038/ng.3943. 884

Zheng, Y., Ley, S. H., and Hu, F. B. (2018). Global aetiology and epidemiology of type 2 diabetes 885

mellitus and its complications. Nat. Rev. Endocrinol. 14, 88–98. doi:10.1038/nrendo.2017.151. 886

Zhong, H., Yang, X., Kaplan, L. M., Molony, C., and Schadt, E. E. (2010). Integrating Pathway 887

Analysis and Genetics of Gene Expression for Genome-wide Association Studies. Am. J. Hum. 888

Genet. doi:10.1016/j.ajhg.2010.02.020. 889

890


https://doi.org/10.1101/2020.01.14.905547

a pathway and network oriented approach to enlighten ...jan 15, 2020 · 28 bottom-up manner, and...

Documents