l8: part 1 hierarchical trees representing timekbessonov/present_data/gbio... · 2015-11-10 · the...
TRANSCRIPT
![Page 1: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/1.jpg)
L8: Part 1 Hierarchical trees Representing time
Kirill Bessonov
Nov 10th 2015
1
![Page 2: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/2.jpg)
Talk Plan
• Trees – Similarity assessment via trees – Phylogenetic trees vocabulary and types
• Practical on phylogenetic trees and sequence alignment – Identifying source viral sequences
• Networks – examples – main definitions – biological examples
• Practical on WGCNA package – main protocol steps – interpretation of network modules – WGCNA demo
2
![Page 3: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/3.jpg)
Decision Trees (DTs) • A data structure type used in CS
• A data model
– Purpose 1: recursively partition data
• cut data space into perpendicular hyper-planes (w)
– Purpose 2: classify data
• DTs with class label at the leaf node
• E.g. a decision tree that estimates whether or not a potential customer will respond to a direct mailing
– predicted binary class: YES or NO
Source: DECISION TREES by Lior Rokach
![Page 4: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/4.jpg)
DT growth and splitting
• In top-down approach – assign all data to root node
• Select attribute(s)/feature(s) to split the node
• Splitting based on – 1 feature: univariate split
– ≥2 features: multivaraite split
• Stop tree growth based on Max depth reached
Splitting criteria is not met
Leaf
s/Te
rmin
al
no
de
s
Selected feature(s)
X>x X<x
Y>y Y<y
![Page 5: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/5.jpg)
Hierarchical Trees
• Trees can be used also for – Clustering
– Hierarchy determination • E.g. phylogenetic trees
• Convenient visualization – effective visual condensation of the
clustering results
• Gene Ontology – Direct acyclic graph (DAG)
– Example of functional hierarchy
5
![Page 6: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/6.jpg)
GO tree example
6
![Page 7: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/7.jpg)
Phylogenetic trees
• Show evolutionary relationships
• Taxa (taxon) – Group of organisms
• Clade – A group of organisms having
a common ancestor
• Common ancestor – an ancestor that given organisms
have in common
7
clade
![Page 8: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/8.jpg)
Building a phylo tree using ape
• Ape - Analyses of Phylogenetics and Evolution
– Functions to create and manipulate phylo trees
– Graphical exploration of phylogenetic data
• To build a phylogenetic tree
1. Download protein sequences from DB
2. Align sequences
3. Calculate pairwise distance using ape
4. Visualize a phylogenetic tree
![Page 9: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/9.jpg)
Building an unrooted phylogenetic tree (1)
#install req. libraries
install.packages("seqinr");
source("https://bioconductor.org/biocLite.R");
biocLite("muscle");
install.packages("ape");
library("seqinr");
library("muscle");
library("ape");
multipleSeqAlignment <- function (seqnames, seqs){
tmp=data.frame(V1=rep(0,length(seqs)),V2=rep(0,length(seqs)));
for(i in 1:length(seqs)){
tmp[i,1]=seqnames[i]
tmp[i,2]=paste(seqs[[i]],collapse="")
}
fasta_seqs_Object = AAStringSet(tmp[,2]); names(fasta_seqs_Object) = seqnames;
#multiple sequence alignment
alignment=muscle::muscle(fasta_seqs_Object); #muscle format
alignment_ape=ape::as.alignment(as.matrix(alignment));
return (alignment_ape)
}
![Page 10: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/10.jpg)
Building an unrooted phylogenetic tree (2)
#main part of the code
choosebank("swissprot") #selects database for query
seqnames <- c("P06747", "P0C569", "O56773", "Q5VKP1");
seqs=list();
for(i in 1:length(seqnames)){
query <- query(paste("AC=",seqnames[i],sep=""));
seqs[i]=getSequence(query);
}
#multipleSeqAlignment() is defined on previous slide
alignment_ape <- multipleSeqAlignment(seqnames, seqs);
mydist <- dist.alignment(alignment_ape);
#nj() performs the neighbor-joining tree estimation by Saitou and Nei mytree$tip.label=c("Q5VKP1-\nWestern Caucasian bat virus\nphosphoprotein","P06747-\nrabies virus\nphosphoprotein","P0C569-\nMokola virus\nphosphoprotein","O56773-\nLagos bat virus\nphosphoprotein");
plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=0.8, no.margin=T, srt=50);
![Page 11: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/11.jpg)
Unrooted Phylogenetic Tree
• Phylogenetic tree showing distance between 4 protein viral sequences
• the genetic distance between O56773 and P0C569 is the smallest
![Page 12: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/12.jpg)
Unrooted phylogenetic tree (1)
• The lengths of the branches
– proportional to the amount of evolutionary change
• estimated by number of mutations
• This is an unrooted phylogenetic tree – does not contain an outgroup sequence,
• sequence of a protein that is known to be more distantly related to the other proteins in the tree than they are to each other
• i.e. the common ancestor to all taxa
![Page 13: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/13.jpg)
Unrooted phylogenetic tree(2)
• we cannot tell which direction evolutionary time ran in along the internal branches of the tree.
• Cannot tell whether – the node representing the
common ancestor of (O56773, P0C569) was
• an ancestor of the node representing the common ancestor of (Q5VKP1, P06747),
• or the other way around…
![Page 14: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/14.jpg)
Distance matrix
• Inspecting calculated distance matrix between aligned sequences confirms results seen in phylogenetic tree
• Closest pair is O56773 and P0C559 proteins
Q5VKP1 P06747 P0C569
P06747 0.49
P0C569 0.48 0.45
O56773 0.50 0.46 0.41
![Page 15: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/15.jpg)
Rooted phylogenetic tree
• In order to convert the unrooted tree into a rooted tree, we need to add an outgroup sequence – Outgroup
• a taxon outside the group of interest • will branch off at the base of phylogeny • Represented by
– Caenorhabditis elegans (UniProt accession Q10572) and – Caenorhabditis remanei (UniProt E3M2K8)
• If we were to build a phylogenetic tree of the Fox-1 homologues in verterbrates, the distantly related sequence from worms would probably be a good choice of outgroup – this protein is from a different taxa/group (worms)
![Page 16: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/16.jpg)
Building an rooted phylogenetic tree (1)
#BUILDIN ROOTED TREE OF PROTEIN SEQUNCES (FOX1)
#Q9NWB1 - Human
#Q17QD3 - Cow
#Q95KI0 - Monkey
#A1A5R1 - Rat
#Q10572 - Worm C.elegans(Root)
#E1G4K8 - Eye worm
seqnames <- c("Q9NWB1","Q17QD3","Q95KI0","A1A5R1","Q10572","E1G4K8")
choosebank("swissprot") #selects database for query
seqs=list()
for(i in 1:length(seqnames)){
query <- query(paste("AC=",seqnames[i],sep=""))
seqs[i]=getSequence(query)
}
alignment_ape <- multipleSeqAlignment(seqnames, seqs);
mydist <- dist.alignment(alignment_ape)
![Page 17: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/17.jpg)
Building an rooted phylogenetic tree (2)
library("ape")
mytree <- nj(mydist)
mytree$tip.label=c("E1G4K8-Eye worm ", "Q10572-C.elegans(Root)",
"A1A5R1-Rat", "Q9NWB1-Human", "Q17QD3-Cow", "Q95KI0-Monkey")
myrootedtree <- root(mytree, outgroup="Q10572-C.elegans(Root)",
r=TRUE)
#Phylogenetic tree with 6 tips and 5 internal nodes.
#Tip labels:
#[1] "E1G4K8" "Q8WS01" "Q9VT99" "A8NSK3" "Q10572" "E3M2K8"
#Rooted; includes branch lengths.
plot.phylo(myrootedtree, edge.color = "blue", edge.width = 3 ,
type="p")
![Page 18: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/18.jpg)
Rooted tree of FOX1 proteins
• The invertebrates are grouped together
• Worms form a distinct group yet with large genetic distance
• Human FOX1 is closest to monkey and cow sequences
outgroup (worms)
![Page 19: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/19.jpg)
Distance matrix E1G4K8 Q10572 A1A5R1 Q9NWB1 Q17QD3 Q10572 0.72
A1A5R1 0.75 0.63 Q9NWB1 0.72 0.62 0.44 Q17QD3 0.73 0.62 0.50 0.28
Q95KI0 0.73 0.61 0.49 0.28 0.14
• As expected, eye worms are the mostly distantly related species to vertebrates
• Cow and monkey have the closest relationship and the lowest genetic distance
Table legend: Q9NWB1 – Human Q95KI0 – Monkey Q10572 - Worm C.elegans (Root) Q17QD3 – Cow A1A5R1 – Rat E1G4K8 - Eye worm
![Page 20: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/20.jpg)
Rooted tree
• Time runs from left to right
• Monkey, Cow and Human have common ancestor 3
• Ancestor 1 is common to ancestors 2 and 3
TIME
![Page 21: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/21.jpg)
Exercises on phylogenetic tree building
• Q1. Calculate the genetic distances (i.e. genetic distance) between the following NS1 proteins from different Dengue virus strains: Dengue virus 1 NS1 protein (Uniprot ID: Q9YRR4), Dengue virus 2 NS1 protein (UniProt: Q9YP96), Dengue virus 3 NS1 protein (UniProt: B0LSS3), and Dengue virus 4 NS1 protein (UniProt: Q6TFL5). Which viruses are the most closely related, and which are the least closely related, based on the genetic distances? Note: Dengue virus causes Dengue fever, which is classified by the WHO as a neglected tropical disease. There are four main types of Dengue virus, Dengue virus 1, Dengue virus 2, Dengue virus 3, and Dengue virus 4.
• Q2. Build an unrooted phylogenetic tree of the NS1 proteins from Dengue virus 1, Dengue virus 2, Dengue virus 3 and Dengue virus 4, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?
![Page 22: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/22.jpg)
• Q3. The Zika virus is related to Dengue viruses, but is not a Dengue virus, and so therefore can be used as an outgroup in phylogenetic trees of Dengue virus sequences. UniProt accession Q32ZE1 consists of a sequence with similarity to the Dengue NS1 protein, so seems to be a related protein from Zika virus. Build a rooted phylogenetic tree of the Dengue NS1 proteins based on an alignment, using the Zika virus protein as the outgroup. Which are the most closely related Dengue virus proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?
Exercises on phylogenetic tree building
![Page 23: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/23.jpg)
Answers
Question 1: Summary of viral proteins and Uniprot accession numbers: Uniprot ID: Q9YRR4 Dengue virus 1 NS1 protein UniProt: Q9YP96 Dengue virus 2 NS1 protein UniProt: B0LSS3 Dengue virus 3 NS1 protein UniProt: Q6TFL5 Dengue virus 4 NS1 protein seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5")
choosebank("swissprot") #selects database for query
seqs=list()
for(i in 1:length(seqnames)){
query <- query(paste("AC=",seqnames[i],sep=""))
seqs[i]=getSequence(query)
}
alignment_ape <- multipleSeqAlignment(seqnames, seqs);
mydist <- dist.alignment(alignment_ape);
mydist
![Page 24: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/24.jpg)
Answers
• Q1. The distance matrix is as follows
The most distant are Q9YP96(V2) and Q6TFL5(V4) with genetic distance of 0,33 while the most closely related are Q9YP96(V1) and BOLSS3(V3) with genetic distance of 0,227
Q6TFL5 Q9YRR4 Q9YP96
Q9YRR4 0.306 Q9YP96 0.333 0.254
B0LSS3 0.297 0.230 0.227
![Page 25: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/25.jpg)
Answers
Question 2:
library("ape")
mytree <- nj(mydist)
#plotting unrooted tree
plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,
no.margin=T, srt=0)
#clean the sequences from gaps
seqs_trim=seqs
for(i in 1:length(seqs)){
start=regexpr("DMGY", paste(seqs_trim[[i]],collapse="") ) [1]
stop=regexpr("GEDG", paste(seqs_trim[[i]],collapse="") ) [1]
seqs_trim[[i]]=seqs_trim[[i]][start:stop]
}
alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);
mydist <- dist.alignment(alignment_ape);mydist
library("ape")
mytree <- nj(mydist)
#plotting unrooted tree based on alignment of whole protein sequences
plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,
no.margin=T, srt=0)
![Page 26: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/26.jpg)
Question 2 (continued):
alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);
mydist <- dist.alignment(alignment_ape);mydist
library("ape")
mytree <- nj(mydist)
#tree based on the best aligned portion
plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,
no.margin=T, srt=0)
Answers
![Page 27: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/27.jpg)
Answers • The resulting Q2 un-rooted tree This un-rooted tree agrees with the genetic distance matrix calculated in Q1. The tree suggests that BOLSS3 and Q9YP96 are the mostly related proteins. To improve quality of the tree it is best to select region that has minimal number of gaps between protein sequences. How gap cleaning affects phylogentic tree performance please see reference [2]
Below you can see that there are regions with lots of gaps. Let’s build another tree based on the bolded(most conserved) region to see if it is the same
Q6TFL5 DMGCVVSWNGKELKC…KDQKAVHADMGYWIESSKNQTWQIEKASLIEVKTCLWPKTHTL…GMEIRPLSEKEENMVKSQVTA
Q9YRR4 ------------------------DMGYWIESEKNETWKLARASFIEVKTCIWPKSHTL…GMEI-----------------
Q9YP96 DSGCVVSWKNKELKC…KDNRAVHADMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTL…GMEIRPLKEKEENLVNSLVTA
B0LSS3 --------------------ASHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTL…------------------------
Alignment of proteins: Built using the full lengths of proteins
![Page 28: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/28.jpg)
Answers
• The resulting tree looks the same but we had achieved overall better resolution between proteins
Q6TFL5 Q9YRR4 Q9YP96
Q9YRR4 0.317 Q9YP96 0.317 0.264
B0LSS3 0.292 0.233 0.216 Built using the bolded region
Whole protein sequences used
Best aligned portion of protein sequences used
Q6TFL5 Q9YRR4 Q9YP96 Q9YRR4 0.306
Q9YP96 0.332 0.254 B0LSS3 0.297 0.230 0.227
![Page 29: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/29.jpg)
Answers
Question 3:
#Q3 building rooted tree based on Q89277 (yellow fever virus) as out group
library("seqinr")
library("muscle")
library("ape")
seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5", "Q89277")
choosebank("swissprot") #selects database for query
seqs=list()
for(i in 1:length(seqnames)){
query <- query(paste("AC=",seqnames[i],sep=""))
seqs[i]=getSequence(query)
}
alignment_ape <- multipleSeqAlignment(seqnames, seqs);
mydist <- dist.alignment(alignment_ape);mydist
library("ape")
mytree <- nj(mydist)
myrootedtree <- root(mytree, outgroup="Q89277", r=TRUE)
plot.phylo(myrootedtree ,type="p", edge.color = "blue", edge.width = 3,
cex=1.2, no.margin=T, srt=0)
![Page 30: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/30.jpg)
Answers
• Q3 asks to build a rooted tree using out-group yellow fever virus (Q89277)
• Most closely related viruses: – BOLSS3 and Q9YP96
• This rooted tree tells you which of the Dengue virus NS1 proteins branched off the earliest from the ancestors. Unrooted tree does not provide ancestry information (i.e. time sequence)
Q89277 Q6TFL5 Q9YRR4 Q9YP96
Q6TFL5 0.523 Q9YRR4 0.511 0.306
Q9YP96 0.486 0.333 0.254
B0LSS3 0.487 0.297 0.230 0.227
outgroup
![Page 31: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/31.jpg)
References
1. Ape library for phylogenetic trees and ancestry with bootstrap methods http://cran.r-project.org/web/packages/ape/ape.pdf
2. Gerard Talavera and Jose Castresana. Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments. Systematic Biology Volume 56, Issue 4 p. 564-577 (link)
![Page 32: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/32.jpg)
L8: Part 2 Networks of Biological
interactions Kirill Bessonov
Nov 10th 2015
32
![Page 33: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/33.jpg)
We are surrounded by networks
33
![Page 34: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/34.jpg)
34
![Page 35: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/35.jpg)
Transportation Networks
35
![Page 36: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/36.jpg)
Computer Networks
36
![Page 37: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/37.jpg)
Social networks
37
![Page 38: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/38.jpg)
Internet submarine cable map
38
![Page 39: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/39.jpg)
From describing to engineering
• In 1950
– Alex Bavelas founds the Networks Laboratory Group at M.I.T. to study effectiveness of different communication patterns
39
![Page 40: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/40.jpg)
Social interaction patterns
40
![Page 41: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/41.jpg)
PPI (Protein Interaction Networks)
• Nodes – protein names • Links – physical binding event 41
![Page 42: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/42.jpg)
Network Definitions
42
![Page 43: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/43.jpg)
Network components
• Networks also called graphs
– Graph (G) contains
• Nodes (N): genes, SNPs, cities, PCs, etc.
• Edges (E): links connecting two nodes
43
![Page 44: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/44.jpg)
Some characteristics
• Networks are
– Complex
– Dynamic
– Can be used to reduce data dimensionally
44 time = t0 time = t
![Page 45: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/45.jpg)
Topology
• Refers to connection pattern
– The pattern of links
45
![Page 46: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/46.jpg)
Small – world networks
• Six degrees of separation – everyone is 6 or fever steps away from each other
• Reference: Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’networks." nature 393.6684 (1998): 440-442.
46
![Page 47: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/47.jpg)
Scale-free networks
• Biological processes are characterized by this topology – Few hubs (highly connected nodes) – Predominance of poorly connected nodes – New vertices attach preferentially to highly connected ones
• Barabási, Albert-László, and Réka Albert. "Emergence of scaling in random networks." science 286.5439 (1999): 509-512. 47
![Page 48: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/48.jpg)
Modules
• Sub-networks with
– Specific topology
– Function
• Biological context
– Protein complex
– Common function
• E.g. energy production
48 clique
![Page 49: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/49.jpg)
Edges Types
N nodes
E edges
graph:
directed
undirected
![Page 50: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/50.jpg)
Network types • Directed
– Edge have directionality
– Some links are unidirectional
– Direction matters • Going A B is not the same as BA
– Analogous to chemical reactions • Forward rate might not be the same as reverse
– E.g. directed gene regulatory networks (TF gene)
• Undirected – Edges have no directionality
– Simpler to describe and work with
– E.g. co-expression networks
50
![Page 51: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/51.jpg)
Neighbours of node(s)
• Neighbours(node, order) = {node1 … nodep}
• Neighbours(3,1) = {2,4}
• Neighbours(2,2) = {1,3,5,4}
51
![Page 52: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/52.jpg)
Reachability of two nodes i and j
• Walk – Sequence of visited nodes on a
path from node i to j
– e.g. nodes(1,2) = {5,2,1,2,3,4,5,2}
• Trail – a walk with no repeated edges
– e.g. nodes(1,4)={5,4}
• Path – a walk with no repeated nodes
– e.g. nodes(1,6)={5,4,6}
52
visited nodes
![Page 53: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/53.jpg)
Connectivity • Line (edge) connectivity (λ)
– Minimum number of lines (edges) that need to be removed to disconnect graph G
• i.e. no other links would be able to connect a node
• Node connectivity (κ)
– Minimum number of nodes that need to be removed to disconnect graph G
53
λs = 3 and κs = 2
λt = 3 and κt = 2
![Page 54: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/54.jpg)
Connectivity matrix (also known as adjacency matrix)
A =
Size
binary or weighted
![Page 55: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/55.jpg)
Node degree (k)
• the number of edges connected to the node
• k(6) = 1
• k(4) = 3
55
![Page 56: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/56.jpg)
Degree distribution (P(k))
• Determines the statistical properties of
uncorrelated networks
56
source: http://www.network-science.org/powerlaw_scalefree_node_degree_distribution.html
![Page 57: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/57.jpg)
Topologies: scale-free Most real networks have
Degree distribution that follows power-law
• the sizes of earthquakes craters on the moon
• solar flares • the sizes of activity patterns of neuronal
populations • the frequencies of words in most languages • frequencies of family names • sizes of power outages • criminal charges per convict • and many more
![Page 58: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/58.jpg)
Topology: random
Degree distribution of nodes is statistically independent
![Page 59: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/59.jpg)
Shortest path (p)
• Indicates the distance between i and j in
terms of geodesics (unweighted)
• p(1,3) =
– {1-5-4-3}
– {1-5-2-3}
– {1-2-5-4-3}
– {1-2-3}
59
![Page 60: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/60.jpg)
Betweenness centrality
# SPs from j to k via i
# SPs from j to k
the ratio between • all shortest paths (SP) that path the node i and all shortest paths existing in the graph G
![Page 61: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/61.jpg)
Facebook academic network
61 Blue low and red is high betweenness
![Page 62: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/62.jpg)
Betweenness centrality
• reflects the – amount of control over the interactions of other nodes in the network
• bc = ((bab(c) / bab) + (bae(c) / bae) + (bad(c) / bad) + (bbe(c) / bbe) + (bbd(b) / bbd) + (bde(b) / bde)) = ((0/1)+(1 / 2) + (0 / 1) + (1 / 2) + (0 / 1) + 0/1)
• bc = 1 62
Possible node combinations: {AB, AD, AE, AC, BD, BE, BC, CD, CE DE}
![Page 63: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/63.jpg)
Betweenness centrality standardized • For standardization
– the denominator is (n-1)(n-2)/2 (15)
– the maximum possible number of edges
63
Node b b - standardized
1 0 0
2 0 0
3 9 9/15
4 9 9/15
5 8 9/15
6 0 0
7 0 0
Possible node pairs (21) 12 23 34 45 56 67 13 24 35 46 57 14 25 36 47 15 26 37 16 27 17
![Page 64: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/64.jpg)
Cliques
• A clique of a graph G is a complete subgraph of G
– i.e. maximally interconnected subgraph
• The highlighted clique is the maximal clique of size 4 (nodes) 64
![Page 65: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/65.jpg)
–Robert Kiyosaki
“The richest people in the world look for and
build networks. Everyone else looks for work.”
![Page 66: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/66.jpg)
Biological context
66
![Page 67: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/67.jpg)
Biological Networks
67
![Page 68: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/68.jpg)
Biological examples
• Co-expression – For genes that have similar expression profile
• Directed gene regulatory networks (GRNs) – show directionality between gene interactions
• Transcription factor target gene expression
– Show direction of information flow – E.g. transcription factor activating target gene
• Protein-Protein Interaction Networks (PPI) – Show physical interaction between proteins – Concentrate on binding events
• Others – Metabolic, differential, Bayesian, etc.
68
![Page 69: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/69.jpg)
Biological networks
• Three main classes
69
Type Name Nodes Edges Resource
molecular interactions PPI proteins physical bonds BioGRID DTI drugs/targets physical bonds PubChem
functional associations
GI genes genetic interactions BioGRID
ON Gene Ontology
functional relations GO
GDA genes/diseases associations OMIM
functional/structural similarities Co-Ex genes
expression profile similarity
GEO, ArrayExpress
PStrS proteins structural similarities PDB
Source: Gligorijević, Vladimir, and Nataša Pržulj. "Methods for biological data integration: perspectives and challenges." Journal of The Royal Society Interface 12.112 (2015): 20150571.
![Page 70: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/70.jpg)
Inferring co-expression networks in R
WGCNA package (Weighted Gene Correlation Network Analysis)
70
![Page 71: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/71.jpg)
Main features
• Builds correlation networks
• Correlations are
– simple to calculate
– fast on large scale data
• Support sign of association (not direction)
• Lots of network metrics (e.g. connectivity)
• Easy identification of modules
– Reduction of dataset dimensionality good
71
![Page 72: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/72.jpg)
Construct a network Search for genes with similar expression profile
Identify modules in predicted network Reduce data into gene sets / groups
Relate modules to external information
find biologically interesting modules E.g.: Clinical data, biological function (gene ontology, pathways)
Find the key drivers in interesting modules Experimental validation, therapeutics, biomarkers
Study Module Preservation across different data Check robustness of module definition
72
![Page 73: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/73.jpg)
Steps for constructing a co-expression network
A) Obtain gene expression data
B) Measure co-expression between genes via a correlation coefficient
C) Build correlation matrix = network A) Adjacency matrix
D) Transform correlation matrix with the power adjacency function new adjacency matrix weighted network
73
![Page 74: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/74.jpg)
Network=Adjacency Matrix
• Adjacency matrix, A=[aij], encodes how a pair of nodes is connected (if at all)
– Weighted networks = aij is edge value (weight)
– Unweighted networks = aij presence or absence of edge
74
![Page 75: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/75.jpg)
Scale Free Network Topology
• Scale free topology means
– presence of hub nodes highly connected to other nodes
– metabolic networks exhibit scale free topology at least approximately
– Node connectivity (k), degree, follows power law
– p(k)=proportion of nodes that have connectivity k
Frequency Distribution of Connectivity
Connectivity k
Fre
qu
en
cy
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
01
00
20
03
00
40
05
00
60
07
00
75
![Page 76: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/76.jpg)
How to check Scale Free Topology?
Only few nodes display
high connectivity
Check if obtained network follows scale free topology Idea: Log transformation p(k) and k and look at scatter plots Answer: R^2 can be used to quantify goodness of fit R^2 > 0.6 mean that networks follows scale free topology
76
![Page 77: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/77.jpg)
Power function transformation
• Idea:
– transform correlation matrix via power function
– Impose scale free topology
– Select the best beta (β)
• Pick the largest beta
• Corresponds to largest R^2
(Beta)
R^2
Power function
77
![Page 78: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/78.jpg)
Defining modules • based on a hierarchical cluster tree
– Build a tree and cut it – Dynamic tree cutting at optimal height [1] Module=branch of
a cluster tree
78
![Page 79: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/79.jpg)
Analysis of modules
• Perform gene ontology analysis on genes from each module (e.g. yellow = “genes 1”)
• Link modules to clinical data (e.g. weight) – Via module eigengene e.g. cor(trait, eigengene)
genes 1 genes 2 genes 3 genes 4
Modules
79
![Page 80: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/80.jpg)
Heatmap view of module
mo
du
les
tissue samples
vertical bands indicate tight co-expression of module genes
GE
NE
S
Module of
co-expressed
genes
80
![Page 81: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/81.jpg)
Modules as eigengenes
• Can summarized all genes in a module by one eigengene (i.e. virtual gene)
• allow one to relate modules to each other
– Allows calculate distance between modules
• to relate modules to clinical traits and SNPs
81
![Page 82: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/82.jpg)
brown
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
brown
-0.10.0
0.10.2
0.30.4
Module Eigengene= measure of over-expression=average redness
Rows,=genes, Columns=microarray
The brown module eigengenes across samples
82
![Page 83: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/83.jpg)
Analysis of modules
• Relate modules to traits
• Interested in modules with correlation > 0.75 (red)
83
![Page 84: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/84.jpg)
WGCNA Demo Simulated data - 5 modules
84
![Page 85: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/85.jpg)
Simulating expression data (1) Note: install 1st Hmisc library otherwise WGCNA installation fails install.packages("Hmisc");
install.packages("WGCNA");
source("https://bioconductor.org/biocLite.R") ;
biocLite(c("GO.db", "preprocessCore", "impute"));
#Simulate data
# Load WGCNA package
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
# Here are input parameters of the simulation model
# number of samples or microarrays in the training data
no.obs=50
# now we specify the true measures of eigengene significance
# recall that ESturquoise=cor(y,MEturquoise)
ESturquoise=0; ESbrown= -.6;
ESgreen=.6;ESyellow=0
# Note that we dont specify the eigengene significance of the blue module
# since it is highly correlated with the turquoise module.
ESvector=c(ESturquoise,ESbrown,ESgreen,ESyellow)
# number of genes
nGenes1=3000
# proportion of genes in the turquoise, blue, brown, green, and yellow module #respectively.
simulateProportions1=c(0.2,0.15, 0.08, 0.06, 0.04)
# Note that the proportions dont add up to 1. The remaining genes will be colored grey,
# ie the grey genes are non-module genes.
# set the seed of the random number generator. As a homework exercise change this seed.
set.seed(1) 85
![Page 86: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/86.jpg)
Simulating expression data (2)
#Step 1: simulate a module eigengene network.
# Training Data Set I
MEgreen=rnorm(no.obs)
scaledy=MEgreen*ESgreen+sqrt(1-ESgreen^2)*rnorm(no.obs)
y=ifelse( scaledy>median(scaledy),2,1)
MEturquoise= ESturquoise*scaledy+sqrt(1-ESturquoise^2)*rnorm(no.obs)
# we simulate a strong dependence between MEblue and MEturquoise
MEblue= 0.6*MEturquoise+ sqrt(1-.6^2) *rnorm(no.obs)
MEbrown= ESbrown*scaledy+sqrt(1-ESbrown^2)*rnorm(no.obs)
MEyellow= ESyellow*scaledy+sqrt(1-ESyellow^2)*rnorm(no.obs)
ModuleEigengeneNetwork1=data.frame(y,MEturquoise,MEblue,MEbrown,MEgreen, MEyellow)
86
![Page 87: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/87.jpg)
Simulating expression data (3) dat1=simulateDatExpr5Modules(MEturquoise=ModuleEigengeneNetwork1$MEturquoise,
MEblue=ModuleEigengeneNetwork1$MEblue,
MEbrown=ModuleEigengeneNetwork1$MEbrown,
MEyellow=ModuleEigengeneNetwork1$MEyellow,
MEgreen=ModuleEigengeneNetwork1$MEgreen,
nGenes=nGenes1,
simulateProportions=simulateProportions1)
datExpr = dat1$datExpr;
truemodules = dat1$truemodule;
datME = dat1$datME;
attach(ModuleEigengeneNetwork1)
datExpr=data.frame(datExpr)
ArrayName=paste("Sample",1:dim(datExpr)[[1]], sep="" )
# The following code is useful for outputting the simulated data
GeneName=paste("Gene",1:dim(datExpr)[[2]], sep="" )
dimnames(datExpr)[[1]]=ArrayName
dimnames(datExpr)[[2]]=GeneName
rm(dat1); collectGarbage();
# The following command will save all variables defined in the current session.
save.image("Simulated-dataSimulation.RData");
cat("Note: *.RData file written in ",getwd(), "\n") 87
![Page 88: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/88.jpg)
Construction of a weighted gene co-expression network (1)
# Load WGCNA package
library(WGCNA)
# Load additional necessary packages
library(cluster)
1# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
# Load the previously saved data
load("Simulated-StandardScreening.RData");
attach(ModuleEigengeneNetwork1)
sft=pickSoftThreshold(datExpr,powerVector=1:20)
plot(sft$fitIndices[,1],-sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab="Soft Threshold (power)",ylab="SFT, signed R^2", type="o")
abline(h=0.90,col="red")
88
![Page 89: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/89.jpg)
Construction of a weighted gene co-expression network (2)
# here we define the adjacency matrix using soft
thresholding with beta=6
ADJ1=abs(cor(datExpr,use="p"))^6
# When you have relatively few genes (<5000) use the
following code
k=as.vector(apply(ADJ1,2,sum, na.rm=T))
# When you have a lot of genes use the following code
#k=softConnectivity(datE=datExpr,power=6)
# Plot a histogram of k and a scale free topology plot
sizeGrWindow(10,5)
par(mfrow=c(1,2))
hist(k)
scaleFreePlot(k, main="Check scale free topology\n")
89
![Page 90: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/90.jpg)
Definition of co-expression modules (1)
#Many clustering procedures require a dissimilarity matrix as input. We define a dissimilarity based on adjacency
# Turn adjacency into a measure of dissimilarity
dissADJ=1-ADJ1
hierADJ=hclust(as.dist(dissADJ), method="average" )
# Plot the resulting clustering tree together with the true color assignment
sizeGrWindow(10,5);
plotDendroAndColors(hierADJ, colors = data.frame(truemodules), dendroLabels = FALSE, hang = 0.03,
main = "Gene hierarchical clustering dendrogram and simulated module colors" )
90
![Page 91: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/91.jpg)
Definition of co-expression modules (2)
#static tree cutting
colorStaticADJ=as.character(cutreeStaticColor(hierADJ, cutHeight=.99, minSize=20))
# Plot the dendrogram with module colors
sizeGrWindow(10,5);
plotDendroAndColors(hierADJ, colors = data.frame(truemodules, colorStaticADJ),
dendroLabels = FALSE, abHeight = 0.99,
main = "Gene dendrogram and module colors")
#dynamic tree cutting
branch.number=cutreeDynamic(hierADJ,method="tree")
# This function transforms the branch numbers into colors
colorDynamicADJ=labels2colors(branch.number)
sizeGrWindow(10,5)
plotDendroAndColors(dendro = hierADJ,
colors=data.frame(truemodules, colorStaticADJ,
colorDynamicADJ, colorDynamicADJ),
dendroLabels = FALSE, marAll = c(0.2, 8, 2.7, 0.2),
main = "Gene dendrogram and module colors")
91
![Page 92: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/92.jpg)
Calculating module eigengenes
#caluculate eigengenes for each module
datME=moduleEigengenes(datExpr,colorStaticADJ)$eigengenes
#correlation between modules based on their eigengenes
signif(cor(datME, use="p"), 2)
#dendrogram
dissimME=(1-t(cor(datME, method="p")))/2
hclustdatME=hclust(as.dist(dissimME), method="average" )
# Plot the eigengene dendrogram
par(mfrow=c(1,1))
plot(hclustdatME, main="Clustering tree based of the module eigengenes")
#see expression profiles - diagnostic plots
#show available modules
levels(as.factor(colorStaticADJ))
sizeGrWindow(8,9)
par(mfrow=c(3,1), mar=c(1, 2, 4, 1))
which.module="blue";
plotMat(t(scale(datExpr[,colorStaticADJ==which.module ]) ),nrgcols=30,rlabels=T,
clabels=T,rcols=which.module,
title=which.module )
ME=datME[, paste("ME",which.module, sep="")]
barplot(ME, col=which.module, main="", cex.main=2,
ylab="eigengene expression",xlab="array sample")
92
![Page 93: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/93.jpg)
Relating modules to trait
#all modules (green and brown modules look interesting)
signif(cor(y,datME, use="p"),2)
#get statistical significance of module association to
trait
cor.test(y, datME$MEbrown)
cor.test(y, datME$MEgreen)
93
![Page 94: L8: Part 1 Hierarchical trees Representing timekbessonov/present_data/GBIO... · 2015-11-10 · The Zika virus is related to Dengue viruses, but is not a Dengue virus, ... (V2) and](https://reader033.vdocuments.net/reader033/viewer/2022050116/5f4d50bad0d9a233267ffa2d/html5/thumbnails/94.jpg)
References
[1] Langfelder P, Zhang B et al (2007) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R. Bioinformatics 2008 24(5):719-720
[2] Steve Horvath, Tutorials for the WGCNA package
94