networks in biology - uni-leipzig.de...• hierarchical and modular organization ......

Post on 18-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Networks in Biology

Gene Regulatory Networks (GRNs)

Dr. Katja Nowick nowick@bioinf.uni-leipzig.de

www.nowick-lab.info

Networks in Biology Networks in cells (molecular networks):

• Metabolic Networks • Gene regulatory networks • Protein-Protein-Interaction networks Networks between cells:

• Neural networks • Immune system Networks in ecosystems:

• Food networks • Cooperation/Symbiosis Social networks:

• Friendships • Epidemiology

Identity of the nodes (vertices) and meaning of the links (edges) depends on the studied network

Characteristics of biological networks

• Node degree distribution follows a power law

• Small world characteristics

• Hierarchical and modular organization

• Overrepresentation of certain network motifs

• Preferential attachment

• Are dynamic

Typical parameters analyzed in a network

• Node degree (hubinesss)

• Neighborhood

• Centralization

• Clustering coefficient

• Centrality (Betweenness Centrality, Closeness Centrality)

Why are cells different from each other?

MacArthur et al., PLoS ONE 3: e3086 (2008)

• Stem cell differentiation regulation

6

Nodes: Genes, including transcription factors (TFs)

Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops

Examples of Gene Regulation Networks

• TF network of E.coli

Examples of Gene Regulation Networks

Ca. 20% of all interactions in E.coli Here nodes are operons (genes on the same mRNA) Links: TF X regulates operon Y

• TF network of drosophila embryonic development

Examples of Gene Regulation Networks

! TFs are also proteins some generated proteins regulate new genes network

Transcription + translation (gene expression)

MacArthur et al., PLoS ONE 3: e3086 (2008)

• Stem cell differentiation regulation

10

1. Nodes: Genes, including transcription factors (TFs)

Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops

Examples of Gene Regulation Networks

TFs regulate expression of other genes

Gene Promoter

TF

TFs regulate expression of other genes

Gene Promoter

Many TFs have to come together to start/stop transcription of a target

Transcription factors (TFs)

Modified after Messina et al., 2004

~ 1500 TFs in human genome

RFX

ZNF

HOX

BHLH

Β-Scaffold

BZip

NHR

Trp cluster

FOX Bromodomain

T-Box

Jumonji

E2F

Dwarfin

Paired Box

Heat shock

Tubby AF-4

Methyl-CpG-binding

AP-2

TEA

Pocket domain

GCM

Other

Structural

ZNF 762

HOX 199

BHLH 117

Some TFs bind DNA as dimers

bHLH: basic helix loop helix TFs bZip: beta zipper TFs NR: nuclear receptors

Homo-dimers or hetero-dimers added complexity

Many TFs have to come together to start/stop transcription of a target

Environmental signals trigger the GRN

Environmental signals trigger the GRN - Activators -

Environmental signals trigger the GRN - Repressors -

TFs are often hubs in the GRNs

TF

TF TF

TF

TF

TF

TF

TF

• TFs and their target genes

TF Binding sites (TFBS): short sequence motifs, degenerate

Enhancers are sites on the DNA helix that are bound to by activators in order to loop the DNA bringing a specific promoter to the initiation complex. Enhancers are much more common in eukaryote than prokaryotes, where only a few examples exist (to date).

Silencers are regions of DNA sequences that, when bound by particular transcription factors, can silence expression of the gene.

TF binding to DNA

Gene Promoter

TFs recognize specific sites/motifs in DNA

• TFs bind short sequence motifs • Motifs are degenerated

Gene Promoter

TFs interact to regulate their targets

• TFs cooperate to regulate their targets

TF

TF TF

TF

TF

TF

TF TF

TFs interact to regulate their targets

• Co-occurrence of TF binding sites in the genome

Encode 2012

Complex TF interactions

• TFs bind as monomers, homo-dimers, or hetero-dimers

• Multiple TFs (~7-10) cooperate to regulate gene expression

• TFs regulate the expression of other TFs

• Feedback loops, autoregulation …

• It makes sense to represent this complexity in a network

• Summary

TFs: what is known and what not

Not only TFs regulate gene expression

General TFs RNA polymerase II transcription-

initiation complex

Specific TFs Activate or repress expression of

particular genes

miRNAs Bind to mRNA to degrade them

Cofactors Bridge between specific and

general TFs; activate or repress

Chromatin remodeler Make DNA accessible or

inaccessible

GRFs

*GRN = Gene Regulatory Factor

Epigenetic control of gene expression

• Chromatin remodeler

Examples of epigenetic/histone modifications

*

*

Temporal changes of the epigenome

Interactions between TFs and histone modifications

• Histone modifications influence chromatin states • Chromatin states influence binding of TFs • TFs interact with enzymes that modify histones

Not only TFs regulate gene expression

General TFs RNA polymerase II transcription-

initiation complex

Specific TFs Activate or repress expression of

particular genes

miRNAs Bind to mRNA to degrade them

Cofactors Bridge between specific and

general TFs; activate or repress

Chromatin remodeler Make DNA accessible or

inaccessible

GRFs

*GRN = Gene Regulatory Factor

A primary miRNA (pri-miRNA) transcript is encoded in the cell's DNA and transcribed in the nucleus, processed by an enzyme Dosha and exported into the cytoplasm where it is further processed by Dicer. After strand separation, the mature miRNA represses protein production either by blocking translation or causing transcript degradation.

miRNAs • = small non-coding RNA molecule (ca. 22 nucleotides) • > 1000 miRNAs in the human genome

Interactions between TFs, miRNAs, other ncRNAs, and histone modifications

• Neurogenesis

Interactions between TFs, miRNAs, other ncRNAs, and histone modifications

• TFs bind as monomers, homo-dimers, or hetero-dimers

• Multiple TFs (~7-10) cooperate to regulate gene expression

• TFs regulate the expression of other TFs

• Feedback loops, autoregulation …

• Network

~5000 ncRNAs

~375 Mio interactions

• Add epigenetic modifications

• Add ncRNAs

• Even more complex networks

Why are tissues different from each other?

Cell states are defined by gene expression

How is a gene activated or repressed (at a certain time and location)? So let’s talk about the links now

MacArthur et al., PLoS ONE 3: e3086 (2008)

• Stem cell differentiation regulation

36

Nodes: Genes, including transcription factors (TFs)

2. Links: Interactions: who regulates expression of whom Directional or bidirectional Activating or repressing Feed-back and other loops

Examples of Gene Regulation Networks

Cell states are defined by gene expression

How is a gene activated or repressed (at a certain time and location)? Goal: discover which gene is regulated by which TF How do we get the information for the links?

• Manual • Semi-automated (i.e. preBIND) • Natural Language Processing (NLP) (i.e. PathwayStudio)

Donaldson I, et al. BMC Bioinformatics. 4:11 (2003)

preBIND

38

Network construction based on literature

Is the network encoded in the DNA?

TFs bind to specific motifs It should be possible to predict TF target genes by reading the DNA

http://fasta.bioch.virginia.edu/cshl/

Experimental approaches

Experimental approaches

Experimental approaches

Experimental approaches

Experimental approaches

Experimental approaches

• Expensive

• Time consuming

• For one research group only feasible for a few TFs

A collection of TFBS can be found in databases: Jasper, Transfac

Motif databases • Jaspar: http://jaspar.genereg.net/ • http://www.gene-regulation.com

/pub/databases.html

To score a single site s for match to a motif W, we use Pr(s |W )

How good is a motif?

• Pr (s | W) is the key idea. However, some statistical mashing is done on this. Consider a genome that is very A/T rich: Pr(A) = 0.45, Pr(T) = 0.45, Pr (C) = 0.05, Pr(G) = 0.05 We saw that Pr (ACACGTT | W) = 0.048 In fact Pr (ACATGTT | W) = 0.048 too.

• Scoring motif matches

• Compute the probability of each site under the above “background model”: Pr (ACACGTT ) = 0.45x0.05x0.45x0.05x0.05x0.45x0.45 =0.0000051. So Pr (ACACGTT | W) = 0.048 is 9364 times Pr (ACACGTT) Similarly, Pr (ACATGTT) is 0.0000461. So Pr (ACATGTT | W) = 0.048 is 1040 times Pr (ACATGTT)

• Pr (ACACGTT | W) is 9364 times Pr (ACACGTT) Pr (ACATGTT | W) is 1040 times Pr (ACATGTT) In other words, if we compare how well “W explains the site” to how well “random background explains it”, then ACACGTT stands out.

How good is a motif?

• The Log Likelyhood Ratio (LLR) score

Given a motif W, background nucleotide frequencies Wb, and a site s,

LLR score of s = log (Pr(s |W) / Pr(s |Wb )

Good scores > 0.

Bad scores < 0.

How good is a motif?

Find motif matches in DNA

Typically people designate the gene closest to the motif as TF’s target

Finding the TF target gene

• So, what to do with the motif now?

We assumed that we have experimental characterization of a TFs binding specificity (the motif) What if we don’t? We can try computational motif discovery

Motif discovery

Motif discovery – Option 1

Try to find the motif given the promoter regions of the five genes G1, G2, … G5

Motif discovery – Option 2

Idea: Find a motif with many (significantly more than expected by chance) matches in the given sequences

Motif discovery – some algorithms

Motif discovery – some tools

Is the network encoded in the DNA?

TFs bind to specific motifs It should be possible to predict TF target genes by reading the DNA Is this really so simple?

• For most TFs is the binding site not known • Since TFBS are degenerated, hard to predict

how efficient the TF really binds • How far away can the binding site be from

the promoter? • Multiple TFs might compete for the same

binding site • Is the nearest gene really the target gene? • Does the binding event have an effect at all? • …

Does the TF binding really have an effect?

• Chromatin immuno-precipitation (ChIP)-Seq

+ - • Overexpression or knock-down of TFs in cell lines, followed by RNA-Seq

Problem: TFs bind at many places

But is indeed a gene regulated by the binding event?

Combine motif finding experiments with experiments changing the TF expression (perturbartion experiments)

Inferring networks from perturbations

Sachs et al. Science. 2005 308:523-9

60

Reverse engineering the topology of regulatory molecular biological networks can be done through the analysis of a set of perturbations. Picture: reversed engineering of the hierarchy of a cell signaling network using multiple perturbations and a statistical method called Bayesian networks inference.

Inferring Networks from Time Series Microarrays

61

Zou M, Conzen SD. Bioinformatics. 2005 21(1):71-9.

Regulatory interactions can also be inferred directly from data = reverse engineering of biological pathways/networks from data. In the example above time-series expression data is used to infer a directed and signed graph based on delayed correlations.

Why are tissues different from each other?

• Summary

Hierarchy Top layer

Kernels Initial TFs

Core layer

Bottom layer Differentiation batteries

Terminal TFs

GRNs are hierarchical

GRNs are hierarchical - Yeast

The model based on experimental evidence in yeast organizes TFs in a stratified nature of three distinct layers: the top, core, and bottom layers. TFs within a layer are highly interconnected and share similar properties. TFs of the different layers regulate distinct sets of targets genes. The three layers are also connected by a central skeleton, a feed-forward structure that utilizes the TFs of the top layer to regulate TFs of the core layer, and TFs of the core layer to regulate TFs of the bottom layer. The core layer is characterized by the highest number of TFs and hubs and is important for signal propagation for the regulation of almost all targets.

Yeast regulatory network of 13385 regulatory interactions among 4503 genes, which includes 158 TFs and 4369 target genes .

Hierarchy

Top layer Kernels

Initial TFs

Core layer

Bottom layer Differentiation batteries

Terminal TFs

GRNs are hierarchical - Development

Hierarchy

Top layer Kernels

Initial TFs

Core layer

Bottom layer Differentiation batteries

Terminal TFs

Developmental biologists have proposed a concept that concentrates on the timely order of events in developmental pathways. In this system modules are classified as kernels, plug-ins, input-output switches and differentiation batteries. Modules can be thought of fulfilling one specific function . Kernels are the initial modules of the network that impact most other parts of the net-work. They are, for instance, involved in the initiation of the development of certain body parts. Differentiation batteries may play a role in terminal steps of the differentiation of body parts and do generally not affect other parts of the network.

e.g. drosophila development

GRNs are hierarchical – cell fate Hierarchy

Top layer Kernels

Initial TFs

Core layer

Bottom layer Differentiation batteries

Terminal TFs

Hobert O PNAS 2008;105:20067-20071

Terminal selector TFs (acting either alone or in synergistic combination) activate downstream target genes directly via terminal selector motifs and also autoregulate their own expression via those motifs. Autoregulated expression of a terminal selector is critical to maintain the differentiated features of the cell. Downstream targets of terminal selectors (X) define differentiated properties of a neuron, such as neurotransmitter receptor, ion channels, adhesion proteins etc. Targets may also include TFs that regulate specific “subroutines.” TFs that are induced by terminal selectors may also cooperate with terminal selector proteins in a feed-forward loop configuration to jointly control specific terminal genes.

top related