identification of transcription factor binding sites

Post on 14-Jan-2016

62 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

DESCRIPTION

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods. - PowerPoint PPT Presentation

TRANSCRIPT

1

Identification of Transcription Factor

Binding Sites

Lior HarpazOfer Shany09/05/2004

2

Goal - find TFBS!

input output

3

Importance TF regulate gene expression. Identification of TF can teach us:

Mapping of regulatory pathways

Potential functions of genes

4

Experimental Methods Footprinting EMSA - electrophoretic mobility

shift assay

Problems:•Time consuming

•Not scaled up to whole genomes

5

Computational Methods - Goals Identifying known TFBSs in

previously unknown locations.

Identifying unknown TFBSs.

6

Computational Methods Basic idea - locate TFBS using

sequence-searching

Problems:•Short sequences (5-15 bp)

•Degenerate sequences

•Location

•Biological reality

7

Computational MethodsPossible solutions:

Conservation = functional importance mRNA expression pattern

Phylogenetic footprinting

Network-level conservation

8

Phylogenetic footprinting Identify ortholog genes

Concentrate on conserved non-coding regions (possible regulatory regions)

Look for conserved motifs.

9

Why should it work? 40% alignment between human

and mice genome 80% of mouse genes have

orthologs in human genome Only 1%-5% of human genome

encodes proteins.

10

Things to consider… Choosing genomes.

=?

Locating transcriptional start site.

Alignment method.

11

More things to consider… Different evolution rates for

different regions in the genome.

PSSM score cut-off

Note - TFBSs within ORFs are not detected.

12

Phylogentetic footprinting in proteobacterial genomes Study set of 190 genes of E.Coly

with known TBFSs. Orthologs were searched in eight

other bacteria. Motif search by Bayesian Gibbs

sampling.

13

Bayesian Gibbs sampling

Algorithm for motif search. Each motif is assigned with a MAP value.

14

Bayesian Gibbs sampling

Parameters and extensions: Model sequence Palindromic patterns Background pattern Distribution of spacing between TFBSs

and translation start site

15

Results Overall – in 146/184 sets, motives matched

known regulatory sequences.

In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.

In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.

16

Results Out of the 166 sets (with >= 2 orthologs):

131 corresponded to known TFBSs.

3 corresponded to known stem & loop structures.

32 data sets contained predictions with large MAP value: could be undocumentd sites !

Documented site were found in 138 sites without using palindromic models.

17

Identification of a new TF New site found near fabA, fabB & yqfA

YijC binds to these sites.

Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.

Indication of yqfA’s involvement in metabolism of fatty-acids.

18

Genomic scale phylogenetic footprinting 2113 ORFs of E.coli used. 187 new sites identified as

probable sites for 46 known TFs. Remaining sites are expected to

represent unknown TFBSs MAP Values of predicted sites were

lower.

19

MAP values left-shift

20

Ortholog Distribution

Full set

Study set

21

Conclusions New sites for known TF were found.

Conservation of Regulatory stem-loops.

New sites for unknown TF are predicted.

New TF identified (YijC).

Predicted gene function (yqfA).

22

הפסקה

23

Network level conservation Each TF regulates the expression

of many genes (20-400).

Conservation of global gene

expression requires the

conservation of regulatory

mechanisms.

24

25

Data analysis

Total motifs: 80,000

P-value filter: 12,000

Low-complexity filter: 7,673

Hierarchically clustering: 1,269

26

Validation

34/48 known sites discovered.

Large fraction of matches for significant p-values.

27

Identification of known binding sites

28

Biological Significance

Functional coherence

Expression coherence

29

Characteristic Features

Conservation of binding affinity

Conservation of position &

orientation

30

References Bulyk, M. Computational prediction of transcription-

factor binding site locations. Genome Biol. 2003 5:201

McCue L, Thompson W, Carmack C, Ryan MP, Liu JS,

Derbyshire V, Lawrence CE. Phylogenetic footprinting of

transcription factor binding sites in proteobacterial

genomes. Nucleic Acids Res. 2001 29:774-782.

Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome

discovery transcription factor binding sites by network-

level conservation. Genome Res. 2004 14:99-108

31

Sensitivity Vs. Specificity

FPTP

TPyspecificit

FNTP

TPysensitivit

top related