identification of transcription factor binding sites

31
1 Identification of Transcription Factor Binding Sites Lior Harpaz Ofer Shany 09/05/2004

Upload: verity

Post on 14-Jan-2016

62 views

Category:

Documents


8 download

DESCRIPTION

Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identification of Transcription Factor Binding Sites

1

Identification of Transcription Factor

Binding Sites

Lior HarpazOfer Shany09/05/2004

Page 2: Identification of Transcription Factor Binding Sites

2

Goal - find TFBS!

input output

Page 3: Identification of Transcription Factor Binding Sites

3

Importance TF regulate gene expression. Identification of TF can teach us:

Mapping of regulatory pathways

Potential functions of genes

Page 4: Identification of Transcription Factor Binding Sites

4

Experimental Methods Footprinting EMSA - electrophoretic mobility

shift assay

Problems:•Time consuming

•Not scaled up to whole genomes

Page 5: Identification of Transcription Factor Binding Sites

5

Computational Methods - Goals Identifying known TFBSs in

previously unknown locations.

Identifying unknown TFBSs.

Page 6: Identification of Transcription Factor Binding Sites

6

Computational Methods Basic idea - locate TFBS using

sequence-searching

Problems:•Short sequences (5-15 bp)

•Degenerate sequences

•Location

•Biological reality

Page 7: Identification of Transcription Factor Binding Sites

7

Computational MethodsPossible solutions:

Conservation = functional importance mRNA expression pattern

Phylogenetic footprinting

Network-level conservation

Page 8: Identification of Transcription Factor Binding Sites

8

Phylogenetic footprinting Identify ortholog genes

Concentrate on conserved non-coding regions (possible regulatory regions)

Look for conserved motifs.

Page 9: Identification of Transcription Factor Binding Sites

9

Why should it work? 40% alignment between human

and mice genome 80% of mouse genes have

orthologs in human genome Only 1%-5% of human genome

encodes proteins.

Page 10: Identification of Transcription Factor Binding Sites

10

Things to consider… Choosing genomes.

=?

Locating transcriptional start site.

Alignment method.

Page 11: Identification of Transcription Factor Binding Sites

11

More things to consider… Different evolution rates for

different regions in the genome.

PSSM score cut-off

Note - TFBSs within ORFs are not detected.

Page 12: Identification of Transcription Factor Binding Sites

12

Phylogentetic footprinting in proteobacterial genomes Study set of 190 genes of E.Coly

with known TBFSs. Orthologs were searched in eight

other bacteria. Motif search by Bayesian Gibbs

sampling.

Page 13: Identification of Transcription Factor Binding Sites

13

Bayesian Gibbs sampling

Algorithm for motif search. Each motif is assigned with a MAP value.

Page 14: Identification of Transcription Factor Binding Sites

14

Bayesian Gibbs sampling

Parameters and extensions: Model sequence Palindromic patterns Background pattern Distribution of spacing between TFBSs

and translation start site

Page 15: Identification of Transcription Factor Binding Sites

15

Results Overall – in 146/184 sets, motives matched

known regulatory sequences.

In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.

In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.

Page 16: Identification of Transcription Factor Binding Sites

16

Results Out of the 166 sets (with >= 2 orthologs):

131 corresponded to known TFBSs.

3 corresponded to known stem & loop structures.

32 data sets contained predictions with large MAP value: could be undocumentd sites !

Documented site were found in 138 sites without using palindromic models.

Page 17: Identification of Transcription Factor Binding Sites

17

Identification of a new TF New site found near fabA, fabB & yqfA

YijC binds to these sites.

Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.

Indication of yqfA’s involvement in metabolism of fatty-acids.

Page 18: Identification of Transcription Factor Binding Sites

18

Genomic scale phylogenetic footprinting 2113 ORFs of E.coli used. 187 new sites identified as

probable sites for 46 known TFs. Remaining sites are expected to

represent unknown TFBSs MAP Values of predicted sites were

lower.

Page 19: Identification of Transcription Factor Binding Sites

19

MAP values left-shift

Page 20: Identification of Transcription Factor Binding Sites

20

Ortholog Distribution

Full set

Study set

Page 21: Identification of Transcription Factor Binding Sites

21

Conclusions New sites for known TF were found.

Conservation of Regulatory stem-loops.

New sites for unknown TF are predicted.

New TF identified (YijC).

Predicted gene function (yqfA).

Page 22: Identification of Transcription Factor Binding Sites

22

הפסקה

Page 23: Identification of Transcription Factor Binding Sites

23

Network level conservation Each TF regulates the expression

of many genes (20-400).

Conservation of global gene

expression requires the

conservation of regulatory

mechanisms.

Page 24: Identification of Transcription Factor Binding Sites

24

Page 25: Identification of Transcription Factor Binding Sites

25

Data analysis

Total motifs: 80,000

P-value filter: 12,000

Low-complexity filter: 7,673

Hierarchically clustering: 1,269

Page 26: Identification of Transcription Factor Binding Sites

26

Validation

34/48 known sites discovered.

Large fraction of matches for significant p-values.

Page 27: Identification of Transcription Factor Binding Sites

27

Identification of known binding sites

Page 28: Identification of Transcription Factor Binding Sites

28

Biological Significance

Functional coherence

Expression coherence

Page 29: Identification of Transcription Factor Binding Sites

29

Characteristic Features

Conservation of binding affinity

Conservation of position &

orientation

Page 30: Identification of Transcription Factor Binding Sites

30

References Bulyk, M. Computational prediction of transcription-

factor binding site locations. Genome Biol. 2003 5:201

McCue L, Thompson W, Carmack C, Ryan MP, Liu JS,

Derbyshire V, Lawrence CE. Phylogenetic footprinting of

transcription factor binding sites in proteobacterial

genomes. Nucleic Acids Res. 2001 29:774-782.

Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome

discovery transcription factor binding sites by network-

level conservation. Genome Res. 2004 14:99-108

Page 31: Identification of Transcription Factor Binding Sites

31

Sensitivity Vs. Specificity

FPTP

TPyspecificit

FNTP

TPysensitivit