identification of transcription factor binding sites
DESCRIPTION
Identification of Transcription Factor Binding Sites. Lior Harpaz Ofer Shany 09/05/2004. Goal - find TFBS !. input. output. Importance. TF regulate gene expression. Identification of TF can teach us: Mapping of regulatory pathways Potential functions of genes. Experimental Methods. - PowerPoint PPT PresentationTRANSCRIPT
1
Identification of Transcription Factor
Binding Sites
Lior HarpazOfer Shany09/05/2004
2
Goal - find TFBS!
input output
3
Importance TF regulate gene expression. Identification of TF can teach us:
Mapping of regulatory pathways
Potential functions of genes
4
Experimental Methods Footprinting EMSA - electrophoretic mobility
shift assay
Problems:•Time consuming
•Not scaled up to whole genomes
5
Computational Methods - Goals Identifying known TFBSs in
previously unknown locations.
Identifying unknown TFBSs.
6
Computational Methods Basic idea - locate TFBS using
sequence-searching
Problems:•Short sequences (5-15 bp)
•Degenerate sequences
•Location
•Biological reality
7
Computational MethodsPossible solutions:
Conservation = functional importance mRNA expression pattern
Phylogenetic footprinting
Network-level conservation
8
Phylogenetic footprinting Identify ortholog genes
Concentrate on conserved non-coding regions (possible regulatory regions)
Look for conserved motifs.
9
Why should it work? 40% alignment between human
and mice genome 80% of mouse genes have
orthologs in human genome Only 1%-5% of human genome
encodes proteins.
10
Things to consider… Choosing genomes.
=?
Locating transcriptional start site.
Alignment method.
11
More things to consider… Different evolution rates for
different regions in the genome.
PSSM score cut-off
Note - TFBSs within ORFs are not detected.
12
Phylogentetic footprinting in proteobacterial genomes Study set of 190 genes of E.Coly
with known TBFSs. Orthologs were searched in eight
other bacteria. Motif search by Bayesian Gibbs
sampling.
13
Bayesian Gibbs sampling
Algorithm for motif search. Each motif is assigned with a MAP value.
14
Bayesian Gibbs sampling
Parameters and extensions: Model sequence Palindromic patterns Background pattern Distribution of spacing between TFBSs
and translation start site
15
Results Overall – in 146/184 sets, motives matched
known regulatory sequences.
In 18 genes (with 1 ortholog) only 67% known sites were matched, and with low MAP value.
In 166 sets (with >=2 orthologs) – 81% of motives matched known regulatory sequences.
16
Results Out of the 166 sets (with >= 2 orthologs):
131 corresponded to known TFBSs.
3 corresponded to known stem & loop structures.
32 data sets contained predictions with large MAP value: could be undocumentd sites !
Documented site were found in 138 sites without using palindromic models.
17
Identification of a new TF New site found near fabA, fabB & yqfA
YijC binds to these sites.
Site location, protein structure & previous experimental results suggests YijC is a repressor for the fab genes.
Indication of yqfA’s involvement in metabolism of fatty-acids.
18
Genomic scale phylogenetic footprinting 2113 ORFs of E.coli used. 187 new sites identified as
probable sites for 46 known TFs. Remaining sites are expected to
represent unknown TFBSs MAP Values of predicted sites were
lower.
19
MAP values left-shift
20
Ortholog Distribution
Full set
Study set
21
Conclusions New sites for known TF were found.
Conservation of Regulatory stem-loops.
New sites for unknown TF are predicted.
New TF identified (YijC).
Predicted gene function (yqfA).
22
הפסקה
23
Network level conservation Each TF regulates the expression
of many genes (20-400).
Conservation of global gene
expression requires the
conservation of regulatory
mechanisms.
24
25
Data analysis
Total motifs: 80,000
P-value filter: 12,000
Low-complexity filter: 7,673
Hierarchically clustering: 1,269
26
Validation
34/48 known sites discovered.
Large fraction of matches for significant p-values.
27
Identification of known binding sites
28
Biological Significance
Functional coherence
Expression coherence
29
Characteristic Features
Conservation of binding affinity
Conservation of position &
orientation
30
References Bulyk, M. Computational prediction of transcription-
factor binding site locations. Genome Biol. 2003 5:201
McCue L, Thompson W, Carmack C, Ryan MP, Liu JS,
Derbyshire V, Lawrence CE. Phylogenetic footprinting of
transcription factor binding sites in proteobacterial
genomes. Nucleic Acids Res. 2001 29:774-782.
Pritzker M, Liu YC, Beer MA, Tavazoie S. Whole-genome
discovery transcription factor binding sites by network-
level conservation. Genome Res. 2004 14:99-108
31
Sensitivity Vs. Specificity
FPTP
TPyspecificit
FNTP
TPysensitivit