![Page 1: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/1.jpg)
How do we represent the position specific preference ?
BID_MOUSE I A R H L A Q I G D E MBAD_MOUSE Y G R E L R R M S D E FBAK_MOUSE V G R Q L A L I G D D IBAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E FHRK_HUMAN T A A R L K A L G D E LEgl-1 I G S K L A A M C D D F
Statistical representation
G: 5 -> 71%
S: 1 -> 14 %
C: 1 -> 14 %
Basic concept of motif identification 2.
![Page 2: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/2.jpg)
Practice: identify potential transcription factor binding sites on a promoter
sequence.
Using TESS : Transcription Element Search System
http://www.cbil.upenn.edu/cgi-bin/tess/tess33?RQ=WELCOME
![Page 3: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/3.jpg)
TESS result
![Page 4: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/4.jpg)
Why there are many false positives for TF binding site scan?
Contextual dependency is not considered.
Stringency of the matrices.
![Page 5: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/5.jpg)
Stringency of the matrices
A C G T Consens
us 40 13 23 23 N
20 3 70 5 G
55 3 40 0 R
0 93 0 5 C
53 8 8 30 W
15 0 3 82 T
0 0 100 0 G
0 50 0 50 Y
0 68 0 30 C
12 35 3 48 Y
A C G T
Consensus
4 0 13 0 G 5 0 12 0 G
15 0 2 0 A 0 17 0 0 C
17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 13 0 4 C 0 17 0 0 C 0 17 0 0 C 0 0 17 0 G 0 0 17 0 G 2 0 15 0 G 0 17 0 0 C
17 0 0 0 A 0 0 0 17 T 0 0 17 0 G 0 2 0 15 T 0 13 0 4 C 0 7 2 7 Y P53_01
P53_02
Consensus –10 bp
Consensus –20 bp
![Page 6: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/6.jpg)
DNA Pattern – Transcription factor binding site
• Pattern strings / Matrixes are extracted from known binding sequence.
• Core vs whole.
• Some short and/or ambiguous patterns will have many hits.
![Page 7: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/7.jpg)
Sequence logo
Info N A C G T Consensus
1 0.679 27 0 5 17 5 G
2 0.883 27 6 2 19 0 G
3 1.771 27 1 0 26 0 G
4 1.619 27 25 2 0 0 A
5 2 27 0 0 0 27 T
6 1.771 27 0 0 1 26 T
7 1.771 27 26 0 0 1 A
8 0.192 27 8 2 11 6 R
1.0
2.0 Information
content
![Page 8: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/8.jpg)
Comparing genomes
For understanding genome organization.
For identifying functionally conserved region / sequences. 3’, 5’ UTR (eg. microRNA binding sites) Transcription factor binding sites /
regulatory modules.
![Page 9: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/9.jpg)
Vista Genome Browser
Practice & Observe: cross genome comparison using vista browser
![Page 10: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/10.jpg)
Identifying conserved regulatory modules
• Regulatory module: a set of TF binding sites that controls a particular aspects of transcriptional regulation.
• Functional requirement conservation at the binding site (sequence) level.
![Page 11: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/11.jpg)
Ways to Identify conserved regulatory modules
• Based on sequence similarity: MEME, rVista, Whole genome rVista for model
organisms…
• Based on binding site identity: BLISS
![Page 12: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/12.jpg)
Practice: Identifying conserved TF binding sites using rVista
1.) Search for your gene in Whole genome rVista.
Or
2.) Compile corresponding genomic region from different species (can be >2). Load to rVista. This can be used for identifying shared regulatory modules in related genes in the same organism as well.
![Page 13: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/13.jpg)
rVista
Practice & Observe: Load genomic sequences from Human, Rat, and Opossum to rVista. Choose TF matrices (e.g. E2F, P53, ATF, etc)
![Page 14: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/14.jpg)
Representation of Deep Seq data
chr2L 10000192 10000217 U0 0 + chr2L 10000227 10000252 U1 0 -chr2R 10000310 10000335 U2 0 +chr3L 10000496 10000521 U1 0 -chr21 10000556 10000581 U2 0 +
Chrom. Start End name Scor Strand
![Page 15: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/15.jpg)
Representation of Deep Seq data
The importance of reference genome
• All coordinates are only meaningful for a given genome assembly.
• One assembly may have multiple releases (annotations).
![Page 16: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/16.jpg)
Manipulating Deep Seq data with Galaxy
Practice & Observe:
1.Load the PolII.H99.Bed file to Galaxy with the Get Data tool.
2.Sort data based on chromosome location c2.
3.Filter out lines with U0 with the expression c4!=‘U2’
![Page 17: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/17.jpg)
Visualizing Deep Seq data with UCSC genome browser
Practice & Observe I:
1.Load the PolII.H99.Bed file as custom track to the browser by copy/past the URL link.
2.View ‘dense’ and then ‘full’ presentation of the track.
![Page 18: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/18.jpg)
Visualizing Deep Seq data with UCSC genome browser
Practice & Observe II:
1.Save the landmark.bed file to your local computer. View the contents with Notepad.
2.Load the local file to UCSC browser.
3.Edit the color value, save, resubmit, and observe the differences.
![Page 19: How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G](https://reader035.vdocuments.net/reader035/viewer/2022081514/5697bf861a28abf838c88503/html5/thumbnails/19.jpg)
Apollo Genome annotation tools
Observe: Using Apollo to organize information for studying complex genomic regions.