from genome sequences to regulatory network phenotypes
DESCRIPTION
From Genome Sequences to Regulatory Network Phenotypes. (bioinformatic functional genomics:). Study the systematic operation of genes and their products in whole genome, whole cell contexts. Discover the effect of every gene on growth, expression, & interaction . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/1.jpg)
From Genome Sequences to Regulatory Network Phenotypes
• Study the systematic operation of genes and their products in whole genome, whole cell contexts.
• Discover the effect of every gene on growth, expression, & interaction .
• Test quantitative network models.
(bioinformatic functional genomics:)
![Page 2: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/2.jpg)
Growth, Expression, & InteractionHarvard Center for
Computational Genetics
John Aach
Tim Chen
George Church
Jason Hughes
Jason Johnson
Abby McGuire
Jong Park
Fritz RothAffymetrix
David Lockhart
Eric Gentalen
NCBI
Andrew Neuwald
DOE, DARPA, Lipper, NIST, HMR
HMS Genetics
Andy Link, Doug Selinger
Pete Estep, Michael Ching
Martha Bulyk, Sonali Bose
Martin Steffen
Saeed Tavazoie, Annie Chan
Dereth Phillips, Chris Harbison
UCSD
Bernhard Palsson
![Page 3: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/3.jpg)
Sequenced genomes
Organism # Genes% Unknown
functionS cerevisiae 6034 49%E coli 4288 38%B subtilus 4000 42%Synechocystis sp. 3168 56%A fulgidus 2471 52%H influenzae 1740 42%M thermoautotrophicum 1855 56%H pylori 1590 43%M jannaschii 1692 54%B burgdorgeri 863 42%M pneumoniae 677 51%M genitalium 470 31%
Total 28848 47%
Science 277: 1433 (1997) FUNs
![Page 4: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/4.jpg)
Choice of Cells
Small genome size: Mycoplasma, Haemophilus, MethanococcusEnergy relevance: Methanobacterium, Synechocystis Major Pathogens: Mycobacterium, Escherichia, HelicobacterBiotech Production: Escherichia, Saccharomyces, Homo Recombinant protein production, in vivo combinatorial chemistry,BACs, gene delivery, etc.
15 going on 40 complete genomes. 30,000 going on 150,000 complete genes (& intergenic regions).
Smith, et al. (1997) J. Bacteriol. 179:7135-55. MethanobacteriumBlattner, et al. (1997) Science 277, 1453-74. EscherichiaGoffeau, et al. (1996) Science 274, 563-7. Saccharomyces
![Page 5: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/5.jpg)
Metabolic & regulatory databases
4288 / 4909 E. coli orfs / genes 587 - 804 enzymes720 - 988 metabolic reactions436 / 1303 metabolites / compounds
Varma & Palsson (1994) Appl. Env. Micro. 60:3724.Karp et al. (1998) NAR 26:50. EcoCycSelkov, et al. (1997) NAR 25:37. WITRobison and Church http://arep.med.harvard.edu
![Page 6: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/6.jpg)
has
exhibits
used in
described by
has
described by
described bydescribed by
exhibits
exhibits
exhibits
exhibits
exhibits
exhibits
exhibits
input to
used in
used in
used in
Strain Phenotype Expt
Starting Cell CountStarting Cell Density
Condition Set
Condition Set NumberDescriptionComment
Experiment Measures Set
Expt Measures Set NoTime of MeasurementExpt Measures Set TypeDescriptionCommentRaw Data Sets DescripData Transform DescripOutcome CommentSuccess CodeDate RecordedSample SizeOpenInd
Growth
Rel Growth MutantStd dev Rel Growth MutantWinner Mutant IndRel Growth AllStd dev Rel Growth AllWinner All Ind
mRNA Expression
mRNA Expression LevelStd dev Express Level
Protein Expression
Cell FractionProtein State Exp LevelStd Dev Prot State Level
Strain Mix
Strain Mix NumberStrain Mix NameDescriptionPreparation Comments
Conceptual Data Model
Project : TBEID1
Model : TBEID
Author : John Aach Version: 1.04 7/7/97
Footprint
Fraction OccupancySt Dev Frac Occupancy
DNA Protein Binding Expt
DNA Seq Binding
DNA Seq Bind Const NumDNA SequenceBinding ConstantStd Dev Binding Constant
Protein Preparation Set
Prot Prep Set NumberDescriptionComment
Protein Protein Binding
Binding LevelStd Dev Binding Level
Protein Protein Binding Expt
Submodel cross-references: * = main model, C = Condition Set Entities, D = DNA and Protein Elements, N = Names, P = Protein Preparation Entities, S = Strain and Strain Mix Entities
(P)
Competition Phenotype Expt
Starting Cell CountStarting Cell Density
(S)
(C)
(S,N)
Non Specific DNA Binding
Non Specific Binding ConstStd Dev Non Spec Bind Const
Experiment Info
Experiment NumberExperiment TypeExperimenter NameDescriptionCommentStart TimeEnd TimeOutcome CommentSuccess CodeSample SizeOpenInd
Strain
Strain NumberProgenitorIndDescriptionComment
Results Selection
Results Selection CodeExpt Measures Set TypeResults Selection Description
BIGED
Biomolecule Interaction,
Growth, Expression, &
Database:
John AachHarvard Center for Computational Genetics
![Page 7: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/7.jpg)
Functional Genomics: Growth, Expression, & Interaction
Why?Sampled sequence vs. Completed genomesRandom vs. Engineered mutations & environmentsEvolutionary models vs. High-throughput assays
Pure comparative genomics challenge:15% amino acid identity:Globins retain heme & oxygen binding functions
100% amino acid identity:Enolase functions vary from enzymatic to major vertebrate lens structural component.
![Page 8: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/8.jpg)
Environments
Metabolites
Growth rate
RNADNA Protein
Expression
InteractionskD
kR kP
kI
kc
kD , kD , kD : Initiate, Elongate, Terminate, Fold, Modify, Localize, Degrade
Escherichia coli & Saccharomyces cerevisiaeRegulatory and Metabolic Networks
![Page 9: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/9.jpg)
Automate Data Model Similarity quality quality search
X-ray 1960 resolution |o-c|/o DALIdiffraction < 0.2nm R < 0.2
Sequence 1988 discrepancy conserved BLAST bp <0.01% proteins
Function 1999 completion DNAgibbs CorFun (growth, expression, & interaction; CorEnvironment)
Translating successful strategies: Metrics(physics envy & killer applications)
![Page 10: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/10.jpg)
Ratio of strains over environments, e ,times, te , selection coefficients, se,R = Ro exp[-sete]
80% of 34 random yeast insertions have s<0.3% or s>0.3%t=160 generations, e=1 (rich media); ~50% for t=15, e=7.Should allow comparisons with population allele models.
Other multiplex competitive growth experiments:Thatcher, et al. (1998) PNAS 95:253.Link AJ (1994) thesis; (1997) J Bacteriol 179:6228.Smith V, et al. (1995) PNAS 92:6479. Shoemaker D, et al. (1996) Nat Genet 14:450.
![Page 11: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/11.jpg)
Multiplex DNA sequencing.Church GM. Kieffer-Higgins S. (1988) Science. 240:185.
Physical mapping of complex genomes by cosmid multiplex analysis. Evans GA. Lewis KA. (1989) PNAS 86: 5030.
Multiplexed biochemical assays with biological chips.Fodor SP, et al. (1993) Nature 364:555.
Lashkari DA, et al. (1995) An automated multiplex oligonucleotide synthesizer. PNAS 92(17):7912.
Multiplex: Tag(Mix) > Process > DecodeInternal standards, identical conditions, microscale
![Page 12: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/12.jpg)
Multiplex Competitive Growth Experiments
In-framemutants+ wild-type
Pool Select
MultiplexPCRsize-tagor chipreadout
40° pH5 NaCl Complex
t=0
![Page 13: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/13.jpg)
107 Environments (so far)
minimal mediayeast extractsynthetic richLow NLow PNaClurinepancreatinBile Cholatetriton X-1002 acetate4 butyrate6 hexanoatehomoserine lactone
Combinatorial:a,H,F,Q,tg,L,Y,N,SC,I,W,u,E M,K,T,D,dapV,P,R,G,thiaminea,g,C,M,thiamine H,L,I,K,VF,Y,W,T,PQ,N,u,D,Rt,S,E,dap,G
pH: 5, 6, 7, 8, 9Temperature: 25, 30, 37, 45
pyridoxin,nicotinate,biotin,pantothenate,A
![Page 14: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/14.jpg)
Genome EngineeringChallenges: Construct any mutant in any background,multiple mutants, minimizing hitchhiking mutants.
Avoid undesired residual activities and neomorphic effects on adjacent genes in most deletion, insertionnonsense, or antisense alleles.Full in-frame replacements, computationally track gene overlaps, primer & genomic repeats.
Link, et al. (1997) J. Bacteriol. 179: 6228-6237. (pKO3)http://arep.med.harvard.edu
![Page 15: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/15.jpg)
ATG
TAA
Primer with NotI site
c-tag
tagATG
TAA
ATG
TAA
Primer with Bam site
TAAATG
tag
Crossover PCR in-frame deletions / tag substitutions
nearby genegene of interest
![Page 16: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/16.jpg)
30°sucrose
Resolving the cointegrant
2 = mutantwild type = 1
repAts
camR
sacB
M13 ori
43° Cam
pKO3: in-frame tagged deletions
tag
tag
![Page 17: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/17.jpg)
Deleted Orf
yiaU
yhcS
ydhB
yfiE
pssR
789
518
348
266
194
141
106
universaltag primer
Primer design for size-tagged PCR3% agarose
size-tagged primerslength
ygfX
ygoX
![Page 18: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/18.jpg)
Competitive Growth Rate Tag Readout
ygfX
yiaU
ydhB
yfiE
ygoX
pssR
yhcS
1 2
rich P- minimal N- minimal
111 222
![Page 19: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/19.jpg)
![Page 20: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/20.jpg)
Effects of pH in rich media
-200
-100
0
100
200
300
400
500
600
700
pssR farR nhaR ydhB yhcS yidP yhiF yidL uw6519
% c
han
ge f
rom
in
ocu
late
r' pH5
r' pH6
r' pH7
r' pH8
r' pH9
![Page 21: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/21.jpg)
Genome EngineeringCurrent status
5 Highly Expressed Genes Link46 Putative regulatory FUNs Phillips24 Highly conserved FUNs Loferer20 Flux Balance Predictions in prep.
![Page 22: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/22.jpg)
Flux balance modelwith max growth objective:
S . v = bS = stoichiometric matrix (m x n)v = vector of n fluxesb = I/O rate vectorn = 720 metabolic fluxesm= 436 metabolites
Predict major flux changes:
zwf-
zwf- pnt-
& synthetic lethals:
zwf- pgi-
GA3P
DPG
FDP
F6P
G6P
10.5010.5010.50
Glucose
3.929.279.36
3PG
2PG
PEP
Pyr
DHAP
6PGA 6PG
Ru5P
E4P
X5P
R5P
S7P
For
OAA
Mal
Fum
Succ
SuccCoA
KG
Ic i tC it
AcCoA
QH2
FADH
NADH
ATP
NADPH
H+
Ac
6.1600
3.9210.0810.11
2.700.590.64
1.87- 0 .1 8- 0 .6 2
1.54- 0 .5 1- 0 .4 7
3.929.279.36
3.929.279.36
1.89- 0 .1 6- 0 .1 5
3.44- 0 .6 7- 0 .6 2
15.9218.0018.21
15.9218.0018.21
14.5216.6216.93
14.5216.6216.93
10.5
0.953.07 0
0.522.525.18
0.522.525.18
0.122.134.82
1.403.405.99
1.403.405.99
1.403.405.99
1.343.345.94
1.343.342.33
0 03.61
5.085.253.54
9.3711.51 0
0 012.19
0.522.525.18
010.2 0
36.2731.5633.43
30.0
00.04 0
29.1227.1224.52
2.382.355.79
![Page 23: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/23.jpg)
Non-coding regions:E. coli: 11%Yeast: 25%Human: 95%
Similarity searching for environments,growth, expression, & interaction data and then theChallenges of DNA sequence motifs:short motifs & limited alphabet (4)
![Page 24: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/24.jpg)
Yggn
pspAo85
YiaK
carAB
f214
hrsAf105
ppiA
o184mtlA5’
mtlA3’
rspA
YidX
kdgT
Yggn
pspAo85
YiaK
carAB
f214
hrsAf105
ppiA
o184
mtlA
5’
mtlA
3’
YidX
rspA
kdgT
A
B
C
D
E
F
Positive correlationNegative correlation
Catabolite repressionglucose & Crp regulated
CorFun = Zg.Zg
T /nn = #environ+genotypesg = gene sites
(switching n & g gives CorEnv)
Log vs. stationary-phase regulated
growth, expression, &/or interaction
![Page 25: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/25.jpg)
Expression data from four cultures,allow three comparisons
glucose 30oC
Mating type a
galactose 30oC
Mating type a
glucose 30oC
Mating type
glucose 30o C -> 39o C shock
Mating type a
![Page 26: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/26.jpg)
Expression Quantitation Options
1) n-dimensional cDNA or protein displays2) Computer selected oligomer-arraysphotolithographic or piezoelectric deposition3) Gridded microarrays from clones4) Counting 13-bp cDNA tags (SAGE)(20,000 tags means <800 RNAs have S/N>4)
Lockhart, et al. (1997) Nature Biotechnology 15:1359. DeRisi, et al. (1997) Science 278:680.Velculescu, et al. (1997) Cell 88:243.
![Page 27: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/27.jpg)
Galactose Regulatory Network
Gal4p-Gal80p active complex
Gal3p
GAL1MEL1 GAL7PGM2 GAL2 GAL10
Gal4p-Gal80p inactive complex
GALACTOSE
GAL80
GAL4
GCY1
Structural Genes For Galactose Metabolism
?
GAL3
Gal1p
![Page 28: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/28.jpg)
Fold Change in GAL3 in Galactose vs. Glucose(Median Fold Change is 3.1)
GAL3: Fold Change in Expression between Growth in Galactose and Growth in Glucose
0
5
10
15
20
25
1 3 5 7 9
11 13
15
17
19
Probe Number
Fo
ld C
ha
ng
e
![Page 29: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/29.jpg)
orfID/gene:chip#probes medFC consFC thrshld missingMM? expr ratio log expr ratio BINS log expr ratioFRE Q
Y BR020w/GAL1:A 21 64.81 24.57 2 64.81 1.81164202 -2 0
Y BR018c/GAL7:A 21 41.91 10.58 2 41.91 1.62231766 -1.95 0
Y BR019c/GAL10:A 20 37.8 13.03 2 37.8 1.5774918 -1.9 0
Y DR345c/HXT3:A 20 -25.05 -13.58 0.03992016 -1.39880773 -1.85 0
Y OR120W /GCY 1:D 20 12.31 7.81 2 12.31 1.09025805 -1.8 0
Y LR081w/GAL2:C 21 8.19 3.56 2 8.19 0.9132839 -1.75 0
Y GL189C/RP S 26A:B 19 -7.82 -0.45 0.12787724 -0.89320675 -1.7 0
Y P L066W /VP S 28:D 20 6.35 2.75 2 6.35 0.80277373 -1.65 0
Y HR094c/HXT1:B 20 -6.26 -2.38 1 0.15974441 -0.79657433 -1.6 0
Y OL154W /:D 21 -6.04 -3.27 0.16556291 -0.78103694 -1.55 0
Y P L067C/:D 21 5.95 3.13 2 5.95 0.77451697 -1.5 0
Y GL030W /RP L32_ex1:B21 -5.32 -3.11 0.18796992 -0.72591163 -1.45 0
Y FL045C/S E C53:B 21 -5.17 -2.73 0.1934236 -0.71349054 -1.4 0
Y BR106w/:A 21 -5.03 -2.66 1 0.19880716 -0.70156799 -1.35 1
Y E R190w/_f:B 20 -4.9 -2.48 1 0.20408163 -0.69019608 -1.3 0
Y MR318C/:D 20 4.02 2.36 4.02 0.60422605 -1.25 0
Y NL015W /P BI2:D 20 3.89 2.3 2 3.89 0.5899496 -1.2 0
Y BR011c/IP P 1:A 20 -3.73 -1.75 0.26809651 -0.57170883 -1.15 0
Y E R178w/P DA1:B 20 -3.46 -2.22 0.28901734 -0.5390761 -1.1 0
Y OL058W /ARG1:D 20 3.36 2.24 3.36 0.52633928 -1.05 0
Y CR005c/CIT2:A 20 -3.3 -2.15 0.3030303 -0.51851394 -1 0
Y HR092c/HXT4:B 20 -3.27 -1.52 1 0.3058104 -0.51454775 -0.95 0
25srRnaa:A::25srRnaa:B::25srRnaa:C::25srRnaa:D84 -3.27 -1.49 0.3058104 -0.51454775 -0.9 0
Y GL055W /OLE 1:B 20 3.21 1.98 3.21 0.50650503 -0.85 1
Y FR024C/_r:B 20 -3.21 -1.43 1 0.31152648 -0.50650503 -0.8 0
Y HR033W /:B 20 3.15 1.52 3.15 0.49831055 -0.75 2
Y DR009W /GAL3:A 20 3.08 1.38 2 3.08 0.48855072 -0.7 3
Y GR244C/:B 20 2.99 1.55 2 2.99 0.47567119 -0.65 1
Y KL096W /CW P 1:C 21 -2.97 -1.78 0.33670034 -0.47275645 -0.6 0
Y NL052W /COX5A:D 20 2.94 1.96 2.94 0.46834733 -0.55 1
Y J R073C/OP I3:C 20 -2.92 -1.52 0.34246575 -0.46538285 -0.5 5
Y MR256c/COX7:D 21 2.84 1.64 2.84 0.45331834 -0.45 3
0
5
10
15
20
25
30
Food Gas Motel
JanFebMarAprMayJun
Relative expression of all genes: Galactose vs. Glucose
0.1
1
10
100
1000
10000
-2.0
-1.5
-1.0
-0.5 0.0
0.5
1.0
1.5
2.0
Log of Fold Change
Num
ber
of G
enes
![Page 30: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/30.jpg)
To analyze the most induced genes, we...
• Extracted the intergenic DNA sequence upstream of each translation start using the Saccharomyces Genome Database.
• Used an algorithm for multiple sequence alignment to look for sequence motifs conserved among the most induced (or repressed).
• Looked at the intersection of genes which both matched a conserved motif and were induced (or repressed)
![Page 31: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/31.jpg)
Gibbs Motif Sampling Strategy1 Initialize the alignment by choosing a random subset of all
possible sites as the ‘site’ alignment, and use all remaining sequences to give a ‘non-site’ alignment.
2 Select a potential site from among all possible sites.3 If the site is in the alignment, take it out.4 Calculate the relative likelihood that the potential site belongs
with the site alignment rather than the ‘non-site’ alignment, based on a Bayesian multinomial distribution model.
5 Randomly choose whether or not to add the site, weighted by this relative likelihood.
6 Repeat Step 2
![Page 32: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/32.jpg)
‘DNAGibbs’: A Modified Gibbs Motif Sampler Optimized for DNA searches.
• Either forward or reverse strand of a potential site -- but not both -- may be added to the alignment.
• Near-optimum sampling method was improved so that it is faster and tends to result in higher scoring alignments.
• Simultaneous multiple motif searching was replaced with a more efficient iterative masking approach.
• The model for base frequencies of non-site sequence was fixed using the average nucleotide frequencies of S. cerevisiae.
• Now runs on DEC Unix and Windows platforms, in addition to the formerly supported SGI and Sun Unix platforms.
![Page 33: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/33.jpg)
• DNAGibbs (maximum log a posteriori likelihood ratio) scores less than 5. .
• Good matches (Z < 3 sd below the mean of the aligned positive motifs) with greater than 10% of all yeast genes (ORFs)
Finally, exclude motifs with:
*O.G. Berg & P.H. von Hippel, J. Mol. Biol., 193: 723-750 (1987)
![Page 34: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/34.jpg)
Using the top 10 genes induced in galactose, DNAGibbs found UASG, the site recognized by Gal4p
Info
rmat
ion
(B
its)
sequence logos were developed by T.D. Schneider & R.M. Stephens, Nucleic Acids Res., 18: 6097-6100 (1990).
CGYTCGGA-GA-AGT---CCGA Previous UASG consensus
![Page 35: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/35.jpg)
Genes that changed between galactose and glucose by more than 2-fold and have strong matches to the UASG motif
Gene Fold Change Best Z-Score # of SitesGAL1 >65 -1.4 5GAL7 >42 -0.7 2GAL10 >38 -1.4 5GCY1 >12 0.5 1GAL2 >8 0.4 4YPL066W >6 -1.1 1YPL067C >6 -1.1 1YMR318C 4 1.1 1GAL3 >3 2 2
![Page 36: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/36.jpg)
Galactose Regulatory Network
Gal4p-Gal80p active complex
Gal3p
GAL1MEL1 GAL7PGM2 GAL2 GAL10
Gal4p-Gal80p inactive complex
GALACTOSE
GAL80
GAL4
GCY1
Structural Genes For Galactose Metabolism
YPL067C YPL066W
?
?
YMR318CGAL3
Gal1p
![Page 37: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/37.jpg)
DNAGibbs and mating type
Motif Score %ORF Consensus Similaritymt-1 (A) 8.9 0.11 ttcctarttng P Boxmta-1 (B) 8.5 0.05 anwncwnkmaananantcwtbwtnw -mta-2 (C) 5.0 0.10 aaaycawmawnanwa -mta-3 (D) 28.1 0.31 grnawktacayg 2-bind, mt-mta-1mt-mta-1 (E) 20.7 0.34 crtgtanntwyc 2-bind mta-3mt-mta-2 (F) 5.3 0.13 kwtnywnnnknnntgtttsa PRE, mt-mta-2mt-mta-3 (G) 8.6 0.27 tgamaywwtnaama PRE, mt-mta-1mt-mta-4 (H) 5.3 0.31 rmtgmcngcma Q Box
Expect DNABP Consensus Ref: Herskowitz, et al.,P Box Mcm1p tttcctaattaggnan in Gene Expression, E. W. Jones, Q Box Mat1p tcaatgacag et al., Eds. (CSHL Press, NY, 1992) .2-bind Mat2p crtgtaawt vol. 2: pp. 583-656PRE Ste12p tgaaaca
![Page 38: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/38.jpg)
0 1 2 3 4 5 6 7 8 9 10Z-score
rpoD15rpoD17rpoD16rpoD18
ompRhnslrp
rpoD19malTrpoS
crpdnaA
fisnarLfarR
glpRtrpRsoxS
ihfoxyRmetRtyrRargRcytR
furmetJphoBfruRcspAtorR
nagCfadRpurRarcA
pdhRlexAgcvA
fnrgalRntrCrhaSiclRfhlA
cynRada
deoRcarPlacI
marRrpoH14
ilvYrpoH13
araCtus
hipBflhCD
rpoEmelRcysBrpoN
rpoD15rpoD17rpoD16rpoD18
ompRhnslrp
rpoD19malTrpoS
crpdnaA
fisnarLfarR
glpRtrpRsoxS
ihfoxyRmetRtyrRargRcytR
furmetJphoBfruRcspAtorR
nagCfadRpurRarcA
pdhRlexAgcvA
fnrgalRntrCrhaSiclRfhlA
cynRada
deoRcarPlacI
marRrpoH14
ilvYrpoH13
araCtus
hipBflhCD
rpoEmelRcysBrpoN
Calibration of 60 E. coli binding site matrices
![Page 39: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/39.jpg)
Interaction Quantitation Options
Over-expression:Yeast two-hybrid screens (in vivo complexity)
In vitro chip assaysMartha Bulyk, David Lockhart, Erik Gentalen
Natural levels, environmental regulation:Subcellular fractionation (unstable)In vivo footprinting (partners unknown)In vivo crosslinking
![Page 40: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/40.jpg)
xmask 2
3'
A A o o o oxx x
h
Combinatorial ds-DNA Chips(chemical, photo & enzymatic synthesis)
SiO2
A A C C G G
3'
specific 16-mer
A C A C A C
A A C C G GA C A C A C
Polymerase
cg
GC
GC
cg
5'
3'5'
spacern-mer
primer
![Page 41: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/41.jpg)
2nd strandsequenceat half-sites
GTAGTAAGTACGTAGGTATGTCGTCAGTCCGTCGGTCTGTGGTGAGTGCGTGGGTGT
length of spacer between half-sites
14 0 14 0 14 0 14 0
length of spacer between half-sites
BEFORE RsaI Digestion(zoomed in view)
AFTER RsaI Digestion(zoomed in view)
RsaI Digestion of a Fixed Density Double-Stranded DNA Chip with a Variable Spacer Length of 0 to 14 bp Between the Half-Sites
Conclusion: Loss of Signal Intensity Corresponds to Cleavage of dsDNA by RsaI
Significance:1) Double-Stranded DNA is Created by Primer Extension of ssDNA Chips
2) Double-Stranded DNA on the Surface of the Chip is Accessible for Interaction with a DNA-Binding Protein
5'
GTAC
GTAC
CA*TG
CA*TG
RsaI
![Page 42: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/42.jpg)
Interaction Quantitation Options
Over-expression:Yeast two-hybrid screens (in vivo complexity)
In vitro chip assays
Natural levels, environmental regulation:Subcellular fractionation (unstable)In vivo footprinting (partners unknown)In vivo crosslinkingMartin Steffen, Andy Link
![Page 43: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/43.jpg)
Isolate in vivo crosslinked complexes
by nucleic acid CsCl (or hybridization) by protein epitope tag
analyze protein by DNase 2D gel,trypsin-LC-ESI-MS/MS
analyze DNA/RNA by chip pH
kdal
Link et al. (1997) Electrophoresis 18:1259 & 1314
![Page 44: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/44.jpg)
Rich media log-phase, in vivo crosslink, DNaseI digest
pH
kdal
4 5 6 7
10
20
30
40
50
100
lac I
fu r
grpE
dps
hns
efp
purEdps
sspA
ihfB
ssb
![Page 45: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/45.jpg)
In vivo crosslinking & footprinting summary
11% of the E.coli genome is non-coding.About 340 / 4328 proteins are likely DNA-binding proteins (2 or the top 380 proteins).
24/25 footprinted GATC sites are non-coding. Odds = 10-27.
2/3 crosslinked DNA molecules are likely regulatory binding sites. Odds = 0.04
8/11 top DNA-crosslinked proteins are known DNA-binding proteins. Odds = 10-16.
![Page 46: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/46.jpg)
Thoughts on chips for crosslinked epitope selections (& generally).
An easy 10-fold enrichment but with 40,000 fragments meansan expensive 1:4000 Signal:Noise,if sequencing (or SAGE) were used.
However, spread over a chip, 1:10.
![Page 47: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/47.jpg)
E. coli oligonucleotide chip challenges:
#1) Closely spaced transcripts, e.g. carAB: (Intergenic 25-mers overlap, start 6 bp apart on average)
P1(pyrimidine) ... 48 bp ... P2(arginine)
gggtaagcaaatttgcattgcttcatactgactgaatgaattaatatgcaaataaagtg
#2) Repeats, e.g. tufA & tufB DNA. Mismatches: *.....*.........*..*.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................................................................................................................................................................................*...........................................................................................................................................................*...................................*.................*..*........*.......................*.............................*.............
![Page 48: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/48.jpg)
Expression: Cell-type & condition clustering plus DNAGibbs algorithm extracts intergenic binding motifs for yeast Gal-Glc, Mat-Mata, & 30oC-39oC comparisons.
Interaction: Strong enrichment for low abundance wild-type & mutant in vivo E.coli DNA-protein contactsestablishes mechanistically anchored intergenic elements.
Growth: Multiplex competitive growth of in-frame replacements for novel E.coli regulatory genes definescellular system integration & environments.
From Genome Sequences to Regulatory Network Phenotypes
Summary
![Page 49: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/49.jpg)
Environments
Metabolites
Growth rate
RNADNA ProteinExpression
InteractionskD
kR kP
kI
kc
kD , kD , kD : Initiate, Elongate, Terminate, Fold, Modify, Localize, Degrade
Escherichia coli & Saccharomyces cerevisiaeRegulatory and Metabolic Networks
Population Selection, Flux Balance, & Gibbs
![Page 50: From Genome Sequences to Regulatory Network Phenotypes](https://reader035.vdocuments.net/reader035/viewer/2022062518/5681440d550346895db0a93a/html5/thumbnails/50.jpg)
Growth, Expression, & InteractionHarvard Center for
Computational Genetics
John Aach
Tim Chen
George Church
Jason Hughes
Jason Johnson
Abby McGuire
Jong Park
Fritz RothAffymetrix
David Lockhart
Eric Gentalen
NCBI
Andrew Neuwald
DOE, DARPA, Lipper, NIST, HMR
HMS Genetics
Andy Link, Doug Selinger
Pete Estep, Michael Ching
Martha Bulyk, Sonali Bose
Martin Steffen
Saeed Tavazoie, Annie Chan
Dereth Phillips, Chris Harbison
UCSD
Bernhard Palsson