power and weakness of data power: data + software + bioinformatician = answer. weakness: data...
Post on 15-Jan-2016
220 views
TRANSCRIPT
Power and weaknessPower and weakness of data of data
Power: data + software + bioinformatician = answer.
Weakness: Data errors. Data poorly understood. Poor software. Never enough data. Few bioinformaticians available.
Laerte about structures:Laerte about structures:
“Use the Force, Luke” sequence , Gert
Signals in SequencesSignals in Sequences
The number of sequencesThe number of sequencesavailable for analysis rapidlyavailable for analysis rapidlyapproaches infinite.approaches infinite.
We need new ways to look We need new ways to look at all this information.at all this information.
The First Law:The First Law:
First law of sequence First law of sequence analysis:analysis:
A conserved residue A conserved residue is important.is important.
With thousands of aligned With thousands of aligned sequences:sequences:
Second law of sequence Second law of sequence analysis:analysis:
A very conserved residue A very conserved residue is very important.is very important.
Signals in sequences:Signals in sequences:Conserved, CMA, variableConserved, CMA, variable
QWERTYASDFGRGHQWERTYASDTHRPMQWERTNMKDFGRKCQWERTNMKDTHRVWBlack = conservedWhite = variableGreen = correlated mutations(CMA)
Sequence SignalsSequence Signals
Three types of information from multiple sequence alignments:
1) Conservation2) Correlation3) Variability
ArtefactsArtefacts
Wrong sequence signalscan result from:
Not enough sequencesToo conserved sequencesToo variable sequencesOver-alignmentOver-interpretation
Recalcitrant residues Recalcitrant residues
Sequence EntropySequence Entropy
20
Ei = pi ln(pi) i=1
Sequence VariabilitySequence Variability
Sequence variability is the number of residue types that is present in more than 0.5% of the sequences.
Entropy - VariabilityEntropy - Variability
Evolution = try everything(and keep what works well)
Variability = Chaos (try everything)
Entropy = Information(keep what works well)
Entropy - VariabilityEntropy - Variability
Variability is result of DNA trying everything.
Entropy is the protein’s break on evolutionary speed.
Ras Entropy - VariabilityRas Entropy - Variability
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
Ras LocationRas Location
11 Red12 Orange22 Yellow23 Green33 Blue
Protease Protease Entropy - VariabilityEntropy - Variability
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
Protease LocationProtease Location
11 Red12 Orange22 Yellow23 Green33 Blue
Globin Globin Entropy - VariabilityEntropy - Variability
GPCR
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
Globin LocationGlobin Location
11 Red12 Orange22 Yellow23 Green33 Blue
And now for drug design: GPCR And now for drug design: GPCR
11 Red
12 Orange
22 Yellow
23 Green
33 Blue
GPCRs: (Membrane facing GPCRs: (Membrane facing amino acids left out)amino acids left out)
11 Red12 Orange22 Yellow23 Green33 Blue
SummarySummary
Given many sequences:
Every residue’s role known.Signaling paths detectable.Two step evolutionary model: First main site, soon after modulator site.
Beyond the summaryBeyond the summary
Sequence -> structure -> functionis wrong. It should be:Structure -> sequence -> function.
And, because active sites are at the surface, conserved residues are at or near the surface.
Beyond the summaryBeyond the summary
Why do all TIM-barrel enzymes have the functional residues at the C-terminal side of the strands?
Beyond the summaryBeyond the summary
22 Yellow: Core
11 Red: main site
23 Green: Modulator
12 Orange: Around main site
Up to 18 residue types
Up to 14 residue types
Up to 8 residue types
Up to 4 residue types11
12 22
23 33
The weakness of dataThe weakness of data
Data errors.Poor software. Data poorly understood. Never enough data. Few bioinformaticians around.
The weakness of dataThe weakness of data
Rob Hooft
WHAT_CHECK
www.cmbi.kun.nl/gv/servers/www.cmbi.kun.nl/gv/pdbreport/
Structure validationStructure validation
Everything that can goEverything that can gowrong, will go wrong,wrong, will go wrong,especially with things asespecially with things ascomplicated as proteincomplicated as proteinstructures.structures.
Why ?Why ?
Why does a sane (?) human being spend fourteen years to search for twelve million errors in the PDB?
Because:Because:
All we know about proteins is derived from PDB files.
If a template is wrong the model will be wrong.
Errors become smaller when you know about them.
What do we check?What do we check?
Administrative errors.Crystal-specific errors.NMR-specific errors.Really wrong things.Improbable things.Things worth looking at.Ad hoc things.
Error detectionError detection
Detecting errors is one thingfixing them another…
We try not to say about the structure that it is wrong, but we try to say what is wrong about the structure.Give hints how to fix things.
How difficult can it be?How difficult can it be?
How difficult can it be?How difficult can it be?
Your best check:Your best check:
PlanarityPlanarity
Little things hurt bigLittle things hurt big
Improbable thingsImprobable things
How wrong is wrong?How wrong is wrong?
Our errorsOur errors
Four sigma: 12.000 false positives.Administrative errors misunderstood.Improbable is not wrong.Poor data makes errors unavoidable.Bugs.
Contact ProbabilityContact Probability
Contact ProbabilityContact Probability
DACADACA
DACADACA
DACADACA
DACADACA
DACADACA
Contact probability boxContact probability box
Using contact probabilityUsing contact probability
His, Asn, Gln ‘flips’His, Asn, Gln ‘flips’
Where are the protons?Where are the protons?
Hydrogen bond networkHydrogen bond network
Hydrogen bond force fieldHydrogen bond force field
Hydrogen bond force fieldHydrogen bond force field
15% should be flipped15% should be flipped
SummarySummary
Everything that could go wrong has gone wrong.Errors are on a ‘sliding scale’.Error detection can detect a lot, but surely not everything (yet).
Beyond the summary,Beyond the summary,For Drug Design:For Drug Design:
Forget: High throughput.Forget: Docking.Forget: Structure in absence of many, many sequences.
First gather and digest all experimental data.
Beyond the summary,Beyond the summary,For Drug Design:For Drug Design:
First know your enemy,
then defeat it.
Thanks to:Thanks to:
Laerte Oliveira Sao PauloFlorence Horn San FranciscoRob Hooft DelftWilma Kuipers Weesp Bob Bywater CopenhagenNora vd Wenden The HagueMike Singer BostonAd IJzerman LeidenMargot Beukers LeidenAmos Bairoch GenevaFabien Campagne San Diego