bioinformatics of disease: immune epitope prediction

80
Bioinformatics of Disease: immune epitope prediction Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine Sydney, Australia National University of Singapore, Singapore ([email protected]) ([email protected]) Visiting scientist @ Institute for Infocomm Research (I 2 R), Singapore

Upload: jory

Post on 25-Feb-2016

46 views

Category:

Documents


3 download

DESCRIPTION

Bioinformatics of Disease: immune epitope prediction. Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatics of Disease: immune epitope prediction

Bioinformatics of Disease: immune epitope prediction

Shoba RanganathanProfessor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of BiochemistryMacquarie University Yong Loo Lin School of MedicineSydney, Australia National University of Singapore, Singapore([email protected]) ([email protected])

Visiting scientist @ Institute for Infocomm Research (I2R), Singapore

Page 2: Bioinformatics of Disease: immune epitope prediction

Bioinformatics is ….. Bioinformatics is the study of living

systems through computation

Page 3: Bioinformatics of Disease: immune epitope prediction

Data in Bioinformatics (in the main)

and their management and analysis

Networks, pathways

and systemsSequences Genomes Transcriptomes

Databases, ontologies Data & text

miningEvolution andphylogeneticsMaths/StatsAlgorithms Physics/

Chemistry

Genetics and populations

Structures

Page 4: Bioinformatics of Disease: immune epitope prediction

Overview of my research1. Genome analysis2. Transcriptome analysis3. Protein/Proteome

analysis4. Systems Biology5. Immunoinformatics6. Genome-phenome

mapping7. Biodiversity Informatics

Page 5: Bioinformatics of Disease: immune epitope prediction

5. What is Immunoinformatics? Using Bioinformatics to address problems

in Immunology Application of bioinformatics to

accelerate immune system research has the potential to deliver vaccines and address immunotherapeutics.

Computational systems biology of immune response

Page 6: Bioinformatics of Disease: immune epitope prediction

Immunoinformatics

Immunology

ComputerScience

Biology

Page 7: Bioinformatics of Disease: immune epitope prediction

Networks, pathways,

and systems

Maths/StatsDatabases

Artificialintelligence

Algorithms

Cell biology

-omics

Basic immunology

Clinical immunology

IMMUNOINFORMATICS

Physics/Chemistry

Page 8: Bioinformatics of Disease: immune epitope prediction

Summary Introduction Structural Immunoinformatic

Database development Data Analysis Computational models Applications

Networks, pathways

and systemsGenetics and populations

-omics

Basic immunology

Clinical immunology

Page 9: Bioinformatics of Disease: immune epitope prediction

The immune system Composed of many interdependent cell

types, organs, and tissues toprotect the body from infections (bacterial,

parasitic, fungal, or viral) and arrest abnormal growth and differentiation

Inappropriate immune responses lead to allergies and autoimmunity

2nd most complex system in the human body

Page 10: Bioinformatics of Disease: immune epitope prediction

Genomics vs. Immunomics Genomics: solving the genome puzzle

104 genes coding for 106 products Immunomics: understanding immune

response102-103 genes leading to >1012 products

Enormous diversity in immunomics has implications for immune function and modulation

Page 11: Bioinformatics of Disease: immune epitope prediction

It is a numbers game…. >1013 MHC class I haplotypes (IMGT-HLA)

107-1015 T cell receptors (Arstila et al., 1999)

>109 combinatorial antibodies (Jerne, 1993)

1012 B cell clonotypes (Jerne, 1993)

1011 linear epitopes composed of nine amino acids

>>1011 conformational epitopes

Page 12: Bioinformatics of Disease: immune epitope prediction

T cell mediated adaptive immune response Specific peptide residues critical for stimulating

cellular immune responses Major histocompatibility complex (MHC)

molecules (Human Leukocyte Antigen or HLA in humans) bind and present short antigenic peptides to T cell receptors, for inspection

Antigen presentation is by two classes of MHC (class I and class II)

Those peptides that bind to specific MHC and trigger T cell recognition (T cell epitopes) are targets for vaccine and immunotherapy development

Page 13: Bioinformatics of Disease: immune epitope prediction

1. Epitope

3. T cell receptor

How to generate a T cell-mediated immune response

2. MHC

Page 14: Bioinformatics of Disease: immune epitope prediction

Major histocompatibility complex

MHC Class II

Gene structure of the human MHC

MHC Class I

3D structure of the human MHC

Page 15: Bioinformatics of Disease: immune epitope prediction

MHC Class I for endogenous peptides

Figure by Eric A.J. Reits

Page 16: Bioinformatics of Disease: immune epitope prediction

MHC class II for exogenous peptides

Figure by Eric A.J. Reits

Page 17: Bioinformatics of Disease: immune epitope prediction

1. Degradation of antigen2. Peptide binding to MHC3. Recognition of peptide-MHC complex by T-cells

Yewdell et al. Ann. Rev Immunol (1999)

20% processed

0.5% bind MHC

50% CTL response

0.05% chance of immunogenicity

Antigen processing pathway: peptides, MHC, T-cells

Page 18: Bioinformatics of Disease: immune epitope prediction

Physico-chemical properties affect MHC-peptide binding

Page 19: Bioinformatics of Disease: immune epitope prediction

Epitope prediction º “Fishing”

Page 20: Bioinformatics of Disease: immune epitope prediction

Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at:the allele levelthe supertype leveldisease-implicated alleles alone.

Minimize the number of wet-lab experiments Cut down the lead time involved in epitope

discovery and vaccine design

Computational models can help identify T cell epitopes

Page 21: Bioinformatics of Disease: immune epitope prediction

1. Sequence-based approach Pattern recognition techniques

• binding motif, matrices, ANN, HMM, SVM Main limitations:

• Require large amount of data for training• Preclude data with limited sequence conservation

2. Structure-based approach Rigid backbone modeling techniques Flexible docking techniques Main advantage: large training datasets unnecessary

Predicting MHC-binding peptidesTong, Tan and Ranganathan (2007) Briefings in Bioinformatics 8: 96-108

Page 22: Bioinformatics of Disease: immune epitope prediction

Our aim: Structure-based prediction

of MHC-binding peptides

Page 23: Bioinformatics of Disease: immune epitope prediction

Great potential to:generate biologically meaningful data for analysispredict candidate peptides for alleles that have not

been widely studied, where sequence-based approaches fail or are not attempted

predict binding affinity of peptidespredict non-contiguous epitopes

Structure determination through experimental methods is both expensive and time-consuming

Has not been extensively studied due to high computational costs and development complexity

Why structure?

Page 24: Bioinformatics of Disease: immune epitope prediction

Protein Threading [Altuvia et al. 1995; Schueler-Furman et al. 2000]

Homology Modeling [Michielin et al. 2000] Rigid/Flexible Docking [Rosenfeld et al. 1993;

Sezerman et al. 1996; Rognan et al. 1999; Desmet et al. 2000; Michielin et al. 2003]

Existing Structure-based Prediction Techniques

Page 25: Bioinformatics of Disease: immune epitope prediction

Hypothesis for epitope selection Peptides bound to MHC alleles are similar to

substrates bound to enzymes “Lock-and-key” mechanism for peptide

selectionShapeSizeElectrostatic characteristics

Page 26: Bioinformatics of Disease: immune epitope prediction

Introduction Structural Immunoinformatic

Database development Data Analysis Computational models Applications

Sequences

Databases, ontologies

Basic immunology

Genetics and populations

Structures

Page 27: Bioinformatics of Disease: immune epitope prediction

MPID:MHC-Peptide Interaction Database Govindarajan et al. (2003) Bioinformatics, 19: 309-310RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)

Distribution based on MHC allele specificity

0

5

10

15

20

25

A*0

201

A*6

801

B*0

801

B*2

705

B*3

501

B*5

101

B*5

301

DQ

8

DR

1

DR

2

DR

3

DR

4

H2-

Db

H2-

Dd

H2-

Kb

H2-

Ld

HLA

-Cw

3

HLA

-Cw

4

I-Ad

I-Ak

RT1

.Aa

MHC allele

Page 28: Bioinformatics of Disease: immune epitope prediction

Gap index =

Peptide/MHC interaction characteristics

Gap Volume

Intermolecular hydrogen bonds

Interface area

Gap volumeInterface area

Interacting Residues

Peptide Length

Page 29: Bioinformatics of Disease: immune epitope prediction

MPID-T: MHC-Peptide-T Cell Receptor Interaction Database Tong et al. (2006) Applied Bioinformatics, 5: 111-114

187 curated pMHC 16 with TCR Human:110, Murine:74

and Rat:3 Alleles: 40

(interface area, H bonds, gap volume and gap index)

Page 30: Bioinformatics of Disease: immune epitope prediction

0

510

1520

25

3035

40

A*01

01

A*02

01

A*68

01

B*15

01

B*3

501

B*08

01

B*27

05

B*2

709

B*44

02

B*44

03

B*44

05

B*51

01

B*53

01

Cw

*030

4

Cw

*040

1

E*01

03

E*01

01

G*0

101

DR

B1*

0101

DR

B1*

0301

DR

B1*

1501

DR

B5*

0101

DQ

B1*0

302

DQ

B1*0

602

DR

B1*

0401

DQ

B1*

0201

RT

1.A

a

RT1

-A1C

H2-

Db

H2-

Dd

H2-

Kb

H2-

Ld

H2-

M3

H2-

Qa-

2

I-Ak

I-Ab

I-Ad

I-Au

I-Ek

I-Ag

7

101 new entries 187 entries (Human: 110; Murine: 74; Rat: 3) 134 non-redundant entries (class I: 100; class II: 34) 121 class I and 41 class II entries 26 HLA alleles (class I: 18; class II: 8) 14 rodent alleles (class I: 8; class II: 6) 16 TCR/peptide/MHC complexes

Distribution of MHC by allele

Page 31: Bioinformatics of Disease: immune epitope prediction

Peptide/MHC binding motifs

Conserved peptide properties in solution structures Classified according to

Alleles Peptide length

Polar Amide Basic Acidic Hydrophobic

Page 32: Bioinformatics of Disease: immune epitope prediction

1. There were only 36 crystal structures of unique MHC (2006) alleles vs. 1765 unique MHC alleles identified in IMGT/HLA database

2. Structure determination through experimental methods is both expensive and time-consuming

3. Homology model building for alleles with no structural data!

How to obtain structures of experimentally unsolved alleles?

Page 33: Bioinformatics of Disease: immune epitope prediction

Introduction Structural Immunoinformatic

Database development Data Analysis of pMHC Class I

complexes Computational models Applications

Data & text mining

Maths/Stats

Structures

Page 34: Bioinformatics of Disease: immune epitope prediction

MHC Class I superfamilies have different interaction characteristics

Superfamily HLA-A2 (36 entries)

HLA-B7(12 entries)

HLA-B27(18 entries)

Interface area (Å2) 846.3±48.9 876.7±72.4 934.0±136.0

Gap volume (Å3) 799.8±195.2 870.2±198.0 985.1±101.5

Gap index 0.9±0.2 1.0±0.1 1.0±0.3

Hydrogen bonds 11.1±1.9Concentrated at pockets A, B, F

14.3±2.3Well distributed

17.9±2.8Concentrated at pockets A, B, F

Single linkage cluster analysis of 68 pMHC Class I complexes from 13 alleles (all available A and B)

Page 35: Bioinformatics of Disease: immune epitope prediction

Data 68 peptide–HLA complexes spanning 13 classes I alleles

from MPID-THierarchical clustering Hierarchical clustering using the agglomerative algorithm. Distance between structures computed by single-linkage

method (MATLAB version 7.0) based on the separation between the each pair of data points.

Nearest neighbors merged into clusters. Smaller clusters were then merged into larger clusters based

on inter-cluster distances, until all structures are combined. Last 3 levels considered for defining HLA class I supertypes.Interaction parameters Significant for the characterization of peptide/MHC interface:

Intermolecular hydrogen bonds pMHC Interface area

Binding characteristics of HLA supertypes analyzed

Details

Gap volume Gap index

Page 36: Bioinformatics of Disease: immune epitope prediction

B27

B44

B7

B62

B8

Legend

Do the Class I alleles aggregate into “superfamilies” using receptor-ligand interaction patterns?

Page 37: Bioinformatics of Disease: immune epitope prediction

80 HLA class I complexes 13 class I alleles Five descriptors Hierarchical clustering using

nearest neighbor algorithm 77% consensus with data

from other groups

Supertype definition: receptor structure, ligand binding motifs, or receptor-ligand interaction patterns

MHC Class I superfamilies from receptor-ligand interactions

B27 B44 B7 B62 B8

Legend

Tong, Tan and Ranganathan (2007) Bioinformatics, 23: 177-183

Page 38: Bioinformatics of Disease: immune epitope prediction

Introduction Structural Immunoinformatic

Database development Data Analysis Computational models Applications

Maths/Stats

StructuresSequences

Physics/Chemistry

Page 39: Bioinformatics of Disease: immune epitope prediction

1. Finding the best fit conformation (docking) of peptides within the MHC binding groove

2. Screening potential binders from the background

Two-step approach to predict MHC-binding peptides

Page 40: Bioinformatics of Disease: immune epitope prediction

Docking is a computationally exhaustive procedure Large number of possible peptide conformations

3 global translational degrees of freedom 3 global rotational degrees of freedom 1 conformational degree of freedom for each rotatable bond

y

x

z R

N C Ca

C

O

>1010 possible conformations for a 10-residue peptide

Page 41: Bioinformatics of Disease: immune epitope prediction

Class I peptides N-termini residues

0.02 – 0.29 Å C-termini residues

0.00 – 0.25 Å

Class II binding registers Only 9 residues fit in

the binding groove N-termini residues

0.01 – 0.22 Å C-termini residues

0.02 – 0.27 Å

Conservation of nonamer peptide backbone conformation

Page 42: Bioinformatics of Disease: immune epitope prediction

Rapid docking of peptide to MHC Tong, Tan & Ranganathan (2004) Protein Sci. 13:2523-2532

Anchoring root fragments to reduce search space (Pseudo-Brownian rigid body docking )

Loop modeling (Loop closure of central backbone by satisfaction of spatial restraints)

Ligand backbone and side-chain refinement (entire backbone and interacting side-chains

2

3

1

Page 43: Bioinformatics of Disease: immune epitope prediction

Benchmarking with existing techniquesAuthor Technique Peptide RMSDa RMSDb

Rognan et al. Simulated Annealing

TLTSCNTSV 1.04 0.46FLPSDFFPSV 1.59 1.10GILGFVFTL 0.46 0.32ILKEPVHGV 0.87 0.87

LLFGYPVYV 0.78 0.33

Desmet et al. Combinatorial Buildup Algorithm RGYVYQGL 0.56 0.32

Rosenfeld et al. Multiple Copy AlgorithmFAPGNYPAL 2.70 0.40

GILGFVFTL 1.40 0.32

Sezerman et al. Combinatorial Buildup Algorithm

LLFGYPVYV 1.40 0.33

ILKGPVHGV 1.30 0.87

GILGFVFTL 1.60 0.32

TLTSCNTSV 2.20 0.46

aRMSD of peptide backbone obtained from respective authors. bRMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.

Page 44: Bioinformatics of Disease: immune epitope prediction

Quantitative separation of binders from non-binders: empirical free energy scoring function DQ3.2b involved in several autoimmune

diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome

type II

Page 45: Bioinformatics of Disease: immune epitope prediction

Gbind = αGH + βGS + GEL + C

Gbind = binding free energy GH = hydrophobic term GS = decrease in side chain entropy GEL = electrostatic term C = entropy change in system due to external

factors α, β, γ optimized by least-square multivariate regression

with experimental binding affinities (IC50) of MHC-peptides in training dataset (Rognan et al., 1999)

Quantitative separation of binders from non-binders: empirical free energy scoring function

Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).

Page 46: Bioinformatics of Disease: immune epitope prediction

Test case: MHC Class II DQ8

DQ3.2b (DQA1*0301/DQB1*0302) is involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome

type II

Page 47: Bioinformatics of Disease: immune epitope prediction

Data used Structure: 1JK8 - DQ3.2β–insulin B9-23 complex Dataset I: 127 peptides with experimentally determined

IC50 values [70 high-affinity (IC50 < 500 nM), 13 medium-affinity (500 nM < IC50 < 1500 nM )and 23 low-affinity (1500 < IC50 < 5000 nM) binders and 21 non-binders (5000 < IC50)] derived from biochemical studies. 87 with known binding registers.

Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2) peptides with experimental T-cell proliferation values from functional studies, with 7 peptides eliciting DQ3.2β-restricted T-cell proliferation.

Gbind ≈ -RT ln (IC50) (Rognan et al., 1999).

Page 48: Bioinformatics of Disease: immune epitope prediction

Training 56 binding conformations with known registers 30 non-binding conformations from 3 non-

binders Testing

Test set 1 – 68 peptides from biochemical studies 16 strong ; 13 medium; 21 weak; 18 non-binders

Test set 2 – 12 peptides from functional studies 7 elicit T-cell proliferation

Scoring: Training & testing datasets

Page 49: Bioinformatics of Disease: immune epitope prediction

Y Q T I E E N I K I F E E D A

E285B 112-126 peptide

Core sequence Binding Energy

YQTIEENIK -23.12

QTIEENIKI -21.34

TIEENIKIF -25.32

IEENIKIFE -29.53

EENIKIFEE -32.27

ENIKIFEED -21.72

NIKIFEEDA -22.95

Screening class II binding register: a sliding window approach

Page 50: Bioinformatics of Disease: immune epitope prediction

Docking

Anchoring root fragments (probes) to reduce search space

Loop modeling

Refinement of binding register

Extension of flanking residues for MHC Class II

A

B

C

D

4-step protocol used

Page 51: Bioinformatics of Disease: immune epitope prediction

Sensitivity (SE) = number of binders correctly predicted = TP/AP (TP+FN)

Specificity (SP) = number of non-binders correctly predicted

= TN/AN (TN+FP)

Accuracy estimates

Area under ROC (receiver operating characteristics) curve:>90% excellent>80% good

Page 52: Bioinformatics of Disease: immune epitope prediction

Results for Training setSpecificity (SP) Level

Group Sensitivity (SE)

Binding Energy Threshold (kJ/mol)

LMH 0.90 -28.70 MH 0.85 -29.10

SP = 0.80

H 0.75 -30.82 LMH 0.84 -29.10 MH 0.77 -30.50

SP = 0.90

H 0.75 -32.74 LMH 0.81 -29.93 MH 0.73 -32.12

SP = 0.95

H 0.63 -33.59

High SE (good for most predictions)

Very few FPs, but also fewer predictions

Page 53: Bioinformatics of Disease: immune epitope prediction

Group LMH MH HAROC 0.88 0.93 0.93

Screening class II binding register: HLA-DQ8 prediction accuracy for Test Set I

Classification of binding peptides High-affinity binders (H)

IC50 ≤ 500 nM Medium-affinity binders (M)

500 nM < IC50 ≤ 1500 nM Low-affinity binders (L)

1500 < IC50 ≤ 5000 nM

Page 54: Bioinformatics of Disease: immune epitope prediction

Position 1 4 6 7 9 Source BE

(kJ/mol) IC50 (nM)

Binding Motif

T D R R Q S V V V N W M D D G K A A A D E I I I P D Y Y R Q E F L M

L Q L Q P F P Q P Q P F P P L A-gliadin 56-70 -41.01 20 D M T P A D A L D D F D L HSV -40.53 173 A A A A A V A A E A Y Artificial sequence -39.98 48 G V A G L L V A L A V IA-2 499-509 -36.16 95 D S N I M N S I N N V M D E I D F F E K Pf ABRA 487–506 -36.01 171 F E S T G N L I A P E Y G F K I S Y HA 255–271Y -35.70 62 Y P F I E Q E G P E F F D Q E MHC Ia 51–63 analog -35.34 1156 L L D I L D T A G L E E Y S A M R D p21 51–66; C out -35.27 202 Q P Y P Q P Q P F P S Q Q P Y A-gliadin 41-55 -35.26 1120 F P S Q Q P Y L Q L Q P F P Q A-gliadin 49-63 -33.93 20 C D G E R P T L A F L Q D V M GAD 101–115 -33.57 69 S F P P Q Q P Y P Q P Q P Q Y A-gliadin 77-91 -33.35 370 S Q D L E L S W N L N G L Q A D L S S FceR 104–122 -32.89 123 E P R A P W I E Q E G P E Y W MHC Ia 46-63 -32.89 519 P P L Y A T G R L S Q A Q L M P S P P M VP16 -32.59 538 S Q D L E L S W N L N G L Q A Y FceR 104–122 analog -32.49 118

Ligands / Epitopes

I A R A K M F P A V A E K 34P3A -31.91 541

Test Set 1: Improved detection of binders

lacking position specific binding motifs

Page 55: Bioinformatics of Disease: immune epitope prediction

Binding registers20/23 (87%) binding registers Only register (aa 4-12) from Test Set 2

(Der p 2: 1-20)(SE=0.80; SP(LMH)=0.90)

Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63)

T-cell proliferation

Page 56: Bioinformatics of Disease: immune epitope prediction

Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)

0123456789

1011121314

1 2 3 4 5 6 7

No of Binding Registers

No o

f Pep

tides

Weak Binders Medium Binders Strong Binders

Mainly for medium and high binders Experimental support: Sinha et al. for

DRB1*0402 Is this why binding motifs are unsuccessful?

Page 57: Bioinformatics of Disease: immune epitope prediction

Introduction Structural Immunoinformatic Database

development Data Analysis Computational models developed Applications

Page 58: Bioinformatics of Disease: immune epitope prediction

Autoimmune blistering skin disorder Characterized by autoantibodies targeting

desmoglein-3 (Dsg3) Strong association with DR4 and DR6 alleles

Pemphigus vulgaris (PV)

http://www.medscape.com

adam.about.com

www.aafp.org

Page 59: Bioinformatics of Disease: immune epitope prediction

Who are the major players in PV? DR4 PV implicated alleles (for Semitic)

DRB1*0401 DRB1*0402 DRB1*0404 DRB1*0406

DR6 PV implicated alleles (for Caucasians) DRB1*1401 DRB1*1404 DRB1*1405 DQB1*0503

Page 60: Bioinformatics of Disease: immune epitope prediction

DR4 PV implicated alleles (DRB1*0401, *0402, *0404, *0406) High sequence conservation

97.9 – 99.0% identity 98.4 – 99.5% similarity

High structural conservation Cα RMSD <0.22 Å for all key binding pockets

7 polymorphic residues within binding cleft Pocket 1 (β86), Pocket 4 (β70, 71, 74) Pocket 6 (β11) Pocket 7 (β71) Pocket 9 (β37)

What is known about DR4?

Page 61: Bioinformatics of Disease: immune epitope prediction

DR6 PV implicated alleles (DRB1*1401, *1404, *1405, DQB1*0503) High sequence conservation

85.8 – 94.1% identity 83.2 – 97.3% similarity

High structural conservation Cα RMSD <0.22 Å for all key binding pockets

14 polymorphic residues within binding clefts Pocket 1 (β86) Pocket 4 (β13, 70, 71, 74, 78) Pocket 6 (β11) Pocket 7 (β28, 30, 67, 71) Pocket 9 (β9, 37, 57, 60)

What is known about DR6?

Page 62: Bioinformatics of Disease: immune epitope prediction

9 stimulatory Dsg3 peptides tested on PV patients possessing DR4 and DR6 PV implicated alleles1. Dsg3 96-112 (DR4, DR6)2. Dsg3 191-205 (DR4, DR6)3. Dsg3 206-220 (DR4, DR6)4. Dsg3 252-266 (DR4, DR6)5. Dsg3 342-356 (DR4, DR6)6. Dsg3 380-394 (DR4, DR6)7. Dsg3 763-777 (DR4, DR6)8. Dsg3 810-824 (DR4)9. Dsg3 963-977 (DR4)

Clues…

Page 63: Bioinformatics of Disease: immune epitope prediction

DR4 PV 8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402 Atomic clashes with all other investigated DR4 subtypesDR6 PV 6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503 Atomic clashes with all other investigated DR6 subtypes

HLA association in DR6 PV more likely to be at DQ than DR locus

Consistent with experimental work done by Sinha et al. (2002, 2005, 2006)

Disease associated alleles vs. innocent bystanders

Tong et al. (2006) Immunome Research, 2: 1

Page 64: Bioinformatics of Disease: immune epitope prediction

1/9 investigated Dsg3 peptides fits existing binding motifs Flanking residues – clashes in fitting binding register Register-shift for Peptide V (Dsg3 342-356)

Detected binding register: Dsg3 346-354 Binding motifs: Dsg3 347-355 (Veldman et al., 2003)

: Dsg3 345-353 (Sinha et al., 2006)

Whither sequence motifs (again!)?

Page 65: Bioinformatics of Disease: immune epitope prediction

Docking of 936 15mer Dsg3 peptides generated using a sliding window of size 15 across the entire Dsg3 glycoprotein

Large-scale screening of Dsg3 peptides

Dsg3 peptide (sliding window width 15)

N C

Binding register (sliding window width 9)

Flanking residues

Tong et al. (2006) BMC Bioinformatics, 7(Suppl 5): S7

Training set: 8 peptides each, with exp. IC50 values and known binding registers (5 binders and 3 non-binders)

Page 66: Bioinformatics of Disease: immune epitope prediction

-40.00

-35.00

-30.00

-25.00

-20.00

-15.0050 70 90 110 130 150 170 190 210 230 250

15-mer start position

Bin

ding

Ene

rgy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00250 270 290 310 330 350 370 390 410 430 450

15-mer start position

Bin

ding

Ene

rgy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00450 470 490 510 530 550 570 590 610 630 650

15-mer start position

Bin

ding

Ene

rgy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00650 670 690 710 730 750 770 790 810 830 850

15-mer start position

Bin

ding

Ene

rgy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00850 870 890 910 930 950 970 990 1010 1030 1050

15-mer start position

Bin

ding

Ene

rgy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00450 470 490 510 530 550 570 590 610 630 650

15-mer start position

Binding Energy

-40.00

-35.00

-30.00

-25.00

-20.00

-15.00450 470 490 510 530 550 570 590 610 630 650

15-m er s tart position

Binding Energy

Extracellular

Intracellular

Transmembrane

DQB1*0503

DRB1*0402

Immunoreactive region

Large-scale screening of Dsg3 peptides

Page 67: Bioinformatics of Disease: immune epitope prediction

Common epitopes possibly responsible for inducing disease in DR4 & DR6 patients

Significant level of cross reactivity observed between DRB1*0402 and DQB1*0503 ( AROC=0.93) 57% of peptides investigated in this study predicted to

bind to both alleles with high affinity 90% of known Dsg3 peptides predicted to bind to both

alleles 12/20 top predicted DQB1*0503-specific Dsg3 peptides

from transmembrane region All top predicted DQB1*0402-specific Dsg3 peptides

from extracellular regions Disease initiation implications: DR4 from ECD; DR6 from

TM

Page 68: Bioinformatics of Disease: immune epitope prediction

Multiple binding registers revisited 76% (410/539) predicted high-affinity binders to DRB1*0402

possess > 2 binding registers 57% (384/673) predicted high-affinity binders to DQB1*0503

possess > 2 binding registers 66% (354/539) bind both alleles at different registers Similar proportion (70%) detected in known binders to both

alleles

Both alleles bind similar peptides via different binding registers

0

50

100

150

200

250

300

350

0 1 2 3 4 5 6

No of Binding Registers

No o

f Pep

tides

DQB1*0503 DRB1*0402

Page 69: Bioinformatics of Disease: immune epitope prediction

What next? We have developed a predictive model for

HLA-C (Cw*0401) with very limited (only six) experimental binding values.

The model yields excellent results for test data (AROC=0.93).

Application to determine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins shows binding energies similar to HLA-A and –B.

Page 70: Bioinformatics of Disease: immune epitope prediction

Conclusions Computational models for immunogenic

epitope prediction can be successfully developed, even for alleles with limited experimental data.

While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.

Page 71: Bioinformatics of Disease: immune epitope prediction

1. Genome analysisApproaches EST analysisAnnotation pipeline using workflow

strategies

ApplicationsParasitic nematodesCancer EST data

OutcomesComprehensive

annotation at the gene and protein levels

Novel &/or pathogen-specific genes

Immune response evasion strategies

Page 72: Bioinformatics of Disease: immune epitope prediction

2. Transcriptome analysisApproaches Graph formalism for

alternative splicing Genome-wide analysis

Applications Drosophila genome Chicken compared to

human and mouse Kallikrein variants as

markers

Outcomes New mRNA-gDNA alignment

method, MGAlign & MGAlignIt First splicing graph database,

DEDB Web server for splicing graphs,

ASGS Sub-graph elements for

alternative splicing Multi-species splicing graph

database, GraphDB

Page 73: Bioinformatics of Disease: immune epitope prediction

3. Protein/Proteome research:Origin and evolution of structural domainsApproaches Intron mapping to

domain boundary All eukaryotic proteins

analyzed

Applications Domain prediction in

EST/genome data Effect of splice

variants on domains

Outcomes New database of protein

coding genes, XPro Visualization of intronic

locations on protein structural doimains, XDomView

Analysis tool, Go Module Viewer

Page 74: Bioinformatics of Disease: immune epitope prediction

3. Protein/Proteome research: Small disulfide-rich proteins<100 aa per domain; ≥ 2 SS bonds

Approaches Multiple structure

alignment and hierarchical classification

Comparative modeling rules

Sequence, structure and evolutionary analysis of Potato II inhibitor family

Outcomes New database, DSFD Server for model building,

SDPMOD Understanding of wound-

induced protease inhibitor folding

Applications Design of protease

inhibitors, channel modulators, growth regulators

Page 75: Bioinformatics of Disease: immune epitope prediction

3. Protein/Proteome research: Protease cleavage site predictionApproaches Detailed structural

modeling and docking of signal peptide moiety to signal peptidase I

SVM for caspases

Applications Enhanced production of

therapeutic and cemmercial heterologous proteins

Apoptosis initiation

Outcomes New databases, SPdb,

CasBase Server for caspase

clevage prediction, CASVM

Signal peptide cleavage prediction (under development)

Page 76: Bioinformatics of Disease: immune epitope prediction

4. Systems BiologyApproaches Holistic computational,

molecular biology and FRET study to locate secretion roadblocks

EST analysis of host-parasite interactions

Applications Trichoderma reesei as fungal

bioreactor Parasites that lead to: liver

cancer - food borne trematode (Opisthorchis viverrini) and bladder cancer (Schistosoma haematobium).

Outcomes Improved heterologous

protein production using filamentous fungi

Understanding of how parasites evade host immune activation

Page 77: Bioinformatics of Disease: immune epitope prediction

6. Genome-Phenome mappingApproaches Mutation data for non-

laboratory animals Mapping to OMIM Mapping to structure

Applications OMIA-OMIM mapping

to structure Correlation between

genotype and disease pehnotype

OutcomesOMIA database, with

links to OMIM (courtesy NCBI)

Mutations linked to severity of disease for α-D-mannosidosis

Predictions of new human disease mutations from known mutation sites in cow, cat and guinea pig

Page 78: Bioinformatics of Disease: immune epitope prediction

7. Biodiversity Informatics: Customary medicinal plantsApproaches Integrating, visualizing and

analyzing ethnobotanical, phytochemical and pharmacological data on customary medicinal plants

Data from Australian aboriginal elders and Indian Siddha doctors

Applications Novel antimicrobial, anti-

inflammatory and anti-cancer lead compunds

Outcomes CMkb, an integrated

knowledgebase

Page 79: Bioinformatics of Disease: immune epitope prediction

Dedications Prof. Bernard Pullman

Mme. Alberte Pullman

My brother, a CML survivor

Page 80: Bioinformatics of Disease: immune epitope prediction

Acknowledgements Dr. (Victor) J.C. Tong, NUS&I2R, Singapore A/Prof. Tin Wee Tan, NUS Dr. Animesh Sinha, Weill Medical College of

Cornell University & Michigan State University, USA

Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN266200400085C).

All of you!