immunological bioinformatics - cbs · prediction of the ctl presenting pathway and integration of...

29
IMMUNOLOGICAL BIOINFORMATICS June - 2011 27685 Immunological Bioinformatics MEHTEIDHWLEFSATKLSSCDSFTSTINELNHCLSLRT YLVGNSLSLADLCVWATLKGNAAWQEQLKQKKAPVH VKRWFGFLEAQQAFQSVGTKWDVSTTKARVAPEKKQ DVGKFVELPGAEMGKVTVRFPPEASGYLHIGHAKAA LLNQHYQVNFKGKLIMRFDDTNPEKEKEDFEKVILE DVAMLHIKPDQFTYTSDHFETIMKYAEKLIQEGKAYV DDTPAEQMKAEREQRIESKHRKNPIEKNLQMWEEM KKGSQFGHSCCLRAKIDMSSNNGCMRDPTLYRCKIQP HPRTGNKYNVYPTYDFACPIVDSIEGVTHALRTTEYH DRDEQFYWIIEALGIRKPYIWEYSRLNLNNTVLSKRK LTWFVNEGLVDGWDDPRFPTVRGVLRRGMTVEGLK QFIAAQGSSRSVVNMEWDKIWAFNKKVIDPVAPRYVA LLKKEVIPVNVPEAQEEMKEVAKHPKNPEVGLKPVW YSPKVFIEGADAETFSEGEMVTFINWGNLNITKIHKN

Upload: ngoduong

Post on 08-Apr-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

IMMUNOLOGICAL BIOINFORMATICS

June - 201127685 Immunological Bioinformatics

MEHTEIDHWLEFSATKLSSCDSFTSTINELNHCLSLRTYLVGNSLSLADLCVWATLKGNAAWQEQLKQKKAPVHVKRWFGFLEAQQAFQSVGTKWDVSTTKARVAPEKKQDVGKFVELPGAEMGKVTVRFPPEASGYLHIGHAKAALLNQHYQVNFKGKLIMRFDDTNPEKEKEDFEKVILEDVAMLHIKPDQFTYTSDHFETIMKYAEKLIQEGKAYVDDTPAEQMKAEREQRIESKHRKNPIEKNLQMWEEMKKGSQFGHSCCLRAKIDMSSNNGCMRDPTLYRCKIQPHPRTGNKYNVYPTYDFACPIVDSIEGVTHALRTTEYHDRDEQFYWIIEALGIRKPYIWEYSRLNLNNTVLSKRKLTWFVNEGLVDGWDDPRFPTVRGVLRRGMTVEGLKQFIAAQGSSRSVVNMEWDKIWAFNKKVIDPVAPRYVALLKKEVIPVNVPEAQEEMKEVAKHPKNPEVGLKPVWYSPKVFIEGADAETFSEGEMVTFINWGNLNITKIHKN

TABLE OF CONTENTS

Abstract! 3

Outline! 3

Introduction to the immune system! 4

Performance measures in prediction systems! 6

B ce" epitope identification and prediction! 8

Prediction of MHC binding.! 11

MHC class I binding predictions! 11

MHC class II binding predictions! 13

MHC polymorphism, supertype clustering, and panPredictions.! 14

Prediction of the CTL presenting pathway and integration of prediction systems! 16

Integrated T-ce" epitope predictions! 17

Epitope discovery and rational vaccine design.! 19

SARS! 19

Influenza! 20

HIV! 21

The new age of vaccine development! 22

Bibliography! 24

IMMUNOLOGICAL BIOINFORMATICS

Morten NielsenClaus Lundegaard

2011

AbstractImmunological bioinformatics is a relatively novel research field, highly relevant in investiga-tions of the complex behavior of the immune system. The immune system reacts to foreign protein segments called epitopes, and a major task is to develop methods identifying epitopes in pathogen proteomes. Predicted epitopes can eventually be experimentally verified and used in vaccine design and diagnosis. The course will demonstrate how bioinformatics can effec-tively be used in the search of vaccine candidates against pathogens like for instance SARS, Flu, and HIV. The methods used include artificial neural networks, Gibbs sampling, weight matrices and hidden Markov models as well as integrative models and protein structural analy-sis.

Outline1. Introduction to the immune system

2. Performance measures in prediction systems

3. B cell epitope identification and prediction

4. Prediction of MHC binding.

5. MHC polymorphism, supertype clustering, and panPredictions.

6. Prediction of the CTL presenting pathway and integration of prediction systems.

7. Epitope discovery and rational vaccine design.

Immunological Bioinformatics

June 2011! 3

Introduction to the immune systemAdapted from (Lundegaard et al., 2007)

Adaptive immunity is induced by lymphocytes and can be classified into two types: humoral immunity, mediated by antibodies, which are secreted by B lymphocytes and can neutralize pathogens outside the cells; and cellular immunity, mediated by T lymphocytes that eliminate infected or malfunctioning cells, and provide help to other immune responses. Diversity is the hallmark of the adaptive immune systems. Both the B and T lymphocyte-specific receptors for antigen recognition are assembled from variable (V), diversity (D), and joining (J) gene seg-ments early in the lymphocyte development. There are multiple copies of V, D and J segments, and a huge repertoire of T and B cells is generated by the recombination of these segments, reviewed by (Li et al., 2004). Another task faced by the immune system is the tolerance to self, which is handled by continuously removing receptors that react to self-epitopes.

Special immunoglobulin molecules (antibodies) mediate the humoral response. As mentioned above, the antibodies are produced by B lymphocytes that bind to antigens by their immuno-globulin receptors, which is a membrane bound form of the antibodies. When the B lympho-cytes become activated, they start to secrete the soluble form of this receptor in large amounts. The antibody is Y-shaped, and each of the two branches functions independently and can be recombinantly produced and is then known as Fabs. The highly variable tip of the Fab, which can bind to epitopes is called the paratope and is made up of the so-called complementary de-termining regions (CDRs). Antibodies can coat the surface of an antigen such as a virus, so that it cannot function or infect cells, reviewed by (Burton, 2002). Antibody-covered viruses or bacteria are easily phagocytosed and destroyed by scavenger cells of the immune system, e.g. the macrophages. Antigenic proteins can be recognized by the antibodies in their native form without any cleavage or interactions with other molecules. Thus the humoral immune response reacts to extracellular pathogens, and the response is crucial in the defense against most pathogens.

The cellular arm of the immune system consists of two parts; cytotoxic T lymphocytes (CTL), and helper T lymphocytes (HTLs). CTLs destroy cells that present non-self peptides (epi-topes). HTLs are needed for B cells activation and proliferation to produce antibodies against a given antigen. CTLs on the other hand perform surveillance of the host cells, and recognize and kill infected cells, generally explained in (Janeway, 2005). Both CTL and HTL are raised against peptides that are presented to the immune cells by major histocompatibility complex (MHC) molecules, which are the most polymorphic of mammalian proteins. The human ver-sions of MHCs are referred to as the human leucocyte antigen (HLA). The cells of an individ-ual are constantly screened for such peptides by the cellular arm of the immune system. In the MHC class I pathway, class I MHCs presents endogenous antigens to T cells carrying the CD8 receptor (CD8+ T cells). To be presented, a precursor peptide is normally first generated by the large cytosomal protease complex called the proteasome (Loureiroa and Ploegha, 2006). Gen-

Immunological Bioinformatics

June 2011! 4

erally, it then binds to the transporter associated with antigen processing (TAP) for transloca-tion into the endoplasmic reticulum (ER), reviewed by (Abele and Tampé, 2004), but some peptides can enter the ER independently of TAP (Anderson et al., 1991; Bacik et al., 1994; Eis-enlohr et al., 1992; Larsen et al., 2006). During or after the transport into the ER the peptide must bind to the MHC class I molecule (Stoltze et al., 2000; Zhang et al., 2006) before it can be transported to the cell surface through the golgi system. The most selective step in this pathway is binding of a peptide to the MHC class I molecule as on average only approximately one out of 200 peptides will bind to a given MHC molecule with enough strength to be de-tected by a CD8+ T cell and induce a CTL response.

Immunological Bioinformatics

June 2011! 5

Performance measures in prediction systems

As an evaluation of the general quality of a prediction method a measure describing the quality is needed. However, no single measure can capture all qualities of a prediction, and not all types of data and predictions can be reasonably described by the same measure. So to be able to compare different systems, it is often needed to present several measures of quality. Most measures need the data to be classified into two groups, i.e. positives and negatives.

The fraction correct predicted or accuracy is the fraction of the total predictions that falls into the correct group. This measure is intuitively easily captured, but has the weakness that if a large fraction of the total evaluation data falls into a single group one will get high performance by just blindly predicting most or even everything to belong to this category.

The number of classified (experimentally measured) positives is often designated as actual positives (AP), and the number of negatives, actual negatives (AN), the number of predicted positives (PP), predicted negatives (PN), truly predicted positives (TP), falsely predicted posi-tives (FP), truly predicted negatives (TN), and falsely predicted negatives (FN). Some of the most often used measures are briefly described here. The equations for the mentioned meas-ures are given at the end of the section.

The positive predicted value (PPV) is the fraction of the positive predictions that actually falls into the positive class.

The sensitivity is the fraction of the AP that is predicted as positives using a given threshold.

The specificity is the fraction of the AN that is predicted as negatives.

The three latter measures are also easily grasped, however they are all dependent on the cho-sen prediction cutoff classifying the data into positive and negative predictions. A high sensi-tivity can be obtained by setting your prediction cutoff so that most of your evaluation data will fall into the positive group, but this will then be at the expense of the specificity and the PPV. Which cutoff to use is determined by the purpose of the prediction, i.e. how many veri-fied epitopes is needed versus the resources available for experimental validation.

A plot of the sensitivity against the false positive rate (1-specificity) is called a receiver operating characteristic (ROC) curve (Swets, 1988). Such a plot can be a help to set the best prediction cutoff. One of the best ways of measuring the predictive power of a method is to calculate the

Sensitivity =TP

AP

Specificity =TN

AN

Immunological Bioinformatics

June 2011! 6

area under the ROC curve (AUC) since this is a threshold-independent measure. Another ro-bust measure is the Pearson’s correlation coefficient (PCC), which is a measure of how well the prediction scores correlate with the actual value on a linear scale.

In situations where the correlation is not necessarily linear, the Spearman's rank correlation coefficient (SRC) is more appropriate. In this measure each prediction is ranked on the basis of the prediction score and the PCC is calculated on the basis of this rank rather than the predic-tion score. The SRC, like the AUC, is a threshold-independent measure of how well the pre-dictor ranks the data when compared with the actual ranking.

When comparing different methods, the threshold-independent measures are to be preferred. Otherwise a threshold has to be set under the same assumptions for all predictors. As an ex-ample one can estimate the specificity for each predictor by setting the threshold for the given predictor to a value where the sensitivity will be 0.5 (i.e. half of the total available positives is over the threshold), or estimate the sensitivity at a threshold where the specificity will be 0.8 (i.e. 80% of the AN are predicted as negatives).

Immunological Bioinformatics

June 2011! 7

B cell epitope identification and predictionAdapted from (Lundegaard et al., 2007)

The humoral response is mediated by antibodies (immunoglobulin molecules), which are pro-duced by B lymphocytes. B lymphocytes can bind to antigens by their immunoglobulin recep-tors. When they become activated, they start to secrete a soluble form of this receptor (anti-bodies) in large amounts. Overall the antibody is Y-shaped (or T-shaped) with two antigen binding (Fab) fragments at the top left and right of the antibody and the constant fragment (Fc) below. The linker regions connecting the constant with the variable regions allow great flexibility in the relative orientation of the domains. The highly variable tip of a Fab fragment that can bind to epitopes is called a paratope and is made up of the so-called complementary determining regions (CDRs) in the antibody sequence. The binding of antibodies to, e.g., a vi-rus, can coat the surface so that it cannot infect cells (Burton, 2002). Viruses or bacteria cov-ered by antibodies are also more easily taken up (phagocytosed) and destroyed by scavenger cells of the immune system such as macrophages.

B-cell epitope prediction is a highly challenging field due to the fact that the vast majority of antibodies raised against a specific protein interact with discontinuous fragments (van Re-genmortel, 1996). The prediction of continuous, or linear, epitopes, however, is a somewhat simpler problem, and may be still useful for synthetic vaccines or as diagnostic tools (Regen-mortel and Muller, 1999). Moreover, the determination of continuous epitopes can be inte-grated into determination of discontinuous epitopes, as these often contain linear stretches (Hopp, 1994).

In the early 1980s, Hopp and Woods (Hopp and Woods, 1981; Hopp and Woods, 1983) devel-oped the first linear epitope prediction method. This method takes the assumption that the regions of proteins that have a high degree of exposure to solvent contain the antigenic deter-minants. According to the hydrophilicity scale generated by Levitt (Levitt, 1976), Hopp and Woods (Hopp and Woods, 1981) assigned the hydrophilicity propensity to each amino acid in a sequence and looked at groups of six residues. This gave promising results and a number of methods have since been developed with the aim of predicting linear epitopes using a combi-nation of different amino acid propensities (Alix, 1999; Debelle et al., 1992; Jameson and Wolf, 1988; Maksyutov and Zagrebelnaya, 1993; Odorico and Pellequer, 2003; Parker et al., 1986). In 1993, Pellequer et al. (Pellequer et al., 1993) proposed an evaluation set containing 85 continu-ous epitopes in 14 proteins and found that the method based on turn propensity (i.e. the pro-pensity of an amino acid to occur within a turn structure) had the highest sensitivity using this set. Seventy percent of the residues predicted to be in epitopes by this method were actually part of epitopes. The sensitivity for methods based on other propensities was in the range of

Immunological Bioinformatics

June 2011! 8

36–61% (Pellequer et al., 1991). Analyzing the epitope regions in the Pellequer dataset reveals that almost all the hydrophobic amino acids are underrepresented, supporting the assumption that linear B-cell epitopes will occur in hydrophilic regions of the proteins.

An extensive study of linear B-cell epitope prediction methods was published by Blythe and Flower (Blythe and Flower, 2005). To test how well peaks in single amino acid scale propensity profiles are (significantly) associated with known linear epitope locations, 484 amino acid pro-pensities from the AAindex database (http://www.genome.ad.jp) (Kawashima and Kanehisa, 2000) were used. As test set they used 50 epitope-mapped proteins defined by polyclonal anti-bodies, which were the best non-redundant test set available. Blythe and Flower (Blythe and Flower, 2005) found, however, that even the predictions based on the most accurate amino acid scales were only marginally better than random, suggesting that more sophisticated ap-proaches is needed to predict the linear epitopes. BepiPred (Larsen et al., 2006), an algorithm that combines scores from the Parker hydrophilicity scale (Parker et al., 1986) and a PSSM trained on linear epitopes, shows a small, but significant, increase in AUC over earlier scale-based methods. The sequence parametrizer algorithm (Sollner and Mayer, 2006; Sollner, 2006), along with its associated machine-learning methods uses the common single amino acid propensity scales, but also incorporates neighborhood parameters reflecting the probability that a given stretch of amino acids exists within a predefined proximity of a specific amino acid residue. Training and testing on epitope sequences pulled from a high-quality proprietary data-base, as well as several publicly accessible databases, yields a degree of accuracy that is greatly increased over single-parameter methods.

Different experimental techniques can be used to define conformational epitopes. Probably the most accurate, and easily defined is using the solved structures of antibody–antigen com-plexes (Fleury et al., 2000; Mirza et al., 2000). The amount of this kind of data is unfortu-nately still scarce, compared to linear epitopes. Furthermore, very few antigens have been stud-ied in a way where all possible epitopes on a given antigen has been identified. Unidentified epitopes within the dataset will lower the apparent performance of an accurate prediction method by increasing the apparent false positive rate.

The simplest way to predict the possible epitopes in a protein of known 3D structure is to use the knowledge of surface accessibility (Novotny et al., 1986; Thornton et al., 1986). Two newer methods using protein structure and surface exposure for prediction of B-cell epitopes have been developed. The CEP method (Kulkarni-Kale et al., 2005) calculates the relative accessi-ble surface area for each residue in the structure. Then it is determined which parts of the pro-tein that are exposed enough to be antigenic determinants. Regions that are distant in the primary sequence, but close in three-dimensional space are considered as one epitope. The tool was tested on a dataset of 63 antigen–antibody complexes and the algorithm correctly identi-fied 76% of the epitope residues. DiscoTope (Haste Andersen et al., 2006) uses a combination of amino acid statistics, spatial information and surface exposure. It is trained on a compiled

Immunological Bioinformatics

June 2011! 9

dataset of discontinuous epitopes from 76 X-ray structures of antibody–antigen protein com-plexes. This method outperforms methods that predict linear epitopes but needs an experi-mentally solved structure. A more recent approach, PEPITO (now BEpro). (Sweredoski and Baldi, 2008) uses both contact numbers (i.e. the number of C atoms within a certain distance threshold) and an amino-acid propensity scale. BEpro, attempts to overcome some of the limi-tations of previous predictors by incorporating an aminoacid propensity scale along with side chain orientation and solvent accessibility information using half sphere exposure values (Hamelryck, 2005). BEpro uses propensity scales and half sphere exposure values at multiple distance thresholds from the target residue, which seems to increase the robustness and reaches the to date best prediction performances (Sweredoski and Baldi, 2008).

Recently a workshop was held on the subject of B-cell epitope predictions attended by a broad range of the current method developers. The workshop resulted in a published review contain-ing conclusions on the present common ground, and suggestions for the future especially con-cerning coordination and evaluation (Greenbaum et al., 2007). Different ways of measuring the accuracy of B-cell epitope predictions have been suggested (Hopp, 1994; van Regenmortel and Pellequer, 1994). Pellequer suggested using the specificity as a measure of accuracy, while Hopp suggested using the PPV, but, as described earlier, neither measure will alone give a good description of the performance. In accordance to this the recent workshop concluded that the AUC measure is to be preferred (Greenbaum et al., 2007). Another issue is whether to make the statistics on a per-residue or on a per-epitope basis. However, as the latter have the addi-tional complications of defining how much of an epitope that must be included in a prediction to be considered correct, and how much extra included residues is allowed, the per residue measure is to be preferred.

Immunological Bioinformatics

June 2011! 10

Prediction of MHC binding.

M H C C L A S S I B I N D I N G P R E D I C T I O N SThe identification of T cell epitopes in a set of peptides can be facilitated by predicting their capability to bind MHC molecules, as MHC binding of a peptide is a necessary requirement for its recognition by T cells. The majority of peptides binding to MHC class I molecules have a length of 8–10 amino acids. Position 2 and the C-terminal position of the peptide are typi-cally most important for the binding, and these positions are referred to as anchor positions (Rammensee et al., 1999). Only few different amino acids are tolerated for MHC binding at the anchor positions. For some alleles, the binding motifs further have auxiliary anchor positions. For example, peptides binding to the human HLA-A*0101 allele also have positions 3 as anchor (Kondo et al., 1997; Kubo et al., 1994; Rammensee et al., 1999). The importance of anchor posi-tions for peptide binding and the allele-specific amino acid preference at the anchor positions was first described by (Falk et al., 1990). The discovery of such allele-specific motifs led to the development of the first algorithms (Pamer et al., 1991; Rotzschke et al., 1991; Sette et al., 1989), which essentially determined if a peptide did or did not match the binding ‘motif ’ of the MHC molecule.

Matrix prediction algorithms go beyond the match/mismatch classification of a motif predic-tion by assigning a score to each possible amino acid at each position in a peptide. It is as-sumed that the amino acids at each position along the peptide sequence contribute a given binding energy, which can independently be added up to yield the overall binding energy of the peptide Meister 1995; Parker 1994; Stryhn 1996}. This type of approach is used by the EpiMa-trix (Schafer et al., 1998), BIMAS (Parker et al., 1994), SYFPEITHI (Rammensee et al., 1997), RANKPEP (Reche et al., 2002), Gibbs sampler (Nielsen et al., 2004), SMM (Peters and Sette, 2005) and ARB methods (Bui et al., 2005). These methods differ in how they determine the matrix coefficients. Some are trained by statistical methods that analyze how often a given amino acid is seen at a given position in binding versus non-binding peptides. Matrix coeffi-cients can also be determined by a machine learning procedure that aims at finding coefficients that best explains the observed binding data. This can be done by interpreting the matrix scores for a peptide as predicted binding affinities, and minimizing the distance between pre-dicted and measured values. This is the approach utilized by SMM, which is presently the best performing matrix method (Lin et al., 2008; Peters et al., 2006).

Matrix-based methods cannot take correlated effects into account, i.e. where the contribution to the binding affinity of an amino acid at one position depends on amino acids at other posi-tions in the peptide. Higher order methods like Artificial Neural networks (ANNs) and Sup-port Vector Machines (SVMs) are ideally suited to take such correlations into account (Adams and Koziol, 1995; Brusic et al., 1994; Buus et al., 2003; Gulukota et al., 1997; Nielsen et al., 2003). These methods can be trained with data either in the format of binder/non-binder clas-

Immunological Bioinformatics

June 2011! 11

sification, or as real affinity data. In an ANN, information about the amino acid sequence is converted into a numerical representation. Twenty input neurons representing the 20 different amino acids typically found in proteins are normally used for each position in the peptide. The neuron defined to represent the amino acid found on that position in the given peptide is as-signed the value 1 and the other 19 neurons are assigned the value zero.

An alternative approach to represent amino acids as vectors is to take the 20 scores from a substitution matrices such as PAM or BLOSUM (Henikoff and Henikoff, 1992; Henikoff and Henikoff, 1993; Jones et al., 1992). The parameters of the neural network are optimized to minimize the difference between the predicted output and the expected output which can be “0” or “1” in the case of a binder/non-binder classification, or the actual affinity if training is done on real value affinity data. In the ANNs typically used in bioinformatics, a so called “hid-den layer” is inserted between the input and the output layer. The input to the hidden layer is a weighted sum of the output from the input layer, and the input to the final output layer is a weighted sum of the outputs from the hidden layer. The parameters of an ANN are the weights used to calculate these sums (Baldi and Brunak, 2001). The NetMHC ANN imple-mentation based on (Nielsen et al., 2003) and (Buus et al., 2003) was the most successful higher order method in two recent comparisons (Lin et al., 2008; Peters et al., 2006).

Beside the sequence based machine-learning techniques, empirical molecular force field model-ing techniques (Logean et al., 2001) and 3D Quantitative Structure–Activity Relationship (3D-QSAR) (Doytchinova and Flower, 2002; Zhihua et al., 2004) analysis have been applied to predict peptide:MHC binding. These approaches are based on small molecule docking proce-dures, taking advantage of 3d structural data for MHC molecules. They directly calculate en-ergy terms for peptide-MHC interactions. As these calculations are very resource intensive, none of them is directly available online. This makes it difficult to compare their performance with the peptide sequence binding data based models. However in-house benchmark experi-ments performed by our group and the group hosting the IEDB database strongly suggest that the structure-based prediction approach is not suitable for MHC.peptide binding predictions and that the predictive performance of structure-based method is significantly below that of the data-driven methods (unpublished data).

To summarize: of the prediction methods publicly available online, the neural network based NetMHC performs best, followed by the matrix based SMM (Lin et al., 2008; Peters et al., 2006). Factors apart from performance that justify using SMM are a) that matrix predictions are inherently easier to interpret compared to an ANN which is essentially a black box, and b) that the SMM training and prediction code is freely available (Peters and Sette, 2005). The implementation of online consensus prediction tools that combine different prediction meth-ods (Moutaftsi et al., 2006) is currently in progress at the IEDB site.

The prediction performance of a given method is largely dependent on the size of the available training data set. The performance differences due to training set size are much larger than

Immunological Bioinformatics

June 2011! 12

those due to the choice of prediction method. (Peters et al., 2006; Yu et al., 2002). Unfortu-nately, for MHC molecules with little binding data available, it is not possible to assess how good the current binding predictions are. The sizes of training sets and their corresponding prediction performance are listed in (Peters et al., 2006).

The output of any prediction method will be a list of peptides associated with a binding affin-ity prediction. There are two approaches to select a reasonable set of T cell epitope candidates from this peptide set. First, one can select based on the actual affinities, and set a cutoff of 50, 500 or 5000 nM which are generally considered sensible for high, intermediate and low affinity. Alternatively, one can rank the predicted score, and set a cutoff of the top 0.1%, 1% or 10% of the predicted peptides. The amount of resources available to test the predicted candidates will often in practice determine the cutoffs chosen.

M H C C L A S S I I B I N D I N G P R E D I C T I O N SUnlike MHC class I molecules, the binding cleft of MHC class II molecules is open-ended, which allows for the bound peptide to have significant overhangs at both ends. As a result MHC class II binding peptides have a broader length distribution, and the typical binding peptides have a length of eleven to twenty residues. However, the majority of the binding in-teraction is mediated by a 9 amino acid residue core sequence of the bound peptide. This complicates binding predictions, as the identification of the correct alignment of the binding core is a crucial part of identifying the MHC class II binding motif (Nielsen et al., 2004; Wang et al., 2008). Identifying this core is difficult, as the MHC class II binding motifs have relatively weak and often degenerate sequence signals.

The majority of MHC class II binding prediction methods are based on the assumption that the peptide–MHC binding affinity is determined solely from a nine amino acids in binding core motif. This is clearly an oversimplification, since it is known that peptide flanking resi-dues on both sides of the binding core may contribute to the binding affinity and stability (Godkin et al., 2001). Some methods for MHC class II binding have attempted to include pep-tide flanking residues indirectly, in terms of the peptide length, in the prediction of binding affinities (Chang et al., 2006). Later it was demonstrated that including peptide flanking resi-dues in MHC class II predictions does improve the prediction accuracy (Nielsen et al., 2007).

Most of the methods for MHC class II binding predictions have been trained and evaluated on very limited datasets covering only a single or a few different MHC class II alleles, making it very difficult to compare the different performance values and establish generality of the methods. A recent large scale comparison of prediction methods for MHC class II binding (Wang et al., 2008) covered 14 HLA-DR (human MHC) and two mouse class II alleles. The best performing methods are described in (Nielsen et al., 2007) and (Sturniolo et al., 1999).

Immunological Bioinformatics

June 2011! 13

MHC polymorphism, supertype clustering, and panPredic-tions.

One potential drawback of epitope-based vaccines is that the HLA genes are extremely poly-morphic. Each of the corresponding molecules has a different specificity. If a vaccine needs to contain a unique peptide for each of these molecules it will need to comprise hundreds of pep-tides. One way to counter this is to select sets of a few HLA molecules that together have a broad distribution in the human population. A factor that may reduce the number of epitopes necessary to include in a vaccine is that many of the different HLA molecules have similar specificities. The different HLA molecules have been grouped together in so-called supertypes (del Guercio, 1995; Sette and Sidney, 1999; Sidney et al., 1995). This means ideally that if a pep-tide can bind to one allele within a supertype, it can bind to all alleles within that supertype. In practice, however, only some peptides that bind to one allele in a supertype will bind to all al-leles within that supertype. A number of different criteria have been used to define these su-pertypes, including structural similarities, shared peptide binding motifs, identification of cross reacting peptides, and ability to generate methods that can predict cross binding pep-tides (Sidney et al., 1996). For HLA class I molecules, (Sette and Sidney, 1999) defined nine supertypes (A1, A2, A3, A24, B7, B27, B44, B58, B62), which were reported to cover most of the HLA-A and -B polymorphism. They argued that the different alleles within each of these su-pertypes have almost identical peptide binding specificity. They found that while the frequen-cies at which the different alleles were found in different ethnic groups were very different, the frequencies of the supertypes were quite constant. Assuming Hardy-Weinberg equilibrium (i.e., infinitely large, random mating populations free from outside evolutionary forces), they found that >99.6% of persons in all ethnic groups surveyed possessed at least one allele within at least one of these supertypes. They also showed that the smaller collections of supertypes (A2, A3, B7) and (A1, A2, A3, B7, A24, B44) covered in the range of 83.0– 88.5% and 98.1–100.0% of persons in different ethnic groups respectively. Three alleles, A29, B8 and B46, were found to be outliers with a different binding specificity than any of the supertypes. These may define supertypes themselves when the specificity of more HLA molecules is known.

(Lund et al., 2004) presented a measure for the difference in the specificities of different HLA molecules and used it to cluster HLA molecules. In order to define a clustering of HLA mole-cules, we first calculated difference in specificities (the distance) between each pair of HLA molecules. The distance dij between two HLA molecules (i, j) was calculated as the sum over each position in the two motifs of one minus the normalized vector products of the amino ac-ids frequency vectors (Lyngsø et al., 1999):

dji =�

p

�1−

pip · pj

p

|pip||pj

p|

Immunological Bioinformatics

June 2011! 14

The result of this analysis was used to revise the class I supertypes and add additional 3 super-types (A26, B8, and B39). Recently (Sidney et al., 2008) analyzed the complete list of alleles available through IMGT (release 2.9) using a simple approach largely based on compilation of published motifs, binding data, and analyses of shared repertoires of binding peptides, in con-junction with clustering based on the primary sequence of the B and F peptide binding pock-ets. On this basis they updated the supertype assignments, with new assignments for about 700 different HLA-A and -B alleles.

(Gulukota and DeLisi, 1996) compiled lists with three, four, and five alleles that give the maximal coverage of different ethnic groups. One complication they had to deal with is that HLA alleles are in linkage disequilibrium; i.e., that the joint probability of an allelic pair may not be equal to the product of their individual frequencies [P (a)P (b) �= P (ab)]. This means that it is not necessarily optimal to choose the alleles with the highest individual frequencies. Moreover, (Gulukota and DeLisi, 1996) find that populations like the Japanese, Chinese and Thais can be covered by fewer alleles than the North American Black population, which turns out to be very diverse. Thus, different alleles should be targeted in order to make vaccines for different ethnic groups or geographic regions. This observation combined with the fact that a binder to a given supertype representative does not necessary bind to a majority of the super-type representatives strongly reduce the assumed population coverage of a given set of epi-topes.

Lately, the number of available binding data has increased significantly both in terms of amount of different peptides and the number of different MHC alleles. This fact has in itself influenced the MHC prediction systems that now cover a large number of different HLA al-leles (Peters et al., 2006; Zhang et al., 2008). More interestingly, however, it has enabled the development of new so called pan-specific algorithms that can predict peptide binding to HLA alleles for which limited or even no experimental data are available (Jacob and Vert, 2007; Jojic et al., 2006; Nielsen et al., 2007; Nielsen et al., 2008; Zhang et al., 2005). A common feature of these pan-specific methods is that they go beyond the conventional single allele approach, and take both the peptide sequence and the MHC contact environment into account. Large-scale benchmark evaluation of the pan-specific methods has demonstrated their power not only for predicting peptide-binding affinities to previously un-characterized MHC molecules but also to boost the predictive performance for characterized alleles by leveraging information from neighboring MHC molecules (Zhang et al., 2009).

Immunological Bioinformatics

June 2011! 15

Prediction of the CTL presenting pathway and integration of prediction systems

The below text is adapted from (Lundegaard et al., 2007).

MHC class I binding is a necessary but not sufficient requirement for a peptide to be recog-nized by CD8 T cells. Another major requirement is the efficient generation of the peptide from its source antigen, which includes proteasomal cleavage in the cytosol and transport into the Endoplasmic reticulum (ER) by the Transporter associated with Antigen Processing (TAP). Successful prediction of these processes should therefore improve the prediction of naturally processed MHC ligands and T cell epitopes.

The proteasome has a complex enzymatic specificity, which makes the prediction of its cleav-ages challenging. It comprises multiple catalytically active sites, each with a distinct specificity (Kloetzel, 2004). It is thus expected that the accuracy for prediction of proteasomal activity will be relatively low compared to that of methods for MHC peptide binding. A further com-plication is that two versions of the proteasome exist. The “normal” proteasome is called the constitutively expressed and participates in the constant turnover of proteins within cells. When a cell is alerted that an infection is present some of the newly synthesized so called ‘immunoproteasome’ have different subunits, which are thought to cleave fragments that are more suited for producing epitopes.

Several prediction algorithms for proteasomal cleavage are available. FragPredict, which is pub-licly available as a part of MAPPP service (http://www.mpiib-berlin.mpg.de/MAPPP/), consists of two algorithms. The first algorithm uses a statistical analysis of cleavage-enhancing and -in-hibiting amino acid motifs to predict potential proteasomal cleavage sites (Holzhutter et al., 1999). The second algorithm, which uses the results of the first algorithm as an input, predicts which fragments are most likely to be generated. This model takes the time-dependent degra-dation into account based on a kinetic model of the 20S proteasome (Holzhutter and Kloet-zel, 2000).

PAProC (http://www.paproc.de) is a method for predicting cleavages by human as well as wild type and mutant yeast proteasomes. The influence of different amino acids at different posi-tions is determined by using a stochastic hill-climbing algorithm (Kuttler et al., 2000) based on the experimentally in vitro verified cleavage and non-cleavage sites (Nussbaum et al., 2001). (Tenzer et al., 2004) have also published a weight matrix based method for prediction of both constitutive- and immunoproteasomal cleavage specificity. Both matrices are trained on in the very limited in vitro proteasomal digest data available.

The NetChop (Kesmir et al., 2002) method uses the C terminus of naturally processed MHC class I ligands, which are thought to be created by proteasomal cleavage. Since some of these

Immunological Bioinformatics

June 2011! 16

ligands are generated by the immunoproteasome, and some by the constitutive proteasome, such a method should predict the combined specificity of both forms of proteasomes. In 2003, NetChop-2.0 was evaluated to be the best-performing predictor on an independent evaluation set (Saxová et al., 2003). The SVM based Pcleavage proteasomal cleavage predictor, which is available online, has a published performance comparable to that of NetChop-2.0 (Bhasin and Raghava, 2005). An update of the NetChop method to version 3.0 (Nielsen et al., 2005) con-sists of a combination of several ANNs, each trained using a different sequence-encoding scheme of the data. NetChop 3.0 has an increase in the prediction sensitivity as compared to NetChop-2.0, without lowering the specificity, and is probably still the best predictor of pro-teasomal cleavage.

Relatively few methods have been developed to predict the specificity of TAP. (Daniel et al., 1998) have developed ANNs using 9-mer peptides for which the TAP affinity was determined experimentally. Surprisingly, they found that some MHC alleles such as alleles blonging to the HLA-A*02 family have some natural ligands with very low TAP affinities. It has been shown that TAP ligands can be trimmed in the ER before binding to MHC molecules (Fruci et al., 2001), and the TAP ligand thus often will be a precursor to the MHC binding peptide. (Peters et al., 2003) used an SMM based matrix to predict TAP affinity of peptides. This method can be generalized to peptides that are longer than 9 amino acids by assuming that only the first three positions in the N-terminal and the last position at the C-terminal influence TAP bind-ing. The method had been tested on a large data set and been shown to have a high accuracy. The selective influence of TAP binding in the epitope presentation pathway is much lower than MHC binding and the AUC value when this method is used alone as an epitope predictor of 0.79 is thus significantly lower than most MHC-binding prediction methods.

With increasing numbers of TAP ligands being publically available (e.g. Jen-Pep database, http://www.jenner.ac.uk/antijen/aj_tap.htm) (Blythe et al., 2002), it is likely that more accurate TAP predictions will be available in the future.

I N T E G R A T E D T - C E L L E P I T O P E P R E D I C T I O N SReliable predictions of immunogenic peptides can reduce the experimental effort needed to identify new epitopes. Although predictions of MHC binding alone can be used to rank the possible epitopes very accurately, even better predictions should be possible if molecular events other than peptide:MHC binding in the antigen presentation pathway are modeled. Accordingly, many attempts have been made to predict the outcome of the steps involved in antigen presentation: MAPP (Hakenberg et al., 2003), NetCTL (Larsen et al., 2005), MHCpathway (Tenzer et al., 2005), EpiJen (Doytchinova et al., 2006), and WAPP (Donnes and Kohlbacher, 2005). All of these methods attempt to predict antigen presentation by inte-grating peptide:MHC binding predictions with one or more of the other events involved in the antigen presentation pathway.

Immunological Bioinformatics

June 2011! 17

To benchmark these prediction methods, a set of verified epitopes can be used as the positive data set. On the other hand, negative data set (peptides that cannot induce an immunologic response) is difficult to define because it is impossible to guarantee that a peptide will never be an epitope in any persons with a given HLA haplotype. Instead, epitopes from well-studied pathogens (e. g. HIV) are often used as the positive set, and all other peptides from the ge-nome of the same pathogen that have never been shown to be an epitope are assumed negative as they have a very low probability of being an epitope. Running a large-scale benchmark calcu-lation comparing the predictive performance of several publicly available MHC-I presentation prediction methods evaluated on a large set of known HIV epitopes (http://www.cbs.dtu.dk/suppl/immunology/CTL-1.2/HIV_dataset) reveals that the NetCTL and MHCpathway methods have the highest predictive performance with >75% of the epitopes being within the top 5% peptides with the highest prediction scores (Larsen et al., 2007).

Immunological Bioinformatics

June 2011! 18

Epitope discovery and rational vaccine design.

To exemplify the use of bioinformatics for epitope discovery in specific cases we here present a number of published works concerning selected pathogens of special concern.

S A R SThe following text have been adapted from (Sylvester-Hvid et al., 2004)

Severe Acute Respiratory Syndrome (SARS) during 2003 infected more than 8400 patients in over 30 countries and caused more than 800 deaths. Coordinated by the WHO, classical measures including case detection, isolation, infection control, contact tracing, and follow-up surveillance successfully contained the disease. However, ideally, the SARS-coronavirus (SARS-CoV) should be possible detect fast and contained by vaccination programs. To develop both vaccines and diagnostic tools in the timeframe of a new aggressive and previously unknown pathogen, the use of bioinformatical tools to shortcut and speed up experimental will be abso-lutely neccesary.

The SARS-CoV infects epithelial cells in the respiratory tract causing interstitial pneumonia (Holmes, 2003). One would therefore expect that an effective vaccine should induce mucosal immunity such as that effected by secretory immunoglobulin A (IgA), which specifically pre-vents an infectious agent from penetrating the mucosal epithelium, and by cytotoxic T lym-phocytes (CTLs), which specifically eradicate infected cells (Eriksson and Holmgren, 2002). IgA responses are generally considered the major protective mechanism; however, there are examples of CTLs, not antibodies, being responsible for early control of mucosal infection (Eriksson and Holmgren, 2002). Particularly noteworthy, this is the case for the infectious bronchitis virus of chicks, a prototype of the Coronaviridae family, where primary effector CD8+ CTLs play a critical role in the elimination of virus during acute infection and subse-quent control of the infection (Collisson et al., 2000; Pei et al., 2003; Seo and Collisson, 1997; Seo and Collisson, 1998; Seo et al., 2000).

The complete SARS genome/proteome was obtained from Gen-Bank (NC004718) and virtu-ally digested into all 9862 unique nonamer peptides (Marra et al., 2003). Thus, close to 10,000 binding predictions were made for each of the nine HLA supertypes. Artificial neural networks (ANNs) were used to predict the binding affinity quantitatively when the corresponding data were available [e.g. for A*0201 (Buus et al., 2003; Nielsen et al., 2003)]. The performance of the ANNs is high, as the correlation coefficient between predicted and measured binding is 0.85. The remaining HLA bindings were predicted using weight matrices derived from Gibbs sampling sequence-weighting methods with pseudocount correction for low counts as well as differential position-specific anchor weighting (Nielsen et al., 2004). These weight matrices were calculated from available nonamer data from the SYFPEITHI and MHCPEP databases

Immunological Bioinformatics

June 2011! 19

with the peptides clustered into the nine supertypes (A1, A2, A3, A24, B7, B27, B44, B58, and B62). The positive predictive value of the matrix-driven prediction has been found to be around 66%, whereas the negative predictive value has been found to be around 97% (Lam-berth et al. unpublished observation). Proteasomal processing was predicted using NETCHOP 2.0 (Kesmir et al., 2002). NETCHOP 2.0 has been found to be superior to other proteasomal prediction algorithms (Saxová et al., 2003). Peptides with a NETCHOP 2.0 score below 0.5 (i.e. poorly predicted proteasomal processing) were excluded from further analysis. Finally, we excluded all peptides that did not represent epitopes conserved in all SARS isolates. Figure 1 shows a representative example for a member of the HLA-A3 supertype, the HLA-A*1101 (this haplotype is particularly common in Southeast Asia).

The selected peptides were biochemically tested for binding and more than 70% of the se-lected peptides turned out to actually bind to the specific HLA molecule. These peptides were never tested for immunogenicity because of lack of patient material.

I N F L U E N Z AAs an example the use of combined knowledge and bioinformatics we have adapted the pub-lished results of (Wang et al., 2007):

Influenza is a highly contagious, airborne respiratory tract infection associated with a signifi-cant disease burden. The annual “mild” influenza epidemics caused by antigenic drift of the virus affects 10–20% of the world’s population with up to 5 million cases of serious illness and 500,000 deaths (http://www.who.int/vaccineresearch/diseases/ari/en). At various intervals, new influenza subtypes emerge against which no immunity exists in the human population and these may cause global pandemics with an even higher disease toll. The current outbreak of a new influenza subtype A (H5N1), which can be directly, although at this time rarely, transmit-ted from birds to humans (Chan, 2002), is an example of a potential pandemic flu threat, which has currently reached phase 3 of 6 in theWHOdelineation of a flu pandemic (http://www.who.int/csr/disease/avian influenza/phase/en/index.html). It is highly pathogenic in birds, and when transmitted to humans it carries a mortality of over 50% (Beigel et al., 2005). It is the largest and most severe influenza epidemic ever registered among birds; since 2003, it has spread rapidly to poultry in many countries It is known that CD8+ T cell responses also play a major role in the control of primary influenza virus infection (Bender et al., 1992; McMichael et al., 1983). In mice, CTLs against conserved epitopes contribute to protective immunity against influenza viruses of various subtypes (Kuwano et al., 1988; Ulmer et al., 1993). Identification of CTL epitopes, especially conserved epitopes shared by multiple viral strains, might therefore be a robust vaccine strategy against emerging influenza epidemics.

The NetCTL algorithm were used to select among highly conserved epitopes encompassing 12 different HLA class I supertypes with >99% population coverage, and searched for conserved epitopes from available influenza A H1N1 viral protein sequences. Peptides corresponding to

Immunological Bioinformatics

June 2011! 20

167 predicted peptide–HLA-I interactions were synthesized, tested for peptide–HLA-I inter-actions in a biochemical assay and for influenza-specific, HLA-I-restricted CTL responses in an IFN- ELISPOT assay. Eighty-nine peptides could be confirmed as HLA-I binders, and 13 could be confirmed as CTL targets. The 13 epitopes, were highly conserved among human in-fluenza A pathogens, and 12 of these epitopes were present in the emerging H5N1 bird flu iso-lates (Wang et al., 2007).

H I VThe below text is an abstract of (Lund et al., 2008).

The immense genetic variation of the host cellular immune system, HLA class I, forms another major obstacle for CTL-based vaccine approaches. Few human beings will share the same set of HLA alleles and most humans will react to a pathogen infection in a unique manner. Re-cently, combining HLA class I molecules with significant overlap among the peptide-binding specificities has allowed a functional classification of HLA supertypes. The HLA supertypes together cover most of the HLA-A and HLA-B alleles in the population, thus allowing for large cohort studies investigating the efficacy of specific immune responses in relation to disease progression. The design of a vaccine based on such knowledge would increase the probability of eliciting broad CTL responses. A method to optimally cover a large majority of a highly di-verse population were developed and used for selection of potential HIV epitopes (Perez et al., 2008) . The developed algorithm, EpiSelect, were used to select a number of epitopes, which together constitute a broad coverage of the HIV-1 strains in subjects of diverse ethnical background. The EpiSelect algorithm aims at selecting a given number of predicted epitopes so that most strains are covered with the largest number of epitopes, while maximizing the cover-age of the viral strain with the smallest number of epitopes. The NetCTL method (Larsen et al., 2005) was used to select a set of potential HIV epitopes restricted to the seven major HLA supertypes A1, A2, A3, A24, B7, B44, and B58. The CTL epitopes were selected from a database of HIV-1 genomic strains covering the subtypes A (A1 and A2), B, C, D, and CRF01_AE. EpiSelect was then applied to select, for each HLA supertype, a set of potential epitopes with highest possible genomic coverage. A total of 174 predicted HLA class I supertype restricted epitopes were selected from the Gag, Pol, Env, Nef, and Tat regions with maximal genomic coverage of the major HIV-1 subtypes (Figure 1) and combined with 10 experimentally verified HLA class I restricted epitopes.

Immunological Bioinformatics

June 2011! 21

HIV-1genome

3' LTRol

vifgag5' LTR

vpr

gag

vpu env

polgag5' LTR 3' LTR

tatrev

vif

nef

Tat

11

11022

Supertype Gag Pol Env Nef Total

86666511

910978910

37610885

4431312

25302525252530

HLA-A1HLA-A2HLA-A3HLA-A24HLA- B7HLA- B44HLA- B58

31312

A

B

Gag Pol Nef TatEnv0

25

50

75

100

% r

eco

gn

ized

pep

tid

es

A1 A2 A3 A24 B7 B44 B580

10

20

30

40

50

60

70

80

% r

eco

gn

ized

pep

tid

es

Predicted HLA supertype restriction

C

Figure 2 P Immunogenicity of HLA class I supertype-restricted epitopes. The selected epitopes within the HIV regions Gag, Pol, Env, Nef, and Tatwere restricted to the most common HLA class I supertypes to give optimal coverage in the human population (A). The majority of the epitopes were

recognized by HIV-specific T cells (B). Recognized epitopes were equally distributed among the seven HLA class I supertypes (C).

selected peptides or to the positive HIV control of overlappingGag peptides. We subsequently ran the EpiSelect algorithm(illustrated in Figure 1) a second time to select a set of experi-mentally-verified epitopes that gave a good coverage of thecohort of patients (Figure 3). First the epitope FLGKIWPSHKwas selected since this induced a response in the most patients,namely 10 patients (5, 7, 8, 10, 13, 18, 21, 22, 30, and 31).Then the epitope AFDLSFFLK was selected because this epi-tope gave an optimal coverage in the patients not covered bythe first epitope. This second epitope covered the five patients:3, 6, 9, 19, and 23 (in addition to the two patients: 13 and 30,which were already covered by the first epitope). After select-ing only nine peptides this way, all 30 HIV responsive patientswere included. That so few epitopes can cover both the vari-

ability in what pathogens look like (strains) as well as whatindividual immune system looks for (HLA variability) is verypromising for an attempt to develop vaccines as well as diag-nostic and research tools that have to deal with this variability.

Future DirectionsThe potency of an HIV-specific T cell response is dependent onthe complex interactions between viral evolution causingimmune escape, the fitness cost of the mutation, and recogni-tion of escape variants. To facilitate detailed characterizationsof these interactions, we have successfully identified a numberof CTL epitopes that induce a response in subjects with diverse

ASHI Quarterly Third Quarter 200864

S C I E N T I F I C C O M M U N I C A T I O N S

Figure 1: Immunogenicity of HLA class I supertype-restricted epitopes. The selected epitopes within the HIV regions Gag, Pol, Env, Nef, and Tat were restricted to the most common HLA class I supertypes to

give optimal coverage in the human population (Lund et al., 2008).

The test subjects were a genetically diverse population composed of 16 men and 15 women originating from 13 countries and infected with several different HIV-1 subtypes and CRFs. Detection of IFN-γ production in the Enzyme Linked Immunospot Assay (ELISPOT) Assay using peripheral blood mononuclear cells were used to identify HIV-specific T cell responses against the 184 selected HLA class I supertype restricted epitopes. Responses against 114 (62 percent) of the tested epitopes were detected in at least one study subject, of which 45 were novel epitopes not previously defined. The majority of the epitopes in HIV-Gag, -Pol, -Env, and –Nef were immunogenic in at least one of the study subjects, while only 38 percent of the tested epitopes in HIV-Tat induced a T cell response.

T H E N E W A G E O F V A C C I N E D E V E L O P M E N TAfter the Second World War the ability to make cell cultures, i.e., the ability to grow cells from higher organisms such as vertebrates in the laboratory, made it easier to develop new vaccines, and the number of pathogens for which vaccines can be made have almost doubled. Before that time many vaccines, were grown in chicken embryo cells (from eggs), and even today many vaccines such as the influenza vaccine, are still produced in eggs, but alternatives are being in-vestigated (Mabrouk and Ellis, 2002). Vaccines have been made for only 34 of the more than 400 known pathogens that are harmful to man. Some of the vaccines, however, are not used to their full potential. It is estimated that immunization saves the lives of 3 million children each year, but that 2 million more lives could be saved if existing vaccines were applied on a full-

Immunological Bioinformatics

June 2011! 22

scale worldwide (André, 2003). To solve the problems of the slow and many times unreliable traditional vaccine development protocols, recombinant vaccines using selected proteins (subunit vaccines ) are now emerging e.g., for TB (Andersen and Doherty, 2005), and HPV (Lowy and Schiller, 2006; Lorenzen et al., 1998).

For veterinary vaccines DNA constructs have been developed that produces, often a single, strongly immunogenic protein intracellularly, and this way co induces the CTLs (Davis et al., 2001) . Subunit vaccines uses naturally protein-based parts of one or more pathogenic strains. These methods will often only induce an antibody response and no CTLs. However, a totally artificial construct of modified proteins inducing cross strain protective antibodies and a CTL response using e.g., DNA vaccines at the same time have not yet appeared but will be the next logical step. For this purpose rational selection of CTL multi epitope encoding constructs (polytopes) will be an additional bioinformatical challenge as further described in (Lund et al., 2005).

Immunological Bioinformatics

June 2011! 23

Bibliography

Abele, R., and Tampé, R. (2004). The ABCs of Immunol-ogy: Structure and Function of TAP, the Transporter Asso-ciated with Antigen Processing. Physiology 19, 216-224.

Adams, H.P., and Koziol, J.A. (1995). Prediction of binding to MHC class I molecules. J Immunol Methods 185, 181-190.

Alix, A.J. (1999). Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 18, 311-14.

Andersen, P., and Doherty, T.M. (2005). The success and failure of BCG - implications for a novel tuberculosis vac-cine. Nat Rev Microbiol 3, 656-662.

Anderson, K., Cresswell, P., Gammon, M., Hermes, J., Wil-liamson, A., and Zweerink, H. (1991). Endogenously synthe-sized peptide with an endoplasmic reticulum signal se-quence sensitizes antigen processing mutant cells to class I-restricted cell-mediated lysis. J Exp Med 174, 489-492.

André, F.E. (2003). Vaccinology: past achievements, present roadblocks and future promises. Vaccine 21, 593-95.

Bacik, I., Cox, J.H., Anderson, R., Yewdell, J.W., and Ben-nink, J.R. (1994). TAP (transporter associated with antigen processing)-independent presentation of endogenously syn-thesized peptides is enhanced by endoplasmic reticulum insertion sequences located at the amino- but not carboxyl-terminus of the peptide. J Immunol 152, 381-87.

Baldi, P., and Brunak, S. (2001). Bioinformatics: The Ma-chine Learning Approach, 2nd edition (Cambridge, Mass.: MIT Press).

Beigel, J.H., Farrar, J., Han, A.M., Hayden, F.G., Hyer, R., de Jong, M.D., Lochindarat, S., Nguyen, T.K., Nguyen, T.H., et al. (2005). Avian influenza A (H5N1) infection in humans. N Engl J Med 353, 1374-385.

Bender, B.S., Croghan, T., Zhang, L., and Small, P.A. (1992). Transgenic mice lacking class I major histocompatibility complex-restricted T cells have delayed viral clearance and increased mortality after influenza virus challenge. J Exp Med 175, 1143-45.

Bhasin, M., and Raghava, G.P.S. (2005). Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic se-quences. Nucleic Acids Research 33, W202-07.

Blythe, M.J., and Flower, D.R. (2005). Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Sci 14, 246-48.

Blythe, M.J., Doytchinova, I.A., and Flower, D.R. (2002). JenPep: a database of quantitative functional peptide data for immunology. Bioinformatics 18, 434-39.

Brusic, V., Rudy, G., and Harrison, L.C. (1994). Prediction of MHC binding peptides using artificial neural networks. In Complex systems: mechanism of adaptation, A.Y.X. Stonier RJ, eds. (Amsterdam: IOS Press).

Bui, H.H., Sidney, J., Peters, B., Sathiamurthy, M., Sinichi, A., Purton, K.A., Mothe, B.R., Chisari, F.V., Watkins, D.I., and Sette, A. (2005). Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix ap-plications. Immunogenetics 57, 304-314.

Burton, D.R. (2002). Antibodies, Viruses and Vaccines. Na-ture Reviews Immunology 2, 706-713.

Buus, S., Lauemoller, S.L., Worning, P., Kesmir, C., Frimurer, T., Corbet, S., Fomsgaard, A., Hilden, J., Holm, A., and Brunak, S. (2003). Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach. Tissue Antigens 62, 378-384.

Chan, P.K.S. (2002). Outbreak of avian influenza A (H5N1) virus infection in Hong Kong in 1997. Clinical Infectious Diseases 34, 58-64.

Chang, S.T., Ghosh, D., Kirschner, D.E., and Linderman, J.J. (2006). Peptide length-based prediction of peptide-MHC class II binding. Bioinformatics 22, 2761-67.

Collisson, E.W., Pei, J., Dzielawa, J., and Seo, S.H. (2000). Cytotoxic T lymphocytes are critical in the control of infec-tious bronchitis virus in poultry. Dev Comp Immunol 24, 187-200.

Daniel, S., Brusic, V., Caillat-Zucman, S., Petrovsky, N., Harrison, L., Riganelli, D., Sinigaglia, F., Gallazzi, F., Ham-mer, J., and van Endert, P.M. (1998). Relationship between peptide selectivities of human transporters associated with antigen processing and HLA class I molecules. J Immunol 161, 617-624.

Davis, B.S., Chang, G.J., Cropp, B., Roehrig, J.T., Martin, D.A., Mitchell, C.J., Bowen, R., and Bunning, M.L. (2001). West Nile virus recombinant DNA vaccine protects mouse and horse from virus challenge and expresses in vitro a non-infectious recombinant antigen that can be used in enzyme-linked immunosorbent assays. J Virol 75, 4040-47.

Debelle, L., Wei, S.M., Jacob, M.P., Hornebeck, W., and Alix, A.J. (1992). Predictions of the secondary structure and antigenicity of human and bovine tropoelastins. Eur Bio-phys J 21, 321-29.

del Guercio, M.F. (1995). Binding of a peptide antigen to multiple HLA alleles allows definition of an A2-like super-type. The Journal of Immunology 154, 685-693.

Donnes, P., and Kohlbacher, O. (2005). Integrated modeling of the major events in the MHC class I antigen processing pathway. Protein Sci 14, 2132-140.

Immunological Bioinformatics

June 2011! 24

Doytchinova, I.A., and Flower, D.R. (2002). Physicochemi-cal explanation of peptide binding to HLA-A*0201 major histocompatibility complex: a three-dimensional quantita-tive structure-activity relationship study. Proteins 48, 505-518.

Doytchinova, I.A., Guan, P., and Flower, D.R. (2006). EpiJen: a server for multistep T cell epitope prediction. BMC Bioinformatics 7, 131.

Eisenlohr, L.C., Bacik, I., Bennink, J.R., Bernstein, K., and Yewdell, J.W. (1992). Expression of a membrane protease enhances presentation of endogenous antigens to MHC class I-restricted T lymphocytes. Cell 71, 963-972.

Eriksson, K., and Holmgren, J. (2002). Recent advances in mucosal vaccines and adjuvants. Curr Opin Immunol 14, 666-672.

Falk, K., Rotzschke, O., and Rammensee, H.G. (1990). Cel-lular peptide composition governed by major histocompati-bility complex class I molecules. Nature 348, 248-251.

Fleury, D., Daniels, R.S., Skehel, J.J., Knossow, M., and Bizebard, T. (2000). Structural evidence for recognition of a single epitope by two distinct antibodies. Proteins 40, 572-78.

Fruci, D., Niedermann, G., Butler, R.H., and van Endert, P.M. (2001). Efficient MHC class I-independent amino-terminal trimming of epitope precursor peptides in the endoplasmic reticulum. Immunity 15, 467-476.

Godkin, A.J., Smith, K.J., Willis, A., Tejada-Simon, M.V., Zhang, J., Elliott, T., and Hill, A.V. (2001). Naturally proc-essed HLA class II peptides reveal highly conserved immu-nogenic flanking region sequence preferences that reflect antigen processing rather than peptide-MHC interactions. J Immunol 166, 6720-27.

Greenbaum, J.A., Andersen, P.H., Blythe, M., Bui, H., Cachau, R.E., Crowe, J., Davies, M., Kolaskar, A.S., Lund, O., et al. (2007). Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. Journal of Molecular Recognition 20, 75-82.

Gulukota, K., and DeLisi, C. (1996). HLA allele selection for designing peptide vaccines. Genet Anal 13, 81-86.

Gulukota, K., Sidney, J., Sette, A., and DeLisi, C. (1997). Two complementary methods for predicting peptides binding major histocompatibility complex molecules. J. Mol.r Biol. 267, 1258--1267.

Hakenberg, J., Nussbaum, A.K., Schild, H., Rammensee, H.G., Kuttler, C., Holzhutter, H.G., Kloetzel, P.M., Kauf-mann, S.H., and Mollenkopf, H.J. (2003). MAPPP: MHC class I antigenic peptide processing prediction. Appl Bioin-formatics 2, 155-58.

Hamelryck, T. (2005). An amino acid has two sides: a new 2D measure provides a different view of solvent exposure. Proteins 59, 38-48.

Haste Andersen, P., Nielsen, M., and Lund, O. (2006). Pre-diction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 15, 2558-567.

Henikoff, S., and Henikoff, J.G. (1992). Amino acid substitu-tion matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915-19.

Henikoff, S., and Henikoff, J.G. (1993). Performance evalua-tion of amino acid substitution matrices. Proteins 17, 49-61.

Holmes, K.V. (2003). SARS coronavirus: a new challenge for prevention and therapy. J Clin Invest 111, 1605-09.

Holzhutter, H.G., and Kloetzel, P.M. (2000). A kinetic model of vertebrate 20S proteasome accounting for the generation of major proteolytic fragments from oligomeric peptide substrates. Biophysical Journal 79, 1196-1205.

Holzhutter, H.G., Frommel, C., and Kloetzel, P.M. (1999). A theoretical approach towards the identification of cleavage-determining amino acid motifs of the 20 S proteasome. J. Mol. Biol. 286, 1251-265.

Hopp, T.P. (1994). Different views of protein antigenicity. Pept Res 7, 229-231.

Hopp, T.P., and Woods, K.R. (1981). Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci U S A 78, 3824-28.

Hopp, T.P., and Woods, K.R. (1983). A computer program for predicting protein antigenic determinants. Mol Immu-nol 20, 483-89.

Jacob, L., and Vert, J.P. (2007). Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bio-informatics

Jameson, B.A., and Wolf, H. (1988). The antigenic index: a novel algorithm for predicting antigenic determinants. Comput Appl Biosci 4, 181-86.

Janeway, C. (2005). Immunobiology : the immune system in health and disease (New York: Garland Science).

Jojic, N., Reyes-Gomez, M., Heckerman, D., Kadie, C., and Schueler-Furman, O. (2006). Learning MHC I--peptide binding. Bioinformatics 22, e227-235.

Jones, D.T., Taylor, W.R., and Thornton, J.M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8, 275-282.

Kawashima, S., and Kanehisa, M. (2000). AAindex: amino acid index database. Nucleic Acids Res 28, 374.

Kesmir, C., Nussbaum, A.K., Schild, H., Detours, V., and Brunak, S. (2002). Prediction of proteasome cleavage motifs by neural networks. Protein Eng 15, 287-296.

Kloetzel, P.M. (2004). The proteasome and MHC class I antigen processing. Biochim Biophys Acta 1695, 225-233.

Immunological Bioinformatics

June 2011! 25

Kondo, A., Sidney, J., Southwood, S., del Guercio, M.F., Appella, E., Sakamoto, H., Grey, H.M., Celis, E., Chesnut, R.W., et al. (1997). Two distinct HLA-A*0101-specific sub-motifs illustrate alternative peptide binding modes. Immu-nogenetics 45, 249-258.

Kubo, R.T., Sette, A., Grey, H.M., Appella, E., Sakaguchi, K., Zhu, N.Z., Arnott, D., Sherman, N., Shabanowitz, J., and Michel, H. (1994). Definition of specific peptide motifs for four major HLA-A alleles. J Immunol 152, 3913-924.

Kulkarni-Kale, U., Bhosle, S., and Kolaskar, A.S. (2005). CEP: a conformational epitope prediction server. Nucleic Acids Res 33, W168-171.

Kuttler, C., Nussbaum, A.K., Dick, T.P., Rammensee, H.G., Schild, H., and Hadeler, K.P. (2000). An Algorithm for the Prediction of Proteasomal Cleavages. J. Mol. Biol. 298, 417-429.

Kuwano, K., Scott, M., Young, J.F., and Ennis, F.A. (1988). HA2 subunit of influenza A H1 and H2 subtype viruses in-duces a protective cross-reactive cytotoxic T lymphocyte response. J Immunol 140, 1264-68.

Larsen, J.E., Lund, O., and Nielsen, M. (2006). Improved method for predicting linear B-cell epitopes. Immunome Res 2, 2.

Larsen, M.V., Lundegaard, C., Lamberth, K., Buus, S., Brunak, S., Lund, O., and Nielsen, M. (2005). An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. Eur J Immunol 35, 2295-2303.

Larsen, M.V., Lundegaard, C., Lamberth, K., Buus, S., Lund, O., and Nielsen, M. (2007). Large-scale validation of meth-ods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics 8, 424.

Larsen, M.V., Nielsen, M., Weinzierl, A., and Lund, O. (2006). TAP-Independent MHC Class I Presentation. Cur-rent Immunology Reviews 2, 233-245.

Levitt, M. (1976). A simplified representation of protein conformations for rapid simulation of protein folding. J Mol Biol 104, 59-107.

Li, Z., Woo, C.J., Iglesias-Ussel, M.D., Ronai, D., and Scharff, M.D. (2004). The generation of antibody diversity through somatic hypermutation and class switch recombi-nation. Genes and Development 18, 1-11.

Lin, H.H., Ray, S., Tongchusak, S., Reinherz, E.L., and Bru-sic, V. (2008). Evaluation of MHC class I peptide binding prediction servers: applications for vaccine research. BMC Immunol 9, 8.

Logean, A., Sette, A., and Rognan, D. (2001). Customized versus universal scoring functions: application to class I MHC-peptide binding free energy predictions. Bioorg Med Chem Lett 11, 675-79.

Lorenzen, N., Lorenzen, E., Einer-Jensen, K., Heppell, J., Wu, T., and Davis, H. (1998). Protective immunity to VHS in rainbow trout (Oncorhynchus mykiss, Walbaum) follow-ing DNA vaccination. Fish and Shellfish Immunology 8, 261-270.

Loureiroa, J., and Ploegha, H.L. (2006). Antigen Presenta-tion and the Ubiquitin-Proteasome System in Host–Patho-gen Interactions. Advances in Immunology 92, 225-305.

Lowy, D.R., and Schiller, J.T. (2006). Prophylactic human papillomavirus vaccines. J Clin Invest 116, 1167-173.

Lund, O., Nielsen, M., Kesmir, C., Petersen, A.G., Lunde-gaard, C., Worning, P., Sylvester-Hvid, C., Lamberth, K., Røder, G., et al. (2004). Definition of supertypes for HLA molecules using clustering of specificity matrices. Immuno-genetics 55, 797-810.

Lund, O., Nielsen, M., Lundegaard, C., and Karlsson, A.C. (2008). Successful Use of Bioinformatics to Identify Broadly Immunogenic HLA Class I Restricted T-cell Responses against HIV. ASHI Quarterly 32, 62-66.

Lund, O., Nielsen, M., Lundegaard, C., KeŞmir, C., and Brunak, S. (2005). Immunological Bioinformatics (The MIT Press, Cambridge, Massachusetts, London, England).

Lundegaard, C., Lund, O., Kesmir, C., Brunak, S., and Niel-sen, M. (2007). Modeling the adaptive immune system: pre-dictions and simulations. Bioinformatics 23, 3265-275.

Lyngsø, R.B., Pedersen, C.N., and Nielsen, H. (1999). Met-rics and similarity measures for hidden Markov models. Proc Int Conf Intell Syst Mol Biol , 178-186.

Mabrouk, T., and Ellis, R.W. (2002). Influenza vaccine tech-nologies and the use of the cell-culture process (cell-culture influenza vaccine). Dev Biol (Basel) 110, 125-134.

Maksyutov, A.Z., and Zagrebelnaya, E.S. (1993). ADEPT: a computer program for prediction of protein antigenic de-terminants. Comput Appl Biosci 9, 291-97.

Marra, M.A., Jones, S.J., Astell, C.R., Holt, R.A., Brooks-Wilson, A., Butterfield, Y.S., Khattra, J., Asano, J.K., Barber, S.A., et al. (2003). The Genome sequence of the SARS-associated coronavirus. Science 300, 1399-1404.

McMichael, A.J., Gotch, F.M., Noble, G.R., and Beare, P.A. (1983). Cytotoxic T-cell immunity to influenza. N Engl J Med 309, 13-17.

Mirza, O., Henriksen, A., Ipsen, H., Larsen, J.N., Wissen-bach, M., Spangfort, M.D., and Gajhede, M. (2000). Domi-nant epitopes and allergic cross-reactivity: complex forma-tion between a Fab fragment of a monoclonal murine IgG antibody and the major allergen from birch pollen Bet v 1. J Immunol 165, 331-38.

Moutaftsi, M., Peters, B., Pasquetto, V., Tscharke, D.C., Sidney, J., Bui, H.H., Grey, H., and Sette, A. (2006). A con-sensus epitope prediction approach identifies the breadth

Immunological Bioinformatics

June 2011! 26

of murine T(CD8+)-cell responses to vaccinia virus. Nat Biotechnol 24, 817-19.

Nielsen, M., Lundegaard, C., and Lund, O. (2007). Predic-tion of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioin-formatics 8, 238.

Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Røder, G., Peters, B., Sette, A., et al. (2007). NetMHCpan, a method for quantitative predic-tions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE 2, e796.

Nielsen, M., Lundegaard, C., Blicher, T., Peters, B., Sette, A., Justesen, S., Buus, S., and Lund, O. (2008). Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol 4, e1000107.

Nielsen, M., Lundegaard, C., Lund, O., and Kesmir, C. (2005). The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predic-tions of proteasomal cleavage. Immunogenetics 57, 33-41.

Nielsen, M., Lundegaard, C., Worning, P., Hvid, C.S., Lam-berth, K., Buus, S., Brunak, S., and Lund, O. (2004). Im-proved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 20, 1388-397.

Nielsen, M., Lundegaard, C., Worning, P., Lauemoller, S.L., Lamberth, K., Buus, S., Brunak, S., and Lund, O. (2003). Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci 12, 1007-017.

Novotny, J., Handschumacher, M., Haber, E., Bruccoleri, R.E., Carlson, W.B., Fanning, D.W., Smith, J.A., and Rose, G.D. (1986). Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains). Proc Natl Acad Sci U S A 83, 226-230.

Nussbaum, A.K., Kuttler, C., Hadeler, K.P., Rammensee, H.G., and Schild, H. (2001). PAProC: a prediction algo-rithm for proteasomal cleavages available on the WWW. Immunogenetics 53, 87-94.

Odorico, M., and Pellequer, J.L. (2003). BEPITOPE: pre-dicting the location of continuous epitopes and patterns in proteins. J Mol Recognit 16, 20-22.

Pamer, E.G., Davis, C.E., and So, M. (1991). Expression and deletion analysis of the Trypanosoma brucei rhodesiense cysteine protease in Escherichia coli. Infect Immun 59, 1074-78.

Parker, J.M., Guo, D., and Hodges, R.S. (1986). New hydro-philicity scale derived from high-performance liquid chro-matography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived acces-sible sites. Biochemistry 25, 5425-432.

Parker, K.C., Bednarek, M.A., and Coligan, J.E. (1994). Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol 152, 163-175.

Pei, J., Briles, W.E., and Collisson, E.W. (2003). Memory T cells protect chicks from acute infectious bronchitis virus infection. Virology 306, 376-384.

Pellequer, J.L., Westhof, E., and Van Regenmortel, M.H. (1991). Predicting location of continuous epitopes in pro-teins from their primary structures. Methods Enzymol 203, 176-201.

Pellequer, J.L., Westhof, E., and Van Regenmortel, M.H. (1993). Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunol Lett 36, 83-99.

Perez, C.L., Larsen, M.V., Gustafsson, R., Norstrom, M.M., Atlas, A., Nixon, D.F., Nielsen, M., Lund, O., and Karlsson, A.C. (2008). Broadly immunogenic HLA class I supertype-restricted elite CTL epitopes recognized in a diverse popu-lation infected with different HIV-1 subtypes. J Immunol 180, 5092-5100.

Peters, B., and Sette, A. (2005). Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioin-formatics 6

Peters, B., Bui, H.H., Frankild, S., Nielson, M., Lundegaard, C., Kostem, E., Basch, D., Lamberth, K., Harndahl, M., et al. (2006). A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2, e65.

Peters, B., Bulik, S., Tampe, R., Endert, P.M.V., and Holz-hutter, H.G. (2003). Identifying MHC class I epitopes by predicting the TAP transport efficiency of epitope precur-sors. J. Immunol. 171, 1741-49.

Rammensee, H., Bachmann, J., Emmerich, N.P., Bachor, O.A., and Stevanovic, S. (1999). SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50, 213-19.

Rammensee, H.G., Bachmann, J., and Stevanovic, S. (1997). MHC ligands and Peptide Motifs (New York: Chapman & Hall).

Reche, P.A., Glutting, J.P., and Reinherz, E.L. (2002). Pre-diction of MHC class I binding peptides using profile mo-tifs. Hum Immunol 63, 701-09.

Regenmortel, M.H.V.V., and Muller, S. (1999). Synthetic peptides as antigens, P.C.V.D. Vliet, eds. (Amsterdam: El-sevier).

Rotzschke, O., Falk, K., Stevanovic, S., Jung, G., Walden, P., and Rammensee, H.G. (1991). Exact prediction of a natural T cell epitope. Eur J Immunol 21, 2891-94.

Immunological Bioinformatics

June 2011! 27

Saxová, P., Buus, S., Brunak, S., and Kesmir, C. (2003). Pre-dicting proteasomal cleavage sites: a comparison of available methods. Int. Immunol. 15, 781-87.

Schafer, J.R., Jesdale, B.M., George, J.A., Kouttab, N.M., and De Groot, A.S. (1998). Prediction of well-conserved HIV-1 ligands using a matrix-based algorithm, EpiMatrix. Vaccine 16, 1880-84.

Seo, S.H., and Collisson, E.W. (1997). Specific cytotoxic T lymphocytes are involved in in vivo clearance of infectious bronchitis virus. J Virol 71, 5173-77.

Seo, S.H., and Collisson, E.W. (1998). Cytotoxic T lympho-cyte responses to infectious bronchitis virus infection. Adv Exp Med Biol 440, 455-460.

Seo, S.H., Pei, J., Briles, W.E., Dzielawa, J., and Collisson, E.W. (2000). Adoptive transfer of infectious bronchitis virus primed alphabeta T cells bearing CD8 antigen protects chicks from acute infection. Virology 269, 183-89.

Sette, A., and Sidney, J. (1999). Nine major HLA class I su-pertypes account for the vast preponderance of HLA-A and –B polymorphism. Immunogenetics 50, 201-212.

Sette, A., Buus, S., Appella, E., Smith, J.A., Chesnut, R., Miles, C., Colon, S.M., and Grey, H.M. (1989). Prediction of major histocompatibility complex binding regions of pro-tein antigens by sequence pattern analysis. Proc. Natl. Acad. Sci. U S A 86, 3296-2300.

Sidney, J., del Guercio, M.F., Southwood, S., Engelhard, V.H., Appella, E., Rammensee, H.G., Falk, K., Rötzschke, O., Takiguchi, M., and Kubo, R.T. (1995). Several HLA al-leles share overlapping peptide specificities. J Immunol 154, 247-259.

Sidney, J., Grey, H.M., Southwood, S., Celis, E., Wentworth, P.A., del Guercio, M.F., Kubo, R.T., Chesnut, R.W., and Sette, A. (1996). Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum Immunol 45, 79-93.

Sidney, J., Peters, B., Frahm, N., Brander, C., and Sette, A. (2008). HLA class I supertypes: a revised and updated clas-sification. BMC Immunol 9, 1.

Sollner, J. (2006). Selection and combination of machine learning classifiers for prediction of linear B-cell epitopes on proteins. J Mol Recognit 19, 209-214.

Sollner, J., and Mayer, B. (2006). Machine learning ap-proaches for prediction of linear B-cell epitopes on pro-teins. J Mol Recognit 19, 200-08.

Stoltze, L., Schirle, M., Schwarz, G., Schroter, C., Thomp-son, M.W., Hersh, L.B., Kalbacher, H., Stevanovic, S., Rammensee, H.G., and Schild, H. (2000). Two new prote-ases in the MHC class I processing pathway. Nat. Immunol. 1, 413--418.

Sturniolo, T., Bono, E., Ding, J., Raddrizzani, L., Tuereci, O., Sahin, U., Braxenthaler, M., Gallazzi, F., Protti, M.P., et al. (1999). Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol 17, 555-561.

Sweredoski, M.J., and Baldi, P. (2008). PEPITO: improved discontinuous B-cell epitope prediction using multiple dis-tance thresholds and half sphere exposure. Bioinformatics 24, 1459-460.

Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science 240, 1285-293.

Sylvester-Hvid, C., Nielsen, M., Lamberth, K., Roder, G., Justesen, S., Lundegaard, C., Worning, P., Thomadsen, H., Lund, O., et al. (2004). SARS CTL vaccine candidates; HLA supertype-, genome-wide scanning and biochemical valida-tion. Tissue Antigens 63, 395.

Tenzer, S., Peters, B., Bulik, S., Schoor, O., Lemmel, C., Schatz, M.M., Kloetzel, P.M., Rammensee, H.G., Schild, H., and Holzhutter, H.G. (2005). Modeling the MHC class I pathway by combining predictions of proteasomal cleav-age, TAP transport and MHC class I binding. Cell Mol Life Sci 62, 1025-037.

Tenzer, S., Stoltze, L., Schonfisch, B., Dengjel, J., Muller, M., Stevanovic, S., Rammensee, H.G., and Schild, H. (2004). Quantitative analysis of prion-protein degradation by con-stitutive and immuno-20S proteasomes indicates differences correlated with disease susceptibility. J Immunol 172, 1083-091.

Thornton, J.M., Edwards, M.S., Taylor, W.R., and Barlow, D.J. (1986). Location of 'continuous' antigenic determinants in the protruding regions of proteins. Embo J 5, 409-413.

Ulmer, J.B., Donnelly, J.J., Parker, S.E., Rhodes, G.H., Felg-ner, P.L., Dwarki, V.J., Gromkowski, S.H., Deck, R.R., De-Witt, C.M., and Friedman, A. (1993). Heterologous protec-tion against influenza by injection of DNA encoding a viral protein. Science 259, 1745-49.

van Regenmortel, M.H., and Pellequer, J.L. (1994). Predict-ing antigenic determinants in proteins: looking for unidi-mensional solutions to a three-dimensional problem? Pept Res 7, 224-28.

van Regenmortel, M.H.V. (1996). Mapping Epitope Struc-ture and Activity: From One-Dimensional Prediction to Four-Dimensional Description of Antigenic Specificity. Methods 9, 465-472.

Wang, M., Lamberth, K., Harndahl, M., Røder, G., Stryhn, A., Larsen, M.V., Nielsen, M., Lundegaard, C., Tang, S.T., et al. (2007). CTL epitopes for influenza A including the H5N1 bird flu; genome-, pathogen-, and HLA-wide screening. Vaccine 25, 2823-831.

Wang, P., Sidney, J., Dow, C., Mothé, B., Sette, A., and Pe-ters, B. (2008). A systematic assessment of MHC class II

Immunological Bioinformatics

June 2011! 28

peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol 4, e1000048.

Yu, K., Petrovsky, N., Schonbach, C., Koh, J.Y., and Brusic, V. (2002). Methods for prediction of peptide binding to MHC molecules: a comparative study. Mol Med 8, 137-148.

Zhang, G.L., Petrovsky, N., Kwoh, C.K., August, J.T., and Brusic, V. (2006). PRED(TAP): a system for prediction of peptide binding to the human transporter associated with antigen processing. Immunome Res 2, 3.

Zhang, H., Lundegaard, C., and Nielsen, M. (2009). Pan-specific MHC class I predictors: a benchmark of HLA class I pan-specific prediction methods. Bioinformatics 25, 83-89.

Zhang, Q., Wang, P., Kim, Y., Haste-Andersen, P., Beaver, J., Bourne, P.E., Bui, H.H., Buus, S., Frankild, S., et al. (2008). Immune epitope database analysis resource (IEDB-AR). Nucleic Acids Res 36, W513-18.

Zhang, Q., Yoon, S., and Welsh, W.J. (2005). Improved method for predicting ß-turn using support vector machine. Bioinformatics 21, 2370-74.

Zhihua, L., Yuzhang, W., Bo, Z., Bing, N., and Li, W. (2004). Toward the quantitative prediction of T-cell epitopes: QSAR studies on peptides having affinity with the class I MHC molecular HLA-A*0201. J Comput Biol 11, 683-694.

Immunological Bioinformatics

June 2011! 29