profiles for the analysis of immunoglobulin sequences: comparison of v gene subgroups

5
Protein Science (1995), 4:306-310. Cambridge University Press. Printed in rhe USA Copyright 0 1995 The Protein Society ~~ . ~ ~ ~ FOR THE RECORD Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups B LINDA HARRIS AND JURGEN BAJORATH Bristol-MyersSquibb Pharmaceutical Research Institute, 3005 First Avenue, Seattle, Washington 98121 (RECEIVED November 4, 1994; ACCEPTED November 18, 1994) Abstract: A format for the structure-oriented analysis of im- munoglobulin (Ig) variable region sequences is presented and applied to generatesequence profiles for comparison of heavy- andlight-chainsubgroups.Theprofile allows simultaneous evaluation of sequences and structural information and can be used for a number of different applications. Keywords: immunoglobulin V genes; sequence analysis; sequence-structure correlation; subgroups ~. ”” ~ ~ - ~ ~~ ~~ ~ ~~ ~~~ Immunoglobulin variable regions display high sequence similar- ity (50% or more) and are thus amenable to standard sequence comparison methods. However, attempts such as model build- ing (Bajorath, 1994) and humanization (Hsiao et al., 1994; Stud- nicka et al., 1994) of antibodies make it necessary to include structural criteria in sequence comparisons. For example, mod- eling requires separate assessment of framework and comple- mentarity determining region (CDR) loop sequence similarities and the identification of residues that determine CDR loop conformations (Chothia et al., 1989). Humanization of murine antibody fragments requires not only the assessment of global sequence similarities between human and murine antibodies, but also the comparison of structural determinantresidues (Chothia et al., 1989) and of residues that are highly relevant for success- ful humanization of murine antibody combining sites (Studnicka et ai., 1994). For the analysis of variable region sequences (of the type out- lined above), we have designed a format for structure-oriented analysis of single or multiple sequences (called IgSS [immuno- globulin sequence-structure] profiles). The IgSS profiles allow the identification of a number of structurally importantresidues in immunoglobulin sequences. The following residues are iden- tified in the profiles: (I) hypervariable and framework regions (Kabat et al., 1991); (2) CDR loops and structural determinants for CDR loop conformations (Chothia et al., 1989); (3) invari- Reprint requests to: Linda Harris or Jurgen Bajorath, Bristol-Myers Squibb Pharmaceutical Research Institute, 3005 First Avenue, Seattle, Washington 98121; e-mail: [email protected]. ~ ~.. ~ ~~~_____~___~” ant buried polarresidues in antibody Fv structures (Novotny & Haber, 1985); (4) residues involved in interdomain hydrogen bonding (Novotny & Haber, 1985); (5) solvent-exposed residues involved in Fv domain-domain association (Novotny & Haber, 1985); (6) structurally most conserved regions (Novotny & Sharp, 1992); (7) low, medium, and high risk positions for antibody hu- manization (for example, a “high risk” position may be critical for the conformation of a CDR loop and may thus require the presence of the original murine residue) (Studnicka et al., 1994); and (8) immunoglobulin superfamily (IgSF) consensus residues of the V-set (Williams & Barclay, 1988). Figures 1, 2, 3, and 4 present the IgSS profile of consensus sequences of human and murine VH and VK subgroups. The residues listed represent the most common amino acid in the se- quence compilation of Kabat et al. (1991). If colored, the listed residue occurs in at least 95% of the sequences in that subgroup. If no residue is shown, the majority of the sequences do not have a residue at this position. The standard Kabat numbering scheme (Kabat et al., 1991) is used with the exception of the L1 and H1 regions where it is modified according to Chothia and colleagues (Chothia et al., 1989). Further details are given in the figure legend. Comparison of the four sets of profiles in Figures 1, 2, 3, and 4 shows the occurrence of different levels of amino acid sequence conservation: V gene invariant, class specific (different in VH and VL), species and class specific, and subgroup specific. Only residues that are found in >95% of the specified sequences are considered conserved; thus, these provide meaningful “anchors” for alignment. In particular, V gene invariance is most notice- able at positions H22, H36, H92, H104, H106, H107 and L23, L35, L88, L99, LIOI, L102 in VH and VL, respectively. The use of these residues for alignment facilitates the appropriateplace- ment of gaps in regions that are variable in length. These regions include the CDRs and/or hypervariable regions that show dif- ferent levels of sequence variabilityin the profile. Some residues outside of known structural determinants (Chothia et al., 1989) but within CDRs and hypervariable regions are highly conserved. For example, residues Tyr 59H and Ser 26L are highly conserved among murine and human sequences. The presence of remark- ably conserved residues in hypervariable regions may suggest previously unrecognized significance. 306

Upload: linda-harris

Post on 06-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups

Protein Science (1995), 4:306-310. Cambridge University Press. Printed in rhe USA Copyright 0 1995 The Protein Society

~~ . ~ ~ ~

FOR THE RECORD

Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups B

LINDA HARRIS AND JURGEN BAJORATH Bristol-Myers Squibb Pharmaceutical Research Institute, 3005 First Avenue, Seattle, Washington 98121 (RECEIVED November 4, 1994; ACCEPTED November 18, 1994)

Abstract: A format for the structure-oriented analysis of im- munoglobulin (Ig) variable region sequences is presented and applied to generate sequence profiles for comparison of heavy- and light-chain subgroups. The profile allows simultaneous evaluation of sequences and structural information and can be used for a number of different applications.

Keywords: immunoglobulin V genes; sequence analysis; sequence-structure correlation; subgroups

~. ”” ~ ~ - ~ ~~ ~~ ~ ~~ ~~~

Immunoglobulin variable regions display high sequence similar- ity (50% or more) and are thus amenable to standard sequence comparison methods. However, attempts such as model build- ing (Bajorath, 1994) and humanization (Hsiao et al., 1994; Stud- nicka et al., 1994) of antibodies make it necessary to include structural criteria in sequence comparisons. For example, mod- eling requires separate assessment of framework and comple- mentarity determining region (CDR) loop sequence similarities and the identification of residues that determine CDR loop conformations (Chothia et al., 1989). Humanization of murine antibody fragments requires not only the assessment of global sequence similarities between human and murine antibodies, but also the comparison of structural determinant residues (Chothia et al., 1989) and of residues that are highly relevant for success- ful humanization of murine antibody combining sites (Studnicka et ai., 1994).

For the analysis of variable region sequences (of the type out- lined above), we have designed a format for structure-oriented analysis of single or multiple sequences (called IgSS [immuno- globulin sequence-structure] profiles). The IgSS profiles allow the identification of a number of structurally important residues in immunoglobulin sequences. The following residues are iden- tified in the profiles: ( I ) hypervariable and framework regions (Kabat et al., 1991); (2) CDR loops and structural determinants for CDR loop conformations (Chothia et al., 1989); (3) invari-

Reprint requests to: Linda Harris or Jurgen Bajorath, Bristol-Myers Squibb Pharmaceutical Research Institute, 3005 First Avenue, Seattle, Washington 98121; e-mail: [email protected].

~ ~.. ~ ~ ~ ~ _ _ _ _ _ ~ _ _ _ ~ ”

ant buried polar residues in antibody Fv structures (Novotny & Haber, 1985); (4) residues involved in interdomain hydrogen bonding (Novotny & Haber, 1985); ( 5 ) solvent-exposed residues involved in Fv domain-domain association (Novotny & Haber, 1985); (6) structurally most conserved regions (Novotny & Sharp, 1992); (7) low, medium, and high risk positions for antibody hu- manization (for example, a “high risk” position may be critical for the conformation of a CDR loop and may thus require the presence of the original murine residue) (Studnicka et al., 1994); and (8) immunoglobulin superfamily (IgSF) consensus residues of the V-set (Williams & Barclay, 1988).

Figures 1, 2 , 3, and 4 present the IgSS profile of consensus sequences of human and murine VH and V K subgroups. The residues listed represent the most common amino acid in the se- quence compilation of Kabat et al. (1991). If colored, the listed residue occurs in at least 95% of the sequences in that subgroup. I f no residue is shown, the majority of the sequences do not have a residue at this position. The standard Kabat numbering scheme (Kabat et al., 1991) is used with the exception of the L1 and H1 regions where it is modified according to Chothia and colleagues (Chothia et al., 1989). Further details are given in the figure legend.

Comparison of the four sets of profiles in Figures 1 , 2, 3, and 4 shows the occurrence of different levels of amino acid sequence conservation: V gene invariant, class specific (different in VH and VL), species and class specific, and subgroup specific. Only residues that are found in >95% of the specified sequences are considered conserved; thus, these provide meaningful “anchors” for alignment. In particular, V gene invariance is most notice- able at positions H22, H36, H92, H104, H106, H107 and L23, L35, L88, L99, LIOI, L102 in VH and VL, respectively. The use of these residues for alignment facilitates the appropriate place- ment of gaps in regions that are variable in length. These regions include the CDRs and/or hypervariable regions that show dif- ferent levels of sequence variability in the profile. Some residues outside of known structural determinants (Chothia et al., 1989) but within CDRs and hypervariable regions are highly conserved. For example, residues Tyr 59H and Ser 26L are highly conserved among murine and human sequences. The presence of remark- ably conserved residues in hypervariable regions may suggest previously unrecognized significance.

306

Page 2: Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups

Ig sequence profiles 307

tivk II (11 j IW 111 (sa) hVk IV (5)

. .

hvk 1 (50) hvk I1 (1 1) lvk 111 (%) w k IV (5)

. .

hVk I (50) hvk 11.(11) hVk 111 (56) hVk IV (5)

'b C

.' d e .$

.~

H L.

e.

- L T L L L

. . . . - - . , ' ' c3 Loop . '.

. - 1

7 a ," , 9. 0 5 0 ' ' 0- S a b c d e f B H L L H L L L H L H H H H

1 1 . . ; y " 0

a a a . .

b - . b C C

e e - e e . e - . . = - .. .:

I

Fig. 1. IgSS profile of human VK subgroups. Subgroup designations are listed in the left-hand column (yellow) and are fol- lowed by the number of available sequences (Kabat et al., 1991) in parentheses. Residues are listed in single-letter code and rep- resent the most common amino acids within that subgroup; "X" means that no single amino acid occurs at this position more than 50% of the time. Gaps show that the majority of sequences lack a residue at this position. Residues are annotated above the sequences as follows (from the top to the bottom row): residue numbers according to Kabat et al. (1991); L, M, H, low, medium, and high risk positions for antibody humanization (Studnicka et al., 1994); a, structurally most conserved segments in immunoglobulin variable domains (Novotny & Sharp, 1992); b, solvent-exposed residues involved in Fv domain-domain as- sociation; c, buried polar residues; d, interdomain hydrogen bonding residues (b-d [Novotny & Haber, 19851); e, IgSF V-set consensus residues (Williams & Barclay, 1988); f, "anchor" residues for alignment (see text); asterisks, structural determinants for CDR loop conformations (Chothia et al., 1989). Hypervariable regions (Kabat et al., 1991) and CDR loops (Chothia et al., 1989) are boxed in magenta and white, respectively. Consensus residues are color-coded as follows: magenta, invariant (>95%) across subgroups within a species; green, orange, blue, purple, invariant within a subgroup but different from subgroups shown in another color. In subgroups with less than 10 sequences (hvdv, mVKVII, and mVh VB), only magenta coloring is used. The heavy chain D regions lack consensus sequences, are of variable length, and can be recombined with any subgroup sequence. Within a species and class, the J regions can be recombined with any subgroup sequence. Double slashes mark the positions of recombination.

Page 3: Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups

308 L. Harris and J. Bajorath

mVkl (63) mVkll (220) mVklll (70) mVklV (43)

mVkV (220) mVkVl (1 53)

mVkVll (2)

mVkl (63) mVkll (220) rnVklll (70) mVklV (43)

mVkV (220) mVkVl (1 53)

mVkVll (2)

3 4 5 0 H H*M,..H L M L L M H M H H H a e a a

b b C

d e l

7 5

I I I

7 0

L3 Loop 1

1 1 Y 0 a n 8

k L L H L ? L H L H H H H H H H H H H H H H H H H H H H H H H ~ H b l L H L L L 6 0 7 0 L o b c d e l "

B a a b b

C C

G T K - L G T K L

Fig. 2. IgSS profile of murine VK subgroups. See Figure 1 for explanation of notation.

The IgSS profiles have a number of different applications. For humanization, the alignment of the murine and human sequences is annotated with the classification of Studnicka et al. (1994), as well as CDR loops and structural determinants. For antibody modeling, the sequence with unknown three-dimensional structure can easily be aligned with sequences of antibody struc- tures available in the Brookhaven Protein Data Bank (Bernstein et al., 1977), and homologous regions can be assigned and se- lected. By aligning an expressed V-gene with its germline coun- terpart, somatic mutations can be evaluated in the context of

structural information. Although only VK and VH IgSS profiies have been presented here, the profiles can be easily adapted for the analysis of X sequences and of other members of the V-set (e.g., T-cell receptors) using consensus residues for alignment.

The IgSS profiles shown in Figures 1,2,3, and 4 are provided in the Electronic Appendix (SUPLEMNT directory) in Excel (vers. 4.0) format and can be used as a template. The Excel for- mat allows the inclusion of many sequences, the display of only selected information, and the use of macros to manipulate the profiles.

Page 4: Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups

Zg sequence profiIes 309

-

1

hVHll(47) Q V Q Q E S P G a V K S ,!! hVHl(46) Q V Q 1 V Q Z E V K T j hVh 111 (79) E VQ V E G L V Q I- - "

4 4 M H M H H l

b b

hVH I (46) hVH II (47) hVh 111 (79)

H2 Loop :r i H H H H H H H I

C I X P X X ( ( I Y XS( ( I X X DG)

6 0

I H H M H M M H M L

( T X Y A Q K F Q C : T X Y N P S L K S

4 0

~ H M H L M L L a a a a

C d

e f

7 0 0 2 a b c

M H L H L H L M L L M L H L H L H L L H

8 8

e e e e

I - . 1 H3 Loap

1 1 n 9 9 0 0 3 0 4 L L L H L H M H H H H H H H H H H H H H H H H H H H H H H H

O a b c d e l g h l l k l

a a a

C

e e e e

f . hVHl(46) R S E T A V Y D region I1 AEY FQ i hVH II (47) T A A ' T A V Y A R I / D region /I ;; Dr i ion 11

YWY F D hVh 111 (79) R A E D TMV A F D

11 Y F C I / NWFD I1 Y Y Y Y Y G M D

I 1 0

H H L H H L H L H L L

b

I T V S I T V S I T V S J T V S J T V S a J T V S S

Fig. 3. IgSS profile of human VH subgroups. See Figure 1 for explanation of notation.

Acknowledgment Bank: A computer-based archival file for macromolecular structures.

We are grateful to Debby Baxter for help in preparation of the Chothia C, Lesk AM, Tramontano A, Levitt M, Smith-Gill SJ, Air G , Sher- manuscript. iff S, Padlan EA, Davies D, Tulip WR. 1989. Conformations of immu-

Hsiao I(, Bajorath J, Harris LJ. 1994. Humanization of 60.3, an anti-CD18

References Kabat EA, Wu TT, Perry HM, Gottesman KS, Foeller C. 1991. Sequences

JMoI Bi01112:535-542.

noglobulin hypervariable regions. Nature 342:877-883.

antibody; importance of the L2 loop. Protein Eng 2815-822.

of immunologically important proteins. DHHS Publication Number Bajorath J. 1994. Threedimensional model of the BR96 monoclonal anti- (Nm) 91-3242.

body variable fragment. Bioconjugate Chem 5:213-219. Novotny J, Haber E. 1985. Structural invariants on antigen binding: Com- Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr, Brice MD, Rodgers parison of VI-VH and VL-VL dimers. Proe Nut1 Acad Sei USA 82:

JR, Kennard 0, Shimanouchi T, Tasumi M. 1977. The Protein Data 4592-45%.

Page 5: Profiles for the analysis of immunoglobulin sequences: Comparison of V gene subgroups

310 L. Harris and J. Bajorath

. .

mVh IA ( 5 0 )

mVh IIA (1 50) mVh IB (75)

mVh llB (1 MI)

mVh lllA(12l) mVh IIC (32)

mVh Ill6 (33) mVh lllC (35) mVn lllD (64) mVh VA (47) m W W (1)

r- b

"

I

Fig. 4. IgSS profiie of murine VH subgroups. See Figure 1 for explanation of notation.

Novotny J, Sharp KA. 1992. Electrostatic fields in antibodies and anti- ing activity by preserving non-CDR complementarity-modulating resi-

Studnicka GM, Soares S, Better M, Williams RE , Nadell R, Horwitz AJ3. W i l l i a m s A F , Barclay AN. 1988. The immunoglobulin superfamily-Domains bodyhntigen complexes. Prog Biophys Mol Biol58:203-224. dues. Protein Eng 7:805-814.

1994. Human-engineered monoclonal antibodies retain full specific bind- for cell surface recognition. Annu Rev Immunol6:381-406.