questions to be addressed
DESCRIPTION
Questions to be addressed. Can multiple D genes be inserted? Violation of 12/23 rule Can D genes be inserted backwards? Is there a D gene preference? Is there a reading frame preference for D genes? If yes, is it part of the gene rearrangement? Who is doing the end trimming?. Data sets. - PowerPoint PPT PresentationTRANSCRIPT
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Questions to be addressed
• Can multiple D genes be inserted?– Violation of 12/23 rule
• Can D genes be inserted backwards?• Is there a D gene preference?• Is there a reading frame preference for D genes?– If yes, is it part of the gene rearrangement?
• Who is doing the end trimming?
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Data sets
• 6329 clonally unrelated rearrangements.– 1968 un-mutated functional– 3707 mutated functional– 274 un-mutated non-functional– 380 mutated non-functional
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
P nucleotides
Sequences Permutated sequences
Distance from heptamer to gene end
No. ofseq
No. withP
% with P No. ofseq
No. with P
% with P p-value
VH gene
1 1448 474 32.7 1635 103 6.3 <10-5
2 1027 48 4.7 1068 65 6.1 0.091
3 762 53 7.0 612 36 5.9 0.245
JH gene
1 324 60 18.5 350 23 6.6 <10-5
2 184 2 1.0 209 3 1.4 0.560
3 219 8 3.7 250 14 5.6 0.220
5’ end of D gene
1 519 128 24.7 619 54 8.7 <10-5
2 343 31 9.0 347 26 7.5 0.275
3 474 25 5.3 454 17 3.7 0.168
3’ end of D gene
1 616 86 14.0 684 58 8.5 0.001
2 266 30 11.3 276 24 8.7 0.195
3 460 5 1.1 485 9 1.9 0.241
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
How many types of D genes?
• Conventional D genes– Identified in 81% of sequences unmutated sequences, 64% of mutated sequences
• Inverted D genes– Long inverted D genes can not be excluded
• Two D genes• D genes with irregular RSS (DIR)• Chromosome 15 OR
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
D gene usage
27 conventional D genes, 34 known alleles
D-Gene Usage and Lengths
0
100
200
300
400
500
600
700
800
IGHD1-1IGHD2-2IGHD3-3
IGHD4-11/IGHD4-4IGHD5-18/IGHD5-5
IGHD6-6IGHD1-7IGHD2-8IGHD3-9IGHD3-10IGHD5-12IGHD6-13IGHD1-14IGHD2-15IGHD3-16IGHD4-17IGHD6-19IGHD1-20IGHD2-21IGHD3-22IGHD4-23IGHD5-24IGHD6-25IGHD1-26IGHD7-27
D Gene
Number of Sequences (bars) 0
5
10
15
20
25
30
35
40
Average Length (triangles)Germline Length (diamonds)
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
D-gene usage and JH gene
• JH proximal D genes more often recombined to JH4 than JH6 and JH distal D genes more often to JH6
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Inverted (palindrom) D genes
Inverted D genes are not used!(or used extremely infrequent)
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
D genes with irregular RSS (recombinaation signal sequence) (DIR)
• Very long, >180 bp
• Contain a family 1 D gene
>DIR1 (in between D6-6 og D1-7)GGTGTTCCGCTAGCTGGGGCTCACAGTGCTCACCCCACACCTAAAACGAGCCACAGCCTCCGGAGCCCCTGAAGGAGACCCCGCCCACAAGCCCAGCCCCCACCCAGGAGGCCCCAGAGCACAGGGCGCCCCGTCGGATTCTGAACAGCCCCGAGTCACAGTGGGTATAACTGGAACTAC>IGHD1-7-01|X13972|IGHD1-7-01|Homo sapiens|F|D-REGIONGGTATAACTGGAACTAC
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
D genes with irregular RSS (DIR)
• Very long, >180 bp
• Contain a family 1 D gene
• Found in 1% of sequences, inverted in 1.2%
• Some explained as family 1 gene plus N additions
• Median length of remaining not different from in permutated sequences
=> No evidence for use of DIR
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Two D genes
• 2 D genes found in 1% of sequences
• Frequency not different from permutated sequences
• Some explained as one long D genes with deletion
• Some not possible due to D genes location
• Median lengths of longest gene resembles normal D genes, shortest resembles permutated sequences
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Multiple D genes
• 65 sequences with two D genes• Average length of shortest D genes: 11.6bp• Average length of longest D genes: 18.8bp• Average length of D genes in permuted sequences: 11.3bp
• Average length of D genes in normal sequences: 17.8bp
• => multiple D genes are not present!!!
Longest-D Shortest-DV-gene J-gene
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Chromosome 15 OR (open reading frames)
• 10 OR resembling D genes on chromosome 15
• High homology to conventional D genes
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
>IGHD5-12-01|X13972|IGHD5-12-01|Homo sapiens|F|D- 275 aa vs.>IGHD5-OR15-5 |X55583 og X55584 253 aa91.3% identity; Global alignment score: 1563 10 20 30 40 50 FINDFASTADTEMPLATESDATIGHD-SATSEPMNIELHMEPRECTSMNIELANTIBDYH :::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::: FINDFASTADTEMPLATESDATIGHDRSATSEPMNIELHMEPRECTSMNIELANTIBDYH 10 20 30 40 50 60
60 70 80 90 100 110 MMDGENANALYSISIRIXRGANISMIPCMMANDLINEPARAMETERSSETTLARGVISAN :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: MMDGENANALYSISIRIXRGANISMIPCMMANDLINEPARAMETERSSETTLARGVISAN 70 80 90 100 110 120
120 130 140 150 160 170 AMELISTDEFALTISNAMEXEXCLDENAMESMFINDENTRYBYCMMNSBMATCHAFINDA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: AMELISTDEFALTISNAMEXEXCLDENAMESMFINDENTRYBYCMMNSBMATCHAFINDA 130 140 150 160 170 180
180 190 200 210 220 230 LLENTRIESSNAMEMSTBEINSTARTFNAMENMBERFFASTAENTRIESREADFRMFILE :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: LLENTRIESSNAMEMSTBEINSTARTFNAMENMBERFFASTAENTRIESREADFRMFILE 190 200 210 220 230 240
240 250 260 270 DTEMPLATESDATGTGGATATAGTGGCTACGATTAC ::::::::::::: DTEMPLATESDAT----------------------- 250
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Chromosome 15 OR (open reading frames)
• 10 OR resembling D genes on chromosome 15
• High homology to conventional D genes
• Very few OR15 in un-mutated sequences
• Median length not different from hits in permutated sequences
=> No evidence for use of OR15 genes
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
D gene reading frames
• The recombination mechanism utilises each D gene reading frame at same frequency
Reading Frame
Stop Hydrophilic Hydrophobic
Gene P NP P NP P NP
D2-2*01 RIL**YQLLC (1) 6.5 34.7 GYCSSTSCYA (2) 61.2 32.6 DIVVVPAAM (3) 32.2 32.6
D2-2*02 RIL**YQLLY (1) 11.3 46.7 GYCSSTSCYT (2) 55.0 20.0 DIVVVPAAI (3) 33.8 33.3
D2-2*03 WIL**YQLLC (1) 0.0 50.0 GYCSSTSCYA (2) 66.7 50.0 DIVVVPAAM (3) 33.3 0.0
D2-8*01 RILY*WCMLY (1) 2.4 42.9 GYCTNGVCYT (2) 68.3 28.6 DIVLMVYAI (3) 29.3 28.6
D2-8*02 RILYWWCMLY (1) 0.0 0.0 GYCTGGVCYT (2) 88.9 0.0 DIVLVVYAI (3) 11.1 100
D2-15*01 RIL*WW*LLL (1) 1.5 32.5 GYCSGGSCYS (2) 70.8 37.5 DIVVVVAAT (3) 27.7 30.0
D2-21*01 SILWW*LLF (1) 8.3 50.0 AYCGGDCYS (2) 58.3 25.0 HIVVVIAI (3) 33.3 25.0
D2-21*02 SILWW*LLF (1) 0.0 54.5 AYCGGDCYS (2) 78.0 18.2 HIVVVTAI (3) 22.0 27.3
Total - 10.8 33.6 - 62.2 32.4 - 26.9 34.0
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
N nucleotide dependence on end nucleotide
Position X+1Position X A T G C P-value A 0.292 0.146 0.292 0.271 0.04
T 0.260 0.290 0.207 0.243 0.016G 0.204 0.172 0.453 0.172 0.0004C 0.136 0.204 0.231 0.430 <0.0001
Expected 0.210 0.201 0.292 0.298 -
N addition is not random but dependent on end nucleotide
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Trimming of gene ends
Trimming of VH
0
20
40
60
80
100
120
140
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 P1 P2
End Position
Number of Sequences
Observed
Predicted
Trimming depends on the gene-end and can not only be described by a simple removal of one nucleotide at a time
Avg. 3.8 bp
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
VDJsolver performance
Unmutated sequences
Mutated sequences
#: p<0.01§: P<0.001
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Results regarding recombination and diversity and open questions
• DIR, OR15, multiple D genes and VH replacements are not used at a significant rate
• Inverted D genes are used rarely• All D genes not used at same frequency What determines if a D genes is used?• D gene usage somewhat dependent on JH gene Does multiple D-J recombination steps take place?
• All D gene reading frames used at equal rate at the recombination step
At what step in the development happens the selection for the hydrophilic reading frame?
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Results regarding recombination and diversity and open questions (cont.)
• N addition not random but dependent on end nucleotide
Does nucleotide availability or the specificity of TdT determine the N addition?
• Trimming not random but dependent on gene and sequence
What enzyme(s) is responsible for the trimming?
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Numbering Schemes
The Kabat numbering scheme is a widely adopted standard for numbering the residues in an antibody in a consistent manner. However the scheme has problems!
The Chothia numbering scheme is identical to the Kabat scheme, but places the insertions in CDR-L1 and CDR-H1 at the structurally correct positions. This means that topologically equivalent residues in these loops do get the same label (unlike the Kabat scheme).
The IMGT unique numbering for all IG and TR V-REGIONs of all species relies on the high conservation of the structure of the variable region. This numbering, set up after aligning more than 5 000 sequences, takes into account and combines the definition of the framework (FR) and complementarity determining regions (CDR), structural data from X-ray diffraction studies, and the characterization of the hypervariable loops.
http://www.bioinf.org.uk/abs/#kabatnum http://imgt.cines.fr/
CE
NT
ER
FO
R B
IOL
OG
ICA
L S
EQ
UE
NC
E A
NA
LY
SIS
Identification of CDR regions
Indentifying the CDRs
CDR-L1Start Approx residue 24Residue before is always CResidue after is always W. Typically WYQ, but also, WLQ, WFQ, WYLLength 10 to 17 residuesCDR-L2Start always 16 residues after the end of CDR-L1Residues before generally IY, but also, VY, IK, IFLength always 7 residuesCDR-L3Start always 33 residues after end of CDR-L2Residue before is always CResidues after always FGXGLength 7 to 11 residuesCDR-H1Start Approximately residue 31 (always 9 after a C) (Chothia/AbM defintion starts 5 residues earlier)Residues before always CXXXXXXXXResidues after always W. Typically WV, but also WI, WALength 5 to 7 residues (Kabat definition); 7 to 9 residues (Chothia definition); 10 to 12 residues (AbM definition)CDR-H2Start always 15 residues after the end of Kabat/AbM definition of CDR-H1Residues before typically LEWIG, but a number of variationsResidues after K[RL]IVFT[AT]SIA (where residues in square brackets are alternatives at that position)Length Kabat definition 16 to 19 residues (AbM definition and most recent Chothia definition ends 7 residues earlier; earlier Chothia definition starts 2 residues later and ends 9 earlier)CDR-H3Start always 33 residues after end of CDR-H2 (always 3 after a C)Residues before always CXX (typically CAR)