sequence motifs, information content, logos, and hmm’s
DESCRIPTION
Sequence motifs, information content, logos, and HMM’s. Morten Nielsen, CBS, BioCentrum, DTU. Outline. Multiple alignments and sequence motifs Weight matrices and consensus sequence Sequence weighting Low (pseudo) counts Information content Sequence logos Mutual information - PowerPoint PPT PresentationTRANSCRIPT
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequence motifs, information content,
logos, and HMM’sMorten Nielsen,
CBS, BioCentrum, DTU
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Outline• Multiple alignments and sequence motifs• Weight matrices and consensus sequence
– Sequence weighting– Low (pseudo) counts
• Information content– Sequence logos– Mutual information
• Example from the real world• HMM’s and profile HMM’s
– TMHMM (trans-membrane protein) – Gene finding
• Links to HMM packages
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Multiple alignment and sequence motifs
• Core• Consensus
sequence• Weight matrices• Problems
– Sequence weights– Low counts
----------MLEFVVEADLPGIKA------------------MLEFVVEFALPGIKA------------------MLEFVVEFDLPGIAA---------------------YLQDSDPDSFQD-----------GSDTITLPCRMKQFINMWQE-------------RNQEERLLADLMQNYDPNLR-----------------YDPNLRPAERDSDVVNVSLK----------------NVSLKLTLTNLISLNEREEA-------EREEALTTNVWIEMQWCDYR-------------------WCDYRLRWDPRDYEGLWVLR-----LWVLRVPSTMVWRPDIVLEN-----------------------IVLENNVDGVFEVALYCNVL--------------YCNVLVSPDGCIYWLPPAIF---------PPAIFRSACSISVTYFPFDW---- ********* FVVEFDLPG
Consensus
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequences weighting 1 - Clustering
----------MLEFVVEADLPGIKA------------------MLEFVVEFALPGIKA------------------MLEFVVEFDLPGIAA---------------------YLQDSDPDSFQD-----------GSDTITLPCRMKQFINMWQE-------------RNQEERLLADLMQNYDPNLR-----------------YDPNLRPAERDSDVVNVSLK----------------NVSLKLTLTNLISLNEREEA-------EREEALTTNVWIEMQWCDYR-------------------WCDYRLRWDPRDYEGLWVLR-----LWVLRVPSTMVWRPDIVLEN-----------------------IVLENNVDGVFEVALYCNVL--------------YCNVLVSPDGCIYWLPPAIF---------PPAIFRSACSISVTYFPFDW----
*********
} Homologous sequencesWeight = 1/n (1/3)
Consensus sequence
YRQELDPLV
Previous
FVVEFDLPG
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequences weighting 2 - (Henikoff & Henikoff)
w FVVEADLPG 0.37FVVEFALPG 0.43FVVEFDLPG 0.32YLQDSDPDS 0.59MKQFINMWQ 0.90LMQNYDPNL 0.68PAERDSDVV 0.75LKLTLTNLI 0.85VWIEMQWCD 0.84YRLRWDPRD 0.51WRPDIVLEN 0.71VLENNVDGV 0.59YCNVLVSPD 0.71FRSACSISV 0.75
• waa’ = 1/rs• r: Number of different aa in a column• s: Number occurrences• Normalize so waa= 1 for each column• Sequence weight is sum of waa
F: r=7 (FYMLPVW), s=4 w’=1/28, w = 0.055Y: s=3, w`=1/21, w = 0.073M,P,W: s=1, w’=1/7, w = 0.218L,V: s=2, w’=1/14, w = 0.109
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Low count correction
--------MLEFVVEADLPGIKA----------------MLEFVVEFALPGIKA----------------MLEFVVEFDLPGIAA-------------------YLQDSDPDSFQD---------GSDTITLPCRMKQFINMWQE-----------RNQEERLLADLMQNYDPNLR---------------YDPNLRPAERDSDVVNVSLK--------------NVSLKLTLTNLISLNEREEA-----EREEALTTNVWIEMQWCDYR-----------------WCDYRLRWDPRDYEGLWVLR---LWVLRVPSTMVWRPDIVLEN---------------------IVLENNVDGVFEVALYCNVL------------YCNVLVSPDGCIYWLPPAIF-------PPAIFRSACSISVTYFPFDW---- *********
• Limited number of data
• Poor sampling of sequence space
• I is not found at position P1. Does this mean that I is forbidden?
• No! Use Blosum matrix to estimate pseudo frequency of I
P1
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Low count correction using Blosum matrices
# I L V
L 0.1154 0.3755 0.0962
V 0.1646 0.1303 0.2689
Blosum62 substitution frequencies• Every time for
instance L/V is observed, I is also likely to occur
• Estimate low (pseudo) count correction using this approach
• As more data are included the pseudo count correction becomes less important
NL = 2, NV=2, Neff=12 =>fI = (2*0.1154 + 2*0.1646)/12 = 0.05
pI* = (Neff * pI + * fI)/(Neff+) = (12*0 + 10*0.05)/(12+10) = 0.02
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Information content
• Information and entropy– Conserved amino acid regions contain high degree of
information (high order == low entropy)– Variable amino acid regions contain low degree of
information (low order == high entropy)
• Shannon information D = log2(N) + pi log2 pi (for proteins N=20, DNA
N=4)
• Conserved residue pA=1, pi<>A=0, D = log2(N) ( = 4.3 for proteins)
• Variable region pA=0.05, pC=0.05, .., D = 0
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequence logo
• Height of a column equal to D
• Relative height of a letter is pA
• Highly useful tool to visualize sequence motifs
High information position
MHC class IILogo from 10 sequences
http://www.cbs.dtu.dk/~gorodkin/appl/plogo.html
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
More on logos
• Information contentD = pi log2 (pi/qi)
• Shannon, qi = 1/N = 0.05D = pi log2 (pi) - pi log2 (1/N)
= log2 N - pi log2 (pi)
• Kullback-Leibler, qi = background frequency– V/L/A more frequent than for instance C/H/W
A R N D C Q E G H I L K M F P S T W Y V2 1 1 1 1 1 1 1 1 4 16 1 6 15 7 1 2 7 18 138 19 1 1 7 2 2 2 1 3 15 13 6 2 1 2 2 7 1 83 2 7 2 1 17 13 2 1 8 14 3 1 1 7 7 2 0 1 88 13 13 14 1 2 13 2 1 2 3 3 1 7 1 3 7 0 1 74 1 7 7 7 1 2 2 1 13 15 2 6 6 1 7 2 7 7 45 2 8 23 1 6 3 2 1 3 3 2 1 1 1 13 8 0 1 182 1 7 13 1 1 2 2 1 8 14 2 6 1 20 7 2 7 1 33 7 7 8 7 1 7 8 1 2 8 2 1 1 13 7 2 7 1 73 2 7 19 1 6 2 8 1 9 9 2 1 1 1 7 2 0 1 18
Frequency matrix
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information
I(i,j) = aai aaj
P(aai, aaj) *
log[P(aai, aaj)/P(aai)*P(aaj)]
P(G1) = 2/9 = 0.22, ..P(V6) = 4/9 = 0.44,..P(G1,V6) = 2/9 = 0.22, P(G1)*P(V6) = 8/81 = 0.10
log(0.22/0.10) > 0
ALWGFFPVAILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSYMNGTMSQV
GILGFVFTL WLSLLVPFVFLPSDFFPS
P1 P6
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information
313 binding peptides 313 random peptides
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information at anchor position is low
• Mutual information between anchor positions 2 and 9 and other residues low– At pos 2 we know that L,M,T,V and I are the most
frequent amino acids. – At pos 9 V,L,I and A are most frequent– 313 Rammensee + Buus pep
• P(L2) = 0.51, P(V9)=0.48, P(L2,V9) = 0.23• P(L2,V9)/(P(L2)*P(V9) )=0.23/0.24 = 1.0
• Knowing that we have L at position 2 does not tell us which one of V,L or I is placed on position 9 => NO mutual information
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Weight matrices
• Estimate amino acid frequencies from alignment inc. sequence weighting and pseudo counts
• Now a weight matrix is given as
Wij = log(pij/qj)• Here i is a position in the motif, and j an amino
acid. qj is the background frequency for amino acid j.
• W is a L x 20 matrix, L is motif length• Score sequences to weight matrix by looking
up and adding L values from matrix
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example from real life
• 10 peptides from MHCpep database
• Bind to the MHC complex
• Relevant for immune system recognition
• Estimate sequence motif and weight matrix
• Evaluate on 528 peptides
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example (cont.)
• Raw sequence counting– No sequence
weighting – No pseudo count– Prediction accuracy
0.45
• Sequence weighting– No pseudo count– Prediction accuracy
0.5
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example (cont.)
• Sequence weighting and pseudo count– Prediction accuracy
0.60
• Motif found on all data (485)– Prediction accuracy
0.79
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Hidden Markov Models
• Weight matrices do not deal with insertions and deletions
• In alignments, this is done in an ad-hoc manner by optimization of the two gap penalties for first gap and gap extension
• HMM is a natural frame work where insertions/deletions are dealt with explicitly
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM (a simple example)
ACA---ATG
TCAACTATC
ACAC--AGC
AGA---ATC
ACCG--ATC
• Example from A. Krogh
• Core region defines the number of states in the HMM (red)
• Insertion and deletion statistics is derived from the non-core part of the alignment (blue)
Core of alignment
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
.8
.2
ACGT
ACGT
ACGT
ACGT
ACGT
ACGT.8
.8 .8.8
.2.2.2
.2
1
ACGT .2
.2
.2
.4
1. .4 1. 1.1.
.6.6
.4
HMM construction
ACA---ATG
TCAACTATC
ACAC--AGC
AGA---ATC
ACCG--ATC
• 5 matches. A, 2xC, T, G• 5 transitions in gap region
• C out, G out• A-C, C-T, T out• Out transition 3/5• Stay transition 2/5
ACA---ATG 0.8x1x0.8x1x0.8x0.4x1x0.8x1x0.2 = 3.3x10-2
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Align sequence to HMMACA---ATG 0.8x1x0.8x1x0.8x0.4x1x0.8x1x0.2 = 3.3x10-2
TCAACTATC 0.2x1x0.8x1x0.8x0.6x0.2x0.4x0.4x0.4x0.2x0.6x1x1x0.8x1x0.8 = 0.0075x10-2
ACAC--AGC = 1.2x10-2
AGA---ATC = 3.3x10-2
ACCG--ATC = 0.59x10-2
Consensus:
ACAC--ATC = 4.7x10-2, ACA---ATC = 13.1x10-2
Exceptional:
TGCT--AGG = 0.0023x10-2
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Align sequence to HMM - Null model
• Score depends strongly on length
• Null model is a random model. For length L the score is
0.25L
• Log-odd score for sequence S
Log( P(S)/0.25L)
ACA---ATG = 4.9
TCAACTATC = 3.0 ACAC--AGC = 5.3AGA---ATC = 4.9ACCG--ATC = 4.6Consensus:ACAC--ATC = 6.7 ACA---ATC = 6.3Exceptional:TGCT--AGG = -0.97
Note!
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM’s and weight matrices
• In the case of un-gapped alignments HMM’s become simple weight matrices
• It still might be useful to use a HMM tool package to estimate a weight matrix– Sequence weighting– Pseudo counts
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Profile HMM’s
• Alignments based on conventional scoring matrices (BLOSUM62) scores all positions in a sequence in an equal manner
• Some position are highly conserved, some are highly flexible (more than what is described in the BLOSUM matrix)
• Profile HMM’s are ideal suited to describe such position specific variations
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
ExampleSequence profiles
• Alignment of 1PLC._ to 1GYC.A• Blast e-value > 1000• Profile alignment
– Align 1PLC._ against Swiss-prot– Make position specific weight matrix from
alignment– Use this matrix to align 1PLC._ against
1GYC.A
• E-value > 10-22. Rmsd=3.3
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example continued Score = 97.1 bits (241), Expect = 9e-22 Identities = 13/107 (12%), Positives = 27/107 (25%), Gaps = 17/107 (15%) Query: 3 VLLGADDGSLAFVPSEFSISPGEKI------VFKNNAGFPHNIVFDEDSIPSGVDASKIS 56 V+ G F + G++ N+ + +G + +Sbjct: 26 VVNG------VFPSPLITGKKGDRFQLNVVDTLTNHTMLKSTSIHWHGFFQAGTNWADGP 79 Query: 57 MSEEDLLNAKGETFEVAL---SNKGEYSFYCSP--HQGAGMVGKVTV 98 A G +F G + ++ G+ G VSbjct: 80 AFVNQCPIASGHSFLYDFHVPDQAGTFWYHSHLSTQYCDGLRGPFVV 126
Rmsd=3.3
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
EM55_HUMAN WWQGRVEGSSKESAGLIPSPELQEWRVASMAQSAP--SEAPSCSPFGKKKK-YKDKYLAKCSKP_HUMAN WWQGKLENSKNGTAGLIPSPELQEWRVACIAMEKTKQEQQASCTWFGKKKKQYKDKYLAKKAPB_MOUSE -----PENLLIDHQGYIQVTDFGFAKRVKG------------------------------NRC2_NEUCR -----PENILLHQSGHIMLSDFDLSKQSDPGGKPTMIIGKNGTSTSSLPTIDTKSCIANF
EM55_HUMAN HSSIFDQLDVVSYEEVVRLPAFKRKTLVLIGASGVGRSHIKNALLSQNPEKFVYPVPYTTCSKP_HUMAN HNAVFDQLDLVTYEEVVKLPAFKRKTLVLLGAHGVGRRHIKNTLITKHPDRFAYPIPHTTKAPB_MOUSE RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGNRC2_NEUCR RTNSFVGTEEYIAPEVIKGSGHTSAVDWWTLGILIYEMLYGTTPFKGKNRNATFANILRE
EM55_HUMAN RPPRKSEEDGKEYHFISTEEMTRNISANEFLEFGSYQGNMFGTKFETVHQIHKQNKIAILCSKP_HUMAN RPPKKDEENGKNYYFVSHDQMMQDISNNEYLEYGSHEDAMYGTKLETIRKIHEQGLIAILKAPB_MOUSE KVRFPSHF-----SSDLKDLLRNLLQVDLTKRFGNLKNGVSDIKTHKWFATTDWIAIYQRNRC2_NEUCR DIPFPDHAGAPQISNLCKSLIRKLLIKDENRRLG-ARAGASDIKTHPFFRTTQWALI--R
EM55_HUMAN NNGVDETLKKLQEAFDQACSSPQWVPVSWVYCSKP_HUMAN NNEIDETIRHLEEAVELVCTAPQWVPVSWVYKAPB_MOUSE EKCGKEFCEF---------------------NRC2_NEUCR ENAVDPFEEFNSVTLHHDGDEEYHSDAYEKR
Profile HMM’s Insertion
Deletion
Conserved
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Profile HMM’s
All M/D pairs must be visited once
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
TMHMM (trans-membrane HMM)
(Sonnhammer, von Heijne, and Krogh)
Model TM length distribution.Power of HMM.Difficult in alignment.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Combination of HMM’s -Gene finding
x cccxxxxxxxxATGccc cccTAAxxxxxxxx
Inter-genicregion
Region aroundstart codon
Coding region
Region aroundstop codon
Start codon
Stop codon
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM packages
• HMMER (http://hmmer.wustl.edu/)– S.R. Eddy, WashU St. Louis. Freely available.
• SAM (http://www.cse.ucsc.edu/research/compbio/sam.html)– R. Hughey, K. Karplus, A. Krogh, D. Haussler and others, UC Santa
Cruz. Freely available to academia, nominal license fee for commercial users.
• META-MEME (http://metameme.sdsc.edu/)– William Noble Grundy, UC San Diego. Freely available. Combines
features of PSSM search and profile HMM search.
• NET-ID, HMMpro (http://www.netid.com/html/hmmpro.html)– Freely available to academia, nominal license fee for commercial users.– Allows HMM architecture construction.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Simple Hmmer command
hmmbuild --gapmax 0.0 --fast A2.hmmer A2.fsa
hmmbuild - build a hidden Markov model from an alignmentHMMER 2.2g (August 2001)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Alignment file: A2.fsa
File format: a2mSearch algorithm configuration: Multiple domain (hmmls)
Model construction strategy: Fast/ad hoc (gapmax 0.0)Null model used: (default)
Sequence weighting method: G/S/C tree weights- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Alignment: #1Number of sequences: 232
Number of columns: 9Determining effective sequence number ... done. [192]
Weighting sequences heuristically ... done.Constructing model architecture ... done.Converting counts to probabilities ... done.
Setting model name, etc. ... done. [A2.fasta]Constructed a profile HMM (length 9)
Average score: -6.42 bitsMinimum score: -15.47 bitsMaximum score: -0.84 bits
Std. deviation: 2.72 bits
>HLA-A.0201 16 Example_for_LigandSLLPAIVEL>HLA-A.0201 16 Example_for_LigandYLLPAIVHI>HLA-A.0201 16 Example_for_LigandTLWVDPYEV>HLA-A.0201 16 Example_for_LigandSXPSGGXGV>HLA-A.0201 16 Example_for_LigandGLVPFLVSV