using crisprs in micro- evolution studies - lifl.fr · using crisprs in micro-evolution studies...
TRANSCRIPT
Using CRISPRs in micro-evolution studies
Algorithmique, combinatoire du texte et applications en bio-informatique
Institut de Génétique et Microbiologie
GPMS: Génomes Polymorphismeet Minisatellites
http://minisatellites.u-psud.fr/
Encadré par :
Christine POURCELGilles VERGNAUD
Réalisé par : Ibtissem GRISSA
28/09/2007
Outline of the talk- CRISPR properties- Bioinformatics tools
• Background- CRISPR Properties
- Bacterial Defense system
• Results• Results- Bioinformatics tools: CRISPRFinder, CRISPRdb
- Micro-evolution studies
- Bioinformatics tools: CRISPRFinder, CRISPRdb
- Micro-evolution studies
- CRISPR properties- Bioinformatics tools CClusteredlustered RRegularlyegularly IInterspacednterspaced SShorthort PPalindromicalindromic RRepeatepeat
CASS : CRISPR + cas• Structure :
DR Leader
DR(24 – 47 bp)
spacersLeader (AT-rich)Degenerated DR
cas
TTTGATTATTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGATTTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTTTTTCTAAGCTGCCTGTGCAGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGATTTTCTAAGCTGCCTGTGCGGCAGTGAACAGTAAGATAATACGATAACATCCTGTTTGTAAAATACTTAT
almost all archaea (29/31)
40% of eubacteria (156/391)• Observed in procaryotic genomes :
Examples of CRISPRs- CRISPR properties- Bioinformatics tools
1 AGGTTTTGCTGCCTTTTCGGCGGGTATC TCAAAGTCAACTTGTAAATGACGATTTTCACG 32
2 ATTTTCAGCTGCCTATTCGGCAGGTCAC AGTTTGGGGCTGAGTTTGCCATTTTCCTAAAT 323 ATTTTCAGCTGCCTATTCGGCAGGTCAC GATGAAGCAGACCACCTCGATTACCCCACGCT 324 ATTTTCAGCTGCCTATTCGGCAGGTCAC ACTATTTATCAAGACCTTCTTTAAAATCAAAC 325 ATTTTCAGCTGCCTATTCGGCAGGTCAC AGTTTGGGGCTGAGTTTGCCATTTTCCTAAAC 326 ATTTTCAGCTGCCTATTCGGCAGGTCAC
(4626121)
(4626448)
** ** * * **
Shewanella sp. ANA-3 (CRISPR_2)
Yersinia pestis KIM (CRISPR_4)
1 TTATTGGGCTGCCTGTGCGGCAGTGAAC GTTATACCCCGCGCAGGGAGTGAAGCGTTGAC 32
2 TTTCTAAGCTGCCTGTGCGGCAGTGAAC TTAAGTTCTTTTTGTCAGCATCTTTAATAAAT 323 TTTCTAAGCTGCCTGTGCGGCAGTGAAC CTGAAATACAAATAAAATAAATCGTCGAACAT 324 TTTCTAAGCTGCCTGTGCGGCAGTGAAC
(2875721)
(2875928)
** **
Sulfolobus tokodaii str. 7 (CRISPR_2)
7 GATGAATCCCAAAAGGAATTGAAAG TGATTGATCACAATGAGAAGACTGTAAAGCTGATAAAC 388 GATGAATCCCAAAAGGAATTGAAAG TGTTGAGGCATAAATTAATCTATCCTTAATGAAAAAT 379 GATGAATCCCAAAAGGAATTGAAAG TTCTTCCTCAGCCTCCATTTTGTTTATGATTTGTAGTGCC 4010 GATGAATCCCAAAAGGAATTGAAAG TTCAATAATCTCTATCTTTCCAAAATCTGTAAATGAAGAC 40
109 GATGAATCCCAAAAGGAATTGAAAG AAAGCACAGTCAATAACGTTATCTGGTATCATATTATCAAA 41110 GATGAATCCCAAAAGGAATTGAAAG CTTTCTCCTTCCCTCTGATCTCTCGCTGAATTGAAAAGA 39111 GATGAATCCCAAAAGGAATTGAAAG GTAAGTATTGATGCTAACATTGACTTCGCTGTCCCAGGGGC 41112 GATGAATCCCAAAAGGAATTGAAGG AAGTATAATAACGATAGTACTAAAATTAATTGATCC 36
113 GATGATTCTCAAAAGGAATTGATAA* * ***
(32702)
(39896)
1 16 CRISPRs
1 248 motifs
CRISPRsCRISPRs : a B: a Bacterial acterial DefenseDefense systemsystem- CRISPR properties- Bioinformatics tools
• CRISPRs spacers generally originate from mobile elements (plasmids, phages) (Y. pestis, S. thermophilus, S. Solfataricus, S. pyogenes…)
• CRISPRs are transcribed and subsequentlyprocessed as micro RNAs (owing to the cas genesmachinery) : RNA interference (RNAi) system to block phage reproduction.
Cas proteins and CRISPR spacer sequencesconstitute a bacterial immune system that works by a mechanism similar to that of RNAi in higherorganisms
BBacterial acterial DefenseDefense against phage invadersagainst phage invaders- CRISPR properties- Bioinformatics tools
CRISPR Provides Acquired Resistance Against Viruses in ProkaryotesBarrangou, Horvath et al, Science 2007
System :Streptococcus thermophilus (used to make yogurt and cheese)
-Infection with phage incorporation of phage-related spacers within CRISPR1
-Such bacteria become resistant to further infection by similar phage strains
-If the spacer is taken out,the resistance is lost
-At least one cas gene is necessaryfor resistance to phage
-At least one cas gene to generatephage-resistant bacteria
Producing more phage-resistantbacterial strains for industrial use?
CRISPRFinder tool- CRISPR properties- Bioinformatics tools
- CRISPRs can be found relatively easily using existing software tools
BUT
- Output not appropriate for this purpose
- Background (tandem repeats,..) further postprocessing and manual curation!!
- Difficulty in defining the DR consensus endpoints + degenerated DR
- Sensitivity (short repeats are generally neglected)
- Absence of Web tool (easy and intuitive)
Dedicated software tool for the identification and preliminary analysis of CRISPRs- Precision- Intuitive and easily used- Web service
CRISPRFinder Workflow- CRISPR properties- Bioinformatics tools
maximal repeats
Sequence(s)
CRISPR possible localizations
DR DR23bp - 55bp
25bp - 60bp
DR’ DR DR23bp - 55bp
[0.6DR - 2.5DR]
23 DR DR[ , ]
CRISPR structure check
Tandem RepeatsElimination
Identification of candidate DRsQuestionable
CRISPRsConfirmedCRISPRs
?
A Ab dc e
Utilisation de Vmatch (Reputer)
CRISPRFinder Output- CRISPR properties- Bioinformatics tools
CRISPRFinder
http://crispr.u-psud.fr/Server/CRISPRfinder.php
CRISPRFinder
CRISPRFinder Output
http://crispr.u-psud.fr/Server/CRISPRfinder.php
- CRISPR properties- Bioinformatics tools
Spacer dictionnary creator- CRISPR properties- Bioinformatics tools
Spacer dictionnary creatorExample of use :
The micro-evolution of Y. pestis species
Evolution de la structure CRISPR
• gain de spacers : insertion polarisée adjacente à la séquence leader
• Perte interstitielle de spacers par recombinaison entre 2 DR
• Conservation de l’ordre des spacersspacer acquisitionelast DR duplication
a DR b c d Leader
CRISPR YP1 in three different strains- CRISPR properties- Bioinformatics tools
tttgattatTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC e TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA f TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG g TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG h TTTCTAAGCTGCCTGTGCGGCAGTGAACAGTAAGATAATACGATAACATCCTGTTTGTAA
Souche Java9 tttgattatTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT a TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT b TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT c TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA d TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC e TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA f TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG g TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG h TTTCTAAGCTGCCTGTGCGGCAGTGAACACGTCATCCTGAAGGCTAGGCAGCTCGGCTTC 0 TTTCTAAGCTGCCTGTGCGGCAGTGAACAGTAAGATAATACGATAACATCCTGTTTGTAA
Souche 02-449 tttgattatTGCCTGTGCGGCAGTGAACTCAGGGGACTGGCGAACAATGTCTTTCATGAT a TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAAAGGTAAGATGGGCAAGCTTCTAGTAGTT b TTTCTAAGCTGCCTGTGCGGCAGTGAACATTATCTGAATGGCATTTTCTTTGGCGCAGAT c TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGCCATTCCGTGAACCTGAGCGCGTTCGCGA d TTTCTAAGCTGCCTGTGCGGCAGTGAACATATTCTCGAGCGATAGCAATAGCCATTCCAC e TTTCTAAGCTGCCTGTGCGGCAGTGAACTCGGTCAAACAAATTTAGGCGACGATTTAACA f TTTCTAAGCTGCCTGTGCGGCAGTGAACAAAAAGAATTTGGGATTAAAGTTACCCATCAG g TTTCTAAGCTGCCTGTGCGGCAGTGAACTCAATGCCTGAATCTCTGGCGTGATAGCTGCGG h TTTCTAAGCTGCCTGTGCGGCAGTGAACACGTCATCCTGAAGGCTAGGCAGCTCGGCTTC 0 TTTCTAAGCTGCCTGTGCGGCAGTGAACGAAATTGTGGGTGTAGATGTTGCAGACGCCTC V TTTCTAAGCTGCCTGTGCGGCAGTGAACTCTGACGTTGCCTGTGTTGCCGCTCTCGTATT W TTTCTAAGCTGCCTGTGCGGCAGTGAACAGTAAGATAATACGATAACATCCTGTTTGTAA
Souche 195P
YP1(2769)
YP2(2895) YP3
(1773)prophage(2363-2409)
Ter
Yersinia pestis CO924,653,728bp
Ori
metAaceB
atpA
AspC
nuoMphosphotransferase
Usher protein
http://crispr.u-psud.fr/crispr/MultipleAnalysis/CRISPRdetector1.php
Outil phylogénétique ??Les spacers identiques renseignent sur un ancêtre commun- Spolygotypage chez M. tuberculosis (CRISPR inactif)
strain
s1
s2
s3
s4
s5
s6
s7
s8
s9
s10
s11
s12
s13
s14
s15
s16
s17
s18
s19
s20
s21
s22
s23
s24
s25
s26
s27
s28
s29
s30
s31
s32
s33
s34
s35
s36
s37
s38
s39
s40
s41
s42
s43
199 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
18 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
314 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
310 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
312 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
207 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
307 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
171 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
318 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
304 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
306 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
303 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
334 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
57 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
Spoligotypage M. tuberculosis
Evolution of Y.Pestis CRISPR- CRISPR properties- Bioinformatics tools
CRISPR YP1 evolution (Pourcel et al 2005)
Y. pestis:- 109 sequenced alleles- 29 spacers
tentative evolutionary scenario for YP1 CRISPR
geno 1abcdemn
pestoide Georgiaa2j2b2k2l2m2
geno10 26 33 34 35 38 45abcdefghl/q/r/s/u/x/y/z
geno11@13efgh
geno 37abcdfgh
geno 47abcdefghovw
geno49abcdefghop
geno 44 48 50abcdefgho
geno 9, 14@25, 27@32, 36, 39@43, 46, 51abcdefghorientalis
a2b2c2d2e2
geno3abcdjk
antiqua (Africa)
geno 2abcdj
antiqua (Africa)
intermediateabcd
antiqua
geno 54abci
medievalis
geno 61abct
medievalis
geno 6, 7, 8, 52, 53, 55@60abc
antiqua (Asia), medievalis (Iran)a2b2c2d2
prediction for Y. pestis ancestorabcdef
"91001"adf
pestoide Chinaa2b2c2d2
ancestorabcde
CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophageDNA, and provide additional tools for evolutionary studies
POURCEL et al, Microbiology 2005
Spacers dictionnary creator- CRISPR properties- Bioinformatics tools
Y. Pestis evolution (Antiqua-> Medievalis ->Orientalis)?
Spacers dictionnary creator- CRISPR properties- Bioinformatics tools
Y. Pestis evolution (Antiqua-> Medievalis ->Orientalis)?
Y. Pestis evolution (Antiqua-> Medievalis ->Orientalis)?
Spacers dictionnary creator- CRISPR properties- Bioinformatics tools
Créer un fichier binaire de tous les spacers introduit (0: n’existe pas, 1 : existe)
CR ISPR _DR 29_2
s13
5
s134
s13
3
s13
2
s131
s130
s12
9
s12
8
s127
s126
s12
5
s12
4
s123
s122
s12
1
s12
0
s119
s118
s11
7
s11
6
s115
s11
4
s11
3
s11
2
s111
s11
0
s10
9
s10
8
s107
s10
6
s10
5
s104
s103
s10
2
s10
1
s100
s99
s98
s97
s96
s95
s94
s93
s92
s91
s90
s89
s88
s87
s86
s85
s84
s83
s82
s81
s80
s79
s78
s77
s76
s75
s74
s73
s72
s71
s70
s69
s68
s67
s66
s65
s64
s63
s62
s61
s60
s59
s58
s57
s56
s55
s54
s53
s52
s51
s50
s49
s48
s47
s46
s45
s44
s43
s42
s41
s40
s39
s38
s37
s36
s35
s34
s33
s32
s31
FSLF2-515
FSLJ2-003
FSLN1-017
J2818
str11.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.69.70.72.73.74.75.76.77.79.80.81.82.83.84.85.86.87.88.89.90.97.98.99.100.101.104.
str2 34.35.39.40.41.42.43.44.45.46.47.48.49.50.59.60.61.63.64.65.66.69.72.77.91.92.93.94.97.98.101.102.103.104.
str3 51.52.53.54.55.56.57.59.60.61.62.63.65.66.67.68.69.71.72.76.78.79.80.95.96.98.99.102.103.104.105.
str4 36.37.38.39.40.41.42.49.50.58.59.60.61.63.64.65.67.72.77.103.
Str2
Str1
Str4
str3
SpacerPHYL
Comparaison de deux méthodes : parcimonie et distance
1) Pairwise alignmenta.b.-.-.f.g.h.i.&-.b.c.d.f.-.h.i.j-.*.-.-.*.-.*.*.&
INPUT: (séq orientées : leader à gche)>Yersinia_pestis_CO92b.c.e.h.i.j.k.l.m>Yersinia_pestis_KIMa.b.f.g.h.i.j.l>Yersinia_pestis_Antiquaa.b.c.d.j.k>Yersinia_pestis_Microtusa.d.f>Yersinia_pestis_Nepal516a
& Ending gap- gap* match
2) Matrice de distance
3) Arbre
4) Alignement et Construction d’un arbrephylogénétique
• valueOpeningGap = 20• valueEndingGap = -10• valueManyGaps = 20• valueFirstMatch = 100• valueNextMatch (i) = 100 + i
CRISPR, a tool of micro-evolution analysis- CRISPR properties- Bioinformatics tools
- Intra-species analysis (Evolutionnary history)
- Strains identification
- Epidemiological studies
A good phylogenetic tool ?
Ancient species:
- highly polymorphic in spacer composition- CRISPRs absence
- Strains differenciation : ok- Phylogenetic relations : not sufficient