repetitive elements may comprise over two-thirds of the human genome
DESCRIPTION
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. 2012-03-05. Abstract. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/1.jpg)
Repetitive Elements May Comprise Over Two-Thirds of
the Human Genome
Repetitive Elements May Comprise Over Two-Thirds of
the Human Genome
2012-03-05
2012-03-05
![Page 2: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/2.jpg)
AbstractAbstract
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We
recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus
suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we
conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (~25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify
novel Alu and MIR SINE elements, and using it we identified ~100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the
amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive
sequence than previously believed.
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We
recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus
suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we
conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (~25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify
novel Alu and MIR SINE elements, and using it we identified ~100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the
amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive
sequence than previously believed.
![Page 3: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/3.jpg)
Repeated sequenceRepeated sequence
• Tandem repeats• Satellite DNA, Minisatellite, Microsatellite
• Interspersed repeats (transposon)• Retrotransposon (copy and paste)
•SINEs (Alu, MIR)•LINEs (LINE1, LINE2)•LTRs (HERV, MER4, retroposon)
• DNA transposon (cut and paste)
• Tandem repeats• Satellite DNA, Minisatellite, Microsatellite
• Interspersed repeats (transposon)• Retrotransposon (copy and paste)
•SINEs (Alu, MIR)•LINEs (LINE1, LINE2)•LTRs (HERV, MER4, retroposon)
• DNA transposon (cut and paste)
![Page 4: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/4.jpg)
What is the human genome sequence made of ?
What is the human genome sequence made of ?
![Page 5: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/5.jpg)
MotivationMotivation
• Evolution would have heavily altered substantial amounts of TE-derived sequence
• The relations among large clusters of sequences may make them detectable
• The commonly used approach, RepeatMasker, relies on Repbase library
• Evolution would have heavily altered substantial amounts of TE-derived sequence
• The relations among large clusters of sequences may make them detectable
• The commonly used approach, RepeatMasker, relies on Repbase library
![Page 6: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/6.jpg)
OutlineOutline
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
![Page 7: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/7.jpg)
OutlineOutline
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
![Page 8: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/8.jpg)
![Page 9: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/9.jpg)
OutlineOutline
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
![Page 10: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/10.jpg)
![Page 11: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/11.jpg)
OutlineOutline
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
![Page 12: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/12.jpg)
![Page 13: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/13.jpg)
![Page 14: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/14.jpg)
OutlineOutline
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
• The P-clouds method• De novo repeat annotation of the human genome with P-clouds
• P-clouds and RepeatMasker detection capability for fragments of known elements
• Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements
![Page 15: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/15.jpg)
![Page 16: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/16.jpg)
![Page 17: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/17.jpg)
![Page 18: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome](https://reader035.vdocuments.net/reader035/viewer/2022062720/568134dc550346895d9c0e31/html5/thumbnails/18.jpg)
Potential applicationPotential application
• C-paradox• Next generation sequencing• Population structure based on TEs
• C-paradox• Next generation sequencing• Population structure based on TEs