![Page 1: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/1.jpg)
Repeated Elements
Computational Method for finding RepeatedElements
Dr. Stephan Steigele
Bioinf, University of Leipzig
Leipzig WS06/07
Stephan Steigele
![Page 2: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/2.jpg)
Repeated Elements
Repeated sequences
Repeated sequences
I sequence instances that occur often (> 1) in a genome
I They have a broad definition, fromI simple sequence repeatsI to very long repeats with full coding capacity for their own
replication (e.g. related to retro-viruses)
Stephan Steigele
![Page 3: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/3.jpg)
Repeated Elements
formal description
formal description
I A DNA sequence is a string S of length n over an alphabetΣ = {A,T ,G,C}
I Si denotes the i character from S, for i ∈ [1,n]
I S−1 is the reversed string of SI Si,j is a substring of S, in which Si is the start in S and Sj is
the end (for i < j)
Stephan Steigele
![Page 4: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/4.jpg)
Repeated Elements
formal description
formal description
I An exact repeat R is represented by the position of bothsubstrings Si1j1 and Si2j2 , hence R = f ((i1, j1), (i2, j2)) andSi1j1 = Si2j2
I Additionally, the positions of both instances have to bedifferent, thus (i1, j1) 6= (i2, j2). R is maximal, if only if Si1−1and Sj1−1 or Si2+1 and Sj2+1, are distinct from each other
I
i1 j1 i2 j2.. G A C C T G .. C A C C T A ..
Maximal repeat R = ((i1, j1), (i2, j2)) = ACCT
Stephan Steigele
![Page 5: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/5.jpg)
Repeated Elements
formal description
formal description
I S denotes the complement of S, according to thecomplementarity strand of DNA.
I the complement follows the Watson-Crick pairs of DNA(C-G and T-A). A inverted repeat is then defined as:Si1j1 = (Si2j2)−1
Stephan Steigele
![Page 6: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/6.jpg)
Repeated Elements
‘C-value paradox’
‘C-value paradox’
I genome size (eukaryotic) displays an important variabilitybetween species without any direct link to complexity
I this ‘C-value paradox’, results from a differentialabundance of numerous repeated sequences
I many genomes contain a large amount of such sequences(about 45% of the human genome, up to 99% of DNA insome plants [Biémont and Vieira, 2004])
I this variability is in strong contrast to a nearly constantnumber of proteins (or generally of genes) found in thedifferent phylogenetic clades
Stephan Steigele
![Page 7: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/7.jpg)
Repeated Elements
Biological Insights
Biological Insights
I often stated as selfish (junk) DNA, with no apparentfunction to its host genome [Orgel and Crick, 1980], manyclasses of repetitive elements are known for their beneficialeffects
I repeated sequences ensure the large scale integrity ofgenomes.
I retroelements serve as boundaries for heterochromatindomains [Volpe et al., 2002] and provide a significantfraction of scaffolding/matrix attachment regions (S/MARs)
Stephan Steigele
![Page 8: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/8.jpg)
Repeated Elements
Biological Insights
Biological Insights
I I retro-transcribed components in the genome plays a majorarchitectonic role in higher order physical structuring[von Sternberg and Shapiro, 2005]
I the evolution of thousands of human proteins is directlyshaped by repetitive sequences [Britten, 2006]
Stephan Steigele
![Page 9: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/9.jpg)
Repeated Elements
Question that need Bioinformatics to be answered
Question that need Bioinformatics to be answered
I knowledge of the amount of repeats in a genome allows arough estimate of the complexity (of that genome)
I this information is necessary in upcoming genomeassembly steps
I repeated elements are useful for phylogenetic inference,expecially when closely related species are compared
I for instance, ‘Short Interspersed Nuclear Elements’(SINEs) may provide the most valuable phylogeneticinformation [Bannikova, 2004] in phylogeneticreconstruction of mammals
Stephan Steigele
![Page 10: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/10.jpg)
Repeated Elements
Question that need Bioinformatics to be answered
Repeats need to be masked prior to performing mostsingle-species or multi-species analyses
“Every time we compare two species that are closer to eachother than either is to humans, we get nearly killed byunmasked repeats.”Webb Miller
Stephan Steigele
![Page 11: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/11.jpg)
Repeated Elements
Question that need Bioinformatics to be answered
Repeats need to be masked prior to performing mostsingle-species or multi-species analyses
Stephan Steigele
![Page 12: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/12.jpg)
Repeated Elements
Computational Approaches: REPEATMASKER
REPEATMASKER
I REPEATMASKER1 is the most commonly usedcomputational tool to detect and annotate repeats
I it is superior, both in sensitivity and specificity to mostother in-silico techniques
I BIG caveat: REPEATMASKER is limited to the repetitiveelements given in a database (one often used repeatdatabase is RepBase2)
I hence, REPEATMASKER is no program for the de-novoidentification of repetitive elements
1http://www.repeatmasker.org/2http://www.girinst.org/repbase/update/
Stephan Steigele
![Page 13: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/13.jpg)
Repeated Elements
Computational Approaches: REPEATMASKER
REPEATMASKER
Advantage of RepeatmaskerI provides individual substitution matrices for repeat families,
one reason for the high sensitivity and specificityI is very fast with extension to WU-BLAST → MASKERAID
I is capable in dissecting composite elements→ allowsreconstruction of evolutionary scenarios
Stephan Steigele
![Page 14: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/14.jpg)
Repeated Elements
Computational Approaches: REPEATMASKER
Repeats need to be masked prior to performing mostsingle-species or multi-species analyses
For widely studied genomes such as human and mouse,libraries of repeat families have been manually curated:
I Repbase Update library (http://www.girinst.org)I RepeatMasker library (http://www.repeatmasker.org)
Stephan Steigele
![Page 15: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/15.jpg)
Repeated Elements
Computational Approaches: REPEATMASKER
Repeats need to be masked prior to performing mostsingle-species or multi-species analyses
Stephan Steigele
![Page 16: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/16.jpg)
Repeated Elements
Computational Approaches: REPEATMASKER
REPEATMASKER
I fails the “platypus test”:I repeat families are largely species-specific, so if one were
to analyze a new genome (like the platypus), a new repeatlibrary would first need to be manually compiled
Stephan Steigele
![Page 17: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/17.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
Problem: Given a long text t and many short queries q1, ...,qk .For each query sequence qi , find all its occurrences in t .
I Provides a data-structure that allows us to search forrepeats efficiently
I allows searching for maximal repeats in linear time
Stephan Steigele
![Page 18: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/18.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
I Consider the text abab$I It has the following suffixes:I abab$, bab$, ab$, b$, and $.
Stephan Steigele
![Page 19: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/19.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
I (a) The suffixes abab$ and ab$ both share the prefix ab.I (b) The suffixes bab$ and b$ both share the prefix b.I (c) The suffix $ doesn’t share a prefix.
Stephan Steigele
![Page 20: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/20.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
I to determine whether a given query q is contained in thetext, we check whether q is the prefix of one of the suffixes.
I e.g., the query ab is the prefix of both abab$ and ab$.I to speed up the search for all suffixes that have the query
as a prefix, we use a tree structure to share commonprefixes between the suffixes.
Stephan Steigele
![Page 21: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/21.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
Suffix tree for abab$ is obtained by sharing prefixes where everpossible. The leaves are annotated by the positions of thecorresponding suffixes in the text.
Stephan Steigele
![Page 22: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/22.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
I Ukkonen Algorithm builds suffix tree in constant (linear)time O(n)
I maximal repeats could be detected in constant time andspace O(n + z)
Stephan Steigele
![Page 23: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/23.jpg)
Repeated Elements
Pure Algorithmic Approach
Suffix Trees
I suffix trees provide a fast way to identify nearly identicalrepeats
I however, they suffer (massively) in performance if therepeat instances get more diverse.
I compound instances of repeats are hard to detectI better methods exist that are explicitly designed to find
repetitive elements
Stephan Steigele
![Page 24: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/24.jpg)
Repeated Elements
Succesful algorithms
Most succesful algorithms for detecting repeats ingenomes take care of differential characteristics ofrepeat families
Stephan Steigele
![Page 25: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/25.jpg)
Repeated Elements
Classification of Repeats
Classification by Cot-curves
I Based on the reassociation rate, DNA sequences aredivided into three classes:
I Highly repetitive: About 10-15% of mammalian DNAreassociates very rapidly. This class includes tandemrepeats.
I Moderately repetitive: Roughly 25-40% of mammalian DNAreassociates at an intermediate rate. This class includesinterspersed repeats.
I Single copy genes (or very low copy number genes): Thisclass accounts for 50-60% of mammalian DNA.
Stephan Steigele
![Page 26: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/26.jpg)
Repeated Elements
Classification of Repeats
Tandem Repeats
I satellites DNA ranges from 100 kb to over 1 Mb.In humans, a well known example is the alphoid DNA located at thecentromere of all chromosomes. Its repeat unit is 171 bp and therepetitive region accounts for 3-5% of the DNA in each chromosome
I minisatellites may differ between individuals. Hence, thisfeature is ideally used for DNA fingerprinting. A example for aknown minisatellites is the telomere. In a human germ cell, the size of atelomere is about 15 kb (however, in an aging somatic cell, the telomereis shorter). The telomere contains the tandemly repeated sequenceGGGTTA
I microsatellites are characterized by the shortest repeatunits (e.g. two base pairs, as in (CAn))
Stephan Steigele
![Page 27: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/27.jpg)
Repeated Elements
Classification of Repeats
(Retro-)Transposons
I Interspersed repeats are repeated DNA sequenceslocated at dispersed regions in a genome.
I Transposons are segments of DNA that can move betweendifferent positions in the genome of a single cell. Theywere first discovered by Barbara McClintock in maize[McClintock, 1950]. These mobile segments of DNA aresometimes called ‘jumping genes’. There are two distincttypes:
I Class-I : Retrotransposons thatI first transcribe the DNA into RNA, thenI use reverse transcriptase to make a DNA copy of the RNA to
insert in a new locationI Class-II : Transposons consisting only of DNA that moves
directly from place to place
Stephan Steigele
![Page 28: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/28.jpg)
Repeated Elements
Classification of Repeats
Retrotransposons
I retrotransposons (Class-I) are related to (retro)-virusesI the most important protein is the reverse transcriptase.
This key protein catalyses a unique reaction, the synthesisof DNA by an RNA template
I a retrotransposed element is duplicated, with a version ofthe element in its original place and a copy of theretrotranscribed element at a second place in the genome.
I the mechanism is called copy-and-paste-mechanism.I functional retrotransposons are independent from their
host, with own internal promotor and coding regions forboth, integrase proteins (endonucleolytic) and the reversetranskriptase.
Stephan Steigele
![Page 29: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/29.jpg)
Repeated Elements
Classification of Repeats
TransposonsI Class-II-transposons are moving mainly by
cut-and-reinsertion operations (cut-and-paste-mechanism)I each cycle of transposition is initiated by single- or
doublestrand breaks. The exposed ends of the excisedelements are then reinserted at other parts of the genome[Shapiro, 1999]
I the key enzyme of this reaction is the transposase, whichis distanly related to the integrase proteins ofretrotransposons
1308bp
344 as 26bp26bp
ITR ITRcatalytic domainDNA-binding domain
catalytic triade
Figure: Sketch of Class-II transposons. The coding region of thetransposase is flanked by repeats. The lower part shows theproposed junktion of DNA upon binding of transposase as the initialstep of transposition.
Stephan Steigele
![Page 30: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/30.jpg)
Repeated Elements
Classification of Repeats
SINEs (short interspersed elements)
I originate from small RNAs like 7SL-RNA or tRNAsI often with internal promotor for RNA-Polymerase IIII spectacular example is ALU familiy in human with roughly
1.000.000 membersI the 3′ and 5′-end are duplicatedI in the 5′ end is the A-Box and B-Box
(RNA-Polymerase-III-promotor), homologous to 7SL-RNAI the 3′-end lacks a promotor region
Stephan Steigele
![Page 31: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/31.jpg)
Repeated Elements
Inverted Tandem Repeat
Inverted Tandem Repeat, ITRI Class-I and Class-II are characterized by inverted (or
directed) repeats (Inverted Tandem Repeat, ITR), flankingthe coding regions of the element
I these sequences are used for the recognition by enzymes[Lampe et al., 2001] and differ in length, from six to somehundred basepairs
I in many cases, short target site duplications are observedI e.g. Tc1 solely jumps to the dinucleotide TA, duplicating
the dinucleotide at each flanking site of the ITRs
10 20
CemaT1/1-26PpmaT1/1-26CbmaT1/1-25CbmaT4/1-25Ppmar1/1-17Bmmar6/1-27Bmmar1/1-28
- CA G G G T G A G T C A A A A T T AT G G T A A G T -- CA G G G T G C G C C A A A A T T AC C C G C A C T -- CA G G G T G G C C G A T A A G T A T A G G A C G - -- CA G G G T G G C C G A T A A G T A T A G G A C G - -- - -G G G T G T C C C A C A T A T T T- - - - - -A C T A G T C A G G T C A T A A G T A T T GT C A C A -C T T AG T C T G G C C A T A A A T A C T G T T AC A A
Consensus
- C A G G G T G G G C C A T A A G T A T + G + A A + + -
Stephan Steigele
![Page 32: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/32.jpg)
Repeated Elements
Inverted Tandem Repeat
Inverted Tandem Repeat leave footprints in genomicDNA
Ppmat1
Figure: ITRs observed in C. briggsae based on homology searcheswith BLASTN. The ITRs are the only conserved entities of themaT -family (in fact, they are 90% identical). The open reading frameis not detectable on the nucleotide level.
Ppmat1
Figure: Protein coding sequences of maT elements in C. briggsae,observed by homology searches with TBLASTN. Only by searching onthe protein level (protein sequences vs. translated genomicsequence), a homologous region coding for the transposase proteinwas found.
Stephan Steigele
![Page 33: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/33.jpg)
Repeated Elements
Computational Approaches: RECON
RECON
I RECON is a recent and widely used approach to discovernew repetitive elements in unknown genomes
I The approach was published by approach of Bao andEddy [Bao and Eddy, 2002] in Genome Resarch,“Automated De Novo Identification of Repeat SequenceFamilies in Sequenced Genomes”
I methods is based on a extension to usual single linkageclustering of local pairwise alignments between genomicsequences
I method is thought to fill the gap with a prgramm thatprovides libraries for REPEATMASKER
Stephan Steigele
![Page 34: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/34.jpg)
Repeated Elements
Computational Approaches: RECON
DEFINITIONS
I given a set of genomic sequences {Sn}I identify all repeat families {Fα} thereinI each repeat is a subsequence Sn(sk ,ek ) where sk and ek
are start and end positions in sequence Sn
I therefore outpur of the algorithm is Fα = {Sn(sk ,ek )}
Stephan Steigele
![Page 35: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/35.jpg)
Repeated Elements
Computational Approaches: RECON
DEFINITIONS
I element : individual copy of a repeat Sn(sk ,ek )
I image: are observations from pairwise comparisonsI syntopic: two images of the same element are syntopic
syntopy is the problem which is mainly adressed by the workof Bao and Eddy
Stephan Steigele
![Page 36: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/36.jpg)
Repeated Elements
Computational Approaches: RECON
OVERVIEW
Stephan Steigele
![Page 37: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/37.jpg)
Repeated Elements
Computational Approaches: RECON
Single Linkage Clustering
I obtain pairwise alignments between sequences in {Sn}I define elements {Sn(sk ,ek )}
I construct graph G(V ,E) with V represents all sequencesand E all significant alignments
I find all connected componentsI group elements on basis of sequence similarity
I construct graph H(V ′,E ′) with V ′ represents all elementsand E ′ the similarity between them
I find all connected components
Stephan Steigele
![Page 38: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/38.jpg)
Repeated Elements
Computational Approaches: RECON
Single Linkage Clustering leads to spuriousreconstruction of repeats
Stephan Steigele
![Page 39: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/39.jpg)
Repeated Elements
Computational Approaches: RECON
Extending the Single Linkage Clustering approach
I decomposit elements: Element Reevaluation and UpdateRule
I filter misleading alignments: Image End Selection RuleI filter partial elements: Family Graph Construction
Procedure with Edge Reevaluation
Stephan Steigele
![Page 40: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/40.jpg)
Repeated Elements
Computational Approaches: RECON
decomposit elements:Element Reevaluation and Update Rule
Stephan Steigele
![Page 41: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/41.jpg)
Repeated Elements
Computational Approaches: RECON
decomposit elements: Element Reevaluation andUpdate Rule
I slide window over aligned imagesI seed a cluster if the leftmost end is not clusteredI pull in existing cluster when in certain distance
I foreach clusterI let n denote the number of ends in cluster, c denote the
mean position of these n ends and m the number of totalelements spanning pos. c
I if n/m is greater than a given treshold, c is considered as aaggregation point
I split element at aggregation points if necessary
Stephan Steigele
![Page 42: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/42.jpg)
Repeated Elements
Computational Approaches: RECON
filter misleading alignments: Image End SelectionRule
Stephan Steigele
![Page 43: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/43.jpg)
Repeated Elements
Computational Approaches: RECON
filter partial elements: Family Graph ConstructionProcedure with Edge Reevaluation
Stephan Steigele
![Page 44: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/44.jpg)
Repeated Elements
Computational Approaches: RECON
filter partial elements: Family Graph ConstructionProcedure with Edge Reevaluation
I construct graph G(V ,E) with V represents all elementsand E forming either a primary edge e (significantalignment) or an secondary edge e′ (no significantalignment)
I foreach vertex v inspect primary edgesI let N(v) denote the set of vertices directly connected to v
via primary edgesI if any pair in N(v) is connected by a secondary edge e′,
thenI ∀v ′εN(v) remove primary edges e between v and v ′
I unless v ′ is the closest related element to v in N(v)I or v is the closest related element to v ′ in N(v ′)
I remove secondary edgesI update family assignment
Stephan Steigele
![Page 45: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/45.jpg)
Repeated Elements
Computational Approaches: RECON
RESULTS
Stephan Steigele
![Page 46: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/46.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
RepeatScout
RepeatScout “De novo identification of repeat families in largegenomes” by Pevzner and colleagues [Price et al., 2005]
Stephan Steigele
![Page 47: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/47.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 48: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/48.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 49: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/49.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 50: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/50.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 51: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/51.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 52: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/52.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 53: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/53.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 54: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/54.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 55: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/55.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 56: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/56.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 57: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/57.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 58: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/58.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 59: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/59.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 60: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/60.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 61: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/61.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 62: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/62.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 63: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/63.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 64: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/64.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 65: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/65.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 66: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/66.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
local alignment score
f (i ,0) = 0, (1)
f (0, j) = 0, (2)
f (i , j) = max
f (i − 1, j − 1) + µij
f (i , j − 1)− γf (i − 1, j)− γ0
, (3)
α(Q,S) = maxi,j f (i , j) (4)
Stephan Steigele
![Page 67: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/67.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
The fit preferred alignment score
f (i ,0) = max(−γi ,−p), (5)
f (0, j) = 0, (6)
f (i , j) = max
f (i − 1, j − 1) + µij
f (i , j − 1)− γf (i − 1, j)− γ−p
, (7)
α(Q,S) = maxi,j
{f (i , j) if i = |Q|f (i , j)− p if i < |Q|
(8)
Stephan Steigele
![Page 68: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/68.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
The fit-preferred alignment score
Stephan Steigele
![Page 69: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/69.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
The fit preferred alignment score
Stephan Steigele
![Page 70: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/70.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 71: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/71.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 72: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/72.jpg)
Repeated Elements
Computational Approaches: REPEATSCOUT
Stephan Steigele
![Page 73: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/73.jpg)
Repeated Elements
Computational Approaches: REAS
ReAS
“ReAS: Recovery of Ancestral Sequences for TransposableElements from the Unassembled Reads of a Whole GenomeShotgun” by Li et al. [Li et al., 2005]
Stephan Steigele
![Page 74: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/74.jpg)
Repeated Elements
Computational Approaches: REAS
Unique features
I REAS is similar to REPEATSCOUT → first search for k-meroccurences
I however, it is tuned to reconstruct repeat elements fromshotgun sequence data
I thus, REAS is useful in pre-assembly steps
Stephan Steigele
![Page 75: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/75.jpg)
Repeated Elements
Computational Approaches: REAS
algorithm in short
I compute K-mer depth, which is the number of times that aK-mer appears in the shotgun data
I seed the process using a randomly chosen high-depthK-mer
I all shotgun reads containing this K-mer are retrieved andtrimmed into 100-bp segments centered at that K-mer
I if the sequence identity between them exceeds a presetthreshold, they are assembled into an initial consensussequence (ICS) using ClustalW
Stephan Steigele
![Page 76: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/76.jpg)
Repeated Elements
Computational Approaches: REAS
algorithm in short
I an iterative extension by selecting high-depth K-mers atboth ends of the ICS is perfomred while repeating theabove procedure.
I after all such extensions are done, clone-end pairinginformation is used to resolve ambiguous joins and tobreak misassemblies, but not to join fragmentedassemblies
I the final consensus is our REAS repeat element
Stephan Steigele
![Page 77: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/77.jpg)
Repeated Elements
Computational Approaches: REAS
Overview of algorithm
Stephan Steigele
![Page 78: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/78.jpg)
Repeated Elements
Computational Approaches: REAS
General difficulties
the idealizied algorithm described above is a simplificationthere are 3 problems:
I ambiguity/misassembly: the fork problemI fragmentationI segmental duplication
Stephan Steigele
![Page 79: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/79.jpg)
Repeated Elements
Computational Approaches: REAS
the fork problem
Stephan Steigele
![Page 80: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/80.jpg)
Repeated Elements
Computational Approaches: REAS
the fork problem
either resolved byI overlapping readsI clone-end data
Stephan Steigele
![Page 81: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/81.jpg)
Repeated Elements
Computational Approaches: REAS
the fork problem
if a-e-c and b-e-d are both supported, the other paths arediscarded
Stephan Steigele
![Page 82: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/82.jpg)
Repeated Elements
Computational Approaches: REAS
the fork problem
if a-e-c is only supported, b-e-d is the other most likeliest pathand kept
Stephan Steigele
![Page 83: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/83.jpg)
Repeated Elements
Computational Approaches: REAS
the fork problem
if a-e-c and a-e-d are both supported, no decission is possibleand all paths are kept
Stephan Steigele
![Page 84: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/84.jpg)
Repeated Elements
Computational Approaches: REAS
the duplication problem
Stephan Steigele
![Page 85: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/85.jpg)
Repeated Elements
Computational Approaches: REAS
greedily solve the duplication problem
I repeat boundaries are detected by sudden chances in ink-mer depth
I search for aggregation of endpoints (similar to RECON)
Stephan Steigele
![Page 86: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/86.jpg)
Repeated Elements
Computational Approaches: REAS
Overview of Results
Stephan Steigele
![Page 87: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/87.jpg)
Repeated Elements
Computational Approaches: REAS
Some results in detail
Stephan Steigele
![Page 88: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/88.jpg)
Repeated Elements
Computational Approaches: REAS
Some results in detail
Stephan Steigele
![Page 89: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/89.jpg)
Repeated Elements
Computational Approaches: REAS
Example for fragmentation problem
Stephan Steigele
![Page 90: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/90.jpg)
Repeated Elements
Computational Approaches: REAS
Bannikova, A. A. (2004).[Molecular markers and modern phylogenetics ofmammals].Zh Obshch Biol, 65(4):278–305.
Bao, Z. and Eddy, S. R. (2002).Automated de novo identification of repeat sequencefamilies in sequenced genomes.Genome Res, 12(8):1269–1276.
Biémont, C. and Vieira, C. (2004).[The influence of transposable elements on genome size].J Soc Biol, 198(4):413–417.
Britten, R. (2006).Transposable elements have contributed to thousands ofhuman proteins.
Stephan Steigele
![Page 91: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/91.jpg)
Repeated Elements
Computational Approaches: REAS
Proc Natl Acad Sci U S A.
Lampe, D. J., Walden, K. K., and Robertson, H. M. (2001).Loss of transposase-DNA interaction may underlie thedivergence of mariner family transposable elements andthe ability of more than one mariner to occupy the samegenome.Mol Biol Evol, 18(6):954–961.
Li, R., Ye, J., Li, S., Wang, J., Han, Y., Ye, C., Wang, J.,Yang, H., Yu, J., Wong, G. K.-S., and Wang, J. (2005).ReAS: Recovery of Ancestral Sequences for TransposableElements from the Unassembled Reads of a WholeGenome Shotgun.PLoS Comput Biol, 1(4):e43.
McClintock, B. (1950).The origin and behavior of mutable loci in maize.
Stephan Steigele
![Page 92: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/92.jpg)
Repeated Elements
Computational Approaches: REAS
Proc Natl Acad Sci U S A, 36(6):344–355.
Orgel, L. E. and Crick, F. H. (1980).Selfish DNA: the ultimate parasite.Nature, 284(5757):604–607.
Price, A. L., Jones, N. C., and Pevzner, P. A. (2005).De novo identification of repeat families in large genomes.Bioinformatics, 21 Suppl 1:i351–i358.
Shapiro, J. A. (1999).Transposable elements as the key to a 21st century view ofevolution.Genetica, 107(1-3):171–179.
Volpe, T., Kidner, C., Hall, I., Teng, G., Grewal, S., andMartienssen, R. (2002).
Stephan Steigele
![Page 93: Computational Method for finding Repeated Elementssteigele/repeats.pdfI Transposons are segments of DNA that can move between different positions in the genome of a single cell](https://reader033.vdocuments.net/reader033/viewer/2022042309/5ed6a1ebf8f40e7c16721bef/html5/thumbnails/93.jpg)
Repeated Elements
Computational Approaches: REAS
Regulation of heterochromatic silencing and histone H3lysine-9 methylation by RNAi.Science, 297(5588):1833–7.
von Sternberg, R. and Shapiro, J. A. (2005).How repeated retroelements format genome function.Cytogenet Genome Res, 110(1-4):108–116.
Stephan Steigele