evolution of tandemly repeated sequences: what happens ... papers/1999...sequences, thus it appears...

13
Evolution of Tandemly Repeated Sequences: What Happens at the End of an Array? Bryant F. McAllister, John H. Werren Department of Biology, University of Rochester, Rochester, New York, USA Received: 18 August 1997 / Accepted: 18 September 1998 Abstract. Tandemly repeated sequences are a major component of the eukaryotic genome. Although the gen- eral characteristics of tandem repeats have been well documented, the processes involved in their origin and maintenance remain unknown. In this study, a region on the paternal sex ratio (PSR) chromosome was analyzed to investigate the mechanisms of tandem repeat evolu- tion. The region contains a junction between a tandem array of PSR2 repeats and a copy of the retrotransposon NATE, with other dispersed repeats (putative mobile el- ements) on the other side of the element. Little similarity was detected between the sequence of PSR2 and the region of NATE flanking the array, indicating that the PSR2 repeat did not originate from the underlying NATE sequence. However, a short region of sequence similarity (11/15 bp) and an inverted region of sequence identity (8 bp) are present on either side of the junction. These short sequences may have facilitated nonhomologous recom- bination between NATE and PSR2, resulting in the for- mation of the junction. Adjacent to the junction, the three most terminal repeats in the PSR2 array exhibited a higher sequence divergence relative to internal repeats, which is consistent with a theoretical prediction of the unequal exchange model for tandem repeat evolution. Other NATE insertion sites were characterized which show proximity to both tandem repeats and complex DNAs containing additional dispersed repeats. An ‘‘ac- cretion model’’ is proposed to account for this associa- tion by the accumulation of mobile elements at the ends of tandem arrays and into ‘‘islands’’ within arrays. Mo- bile elements inserting into arrays will tend to migrate into islands and to array ends, due to the turnover in the number of intervening repeats. Key words: Tandemly repetitive DNA — Satellite DNA — Retrotransposons — Junction site — Unequal sister chromatid exchange — Recombination Introduction Tandemly repeated sequences, or satellite DNAs, are present in a wide range of organisms, and in certain cases these sequences may constitute a large percentage of the total genome. Because tandemly repeated sequences are so abundant and appear to have little or no functional significance, they are commonly regarded as ‘‘selfish’’ or ‘‘junk’’ DNA (Doolittle and Sapienza 1980; Orgel and Crick 1980). Studies of tandem repeats have gener- ally focused on determining the nucleotide sequence, abundance, genomic organization, and/or species distri- bution of the repeats (Willard 1989). Tandem repeats are routinely divided into distinct classes based on the size of the repeating sequence; such as microsatellite, minisat- ellite, and satellite DNA (Charlesworth et al. 1994). It is the ‘‘classic’’ satellite sequences (approximately 200 bases in length) which are the focus of this investigation. Two important questions regarding the evolution of sat- ellite sequences remain unresolved: What leads to the origins of these repeats? and What are the dynamics of Correspondence to: Bryant McAllister at Department of Biology, Uni- versity of Texas at Arlington, Box 19498, Arlington, TX 76019-0498, USA J Mol Evol (1999) 48:469–481 © Springer-Verlag New York Inc. 1999

Upload: others

Post on 20-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Evolution of Tandemly Repeated Sequences: What Happens at the End ofan Array?

    Bryant F. McAllister, John H. Werren

    Department of Biology, University of Rochester, Rochester, New York, USA

    Received: 18 August 1997 / Accepted: 18 September 1998

    Abstract. Tandemly repeated sequences are a majorcomponent of the eukaryotic genome. Although the gen-eral characteristics of tandem repeats have been welldocumented, the processes involved in their origin andmaintenance remain unknown. In this study, a region onthe paternal sex ratio (PSR) chromosome was analyzedto investigate the mechanisms of tandem repeat evolu-tion. The region contains a junction between a tandemarray of PSR2 repeats and a copy of the retrotransposonNATE,with other dispersed repeats (putative mobile el-ements) on the other side of the element. Little similaritywas detected between the sequence of PSR2 and theregion of NATE flanking the array, indicating that thePSR2 repeat did not originate from the underlyingNATEsequence. However, a short region of sequence similarity(11/15 bp) and an inverted region of sequence identity (8bp) are present on either side of the junction. These shortsequences may have facilitated nonhomologous recom-bination betweenNATEand PSR2, resulting in the for-mation of the junction. Adjacent to the junction, the threemost terminal repeats in the PSR2 array exhibited ahigher sequence divergence relative to internal repeats,which is consistent with a theoretical prediction of theunequal exchange model for tandem repeat evolution.Other NATE insertion sites were characterized whichshow proximity to both tandem repeats and complexDNAs containing additional dispersed repeats. An ‘‘ac-cretion model’’ is proposed to account for this associa-

    tion by the accumulation of mobile elements at the endsof tandem arrays and into ‘‘islands’’ within arrays. Mo-bile elements inserting into arrays will tend to migrateinto islands and to array ends, due to the turnover in thenumber of intervening repeats.

    Key words: Tandemly repetitive DNA — SatelliteDNA — Retrotransposons — Junction site — Unequalsister chromatid exchange — Recombination

    Introduction

    Tandemly repeated sequences, or satellite DNAs, arepresent in a wide range of organisms, and in certain casesthese sequences may constitute a large percentage of thetotal genome. Because tandemly repeated sequences areso abundant and appear to have little or no functionalsignificance, they are commonly regarded as ‘‘selfish’’or ‘‘junk’’ DNA (Doolittle and Sapienza 1980; Orgeland Crick 1980). Studies of tandem repeats have gener-ally focused on determining the nucleotide sequence,abundance, genomic organization, and/or species distri-bution of the repeats (Willard 1989). Tandem repeats areroutinely divided into distinct classes based on the size ofthe repeating sequence; such as microsatellite, minisat-ellite, and satellite DNA (Charlesworth et al. 1994). It isthe ‘‘classic’’ satellite sequences (approximately 200bases in length) which are the focus of this investigation.Two important questions regarding the evolution of sat-ellite sequences remain unresolved: What leads to theorigins of these repeats? and What are the dynamics of

    Correspondence to:Bryant McAllister at Department of Biology, Uni-versity of Texas at Arlington, Box 19498, Arlington, TX 76019-0498,USA

    J Mol Evol (1999) 48:469–481

    © Springer-Verlag New York Inc. 1999

  • repetitive arrays? Many theoretical models have pro-vided a framework in which to address these issues, al-though the empirical data have been inadequate for dis-tinguishing among the proposed mechanisms underlyingrepeat evolution.

    The first issue is the origins of repetitive arrays. Se-quence duplication by an aberrant recombination(Krüger and Vogel 1975; Ohta 1980) or replication slip-page (Levinson and Gutman 1987; Walsh 1987; Stephan1989) may represent the primary event which creates arepeat. Both of these processes involve duplicating a pre-viously single-copy sequence, thus initializing a tandemrepeat. After amplification of the sequence into a largearray, the final sequence of the repeating unit shouldresemble the original nonrepetitive sequence. Smith(1976) envisioned a different process of repeat origin. Hepostulated that a nonfunctional sequence will experienceunequal exchange events (mismatches out of registerfrom complete identity), and the recombinant productswould not be eliminated by purifying selection. Follow-ing a large number of these unequal exchanges, a peri-odicity evolves in the sequence (Smith 1976; Stephan1989). This model predicts that repeats originate from anexisting sequence, although the relationship between theoriginal sequence and the final repeated unit is indeter-minate. The duplication, replication slippage, and un-equal exchange models all propose a de novo origin ofeach repetitive array, but an alternative is that new re-petitive arrays evolve from existing arrays at other loca-tions in the genome. Excision and circularization of afew repeats from a tandem array may provide a templatefor rolling circle replication (Walsh 1987), and followingamplification into a linear array, the repeats may insert ata new location in the genome. By examining the se-quences immediately flanking repetitive arrays, or byinferring the ancestral state of the underlying single-copysequence prior to repeat origin, it may be possible toidentify patterns that are consistent with one of thesemechanisms.

    A second issue concerns the dynamics and mainte-nance of existing arrays. The most widely acceptedmechanisms underlying maintenance of arrays of satel-lite DNA are unequal exchange and intrastrand exchange(Charlesworth et al. 1994). Successive unequal ex-changes between paralogous units of repetitive arrays onseparate DNA strands prevents individual repeats fromevolving independently, instead the arrays evolve in con-cert (Smith 1976; Ohta 1980; Stephan 1989). Unequalexchanges increase and decrease the number of repeatsin an array, which over successive generations reducessequence variation and causes turnover among repeats bystochastic fixation of a single repeat type. According tothis model, homogenization by unequal exchange occursin regions of the array that are free to recombine; how-ever, unique sequences flanking a repetitive array inhibit

    exchanges involving repeats located near the termini(Stephan 1989). These terminal repeats are also influ-enced by overall rates of unequal exchange, whichcauses identity among repeats to decrease as spatialdistance increases (Ohta 1980). Therefore, evolution byunequal exchange should lead to terminal repeatscontaining the highest sequence divergence relative tothe common repeated sequence. Models of unequalexchange also indicate that higher order organiza-tion of repeats can occur (Stephan 1989). This pro-cess may be facilitated by the insertion of mobileelements into tandem arrays, although there have beenfew studies of the pattern and distribution of elementswithin tandem arrays and little is known about the struc-ture of junctions between tandem arrays and otherDNAs.

    The Paternal Sex Ratio (PSR) chromosome in theparasitic waspNasonia vitripennisprovides a uniquesystem for studying the origin, maintenance, and struc-ture of tandemly repeated sequences. A key feature ofPSR is that it never experiences meiosis and is main-tained strictly as a haploid through mitotic cell cycles.PSR is continually maintained in a haploid cell lineagebecause it is always carried by (haploid) male wasps. Thechromosome is transmitted in sperm, and after fertiliza-tion PSR causes loss of the paternal autosomes in thefirst zygotic division (Werren et al. 1987; Nur et al.1988). The result is development of a male wasp con-taining the haploid maternal autosomes and PSR.Since spermatogenesis is entirely mitotic in male wasps,PSR never undergoes meiotic recombination. Therefore,all processes involved in the maintenance of tandemrepeats on PSR act exclusively during mitotic cell divi-sions.

    The PSR chromosome contains four major families oftandemly repeated sequences (Eickbush et al. 1992).Three of these tandem repeats (PSR2, PSR18, andPSR22) are classic ‘‘satellite’’ DNAs and they appar-ently evolved specifically on PSR, because they are notpresent on the autosomes ofN. vitripennis or relatedspecies (Nur et al. 1988; Eickbush et al. 1992). The threerepeated sequences are approximately 171–214 bp inlength (Eickbush et al. 1992) and are present in largelocalized arrays on the PSR chromosome (Beukeboomand Werren 1993). Overall sequence similarity is evidentand two palindromes are highly conserved among thesequences, thus it appears that these repetitive sequencesare evolutionarily related (Eickbush et al. 1992; Reed etal. 1994). In this study, junction regions between tandemarrays and regions of nonarray DNA are analyzed toevaluate mechanisms of tandem array evolution.Prompted by the frequent association between tandemrepeats and dispersed repeats on the PSR chromosome, amodel is also proposed for the accumulation of mobileDNA in ‘‘islands’’ and at the ends of arrays.

    470

  • Materials and Methods

    The methods for isolating and restriction mapping thel-clones used inthis study have been presented previously (McAllister 1995). Cloneswere isolated from a genomic library (EMBL 3) constructed fromSau3AI partially digested DNA from a standard PSR lab strain (PSR A)crossed to the high fertilizing MI strain ofN. vitripennis(Reed et al.1994). Regions of thel-clones were subcloned into plasmid (Blue-script) vectors for use as probes in Southern blots and as sequencingtemplate. Southern blotting and hybridization followed standard pro-cedures (McAllister 1995; McAllister and Werren 1997).

    Characterization of theNATE P5/PSR2 Junction.Two l cloneswere used for a detailed analysis of a junction region (Fig. 1A);lP8contains the junction between a truncated copy of the retroelementNATEand an array of PSR2 repeats, andlP16 contains a full-lengthcopy of NATE and flanking sequences at both ends of the element.Because the PSR2 sequences are organized in tandem, a cloning andsequencing strategy was used whereby the location of each PSR2 re-peat inlP8 was identified relative to the junction site (Fig. 1B). Theentire array of PSR2 repeats and the adjacent region ofNATE weresubcloned fromlP8 by digesting the clone withSacI and SalI andligating fragments into Bluescript (Stratagene) plasmid, digested withthe same enzymes and dephosphorylated with calf intestinal alkalinephosphatase. A clone with an insert of 1.7 kb containing the junctionsite was obtained and designated p8-SS13 (Fig. 1B). This clone wasfurther subcloned usingHindIII. A 300-bpHindIII fragment containingtheNATEP5/PSR2 junction andHindIII fragments of PSR2 repeat size(171 bp) were subcloned from p8-SS13 into Bluescript (Fig. 1B).

    The junction site, including 63 bases of the first PSR2 repeat, wassequenced using the 300-bpHindIII fragment as template and sequenc-ing both strands. Sequencing of the PSR2 repeats at the other end of thecloned region was initiated by reading one strand of p8-SS13 from thecloning site (oflP8) into the PSR2 array about 370 bases (Fig. 1B).This sequence spanned 135 bases of the PSR2 array before encounter-ing the firstHindIII site, one complete stretch of 171 bases flanked byHindIII sites, and 65 bases into the adjacentHindIII fragment. Sixclones containingHindIII fragments of PSR2 repeat size were fullysequenced on both strands. Three clones were identical to the 171-baseHindIII fragment sequenced from p8-SS13, two shared unique nucleo-tide substitutions with the 65-base incompleteHindIII fragment, andone contained unique nucleotide substitutions that had not been iden-tified in otherHindIII fragments. Upon organization of these sequencesinto a linear array, threeHindIII fragments of PSR2 repeat size were

    present; this was consistent with mapping estimates oflP8 and p8-SS13 indicating the presence of threeHindIII fragments containingPSR2 repeats.

    Sequences from the reverse transcriptase (RT) domain from mul-tiple copies ofNATE cloned from the PSR chromosome revealed aclose relationship among the elements (McAllister 1995). The RT se-quence of the element at this junction (NATEP5) shares 99.0% (597 bpcompared) identity with the same region of the full-length elementNATEP16. Because these elements appear to have recently replicatedfrom a common ancestor, the region corresponding to the junction inNATEP5 was subcloned fromNATEP16 to obtain the sequence of thisregion in a ‘‘normal’’ copy ofNATE.The enzymesEcoRI andHindIIIwere used to subclonelP16 into Bluescript (Fig. 1A). A clone with a1.7-kb insert homologous to the junction region inNATEP5 was ob-tained and this clone was used for sequencing.

    All sequencing reactions were performed on chemically (NaOH)denatured double-stranded plasmid DNA (Ausubel et al. 1992), usingthe Sequenase (USB) kit and labeling with [a35S] dATP. Reactionproducts were electrophoresed through buffer gradient gels. Once se-quences were obtained, manipulations were performed using ESEE(Cabot and Beckenbach 1989). Dot-plot comparisons were performedusing the algorithm of Maizel and Lenk (1981) available in GCG(Genetics Computer Group, Madison, WI, Version 8.0, 1994).

    Amplification of theNATE P5/PSR2 Junction.To verify the pres-ence of the junction between PSR2 andNATEP5 on the PSR chromo-some and to determine if the junction site is variable among chromo-somes, primers flanking the junction were used to amplify the regionfrom five different isolates of PSR using the polymerase chain reaction(PCR). Template DNA was extracted from males carrying PSR A, thestandard laboratory strain that was used to make thel library, and fromfour additional isolates of the PSR chromosome. All of these strainswere collected from natural populations ofN. vitripennisin northeast-ern Utah. The region was amplified using a primer which anneals to thePSR2 repeat (PSR2R, GCT TCT TCA TTT AAG ACG) and anotherthat anneals toNATE (N5, CCA TCC TGT GCA GCC TAG). Reac-tions volumes were 50ml; containing 1/10 reaction buffer (BRL), 2.5mM MgCl2, a 200 nM concentration of each primer, a 100 nM con-centration of each dNTP, 2 U of Taqpolymerase (BRL), and approxi-mately 5 ng of template DNA. A Perkin Elmer Cetus DNA thermalcycler was used with the following parameters: 2× (95°C, 2 min; 54°C,1 min; 72°C, 4 min) linked to 35× (95°C, 0.5 min, 57°C, 1 min, 72°C,2 min). Reaction products were digested withHindIII and electropho-resed through a 1.5% agarose gel to determine the size of theHindIIIfragment containing the junction. The products were also cloned into

    Fig. 1. Structure of clones examined forthe junction analysis.A The clonelP16contains a full-length copy of theretroelementNATEP16. BothlP5 andlP8contain the truncated copy of theretroelementNATEP5 adjacent to thePSR2 array.Horizontal linesbelow thephage represent subcloned regions, withheavy linesindicating sequenced regions.B The subcloning and sequencing strategyemployed to analyze the PSR2 array.Restriction sites areSau3AI (s) andHindIII(h) and thehorizontal linesrepresent theregions sequenced.

    471

  • Bluescript usingHindIII, and the sequence was obtained from bothstrands of one clone containing the junction site from each chromo-some.

    Results

    Several regions containing different copies of the retro-transposonNATEhave been cloned from the PSR chro-mosome (McAllister 1995). These are described later.First, we describe a detailed analysis of a junction be-tween a copy ofNATEand an array of PSR2 repeats. Theregion reveals patterns of repeat evolution and nonho-mologous recombination.

    Analysis of theNATE/PSR2 Junction

    General features of theNATE/PSR2 junction are shownin Fig. 1, based on twol clones containing this region.Restriction digestion patterns are identical in the 11-kbregion shared by the two clones, thus the inserts appar-ently represent overlapping frames of the same region.Hybridization to restriction-digested phage DNA re-vealed that the PSR2 array localizes to one end ofNATEP5, and upon comparisons to intact copies ofNATE,truncation of this end (58) of the element is evident (Fig.1A). The other end (38) of NATEP5 appears intact (Fig.1A). To confirm the hybridization data, a 3.0-kb frag-ment encompassing the LTR, insertion site, and flankingsequence was subcloned fromlP5 (p5-ES3 in Fig. 1A).A primer homologous to a site within the LTR was usedto read the terminal sequence of the LTR and 100 basesflanking the element. Consistent with the hybridizationdata, the 38 end of the element is intact compared to LTRsequences of other elements (data not shown).

    Neither the sequence immediately flanking the intactend ofNATEP5 or a 150-bp sequence located about 1 kbaway from the insertion site exhibited any similarity withthe standard PSR2-1 sequence upon visual comparisonand by dot-plot analysis. Within these flanking se-quences, no repetition was identified. The sequenceflanking this element was very AT rich. Overall, thesequence was 72% AT, and many polynucleotidestretches of A and T were observed. Southern hybridiza-tion revealed that this region contains a middle-repetitivesequence present on both the PSR chromosome and thestandard chromosomal complement (data not shown).Thus, the region flankingNATE to one side containsPSR2 repeats, whereas the other flanking region doesnot.

    In both clones containing the junction, one cloningsite fell within the PSR2 array (Fig. 1A). A restrictionmap indicated that one clone (lP8) contains approxi-mately four PSR2 repeats. This small number of repeatsmadelP8 amenable for a detailed study of the junctionregion, and it was subcloned for further analysis (Fig.1B).

    An approximately 300-bp region of the PSR chromo-some encompassing theNATE P5/PSR2 junction wassubcloned and sequenced. Sequence was obtained for227 bases ofNATEP5 leading up to the junction site andfor 63 bases of the first PSR2 repeat. The homologousregion, and an additional 100 bases representing whatwould be on the other side of the junction site (ifNATEP5 was not truncated at the junction), was sequencedfrom NATEP16. Figure 2A shows the junction sequencecompared to sequences fromNATEP16 and a referencePSR2 repeat [PSR2-1 (Eickbush et al. 1992)]. This com-parison reveals an abrupt transition between theNATEand the PSR2 sequences, with only a ‘‘CT’’ being shared

    Fig. 2. A Sequence of the junction betweenNATEP5 and PSR2. Ineachlarge boxed region,the junction sequence is compared to eitherreference sequence ofNATEP16 or PSR2-1, withdots indicating thatthe same nucleotide is present. Two stretches of sequence similarity,region A and region B, and the palindrome I of PSR2, are indicated by

    shaded boxes.B Proposed pairing configuration involving the regionsof similarity that could have promoted nonhomologous recombinationbetweenNATE and PSR2. Theheavier linesindicate the observedjunction resulting from nonhomologous recombination.

    472

  • at the junction site. The copy ofNATE at the junctionexhibits 96.9% sequence identity with the reference ele-ment throughout 227 bp leading up to the junction site,where there is an immediate transition to a PSR2 se-quence with 92.1% (over 63 bp excluding a 3-bp indel)identity with the reference repeat.

    Origin of theNATE/PSR2 Junction

    As shown in Fig. 2A, two short stretches of sequencesimilarity betweenNATEand PSR2 are present on eitherside of the junction site (designated A and B). Region Ais a stretch of 11 bases of identity shared between a13-base sequence inNATE and a 15-base sequence inPSR2. Region A is 12 bp upstream of the junction inNATE and immediately upstream of the junction inPSR2. Region B is 1 base downstream of the junction inPSR2 and 5 bases downstream inNATE. It consists ofeight bases inNATE,with the exact complement in re-verse orientation present in PSR2. Short regions of se-quence similarity are known to be associated with re-combination events (Metzenberg et al. 1991; Warburtonet al. 1993; Sakagami et al. 1994). This region is unusualin having two short stretches with one a compliment inreverse orientation. It seems unlikely that these two re-gions of notable identity occur in such close proximity ofthe junction by chance.

    A possible configuration leading to recombination be-tween NATE and PSR2 is presented in Fig. 2B. Thisconfiguration involves illegitimate homologous pairingbetweenNATEand PSR2, either within the same chro-mosome or between sister chromatids, and would haveoccurred mitotically (due to the paternal inheritance pat-tern of PSR). Assuming some sort of DNA damagingevent, such as a single-strand or double-strand break,illegitimate homologous pairing betweenNATE andPSR2 could have occurred in region A, with this shortregion of homology being further stabilized byNATEregion B pairing with the complementary strand of PSR2region B (Fig. 2B). The formation of the hybrid structurewith region B is made possible by the looping of the17-base stretch separating A and B withinNATE.In ad-dition, a short palindromic region (AGGCCT) may havefurther facilitated the DNA bending that was required.Resolution of the configuration presumably led to abreak between A and B, with recombination betweenNATE and PSR2 at this site causing truncation of theNATE element. It is also interesting to note that thisregion of the PSR2 sequence has been involved in re-combination with another tandemly repetitive sequence(Reed et al. 1994) and is adjacent to a 14-bp palindromicregion (palin I) present in the PSR2 repeat (Fig. 2A).Palindromes are known to facilitate DNA melting(Chalker et al. 1993) and to be associated with recom-bination events (Bigot et al. 1990; Reed et al. 1994).

    We investigated several alternatives to nonhomolo-

    gous recombination that may have led to the formation ofthis junction. To determine if there is evidence that thePSR2 array is encroaching onNATE, the junction sitewas determined on different chromosomes. The junctionsite was amplified by PCR, cloned, and sequenced fromPSR A (lab standard used to constructl library) and fouradditional PSR chromosomes that were collected inde-pendently from natural populations. Identical junctionsites betweenNATEP5 and PSR2 were found on all fivePSR chromosomes, confirming that the junction site isnot a cloning artifact and indicating its stability duringthe time interval since the divergence of these chromo-somes.

    Another alternative to nonhomologous recombinationis that the PSR2 repeat family evolved fromNATE.Todetermine if there were large-scale similarities betweenthe PSR2 sequence and the region ofNATEunderlyingthe junction, dot-plot comparisons (Maizel and Lenk1981) were performed. A 229-bp region ofNATE P5leading up to the junction site, the four PSR2 repeats, and100 bp ofNATEP16 on the other side of the junction sitewere joined into a composite sequence and this was com-pared to the PSR2-1 sequence. Window sizes of 8 and 21bases were used, with stringencies of 50, 75, and 100%.Similarity among the four PSR2 repeats was evident inall of the dot-plot analyses, although no remarkable simi-larity was detected between the PSR2 repeat andNATE.Therefore, except for the two short regions of similaritydirectly adjacent to the junction, there is no significantsimilarity betweenNATEand PSR2. The observed pat-tern is more consistent with a recombination event be-tween NATE and PSR2 than with PSR2 arising fromNATE.

    Another possible explanation for the formation of thisjunction is an insertion of PSR2 repeats intoNATE.If thePSR2 array inserted intoNATE, it is expected that theremaining piece ofNATEP5 should be flanking the otherend of this PSR2 array. Both PCR assays and a PSRlibrary screen were used in attempts at isolating the ex-pected junction at the other end of the array. A combi-nation of primers were used in an attempt to amplify theexpected junction. Two primers were used that were lo-cated within the LTR ofNATEand oriented in the samedirection (N8, 58 ACG ATT ACA TAA AGC ATA G;N7, 58 CAC GAG GCA TCA TCT GCG), in combina-tion with primers homologous to the PSR2 and PSR18sequences (2R, 58 GGC CTC AGG ACG GCG TTC; 2F,58 GAA CGC CGT CCT GAG GCC; 18R, CGA CATCTT TTC CTC AGC G; 18F, 58 GAG GAA AAG ATGTCG CTC C). Based on the organization of the PSR2sequence at theNATE P5/PSR2 junction, eitherNATEprimer in combination with the 2R primer should haveproduced the expected product. Attempts at amplifyingthe junction with all of these primer combinations wereunsuccessful (data not shown). In addition, the PSR ge-nomic library was screened to determine if clones rep-

    473

  • resenting the expected structure were present. The regionof NATEP16 (p16-EH27) that encompasses the junctionin NATE P5 was used as a probe to screen about fivegenome equivalents of the library. Approximately 150clones hybridized strongly to the probe. Thirty-twoclones were isolated, regrown, and screened for hybrid-ization to the PSR2 repeat. No clones were identifiedwhich hybridized to both probes. Therefore, these ex-periments failed to identify the expected organization atthe other end of the array.

    Sequences of PSR2 Repeats at an Array Terminus

    To investigate the evolution of repeats at the junctionregion, the number and linear structure of PSR2 repeatsin this region was determined. The observed pattern isconsistent with unequal exchange being inhibited at the

    junction with NATE. After the sequence of the PSR2array present inlP8 was obtained, the presence of thethreeHindIII restriction fragments of PSR2 was verified.Using the initiation of the first PSR2 repeat to define theperiodicity in the array, four complete PSR2 repeats (ap-prox 171 bp each) and one partial PSR2 repeat (27 bp)were identified in this sequence (Fig. 3). Even though theentire sequence of the array was not obtained in a linearfashion, by sequencing overlapping clones the relativepositioning of the PSR-2 repeats is this region was de-termined (see Materials and Methods). The repeat whichforms the junction withNATE P5 is referred to as 8-1,and the others are numbered consecutively (8-2, etc.)into the array (Fig. 3).

    For comparative purposes, five ‘‘random’’ PSR2 re-peat sequences were available (Eickbush et al. 1992).These published PSR2 repeats presumably represent a

    Fig. 3. Sequences of five previously published PSR2 sequences that represent internal repeats and four complete and one partial repeats obtainedfrom the junction region. Recognition sites forSau3AI and HindIII are underlined,and the palindromic regions are indicated.

    474

  • random sample of repeats located internally within thearray, because they were subcloned and sequenced fromfour l clones, each with 15- to 20-kb inserts consistingentirely of PSR2 repeats. PSR2 sequences on the PSRchromosome are remarkably homogeneous in restrictionenzyme digestion patterns, there is no evidence of ahigher-order structure in the array, and they are localizedin one or a few arrays (Eickbush et al. 1992; Beukeboomand Werren 1993; J.H.W., unpublished data). As is evi-dent in the visual comparison of repeats in Fig. 3, thePSR2 repeats adjacent to the junction site are similar tothe internal PSR2 repeats. Nevertheless, there are a num-ber of unique nucleotides in the PSR2 repeats at the endof this array which have not been identified in the otherPSR2 repeats (Fig. 3).

    Sequence comparisons were performed to quantifythe differences between the PSR2 repeats at the junctionrelative to the internal repeats. The mean pairwise se-quence difference among the five internal repeats was0.044 ± 0.019 (SD). Each of the four repeats at the junc-tion was compared to the five internal repeats, and meansand standard deviations of these pairwise comparisonswere calculated. The sequence difference between thefirst, second, and third repeats was 0.068 ± 0.021, 0.056± 0.012, and 0.075 ± 0.015, respectively, which washigher than the calculated difference among the internalrepeats. The fourth repeat in the array 8-4 had a meandifference of 0.034 ± 0.019, lower than the value ob-served among the internal repeats. These sequence com-parisons reveal a trend where the first three repeats in thearray are more divergent from the internal repeats thanthe internal repeats are to each other, and the fourthrepeat is very similar to the internal repeats.

    The distance estimates indicate that the three mostterminal repeats are distinct from other PSR2 repeats,although visual comparisons reveal many of the nucleo-tide positions which distinguish these first three repeatsare clustered and shared among them (Fig. 3). In contrastto the first three repeats, the fourth repeat does not con-tain these distinct features and it is almost identical to theinternal PSR2 repeats. Although the first three repeats

    are similar, different regions of each repeat may repre-sent units with their own history, and unequal exchangeevents have brought these units together into the extantrepeats which were observed. One apparent example ofthis pattern is in repeat 8-2 from the PSR2 array termi-nus. This repeat is a perfect mosaic containing partialregions of the first and third repeats. By postulating anarray containing the first and third repeats, a single un-equal exchange between these two repeats would gener-ate the second repeat (Fig. 4). This unequal exchangeevent appears to be recent, because no nucleotide sub-stitutions are apparent in the 348 bases representing themosaic second repeat compared to the progenitor seg-ments. In addition to the apparent mosaic nature of thesecond repeat, a block of distinctive nucleotides thathave not been identified in other PSR2 repeats are sharedby the first, second, and third repeats (Fig. 3 and 4). Thepresence of these nucleotides in the second repeat isconsistent with the previously mentioned unequal ex-change, but their presence in the third repeat indicatesanother unequal exchange. These distinctive nucleotidesmay have been present at the formation of the junction,with subsequent divergence of the more interior array, orthe terminal repeats may have diverged from the internalarray by mutations following formation of the junction.In either case, it appears that unequal exchanges havemoved this segment into the array. Furthermore, at leastthree nucleotide substitutions have occurred in both thefirst and the third repeats during the time separating thesetwo unequal exchanges. The pattern observed indicatesthat the repeats adjacent toNATE are divergent fromconsensus PSR2 repeats but converge on the consensussequence by the fourth repeat.

    General Structure ofNATE-Repeat Junctions

    The structure of five additional regions containingNATEinserts are shown in Fig. 5. The results are based oncloning, restriction digestion analyses, and Southern hy-bridizations using subclones from these regions as

    Fig. 4. Inferred history of unequal exchangebetween repeats adjacent to the junction withNATE.Nucleotide sites that are present in at least two PSR2repeats (but not all) are represented assolid lines,whereas the sites only present in a single repeat arerepresented bydotted lines.Recombination events aredepicted as one chromatid with a double-strand breakusing the other chromatid as template for repair. Theshaded boxesindicate the window where theexchange event is localized for the strand containingthe PSR 2 repeats.A Oldest inferred unequalexchange shifting the block of unique nucleotidesinto the array.B Unequal exchange creating themosaic second repeat.C Structure of the array in itspresent form.

    475

  • probes to restriction-digested genomic DNA. As a gen-eral pattern,NATE inserts are found near tandem arraysand near other dispersed repetitive sequences. Three ofthe five junctions contain nearby (or immediately flank-ing) regions of known tandemly repetitive DNA. As de-scribed previously,NATEP5 has a junction with an arrayof PSR2, with apparently dispersed repetitive DNAflanking the other end of the element.NATE P18 is atruncated retroelement with PSR79 on one side and ashort (about 1.5-kb) stretch of PSR79 on the other, fol-lowed by a complex nonarray region.NATE P41b/41aare adjacent truncated elements, with approximately 5 kbof complex DNA followed by a region of PSR22 repeats.The other two clones containNATEelements adjacent tocomplex non-array DNA. The regions of complex DNAflanking these elements are primarily composed of a di-verse set of dispersed repetitive DNA (for an example,see McAllister and Werren 1997). A high frequency oftruncation is also observed among these copies ofNATE;of six elements, at least four are truncated.

    Several specific features are noteworthy (Fig. 5).NATE P41a and b are two back-to-back truncated ele-ment in the vicinity of a PSR22 array. It is estimated thatthere are nine copies ofNATEon the 40-Mb PSR chro-mosome. The probability of these two elements beingassociated by chance is very low. They must have either(a) preferentially inserted next to each other or (b) in-serted at more distant location and subsequently‘‘moved’’ into close proximity. This may have been fa-cilitated by nonhomologous recombination.NATE P18also shows interesting features. It is very close to the end

    of a PSR79 array. PSR79 is a complex tandemly repeatedsequence (Eickbush et al. 1992), and current deletionstudies indicate that it occurs in a localized block on thePSR chromosome (Beukeboom and Werren 1993; Mc-Allister, Beukeboom, and Werren, in preparation). Theprobability thatNATEP18 inserted at the end of a PSR79array is thus low, suggesting migration of this elementtoward the end of an array after insertion.NATEP5 is atthe end of a PSR2 array and near other mobile elements.BothNATEP17 andNATEP16 occur in complex geneticregions containing other dispersed repetitive DNA (pre-sumed mobile elements). The general pattern suggestsaccumulation of mobile elements into complex regions ator near junctions with tandemly repetitive DNA. Belowwe present a general model to account for this observa-tion.

    Discussion

    Although the general characteristics of ‘‘satellite’’ se-quences are well documented, the mechanisms govern-ing their origin and maintenance still remain elusive. Oneof the reasons for not being able to resolve these funda-mental processes is the inherent difficulties involved instudying long tandem arrays of repeated sequences.Analyses of junction sites at the ends of repetitive arraysprovide an opportunity to characterize repetitive se-quences with a unique level of resolution. Several studiesof junctions between different tandemly repeated se-

    Fig. 5. Cloned regions from the PSR chromosome. Positions of the retroelementNATE are indicated by thelarge rectangles,with the blackarrowheadsrepresenting the LTRs. Theblocks of vertical linesrepresent arrays of the tandemly repetitive sequences PSR2, PSR22, and PSR79.Regions flanking the copies ofNATEwhich have middle-repetitive hybridization patterns are represented bycross-hatched boxes.

    476

  • quences reveal that these junctions are generally com-plex and not defined by a specific site of transition(Maresca and Singer 1983; Lohe and Brutlag 1987; Reedet al. 1994). This complexity arises because the differentrepeated sequences interdigitate throughout a junctionregion, suggesting that multiple recombination eventshave occurred between established repetitive arrays.Single junctions between repetitive and nonrepetitive se-quences provide a better means for examining the prop-erties underlying the evolution of tandem arrays, such asthe studies of arrays of tandem repeats within codingregions (Eckert and Green 1986; Hogan et al. 1995),although it can be difficult to identify where the arraybegins. When junctions are not localized to a specificsite, it is difficult to determine the potential role theseregions may have played in the evolution of the repeti-tive array. The junction characterized in this study in-volves the tandem repeat PSR2 adjacent to a truncatedend of the retroelementNATE.One benefit of this par-ticular organization was the ability to resolve the transi-tion site betweenNATEP5 and PSR2 by comparing thejunction region to a full-length copy ofNATE and areference PSR2 repeat. This comparison revealed an ex-tremely abrupt transition from theNATEsequence to thebeginning of the PSR2 array, where only short regions ofsimilarity were identified around the junction.

    Origin of the Junction Site

    By specifically identifying the junction site, this studywas able to distinguish between sequences contained inthe array and sequences flanking the array. Thus, thesequence of the full-length retroelementNATEP16 rep-resents the ancestral state ofNATEP5 prior to the originof the junction. This study addresses several mechanismsthat could have led to the origin of this junction betweenNATEP5 and PSR2. (1) The PSR2 sequence may haveoriginated from the underlyingNATEsequence. (2) Dur-ing array amplification, tandemly repeated sequencesmay expand into nonrepetitive sequences that are flank-ing arrays. Therefore, the PSR2 array may be expandinginto the sequence ofNATEP5. (3)NATEP5 may haveinserted into the PSR2 array or (4) alternatively the PSR2array may have inserted intoNATE P5. (5) The twosequences may have become juxtaposed following astructural rearrangement of the chromosome.

    Several proposed mechanisms of tandem repeat originmake specific predictions about the relationship betweenthe repeated sequence within an array and the sequencefrom which the array originated. Similarity betweenPSR2 andNATEis expected if the PSR2 repeat originat-ed from this region ofNATEby duplication (Krüger andVogel 1975; Ohta 1980) or replication slippage (Levin-son and Gutman 1987; Walsh 1987; Stephan 1989).Upon comparison of these sequences, no extensive simi-larity was detected between the PSR2 repeat and the

    region of NATE underlying the junction (except for ashort 11-bp region). Therefore, the PSR2 repeat was un-likely to have been generated from this region ofNATEby sequence duplication or an aberrant replication error.Under the model of successive unequal exchanges caus-ing the origin of a repetitive array, there is an unpredict-able relationship between the final repeated sequenceand the progenitor sequence (Smith 1976; Stephan1989). Even though the relationship is not direct, a tran-sition from the repeat sequence to the flanking sequenceis expected, and this transition was not observed. Fur-thermore, instability of the junction site is expected if thearray originates by successive unequal exchange, be-cause chance unequal exchange events are expected tooccur between repeats in the array and sequence flankingthe array (Smith 1976). Inconsistent with this expectationis the apparent stability of the junction site among thefive independent natural isolates of the PSR chromo-some. Since this junction does not represent the expectedstructure following the primary origin of the PSR2 arrayby either duplication, slippage replication, or unequalexchange, another mechanism is necessary to explain thesecondary formation of the junction following the estab-lishment of the PSR2 repeat.

    Secondary processes may have led to formation ofthis junction after the origin of the PSR2 array on thischromosome. One possibility is thatNATE P5 insertedinto the PSR2 array in its current form; however, twoobservations are inconsistent with this hypothesis. SinceNATEP5 is truncated at one end, the long terminal repeatis absent from this end of the element making the currentform of the element incapable of insertion (Varmus andBrown 1992). Upon initial insertion,NATEP5 must havecontained long terminal repeats at both ends of the ele-ment and the truncation and formation of the currentjunction with PSR2 must have followed the element’sinsertion. Additionally, the PSR2 repeat is located onlyat the truncated end ofNATE P5. Had this element in-serted into a PSR2 array, the presence of the repeat isexpected in the flanking regions at both ends of the el-ement. However, it is possible that the PSR2 repeat wasonce located at both ends of the element, and the se-quence has subsequently been lost on the intact end dueto element accretion (see model below).

    The immediate transition from theNATEP5 sequenceto the PSR2 array is curiously similar to an insertion, andan alternative mode of insertion is that the PSR2 arrayinserted into this region ofNATEP5. Although mobilityof tandemly repeated sequences has never been demon-strated, the rolling-circle-replication model (Walsh 1987)provides a mechanism whereby this process could occur.The presence of extrachromosomal ‘‘plasmids’’ contain-ing tandemly repeated sequences does provide empiricalsupport that template for rolling circle replication is pro-duced by intrastand exchange within repetitive arrays(Kiyama et al. 1986, 1987; Okumura et al. 1987). How-

    477

  • ever, replication of these plasmids and insertion of alinear array of repeats has not been demonstrated. Thestructure of the junction in this study is consistent withan insertion of the PSR2 array intoNATE P5, but at-tempts at verifying this hypothesis by identifying themissing piece ofNATEP5 at the other end of the PSR2array were unsuccessful. Although a PSR2 insertion wasnot ruled out by the failure to detect the expected struc-ture at the other end of the PSR2 array, it is unlikely thatthis structure is currently present on the PSR chromo-some.

    The absence of evidence supporting any of the previ-ously mentioned mechanisms, and the presence of no-table regions of similarity in this relatively short junctionregion, suggests that the junction was formed by nonho-mologous recombination which joined an establishedPSR2 array with an internal region ofNATEP5. Whenshort regions of similarity are involved in nonhomolo-gous recombination, resolution may result in productswhich are nonconservative (Sakagami et al. 1994), asappears in this junction betweenNATEP5 and PSR2. Itis worth noting that one of the regions involved was inreverse orientation, therefore establishing a complex in-teraction among the DNA strands. This nonhomologousrecombination would have truncatedNATE.Since a largeamount of this chromosome appears to be functionallyinert (Beukeboom and Werren 1993; McAllister, Beuke-boom, and Werren, in preparation), it is reasonable toassume that the rearrangement was not deleterious.

    Maintenance and Evolution of Repetitive Arrays

    This junction site is also useful for examining processesgoverning the maintenance and evolution of repetitivearrays. Unequal exchange is currently the favoredmechanism to explain the observed sequence homogene-ity among repeats in repetitive arrays (Kru¨ger and Vogel1975; Smith 1976; Stephan 1989; Charlesworth et al.1994; Elder and Turner 1995). One prediction of themodel is that repeats at the end of an array should exhibitthe highest level of sequence divergence (Ohta 1980;Stephan 1989). This pattern has been documented in ar-rays of 200-bp tandem repeats within Balbiani ring genesin Chironomus tentans(Höög et al. 1988) and in alphasatellite arrays in the human genome (Wevrick et al.1992; Cooper et al. 1993). This study indicates that thesame phenomenon occurs in an array of PSR2 repeats onthe PSR chromosome. The three most terminal repeats ofthis PSR2 array exhibited greater overall sequence di-vergence from a set of internal PSR2 repeats. Further-more, these end repeats contained many unique nucleo-tide substitutions that have not been identified in otherPSR2 repeats. Effects of the array terminus apparentlyact over a very short distance, because increased diver-gence was only observed in the first three repeats,whereas the fourth repeat was very similar to the internal

    repeats. This observation is consistent with Stephan’s(1989) simulation of repetitive sequence evolution whereunequal exchange at the array terminus was suppressedby the presence of nonrepetitive sequences (i.e., presenceof the nonhomologousNATE element), and this effectshould operate over very short distances.

    The observed pattern is consistent with unequal ex-changes being inhibited at the array end. However, someunequal exchanges are apparent. Unequal exchange inthese end repeats may be rare relative to exchange eventswithin the array, and these exchange events may be lo-calized (Ohta 1980). The data are consistent with both ofthese assumptions; only two unequal exchanges wereevident and both of these apparently resulted from dis-placement of the array by a single repeat. However, whyis there no evidence of gene conversion? How do the endrepeats diverge in the presence of unequal exchange?

    An alternative explanation that encompasses the ob-servations is based on a revised model of unequal ex-change proposed by Fletcher (1994). In his model, un-equal exchange is initiated by a double-strand break in arepetitive array. To repair the break, the two strandssearch independently for homology and ultimately pairwith different repeats. If the strands pair into a Hollidaycomplex with the 58 and 38 ends in the correct orientation(facing each other), although with a number of repeatsseparating the ends, resolution of the complex causesexpansion of the array. This junction region provides aspecial case for this process as illustrated by the inferredunequal exchange events in Fig. 4. When a double-strandbreak occurs near a junction site, homology searchingand pairing of the strand containing the junction may bedominated by the flanking sequence, whereas the otherstrand contains repetitive sequence and can pair with anyrepeat in the array. Studies have demonstrated that re-combination is influenced by long stretches of sequenceidentity (Waldman and Liskay 1987, 1988; Metzenberget al. 1991), so when a break occurs near a junction thestrand containing the array terminus and flanking se-quence should preferentially anneal to its homologoussite. Because the junction region and terminal repeatswould always be repaired from identical sequence on thetemplate DNA strand, the sequence of these repeatswould be protected from gene conversion. Nucleotidesubstitutions occurring in the end repeats would be re-tained, even in the presence of recombination. Depend-ing on where the break occurs, nucleotide substitutionspresent in the end repeats could shift into the array dur-ing this process, and the pattern exhibited by the lastthree repeats in this PSR2 array suggests that this hasoccurred.

    The data on the windows in which recombinationevents have occurred also implicate the double-strandbreak model. Very short stretches of identity were ob-served in the regions where recombination was evident.The most recent inferred unequal exchange was localized

    478

  • in a stretch of 26 bases of complete sequence identity,whereas the older exchange occurred in a window of 15bases of complete identity between the junction withNATEand the distinct nucleotides that were shifted intothe array by the exchange. Other studies have also re-vealed that unequal exchange in repetitive arrays occursin very short tracts of identity (Cabot et al. 1993; War-burton et al. 1993; Reed et al. 1994). Because the double-strand break model allows the two broken ends to inde-pendently anneal at different locations on the templatestrand, the short window of identity where recombinationevents are localized may be misleading with regard to thesequence identity that was present between the brokenends and the template.

    Findings of this study are consistent with unequalexchange being the mechanism maintaining homogene-ity throughout the array. A previous report also docu-mented unequal exchange on the PSR chromosome, al-though the exchanges were between different families ofrepetitive sequences (Reed et al. 1994). Because of theunique transmission pattern of PSR, unequal exchangescan only occur between sister-chromatids during mitotic(haploid) cell divisions, thus these studies provide sup-port that unequal exchange can be limited to sister-chromatids. This is consistent with a sequence analysisof the Responderarray in Drosophila melanogasterwhich indicated that unequal exchange was primarily aninterchromatid process (Cabot et al. 1993).

    Accretion Model Spatial Structure of Tandem Arrays

    We have demonstrated an association among tandemlyrepeated arrays and truncated dispersed repeats on thePSR chromosome, and here we propose a model thatmay provide a general explanation for this frequentlyobserved association. It is recognized that mobile ele-ments will accumulate within noncoding regions such astandem DNA, because they are less likely to cause harm-ful mutations (Charlesworth et al 1994). However, therehas been little discussion of how mobile elements areexpected to be distributed spatially within tandem arrays.Here we present an ‘‘accretion model’’ that predicts theaccumulation of mobile elements at the ends of arraysand into ‘‘islands’’ within arrays. The model is based onthe prediction that, due to the stochastic processes ofduplication and deletion from unequal exchanges, theultimate fate of any particular array is loss (Charlesworthet al. 1986, Stephan 1987). This drift in array size hasimplications for macrostructural features of tandem ar-rays and associated DNAs. In particular, we propose thatmobile elements inserting into tandem arrays will sub-sequently ‘‘migrate’’ to the ends of arrays, and into is-lands within arrays, due to the turnover of tandem re-peats.

    Consider an insertion of a mobile element into a tan-dem array (see Fig. 6). Due to the process of unequal

    exchange, the portion of tandem array located to one side(or the other) of the element will eventually be lost bychance drift in repeat numbers, causing migration of theelement to the end or the array (Fig. 6A). Following theinsertion of more elements and repetition of the process,unequal exchange will ultimately cause the accumulationof many mobile elements at the ends of the tandem array.Similarly, if two elements are inserted into a tandemarray (Fig. 6B), there would be an intervening array be-tween them. Random loss of this intervening array wouldmove the elements together where they would accumu-late into an ‘‘island’’ within the tandem array by thisaccretion process. Accretion of additional elements andislands will lead to the further growth of non-array is-lands composed of dead, dying, and functional mobileelements.

    The accretion model, therefore, predicts both islandsof mobile elements within tandem arrays and large num-

    Fig. 6. Large-scale effects of mobile element insertion and unequalexchange on the structure of tandem arrays. Transposable elements(TE) insert into a large tandem array. A single product of unequalexchange is indicated by theheavy line,and this causes migration of thetransposable element. Ultimately, accumulation of mobile elements oc-curs.A Terminal accretion of mobile elements at the end of an array.B Island formation by mobile element accumulation within repetitivearrays.

    479

  • bers of mobile elements at the ends of tandem arrays.Features in the cloned regions of the PSR chromosomeare consistent with this prediction. In all regions contain-ing the terminal region of a tandem array, many differentmiddle repetitive sequences are present flanking thesearrays. Retroelements should be more susceptible to ac-cretion, because once inserted into a tandem array, theyshould rarely excise (Nuzhdin and Mackay 1994). Sev-eral recent studies have characterized complex repetitiveDNAs involving retroelements (Wevrick et al. 1992;Hochstenback et al. 1994; Nurminsky et al. 1994; Le etal. 1995). Consistent with the accretion model, Wevricket al. (1992) describe the presence of retroelementsflanking alpha satellite arrays, and Le et al. (1995) foundthat Drosophila heterochromatin is composed of tan-demly repetitive DNA with alternating regions of com-plex DNA. Analysis of one of the ‘‘islands’’ of complexDNA revealed presence of a retroposon.

    Local accumulation of different mobile elements byaccretion, coupled with truncations (as observed withcopies ofNATEon PSR), could result in the juxtaposi-tion of different coding sequences. This may have im-plications for the evolution of novel genetic sequences,because it is conceivable that such accretions will bebreeding grounds for new mobile elements, viruses, ornovel coding sequences.

    Acknowledgments. The authors thank B. Charlesworth, T. Eickbush,I. Kobayashi, H. Ochman, and A. Ray for discussions of these dataand/or comments on previous versions of the manuscript. Duringpreparation of the manuscript, B. McAllister was supported by an NSF/Alfred P. Sloan Postdoctoral Research Fellowship in Molecular Evo-lution. This research was supported by a grant from the National Sci-ence Foundation to J. Werren.

    References

    Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, SmithJA, Struhl K (1992) Current protocols in molecular biology, Suppl21. John Wiley & Sons, New York, pp. 7.3.7

    Beukeboom LW, Werren JH (1993) Deletion analysis of the selfishBchromosome,Paternal Sex Ratio(PSR), in the parasitic waspNa-sonia vitripennis.Genetics 133:637–648

    Bigot Y, Hamelin M-H, Periquet G (1990) Heterochromatin conden-sation and evolution of unique satellite-DNA families in two para-sitic wasp species:Diadromus pulchellusand Eupelmus vuilleti(Hymenoptera). Mol Biol Evol 7:351–364

    Cabot EL, Beckenbach AT (1989) Simultaneous editing of multiplenucleic acid and protein sequences using ESEE. Comp Appl Biosci5:233–234

    Cabot EL, Doshi P, Wu M-L, Wu C-I (1993) Population genetics oftandem repeats in centromeric heterochromatin: unequal crossingover and chromosomal divergence at theResponderlocus ofDro-sophila melanogaster.Genetics 135:477–487

    Chalker AF, Okeley EA, Davison A, Leach DRF (1993) The effects ofcentral asymmetry on the propagation of palindromic DNA in bac-teriophagel are consistent with cruciform extrusion in vivo. Ge-netics 133:143–148

    Charlesworth B, Langley CH, Stephan W (1986) The evolution ofrestricted recombination and the accumulation of repeated DNAsequences. Genetics 112:947–962

    Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionarydynamics of repetitive DNA in eukaryotes. Nature 371:215–220

    Cooper KF, Fisher RB, Tyler-Smith C (1993) Structure of the se-quences adjacent to the centromeric alphoid satellite DNA array onthe human Y chromosome. J Mol Biol 230:787–799

    Doolittle WF, Sapienza C (1980) Selfish genes, the phenotype para-digm and genome evolution. Nature 284:601–603

    Eckert RL, Green H (1986) Structure and evolution of the humaninvolucrin gene. Cell 46:511–523

    Eickbush DG, Eickbush TH, Werren JH (1992) Molecular character-ization of repetitive DNA sequences from a B chromosome. Chro-mosoma 101:575–583

    Elder JF Jr, Turner BJ (1995) Concerted evolution of repetitive DNAsequences in eukaryotes. Q Rev Biol 70:297–320

    Fletcher HL (1994) Possible loss of length conservation and reciprocityduring recombination or conversion in tandem arrays. Genetics138:511–518

    Hochstenback R, Harhangi H, Schouren K, Hennig W (1994) Degen-eratingGypsyretrotransposons in a male fertility gene onY chro-mosome ofDrosophila hydei.J Mol Evol 39:452–465

    Hogan NC, Slot F, Traverse KL, Garbe JC, Bendena WG, Pardue M-L(1995) Stability of tandem repeats in theDrosophila melanogasterHsr-omeganuclear RNA. Genetics 139:1611–1621

    Höög C, Daneholt B, Wieslander L (1988) Terminal repeats in longrepeat arrays are likely to reflect the early evolution of Balbiani ringgenes. J Mol Biol 200:655–664

    Kiyama R, Hideki M, Oishi M (1986) A repetitive DNA family (Sau3Afamily) in human chromosomes: extrachromosomal DNA andDNA polymorphism. Proc Natl Acad Sci USA 83:4665–4669

    Kiyama R, Okumura K, Matsui H, Bruns GAP, Kanda N, Oishi M(1987) Nature of recombination involved in excision and rearrange-ment of human repetitive DNA. J Mol Biol 198:589–598

    Krüger J, Vogel F (1975) Population genetics of unequal crossing over.J Mol Evol 4:201–247

    Le MH, Duricka D, Karpen GH (1995) Islands of complex DNA arewidespread in Drosophila centric heterochromatin. Genetics 141:283–303

    Levinson G, Gutman GA (1987) Slipped-strand mispairing: a majormechanism for DNA sequence evolution. Mol Biol Evol 4:203–221

    Lohe AR, Brutlag DL (1987) Adjacent satellite DNA segments inDrosophila,structure of junctions. J Mol Biol 194:171–179

    Maizel JV, Lenk RP (1981) Enhanced graphic matrix analysis ofnucleic acid and protein sequences. Proc Natl Acad Sci USA 78:7665–7669

    Maresca A, Singer MF (1983) Deca-satellite: a highly polymorphicsatellite that joinsa-satellite in the African green monkey genome.Mol Biol 164:493–511

    McAllister BF (1995) Isolation and characterization of a retroelementfrom a B chromosome (PSR) in the parasitic waspNasonia vitrip-ennis.Insect Mol Biol 4:253–262

    McAllister BF, Werren JH (1997) Hybrid origin of a B chromosome(PSR) in the parasitic waspNasonia vitripennis.Chromosoma 106:243–253

    Metzenberg AB, Wurzer G, Huisman THJ, Smithies O (1991) Homol-ogy requirements for unequal crossing over in humans. Genetics128:143–161

    Nur U, Werren JH, Eickbush DG, Burke WD, Eickbush TH (1988) A‘‘selfish’’ B chromosome that enhances its transmission by elimi-nating the paternal genome. Science 240:512–514

    Nurminsky DI, Shevelyov YY, Nuzhdin SV, Gvozdev VA (1994)Structure, molecular evolution and maintenance of copy number ofextended repeated structures in theX-heterochromatin ofDro-sophila melanogaster.Chromosoma 103:277–285

    Nuzhdin SV, Mackay TFC (1994) Direct determination of retrotrans-poson transposition rates inDrosophila melanogaster.Genet Res63:139–144

    Ohta T (1980) Evolution and variation of multigene families. Springer-Verlag, New York

    480

  • Okumura K, Kiyama R, Oishi M (1987) Sequence analyses of extra-chromosomalSau3A and related family DNA: analysis of recom-bination in the excision event. Nucleic Acids Res 15:7477–7489

    Orgel LE, Crick FHC (1980) Selfish DNA: the ultimate parasite. Na-ture 284:604–607

    Reed KM, Beukeboom LW, Eickbush DG, Werren JH (1994) Junctionsbetween repetitive DNAs on the PSR chromosome ofNasonia vit-ripennis: association of palindromes with recombination. J MolEvol 38:352–362

    Sakagami K, Tokinaga Y, Yoshikura H, Kobayashi I (1994) Homol-ogy-associated nonhomologous recombination in mammalian genetargeting. Proc Natl Acad Sci USA 91:8527–8531

    Smith GP (1976) Evolution of repeated DNA sequences by unequalcrossover. Science 191:528–535

    Stephan W (1987) Quantitative variation and chromosomal location ofsatellite DNAs. Genet Res 50:41–52

    Stephan W (1989) Tandem-repetitive noncoding DNA: forms andforces. Mol Biol Evol 6:198–212

    Varmus H, Brown P (1989) Retroviruses. In: Berg DE, Howe MM(eds) Mobile DNA. Am Soc Microbiol, Washington, DC, pp. 53–108

    Waldman AS, Liskay RM (1987) Differential effects of base-pair mis-match on intrachromosomal versus extrachromosomal recombina-tion in mouse cells. Proc Natl Acad Sci USA 84:5340–5344

    Waldman AS, Liskay RM (1988) Dependence of intrachromosomalrecombination in mammalian cells on uninterrupted homology.Mol Cell Biol 8:5350–5357

    Walsh JB (1987) Persistence of tandem arrays: implications for satelliteand simple-sequence DNAs. Genetics 115:553–567

    Warburton PE, Waye JS, Willard HF (1993) Nonrandom localizationof recombination events in human alpha satellite repeat unit vari-ants: Implications for higher-order structural characteristics withincentromeric heterochromatin. Mol Cell Biol 13:6520–6529

    Werren JH, Nur U, Eickbush DG (1987) An extrachromosomal factorcausing loss of paternal chromosomes. Nature 327:75–76

    Wevrick R, Willard VP, Willard HF (1992) Structure of DNA nearlong terminal arrays of alpha satellite DNA at the centromere ofhuman chromosome 7. Genomics 14:912–923

    Willard, HF (1989) The genomics of long tandem arrays of satelliteDNA in the human genome. Genome 31:737–744

    481