hi kathy, i’ve had a look at the remapped version of chr7 (mal7.remapped this is the cons file you...

9
Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to get some clues as to the true assembly. Currently the two telomeres appear to be fused back to back at coordinates c. 1.38 MB. I’ve used ACT to compare to the remapped version to the old version. This can be misleading as this assumes that the previous version was correct. I’ve also annotated some repeat units that appear quite commonly right at the end of the telomere (7 mer tandem repeat ) and within the subtelomeric region rep20 (also known as TARE 6 has a characteristic 21 bp tandem repeat). Rep20 can be used as a reference point to see the general orientation. I think that there has been fusion of reads that belong to right rep20 and left rep20 , as a result the subtelomeric regions and telomeres over a larger region have become joined. The gap that is present is probably unbridgeable because it is in fact the right and left ends of the chromosome. Its almost certain that it is repeats that are causing the problem with the assembly. So if they occur very close to regions that are miss positioned this may be some explanation. The assembler joins contigs that in truth shouldn't be joined if they both have repeat elements with large overlaps. Finally I’ve compared the layout of the telomeres in chr6 and 13 just to get some idea of how these chromosomes look in terms of general layout in the telomeres and subtelomeres. This could be misleading but a good starting point. I hope that this provides assistance in forming a working hypotheses to sort out the assembly. The info is in the following pages. I hope that it is helpful. Cheers, Andy PS. Give me a buzz on 4955 if you want to discuss it.

Upload: dominick-powell

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

Hi Kathy,

I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to get some clues as to the true assembly. Currently the two telomeres appear to be fused back to back at coordinates c. 1.38 MB.

I’ve used ACT to compare to the remapped version to the old version. This can be misleading as this assumes that the previous version was correct.

I’ve also annotated some repeat units that appear quite commonly right at the end of the telomere (7 mer tandem repeat ) and within the subtelomeric region rep20 (also known as TARE 6 has a characteristic 21 bp tandem repeat). Rep20 can be used as a reference point to see the general orientation.

I think that there has been fusion of reads that belong to right rep20 and left rep20 , as a result the subtelomeric regions and telomeres over a larger region have become joined. The gap that is present is probably unbridgeable because it is in fact the right and left ends of the chromosome.

Its almost certain that it is repeats that are causing the problem with the assembly. So if they occur very close to regions that are miss positioned this may be some explanation. The assembler joins contigs that in truth shouldn't be joined if they both have repeat elements with large overlaps.

Finally I’ve compared the layout of the telomeres in chr6 and 13 just to get some idea of how these chromosomes look in terms of general layout in the telomeres and subtelomeres. This could be misleading but a good starting point.

I hope that this provides assistance in forming a working hypotheses to sort out the assembly.

The info is in the following pages. I hope that it is helpful.

Cheers,

Andy

PS. Give me a buzz on 4955 if you want to discuss it.

Page 2: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

MAL7

MAL7.remapped

No telomere present at the left-end.

A GC plateau (arrowed) is characteristic due to the terminal 7 bp repeat (not shown).

Files: MAL7.embl ; MAL7.embl.remapped; MAL7.remapped.fasta.V.MAL7.fasta.crunchDirectory: /nfs/disk222/yeastpub/analysis/pathogen/malaria/annotation/Plasmodium/falciparum/geneDB/chr7

Missing left hand telomere

Page 3: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

gap

MAL7

The gap is probably where the two telomere ends meet back to back.

MAL7.remapped

Page 4: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

The regions that probably belong to right and left telomeres are marked up on the gene line in green in MAL7.remapped.

Good hits to the right telomere of MAL7.embl

Hits the right telomere but inverted

Page 5: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

Good hits to the right telomere, left section inverted. Hits overlap suggesting repeats are causing problems

Page 6: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

These two regions have hits in both the right and the left teleomeres but hits are strongest to the right telomere. Again probably highly repetitive regions. Both are inverted.

Page 7: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

Positioning of 7 bp tandem repeats which are characteristic of the terminal part of the telomere support this hypothesis.

To view them read MAL7.remapped.7bp.repeats.Sco65 as and entry into MAL7.remapped. (Sco 65 is to show a 65 score cutoff). This cutoff will affect the percentage identitiy within the repeat consensus.

Page 8: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

There are 21 bp repeats in this region. The file is MAL7.remapped.21bp.repeats.Sco200 and can be read as an entry into act. Also the MAL7.remapped.21bp.repeats.Sco800 gives a better idea as it only selects more well conserved repeats.

Page 9: Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to

The layout of telomeres and subtelomeres in Plasmodium falciparum, characterised elements

Repeat Other names type Unit size

Presence

Terminal repeat Tandem 7 ALL

14 bp repeat TARE-1, SB-1 Tandem 14 Most

TARE-2 Tandem 135 Most

TARE-3 692-bp, 0.5Kb repeat

Tandem 692 Almost All

TARE-4 Tandem/inverted Most

TARE-5 12-bp repeat Tandem 12 Most

17-bp repeat Tandem 17 ?

23/28 bp repeat Tandem 23/28 ?

Rep11 Tandem 11 ?

Rep20 Rep2, 21-bp repeat, TARE-6, SB-3

Tandem 21 ALL

This is not to scale just to give an approximate idea of the layout of the telomere. These repeat elements are not always present. Those with thick outline are always present. Can compare to the layout of other finished chromosomes. MAL13 has repeat units annotated.