-1- module 3: comparative genomics module 3 comparative genomics introduction the artemis comparison...

21
-1- Module 3: Comparative Genomic Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed to extract the additional information that can only be gained by comparing the growing number of sequences from closely related organisms (Carver et al. 2005). ACT is based on Artemis, and so you will already be familiar with many of its core functions, and is essentially composed of three layers or windows. The top and bottom layers are mini Artemis windows (with their inherited functionality), showing the linear representations of the DNA sequences with their associated features. The middle window shows red and blue blocks, which span this middle layer and link conserved regions within the two sequences, in the forward and reverse orientation respectively. Consequently, if you were comparing two identical sequences in the same orientation you would see a solid red block extending over the length of the two sequences in this middle layer. If one of the sequences was reversed, and therefore present in the opposite orientation, there would be a blue ‘hour glass’ shape linking the two sequences. Unique regions in either of the sequences, such as insertions or deletions, would show up as breaks (white spaces) between the solid red or blue blocks. In order to use ACT to investigate your own sequences of interest you will have to generate your own pairwise comparison files. Data used to draw the red or blue blocks that link conserved regions is generated by running pairwise BLASTN or TBLASTX comparisons of the sequences. ACT is written so that it will read the output of several different comparison file formats; these are outlined in Appendix II. Two of the formats can be generated using BLAST software freely downloadable from the NCBI, which can be loaded and run a PC or Mac. Appendix X shows you how to generate comparison files from BLAST. Whilst having a local copy BLAST to generate ACT comparison files can be very useful, it means that you are tied to a particular computer. Another way of generating comparison files for ACT is to use the WebACT web resource (see page 27). This site allows you to cut and paste or upload your own sequences, and generate ACT readable BLASTN or TBLASTX comparison files. Aims The aim of this Module is for you to become familiar with the basic functions of ACT by using a series of worked examples. Some of these examples will touch on exercises that were used in previous Modules, this is intentional. Hopefully, as well as introducing you to the basics of ACT this Module will also show you how ACT can be used for not only looking at genome evolution but also to backup, or question, gene models and so on. In this module you will also use a web resource, WebACT to generate your own comparison files

Upload: clinton-bates

Post on 17-Dec-2015

230 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-1-

Module 3: Comparative Genomics

Module 3Comparative Genomics

IntroductionThe Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed to extract the additional information that can only be gained by comparing the growing number of sequences from closely related organisms (Carver et al. 2005). ACT is based on Artemis, and so you will already be familiar with many of its core functions, and is essentially composed of three layers or windows. The top and bottom layers are mini Artemis windows (with their inherited functionality), showing the linear representations of the DNA sequences with their associated features. The middle window shows red and blue blocks, which span this middle layer and link conserved regions within the two sequences, in the forward and reverse orientation respectively. Consequently, if you were comparing two identical sequences in the same orientation you would see a solid red block extending over the length of the two sequences in this middle layer. If one of the sequences was reversed, and therefore present in the opposite orientation, there would be a blue ‘hour glass’ shape linking the two sequences. Unique regions in either of the sequences, such as insertions or deletions, would show up as breaks (white spaces) between the solid red or blue blocks.

In order to use ACT to investigate your own sequences of interest you will have to generate your own pairwise comparison files. Data used to draw the red or blue blocks that link conserved regions is generated by running pairwise BLASTN or TBLASTX comparisons of the sequences. ACT is written so that it will read the output of several different comparison file formats; these are outlined in Appendix II. Two of the formats can be generated using BLAST software freely downloadable from the NCBI, which can be loaded and run a PC or Mac. Appendix X shows you how to generate comparison files from BLAST. Whilst having a local copy BLAST to generate ACT comparison files can be very useful, it means that you are tied to a particular computer. Another way of generating comparison files for ACT is to use the WebACT web resource (see page 27). This site allows you to cut and paste or upload your own sequences, and generate ACT readable BLASTN or TBLASTX comparison files.

AimsThe aim of this Module is for you to become familiar with the basic functions of ACT by using a series of worked examples. Some of these examples will touch on exercises that were used in previous Modules, this is intentional. Hopefully, as well as introducing you to the basics of ACT this Module will also show you how ACT can be used for not only looking at genome evolution but also to backup, or question, gene models and so on. In this module you will also use a web resource, WebACT to generate your own comparison files and view them in ACT.

Page 2: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-2-

Module 3: Comparative Genomics

1. Starting up the ACT software

Make sure you’re in the Module_3_Comparative_Genomics directory. Then click on the ACT icon.

The files you will need for this exercise are: Lmjchr11.fastaLmjchr11fastaLin.chr21.fasta.crunchLin.chr11.fasta

Click ‘File’ then ‘Open

1

2

Click ‘Apply’ and wait……6

For comparing more than two genomes!

3Use the File manager to drag and drop files or see 4

Click and select appropriate files

4, 5 & 6

Comparison files end with ‘.crunch’.

Page 3: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-3-

Module 3: Comparative Genomics

2. The basics of ACT

You should now have a window like this so lets see what’s there.

1. Drop-down menus. These are mostly the same as in Artemis. The major difference you’ll find is that after clicking on a menu header you will then need to select a DNA sequence before going to the full drop-down menu.

2. This is the Sequence view panel for ‘Lmjchr11.fasta’ (Subject Sequence) you selected earlier. It’s a slightly compressed version of the Artemis main view panel. The panel retains the sliders for scrolling along the genome and for zooming in and out.

3. The Comparison View. This panel displays the regions of similarity between two sequences. Red blocks link similar regions of DNA with the intensity of red colour directly proportional to the level of similarity. Double clicking on a red block will centralise it. Blue blocks link regions that are inverted with respect to each other.

4. Artemis-style Sequence View panel for ‘Lin.chr11.fasta’ (Query Sequence).5. Right button click in the Comparison View panel brings up this important ACT-

specific menu which we will use later.

1

2

3

4

5

Page 4: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-4-

Module 3: Comparative Genomics

Introduction & AimsIn this first exercise we are going to explore the basic features of ACT. Using the ACT session you have just opened we firstly are going to zoom outwards until we can see the entire Leishmania major chromosome compared against the Leishmania infantum chromosome. As for the Artemis exercises we should turn off the stop codons to clear the view and speed up the process of zooming out. The only difference between ACT and Artemis when applying changes to the sequence views is that in ACT you must click the right mouse button over the specific sequence that you wish to change, as shown above.Now turn the stop codons off in the other sequence too. Your ACT window should look something like the one below:

Use the vertical sliders to zoom out. Drag or click the slider downwards from one of the genomes. The other genome will stay in synch.

3. Exercise 1

1

Right button click here

2

De-select stop codons

Page 5: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-5-

Module 3: Comparative Genomics

Once zoomed out your ACT window should look similar to the one shown above. If the genomes in view fall out of view to the right of the screen, use the horizontal sliders to scroll the image and bring the whole sequence into view, as shown below. You may have to play around with the level of zoom to get the whole genomes shown in the same screen as shown above.

Page 6: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-6-

Module 3: Comparative Genomics

Notice that when you scroll along with either slider both genomes move together. This is because they are ‘locked’ together. Right click over the middle comparison view panel. A small menu will appear, select Unlock sequences and then scroll one of the horizontal sliders. Notice that ‘LOCKED’ has disappeared from the comparison view panel and the genomes will now move independently

You can optimise your image by either removing ‘low scoring’ (or percentage ID) hits from view, as shown below 1-3 or by using the slider on the the comparison view panel (4). The slider allows you to filter the regions of similarity based on the length of sequence over which the similarity occurs, sometimes described as the “footprint”. In the example shown below, the minimum alignment size has been increased to 166.

LOCKED

1. Right button click in the Comparison View panel

4

1

3 Move the sliders to manipulate the comparison view image

2. Select either Set Score Cutoffs or Set Percent ID Cutoffs

Page 7: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-7-

Module 3: Comparative Genomics

4. Things to try out in ACT1. Double click red boxes to move the alignment to this position.2. Zoom right in to view the base pairs and amino acids of each sequence.3. Load annotation files into the sequence view panels - try LmjF11.embl and

Lin.chr11.embl4. Use some of the other Artemis features eg. graphs etc.5. Navigate to an ID name, base number, etc on either sequence using the

Artemis navigator.6. Find an inversion in one genome relative to the other then flip one of the

sequences. To reverse and complement a sequence right click the sequence window, then select either flip query sequence or flip subject sequence. This can also be accomplished by clicking on the sequence window and then Edit>Bases>Reverse And Complement

7. Find a region containing duplicated genes in one genome.

Once you have finished this exercise remember to close this ACT session down completely before starting the next exercise

Page 8: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-8-

Module 3: Comparative Genomics

Exercise 2

Comparison of Trypanosoma brucei chromosome 1 to Leishmania major chromosomes 12 and 20.

Introduction

The structural annotation and analysis of the whole genome of Leishmania major and Trypanosoma brucei is mostly completed. This allows us to perform comparative analysis of the genomes of trypanosomatid parasites and understand the basic biology of their parasitism, based on the similarities / dissimilarities between the parasites at DNA / protein level.

Aim

You will be performing a three-way comparison between chromosome 1 of Trypanosoma brucei, and it’s homologous chromosomes - 12 and 20 in Leishmania major. This exercise will provide some insight into the difference in chromosome organization and gene content between these too genera with some reference to differences in their biology.

The files that you are going to need are:Tb927_01_v4.embl - T. brucei sequence and annotation fileLmjchr12.embl - L. major sequence and annotation file

Lmjchr20.embl - L. major sequence and annotation fileLmjchr12.fasta-txTb927_01_v4.fasta -Comparison file1Tb927_01_v4.fasta-txLmjchr20.fasta.crunch.gz -Comparison file 2

Leishmania majorChromosome 12

Trypanosoma bruceiChromosome 1

TBLASTXcomparison

Leishmania majorChromosome 20

TBLASTXcomparison

Page 9: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-9-

Module 3: Comparative Genomics

Lmjchr20.embl

Lmjchr12.embl

Lmjchr12.fastaTb927_01_v4.fasta.crunch.gz

Tb927_01_v4.embl

Tb927_01_v4.fasta-txLmjchr20.fasta.crunch.gz

Drag and dropFiles from File manager

Add more than two sequences

After a few seconds a window like that on page 8 should appear.

Use some of the techniques previously described to answer the questions onthe next page.

Page 10: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-10-

Module 3: Comparative Genomics

Exercise 2

Things to try:

1. Adjust the alignment size and/or score cut-off and/or percent ID cutoff to filter out undesired blast hits.

2. Find a break in synteny between T. brucei chromosome 1 and either of the Leishmania chromosomes.

3. Find a gene deletion/insertion event in T. brucei relative to either Leishmania chromosome.

4. Find an inversion in T. brucei chromosome 1 relative to either Leishmania chromosome.

5. Find a gene that has been duplicated in T. brucei relative to Leishmania.

Questions to Address using this data:

2. What is the relationship between these three chromosomes?3. There are tblastx hits along the entire length of both Leishmania

chromosomes when compared to T. brucei before any adjustment is made to alignment size or score cut-off. What do these hits represent?

4. Are there large chromosomal regions unique to T. brucei? What is the significance of these regions?

Page 11: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-11-

Module 3: Comparative Genomics

Exercise 3Prediction of gene models:

In the previous Artemis exercise we annotated gene models on chromosome 12 of L. major based on codon usage and NCBI blast alignments. This exercise demonstrates how ACT can complement structural annotation by comparing a newly generated sequence to an existing, annotated orthologous chromosome.

Here we will be comparing the EMBL file generated for Leishmania major chromosome 12 to a fully annotated EMBL file of Leishmania infantum.

The first part of this exercise will involve generating ‘crunch’ files containing the blast results for comparison in an ACT window using the on-line resource Webact at the URL shown below

www.webact.org

Select Generate

ACT will accept tabular (-m 8) formatted BLAST results between two sequences. These can be generated by BLAST searching a local database, or though on-line tools like webACT.

Page 12: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-12-

Module 3: Comparative Genomics

Select Lmjchr12.fasta from the file menu

Select Lin.chr12.fasta from the file menu

Select BlastN

Click submit when done. After several seconds you will have the option to download the output files. The most important is the crunch file - ‘blast_out1’

Page 13: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-13-

Module 3: Comparative Genomics

Exercise 3 Part

Using ACT as an Annotation tool:1. Load fasta files for L. major and L. infantum chromosome 122. Load the ‘crunch file’ generated by WebACT - default name is blast_out13. Read in EMBL files for both sequences, use the EMBL file generated from

Module 1 for L. major (NOTE! There is another file called Lmj12.embl in this directory corresponding to the fully annotated database version of this chromosome. For this exercise use the LmjF12new.embl file generated in Module 1)

Lmj12.fasta

Lin.12.fasta

blast_out1

Select the sequence to read an entry into

Read Lmj12new.embl into Lmj12.fastaRead Lin.chr12.embl into Lin.chr12.fasta

Page 14: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-14-

Module 3: Comparative Genomics

EMBL files can be annotated directly using ACT through a combination of positional orthology and sequence similarity. This exercise is meant to demonstrate how we can use sequence alignment information in combination with techniques described in module 1 to add new gene models.

Missing L. major orthologues?

Load codon usage tables for L. major by selecting the graph menu, then Lmjchr12.fasta>Add User Plot>Lmajorcodon, the same can be done for Lin.chr12. You may have to resize the window to see both codon plots in the same window. You might also try right clicking on the sequence and de-selecting ‘Revere Frame Lines’ to display only sequence in the forward frame.

In the example above it is immediately obvious that a gene model is missing in the partially annotated Leishmania major chromosome 12. Further downstream of this we can see several large ORFs aligned against annotated gene models in the L. infantum chromosome.

Load codon usage plots for both sequences.

Page 15: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-15-

Module 3: Comparative Genomics

1. Double click on ORF with mouse scroll button.

2. Select ‘create feature from base range’.

3. Zoom in on new gene model using scroll bar.

4. Use the trim shortcut <ctrl-t> to trim gene model to same Met as L. infantum.

Select the new gene model. By moving the zoom scroll bar on one of the sequence panels it is possible to compare two sequences at the sequence, or peptide level. In some cases it may be necessary to unlock the sequence to move it independently of the other.

In ACT, keyboard shortcuts using <ctrl> work on the query sequence only. To use the same shortcut for a subject sequence use <Alt>.

Page 16: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-16-

Module 3: Comparative Genomics

Exercise 2 Part IV

ACT Annotation of Leishmania major gene models

1. Using the techniques described, look for disagreements between the annotated L. infantum chromosome 12 and your annotations on the L. major chromosome 12 from module 1.

2. Try adding missing gene models to L. major, informed by the ACT alignment and codon usage.

3. Look for examples of the following between these two chromosomes:

• Repeated genes/regions

• Chromosomal rearrangements, translocations, etc.

Example location: 389400..422400, in Lin.chr12.embl.

Page 17: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-17-

Module 3: Comparative Genomics

17

Exercise 4

AimUsing ACT, you will be viewing the apicoplast genomes from five apicomplexan parasites including P. falciparum and do basic comparitive analysis of the sequences. You will identify key features and spot differences between the apicoplast genomes of all these parasites.

Out of the five apicoplast genomes, the Babesia bigemina apicoplast genome has been recently sequenced at the PSU (WTSI) and is unpublished. It contains ‘first pass’ manual annotation at present. The other four apicoplast genomes are all published previously and the sequences have been retrieved from the EMBL database. The file containing P. falciparum apicoplast genome (PF_apicoplast_genome.embl) was created by joining two separate entries (EMBL accession numbers X95275 & X95276) in the EMBL database. The files that you are going to need are:1) BG_apicoplast_genome.embl- sequence & annotated file for Babesia bigemina2) TP_apicoplast_genome.embl- sequence & annotation file for Theileria parva3) PF_apicoplast_genome.embl- sequence & annotation file for Plasmodium falciparum4) TG_apicoplast_genome.embl- sequence & gene model file for Toxoplasma gondii5) ET_apicoplast_genome.embl- sequence & gene model file for Eimeria tenella

6) TP_BG_comparison - tblastx comparison file of T. parva & B. bigemina7) TP_PF_comparison - tblastx comparison file of T. parva & P. falciparum8) PF_TG_comparison - tblastx comparison file of P. falciparum & T. gondii9) TG_ET_comparison - tblastx comparison file of T. gondii & E. tenella.

First, open an ACT window and then open the annotation and the appropriate comparison files in the order of 1 – 6 – 2 – 7 – 3 – 8 – 4 – 9 – 5 (the numbers are designated above).

You will need to click on ‘more files’ to upload more than 2 sequences and the comparison flies.

Click on ‘apply’ after you have uploaded all the files.

Page 18: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-18-

Module 3: Comparative Genomics

Upload the files in sequential order as described in the previous page

Click on here to load more files and select the appropriate file.

Remember you can also use the file manager to drag in the files

Click on here to read all the files that you have selected.

Click on ‘no’ if any small dialogue window appears while reading / opening the files.

Page 19: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-19-

Module 3: Comparative Genomics

Can you see any conserved gene order (in the apicoplast genome) between the B. bigemina & T. parva or between P. falciparum, T. gondii and E. tenella ?Can you obtain a clearer picture of the ACT 5-way comparison figure by filtering out the low scoring segments, using the blast score cut off feature?Zoom in and look at some of the genes encoded within theses regions. View the details by clicking on the feature, and then select `Edit selected feature’ from the ‘Edit’ menu after selecting the appropriate CDS feature.

You may need to ‘lock’ or ‘unlock’ sequences to get all of them visible within the same window.

You may also need to ‘flip’ one or more sequences for the comparative analysis.

Use the right click on your mouse and select score cutoff window to appear. Scroll along the bar to screen out low scoring hits

Page 20: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-20-

Module 3: Comparative Genomics

After filtering out the low-scoring blast matches, you should be able to see a figure like the image below.

B. bigemina (BG)

T. parva (TP)

P. falciparum (PF)

T. gondii (TG)

E. tenella (ET)

Page 21: -1- Module 3: Comparative Genomics Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed

-21-

Module 3: Comparative Genomics

Can you fill in the table, based on your analysis?

You may need to use some of the features of Artemis that you have learned in earlier Artemis modules to extract some of the numbers.

Can you make a schematic diagram showing major features of P. falciparum apicoplast genome?

Can you show how P. falciparum apicoplast genome compares with the apicoplast genomes of other four related parasites?

Features BG TP PF TG ET

Size

No. of CDS features

No. of tRNA genes

Arrangement of genes on both DNA strands?

Unique gene(s) in TP

Unique gene(s) in BG

Identify a few genes that are present in all 5 organisms

Other comments