comparative genome and proteome analysis of anopheles gambiae and drosophila melanogaster

49
Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster Evgeny M. Zdobnov, Christian von Mering, Ivica Letunic, David Torrents, Mikita Suyama, Richard R. Copley, George K. Christophides, Dana Thomasova, Robert A. Holt, G. Mani Subramanian, Hans-Michael Mueller, George Dimopoulos, John H. Law, Michael A. Wells, Ewan Birney, Rosane Charlab, Aaron L. Halpern, Elena Kokoza, Cheryl L. Kraft, Zhongwu Lai, Suzanna Lewis, Christos Louis, Carolina Barillas-Mury, Deborah Nusskern, Gerald M. Rubin, Steven L. Salzberg, Granger G. Sutton, Pantelis Topalis, Ron Wides, Patrick Wincker, Mark Yandell, Frank H. Collins, Jose Ribeiro, William M. Gelbart, Fotis C. Kafatos, Peer Bork SCIENCE VOL 298 4 OCTOBER 2002 Presented by Leon G Xing

Upload: duard

Post on 04-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Comparative Genome andProteome Analysis of Anopheles

gambiae and Drosophilamelanogaster

Evgeny M. Zdobnov, Christian von Mering, Ivica Letunic, David Torrents, Mikita Suyama, Richard R. Copley, George K. Christophides, Dana Thomasova, Robert A. Holt, G. Mani Subramanian, Hans-Michael

Mueller,

George Dimopoulos, John H. Law, Michael A. Wells, Ewan Birney, Rosane Charlab, Aaron L. Halpern, Elena Kokoza, Cheryl L. Kraft, Zhongwu Lai, Suzanna Lewis, Christos Louis, Carolina Barillas-Mury, Deborah Nusskern, Gerald M. Rubin, Steven L. Salzberg, Granger G. Sutton, Pantelis Topalis, Ron Wides, Patrick Wincker, Mark Yandell, Frank H. Collins, Jose Ribeiro, William M. Gelbart, Fotis C. Kafatos, Peer Bork

SCIENCE VOL 298 4 OCTOBER 2002

Presented by Leon G Xing

Page 2: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Why Anopheles gambiae?

• It is the principal vector of malaria

• It carries many other infectious diseases

• Malaria afflicts more than 500 million people

• More than 1 million people die each year from malaria

Page 3: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

The Culprit

Page 4: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Why Drosophila melanogaster

• One of the most intensively studied organisms in biology

• Serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes

• Modest genome size ~ 180 MB• Its genome has been sequenced in

2000

Page 5: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Mosquito vs. Fruit Fly

• They diverged about 250 million years ago

• (Human and pufferfish diverged about 450 million years ago)

• Share considerable similarities• Half of the genes in both genomes

are interpreted as orthologs• Average sequence identity about

56%,

Page 6: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Mosquito vs. Fruit Fly

• Anopheles genome is twice the size of Drosophila

• Female Anopheles feeds on blood (Hematophagy), which is essential for egg development and propagation

• Viruses and parasites use Anopheles as a vehicle for transmission

Page 7: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Orthologs

• Genes in different species that evolved from a common ancestral gene by speciation

• Typically retain the same function in the course of evolution

Page 8: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Paralogs

• Genes related by duplication within an organism and have evolved a related but different function

Page 9: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Predict the function of a new protein

• A powerful approach is to use bioinformatics and domain database searches to find its characterized orthologs

• We know a lot about Drosophila but don’t know much about Anopheles

• Compare their genomes may deduce a lot of information

Page 10: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Drosophila melanogaster Genome

• The assembled and annotated genome sequence of 5 Drosophila melanogaster chromosomes is in GenBank

• It’s the collaboration between Celera and the Berkeley Drosophila Genome Project

• Published in the March 24, 2000 issue of Science.

Page 12: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Anopheles vs DrosophilaGene Comparison at Protein

Level

• The proteins are classified into 4 categories based on:– 12,981 deduced Anopheles proteins

out of 15,189 annotated transcripts– Omit transposon-derived bacterial

like sequences, and alternative transcripts

Page 13: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of Anopheles proteins

• 1:1 orthologs: – Anopheles proteins with one clearly

identifiable counterpart in Drosophila and vice versa

– 47% of the Anopheles– 44% of the Drosophila proteins

Page 14: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of Anopheles proteins

• “Many-to-many” orthologs.– Gene duplication has occurred in one

or both species after divergence

– Includes 1779 Anopheles proteins

Page 15: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of Anopheles proteins

• The third category:– Have homologs in Drosophila and/or

other species but without easily discernable orthologous relationships

– 3590 Anopheles predicted proteins

Page 16: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of Anopheles proteins

• The fourth category – Has little or no homology in

Drosophila but instead have best matches to other species.

– 1283 proteins

Page 17: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of Anopheles proteins

• Remaining proteins:– No detectable homologs in any other

species with a fully sequenced genome;

– 1437 in Anopheles– 2570 in Drosophila– Might be new or quickly evolving

genes.

Page 18: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Classification of proteins

Page 19: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Some Notes• The numbers and derived estimates

are approximations.• Annotation of genomes is an ongoing

effort• Some Anopheles genes have not been

sequenced yet• Highly polymorphic regions or in highly

repetitive contexts prone to errors• > 70% accuracy

Page 20: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

The core of conserved proteins

• The 1:1 orthologs (6089 pairs) can be considered the conserved core

• The average sequence identity is 56%

• Humans and pufferfish share 61%• Indicates that insect proteins

diverge at a higher rate

Page 21: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Properties of 1:1 orthologs.

Page 22: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Orthologous proteins constitute a core of

conserved functions

• Early embryogenesis are conserved between Drosophila and Anopheles

• 315 early developmental genes in Drosophila vs 251 genes showed a clear single ortholog in Anopheles

Page 23: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Orthologous proteins

• 85% of the developmental genes have single orthologs

• 47% for the genome as a whole

Page 24: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Protein family expansions and reductions

• Due to adaptations to environment and life strategies

• Leads to changes in cellular and phenotypic features

• Implies duplications after speciation

Page 25: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Protein family expansions and reductions example

• Epsilon subunit of the adenosine triphosphate-synthase complex

• Encoded by two genes in both Anopheles and Drosophila

• They might share a single-copy ancestral gene

• After speciation they were duplicated independently later

Page 26: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Expansions of proteins with FBN-like domains in Anopheles.

• Fibrinogen (FBN) are found originally in human blood coagulation proteins

• A large expansion of mosquito proteins contains a domain resembling the COOH-terminus of the beta and gamma chains of FBN

Page 27: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Expansions of proteins with FBN-like domains in Anopheles.

• Phylogenetic tree of 58 Anopheles and 13 Drosophila FBN genes

• They largely belong to two distinct species-specific clades

• Identified only two 1:1 orthologous relationships

Page 28: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

The significant implication of FBN gene expansion

• The massive expansion of the Anopheles gene FBN family might be associated with particular aspects of the mosquito's biology

• That is, hematophagy and exposure to Plasmodium

• Blood meal is a challenge associated with microbial flora in the gut and blood coagulation

Page 29: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

The implication of FBN gene expansion

• The bacteria-binding properties of FBNs might be important in controlling or aggregating bacteria in the midgut

• These proteins might be used as competitive inhibitors i.e. anticoagulants

• Some mosquito FBN proteins are up-regulated by invading malaria parasites

Page 30: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Expansion of FBN-like proteins in Anopheles

Page 31: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Gene losses in insects

• Some genes are absent in both Anopheles and Drosophila but are present in other eukaryotes

• Criteria: genes must be present in at least one animal but also in fungi or plants

Page 32: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Gene losses in insects.

Page 33: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Gene genesis and gene loss

• 1437 predicted genes in Anopheles have no detectable homology with genes of other species

• 522 of these have putative paralogs only within Anopheles

• At least 26 of such genes expressed in the adult female salivary glands

Page 34: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Strategy for identifying gene losses

• Search for genes that are present in only one of the two insects but that do have orthologs in other species

Page 35: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Gene Losses

• Widespread orthologs missing from both Anopheles and Drosophila are putative

insect-specific gene losses

• Example: – Insects are known to unable to synthesize

sterols – Absence of several enzymes involved in sterol

metabolism

Page 36: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Gene Losses example

• Absence of the DNA repair enzyme uracil-DNA glycosylase in insects

• DNA methylation can lead to spontaneous

deamination of cytosine to uracil

• Drosophila has long been known to have no or only very little DNA methylation

Page 37: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Cladogram based on Orthologs

Page 38: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Intron gain and loss

• Drosophila are known to have a reduction of noncoding regions

• 11,007 out of 20,161 Anopheles introns in 1:1 orthologs have equivalent positions in Drosophila

• Almost 10,000 introns have either been lost or gained

Page 39: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

The Drosophila Dscam gene

• Able to encode up to 38,000 proteins

through extensive alternative splicing

• Three different cassettes of duplicated exons that can generate exponential combinations of splice variants

• The numbers of exons within the cassettes are at least similar in Anopheles

Page 40: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster
Page 41: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Microsynteny

• Through evolution genome structure may vary greatly, but small regions of conserved gene will be retained

• Microsynteny studies the localized region of sequences with high similarity

Page 42: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Microsynteny blocks

Page 43: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Mapping of orthologs and microsyntenyblocks to chromosomal arms in Anopheles

and Drosophila.

Page 44: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Chromosome mapping

• Both Anopheles and Drosophila have five major chromosomal arms (X, 2L, 2R, 3L, and 3R, and a small chromosome 4 in Drosophila melanogaster).

• In Drosophila, reassortment of recognizable

chromosomal arms occurs by fission and fusion at the centromeres

Page 45: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Chromosome mapping

• The most conserved pair of chromosomal arms is Dm2L and Ag3R

• 76% of the orthologs and 95% of microsynteny blocks in Dm2L mapping to Ag3R

Page 46: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Chromosome mapping.

Page 47: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Chromosome mapping surprise

• Significant portions of the Anopheles X chromosome appear to have been derived from what are presently autosomal Drosophila chromosome segments

• 11% of Dm3R and 33% of Dm4

Page 48: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Homology of chromosomal arms

Page 49: Comparative Genome and Proteome Analysis of  Anopheles gambiae  and  Drosophila melanogaster

Thank you!