how to make a monkey: functional adaptation in the primate genome

24
How to make a monkey: functional adaptation in the primate genome Rutger Vos Marie Curie Research Fellow

Upload: rutger-vos

Post on 06-May-2015

1.088 views

Category:

Technology


4 download

DESCRIPTION

Presentation to the "Workshop on Parallel and Distributed Processing of Large Genome Data", 22 February 2011, DBCLS, Tokyo (http://mlab.cb.k.u-tokyo.ac.jp/en/events/lgd/). The presentation describes the methodological issues surrounding the design of a workflow for assigning orthology among primate genomes, testing them for evidence of selection and interpreting the results using the Gene Ontology.

TRANSCRIPT

Page 1: How to make a monkey: functional adaptation in the primate genome

How to make a monkey: functional adaptation in the

primate genomeRutger Vos

Marie Curie Research Fellow

Page 2: How to make a monkey: functional adaptation in the primate genome

Outline• Introduction

– The question – Primate genomes– Homology across genomes– Finding evidence for natural selection– Characterizing gene function

• Methods– Computational infrastructure– Basic workflow steps– Workflow design

• Results– Preliminary findings

• Conclusions• Acknowledgements

Page 3: How to make a monkey: functional adaptation in the primate genome

The question

Which gene functions were under directional selection in primate evolutionary history?

Page 4: How to make a monkey: functional adaptation in the primate genome

Primate genomes

Homo sapiensHuman

Pongo pygmaeusOrangutan

Tarsius syrichtaPhilippine tarsier

Pan troglodytesChimpanzee

Macaca mulattaRhesus monkey

Otolemur garnettiiGreater galago

Gorilla gorillaGorilla

Callithrix jacchusCommon marmoset

Microcebus murinusGray mouse lemur

Page 5: How to make a monkey: functional adaptation in the primate genome

Primate genomes

~65 MYA (K/T boundary)

Apes

Old world monkeys

New world monkeys

TarsiersLemurs

Bush babies

Page 6: How to make a monkey: functional adaptation in the primate genome

Homology: Orthologs and paralogs

Page 7: How to make a monkey: functional adaptation in the primate genome

Evidence of selection: dN/dS ratio

Page 8: How to make a monkey: functional adaptation in the primate genome

Evidence of selection: dN/dS ratio

• Or Ka/Ks or ω, the ratio of non-synonymous over synonymous substitutions– dN/dS > 1: positive selection– dN/dS ≈ 1: neutral evolution?– dN/dS < 1: stabilizing selection

Page 9: How to make a monkey: functional adaptation in the primate genome

Gene function: the Gene Ontology

• GO is a hierarchical database of terms for genes

• Terms are structured in a directed acyclic graphs

• Terms are organized in three domains: biological process, cellular component and molecular function

Page 10: How to make a monkey: functional adaptation in the primate genome

Gene function: the Gene Ontology

Page 11: How to make a monkey: functional adaptation in the primate genome

Methods: Basic workflow steps

1. Protein BLAST all vs. all2. Find Reciprocal Best protein Hit clusters3. Protein align RBH clusters4. Backtranslate protein alignments to cDNAs5. Perform dN/dS ratio tests on all branches6. Lookup GO terms for sequence GIs7. Interpret results

Page 12: How to make a monkey: functional adaptation in the primate genome

Methods: Basic workflow design

• Build a single BLAST database of all genomes, then,

• To parallelize the analysis:– Split the data into nine sets (for nine species)– Split each of nine genomes into files for each gene

(~20k files per species)– Process files in parallel

Page 13: How to make a monkey: functional adaptation in the primate genome

Methods: File processing

Homo_sapiens.sh

Pan_troglodytes.sh

…Makefile

qsub setenv

qsub setenv

mak

e -j

4 al

l

Page 14: How to make a monkey: functional adaptation in the primate genome

Methods: Software used

• NCBI standalone BLAST (formatdb, blastp, fastacmd)

• Muscle• GeneWise• HyPhy• BioPerl/Bio::Phylo (for parsing, logging and

wrapping, all scripts under svn)

Page 15: How to make a monkey: functional adaptation in the primate genome

Methods: Project organization

From: Noble, W.S., 2009. A Quick Guide to Organizing Computational Biology Projects. PLoS Comput. Biol. 5(7).

Page 16: How to make a monkey: functional adaptation in the primate genome

Methods: ThamesBlue hardware

• One of the 100 fastest supercomputers in the world

• IBM BladeCenter cluster • JS21 and JS20 Blade servers

with 60TB of storage connected via a Myrinet 2G network.

• SuSE Linux Enterprise Server • General Parallel File System• Batch jobs managed with

Torque.

Page 17: How to make a monkey: functional adaptation in the primate genome

Results

• 5952 loci with >= 2 RBHs relative to humans• 2346 loci with dN/dS deviation somewhere

(p<0.05) Homo sapiens

Pan troglodytes

Gorilla gorilla

Pongo pygmaeus

Macaca mulatta

Callithrix jacchus

Tarsius syrichta

Microcebus murinus

Otolemur garnettii

Page 18: How to make a monkey: functional adaptation in the primate genome

Results: some interesting terms

• Forebrain development, lifespan (and apoptosis), learning and social behavior in apes, including “deep” nodes

• Eye development in “higher” monkeys• Terms to do with pregnancy• Terms to do with male-male competition• Etc. Etc. (…lots of hard to interpret molecular

processes, of course…)

Page 19: How to make a monkey: functional adaptation in the primate genome

“Brain genes”

Page 20: How to make a monkey: functional adaptation in the primate genome

Visual system

• Primates have a highly variable visual system:– Old World monkeys: three types of cones (unique

among mammals)– New World monkeys: females trichromatic, males

dichromatic

Page 21: How to make a monkey: functional adaptation in the primate genome

Biological conclusions

• Very, very, very, very preliminary: highest dN/dS ratios in functions for which there are multiple “optima” among primates:– Different placentation systems– Different mating systems– Different visual systems– Different life histories and brain mass investments

Page 22: How to make a monkey: functional adaptation in the primate genome

Methodological conclusions

• Nine genomes is not that much. As FASTA files, it’s a 14Gb zipped archive (AA+cDNA).

• The problem was trivially parallelizable, so I didn’t use any MPI versions of softwares.

• Simple, consistent workflow and project design conventions are a lifesaver.

• Make each step small enough so you can rerun it, because you will.

Page 23: How to make a monkey: functional adaptation in the primate genome

Summary

• I discussed:– Primate evolution and adaptation– Ortholog-finding– Alignment (multiple proteins, cDNA to protein)– Tree-based dN/dS ratio tests– Gene Ontology term enrichment– Methodological challenges

Page 24: How to make a monkey: functional adaptation in the primate genome

Acknowledgements

• Funding: FP7-PEOPLE-IEF-2008/N°237046• DBCLS for their kind invitation• Mark Pagel, Andrew Meade for discussion and

help designing the workflow