annotating genomes using maker-p and iplant. what are annotations? annotations are descriptions of...

16
Annotating genomes using MAKER-P and iPlant

Upload: elwin-lester

Post on 05-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Annotating genomes using MAKER-P and iPlant

Page 2: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

What Are Annotations?

• Annotations are descriptions of features of the genome– Structural: exons, introns, UTRs, splice forms etc.– Coding & non-coding genes– Expression, repeats, transposons

• Annotations should include evidence trail– Assists in quality control of genome annotations

• Examples of evidence supporting a structural annotation:– Ab initio gene predictions– ESTs– Protein homology

Page 3: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Secondary Annotation• Protein Domains

– InterPro Scan: combines many HMM databases• GO and other ontologies• Pathway mapping

– E.g. BioCyc Pathway tools

Page 4: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Challenges in Plant Genome Annotation• Genomes are BIG • Highly repetitive• Many pseudogenes• Assembly contamination• Incomplete evidence• No method is 100% accurate

Page 5: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Options for Protein-coding Gene Annotation

Yandell & Ence. Nature Reviews Genetics 13, 329-342 (May 2012) | doi:10.1038/nrg3174

Page 6: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Typical Annotation Pipeline• Contamination screening• Repeat/TE masking• Ab initio prediction• Evidence alignment (cDNA, EST, RNA-seq,

protein)• Evidence-driven prediction• Chooser/combiner• Evaluation/filtering• Manual curation

Page 7: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

MAKER-P Automated Pipeline

Ab initio prediction Evidence

MPI-enabled to allow parallel operation on large compute clusters

Collaboration with Yandell Lab

Repeat Library

Page 8: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

What is a GFF File?

Generic Feature Format

Page 9: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

• W559 - Annotation of the Lobolly Pine Megagenome—Jill Wegrzyn– 20.15 Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours

• P157 - Disease Resistance Gene Analysis on Chromosome 11 Across Ten Oryza Species

– 10 rice species (each w/12 chromosome pseudomolecules)– 96 CPU per chromosome (1152 CPU total) ~ 2hr per genome

9

22,656 CPU cores on1,888 nodes Genome Assembly Size

(Mb) CPU Run Time

Arabidopsis thaliana TAIR10 120 600 2:44Arabidopsis thaliana TAIR10 120 1500 1:27Zea mays RefGen_v2 2067 2172 2:53

TACC Lonestar Supercomputer

Campbell et al. Plant Physiology. December 4, 2013, DOI:10.1104/pp.113.230144

PAG 2014:

MAKER-P at iPlant

Page 10: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

MAKER-P at iPlant

• Virtual image• MPI-enabled for parallel computing• Check out with up to 16 CPU• Tested with 4 CPU instance

– Completed rice chr 1 in 8 hr 45 min

10

Atmosphere: MAKER_2.28 (emi-F13821D0)

Page 11: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

MAKER-P Tutorial

https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial

Page 12: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Page 13: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Page 14: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Page 15: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Documentation and Help