fungal its meeting presentation

23
Metagenomic tools for the fungal community Holly Bik, UC Davis 19 October 2012

Upload: holly-bik

Post on 05-Dec-2014

1.105 views

Category:

Education


0 download

DESCRIPTION

My talk from the Fungal ITS meeting in Boulder, Colorado (sponsored by the Sloan Foundation). Discussing metagenomic tools for fungal studies, and how we can increase support for fungal researchers within our computational pipelines being developed at UC Davis.

TRANSCRIPT

Page 1: Fungal ITS meeting presentation

Metagenomic+tools+for+the+fungal+community+

Holly+Bik,+UC+Davis+19+October+2012+

Page 2: Fungal ITS meeting presentation

hAp://phylosiE.wordpress.com+

Page 3: Fungal ITS meeting presentation

Explicitly+PhylogeneLc+Approaches+Aligned+environmental+sequences+

Guide+Tree+

EvoluLonary+Placement+of+short+reads+

+++++++++

Page 4: Fungal ITS meeting presentation

We+provide:+•  Support+for+Paired+End+(raw)+Illumina+data+•  Marker+gene+data+for+Bacteria,+Archaea,+Eukaryotes,+Viruses+

•  Taxonomy+assignments+based+on+probability+distribuLons+over+a+reference+phylogeny+

•  Complement+to+exisLng+tools+–+QIIME/VAMPs+–  Inputs/outputs+will+be+compaLble+for+use+with+other+soEware+tools+

Page 5: Fungal ITS meeting presentation

Markers+

•  PMPROK+–+Dongying+Wu’s+Bac/Arch+markers+•  EukaryoLc+Orthologs+–+Parfrey+2011+paper+•  16S/18S+rRNA++•  Mitochondria+_+protein_coding+genes+•  Viral+Markers+–+Markov+clustering+on+genomes+•  Codon+Subtrees+–+finer+scale+taxonomy+

•  Extended+Markers+–+plasLds,+gene+families+

Page 6: Fungal ITS meeting presentation

Reference+Marker+Genes+

Page 7: Fungal ITS meeting presentation
Page 8: Fungal ITS meeting presentation

The+Monkey+–+Build+Marker+Packages+

FastTree

hmmbuild (ssu-build)

Mapping'File'(sequence'name,'NCBI'taxon'ID)'

Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'

Execute'build_marker'mode'

Generate'unique'IDs'for'input'sequences'

Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'

Alignment'File'(Marker'sequences'in'FASTA'format)'

Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''

Tree Reconciliation

Built Marker Packages

Index Marker Database

Clean'and'package'new'marker'genes'

New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'

Execute'index'mode'

Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'

NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'

Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'

Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'

PD'cutoff'

Built'PhyloSiS'Marker'package'

Tree' HMM'profile''(CMs'for'rRNA)'

Taxon'map' Representa?ve'sequences'

Alignment'

FastTree

hmmbuild (ssu-build)

Mapping'File'(sequence'name,'NCBI'taxon'ID)'

Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'

Execute'build_marker'mode'

Generate'unique'IDs'for'input'sequences'

Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'

Alignment'File'(Marker'sequences'in'FASTA'format)'

Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''

Tree Reconciliation

Built Marker Packages

Index Marker Database

Clean'and'package'new'marker'genes'

New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'

Execute'index'mode'

Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'

NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'

Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'

Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'

PD'cutoff'

Built'PhyloSiS'Marker'package'

Tree' HMM'profile''(CMs'for'rRNA)'

Taxon'map' Representa?ve'sequences'

Alignment'

Page 9: Fungal ITS meeting presentation

The+Kangaroo+–+SimulaLon+Data+

Select Taxa

PD on concatenated tree

Genome&Directory&Define&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of&

reads&to&generate&per&file&(default&=&100,000)&

Grinder&algorithm&randomly&generates&reads&from&selected&genomes,&outputs&simulated&PEAIllumina&and&454&datasets&

Execute&sim&mode&

Determines&PD&contribuFons&for&taxa&present&in&concatenated&guide&tree&in&PhyloSiH&marker&directory&

Two&separate&approaches&used:&1.  Select&some&number&of&taxa&that&contribute&

to&PD&(user&input,&default&=&10&taxa)&2.  Sample&taxa&uniformly&without&replacement&

Knockout Swaths of Taxa

Generated Simulated Reads

Simulation Marker Directory

Workflow&plugs&into&updateDB&to&remove&genomes&which&have&been&used&to&simulate&metagenome&data,&as&well&as&a&swath&of&related&taxa.&

A&new&marker&directory&is&created,&where&simulated&genomes&have&been&knocked&out&from&marker&packages.&&

Compute metrics between target and

remaining taxa

Calculated&metrics&include:&the&distance&to&nearest&neighbors,&connecFng&branch&lengths,&and&the&number&of&sampled&nodes&within&various&PD&units&of&connecFng&nodes.&

Page 10: Fungal ITS meeting presentation

DBupdate+–+Mining+new+genomes+

Amino Acid Tree

Run PhyloSift (search + align)

Execute'

phylosi/_dbupdate.pl'

A'taxa'set'is'selected'with'a'

maxPD'cutoff'of'0.02'and'a'new'

tree'is'inferred'

EBI'

Genomes'

Infer Updated Tree

PD'metric'used'to'split'guide'tree'into'

smaller'subtrees;'subsets'of'taxa'are'

selected'such'that'no'branch'connecEng'

them'has'length'>0.X'for'some'value'of'X'

Add'new'sequences'to'marker'packages'

JGI'

Genomes'

Private'

Genomes'

NCBI'

Genomes'

Nucleotide Tree

Prune Tree

Update reference sequences with

new data

New'sequences'added'at'0.25'PD'for'amino'

acid'tree;'higher'PD'threshold'enables'

more'aggressive'searches'of'reference'

database,'since'LAST'searching'is'faster'

with'fewer'sequences.'

Reconcile'NCBI'taxonomy'IDs'with'

phylogeneEc'topologies,'for'both'

amino'acid'tree'and'codon'subtrees'

Tree Reconciliation

Codon Subtrees

Package Markers

Users’'local'marker'databases'are'automaEcally'

scanned'each'Eme'PhyloSi/'is'run'and'any'new'

updates'are'automaEcally'downloaded'if'available'

Automated Download to

PhyloSift Users

Page 11: Fungal ITS meeting presentation

Tree+ReconciliaLon+in+PhyloSiE+

Environmental,Sequences,

Named,Taxa,

Page 12: Fungal ITS meeting presentation
Page 13: Fungal ITS meeting presentation

Great!,,

Not,Bad,,

Ge9ng,Tricky…,,

Page 14: Fungal ITS meeting presentation

Tree+Placement+Fat+Tree+_+Guppy+

Page 15: Fungal ITS meeting presentation

Marine+Metagenome+

Chemoautotrophic+bacteria+–+oxidize+ammonia+into+nitrite+

Alveolate+ProLsts+

Common+seawater+Archaea+

Page 16: Fungal ITS meeting presentation

Tree+Placement+Tog+Tree+_+Guppy+

Page 17: Fungal ITS meeting presentation

Marine+Metagenome+

Page 18: Fungal ITS meeting presentation

Marine+Metagenome+

Tree+Placement+Sing+Tree+_+Guppy+

Page 19: Fungal ITS meeting presentation

Linking+with+the+Fungal+ITS+community+

•  How+does+fungal+ITS+sequence+data+relate+to+your+project?+–  PhyloSiE+has+the+capability+to+add+any+marker+gene+reference+packages+that+are+relevant+for+specific+taxonomic+communiLes++

•  What+fungal+ITS+data+does+your+project+currently+provide+– None+–+but+we+do+mine+other+marker+genes+from+fungal+genomes+

•  What+fungal+ITS+data+is+your+project+hoping+to+provide?+– We+wouldn’t+provide+data,+but+can+work+with+users+to+increase+support+for+fungal+analyses+

Page 20: Fungal ITS meeting presentation

•  Is+your+project+involved+with+curaLng+fungal+ITS+sequences+– No,+but+we+would+curate+alignments+and+marker+packages+of+ITS+sequences+mined+from+public+databases+

•  If+so,+what+curaLon+strategies+are+being+implemented+for+your+project?+– Alignment+filtering+and+masking,+pruning+reference+trees+

•  What+tools+for+working+with+fungal+ITS+sequences+does+your+project+currently+provide?++– None+so+far+–+but+can+be+implemented+if+given+a+reference+dataset+(e.g.+alignment)+

Linking+with+the+Fungal+ITS+community+

Page 21: Fungal ITS meeting presentation

Linking+with+the+Fungal+ITS+community+

•  What+tools+are+you+developing+/+planning+to+develop?++– Current+focus+is+on+mulLsample+comparisons+– Gene+tree+reconciliaLon+– Probability+distribuLon+over+tree+topology+to+delimit+OTUs+(PhylogeneLc+OTUs)+

•  What+framework+of+fungal+taxonomy+does+your+project+use?++– NCBI_derived+taxonomy+(because+of+tree+mapping/reconciliaLon+issues)+

Page 22: Fungal ITS meeting presentation

SATELLITE MEETING

Eukaryotic Metagenomics

March/April 2013 UC Davis

Page 23: Fungal ITS meeting presentation

Acknowledgements+UC+Davis+•  Jonathan+Eisen+•  Aaron+Darling+•  Guillaume+Jospin+•  Dongying+Wu+•  David+Coil+

+PhyloSiE+SoEware+Development+on+Github:+hAps://github.com/gjospin/PhyloSiE++Google+Group+for+user+support:++hAps://groups.google.com/d/forum/phylosiE++TwiAer:+@PhyloSiE+