irida: a federated bioinformatics platform enabling richer genomic epidemiology analysis in public...

56
IRIDA: A federated bioinformatics platform enabling richer genomic epidemiology analysis in public health William Hsiao, Ph.D. [email protected] @wlhsiao BC Centre for Disease Control Public Health Laboratory and University of British Columbia March 21 2016, UT San Antonio

Upload: william-hsiao

Post on 15-Apr-2017

519 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA: A federated bioinformatics platform enabling richer genomic epidemiology analysis in public

healthWilliam Hsiao, Ph.D.

[email protected]@wlhsiao

BC Centre for Disease Control Public Health Laboratory and University of British Columbia

March 21 2016, UT San Antonio

Page 2: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health
Page 3: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Roles of Public Health Agencies• Public Health (PH) agencies around the world track and intervene the

spread of diseases to improve health of the population• PH agencies also come up with policies and strategies to prevent

diseases from occurring• PH laboratories test patient and environmental samples and

determine the cause of diseases• At the BC Public Health Lab, we process on average, 3,000 samples a

day or about 1 million samples a year.

Page 4: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Dual Arms of a Public Health AgencyWhat did you eat? Where did you eat

that? When?

What strain of Salmonella

Enteritidis is it?

Epidemiological Investigation

Laboratory Investigation

Identify common exposure

Identify the culprit pathogen

Confirmed by Epi

Confirmed by Lab

Page 5: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Current State of Clinical Microbiology Laboratory

Didelot et al. 2012. doi:10.1038/nrg3226.

• Culture to isolate organisms using different media

• Different diagnostic tests and typing and subtyping methods

• Different drug sensitivity tests

Page 6: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory

• Growth characteristics • Phenotypic panels • Agglutination reactions • Enzyme immuno assays (EIAs) • PCR • DNA arrays (hybridization) • Sanger sequencing of marker genes• DNA restriction • Electrophoresis (PFGE, capillary)

Each pathogen is characterized by methods that are specific to that pathogen in multiple workflows (separate workflows for each pathogen) TAT: 5 min – weeks (months)

Source: Rebecca Lindsey

Page 7: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

• Many incompatible systems

• Paper and Fax communication common

• Rich case information conveyed verbally or in free text

• Require data re-entry and re-coding

National Ministry of Health

Provincial / State public health dept.

National laboratory

Local public health dept.

Provincial / State laboratory

Cases

Physicians Local laboratory

Fax/Electronic

Fax

Phone/Fax

Electronic/Paper

Electronic/Fax/Phone Mailing of Samples/Fax/Eelctronic

Source: M. Taylor, BCCDC

Current State of Public Health Epidemiology

Page 8: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

The Era of Molecular Epidemiology

• Molecular test results are often more specific and sensitive than traditional phenotypical or biochemical tests

• These biomarkers can be correlated to epidemiological investigations (People, Place, Time)

• Provides linkage based on common exposure to the same pathogen at the molecular level

BUT….• Most tests detect one or a few of specific biomarkers, representing a

fraction of the pathogens’ genetic information• As pathogens evolve, targeted tests can lose their specificity

Page 9: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Era of Whole Genome Sequencing (WGS) = lots of High Quality Data

• Capture the pathogen’s entire genetic makeup• Unbiased (~97-99+% of the genome captured using common sequencing approaches) • Significantly more data than traditional methods• Allow higher resolution and higher sensitivity analysis to be applied• Allow value-added

evolutionary & Functionalstudy of the pathogens

• Virulence factors• AMR genes

• These genomics data can be usefulfor downstream research use (e.g.comparative genomics)

Page 10: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

NGS Reduces Sequencing Cost allowing PHM Sequencing

$10K per human genome or $10 per bacterial genome

$100M per human genome

Page 11: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Whole Genome Sequencing of Foodborne Pathogens

• UK Public Health England committed to sequence all the Salmonella isolates submitted to PH Lab

• US FDA and CDC (supported by National Center for Biotechnology Information) created a distributed network of labs to utilize WGS for pathogen identification

https://publichealthmatters.blog.gov.uk/2014/01/20/innovations-in-genomic-sequencing/http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm

Page 12: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

PulseNet Canada• Part of PulseNet International

• a global laboratory surveillance network of enteric pathogens • based on Pulse Field Gel Electrophoresis (PFGE) fingerprint technology• Originally developed at CDC Atlanta for E. coli O157:H7 Outbreak investigation

in 1993

• PulseNet Canada formed in 2000 and shares fingerprint data with other PulseNet partners including direct database linkage with the CDC

• PulseNet is transitioning from PFGE to WGS within 3 years• Sequencing facilities are being setup in PH labs across Canada this

year

Page 13: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Whole Genome Sequenced Based Workflow

Didelot et al. 2012. doi:10.1038/nrg3226.

Page 14: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Each year, one in eight Canadians (or 4 million people)

get sick with a domestically acquired food-borne illness.http://www.phac-aspc.gc.ca/efwd-emoha/efbi-emoa-eng.php

Page 15: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Each year, one in six American (or 48 million people)

get sick with a domestically acquired food-borne illness.http://www.cdc.gov/foodborneburden/

Page 16: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Improve Public Health Microbiology using Genomic Epidemiology• Genomic Epidemiology Definition: Using whole genome sequencing

data from pathogens and epidemiological investigations to track spread of an infectious disease

• Lead to faster and simpler test menu and more actionable information (virulence factors, AMR, source tracking)

• However, there are a few hurdles to overcome….

Page 17: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Many Players in surveillance and outbreak – ineffective information sharing

Provincial public health dept.

National laboratory

Local public health dept.

Provincial laboratory

Cases

Physicians Frontline lab

Information

Bioinformatics and Analytical Capacities

Page 18: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Sequencing Improvement outpaces Computing Improvements

Cloud Computing

Cluster Computing

Algorithm improvement

Page 19: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA Platform Overview• IRIDA= Integrated Rapid Infectious Disease Analysis

• A free, open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations

Core Functions:

• Management of strain and genomic sequence data

• Rapid processing and analysis of genomic data

• Informative display of genomic results

• Sample, Case, and aggregate data (“metadata”) Management

Target audience:

• Public health agencies who need a platform to manage and process genomic data

• Public health agencies who need a platform to use genomics for outbreak investigations

IRIDA

Sequencing Instruments

Web Application

Data management

Built-in Analytical

Tools

External Galaxy

Command-line Tools

Page 20: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA is a Partnership

- Project Team has direct access to state of the art research in academia

- Project Team is directly embedded in user organization

Page 21: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA Has A Simple User Interface

Line List View (under testing)

Timeline View (Conceptualization)

Selectable fields

Travel

Symptoms and Onset

Exposure Types

Hospitalization

Launch a pipeline

Page 22: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA is a Robust, Extensible Platform

• IRIDA uses Galaxy tomanage workflows

• Adding additional pipelines is relativelyeasy

• Using a standardAPI to allow 3rd party tools to obtain data from IRIDA (e.g. IslandViewer and GenGIS)

IRIDA

Servlet Container

REST API Central File Storage

Web Interface

Application Logic

Compute ClusterGalaxy

$ ~ >_ Galaxy

Page 23: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

IRIDA is Built to Enable Collaboration

• Be able to compare pipelines• Pipeline implemented using Galaxy –

transparent and shareable • Define QC criteria using ontology to compare

the different pipelines of the same purpose

• Be able to share data to minimize data re-entry from one platform to another

• Federation of platforms using standard API to share data and analysis results

Page 24: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Distributed in Multiple, Flexible Access Options

• IRIDA is available in several different flavours.• Download latest version at https://github.com/phac-nml/irida

Local Install Virtual Machine Cloud Instance Public Version

Advantages Full control of the system; your data never leaves your centre

Full control of the system; Easy to setup

Full control of the system; does not require local computing infrastructure

No setup required, upload your data and have it processed using Compute Canada Resource

Disadvantages Computing infrastructure and IT support needed to main the resource

Not really scalable if run on your own desktop; some performance loss

Data goes into a cloud environment; uploading to cloud environment can be slow

Data goes into a public instance (data remain private to your account); upload can be slow

Page 25: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Contextual Information is Crucial for Interpreting Genomics Data.

Sequence

+ =

Contextual Info Find the Pathogenic Culprit!

Source: Emma Griffiths

Page 26: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Contextual Information Needs to be Shared…..So Keep the Next User in Mind.

International Partners Intervention Partners

Source: Emma Griffiths

Page 27: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

The

of Contextual InformationIsn’t

STANDARDIZED

Source: Emma Griffiths

Page 28: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

When Words Can Mean Different Things.

Semantic Ambiguity.

http://www.neurolang.com/wp-content/uploads/2013/05/RhymesAmbiguity.png

Page 29: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

“Ontologies are for the digital age what dictionaries were in the age of print.”

Logic

VocabularyHierarchy

Knowledge Extraction

Ontology

Ontology, A Way of Structuring Information.

• Standardized, well-defined hierarchy terms • interconnected with logical relationships• “knowledge-generation engine”

=

Source: Emma Griffiths

Page 30: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Ontologies Standardize Vocabulary and Enable Complex Querying

Simple Food Ontology Hierarchy

Animal Feed Poultry Water

Pellets Nuggets Deli Meats Bottled Well

Produce

Spinach Sprouts Whole Mice

Transmission through_ ingestion or contact

Treated by_filtration

Taxonomy_Spniacea oleracea

Preparation_Ready-to-Eat

Animal (Consumer)_Snake

Synonym_Cold Cuts

Source: Emma Griffiths

Page 31: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Case Studies: Ontology Can Help Resolve Issues of Taxonomy, Granularity and Specificity.

Leafy Greens

Spinach Lettuce

EndiveIcebergSpinacia oleracea Amaranthus hybridus

Taxonomy_species found in N. America

Taxonomy_species found in S. Africa Equivalent Subtypes

of Lettuce

a) Taxonomy & Granularity

Poultry

Chicken Nuggets

b) Specificity

Breast

Processing_Ready-to-Eat

Composition_breading, spices, chicken breast

Location of Purchase_Retail (Grocery Store vs Butcher)

Preparation_marinated

Source: Emma Griffiths

Page 32: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Ontology Acts Like A Rosetta Stone.

• Need a common language

• Humans AND computers need to read it

• Mapping allows interoperability AND customization

*ontologies can be translated into different human languages as wellRosetta Stone – Egypt, 196 BC• stone tablet translating same text

into different ancient languages

Source: Emma Griffiths

Page 33: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

GenEpiO: Combining Different Epi, Lab, Genomics and Clinical Data Fields.

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Sample MetadataIsolation Source (Food, Host

Body Product, Environmental), BioSample

Epidemiology InvestigationExposures

Clinical DataPatient demographics, Medical

History, Comorbidities, Symptoms, Health Status

ReportingCase/Investigation Status

GenEpiO(Genomic Epidemiology Application Ontology)

Source: Emma Griffiths

Page 34: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Use computers to identify common exposures, symptoms etc among genomics clusters

Example: Automating Case Definition generationCorrelate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with High-Risk Food Types Spinach Leafy Greens and Geographical Location of Vancouver

XXXXXXXXXXXXXXGenEpiO Will Help Integrate Genomics and

Epidemiological Data

Source: Emma Griffiths

Page 35: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Public Health Surveillance

Case Cluster Analysis

Result Reporting

Infectious Disease Epidemiology (from case to Intervention)Lab Surveillance (from sample to strain typing results)

Evidence Collection& Outbreak Investigation

Sample Collection& Processing

Sequence Data Generation &

Processing

Bioinformatics Analysis

Result Reporting

Whole Genome Sequencing (SO, ERO, OBI etc)

Quality Control (OBI, ERO)

LegendGenEpiO

OBO

Other

Anatomy (FMA)

Environment (Envo)

Food (FoodOn)

Clinical Sampling (OBI)

Custom LIMS

Quality Control (OBI, ERO)

AMR (ARO)

Virulence (PATO)

Phylogenetic Clustering (EDAM)

Mobile Elements (MobiO)

Quality Control (OBI, ERO)

Nomenclature & Taxonomy (NCBItaxon)

AMR (ARO) LOINC

Surveillance (SurvO)

Demographics (SIO)

Patient History (SIO)

Symptoms (SYMP)

Exposures (ExO)

Source Attribution (IDO)

Travel (IDO)

Transmission (TRANS)

Food (FoodOn)

Geography (OMRSE)

Outbreak Protocols

Surveillance (SurvO)

Food (FoodOn)

Surveillance (SurvO)

Mobile Elements (MobiO)

Infectious Disease (IDO)

Typing (TypON)

Page 36: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Genomic Epidemiology Ontology: Using a Common Language to Get Ahead of the Epidemiological Curve

Fewer cases…faster resolution!

Source: Emma Griffiths

Page 37: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Whole Genome SequencingSalmonella Enteritidis

Page 38: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

39Higher Salmonellosis Incidents in BC Higher salmonellosis rate than Canada national rate since 2007:

S. Enteritidis most commonly isolated serotype since 2006 (accounts for 30-50% of all Salmonella isolates in BC)

BCCanada

Source: http://www.bccdc.ca/NR/rdonlyres/B24C1DFD-3996-493F-BEC7-0C9316E57721/0/2011_CD_Annual_Report_Final.pdf

Page 39: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

• PFGE: Over half of isolates tested are 1 of 2 XbaI patterns

• Phagetyping (PT): ~half of isolates are 1 PT.

• So a better method of subtyping is needed for discrimination between cases of Enteritidis…

– OR is a very large outbreak (no supporting data for this)

Enteritidis Xba Patterns 1998-2012 SENXAI.0003

SENXAI.0001

SENXAI.0038

SENXAI.0006

SENXAI.0036

SENXAI.0004

SENXAI.0007

SENXAI.0008

SENXAI.0062

SENXAI.0041

SENXAI.0077

SENXAI.0002

SENXAI.0025

SENXAI.0060

SENXAI.0009

Enteritidis PT distribution 1998-2012 8

13a

13

Atypical6a

1

4

51

5b

41Untypable

1b

Untypeable

21

14b6

2All have been PFGE’d but not all PT’d

S. Enteritidis subtyping in BC

Source: Kim Macdonald

Page 40: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Isolates and Methods

• 36 isolates from 9 confirmed food-borne outbreaks • Collected over 9 years – many more isolates in the freezer waiting to be organized• Subtyping data by PFGE and PT available• Isolates from epi-linked sources available for 2 of the outbreaks

• Isolate Picking Criteria:• believed to be single source outbreak (common food, common food handler or common ingredients)• clear epidemiological linkage through enhanced interviews• majority of the clusters have the same PT and/or PFGE. Some have one PFGE band difference

• Sequencing library prepared using Nextera or Nextera XT• Sequenced on Illumina MiSeq 150bp or 250bp paired-end• Minimal depth cover 30X per genome (average coverage 50x)

Page 41: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

SNP Analysis• What is a SNP?

• A SNP (single nucleotide polymorphism) is DNA sequence variation occurring when a single nucleotide differs between two or more genomes

ATCGCGATATCATACGGATCGCAATATCATACGGATCGCGATATCATACGGATCGCGATATCATACGGATCGCAATATCATACGG

• SNP can be created from point mutation but can also be created from insertion and deletion of one nucleotide

Page 42: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Why are SNPs useful• Silent mutations that do not change protein sequences happen quite

frequently due to DNA replication errors => High Resolution

• SNPs occurs across the whole genome and can be detected from whole genome sequencing => Unbiased markers

• SNPs can be used to infer phylogeny of organisms• More shared SNPs = more closely related

Page 43: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Minimal Spanning Tree – colored by PT

PT8

PT4

PT13a

PT52

Note: for PT13a, 3 isolates have identical SNVs and collapsed into a single node; edges are not drawn to scale

Page 44: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Minimal Spanning Tree – Coloured by outbreak

Created using PhyloViz Online:http://online.phyloviz.net/

Page 45: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Whole Genome SequencingGiardia lamblia (duodenalis)

Page 46: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Giardia• Giardia is a primitive, eukaryote protozoa belonging to Diplomonads• Its representatives are differentiated into 8 lineages (A-H) with 2 lineages (A & B)

infecting human. Genomes (A, B, E) of 3 lineages are available.• G. duodenalis (lineage A & B) causes gastrointestinal disease (giardiasis) in human

and is spread by drinking water.• There is over 1 billion cases/ year worldwide.• In BC, various waterborne outbreaks have been reported (Isaac-Renton et al. 1992,

Safaris and Isaac-Renton 1992).• The infection may be transmitted by drinking water or food.• Giardia is often associated with an animal host (beaver, Castor canadiensis), and

giardiasis is called “beaver fever”.

Page 47: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Study Overview For the present study, 89 samples from 4 major

outbreaks (Creston, Kitimat, Revelstoke and Barriere), as well as other events were included.

Trophozoites were retrieved from -80C freezer, and DNA were extracted from Giardia strains from surface water, human and beaver using a QIAamp DNA mini kit.

The identity of isolates was confirmed by 18S rRNA but 18S doesn’t differentiate subtypes

Paired-end (PE) DNA libraries were constructed with Nextera® XT DNA kit, and whole genome re-sequencing was conducted by Illumina MiSeq.

Aldergrove

Dawson Creek

Kamloops

Terrace

Mission Creek

Source: Clement Tsui

Page 48: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Bioinformatics Pipelines

Genome Sequencing(MiSeq)

Quality checking(Fastqc,

Trim Galore)

Reference Mapping (Bowtie)

Variant calling (GATK or DiscoSNP)

SNPs analysis

De novo Assembly (SPades)

Gene calling (MAKER)

Comparative Genomics

Source: Clement Tsui

Page 49: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Both A and B are present in outbreaks

Barriere

Kitimat

Creston

0 1 2 3 4 5 6 7 8 9

BA2A1

Outbreaks could have multiple sources.

Source: Clement Tsui

Page 50: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

VANC/89/UBC/33, Vancouver, Canada

VANC/87/UBC/28, Aldergrove, Canada

B5/19, Calgary, Canada

VANC/90/UBC/43, Creston, Canada

VANC/87/UBC/29, Aldergrove, Canada

VANC/85/UBC/5, Coquitlam, Canada

HAMILTON84/76, Hamilton, New Zealand

VANC/91/UBC/73, Kamloops, Canada

VANC/90/UBC/64, Barriere, Canada Δ

BE/1/IP/0482/1/15, Banff, Canada

VANC/89/UBC/37, Kitimat, Canada

ATCC50170/93, Madison, USA

BTW/109, Botwood, Canada

VANC/92/UBC/101, Mission Creek, Canada

VANC/93/UBC/70, Barriere, Canada

VANC/92/UBC/104, Mission Creek, Canada

VANC/89/UBC/36, Oliver, Canada

VANC/90/UBC/71, Creston, Canada

VANC/96/UBC/126/Major, Revelstoke, Canada

HAMILTON7/75, Hamilton, New Zealand

CB2/108, Cornerbrook, Canada

VANC/85/UBC/1, Hornby Island, Canada

VANC/93/UBC/106/major, Mission Creek, Canada

VANC/88/UBC/35, Vancouver, CandaVANC/88/UBC/34, Vancouver, Canada

SI/16, Strathmore, Canada

BE/2/IPO583/1/14, Banff, Canada

VANC/94/UBC/121, Chilliwack, Canada

MONASTASHE/6, Monastashe River, Canada

D3/18, Calgary, Canada

VANC/87/UBC/27/major, Aldergrove, Canada

WHANGAREI8/79, Whangarei, New Zealand

VANC/90/UBC/52, Creston, Canada A1

Panglobal, zoonotic

Creston Revestoke Barriere ΔKitimat

Surface WaterHumansVeterinary

0.09

1

1

1

1

0.995

1

1

1

0.793

VANC/90/UBC/55/minor, Goat River beaver lodge, Canada

VANC/92/UBC/107, Vancouver, Canada

VANC/90/UBC/57, Bella Coola, Canada VANC/86/UBC/3, Ashcroft, Canada (Mexico)

VANC/87/UBC/22, North Vancouver, Canada

VANC/85/UBC/2, Smithers, Canada

VANC/93/UBC/39, Campbell River, Canada (Kenya/Sudan)

VANC/87/UBC/23, Prince George, Canada

VANC/90/UBC/42, Creston, Canada

A2

ATCC50803, Bethesda, USA (Afghanistan) ATCC30888/13, Portland, USA

VANC/90/UBC/62, Barriere, Canada ΔATCC50163/89, Philadelphia, USA

VANC/85/UBC/7, Quesnel, Canada

Source: Clement Tsui

Page 51: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

0.06

VANC/96/UBC/126/minor, Revelstoke, Canada

VANC/91/UBC/68/2, Terrace, Canada

VANC/94/UBC/122, Mission Creek, Canada

VANC/92/UBC/102, Mission Creek, Canada

VANC/87/UBC/25, Kelowna, CanadaVANC/91/UBC/74, Mission Creek, Canada

VANC/90/UBC/63, Barriere, Canada

VANC/87/UBC/26, Slocan River, Canada

VANC/90/UBC/54, Goat River beaver lodge, Canada

VANC/92/UBC/103, Mission Creek, Canada

VANC/94/UBC/125, Mission Creek, Canada

VANC/90/UBC/47, Kitimat, Canada

VANC/94/UBC/124, Mission Creek, Canada

VANC/89/UBC/48, Kitimat Canada

VANC/90/UBC/41, Creston, Canada

VANC/90/UBC/45, Creston, CanadaVANC/90/UBC/44, Creston, Canada

VANC/91/UBC/85, Mission Creek, Canada

VANC/91/UBC/72, Thompson River, Kamloops, Canada

VANC/90/UBC/49, Creston, Canada

VANC/89/UBC/59, Nanaimo, Canada

VANC/91/UBC/67, Terrace, CanadaVANC/91/UBC/68/1, Terrace, Canada

VANC/93/UBC/105, Mission Creek, Canada

VANC/87/UBC/27/minor, Aldergrove, BC

VANC/90/UBC/55/major, Goat River beaver lodge, Canada

VANC/90/UBC/46, Creston, Canada

VANC/92/UBC/84, Mission Creek, Canada

VANC/90/UBC/53, Goat River beaver lodge, Canada

VANC/92/UBC/99, Mission Creek, Canada

VANC/91/UBC/65, Barriere, Canada

VANC/90/UBC/56, Goat River beaver lodge, Canada

VANC/87/UBC/8, North Vancouver, Canada

VANC/92/UBC/98, Mission Creek, Canada

VANC/96/UBC/127, Revelstoke, Canada

VANC/90/UBC/60, Creston, Canada

VANC/90/UBC/51, Kitimat, Canada

VANC/90/UBC/61, Barriere, Canada

VANC/93/UBC/106/minor, Mission Creek, Canada

VANC/96/UBC/129, Revelstoke, Canada

VANC/90/UBC/58, Mission Creek, Canada

VANC/91/UBC/69, Muskwa River, Dawson Creek, CanadaVANC/85/UBC/9, Terrace, Canada

VANC/90/UBC/40, Creston, Canada

VANC/90/UBC/50/2, Creston, Canada

VANC/96/UBC/128, Revelstoke, Canada

1

1

1

1

0.999

1

0.968

0.9751

1

1

0.978

0.99

1

1

1

0.818

Creston Outbreak

Revelstoke Outbreak

Barriere Outbreak

Kitimat Outbreak

Kelowna,Mission Creek

Surface WaterHumansVeterinary

Source: Clement Tsui

Page 52: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Microbial genomics has been a valuable research tool• Help us understand:

• microbial evolution• pathogenesis• create novel industrial processes• create new laboratory tests

• Use historical isolates – not real time• Use of laboratory strains – no associated rich clinical and

epidemiological metadata

Page 53: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

Cultural and Practical DifferencesGenomics Research Laboratory Genomics Diagnostic Laboratory

Curiosity driven Production / Case driven

Exploratory analysis tolerated Exploratory analysis discouraged

Reproducibility = other labs’ problem Reproducibility critical

Tweaking protocols desirable Stability in protocols desirable

Protocols don’t need to be validated Protocols need to be validated

Novelty justifies the high cost of experiment

Conscious of cost per unit test; tests need to be scalable

By working together, we can bridge the cultural differences

Page 54: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health
Page 55: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

AcknowledgementsIRDA Project Principle InvestigatorsFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NMLRob Beiko – Dalhousie U.Joᾶo Carriҫo – U. of LisboaMorag Graham – NMLEduardo Taboada - NMLLynn Schriml – U. of Maryland

National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsonTarah LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsChrystal BerryLorelee TschetterAleisha ReimerPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall

Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo

BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella

Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert

Dalhousie UniversityAlex Keddy

McMaster UniversityAndrew McArthurDaim Sardar

European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid

European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

Sidra Medical CenterPatrick Tang

Salmonella ProjectKim MacdonaldMatthew CroxenLinda HoangAna Paccagnella Mark McCabeDiane EislerBrian AukNatalie PrystajeckyMarsha TaylorEleni Galanis

Giardia ProjectClement TsuiRuth MillerAnamaria CrisanDamion DooleyKirby CroninSara TanJustin DirkMark McCabeSunny MakBrian AukAnna LiC.P. FungLorraine McIntyreRenata ZanchettinNatalie PrystajeckyJudy Isaac-Renton

Page 56: IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiology Analysis in Public Health

57

IRIDA Annual General MeetingWinnipeg, April 8-9, 2015