overview of the pathway tools software and pathway/genome databases

38
Overview of the Pathway Tools Software and Pathway/Genome Databases

Upload: haden-byrom

Post on 28-Mar-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of the Pathway Tools Software and Pathway/Genome Databases

Overview of the Pathway Tools Software

and Pathway/Genome

Databases

Page 2: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsIntroductions

BRG Staff Peter Karp Tomer Altman Joe Dale Fred Gilham John Myers Suzanne Paley Markus Krummenacker Ingrid Keseler Ron Caspi Alex Shearer Carol Fulcher

Attendees Where from, what genome? What do you hope to get out of the tutorial?

Page 3: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformatics

SRI International

Private nonprofit research institute

No permanent funding sources

1300 staff in Menlo Park

– Founded in 1946 as Stanford Research Institute

– Separated from Stanford University in 1970

– Name changed to SRI International in 1977

Page 4: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsSRI Organization

Information and Computing Sciences

Engineering SystemsAnd Sciences

PhysicalSciences

BiopharmaceuticalsAnd

PharmaceuticalDiscovery

Education and

Policy

Bioinformatics Research Group

Page 5: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsResearch in the SRI

Bioinformatics Research Group

BioCyc Database Collection EcoCyc MetaCyc

Pathway ToolsBioWarehouse

Page 6: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsOutline for Tutorial

Monday Introduction Pathway/Genome Navigator Introduction to Pathway/Genome Editors

Tuesday PathoLogic tutorial PathoLogic lab session – Build initial version of PGDB Pathway hole filler lecture+lab

Wednesday PathoLogic: Creating protein complexes, operon predictor, transport inference parser Pathway Tools Schema Model organism database projects

Thursday Advanced Pathway/Genome Editors

Friday Overviews and Omics Viewers Comparative analysis Structured Advanced Query Form Metabolite Tracing Regulation

Page 7: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsTutorial Goals

General familiarity with Pathway Tools goals and functionality

Ability to create, edit, and navigate a new PGDB

Create new PGDB for genome(s) you brought with you

Familiarity with information resources available about Pathway Tools to continue your work

Page 8: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsSRI’s Support for Pathway Tools

NIH grant finances software development and user support

Additional grants finance other software development

Email us bug reports, suggestions, questions

Comprehensive bug reports are required for us to fix the problem you reported

Keep us posted regarding your progress

Page 9: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsAdministrative Details

Please wear badge at all timesEscort required outside this room/hallwayLet us know when you are leaving

Use E-Bldg EntrancePhone numbers to call from entrance

Meals

Restrooms

Page 10: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsTutorial Format

Questions welcome during presentations

Lab sessions will take different amounts of time for different people

Refine your PGDB Read Pathway Tools manuals

Computer logins

Internet connectivity

Page 11: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPathway/Genome Database

ChromosomesPlasmids

Genes

ProteinsRNAs

Reactions

Pathways

Compounds

CELL

OperonsPromoters

DNA Binding SitesRegulatory Interactions

Sequence Features

Page 12: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsBioCyc Collection of

Pathway/Genome DatabasesPathway/Genome Database (PGDB) – combines information about

Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters,

operons

Tier 1: Literature-Derived PGDBs MetaCyc EcoCyc -- Escherichia coli K-12

Tier 2: Computationally-derived DBs, Some Curation -- 20 PGDBs

HumanCyc Mycobacterium tuberculosis

Tier 3: Computationally-derived DBs, No Curation -- 349 DBs

Page 13: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsTerminology –

Pathway Tools Software PathoLogic

Predicts operons, metabolic network, pathway hole fillers, from genome Computational creation of new Pathway/Genome Databases

Pathway/Genome Editors Distributed curation of PGDBs Distributed object database system, interactive editing tools

Pathway/Genome Navigator WWW publishing of PGDBs Querying, visualization of pathways, chromosomes, operons Analysis operations

Pathway visualization of gene-expression data Global comparisons of metabolic networks

Bioinformatics 18:S225 2002

Page 14: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPathway Tools Software:

PGDBs Created Outside SRI1000+ licensees: 75+ groups applying software to 150+ organisms

Saccharomyces cerevisiae, SGD project, Stanford University pathway.yeastgenome.org/biocyc/

Mouse, MGD, Jackson LaboratorydictyBase, Northwestern UniversityUnder development:

CGD (Candida albicans), Stanford University Drosophila, P. Ebert in collaboration with FlyBase C. elegans, P. Ebert in collaboration with WormBase

Planned: RGD (Rat), Medical College of Wisconsin

Arabidopsis thaliana, TAIR, Carnegie Institution of WashingtonTomato and Potato, Cornell University GrameneDB, Cold Spring Harbor LaboratoryMedicago truncatula, Samuel Roberts Noble Foundation

Page 15: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPathway Tools Software:

PGDBs Created Outside SRINIAID BRCs: BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB (Cryptosporidium)F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosaV. Schachter, Genoscope, AcinetobacterM. Bibb, John Innes Centre, Streptomyces coelicolorG. Church, Harvard, Prochlorococcus marinus, multiple strainsE. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensisR.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania majorHerbert Chiang, Washington University, Bacteroides thetaiotaomicronSergio Encarnacion, UNAM, Sinorhizobium melilotiGregory Fournier, MIT, Mesoplasma florumMark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicumArtiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472Kenneth J. Kauffman, University of California, Riverside, Desulfovibrio vulgaris

Page 16: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPathway Tools Software:

PGDBs Created Outside SRI

Mike McLeod, University of British Columbia, Rhodococcus sp. RHA1

Robert S. Munson, Children's Research Institute, Ohio, Haemophilus ducreyi, Haemophilus influenzae 86-026NP

John Nash, Canadian NRC, Campylobacter jejuni Christopher S. Reigstad, Washington University, Escherichia coli

UTI89 Haluk Resat, Pacific Northwest Lab, Rhodobacter sphearoides Gary Xie, Los Alamos Lab, Bacillus cereus

Large scale users: C. Medigue, Genoscope, 107 PGDBs G. Burger, U Montreal, 48 PGDBs Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,

Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes

Partial listing of outside PGDBs at BioCyc.org

Page 17: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsTerminology

“Database” = “DB” = “Knowledge Base” = “KB” = “Pathway/Genome Database” = “PGDB”

Page 18: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsWhy Create PGDBs?

Extract more information from your genome

Create an up-to-date computable information repository about an organism

Perform analyses on the genome and pathway complement of the organism

Analyses of omics data Analyses of cellular systems (dead-end metabolites) Reports generated by Pathway Tools

Perform comparative analyses with other organisms

Generate a genome poster and metabolic wall chart

Page 19: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsSequence Project Workflow

Raw Sequence

Phred

Phrap

BLAST, BLOCKS

GeneMark/Glimmer

PathoLogic

P/G Navigator

P/G Editors

WWW Publishing Analyses

PathwayTools

Page 20: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsEcoCyc Project – EcoCyc.org

E. coli Encyclopedia Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc

“Multi-dimensional annotation of the E. coli K-12 genome” Positions of genes; functions of gene products – 76% / 66% exp Gene Ontology terms; MultiFun terms Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Regulation of transcription initiation

Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040

Karp, Gunsalus, Collado-Vides, Paulsen

Page 21: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformatics

Paradigm 1:EcoCyc as Textual Review Article

All gene products for which experimental literature exists are curated with a minireview summary

Found on protein and RNA pages, not gene pages! 3257 gene products contain summaries

Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more

Additional summaries found in pages for operons, pathways

EcoCyc cites 15,880 publications

Page 22: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsParadigm 2: EcoCyc as

Computational Symbolic Theory

Highly structured, high-fidelity knowledge representation provides computable information

Each molecular species defined as a DB object Genes, proteins, small molecules

Each molecular interaction defined as a DB object Metabolic reactions Transport reactions Transcriptional regulation of gene expression

220 database fields capture extensive properties and relationships

Page 23: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsEcoCyc Procedures

DB updates performed by 5 staff curators Information gathered from biomedical literature

Enter data into structured database fields Author extensive summaries Update evidence codes

Corrections submitted by E. coli researchers

Four releases per year

Quality assurance of data and software Evaluate database consistency constraints Perform element balancing of reactions Run other checking programs

Page 24: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsMetaCyc: Metabolic

Encyclopedia Describe a representative sample of every experimentally

determined metabolic pathway Describe properties of metabolic enzymes

Literature-based DB with extensive references and commentary

Pathways, reactions, enzymes, substrates

Jointly developed by P. Karp, R. Caspi, C. Fulcher, SRI International L. Mueller, A. Pujar, Cornell Univ S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research 2008

Page 25: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsMetaCyc Data -- Version 11.6

Pathways 1010

Reactions 6,576

Enzymes 4,582

Small Molecules

6,561

Organisms 1,077

Citations 15,875

Page 26: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsTaxonomic Distribution of

MetaCyc Pathways

Bacteria 517

Green Plants 372

Mammals 90

Fungi 89

Archaea 65

Page 27: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsFamily of Pathway/Genome

Databases

MetaCyc

EcoCycCauloCycAraCyc

MtbRvCycHumanCyc

Page 28: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsComparison of BioCyc to KEGG:

The Data

KEGG approach: Static collection of pathway diagrams that are color-coded to produce organism-specific views

KEGG vs MetaCyc: Resource on literature-derived pathways

KEGG pathway maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms

KEGG pathway maps encompass multiple biological pathways; are 2-4 times the size of MetaCyc pathways

KEGG has no literature citations, no summaries, less enzyme detail

KEGG vs BioCyc organism-specific PGDBs KEGG re-annotates entire genome for each organism KEGG does not curate or customize pathway networks for each organism

Page 29: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformatics

Comparison of Pathway Tools to

KEGG: The Software

KEGG has no pathway hole filler or transport inference parser or operon predictor

KEGG has no interactive editing tools – you cannot refine a KEGG pathway DB

KEGG has no algorithmic visualization tools – pathway diagrams are pre-drawn

May become out of date Cannot show pathways at multiple detail levels

KEGG genome browser has very limited functionality KEGG has one overview diagram with limited functionality KEGG has no metabolite tracing tool KEGG has no Structured Advanced Query Tool

Page 30: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformatics

Overviews and Omics Viewers

Genome-scale Visualizations Metabolic map Transcriptional regulatory network Genome map

Overlay gene expression, proteomics, metabolomics data Obtain pathway based visualizations of omics data

Numerical spectrum of expression values mapped to a color spectrum Steps of overview painted with color corresponding to expression level(s)

of genes that encode enzyme(s) for that step

Page 31: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsEnvironment for Computational

Exploration of Genomes

Powerful ontology opens many facets of the biology to computational exploration

Global characterization of metabolic networkAnalysis of interface between transport and

metabolismNutrient analysis of metabolic network

Page 32: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPathway Tools Implementation

Details

Allegro Common LispSun, Linux, Windows, Macintosh platforms

Ocelot object database

370,000+ lines of code

Lisp-based WWW server at BioCyc.org Manages 370+ PGDBs

Page 33: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsThe Common Lisp Programming

Environment

Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)

Page 34: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsSurvey

Please complete survey at end of each day

Page 35: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsPGDB(s) That You Build

Before you leave Tar up your PGDB directory and FTP it home, email it home,

or copy it to flash disk We will create a backup copy of your PGDB directory if the

directory is still there at the end of the tutorial Delete the PGDB directory if you don’t want us to back it up We will not give the backed up data to anyone else

Page 36: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsInformation Sources

Pathway Tools User’s Guide /root/aic-export/pathway-tools/ptools/11.5/doc/manuals/userguide.pdf NOTE: Location of the aic-export directory can vary across different computers

Pathway Tools Web Site Publications, FAQ, programming examples, etc. http://bioinformatics.ai.sri.com/ptools/

BioCyc Publications Page http://biocyc.org/publications.shtml

MetaCyc Guide http://metacyc.org/MetaCycUserGuide.shtml

Slides from this tutorial http://bioinformatics.ai.sri.com/ptools/tutorial/

BioCyc Webinars http://biocyc.org/webinar.shtml

Page 37: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsReporting Pathway Tools

Problems

[email protected]

Tell us: What platform you are running on What version of Pathway Tools you are running The error message Result of [1] EC(2) :zoom :count :all What operation were you performing when the error occurred?

New patches automatically downloaded and loaded with PTools starts up

Auto-Patch Tools -> Instant Patch -> Download and Activate All Patches

Page 38: Overview of the Pathway Tools Software and Pathway/Genome Databases

SRI InternationalBioinformaticsSummary

Pathway Tools and Pathway/Genome Databases Not just for pathways! Computational inferences

Operons, metabolic pathways, pathway hole fillers Editing tools Analysis tools: Omics data on pathways Web publishing of PGDBs

Main classes of users: Develop PGDB to extract more information from genome for

genome paper Develop a model-organism DB for the organism that is updated

regularly and published on the web