steven l. salzberg the institute for genomic research and johns hopkins university

33
Data Management in a High-Throughput, Science-based Genome Center NIGMS Protein Structure Initiative Workshop on Data Management Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Upload: morley

Post on 13-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Data Management in a High-Throughput, Science-based Genome Center NIGMS Protein Structure Initiative Workshop on Data Management. Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University. How can you run 50 projects in parallel and: Maintain production - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Data Management in a High-Throughput, Science-based

Genome CenterNIGMS Protein Structure Initiative Workshop on Data Management

Steven L. Salzberg

The Institute for Genomic Researchand Johns Hopkins University

Page 2: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

• How can you run 50 projects in parallel and:– Maintain production– Generate consistent, high-quality data– Share data and software with the scientific

community– Publish research of the highest quality– Adapt quickly to new technologies

Page 3: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Genomes completed and published by TIGR and our collaborators, 1995-present

Organism ReferenceArabidopsis thaliana Lin et al., Nature 402: 761-8 (2000)Archaeoglobus fulgidus Klenk et al., Nature 390:364-370 (1997)Bacillus anthracis Ames Read et al., Nature 423: 81-86 (2003)Bacillus anthracis Florida Read et al., Science 296, 2028-33 (2002)Borrelia burgdorferi Fraser et al., Nature 390: 580-586 (1997) Brucella suis Paulsen et al., PNAS 99 (2002)Caulobacter crescentus Nierman et al., PNAS 98 (2001)Chlamydia pneumoniae Read et al., Nucl. Acids Res. 28, (2000)Chlamydia muridarum Read et al., Nucl. Acids Res. 28, (2000)Chlamydophila caviae Read et al., Nucl. Acids Res. 31, (2003) Chlorobium tepidum Eisen et al., PNAS 99: 9509-9514 (2002)Coxiella burnetii RSA 493 Seshadri et al., PNAS 100: 5455-60 (2003)Deinococcus radiodurans White et al., Science 286 (1999)Enterococcus faecalis Paulsen et al., Science 299: 2071-2074 (2003)Haemophilus influenzae Fleischmann et al., Science 269, (1995)Helicobacter pylori Tomb et al., Nature 388:539-547 (1997)Methanococcus jannaschii Bult et al., Science 273:1058-1073 (1996)Mycobacterium tuberculosis Fleischmann et al., J. Bact.184, (2002)Mycoplasma genitalium Fraser et al., Science 270:397-403 (1995)Neisseria meningitidis Tettelin et al., Science 287 (2000)Oryza sativa (rice) chr 10 Wing et al., Science 300: 1566-1569 (2003)Plasmodium falciparum Gardner et al., Nature 419:531-534 (2002)Plasmodium yoelii Carlton et al., Nature 419:512-519(2002)Porphyromonas gingivalis Nelson et al., J. Bact., in revision.Pseudomonas putida Nelson et al., Envir. Microbiol. (2002)Shewanella oneidensis Heidelberg et al., Nat. Biotech. 20 (2002) Streptococcus agalactiae Tettelin et al., PNAS. 99 (2002) Streptococcus pneumoniae Tettelin et al., Science 293 (2001)Sulfolobus islandicus virus Arnold et al., Virology 15:252-66 (2000)Thermotoga maritima Nelson et al., Nature 399: 323-329 (1999)Treponema pallidum Fraser et al., Science 281: 375-388 (1998)Vibrio cholerae Heidelberg et al., Nature 406, (2000)

Page 4: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Genomes in progress or recently completed

Fibrobacter succinogenesPrevotella intermediaPseudomonas fluorescensSilicibacter pomeroyi DSS-3Streptococcus agalactiae A909Streptococcus gordoniiStreptococcus mitisStreptococcus pneumoniae 670Acidobacterium capsulatum Bacillus anthracis A01055Bacillus anthracis A0402Bacillus anthracis Ames 0581Burkholderia thailandensisCampylobacter coli RM2228Campylobacter upsaliensis RM3195Clostridium perfringens SM101Epulopiscium fisheloniiHyphomonas neptuniumListeria monocytogenes F6854Listeria monocytogenes H7858Mycoplasma arthritidis Mycoplasma capricolumMyxococcus xanthusPrevotella ruminicolaPyrococcus furiosusVerrucomicrobium spinosum Actinomyces naeslundii

Bacillus anthracis A0071 Bacillus anthracis Kruger BErwinia chrysanthemiGemmata obscuriglobus Mycobacterium tuberculosisRuminococcus albusStreptococcus sobrinusAspergillus fumigatus Brugia malayi Coccidioides immitisCryptococcus neoformansEntamoeba histolyticaOryza sativa Chromosome 3 & 10Plasmodium vivaxSchistosoma mansoniSolanum spp.Tetrahymena thermophilaToxoplasma gondii Theileria parvaTrichomonas vaginalis Trypanosoma brucei Trypanosoma cruzi

Acidithiobacillus ferrooxidansBacillus anthracis Kruger BBurkholderia mallei Clostridium perfringens ATCC13124Dehalococcoides ethenogenesDesulfovibrio vulgaris Ehrlichia chaffeensisEhrlichia sennetsuGeobacter sulfurreducens Listeria monocytogenes Methylococcus capsulatusMycobacterium avium 104Mycobacterium smegmatisPseudomonas syringae Staphylococcus aureus Staphylococcus epidermidis Treponema denticolaWolbachia sp.Anaplasma phagocytophilaBacillus cereus 10987Bacteroides forsythesBrucella ovisBaumannia cicadellinicolaCampylobacter jejuniCarboxydothermus hydrogenoformansColwellia sp. 34HDichelobacter nodosus

Page 5: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

A Whole-Genome Shotgun Sequencing Project

Shotgun sequencingGenome Assembly AnnotationData release

Downstream research

Library construction

Colony picking

Template preparation

Sequencing reactions

Base calling

Sequence files

Assembler->Genome scaffold

Ordered contig set

Gap closuresequence editing

Re-assembly

ONE ASSEMBLY!

(per molecule)

Combinatorial PCRPOMP

Gene finding

Homology searches

Function assignments

Metabolic pathwaysGene families

Comparative genomics

Transcriptional/translational

regulatory elementsRepetitive sequences

Publicationwww.tigr.org

LIMS entry point

Microarraystudies

Vaccine, drugdevelopment

Human diseasestudies

Page 6: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Sequence Data Management

• Professional software engineers– Continual contact with lab staff

• Separate research staff– Computational research, separate from

production “pipeline”• Genome assembly• Gene finding• Sequence alignment

• Biology/genomics research staff

Page 7: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Joint Technology Center

• TIGR doubled its sequencing capacity in a 2-month period, Dec-Jan 2002-3

• We moved our entire facility to a new building and tripled its capacity in June-July 2003

• All databases, network connections, LIMS software continued operating smoothly throughout

Page 8: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Sequence LIMS Processes at TIGRSequence LIMS Processes at TIGR

Colony Plate Culture Plate DNA Plate Reaction Plate

DNA Sequencer (ABI 3730xl)

Chromatogram Files

Page 9: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

LIMS-Database Interactions at TIGRLIMS-Database Interactions at TIGR(circa 2001)(circa 2001)

library librarytemplatesamplereaction

librarytemplatesamplereactiongel

librarytemplatesamplereactiongel

librarytemplate

UploaderTrackerCreateGel Sheet

GelSheetMaker

Map RickyTrackerCreate/EditRxn Sheet

librarytemplatesamplereactiongel-----------sequencefeaturebases

One database per sequencing project....

Page 10: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Finishing Center – Sequencing Center Data Interchange (mid-2003)

DNA

Data

• Reads - bases- Quality- Chromatograms + positions- Revision- Trimming info- Insert mapping/pairing- Chemistry, read end, etc

• Library info (size estimators)• Vectors used

QC•Yield •Randomness•Percent good quality•Percent contaminant

Sequencing Center (SC)

Reaction ListsOn existing clones Insert Id

Primer

Finishing Center (FC)

Page 11: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

IT Support• High-quality computers and systems

support is absolutely critical

• At the same time, IT support should be invisible (ideally)

• TIGR has 15 full-time, professional IT staff– Systems administrators– Database administrators– Web administrators– Network administrators– Desktop support

Page 12: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

IT Infrastructure• 10 Compaq Alpha ES40s, max 32 GB RAM

– high-end computing

• 15 UltraSPARC and SunFire servers– database and web services

• 400 Pentium-based Linux computers– grid computing

• Gigabit backbone network

• Network-attached file storage– NetApp, EMC

Page 13: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

IT example: Grid computing facility

Pool ComputeCycles

Owner ComputeCycles

January 2001, 1-week snapshot

Page 14: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

High-throughput, automated annotation

• 10 bioinformatics engineers maintain software pipeline

• Can completely process a bacterial genome in one day

• Manage all data uploads to GenBank• Specialized analyses for publications

Page 15: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Manual annotation: ~10 genes / day

• Eight bacterial genome annotators• Inspection of:

– Search results– TIGRFam matches– Experimentally characterized gene– Literature references – abstracts and more

• Assignment of:– Common name– Role category– Genetic name– EC number

Page 16: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Genome Annotation Processes

Owen WhiteDirector of Bioinformatics

Charles LuBioinformatics

Engineer II

William C. NelsonBioinformatics

Analyst

Todd CreasyBioinformatics

Engineer II

Jaideep P.Sundaram

BioinformaticsEngineer III

Christopher R.Hauser

BioinformaticsEngineer II

Kelly S. MoffatLaboratory Data

Specialist

Hean L. KooLaboratory Data

Specialist

Sean DaughertyBioinformatics

Analyst I

Lauren BrinkacBioinformatics

Analyst I

Robert DodsonBioinformatics

Analyst III

Robert DeBoyBioinformatics

Analyst II

Michelle GiglioStaff Scientist

TBABioinformatics

Engineer II

TBABioinformatics

Engineer II

Tanja DavidsenBioinformatics

Engineer II

Nikhat ZafarBioinformatics

Engineer I

TBABioinformatics

Engineer II

Steven L. SalzbergSenior DirectorBioinformatics

Martin ShumwaySoftware Eng

Manager

Arthur DelcherSr Bioinformatics

Scientist

Corina AntonescuBioinformatics

Engineer

SoftwareDevelopment Web Content

Data Curation

SoftwareMaintenance

Assembly, SNPsAnup MahurkarSoftware Engineer

Manager

Michael SchatzSoftwareEngineer

Daniel KosackSoftwareEngineer

Samuel AngiuoliBioinformatics

Supervisor

Ian PaulsenAssociate Investigator

Qinghu RenPost-doctoralResearcher

Jonathan Eisen Investigator

Phylogenetic Analyses

Karen NelsonAssociate Investigator

Metabolism

Transporters

Jeremy PetersonBioinformatics

Manager

TBABioinformatics

Engineer II

Pawel GajerStaff Scientist

...plus 10 more annotators

Website dbase

Page 17: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Manatee: a collaborative tool

• Manual Annotation Tool, Etc Etc…• Open Source: manatee.sourceforge.net• Based on Chado relational schema • Several installations

– one week to install• Fully documented

– API– User manual– Installation

• Testing– Unit, integration testing– Deployment

• Quarterly training classes

Page 18: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Gene Information Page

Gene Identification InformationGene Ontology and Cellular RoleGraphical Display of AnalysesTextual Display of Analyses

Page 19: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Pair-wise Alignment Summary

Experimentally characterized proteinsindicated by color

Page 20: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Summary of Genome Information

Page 21: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Gene Information PageOnline help system

Page 22: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

• Published: 33• Completed: 18• Closure: 20• High-throughput sequencing: 22• Library construction: 19• Trend: more closely related

genomes

Annotation pipeline

{TigrDB

Gene coordsSeq/pep filesSearch resultsFamilies

ChuggaChugga

GenBank

Page 23: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Annotation research example: position effect

Page 24: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

u

+ sulfite

+ sulfate

PEP

Pyruvate

Glucose-6-P+GlucoseCELLOBIOSESTARCH

GLUCONATE

ED and PPP

GLYCOGEN

(D)+(L)-LACTATE

GLUCOSE

Glucose-6-P

Fructose-6-P

Fructose-1,6-P

Glyceraldehyde-3-P + Dihydroxyacetone-3-P

1,3-biphosphoglycerate

3-phosphoglycerate

2-phosphoglycerate

MANNOSE1P

GLYCOLYSISGLYCOLYSIS

CITRATE

5911,1348

6913

1408

7260

084209741387

69175120

0970

0820

5961

523200856351

OAA8147,8148

5053

6-P-gluconate

3132

6891,6892

2159,7870,0657,6915,5092,0967,6890,4637

1930,4834,5182,4827,4833,1650,2213,2214,1937,1938,1941,1940,1931-1935,7020,7096,5742,1332,7329,6544,3911,7491

TCA and GLYOXYLATE BYPASS

Isocitrate

2-KETOGLUTARATE

Succinyl-CoASuccinate

Fumarate

OXALOACETATE

Malate

ARGININEGlutamate

1801-1812

UREA,ORNITHINE,PUTRESCINE

763050557149

ASPARAGINE ASPARTATE66837750 0359

ASPARTATE

cysteine

2637

HISTIDINE 0849-0858

2141-21506280,2133

SERINE0368,0955,8072

Acetate

Acetyl-CoA

48095030

7296 Acetyl-P

FORMALDEHYDE2276,5455,8021

FORMATE CO27759-77645068-5071

LEUCINE, VALINEISOLEUCINE

0252,0660,3055,6294,14502632,1452,0694,2975

GABA

206382138217

GLUTAMINE 0776

ALANINE7582

ATP+PPn ADP+PPn-1

0556

H2CO3 CO2+H200101

SARCOSINE8024-29

histidine

67556759

6987-92,3275,8077-84,0881-83

serine0651,5718,1053

0592-05966952-6956

7467,8030 GLYCINE

CO2+ NH3

THIOSULFATE sulfite

0743,1598,0144,1523Homoserine

537518156167

leucine6889,5386-90,2975

methionine

threonine 0066,6165,7479

5748

chorismate

NOPALINE1835-6

PHENYLALANINE tyrosine1785

GLUTAMATE

glutamate

PROLINE glutamate0993

ACETOIN7653-6

5704-5

RIBOSE4637

4374

TAURINE8184

4033-44CO2+H20

ACC

polyhydroxyalkanoate 0894-6

FRUCTOSE 7260

GLYCEROL6809-13

CELLULOSE4371-4379

glutamate

phenylalanine

57152841

arginine0606-7,0438,2772,7841,1801,7947,2874,1372-74,6805,6936, 6789,8270

GLUTAMATE 0606-07,0438,2771,7841,1801,7947,7430,1221,0748,2550

ornithine,proline

7859-6753740127-29

tryptophan

CHOLINE betaine aldehyde0169 betaine4115

CHOLINE SULFATE0135

lysine2825,1370,6077,6068-9,0533-36

METHIONINE -ketobutyrate, methanethiol, NH3

6437

AROMATIC SULFONATES

OH

8167-74

1058

PHENOLSULPHATE

3226

glutamate

glutamine

GLUCOSAMINE-6-P 0259

0418

ETHANOLAMINE 7675-78acetaldehyde 7672 ethanol

TYROSINE3099,1553-5,3712,5278

OH

H+

GABA(6)

0131602067039370452108097B00223

07674 H+

ethanolamine

H+

proline

06835003720190902717039290447204715B00258

H+

aromatic amino acids

008540177807051

H+

amino acids

004210116804113044490652806581

(6)

H+

histidine07482

H+

glycine betainecholine

003030081002293027780428508186

Na+

alanine/glycine07749

Na+

proline0099501724

H+/Na+

glutamate000390326306942

(3)

(6)

(2)

(3)

Na+

branched chain amino acids01943

H+ Na+

0229202183 00799 00329 0599500950 06173

(7)

xanthine/ uracil

H+

01500015150224807340B00058B00069

(6)

serine02843

purine/cytosine/allantoin

H+

0217302731027550349707398B00094

(6)

H+

amino acid? (11)

02733 0378002734 0508802783 0654802885 0706803094 0824603143

amino acid?

LysE family

03773037760468504749074160399704987

H+(7)

EI

HPr

fructoseIIC

IIAIIB

PEP

pyruvate

fructose-1-phosphate

PTS

00665, 07014,07012, 07257,07262

glycerol06808

waterB00055

H+

sugar02882 0320004513 0478205593

(5)

riboseATP

ADP

04642-0464504175-04184

04773

04940-0494806899-06095

sugar?ATP

ADP(2)

amino acidATP

ADP(10)

00031-0003400866-0086901324-0132902623-0262502830-0283806448-0645407487-0748908101-0810608188-08191B00245-B00248

(3)

009420278504312

branched chain amino acid

ATP

ADP(6)

01120-0112801159-0116504161-0416404192-0420106712-0671807539-07546

(2)

H+

sialic acid01629

H+

galactonate(3)

040520434905297

H+

glucarate02823

H+

hexuronate04419

H+

gluconate/ idonate(3)

029030312107493

H+

benzoate/4-hydroxy-benzoate

(7)03532 0530405642 0272704554 0455506330

chloride02291

mechanosensitive ion channel

(8)0151300486007970171003213042110578106366

(3)

0015603521B00088

K+

sulfate

(5)00100

00139062730738202334

H+/Na+

H+ /Na+

dicarboxylate006910466304955052700665006285

(6)

phosphate

(3)

020710633400027 H+/

Na+

H+

lactate01350

H+

formate06693

Na+?

(4)

03347041230575803386

AD

P

AT

P

phosphate

00373-0037704337-04343

02490

(2)

H+

citrate00024052680183104267

(4)

Mg2+/Co2+

0560603870

(2) AD

P

AT

P

Zn2+

00069-0007202516-02519

AD

P

AT

P

ironchelate/hemin

01097-0109900589-0059004338-0444304704-04707B00115-B0011701432-01435

(2)0578407705

AD

P

AT

P

B0030508304034060389304335-04336043880516900565-0056801690018920190204322-04323048640573307001-070020724105110-05113

(17)

?

AT

P

AD

P

H+

00242-00252

F-type ATPase

P-type ATPases

AT

P

AD

P (9)

Cd2+/Cu2+/K+/Mg2+

00189002790028100297006740436107600B00030 01979-01982

AD

P

AT

P(5)

Ni2+

03431-0343503232-03242

ATP

ADP

opines01833-01839

(4)

H+metalcation

00208012830387005226

(6)

K+H+/Na+

005010328607394017540229906628

arseniteH+

0424805470

chromateH+

04499

H+ammonium

0052405192

Na+H+

05004-05011

H+

dicarboxylate

06670-06773,02296

Na+

multidrug0047607628

H+

malonate?(3)

028910368807052

ureaB00086

Mg2+

01819

nicotinamidemononucleotide

04017

Cu2+?002970504106871

H+

polysaccharide03589

H+

(2) (22)

0031100787009860114401646017200031100455015020253603296038950397805254052560540705409058340649906501074100777404758

H+metal ion

(3)

00184-0018700290-0029204717-04720

H+multidrugH+ ?

(4)

?

(6)03067-0306903108-0311004086-0409005258-0526006308-0631207083-07084

arginine

ornithine(2) 06930

06933

H+?

00168017210178102322025880260503229035180450404743056220618606981B00210

(14)

H+

tartrate?(3)

013130316604064

(8)

H+?

004940284802916043560453504686063030759507737

(9)

muconateH+

05831

H+

2-ketogluconate03187

H+

nitrite05215

oxalateformate

05850

AD

P

AT

P

(2)

multidrug

01013 0498105526-05527

AD

P

AT

P

sulfate

00626-0063105788

AD

P

AT

P

nitrate

08223-0822602762-0276603450-03451

05211

AD

P

AT

PA

DP

AT

P

taurine

08179-0818208294-08297

03423

AD

P

AT

P(2)

polyamines00612-00617 02492-0250005790-05793 06140-0614707135-07142 07871-07877

003560244602641035630399905057

(6)

(6)

AD

P

AT

P

?

(2)

00678-0068304154-0415508198-0820306869

AD

P

AT

PA

DP

AT

P

peptides07115-0712601997-0200403700-03703

(3)

(3)

AD

P

AT

P

05701-05703

alkylphosphonate

07209-07212

00446

amino acids/glycine betaine

00077-0008101791-0179602897-0289906816-0682108073-0807507730-07733

001370064700981080590431205762

(2)

AD

P

AT

P

molybdate02470-02476

0295003439

(2)

(6)

(6)

(2)

(3)(2) 03300-0330100620-0062406089-0609102852-02858

BenF/PhaK/OprD porins

06207 02876 06746 0674304268 07118 00496 0018302775 07251 04190 0812401828 05266 02566 0625203169 02729 02324 0455906315 06665 02606(23)

TonB-dependentreceptor01170 01318 01568

01578 01890 0280302868 03243 0325703267 03551 0368504444 04701 0497905060 05596 0620506925 07144 0715407469 07689 0770307985 08117 08125

(27)

OprB-like porin

02876 0620706898

(3)

OprP porin00194

STACHYDRINE proline1320-1322

OXOPROLINE glutamate1954

Page 25: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Heavily curated multiple alignments based on protein families of the same function.

Proposed “cure” for transitive annotation. Based on Hidden Markov Models (HMMs). approaching 2,000 families. Complete assignments to Gene Ontology Cutoff scores for each family

Trusted (automated name assignment) Noise (manual inspection required)

Downloadable. Fully integrated into the Interpro database

TIGRFams

Page 26: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

TIGRFAMs: Genome coverage

0

200

400

600

800

1000

1200

1400

1600

1800

2000

TIGRFAMS 174 190 247 0 256 216 189 332 302 386 298 148 338 370 186 450

Identified genes 522 393 1894 381 768 899 559 1338 947 1024 889 263 1210 961 504 1676

M. j. Chl tr Strep Chl trTherm

A. f. B. b. Caul D. r. H. i. H.p. M. g. TB Neis. Trep Vib.

Page 27: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Multiple Genome Annotation

• Genes usual pipeline– blast, Pfam, COG, TIGRFam, Interpro, etc

• Cluster genes based on common properties using hierarchical clustering

• Display

• Can annotators select grab subsets of genes for reliable assignment?

Page 28: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Malate dehydrogenase

Lactate dehydrogenase

Page 29: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Sybil Comparative System

• Open Source software• Complete, portable data management

system for genome annotation• Chado relational database (developed by

FlyBase and other collaborators)• Extensive graphical interface• Priority “use case”: management of genes

and genomes for identification of pathogen-related genes

Page 30: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Sybil Overview

Page 31: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

A B C

Tabular views of conserved synteny,orthologs,blast matches across multiple genomes

Page 32: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Open Source Softwareat TIGR

• Manatee• Sybil (prototypes)• Annotation Engine• MUMmer (large-scale genome alignment)• BAMBUS (Assembly/scaffolding)• Glimmer, GlimmerM, Exonomy (Gene finding)• TM4 (Microarray tools)• Chado/BSML

Perl Artistic License

Page 33: Steven L. Salzberg The Institute for Genomic Research and Johns Hopkins University

Conclusions

• Professional software and support staff are critical to a large, high-throughput research project

• Scientists benefit from frequent interactions with production line staff

• High-quality support allows scientific staff to devote more effort to scientific discovery