presented by john quackenbush, ph.d. at the june 10, 2003 meeting of the pharmacology toxicology...

65
Presented by Presented by John Quackenbush, Ph.D. John Quackenbush, Ph.D. at the at the June 10, 2003 June 10, 2003 meeting of the meeting of the Pharmacology Toxicology Subcommittee Pharmacology Toxicology Subcommittee of the of the Advisory Committee for Pharmaceutical Science Advisory Committee for Pharmaceutical Science

Upload: silvester-patrick

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Presented by Presented by John Quackenbush, Ph.D.John Quackenbush, Ph.D.

at the at the June 10, 2003June 10, 2003meeting of themeeting of the

Pharmacology Toxicology SubcommitteePharmacology Toxicology Subcommitteeof theof the

Advisory Committee for Pharmaceutical ScienceAdvisory Committee for Pharmaceutical Science

Page 2: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Challenges in Data Challenges in Data Management and Analysis for Management and Analysis for

MicroarraysMicroarrays

FDAFDA

10 June 200310 June 2003

Page 3: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Selecting the Appropriate Selecting the Appropriate PlatformPlatform

Page 4: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

ACGTAGCTAGCTGATCGTAGCTAGCTAGCTAGCTGATCACGTAGCTAGCTGATCGTAGCTAGCTAGCTAGCTGATCACGTAGCTAGCACGTAGCTAGCTTGATCGTAGCTAGCGATCGTAGCTAGC CGTAGCTAGCTCGTAGCTAGCTGGATCGTAGCTAGCTATCGTAGCTAGCT GTAGCTAGCTGGTAGCTAGCTGAATCGTAGCTAGCTATCGTAGCTAGCTA TAGCTAGCTGATAGCTAGCTGATTCGTAGCTAGCTAGCGTAGCTAGCTAG AGCTAGCTGATAGCTAGCTGATCCGTAGCTAGCTAGCGTAGCTAGCTAGC GCTAGCTGATCGCTAGCTGATCGGTAGCTAGCTAGCTTAGCTAGCTAGCT CTAGCTGATCGCTAGCTGATCGTTAGCTAGCTAGCTAAGCTAGCTAGCTA TAGCTGATCGTTAGCTGATCGTAAGCTAGCTAGCTAGGCTAGCTAGCTAG AGCTGATCGTAAGCTGATCGTAGGCTAGCTAGCTAGCCTAGCTAGCTAGC GCTGATCGTAGGCTGATCGTAGCCTAGCTAGCTAGCTTAGCTAGCTAGCT CTGATCGTAGCCTGATCGTAGCTTAGCTAGCTAGCTGAGCTAGCTAGCTG TGATCGTAGCTTGATCGTAGCTAAGCTAGCTAGCTGAGCTAGCTAGCTGA GATCGTAGCTAGATCGTAGCTAGGCTAGCTAGCTGATCTAGCTAGCTGAT ATCGTAGCTAGATCGTAGCTAGCCTAGCTAGCTGATCTAGCTAGCTGATC

Design and Design and synthesize chipssynthesize chips

Affymetrix GeneChip™ Expression AnalysisAffymetrix GeneChip™ Expression Analysis

Generate DNAGenerate DNASequenceSequence

ACGTAGCTAGCACGTAGCTAGCTGATCGTAGCTGATCGTAGCTAGCTAGCTAGCTGATCTAGCTAGCTAGCTGATC

Page 5: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Affymetrix GeneChip™ Expression AnalysisAffymetrix GeneChip™ Expression Analysis

Obtain RNAObtain RNASamplesSamples

Prepare Prepare FluorescentlyFluorescently

LabeledLabeledProbesProbes

ControlControl

TestTest

Scan chipsScan chips

AnalyzeAnalyze

PMPM

MMMM

Hybridize andHybridize andwash chipswash chips

Page 6: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Microbial

ORFs

Design PCR Primers

PCR Products

Eukaryotic

Genes

Select cDNA clones

PCR Products

Microarray Overview IMicroarray Overview I

For each plate set,For each plate set,many identical replicasmany identical replicas

Microarray SlideMicroarray Slide(with 60,000 or more(with 60,000 or more

spotted genes)spotted genes)

+

Microtiter PlateMicrotiter Plate

Many different plates Many different plates containing different genescontaining different genes

Page 7: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Microarray Gene Chip Overview IIMicroarray Gene Chip Overview II

Obtain RNA SamplesObtain RNA SamplesPrepare FluorescentlyPrepare Fluorescently

Labeled ProbesLabeled Probes

ControlControl

TestTest

Hybridize,Hybridize,WashWash

MeasureMeasureFluorescenceFluorescencein 2 channelsin 2 channels

red/greenred/green

Analyze the dataAnalyze the datato identifyto identifypatterns ofpatterns of

gene expressiongene expression

Page 8: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

GeneGeneSpots Spots on anon anArrayArray

FluorescenceFluorescenceIntensityIntensity

ExpressionExpressionMeasurementMeasurement

TissueTissueSelectionSelection

DifferentialDifferentialState/StageState/StageSelectionSelection

RNA PreparationRNA Preparationand Labelingand Labeling

CompetitiveCompetitiveHybridizationHybridization

Microarray Expression AnalysisMicroarray Expression Analysis

Page 9: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Lack of standardization makes direct Lack of standardization makes direct comparison of results a challengecomparison of results a challengeLot-to-log variation in arrays can introduce Lot-to-log variation in arrays can introduce artifacts – are the results dependent on the artifacts – are the results dependent on the biology or on the arrays (or technician or biology or on the arrays (or technician or reagent lots or ....)reagent lots or ....)Commercial arrays provide a standard and Commercial arrays provide a standard and remove some design considerations (one remove some design considerations (one sample, one array), but cost up to 10x (or sample, one array), but cost up to 10x (or greater) more than in-house arraysgreater) more than in-house arraysArrays demand good LIMS systems for sample Arrays demand good LIMS systems for sample trackingtracking

Platform-related issuesPlatform-related issues

Page 10: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Microarray AnalysisMicroarray Analysis

Page 11: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Choose an experimentally interesting and tractable Choose an experimentally interesting and tractable model systemmodel systemDesign an experiment with comparisons between Design an experiment with comparisons between related variants related variants Include sufficient biological replication to make good Include sufficient biological replication to make good estimatesestimatesHybridize and collect dataHybridize and collect dataNormalize and filterNormalize and filterMine data for biological patterns of expressionMine data for biological patterns of expressionIntegrate expression data with other ancillary data Integrate expression data with other ancillary data such, including genotype, phenotype, the genome, such, including genotype, phenotype, the genome, and its annotationand its annotation

General Microarray StrategyGeneral Microarray Strategy

Page 12: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Annotating andAnnotating andComparing ArraysComparing Arrays

Page 13: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

TIGR Gene Indices TIGR Gene Indices home page home page

www.tigr.org/tdb/tgiwww.tigr.org/tdb/tgi

~60 species~60 species

>16,000,000 sequences>16,000,000 sequences

~60 species~60 species

>16,000,000 sequences>16,000,000 sequences

Page 14: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The Mouse Gene IndexThe Mouse Gene Index <http://www.tigr.org/tdb/mgi><http://www.tigr.org/tdb/mgi>

Page 15: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

A TC ExampleA TC Example

Page 16: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Babak ParviziBabak Parvizi

GO Terms GO Terms and EC Numbersand EC Numbers

Page 17: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The TIGR Gene IndicesThe TIGR Gene Indices <http://www.tigr.org.tdb/tdb/tgi><http://www.tigr.org.tdb/tdb/tgi>

Dan Lee, Ingeborg HoltDan Lee, Ingeborg Holt

Page 18: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Tentative OrthologuesTentative Orthologues

And ParaloguesAnd Paralogues

Building TOGs: Reflexive, Transitive ClosureBuilding TOGs: Reflexive, Transitive Closure

Thanks to Woytek Makałowski and Mark Boguski Thanks to Woytek Makałowski and Mark Boguski

Page 19: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

TOGA: An Sample Alignment: TOGA: An Sample Alignment: bithoraxoid-like proteinbithoraxoid-like protein

Page 20: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical
Page 21: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Gene Finding in HumansGene Finding in Humans is easy!is easy!

Razvan SultanaRazvan Sultana

Page 22: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Gene Finding in HumansGene Finding in Humans is easy?is easy?

Razvan SultanaRazvan Sultana

Page 23: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Gene Finding in HumansGene Finding in Humans is difficult?is difficult?

Razvan SultanaRazvan Sultana

Page 24: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Gene Finding in HumansGene Finding in Humans is difficult?is difficult?

Razvan SultanaRazvan Sultana

A genome and its annotation is A genome and its annotation is onlyonly a a hypothesis that must be tested.hypothesis that must be tested.

Page 25: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

http://pga.tigr.org/tools.shtmlhttp://pga.tigr.org/tools.shtml

RESOURCERER RESOURCERER Jennifer TsaiJennifer Tsai

Page 26: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

RESOURCERER: An ExampleRESOURCERER: An Example

Page 27: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

RESOURCERER: Using Genetic MarkersRESOURCERER: Using Genetic Markers

Next step: Integrate QTLsNext step: Integrate QTLs

Page 28: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The “complete” genome is incompleteThe “complete” genome is incompleteGene names are not yet well definedGene names are not yet well defined

One gene may have many namesOne gene may have many namesOne gene may have many sequencesOne gene may have many sequencesOne sequence may have many namesOne sequence may have many names

Analysis and interpretation depends on well Analysis and interpretation depends on well annotated gene setsannotated gene sets

Gene names, Gene Ontology Assignments, and Gene names, Gene Ontology Assignments, and pathway informationpathway information

Cross-species comparisons require good Cross-species comparisons require good knowledge of orthologues and paraloguesknowledge of orthologues and paralogues

Annotation IssuesAnnotation Issues

Page 29: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Tools and TechniquesTools and Techniquesfor Array Analysisfor Array Analysis

Page 30: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Design the experimentDesign the experiment

Perform the hybridizations and generate Perform the hybridizations and generate imagesimages

Analyze images to identify genes and Analyze images to identify genes and expression levels (hybridization intensities)expression levels (hybridization intensities)

Normalize expression levels to facilitate Normalize expression levels to facilitate comparisonscomparisons

Analyze expression data to find biologically Analyze expression data to find biologically relevant patternsrelevant patterns

Analysis stepsAnalysis steps

Page 31: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

MADAM: Microarray Data ManagerMADAM: Microarray Data Manager

Available with OSI source and MySQLAvailable with OSI source and MySQL

Joseph WhiteJoseph WhiteJerry LiJerry Li

Alexander SaeedAlexander SaeedVasily SharovVasily Sharov

Syntek Inc.Syntek Inc.

MAGE-ML exportMAGE-ML exportby Juneby June

Page 32: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Goal is to measure ratios of gene expression levelsGoal is to measure ratios of gene expression levels(ratio)(ratio)ii = R = Rii/G/Gii

where Rwhere Rii/G/Gii are, respectively , the measured are, respectively , the measured

intensities for the intensities for the iith spot.th spot.

In a self-self hybridization, we would expect all ratios In a self-self hybridization, we would expect all ratios to be equal to one:to be equal to one:

R Rii/G/Gii = 1 for all = 1 for all ii. But they may not be.. But they may not be.

Why not?Why not? Unequal labeling efficiencies for Cy3/Cy5Unequal labeling efficiencies for Cy3/Cy5 Noise in the systemNoise in the system Differential expressionDifferential expression

Normalization brings (appropriate) ratios back to one.Normalization brings (appropriate) ratios back to one.

Why Normalize Data?Why Normalize Data?

Page 33: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

LOWESS ResultsLOWESS Results

Page 34: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

MIDAS: Data AnalysisMIDAS: Data Analysis Wei LiangWei Liang

Available with sourceAvailable with source

Variance Stabilization,Variance Stabilization,Adding Error Models,Adding Error Models,

MAANOVA,MAANOVA,Automated ReportingAutomated Reporting

Page 35: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

MeV: Data Mining ToolsMeV: Data Mining Tools Alexander SaeedAlexander SaeedAlexander SturnAlexander SturnNirmal BhagabatiNirmal Bhagabati

John BraistedJohn BraistedSyntek Inc.Syntek Inc.

Datanaut, Inc.Datanaut, Inc.

Available with OSI sourceAvailable with OSI source

Page 36: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

There is no standard method for data analysisThere is no standard method for data analysis

The same algorithm with a small change in The same algorithm with a small change in parameters (such as distance metric) can parameters (such as distance metric) can produce very different resultsproduce very different results

Data normalization plays a big role in Data normalization plays a big role in identifying “differentially expressed” genesidentifying “differentially expressed” genes

Much of the apparent disparity in microarray Much of the apparent disparity in microarray datasets can be attributed to differences in datasets can be attributed to differences in data analysis methods, from image processing data analysis methods, from image processing to normalization to data miningto normalization to data mining

Analysis IssuesAnalysis Issues

Page 37: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Data Reporting StandardsData Reporting Standards

Page 38: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

What data should we collect?What data should we collect? Nature GeneticsNature Genetics 29, December 2001 29, December 2001

<http://www.mged.org><http://www.mged.org>MAGE-ML – XML-based data exchange formatMAGE-ML – XML-based data exchange format

EVERYTHINGEVERYTHING

Page 39: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Publications on Microarray Data Exchange StandardsPublications on Microarray Data Exchange Standards

MIAME Standards:MIAME Standards:Nature family, Cell family, EMBO reports, Bioinformatics,Nature family, Cell family, EMBO reports, Bioinformatics,Genome Research, Genome Biology, Science, The Lancet,Genome Research, Genome Biology, Science, The Lancet,Science, and others….Science, and others….

Page 40: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

MIAME Standards are a start, but still evolvingMIAME Standards are a start, but still evolving

Implementation will require further Implementation will require further development of ontologies to create standard development of ontologies to create standard descriptorsdescriptors

MIAME-Tox MIAME-Tox <http://www.mged.org/MIAME1.1-DenverDraft.DOC><http://www.mged.org/MIAME1.1-DenverDraft.DOC> represents an attempt to extend this to represents an attempt to extend this to toxicologytoxicology

Software must be developed to read/write Software must be developed to read/write MAGE-MLMAGE-ML

Public databases need to be extended to meet Public databases need to be extended to meet Tox needsTox needs

Standardization IssuesStandardization Issues

Page 41: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

ScienceScience

Page 42: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Integrating ExpressionIntegrating Expressionwith other datawith other data

Page 43: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Innate ImmunityInnate ImmunityInnate ImmunityInnate ImmunityAdaptive ImmunityAdaptive Immunity

Pathophysiologic ConditionsPathophysiologic Conditions

Immunomodulatory GenesImmunomodulatory Genes

SepsisARDS

Asthma

SepsisARDS

Asthma

Antigen PresentationAntigen Presentation

Cytokines andAdhesion Proteins

Cytokines andAdhesion Proteins

CD14CD14

LPSLPS TLR ProteinsTLR Proteins

NF-BNF-B

IBIB

InflammatoryCell Recruitment

InflammatoryCell Recruitment

LBPLBP

DegradationDegradation

NIKNIK

TRAF-6TRAF-6

MyD88MyD88IRAK2IRAK2

BPIBPI

Adapted from Godowski. NEJM 1999; 340:1835Adapted from Godowski. NEJM 1999; 340:1835

MD-2MD-2

David SchwartzDavid Schwartz

Page 44: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

C57BL/6 DBA/2

BXD5BXD29 BXD39 BXD42

ExamplesExamples

BXD Recombinant Inbred Strains (n=32)BXD Recombinant Inbred Strains (n=32)

200200

400400

600600

800800

10001000

11

Lav

age

PM

Ns

x 10

L

avag

e P

MN

s x

10 33 /

ml

/ml

P1P1

LL11

L2L2 L3L3H1H1

H2H2

H3H3

P2P2

P1+P1+ H3H3++

P2+P2+ H2+H2+

L1+L1+ H1H1++L2L2

++L3L3++

R (P1+P2)R (P1+P2)

53 Hybridizations53 Hybridizations

P1P1 P2P2

P1+P1+

L1L1H1H1

L1+L1+ H1+H1+

P2+P2+

Result: ~425 “significant” genesResult: ~425 “significant” genes

Page 45: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

C57BL/6 DBA/2

BXD5BXD29 BXD39 BXD42

ExamplesExamples

IDEAIDEA: Build QTL Maps and use those: Build QTL Maps and use thoseto filter expression datato filter expression data

Goal: Find differentially expressed genes Goal: Find differentially expressed genes genetically linked to responsegenetically linked to response

BXD Recombinant Inbred Strains (n=32)BXD Recombinant Inbred Strains (n=32)

200200

400400

600600

800800

10001000

11

Lav

age

PM

Ns

x 10

L

avag

e P

MN

s x

10 33 /

ml

/ml

Page 46: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

525525

Genes in QTLGenes in QTL Genes by MicroarrayGenes by Microarray

426426

Microarray Expression-QTL Consensus Microarray Expression-QTL Consensus Candidate GenesCandidate Genes

4646

Candidate genes for follow-up and validationCandidate genes for follow-up and validation

Page 47: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

BG076932BG076932 annexin A1 (Anxa1) annexin A1 (Anxa1)BG085317BG085317 arginase type II (Arg2) arginase type II (Arg2)BG064781BG064781 cytidine 5'-triphosphate synthase (Ctps) cytidine 5'-triphosphate synthase (Ctps)BG085740BG085740 ets-related transcription facto ets-related transcription factoBG063515BG063515 ferritin heavy chain (Fth) ferritin heavy chain (Fth)BG078398BG078398 MARCKS-like protein (Mlp) MARCKS-like protein (Mlp)AW556835AW556835 protein tyrosine phosphatase, non-receptor type 2 (Ptpn2) protein tyrosine phosphatase, non-receptor type 2 (Ptpn2)BG077485BG077485 ring finger protein (C3HC4 type) 19 (Rnf19) ring finger protein (C3HC4 type) 19 (Rnf19)BG085186BG085186 surfactant protein-D gene surfactant protein-D geneAW550270AW550270 tenascin C (Tnc) tenascin C (Tnc)BG065761BG065761 tumor necrosis factor, alpha-induced protein 2 (Tnfaip2) tumor necrosis factor, alpha-induced protein 2 (Tnfaip2)BG074379BG074379 co-chaperone mt-GrpE#2 precursor putative co-chaperone mt-GrpE#2 precursor putative BG080688BG080688 CSF-1CSF-1BG067349BG067349 C-type lectin MincleC-type lectin MincleBG073439BG073439 DKFZp564O1763DKFZp564O1763AW551388AW551388 E2F-like transcriptional repressor proteinE2F-like transcriptional repressor proteinBG076460BG076460 glutamate-cysteine ligase catalytic subunit (GLCLC) glutamate-cysteine ligase catalytic subunit (GLCLC) BG080666BG080666 gly96gly96BG067921BG067921 GTP binding proteinGTP binding proteinBG072974BG072974 DKFZp547B146DKFZp547B146BG070296BG070296 DKFZp566F164DKFZp566F164BG074109BG074109 Hsp86-1Hsp86-1BG077487BG077487 hypoxia inducible factor 1hypoxia inducible factor 1

BG078274BG078274 I kappa B alpha geneI kappa B alpha geneBG084405BG084405 IAP-1IAP-1BG069214BG069214 inhibitor of apoptosis protein 1inhibitor of apoptosis protein 1BG067127BG067127 interferon regulatory factor 1interferon regulatory factor 1BG080268BG080268 KCKCBG070106BG070106 lipocalinlipocalinBG064651BG064651 MAILMAILBG063925BG063925 metallothionein IImetallothionein IIBG077818BG077818 metallothionein-Imetallothionein-IBG073108BG073108 MHC class III region RDMHC class III region RDBG064928BG064928 mitogen-responsive 96mitogen-responsive 96BG072801BG072801 S100A9S100A9BG086320BG086320 SDF-1-betaSDF-1-betaBG072793BG072793 T-cell activating proteinT-cell activating proteinBG073446BG073446 TH1 proteinTH1 proteinBG072227BG072227 TNFaTNFaBG068491BG068491BG071081BG071081BG067341BG067341BG067620BG067620BG067670BG067670BG066678BG066678BG071169BG071169

Candidate Gene Set for LPS responseCandidate Gene Set for LPS response

Page 48: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

00 33 66 99 1212

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

Sleep Deprivation Studies in MouseSleep Deprivation Studies in Mouse

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

zz zzzz

Page 49: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Experimental ParadigmExperimental ParadigmCompare gene expression between sleeping and Compare gene expression between sleeping and sleep-deprived mice in cortex and hypothalamussleep-deprived mice in cortex and hypothalamus

Perform 3 biological replicatesPerform 3 biological replicates

Normalize and filter data and use data mining techniques to Normalize and filter data and use data mining techniques to select distinct patterns of gene expressionselect distinct patterns of gene expression

Use Gene Ontology (GO) assignments to classify genes by Use Gene Ontology (GO) assignments to classify genes by cellular localization, molecular function, biological processcellular localization, molecular function, biological process

Use GO analysis to develop an understanding of responseUse GO analysis to develop an understanding of response

Page 50: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Differential Expression in CortexDifferential Expression in Cortex

Energy MetabolismTranscription;Mitochondrial and Ribosomal Proteins

Stress Response

Metabolism andSignal Transduction

Page 51: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Differential Expression in HypothalamusDifferential Expression in Hypothalamus

Sleep signaling

Page 52: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Predicting OutcomePredicting Outcome

Page 53: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Patients present with tumors, many of which Patients present with tumors, many of which are indistinguishable.are indistinguishable.

Histology can provide some information, but Histology can provide some information, but these have little predictive power.these have little predictive power.

Microarrays provide a “fingerprint” that can Microarrays provide a “fingerprint” that can serve as a phenotypic measure that may be serve as a phenotypic measure that may be linked to outcome.linked to outcome.

This is a huge problem in data mining.This is a huge problem in data mining.

The problem

Page 54: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The problem in pictures: Adenocarcinomas

Page 55: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

32k Human Arrays32k Human Arrays

Page 56: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

cDNA Multi-Organ Cancer ClassifiercDNA Multi-Organ Cancer Classifier

hierarchical clusteringhierarchical clustering(Pearson correlation)(Pearson correlation)

UNSUPERVISEDUNSUPERVISEDCLASSIFICATIONCLASSIFICATION

Artificial neural network Artificial neural network training and validationtraining and validation

SUPERVISEDSUPERVISEDCLASSIFICATIONCLASSIFICATION

77 tumor samples; 144 hybridization assays77 tumor samples; 144 hybridization assays

Normalization and flip-dye replica consistency Normalization and flip-dye replica consistency check check

Statistical filtering of genesStatistical filtering of genes(Kruskal-Wallis H-test)(Kruskal-Wallis H-test)

685 genes 685 genes

breastbreast

ovaryovary lunglung

p < 0.05p < 0.05Divide experiments into training Divide experiments into training

and validation sets and validation sets

Validation25%

Training75%

Page 57: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Input data:Input data:A list of genes withA list of genes withexpression levelsexpression levels

Output data:Output data:A tumor typeA tumor typecallcall

Neural Networks and CancerNeural Networks and Cancer

““hidden layers” allowhidden layers” allowcomplex connectionscomplex connections

Page 58: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Training:Training:Adjusts weightsAdjusts weightsand connectionsand connections

Neural Networks and CancerNeural Networks and Cancer

Breast TumorBreast Tumor

Page 59: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Tumor TypeTumor Type Number of Number of SamplesSamples

Array PlatformArray Platform

BladderBladder 1919 U95, HU6800U95, HU6800

BreastBreast 4242 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

Central Nervous – Atypical Central Nervous – Atypical Teratoid/RhandoidTeratoid/Rhandoid

1010 HU6800HU6800

Central Nervous GliomaCentral Nervous Glioma 1010 HU6800HU6800

Central Nervous - MedulloblastomaCentral Nervous - Medulloblastoma 7070 HU6800HU6800

ColonColon 4141 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

Stomach/EG JunctionStomach/EG Junction 3030 U95, TIGR 32kU95, TIGR 32k

KidneyKidney 3131 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

Leukemia – Acute Lymphocyite B CellLeukemia – Acute Lymphocyite B Cell 1010 HU6800HU6800

Leukemia – Acute Lymphocyite T CellLeukemia – Acute Lymphocyite T Cell 1010 HU6800HU6800

Leukemia – Acute MyelogenousLeukemia – Acute Myelogenous 1010 HU6800HU6800

Lung – AdenocarcinomaLung – Adenocarcinoma 7171 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

Lung – Squamous Cell CarcinomaLung – Squamous Cell Carcinoma 2121 U95U95

Lymphoma - FollicularLymphoma - Follicular 1111 HU6800HU6800

Lymphoma – Large B CellLymphoma – Large B Cell 1111 HU6800HU6800

MelanomaMelanoma 1010 HU6800HU6800

MesotheliomaMesothelioma 1010 HU6800HU6800

OvaryOvary 4444 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

PancreasPancreas 2626 U95, HU6800, TIGR 32kU95, HU6800, TIGR 32k

ProstateProstate 4242 U95, HU6800U95, HU6800

UterusUterus 1010 HU6800HU6800

Tumors in the Universal ClassifierTumors in the Universal Classifier

543 tumor samples543 tumor samples21 tumor types21 tumor types95% of all cancers95% of all cancers

Page 60: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Data AcquisitionData Acquisition

NormalizationNormalizationand Scalingand Scaling

StatisticalStatisticalScreeningScreening

Neural NetworkNeural NetworkTraining andTraining and

ValidationValidation

Microarray DatabaseMicroarray Database

Training SetTraining SetTumor 1Tumor 1Tumor 2Tumor 2Tumor 3Tumor 3Tumor 4Tumor 4Tumor 5Tumor 5

……Tumor nTumor n

Test SetTest SetTumor 1Tumor 1Tumor 2Tumor 2Tumor 3Tumor 3Tumor 4Tumor 4Tumor 5Tumor 5

……Tumor nTumor n

ClassifierClassifier

All NormalizedAll Normalizedand Scaled Genesand Scaled Genes

Kruskal-WallisKruskal-WallisBonferoni f(x)Bonferoni f(x)

CorrelativeCorrelativeGene SubsetGene Subset

U95A=124U95A=124

Hu6800=136Hu6800=136

U95AU95A

Hu6800Hu6800

Gene 1 2.2Gene 1 2.2Gene 2 0.5Gene 2 0.5Gene 3 1.2Gene 3 1.2 … …

U95AU95A Hu6800Hu6800

TIGRTIGR

Gene 1 2.2Gene 1 2.2Gene 2 0.5Gene 2 0.5Gene 3 1.2Gene 3 1.2

… …

Average AcrossAverage AcrossChips usingChips usingReferenceReference

Gene-by-GeneGene-by-Geneusing Referenceusing Reference

Gene-by-GeneGene-by-Geneusing Referenceusing Reference

Page 61: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

We collected 540 expression profilesWe collected 540 expression profiles 21 tumor types21 tumor types 95% of all cancers95% of all cancers

10 Independent Classifiers10 Independent Classifiers 75% of data for training, 25% for test75% of data for training, 25% for test Average ~88% accuracy Average ~88% accuracy

Web based Classifier availableWeb based Classifier available So far, 7 of 8So far, 7 of 8** in classification in classification 84% accuracy in classifying primary source84% accuracy in classifying primary source

of metsof mets* Bad RNA* Bad RNA

Summary

Page 62: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Statistical significance is not the same as Statistical significance is not the same as biological significancebiological significanceIf you perturb a system, If you perturb a system, manymany genes change genes change their expression levelstheir expression levelsMultiple pathways and features in the data can Multiple pathways and features in the data can be revealed through different analysis be revealed through different analysis methodsmethodsGenes which are good for classification or Genes which are good for classification or prognostics may not be biologically relevantprognostics may not be biologically relevantExtracting meaning from microarrays will Extracting meaning from microarrays will require new software and toolsrequire new software and toolsThe most important thing we need is The most important thing we need is moremore data collected and stored in a standard data collected and stored in a standard fashionfashion

Further challenges in analysis?Further challenges in analysis?

Page 63: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The “complete” genomes are incompleteThe “complete” genomes are incompleteMany of the signatures we see on arrays do not have Many of the signatures we see on arrays do not have immediate biological implicationsimmediate biological implicationsMost often genes are included on the arrays that are Most often genes are included on the arrays that are used solely for normalizationused solely for normalizationLarger datasets may reveal diagnostic or prognostic Larger datasets may reveal diagnostic or prognostic patterns that are not obvious at presentpatterns that are not obvious at presentReported “variation” in the assays must be Reported “variation” in the assays must be understoodunderstood

Differences in laboratory and analysis protocols areDifferences in laboratory and analysis protocols are likely sources likely sources

There is a need to define QC and analysis standardsThere is a need to define QC and analysis standardsThere is clearly a need for a large database of There is clearly a need for a large database of expression profiles linked to other relevant ancillary expression profiles linked to other relevant ancillary informationinformation

Barriers to Toxicology Applications

Page 64: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

Science is built with facts as a house is with Science is built with facts as a house is with stones – but a collection of facts is no more a stones – but a collection of facts is no more a science than a heap of stones is a house.science than a heap of stones is a house. – – Jules Henri PoincareJules Henri Poincare

Page 65: Presented by John Quackenbush, Ph.D. at the June 10, 2003 meeting of the Pharmacology Toxicology Subcommittee of the Advisory Committee for Pharmaceutical

The TIGR Gene Index TeamThe TIGR Gene Index TeamFoo CheungFoo Cheung

Svetlana KaramychevaSvetlana KaramychevaYudan LeeYudan Lee

Babak ParviziBabak ParviziGeo PerteaGeo Pertea

Razvan SultanaRazvan SultanaJennifer TsaiJennifer Tsai

John QuackenbushJohn QuackenbushJoseph WhiteJoseph White

Funding provided by the Department of EnergyFunding provided by the Department of Energyand the National Science Foundationand the National Science Foundation

TIGR Human/Mouse/Arabidopsis TIGR Human/Mouse/Arabidopsis Expression TeamExpression Team

Emily ChenEmily ChenBryan FrankBryan Frank

Renee GaspardRenee GaspardJeremy HassemanJeremy Hasseman

Heenam KimHeenam KimLara LinfordLara Linford

Simon KwongSimon KwongJohn QuackenbushJohn Quackenbush

Shuibang WangShuibang WangYonghong WangYonghong Wang

Ivana YangIvana YangYan YuYan Yu

Array Software Hit TeamArray Software Hit TeamNirmal BhagabatiNirmal Bhagabati

John BraistedJohn BraistedTracey CurrierTracey Currier

Jerry LiJerry LiWei LiangWei Liang

John QuackenbushJohn QuackenbushAlexander I. SaeedAlexander I. Saeed

Vasily SharovVasily SharovMathangi ThaiagarjianMathangi Thaiagarjian

Joseph WhiteJoseph WhiteAssistantAssistantSue MineoSue MineoFunding provided by the National Cancer Institute,Funding provided by the National Cancer Institute,

the National Heart, Lung, Blood Institute,the National Heart, Lung, Blood Institute,and the National Science Foundationand the National Science Foundation

H. Lee Moffitt Center/USFH. Lee Moffitt Center/USFTimothy J. YeatmanTimothy J. Yeatman

Greg BloomGreg Bloom

TIGR PGA CollaboratorsTIGR PGA CollaboratorsNorman LeeNorman LeeRenae MalekRenae Malek

Hong-Ying WangHong-Ying WangTruong LuuTruong Luu

Bobby BehbahaniBobby Behbahani

TIGR Faculty, IT Group, and StaffTIGR Faculty, IT Group, and Staff

<[email protected]><[email protected]>AcknowledgmentsAcknowledgments

PGA CollaboratorsPGA CollaboratorsGary Churchill (TJL)Gary Churchill (TJL)Greg Evans (NHLBI)Greg Evans (NHLBI)Harry Gavaras (BU)Harry Gavaras (BU)

Howard Jacob (MCW)Howard Jacob (MCW)Anne Kwitek (MCW)Anne Kwitek (MCW)Allan Pack (Penn)Allan Pack (Penn)

Beverly Paigen (TJL)Beverly Paigen (TJL)Luanne Peters (TJL)Luanne Peters (TJL)

David Schwartz (Duke)David Schwartz (Duke)

EmeritusEmeritusJennifer Cho (TGI)Jennifer Cho (TGI)

Ingeborg Holt (TGI)Ingeborg Holt (TGI)Feng Liang (TGI)Feng Liang (TGI)

Kristie Abernathy (mA)Kristie Abernathy (mA)Sonia Dharap(mA)Sonia Dharap(mA)

Julie Earle-Hughes (mA)Julie Earle-Hughes (mA)Cheryl Gay (mA)Cheryl Gay (mA)Priti Hegde (mA)Priti Hegde (mA)

Rong Qi (mA)Rong Qi (mA)Erik Snesrud (mA)Erik Snesrud (mA)