vectorbase gene expression data in vectorbase fotis kafatos, george christophides, bob maccallum...

Post on 16-Jan-2016

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

VectorBaseVectorBase

Gene expression data in VectorBase

Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond

Imperial College London

(thanks also to EBI, Sanger and ND)

VectorBaseVectorBase

Outline

1. Project goals

2. What’s currently available

3. Current challenges and future plans

VectorBaseVectorBase

Project goals

• For vector biologists:– Easy access to gene expression data

• consistent data processing

• For array specialists:– ArrayExpress submission– Advanced analysis tools– Array annotation

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

• BASE: BioArray Software Environment

• http://base.thep.lu.se/• Open source, active

development and user community

• LIMS, data storage, export and analysis

• Web-based, user/group access control

• BASE 2.x adoption will bring Affy support

Data submission

• Community submission guidelines available• First batch of experiments loaded by us• Bulk data loader• Sample/experiment annotation requires

intervention from curators

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

• Data held in BASE is largely MIAME compliant

• Script for semi-automated export in TAB2MAGE format

• One experiment submitted so far

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

• BASE web interface offers powerful and extendable analysis environment

• Can be used for multi-site collaborations on pre-publication data

• Steep learning curve/not 100% intuitive

• Not easily linked to• We provide simpler

views so the casual user can quickly draw biological inferences

VectorBaseVectorBase

VectorBaseVectorBase

Standardised data

All displayed data is processed in the same way:

1. Poor quality spots removed• Currently using submitted spot flags

2. Normalisation• “lowess” for two-colour experiments

VectorBaseVectorBase

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

PROBEMAPPING

• 3 probe types

• 6 array designs

• Mapping handled via Ensembl pipeline:– Oligo exonerate– PCR e-PCR– cDNA

exonerate2genes

VectorBaseVectorBase

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

ArrayExpress

‘PUBLIC’STORAGE

DATASUMMARIES

PROBEMAPPING

GFF3

VectorBaseVectorBase

contigview

VectorBaseVectorBase

featureview

VectorBaseVectorBase

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

VECTOR BIOLOGISTS

ARRAY BIOLOGISTS GENOME BIOLOGISTS

ArrayExpress

‘PUBLIC’STORAGE

VectorBaseVectorBase

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

DATASUMMARIES

PROBEMAPPING

DATA MINING

VectorBaseVectorBase

BioMart

• Beta version currently available– http://base.vectorbase.org:9999/biomart/martview

• Improvements still needed:– experiment annotations– Alignments (i.e. handle split alignments)

• Federation with current marts• Integration with new data?

VectorBaseVectorBase

Current challenges and future plans

• How do you want to query?

• CVs & ontologies

• APIs

• Community submission

• Manual annotation

VectorBaseVectorBase

Querying strategy

• What do you want to query on?– Fetch all genes upregulated under condition X– Fetch all experiments with gene X and condition Y– Fetch all probes with expression similar to probe X

• All essentially boil down to:– Define probe (genes etc)

– Define significant expression• ANOVA? • Up/down-regulation WRT what?

– Define experimental conditions• Sample annotation• Experimental design

BULKLOADER

EXPRESSIONDATA

STORAGE& ANALYSIS

VECTOR BIOLOGISTS

ARRAY BIOLOGISTS GENOME BIOLOGISTS

CV / ONTOLOGY

ArrayExpress

‘PUBLIC’STORAGE

GENOMICDATA

AUTOMATICANNOTATION

GENOMEBROWSER

DATASUMMARIES

PROBEMAPPING

DATA MINING

STORAGE& ANALYSIS

‘PUBLIC’STORAGE

GENOMEBROWSER

DATASUMMARIES

DATA MINING

BULKLOADER

EXPRESSIONDATA

GENOMICDATA

AUTOMATICANNOTATION

CV / ONTOLOGY

ArrayExpress

Array API ?AE API ? e! API

MartJ / MQL

PROBEMAPPING

VectorBaseVectorBase

Array API

Perl / Java objects for retrieval / handling of array data– Dual purpose:

• Consistency & efficiency of VB expression website • Computational access to VB data for all

– Objects must be:• General, DB-independent• Compatible with pre-existing Bio API (BioPerl / BioJava)

– Nb. May be pre-existing solution:• ArrayExpress API?• BioPerl-Expression?• MAGE-OM-stk

• http://neuron.cse.nd.edu/vectorbase/index.php/Array_API_proposal

VectorBaseVectorBase

VectorBaseVectorBase

Community data submission

• Carrot? – Help with ArrayExpress submission– Analysis tools– Dissemination

• Stick? – Outreach (courses, conferences)– Networking

VectorBaseVectorBase

GE data manual annotators

• Gene-build designed arrays– Negative evidence less compelling

• EST clone-based arrays– http://tinyurl.com/vlkwo

VectorBaseVectorBase

Longer term plans

Host-parasite GE data integration & analysis

GE-clusters “upstream” regions regulatory elements, upstream TFs

RNAi phenotypes Images

VectorBaseVectorBase

VectorBaseVectorBase

VectorBaseVectorBase

CVs & ontologies

• Integrate MGED and specialist ontologies for– Body parts– Developmental stages– Disease processes– …

• Allows comparison across experiments with similar experimental conditions

BioMartMost biomarts:

• Gene-based

• Mostly ‘binary’ data– e.g. a gene either has a

signal domain or doesn’t

• Easily linked with other (gene-based) biomarts

VB Biomart:

• Probe based– Many probes not aligned

• Exp data less clear– e.g. define ‘differential

expression’

• Exports gene/trans IDs

for linking to other Marts

VectorBaseVectorBase

Clustering

• A priority?• Easy to do on reporter level within

experiments• Harder to do at gene level across all

experiments– Binary gene profile: “yes/no differentially

expressed in experiment” ?

• Amazon-style links to “genes which may have similar expression profiles”?

VectorBaseVectorBase

BASE 2.x

• Adoption delayed, now in progress

• Brings Affymetrix support

• Cleaner/modern interface

• Better API (Java)

top related