support for mage-tab in caarray 2.0 overview and feedback mage-tab workshop january 24, 2008

Support for MAGE-TAB in caArray 2.0Overview and feedback

MAGE-TAB Workshop

January 24, 2008

Agenda

• Brief overview of caArray 2.0

• caArray 2.0 and MAGE-TAB

• MAGE-TAB feedback

What is caArray?

• caArray is a caBIG™-compliant microarray data repository at the NCICB

• Developed to support a federated model of microarray data sharing

• Developed in line with MIAME and MAGE guidelines

caArray 1.6 caArray 2.0

Goals of caArray 2.0

• Address Adopter feedback gained from our 1.x experience

• Improve the user experience for storing and retrieving data produced

• Simplify and improve the performance of data access through the API and grid service, for analytical applications

• Harmonize with caBIG™ tissue repository (caTissue) and annotation repository (caBIO)

• Support additional array platforms, including SNP arrays

• Organize the application around workflow between investigators and the labs that serve them

• Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community

Features of caArray 2.0

• Store array data associated with experiment and sample annotations

• Data entry through graphical user interface or MAGE-TAB

• Parse Affymetrix, Illumina and GenePix formats for expression and SNP arrays

• Role-based permissions for data access

• Programmatic access via a Java API and grid service

• Manage protocols and controlled vocabularies

• MGED Ontoloty 1.3.1 comes pre-loaded

• Basic Browse and Search Functionality

caArray 2.0 Annotations

• Capture information for

• Experiment information

• Contacts

• Publications

• Sample Annotations• Source• Sample• Extract• Labeled Extracts• Hybridizations

caArray 2.0 supported formats

Parsable file formats• Annotation

• MAGE-TAB .ADF, IDF, SDRF• Array data - parsed

• Affymetrix Expression and SNP• . CDF, .CEL, .CHP

• Illumina Expression and SNP• .CSV

• GenePix• .GAL, .GPR

Unparsed formats• Affymetrix: .dat, .exp, .rpt, .txt• Illumina: .txt, .idat• Agilent: .txt, .tsv• ImaGene: .txt, .tiv• Nimblegen: .txt, .gff

caArray 2.0 permissions

• Role-based permissions for each Installation

• Anonymous user

• System Administration

• Principle investigator/Biostatistician/Lab Administrator/Lab Scientist

• Data is Private until made Public

• Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user

• Collaboration groups can be managed by the PI for pre-public collaboration

• CSM 4.0

• Experiment-level and samples-level security

caArray 2.0 API and Grid Service

• Support for MAGE-TAB level of annotation – Simplified implementation of MAGE

• API provides a data service and analytical services

• Data service allows users to use CQL to issue queries that traverse the domain model

• Analytical services provide convenience methods for data access

caArray 2.0 browse and search

•Browse by• Experiments• Organism• Provider• Array design

•Search by specifying• Keyword• Category

MAGE-TAB in caArray 2.0

• Support MAGE-TAB v1.0 – ADF, IDF, SDRF

• Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies)

• Protocols imported and viewable in Manage Protocols

• Characteristics displayed on the relevant detail pages

• Original files are stored in association with the Experiment

• Edits made to the information in the UI are not reflected in these files

• Future feature – MAGE-TAB export based on current database values

MAGE-TAB for data migration

caArray 1.6 >> caArray 2.0

• Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files

• Challenges included

• MAGE-OM >>MAGE-TAB mapping

• Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue)

• Manual checking still needed

Jackson Labs internal MAD database >> caArray 2.0

MAGE-TAB Feedback

• Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies

• Need tools to facilitate this

• Source vs. Sample vs. Extract vs. Labeled Extract

• Often confusion over “what goes where”

• From Jackson Labs:

• Documentation is good for a biologist-type end-user, but software engineer would like more detail

• More real-life examples would be helpful

Specific requests to consider

• Need a way to specify required fields for particular implementations

• caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template

• Associate “Supplemental” files with an experiment

• In IDF, recommend adding a field to specify the type of array experiment (Gene Expression, SNP, aCGH, etc.)

support for mage-tab in caarray 2.0 overview and feedback mage-tab workshop january 24, 2008

Documents

goals of caarray

data accesscaarray

data producedsimplify

support magetab v1

tab feedbackwhat

store array data

sampleslevel security

data accessprogrammatic