support for mage-tab in caarray 2.0 overview and feedback mage-tab workshop january 24, 2008
TRANSCRIPT
What is caArray?
• caArray is a caBIG™-compliant microarray data repository at the NCICB
• Developed to support a federated model of microarray data sharing
• Developed in line with MIAME and MAGE guidelines
caArray 1.6 caArray 2.0
Goals of caArray 2.0
• Address Adopter feedback gained from our 1.x experience
• Improve the user experience for storing and retrieving data produced
• Simplify and improve the performance of data access through the API and grid service, for analytical applications
• Harmonize with caBIG™ tissue repository (caTissue) and annotation repository (caBIO)
• Support additional array platforms, including SNP arrays
• Organize the application around workflow between investigators and the labs that serve them
• Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community
Features of caArray 2.0
• Store array data associated with experiment and sample annotations
• Data entry through graphical user interface or MAGE-TAB
• Parse Affymetrix, Illumina and GenePix formats for expression and SNP arrays
• Role-based permissions for data access
• Programmatic access via a Java API and grid service
• Manage protocols and controlled vocabularies
• MGED Ontoloty 1.3.1 comes pre-loaded
• Basic Browse and Search Functionality
caArray 2.0 Annotations
• Capture information for
• Experiment information
• Contacts
• Publications
• Sample Annotations• Source• Sample• Extract• Labeled Extracts• Hybridizations
caArray 2.0 supported formats
Parsable file formats• Annotation
• MAGE-TAB .ADF, IDF, SDRF• Array data - parsed
• Affymetrix Expression and SNP• . CDF, .CEL, .CHP
• Illumina Expression and SNP• .CSV
• GenePix• .GAL, .GPR
Unparsed formats• Affymetrix: .dat, .exp, .rpt, .txt• Illumina: .txt, .idat• Agilent: .txt, .tsv• ImaGene: .txt, .tiv• Nimblegen: .txt, .gff
caArray 2.0 permissions
• Role-based permissions for each Installation
• Anonymous user
• System Administration
• Principle investigator/Biostatistician/Lab Administrator/Lab Scientist
• Data is Private until made Public
• Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user
• Collaboration groups can be managed by the PI for pre-public collaboration
• CSM 4.0
• Experiment-level and samples-level security
caArray 2.0 API and Grid Service
• Support for MAGE-TAB level of annotation – Simplified implementation of MAGE
• API provides a data service and analytical services
• Data service allows users to use CQL to issue queries that traverse the domain model
• Analytical services provide convenience methods for data access
caArray 2.0 browse and search
•Browse by• Experiments• Organism• Provider• Array design
•Search by specifying• Keyword• Category
MAGE-TAB in caArray 2.0
• Support MAGE-TAB v1.0 – ADF, IDF, SDRF
• Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies)
• Protocols imported and viewable in Manage Protocols
• Characteristics displayed on the relevant detail pages
• Original files are stored in association with the Experiment
• Edits made to the information in the UI are not reflected in these files
• Future feature – MAGE-TAB export based on current database values
MAGE-TAB for data migration
caArray 1.6 >> caArray 2.0
• Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files
• Challenges included
• MAGE-OM >>MAGE-TAB mapping
• Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue)
• Manual checking still needed
Jackson Labs internal MAD database >> caArray 2.0
MAGE-TAB Feedback
• Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies
• Need tools to facilitate this
• Source vs. Sample vs. Extract vs. Labeled Extract
• Often confusion over “what goes where”
• From Jackson Labs:
• Documentation is good for a biologist-type end-user, but software engineer would like more detail
• More real-life examples would be helpful
Specific requests to consider
• Need a way to specify required fields for particular implementations
• caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template
• Associate “Supplemental” files with an experiment
• In IDF, recommend adding a field to specify the type of array experiment (Gene Expression, SNP, aCGH, etc.)