247th acs meeting: the eureka research workbench

Post on 01-Dec-2014

633 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.

TRANSCRIPT

Eureka Research Workbench:An Open Source eScience

Laboratory Notebook

Stuart J. ChalkDepartment of ChemistryUniversity of North Florida

schalk@unf.edu

2014 Spring ACS Meeting – CINF Paper 38

Big Data Electronic Notebooks The Eureka Research Workbench Experiment Markup Language ExptML Schema and Files Semantic Data and Ontologies File Storage Eureka Interface Web Interface Conclusion

Outline

Current buzz word for “this bring together lots of data and build tools on top to extract knowledge”

This is great, except… …how do we do that for science?

Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information

Research Data Alliance (https://rd-alliance.org/) “Research Data Sharing without barriers” Fran Berman at RPI is NSF funded co-chair of RDA

Big Data

Scientists need to move todigital notebooks…

...and record not just the databut the flow and context

How science is doneis important for searching,aggregation, meta-analysis

We need more than an electronic version of a notebook

We need a science version of “Second Life” (SciLife?)

Electronic Notebooks

Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project

Store all research notes/data in a digital format Capture the workflow of scientists Writing in a lab notebook is equivalent to

“multi-type” blogging in the digital world How to capture information? Many data types!

(ExptML) How to store files “online”? (Fedora-Commons) How to access files in the browser? (CakePHP) How to represent laboratory resources? (ExptML) How to link data together? RDF (in Fedora-Commons)

Eureka Research Workbench

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Customer Data Dataset Definition Element

Equipment Event Experiment Group Message Project Protocol Quote Report Result

ExptML Chemical Schema

ExptML Chemical Schema

ExptML Chemical (Instance)

Data are connected to other data – ‘Linked Data’(http://www.w3.org/standards/semanticweb/data)

The ‘Semantic Web’ approach to contextualize data Proposed storage of ‘relationships’ between data is

the Resource Description Format (RDF - http://www.w3.org/RDF/)

Semantic Data

Digital repository software http://fedora-commons.org/ Creation and management of online digital libraries

Fedora ‘Digital Object’ consists of metadata + streams Metadata stored as Dublin Core (DC stream) ExptML file stored as EXPTML stream Other files (PDFs, Images, Word etc.) stored as streams Relationships stored as RDF (RELS-EXT stream)

Features: Version control, Checksumming, Archiving Built-in search of objects and relationships Add-on for file content search (Fedora GSearch)

Fedora Commons

Fedora-Commons defines and works on digital objects

In the definition of a Fedora object an ExptML file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships

Each Fedora object can have any number of additional streams for Paper PDFs, product/sample pictures,

binary file formats (if a conversion has been done) Video, audio, RDF, anything…

You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving)

Fedora for File Storage

Fedora Object Storage

Web interface written in PHP using the CakePHP Framework

Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files

Representational State Transfer (REST) format for URLs E.g.

http://example.com/chemicals/view/exptml:chm1 Creation of ExptML via interface Provides search via Fedora and Gsearch Can extract data out of XML files Can gather data from other websites (via API

controller)and integrate into ExptML files

Eureka Web Application

Eureka Website – Group View

Only data types related to the research group show up on left

Eureka Website – Bench View

Clicking on the “Add” menu on the rightallows you add a comment or link to data

Eureka Website – Notebook View

Eureka Website – Laboratory View

The “Rel” menu shows you the information related to this instrument

Eureka Website – Library View

You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system

Eureka Website – Stockroom View

Web Application Server: Fedora 4, JSON-LD, ElasticSearch Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery

Standards Linked Data Platform (http://www.w3.org/TR/ldp/) Datapackage/Simple Data Format (http://dataprotocols.org/) Markup Languages: AnIML, UnitsML, CML Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc. Open Framework for Laboratory Data (Allotrope Foundation)

Datasources ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)

Tools Marvin for JS, JSXGraph, JSpecView, Chemicalize.org

Eureka Technology Stack

Implement ingest of all data types, file (if appropriate) and web based

In browser processing of data -> dataset -> result, report writing Extraction of file based legacy data -> ExptML format data Open access to data/spectra, ‘available data’ page (browser only) Access to data/spectra via linked data server (discovery/indexing) Publishing of packaged datasets with authenticated download option Automated ingestion of data from instruments/sensors Collaborative research: authentication and data exchange

Timeframe? Depends on securing funding

Eureka Roadmap

Eureka: Web application to create ExptML files Built on ExptML to capture data/resources/workflows Reliable storage/archiving system for ExptML files

(Fedora) Storage of relationships between data (RDF) TODO

Provide mechanism for sharing of data (different levels) Add tools to find, visualize and work on science data Integration into the RDA model for sharing research data Get the word out and test system with many users

Conclusion

References Eureka – http://sourceforge.net/projects/eureka Fedora-Commons – http://fedora-commons.org XML – http://www.w3.org/standards/xml AnIML – http://animl.sourceforge.net ExptML – http://exptml.sourceforge.net/ UnitsML – http://unitsml.nist.gov/ CML – http://www.xml-cml.org/ JSON-LD – http://www.w3.org/TR/json-ld/ RDF – http://www.w3.org/RDF/ CIR – http://cactus.nci.nih.gov/chemical/structure RDA – http://rd-alliance.org ChemSpider – http://www.chemspider.com/ Allotrope Foundation – http://allotrope.org

top related