247th acs meeting: the eureka research workbench

24
Eureka Research Workbench: An Open Source eScience Laboratory Notebook Stuart J. Chalk Department of Chemistry University of North Florida [email protected] 2014 Spring ACS Meeting – CINF Paper

Upload: stuchalk

Post on 01-Dec-2014

633 views

Category:

Technology


1 download

DESCRIPTION

Academic scientists need a tool to capture the science they do so that it can be shared in open science, integrated with linked data, and shared/searched. Eureka is an evolving platform to do this.

TRANSCRIPT

Page 1: 247th ACS Meeting: The Eureka Research Workbench

Eureka Research Workbench:An Open Source eScience

Laboratory Notebook

Stuart J. ChalkDepartment of ChemistryUniversity of North Florida

[email protected]

2014 Spring ACS Meeting – CINF Paper 38

Page 2: 247th ACS Meeting: The Eureka Research Workbench

Big Data Electronic Notebooks The Eureka Research Workbench Experiment Markup Language ExptML Schema and Files Semantic Data and Ontologies File Storage Eureka Interface Web Interface Conclusion

Outline

Page 3: 247th ACS Meeting: The Eureka Research Workbench

Current buzz word for “this bring together lots of data and build tools on top to extract knowledge”

This is great, except… …how do we do that for science?

Platform, data structures, and exchange protocols to capture, identify, and disseminate scientific information

Research Data Alliance (https://rd-alliance.org/) “Research Data Sharing without barriers” Fran Berman at RPI is NSF funded co-chair of RDA

Big Data

Page 4: 247th ACS Meeting: The Eureka Research Workbench

Scientists need to move todigital notebooks…

...and record not just the databut the flow and context

How science is doneis important for searching,aggregation, meta-analysis

We need more than an electronic version of a notebook

We need a science version of “Second Life” (SciLife?)

Electronic Notebooks

Page 5: 247th ACS Meeting: The Eureka Research Workbench

Started in 2006 after getting involved in the Analytical Information Markup Language (AnIML) project

Store all research notes/data in a digital format Capture the workflow of scientists Writing in a lab notebook is equivalent to

“multi-type” blogging in the digital world How to capture information? Many data types!

(ExptML) How to store files “online”? (Fedora-Commons) How to access files in the browser? (CakePHP) How to represent laboratory resources? (ExptML) How to link data together? RDF (in Fedora-Commons)

Eureka Research Workbench

Page 6: 247th ACS Meeting: The Eureka Research Workbench

A specification (written in XML) that describes different types of information recorded during the scientific process (http://exptml.sourceforge.net)

Experiment Markup Language (ExptML)

Sample Solution Space Specimen Substance Task Template Timeline User Vendor

Annotation Api Calculation Chemical Citation Customer Data Dataset Definition Element

Equipment Event Experiment Group Message Project Protocol Quote Report Result

Page 7: 247th ACS Meeting: The Eureka Research Workbench

ExptML Chemical Schema

Page 8: 247th ACS Meeting: The Eureka Research Workbench

ExptML Chemical Schema

Page 9: 247th ACS Meeting: The Eureka Research Workbench

ExptML Chemical (Instance)

Page 10: 247th ACS Meeting: The Eureka Research Workbench

Data are connected to other data – ‘Linked Data’(http://www.w3.org/standards/semanticweb/data)

The ‘Semantic Web’ approach to contextualize data Proposed storage of ‘relationships’ between data is

the Resource Description Format (RDF - http://www.w3.org/RDF/)

Semantic Data

Page 11: 247th ACS Meeting: The Eureka Research Workbench

Digital repository software http://fedora-commons.org/ Creation and management of online digital libraries

Fedora ‘Digital Object’ consists of metadata + streams Metadata stored as Dublin Core (DC stream) ExptML file stored as EXPTML stream Other files (PDFs, Images, Word etc.) stored as streams Relationships stored as RDF (RELS-EXT stream)

Features: Version control, Checksumming, Archiving Built-in search of objects and relationships Add-on for file content search (Fedora GSearch)

Fedora Commons

Page 12: 247th ACS Meeting: The Eureka Research Workbench

Fedora-Commons defines and works on digital objects

In the definition of a Fedora object an ExptML file is just one stream of many. By default each object also has a “DC” stream of metadata and an “RELS-EXT” stream of relationships

Each Fedora object can have any number of additional streams for Paper PDFs, product/sample pictures,

binary file formats (if a conversion has been done) Video, audio, RDF, anything…

You can export individual streams or the whole Fedora object with streams binary encoded (Sharing/archiving)

Fedora for File Storage

Page 13: 247th ACS Meeting: The Eureka Research Workbench

Fedora Object Storage

Page 14: 247th ACS Meeting: The Eureka Research Workbench

Web interface written in PHP using the CakePHP Framework

Communicates with Fedora-Commons API to create, retrieve, update and delete (CRUD) ExptML and other files

Representational State Transfer (REST) format for URLs E.g.

http://example.com/chemicals/view/exptml:chm1 Creation of ExptML via interface Provides search via Fedora and Gsearch Can extract data out of XML files Can gather data from other websites (via API

controller)and integrate into ExptML files

Eureka Web Application

Page 15: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Group View

Only data types related to the research group show up on left

Page 16: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Bench View

Clicking on the “Add” menu on the rightallows you add a comment or link to data

Page 17: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Notebook View

Page 18: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Laboratory View

The “Rel” menu shows you the information related to this instrument

Page 19: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Library View

You can add the PDF of the paper to the citation. The contents of the PDF are searchable in the system

Page 20: 247th ACS Meeting: The Eureka Research Workbench

Eureka Website – Stockroom View

Page 21: 247th ACS Meeting: The Eureka Research Workbench

Web Application Server: Fedora 4, JSON-LD, ElasticSearch Client: CakePHP 3/HTML5, Recline.js, Annotator, JQuery

Standards Linked Data Platform (http://www.w3.org/TR/ldp/) Datapackage/Simple Data Format (http://dataprotocols.org/) Markup Languages: AnIML, UnitsML, CML Other Molecular File Formats: MOL/SDF/CDX/CIF/PDB etc. Open Framework for Laboratory Data (Allotrope Foundation)

Datasources ChemSpider, CIR, PubChem, Google Scholar, CrossRef, VIVO ExchangeNetwork (EPA), NIST, SDBS (no API’s yet)

Tools Marvin for JS, JSXGraph, JSpecView, Chemicalize.org

Eureka Technology Stack

Page 22: 247th ACS Meeting: The Eureka Research Workbench

Implement ingest of all data types, file (if appropriate) and web based

In browser processing of data -> dataset -> result, report writing Extraction of file based legacy data -> ExptML format data Open access to data/spectra, ‘available data’ page (browser only) Access to data/spectra via linked data server (discovery/indexing) Publishing of packaged datasets with authenticated download option Automated ingestion of data from instruments/sensors Collaborative research: authentication and data exchange

Timeframe? Depends on securing funding

Eureka Roadmap

Page 23: 247th ACS Meeting: The Eureka Research Workbench

Eureka: Web application to create ExptML files Built on ExptML to capture data/resources/workflows Reliable storage/archiving system for ExptML files

(Fedora) Storage of relationships between data (RDF) TODO

Provide mechanism for sharing of data (different levels) Add tools to find, visualize and work on science data Integration into the RDA model for sharing research data Get the word out and test system with many users

Conclusion

Page 24: 247th ACS Meeting: The Eureka Research Workbench

References Eureka – http://sourceforge.net/projects/eureka Fedora-Commons – http://fedora-commons.org XML – http://www.w3.org/standards/xml AnIML – http://animl.sourceforge.net ExptML – http://exptml.sourceforge.net/ UnitsML – http://unitsml.nist.gov/ CML – http://www.xml-cml.org/ JSON-LD – http://www.w3.org/TR/json-ld/ RDF – http://www.w3.org/RDF/ CIR – http://cactus.nci.nih.gov/chemical/structure RDA – http://rd-alliance.org ChemSpider – http://www.chemspider.com/ Allotrope Foundation – http://allotrope.org