oerc seminar

41
Who am I? 1 Sean Bechhofer University of Manchester [email protected] @seanbechhofer http://humblyreport.wordpress.com

Upload: seanb

Post on 10-May-2015

484 views

Category:

Technology


0 download

DESCRIPTION

Slides from a seminar at Oxford e-Research Centre, 23rd Feb, 2012.

TRANSCRIPT

Page 1: OeRC Seminar

Who am I?

1

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 2: OeRC Seminar

Who am I?

2

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 3: OeRC Seminar

Who am I?

3

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 4: OeRC Seminar

Who am I?

4

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 5: OeRC Seminar

Who am I?

5

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 6: OeRC Seminar

Who am I?

6

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 7: OeRC Seminar

Who am I?

7

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 8: OeRC Seminar

Who am I?

8

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 9: OeRC Seminar

Who am I?

9

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

Page 10: OeRC Seminar

Sean Bechhofer

University of Manchester [email protected]

@seanbechhofer http://humblyreport.wordpress.com

10

Research Objects: Towards Exchange and Reuse of Digital

Knowledge

Page 11: OeRC Seminar

Publication •  Argumentation: Convince the reader of the ���

validity of a position [Mesirov]

–  Reproducible Results System: facilitates enactment and publication of reproducible research.

•  Results are reinforced by reproducability [De Roure]

–  Explicit representation of method.

•  Verifiability as a key factor in scientific discovery.

J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416, 2010 http://dx.doi.org/10.1126/science.1179653

D. De Roure and C. Goble Anchors in Shifting Sand: the Primacy of Method in the Web of Data Web Science Conference 2010, Raleigh NC, 2010 http://eprints.ecs.soton.ac.uk/20817/

Stodden et. al. Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science Computing in Science and Engineering 12(5), p.8-13, 2010 http://dx.doi.org/10.1109/MCSE.2010.113

Page 12: OeRC Seminar

Publication

•  Nano-publications. Explicit representation at the statement level.

•  Executable Papers

–  Collage –  SHARE

–  Verifiable Computational Results

12

Groth et. al. The Anatomy of a Nano-publication Information Services and Use 30(1), p.51-56, 2010 http://iospress.metapress.com/index/FTKH21Q50T521WM2.pdf

Nowakowski et. al. The Collage Authoring Environment ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.064

Van Gorpet. al SHARE: a web portal for creating and sharing executable research papers ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.062

Gavish et. al. A Universal Identifier for Computational Results ICCS 2011, 2011 http://dx.doi.org/10.1016/j.procs.2011.04.067

Page 13: OeRC Seminar

Knowledge Burying in paper publication

•  Publishing/mining cycle results in loss of knowledge –  ≥ 40% of information lost

•  RIP – Rest in Paper •  Need for mechanisms for publication of knowledge, preserving

information about the process.

Experiment

Paper

Knowledge

Publication Text Mining

B.Mons Which Gene Did You Mean? BMC Bioinformatics 6 p.142 2005 http://dx.doi.org/10.1186/1471-2105-6-142

Page 14: OeRC Seminar

The Problem

•  Moving to digital environments

– Workflows, protocols, algorithms – Consuming and producing data –  Electronic publication methods

•  From (linear) paper publications to….

??? •  Need for frameworks for facilitating reuse and

exchange of digital knowledge

14

Page 15: OeRC Seminar

Workflows

BioAID_DiseaseDiscovery v3

•  Central in experimental science •  Enable automation •  Make science repeatable (and sometimes

reproducible) •  Encourage best practices

•  Scientist-friendly •  Aimed at (some types of) scientists, possibly

even without strong computational skills •  Communities: Need for scientific data

preservation •  Enhance scientific development by building on,

sharing, and extending previous results within scientific communities

•  However, workflow preservation is especially complex •  Workflows not only specified statically at

design time but also interpreted through their execution

•  Complex models are required to describe workflows and related resources, including documents, data and services

•  Resources often beyond control of scientists

A Scientific Workflow can be seen as the combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving

Page 16: OeRC Seminar

myExperiment

16

  A repository of research methods

  A community social network of people and things

  A Social Virtual Research Environment

  Web 2.0 “boutique” site

  A probe into researcher behaviour

  Open source (BSD) Ruby on Rails app

  REST and SPARQL interfaces, supports Linked Data

  Part of product family including BioCatalogue, MethodBox and SysmoDB

5550  members,  300  groups,  2300  workflows,  220  packs  

Page 17: OeRC Seminar

Motivating Projects

•  myExperiment

–  Workflow sharing •  Sysmo-DB

–  Assets catalogue supporting exchange of data, models, SOPs

•  Obesity e-Lab/MethodBox

–  Sharing survey data/analysis scripts •  myExperiment packs

–  Packs supporting (simple) ���aggregations.

–  Links not just references

–  Packs as nascent ROs 17

Page 18: OeRC Seminar

Wf4Ever

18

…technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines.

FP7 Digital Libraries and Digital Preservation iSOCO, University of Manchester, Universidad Politécnica de Madrid, University of Oxford, Poznan Supercomputing and Networking Centre, Instituto de Astrofísica de Andalucía, Leiden University Medical Centre

•  Architecture/implementation for workflow preservation, sharing and reuse

•  Research Object models •  Workflow Decay, Integrity and Authenticity

•  Workflow Evolution and Recommendation •  Provenance

•  Driven by Use Cases

Page 19: OeRC Seminar

Research Objects

19

Semantically rich aggregations of resources, supporting a research objective

Linking

Page 20: OeRC Seminar

Bio Scenario

20

Page 21: OeRC Seminar

Bio Scenario

21

Page 22: OeRC Seminar

Astronomers Questions When accessing a workflow •  Can I use it for my purposes (in my

words)? •  If I can expect it to run, when was

it was last run, by whom? •  What it does quickly, by one of

–  example input / output (and trying it) –  a description –  ‘reading’ its key parts

–  what it was used for –  related workflows its creator –  contacting the creator or last user

•  How I need to cite the author and workflow?

When sharing a workflow •  What rights do others have?

•  What a good workflow is to get a good score?

–  Make my workflow findable, reusable, and ready for review

–  Instructions to authors –  Two types of contributions: serious

science, preliminary/playing around

•  If my workflow may have issues –  What the system or other users think

it does

•  How it relates to other things

•  Share freely or anonymously upon request?

22 http://www.flickr.com/photos/-bast-/349497988/

Page 23: OeRC Seminar

User Requirements

•  Study of user scenarios

•  Isolation of User Requirements •  User review

•  Project Technical requirements •  Classify Technical Requirements 23

Reader

Contributor

Re-User

Publisher Contributor

Comparator

Evaluator/Reviewer

Finder/Searcher

Curator

Trainee Creator

As a Reader of ROs, I want to compare an RO with others so that I can determine whether the investigation is novel

As a Creator of ROs, I want to aggregate existing resources so that I can conveniently access related resources from a single place.

As a Comparator of ROs, I want to follow the steps taken so that I can understand the investigative process or method

Page 24: OeRC Seminar

Creator. Collecting together resources as an RO for reuse or repurpose. May be for personal use.

Contributor. Providing materials to be used within an RO Collaborator. Providing materials to be used without necessarily being aware of the RO

Reader. Looking for related works, state of the art. Comparator. Looking for similar or previous work to a task in hand Re-User. Understands the underlying methods encapsulated (e.g. workflow) and how to extract/replace components.

Publisher. Disseminating results or methods. Upload to repository, publish via myExp, embed in blog post.

Evaluator/Reviewer. Evaluating/validating or reviewing content. Confirmation of results or validation of process.

User Roles

24

Page 25: OeRC Seminar

25

Stability, Completeness, Integrity, Authenticity, Quality Workflow Reproducibility

Workflow Decay •  Component level •  flux/decay/unavailability •  Data level

•  formats/ids/standards •  Infrastructure level

•  platform/resources

Experiment Decay •  Methodological changes •  New technologies •  New resources/components •  New data

Page 26: OeRC Seminar

26

Lifecycle Functionalities

Wf4Ever functionalities

Access & Usage Functionalities

Data Management & Analysis Functionalities

Storage Functionalities

Storage Publication Execution

Stability Evaluation

Completeness Evaluation

Recommenda-tions Visualization Collaboration

Edit Use Annotate

Retrieval

Maintenance Archival

… …

Page 27: OeRC Seminar

27

Lifecycle Services Storage Services

Wf4Ever Reference Implementation

Access & Usage Clients

Data Management & Analysis Services

Stability Evaluation

Completeness Evaluation Recommender

RO Portal RO Manager Tool

RO Digital Library

ROBox

Dropbox Client

(By the end of 1st Year)

Taverna Workflow Mgmt System

Page 28: OeRC Seminar

Linked Data

•  A set of best practices for publishing ���and connecting data on the Web

1.  Use URIs to name things 2.  Use dereferencable HTTP URIs

3.  Provide useful content on lookup using standards 4.  Include links to other stuff

28

Page 29: OeRC Seminar

Linked Data

Page 30: OeRC Seminar

Linked Data is not Enough!

•  A set of best practices for publishing ���and connecting data on the Web

1.  Use URIs to name things 2.  Use dereferencable HTTP URIs

3.  Provide useful content on lookup using standards 4.  Include links to other stuff

•  All very nice, lots of publishing going on, but no common models for lifecycle, aggregation, ownership, etc

•  A platform for sharing and publishing, but more is needed

30

Note: The answer is not not Linked Data!*

*Logician joke

Bechhofer et al Linked Data is not Enough for Scientists Future Generation Computer Systems, 2011 http://dx.doi.org/10.1016/j.future.2011.08.004

Page 31: OeRC Seminar

ROs and Linked Data

•  Linked Data: Collection of best practices for publishing and connecting structured data on the web.

•  ROs should be independent of mechanisms for representation and delivery

•  ROs as non-information ���resources –  “Named Graphs ���

for LD"

31

LD Cloud

RO

Page 32: OeRC Seminar

32

Research Object Model WP2 - Workflow Lifecycle Management

»  Research Object Model ›  Focus of work in M6-12

»  Version 0.1 released to project in November 2011 »  Use within developed RO services (RODL) »  A suite of linked ontologies

›  Research Object Core - ro (aggregation and annotation) •  Research Object

›  Workflow Description - wfdesc (content) •  Abstract workflow

›  Workflow Provenance - wfprov (provenance) •  Workflow provenance

Emphasis on Workflow-centric Research Objects

Container Structure

Minimal place holder

Page 33: OeRC Seminar

33

Research Object Core (ro) WP2 - Workflow Lifecycle Management

»  Aggregation (OAI-ORE) ›  Use of OAI-ORE to support the description of collections of���

resources. ›  Established vocabulary ›  Usage in existing work (myExperiment) ›  Fit with Linked Data publication

»  Annotation (AO) ›  Survey of existing annotation vocabularies, Annotation Ontology (Clark et al) and Open

Annotation Collaboration (Van de Sompel et al). ›  Liaison and discussion with both groups

•  Little to choose in technical terms •  A catalyst and focus for collaboration between AO and OAC

›  Choice of AO •  Existing collaboration/relationship (UNIMAN and AO)

»  Formation of W3C Open Annotation Community Group ›  Participation from Wf4Ever staff ›  Potential for impact/collaborations

»  Defines the core data model used by the RO Digital Library service and the Command Line Tool developed in WP1.

Page 34: OeRC Seminar

34

Workflow Description (wfdesc) WP2 - Workflow Lifecycle Management

»  Model providing initial descriptions of workflows ›  Process instances ›  Linked via input/output/parameters. ›  Support for the tasks of workflow abstraction, indexing, classifications, and general

workflow analysis. ›  Generic technologies, adaptable to different domains using specific catalogues, e.g. SADI

framework. ›  Reflects explicit focus on workflow-centric ROs

»  Evolved from the OPMW ontology by Wf4Ever staff member Daniel Garijo and Yolanda Gil.

»  Tooling generating wfdesc descriptions from aggregated Taverna workflows has been developed.

›  Descriptions already used by the Workflow Recommendation Service for inspecting workflow structures and service interconnections. WP3

Page 35: OeRC Seminar

35

Workflow Provenance (wfprov) WP2 - Workflow Lifecycle Management

»  A provenance convergence layer ›  Potential for links to OPM-V or PROV-O. ›  Mappings to OPM-V and PROV-O are under development ›  A placeholder for the v0.1 ontology suite

»  Taverna plugin has been developed exporting Taverna provenance in PROV-O format in WP4

»  Prototype for a conversion agent that generates wfprov descriptions from PROV-O developed, wfprov data will primarily be used by Integrity and Authenticity in order to inspect workflow executions. WP4

»  More extended modeling and descriptions of provenance information will be reported in WP4.

Page 36: OeRC Seminar

ROs are Technical and Social

•  An artefact to support preservation of the method, data etc.

•  Technical details of platform, services etc.

•  A record of an investigation or experiment •  A mechanism for communication, packaging, sharing,

publishing, finding

•  An object that connects people together

De Roure et al. Social Scientific Objects 1st International Workshop on Social Object Networks, Boston, 2011 http://users.ox.ac.uk/~oerc0033/preprints/myExpSocialObjects.pdf

Page 37: OeRC Seminar

Where Next/Challenges

•  Prototype development

•  Models for Research Objects –  Vocabularies

•  Refinement of lifecycle states –  Versioning and Evolution

•  Provenance

–  RO components –  The RO itself

•  Trust

37 http://www.flickr.com/photos/marsdd/2986989396/

Page 38: OeRC Seminar

Music

38

Page 39: OeRC Seminar

Music

•  Music IR and Linked Data

–  Publication of collections   eTree

  Million Song Dataset

  Benefits?

•  Music IR and ROs –  What are the Research ���

Objects of Music IR?

–  Intermediate results/feature ���sets

•  Ontologies and vocabularies for describing results/feature sets

39

Page 40: OeRC Seminar

Thanks!

•  Manchester Information Management Group

–  http://img.cs.manchester.ac.uk •  myGrid Team

–  http://www.mygrid.org.uk/ •  Wf4Ever Team

–  http://www.wf4ever-project.org/

40

Page 41: OeRC Seminar

Where Next?

41