mygrid/taverna provenance
DESCRIPTION
myGrid/Taverna Provenance. Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06. Components. Identifiers LSIDs Data JDBC data store Metadata RDF Provenance Plugin Browsing Provenance Browser Plugin Security Under development. LSID. - PowerPoint PPT PresentationTRANSCRIPT
myGrid/Taverna Provenance
Daniele TuriUniversity of Manchester
OMII f2f Meeting, London, 19-20/4/06
Components
• Identifiers– LSIDs
• Data– JDBC data store
• Metadata– RDF Provenance Plugin
• Browsing– Provenance Browser Plugin
• Security– Under development
LSID
LSID: Life Science Identifier
• URN specification in progress
• 5 part identifier (with optional version id)– urn:lsid:www.mygrid.org.uk:lsdocument:X1234– urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi
:7717376
• protocol for retrieving data and metadata about an object
• commitment by the provider to always return the same data for an ID
LSID (ctd)• Issue
–LSID Authorities
• Resolution
–LSID Resolvers
• Examples
–myGrid
–Long Term Ecological Research Network
–BioPathways Consortium
LSID (ctd 2)
• abstraction
• lightweight
• independent from actual storage implementation
–database
– file system
–application
• both for private and public data sources
Data
Data Storage (current)
• Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC
• Optional and can be done by configuring a Baclava Data Store
• Allows the LSIDs of data items to be resolved against the actual data
Data Storage (future)
• Domain-specific databases– use outside myGrid
• Develop:– taverna processor for JDBC/OGSA-DAI– associated interface (cf BioMart)
• Users will be able to study the contents of an existing database and: – write queries that extract data from the database,
where the query may be parameterised with values passed in from the workflow;
– write requests that insert data from the workflow into a named table in the database.
Metadata
Metadata Generation
• Taverna Provenance Plugin
• Listen to Taverna Events
– WorkflowEventListener
• Faithfully record them as ontological instance data
– RDF graphs (one for each Taverna run)
Metadata
• Representation
• Ontology (Schema)
• Storage
• Query
• Browsing
Representation
• RDF
– triples
• subject –predicate object
– URIs (hence easy data integration)
– semantic web language
– XML serialization
– flexible, powerful
– sets of triples gives rise to graphs
Workflow Run
urn:lsid:..:wfInstance:8
runs
launchedBybelongsTo
urn:lsid:…:org:HY7
urn:lsid:…:person:4
urn:lsid:…:workflow:6
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
executed
executed
Schema
• Ontology
– RDF schema
• Taxonomic inferences
– also available as OWL
• opens it up to complex reasoning
Typed Workflow Run
urn:lsid:..:wfInstance:8
runs
launchedBy
Experimenter
belongsTo
Organization
urn:lsid:…:org:HY7
ProcessRunWorkflowRun Workflow
Provenance Ontology
runs
launchedBy
belongsTo
executed
urn:lsid:…:person:4
urn:lsid:…:workflow:6
urn:lsid:…:processRun:84
urn:lsid:…:processRun:51
executed
executed
Storage
• Named RDF graphs
– retrieve whole graphs (eg workflows)
– implementation in
• NG4J (Jena + MySQL)
– scalability issues
• Sesame2 native store
– scalable
– Java 5
Query
• RDF query languages
– TriQL, SeRQL, SPARQL
• query languages for named RDF graphs
• Ontology inspection/reasoning
• Canned Queries
– workflows with failed processes
– input/output of past process runs
– workflows with data changed by user
Browsing
Provenance Browsing
• Provenance Browser Plugin
– reusing Taverna GUI components
• Matthew Gamble
Analysis
Provenance Analysis
• Comparison
• Aggregation
• etc
– see work by Jun Zhao
Security
• User sends LSID ref and credentials to the Access Point • Access Point returns data and metadata or denies
access as follows: – credentials are passed to a User Directory – User Directory passes the corresponding user to the
Authorization Authority – Authorization Authority returns the user attributes in the form of a
(possibly signed) SAML assertion – this assertion, together with the lsid and its corresponding
metadata, is passed to the Policy Enforcement Point (PEP) – PEP uses these three inputs to form an XACML request that is
passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set.
– PDP evaluates the request against its policy set and returns an XACML response to PEP
– PEP decodes the response and either allows data/metadata to be returned to the user or denies access.
myGrid XACML Policy
• Scenario – supervisors can access all workflows in the
organization – students can access only their own workflows – blacklisted users cannot access anything
• See policySet.xml on myGrid wiki