semantic mediation in seek/kepler: exploiting semantic annotation for discovery, analysis, and...

25
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher Dept. of Computer Science, UC Davis UC Davis Genome Center ludaesch @ ucdavis.edu Shawn Bowers UC Davis Genome Center sbowers @ ucdavis.edu ormatics.org | kepler-project.org | www.sdsc.edu | dbis.ucdavis.edu | genomics.u

Upload: lindsey-rose

Post on 20-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

  • Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and WorkflowsBertram LudscherDept. of Computer Science, UC DavisUC Davis Genome Centerludaesch @ ucdavis.edu

    Shawn BowersUC Davis Genome Centersbowers @ ucdavis.edu

    seek.ecoinformatics.org | kepler-project.org | www.sdsc.edu | dbis.ucdavis.edu | genomics.ucdavis.edu

    foobar

    Semantic Mediation System, SEEK/Kepler

    Science Environment for Ecological KnowledgeSEEK is an NSF-funded, multidisciplinary research project to facilitate

    Access to distributed ecological, environmental, and biodiversity dataEnable data sharing & reuseEnhance data discovery at global scales

    Scalable analysis and synthesis Taxonomic, spatial, temporal, conceptual integration of data, addressing data heterogeneity issuesEnable communication and collaboration for analysisEnable reuse of analytical componentsSupport scientific workflow design and modeling

    foobar

    Semantic Mediation System, SEEK/Kepler

    SEEK data access, analysis, mediationData Access (EcoGrid)Distributed data network for environmental, ecological, and systematics dataInteroperate diverse environmental data systems

    Workflow Tools (Kepler)Problem-solving environment for scientific data analysis and visualization scientific workflows

    Semantic Mediation (SMS)Leverage ontologies for smart data/component discovery and integration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Managing Data HeterogeneityData comes from heterogeneous sourcesReal-world observationsSpatial-temporal contextsCollection/measurement protocols and proceduresMany representations for the same information (count, area, density)Data, Syntax, Schema, Semantic heterogeneity

    Discovery and synthesis (integration) performed manuallyDiscovery often based on intuitive notion of what is out thereSynthesis of data is very time consuming, and limits use

    foobar

    Semantic Mediation System, SEEK/Kepler

    Scientific workflow systems support data analysisKEPLER

    foobar

    Semantic Mediation System, SEEK/Kepler

    A simple Kepler workflowComposite Component(Sub-workflow)Loops often used in SWFs; e.g., in genomics and bioinformatics (collections of data, nested data, statistical regressions, ...) (T. McPhillips)

    foobar

    Semantic Mediation System, SEEK/Kepler

    A simple Kepler workflowWorkflow runs PhylipPars iteratively to discover all of the most parsimonious trees.UniqueTrees discards redundant trees in each collection.Lists Nexus files to process (project)Reads text filesParses Nexus formatDraws phylogenetic treesPhylipPars infers treesfrom discrete, multi-statecharacters.(T. McPhillips)

    foobar

    Semantic Mediation System, SEEK/Kepler

    A simple Kepler workflowAn example workflow run, executed as a Dataflow Process Network

    foobar

    Semantic Mediation System, SEEK/Kepler

    SMS motivationScientific Workflow Life-cycle Resource Discoverydiscover relevant datasetsdiscover relevant actors or workflow templatesWorkflow Design and Configurationdata actor (data binding)data data (data integration / merging / interlinking)actor actor (actor / workflow composition)

    Challenge: do all this in the presence of 100s of workflows and templates1000s of actors (e.g. actors for web services, data analytics, )10,000s of datasets1,000,000s of data items highly complex, heterogeneous data price to pay for these resources: $$$ (lots) scientists time wasted: priceless!

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities SEEK KR group is developing OWL-DL ontologies:Various workflow-component ontologies (for categorizing by function, project, scientific discipline, )Scientific observation ontology (OBOE), an upper ontology for defining and relating observations, measurements, and unitsDomain specific ontologies that extend OBOE (standard and derived units, ecology and biodiversity concepts, )OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities Annotations connect resources to ontologiesConceptually describe a resource and/or its data schema Annotations provide the means for ontology-based discovery, integration, OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Hybrid types Semantic + Structural Typing

    foobar

    Semantic Mediation System, SEEK/Kepler

    Semantic Type Annotation in KeplerComponent input and output port annotationEach port can be annotated with multiple classes from multiple ontologiesAnnotations are stored within the component metadata

    foobar

    Semantic Mediation System, SEEK/Kepler

    Component Annotation and IndexingComponent AnnotationsNew components can be annotated and indexed into the component library (e.g., specializing generic actors)Existing components can also be revised, annotated, and indexed (hiding previous versions)

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities Ontology-based smart searchFind components by semantic typesFind components by input/output semantic typesOntology-based query rewriting for discovery/integrationJoint work with GEON project (see SSDBM-04, SWDB-04)

    OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Smart SearchFind a component (here: an actor) in different locations (categories) based on the semantic annotation of the component (or its ports)

    foobar

    Semantic Mediation System, SEEK/Kepler

    Searching in context Search for components with compatible input/output semantic types searches over actor library applies subsumption checking on port annotations

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities Workflow validation and analysisCheck that workflows are semantically & structurally well-typedInfer semantic type annotations of derived data (ie, type inference)An initial approach and prototype based on mapping composition (see QLQP-05)User-oriented provenanceCollect & query data-lineage of WF runs (see IPAW-06)OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Workflow validation in KeplerNavigate errors and warnings within the workflow

    Search for and insert adapters to fix (structural and semantic) errors Statically perform semantic and structural type checking

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities Integrating and transforming dataMerge (smart union) datasetsFind mappings between data schemas for transformation data binding, component connections (see DILS-04)OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaboration

    foobar

    Semantic Mediation System, SEEK/Kepler

    Smart (Data) Integration: Merge Discover data of interest

    connect to merge actor

    compute mergealign attributes via annotationsopen dialog for user refinement store merge mapping in MOML

    enjoy! your merged dataset almost, can be much more complicated

    foobar

    Semantic Mediation System, SEEK/Kepler

    Under the hood of Smart Merge Exploits semantic type annotations and ontology definitions to find mappings between sources

    Executing the merge actor results in an integrated data product (via outer union) a3a6a1a8a4Mergea1a8a3a6a4

    foobar

    Semantic Mediation System, SEEK/Kepler

    Approach & SMS capabilities Workflow design support(Semi-) automatically combine resource discovery, integration, and validationAbstract Executable WF ongoing work!OntologiesSemantic AnnotationIterativeDevelopmentResource DiscoveryWorkflow ValidationResourceIntegrationWorkflowElaborationAutomated SWF Refinement

    foobar

    Semantic Mediation System, SEEK/Kepler

    SummaryOutlook: Ontologies and semantic anotations for WF design & reusePut ontologies to actual use in Kepler Continue to develop Kepler tools for annotation (KR observation ontology), discovery, integration, design,

    Issues & Challenges:Tools/approaches for ontology (OWL) management, organization, reasoningOpen source (distributed) ontology (OWL) storage and reasoningTools and techniques for robust ontology versioning, and extension

    AcknowledgementsTimothy McPhillips, Dave Thau (UC Davis)Mark Schildhauer, Josh Madin, Matt Jones (UCSB)Deana Pennington (UNM)Rich Williams (Microsoft Research)Ferdinando Villa, Sergey Krivov (UVM)

    foobar

    watch the animation free after Mastercard/Visa ;-)

    Overview of SMS (will be elaborated subsequently) links to other parts

    Overview of SMS (will be elaborated subsequently) links to other parts

    Overview of SMS (will be elaborated subsequently) links to other parts

    Overview of SMS (will be elaborated subsequently) links to other parts

    Overview of SMS (will be elaborated subsequently) links to other parts

    dito (beautify)Overview of SMS (will be elaborated subsequently) links to other parts

    Overview of SMS (will be elaborated subsequently) links to other parts

    link again to other stuff THE END