first prototype of the core platform - optique project · 2015. 2. 17. · deliverable d2.3 first...

31
Project N o : FP7-318338 Project Acronym: Optique Project Title: Scalable End-user Access to Big Data Instrument: Integrated Project Scheme: Information & Communication Technologies Deliverable D2.3 First Prototype of the Core Platform Due date of deliverable: (T0+12) Actual submission date: October 31, 2013 Start date of the project: 1st November 2012 Duration: 48 months Lead contractor for this deliverable: FOP Dissemination level: PU – Public Final version

Upload: others

Post on 04-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Project No: FP7-318338

    Project Acronym: Optique

    Project Title: Scalable End-user Access to Big Data

    Instrument: Integrated Project

    Scheme: Information & Communication Technologies

    Deliverable D2.3First Prototype of the Core Platform

    Due date of deliverable: (T0+12)

    Actual submission date: October 31, 2013

    Start date of the project: 1st November 2012 Duration: 48 months

    Lead contractor for this deliverable: FOP

    Dissemination level: PU – Public

    Final version

  • Executive Summary:First Prototype of the Core Platform

    This document summarises deliverable D2.3 of project FP7-318338 (Optique), an Integrated Project sup-ported by the 7th Framework Programme of the EC. Full information on this project, including the contentsof this deliverable, is available online at http://www.optique-project.eu/.

    The deliverable presents a first software prototype of the Optique system, which integrates an initial setof software components from the technical workpackages into a central platform. The deliverable providesdetailed documentation of shared platform interfaces, which have been designed and implemented duringthe first project phase in order to enable a tight integration of the Optique components. Particularly, theintegration of modules for ontology and mapping management, query transformation and visual query for-mulation into the initial prototype is described. Finally, the document provides instructions for setting upthe platform as well as explains basic functionalities for the process of visual query building to both andEnd-User and IT-Experts.

    List of AuthorsJohannes Trame (FOP)Peter Haase (FOP)Ernesto Jiménez-Ruiz (UOXF)Evgeny Kharlamov (UOXF)Christoph Pinkel (FOP)Martín Rezk (FUB)Ahmet Soylu (UiO)

    ContributorsTimea Bagosi (FUB)Marco Console (UNIROMA1)Martin Giese (UiO)Mariano Rodríguez-Muro (FUB)Marco Ruzzi (UNIROMA1)Domenico Fabio Savo (UNIROMA1)Michael Schmidt (FOP)Martin G. Skjæveland (UiO)Marius Strandhaug (UiO)Evgenij Thorstensen (UOXF)

    2

    http://www.optique-project.eu/

  • Contents

    1 Introduction 4

    2 Optique Platform Architecture & Implementation 62.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 The Information Workbench as a Platform for Integration . . . . . . . . . . . . . . . . . . . . 72.3 Shared Platform Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.3.1 Ontology Management API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.2 Relational Schema API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.3 R2RML Mapping Management API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.4 RDF Data Management API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.4 Integration of Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.1 Ontology & Mapping Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.2 Query Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.3 Query Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3 Documentation and Prototype 213.1 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.1.1 Obtaining the platform bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 Installation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.3 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.4 Opening the Optique platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.5 Shutting down the Optique platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2 Administration Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.1 Setup and Configuration Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Full Administrator Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    3.3 End User Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.3.1 General Platform Features for Exploration, Search, Authoring and Visualisation . . . . 253.3.2 Visual Query Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    4 Conclusion and Outlook 29

    Bibliography 29

    Glossary 31

    3

  • Chapter 1

    Introduction

    A typical problem that end-users face when dealing with Big Data is that of data access. Due to theincreasing volume of data, the velocity of data growth as well as due to the variety of different data formatsaccessing the relevant information becomes a difficult task. In situations where an end-user needs data thatpredefined queries do not provide, the help of IT-experts (e.g., database managers) is required to translatethe information need of end-users to specialised queries and optimise them for efficient execution. Since thisprocess may require several iterations and may take several days, IT-experts spend 30–70% of their timegathering and assessing the quality of data [2]. Specially in domains such as the oil and gas industry thisprocess is the main bottle neck for data access. The Optique project aims at solutions that reduce the costin time and money dramatically.

    streaming data

    end-user IT-expert

    Ontology Mappings

    ...heterogeneous data sources

    query

    results

    Query Formulation

    Ontology & Mapping Management

    ...

    Application(Analytics)

    Query TransformationDistributed Query Optimisation and Processing

    Figure 1.1: The Optique OBDA system

    The key idea of the semantic approach known as “Ontology-Based Data Access” (OBDA) [4, 1] is to usean ontology, which presents to users a semantically rich conceptual model of the problem domain. The usersformulate their information requirements (that is, queries) in terms of the ontology, and then receive theanswers in the same intelligible form. These requests should be executed over the data automatically, withoutan IT-expert’s intervention. To this end, a set of mappings is maintained which describes the relationshipbetween the terms in the ontology and the corresponding terminology in the data source specifications, e.g.,table and column names in relational database schemas.

    Beyond scalability issues and the ability to handle (temporal) data streams, state-of-the-art OBDAsystems put in particular constrains on the usability for end-user (i.g., syntactic and conceptual mismatchduring query formulation) as well as for IT-Experts (i.g., suitable ontologies and mappings are expensive toobtain). The Optique projects aims at developing the next generation OBDA system (cf. Figure 1.1) that

    4

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    overcomes these limitations. A bigger goal of the project is to provide a platform with a generic architecturethat can be adapted to any domain that requires scalable data access and efficient query execution.In this deliverable we present an initial prototype of the Optique platform describing the individual systemcomponents and their integration and interaction by using shared interfaces, system-wide conventions andstandards that have been implemented during the first project phase. The first prototype is strongly alignedwith the initial Optique architecture and requirements as specified by all technical and scientific partners inthe Optique Deliverable 2.1.The remainder of the deliverable is organized as follows:In Chapter 2 we recall the agreed architecture as specified in Deliverable 2.1, before we describe in detail thedesign and implementation of shared platform interface and their role in integrating the different components.Chapter 3 provides a documentation for installation and configuration of the platform as well as end-userinstructions. Chapter 4 summarizes the main achievements and platform features as well as provides anoverview of ongoing development and engineering activities for the second year.

    5

  • Chapter 2

    Optique Platform Architecture &Implementation

    2.1 Architectural Overview

    The Optique architecture is designed as a tiered architecture with three layers (cf. Figure 2.1). Whilethe presentation layer addresses specifically end-users needs such as query formulation and query answervisualisation, it provides also functionalities for IT-experts in order to set-up and (re)configure the platform.

    The core components such as for query transformation, query answering and ontology and mappingmanagement are integrated on the application layer and interact through specialized application programminginterfaces. The data and resource layer is responsible for managing the access to different kind of (re)sources,for example, relational databases, data streams or even access to computational resources in the cloud.

    Cloud API

    data streamsRDBs, triple stores, temporal DBs, etc.

    Stream connectorJDBC, Teiid

    ... ...

    Information Workbench frontend API* Infor. Workbench

    frontend API* Information Workbench frontend API* Infor. Workbench

    frontend API*

    Cloud (virtual resource pool)

    Ans. visual.: WorkbenchQuery Formulation

    Interface

    Answers visualisation Optique's Configuration Interface

    Ontology and Mapping Management Interface

    Ontology editing Interface: Protégé

    PresentationLayer

    Query Answering Component

    External visualisation engines

    Workbench visualisationengine

    Shared triple store

    Sesame

    - ontology- mappings- configuration- queries- answers- history- lexical information- etc.

    Map

    ping

    s

    Ontology and Mapping Manager's Processing Components

    Ont/Mapp matchers

    Ont/Mapp bootsrappers

    Query Formulation Processing Components

    Query by Navig.1-time Q SPARQL Stream Q

    Context Sens. Ed.1-time Q SPARQL Stream Q

    Direct Editing1-time Q SPARQL Stream Q

    Faceted searchQuery Ans man1-time Q SPARQL Stream Q

    QDriven ont construction

    1-time Q SPARQL Stream Q

    Export functional

    mininglog analysis

    ...

    Stream analyticsranking, abductionprovenance, etc.

    Met

    adat

    a

    Configuration of modules

    LDAP authentification

    Feedback funct.

    Sesame

    Query transformation

    Query rewriting1-time Q SPARQL

    Stream Q

    Semantic QOpt.1-time Q SPARQL

    Stream Q

    Syntactic QOpt1-time Q SPARQL

    Stream Q

    Sem indexing1-time Q SPARQL

    Stream Q

    Query Execution1-time Q

    SQL->RDFStream

    Q

    Distributed Query Execution

    Q Planner1-time Q

    SQLStream

    Q

    Optimization1-time Q

    SQLStream

    Q

    Materialization module

    Shared database

    JDBC, Stream API

    ontology mapping

    Bootstrapper

    ontology mapping

    Analyser

    ontology mapping

    Evolution Engine

    ontology mapping

    Transformator

    ontology mapping

    ApproximationSimplification

    OW

    L API

    Federation module

    Federation module

    Manager

    Ont/Mapp revision control, editing

    * E.g., widget development, Java, REST

    Ontology Processing

    Ontology modularization

    Sesame

    Front end: mainly Web-basedComponent

    Group of components

    Optique solution

    External solution

    Components Colouring Convention Expert users

    Types of Users End users API Application receiving

    answers

    Ontology reasoner 1Ontology reasoner 2

    ...

    ComponentManager,

    Setup module

    Data,Resource Layer

    ApplicationLayer

    Optique Platform: Integrated via Information Workbench

    Figure 2.1: The general architecture of the Optique OBDA system

    6

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    2.2 The Information Workbench as a Platform for Integration

    The Optique platform is built up on the Information Workbench, an industrial-strength and mature softwareplatform developed and maintained by fluid Operations.1 The Information Workbench is an open, data-centric development platform that has been specifically designed to support the whole lifecyle of interactingwith semantic data – from integration to access, visualization, exploration, and data interaction. Exploitingthe features that are already present in the Information Workbench thus provides us with a stable foundationfor the implementation of the Optique platform and enables us to make an first version of the platformavailable already during this project phase.Based on the architecture and language standards agreed in Deliverable 2.1, the Optique platform is supposedto provide interfaces for data access, querying, and plugin-mechanisms for query language extensions aswell as frontend components. While the Information Workbench provides already a number of extensionpoints in order to automatically plug-in new modules, special programming interfaces have been designedand implemented according to the requirements as specified in Deliverable 2.1. This is to enable a tightintegration and seamless interaction of the Optique software components.

    2.3 Shared Platform Interfaces

    We distinguish between (i) the shared interfaces among the Optique components and (ii) the interfaces pro-vided by each component itself (e.g. APIs for query formulation). The concrete interfaces for the componentsand their interplay with the platform are described in the deliverables of the components itself. In the initialarchitecture specification, we agreed on a number of shared interfaces:

    (1) API for Ontology Management

    (2) API for Mapping Management

    (3) API for Relational Data and Metadata Management

    (4) API for RDF Data Management

    (5) API for Streaming Data Management

    (6) API for Cloud Automation

    Within the first year we designed and implemented interfaces for (1), (2), (3) and (4) as those have been pri-oritized by the partners during the requirement specification in Deliverable 2.1. Table 2.1 provides a overviewon which API is used by the Ontology and Mapping Management (O&M), the Query Transformation(QT)and the Query Formulation (QF) component within the initial prototype of the platform.

    O&M

    QT QF

    Ontology Management API X X XMapping Management API X XRelational Data and Metadata Management API X XRDF Data Management API X

    Table 2.1: Usage of shared APIs by the different components.

    The following subsections describe in detail the signatures and functionalities of these four interfaces. Theimplementation of the interfaces is tested by a set of JUnit tests which are provided in the respective packages.

    1http://www.fluidops.com/information-workbench/

    7

    http://www.fluidops.com/information-workbench/

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    2.3.1 Ontology Management API

    The Ontology Management exposes the functionality for loading, storing, manipulating and reasoning overOWL ontologies. Ontologies are stored in the Optique repository natively as RDF triples, each ontology inits own context (named graph). Through the ontology management API, ontologies can be accessed throughthe object model of the OWLAPI2. In this way, much of the basic ontology management functionality canbe directly reused from the OWLAPI and seamlessly interacts with Reasoners such as Hermit or Pellet.3

    Ontologies in the Optique repository are uniquely identified via their Ontology URI.4

    Note on ontology identifiers:The OWL 2 specification uses IRIs instead of URIs. We use URIs for consistency with the rest of theOptique APIs. The API has convenience methods to convert URI IRI. The OWL 2 specificationstates that an ontology *may* have an ontology IRI, we require that it *must* have one in order toidentify it.

    At this point ontology identifiers are supposed to be unique. In the future, we may want to extend the APIto support multiple versions of an ontology with the same ontology IRI/URI, but different version identifiers.

    Location

    • Interface and implementation class in package com.fluidops.iwb.api

    • Class OntologyManager : the ontology manager interface

    • Class OntologyManagerImpl : the corresponding implementation class

    Using the API

    The API comes with three basic methods, as documented below. In order to obtain a runtime object of theontology manager, it is requred to to call:

    >OntologyManager om = EndpointImpl.api().getOntologyManager();

    On top of this object, the API itself provides the following methods:

    Load an ontology from the Optique repositoryOWLOntology loadOntology(URI ontologyURI) throws OWLOntologyCreationException;

    The method loads an ontology with the URI ontologyURI. It returns an object of type OWLOntology,which is defined in the OWLAPI. Therefore, one can now use all the existing functionality from theOWLAPI. For the implementation of the load method, we have implemented a new OWLOntologyDoc-umentSource that allows to read ontologies directly from the Optique triple store in an efficient manner.

    Store an ontology in the Optique repositoryboolean storeOntology(OWLOntology ontology, URI ontologyURI, boolean overwrite)

    throws OWLOntologyStorageException, OWLOntologyCreationException;

    This method takes an ontology and stores it under the given ontologyURI in the Optique repository. Withoverwrite one can specify whether an existing ontology with the ontologyURI should be overwritten. Themethod returns true if the ontology has been successfully stored. Of course it is possible to store anontology that has been created via the OWLAPI from some other source.

    Remove an ontology from the repositoryboolean removeOntology(URI ontologyURI) throws OWLOntologyStorageException;

    2http://owlapi.sourceforge.net3see http://clarkparsia.com/pellet and http://hermit-reasoner.com4http://www.w3.org/TR/owl2-syntax/#Ontology_IRI_and_Version_IRI

    8

    http://owlapi.sourceforge.nethttp://clarkparsia.com/pellethttp://hermit-reasoner.comhttp://www.w3.org/TR/owl2-syntax/#Ontology_IRI_and_Version_IRI

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    This method takes a given ontologyURI and removes the corresponding ontology from the Optique repos-itory. The method returns true if the ontology has been successfully removed.

    Code Example

    Listing 2.1: Example of programmatic access to the OntologyAPI

    /*

    * create the OntologyManager and get an instance

    */

    OWLOntologyManager m = OWLManager.createOWLOntologyManager ();

    OntologyManager om = EndpointImpl.api().getOntologyManager ();

    /*

    * Load an Ontology from the web via the OWLAPI and store it in the Optique repository

    */

    // create the URI

    URI testOntologyURI = ValueFactoryImpl.getInstance ().createURI("http ://www.w3.org/TR /2003/PR-owl -

    guide -20031209/ wine");

    // first use the OWL API to load the wine ontology from the web

    OWLOntology o = m.loadOntologyFromOntologyDocument(IRI.create(testOntologyURI));

    // store the ontology using the platform ontology manager

    om.storeOntology(o, testOntologyURI , false);

    /*

    * Load the Ontology from the platform repository

    */

    OWLOntology o = om.loadOntology(testOntologyURI);

    // Use the OWL API to count the axioms

    int axiomCount = o.getAxiomCount ();

    // Check that the ontology is in OWL 2 DL

    boolean isInDL = (new OWL2DLProfile ()).checkOntology(o).isInProfile ();

    /*

    * Remove the ontology from the platform repository

    */

    om.removeOntology(testOntologyURI);

    2.3.2 Relational Schema API

    The Relational Data & Metadata API provides a standardized method for accessing relational metadata suchas table and column information, datatypes, constraints, and indices. In order to use the API, a so-calledRelational Metadata Provider, RDBMetadataProvider for short, needs to be created within the platform(compare Section ), which is responsible for extracting the required information from a relational database.While the API serves primarily as common layer for abstraction of different relational endpoint, it acts alsoas a caching layer which allows for much faster access to the metadata than fetching the metadata everytime individually from within the different components. Once the information has been collected, the APIcan be used to obtain an in-memory representation (PoJo, Plain Old Java Object) of the database and itsschema information.

    Location

    • The Relational Data & Metadata API is implemented as part of the core platform.

    • The RDBMetadataProvider is implemented in package com.fluidops.iwb.provider

    • The API itself is implemented in the package com.fluidops.iwb.api

    • The package com.fluidops.iwb.api.datacatalog contains the interfaces

    • The class DataCatalogService in package com.fluidops.iwb.api.catalog is the main entry point

    9

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    • The class RelationalDataEndpoint in package com.fluidops.iwb.api.DataCatalog represents the PoJo servingas an entry point for accessing meta information about relational schemas

    • The package com.fluidops.iwb.api.DataCatalog.impl contains the implementation

    Access Extracted Metadata

    The API comes with three methods, as documented below. In order to obtain a runtime object from theAPI, one needs to call:

    > DataCatalogService dcs = EndpointImpl.api().getDataCatalog();

    At this point, it is assumed that a RDBMetadataProvider has already been created before (see Section 3.2.1).On top of this object, the API itself provides the following methods:

    Check Data Endpoint Existencepublic boolean dataEndpointExists(URI dataEndpointId)

    The method checks whether the data endpoint with the given ID exists. The endpoint ID is composed asfollows: // where is the default namespaceof the platform the is the ID of the RDBMetaInformationProvider.

    Load Data Endpoint by IDpublic DataEndpoint loadDataEndpoint(URI dataEndpointId) throws IllegalArgumentException

    Returns the DataEndpoint PoJo object associated with the specified data catalog entry, which allowsbrowsing through the database’s meta information and schema. If no catalog entry for the specified idexists, an IllegalArgumentException is thrown. For now, the returned object will always be of the morespecific type RelationalDatabaseEndpoint. Hence, to access the relational schema, it might be requiredto cast the endpoint to the latter.

    Load All Data Endpointspublic List loadDataEndpoints()

    Returns a list of the DataEndpoint PoJos for all DataEndpoints that have been extracted. As for methodloadDataEndpoint(URI dataEndpointId) it might be required to cast the DataEndpoints in the list totype RelationalDatabaseEndpoint in order to access the Relational Schema contained within.

    Working With Extracted Data

    RelationalDatabaseEndpoint objects can be retrieved through the API as described above. An endpointcorresponds to a database in RDBMS terms, i.e., each RelationalDatabaseEndpoint represents one schemawithin a specific instance of a database server.RelationalDatabaseEndpoint contains two methods:

    • Schema getSchema(): returns an object representing metadata on the schema/database.

    • DatabaseInfo getDatabaseInfo(): returns an object representing information about the database serverinstance that hosts the schema/database.

    From Schema objects, schema’s name and list of Tables can be accessed. Tables, in turn, contain the followingmethods to access relevant meta information:

    • String getName(): returns the table’s local name, e.g., “mytable”

    • String getFullName(): returns the fully qualified name of a table, e.g., “mydb.mytable”

    • List getColumns(): returns the columns (i.e., attributes) in the table

    • PrimaryKey getPrimaryKey(): returns the primary key of the table

    10

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    • public List getForeignKeys(): returns a list of foreign keys, if any

    • TableType getTableType(): returns the type of the table (can be TABLE for normal tables, but couldalso be a VIEW or other types supported by the SQL standard)

    • Schema getSchema(): returns the schema that contains the table

    • List getIndices(): lists indices of the table RDF Representation of Relational Schemas

    Internally, the RelationalSchemaOntology is used to represent relational schema in RDF. The ontology definesa set of classes and properties that allow to interlink all information that belongs to a schema: Schemas arerelated to tables. Tables consist of columns, primary keys, keys, foreign keys and indices. A column has adatatype associated and a position at which it occurs in the table. The ontology is bundled with the Optiqueplatform and will be loaded automatically during the start-up of the platform.

    Code Examples

    The following code snippet exemplifies how to output the table names and columns of such an endpoint(assuming there is a RDBMetainformationProvider with id “npd”, as illustrated above).

    Listing 2.2: Example of programmatic access to the RelationalDatabaseEndpoint

    // compose the URI of the data catalog through the providerId of the RDBMetaInformationProvider

    URI dataEndpointId = ProviderUtils.objectAsUri(

    EndpointImpl.api().getNamespaceService ().defaultNamespace () + "npd/npd");

    if (!dcs.dataEndpointExists(dataEndpointId))

    System.out.println("Data endpoint with ID " + dataEndpointId + " does not exist");

    else

    {

    DataEndpoint de = dcs.loadDataEndpoint(dataEndpointId);

    if (de instanceof RelationalDatabaseEndpoint)

    {

    RelationalDatabaseEndpoint rde = (RelationalDatabaseEndpoint)de;

    Schema s = rde.getSchema ();

    System.out.println("Schema = " + s.getFullName ());

    for (Table t : s.getTables ())

    {

    System.out.println("-> Table: " + t.getName ());

    for (Column c : t.getColumns ())

    {

    System.out.println("---> Column: " + c.getName () + " with datatype " + c.

    getColumnDataType ().getName ());

    }

    }

    }

    }

    2.3.3 R2RML Mapping Management API

    The R2RML Mapping Management API has been designed for managing and manipulating collections ofmappings according to the R2RML standard.5

    Location

    • Interface and factory class in package eu.optique.api.r2rml

    • Class R2RMLMappingManager : the mappig manager interface5http://www.w3.org/TR/r2rml/

    11

    http://www.w3.org/TR/r2rml/

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    • Class R2RMLMappingManagerFactory : associated factory class

    • Implementation in package eu.optique.api.r2rml

    • Class R2RMLMappingManagerImpl : implementation

    • Test cases in package eu.optique.api.r2rml contains various test cases

    Access Mapping from the API

    The API comes with methods for mapping collection management, supporting the deserialization of map-pings from an input stream, serialization to file, maintenance (add, delete, replace) of mappings loaded intothe central metadata repository. At its core, the API provider in-memory classes for mapping explorationand manipulation. In order to obtain the API object, the factory class needs to be used as follows:

    > R2RMLMappingManager mm = R2RMLMappingManagerFactory.getApi();

    The mapping manager relies on the concept of mapping collections, where a mapping collection containsa set of mappings, thus serving as a logical grouping of R2RML mappings (e.g. containing all mappingsobtained from importing an R2RML file, or all mappings related to a given relational database). At APIlevel, every API method requires the specification of the so-called mapping collection ID. Mapping collectionswith different IDs are thus treated as isolated sets of mappings. On top of the R2RMLMappingManagerobject, the API itself provides the following functionality:

    Import R2RML Mappings From an Input Streampublic void importMappings(String mappingCollectionId, InputStream is, RDFFormat rdfFormat)

    throws IllegalArgumentException;

    The method imports an R2RML mapping from an input stream. The input stream is passed in parameteris. The mappingCollectionId provides an internal ID for the collection of mappings specified in the inputstream. The rdfFormat specifies the RDF format in which the mapping is serialized (if not provided,Turtle format is assumed). If the mappingCollectionId exists already, an IllegalArgumentException isthrown.

    Export R2RML Mappings as String (in Turtle)public String exportMappings(String mappingCollectionId)

    throws IllegalArgumentException;

    This method allows to export an existing mapping collection with the specified mappingCollectionId asa string. The output format is Turtle. An IllegalArgumentException is thrown if the specified mappingcollection does not exist.

    Export R2RML Mappings as Stringpublic String exportMappings(String mappingCollectionId, RDFFormat rdfFormat)

    throws IllegalArgumentException;

    This method allows to export an existing mapping collection with the specified mappingCollectionId as astring. The rdfFormat specifies the RDF format in which the mapping is to be serialized (if not provided,Turtle format is assumed). An IllegalArgumentException is thrown if the specified mapping collectiondoes not exist.

    Export R2RML Mappings to Filepublic void exportMappings(String mappingCollectionId,String fileName,RDFFormat rdfFormat)

    throws IllegalArgumentException;

    Allows to export an existing mapping collection with the specified mappingCollectionId to the file specifiedin fileName (to be stored on the server). The rdfFormat specifies the RDF format in which the mappingis to be serialized (if not provided, Turtle format is assumed). An IllegalArgumentException is thrown ifthe specified mapping collection does not exist.

    12

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    In-memory Mapping Retrieval & Access

    The API provides a method in order to retrieve a list of identifiers for existing mapping collections as wellas to check whether a collection with a specific ID is already known by the platform:

    • public List listMappingCollections();: Lists the IDs of all mapping collections that are availablein the system (i.e., have been previously imported).

    • public boolean mappingCollectionExists(String mappingCollectionId);: Checks whether the mapping col-lection with the specified mappingCollectionId exists.

    Once a mapping collection exists (i.e., mappings have been imported), it is possible to access, add, removeand replace the available collections and mappings through a bunch of methods, as documented in thefollowing:

    Check Existence of a Mapping in a Collectionpublic boolean mappingExists(TriplesMap mapping, String mappingCollectionId)

    throws IllegalArgumentException, InvalidR2RMLMappingException;

    Checks whether the specified mapping (passed as in-memory object) exists in a given mapping collection.Throws an IllegalArgumentException if the collection does not exist. The existence check does notsemantically or syntactically match the mappings, but instead just checks whether there is a mappingin the specified mapping collection that has a blank node or URI identical to the one that is stored inthe mapping in-memory object. In the very unlikely case that the specified mapping collection cannot beprocessed, an InvalidR2RMLMappingException is thrown.

    Load a Collection of Mappings as Main-memory Javapublic R2RMLMappingCollection loadMappings(String mappingCollectionId)

    throws IllegalArgumentException;

    The method loads a collection of (previously imported) mappings into main memory and returns anobject that can be used for further exploration and manipulation. The mappingCollectionId identifieswhich mappings are to be loaded. If the specified mappingCollectionId does not exist in the central store,an IllegalArgumentException is thrown. In the very unlikely case that the specified mapping collectioncannot be processed, an InvalidR2RMLMappingException is thrown.

    Deleting a Mapping Collectionpublic void deleteMappings(String mappingCollectionId)

    throws IllegalArgumentException;

    The method deletes a collection of (previously imported) mappings. The mappingCollectionId identifieswhich mappings are to be deleted. If the specified mappingCollectionId does not exist in the central store,an IllegalArgumentException is thrown.

    Delete a Mapping From a Collectionpublic boolean deleteMapping(TriplesMap mapping, String mappingCollectionId)

    throws IllegalArgumentException, InvalidR2RMLMappingException;

    Deletes the given mapping from the collection. Throws an IllegalArgumentException if the mappingcollection ID does not exist. Deletion is performed by considering the mapping’s resource ID. If nomapping with the specified resource ID is found, the method returns false. If deletion succeeds, themethod returns true. In the very unlikely case that the specified mapping collection cannot be processed,an InvalidR2RMLMappingException is thrown.

    Add a Mapping to a Collectionpublic void addMapping(TriplesMap mapping, String mappingCollectionId)

    throws IllegalArgumentException, InvalidR2RMLMappingException;

    Adds the given mapping to the collection; if the collection does not exist, a new collection is created. The

    13

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    method throws an IllegalArgumentException if a mapping with the mapping’s resource ID already existsin the repository.

    Replace a Mapping in a Collectionpublic boolean replaceMapping(TriplesMap toRemove, TriplesMap toAdd, String mappingCollectionId)

    throws IllegalArgumentException, InvalidR2RMLMappingException;

    Replaces a mapping in the collection with the specified mappingCollectionId. Throws an IllegalArgu-mentException if the mapping collection ID does not exist. Deletion is performed by matching themapping’s resource ID. The flag indicates whether the specified mapping has been successfully deleted(adding will take place independently, i.e. if the mapping to be removed does not exist, this methodis equivalent to calling addMappingToCollection. In the very unlikely case that the specified mappingcollection cannot be processed, an InvalidR2RMLMappingException is thrown.

    In-memory Mapping Manipulation

    Creating a TriplesMapTo create a new TriplesMap, the MappingFactory class provides three methods:

    • public static TriplesMap createTriplesMap(LogicalTable lt, SubjectMap sm);

    • public static TriplesMap createTriplesMap(LogicalTable lt, SubjectMap sm, PredicateObjectMap pom);

    • public static TriplesMap createTriplesMap(LogicalTable lt, SubjectMap sm,List listOfPom);

    A TriplesMap must have a LogicalTable and a SubjectMap. It can also have any number of PredicateOb-jectMaps.6 These can also be created with methods in the MappingFactory class. A LogicalTable musteither be a SQLBaseTableOrView or a R2RMLView. These can be created with the methods:

    • public static SQLTable createSQLBaseTableOrView(String tableName);

    • public static R2RMLView createR2RMLView(String query);

    A SubjectMap is a subclass of TermMap. TermMaps must have a TermMapType which must be set to eitherTermMapType.COLUMN_VALUED, TermMapType.CONSTANT_VALUED orTermMapType.TEMPLATE_VALUED. A SubjectMap can be created with the methods:

    • public static SubjectMap createSubjectMap(Template template);

    • public static SubjectMap createSubjectMap(TermMapType type, String columnOrConst);

    When using the first method, the TermMapType will be set to TermMapType.TEMPLATE_VALUED. ATemplate can be created with the methods:

    • public static Template createTemplate();

    • public static Template createTemplate(String template);

    A PredicateObjectMap can be created with one of the following methods:

    • public static PredicateObjectMap createPredicateObjectMap(PredicateMap pm, ObjectMap om);

    • public static PredicateObjectMap createPredicateObjectMap(PredicateMap pm, RefObjectMap rom);6http://www.w3.org/TR/r2rml/#property-index

    14

    http://www.w3.org/TR/r2rml/#property-index

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    • public static PredicateObjectMap createPredicateObjectMap(List pms, List oms,List roms);

    The PredicateObjectMap must have at least one PredicateMap and at least one ObjectMap or RefOb-jectMap. When using the third method, the lists must contain elements that meet this requirement. Pred-icateMap and ObjectMap are subclasses of TermMap (as SubjectMap). These can be created with themethods createPredicateMap and createObjectMap. A RefObjectMap can be created with the method:

    • public static RefObjectMap createRefObjectMap(Resource parentMap);

    The resource node points to the parent triples map of the RefObjectMap.

    Modifying TriplesMapsA TriplesMap can be modified using the following methods:

    • public void setLogicalTable(LogicalTable lt);

    • public void setSubjectMap(SubjectMap sm);

    • public void addPredicateObjectMap(PredicateObjectMap pom);

    • public void removePredicateObjectMap(PredicateObjectMap pom);

    The methods setLogicalTable and setSubjectMap will overwrite the previously set LogicalTable or Sub-jectMap in the TriplesMap. The method addPredicateObjectMap will add the PredicateObjectMap to theend of the list of PredicateObjectMaps, and removePredicateObjectMap will remove the specified Predica-teObjectMap from the list. In general, set-methods will overwrite the previously set object and add-methodswill add the object to the end of a list. The remove methods can throw an IllegalStateException if re-moving the object would make the TriplesMap invalid (for example removing the last PredicateMap from aPredicateObjectMap).

    Using the API via REST/JSON

    Some of the methods are exposed publicly via the platform’s REST endpoint. In particular, the followingmethods can be used via REST/JSON:

    • public String exportMappings(String mappingCollectionId) throws IllegalArgumentException;

    • public void deleteMappings(String mappingCollectionId) throws IllegalArgumentException;

    • public boolean mappingCollectionExists(String mappingCollectionId);

    • public List listMappingCollections();

    As an example, we illustrate how to read a mapping collection from the REST/JSON API using the ex-portMappings functionality. Assume we know there’s a mapping collection with id test. The REST call toobtain a serialized version of the mappings in this collection (as Turtle) is as follows:http://:/REST/JSON/getMappingManager/?method=exportMappings&params=[%22test%22]&id=

    The and needs to be replaced through the respective server name (or IP) and port, respec-tively. The is just an ID that needs to be passed to the request, an integer such as 1, which will bereported in the answer. The answer may look like this:{"id":"1","result":

    "

    @prefix dc: .\n

    @prefix Settings: .\n

    @prefix rso: .\n

    ...

    a

    ...

    ","jsonrpc":"2.0"}

    15

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Thus, the result is returned as a JSON object which can be further processed with a JSON library, wherethe result is stored in variable result. In order to automate the request, the username and password (HTMLbasic authentication) need to be passed as POST parameters. In a standard setting, these will be admin/iwb(can be created and modified users at the User Management page).

    2.3.4 RDF Data Management API

    A variety of assets in Optique (ontology, mapping, relational database metadata, . . . ) will be stored in thecentral store as RDF. Using the Sesame RDF API, it will be possible to access these entities directly, e.g.by the means of SPARQL queries. In addition, the RDF data management API allows one to store, query,and manipulate other kinds of RDF data in the central store directly.

    Location

    • Interface and factory class in package eu.optique.api.rdf

    • Class RDFDataManager: the data manager interface

    • Class RDFDataManagerFactory: associated factory class

    • Implementation in package eu.optique.api.rdf.impl

    • Class RDFDataManagerImpl: implementation

    General Documentation

    The API comes with five methods, as documented below. In order to obtain a runtime object of the API,one need to call

    > RDFDataManager rdfDm = RDFDataManagerFactory.getApi();

    On top of this object, the API itself provides the following methods:

    Load RDF data from a filepublic void load(String filename, String format, String context, String source, Boolean userEditable)

    The method loads data from the file filename. The other parameters are optional: The parameter formatmay be used to specify the file format (e.g. N3), if not specified the file type is guessed from the file suffix.The parameter context determines the context into which the data is written (either as a prefixed URIor as a full URI enclosed in -brackets). The parameter source is a user-defined string representing thesource from which the data is loaded. The parameter userEditable defines, whether the loaded data canbe edited (i.e., modified & deleted) by the user. If not set, the system default configuration is used.

    Execute a SPARQL 1.0/1.1 SELECT querypublic TupleQueryResult sparqlSelect(String query) throws Exception

    The query result is returned in the form of Sesame’s TupleQueryResult. 7

    Execute a SPARQL 1.0/1.1 CONSTRUCT querypublic GraphQueryResult sparqlConstruct(String query) throws MalformedQueryException, QueryEvaluationException

    The query result is returned in the form of Sesame’s GraphQueryResult. 8

    Execute a SPARQL 1.0/1.1 ASK querypublic boolean sparqlAsk(String query) throws MalformedQueryException, QueryEvaluationException

    The query result is returned in the form of a boolean.7http://www.openrdf.org/doc/sesame2/api/org/openrdf/query/TupleQueryResult.html8http://www.openrdf.org/doc/sesame2/api/org/openrdf/query/GraphQueryResult.html

    16

    http://www.openrdf.org/doc/sesame2/api/org/openrdf/query/TupleQueryResult.htmlhttp://www.openrdf.org/doc/sesame2/api/org/openrdf/query/GraphQueryResult.html

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Execute a SPARQL 1.0/1.1 INSERT/DELETE querypublic void sparqlUpdate(String query) throws MalformedQueryException, UpdateExecutionException

    Executes the SPARQL UPDATE query.

    Code Example

    Listing 2.3: Example of programmatic access to the OntologyAPI

    // get an instance of the RDFDataManager

    RDFDataManager rdfDM = RDFDataManagerFactory.getApi ();

    /*

    * load an example rdf file

    */

    rdfDM.load("example.rdf", null , "", "npd -v2-db -vocab", false);

    /*

    * SPARQL ASK

    */

    if(rdfDM.sparqlAsk("ASK { ?s rdfs:label ?o }"))

    System.out.println("Triple exists!");

    /*

    * SPARQL SELECT

    */

    TupleQueryResult res = rdfDM.sparqlSelect("SELECT ?s ?o WHERE {?s rdfs:label ?o}");

    // iterate over the results and extract the variable bindings

    int i = 0;

    while (res.hasNext ())

    {

    BindingSet bs = res.next();

    Binding p = bs.getBinding("p");

    Binding o = bs.getBinding("o");

    System.out.print("Result tuple #" + i++ + ": ");

    if (p != null && p.getValue () != null)

    System.out.print("?p -> " + p.getValue ());

    if (o != null && o.getValue () != null)

    System.out.println(", ?o -> " + o.getValue ());

    }

    /*

    * SPARQL CONSTRUCT

    */

    GraphQueryResult gres = rdfDM.sparqlConstruct(

    "CONSTRUCT { rdfs:label ?o } WHERE { " +

    "?s rdfs:label ?o }");

    while (gres.hasNext ())

    System.out.println(gres.next().getSubject ());

    /*

    * SPARQL UPDATEWE

    */

    rdfDM.sparqlUpdate("INSERT DATA { rdfs:label \" Optique Project \"}");

    2.4 Integration of Components

    2.4.1 Ontology & Mapping Management

    OBDA systems crucially depend on the existence of suitable ontologies and mappings. Developing themfrom scratch is likely to be expensive and a practical OBDA system should support a (semi-) automaticbootstrapping of an initial ontology and set of mappings.

    The Ontology and Mapping (O&M) management component is in charge of creating and evolving the

    17

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    ontology and mappings, as well as, providing an interface to feed the Query Formulation component withthe ontology vocabulary in order to guide the formulation of the queries.

    The current implementation of the Optique O&M component is equipped with an O&M bootstrapper, aroutine that takes a database schemata and possibly instances over these schemata as an input, and returnsan ontology and a set of mappings connecting the ontology entities to the elements of the input schemata.For this purpose the O&M component retrieves the required metadata from the platfrorms’ shared metadatarepository using the Relational Schema API (see Section 2.3.2).

    The Optique O&M component also integrates an ontology matching system to align the bootstrappedontology with state of the art domain ontologies and an ontology approximation module to transform theresulting ontology if it is outside the desired OWL 2 profile.

    Currently, the O&M bootstrapper includes two installation wizards (basic and advanced) for the ontologyand the mappings. Figure 2.2 depicts the workflows of these wizards. Please refer to Deliverable D4.1 fordetails about the O&M bootstrapper and its installation wizards. The O&M bootstrapper is compliant with

    Figure 2.2: O&M installation wizards

    the Ontology API (see Section 2.3.1) and the Mapping Management API (see Section 2.3.2) provided by theOptique platform and is able to store the resulting assests back to the platform.

    2.4.2 Query Transformation

    The Query Transformation (QT) component Ontop allows to query virtual RDF graphs defined by a re-lational database, an ontology, and a set of R2RML mappings. The general architecture is illustrated inFigure 2.3. In a first step a set of mappings, the ontology and the SPARQL query are translated into a setof Datalog rules that represent these objects. Second, the program is optimized using query containmentbased techniques and Semantic Query Optimization. In particular we:

    • use SLD-resolution to compute a partial evaluation of the program; and

    • optimize the query(ies) with respect to Primary/Foreign Keys to avoid redundant self-joins.

    The optimized program is translated into an equivalent relational algebra expression, the SQL query isgenerated and executed by the DBMS.

    In order to integrate Ontop into the Optique platform we agreed on the languages and libraries we weregoing to support. In particular we agreed on using R2RML as mapping language, on the OWLAPI for

    18

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    SPARQL query q

    R2RMLOntology

    Datalog Relational Algebra

    SQL queryRelational DB

    +

    translation

    +

    Optimization

    DB Metadata

    Figure 2.3: Query answering with mappings in Ontop

    handling ontologies as well as on the Sesame API to generate the SPARQL algebra, handle RDF graphsand respective result sets. The database metadata, together with the R2RML mappings and the OWLontology are fetched from the platform’s shared metadata repository using the respective APIs and passedto the Ontop (Quest) repository during initialization —c.f. Figure 2.4. As Ontop internally makes use ofspecialized data structures for handling metadata, we created a transformer between the Optique’s databasemetadata object and the Ontop specific metadata object. Similarly, Ontop supports R2RML mappings bytransforming them internally into its own type of mappings. The Ontop component is registered within the

    Figure 2.4: OBDA assets are passed from the platform’s APIs to the Ontop component

    platform as a Sesame repository. Consequently, native platform functionalities and widgets such as for searchand visualization are able to seamless operate on this repository.

    2.4.3 Query Formulation

    We complete the cycle with an intuitive visual query formulation system (OptiqueVQS) for the ontology-based data access framework. The details of OptiqueVQS are described in deliverable D3.1, we here onlyprovide a brief overview and focus on the aspects relevant for the integration. OptiqueVQS allows end-users to formulate queries by directly manipulating visual representations of domain elements.The designmantra of the OptiqueVQS relies on high usability rather than high expressiveness, that is OptiqueVQSaccommodates reasonably complex and frequently occurring queries (i.e, from end-user perspective), andleave out highly complex and less frequently used queries, which might hinder the overall usability withoutmuch gain. However, we also plan to provide support for complex queries; this is possible by either hidingsuch functionality behind layers so that advanced users can access them or introducing a textual editor wheredomain experts and IT experts collaborate on queries from different perspectives. Concerning the actualdesign of the interface, we employ a multi-parading approach where different representation and interactionparadigms are combined to benefit from their individual strengths.

    OptiqueVQS is designed as a user-interface (UI) mashup built on widgets. A UI mashup aggregatesdifferent applications into a common graphical space and orchestrates them for common goals. Widgets9

    are the building blocks of our VQS and refer to portable, self-contained, full-fledged, and mostly client side9http://www.w3.org/TR/widgets/

    19

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    applications with limited functionality and complexity. Widgets in our system communicate with each otherby delivering events, generated by user actions, through a client-side communication channel. Each widgetreacts to events either in a preprogrammed way or by considering the semantic and syntactic signatures ofevents.

    The OptiqueVQS consists of a client-side component that is the presentation and a server-side componentthat uses platform APIs to serve the ontology and data to the client-side component. The client-side com-ponent is purely HTML and JavaScript and uses existing JavaScript libraries for realising its functionality.JQuery mobile10 is used to generate widgets, which can also run on mobile devices, InfoVis11 is used tovisualise query graphs, and JQuery12 is used for cross-browser compliance. The communication channel isbuilt on HTML 5’s13 message passing support. The server-side component is built in IWB and integratedwith other APIs. The communication between the client-side and the server-side components is handled viaa set of REST calls, which at every step return a fragment of the ontology in JSON format.

    Currently the OptiqueVQS uses the following REST methods from OntologyAccess4QueryFormulationclass in eu.optique.api.component.qf package:

    • getAvailableOntologies(): gets the list of identifiers of the available ontologies in the triple store.

    • loadOntology(String ontologyURI): it loads an ontology given its URI.

    • getCoreConcepts(): gets the core concepts of the active (i.e. loaded) ontology to be listed in theOptiqueVQS.

    • getConceptFacets(String conceptURI): given a concept URI/Id, retrieves its associated facets.

    • getNeighbourConcepts(String conceptURI): given an concept URI/Id retrieves the associated con-cept neighbours.

    Each REST call returns the ontology-related information serialised as JSON objects, which will populatethe OptiqueVQS interface. For example, the REST call:http://:/REST/JSON/getQFOntologyAccess/?method=getCoreConcepts&id=1 will return a set of JSONobjects corresponding to the core concepts in the ontology.

    10http://jquerymobile.com/11http://philogb.github.io/jit/12http://jquery.com/13http://www.w3.org/TR/html5/

    20

  • Chapter 3

    Documentation and Prototype

    This chapter provides detailed instructions for system administrators (IT-Experts) in order to deploy andconfigure the Optique platform. Beside guidance for the technical system installation of the platform on dif-ferent operation systems, it explains the basic steps that are required to initialize the platform. Particularly,it shows how to connect the system to a relational datasource (also called “data endpoints”), bootstrappingthe system with initial configurations which are required for the initialization of the component for querytransformation. Finally, basic functionality of the novel query formulation interface is explained to both, theIT-Expert as well as the End-User.

    3.1 Installation Instructions

    3.1.1 Obtaining the platform bundle

    The Optique platform prototype is available in the restricted download area of the project website. Pleasecontact the project coordinator if you would like to obtain a copy for review or evaluation purpose.

    3.1.2 Installation Requirements

    Server - Operating SystemWindows (64-bit only): Windows 7, Windows Server 2008Linux (64-bit only): openSUSE 12.1Java Runtime Environment (JRE >= 1.7.0_25 64 bit)32-bit systems, other Linux Distributions, different versions of Windows or OS X systems may alsowork, but are not officially supported..

    Client -BrowsersFirefox >=17.x (ESR)Internet Explorer >=8Safari >=5.1.7Other browsers may also work, but are not officially supported.

    3.1.3 Installation

    The Optique platform supports both Windows and Linux based operating systems.

    Windows

    Installation from the zip-distributionIt is recommended to use a 64bit Windows operating system with a 64Bit Java SE Runtime Environ-

    21

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    ment in version 1.7 (taken from the JDK). The reference version shipped with the installer is JRE SE1.7.0_25 64bit. This is also the version used in steps a) and b) below. Unpack the distribution into adirectory of any choice (e.g. C:\\OPTIQUE). In the following, we will refer to the absolute pathname ofthis directory by .

    Running the Optique platform as executableExecute /start.cmd

    Running the Optique platform on a 32bit Windows operating systemThe Optique platform can be run on a 32bit Windows operating system by following the steps below:

    1) Download and install Java SE 32-Bit JDK version 1.7.1

    2) Set the path of java.exe in the file /fiwb/backend.conf, examples:wrapper.java.command=C:\Program Files\Java\jdk1.7.0_25\bin\java (absolute path)wrapper.java.command=java (if the java command is in the Path environment)3) Execute the Optique platform as described above.

    Linux

    To run the Optique platform under Linux a Java SE Runtime Environment version 1.7 (taken from theJDK) must be installed. Note that the Optique platform does not ship a reference version bundled with therelease. Unpack the distribution into a directory of any choice (e.g. /opt/optique). In the following, we willrefer to the absolute pathname of this directory by . Download and install Java SE64-Bit JDK version 1.7. Make sure the java command is added to the command-path of the user root.

    a) Running as serviceCreate a user under which the optique platform shall run, e.g. “fluid” (in the following we will refer tothis user as ).If “fluid” has not been chosen as user, the script /fiwb/iwb.sh has to be adaptedaccordingly: Search for RUN_AS_USER=fluid and replace fluid by Exceute the script linux-install.sh in as user root, like follows:bash -eu linux-install.sh

    This installs an init-script as /etc/init.d/iwb and starts the application. To make sure this script isexecuted on reboot create corresponding links in the run-level specific directories.Depending on the unix distribution, this can be done with: chkconfig -a iwb or with insserv iwb

    b) Running the Optique platform as executableMake sure all script are executable by executing in :chmod +x *.sh fiwb/*.sh fiwb/wrapper-linux*

    If the Optique platform needs to be executed as a user different from “fluid”, the script/fiwb/iwb.sh has to be adapted accordingly: Search for RUN_AS_USER=fluid andreplace “fluid” by any preferred user (this user must exist on the system).Execute start.sh in .

    Mac OS X

    Please note that, while we have successfully installed and run the Optique platform on Mac OS X, this plat-form is not officially supported. To run the Optique platform on Mac OS X it requires a compatible versionof the Java runtime (ideally, version 1.7). OS X may ask whether to install a Java runtime automatically if itdetects that one is needed but missing the Optique platform distribution (.zip file). To get started, proceedas follows:

    • Unpack the Optique platform zip distribution.1http://www.oracle.com/technetwork/java/javase/downloads/index.html

    22

    http://www.oracle.com/technetwork/java/javase/downloads/index.html

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    • Open the terminal application and cd into the unpacked distribution (e.g. cd Desktop/IWB’ [ENTER])

    • Make scripts executable by typing chmod +x *.sh fiwb/*.sh fiwb/wrapper* [ENTER])

    • In the file, fiwb/iwb.sh, modify the value for RUN_AS_USER=fluid to the user name that is intended to runthe Optique platform. (Alternatively, create a user account with the name fluid.)

    • To start the Optique platform, now type ./start.sh’ [ENTER]

    • After a few seconds, the Optique platform should be accessible locally from any browser under theaddress http://localhost:8888

    3.1.4 Opening the Optique platform

    Once the Optique platform is running (startup may take up to a few minutes), it can be accessed athttp://localhost:8888 . The start page (cf. Figure 3.1) provides some links to important pages as wellas to the user documentation.By default, an administrator account with credentials “admin/iwb” is created. It is highly recommended to

    Figure 3.1: Start page of the Optique Platform, describing the basic steps for initial configuration.

    change the password in the Admin area after the first login.

    3.1.5 Shutting down the Optique platform

    Windows Service: Go to the Services configuration tool and stop the serviceLinux Daemon: Invoke /etc/init.d/iwb stopCommand line (all OS): Exit the Optique platform by clicking in the command line window and pressingCtrl + c.IMPORTANT: Never exit the Optique platform by closing the command line window without propershutdown. This can result in corrupting the data store and loss of data.

    3.2 Administration Guide

    The initial set-up of the platform for a specific domain, requires to go through several pref-configurationsteps (Section 3.2.1) before the system can be fully customized (Section 3.2.2). We assume that this is to bedone by an IT-Expert rather than the End-User (cf. Figure 1.1).

    23

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    3.2.1 Setup and Configuration Steps

    Registration of a Relational Database Endpoint and Extraction of Relational Metadata

    In order to extract data from a Relational Database, a so-called RDBMetainformationProvider needs to beset-up. The provider securely stores the relevant connection parameters as well as connects to the relationaldatabase in order to extract information from it. To do so, start your Optique platform, go to Admin -> DataProvider Setup, and choose “Add Provider”. Then choose the RDBMetainformationProvider from the popup andinsert the connection information. You need to provide the following information:

    • The Poll Interval can be set to refresh database information periodically (a value of 0 indicates thatthe automatic refresh is disabled, i.e. only manually triggered runs are possible)

    • The username is the name of the database user used to authenticate with the Relational Database

    • The password is the password for the previous database user used to authenticate with the RelationalDatabase

    • The schemaName is the name of the schema that needs to be extracted from the Relational Database(in order to extract multiple schemata from the same database, one needs to create multiple providers)

    • The connectionString is the JDBC connection string used to connect to the database,e.g. jdbc:mysql://helloworld:3306/npd (see standard JDBC documentation for connection strings todifferent database types)

    • The driverClass specifies the Java class of the JDBC driver, e.g. com.mysql.jdbc.Driver. Note that theplatform does NOT ship JDBC drivers (typically for licensing reasons), so the .jar file of the respectivedriver need to be downloaded and added to the classpath.2

    Figure 3.2 illustrates how to set up an RDBMetadataProvider for the relational NPD dataset 3 which ishosted within Optiques’ development infrastructure (in order for this example to work, a VPN connectionto the Optique network is required).

    Bootstrapping of Initial System Configurations: Ontologies and Mappings

    Once the metadata has been extracted successfully, the O&M component widget can be utilized in order to:

    • bootstrap an initial ontology directly from the relation metadata

    • bootstrap a set of direct mappings

    • automatically align the bootstrapped ontology to an existing domain ontology

    • if needed, approximate the ontology language profile to the OWL 2 QL profile

    • store the assets in the platform’s shared metadata repository

    The O&M user interface is implemented as a wizard, which guides you stepwise through this process.While it is linked from the systems’ start-page, you can also open it directly, for example, by go to theurl http://localhost:8888/resource/Bootstrapping on you local instance.

    2For adding the JDBC driver to the classpath, it is required to download the JDBC driver forMySql http://dev.mysql.com/downloads/connector/j/5.0.html, copy the respective .jar file into the folder/fiwb/libs/extensions and restart the platform

    3http://sws.ifi.uio.no/project/npd-v2

    24

    http://dev.mysql.com/downloads/connector/j/5.0.htmlhttp://sws.ifi.uio.no/project/npd-v2

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Figure 3.2: RDBMetadataProvider - Example Configuration of an RDBMetadataProvider

    Configuration and Initialization of the Query Transformation Component

    After bootstrapping, alignment and approximation of the ontology and respective mappings the query trans-formation component needs to be initialized. For this purpose, the platform provides a specific configurationwidget (cf. Figure 3.3), which allows to select and combine easily different relational endpoints, with differentcollections of mappings and ontologies. It automatically requests all registered relational data endpoints,mappings as well as ontologies from the shared metadata repository. By selecting the respective identifiersfrom the drop-down elements, you can easily define the input parameters in order to configure the Ontopcomponent, which is responsible for query transformation. Once the required configuration parameters havebeen set and stored, the query transformation component will be initialized and registered as a Sesamerepository within the platform.45 After initialization, the platform will automatically offer to operate on thevirtualised repository, for example, from within widgets or the sparql-interface.

    3.2.2 Full Administrator Guide

    A full documentation of features for administration of the platform core as well as for developing customizeddomain extensions, ships with the platform bundle as well as is available online.6 The documentation coversin particular topics such as system settings, wiki management, access to the platform’s APIs via REST andCLI and the management of user rights. The developers guide explains basic plugin and extension mechanismfor developing customized solutions, for example, domain specific query answer visualisation widgets.

    3.3 End User Documentation

    3.3.1 General Platform Features for Exploration, Search, Authoring and Visualisation

    Platform features that specifically target End-User needs such as different kinds of widgets for query resultvisualization and browsing through the data, are documented. The help is included in the platform bundleas well as available through the online reference (see above). It documents the underlying key concepts

    4http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/repository/Repository.html5Depending on the size of the database, ontology and mappings this may take some time.6http://optique.fluidops.net/platform/help.pdf

    25

    http://openrdf.callimachus.net/sesame/2.7/apidocs/org/openrdf/repository/Repository.htmlhttp://optique.fluidops.net/platform/help.pdf

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Figure 3.3: Widget for configuration and initialization of the Query Transformation component.

    of template mechanism for browsing, searching and semantic authoring of resource using a semantic wikiapproach.

    3.3.2 Visual Query Formulation

    The OptiqueVQS initially includes three widgets as depicted in Figure 3.4, with examples from [3]:

    • The first widget (W1 - see the bottom-left part of Figure 3.4) is a menu-based query by navigationwidget and allows users to navigate concepts through pursuing relationships between them, hencejoining relations in a database.

    • The second widget (W2 - see the bottom-right part of Figure 3.4) is a form-based widget, which presentsthe attributes of a selected concept for selection and projection operations.

    • The third widget (W3 - see the top part of Figure 3.4) is a diagram-based widget and provides anoverview of the constructed query and affordances for manipulation.

    These three widgets are orchestrated by the system, through harvesting event notifications generated by eachwidget as a user interacts, to jointly extract and represent the information need of a user.In a typical query construction scenario, a user first selects a kernel concept, i.e., the starting concept, fromW1, which initially lists all domain concepts accompanied with icons, descriptions, and the potential/ap-proximate number of results. The selected concept becomes the focus/pivot concept (i.e., the node colouredin orange or highlighted), appears on the graph (i.e., W3) as a variable-node, W2 displays its attributes, andW1 displays all concept-relationship pairs pertaining to this concept.

    The user can select attributes to be included in the result list (i.e., using the “eye" button) and/or imposeconstraints on them through form elements (i.e., W2). Currently, the attributes selected for output appearon the corresponding variable-node in black with a letter “o", while constrained attributes appear in bluewith letter “c". Note that W1 does not purely present relationships, but combine relationship and conceptpairs (i.e., relationship and range) into one selection; this helps us to reduce the number of navigational

    26

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Figure 3.4: OptiqueVQS – an example query is depicted for the Statoil use case.

    levels that a user has to pass through. The user can select any available option from the list, which resultsin a join between two variable-nodes over the specified relationship and moves focus to the selected concept(i.e., pivot). The user has to follow the same steps to involve new concepts in the query and can alwaysjump to a specific part of the query by clicking on the corresponding variable-node. The arcs that connectvariable-nodes do not have any direction, since for each active node only outgoing relationships, includinginverse relationships, are presented for selection in W1; this allows queries to be always read from left toright.

    In W3, we employ node duplication approach for cyclic queries for the sake of having tree-shaped rep-resentations for queries, hence avoiding graph representation, which might be complex for end-users tocomprehend.

    An example query is depicted in Figure 3.4 for the Statoil use case. The query asks for all fields thatcontain an oil producing facility and are operated by the Statoil company. In the output, we would like tosee the name of the field and the name of the facility. The user can delete nodes by switching to delete modeor assert that two variable-nodes indeed refer to same variable (i.e., cyclic query). Affordances for these areprovided by the buttons at the bottom-left part of the W3.

    The user can also switch to SPARQL mode and see the textual form of the query by clicking on “SPARQLQuery" button at the bottom-right part of the W3 as depicted in Figure 3.5. The user can keep interactingwith the system in textual form and continue to the formulation process by interacting with the widgets.For this purpose, pivot/focus node is highlighted and every variable-node is made clickable to allow users tochange focus. Currently, the textual SPARQL query is non-editable and is for didactical purposes, so thatadvanced end-users, who are eager to learn the textual query language, could switch between two modes andsee the new query fragments added after each interaction.

    27

  • Optique Deliverable D2.3 First Prototype of the Core Platform

    Figure 3.5: OptiqueVQS – the example query is depicted in SPARQL form.

    28

  • Chapter 4

    Conclusion and Outlook

    In this deliverable we presented a first prototype of the Optique platform with specific focus on the integra-tion of existing components from the technical workpackages. In its core the Optique platform rests on theInformation Workbench, which serves as a semantic platform for integration. During the first project phase,the platform have been extended with specialized programming interfaces according to the requirementsspecification and overall architecture as defined in Deliverable 2.1. The shared interfaces enable a seamlessintegration and interaction of components with the platform and among each other. More specifically, theplatform allows components such as those for Ontology and Mapping Management, Query Transformationand Visual Query Formulation to access required OBDA assets (ontologies, mappings, metadata and config-urations) from the platform’s shared metadata repository in a standardized and convenient way. On the userinterface side a number platform widgets and wizards have been created that support IT-Experts in settingup and deploying the Optique system to specific use-case domains. So far the platform has been successfullydeployed for the Siemens and Statoil use-cases.1 Furthermore, relevant user interface components such thosefor visual query formulation and browsing have been presented to a group of End-Users during the respectiveworkshops.Ongoing work with respect to the second year, focuses on the design and implementation of interfaces forcloud automation. In particular, native support for managing computational cloud resources will enable atight integration of the Distributed Query Answering and Processing Component (APD). While the currentplatform bundle already includes special JBDC driver for accessing ADP, the integration with the componentfor query transformation is an ongoing activity. For details, please refer to Deliverable 7.1. Moreover, somerefinements to the Relational Schema API are need in order the enable access to meta-data across multipleschema and user rights. This will be of particular importance to the Statoil use-case. Moreover, we areplanning to provide a dedicated infrastructure for storing and managing queries as first order objects inthe platform. While this will primarily support the task of building and managing domain specific querycatalogs, it also contributes to the challenge of establishing a proper framework for continuous performanceand integration testing. Finally, we are discussing about improvements to the usability (i.e., exception han-dling) as well as to the performance (i.e., the initialization and reconfiguration of the query transformationcomponent).

    1See http://fact-pages.fluidops.net and http://optique-siemens.fluidops.net

    29

    http://fact-pages.fluidops.nethttp://optique-siemens.fluidops.net

  • Bibliography

    [1] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, MarianoRodriguez-Muro, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio Savo. The MASTRO system forontology-based data access. Semantic Web, 2(1):43–53, 2011.

    [2] Jim Crompton. Keynote talk at the W3C Workshop on Semantic Web in Oil & Gas Industry: Houston,TX, USA, 9–10 December, 2008. available from http://www.w3.org/2008/12/ogws-slides/Crompton.pdf.

    [3] Evgeny Kharlamov, Martin Giese, Ernesto Jiménez-Ruiz, Martin G. Skjæveland, Ahmet Soylu, DmitriyZheleznyakov, Timea Bagosi, Marco Console, Peter Haase, Ian Horrocks, Sarunas Marciuska, ChristophPinkel, Mariano Rodriguez-Muro, Marco Ruzzi, Valerio Santarelli, Domenico Fabio Savo, Kunal Sen-gupta, Michael Schmidt, Evgenij Thorstensen, Johannes Trame, and Arild Waaler. Optique 1.0: Se-mantic access to big data: The case of Norwegian Petroleum Directorate’s FactPages. In InternationalSemantic Web Conference (Posters & Demos), pages 65–68, 2013.

    [4] Mariano Rodriguez-Muro and Diego Calvanese. High Performance Query Answering over DL-Lite On-tologies. In KR, 2012.

    30

    http://www.w3.org/2008/12/ogws-slides/Crompton.pdfhttp://www.w3.org/2008/12/ogws-slides/Crompton.pdf

  • Glossary

    ADP Athena Distributed ProcessingAPI Application Programming InterfaceCLI Command Line InterfaceDL Description LogicIWB FOP Information WorkbenchJDBC Java Database ConnectivityNPD Norwegian Petroleum DictorateOBDA Ontology-based Data AccessOS Operating SystemOWL Web Ontology LanguageO&M Ontology and MappingQA Query AnsweringPoJo Plain Old Java ObjectQF Query FormulationQT Query TransformationRDB Relational Data BaseRDBMS Relational Data Base Management SystemRDF Resource Description FrameworkREST Representational State TransferR2RML RDB to RDF Mapping LanguageSPARQL SPARQL Protocol and RDF Query LanguageSQL Structured Query LanguageURI Uniform Resource IdentifierVQS Visual Query Formulation SystemW3C World Wide Web Consortium

    31

    1 Introduction2 Optique Platform Architecture & Implementation2.1 Architectural Overview2.2 The Information Workbench as a Platform for Integration2.3 Shared Platform Interfaces2.3.1 Ontology Management API2.3.2 Relational Schema API2.3.3 R2RML Mapping Management API2.3.4 RDF Data Management API

    2.4 Integration of Components2.4.1 Ontology & Mapping Management2.4.2 Query Transformation2.4.3 Query Formulation

    3 Documentation and Prototype3.1 Installation Instructions3.1.1 Obtaining the platform bundle3.1.2 Installation Requirements3.1.3 Installation3.1.4 Opening the Optique platform3.1.5 Shutting down the Optique platform

    3.2 Administration Guide3.2.1 Setup and Configuration Steps3.2.2 Full Administrator Guide

    3.3 End User Documentation3.3.1 General Platform Features for Exploration, Search, Authoring and Visualisation3.3.2 Visual Query Formulation

    4 Conclusion and OutlookBibliographyGlossary