apache marmotta - introduction
TRANSCRIPT
ApacheCon Europe 2012 Presentation Template - 1
Apache Marmotta: Introduction
Sebastian Schaffert
Who Am I
Dr. Sebastian Schaffert
Senior Researcher at Salzburg ResearchChief Technology Officer at Redlink GmbHCommitter at Apache Software Foundation
and starting 12/2014 Software Engineering Manager (SRE) @ Google
http://www.schaffert.eu
http://linkedin.com/in/sebastianschaffert
Agenda
Introduction to Apache Marmotta (Sebastian)Overview
Installation
Development
Linked Data Platform (Sergio & Jakob)Overview
Practical Usage
Semantic Media Management (Thomas)Media Use Case
SPARQL-MM
Overview
What is Apache Marmotta?
Linked Data Server
(implements Content Negotiation and LDP)
SPARQL Server
(public SPARQL 1.1 query and update endpoint)
Linked Data Development Environment
(collection of modules and libraries for building custom Linked
Data applications)
Community of Open Source Linked Data Developers
all under business friendly Apache Open Source licence
Linked Data Server
easily offer your data as Linked Data on the Web
human-readable and machine-readable read-write data access based on HTTP content negotiation
reference implementation of the Linked Data Platform (see next presentation block)
SPARQL Server
full support of SPARQL 1.1 through HTTP web services
SPARQL 1.1 query and update endpoints
implements the SPARQL 1.1 protocol
( supports any standard SPARQL clients)
fast native implementation of SPARQL in KiWi triple store
lightweight Squebi SPARQL explorer UI
Linked Data Development
modular server architecture allows combining exactly those
functionalities needed for a use case
(no need for reasoning? exclude reasoner ...)
collection of independent libraries for common Linked Data problemsaccess Linked Data resources (and even some that are not Linked Data)
simplified Linked Data query language (LDPath)
use only the triple store without the server
Community of Developers
discuss with people interested in getting-things-done in the Linked Data world
build applications that are useful without reimplementing the whole stack
thorough software engineering process under the roof of the Apache Software Foundation
Installation / Setup
(we help you)
https://github.com/wikier/apache-marmotta-tutorial-iswc2014
Sample Project
Requirements:JDK 7/8 (https://java.com/de/download/)
Maven 3.x (http://maven.apache.org)
git (http://git-scm.com/)
curl (http://curl.haxx.se/)
https://github.com/wikier/apache-marmotta-tutorial-iswc2014
Sample Project
https://github.com/wikier/apache-marmotta-tutorial-iswc2014
$ git clone [email protected]:wikier/apache-marmotta-tutorial-iswc2014.git$ cd apache-marmotta-tutorial-iswc2014$ mvn clean tomcat7:run
then point browser to http://localhost:8080
Apache Marmotta Platform
Apache Marmotta Platform
implemented as Java web application
(deployed as marmotta.war file)
service oriented architecture using CDI (Java EE 6)
REST web services using JAX-RS (RestEasy)
CDI services found on classpath are automatically added to system
Architecture
Marmotta Core (required)
core platform functionalities:Linked Data access
RDF import and export
Admin UI
platform glue code:service and dependency injection
triple store
system configuration
logging
Marmotta Backends (one required)
choice of different triple store backends
KiWi (Marmotta Default)based on relational database (PostgreSQL, MySQL, H2)
highly scalable
Sesame Nativebased on Sesame Native RDF backend
BigDatabased on BigData clustered triple store
Titanbased on Titan graph database (backed by HBase, Cassandra, or BerkeleyDB)
Marmotta SPARQL (optional)
SPARQL HTTP endpointsupports SPARQL 1.1 protocol
query: /sparql/select
update: /sparql/update
SPARQL explorer UI (Squebi)
Marmotta LDCache (optional)
transparently access Linked Data resources from other servers as if they were local
support for wrapping some legacy data sources (e.g. Facebook Graph)
local triple cache, honors HTTP expiry and cache headers
Note:SPARQL does NOT work well with LDCache, use LDPath instead!
Marmotta LDPath (optional)
query language specifically designed for querying the Linked Data Cloud
regular path based navigation starting at a resource and then following links
limited expressivity (compared to SPARQL) but full Linked Data support
@prefix local: ;@prefix foaf: ;@prefix mao: ; likes = local:likes / (foaf:primaryTopic / mao:title | foaf:name) :: xsd:string;
Marmotta Reasoner (optional)
implementation of rule-based sKWRL reasoner
Datalog-style rules over RDF triples, evaluated in forward-chaining procedure
@prefix skos:
($1 skos:broaderTransitive $2) -> ($1 skos:broader $2)($1 skos:narrowerTransitive $2) -> ($1 skos:narrower $2)($1 skos:broaderTransitive $2), ($2 skos:broaderTransitive $3) -> ($1 skos:broaderTransitive $3)($1 skos:narrowerTransitive $2), ($2 skos:narrowerTransitive $3) -> ($1 skos:narrowerTransitive $3)($1 skos:broader $2) -> ($2 skos:narrower $1)($1 skos:narrower $2) -> ($2 skos:broader $1)($1 skos:broader $2) -> ($1 skos:related $2)($1 skos:narrower $2) -> ($1 skos:related $2)($1 skos:related $2) -> ($2 skos:related $1)
Marmotta Versioning (optional)
transaction-based versioning of all changes to the triple store
implementation of Memento protocol for exploring changes over time
snapshot/wayback functionality (i.e. possibility to query the state of the triple store at a given time in history)
Apache Marmotta Walkthrough(Demo)
Apache Marmotta Libraries
Apache Marmotta Libraries
provide implementations for common Linked Data problems (e.g. accessing resources)
standalone lightweight Java libraries that can be used outside the Marmotta platform
LDClient
library for accessing and retrieving Linked Data resources
includes all the standard code written again and again (HTTP retrieval, content negotiation, ...)
extensible (Java ServiceLoader) with custom wrappers for legacy
data sources
(included are RDF, RDFa, Facebook, Youtube, Freebase, Wikipedia, as
well as base classes for mapping other formats like XML and
JSON)
LDCache
library providing local caching functionality for (remote) Linked Data resources
builds on top of LDClient, so offers the same extensibility
Sesame Sail with transparent Linked Data access (i.e. Sesame API for Linked Data Cloud)
LDPath
library offering a standalone implementation of the LDPath query language
large function library for various scenarios (e.g. string, math, ...)
can be used with LDCache and LDClient
can be integrated in your own applications
supports different backends (Sesame, Jena, Clerezza)
Marmotta Loader
command line infrastructure for bulk-loading RDF data in various formats to different triple stores
supports most RDF serializations, directory imports, split-file imports, compressed files (.gz, .bzip2, .xy), archives (tar, zip)
provides progress indicator, statistics
KiWi Triplestore
KiWi Triplestore
Sesame SAIL: can be plugged into any Sesame application
based on relational database (supported: PostgreSQL, MySQL, H2)
integrates easily in existing enterprise infrastructure (database server, backups, clustering, )
reliable transaction management (at the cost of performance)
supports very large datasets (e.g. Freebase with more than 2 billion triples)
KiWi Triplestore: SPARQL
translation of SPARQL queries into native SQL
generally very good performance for typical queries, even on big datasets
query performance can be optimized by proper index and memory configuration in the database
almost complete support for SPARQL 1.1 (except some constructs exceeding the expressivity of SQL and some bugs)
KiWi Triplestore: Reasoner
rule-based sKWRL reasoner (see demo before)
fast forward chaining implementation of rule evaluation
truth maintenance for easy deletes/updates
future: might be implemented as stored procedures in database
KiWi Triplestore: Clustering
cluster-wide caching and synchronization based on Hazelcast or Infinispan
useful for load balancing of several instances of the same application (e.g. Marmotta Platform)
KiWi Triplestore: Versioning
transaction-based versioning of triple updates
undo transactions (applied in reverse order)
get a Sesame repository connection visiting any time of the triple store history
Thank You!
Sebastian [email protected]
supported by the European Commission FP7 project MICO (grant no. 610480)