stardog 1.1: easier, smarter, faster rdf database

Post on 13-Dec-2014

1.313 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk from Semtech NYC 2012 about Stardog 1.1, the forthcoming release that adds SPARQL 1.1 and user-defined rules.

TRANSCRIPT

stardog.com

Stardog 1.1An Easier, Smarter,

Faster RDF DatabaseMichael Grove, Clark & Parsia LLC

mike@clarkparsia.com@mikegrovesoft, @stardog_db, @candp

1

stardog.com

About C&P

•We build semantic technology tools for enterprise solutions

•Proud bootstrappers since 2005

•Offices in DC and Cambridge, MA

•Government & enterprise customers

2

stardog.com

What is Stardog?

•a pure Java RDF database

• full-service, feature rich

• focus on query performance

•standards compliant

•scalable (up first, out next)

3

stardog.com

History•Development started summer 2010

•Stardog 0.5 alpha - 2 May 2011

•Stardog 1.0 final - 19 June 2012

•Total of 32 releases, ~500 tickets, 100s of email on the mailing list

•Stardog 1.0.7 presently

•Stardog 1.1 real soon now...

4

stardog.com

Easier.

5

stardog.com

What is easy?

6

•What’s “easy” in an RDF database?

•Configuration

•Maintenance

•User Experience

• i.e., rationally predictable

•Easier for whom? Not a simple question.

stardog.com

Configuration• Convention, not configuration

• “Quick Start” is shortest page in the docs

• 4 steps to querying

• Predictable, sane defaults throughout

• Adapted to Java, Unix, Semtech cultures

• Culture is key to convention

• Very good (!) documentation

7

stardog.com

Maintenance•Nothing is easier than doing nothing

•RDF & OWL are ideally schema flexible

• Job scheduler: search, indexes, etc.

•Data migration tools since < 1.0

•Multi-tenancy, online & offline DBs

• Just add data...Automatic data quality*

•NoSQL == Anti-jobs program for DBAs

8

stardog.com

Except that...•Every DB has to be admin’d &

maintained

•Matter of degree, not kind

•Stardog Enterprise Server Management

• audit logging

• JMX monitoring

•web console

•online backups (coming soon!)

9

stardog.com

User Experience•Client-server & Embeddable

• Jena, Sesame, SNARL, HTTP

•SPARQL query simplifications

•ACID transactions

• Idiomatic Java & Unix interfaces

•Great CLI & shell…

•Windows has gotten much better! :>

•Rich security model10

stardog.com

Smarter.

11

stardog.com

Okay...that’s BS.

12

•“Smarter” is market speak

•But Stardog 1.1 has rich feature set

•Reasoning, including UDR

•Integrity Constraint Validation (ICV)

•Semantic Search

•Security

•Spring

•Linked Data Platform

stardog.com

Reasoning•OWL 2 DL, QL, EL, and RL

•Query-time, no materialization

•Only pay for what you eat

•Embarrassingly parallel in part

•Pellet 3 embedded for OWL 2 DL schema reasoning only

•Very flexible re: NGs & schemas

13

stardog.com

User-defined Rules• New in 1.1!

• Using SWRL syntax

• Including all SWRL builtins

• Which are also available to SPARQL

• Recently added new individual builtin

• Create new individuals in your rules

• Beware of non-termination!

• Executed at query time like everything else

14

stardog.com

ICV?• Integrity Constraint Validation

•Automated data quality

•Closed world semantics

•Transactional

•High-level & declarative

• ICs can be OWL, SWRL, or SPARQL

15

stardog.com

16

Example...Only employees who are US citizens can work on projects that receive funding from a US government agency.

Class: Project and (receivesFundsFrom some USGovAgency)SubClassOf: inverse(worksOn) only (Employee and nationality value "US")

More examples: http://stardog.com/docs/

stardog.com

Semantic Search•Uses Waldo, our deep adaptation of

Lucene

•Text index from RDF literals

•Search for resources or literals

• Integrated with SPARQL query evaluation

•Auto-managed search indexes

17

stardog.com

Security

•Rich security model

•Based on standard RBAC model

•Applies at database-level

•Will extend to Named Graphs in 1.x

•Easy CLI admin tools (& Java API)

18

stardog.com

Spring•Love it or not, Spring isn’t going away

•Support Batch, Data Import, etc.

•Open Source: http://github.com/clark-parsia/spring-stardog

•Developed by an early adopter who needed it; supported/maintained by C&P

19

stardog.com

Linked Data•Stardog fills a hole in our Linked

Data Platform

•HTML5, pure JS, client side web framework (based on backbone.js)

•Linked Data publishing suite

•Stardog Linked Data Catalog...Enterprise Linked Data management app

20

stardog.com

Faster.

21

stardog.com

Finally...

22

•Now we can talk about something that’s objective, context-free, and measurable

•Yes!

•But no…#include <std_disclaim.h>

•Your data & your queries are the only things that really matter

stardog.com

That said...•Two de facto benchmarks for

SPARQL:

•BSBM, OLTP-style, query mixes per hour (QMpH · 25)

•SP2B, OLAP-style (torture test), set of queries within a timeout, T, at a data size D

23

stardog.com

SP2B•Stardog completes SP2B at 5M,

10M, and 25M (except q5a)

•No other RDF database completes > 5M. (As of the most recent report. Things change.)

•Considerable performance differential

•Pushing this out to 100M+ in 1.x

24

stardog.com

BSBM• A throughput test, primarily. Not

necessarily simple queries

• On modest machine, 255 clients, 10M triples, we sustain 7m queries per hour (277k QMpH)

• At 100M, 255 clients, sustain 3m queries per hour (125k QMpH)

• Among the top 2 or 3 RDF DBs for BSBM performance

• We will tackle BSBM BI next...

25

stardog.com

Data Loading• Two indexing modes

• Triples only indexing

• Faster loading, slower NG query

• Up to 250,000 triples per second

• Quads indexing

• Slower loading, faster NG query

• Up to 150,000 triples per second

• More improvements coming in the future

• Customized RDF parser

• Will look at user-defined index subsets26

stardog.com

What’s new in 1.1

•Aforementioned user defined rules

•But most notably, SPARQL 1.1

•Our most requested feature in a survey

•Oh, we also made it faster

27

stardog.com

SPARQL 1.1• Latest revision of the SPARQL query

language

• Put off implementing until spec finalized

• It’s still in flux, but we decided to go for it

• Adds useful new features to SPARQL

• Aggregates, grouping, sub-query, negation

• Oh, and the entailment regimes

28

stardog.com

SPARQL 1.1• Rewrite of query planner & engine for 1.0.5

• Changes needed to support SPARQL 1.1

• Tested by users for the past 3 releases

• With great power comes great responsibility...

• New features are not without cost

• Query planning & optimization more crucial than ever

• Majority of development time

29

stardog.com

Roadmap

30

1. Transitivity & equality

2. GeoSPARQL

3. Web Console

4. Statement identifiers

5. Stored procedures & database triggers

6. “Stardocs”: doc/blob storage & NLP analytics

7. Graph Traversals, Algorithms & query langs

8. Statistical inference & machine learning

9. Stardog 2.0: Distributed Cluster Super Cloud Thingie!

stardog.com

Summary

31

Pick all three!

Easier. Smarter. Faster.

stardog.com

Thanks!

32

stardog.com

Licensing

33

stardog.com

Feature Rich• Support for RDFS, OWL2 profiles (EL, RL, QL) & OWL2 DL

via schema only queries

• Semantic Search

• ICV

• Transactions

• Rich security model

• Support for major APIs

• Jena & Sesame, and our own SNARL

• SPARQL HTTP protocol, Graph Store protocol

• Also includes a CLI & Shell environment

34

top related