semtech 2010: pelorus platform

Pelorus: A Semantic Web Application

Platform2010 Semantic Technology

Conference

Michael GroveDirector of Software Development

Clark & Parsia, [email protected]

http://clarkparsia.com -- http://www.twitter.com/candp

Who are we?Clark & Parsia is a Semantic software startup founded in 2005Offices in DC and Cambridge, MASoftware products for end-user and OEM useProvides software development and integration servicesSpecializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers.

Where do we start?No, literally, where do we start?Enterprise increasingly wants to utilize semweb tech to manage information

Lack of in-house SemWeb expertiseSo what's the first step in these cases?

It's hard to get a project off the ground without expertiseIn many cases, you just want to get a prototype running ASAP to evaluate the approach

An integrated platform to rapidly prototype and assess semweb tech, which also scales to production, is crucial

The Pelorus PlatformPelorus Platform aims to ease this situationIt's a standards-based application development stack geared toward enterprise information integration via RDF, SPARQL and OWL.

Provides a collection of software designed to take you from ontology (or data) to applicationBased on years of customer engagements learning what parts are the same for everyone, and what parts are customized by everyone--and facilitating both.

Minimal or no human in the loop steps are required to get a barebones application running

From there, it's just UI customization

IngredientsPelletServer

RESTful server-side component powered by PelletProvides:

ReasoningSemantic SearchIntegrity constraintsQuery servicesMachine Learning ... and Planning too!

Semantic ETLToolkit for transforming existing data into RDF

Support for most common formats, XML, CSV, Excel, relational, etc.Conversion driven from domain ontology

More IngredientsAnnex - A linked data server

Publishes your RDF as linked dataWorks in-place against any RDF database

No files to parse and directory structure to fill outJavascript module and pluggable template API for rendering resourcesCRUD workflow support for maintaining your data

More IngredientsMachine Learning Suite

Bootstrap ontologies from existing dataProvides capabilities for learning ETL transformations from existing data, decreasing by-hand mapping burdenAutomatically create Pelorus models for browsingAnalysis support, clustering, classification, and more.

PelorusFaceted browsing via SPARQL for RDF data.

So What Now?Intent of Platform is to take either your existing data, or an existing ontology, as input and provide as output a working skeleton application.

This is the Staples Easy button for the Semantic WebSome minimal configuration and UI customize may be required

The goal is to Just Add Data and get back a working, full-service, modern app that's optimized for data integration and analysis.

Getting StartedLegacy data in a series of databases, XML files, etc

This is a maintenance nightmareHow to you search this data, analyze it, or verify it's correctness?

If we could get the data out of these legacy formats and integrate them, then we could do something useful...

1. Integrate Legacy DataOntology Bootstrapping via ML

We can learn the basic ontology from our existing dataFeed data to a ML process that will produce our ontology

Semantic ETLUsing our ontology, and some additional ML, we can generate mappings from the source data to the ontologyAutomatically convert our legacy data into RDF

2. Publish Integrated DataNow that we have RDF, we'd like to publish it as Linked Data

Annex Linked Data server takes any RDF database and exposes it's contents as Linked Data.

Customizable template frameworkJavascript API to access original RDF database

We'd also like to maintain our dataUsing Empire, we can generate Java beans to represent our domain ontology.Annex provides generic CRUD templates driven from standard Java beans, using JPA as a persistence mechanism.

By virtue of simply having RDF in a database, we've got publication as Linked Data, and maintenance via simple CRUD pages for free.

3. Browse & Search & QueryWe've published our RDF, but clicking around pages looking for a particular resource is not idealHaving a simple interface to browse the data would be great.Pelorus is served via Annex

Facet model is generated dynamically via more MLUses same Javascript template framework for custom display of RDF content.

Step 4: Analyze & Plan & ActWe can use OWL reasoning via Pellet to learn new things about the data; for example:

which products should we sell to which customers?which products should we sell to which prospects?why do we make these recommendations?

We can use Machine Learning to learn new things, too:which customers are like others? (similarity)which groups do our customers fall into? (clustering)which employees are liaisons between parts of the company (social network analysis)which employees are most likely to retire in the next year? (classification)

We can use Automated Planning to:build actionable plans/workflows based on these analyses

Interlude: Pelorus Demos

http://pelorus.clarkparsia.com/ -- American baseball

http://nasa.clarkparsia.com/ -- NASA Space Program

http://datagov.clarkparsia.com/ -- data.gov data catalog

http://pelorus.clarkparsia.com/

http://nasa.clarkparsia.com/

http://datagov.clarkparsia.com/

What's the point?Getting to step 4 (and beyond) is the point, that's where the real ROI lives...

You want to get there sooner & cheaper But many times step 1-3 is a hurdle

If you've got limited time and/or budget to prove value in step 4, you don't want to waste it on the drudgery of getting off the ground

This is the key to semantic technology's value proposition

Questions?