semtech 2010: pelorus platform
DESCRIPTION
Pelorus Platform is a suite of tools for building Linked Data and Semantic Web applications.TRANSCRIPT
Pelorus: A Semantic Web Application
Platform2010 Semantic Technology
Conference
Michael GroveDirector of Software Development
Clark & Parsia, [email protected]
http://clarkparsia.com -- http://www.twitter.com/candp
Who are we?Clark & Parsia is a Semantic software startup founded in 2005Offices in DC and Cambridge, MASoftware products for end-user and OEM useProvides software development and integration servicesSpecializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers.
Where do we start?No, literally, where do we start?Enterprise increasingly wants to utilize semweb tech to manage information
Lack of in-house SemWeb expertiseSo what's the first step in these cases?
It's hard to get a project off the ground without expertiseIn many cases, you just want to get a prototype running ASAP to evaluate the approach
An integrated platform to rapidly prototype and assess semweb tech, which also scales to production, is crucial
The Pelorus PlatformPelorus Platform aims to ease this situationIt's a standards-based application development stack geared toward enterprise information integration via RDF, SPARQL and OWL.
Provides a collection of software designed to take you from ontology (or data) to applicationBased on years of customer engagements learning what parts are the same for everyone, and what parts are customized by everyone--and facilitating both.
Minimal or no human in the loop steps are required to get a barebones application running
From there, it's just UI customization
IngredientsPelletServer
RESTful server-side component powered by PelletProvides:
ReasoningSemantic SearchIntegrity constraintsQuery servicesMachine Learning ... and Planning too!
Semantic ETLToolkit for transforming existing data into RDF
Support for most common formats, XML, CSV, Excel, relational, etc.Conversion driven from domain ontology
More IngredientsAnnex - A linked data server
Publishes your RDF as linked dataWorks in-place against any RDF database
No files to parse and directory structure to fill outJavascript module and pluggable template API for rendering resourcesCRUD workflow support for maintaining your data
More IngredientsMachine Learning Suite
Bootstrap ontologies from existing dataProvides capabilities for learning ETL transformations from existing data, decreasing by-hand mapping burdenAutomatically create Pelorus models for browsingAnalysis support, clustering, classification, and more.
PelorusFaceted browsing via SPARQL for RDF data.
So What Now?Intent of Platform is to take either your existing data, or an existing ontology, as input and provide as output a working skeleton application.
This is the Staples Easy button for the Semantic WebSome minimal configuration and UI customize may be required
The goal is to Just Add Data and get back a working, full-service, modern app that's optimized for data integration and analysis.
Getting StartedLegacy data in a series of databases, XML files, etc
This is a maintenance nightmareHow to you search this data, analyze it, or verify it's correctness?
If we could get the data out of these legacy formats and integrate them, then we could do something useful...
1. Integrate Legacy DataOntology Bootstrapping via ML
We can learn the basic ontology from our existing dataFeed data to a ML process that will produce our ontology
Semantic ETLUsing our ontology, and some additional ML, we can generate mappings from the source data to the ontologyAutomatically convert our legacy data into RDF
2. Publish Integrated DataNow that we have RDF, we'd like to publish it as Linked Data
Annex Linked Data server takes any RDF database and exposes it's contents as Linked Data.
Customizable template frameworkJavascript API to access original RDF database
We'd also like to maintain our dataUsing Empire, we can generate Java beans to represent our domain ontology.Annex provides generic CRUD templates driven from standard Java beans, using JPA as a persistence mechanism.
By virtue of simply having RDF in a database, we've got publication as Linked Data, and maintenance via simple CRUD pages for free.
3. Browse & Search & QueryWe've published our RDF, but clicking around pages looking for a particular resource is not idealHaving a simple interface to browse the data would be great.Pelorus is served via Annex
Facet model is generated dynamically via more MLUses same Javascript template framework for custom display of RDF content.
Step 4: Analyze & Plan & ActWe can use OWL reasoning via Pellet to learn new things about the data; for example:
which products should we sell to which customers?which products should we sell to which prospects?why do we make these recommendations?
We can use Machine Learning to learn new things, too:which customers are like others? (similarity)which groups do our customers fall into? (clustering)which employees are liaisons between parts of the company (social network analysis)which employees are most likely to retire in the next year? (classification)
We can use Automated Planning to:build actionable plans/workflows based on these analyses
Interlude: Pelorus Demos
http://pelorus.clarkparsia.com/ -- American baseball
http://nasa.clarkparsia.com/ -- NASA Space Program
http://datagov.clarkparsia.com/ -- data.gov data catalog
What's the point?Getting to step 4 (and beyond) is the point, that's where the real ROI lives...
You want to get there sooner & cheaper But many times step 1-3 is a hurdle
If you've got limited time and/or budget to prove value in step 4, you don't want to waste it on the drudgery of getting off the ground
This is the key to semantic technology's value proposition
Questions?